CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL

127 CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL 6.1 INTRODUCTION Analyzing the human behavior in video sequences is an active field of research for the past few years. The vital applications of this field could include the monitoring of behaviors for secure installations, video surveillance, video retrieval and human computer interaction systems. The main objective is to recognize and predict the behavior and detect abnormalities. Currently, many researchers have contributed many approaches for predicting the behavior as a post processing task. In this work, it is proposed to analyze the behavior of the human action during the course. In the common scenario, like parking lots and supermarkets, the visual surveillance system should analyze abnormal behavior (as an indicative of theft) and raise an alarm to alert the visual analysts. Hence, the action patterns of the people should be analyzed and the state of action should be detected as either normal or abnormal to understand the behavior. The characterization of human behavior is equivalent to dealing with a sequence of video frames that contains both the spatial and temporal information (Cadamo et al 2010). The temporal information conveys more details for human behavior understanding. Normally, human posture analysis is the basic step to extract the temporal information. During human posture analysis, various human behavior patterns are exhibited in the form of key postures like turnleft, guardkick, falldown etc.

128 This chapter presents a novel human behavior understanding model that analyses the human movements and learns the human posture status either as normal or abnormal from the video sequences using Probabilistic Global Action Graph (PGAG). According to the posture analysis, the status of the human behavior can be predicted as either normal or abnormal using the proposed approach. The process flow of the human behavior understanding model is shown in Figure 6.1. Video capturing Silhouettes Training Phase Foreground Segmentation TSOC COC VQ Probabilistic Global Action Graph (PGAG) Normal / abnormal VQ symbols Preprocessing Feature Extraction Vector Quantization Behavior Modeling Testing Phase Test video Behavior Alarming Normal / abnormal status arg max P ij j State likelihood Feature Extraction Figure 6.1 Process flow of the human behavior understanding model The proposed human behavior understanding model consists of two phases, namely training and testing phases. During the training phase, the following pipeline of processes is involved: (i) Foreground segmentation, (ii) Feature Extraction, (iii) Vector Quantization, and (iv) probabilistic Global Action Graph construction.

129 (i) The pixel layer based approach is used as an initial preprocessing step to segment the foreground (human silhouette) from the action video. (ii) TSOC and COC are identified as features, which are extracted from each silhouette. (iii) In vector quantization, the aim is to group similar postures together and create a finite number of key postures representing the code book. In any case, 35 dimensional shape features for each key posture is symbolized as a one dimensional VQ symbol in code book. (iv) A semiautomatic state space approach based human understanding model is simulated using Probabilistic Global Action Graph (PGAG). In order to experiment the designed model, the test phase is formulated, which has two major steps, namely (i) (ii) For the input sequence of silhouette, the likelihood of key posture is identified using similarity measure. The key posture is analyzed as either normal or abnormal via PGAG with the help of an alarm. In any real-time system, the behavior model is very essential to understand domain knowledge irrespective of action. The rest of the chapter focuses mainly on designing domain specific behavior model based on key posture transitions.

130 6.2 PROBABILISTIC GLOBAL ACTION GRAPH (PGAG) In general, every action has a finite number of key postures and there exists a bound relation across actions due to restrictive movement of the human body. Hence, the related action sequences may share a few of the key postures. The relations can be denoted in two cases: (i). Considering the walk and turnaround normal actions, few of the frequently correlated postures are heel-strike, toe-off, mid-stance etc. (ii). Similarly, for shotgun and fell down abnormal actions, raise hand, point, bend-knee, lower body and fall on floor are the correlated postures. Normally, few actions exhibit similar key postures either at the beginning or ending point of their occurrence. A weighted directed action graph termed as PGAG is constructed, which acquires and distinguishes the posture transitions across various composite actions globally. In this graph, each node represents a key posture. The weighted link between nodes represents the transitional probability between the two key postures. The temporal characteristic of each action is obtained using the posture transitions. Under the state level hypothesis, the transitions among nodes signify the occurrence of an event. Events can be defined based on dominant and persistent characteristics of the posture transitions. Hence, the PGAG possesses the characteristics of understanding the behavior in terms of posture transitions.

131 6.2.1 Construction of PGAG The PGAG is constructed using probabilistic posture transition matrix. The steps involved during the construction of PGAG are as follows: 1. Consider the number of nodes in PGAG, which are equal to the number of VQ symbols in the posture code book. (i.e. # of PGAG nodes = # of key postures). 2. For each possible posture, the posture transition probability (P ij ) is obtained between key postures i and posture j P ij = # of transitions from Posture 'i' to # of transitions from Posture Posture ' j' 'i' (6.1) 3. The posture transition probability between the postures is constrained with, m j 1 P ij 1 (6.2) where the sum of transition probabilities from i th posture to all other j th postures must be equal to 1, and m is the total number of key postures. Hence, the posture transition matrix has the dimension of m x m. The PGAG using six key postures of runstop action are depicted in Figure 6.2. In this graph, the runstop action is performed in multiple views. The transition paths are normally cyclic in nature and there exist specific beginning or ending key postures as detailed in Figure 6.2.

132 P0 0.109 0.165 0.226 0.130 0.125 0.245 P1 P2 P3 P4 0.228 0.2 0.34 0.48 0.45 0.37 0.454 0.13 0.19 0.52 0.3 0.322 1 P5 0.016 Figure 6.2 PGAG with 6 nodes (P0-P5) and posture transition probabilities for runstop action The corresponding posture transition matrix using PGAG is listed in Table 6.1. Also, the posture transition matrix has non-zero probabilities at the lower diagonals, the main reason here is that the runstop action has temporally related key postures with strict left-to-right transitions. Table 6.1 Posture transition matrix for runstop action P ij P0 P1 P2 P3 P4 P5 P0 0.109 0.165 0.226 0.125 0.245 0.130 P1 0.000 0.228 0.450 0.000 0.000 0.322 P2 0.000 0.130 0.200 0.370 0.000 0.300 P3 0.000 0.000 0.190 0.340 0.454 0.016 P4 0.000 0.000 0.000 0.520 0.480 0.000 P5 0.000 0.000 0.000 0.000 0.000 1.000

133 6.3 BEHAVIOR UNDERSTANDING MODEL The constructed PGAG can be effectively used for analyzing the frame level action dynamics in the form of key posture transitions. A human behavior understanding model is simulated, which predicts the status of the key postures either as normal or abnormal using a priori knowledge. In the training phase, each VQ symbol has been assigned with unique behavior status as either normal or abnormal. The model notifies the abnormal behavior in the sequence of events. It also raises an alarm during an abnormal event, which analyzes the current state and the next state using PGAG and VQ symbol status (normal / abnormal). The probabilistic state transitions are described in the form of four cases which are depicted in Figures 6.3 (a) to 6.3(d). Case 1: Initial state is normal t > 1 P 0 t = 1 Figure 6.3(a) Case 1 of PGAG Case 2: Current state is normal and the next state is most probably normal P ii > 0.5 i P ij > 0.5 i P j t > i t > i P i t = i P ij < 0.5 i P j t > i Figure 6.3(b) Case 2 of PGAG

134 Case 3: Initial state is abnormal t > 1 P 0 t = 1 Figure 6.3(c) Case 3 of PGAG Case 4: Current state is abnormal and the next state is most probably abnormal P ii > 0.5 i t > i P i t = i P ij > 0.5 i P ij < 0.5 i P j t > i P j t > i Figure 6.3(d) Case 4 of PGAG where, in Case 2 and Case 4, i represents the maximum likelihood of posture transitions from current posture i to any posture j, i.e. the maximum value of i th row in posture transition matrix. In the testing phase for the video, the behavior status can be plotted between the number of frames and their posture status indication (where Normal = 0 and Abnormal =1). Thus, the PGAG based human behavior understanding model is capable of measuring the probabilistic likelihood of next state of the posture sequence and generating appropriate alarm for the concerned authorities in real-time. 6.4 EXPERIMENTS The proposed human behavior understanding model is experimented on a public video data set MuHAVi-MAS

135 (http://dipersec.king.ac.uk/muhavi-mas/). The silhouette images are in PNG format and each action combination can be downloaded as a small zip file (between 1 to 3 MB). Also, the developers of MuHAVi-MAS have added 3 constant characters "GT-" to the beginning of every original image name to label them as ground truth images. Here, 5 composite action classes such as CA1-walkturnback, CA2-runstop, CA3-punch, CA4- kick and CA5-shotguncollapse along with manually annotated action status are available for the corresponding image frames. Also, it contains information about actor, camera views and sample identity. Thus, the MuHAVi-MAS data set has enough information to validate the performance of the proposals for human behavior understanding model using PGAG. Sample frames from the data set for five composite actions are shown in Figure 6.4. (a) CA3 - Punch (b) CA4- Kick (c) CA5- ShotGunCollapse (d) CA1 - WalkTurnBack (e) CA2 RunStop Figure 6.4 Sample image frames from MuHAVi-MAS dataset for 5 composite actions

136 This multi-view data set with five cameras contains the ground truth, which is explicitly represented for each of the composite actions performed by five actors. Also, these five composite actions have been logically partitioned into 14 primitive actions as detailed in Table 6.2. Table 6.2 Detailed specification about 14 primitive actions from MuHAVi-MAS data set Composite action label CA1 CA2 CA3 CA4 CA5 Composite Action WalkTurnBack N - Normal Run_Stop N - Normal Primitive action labels Primitive Actions Data set size = No. of samples x No. of frames C11 WalkRightToLe ft 8 x 72 = 576 C13 TurnBackRight 4 x 61 = 244 C12 WalkLeftToRig ht 8 x 86 = 708 C14 TurnBackLeft 4 x 54 = 216 C9 RunRightToLeft 8 x 66 = 528 C13 TurnBackRight 4 x 52 = 208 C10 RunLeftToRight 8 x 78 = 624 C14 TurnBackLeft 4 x 51 = 204 Punch C8 GuardToPunch 16 x 28 = 448 AN - Abnormal C7 PunchRight 16 x 46 = 736 Kick C6 GuardToKick 16 x 28 = 448 AN - Abnormal C5 KickRight 16 x 47 = 752 ShotGunCollapse C1 CollapseRight 8 x 84 = 672 AN - Abnormal C3 StandupRight 8 x 120 = 960 C2 CollapseLeft 8 x 93 = 744 C4 StandupLeft 4 x 112 = 448 Based on available ground truth, totally, the data set with 140 video samples contains 3308 normal frames and 5208 abnormal frames, and as a whole, 8516 frames are considered for the experimentation. The data set detailed so far in Table 6.2 is uniformly partitioned into two data sets, namely Train Set and Test Set. During this partitioning, the number of frames considered per composite action and the corresponding ground truth for each frame with status as either normal or abnormal is mentioned in detail in Table 6.3.

137 Table 6.3 MuHAVi-MAS data set partitioning Data Set No. of frames per composite action CA1 CA2 CA3 CA4 CA5 Normal Abnormal Train set 872 782 592 600 1412 2248 2010 Test set 872 782 592 600 1412 2248 2010 In the training phase, Train set is chosen to understand the behavior by updating the PGAG. From each action video silhouettes, the TSOC and COC features are extracted and then vector quantized into 82 key postures. The recognized key postures are further subcategorized as 39 normal and 43 abnormal key postures. The detailed categorization of key postures is listed in Table 6.4. Table 6.4 Categorization of key postures per composite action Composite Action No. of Key Postures VQ symbols No. of normal postures No. of abnormal postures WalkTurnBack 15 w1-w15 15 0 Kick 17 k1-k17 3 14 Punch 18 p1-p18 4 14 Runstop 12 r1-r12 12 0 ShotGunCollapse 20 s1-s20 5 15 The recognized 82 key postures represent the nodes of the PGAG, and the 82 x 82 dimensional posture transition probability matrix is computed, considering the similarity between the training postures and key posture. The runstop action has 12 key postures and their posture transition matrix is listed in Table 6.5.

138 Table 6.5 Posture transition matrix for runstop action using MuHAVi-MAS data samples P ij r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r1 0.351 0.149 0.020 0.007 0.041 0.027 0.027 0.074 0.074 0.047 0.115 0.068 r2 0.159 0.280 0.098 0.015 0.030 0.023 0.061 0.121 0.091 0.008 0.098 0.015 r3 0.041 0.107 0.339 0.132 0.008 0.116 0.017 0.190 0.000 0.000 0.041 0.008 r4 0.000 0.028 0.135 0.390 0.057 0.199 0.014 0.149 0.000 0.014 0.014 0.000 r5 0.050 0.042 0.034 0.050 0.210 0.084 0.000 0.143 0.025 0.185 0.134 0.042 r6 0.022 0.014 0.065 0.275 0.080 0.348 0.007 0.152 0.000 0.007 0.029 0.000 r7 0.064 0.085 0.213 0.043 0.043 0.021 0.277 0.106 0.064 0.000 0.085 0.000 r8 0.026 0.093 0.144 0.093 0.093 0.113 0.015 0.273 0.005 0.021 0.093 0.031 r9 0.126 0.165 0.000 0.000 0.000 0.010 0.019 0.010 0.505 0.000 0.029 0.136 r10 0.090 0.010 0.000 0.010 0.170 0.000 0.000 0.020 0.030 0.310 0.240 0.120 r11 0.099 0.033 0.007 0.013 0.139 0.033 0.033 0.159 0.040 0.106 0.265 0.073 r12 0.136 0.027 0.009 0.000 0.055 0.018 0.009 0.000 0.109 0.145 0.045 0.445 At any current state of the simulated model with i th key posture, the possible next state transition is evaluated using the similarity score ( i ). This i represents the most probable transitions of i th key posture, which are highlighted in Table 6.5. Also, the few cells having the values 0.000 imply that no transitions have occurred between the corresponding postures. 6.4.1 GUI Design for Human Behavior Analysis The GUI design for human behavior analysis is implemented, which includes the Browse option and display provisions for frame number, similarity score with the closest key posture, key posture label and current action status. The Browse option is used to select the test video sample for verifying the model performance. The frame number indicates the current silhouette being processed in the input video. The similarity score in the range [0..1], provides the distance measure between current frame and closest key posture across code book. The posture label displays the VQ symbol based on the action. The posture type indicates behavior alarm as either normal or abnormal. The simulated model performance for the given video is plotted as the number of frames versus the status of the posture as either normal or abnormal. The normal or abnormal status is scaled as 0 or 1 respectively.

139 In Figure 6.5, the summary of results obtained using proposed PGAG based behavior understanding model is illustrated for the test sample of kick abnormal action. This sample video sequence has 40 frames, out of which 29 frames are categorized as abnormal and 11 frames are accounted as normal. According to the annotation provided by the data set, the result achieved during behavior learning for this sample is 72.5%. Even though at the action level kick is categorized into abnormal status, at the frame level their starting and ending frame sequences exhibit standing posture only, which is a normal one. Hence, the proposed model has attained correct behavior understanding. Similarly, for the second test sample of shotguncollapse abnormal action, the results are summarized in Figure 6.6. This sample consists of 50 frames and out of which 48 frames are categorized as abnormal. Thus, the proposed behavior understanding model obtained 96% accuracy. Likewise, for the third test sample of walkturnback normal action as shown in Figure 6.7, the accuracy reported is 95%. Figure 6.5 GUI based results for kick action, where frame number 40 is alarmed as ABNORMAL and the unique VQ symbol from PGAG is 56. Overall performance plot shows out of 40 frames, 29 frames are identified as ABNORMAL and hence most probably ABNORMAL status

140 Figure 6.6 GUI based Results for shotguncollapse action, where frame number 50 is alarmed as ABNORMAL and the unique VQ symbol from PGAG is 62. Overall performance plot shows out of 50 frames, 48 frames are identified as ABNORMAL and hence most probably ABNORMAL status Figure 6.7 GUI based Results for walkturnback action, where frame number 40 is alarmed as NORMAL and the unique VQ symbol from PGAG is 23. Overall performance plot shows out of 40 frames, 38 frames are identified as NORMAL and hence most probably NORMAL status

141 6.4.2 Performance Analysis The model is evaluated for predicting the normal or abnormal posture status using a test set of 56 video samples with 4258 frames. The test outcome can be either 1 i.e. predicting that the human has performed abnormal action or 0 i.e. predicting that the human has performed normal action. TP - true positives (abnormal, correctly declared as abnormal) TN - true negatives (normal, correctly declared as normal) FP - false positives (normal, incorrectly declared as abnormal) FN - false negatives (abnormal, incorrectly declared as normal) The performance is measured based on the following metrics: Accuracy Proportion of true results in the result set. Accuracy (Ac) = # of TP # of TN # of TP TN FP FN (6.3) Precision Proportion of true positive against all positive results. Precision (Pr) = # of TP # of TP FP (6.4) Sensitivity Proportion of actual positives which are correctly identified as such. This is also called as recall rate. Sensitivity (Sen) = # of TP # of TP FN X 100 (6.5)

142 Specificity Proportion of negatives which are correctly identified. Specificity (Spe) = # of TN # of TN FP X 100 (6.6) The performance of the behavior model for the training and test data set using the PGAG approach is reported in Table 6.6 and Table 6.7 respectively. Table 6.6 Performance analysis of human behavior model on training set No. of Sen Spe Data Set frames FP FN TP TN Ac Pr (%) (%) N AN Train Set 2248 2010 319 317 1691 1931 0.85 0.84 84 86 In Table 6.6, the performance of the behavior model is obtained for the Train set with 4258 video frames. The proposed work has correctly categorized the normal status with 86% specificity and categorized correctly the abnormal status with 84% sensitivity. Hence, out of 2010 abnormal frames, 1691 are correctly recognized, similarly out of 2248 normal frames, 1931 are correctly recognized and obtained 85% accuracy of results. Table 6.7 Performance analysis of human behavior model on test set No. of Sen Spe Data Set frames FP FN TP TN Ac Pr (%) (%) N AN Test Set 2248 2010 341 329 1669 1919 0.81 0.83 84 85

143 In Table 6.7, the performance of the behavior model for the Test set with 4258 unknown video frames is considered. The proposed work has correctly categorized the normal status with 85% specificity and correctly categorized the abnormal status with 84% sensitivity. Hence, out of 2010 abnormal frames, 1669 are correctly recognized, similarly out of 2248 normal frames, 1919 are correctly recognized and obtained 85% accuracy of results. For both the results reported in Table 6.6 and Table 6.7, the reason for getting only around 84% is due to the improper ground truth information. Based on ground truth, the actions kick, punch and shotguncollapse have been marked as abnormal. But, as per the human visual perception, even though they are marked as abnormal actions, the initial point and end point of 13% of the action frames are considered as normal only. Hence, the result reported has less accuracy in terms of sensitivity and specificity. The probabilistic behavior model is well structured and implemented with real time data. The performance shows that the system is highly reliable for behavior analysis. 6.5 SUMMARY The main contribution in this chapter is to simulate the human behavior understanding model for real-time environment. This objective is analyzed and state space approach is formulated. The PGAG has been proposed to learn the action dynamics at the frame level. The ultimate purpose of the system is to predict the behavior status either as normal or abnormal. The human behavior model is designed and experimented using multi-viewed data set with train set of 4258 frames and test set of 4258 frames put together 8516 frames. The system is evaluated with four metrics. The performance results have achieved 86% specificity and 84% sensitivity for

144 train set. Similarly for the test set, the system achieved 85% specificity and 84% sensitivity. The simulated behavior understanding model can analyze video contents and recognize human postures and status of the actions well in advance. This proposed model can be effectively utilized to ease the real world scenarios where behavior understanding is a complex task. The forthcoming chapter presents concluding remarks and summarizes the findings of this research work. Also, the future avenues for further extension are highlighted.