FORECASTING TRENDS FOR PROACTIVE CRIME PREVENTION AND DETECTION USING WEKA DATA MINING TOOL-KIT ABSTRACT

Similar documents
Using Data Mining Techniques to Analyze Crime patterns in Sri Lanka National Crime Data. K.P.S.D. Kumarapathirana A

International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17, ISSN

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES FACULITY OF INFORMATICS DEPARTMENT OF INFORMATION SCIENCE

Predicting the Effect of Diabetes on Kidney using Classification in Tanagra

Presentation to The National Association of Sentencing Commissions Annual Conference August 28, 2017

Data Mining Crime Correlations Using San Francisco Crime Open Data

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

Visual and Decision Informatics (CVDI)

Predicting Breast Cancer Survivability Rates

Predictive Policing: Preventing Crime with Data and Analytics

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

A Survey on Prediction of Diabetes Using Data Mining Technique

KNN CLASSIFIER AND NAÏVE BAYSE CLASSIFIER FOR CRIME PREDICTION IN SAN FRANCISCO CONTEXT

Evaluating Classifiers for Disease Gene Discovery

Credal decision trees in noisy domains

A History of Fingerprinting

Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients

A Deep Learning Approach to Identify Diabetes

City of Syracuse Department of Audit Minchin G. Lewis City Auditor

Police Role in the Community. CJ Chapter 4

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

Los Angeles Strategic Extraction and Restoration L.A.S.E.R.

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Learning Decision Tree for Selecting QRS Detectors for Cardiac Monitoring

Learning with Rare Cases and Small Disjuncts

International Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT

Performance Analysis of Different Classification Methods in Data Mining for Diabetes Dataset Using WEKA Tool

CHAPTER - 7 FUZZY LOGIC IN DATA MINING

Predictive Analytics and machine learning in clinical decision systems: simplified medical management decision making for health practitioners

VULNERABILITY AND EXPOSURE TO CRIME: APPLYING RISK TERRAIN MODELING

English summary Modus Via. Fine-tuning geographical offender profiling.

Computer Aided Investigation: Visualization and Analysis of data from Mobile communication devices using Formal Concept Analysis.

Summary. Frequent offenders: specialists or not?

Classification of Smoking Status: The Case of Turkey

Where Small Voices Can Be Heard

Data Mining. Outlier detection. Hamid Beigy. Sharif University of Technology. Fall 1395

A NOVEL VARIABLE SELECTION METHOD BASED ON FREQUENT PATTERN TREE FOR REAL-TIME TRAFFIC ACCIDENT RISK PREDICTION

Executive Summary. The Case for Data Linkage

Summary. 1 Scale of drug-related crime

USING GIS AND DIGITAL AERIAL PHOTOGRAPHY TO ASSIST IN THE CONVICTION OF A SERIAL KILLER

CRIMINAL JUSTICE SECTOR. Strategic Intent YEAR PLAN

ATTORNEY GENERAL LAW ENFORCEMENT DIRECTIVE NEW JERSEY FORENSIC SCIENCE COMMISSION

After Sexual Assault:

Law Enforcement Related Violence Reduction Strategies (with Inventory) Working Paper # 9 September 2011

FAQ: Alcohol and Drug Treatments

Chapter 1. Introduction

CRIMINAL JUSTICE (CJ)

A NEW DIAGNOSIS SYSTEM BASED ON FUZZY REASONING TO DETECT MEAN AND/OR VARIANCE SHIFTS IN A PROCESS. Received August 2010; revised February 2011

Analysis of Hoge Religious Motivation Scale by Means of Combined HAC and PCA Methods

Multi Parametric Approach Using Fuzzification On Heart Disease Analysis Upasana Juneja #1, Deepti #2 *

Social Determinants of Health

Harold Rogers Update Melissa McPheeters, PhD, MPH

An Improved Algorithm To Predict Recurrence Of Breast Cancer

Department of Legislative Services Maryland General Assembly 2009 Session

Evaluation of the Eleventh Judicial District Court San Juan County Juvenile Drug Court: Quasi-Experimental Outcome Study Using Historical Information

A Cooperative Multiagent Architecture for Turkish Sign Tutors

Analysis of Cow Culling Data with a Machine Learning Workbench. by Rhys E. DeWar 1 and Robert J. McQueen 2. Working Paper 95/1 January, 1995

CHAPTER 1 An Evidence-Based Approach to Corrections

Artificial intelligence and judicial systems: The so-called predictive justice. 20 April

Project RISCO Research Summary

Criminal Justice (CJUS)

Methamphetamine Lab Protocol in McCracken County. Implications for Rural Social Work Practice. Leslie Thorn. Western Kentucky University

Assessing the Credibility of Threats Toward Schools

Investigative Interviewing 1 PSY 4931

UNEQUAL ENFORCEMENT: How policing of drug possession differs by neighborhood in Baton Rouge

Sign Language Interpretation Using Pseudo Glove

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection

Overview S1 Clusters of needs expressed by victims and the expected fulfilment

SCIENCE & TECHNOLOGY

PDSA Delaware: A Data and Logic Model Driven Prescription Drug and Substance Abuse Change Approach for Delaware

Mayor s Gang Prevention Task Force

Expert consultation on improving drug statistics and strengthening the Annual Report Questionnaire (ARQ)

The Open Access Institutional Repository at Robert Gordon University

Challenges for U.S. Attorneys Offices (USAO) in Opioid Cases

Limited English Proficiency Services

International Journal of Advance Research in Computer Science and Management Studies

Injecting Equipment Provision in Scotland Survey 2011/12

Guideline for the Surveillance of Pandemic Influenza (From Phase 4 Onwards)

CHAPTER 1. A New Discipline Emerges. Chapter 1 Multiple Choice Select a single answer for each multiple choice question.

THE 21ST CENTURY CURES ACT: TACKLING MENTAL HEALTH FROM THE INSIDE OUT

Commonwealth of Kentucky NASCIO Recognition Awards Nomination Category: Data, Information and Knowledge Management. ekasper

Classification of Thyroid Disease Using Data Mining Techniques

Evaluation on liver fibrosis stages using the k-means clustering algorithm

The Research Partner Model

International Journal of Advance Research in Computer Science and Management Studies

Violence against Women: Do the Governments Care? Mapping the state response in CEE and CIS countries

Cannabis Legalization August 22, Ministry of Attorney General Ministry of Finance

A DATA MINING APPROACH FOR PRECISE DIAGNOSIS OF DENGUE FEVER

Physical Evidence Chapter 3

THE BLOCKWATCH HANDBOOK

19 TH JUDICIAL DUI COURT REFERRAL INFORMATION

Fuzzy Decision Tree FID

Adult at Risk Safeguarding and Protection Policy

Predicting Heart Attack using Fuzzy C Means Clustering Algorithm

CHAPTER 3 METHODOLOGY, DATA COLLECTION AND DATA ANALYSIS

In 2013, Her Majesty s Inspectorate of Constabularies published Stop and Search Powers: Are the police using them effectively and fairly?

Machine Learning and Event Detection for the Public Good

Programme Specification. MSc/PGDip Forensic and Legal Psychology

Crime Scene Investigation. Story

Analysis of complex patterns of evidence in legal cases: Wigmore charts vs. Bayesian networks

Transcription:

FORECASTING TRENDS FOR PROACTIVE CRIME PREVENTION AND DETECTION USING WEKA DATA MINING TOOL-KIT Ramesh Singh National Informatics Centre, New Delhi, India Rahul Thukral Department Of Computer Science And Engineering,Dce, India ABSTRACT One of the most complex legal activities that law enforcement agencies across the world are facing is to analyze the crime data so that forecasting of crime trends can be done.this involves predicting the behaviour of criminals so that Proactive Preventive Measures can be taken by law enforcement officials. Data mining is emerging as one of the tools for crime analysis, prediction of crime trends and many other related applications. This paper represents an implementation of classification algorithm analysis tool on crime data collected from CIPA (Common Integrated Police Application), with the purpose of predicting criminal behaviour and predicting crime trends in a particular area. The proposed system can provide answers to following type of questions: What is the state to state or district to district crime trend? What is the probability / tendency of particular age group of people in committing a specific type of crime? Which day of the week is the safest in a particular area? What is the average response time for the PCR van to reach the crime spot? Predicting the possibility of different crimes so that proactive preventive steps could be taken. Keywords Weka Data Mining Tool-kit, Decision Tree, Machine Learning, Classification, Clustering,

INTRODUCTION Data mining approaches are used for classification and prediction.data mining helps researchers to set up a model in a short period to analyze large data set and predict the result using classification algorithms []. Decision trees [6] are one of the most powerful classification algorithms that are increasingly popular in the field of data mining.popular decision tree algorithms include ID3 and C4.5[2-3].A major challenge facing all law-enforcement and intelligence- gathering organizations is accurately and efficiently analyzing the growing volumes of crime data. We present a general framework for crime data mining using the WEKA Workbench [4]. Data mining is defined as the process of discovering patterns in data.the patterns discovered must be meaningful in that they lead to some advantage. The data is invariably present in substantial quantities. We are interested in techniques for finding and describing structural patterns in data as a tool for helping to explain that data and make predictions from it.useful patterns allow us to make non-trivial prediction on new data. Clustering Techniques [5] group data items into classes with similar characteristics to maximize or minimize interclass similarity for example, to identify suspects who conduct crimes in similar ways or distinguish among groups belonging to different gangs. Classification finds common properties among different crime entities and organizes them into predefined classes. Often used to predict crime trends, classification can reduce the time required to identify crime entities. However, the technique requires a predefined classification scheme. Classification also requires reasonably complete training and testing data because a high degree of missing data would limit prediction accuracy. Many learning techniques look for structural descriptions of what is learned, descriptions that can become fairly complex and are typically expressed as sets of rules or decision trees. SPSS's commercially available PASW Modeller [9] is one of the tool that could identify crime trends and patterns both expediently and inexpensively. Many police departments across the world, like Richmond Police Department have turned up to these Business Intelligence (BI) Software's to make better analytical, operational, and policy decisions relating to crime. Some of the features of this Modeller include Nearest Neighbour to quickly and easily group similar 2

crime cases using prediction or segmentation techniques. Statistical Integration and Visualization of cluster break down. IBM's Cognos Crime Information Warehouse [8] is another tool that that allows police departments to report, analyze and understand crime statistics in near-real time. Rather than reacting, instant access to accurate information allows departments to redeploy in response to crime trends as they occur. Apart from all these compelling benefits offered by these commercial crime data mining frameworks,open Source Availability of Weka Data Mining Framework, it's generic architecture & implementation of large no of machine learning algorithms streamlines it with various other commercial Crime Predictive frameworks [7-8] available in the market. This paper is organized as follows: The next section introduces the WEKA Pre-processing stage where the data is prepared for Crime Predictive Analysis. Section 3 & 4 gives the methodology used to conduct the prediction analysis on the crime data and Experimental results. Conclusion and future work are given in last section. PRE-PROCESSING A large amount of crime data are recorded and accumulated in the Common Integrated Police System.However the original dataset often includes noisy, missing and inconsistent data. Data pre-processing will improve the quality of data and facilitate efficient data mining task. ARFF- files (used in Weka [4]) are used as the file format for data files. This enables an external way of defining the type of each attribute in the data sets, as well as making comparisons and collaboration with Weka easy. An alternative way for data input is through OPEN DB option provided for accepting data from a database. As most of the voluminous data generated in the crime domain resides in various database So it is a good option to use the existing database to import data into the Weka environment. One issue with WEKA is that, it can handle data mostly in numeric and nominal format so following modifications are required to define a mapping b/w WEKA Data-types and the DBMS supported data types demonstrated in Figure []. 3

Figure - An Additional Mapping snippet for modifying the Weka DatabaseUtils.props to support Postgres specific Data types We choose a dataset that contains the crime records for year 2006 & 2007 for the various districts in the New Delhi state. The real crime data was obtained from CIPA (Common Integrated Police Application) Project, under non-disclosure agreements. The operational data was converted into de-normalized data using extraction and transformation.such as Date/Time data are separated into columns such as Year, Month, Day, Hour, AmPm. Example - As there is a general tendency for crimes to happen at night, so time of crime also plays an important decisive role in this process.so depending upon all these consideration, the next task was identifying the significant attributes for the classification process.this process involved talking feedback from crime domain experts and based on their evaluations different attributes were short-listed that could help in crime prediction. Following is the list of attributes which could help in predicting crime incidences 4

Figure 2 List of Selected Attributes for Crime Analysis CLASSIFICATION & PREDICTION OF CRIME TRENDS The aim here is that we can use data mining to detect much more complex patterns since in real life there are many attributes or factors for crime and often there is partial information available about the crime.in a general case it will not be easy for a computer analyst or detective to identify these patterns by simple querying. Thus classification and clustering techniques using data mining comes in handy to deal with noisy or missing data about the crime incidents. RESULTS AND ANALYSIS Depending upon these results, following observation can be made that might be useful in predicting crime behaviour/trend in an area. 5

4. Age Group Factor - Figure 3 Shows the tendency of crime to happen within a particular age group. X- Axis Shows the various age groups Y- Axis Shows the no of instances Legends - Red Commit Again, Blue Don't Commit Again Following observation can be made from the above (i)people from age group 20-35 are more Venerable to committing crime again 4.2 Crime Type Factor 6

Figure 4 Shows the tendency of crime to happen based on the crime type. X- Axis Shows the various crime type Y- Axis Shows the no of instances Legends Red Commit Again, Blue Don't Commit Again Following observations can be made from the above (i) Out of the 3638 instances of crime dataset, majority of the people where prosecuted on IPC, 860 Act (Crime Type Code -0043) and the probability of crime's of this type being committed again is high. (ii) Out of the 749 instances of criminals prosecuted under The Punjab Excise Act, 94 (Crime Type Code -0708) none showed any probability of committing crime again. (iii) Out of the 458 criminals prosecuted under the Arms Act, 959 (Crime Type Code -0004) many had the tendency of committing crime again, indicating a possible flaw in the punishment and hence there might be a need to replace this law with a tougher law. 4.3 Religion Factor 7

Figure 5 - Shows the tendency of crime to happen based on the religion. X- Axis Shows the various religions Y- Axis Shows the no of crime instances Legends - Red Commit Again, Blue Don't Commit Again Following observations can be made from the above - (i)majority of Criminal cases were registered against Religion code 003 and with this Religion group showing the higher tendency of committed crime again. (ii) Second religious group, Religion code 0006 also showed a tendency of committing crimes again. 4.5 Category Factor - Figure 6 - Shows the tendency of crime to happen based on the category. X- Axis Shows the various categories Y- Axis Shows the no of crime instances 8

Legends - Red Commit Again, Blue Don't Commit Again Following Patterns can be observed from above - (i)people belonging to Category Code A had more tendency of committing crimes again. (ii)people belonging to Category O and T showed no patterns of crimes being committed again. 4.6 Living Status Factor Figure 7 - Shows the tendency of crime to happen base on the Living Status of the Accused. X- Axis Shows the Different Living Status Y- Axis Shows the no of crime instances Legends - Red Commit Again, Blue Don't Commit Again Following Patterns can be observed from above - (i)people with living Status 002 (Lower Income Group ) had registered more no of criminal proceedings and slightly more tendency of committing crimes again (ii)people with living status 006 (Upper Income Group ) registered lowest no of crimes 4.7 Month Factor 9

Figure 8 - Shows the tendency of crime happen during various months of year. X- Axis Shows the months of year Y- Axis Shows the no of crime instances Legends - Red Commit Again, Blue Don't Commit Again Following Patterns can be observed from above - (i)months of April and May witnessed more no of crimes (ii)no of crime incidences have an increase during January and May and decline thereafter. 4.8 Crime Trend Prediction (District Wise) 0

Figure 9 Shows the District Wise crime patterns for various months of the year. X- Axis Shows the various months Y- Axis Shows the no of crime instances Legends - Blue - District 66 (North Delhi District) Red - District 7 (South West Delhi District) Cyan- District 73 (North East Delhi District) Dull Green- District 62 (Central Delhi District) Following Crime trends can be observed (i)district 66 and District 7 has witnessed an increasing trend in crime rate till month of May (ii)crime rate in District 73 and District 62 has not changed significantly over various months 4.9 Year- Wise Crime Patterns in Various Districts Figure 0 Shows the District Wise crime patterns for year 2006-2007. X- Axis Shows the various years Y- Axis Shows the no of crime instances Legends - Blue District 66 (North Delhi District) Red - District 7 (South West Delhi District) Cyan District 73 (North East Delhi District) Dull Green District 62 (Central Delhi District) Following Crime Trends can be observed-

(i)district 66 registered 90 % increase in the no of crime incidences from 2006 to 2007 (ii)other Districts also registered an alarming increase the no of crime incidences as compared to previous years. 4.0 District Wise Distribution Of Violent Crime Figure Shows the District Wise crime patterns for violent Crimes X- Axis Shows whether violent crime or not Y- Axis Shows the no of crime instances Legends - Blue- District 66 (North Delhi District) Red - District 7 (South West Delhi District) Cyan - District 73 (North East Delhi District) Dull Green - District 62 (Central Delhi District) Following Crime Patterns can be observed - (i)district 66 & District 7 has witnessed increasing rate of violent crime as compared to the previous years. 2

4. Decision Tree Generated from Crime Data We used Decision tree classifiers as supervised learning means to generate the tree and its rules that will be used to formulate predictions. Figure 2 Decision Tree Out of the 23 attributes, based on the training data set provided, only 4 attributes were the deciding factors that could be used for Predicting the crime trends.this varies largely on the crime instances that were fed as the training set. Full Training Set classifier model was chosen in the weka. === Classifier model (full training set) === 3

Figure 3 Textual Decision Tree This supervised decision based classification algorithm has identified violent crime, age, employed and gender as the decisive factors/attributes in predicting whether the particular criminal has the tendency of committing crime again or not! So based own the past crime records it has found the following pattern that - - If it is a violent crime And age between 9 to 35 And not employed And gender male Then The Probability of this crime incidence to reoccur is very high 4.2 Predictions Apart from Visualizing and predicting crime trends, you can get answers to questions like Q. What is the Probability that a Crime of Crime Type 0708 would be committed by an Indian National (080) in district 62 of Delhi region territory? 4

Figure 4 Prediction Based on the patterns in the previous crime records we can predict the probability of occurrence of above instance.so, Depending upon the patterns and trends in past criminal cases, the algorithm has predicted the possibility of such incidence as 'no' with a probability of.796. As In district 62, crime type of 0708 is very uncommon, hence the probability of above crime incidence would be very less hence policing can be planned and diverted to more crime prone area. CONCLUSION & FUTURE DIRECTION We looked at the use of data mining tool WEKA for identifying crime patterns using statistics and classification techniques. Our contribution here was to formulate crime pattern detection as machine learning task and to thereby use data mining to support police detectives in solving crimes. Some of the benefits include: I. Less time and effort required to make more informed decisions for Proactive Crime Prevention. II. Better understand crime patterns and trends III. Reduce crime and identify serial offenders IV. Help in informed decisions, analyze crime patterns and plan policing V. Research crime trends VI. Easily discern emerging and established crime trends as well as deviations and anomalies. Some of the limitation of our study includes that crime pattern analysis is sensitive to quality of input data that may be inaccurate, have missing information etc. Also mapping real data to data mining attributes is not always an easy task and often requires help from domain experts. 5

As a future extension of this study we will extend Weka Interface by generalizing it for the crime domain. We also plan to add support for geographic based crime hot spot analysis. REFERENCES [] Han, J., Kamber : Data mining: concepts and techniques. Morgan Kaufmann, San Francisco (200). [2] Quinlan JR.C4.5, "Programs for Machine Learning," Morgan Kaufmann, 993 [3] Quinlan JR, "Induction of Decision Tree Machine Learning vol., Morgan Kaufmann, pp.8-06, 986 [4] WEKA-project,WEKA 2009,University of Waikato Available from : www.cs.waikato.ac.nz/~ml/weka/. [5] A. K. Jain, M. N. Murty, P. J. Flynn, Data clustering: a review, ACM Computing Surveys, 3(3), 999, 264-323. [6] Mitra S, Acharya T. Data Mining, Multimedia, Soft Computing, and Bioinformatics. John Wiley & Sons, Inc. 200 [7] C McCue, Using Data Mining to Predict and Prevent Violent Crimes at http://www.spss.com [8] Cognoss Crime Information warehouse at http://www.cognos.com [9] SPSS's PASW Modeler at Richmond Police Department. http://www.spss.com/success/pdf/rpdcs-2009.pdf 6