Methods

ArcMAP

To begin our analysis, we obtained the csv file containing every crime incident in Chicago in 2013, from the City of Chicago Data Portal website. Necessary components in this document include the type of crime, individual coordinates of each crime, and the community area number each crime falls into. This file was added to ArcMap, applied X and Y coordinates to it, and was named the weapon layer. A layer containing the actual name of communities and their area number obtained from the same portal was added on top of the crime layer. The third layer, socio-economic variables of each community, was obtained again from the same portal. The area number of this last layer containing socio-economical variables but no coordinate system is identical to the area number in the community layer containing coordinate system, so we joined the tables of the two layers together to produce a new layer containing socio-economic variables in communities containing accurate longitude and latitudes systems. This layer is named com_social. The weapon layer then became the target layer for performing a Join table action called Join data from another layer based on spatial location, where the number of weapon violations are summed in each community. The resulting file yielded a newly summarized field know as count_, our dependent variable for the rest of the study, with the seven independent social economic variables.

Exploratory Regression Analysis

This analysis examined all combinations of the input explanatory variables (independent variables) influencing the spatial occurrence of weapon violation incidents, then finding the best combination of independent variables with the greatest influence on crime rate. The Exploratory Regression Analysis tool can be found in the Spatial Statistics category in Arc Toolbox. The input feature is weapon_crime, with dependent variable on count_, and Candidate Exploratory Variables as the independent variables, excluding the Hardship Index. The output text file identified three variables combined with the highest Adjusted R-squared (AdjR2) value and the lowest Akaike’s Information Criterion (AICc). These variables were: percentage of the number of crowded housing, the number of people above sixteen who are unemployed, and people over the age of twenty-five without a high school diploma. These variables were later taken into the analysis of OLS and GWR regression model runs.

Ordinary Least Square

The results of an OLS regression indicates a commonly-used linear regression that generates predictions or models a dependent variable in terms of its relationships to a set of independent variables. The OLS regression can be found in the same category as the Exploratory Regression Analysis in Arc Toolbox. From the previously conducted regression, it was concluded that the combination of percentage of the number of crowded housing, the number of people above sixteen who are unemployed, and people over the age of twenty-five without a high school diploma had the highest influence on crime rate. The Input Feature Class was still the weapon_crime, Unique ID Field is ID_, Dependent Variable is count_, and the identified variables were put in for Explanatory Variables. The output included a pdf, which will later be interpreted.

Geographically Weighted Regression

GWR generates spatial data that reveals the spatial variation I the relationships among the dependent variables. GWR was performed using the same social-economic variables as OLS. The Kernel type was set to Adaptive and the Bandwidth method was set to Bandwidth_parameter. With the Adaptive option, the bandwidth distance changes according to the spatial density of features in the input feature class; the number of neighbours used for the analysis is reported instead of a specific distance (ESRI). We tested 15, 30, and 45 as the number of neighbors and determined 30 as our best option. As shown in Figure 3, Panel B (Bandwidth as 30) shows an appropriate amount of generalization and a proper degree of smoothing in the model. We also rounded output cell size to the nearest integer to 455 and generated the coefficient raster layer to show the effect of individual variable on the spatial pattern.

Fig. 3 Various GWR local bandwith attempts, including 15, 30 and 45

CrimeStat

Fuzzy Mode

Our primary dbf source was extracted from the weapon_crime contain the location of each crime. POINT_X and POINT_Y were added to its attribute table by using Add XY Coordinates from Arc Toolbox. These points made it possible for the exported dbf file to be imported into Crimstats. Points x and y were set accordingly, and the z(intensity) was from the count column, which consisted of single count number at each crime location. The type of coordinate system was Projected (Euclidean) and we used feet as data units as the original shapefile was in customary units. We ran the Hot-Spot Fuzzy Mode analysis using a 2500 feet radius as it calculates the number of incidents at each location and within a 2500 feet radius. The identification of small hot spot areas allows for the accumulation of crimes in close vicinity to be identified. The computed dbf was then added to ArcMap using Add XY Data… The symbology of the points was changed to Unique Values, using FREQ (frequency) as its value field and a green to red gradient color ramp.

Nearest Neighbour Hierarchical Spatial Clustering (Nnh):

The Nnh process in CrimeStat differs from the traditional method of nearest neighbour calculation. This process defines a threshold where only points that are closer to one or more other points than the threshold distance are selected for clustering (CrimeStat IV Chapter 7). For our analysis, we fixed our distance on 2500 feet with a minimum of 10 points per cluster. The resulting shapefile was imported to ArcMap for further analysis.

Risk Adjusted Nearest Neighbour Hierarchical Spatial Clustering (Rnnh):

Population data was extracted from Illinois Action for Children dataset, where we subtracted the number of personals twelve or under by the total number of people in the community. The justification for age 12 is because of both data limitations and we believe people below this age could not easily commit such crime. The population data was then imported into ArcMap, aligning with the name of the community where location data are present. X and Y points were then added to the file, exporting the file thus created a file that could be used in CrimeStat. The newly created dbf, namely pop12.dbf, was selected as the secondary file. Appropriate X and Y points were selected, along with population in the Z(intensity) column. In the parameter settings, the distance is fixed at 2500 feet, with at least 5 points per cluster and a minimum of 100 sample size. The results were saved and imported into ArcGIS. Three orders of clusters were produced, identifying the most prominent areas of weapon crime incidents, in relations to the population over 12 years of age.

Kernel Density: Single and dual surface estimation

Before generating a kernel density map, a reference file is required. We created a new grid from the reference file tab, with X Y inputs (pic)). These x and y coordinates were derived from properties, then under Source tab, in our community shapefile. The cell is spaced at 820 feet. The Kernel density interpolation generates a map showing the likelihood of the incident occurring at a location. This can be found under Spatial Modeling I’s Interpolation I. Under this tab, only Single is checked for a single kernel density map. Additional setting toggles include changing method of interpolation to triangular, area units to square miles, absolute density as output units and minimum sample size to 100. The output shp file is then imported into ArcMap. The kernel density turned out to be much bigger than Chicago, therefore we clipped the kernel density by the outline of the city. The classification scheme under graduated colors symbology was changed to Geometrical Interval, with 10 classes, and added two zeroes to Sampling in order to have all data points considered for classification. The value under fields is set at Z, with a green to red color ramp.

The dual kernel density takes the secondary file, which is population, into consideration. The result varies from single in terms of having population normalization layer. Most of the configuration in this category is identical to those in single kernel density. The only difference is in the output units, which is ratio of densities. The produced shapefile is again imported into ArcGIS, where it is clipped with the city outline, repeating the same steps as above.

Additional Attempts

SPSS

SPSS was utilized to conduct a discriminant analysis in order to identify individual variables contributing to the occurrence of weapon violations in the communities. Two groups were generated, one consisting of crimes, the other without crime. Seen in Figure 4, each row number represented a community, the class number was derived from count, where a zero count would be class one, and any count equal to or greater to one would fit the second class. As the enumeration area for this study was in the form of communities, only one community yielded clean of crime. This community alone was inadequate of justifying for independent variables contributing to the occurrences of crimes. The use of discriminant analysis was thus aborted.

Fig. 4 SPSS discriminant analysis

Fig. 5 SPSS Result

WEAPON VIOLATIONS IN CHICAGO

An Analysis of Spatial and Socio-Economic Factors