[ad_1]
[Editor’s note: this post was co-authored by SAS’ Tom Sabo.]
Narrative data from police agencies on arrest or offense incidents, as well as referrals to police departments, are rich in information and are also largely unavailable to the public for analysis. That said, I recently came across ~45,000 unique narratives describing a police incident that occurred in the city of Dallas, TX, available at https://www.dallasopendata.com/ .
Evaluating large amounts of narrative data for patterns using manual analysis alone can be time-consuming and yield limited qualitative results. We have begun to demonstrate how modern methods in text analytics can help. In particular, we wanted to identify operational textual and geospatial patterns related to human trafficking (Figure 1) and other crimes.
Figure 1. Example narrative incident
To solve this, we had to think critically about improving the existing process with technology. In particular, this included the possibility that individuals who work in policing on a day-to-day basis would benefit, rather than an analyst or data scientist. Ultimately, we sought to improve the time value for police investigators by using text analytics to highlight trafficking incidents and other crime patterns, then provide intuitive access to them through visual dashboards. Fortunately, the text analytics methods we’ve used elsewhere that automatically categorize data and look for trends, entities (people, places, objects) and relationships between them work very well for police incident reports.
This workflow and approach can be seen in Figure 2 below, which details the general process and analytics used to police incidents. Narrative text was moved through a GUI-based text pipeline that used common and industry-standard NLP (natural language processing) and text analytics approaches such as topic analysis, entity extraction, summarization, text data profiling, and more. This pipeline-based approach provides standardized, analytics-ready tables that we bring to Visual Analytics to explore, explore, and visualize the results of our analysis. This process provides enormous time value in terms of extracting crime-related information from large narrative data that is immediately usable by police investigators. For this process, we identified patterns of theft, violence and human trafficking in minutes from 45,000 narratives.
Figure 2: Text analytics workflow and approach
Much of our results were based on rules we developed using SAS Visual Text Analytics, which essentially defined the ways to extract the above and other crime patterns mentioned. Concept rule sets and open source integration were used to extract, geocode, and categorize locations by type. To accomplish this, a rule was written by which street addresses were obtained. This rule used a combination of street numbers, street words (Avenue, Street, Drive, etc.), direction indicators (N, S, E, W), and filler words to represent the literal street name. Using this, we were able to filter out incidents occurring near schools, as shown in Figure 3.
Figure 3: Geolocation concept rules and derived analysis
After extracting the full street names, they ran a Python process (using geopy) that produced the latitude and longitude for each address. The resulting coordinates were then reverse geocoded. This is done to return the address from the newly discovered coordinates. This is done to allow a longer address to be returned from the process.
Example of address geocoding and reverse geocoding:
- Original street name: 920 SAS Campus Drive Cary, NC 27513
- Geocoordinates: 35.815658, -78.749284
- Reverse geocoding: SAS Global Education Center920 SAS Campus Drive Cary, NC 27513
As shown in the previous example, performing reverse geocoding can provide additional information such as a hotel, gas station, school, or other key names for that address. This additional information allowed us to group the extracted locations into a VTA-created taxonomy that classified locations by type. We built ~10 locations for this project, including gas stations, restaurants, hotels, and schools, among others. When combined with additional analysis, this additional categorization is useful and provides new structured fields to use as an entry point for visual analytics analysis. This additional entry point allowed for exploratory analysis and quick discovery of interesting findings. One example is the discovery of an armed robbery that occurred in front of an elementary school. We were able to geospatially target and categorize the unstructured narrative by time, place, and event type by geocoding, location type estimation, and weapon retrieval to aid investigators and increase analyst efficiency.
Additional rules have been developed within the VTA for the impounding of vehicles at police incidents. This rule would use a combination of key vehicle characteristics such as color, make, model, year, type, and basic vehicle descriptors. By looking at combinations of these features, we have extracted multiple vehicles from this narrative, providing additional and useful information as you study the narratives and look at trends across the corpus. Examples of vehicles identified in the narratives are shown in Figure 4 .
Figure 4: Obtaining a vehicle
Many of the extracted concepts are shown in the network diagram (Figure 5) below as they relate to their source documents. Blue nodes are source documents, yellow nodes are addresses, and orange nodes are weapon references. This visualization allows users to quickly explore overlaps, trends, and potential modes in 40k narrative accounts. Discovering the many connections and overlaps would not have been possible without human manual extraction and visualization of the concept. Many examples of potentially interesting trends can be seen in Figure 5 below. We can see many narratives about the 2005 White Chevy Van, for example. This may indicate a trend for this vehicle and warrant further study of the source narratives. Another example is examining the frequency and trends with which specific weapons or addresses are referenced in reports.
Figure 5: Network-based exploration of extracted concepts in SAS Visual Analytics
Human trafficking rules were developed using artificial intelligence and statistical methods in SAS Visual Analytics to identify patterns around known objects. For example, in Figure 6 below, by searching the narrative database for terms like ‘prostitution’, we immediately identify terms related to trafficking, including ‘harbour’, ‘recruitment’ and, in particular, ‘minor petitioner’.
Figure 6: Using SAS Visual Analytics to identify terms and incidents related to human trafficking
From here, using artificial intelligence methods and additional rules related to threats, coercion, blackmail, and escape, we were able to highlight narrative incidents that directly highlighted human trafficking (like Figure 7 below) or highlighted risky situations such as physical violence against women/teenagers. which may be directly related to human trafficking or create a trafficking situation in the future.
Figure 7: Flagging statements in narratives that suggest human trafficking
Putting it all together, we can use the geospatial methods discussed earlier to isolate those narrative incidents associated with the risk of trafficking or human trafficking to make them available for investigation, as shown in Figure 8 below. It is intended to be an intuitive dashboard that an investigator or police officer can use.
Figure 8: Geospatially mapped narratives that contain or threaten human trafficking
In summary, our goal was to show how, given minimal structured data, we used the capabilities of text analytics to identify patterns in narrative data that can be evaluated in intuitive ways. Although police departments have additional metadata associated with these narrative incidents, it is possible that such metadata only allows for the primary crime, such as a drug use incident, while there are indications in the narratives of a secondary issue, such as the risk of human trafficking. Additionally, similar methods can be applied to textual or transcribed advice and other textual data sources for research to help filter, classify and route these links for quick action.
Learn more
[ad_2]
Source link