How to Reduce Crime in Big Cities
October 15, 2017 | Posted By: Marty Abbott
I’m a huge Malcolm Gladwell fan. Gladwell’s ability to convey complex concepts and virtually incomprehensible academic research in easily understood prose is second to none within his field of journalism. A perfect example of his skill is on display in the Tipping Point, where Gladwell wrestles the topic of Complexity Theory (aka Chaos Theory) into submission, making it accessible to all of us. In the Tipping Point, Gladwell also introduces us to The Broken Windows Theory.
The Broken Windows Theory gets its name from a 1982 The Atlantic Monthly article. This article asked the reader to imagine a building with a few broken windows. The authors claim that the existence of these windows invite vandals to break still more windows. A continuous cycle of expanding vandalism ensues, with squatters moving in, nearby buildings getting vandalized, etc. Subsequent authors expanded upon the theory, claiming that the presence of vandalism invites other crimes and that crime rates soar in communities where unhandled vandalism is present. A corollary to the Broken Windows Theory is that cities can reduce crime rates by focusing law enforcement on petty crimes. Several high profile examples seem to illustrate the power and correctness of this theory, such as New York Mayor Giuliani’s “Zero Tolerance Program”. The program focused on vandalism, public drinking, public urination, and subway fare evasion. Crime rates dropped over a 10 year period, corresponding with the initiation of the program. Several other cities and other experiments showed similar effects. Proof that the hypothesis underpinning the theory is correct.
Not So Fast…
Enter the self-described “Rogue Economist” Stephen Levitt and his co-author Stephen Dubner - both of Freakonomics fame. While the two authors don’t deny that the Broken Windows theory may explain some drop in crime, they do cast significant doubt on the approach as the primary explanation for crime rates dropping. Crime rates dropped nationally during the same 10 year period in which New York pursued its Zero Tolerance Program. This national drop in crime occurred in cities that both practiced Broken Windows and those that did not. Further, crime rate dropped irrespective of either an increase or decrease in police spending. The explanation therefore, argue the authors, cannot primarily be Broken Windows. The most likely explanation and most highly correlated variable is a reduction in a pool of potential criminals. Roe v. Wade legalized abortion, and as a result there was a significant decrease in the number of unwanted children, a disproportionately high percentage of whom would grow up to be criminals.
Gladwell isn’t therefore incorrect in proffering Broken Windows as an explanation for reduction in crime. But the explanation is not the best one available and as a result it holds residence somewhere between misleading (worst case) and incomplete (best case).
To be fair, it’s hard to hold Gladwell accountable for this oversight. Gladwell is not a scientist and therefore not trained in how to scientifically evaluate the research he reported. Furthermore, his is an oft repeated mistake even among highly trained researchers. And what exactly is that mistake? The mistake made here is illustrated by the difference in approach between the Broken Windows researchers and the Freakonomics authors. The Broken Windows researchers started with something like the following question “Does the presence of vandalism invite additional vandalism and escalating crime?” Levitt and Dubner first asked the question “What variables appear to explain the rate of crime?”
Broken Windows started with a question focused on deductive analysis. Deduction starts with a hypothesis - “Evidence of vandalism and/or other petty crimes invites similar and more egregious crimes”. The process continues to attempt to confirm or disprove the hypothesis. Deduction starts with a broad and abstract view of the data – a generalization or hypothesis as to relationships – and attempts to move to show specific relationships between data elements. The Broken Windows folks started with a hypothesis, developed a series of experiments to test the hypothesis and then ultimately evaluated time series data in cities with various Broken Windows approaches to policing. What they lacked was a broad question that may have developed a range of options indicating possible causes.
The Freakonomics authors started with an inductive question. Induction is the process of moving from specific observations about data into generalizations. These generalizations are often in the form of hypothesis or models as to how data interacts. Induction helps to inform what questions should be asked of the data. Induction is the asking of “What change in what independent variables seem to correspond with a resulting change in some dependent variable?” Whereas deduction works from independent variable to dependent variable, induction attempts to work backwards from dependent variable to identify independent variable relationships.
The jump to deduction, without forming the right questions and hypotheses through induction, is the biggest mistake we see in developing Big Data programs and implementing Big Data solutions. We all approach problems with unique experiences and unique biases. The combination of these often cause us to race to hypotheses and want to test them. The issue here is two-fold. The best case is that we develop an incomplete (and as a result partially or mostly incorrect) answer similar to that of The Broken Windows researchers. The worst case is that we suffer what statisticians call a Type 1 error – confirming an incorrect answer. The probability of type 1 errors increases when we don’t look for alternative or better answers for outcomes within our data sets. Induction helps to uncover those alternative or supporting explanations. Exploring the data to discover potential relationships helps us to ask the right questions and form better hypotheses and better models. Skipping induction makes it highly probable that we will get an incorrect, misleading or substandard answer.
But it is not enough to simply ensure that we practice both induction and deduction. We must also recognize that the solutions that support induction are different from those that support deduction. Further, we must understand that the two processes while complimentary can actually interfere with each other when performed on the same system. Induction is necessarily a very broad and as a result slow and tedious process. Deduction, on the other hand, needs significantly less data and “prefers” to be faster in implementation. Inductive systems are best supported by solutions that impose very few relations or structure on the data we observe. Systems that support deduction, in order to allow for faster response times, impose increased structure relative to inductive systems. While the two phases of discovery (Induction and Deduction) support each other, their differences suggest that they should be performed on solutions purpose built to their specific needs.
Similarly, not everyone is equally qualified to perform both induction and deduction. Our experience is that the folks who tend to be good at determining how to prove relationships between variables are often not as good at identifying patterns and vice versa.
These two observations, that the systems that support induction and deduction should be separated and that the people performing these tasks may need to be different, have ramifications to how we develop our analytics systems and organize our Big Data teams. We’ll discuss these ramifications and more in our next post, “10 Anti-Patterns within Big Data”.
Subscribe to the AKF Newsletter
How an AI bot beat the world's best gamers
September 5, 2017 | Posted By: Roger Andelin
Last month, a bot developed by OpenAI (co-founded by Elon Musk) beat the world’s best, pro Dota 2 players. This is another milestone accomplishment in the field of artificial intelligence and machine learning and more fuel for the fire of concerns surrounding the AI debate. However, before we jump into that debate, here is some background you should understand about the technology fueling this debate.
The Evolution of Traditional Programming
A lot of what computer programming is can be simplified into three steps. First step, read in some data. Second step, do something with that data. Third step, output some result.
For example, imagine you want to fly somewhere for the weekend. You may first go to your travel app and input some dates, times, number of people traveling, airports, etc. Second, the app uses that information to search its database of available flights. Third, it returns a list of available flights for you to see.
This approach to software design has been the norm since the earliest days of programming. Artificial intelligence, in particular machine learning, has changed that approach. The first step is still the same: Read in some data. The third step is the same: Output some result.
However, with artificial intelligence technologies like machine learning, the second step, doing something with the data, is very different. In the example of finding a flight, a programmer easily can read the software code to understand the sequence of steps the computer has been programmed to do to produce the output data. If the programmer wants to change or improve the program’s behavior she can do that by writing new code or by altering the existing code. For example, if you wanted to compare the prices for available flights near the dates you have selected, a programmer can easily change several lines of code in the program to do just that. The programming code identifies every step the computer takes to arrive at its output. Said another way, the program only does what it’s specifically told to do in the code, nothing more or nothing less.
By contrast, the output of today’s most common machine learning programs is not determined by instructions written in computer code. There is no code for a programmer to read or modify when a change is desired. The output is determined by the program’s neural network.
Neural Networks in Action
What is a neural network? At the core of a neural network is a neuron. Similar to a traditional computer program, a neuron takes some input data, does a mathematical calculation on that data and then outputs some data. A typical neuron in a neural network will receive as input hundreds to thousands of numbers, typically between 0 and 1. A neuron will then multiply each number by a weight and sum the result of all the numbers. Many neurons will then convert the result into a number between 0 and 1. That result is then sent to the next neuron in sequence until the final output neuron is reach.
Here is an example of the math a typical neuron will do: If “x1, x2, x3…” represents input data and “w1, w2, w3…” represent the weights stored in the neuron, the calculation done by the neuron in a neural network looks like this: x1*w1 + x2*w2 + x3*w3 and so forth.
You can think of the calculation inside the neuron in a different way: The neuron is reading in a bunch of numbers and the weights in the neuron determine the importance, “or weight” of that input in producing the output. If the input is not important the weight for the input will be near zero and the input is not passed along to the next neuron. Therefore, the weights in a neuron effectively decided what input is valuable and what input should be ignored.
In a neural network, neurons like the one I described above are connected in parallel and in series to create a matrix of neurons. The input data to a neural network will go into hundreds or thousands of neurons in parallel, all with different weights. The output of those neurons is then sent to another layer of neurons and so forth, usually multiple layers deep. This is called a deep neural network. Another way to look at this is the neurons are grouped into a matrix of rows and columns, all interconnected. The final layer of the neural network is the output layer. Therefore, the final output of a neural network is the result of millions of calculations done by the neurons of that network.
When a programmer creates a neural network in software, the weights for each neuron are initially just random numbers. In other words, the weights arbitrarily decide to either diminish, increase or leave the input data alone, and output from the network is random. However, through a process called training, the weights move from randomly assigned values to values that can produce very useful outputs.
Training is both a time consuming and complicated mathematical process. However, it is much like training you and I would do to get better at something. For example, let’s say I wanted to learn how to shoot and arrow with a bow. I might pick up the bow and arrow, point it at the target, pull back the string and release. In my case, I know the arrow would miss the target. Therefore, I would try again and again making corrections to my aim based on on how far and which direction I was off from the target.
During the training process for a neural network, the weights in each of the neurons are changed slightly to improve the output, or “aim.” The most common approach for making those changes is called backpropagation. Backpropagation is a mathematical approach for applying corrections to every weight in every neuron of the network. During training, input is fed into the network and output is generated. The output is compared to the desired target and the difference between the output and the target is the error. Using the error, backpropagation makes changes to the weights in each neuron to reduce the error. If all goes well during training and backpropagation, the output error diminishes until it reaches expert or better than expert level.
AI vs Humans
In the case of the OpenAI Dota bot that recently beat the world’s best Dota 2 player, the outputs, which were a sequence of steps, strategies and decisions, went from random moves to moves that were so good the bot was able to easily defeat the best pro players in the world. The critical information that enabled the bot to win its matches was stored in the weights of the neurons and the neural network architecture itself.
A good question at this point is to ask if a programmer looking at the Dota 2 bot’s neural network could understand the steps taken by the bot to beat the human player. The answer is no. A programmer can see areas of the neural network that influence an output but it is not possible to explain why the bot took specific steps to formulate its moves and strategies. All the programmer would see is a huge matrix of weights that would be quite overwhelming to interpret.
Another good question to ask is whether or not a program written traditionally by a programmer with step by step instructions could beat the best Dota 2 player. The answer is no. Step by step programs where the programmer specifically instructs the computer to do something would easily be defeated by a professional player. However, a neural network can learn from training things that a programmer would never have the knowledge to program, store that learning in its neurons and use that learning to do things like defeat a human pro.
What makes the Dota 2 bot special is that it learned to beat the best pro players by playing against itself whereas most machine learning programs learn from training on data given to it by a programmer. In machine learning, good training data is like gold. It’s scarce and valuable. (note: This is one reason why Google and other big tech companies want to collect so much data.) Data is used to train neural networks to do useful things like recognizing people and places in your pictures or recognizing your voice from others in your family. OpenAI built a bot that learned almost entirely by playing against itself with the exception of some coaching provided by the OpenAI team. OpenAI has shown clearly that learning can occur without having tons of training data. It’s a little like being able to make gold.
Does the development of the OpenAI dota bot mean bots can now decide to train against themselves and become super bots? No. But it does say that humans can now program two bots to train against each other to become superbots. The key enabler being us. It’s anyone’s guess what type of bot can be imagined and developed in this way, useful or harmful. Obviously to most, a gaming superbot seems pretty innocuous, except of course to the gamer who may unexpectedly run into one during a match. However, it’s not hard to imagine super bots that are not so harmless. Or, perhaps you can just imagine a time when someone trains a bot to play football against itself until the bot becomes better at calling plays and strategy than every coach in the NFL. What happens then? The answer is disruption. Are you ready for it?
AKF Partners recommends that boards and executives direct their teams to identify sources of innovation and patterns of disruption that AI techniques may represent within their respective markets Walmart is already working on facial recognition technology in their stores to determine whether or not shoppers are satisfied at checkout. Will this give them a potential advantage over Amazon? How can machine learning and AI help you prevent fraud in your payment systems or the use of your commerce system to launder money?
AKF is prepared to help answer that question and others you may be facing. We will help you craft your AI strategy, sort through the hype, help you find the opportunities, and identify the potential threats of AI technology to your business.
Reach out to AKF
Subscribe to the AKF Newsletter