What do city-wide crime rates actually correlate with? A statistical dive into neighborhood crime rates in five cities.

Do city-wide crime rates really indicate anything important or useful to us?

When people look at crime rates, they’re typically looking to find out how dangerous or violent an area is. Crime rates are mostly widely presented as nation-wide crime rates or city-wide crime rates. Cities are large areas and anyone who has lived in a city knows that crime and danger are not evenly spaced out in that city. There’s always a “bad part of town” and a “good part of town”. We all understand then that city-wide crime rates do not tell us how violent or dangerous the entire city is. So, what do city-wide crime rates actually tell us? Do they correlate with how dangerous the most dangerous neighborhoods in a city are? Do they correlate with how dangerous the downtown area is? Do they correlate with something else? Do they correlate with nothing? That is what I’m trying to find out today.

As someone who’s looked at crime rates in neighborhoods a lot throughout the last decade of my life (dunno, it’s just been a persistent curiosity of mine), my prediction is that city-wide crime rates will not correlate with any useful metric. They will probably correlate mostly with what percentage of neighborhoods in the city are low crime, but they will absolutely not correlate with how bad the worst neighborhoods are, and they will not correlate with how bad the downtown area is.

Why do I make this prediction? Well, for example, a robbery rate of 500 per 100,000 residents is a high rate for a city, but it’s not a high rate for neighborhoods. I’ve seen many neighborhoods with robbery rates over 2,000. A neighborhood with a robbery rate of 500 has a pretty moderate or mediocre frequency of robberies, something you’d expect in a “it’s not the best part of town, it ain’t quiet as the burbs, but it’s definitely not the hood” neighborhood. A bad neighborhood can have a robbery rate of just 500, but it wouldn’t have a robbery problem… perhaps it has a homicide or assault problem.

My point is that the typical city has city-wide crime rates that a pretty safe neighborhood would also have.

So, for this study, I looked at crime rates by neighborhoods in five cities: The American cities of Pittsburgh, Minneapolis, Buffalo, Portland and for some national diversity, the Canadian city of Winnipeg. I did look at more cities, but the data for these cities was not usable for reasons such as: (1) the neighborhood populations were too big, (2) I couldn’t find census data via neighborhood, (3) the crime data only had coordinates or addresses and not neighborhoods, (4) the crime data didn’t distinguish specifically enough between violent crimes.

Phenomena resolution

Not long ago, I was reading a discussion about Tornado Alley on Reddit. And someone, chiming in with a surprising “fact”, pointed out that per land unit, tornados are more common in the UK than in the USA. The idea here was that the prevalence of tornados is overstated in Tornado Alley. However, another user pointed out that although this is technically true, it’s not a useful fact because the entire landmass that makes up the USA consists of many areas that are devoid of tornados, like Oregon and Maine, and others that are hotbeds, like Kansas and Nebraska. The reality remains that the worst tornado spots in the USA are much worse than the worse tornado spots in the UK, and that the notoriety of Tornado Alley is not overstated. So the “fact” that the UK has more tornados per land unit than the USA is irrelevant to the severity of Tornado Alley.

This is an example of what I call phenomena resolution. If someone said it’s “102 degrees in Texas today”, that would be a useless statistic because the geographic area (Texas) is too large to map, average or attribute the phenomena (temperature) to. If it’s raining in California, that does not mean it’s raining in Oakland. How big of an area is appropriate to attribute rain to? Not an entire state, that’s for sure. How big of an area is appropriate to attribute a temperature to? Not an entire state, that’s for sure. How big of an area is appropriate to attribute a tornado likelihood to? Not an entire country, that’s for sure. Statistics about the frequency or existence of phenomena or occurrences are only useful if the zone or area is an appropriate size.

So, how big of a population (the area) is appropriate to attribute a crime rate or violence level to (the phenomena)? Not an entire city, that’s for sure. In the same sense that it could be 20 degrees in Flagstaff on a January day while being 65 degrees in Phoenix, it could be very safe in one neighborhood while very dangerous in another one in the same city. Inasmuch as there is no such thing as the “temperature in Arizona”, and there is no such thing as “the danger of Phoenix” (or any city).

Crime rates of the five studied cities and the “violence rating” metric

The first chart here shows the city-wide crime rates of the five cities I looked at. Note that the data, depending on the cities, spans from 3 to 5 years. For more discussion on the year-span, there is a section on this page dedicated to it. 

The rates are: crime instances per population per year. The city-wide crime and population data also comes from the neighborhood crime data and the neighborhood census data; it does not come from averaging the FBI crime data nor the StatCan crime data, so these figures will not align perfectly with other sources.


This city-wide metric is only useful inasmuch as we will compare the city-wide rates to metrics that are actually informative: ones that have appropriate phenomena resolution (neighborhood sized populations), and we will end up seeing if the city-wide rates correlate with or predicts what the resolution-appropriate metrics show us.

The “violence rating” is a metric that weights and averages the three main violent crimes: homicide, robbery, and aggravated assault. The weights of each crime was determined by their frequency based on the 2019 national crime rates of the USA. Aggravated assaults were the most common of the three, so an aggravated assault is weighted at 1. Aggravated assaults were 3 times more common than robberies, so a robbery is weighted at 3. Finally, aggravated assaults were 48 times more common than homicides, so a homicide is weighted at 48.

After doing the multiplications for each crime, they are added together and then divided by three. The division is necessary because some jurisdictions don’t have proper data available for aggravated assaults, so when this is the case, like with Buffalo, we divide by two. This makes it possible to compare cities with missing or inappropriate data in one crime category.

For example, Portland has the following rates: 6.6 (homicide), 162.9 (robbery) and 373.8 (aggravated assault). The calculation therefore is: [(6.6 x 48) + (162.9 x 3) + (373.8)] / 3 = 393.4

Does the violence severity of downtown correlate with the city-wide crime rates?

The second chart here shows the crime in the downtown areas of the five cities.

Downtowns aren’t normal neighborhoods; they attract a lot more people than other neighborhoods for work reasons, commercial reasons, entertainment reasons, and so on. So, it’s not appropriate to use the residential population of a downtown to quantify and compare its violence. My thought here is the best way to quantify and compare downtowns is to use the “urban area population”, which is basically the population of people who do, could and would frequent that city’s downtown on a habitual or recurrent basis, for work, shopping, entertainment, and so on. Later on, we will look at why using the “city population” is inappropriate due its geographic arbitrariness (“suburban fluff”).

I also did make a chart using the residential populations of the downtowns in each city. This metric isn’t useful in my opinion, but I believe many would want to see it, so I just put it here for that reason. The order switches up quite a bit, though Winnipeg’s downtown is the worst with both metrics.


So, at this point we can see that one part of my hypothesis was correct: the city-wide rates do not correlate with how violent or dangerous a city’s downtown is. Winnipeg had the 4th highest violence rating at the city level, but has the highest violence rating at the downtown level, and by quite a bit too: double the rate of the second highest city in both metrics. Likewise, Buffalo had the worst city-wide crime rates, but its downtown is the second least violent of the five cities here.

The next series of charts take a look at the worst neighborhoods in each city. Note that any neighborhoods with populations under 500, I either excluded them or amalgamated them to a bordering neighborhood if it was appropriate (mainly if they had connectivity). I also excluded neighborhoods that were mostly industrial, and I halved the rates of downtown areas and other commercial-residential areas to make them more comparable with proper residential neighborhoods. Note that the only commercial-residential area (shopping malls) was “Lloyd” in Portland. I only looked at high crime areas to do this halving process, and the only other high crime commercial-residential area was “Polo Park” in Winnipeg but its population was under 500.



So at this point, again, we can see that another part of my hypothesis was correct: city-wide rates do not correlate with how bad the worst neighborhoods are. The top 5 neighborhoods with the worst violence ratings are not in Buffalo, rather they are in Winnipeg. Likewise, the single worst neighborhood in Pittsburgh, Buffalo and Minneapolis all have a similar violence rating between 3500 and 3600 despite the differences in their city-wide rates. So city-wide rates clearly do not correlate with how violent the most violent parts of the city are.

If you want to see the crime rates of all of the neighborhoods in each city, you can click here.

Camden and Compton are… average?

We’ll come back to finding out what the city-wide crime rates correlate with, but for now I’d like to expand a bit more on phenomena resolution and its relation to crime stats (crime resolution). The following is what I’m going to call the boundary bias. This factor has a colossal effect on the city-wide rates.
The mere boundaries of a city play a huge part in a city’s crime rates. Where does the city end and the next one begin? The boundaries between cities and towns are often times geographically arbitrary. For instance, there is no geographic feature that separates the municipality of Pittsburgh from the municipalities of Crafton and Ingram. There is no mountain. There is no river. There isn’t empty, undeveloped space. There is a continuous / contiguous urban landscape, and at some point, you happen to exit Pittsburgh and enter Crafton.


The same can be said about Portland and Gresham.


The same can be said about the Fairfax and the municipality of Cincinnati. In some cases we even have weird divisions where one municipality surrounds an entire other one. Like look at Norwood and St. Bernard within Cincinnati. Geographically speaking, they’re just like neighborhoods of Cincinnati. But they’re their own municipalities. These are called enclave cities.

The same can be said about the Fairfax and the municipality of Cincinnati. In some cases we even have weird divisions where one municipality surrounds an entire other one. Like look at Norwood and St. Bernard within Cincinnati. Geographically speaking, they’re just like neighborhoods of Cincinnati. But they’re their own municipalities. These are called enclave cities.


And if we go back to Pittsburgh… well, there is an enclave city called Mount Oliver which is directly beside a neighborhood in Pittsburgh called Mt. Oliver. Funny…




Sometimes several adjacent municipalities amalgamate into one city. For example, Winnipeg used to be twelve different municipalities but in the 1970’s, they all joined up and formed one city. And don’t worry, we’ll come back to this and look at what the crime rates of each of these municipalities would be if they never amalgamated.
So the point here is that city boundaries are often times very arbitrary. And this is relevant because, if they just happened to be different, the crime rates of the city would also be different… and in some cases, very different. What if Winnipeg never amalgamated? What if Newark amalgamated with a bunch of the municipalities to its south? What is Phoenix amalgamated with its entire continuous urban landscape? Well, let’s take a look.

Phoenix shares a continuous urban landscape with a bunch of other municipalities. Empty distance divides Phoenix from Tucson but it doesn’t divide Phoenix from Glendale. So, the city of Phoenix itself, its current boundaries, in 2020 had a homicide rate of 7.8 and a robbery rate of 189. The rates aren’t that menacing on their own, but they get even less so if you consider the entire continuous urban landscape as one amalgamated unit: Glendale, Chandler, Mesa, Gilbert, Scottsdale, Tempe, Tolleson, Goodyear, Peoria and Phoenix (and there were a bunch of other continuous towns and cities I didn’t include just for time sake): now we get a homicide rate of 4.8 and a robbery rate of 119, and a population of 3,698,571.



Newark is a city that is fairly notorious for its crime. And I’m not saying there isn’t a crime problem in certain parts of the Newark boundaries… but these boundaries really accentuate its crime if we compare it to cities like Phoenix. As per its current boundaries, in 2020 Newark had a homicide rate of 20.3 with a population of about 280,000 people. But if we just extend the boundaries southward to include the municipalities of Elizabeth, Hillside Township, Roselle, Roselle Park and Linden, we reduce the homicide rate to 12.7 with a population of about 510,000.

A huge contiguous urban landscape: Newark’s boundaries are geographically arbitrary


How much further would we have to go until we match Phoenix’s homicide rate at 7.8? Could we do it before we match populations at about 1.6 million? Well, it takes us an expansion up to 870,870 people to match the homicide rate of 7.8.

And this is a key phenomenon here: boundary accentuation. Some cities just happen to have boundaries that stop before much of the safer areas are included, which if included, would bring the city-wide rates down. Basically any city you see that has super high crime rates but also a small population: just go and look it up on Google Maps. East St. Louis and Camden for example, on a map they literally just look like the inner-cities, the older parts of town in cities with larger boundaries. There are massive continuous urban landscapes that they are a part of, yet they just happen to be their own sole cities because they never amalgamated with the contagious municipalities.

A picture of Camden. Look at the huge contiguous urban landscape, yet the small boundaries that Camden is.

So what if we did this to other cities? Let’s remove areas from a city’s boundary, make their population smaller and consequently, increase their crime rates. In other words, let’s boundary bias them: remove parts of a city’s boundaries to accentuate its crime.

Note: Keep in mind, the following boundary biases are all contiguously within a city. So I didn’t just pick the worst neighborhoods and combined them with gaps between; they had to touch each other and form one contiguous boundary.

Camden is one of the most notorious cities in the USA for its crime and poverty. It has about 75,000 residents. So let’s boundary bias Portland, Pittsburgh, Winnipeg, Buffalo and Minneapolis with their worst 75,000-residents clusters.


Compared to the other 75,000 population areas within these larger cities, it’s the 4th most violent of the 6 cities sampled here. I haven’t looked at neighborhood statistics in 100s of cities, but at worst it seems that Camden is akin to you’re average inner city area in a larger city; Camden just happens to be basically 100% inner city, therefore its city-wide crime rates look like an outlier and exceptionally bad

Compton is also one of the most notorious American cities for crime. It has about 100,000 residents. So let’s see what we get when we do the same experiment with Compton.

Well, we find the same outcome as with Camden: Compton doesn’t measure up to the worst contiguous clusters of about 100,000 people in the analyzed cities. Note that I know that Compton was way worse in the 90’s, so nobody needs to comment this. However its reputation is still thriving, which is why I used it as an example.

If you want to see the neighborhoods conjoined to create these worst-75,000’s and worst-100,000’s, click here.

De-amalgamating Winnipeg

What would Winnipeg look like, crime-statistically speaking, if it never amalgamated?


Old Winnipeg, with a homicide rate of 19.2, robbery of 615.5 and aggravated assault of 891.0, looks like a “tier 2” violent city now, doesn’t it? It’s pretty staggering just how the boundaries alone can affect the crime rate. So the happenstance boundaries really have profound effects on the city-wide rates; they bias the rates, and when we play with the boundaries: make them larger or make them smaller, we can really see the effects.

But still, even populations of 100,000 or 75,000 are still way too big a number for crime resolution. In my opinion, the ideal population size for a crime area is between 1,000 and 7,500 residents. I actually found data for many cities like Toronto, Vancouver, Chicago, St. Paul and so on, but in these cities practically each neighborhood has over 10,000 residents a piece. The average neighborhood in St. Paul has about 16,800 residents, for example. Chicago even has some neighborhoods with over 100,000 people. So the data in these cities was pretty useless. On the other hand, the average neighborhood in Minneapolis has 5,108 residents, a much better crime resolution.

The safe zone skew

My hypothesis was somewhat correct here. I said it would correlate with the amount of low-crime areas in the city, but it seems to correlate better with the amount of neighborhoods that are moderate and safe collectively. So in other words, city-wide rates correlate to the percentage of neighborhoods that are “not bad”. A city with a low crime rate tells us that the city has a lot of safe and / or mediocre neighborhoods, but it doesn’t tell us anything about the state of its rougher areas, its downtown, and so on.

Alright so let’s get some proper data on this to substantiate the claims made above.

The chart here shows the city-wide crime rates for each of the five studied cities here, and it also contains a 7-way categorization of neighborhoods based on their violence rating, and the figure below shows what percentage of the population in the given cities lives in a neighborhood within the categories.


What we see here are things like, only 9.2% of the population in the high crime rate city Buffalo lives in a neighborhood with a violence rating less than 300. On the contrary, this figure is 70.3% for Winnipeg and 52.2% for Pittsburgh. That right there demonstrates pretty well why the city-wide crime rates are so much higher in Buffalo than in Winnipeg and Pittsburgh: it’s the safe zone skew.

Another interesting and important statistic to notice is those neighborhoods between 600 and 999. Although these scores would be high for a city-wide rate, for individual neighborhoods, the way I would describe this violence rating range is like “not the best areas, but not the hood either”. You can see that 34.9% of Buffalo’s residents live in a neighborhood like this, while this figure is 18.7% for Minneapolis and 4.7% for Winnipeg. This is also going to have a big effect on the city-wide rates.

The percentage of the population that lives neighborhoods with a violence rating under 600 correlates quite well with the violence rating of the entire city.


The below is what I’m going to call “crime ribbons”. There are 100 lines that represent 1% each. Each line is given a color that corresponds to one of the 7 violence rating categories. I find looking at these really visualizes nicely what’s responsible for the city-wide crime rates.



Another interesting observation here is how Winnipeg and Pittsburgh have similar violence ratings at the city level, but such different distributions among the 7 categories.

In conclusion, the ability city-wide crime rates seem to have is predicting what percentage of the population lives in low and moderate crime environments. It does not help at all to predict how dangerous the shady areas and downtown are.

Something I noticed about city boundary tendencies in Canada vs. The USA

This is an interesting tendency related to the concept of the safe zone skew I noticed. Looking on Google Maps of various cities in the USA and Canada, I noticed that Canadian cities tend to include a lot more suburban developments into their “city” than American cities do.

Generally speaking, suburban areas have much lower crime rates than urban areas. I believe the main reason why suburbs tend to have less crime is simply because they are newer developments, and therefore they are more likely to be inhabited by economically well-off people, and at a local level, economic status is one of the main (but not the only) correlates with crime. I could hypothesize many other factors but the cause for this reality isn’t relevant at this point. The fact is suburbs have less crime than urban areas, therefore the greater percentage of suburbs in your city means the lower the city-wide crime rates will be. This is what I call “suburban fluff”. A lot of major Canadian cities have quite low crime rates. But this appears to be an illusion, the suburban fluff illusion.

I don’t have any data that properly delineates suburban and urban populations of cities. The best metric I could come up with was seeing what percentage a city’s urban area population is its municipality population is. The idea here is that the bigger percentage the municipality population is, the more suburbs that are in the municipality’s (city’s) boundaries. For simplicity sake, we’ll call this metric the “suburban inclusion likelihood”.

So, I looked at a bunch of American and Canadian cities and I found this:


As suspected from looking at Google Maps satellite images, Canadian cities have a tendency to include more suburbs in their city limits. I’m not really sure why the municipality population is bigger than the urban area population for three Canadian cities, but the population figures are from the same year. I assume it has to do with some way that StatCan filters or presents the data.

Anyway, the average suburban inclusion likelihood for Canadian cities is 74.9%, while for the American cities this figure was only 27.4%. In other words, the average Canadian city will include 74.9% of the surrounding suburbs into its city population, while the average American city will only include 27.4%.

And apparently I wasn’t the only one to notice this tendency. I later on was reading the /Suburb article on Wikipedia and it had this written:

“In some areas, such as India, China, New Zealand, Canada, the United Kingdom, and parts of the United States, new suburbs are routinely annexed by adjacent cities due to urban sprawl. In others, such as Morocco, France, and much of the United States, many suburbs remain separate municipalities or are governed locally as part of a larger metropolitan area such as a county, district or borough.”

So to demonstrate this effect on crime rates, we’re going to separate urban Winnipeg and suburban Winnipeg.

Basically to determine whether something was suburban or not, I was mostly looking at things like the existence of driveways instead of back lanes, no commercial corridors (and just big box strip malls), not many entries into the neighborhood and non-gridded and arbitrarily winding roads. I do want to point out that Pittsburgh has lots of winding rounds but they are geographic reasons for this. Winnipeg on the other hand appears to be very flat, so winding roads correlated pretty well with other suburban characteristics.

Anyway, if Winnipeg had more “American-like” boundaries, it would have a population of 350,330, a homicide rate of 12.0, a robbery rate of 430.8, and an aggravated assault rate of 612.5, and a violence rating of 826.8. Rates quite a bit higher than its “actual” rates.


I also found some good yet outdated data from the Canadian city Regina. It’s from the years 2010 and 2011. Unfortunately, the newer data didn’t have homicide or robbery statistics. Therefore it would not be suitable to compare this older 2010’s data with the newer 2020’s data from the other five cities, however I was still interested to see how the crime rates would change if we took out the suburbs.

So with this data, Regina has a population of 193,120, a homicide rate of 4.1, and a robbery rate of 153.8, but no aggravated assault data. So that gives a violence rating of 330.1. But if we cut off the suburban fluff, we get a population of 101,810 with a homicide rate of 6.4, a robbery rate of 258.3 and a violence rating of 540.7.


Basically what I’ve found here is that Canadian cities appear to have lower crime rates due to the suburban fluffing of Canadian cities. We cannot do this same exercise with American cities because we cannot take suburbs off of most American cities because suburban areas are not included in the municipality boundaries to begin with.

As a final note here: some of you might disagree with the way I delineated between urban and suburban areas in these cities. In reality, there is a continuum between the two categories and there are no strict perfect ways to categorize them. Many neighborhoods do straddle between being urban and suburban and I had to make a lot of iffy decisions. You can check which neighborhoods I called “suburban” in the downloadable data package.

A new metric… Crime inequality score

Moving on… I want to introduce a new metric here, called the crime inequality score. The metric here is calculated like how an income equality metric would be calculated, such as the palma ratio or the 20:20 ratio. How these metrics work is they take the wealth of the wealthiest ## percentage of a community, city or country, and see how much richer they are than the poorest ## percentage. The palma ratio takes the richest 10% and the poorest 40%, while the 20:20 ratio takes the richest 20% and the poorest 20%.

The metric we’re introducing here is similar. We will compare the violence rating of the worst 5% of the city and the violence rating of the safest 50% of the city. All we will do is divide the violence rating of the worst 5% by the safest 50%, and that will give us the crime inequality score. A lower the score means the city’s violence is more equally distributed around the city. A score of 1 means the crime is perfectly distributed. A higher score means the crime is more heavily concentrated in certain areas.


I’ve also included Regina in this metric. As we can see, the two Canadian cities’ crime inequality scores are by far the worse at 51.0 and 47.0. It seems that suburban fluff will make a city’s crime rates look better, but it will make a city’s crime inequality score look really bad.

I believe that by taking into account a city’s crime inequality score and its overall violence rating, we get a more clear but still general picture on the crime situation in a city. If we just look at the crime inequality score, then Buffalo and Portland look pretty good… however if we take into account the violence rating of the entire city, we can see that Buffalo is doing a much worse job than Portland with violence. On the other hand, if we just look at the crime rates in the cities, it looks like Buffalo is doing a lot worse than Winnipeg, when in reality, Winnipeg seems to be doing pretty atrociously, just like Buffalo, which we can see if we take into account the crime inequality score.

Outro!

Believe it or not but I have a lot more I could say, but i don’t want this post to drag on for too long. It’s already very long as is. There’s almost 5,000 words. I could write 5,000 more. But I just want to end this article off coining some new terms, “fallacies” if you will, based around inappropriately using city-wide crime rates.

The “violence uniformity fallacy”: when someone assumes criminal activity and violence are evenly distributed throughout a city. We all know this is false because we know “bad parts of town” exist, yet when we use language like “a dangerous city” (“Camden is more dangerous than Minneapolis”), we are falling for this fallacy.

The “city-wide extrapolation fallacy”: when someone extrapolates the city-wide crime rates and assume it correlates to how dangerous the most crime-ridden neighborhoods and the city’s downtown are. We know this is false because the worst neighborhood in Pittsburgh, Minneapolis and Buffalo all have very close violence ratings, despite that these three cities have drastically different city-wide rates.

The “standard distribution of crime fallacy”: when someone assume that the distribution of crime in each city is the same as the next. For example, that in all cities: 5% of the neighborhoods are really bad and 25% are moderate and 70% are safe. This assumption then leads them to conclude that if a city has higher crime rates than another city, that its bad areas must be more violent and that its safe areas must be less safe. We already saw the “crime ribbons” and know that the distributions of crime in two cities can be wildly different.

Sources, data notes & explanations and extra data can be found here.


Comments

Popular posts from this blog

Crime rates at the neighborhood level: American, Canadian neighborhoods

Homicide rates: are they more effected by violence in general, or just gun violence?

Rural crime rates by Indigenous nation in Canada (2019 to 2023)