Analysis of NYC DOHMH Restaurant Inspection Data: Part 1, Exploratory Data Analysis
With an important public health function, material business impact, byzantine regulations and good data availability through NYC Open Data, the NYC Department of Health and Mental Hygiene (DOHMH) restaurant health inspection system is a target rich environment for applied data analysis.
My analysis of the system begins with an exploratory data analysis and review of salient aspects of the documentation provided by NYC DOHMH, presented as a Jupyter notebook here.
While my exploratory findings are in line with what previous analyses have found, greater consideration of the relevant documentation suggests areas for further research. See Analysis of NYC DOHMH Restaurant Inspection Data: Part 2, Seasons of Vermin for more.
New York City has approximately 27,000 restaurants, delis and other food service providers across its five boroughs (1). The task of ensuring compliance with food service regulations and keeping patrons safe falls to the New York Department of Health and Mental Hygiene. Starting in 2010, restaurants were required to post their most recent health inspection grade in clear view of passersby. This change made the health inspection system significantly more salient to everyday people, increasing the business importance of having a strong grade to reassure potential customers. In fact, the city claims that 88% of people consider restaurant grades when deciding where to eat (2).
So how are these grades actually determined? Per DOHMH, restaurants receive an initial inspection during which they are assigned points for any violations found. If those points add up to less than 14, the restaurant receives an A and will be reinspected in 12 months. If the restaurant receives 14 or more points, the restaurant will be reinspected 7 or more days later. The score from the re-inspection determines the grade to be posted. If the restaurant scores more than 14 points again, they have the option of posting their B or C grade card or a “GRADE PENDING” card, until their violations are adjudicated through an Office of Administrative Trials and Hearings (OATH) hearing. This process is summarized in the graphic below.
In other words, each restaurant has three chances per cycle to receive an A (Initial inspection, re-inspection, final adjudication). Note that the initial score also determines how quickly the inspection cycle begins again for each restaurant; restaurants with a higher number of violations are reinspected sooner than those with fewer violations. This graphic also does not describe conditions under which restaurants face closure by the DOHMH.
So what are restaurants graded on under this system? The short answer is sanitary violations, ranging from proper food safety training and supervision, proper prep surfaces and conditions, storage of food items at the correct temperature and presence or evidence of several kinds of pests. The long answer in the form of the full list of violations is here, starting on page 8.
DOHMH Inspection data is provided through the NYC OpenData portal. Let’s begin by exploring the dataset to get a feel for how restaurants and grades are distributed throughout the city before diving into any more substantial research questions.
The full Jupyter notebook is accessible here, through Google Colab.
Overall, this analysis appears to suggest that:
There is not substantial variation in the overall distribution of scores across boroughs or cuisines, indicating reasonably consistent application of the inspection standards from restaurant to restaurant across the DOHMH health inspector corps
The abundance of A’s in the data (just under 90%) , combined with the clustering of scores just below each grade’s cutoff and the large number of re-inspections raises more questions than it answers about how A grades are obtained and fairness in that process, especially given the OATH hearing adjudication pathway.
These questions include:
On the highest level, is the prevalence of A’s due to successful and persistent remediation of issues uncovered during the initial inspection, or band-aid fixes following the initial inspection or hard bargaining during the OATH adjudication process? If the former is true, then the city’s approach is clearly effective and the number of opportunities allowed to remedy issues supports the ultimate goal of ensuring food safety in the city. If the latter is true, the system looks and likely is ineffective in ensuring consumer safety and instead is more effective as a source of citation revenue. This is hard to disentangle given the lack of clarity around the adjudication process provided in the dataset. However, it may be possible to gain some insight into these effects by examining a set of observations before and after the adjudication process, by extracting the most recent observations at time t and then observing what has been removed from the dataset at time t+6 months, given that the dataset captures sustained or not-yet-adjudicated citations (p. 1). The larger the decrease in the number of rows at t+6 months versus the original observation (call this the adjudication spread), the greater the impact of the OATH adjudication process.
In a similar vein, what percentage of restaurants obtain A’s through the OATH hearing adjudication pathway? If this is a common pathway for A scores, this could put recently immigrated restaurant owner at a disadvantage, given their lack of familiarity with navigating American governmental institutions. The potential for disequity could be examined by correlating the percentage of recent immigrants by zipcode with the average inspection scores for those zipcodes, or even more specifically by performing the same analysis for the adjudication spread, instead of the average score. Data on recent immigrant populations have proven hard to find as of yet, postponing this exercise.
Is there a substantial difference between a restaurant scoring no points and 14 points? If there is, the grading system is failing to adequately reflect the riskiness of dining choices to consumers and should offer greater clarity to recognize the efforts.
How do seasonal effects impact score and grade distributions? It seems likely that populations of mice, rats and bugs will fluctuate over the course of the year, as should their relative prevalence in kitchens. If vermin are less present in a particular part of the year, restaurants that happen to be reviewed in the advantaged season may be inspected less frequently than restaurants inspected in other parts of the year (since initial inspection or ‘natural’ A grades result in an annual inspection cycle, while higher initial inspection scores trigger a shorter cycle), creating an unfair advantage for them, and also giving different values to the same grade depending on when it is earned. Additionally, the shorter review cycle for restaurants with higher initial inspection scores would serve to concentrate reviews in the advantaged season, unbalancing the DOHMH inspectors’ burden.
This last question will be further explored in the second part of this analysis, Seasons of Vermin.