The New York City Department of Health and Mental Hygiene (DOHMH) conducts unannounced restaurant inspections on an annual basis in order to check for compliance with policies on food handling, food temperature, personal hygiene of restaurant workers, and vermin control. Regulation violations are each worth a pre-specified number of points, which are totaled at the end of the inspection. Scores are converted into grades, where a lower score earns a higher grade.
We’re going to make some plotly plots.
library(tidyverse)
library(p8105.datasets)
library(plotly)
First step is to import the dataset through p8105.datasets.
data("rest_inspec")
I did some data cleaning, and choose only Italian restaurants for further analysis.
data("rest_inspec")
nyc_inspections_italian =
rest_inspec %>%
drop_na() %>%
filter(cuisine_description == "Italian")
nyc_inspections_italian
## # A tibble: 8,437 × 18
## action boro build…¹ camis criti…² cuisi…³ dba inspection_date inspe…⁴
## <chr> <chr> <chr> <int> <chr> <chr> <chr> <dttm> <chr>
## 1 Viola… MANH… 378 4.04e7 Critic… Italian SALU… 2015-05-20 00:00:00 Cycle …
## 2 Viola… MANH… 132 5.00e7 Critic… Italian INNS… 2016-07-22 00:00:00 Cycle …
## 3 Viola… MANH… 425 4.15e7 Critic… Italian SPIN… 2015-10-05 00:00:00 Cycle …
## 4 Viola… MANH… 425 4.15e7 Critic… Italian SPIN… 2016-04-18 00:00:00 Cycle …
## 5 Viola… MANH… 132 5.00e7 Not Cr… Italian INNS… 2017-10-10 00:00:00 Cycle …
## 6 Viola… MANH… 425 4.15e7 Critic… Italian SPIN… 2017-03-30 00:00:00 Cycle …
## 7 Viola… MANH… 4 5.00e7 Not Cr… Italian ULIVO 2016-05-18 00:00:00 Pre-pe…
## 8 Viola… MANH… 378 4.04e7 Not Cr… Italian SALU… 2016-06-17 00:00:00 Cycle …
## 9 Viola… MANH… 108 5.00e7 Not Cr… Italian BARI… 2016-03-07 00:00:00 Pre-pe…
## 10 Viola… MANH… 425 4.15e7 Critic… Italian SPIN… 2017-03-30 00:00:00 Cycle …
## # … with 8,427 more rows, 9 more variables: phone <chr>, record_date <dttm>,
## # score <int>, street <chr>, violation_code <chr>,
## # violation_description <chr>, zipcode <int>, grade <chr>, grade_date <dttm>,
## # and abbreviated variable names ¹building, ²critical_flag,
## # ³cuisine_description, ⁴inspection_type
My first plot:
# Plotly line graph: This scatter plot shows the mean score trends of Italian restaurants through 2017 by Borough.
nyc_inspections_italian %>%
separate(inspection_date, c("year", "month", "date")) %>%
filter(year == 2017) %>%
group_by(boro, month) %>%
summarise(mean_score = mean(as.numeric(score))) %>%
plot_ly(
x = ~month, y = ~mean_score, type = "scatter", mode = "lines",
color = ~boro, alpha = 1)
## `summarise()` has grouped output by 'boro'. You can override using the
## `.groups` argument.
My second plot:
# Plotly barchart: This barchart shows the number of Italian restaurant with Grade A in different Borough in 2017.
nyc_inspections_italian %>%
separate(inspection_date, c("year", "month", "date")) %>%
filter(grade == "A", year == 2017) %>%
count(boro) %>%
mutate(boro = fct_reorder(boro, n)) %>%
plot_ly(x = ~boro, y = ~n, color = ~boro, type = "bar", colors = "viridis")
My third plot:
# Plotly boxplot: This boxplot displays the scores in each Borough in 2017. The higher the score, the more severe of the violations.
nyc_inspections_italian %>%
separate(inspection_date, c("year", "month", "date")) %>%
filter(year == 2017) %>%
mutate(score = as.numeric(score),
boro = fct_reorder(boro, score)) %>%
plot_ly(y = ~score, color = ~boro, type = "box", colors = "viridis")