Uncovering the Link Between Municipalities’ Influence, Public Health Spending, and Patient Displacement in Brazil

A compelling storytelling journey with Quarto, Shiny, and ChatGPT

Fernando Barbalho
Towards Data Science

--

Photo by Natanael Melchor on Unsplash

The Brazilian public health system has long been fighting battles to improve its efficiency in allocating resources and providing care. One of the most significant challenges is the need for patients to travel to other cities to receive necessary hospital treatment. Using data from Brazilian National Health System, we estimated that In 2021 alone, patients all over the country took part in around 4.0 million such trips. The article explores the implementation details of a project that deals with this fascinating public health issue. Reading the text may be particularly important to those who work with public policies on health care. Also, because of the detail of the code used in the final product, the document may interest professionals working with data visualization and storytelling.

To better understand the issue in this article, we investigated the relationship between hospital attendance spending and patient displacement. Our analysis revealed a significant inequality in spending among Brazilian municipalities, with smaller cities spending much less than their larger counterparts. We hypothesized that this disparity in spending would lead to a significant flow of patients from cities with low hospital capacity to those with greater availability of care infrastructure.

We confirmed our hypothesis using a city influence model (REGIC) produced by the Brazilian Institute of Geography and Statistics (IBGE). We demonstrated that patient displacement primarily affects smaller cities with less management capacity and proportional hospital and outpatient care spending. Meanwhile, larger municipalities with more management and influence capacity are more likely to receive patients from outside, increasing the demand for their health services.

To make our findings more accessible to a broader audience, we developed an interactive dashboard using Shiny, Quarto, and data visualization techniques. The dashboard lets users explore the data in real time, with dynamic ChatGPT prompts providing additional insights and storytelling elements. By leveraging these tools, we hope to shed new light on the challenges facing the Brazilian public health system and contribute to ongoing efforts to improve its performance. See the complete analysis and the interactive pages through this link. In addition, the following sections present some graphs used in the product and the related code.

Public health data in graphs (and codes)

The visualizations used in the product were focused heavily on maps. The idea was to show how Brazilian cities behaved concerning the need to travel for hospital treatment. The graph below, for example, shows how the cities differ in the outflow of patients to seek care in other municipalities.

Patients were traveling for assistance. Image by the author.

Below is the block of code that builds a dataset that will later be used as the basis for maps using ggplot.

agrupamento_municipio<-
dataset_analise %>%
filter(
deslocamento ==1) %>%
group_by(munic_res) %>%
summarise(
numero_internacoes = n()
) %>%
mutate(code_muni = munic_res,
tipo_deslocamento = "saida" ) %>%
bind_rows(
dataset_analise %>%
filter(
deslocamento ==1) %>%
group_by(codufmun) %>%
summarise(
numero_internacoes = n()
) %>%
mutate(code_muni = codufmun,
tipo_deslocamento = "entrada"),
dataset_analise %>%
filter(
deslocamento ==0) %>%
group_by(codufmun) %>%
summarise(
numero_internacoes = n()
) %>%
mutate(code_muni = codufmun,
tipo_deslocamento = "local")
) %>%
group_by(code_muni, tipo_deslocamento) %>%
summarise(
total_internacoes = sum(numero_internacoes)
)

agrupamento_municipio<-
agrupamento_municipio %>%
tidyr::pivot_wider(names_from = tipo_deslocamento, values_from = total_internacoes) %>%
mutate(liquido = ifelse(is.na(entrada),0,entrada)+
ifelse(is.na(local),0,local)-
ifelse(is.na(saida),0,saida))

agrupamento_municipio<-
agrupamento_municipio %>%
mutate(local = ifelse(is.na(local),0,local),
saida = ifelse(is.na(saida),0,saida),
entrada = ifelse(is.na(entrada),0,entrada),
perc_saida = saida/(saida+local)*100,
perc_entrada = entrada/(entrada+local)*100,
perc_entrada = ifelse(is.nan(perc_entrada),0,perc_entrada))

municipios_seat %>%
mutate(code_muni = str_sub(as.character(code_muni),1,6)) %>%
inner_join(agrupamento_municipio
) %>%
inner_join(
REGIC_trabalho%>%
mutate(code_muni = str_sub(as.character(cod_cidade),1,6))
) %>%
ggplot()+
geom_sf(data = estados_mapa, fill=NA, color="#808080")+
geom_sf(aes( fill= perc_saida),pch=21, color="#444444", size=2.9)+
geom_text_repel(data = mun_sel_nivel_1A,aes(x=X, y=Y, label= name_muni),fontface = "bold", color="white")+
geom_text_repel(data = mun_sel_nivel_1B,aes(x=X, y=Y, label= name_muni),fontface = "bold", color="white")+
geom_text_repel(data = mun_sel_nivel_1C,aes(x=X, y=Y, label= name_muni),fontface = "bold", color="white")+
geom_text_repel(data = mun_sel_nivel_2A,aes(x=X, y=Y, label= name_muni),fontface = "bold", color="white", force =2)+
scale_fill_continuous_sequential(palette= "Heat 2")+
labs(
fill= str_wrap("% de pacientes internados em outros municípios",15)
)+
theme_light() +
theme(
text = element_text(size=20),
panel.background = element_rect(fill = "black"),
panel.grid = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
strip.background = element_rect(fill = "#505050"),
strip.text = element_text(color = "white"),
axis.text = element_blank(),
legend.key = element_rect(fill = "#15202B")

)+
facet_wrap(nome_nivel_hierarquia_ordenado~.)

Each point on the map is a municipality, and the colors indicate the percentage of patients from the city who demand care in another municipality. We used the “facet” data visualization feature to show where each city fits within the REGIC model. In this model, the most influential cities with the highest management capabilities are the metropolises (groups indicated in the first row of the figure), and the lowest level in the hierarchy is the local centers placed in the last frame.

As seen in the map, the local centers concentrate the points with stronger red shades that show the municipalities with a high percentage of patients who need to travel. On the other hand, the map presents all sub-hierarchical levels of metropolises with yellow colors that represent low travel needs.

The picture reinforces what we expected of the extreme hierarchical levels concerning autonomy given by management capabilities. With few services and low management, there is a strong dependence on hospital structures in other municipalities.

It would be essential to indicate how the flow between the various levels of the REGIC model occurs. For this purpose, we use alluvial graphs.

The flow between REGIC levels. Image by the author

The alluvial above was built using the code described below. Here it was necessary to make some data treatment before calling two functions that build the flow graphic.

ordem_y<- 
dataset_analise %>%
filter(deslocamento==1,
nome_nivel_hierarquia.x == "Centro Local",
!(is.na(nome_nivel_hierarquia.y))) %>%
group_by(nome_nivel_hierarquia.y) %>%
summarise(
quantidade = n()
) %>%
ungroup() %>%
inner_join(
de_para_hierarquia %>%
rename(nome_nivel_hierarquia.y=nome_nivel_hierarquia,
entrada_abreviado = nome_abreviado)) %>%
arrange(quantidade) %>%
mutate(entrada = entrada_abreviado)

aluvial<-
dataset_analise %>%
filter(deslocamento==1,
nome_nivel_hierarquia.x == "Centro Local",
!(is.na(nome_nivel_hierarquia.y))) %>%
mutate(saída = nome_nivel_hierarquia.x,
entrada =nome_nivel_hierarquia.y ) %>%
select(saída, entrada)

aluvial<-
aluvial %>%
inner_join(
de_para_hierarquia %>%
rename(saída=nome_nivel_hierarquia,
saida_abreviado = nome_abreviado)) %>%
inner_join(
de_para_hierarquia %>%
rename(entrada=nome_nivel_hierarquia,
entrada_abreviado = nome_abreviado)) %>%
select(saida_abreviado, entrada_abreviado ) %>%
rename(saída= saida_abreviado,
entrada = entrada_abreviado)

aluvial$entrada <- factor(aluvial$entrada, levels = unique(ordem_y$entrada[order(ordem_y$quantidade)]))

p<-
alluvial_wide( data = aluvial,
max_variables = 2,
fill_by = 'first_variable')

parcats::parcats(p, data_input = aluvial,marginal_histograms = FALSE,labelfont = list(size = 15, color = "black"), sortpaths= "backwards")

The flow represented in the figure above shows that patients from municipalities categorized as local centers go mainly, in this order, to sub-regional centers B (17.6%), regional capitals C (16.6%), sub-regional centers A (15.7%), and only in fourth position appear the metropolises (13.8%). With this, there is no doubt that the REGIC hierarchy is, in a certain way, climbed when patients require hospital care. The metropolis is attractive to local centers, but other urban levels’ proximity and management capacity moderates it.

We identified that some cities are significant in receiving patients from other cities. So we made two maps that bring the impact of travel to two cities that stand out in receiving patients, showing both the distances traveled and the number of patients received. For this purpose, see below the map focusing on Recife, the Brazilian city that receives most patients from other locations.

Patients traveling to Recife. Image by the author.

The code below is a bit long. There is a lot of data transformation before the instructions that create two objects with the two graphs shown above. Here we use the {patchwork} package to place the charts side by side.

municipio_selecionado<-"261160"
muni_sel<-
dataset_analise %>%
filter(deslocamento ==1,
codufmun== municipio_selecionado) %>%
group_by(codufmun,nome_nivel_hierarquia_ordenado.y, uf.y) %>%
summarise(quantidade = n()) %>%
rename(code_muni= codufmun,
hierarquia = nome_nivel_hierarquia_ordenado.y,
uf = uf.y) %>%
mutate(tipo_deslocamento = "destino",
distancia = 0) %>%
bind_rows(
dataset_analise %>%
filter(deslocamento ==1,
codufmun== municipio_selecionado) %>%
group_by(munic_res,nome_nivel_hierarquia_ordenado.x, uf.x) %>%
summarise(
quantidade = n(),
distancia =min(distancia)
) %>%
ungroup() %>%
rename(code_muni= munic_res,
hierarquia = nome_nivel_hierarquia_ordenado.x,
uf=uf.x)%>%
mutate(tipo_deslocamento = "origem")
)

muni_sel_posicao<-
dataset_analise %>%
dplyr::filter(deslocamento ==1,
codufmun== municipio_selecionado)%>%
distinct(codufmun, mun_res_lat.x, mun_res_lat.y, mun_res_lon.x, mun_res_lon.y,distancia)

muni_sel_posicao<-
municipios_seat %>%
mutate(code_muni = str_sub(as.character(code_muni),1,6)) %>%
inner_join(
muni_sel_posicao %>%
rename(code_muni= codufmun)
)

muni_sel_repel<-
municipios_seat %>%
mutate(code_muni = str_sub(as.character(code_muni),1,6)) %>%
filter(code_muni %in% c("260960", "260790","120020")) %>% #261160-Recife,260790 -Jaboatão, 260960 - Olinda, 260410 - Caruarau, 260545 - Fernando de Noronha, 120020 - Cruzeiro do Sul-AC
inner_join(muni_sel)

xmin<- min(min(muni_sel_posicao$mun_res_lon.x), min(muni_sel_posicao$mun_res_lon.y)) -1
xmax <- max(max(muni_sel_posicao$mun_res_lon.x), max(muni_sel_posicao$mun_res_lon.y)) +1

ymin<- min(min(muni_sel_posicao$mun_res_lat.x), min(muni_sel_posicao$mun_res_lat.y)) -1
ymax <- max(max(muni_sel_posicao$mun_res_lat.x), max(muni_sel_posicao$mun_res_lat.y)) +1

g1<-
municipios_seat %>%
mutate(code_muni = str_sub(as.character(code_muni),1,6)) %>%
inner_join(
muni_sel
) %>%
ggplot()+
geom_sf(data = estados_mapa, fill=NA, color="#505050")+
geom_curve(data=muni_sel_posicao, aes(x=mun_res_lon.x,y=mun_res_lat.x,xend=mun_res_lon.y,yend=mun_res_lat.y, colour= distancia),
curvature = -.25, ncp = 800,size = 1)+
geom_sf(fill="white",size=1.9,pch=21, color="#444444")+
scale_fill_discrete_qualitative(palette="dark2")+
scale_color_continuous_sequential(palette= "Heat 2")+
coord_sf(xlim = c(xmin,xmax), ylim=c(ymin,ymax))+
labs(
fill= "",
color = str_wrap("distância em Km",10)
)+
theme_light() +
theme(
text = element_text(size=18),
panel.background = element_rect(fill = "black"),
panel.grid = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
strip.background = element_rect(fill = "#505050"),
strip.text = element_text(color = "white"),
axis.text = element_blank(),
)
muni_sel_foco<-
municipios_seat %>%
mutate(code_muni = str_sub(as.character(code_muni),1,6)) %>%
inner_join(
muni_sel%>%
filter(code_muni==municipio_selecionado)
)

muni_sel<-
muni_sel%>%
filter(code_muni!=municipio_selecionado)

set.seed(1972)
g2<-
municipios_seat %>%
mutate(code_muni = str_sub(as.character(code_muni),1,6)) %>%
inner_join(
muni_sel
) %>%
ggplot()+
geom_sf(data = estados_mapa, fill=NA, color="#505050")+#505050
geom_sf( aes(fill=quantidade),pch=21, color="#444444", size=2, show.legend = TRUE)+
geom_sf( data= muni_sel_foco, aes(size=quantidade),pch=21, color="#444444", fill="white")+
geom_text_repel(data = muni_sel_repel,
aes(x=X, y=Y, label= str_wrap(paste(name_muni,":",quantidade),10)),
color = "white",
limits = c(0,2352),
fontface = "bold",
nudge_x = c(0,2,2.5),
nudge_y = c(0,-3.5,2),
show.legend = TRUE)+
geom_text_repel(data = muni_sel_foco,
aes(x=X, y=Y, label= str_wrap(name_muni,20)),
fontface = "bold",
color="white",
nudge_x = c(3),
nudge_y = c(0))+
scale_fill_continuous_sequential(palette= "Heat", trans= "log2" )+
coord_sf(xlim = c(xmin,xmax), ylim=c(ymin,ymax))+
labs(

fill = str_wrap("Quantidade de saídas",15),
size= str_wrap("Quantidade de entradas",15)
)+
theme_light() +
theme(
text = element_text(size=18),
panel.background = element_rect(fill = "black"),
panel.grid = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
strip.background = element_rect(fill = "#505050"),
strip.text = element_text(color = "white"),
axis.text = element_blank(),
legend.key = element_rect(fill = "#15202B")
)

library(patchwork)
g1|g2

The influence of the metropolis of Recife can be seen throughout Brazil when it comes to hospital care. Looking at the Haversian distances on the map on the left, one can see that Recife serves patients who live more than 4000 km away. The sample indicates that the capital of Pernambuco received patients from almost all the federative units of the country. On the other hand, when evaluating the map on the right, one notices that the most significant influence occurs in the cities of the metropolitan region, particularly Olinda and Jaboatão dos Guararapes. It is also easy to see an extension of points in red shades that extends over the entire state of Pernambuco. One can also identify the influence over neighboring states, notably Paraíba, Alagoas, and Rio Grande do Norte.

At the end of our original storytelling, we needed to demonstrate that the lower municipal spending on hospital care is associated with a greater need for patient displacement. To test this association, we created clusters from the percentage of evaded and received patients considering each of the 5570 Brazilian municipalities. Using the silhouette measure of clusters generated by the PAM technique, we identified four groups: moderate entry, weak exit, moderate exit, and strong exit. This last group is the most relevant for the analysis. The graphs below provide insights for evaluating group importance.

Box-plot for expenses on Hospital care. Image by the author.

The code below starts by bringing into memory the data with the clusters of municipalities. Next, using ggplot we construct the two box-plots, placed side by side using the {patchwork} package.

agrupamento_municipio_cluster<-readRDS("agrupamento_municipio_2021.RDS")

g1<-
dataset_analise %>%
filter(deslocamento == 1,
perc.x>0,
perc.x<=50) %>%
distinct(nome_nivel_hierarquia.x,munic_res, perc.x) %>%
inner_join(
agrupamento_municipio_cluster %>%
rename(munic_res=code_muni)
) %>%
ggplot() +
geom_jitter(aes(x=cluster_4_k, y=perc.x, fill=perc_saida), pch=21, color="#444444",size=2)+
geom_boxplot(aes(x=cluster_4_k, y=perc.x),fill=NA, color= "white", outlier.shape = NA)+
scale_fill_continuous_sequential(palette= "Red-Yellow")+
theme_light() +
theme(
text = element_text(size=18),
panel.background = element_rect(fill = "black"),
panel.grid = element_blank(),
axis.title.x = element_blank(),
strip.background = element_rect(fill = "#505050"),
strip.text = element_text(color = "white"),
#axis.text = element_blank(),
axis.text.x = element_text(angle = 45, vjust = 0.5),
legend.key = element_rect(fill = "#15202B")
)+
labs(
fill= "(%) saída",
y = "Gastos Hospitalares e ambulatoriais - (%) do total"
)

g2<-
dataset_analise %>%
filter(deslocamento == 1,
perc.y>0,
perc.y<=50) %>%
distinct(nome_nivel_hierarquia.y,codufmun, perc.y) %>%
inner_join(
agrupamento_municipio_cluster %>%
rename(codufmun=code_muni)
) %>%
ggplot() +
geom_jitter(aes(x=cluster_4_k, y=perc.y, fill=perc_entrada), pch=21, color="#444444",size=2)+
geom_boxplot(aes(x=cluster_4_k, y=perc.y),fill=NA, color= "white", outlier.shape = NA)+
scale_fill_continuous_sequential(palette= "Red-Yellow")+
theme_light() +
theme(
text = element_text(size=18),
panel.background = element_rect(fill = "black"),
panel.grid = element_blank(),
axis.title.x = element_blank(),
strip.background = element_rect(fill = "#505050"),
strip.text = element_text(color = "white"),
axis.text.x = element_text(angle = 45, vjust = 0.5),
legend.key = element_rect(fill = "#15202B")
)+
labs(
fill= "(%) entrada",
y = "Gastos Hospitalares e ambulatoriais - (%) do total"
)
g1|g2

Each colored dot in the graphs is a Brazilian municipality. The colors of the dots represent the percentage of patient outflow on the left graph and the percentage of attendance of patients from other cities on the right graph. On the horizontal axis of the two graphs, we see the groups and the percentage of expenses with inpatient and outpatient care on the vertical axis.

Observing the graphs, we can see the most important conclusion of the association between expenses and patient inflow and outflow groups. When we analyze the outflow of patients in the left graph, we find that the strong exit group presents a median of proportional expenses with hospitals much lower than the other groups.

Did you ask for Shiny with interaction?

As we indicated at the beginning of the text, in addition to the main story, we prepared several tabs that allow the user to make filters and generate graphics that explore the reality of cities specified in the interaction. See below prints of some of these interactions. Note the possibility of downloading the data associated with the graphics in almost all tabs.

The flow of patients — Brazil. Image by the author
Flow of patients to a chosen city. Image by the author
Position of chosen cities in a Box-plot. Image by the author.

One tab brings the complete table with data that can be used for filters and downloads.

A table with an X-ray of the municipalities. Image by the author.

What about ChatGPT?

The last tab shows data that summarizes the primary information about a municipality selected by the user. See below the screen print.

Summary of information and generation of prompt to chatGPT. Image by the author

The last table is where one can interoperate with ChatGPT. The panel dynamically generates a prompt with the data present in the other tables of this last tab. The user can press the Copy button, take the prompt to ChatGPT and watch the magic happen. See screenshots of one example. (if the reader does not read Portuguese and wants to understand the prompt and the AI’s response, please contact me at my e-mail address: fbarbalho@gmail.com).

Prompt generated by the application. Image by the author
Text generated by chatGPT as a response to the prompt created by the application. Image by the author

Code and data

The complete code is available at github.

All datasets are characterized as public domain since they are data produced by Brazilian federal government institutions, made available on the internet as active transparency, and are subject to the Brazilian Access to Information Law.

The author acknowledges Ben Huberman for the valuable comments.

--

--

Doctor in Business Administration from UNB (2014). As data scientist, researches and implements products for transparency in the Brazilian public sector.