I like the manga / anime One Piece. In the world of One Piece, notorious pirates and outlaws are given bounties by the sinister World Government. A number of characters went up in value in the most recent chapter (ch. 1058), so I decided to make a little chart showing each character’s most recent bounty. I didn’t feel like manually searching through every chapter for the bounties, so I figuerd it would be easier to scrape them from a fan wiki, thereby profiting by the work of some nerd who has already done so.
# Importing some libraries
library(tidyverse) # for dplyr, etc
library(rvest) # for web scraping
library(ggthemes) # for nice looking charts
# Options
options(scipen = 999) # I don't like scientific notation
First, we’re going to use the read_html()
function from the {rvest}
package to grab the html from a fan wiki.
We’ll first grab the full html with read_html()
, then we’ll get the tables with html_nodes('table') |> html_table()
. You can read more about how {rvest}
works here.
webpage <- read_html("https://onepiece.fandom.com/wiki/Bounties/List") |>
html_nodes("table") |>
html_table()
The result is a big list of tables, one for each one on the wiki page. Here’s what that looks like:
We’re going to loop through this list, extracting all the information from each table into a nice neat dataframe. We’ll start with an empty tibble, then loop through each table in webpage
, grabbing and cleaning all the data as we go.
ds_ <- tibble()
for (i in 1:length(webpage)){
tbl <- webpage[[i]] |>
# Clean up the column / variable names
janitor::clean_names() |>
mutate(
# We want the notes in a separate column,
# but the tables have them as separate rows.
# So we'll move this over into a new variable
# on every other row, then delete the leftovers
row_number = row_number(),
notes = if_else(row_number %% 2 == 0, name, as.character(NA))
) |>
fill(notes, .direction = "up") |>
filter(row_number %% 2 != 0) |>
# Now to clean the data of "[3]" wiki citations, random slashes, etc.
mutate(
nickname = gsub("[[:punct:]]", "", nickname),
nickname = trimws(gsub("[0-9]", "", nickname)),
bounty = gsub("\\[[0-9]*\\]", "", bounty),
bounty = parse_number(bounty)
) |>
select(-row_number)
# Eventually we get to the non-canon bounties, which are in a different format.
# So we'll just stop when that happens.
if(ncol(tbl) == 4) {
ds_ <- ds_ |>
rbind(tbl)
} else {
break
}
}
This gives a bunch of warnings for when the parse_number()
call doesn’t work (the notes rows), but who cares! We have our nice dataframe now. Here’s what it looks like:
Next we’ll make a nice graph and call it a day. I wanted to add a few more things to the data, so I went ahead and manually added a few things to the data in Libre Calc. I’m going to throw it into a quick {ggplot}
chart.
# The scraped data w/ manually added details
ds <- read_csv("one-piece-bounties-fixed.csv")
crew_list <- c("Straw Hat Pirates", "Grand Fleet", "Cross Guild", "Beast Pirates",
"Revolutionary Army", "Big Mom Pirates", "Blackbeard Pirates",
"Donquixote Pirates", "Whitebeard Pirates")
ds |>
filter(!is.na(bounty)) |>
mutate(crew = if_else(crew %in% crew_list, crew, "Other")) |>
# The ggplot itself
ggplot(aes(x = reorder(name, bounty), y = bounty, fill = crew)) +
geom_col(color = "black") +
geom_text(aes(label = paste0(" ", format(bounty, big.mark = ","), " (", last_update, ")")),
hjust = "left", size = 3) +
guides(fill = guide_legend(nrow = 9)) +
scale_y_continuous(labels = scales::comma,
limits = c(0, 7500000000)) +
scale_fill_manual(values = c("#114c8f", "#f1bec4", "black", "#73d7e7",
"#aa5e84", "grey", "red", "yellow", "white")) +
coord_flip() +
theme_solarized() +
theme(
legend.position = c(0.8, 0.5),
plot.title.position = "plot"
) +
labs(
title = "One Piece World Government Bounties",
subtitle = "confirmed canon bounties only, as of chapter 1058",
x = "",
y = "Bounty (berry)",
fill = "Crew / Affiliation"
)
Nice! The main takeaway is that Chopper is disrespected and deserves better. #RespectChopper