Scraping One Piece bounties from a Wiki

I like the manga / anime One Piece. In the world of One Piece, notorious pirates and outlaws are given bounties by the sinister World Government. A number of characters went up in value in the most recent chapter (ch. 1058), so I decided to make a little chart showing each character’s most recent bounty. I didn’t feel like manually searching through every chapter for the bounties, so I figuerd it would be easier to scrape them from a fan wiki, thereby profiting by the work of some nerd who has already done so.

# Importing some libraries
library(tidyverse)    # for dplyr, etc
library(rvest)        # for web scraping
library(ggthemes)     # for nice looking charts

# Options 
options(scipen = 999) # I don't like scientific notation

First, we’re going to use the read_html() function from the {rvest} package to grab the html from a fan wiki.

We’ll first grab the full html with read_html(), then we’ll get the tables with html_nodes('table') |> html_table(). You can read more about how {rvest} works here.

webpage <- read_html("https://onepiece.fandom.com/wiki/Bounties/List") |>
  html_nodes("table") |>
  html_table()

The result is a big list of tables, one for each one on the wiki page. Here’s what that looks like:

What a mess!

We’re going to loop through this list, extracting all the information from each table into a nice neat dataframe. We’ll start with an empty tibble, then loop through each table in webpage, grabbing and cleaning all the data as we go.

ds_ <- tibble()
for (i in 1:length(webpage)){
  
tbl <- webpage[[i]] |>
  # Clean up the column / variable names
  janitor::clean_names() |>
  mutate(
    # We want the notes in a separate column, 
    # but the tables have them as separate rows.
    # So we'll move this over into a new variable 
    # on every other row, then delete the leftovers   
    row_number = row_number(),
    notes = if_else(row_number %% 2 == 0, name, as.character(NA))
  ) |>
  fill(notes, .direction = "up") |>
  filter(row_number %% 2 != 0) |>
  # Now to clean the data of "[3]" wiki citations, random slashes, etc.
  mutate(
    nickname = gsub("[[:punct:]]", "", nickname),
    nickname = trimws(gsub("[0-9]", "", nickname)),
    bounty = gsub("\\[[0-9]*\\]", "", bounty),
    bounty = parse_number(bounty)
  ) |>
  select(-row_number)

# Eventually we get to the non-canon bounties, which are in a different format. 
# So we'll just stop when that happens.
if(ncol(tbl) == 4) {
  ds_ <- ds_ |>
    rbind(tbl)
} else {
  break
}

}

This gives a bunch of warnings for when the parse_number() call doesn’t work (the notes rows), but who cares! We have our nice dataframe now. Here’s what it looks like:

Nice and tidy!

Next we’ll make a nice graph and call it a day. I wanted to add a few more things to the data, so I went ahead and manually added a few things to the data in Libre Calc. I’m going to throw it into a quick {ggplot} chart.

# The scraped data w/ manually added details
ds <- read_csv("one-piece-bounties-fixed.csv")

crew_list <- c("Straw Hat Pirates", "Grand Fleet", "Cross Guild", "Beast Pirates",
               "Revolutionary Army", "Big Mom Pirates", "Blackbeard Pirates",
               "Donquixote Pirates", "Whitebeard Pirates")

ds |>
  filter(!is.na(bounty)) |>
  mutate(crew = if_else(crew %in% crew_list, crew, "Other")) |>
  # The ggplot itself
  ggplot(aes(x = reorder(name, bounty), y = bounty, fill = crew)) +
  geom_col(color = "black") +
  geom_text(aes(label = paste0(" ", format(bounty, big.mark = ","), " (", last_update, ")")), 
    hjust = "left", size = 3) +
  guides(fill = guide_legend(nrow = 9)) +
  scale_y_continuous(labels = scales::comma,
                     limits = c(0, 7500000000)) +
  scale_fill_manual(values = c("#114c8f", "#f1bec4", "black", "#73d7e7",
                               "#aa5e84", "grey", "red", "yellow", "white")) +
  coord_flip() +
  theme_solarized() +
  theme(
    legend.position = c(0.8, 0.5),
    plot.title.position = "plot"
  ) +
  labs(
    title = "One Piece World Government Bounties",
    subtitle = "confirmed canon bounties only, as of chapter 1058",
    x = "",
    y = "Bounty (berry)",
    fill = "Crew / Affiliation"
  )

Nice! The main takeaway is that Chopper is disrespected and deserves better. #RespectChopper