---
title: "Ingesting text"
format: html
---
This is from Section 19.3 of the Modern Data Science with R 2e book.
# Using *rvest*
Take a look at the Wikipedia [List of songs recorded by the Beatles](http://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles).
In the book the second list of Other songs is used. I have used the Main Songs list.
A great reference for regex (commands like gsub) is the [r4ds](https://r4ds.had.co.nz) book, see Chapter 14 about strings
```{r}
library(tidyverse)
library(rvest)
library(tidyr)
library(methods)
library(mdsr)
library(tm)
url <- "http://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles"
tables <- url %>%
read_html() %>%
html_nodes(css = "table")
tables
songs <- html_table(tables[[4]])
glimpse(songs)
songs
other <- html_table(tables[[5]])
glimpse(other)
other
```