--- title: "Ingesting text" format: html --- This is from Section 19.3 of the Modern Data Science with R 2e book. # Using *rvest* Take a look at the Wikipedia [List of songs recorded by the Beatles](http://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles). In the book the second list of Other songs is used. I have used the Main Songs list. A great reference for regex (commands like gsub) is the [r4ds](https://r4ds.had.co.nz) book, see Chapter 14 about strings ```{r} library(tidyverse) library(rvest) library(tidyr) library(methods) library(mdsr) library(tm) url <- "http://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles" tables <- url %>% read_html() %>% html_nodes(css = "table") tables songs <- html_table(tables[[4]]) glimpse(songs) songs other <- html_table(tables[[5]]) glimpse(other) other ```