html_table {rvest} | R Documentation |
Parse an html table into a data frame.
html_table(x, header = NA, trim = TRUE, fill = FALSE, dec = ".")
x |
A node, node set or document. |
header |
Use first row as header? If |
trim |
Remove leading and trailing whitespace within each cell? |
fill |
If |
dec |
The character used as decimal mark. |
html_table
currently makes a few assumptions:
No cells span multiple rows
Headers are in the first row
tdist <- read_html("http://en.wikipedia.org/wiki/Student%27s_t-distribution") tdist %>% html_node("table.infobox") %>% html_table(header = FALSE) births <- read_html("https://www.ssa.gov/oact/babynames/numberUSbirths.html") html_table(html_nodes(births, "table")[[2]]) # If the table is badly formed, and has different number of rows in # each column use fill = TRUE. Here's it's due to incorrect colspan # specification. skiing <- read_html("http://data.fis-ski.com/dynamic/results.html?sector=CC&raceid=22395") skiing %>% html_table(fill = TRUE)