网上看见很多趴取的方法,但是总是感觉很繁琐,没有体现R的简洁,现在给出一种优化的版本。
library(rvest)
library(rjson)
url = "http://data2.7m.cn/history_Matches_Data/2009-2010/92/en/fixture.js"
dat = url %>% html() %>% html_nodes('p') %>% html_text()
data = strsplit(dat,"\r\n")
nb = c(6,7,11,8,12)
d = lapply(nb,function(i) data[[1]][i] %>%
substr(regexpr("\\[",.)[1],regexpr("\\]",.)[1]) %>% fromJSON)
Total = do.call(cbind,d) %>% as.data.frame
names(Total) = c('轮次','日期','主队','得分','客队')
a = seq(1,nrow(Total),10)
England = lapply(a, function(i) Total[i:(i+9),])
> England[[38]]
轮次 日期 主队 得分 客队
371 38 2010,05,09,23,00,00 Everton 1-0(0-0) Portsmouth
372 38 2010,05,09,23,00,00 West Ham United 1-1(1-1) Manchester City
373 38 2010,05,09,23,00,00 Bolton Wanderers 2-1(1-0) Birmingham
374 38 2010,05,09,23,00,00 Manchester United 4-0(2-0) Stoke City
375 38 2010,05,09,23,00,00 Chelsea FC 8-0(2-0) Wigan Athletic
376 38 2010,05,09,23,00,00 Aston Villa 0-1(0-0) Blackburn Rovers
377 38 2010,05,09,23,00,00 Wolves 2-1(1-1) Sunderland
378 38 2010,05,09,23,00,00 Burnley 4-2(1-2) Tottenham Hotspur
379 38 2010,05,09,23,00,00 Arsenal 4-0(3-0) Fulham
380 38 2010,05,09,23,00,00 Hull City 0-0(0-0) Liverpool
> England[[1]]
轮次 日期 主队 得分 客队
1 1 2009,08,15,19,45,00 Chelsea FC 2-1(1-1) Hull City
2 1 2009,08,15,22,00,00 Blackburn Rovers 0-2(0-1) Manchester City
3 1 2009,08,15,22,00,00 Wolves 0-2(0-1) West Ham United
4 1 2009,08,15,22,00,00 Bolton Wanderers 0-1(0-1) Sunderland
5 1 2009,08,15,22,00,00 Stoke City 2-0(2-0) Burnley
6 1 2009,08,15,22,00,00 Portsmouth 0-1(0-1) Fulham
7 1 2009,08,15,22,00,00 Aston Villa 0-2(0-1) Wigan Athletic
8 1 2009,08,16,00,30,00 Everton 1-6(0-3) Arsenal
9 1 2009,08,16,20,30,00 Manchester United 1-0(1-0) Birmingham
10 1 2009,08,16,23,00,00 Tottenham Hotspur 2-1(1-0) Liverpool