The pitching data.table (created in Quiz 3.3) contains one row for each combination of pitcher and year combination. Suppose we instead want to create a data.table with one row for each player, summarizing that player's performance across his career.
Which column in the pitching data.table uniquely identifies each individual player?
playerID
The G column represents the number of games each pitcher played in. The BB column represents the number of walks each pitcher allowed (BB stands for "base on balls")
Summarize the pitching dataset based on each player, to find the total number of games (the sum of the G column) each player played in his career
pitching[, sum(G), by="playerID"]
Summarize the pitching dataset based on each player, this time calculating two new columns- one called totalG containing total number of games (the sum of the G column), and one called totalBB containing the total number of walks the pitcher had allowed (the sum of the BB column). Save the result as a variable called "pitchers"
pitchers = pitching[, list(totalG=sum(G), totalBB=sum(BB)), by="playerID"]
Sort the "pitchers" data.table based on the total number of games each pitcher played
pitchers[order(totalG), ]
Summarize the pitching dataset, this time based on each team, calculating two columns- totalG (total games) and totalBB (the sum of the BB column). Save the result as a variable called "summarized.teams"
summarized.teams = pitching[, list(totalG=sum(G), totalBB=sum(BB)), by="teamID"]