In R a list or a vector can be created with the constructor
function, c(). To explore the constructor function we can create three vectors
containing made-up polling results for a collection of political candidates
from two networks:
After executing these three statements we can look in the R
Studio Environment window and confirm the vectors are associated with the data
types (numeric and character) as intended.
Our intention was to create a vector but can we confirm this
somehow? The R language includes a family of functions that test whatever is
passed to them so we can try is.vector(), is.data.frame(), and is.list() to see
what R returns for each of these tests.
There is a large list of is.___() functions the come with
the base installation of R and from the R console’s autocomplete functionality
the different functions can be scrolled through by typing “is.” and pausing in
the console for a scroll window to appear:
With these three individual vectors we can create a single
data frame to tie the network polling results together with the respective
candidates with the data.frame() function.
And as with the vectors, it is possible to quickly inspect
the success of this statement in the R Environment window:
The data.frame() function presumes the names for the data
frame elements are simply the names of the vectors. This inheritance naming methodology
is common in R as functions create new vectors, data frames, and other
structures.
If we want to calculate the mean value for individual
candidates we cannot simply apply the mean() function to the polling data
frame. Each candidate is represented on a single row so if we want the mean
value for each candidate perhaps there is an existing function that calculates
the mean of a row’s values? And we can pass that function the columns to
perform the mean over so it does not attempt to use the Name character string
column? The rowMeans() function meets these needs, where we pass it the row
number and then the column range. In this data the first row is Jeb and columns
2 & 3 contain the polling values.
When we do not specify a row number and instead leave the
value blank R presumes the user is seeking to use the entire data frame. So
when we execute rowMeans(polling[,2:3]) the function returns the average for
columns 2 & 3 of each row. It is also possible to insert these average
values back into the polling data frame, creating a new column name in the
process, with polling$average <- rowMeans(polling[,2:3]).
These average polling results can be plotted with qplot(polling$Name,
polling$average) to create a point plot like below:
Point plots for data such as this are not easy to read,
however, and are not an effective visualization of the descriptive statistics
in a case such as this and a bar plot would be easier to take in visually. Such
a bar plot can be created using barplot(polling$average,
names.arg=polling$Name):