The R language has three object oriented systems: S3, S4, and Reference Classes (also referred to as R5). A class refers to a type of object and attributes or values describing that specific object and every object must be a member of a class. A method refers to a function associated with an object of a given class. In this post I shall focus only upon S3 and S4 and will not touch upon R5/Reference Classes.
To determine if a given object is of S3 or S4 type there are two common function calls that can be used to determine the object's type. The function isS4() will return TRUE if the supplied object is S4 and false if it is anything other than S4. The function otype(), which is part of the Pryr package, however is more robust since it is not a simple true/false test for S4 but rather returns the object type definitively.
In the above example churn is a valid S3 object, inv101 was a valid S4 object, and inv501 was a simple vector array of three numbers. Since isS4() is simply providing a TRUE or FALSE response for whether or not the provided object is S4 it is not able to differentiate between a vector array and a S3 object while otype() does differentiate.
Determining the base type of an object can be found with a call to typeof(). Calling typeof() with the object itself will return S4 as well, making it another means to determine if an object is S4, but like isS4() it does not positively identify S3 and instead indicates a provided S3 object is of type 'list.' Passing typeof() the class member name will return the base type:
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsNnJh4poIzQsqHmj2ocWPeugOFTuRGxGT2zcFDoTEkkP-imtCQXI1kBzjg2QEMqtTOlXJUuhJo6mTLDF1tVYGFLNhdMBmPO-uBVb6wywu5eo4SeQP7MOJRxn_V7oGmMf2oY0m5ajEI-o/s1600/2.JPG)
In their 1988 First International Joint Conference of ISSAC paper (http://stepanovpapers.com/genprog.pdf) titled “Generic Programming” David Musser and Alexander Stepanov defined generic programming as centering “around the idea of abstracting from concrete, efficient algorithms to obtain generic algorithms that can be combined with different data representations to produce a wide variety of useful software.” A generic function does not require explicit arguments to be passed to it in order for it to execute. There may be other specific methods or functions which operate differently when additional arguments are supplied to a function call but in a generic function the function can be called without specifying arguments or concern for the different object types. When the function is called the R interpreter attempts to route the call to a function defined for the class of the objects in the function call.
For example, the functions plot() and print() are generic functions and the specific methods available to each generic function can be found by calling the methods() function with the generic function. Calling methods(plot) reveals the methods available to plot() which could be selected by the R interpreter for plotting a supplied set of arguments based on decisions made by the interpreter based on the objects and values present. For example, a generic method would test supplied data or objects to see if it contained a matrix and if so take one set of actions, take a different set of actions if it contained a data.frame, etc.
This abstraction helps simplify aspects of programming and use of functions but it can introduce unexpected behavior and incorrect or misleading results. S3 implemented an unchecked form of object oriented design and S4 was introduced to correct these shortcomings with a stricter design more akin to other existing object oriented systems. Despite the stricter nature of S4, or possibly because of this stricter nature, S3 remains the most commonly used object oriented system in the R language.
S4 requires formal definition of classes and class inheritance while S3 allows the programmer to turn a data frame or a list into a class by simply adding the class attribute to the data frame. The other substantial difference in S4 is that a generic function can be dispatched to a method based on any number of argument classes and not just one as in S3.
In practice this help eliminate some of the inconsistencies or misleading results that would come from passing incorrect or bad arguments to an S3 object or method. In S3 the programmer can create a class and pass objects to the class for processing but if the programmer passes objects incompatible with the functions within that class the R interpreter will not necessarily raise any errors or warnings and simply return NULL values. In S4 the programmer specifically states class definitions for input and output and if a user passes incorrect data to the class function it will return an error.
For some examples I import a dataset of cellular churn data read in from a CSV file, assign a class to the data, and then explore some of the attributes and values as S3:
In the below shot of the R Studio values we can see that churn is now referred to as "Large customers" while before it had been a class it had been a collection of 20,000 observations. Calling class(churn) also added two new attributes to the data set, attr("class") and attr("row.names").
As we execute the commands after class(churn) from the earlier screenshot we can confirm how specific class members or variables are accessed via the $ operator and the results from our tests for being S3 or S4 and base type:
To explore the S4 objects I created a class and a validity function and then test the validity checker as well as similar tests as above as to S3 or S4 membership and base types:
This code creates a class "usedcars" which utilizes a validity function "checkCar" to make sure supplied values fit within the constraints of Price, Mileage, and Year we have specified. In the next code segment we perform some tests on the class, its slots, and then create an instance of the usedcars class with the formal call to the new() function:
The console output for executing these commands is:
The getClass() function returns the 'slots' of the class "usedcars", which can be useful for making sure that when values are being passed for insertion into a new instance of "usedcars" that the correct slot names are utilized. inv101 receives the new instance of "usedcars" and since there were no error messages returned to the console we can be confident that the correct data types were passed and the values were in the allowed ranges.
In the below code we test for S4 and object type, test the validity detection function of the "usedcars" class, and direct access values using the @ referential operator (which differs from the S3 $ operator):
The integrity check successfully caught the out of specification attempts to assign values when created inv102 and inv103 above. The input price for inv102 was input too high in error and both the price and year values for inv103 were in error.