Chapter 2 Basic Types

In every programming language, data is stored in different ways. Writing a program that manipulates data requires understanding all of the choices. That is why we must be concerned with the different types of data in our R and Python programs. Different types are suitable for different purposes.

There are similarities between Python’s and R’s type systems. However, there are may differences as well. Be prepared for these differences. There are many more of them in this chapter than there were in the previous chapter!

If you’re ever unsure what type a variable has, use type() (in Python) or typeof() (in R) to query it.

Storing an individual piece of information is simple in both languages. However, while Python has scalar types, R does not draw as strong of a distinction between scalar and compound types.

2.1 Basic Types In Python

In Python, the simplest types we frequently use are str (short for string), int (short for integer), float (short for floating point) and bool (short for Boolean). This list is not exhaustive, but these are a good collection to start thinking about. For a complete list of built-in types in Python, click here.

print(type('a'), type(1), type(1.3))
## <class 'str'> <class 'int'> <class 'float'>

Strings are useful for processing text data such as names of people/places/things and messages such as texts, tweets and emails (Beazley and Kenneth) Jones 2014). If you are dealing with numbers, you need floating points if you have a number that might have a fractional part after its decimal; otherwise you’ll need an integer. Booleans are useful for situations where you need to record whether something is true or false. They are also important to understand for control-flow in section 11.

In the next section we will discuss the Numpy library. This library has a broader collection of basic types that allows for finer control over any script you write.

2.1.1 Type Conversions in Python

We will often have to convert between types in a Python program. This is called type conversion, and it can be either implicitly or explicitly done.

For example, ints are often implicitly converted to floats, so that arithmetic operations work.

my_int = 1
my_float = 3.2
my_sum = my_int + my_float
print("my_int's type", type(my_int))
## my_int's type <class 'int'>
print("my_float's type", type(my_float))
## my_float's type <class 'float'>
print(my_sum)
## 4.2
print("my_sum's type", type(my_sum))
## my_sum's type <class 'float'>

You might be disappointed if you always count on this behavior, though. For example, try the following piece of code on your machine. You will receive the following error: TypeError: unsupported operand type(s) for +: 'float' and 'str'.

3.2 + "3.2"

Explicit conversions occur when we as programmers explicitly ask Python to perform a conversion. We will do this with the functions such as int(), str(), float(), and bool().

my_date = "5/2/2021"
month_day_year = my_date.split('/')
my_year = int(month_day_year[-1]) 
print('my_year is', my_year, 'and its type is', type(my_year))
## my_year is 2021 and its type is <class 'int'>

2.2 Basic Types In R

In R, the names of basic types are only slightly different. They are logical (instead of bool), integer (instead of int), double or numeric (instead of float)⁵, character (instead of str), complex (for calculations involving imaginary numbers), and raw (useful for working with bytes).

# cat() is kind of like print()
cat(typeof('a'), typeof(1), typeof(1.3))
## character double double

In this case R automatically upgraded 1 to a double. If you wanted to force it to be an integer, you can add a capital “L” to the end of the number.

# cat() is kind of like print()
cat(typeof('a'), typeof(1L), typeof(1.3))
## character integer double

2.2.1 Type Conversions in R

You can explicitly and implicitly convert types in R just as you did in Python. Implicit conversion looks like this.

myInt = 1
myDouble = 3.2
mySum = myInt + myDouble
print(paste0("my_int's type is ", typeof(myInt)))
## [1] "my_int's type is double"
print(paste0("my_float's type is ", typeof(myDouble)))
## [1] "my_float's type is double"
print(mySum)
## [1] 4.2
print(paste0("my_sum's type is ", typeof(mySum)))
## [1] "my_sum's type is double"

Explicit conversion can be achieved with functions such as as.integer, as.logical, as.double, etc.

print(typeof(1))
## [1] "double"
print(typeof(as.logical(1)))
## [1] "logical"

2.2.2 R’s Simplification

The basic types of R are a little different than the basic types of Python. On the one hand, Python has basic types for individual elements, and it uses separate types as containers for storing many elements. On the other, R uses the same type to store a single element as it does to store many elements. Strictly speaking, R does not have a scalar type.

Technically, all of the examples we just did in R are using length one vectors–logical integer double, character, complex, and raw are the possible modes of a vector. vectors will be discussed further section 3.

Think about which option you prefer. What are the benefits of using separate types for scalars and collections? What are the benefits of using the same type?

2.3 Exercises

2.3.1 R Questions

Which R base type is ideal for each piece of data? Assign your answers to a character vector of length four called questionOne.

An individual’s IP address
whether or not an individual attended a study
the number of seeds found in a plant
the amount of time it takes for a car to race around a track

Floating points are weird. What gets printed is not the same as what is stored! In R, you can control how many digits get printed by using the options function.

Assign2/3 to a
print a, and copy/paste what you see into the variable aPrint. Make sure it is a character.
Take a look at the documentation for options. Assign the value of options()$digits to numDigitsStart
Change the number of digits to 22
Again, print, a and copy/paste what you see into the variable aPrintv2. Make sure it is a character.
Assign the output of options()$digits to numDigitsEnd

Floating points are weird. What gets stored might not be what you want. “The only numbers that can be represented exactly in R’s numeric type are integers and fractions whose denominator is a power of 2.” As a consequence, you should never test strict equality (i.e. using ==) between two floating points.

Assign the square root of 2 to mySqrt
Print the square of this variable
Test (using ==) that this variable is equal to 2. Assign the result of this test to isTwoRecoverable
Test for near equality (using all.equal). In other words, check that this variable is very close to 2. Assign the result of this test to closeEnough. Make sure to read the documentation for this function because the return type can be tricky!

2.3.2 Python Questions

Which Python type is ideal for each piece of data? Assign your answers to a list of strings called question_one.

An individual’s IP address
whether or not an individual attended a study
the number of seeds found in a plant
the amount of time it takes for a car to race around a track

Floating points are weird. What gets printed is not the same as what is stored! In Python, you need to edit a class’s __str__ method if you want to control how many digits get printed for a user-defined type/class, but we won’t do that. Instead, we’ll use str.format() to return a string directly (instead of copy/paste-ing it).

Assign a to 2/3
print a, and copy/paste what you see into the variable a_print
Create a str that displays 22 digits of 2/3. Call it a_printv2
print the above string

Floating points are weird. What gets stored might not be what you want. The Python documentation has an excellent discussion of how storage behavior can be surprising. Click here to read it.

Assign the square root of 2 to my_sqrt
print the square of this variable
Test (using ==) that this variable is equal to 2. Assign the result of this test to is_two_recoverable
Test for near equality (using np.isclose, which is available after running import numpy as np). In other words, check that this variable is close to 2. Assign the result of this test to close_enough.

References

Beazley, David M., and Brian K. (Brian Kenneth) Jones. 2014. Python Cookbook: Recipes for Mastering Python 3. Third. pub-ora-media:adr: pub-ora-media.

“double” is short for “double precision floating point.” In other programming languages, the programmer might choose how many decimal points of precision he or she wants.↩