Chapter 2 Basic Types
In every programming language, data is stored in different ways. Writing a program that manipulates data requires understanding all of the choices. That is why we must be concerned with the different types of data in our R and Python programs. Different types are suitable for different purposes.
There are similarities between Python’s and R’s type systems. However, there are may differences as well. Be prepared for these differences. There are many more of them in this chapter than there were in the previous chapter!
If you’re ever unsure what type a variable has, use type()
(in Python) or typeof()
(in R) to query it.
Storing an individual piece of information is simple in both languages. However, while Python has scalar types, R does not draw as strong of a distinction between scalar and compound types.
2.1 Basic Types In Python
In Python, the simplest types we frequently use are str
(short for string), int
(short for integer), float
(short for floating point) and bool
(short for Boolean). This list is not exhaustive, but these are a good collection to start thinking about. For a complete list of built-in types in Python, click here.
Strings are useful for processing text data such as names of people/places/things and messages such as texts, tweets and emails (Beazley and Kenneth) Jones 2014). If you are dealing with numbers, you need floating points if you have a number that might have a fractional part after its decimal; otherwise you’ll need an integer. Booleans are useful for situations where you need to record whether something is true or false. They are also important to understand for control-flow in section 11.
In the next section we will discuss the Numpy library. This library has a broader collection of basic types that allows for finer control over any script you write.
2.1.1 Type Conversions in Python
We will often have to convert between types in a Python program. This is called type conversion, and it can be either implicitly or explicitly done.
For example, int
s are often implicitly converted to float
s, so that arithmetic operations work.
my_int = 1
my_float = 3.2
my_sum = my_int + my_float
print("my_int's type", type(my_int))
## my_int's type <class 'int'>
print("my_float's type", type(my_float))
## my_float's type <class 'float'>
print(my_sum)
## 4.2
print("my_sum's type", type(my_sum))
## my_sum's type <class 'float'>
You might be disappointed if you always count on this behavior, though. For example, try the following piece of code on your machine. You will receive the following error: TypeError: unsupported operand type(s) for +: 'float' and 'str'
.
Explicit conversions occur when we as programmers explicitly ask Python to perform a conversion. We will do this with the functions such as int()
, str()
, float()
, and bool()
.
2.2 Basic Types In R
In R, the names of basic types are only slightly different. They are logical
(instead of bool
), integer
(instead of int
), double
or numeric
(instead of float
)5, character
(instead of str
), complex
(for calculations involving imaginary numbers), and raw
(useful for working with bytes).
In this case R automatically upgraded 1
to a double. If you wanted to force it to be an integer, you can add a capital “L” to the end of the number.
# cat() is kind of like print()
cat(typeof('a'), typeof(1L), typeof(1.3))
## character integer double
2.2.1 Type Conversions in R
You can explicitly and implicitly convert types in R just as you did in Python. Implicit conversion looks like this.
myInt = 1
myDouble = 3.2
mySum = myInt + myDouble
print(paste0("my_int's type is ", typeof(myInt)))
## [1] "my_int's type is double"
print(paste0("my_float's type is ", typeof(myDouble)))
## [1] "my_float's type is double"
print(mySum)
## [1] 4.2
print(paste0("my_sum's type is ", typeof(mySum)))
## [1] "my_sum's type is double"
Explicit conversion can be achieved with functions such as as.integer
, as.logical
, as.double
, etc.
2.2.2 R’s Simplification
The basic types of R are a little different than the basic types of Python. On the one hand, Python has basic types for individual elements, and it uses separate types as containers for storing many elements. On the other, R uses the same type to store a single element as it does to store many elements. Strictly speaking, R does not have a scalar type.
Technically, all of the examples we just did in R are using length one vectors–logical
integer
double
, character
, complex
, and raw
are the possible modes of a vector. vector
s will be discussed further section 3.
Think about which option you prefer. What are the benefits of using separate types for scalars and collections? What are the benefits of using the same type?
2.3 Exercises
2.3.1 R Questions
Which R base type is ideal for each piece of data? Assign your answers to a character
vector
of length four called questionOne
.
- An individual’s IP address
- whether or not an individual attended a study
- the number of seeds found in a plant
- the amount of time it takes for a car to race around a track
Floating points are weird. What gets printed is not the same as what is stored! In R, you can control how many digits get printed by using the options
function.
- Assign
2/3
toa
print
a
, and copy/paste what you see into the variableaPrint
. Make sure it is acharacter
.- Take a look at the documentation for
options
. Assign the value ofoptions()$digits
tonumDigitsStart
- Change the number of digits to
22
- Again,
print
,a
and copy/paste what you see into the variableaPrintv2
. Make sure it is acharacter
. - Assign the output of
options()$digits
tonumDigitsEnd
Floating points are weird. What gets stored might not be what you want. “The only numbers that can be represented exactly in R’s numeric type are integers and fractions whose denominator is a power of 2.” As a consequence, you should never test strict equality (i.e. using ==
) between two floating points.
- Assign the square root of 2 to
mySqrt
- Print the square of this variable
- Test (using
==
) that this variable is equal to2
. Assign the result of this test toisTwoRecoverable
- Test for near equality (using
all.equal
). In other words, check that this variable is very close to2
. Assign the result of this test tocloseEnough
. Make sure to read the documentation for this function because the return type can be tricky!
2.3.2 Python Questions
Which Python type is ideal for each piece of data? Assign your answers to a list
of str
ings called question_one
.
- An individual’s IP address
- whether or not an individual attended a study
- the number of seeds found in a plant
- the amount of time it takes for a car to race around a track
Floating points are weird. What gets printed is not the same as what is stored! In Python, you need to edit a class’s __str__
method if you want to control how many digits get printed for a user-defined type/class, but we won’t do that. Instead, we’ll use str.format()
to return a string directly (instead of copy/paste-ing it).
- Assign
a
to2/3
print
a
, and copy/paste what you see into the variablea_print
- Create a
str
that displays 22 digits of 2/3. Call ita_printv2
print
the above string
Floating points are weird. What gets stored might not be what you want. The Python documentation has an excellent discussion of how storage behavior can be surprising. Click here to read it.
- Assign the square root of 2 to
my_sqrt
print
the square of this variable- Test (using
==
) that this variable is equal to2
. Assign the result of this test tois_two_recoverable
- Test for near equality (using
np.isclose
, which is available after runningimport numpy as np
). In other words, check that this variable is close to2
. Assign the result of this test toclose_enough
.
References
Beazley, David M., and Brian K. (Brian Kenneth) Jones. 2014. Python Cookbook: Recipes for Mastering Python 3. Third. pub-ora-media:adr: pub-ora-media.
“double” is short for “double precision floating point.” In other programming languages, the programmer might choose how many decimal points of precision he or she wants.↩