Chapter 5 R’s list
s Versus Python’s list
s and dict
s
When you need to store elements in a container, but you can’t guarantee that these elements all have the same type, or you can’t guarantee that they all have the same size, then you need a list
in R. In Python, you might need a list
or dict
(short for dictionary) (Lutz 2013).
5.1 list
s In R
list
s are one of the most flexible data types in R. You can access individual elements in many different ways, each element can be of different size, and each element can be of a different type.
myList <- list(c(1,2,3), "May 5th, 2021", c(TRUE, TRUE, FALSE))
myList[1] # length-1 list; first element is length 3 vector
## [[1]]
## [1] 1 2 3
myList[[1]] # length-3 vector
## [1] 1 2 3
If you want to extract an element, you need to decide between using single square brackets or double square brackets. The former returns a list
, while the second returns the type of the individual element.
You can also name the elements of a list. This can lead to more readable code. To see why, examine the example below that makes use of spme data about cars (“SAS Viya Example Data Sets” 2021). The lm()
function estimates a linear regression model. It returns a list
with plenty of components.
dataSet <- read.csv("data/cars.csv")
results <- lm(log(Horsepower) ~ Type, data = dataSet)
length(results)
## [1] 13
# names(results) # try this <-
results$contrasts
## $Type
## [1] "contr.treatment"
results['rank']
## $rank
## [1] 6
results
is a list
(because is.list(results)
returns TRUE
), but to be more specific, it is an S3 object of class lm
. If you do not know what this means, do not worry! S3 classes are discussed more in a later chapter. Why is this important? For one, I mention it so that you aren’t confused if you type class(results)
and see lm
instead of list
. Second, the fact that the authors of lm()
wrote code that returns result
as a “fancy list” suggests that they are encouraging another way to access elements of the results
: to use specialized functions! For example, you can use residuals(results)
, coefficients(results)
, and fitted.values(results)
. These functions do not work for all lists in R, but when they do work (for lm
and glm
objects only), you can be sure you are writing the kind of code that is encouraged by the authors of lm()
.
5.2 list
s In Python
Python list
s are very flexible, too. There are fewer choices for accessing and modifying elements of lists in Python–you’ll most likely end up using the square bracket operator. Elements can be different sizes and types, just like they were with R’s lists.
Unlike in R, however, you cannot name elements of lists. If you want a container that allows you to access elements by name, look into Python dictionaries (see section 5.3) or Pandas’ Series
objects (see section 3.2).
From the example below, you can see that we’ve been introduced to lists already. We have been constructing Numpy arrays from them.
import numpy as np
another_list = [np.array([1,2,3]), "May 5th, 2021", True, [42,42]]
another_list[2]
## True
another_list[2] = 100
another_list
## [array([1, 2, 3]), 'May 5th, 2021', 100, [42, 42]]
Python lists have methods attached to them, which can come in handy.
another_list
## [array([1, 2, 3]), 'May 5th, 2021', 100, [42, 42]]
another_list.append('new element')
another_list
## [array([1, 2, 3]), 'May 5th, 2021', 100, [42, 42], 'new element']
Creating lists can be done as above, with the square bracket operators. They can also be created with the list()
function, and by creating a list comprehension. List comprehensions are discussed more in 11.2.
my_list = list(('a','b','c')) # converting a tuple to a list
your_list = [i**2 for i in range(3)] # list comprehension
my_list
## ['a', 'b', 'c']
your_list
## [0, 1, 4]
The code above makes reference to a type that is not extensively discussed in this text: tuple
s.
5.3 Dictionaries In Python
Dictionaries in Python provide a container of key-value pairs. The keys are unique, and they must be immutable. string
s are the most common key type, but int
s can be used as well.
Here is an example of creating a dict
with curly braces (i.e. {}
). This dict
stores the current price of a few popular cryptocurrencies. Accessing an individual element’s value using its key is done with the square bracket operator (i.e. []
), and deleting elements is done with the del
keyword.
crypto_prices = {'BTC': 38657.14, 'ETH': 2386.54, 'DOGE': .308122}
crypto_prices['DOGE'] # get the current price of Dogecoin
## 0.308122
del crypto_prices['BTC'] # remove the current price of Bitcoin
crypto_prices.keys()
## dict_keys(['ETH', 'DOGE'])
crypto_prices.values()
## dict_values([2386.54, 0.308122])
You can also create dict
s using dictionary comprehensions. Just like list comprehensions, these are discussed more in 11.2.
incr_cryptos = {key:val*1.1 for (key,val) in crypto_prices.items()}
incr_cryptos
## {'ETH': 2625.194, 'DOGE': 0.3389342}
Personally, I don’t use dictionaries as much as lists. If I have a dictionary, I usually convert it to a Pandas data frame (more information on those in 8.2).
5.4 Exercises
5.4.1 R Questions
Consider the data sets "adult.data"
, "car.data"
, "hungarian.data"
, "iris.data"
, "long-beach-va.data"
and "switzerland.data"
(Janosi et al. 1988), (Fisher 1988), (“Adult” 1996) and (“Car Evaluation” 1997) hosted by (Dua and Graff 2017). Read all of these in and store them all as a list
of data.frame
s. Call the list listDfs
.
Here are two lists in R:
Make a new
list
that is these two lists above “squished together.” It has to be length \(4\), and each element is one of the elements ofl1
andl2
. Call this listl3
. Make sure to delete all the “tags” or “names” of these four elements.Extract the third element of
l3
as a length onelist
and assign it to the namel4
.Extract the third element of
l3
as avector
and assign it to the namev1
.
5.4.2 Python Questions
Read in car.data
with pd.read_csv()
, and use a DataFrame
method to convert that to a dict
. Store your answer as car_dict
.
Here are two dict
s in Python:
Make a new
list
that is these twodict
s above “squished together” (why can’t it be anotherdict
?) It has to be length \(4\), and each value is one of the values of \(d1\) and \(d2\). Call this listmy_list
.Use a list comprehension to create a list called
special_list
of all numbers starting from zero, up to (and including) one million, but don’t include numbers that are divisible by any prime number less than seven.Assign the average of all elements in the above list to the variable
special_ave
.
References
“Adult.” 1996. UCI Machine Learning Repository.
“Car Evaluation.” 1997. UCI Machine Learning Repository.
Dua, Dheeru, and Casey Graff. 2017. “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer Sciences. http://archive.ics.uci.edu/ml.
Fisher, Test, R.A. & Creator. 1988. “Iris.” UCI Machine Learning Repository.
Janosi, Andras, William Steinbrunn, Matthias Pfisterer, and Robert Detrano. 1988. “Heart Disease.” UCI Machine Learning Repository.
Lutz, Mark. 2013. Learning Python. 5th ed. Beijing: O’Reilly. https://www.safaribooksonline.com/library/view/learning-python-5th/9781449355722/.
“SAS Viya Example Data Sets.” 2021. https://support.sas.com/documentation/onlinedoc/viya/examples.htm.