Chapter 5 R’s lists Versus Python’s lists and dicts

When you need to store elements in a container, but you can’t guarantee that these elements all have the same type, or you can’t guarantee that they all have the same size, then you need a list in R. In Python, you might need a list or dict (short for dictionary) (Lutz 2013).

5.1 lists In R

lists are one of the most flexible data types in R. You can access individual elements in many different ways, each element can be of different size, and each element can be of a different type.

If you want to extract an element, you need to decide between using single square brackets or double square brackets. The former returns a list, while the second returns the type of the individual element.

You can also name the elements of a list. This can lead to more readable code. To see why, examine the example below that makes use of spme data about cars (“SAS Viya Example Data Sets” 2021). The lm() function estimates a linear regression model. It returns a list with plenty of components.

results is a list (because is.list(results) returns TRUE), but to be more specific, it is an S3 object of class lm. If you do not know what this means, do not worry! S3 classes are discussed more in a later chapter. Why is this important? For one, I mention it so that you aren’t confused if you type class(results) and see lm instead of list. Second, the fact that the authors of lm() wrote code that returns result as a “fancy list” suggests that they are encouraging another way to access elements of the results: to use specialized functions! For example, you can use residuals(results), coefficients(results), and fitted.values(results). These functions do not work for all lists in R, but when they do work (for lm and glm objects only), you can be sure you are writing the kind of code that is encouraged by the authors of lm().

5.2 lists In Python

Python lists are very flexible, too. There are fewer choices for accessing and modifying elements of lists in Python–you’ll most likely end up using the square bracket operator. Elements can be different sizes and types, just like they were with R’s lists.

Unlike in R, however, you cannot name elements of lists. If you want a container that allows you to access elements by name, look into Python dictionaries (see section 5.3) or Pandas’ Series objects (see section 3.2).

From the example below, you can see that we’ve been introduced to lists already. We have been constructing Numpy arrays from them.

Python lists have methods attached to them, which can come in handy.

Creating lists can be done as above, with the square bracket operators. They can also be created with the list() function, and by creating a list comprehension. List comprehensions are discussed more in 11.2.

The code above makes reference to a type that is not extensively discussed in this text: tuples.

5.3 Dictionaries In Python

Dictionaries in Python provide a container of key-value pairs. The keys are unique, and they must be immutable. strings are the most common key type, but ints can be used as well.

Here is an example of creating a dict with curly braces (i.e. {}). This dict stores the current price of a few popular cryptocurrencies. Accessing an individual element’s value using its key is done with the square bracket operator (i.e. []), and deleting elements is done with the del keyword.

You can also create dicts using dictionary comprehensions. Just like list comprehensions, these are discussed more in 11.2.

Personally, I don’t use dictionaries as much as lists. If I have a dictionary, I usually convert it to a Pandas data frame (more information on those in 8.2).

5.4 Exercises

5.4.1 R Questions

Consider the data sets "adult.data", "car.data", "hungarian.data", "iris.data", "long-beach-va.data" and "switzerland.data" (Janosi et al. 1988), (Fisher 1988), (“Adult” 1996) and (“Car Evaluation” 1997) hosted by (Dua and Graff 2017). Read all of these in and store them all as a list of data.frames. Call the list listDfs.

Here are two lists in R:

  1. Make a new list that is these two lists above “squished together.” It has to be length \(4\), and each element is one of the elements of l1 and l2. Call this list l3. Make sure to delete all the “tags” or “names” of these four elements.

  2. Extract the third element of l3 as a length one list and assign it to the name l4.

  3. Extract the third element of l3 as a vector and assign it to the name v1.

5.4.2 Python Questions

Read in car.data with pd.read_csv(), and use a DataFrame method to convert that to a dict. Store your answer as car_dict.

Here are two dicts in Python:

  1. Make a new list that is these two dicts above “squished together” (why can’t it be another dict?) It has to be length \(4\), and each value is one of the values of \(d1\) and \(d2\). Call this list my_list.

  2. Use a list comprehension to create a list called special_list of all numbers starting from zero, up to (and including) one million, but don’t include numbers that are divisible by any prime number less than seven.

  3. Assign the average of all elements in the above list to the variable special_ave.

References

“Adult.” 1996. UCI Machine Learning Repository.

“Car Evaluation.” 1997. UCI Machine Learning Repository.

Dua, Dheeru, and Casey Graff. 2017. “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer Sciences. http://archive.ics.uci.edu/ml.

Fisher, Test, R.A. & Creator. 1988. “Iris.” UCI Machine Learning Repository.

Janosi, Andras, William Steinbrunn, Matthias Pfisterer, and Robert Detrano. 1988. “Heart Disease.” UCI Machine Learning Repository.

Lutz, Mark. 2013. Learning Python. 5th ed. Beijing: O’Reilly. https://www.safaribooksonline.com/library/view/learning-python-5th/9781449355722/.