Chapter 4 Numpy ndarrays Versus R’s matrix and array Types

Sometimes you want a collection of elements that are all the same type, but you want to store them in a two- or three-dimensional structure. For instance, say you need to use matrix multiplication for some linear regression software you’re writing, or that you need to use tensors for a computer vision project you’re working on.

4.1 Numpy ndarrays In Python

In Python, you could still use arrays for these kinds of tasks. You will be pleased to learn that the Numpy arrays we discussed earlier are a special case of Numpy’s N-dimensional arrays. Each array will come with an enormous amount of methods and attributes (more on object-oriented program in chapter 14) attached to it. A few are demonstrated below.

Matrix and elementwise multiplication is often useful, too.

I should mention that there is also a matrix type in Numpy; however, this is not described in this text because it is preferable to work with Numpy arrays (Albon 2018).

In both R and Python, there are matrix types and array types. In R, it is more common to work with matrixs than arrays, and the opposite is true in Python!

4.2 The matrix and array classes in R

In Python, adding a dimension to your “container” is simple. You keep using Numpy arrays, and you just change the .shape attribute (perhaps with a call to .reshape() or something similar). In R, there is a stronger distinction between 1-,2-, and 3-dimensional containers. Each has its own class. 2-dimensional containers that store objects of the same type are of the matrix class. Containers with 3 or more dimensions are of the array class7. In this section, I will provide a quick introduction to using these two classes. For more information, see chapter 3 of (Matloff 2011).

Just like vectors, matrix objects do not necessarily have to be used to perform matrix arithmetic. Yes, they require all the elements are of the same type, but it doesn’t really make sense to “multiply” matrix objects that hold onto characters.

I usually create matrix objects with the matrix() function or the as.matrix() function. matrix() is to be preferred in my opinion. The first argument is explicitly a vector of all the flattened data that you want in your matrix. On the other hand, as.matrix() is more flexible; it takes in a variety of R objects (e.g. data.frames), and tries to figure out what to do with them on a case-by-case basis. In other words, as.matrix() is a generic function. More information about generic functions is provided in 14.2.2.

Some other things to remember with matrix(): byrow= is FALSE by default, and you will also need to specify either ncol= and/or nrow= if you want anything that isn’t a 1-column matrix.

array() is used to create array objects. This type is used less than the matrix type, but this doesn’t mean you should avoid learning about it. This is mostly a reflection of what kind of data sets people prefer to work with, and the fact that matrix algebra is generally better understood than tensor algebra. You won’t be able to avoid 3-d data sets (3-dimensions, not a 3-column matrix) forever, though, particularly if you’re working in an area such as neuroimaging or computer vision.

You can matrix-multiply matrix objects together with the %*% operator. If you’re working on this, then the transpose operator (i.e. t()) comes in handy, too. You can still use element-wise (Hadamard) multiplication. This is defined with the more familiar multiplication operator *.

Sometimes you need to access or modify individual elements of a matrix object. You can use the familiar [ and [<- operators to do this. Here is a setting example. You don’t need to worry about coercion to different types here.

Here are some extraction examples. Notice that, if it can, [ will coerce a matrix to vector. If you wish to avoid this, you can specify drop=FALSE.

There are other functions that operate on one or more matrix objects in more interesting ways, but much of this will be covered in future sections. For instance, we will describe how apply() works with matrixs in section 15, and we will discuss combining matrix objects in different ways in section 12.

4.3 Exercises

4.3.1 R Questions

Consider the following data set. Let \(N = 20\) be the number of rows. For \(i=1,\ldots,N\), define \(\mathbf{x}_i \in \mathbb{R}^4\) as the data in row \(i\).

For the following problems, make sure to only use the transpose function t(), matrix multiplication (i.e. %*%), and scalar multiplication/division. You may use other functions in interactive mode to check your work, but please do not use them in your submission.

  1. Calculate the sample mean \(\bar{\mathbf{x}} = \frac{1}{N} \sum_{i=1}^N \mathbf{x}_i\). Check your work with colMeans(), but don’t use that function in your submitted code. Assign it to the variable xbar. Make sure it is a \(4 \times 1\) matrix object.

  2. Calculate the \(4 \times 4\) sample covariance of the following data. Call the variable mySampCov, and make sure it is also a matrix object. You can check your work with cov(), but don’t use it in your submitted code. A formula for the sample covariance is \[\begin{equation} \frac{1}{N-1} \sum_{i=1}^N (\mathbf{x}_i - \bar{\mathbf{x}})(\mathbf{x}_i - \bar{\mathbf{x}})^\intercal \end{equation}\]

Create a matrix called P that has one hundred rows, one hundred columns, all of its elements nonnegative, \(1/10\) on every diagonal element, and all rows summing to one. This matrix is called stochastic and it describes how a Markov chain moves randomly through time.

Create a matrix called X that has one thousand rows, four columns, has every element set to either \(0\) or \(1\), has its first column set to all \(1\)s, has the second column set to \(1\) in the second \(250\) elements and \(0\) elsewhere, has the third column set to \(1\) in the third \(250\) spots and \(0\) elsewhere, and has the fourth column set to \(1\) in the last \(250\) spots and \(0\) elsewhere. In other words, it looks something like

\[\begin{equation} \begin{bmatrix} \mathbf{1}_{250} & \mathbf{0}_{250} & \mathbf{0}_{250} & \mathbf{0}_{250} \\ \mathbf{1}_{250} & \mathbf{1}_{250} & \mathbf{0}_{250} & \mathbf{0}_{250} \\ \mathbf{1}_{250} & \mathbf{0}_{250} & \mathbf{1}_{250} & \mathbf{0}_{250} \\ \mathbf{1}_{250} & \mathbf{0}_{250} & \mathbf{0}_{250} & \mathbf{1}_{250} \\ \end{bmatrix} \end{equation}\] where \(\mathbf{1}_{250}\) and \(\mathbf{0}_{250}\) are length \(250\) column vectors with all of their elements set to \(1\) or \(0\), respectively.

  1. Compute the projection (or hat) matrix \(\mathbf{H} := \mathbf{X}\left(\mathbf{X}^\intercal \mathbf{X}\right)^{-1} \mathbf{X}^\intercal\). Make it a matrix and call it H.

  2. An exchangeable covariance matrix for a random vector is a covariance matrix that has all the same variances, and all the same covariances. In other words, it has two unique elements: the diagonal elements should be the same, and the off-diagonals should be the same. In R, generate ten \(100 \times 100\) exchangeable covariance matrices, each with \(2\) as the variance, and have the possible covariances take values in the collection \(0,.01,.02, ..., .09.\) Store these ten covariance matrices in a three-dimensional array. The first index should be each matrix’s row index, the second should be the column index of each matrix, and the third index should be the “layer” or “slice” indicating which of the \(10\) matrices you have. Name this array myCovMats

  3. In R, generate one hundred \(10 \times 10\) exchangeable covariance matrices, each with \(2\) as the variance, and have the possible covariances take values in the collection \(0,0.0009090909, ..., 0.0890909091, .09.\) Store these \(100\) covariance matrices in a three-dimensional array. The first index should be each matrix’s row index, the second should be the column index of each matrix, and the third index should be the “layer” or “slice” indicating which of the \(100\) matrices you have. Name this array myCovMats2

4.3.2 Python Questions

Let \(\mathbf{X}\) be an \(n \times 1\) random vector. It has a multivariate normal distribution with mean vector \(\mathbf{m}\) and positive definite covariance matrix \(\mathbf{C}\) if its probability density function can be written as

\[\begin{equation} f(\mathbf{x}; \mathbf{m}, \mathbf{C}) = (2\pi)^{-n/2}\text{det}\left( \mathbf{C} \right)^{-1/2}\exp\left[- \frac{1}{2} (\mathbf{x}- \mathbf{m})^\intercal \mathbf{C}^{-1} (\mathbf{x}- \mathbf{m}) \right] \end{equation}\]

Evaluating this density should be done with care. There is no one function that is optimal for all situations. Here are a couple quick things to consider.

  • Inverting very large matrices with either np.linalg.solve or np.linalg.inv becomes very slow if the covariance matrix is high-dimensional. If you have special assumptions about the structure of the covariance matrix, use it! Also, it’s a good idea to be aware of what happens when you try to invert noninvertible matrices. For instance, can you rely on errors to be thrown, or will it return a bogus answer?

  • Recall from the last lab that exponentiating numbers close to \(-\infty\) risks numerical underflow. It’s better to prefer evaluating log densities (base \(e\), the natural logarithm). There are also special functions that evaluate log determinants that are less likely to underflow/overflow, too!

Complete the following problems. Do not use pre-made functions such as scipy.stats.norm and scipy.stats.multivariate_normal in your submission, but you may use them to check your work. Use only “standard” functions and Numpy n-dimensional arrays. Use the following definitions for \(\mathbf{x}\) and \(\mathbf{m}\):

  1. Let \(\mathbf{C} = \begin{bmatrix} 10 & 0 & 0 \\ 0 & 10 & 0 \\ 0 & 0 & 10 \end{bmatrix}\). Evaluate and assign the log density to a float-like called log_dens1. Can you do this without defining a numpy array for \(\mathbf{C}\)?

  2. Let \(\mathbf{C} = \begin{bmatrix} 10 & 0 & 0 \\ 0 & 11 & 0 \\ 0 & 0 & 12 \end{bmatrix}\). Evaluate and assign the log density to a float-like called log_dens2. Can you do this without defining a numpy array for \(\mathbf{C}\)?

  3. Let \(\mathbf{C} = \begin{bmatrix} 10 & -.9 & -.9 \\ -.9 & 11 & -.9 \\ -.9 & -.9 & 12 \end{bmatrix}\). Evaluate and assign the log density to a float-like called log_dens3. Can you do this without defining a numpy array for \(\mathbf{C}\)?

Consider this wine data set from (Cortez et al. 2009) hosted by (Dua and Graff 2017). Read it in with the following code. Note that you might need to use os.chdir() first.

  1. Create the design matrix (denoted mathematically by \(\mathbf{X}\)) by removing the "quality" column, and subtracting the column mean from each element. Call the variable X, and make sure that it is a Numpy ndarray, not a Pandas DataFrame.

  2. Compute the spectral decomposition of \(\mathbf{X}^\intercal \mathbf{X}\). In other words, find “special” matrices8 \(\mathbf{V}\) and \(\boldsymbol{\Lambda}\) such that \(\mathbf{X}^\intercal \mathbf{X} = \mathbf{V} \boldsymbol{\Lambda} \mathbf{V}^\intercal\). Note that the eigenvectors are stored as columns in a matrix \(\mathbf{V} := \begin{bmatrix} \mathbf{V}_1 & \cdots & \mathbf{V}_{11}\end{bmatrix}\), and the scalar eigenvalues are stored as diagonal elements \(\boldsymbol{\Lambda} = \text{diag}(\lambda_1, \ldots, \lambda_{11})\). Store the eigenvectors in an ndarray called eig_vecs, and store the eigenvalues in a Numpy array called eig_vals. Hint: use np.linalg.eig(). Also, if you’re rusty with your linear algebra, don’t worry too much about refreshing your memory about what eigenvectors and eigenvalues are.

  3. Compute the singular value decomposition of \(\mathbf{X}\). In other words, find “special”9 matrices \(\mathbf{U}\), \(\mathbf{\Sigma}\), and \(\mathbf{V}\) such that \(\mathbf{X} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^\intercal\). Use np.linalg.svd, and don’t worry too much about the mathematical details. These two decompositions are related. If you do it correctly, the two \(\mathbf{V}\) matrices should be the same, and the elements of \(\boldsymbol{\Sigma}\) should be the square roots of the elements of \(\boldsymbol{\Lambda}\). Store the eigenvectors as columns in an ndarray called eig_vecs_v2, and store the singular values (diagonal elements of \(\boldsymbol{\Sigma}\)) in a Numpy array called sing_vals.

  4. Compute the first principal component vector, and call it first_pc_v1. The mathematical formula is \(\mathbf{X} \mathbf{U}_1\) where \(\mathbf{U}_1\) is the eigenvector associated with the largest eigenvalue \(\lambda_1\). This can be thought of as, in a sense, the most informative predictor that you can create by averaging together all other predictors.

References

Albon, Chris. 2018. Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning. 1st ed. O’Reilly Media, Inc.

Cortez, Paulo, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. 2009. “Modeling Wine Preferences by Data Mining from Physicochemical Properties.” Decis. Support Syst. 47 (4): 547–53. http://dblp.uni-trier.de/db/journals/dss/dss47.html#CortezCAMR09.

Dua, Dheeru, and Casey Graff. 2017. “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer Sciences. http://archive.ics.uci.edu/ml.

Matloff, Norman. 2011. The Art of R Programming: A Tour of Statistical Software Design. 1st ed. USA: No Starch Press.


  1. Technically, the distinction between all of these containers is more subtle. An array in R can have one, two or more dimensions, and it is just a vector which is stored with additional dimension attributes. Moreover, a 2-dimensional array is the same as a matrix.

  2. Do not worry too much about the properties of these matrices for this problem

  3. Again, do not worry too much about the properties of these matrices for this problem.