Teaching tensor products in a 2nd linear algebra course

image

I want to introduce tensor products in a 2nd year in a linear algebra course. The demographics is typical: mathematics, physics, computer sciences, etc. students, who already had a course focusing on linear systems. The syllabus is also typical: we will follow Linear Algebra Done Right, versing on the basics of vector spaces, then eigenthings/diagonalization + inner products. We skip quotient spaces.

This typical syllabus, however, doesn't cover tensor products. The case for doing differently is strong, due to the ubiquitous applications in pure math, quantum mechanical entanglement, vector fields in differential geometry, many CS applications (in particular machine learning), etc. (the latter also justifies it for all other applied sciences).

I see two usual ways to introduce tensors: either through a quite abstract lens for this level (the universal property, bilinear forms, and then a presentation for $V\otimes W$), or an extremely basis-oriented approach which describes tensors as multidimensional arrays following a certain rule for their product, like this. I feel like there should be a middle of the way between these which could benefit students taking a 2nd course in linear algebra.

I think this actually points at an interesting gap in mathematical foundations. To me the cleanest bare-bones way of describing the tensor product is simply the following:

This does not require saying anything about universal properties or constructing the tensor product using free vector spaces, which I've come to suspect is ultimately a hack to get around foundational issues: don't you find it inelegant that to construct the tensor product of two finite-dimensional vector spaces $V, W$, which is also finite-dimensional, we need to first pass to a vastly infinite-dimensional intermediate vector space $F(V \times W)$, before quotienting it by a vastly infinite-dimensional subspace? This intermediate space is very confusing to many students and we immediately discard it anyway. So wouldn't it be better to find an approach that avoids it?

What the above approach requires instead is altering our conception of what a set is. Namely, we need to use Bishop sets. Quoting the nLab:

In (Bishop) the notion of set is specified by stating that a set has to be given by a description of how to build elements of this set and by giving a binary relation of equality, which has to be an equivalence relation.

This is what we've done above: we've described how to build elements of the tensor product and we've described what it means for two elements of the tensor product to be equal. So we've built a Bishop set. If you like, this is a way to collapse the two-step construction (first construct $F(V \times W)$, then quotient by the axiom subspace) into a single step (so we never have to discuss the intermediate $F(V \times W)$ directly), at the expense of producing something that isn't a set in the ordinary sense.

But a Bishop set tells us exactly what we need to perform calculations! So I would argue that this is both meaningfully less abstract than the usual stuff while also being closer to mathematical practice. Many sets in mathematics are actually naturally Bishop sets, for example $\mathbb{Q}$ (you construct an element by writing down a fraction $\frac{n}{m}$, and two fractions are equal iff etc), $\mathbb{R}$ (you construct an element by writing down a Cauchy sequence, and two Cauchy sequences are equal iff etc), etc. Of course I've never been in a position to try teaching anyone anything using this perspective so I can't speak to how well this works in practice.

This train of thought was prompted by a remark in a paper about geometric algebra (I think by Hestenes) that Clifford algebras can be defined in this style, in a way that avoids an explicit discussion of tensor products, universal properties, etc.: you just say that an element of $\text{Cl}(V, q)$ is a noncommutative polynomial in elements of $V$, and that two such elements are equal iff they can be transformed into each other by a sequence of applications of the associative algebra axioms together with the additional axiom $v^2 = q(v)$.

I particularly like the definition given in Halmos's Finite Dimensional Vector Spaces book; the tensor product of vector spaces $U$ and $V$ is defined as $\mathcal{B}(U, V)^*$, the dual space of the space of bilinear forms on $U$ and $V$.

Such a construction is very concrete, it's essentially immediate that this satisfies the universal property, it's also immediate that the dimension of $U \otimes V$ is $\dim U \cdot \dim V$, but it doesn't explicitly require bases for $U$ or $V$ to define (so no "well-definedness" questions), nor does it require any questions about what "formal linear combinations" mean or why you can simply postulate "bilinearity relations".

I also like that this seems to mesh very nicely with the common "physicists" perspective, that a tensor of valence $(k, l)$ is a multi-linear function of $k$ covectors and $l$ vectors (which naturally leads to the "multi-dimensional array of numbers" idea, just by picking a basis).

The largest subtlety with this approach is it requires being comfortable with dual spaces, which you may not cover, and are themselves a topic students don't always understand well.

Since this topic is going to be the hardest one in the course, be sure you have a good reading reference, since tensor products are not in your chosen textbook (Axler). [Edit: see update at the end of the next paragraph.] And make sure you will have time to do something with tensor products: solve several problems using them that are not directly about them.

My experience teaching tensor products is to graduate students. For an undergraduate course aimed at a mixed audience of math, physics, and CS majors, I like the idea of using the approach in the book of Halmos, which is mentioned in another answer: $V\otimes_K W$ is set to be $({\rm Bil}(V\times W,K))^*$. In terms of the real definition of tensor products, $({\rm Bil}(V\times W,K))^*$ is naturally isomorphic to $((V\otimes_K W)^*)^*$, so the fact that the Halmos approach can be okay is heavily reliant on $V$ and $W$ being finite-dimensional together with double duality in the finite-dimensional case. [Edit: a comment below by Axler points out that the latest edition of his book has a treatment of tensor products at the end: the definition of $V\otimes_K W$ on page $372$ is ${\rm Bil}(V^* \times W^*,K)$, which is $(V^*\otimes W^*)^*$ and that is nearly like what Halmos does, but perhaps is a bit simpler technically. This new part of the book does not do anything with tensor product spaces besides study them for their own sake, as far as I could tell. So that leaves something to do in the next edition. :)]

Emphasize to students not to try to probe too much about what a tensor “is” but instead what their computational rules are along with what a basis can be: (i) for $v \in V$ and $w\in W$, the elementary tensor $v\otimes w$ is the map $B \mapsto B(v,w)$ for all bilinear $B:V\times W \to K$, (ii) we have the formulas $(cv)\otimes w = c(v\otimes w)$ and $(v+v’)\otimes w = v\otimes w+v’\otimes w$ as well as the two rules focusing on linear changes in the second “tensorand”, and (iii) if $\{e_1,\ldots,e_m\}$ and $\{f_1,\ldots,f_n\}$ are any bases of $V$ and $W$, respectively, then $\{e_i\otimes f_j\}$ is a basis of $V\otimes_K W$. Perhaps present the computational and basis rules and make some tensor calculations without first giving an official definition of $V\otimes_K W$ so that they get used to how tensor calculations work, and only then present the peculiar definition for finite-dimensional spaces in order to show a tensor product space with its very overdetermined bilinear tensor rules actually makes sense and that it really has the desired basis construction property: for the bases $\{e_i\}$ and $\{f_j\}$, a basis of ${\rm Bil}(V\times W,K)$ is the $mn$ bilinear maps $\{B_{ij}\}$ where $B_{ij}$ is the unique bilinear map $V\times W\to K$ such that $B_{ij}(e_r,f_s) = \delta_{ir}\delta_{js}$ for all $r$ and $s$. Then check each $t$ in $V\otimes_K W$ is $\sum c_{ij} e_i \otimes f_j$ where $c_{ij} = t(B_{ij})$ and the $mn$ elementary tensors $\{e_i\otimes f_j\}$ are linearly independent.

There are two main examples, which I would start from.

Usually a vector space they deal with (like $\mathbb{R}^n$) is the space of functions on a finite set $A$ (for $\mathbb{R}^n$, $A=\{1,2,\ldots,n\}$). For two such spaces with ground sets $A$ and $B$, their tensor product is just the space of functions on the Cartesian product $A\times B$. And $f\otimes g$ for the functions $f, g$ on $A, B$ is defined by the formula $$(f\otimes g)(x, y)=f(x)g(y).\tag{$\heartsuit$}$$

If $X$ is the space of linear polynomials in the variables $x_1,\ldots,x_n$ and $Y$ is the space of linear polynomials in the other variables $y_1,\ldots,y_m$, then $X\otimes Y$ is the space of bilinear polynomials in these $n+m$ variables $x_1,\ldots,x_n$ and $y_1,\ldots,y_m$. Again, $(\heartsuit)$ takes place.

Dimension and basis issues are easily understood.

These two examples are isomorphic, of course: enumerate $A=\{a_1,\ldots,a_n\}$ and define $x_i$ as a function on $A$ which takes the value 1 at $a_i$ and 0 otherwise. Actually all finite-dimensional examples are isomorphic to these two.

After that general less or more abstract constructions (from formal linear combinations which follow some rules to universal property, depends on the audience background and goals of a course) can be introduced.

I’d only add a general remark. I assume that the main objects in this course are finite dimensional real or complex vector spaces. Since all the spaces $V\otimes W$, $\text{Bil}(V\times W)$, $\text{Hom}(V,W)$ etc, with various combination of duals and bi-duals are all isomorphic, some students may find their introduction a sort of abstract obstinacy. Naturality and functoriality of these constructions of course provide a sufficient justification, but again on an abstract level, that has to be motivated (e.g. by their use in differential geometry, a bit far though). To make these objects somehow more distinguished and more concrete, it seems to me not a bad idea doing these constructions together with norms, and proving that various linear isomorphisms are indeed isometries too, or at least computing bounds. This may require some more work, and move slightly the course toward topics in numeric linear algebra, but it seems reasonable, given the demography.

Ask AI
#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70