Comment by seanhunter
Comment by seanhunter 4 days ago
In general this is a cool article but I worry when TFA observes that cosine similarity and the dot product are the same under certain conditions (specifically that the vectors are unit vectors). It's really important that people who are using these measures understand something as basic as this. It shouldn't need to be said in an article but I feel (as the author obviously also does) that it does need to be said because people are just blindly using it without understanding it at all.
Cosine similarity literally comes from solving the geometric formula for the dot product of two Euclidian vectors to find cos theta, so of course it's the same. That is to say
a . b = |a||b| cos theta
Where a and b are the two vectors and theta is the angle between them. Therefore
cos theta = (a . b)/(|a||b|)
TADA! cosine similarity.[1]
If the vectors are unit vectors (he calls this "normalization" in the article) then |a||b| = 1 so of course cos theta = a . b in that case.
If you don't understand this, I really recommend you invest an afternoon in something like khan academy's "vectors" track from their precalculus syllabus. Understanding the underlying basic maths will really pay off in the long run.[2]
[1] If you have ever been confused by why it's called cosine similarity when the formula doesn't include a cosine that's why. The formula gives you cos theta. You would take the arccosine if you wanted to get theta but if you're just using it for similarity you may as well not bother to compute the angle and just use cos theta itself.
[2] Although ML people are just going to keep on misusing the word "tensor" to refer to a mere multidimensional array. I think that ship has sailed and I just need to give up on that now but there's still hope that people at least understand vectors when they work on this stuff. Here's an amazing explanation of what a tensor actually is for anyone who is interested https://www.youtube.com/watch?v=f5liqUk0ZTw
A few nitpicks:
First, I am not sure whom you refer to - as (I hope) everyone who uses cosine similarity has seen a . b = |a||b| cos theta. I read its very name, "cosine (of the angle between vectors) used as a similarity measure".
Second, cos theta = (a . b)/(|a||b|) is pretty much how you define the angle between vectors, when working in Hilbert spaces.
Third, you pick a very narrow view of tensor when it is based on spatial coordinates (and so you get covariant and contravariant indices). But even in physics, this notation is broader - e.g. in quantum physics, a state of two-qubit lives in the tensor product space of two single-qubit states. Sure, both in terms of states and operators, you have a notion of covariance and contravariance (bras and kets, respectively). In mathematics, it is even broader - all you need is two vector spaces and ⊗.
In terms of deep learning (at least in most cases), there is no less notion of co- and contravariance. Yet, the tensor product makes sense, as (say) we can have an outer product between the sample and channels. Quite a few operations could be understood that way, e.g., so-called 1x1 convolutions that mix channels but do not do anything spatially and channel-wise.
A few notes here:
https://github.com/stared/thinking-in-tensors-writing-in-pyt...