Function Norms in Neural Network

Classification of norms

In mathematics, norms measure a vector or matrix's size or length. Norms help quantify the magnitude of a model's parameters and calculate the distance between two vectors.

Let $V$ be a vector space with $u, v, w \in V$and $||.|| : V \to \mathbb{R}$ be a function from $V$ to the set of real numbers. The function $||.||$ is called a norm if:

$$ \begin{align*} \bullet \hspace{0.2cm}& ||v|| \geq 0\\ \bullet \hspace{0.2cm} & ||v|| =0 \iff v=0 \\\bullet \hspace{0.2cm} & ||\lambda v|| = |\lambda| \times ||v||, \hspace{1cm} \text{for }\lambda \in \mathbb{R} \\\bullet \hspace{0.2cm} & ||u-w|| \leq ||u-v||+||v-w|| \end{align*} $$

The above points emphasize that the norm is always positive and zero only if the vector is zero. In other words, the norm is a generalized concept of distance, and in particular, $L^2$ norm gives us the distance or length of a vector in $\mathbb{R}^n$.

There are different types of norms, such as $L^1$norm, $L^2$ norm, Max norm, Frobenius norm, and Spectral norm. Choosing a norm can significantly impact a model's performance, and it is essential to choose the right one based on the problem at hand. The mathematical properties of any norm are given below. Here, we explain the mathematical explanation of these norms and the behavior of neural networks in each of these norms.

$L^1$ Norm

$L^1$ norm sums the absolute value of entries of a vector $v$. Let $v\in R^n$ and $v=(x_1, x_2,\ldots, x_n)$, then $L^1$ norm of $v$ is denoted by $||v||_1$ and is defined as:

$$ ||v||1 = \sum{i=1}^n |x_i| = |x_1| + |x_2| + \cdots +|x_n| $$

$L^1$ norm is also known as the Manhattan norm. It is commonly used in regularization techniques, such as $L^1$ regularization, to encourage sparsity in neural network models.

$L^2$ Norm

$L^2$ norm is also known as the Euclidean norm. For a vector $v=(x_1, x_2,\ldots, x_n)$, the $L^2$ norm of $v$ is denoted by $||v||_2$ and is defined as:

$$ ||v||2 = \sqrt{\sum{i=1}^n |x_i|^2} = \sqrt{|x_1|^2 + |x_2|^2 + \cdots +|x_n|^2} $$

For real vectors $v$, $|x_i|^2 = x_i^2$, so the absolute value can be dropped. But for a vector $v$ with complex entries $|x_i|$ denotes the complex modulus of the entry $x_i$. $L^2$ norm is commonly used in optimization techniques, such as gradient descent, to update the weights in a neural network.

Max Norm

Max norm is a type of norm that measures the maximum magnitude of the elements in a vector. For a vector $v=(x_1, x_2,\ldots, x_n)$, the max norm of $v$ is denoted by $||v||_\infty$ and is defined as:

$$ ||v||\infty = \max{i} |x_i| $$

Max norm is commonly used in regularization techniques to limit the magnitude of weights in a neural network. It can also be used to prevent exploding gradients during backpropagation.

Geometric Interpretation

The above three norms can be viewed with a simple geometric interpretation. We fix the underlying space to be $\mathbb{R}^2$ and the norms compared to 1. Let $v=(x, y) \in \mathbb{R}^2$, we see that in different norms $||v||$ has different answers. Let’s fix these solutions equal to 1 and draw them.