July 23, 2019

A convolution operator for point clouds

Deep learning has revolutionized computer vision and Convolutional Neural Networks (CNN) now produce some great results on tasks like semantic classification and object segmentation for 2D images. Convolution is the fundamental building block of a neural network for image processing, but its definition is not directly adapted to point cloud processing. In this article, we present a convolution operator that operates on 3D point clouds without any intermediate representation. It can be used to design networks performing segmentation or classification tasks on point clouds.

Convolutions and image processing

Discrete convolution is the base operation in CNNs and image processing. This linear space filter combines the data of local neighborhoods on a 2D grid. Given a kernel \(g\) (a set of weights localized by pixel coordinates), the convolution of image \(I\) and \(g\) at pixel \(x\) is defined by:

\begin{equation}
(I \ast g)(x)=\sum_{y \in N_x}g(x-y)I(y)
\end{equation}

where \(N_x\) is the neighborhood of \(x\), whose size depends on the choice of kernel \(g\).

Principle of an image convolution.

In CNNs, convolution weights (the values of kernels \(g\) for each pixel) are learned at various layers (corresponding to various scales), allowing the network to respond to geometric features of the input.

Thanks to the regular structure of images, convolutions can be computed with high efficiency on modern hardware. However, 3D point clouds have no such regular structure.

Definition of a convolution operator for point clouds

Inspired by image convolutions, we can define a convolution operator that operates directly on points.

In an image, each pixel is associated to a color, a gray level or more generally a vector of features. Likewise, in a point cloud, each point can be associated to a vector of features. Let \(P \subset \mathbb{R}^3\) be a point cloud and \(F = \left\{ f(x) ~|~ x \in P \right\}\) be the set of features held by the points of \(P\). The generic definition of a convolution of \(P\) (or better yet \(F\)) with kernel \(g\) at point \(x \in \mathbb{R}^3\) would be:

\begin{equation}
(F \ast g)(x)=\sum_{y \in N_x}g(x-y)f(y).
\end{equation}

By analogy, we would like \(g\) to apply different weights to different areas in a well chosen neighborhood of \(x\). We fix a radius \(r\) and try to design a kernel \(g\) whose definition domain is the ball \(B_r^3 = \left\{ x \in \mathbb{R}^3 ~|~ \left\| x \right\| \leq r \right\}\).

There are many ways to define areas in 3D space but relate the definition to points may be the most intuitive as features are also localized by them. Let \( (\tilde{x}_k)_{k < K} \subset B_r^3\) be a set of \(K\) kernel points and \((W_k)_{k < K}\) be their associated weight matrices. We define the kernel function \(g\) for any point \(x \in B_r^3\) as:

\begin{equation}
g(x)=\sum_{k < K}h(x,\tilde{x}_k)W_k
\end{equation}

where \(h\) is a correlation between \(x\) and a given kernel point \(\tilde{x}_k\) that should increase when the distance from \(x\) to \(\tilde{x}_k\) decreases. For its simplicity, a linear correlation is a reasonable choice:

\begin{equation}
h(x,\tilde{x}_k) = \max(0, 1 - \frac{\left | x - \tilde{x}_k \right |}{\sigma})
\end{equation}

where \(\sigma\) is the influence distance of the kernel points. \(\sigma\) must be chosen according to the input density.

Comparison between a convolution kernel for an image (left) and a convolution kernel defined by points for point clouds (right). In point clouds, the neighborhood is not explicitly defined by a grid structure like in images: it has to be computed for each input point. Such a neighborhood can be computed efficiently thanks to a radius search in a \(k\)-d tree.

Now that we have defined a generic convolution operator, it is possible to design CNNs operating directly on points where

  • the radius \(r\) and influence distance \(\sigma\) are fixed for each layer,
  • the number \(K\) and positions of the kernel points \((\tilde{x}_k)_{k < K}\) are fixed for each layer,
  • the associated weight matrices \((W_k)_{k < K}\) are learned.

Geometric features learned in a Convolutional Neural Network

We can visualize the geometric features learned by the network by coloring the points according to their level of activation for this feature at a given layer.

Response of an input cloud to a given kernel.

In the following figures, we show the low and high-level features learned by a CNN constructed around the convolution operator we defined. We observe that, in its first layer, the network is able to learn low-level features like vertical/horizontal planes, linear structures or corners. In the later layers, the network detects more complex shapes like small buttresses, balls, cones, or stairs.

Low-level features the network has learned at the first layer.
High-level features the network has learned at the third layer.

References

For a full description of this convolution operator and an overview of the applications it allows through neural networks, please read: Thomas, H. et al. KPConv: Flexible and Deformable Convolution for Point Clouds, In International Conference on Computer Vision (ICCV), 2019. [arXiv] [Code]

Recent Posts

How we evaluate the performances of our classification softwares

People often wonder what the success rate (or the accuracy) of our automated classification softwares are. Unfortunately, there is no straightforward answer to that question. In this article, we describe what we think is the most scientific way to assess a classification performance and how some of our clients choose to complement the quantitative indices with […]

Read More
From Paris to Fontainebleau

At Terra3D we love ping pong. But more than that we really (really) love cycling. Besides, we also have two sites in Parisian region : one at Paris, in the heart of the city of love, and another in Fontainebleau, a city renowned for the forest of Fontainebleau and for the historic Chateau de Fontainebleau, just […]

Read More
Introduction to 3D point clouds

What is a 3D Point Cloud ? A 3D point cloud is a list of cartesian coordinates X, Y and Z representing the geometry of a scene.  Generally composed of millions or billions of points, they are accurate digital representations of the real world. These coordinates correspond to the actual size of objects. Thus, we […]

Read More
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram