How we evaluate the performances of our classification softwares

People often wonder what the success rate (or the accuracy) of our automated classification softwares are. Unfortunately, there is no straightforward answer to that question. In this article, we describe what we think is the most scientific way to assess a classification performance and how some of our clients choose to complement the quantitative indices with evaluations of their own.

Performance metrics

The two most common performance metrics in a classification context are precision and recall. Precision \(P\) is the fraction of actually relevant instances among the instances the algorithm proposed (“is the algorithm not raising to many false alarms ?”) while recall \(R\) is the fraction of relevant instances that have been retrieved on the scene (“is the algorithm not missing some objects of interest ?”).

For a given class, let \(T_p\), \(F_p\), \(T_n\) and \(F_n\) denote the number of true positives, false positives, true negatives and false negatives, respectively. Precision can be computed as 

\begin{equation}P = \frac{T_p}{T_p + F_p}\end{equation}

and recall as

\begin{equation}R = \frac{T_p}{T_p + F_n}\end{equation}

When tuning the parameters of a detection algorithm, there usually is an inverse relationship between precision and recall: it is possible to increase one at the cost of reducing the other by increasing or reducing the sensitivity.

The \(F_1\)-score and the Intersection over Union \(IoU\) are examples of measures combining precision and recall into a single figure. The \(F_1\)-score is an harmonic mean:

\begin{equation}F_1 = 2 \frac{P \times R}{P + R}\end{equation}

and the \(IoU\) is computed as

\begin{equation}IoU = \frac{T_p}{T_p + F_p + F_n}\end{equation}

Example of performance metrics for a classification algorithm with 4 classes: terrain, cable, tower and vegetation.

However, the most comprehensive tool to discuss the results of a classification algorithm is the confusion matrix: a square table where each row depicts the instances in an actual class, while each column represents the instances in a predicted class. The closer it is to a diagonal matrix, the better the algorithm performed.

Example of a confusion matrix for a classification algorithm with 4 classes: terrain, cable, tower and vegetation.

Scale of the study

In a classified point cloud, each point is assigned a label. As a result, it is possible to compute performance metrics at different scales:

Testing set

In practice, such performance metrics are computed based on a sample dataset. The results we obtained automatically are compared to a ground truth: a reference version of the testing set where all the points are perfectly classified.

Unfortunately, the only way to obtain such a ground truth is to perform the classification by hand, to have the labels assigned manually by a specialist of the data we are dealing with. This is a very tedious work and that explains why providing precision and recall figures is sometimes difficult: in most cases, no representative ground truth is available and it takes too much time to build one.

Manually classified point cloud
Automatically classified point cloud

Example of a testing set in a powerline environment. On the left: point cloud classified automatically. On the right: reference point cloud manually classified by a specialist.

Industrial applications

In some industrial applications, the indices we defined above do not really matter and the qualitative analysis is more important to our clients.

Moreover, not all the classification errors are of the same importance to them: some errors are critical and put the rest of the process chain at risk, while some are insignificant and have no influence on the remainder of the process. For example, if the final application is vegetation monitoring, a cable point classified as vegetation will raise a false alarm, whereas a building point remaining unclassified is not harmful.

This is why some of our clients decide to add a study on the manual fixing time: they measure the time it takes for an operator to correct the output of our algorithms so that it is fit for their industrial use. And better precision and recall scores does not always mean shorter fixing times.

The importance of designing a learning-free algorithm

Ultimately, even if precision, recall and \(F_1\)-score are useful work tools, our goal is to reduce manual fixing time and make it each time more straightforward for our clients to integrate our softwares into their process chains.

This is achieved through the reduction of what our clients see as critical errors, that is to say the reduction of some particular values of the confusion matrix.

We could have designed a theory-of-everything algorithm based on learning techniques*: it would have maximized our "accuracy" scores, minimized our efforts and sounded very sexy. However, our learning-free approach has two noteworthy benefits:

  1. There is no need for you to provide a manually labelled training set; our softwares are self-contained.
  2. Our engineers master each layer of the algorithms; they'll know how to treat the specific errors you might find in our results. This would be impossible with a black-box classifier no human being knows how to influence.

*: Actually, we dit it. Twice. But we won't use it in production. [Handcrafted features][Deep Learning]

From Paris to Fontainebleau

At Terra3D we love ping pong. But more than that we really (really) love cycling. Besides, we also have two sites in Parisian region : one at Paris, in the heart of the city of love, and another in Fontainebleau, a city renowned for the forest of Fontainebleau and for the historic Chateau de Fontainebleau, just 65 km from Paris.

It naturally sounded like a challenge for us, lovers of the tarmac. And what better occasion than a hot July day, during the tour de France, to achieve this objective.

After a 4 hour ride, 65,8 km and 2 broken chains, we finally reached Fontainebleau. The Fontainebleau guys showed us some local spots as the steep Massoury climb or the Reine Amélie climb.

Rendez-vous next year for this new Terra3D tradition.

Relive 'T3D from Paris to Fontainebleau'

The path from Paris to Fontainebleau in the morning.

Relive 'T3D in the Forest of Fontainebleau'

The path from Fontainebleau to Melun in the afternoon.

Nice Fontainebleau panorama behind the "croix du calvaire".
Meet Terra3D team at the top of the Reine Amélie climb.

A convolution operator for point clouds

Deep learning has revolutionized computer vision and Convolutional Neural Networks (CNN) now produce some great results on tasks like semantic classification and object segmentation for 2D images. Convolution is the fundamental building block of a neural network for image processing, but its definition is not directly adapted to point cloud processing. In this article, we present a convolution operator that operates on 3D point clouds without any intermediate representation. It can be used to design networks performing segmentation or classification tasks on point clouds.

Convolutions and image processing

Discrete convolution is the base operation in CNNs and image processing. This linear space filter combines the data of local neighborhoods on a 2D grid. Given a kernel \(g\) (a set of weights localized by pixel coordinates), the convolution of image \(I\) and \(g\) at pixel \(x\) is defined by:

\begin{equation}
(I \ast g)(x)=\sum_{y \in N_x}g(x-y)I(y)
\end{equation}

where \(N_x\) is the neighborhood of \(x\), whose size depends on the choice of kernel \(g\).

Principle of an image convolution.

In CNNs, convolution weights (the values of kernels \(g\) for each pixel) are learned at various layers (corresponding to various scales), allowing the network to respond to geometric features of the input.

Thanks to the regular structure of images, convolutions can be computed with high efficiency on modern hardware. However, 3D point clouds have no such regular structure.

Definition of a convolution operator for point clouds

Inspired by image convolutions, we can define a convolution operator that operates directly on points.

In an image, each pixel is associated to a color, a gray level or more generally a vector of features. Likewise, in a point cloud, each point can be associated to a vector of features. Let \(P \subset \mathbb{R}^3\) be a point cloud and \(F = \left\{ f(x) ~|~ x \in P \right\}\) be the set of features held by the points of \(P\). The generic definition of a convolution of \(P\) (or better yet \(F\)) with kernel \(g\) at point \(x \in \mathbb{R}^3\) would be:

\begin{equation}
(F \ast g)(x)=\sum_{y \in N_x}g(x-y)f(y).
\end{equation}

By analogy, we would like \(g\) to apply different weights to different areas in a well chosen neighborhood of \(x\). We fix a radius \(r\) and try to design a kernel \(g\) whose definition domain is the ball \(B_r^3 = \left\{ x \in \mathbb{R}^3 ~|~ \left\| x \right\| \leq r \right\}\).

There are many ways to define areas in 3D space but relate the definition to points may be the most intuitive as features are also localized by them. Let \( (\tilde{x}_k)_{k < K} \subset B_r^3\) be a set of \(K\) kernel points and \((W_k)_{k < K}\) be their associated weight matrices. We define the kernel function \(g\) for any point \(x \in B_r^3\) as:

\begin{equation}
g(x)=\sum_{k < K}h(x,\tilde{x}_k)W_k
\end{equation}

where \(h\) is a correlation between \(x\) and a given kernel point \(\tilde{x}_k\) that should increase when the distance from \(x\) to \(\tilde{x}_k\) decreases. For its simplicity, a linear correlation is a reasonable choice:

\begin{equation}
h(x,\tilde{x}_k) = \max(0, 1 - \frac{\left | x - \tilde{x}_k \right |}{\sigma})
\end{equation}

where \(\sigma\) is the influence distance of the kernel points. \(\sigma\) must be chosen according to the input density.

Comparison between a convolution kernel for an image (left) and a convolution kernel defined by points for point clouds (right). In point clouds, the neighborhood is not explicitly defined by a grid structure like in images: it has to be computed for each input point. Such a neighborhood can be computed efficiently thanks to a radius search in a \(k\)-d tree.

Now that we have defined a generic convolution operator, it is possible to design CNNs operating directly on points where

Geometric features learned in a Convolutional Neural Network

We can visualize the geometric features learned by the network by coloring the points according to their level of activation for this feature at a given layer.

Response of an input cloud to a given kernel.

In the following figures, we show the low and high-level features learned by a CNN constructed around the convolution operator we defined. We observe that, in its first layer, the network is able to learn low-level features like vertical/horizontal planes, linear structures or corners. In the later layers, the network detects more complex shapes like small buttresses, balls, cones, or stairs.

Low-level features the network has learned at the first layer.
High-level features the network has learned at the third layer.

References

For a full description of this convolution operator and an overview of the applications it allows through neural networks, please read: Thomas, H. et al. KPConv: Flexible and Deformable Convolution for Point Clouds, In International Conference on Computer Vision (ICCV), 2019. [arXiv] [Code]

Real-time rendering of 3D point clouds

References

H. Bouchiba, J-E. Deschaud and F. Goulette, (2018). « Raw point cloud deferred shading through screen space pyramidal operators ». Short paper. Eurographics 2018, the 39th Annual Conference of the European Association for Computer Graphics. Delft, Netherlands

Aerial and UAV point cloud automated processing

Terra3D expertise has been applied to high voltage electric lines asset management. We developed software to classify automatically aerial point clouds. This classification has been then used to compute lines model. This model can then be used to check if the vegetation if far enough from the lines.

Automated PCRS (urban GIS database)

Terra3D has developed powerful automated tools for automatic urban and road point cloud analysis. These tools can be used to create GIS vector data bases. In particular it can be used to match the PCRS (which stands for Plan Corps de Rue Simplifié)

PCRS

PCRS is a norm that defines GIS vector maps for urban asset management and network planning. This norm is defined by the French government, it will be mandatory for cities to be able to provide PCRS to all network managers (eletric, water and gas) before 2026. The PCRS provides a common reference for various use cases.

Method

Terra3D developed an automated pipeline to process point cloud data acquired by mobile mapping to generate PCRS maps automatically. This pipeline consist of 2 steps:

Raw unprocessed point cloud. Here the point cloud is colored, each point holds RGB information.
Automatically classified point cloud. Each class is represented as a color.
Automatically vectorized point cloud. Vectorized classes are : curb, road marking, and walls.