How we evaluate the performances of our classification softwares

People often wonder what the success rate (or the accuracy) of our automated classification softwares are. Unfortunately, there is no straightforward answer to that question. In this article, we describe what we think is the most scientific way to assess a classification performance and how some of our clients choose to complement the quantitative indices with evaluations of their own.

Performance metrics

The two most common performance metrics in a classification context are precision and recall. Precision \(P\) is the fraction of actually relevant instances among the instances the algorithm proposed (“is the algorithm not raising to many false alarms ?”) while recall \(R\) is the fraction of relevant instances that have been retrieved on the scene (“is the algorithm not missing some objects of interest ?”).

For a given class, let \(T_p\), \(F_p\), \(T_n\) and \(F_n\) denote the number of true positives, false positives, true negatives and false negatives, respectively. Precision can be computed as 

\begin{equation}P = \frac{T_p}{T_p + F_p}\end{equation}

and recall as

\begin{equation}R = \frac{T_p}{T_p + F_n}\end{equation}

When tuning the parameters of a detection algorithm, there usually is an inverse relationship between precision and recall: it is possible to increase one at the cost of reducing the other by increasing or reducing the sensitivity.

The \(F_1\)-score and the Intersection over Union \(IoU\) are examples of measures combining precision and recall into a single figure. The \(F_1\)-score is an harmonic mean:

\begin{equation}F_1 = 2 \frac{P \times R}{P + R}\end{equation}

and the \(IoU\) is computed as

\begin{equation}IoU = \frac{T_p}{T_p + F_p + F_n}\end{equation}

Example of performance metrics for a classification algorithm with 4 classes: terrain, cable, tower and vegetation.

However, the most comprehensive tool to discuss the results of a classification algorithm is the confusion matrix: a square table where each row depicts the instances in an actual class, while each column represents the instances in a predicted class. The closer it is to a diagonal matrix, the better the algorithm performed.

Example of a confusion matrix for a classification algorithm with 4 classes: terrain, cable, tower and vegetation.

Scale of the study

In a classified point cloud, each point is assigned a label. As a result, it is possible to compute performance metrics at different scales:

Testing set

In practice, such performance metrics are computed based on a sample dataset. The results we obtained automatically are compared to a ground truth: a reference version of the testing set where all the points are perfectly classified.

Unfortunately, the only way to obtain such a ground truth is to perform the classification by hand, to have the labels assigned manually by a specialist of the data we are dealing with. This is a very tedious work and that explains why providing precision and recall figures is sometimes difficult: in most cases, no representative ground truth is available and it takes too much time to build one.

Manually classified point cloud
Automatically classified point cloud

Example of a testing set in a powerline environment. On the left: point cloud classified automatically. On the right: reference point cloud manually classified by a specialist.

Industrial applications

In some industrial applications, the indices we defined above do not really matter and the qualitative analysis is more important to our clients.

Moreover, not all the classification errors are of the same importance to them: some errors are critical and put the rest of the process chain at risk, while some are insignificant and have no influence on the remainder of the process. For example, if the final application is vegetation monitoring, a cable point classified as vegetation will raise a false alarm, whereas a building point remaining unclassified is not harmful.

This is why some of our clients decide to add a study on the manual fixing time: they measure the time it takes for an operator to correct the output of our algorithms so that it is fit for their industrial use. And better precision and recall scores does not always mean shorter fixing times.

The importance of designing a learning-free algorithm

Ultimately, even if precision, recall and \(F_1\)-score are useful work tools, our goal is to reduce manual fixing time and make it each time more straightforward for our clients to integrate our softwares into their process chains.

This is achieved through the reduction of what our clients see as critical errors, that is to say the reduction of some particular values of the confusion matrix.

We could have designed a theory-of-everything algorithm based on learning techniques*: it would have maximized our "accuracy" scores, minimized our efforts and sounded very sexy. However, our learning-free approach has two noteworthy benefits:

  1. There is no need for you to provide a manually labelled training set; our softwares are self-contained.
  2. Our engineers master each layer of the algorithms; they'll know how to treat the specific errors you might find in our results. This would be impossible with a black-box classifier no human being knows how to influence.

*: Actually, we dit it. Twice. But we won't use it in production. [Handcrafted features][Deep Learning]

From Paris to Fontainebleau

At Terra3D we love ping pong. But more than that we really (really) love cycling. Besides, we also have two sites in Parisian region : one at Paris, in the heart of the city of love, and another in Fontainebleau, a city renowned for the forest of Fontainebleau and for the historic Chateau de Fontainebleau, just 65 km from Paris.

It naturally sounded like a challenge for us, lovers of the tarmac. And what better occasion than a hot July day, during the tour de France, to achieve this objective.

After a 4 hour ride, 65,8 km and 2 broken chains, we finally reached Fontainebleau. The Fontainebleau guys showed us some local spots as the steep Massoury climb or the Reine Amélie climb.

Rendez-vous next year for this new Terra3D tradition.

Relive 'T3D from Paris to Fontainebleau'

The path from Paris to Fontainebleau in the morning.

Relive 'T3D in the Forest of Fontainebleau'

The path from Fontainebleau to Melun in the afternoon.

Nice Fontainebleau panorama behind the "croix du calvaire".
Meet Terra3D team at the top of the Reine Amélie climb.

Introduction to 3D point clouds

What is a 3D Point Cloud ?

A 3D point cloud is a list of cartesian coordinates X, Y and Z representing the geometry of a scene. 

Generally composed of millions or billions of points, they are accurate digital representations of the real world. These coordinates correspond to the actual size of objects. Thus, we can make measurements, inspections and inventories as if we were in the field.

3D Point clouds are usually acquired by LIDAR scanners installed in terrestrial or aerial vehicles. The geo-referencing is usually carried out thanks to Global Positioning Systems (GPS) and Inertial Measurement Units (IMU). Depending on the acquisition system, some additional attributes such as laser intensity, GPS time or color may be available.

This technology is particularly well suited for mapping large scale infrastructures such as electrical corridors, railways, highways, urban environments, among others.

Example of Mobile Laser Scanning (MLS)

What is 3D point cloud classification ?

Classification is the process of giving a semantic sense to each group of points in the 3D point cloud. For example, in an electrical corridor, you may be interested in determining which points correspond to conductors, which to towers, which to ground and which to vegetation. This can be useful for example to generate alerts when an electric conductor is too close to the vegetation. We all remember the tragic 2018 camp fire in California.

Instead of doing classification manually, which is a painful and time-consuming task, automatic algorithms can do the job for you in an accurate, fast and reproducible way. Terra3D is specialized in developing this kind of automatic tools.

Raw point cloud
Classified point cloud

Example of automatic classification in a railway environment. On the right: raw point cloud in white color. On the left: classified point cloud where each color represents an object class.

Raw point cloud
Classified point cloud

Example of automatic classification in a powerline environment. On the right: raw point cloud in white color. On the left: classified point cloud where each color represents an object class.

What is 3D modeling ?

Modeling is the process of generating a geometrical model from a 3D point cloud. It is usually carried from a previously classified point cloud. For example, in a powerline corridor, wires conductors can be modelled as 3D polylines generated from a set of 3D points classified as such. Those polylines can be obtained by fitting a catenary shape. When geometric models are exported in a common CAD format file such as DWG, DXF or SHP, this process is often called vectorisation. Such models can be easily imported into classical CAD and GIS software.

Example of automatic modeling in a powerline environment. Red polylines correspond to 3D catenary models of wire conductors.
Example of automatic modeling in a railway environment. Red polylines correspond to 3D models of cables while blue polylines correspond to 3D models of rails.

Additional resources

A convolution operator for point clouds

Deep learning has revolutionized computer vision and Convolutional Neural Networks (CNN) now produce some great results on tasks like semantic classification and object segmentation for 2D images. Convolution is the fundamental building block of a neural network for image processing, but its definition is not directly adapted to point cloud processing. In this article, we present a convolution operator that operates on 3D point clouds without any intermediate representation. It can be used to design networks performing segmentation or classification tasks on point clouds.

Convolutions and image processing

Discrete convolution is the base operation in CNNs and image processing. This linear space filter combines the data of local neighborhoods on a 2D grid. Given a kernel \(g\) (a set of weights localized by pixel coordinates), the convolution of image \(I\) and \(g\) at pixel \(x\) is defined by:

(I \ast g)(x)=\sum_{y \in N_x}g(x-y)I(y)

where \(N_x\) is the neighborhood of \(x\), whose size depends on the choice of kernel \(g\).

Principle of an image convolution.

In CNNs, convolution weights (the values of kernels \(g\) for each pixel) are learned at various layers (corresponding to various scales), allowing the network to respond to geometric features of the input.

Thanks to the regular structure of images, convolutions can be computed with high efficiency on modern hardware. However, 3D point clouds have no such regular structure.

Definition of a convolution operator for point clouds

Inspired by image convolutions, we can define a convolution operator that operates directly on points.

In an image, each pixel is associated to a color, a gray level or more generally a vector of features. Likewise, in a point cloud, each point can be associated to a vector of features. Let \(P \subset \mathbb{R}^3\) be a point cloud and \(F = \left\{ f(x) ~|~ x \in P \right\}\) be the set of features held by the points of \(P\). The generic definition of a convolution of \(P\) (or better yet \(F\)) with kernel \(g\) at point \(x \in \mathbb{R}^3\) would be:

(F \ast g)(x)=\sum_{y \in N_x}g(x-y)f(y).

By analogy, we would like \(g\) to apply different weights to different areas in a well chosen neighborhood of \(x\). We fix a radius \(r\) and try to design a kernel \(g\) whose definition domain is the ball \(B_r^3 = \left\{ x \in \mathbb{R}^3 ~|~ \left\| x \right\| \leq r \right\}\).

There are many ways to define areas in 3D space but relate the definition to points may be the most intuitive as features are also localized by them. Let \( (\tilde{x}_k)_{k < K} \subset B_r^3\) be a set of \(K\) kernel points and \((W_k)_{k < K}\) be their associated weight matrices. We define the kernel function \(g\) for any point \(x \in B_r^3\) as:

g(x)=\sum_{k < K}h(x,\tilde{x}_k)W_k

where \(h\) is a correlation between \(x\) and a given kernel point \(\tilde{x}_k\) that should increase when the distance from \(x\) to \(\tilde{x}_k\) decreases. For its simplicity, a linear correlation is a reasonable choice:

h(x,\tilde{x}_k) = \max(0, 1 - \frac{\left | x - \tilde{x}_k \right |}{\sigma})

where \(\sigma\) is the influence distance of the kernel points. \(\sigma\) must be chosen according to the input density.

Comparison between a convolution kernel for an image (left) and a convolution kernel defined by points for point clouds (right). In point clouds, the neighborhood is not explicitly defined by a grid structure like in images: it has to be computed for each input point. Such a neighborhood can be computed efficiently thanks to a radius search in a \(k\)-d tree.

Now that we have defined a generic convolution operator, it is possible to design CNNs operating directly on points where

Geometric features learned in a Convolutional Neural Network

We can visualize the geometric features learned by the network by coloring the points according to their level of activation for this feature at a given layer.

Response of an input cloud to a given kernel.

In the following figures, we show the low and high-level features learned by a CNN constructed around the convolution operator we defined. We observe that, in its first layer, the network is able to learn low-level features like vertical/horizontal planes, linear structures or corners. In the later layers, the network detects more complex shapes like small buttresses, balls, cones, or stairs.

Low-level features the network has learned at the first layer.
High-level features the network has learned at the third layer.


For a full description of this convolution operator and an overview of the applications it allows through neural networks, please read: Thomas, H. et al. KPConv: Flexible and Deformable Convolution for Point Clouds, In International Conference on Computer Vision (ICCV), 2019. [arXiv] [Code]

Real-time rendering of 3D point clouds


H. Bouchiba, J-E. Deschaud and F. Goulette, (2018). « Raw point cloud deferred shading through screen space pyramidal operators ». Short paper. Eurographics 2018, the 39th Annual Conference of the European Association for Computer Graphics. Delft, Netherlands