We are interested in studying the shape of data in the form of point clouds: a collection of individual samples (documents, images, individuals...) represented as points in a high-dimensional feature space. A fundamental premise of data science is that such high-dimensional datasets contain simple underlying geometric structure. For instance, in linear regression, we assume a linear inherent structure within a cloud of high-dimensional points. Here, we develop a framework and algorithm allowing geometric inferences about the underlying shape, which is helpful for downstream machine learning and data analysis.
Our framework studies integro-geometric properties (such as perimeter, surface area, mean width) of a union of virtual balls placed around each of the data points in the cloud. Estimating these features for a wide range of ball-radii informs us about the evolution of geometric properties across scales as a proxy to learn more about the geometry of the structure from which the points were sampled.
We have successfully developed an algorithm realizing geometric inferences; our naive first implementation, however, is too slow to be practical. At the core of this algorithm, we need to repeatedly compute intersection volumes between multiple virtual balls, which is extremely time consuming. To identify relevant computations more quickly, we now utilize an induction method to efficiently calculate the Vietoris–Rips complex of the point cloud as a preliminary data representation. Also, for each simplex degree, we train a neural network to learn the intersection volumes of multiple virtual balls, just based off relative ball distances and radii. We can then use the trained networks instead of tedious computations to estimate the integro-geometric properties of the point cloud shape.
Preliminary results show that our method is able to robustly estimate geometric features (area, perimeter, and Euler–Poincare characteristic) from two-dimensional point clouds and distinguish between various shapes.