Human-in-the-loop Clustering Dashboard for Materials Structure Exploration

Maria Alexandra Theodorescu, Daniela Raicu, Jacob Furst, Roselyne Tchoua, College of Computing and Digital Media, DePaul University, 243 S Wabash Ave, Chicago IL, 60604

Data science is intrinsically inter-disciplinary; however, end-users of machine learning models are not always trained data scientists. On the other hand, it is crucial that these models be infused with domain knowledge in order to increase explainability and trust in their output. Our ultimate goal is to assign domain-aware confidence scores to help domain experts make informed decisions. Our hypothesis is that given confidence scores, end-users will be more willing to trust and adopt machine learning models. We test this hypothesis with materials informatics, a field that has the potential to greatly reduce time-to-market and development costs for new materials as it leverages machine learning and large datasets for targeted design. For example, automated phase-mapping seeks to discover samples of materials mixture with similar structure. This is challenging because measurements per sample far exceed the number of samples to cluster making it difficult to interpret and generalize. Towards our goal, we are building a dashboard comparing and contrasting clustering methods. We envision that scientists will not only be able to assess confidence scores but also interact with results; merging and splitting clusters, guiding the discovery process. We describe the signals in terms of peaks and other interpretable features; we cluster the data using K-Means with varied numbers of clusters. We provide several visualization options (e.g. layered graphs, samples closest to and farthest from centroids). Our preliminary results show a number of fully or mostly homogeneous clusters (using ground truth labels), discovering well-defined clusters with a fraction of the original features (28 out of 2000 or 1.4%). As we repeat the experiment with more clusters, we identify consistently homogeneous clusters as well as larger clusters, candidates for further splitting. We will experiment with different clustering methods comparing the performance using our features, the original and other reduced features (e.g. PCA).

Additional Abstract Information

Presenter: Maria Alexandra Theodorescu

Institution: Depaul University

Type: Poster

Subject: Computer Science

Status: Approved

Time and Location

Session: Poster 5
Date/Time: Tue 12:30pm-1:30pm
Session Number: 4003