The following navigation utilizes arrow, enter, escape, and space bar key commands. Left and right arrows move through main tier links and expand / close menus in sub tiers. Up and Down arrows will open main tier menus and toggle through sub tier links. Enter and space open menus and escape closes them as well. Tab will move on to the next part of the site rather than go through menu items.
Maria Alexandra Theodorescu, Daniela Raicu, Jacob Furst, Roselyne Tchoua, College of Computing and Digital Media, DePaul University, 243 S Wabash Ave, Chicago IL, 60604
Data science is intrinsically inter-disciplinary; however, end-users of machine learning models are not always trained data scientists. On the other hand, it is crucial that these models be infused with domain knowledge in order to increase explainability and trust in their output. Our ultimate goal is to assign domain-aware confidence scores to help domain experts make informed decisions. Our hypothesis is that given confidence scores, end-users will be more willing to trust and adopt machine learning models. We test this hypothesis with materials informatics, a field that has the potential to greatly reduce time-to-market and development costs for new materials as it leverages machine learning and large datasets for targeted design. For example, automated phase-mapping seeks to discover samples of materials mixture with similar structure. This is challenging because measurements per sample far exceed the number of samples to cluster making it difficult to interpret and generalize. Towards our goal, we are building a dashboard comparing and contrasting clustering methods. We envision that scientists will not only be able to assess confidence scores but also interact with results; merging and splitting clusters, guiding the discovery process. We describe the signals in terms of peaks and other interpretable features; we cluster the data using K-Means with varied numbers of clusters. We provide several visualization options (e.g. layered graphs, samples closest to and farthest from centroids). Our preliminary results show a number of fully or mostly homogeneous clusters (using ground truth labels), discovering well-defined clusters with a fraction of the original features (28 out of 2000 or 1.4%). As we repeat the experiment with more clusters, we identify consistently homogeneous clusters as well as larger clusters, candidates for further splitting. We will experiment with different clustering methods comparing the performance using our features, the original and other reduced features (e.g. PCA).
Presenter: Maria Alexandra Theodorescu
Institution: Depaul University
Type: Poster
Subject: Computer Science
Status: Approved