Verifying Data with a Better Mapper

Data sets often come in a “cloud” of points, says CUNY Baccalaureate student Alisa Leshchenko. The algorithm known as Mapper helps extract the shape of the data from this cloud. But Mapper isn’t perfect.

In a new study, Leshchenko and Professor Mikael Vejdemo-Johansson of The College of Staten Island and The Graduate Center, CUNY, present an improved version of Mapper that makes it possible to verify the algorithm’s results.

Graphs showing Mapper analysis of the Iris dataset using Petal Length as a filter function, and density of Z-scores of log persistences from applying Method 4 to the Iris mapper graph. Marked with a vertical line and a separate point, both in orange, is the maximum Z-score from the dataset itself.
Graphs showing Mapper analyses

“I think it will be particularly useful in applications where a guarantee is paramount,” Leshchenko said, “such as where medical or financial decisions are at stake.”

Their paper appears in the Proceedings of the Abel Symposium.

The Mapper algorithm extracts a data cloud’s shape by chopping up the cloud, shrinking the resultant chunks, and piecing them back together to form a “pixelated” version of the original.

“Classical Mapper has been used, for example, to identify a subpopulation of breast cancer patients who have good survival odds,” Leshchenko said. “Standard techniques would not have been able to identify this group.”

A theorem called the Nerve Lemma says that if you chop up and reconnect your data in a certain way, the new shape will keep the most useful features of the original. In mathematical terms, the two will be “topologically indistinguishable.”

Unfortunately, the Classical Mapper doesn’t meet the requirements of this theorem, so you can’t use the Nerve Lemma to check if your new shape is missing important features.

The researchers wrote about an algorithm they’ve developed and dubbed Certified Mapper that does meet the Nerve Lemma’s requirements. In other words, results from the new algorithm can be checked.

The authors say that while these algorithms are still finding their place in statistics and that the Certified Mapper still needs to optimized, it could someday be used in fields such as drug discovery.

Beyond SUM

Work By

Mikael Vejdemo Johansson (Assistant Professor, Data Science, Mathematics) | Profile 1 | Profile 2 | Profile 3
Alisa Leshchenko (Student, CUNY Baccalaureate, Mathematics and Data Science) | Profile 1