JCHAOSINDEX: Measuring and Benchmarking Dispersion in Randomized Data

Jui Keskar, Germany; Jui Keskar, Germany

JCHAOSINDEX: Measuring and Benchmarking Dispersion in Randomized Data

Authors

Jui Keskar, Germany

Abstract

Randomization of data is an ongoing need for various business reasons like design of clinical trials, or training an AI model, to name a few. To control the level of randomization, it is important to measure the level of randomness, i.e. unpredictability and dispersion, in the "randomized" data vis-a-vis the original data. While Permutation entropy measures unpredictability, there is no technique that measures dispersion of randomized data. To measure dispersion in randomized data, "Neighbour-displacementdelta" (NDD) based technique is proposed. JChaosIndex, measure of dispersion, considers displacement of each data element as well as relative displacements of the neighbours of each data element. Higher the JChaosIndex, more disersed is the randomized data. JChaosIndex measurement technique can be easily included in a programming language library or database methods or any algorithm. Importantly, this technique is domainagnostic as it works purely on the indexes of the data record and not the actual data.

Keywords

Measure of Randomness, Data Dispersion, JChaosIndex, Permutation Entropy, Neighbour Displacement Delta

CS&IT Conference Proceedings

JCHAOSINDEX: Measuring and Benchmarking Dispersion in Randomized Data