Authors
Jui Keskar, Germany
Abstract
Randomization of data is an ongoing need for various business reasons like design of clinical trials, or training an AI model, to name a few. To control the level of randomization, it is important to measure the level of randomness, i.e. unpredictability and dispersion, in the "randomized" data vis-a-vis the original data. While Permutation entropy measures unpredictability, there is no technique that measures dispersion of randomized data. To measure dispersion in randomized data, "Neighbour-displacementdelta" (NDD) based technique is proposed. JChaosIndex, measure of dispersion, considers displacement of each data element as well as relative displacements of the neighbours of each data element. Higher the JChaosIndex, more disersed is the randomized data. JChaosIndex measurement technique can be easily included in a programming language library or database methods or any algorithm. Importantly, this technique is domainagnostic as it works purely on the indexes of the data record and not the actual data.
Keywords
Measure of Randomness, Data Dispersion, JChaosIndex, Permutation Entropy, Neighbour Displacement Delta