Abstract
Probabilistic graphs have gained much interest in the data mining community since the big data revolution. Graph clustering is a widely used technique in exploratory data analysis such as data compression, information retrieval, protein–protein interaction network, etc. Traditional clustering algorithms assume that the data represented in a graph is deterministic, i.e., the edges between nodes are certain, though they may vary in their importance (weight). In probabilistic graphs, however, edges are uncertain and are represented as probabilities. In this work, we propose an evolutionary algorithm based clustering technique specifically to deal with uncertainties in such data. It has been observed that correlation usually exists among adjacent edges. Our work exploits this principle and explores the paths between nodes (including those of multiple order) to provide an estimate of the similarity between nodes with probabilistic edges. We embed this information into the fitness function and use Genetic algorithm to optimize the solution. The algorithm is tested on several benchmark datasets and compared with recent state-of-the-art algorithms.
| Original language | English |
|---|---|
| Pages (from-to) | 216-238 |
| Number of pages | 23 |
| Journal | Information Sciences |
| Volume | 568 |
| Early online date | 6 Apr 2021 |
| DOIs | |
| Publication status | Published - Aug 2021 |
Keywords
- Clustering
- Probabilistic graphs
- Co-similarity
- Genetic algorithm