Node sampling services provide peers in a peer-to-peer system with a source of randomly chosen addresses of other nodes. Ideally, samples should be independent and uniform. The restrictions of a distributed environment, however, introduce various dependancies between samples. We review gossip-based sampling protocols proposed in previous work, and identify sources of inaccuracy. These include replicating the items from which samples are drawn, and imprecise management of the process of refreshing items. Based on this analysis, we propose a new protocol, Eddy, which aims to minimize temporal and spatial dependancies between samples. We demonstrate, through extensive simulation experiments, that these changes lead to an improved sampling service. Eddy maintains a balanced distribution of items representing active system nodes, even in the face of realistic levels of message loss and node churn. As a result, it behaves more like a centralized random number generator than previous protocols. We demonstrate this by showing that using Eddy improves the accuracy of a simple algorithm that uses random samples to estimate the size of a peer-to-peer network.
Bibliographical noteFunding Information:
Acknowledgements This research is funded in part by the Engineering and Physical Sciences Research Council (EPSRC) UK grant number EP/F000936/1.
- Node sampling
ASJC Scopus subject areas
- Computer Networks and Communications