Predictive modelling of online dynamic user-interaction recordings and community identification from such data becomes more and more important with the widespread use of online communication technologies. Despite of the time-dependent nature of the problem, existing approaches of community identification are based on static or fully observed network connections. Here we present a new, dynamic generative model for the inference of communities from a sequence of temporal events produced through online computer- mediated interactions. The distinctive feature of our approach is that it tries to model the process in a more realistic manner, including an account for possible random temporal delays between the intended connections. The inference of these delays from the data then forms an integral part of our state-clustering methodology, so that the most likely communities are found on the basis of the likely intended connections rather than just the observed ones. We derive a maximum likelihood estimation algorithm for the identification of our model, which turns out to be computationally efficient for the analysis of historical data and it scales linearly with the number of non-zero observed (L + 1)-grams, where L is the Markov memory length. In addition, we also derive an incremental version of the algorithm, which could be used for real-time analysis. Results obtained on both synthetic and real-world data sets demonstrate the approach is flexible and able to reveal novel and insightful structural aspects of online interactions. In particular, the analysis of a full day worth synchronous Internet relay chat participation sequence, reveals the formation of an extremely clear community structure.
- online community identification
- temporal delay
- latent variable model
- Markov chain