Modeling population growth in online social networks
© Zhu et al.; licensee Springer. 2013
Received: 14 March 2013
Accepted: 27 May 2013
Published: 24 June 2013
Skip to main content
© Zhu et al.; licensee Springer. 2013
Received: 14 March 2013
Accepted: 27 May 2013
Published: 24 June 2013
Online social networks (OSNs) are now among the most popular applications on the web offering platforms for people to interact, communicate and collaborate with others. The rapid development of OSNs provides opportunities for people’s daily communication, but also brings problems such as burst network traffic and overload of servers. Studying the population growth pattern in online social networks helps service providers to understand the people communication manners in OSNs and facilitate the management of network resources. In this paper, we propose a population growth model for OSNs based on the study of population distribution and growth in spatiotemporal scale-space.
We investigate the population growth in three data sets which are randomly sampled from the popular OSN web sites including Renren, Twitter and Gowalla. We find out that the number of population follows the power-law distribution over different geographic locations, and the population growth of a location fits a power function of time. An aggregated population growth model is conducted by integrating the population growth over geographic locations and time.
We use the data sets to validate our population growth model. Extensive experiments also show that the proposed model fits the population growth of Facebook and Sina Weibo well. As an application, we use the model to predict the monthly population in three data sets. By comparing the predicted population with ground-truth values, the results show that our model can achieve a prediction accuracy between 86.14% and 99.89%.
With our proposed population growth model, people can estimate the population size of an online social network in a certain time period and it can also be used for population prediction for a future time.
Nowadays online social networks (OSNs) are considered as the most popular applications on the web, which offer platforms for people to interact, communicate and collaborate with others. The user population of online social networks is growing expeditiously. It is reported that Facebook (2013) has reached 900 million users in April 2012. Meanwhile, Twitter (2013) has also surpassed 500 million users in July 2012. The rapid development of OSNs facilitates people’s daily communications. However, the growth of user population also causes problems to service providers, such as overload of servers. One example is the “fail whale” phenomenon in Twitter, where the requested page returns a “fail whale” image when too many burst requests occur.
The issues and patterns of population growth in OSNs have drawn much attention from the academia and many works have been done in the past years. A study of micro evolution on OSNs (Leskovec et al. 2008) captured the best fits of population growth in four different OSNs and showed that the growth tendency varies with time. Torkjazi et al. (2009) and Rejaie et al. (2010) observed S-shaped population growth: it experiences a slow growth in the beginning, following a period of exponential growth and finally a significant and sudden slow down in the growth of the population. However, most of these studies fail to provide a theoretical model to describe the population growth in OSNs. Besides, existing works study population growth only in the temporal dimension, and they lacks concern of the dynamics in the geographic scale.
In this paper, we investigate the population growth of OSNs from spatiotemporal scale-space. Our investigation is based on three data sets randomly sampled from the popular OSN website including Renren (2013), Twitter (2013) and Gowalla (2011), from which we explore their population distributions over various geographic locations and time-varying properties on population growth. We find out that in the spatial scale, the population size follows a power-law distribution over geographic locations. In the temporal scale, the population growth in the largest populated location is revealed to fit a power function increasing with time. The number of populated locations also increases as a power function as time. Based on these observations, we propose an aggregated population growth model by integrating the population growth over geographic locations and time. Theoretical analysis is presented to derive this model and comprehensive experiments are conducted to verify its effectiveness. It is shown that the proposed model fits well for population growth in large scale rapid growing OSNs such as Facebook (2013) and Sina Weibo (2013). As an application, we further utilize the model to predict population growth in three data sets, which illustrates that our model can achieve a prediction accuracy between 86.14% and 99.89%.
There are several applications of our work. It has significant meanings for Internet Service Providers (ISPs) to understand the population growth of OSN users, which will further reveal the user interaction patterns and network traffic patterns. It is also benign for the OSN web sites to deploy servers and cast advertisements on the base of population growth model. The third-party service providers can analyze the service market by the model and further optimize their resource deployment and investment.
To conduct our analysis, we collect data from three online social network sites: Renren, a social-based application service, Twitter, a social-based media service and Gowalla, a location-based online social service. Renren, established in December 2005 and now with 160 million users, is a Chinese online social network which organizes users into membership-based networks representing schools, companies and geographic locations. It allows users to post short messages known as status, blogs and pictures. It also allows people to share contents such as videos, articles and pictures. Twitter, with over 500 million users, launched in July 2006, is known as its microblogging services by which users can write any topic within the 140 characters limit. Such kind of short message is known as tweet. A follower can follow any other users and receive any kind of tweets from his/her followings. Varied from above-mentioned two online social networks, Gowalla is a location-oriented online social service. People are allowed to check-in their visiting places via mobile devices. It is launched in 2007 and closed in 2012 with approximately 600,000 users.
We collect the Renren and Twitter data sets by crawling from their sites. We start our crawling with randomly selected users from the largest weakly connected component (WCC). Following friends’ links in the forward direction in a breadth first search (BFS) fashion, we collect a sample of each social network. To eliminate the degree bias caused by BFS, we launch the BFS-bias correction procedure described in (Kurant et al. 2011). Furthermore, according to the estimation method of the size of social networks by Katzir et al. (2011), we believe the quality and quantity of our data sets are enough to reveal the laws of population growth in OSNs.
In order to capture the growth of population in different geographic locations, we need to know the account creation time and geographic information of each user. We trace user account creation time in Twitter from user profile. However, we cannot explicitly retrieve account creation time from user profile in Renren. To estimate the account creation time precisely, we use the time of a user’s first activity such as updating status, posting a blog or interacting with friends as the time point when the account creates. Meanwhile, we seek users’ geographic locations from user profiles and choose users with valid geographic information to compose our data sets.
The Gowalla data set, obtained from public source (Cho et al. 2011), contains more than 100,000 users, as well as their social relations and check-in histories. We find the user registration time by their first check-ins. To reveal users’ geographic information, similar as (Cho et al. 2011), we infer a user’s location by compartmentalizing the globe into 25 by 25 km cells and defining the location as the cell with the most check-ins.
Statistics of data sets
We present the methods for modeling of population growth in online social networks in this section.
To study the population growth in OSNs, we first illustrate the basic approach of modeling the population growth in spatiotemporal dimension.
The population in OSNs grows over locations and time. In spatial aspect, people from different geographic locations may register as users in an OSN, thus people from more and more locations join in the network. The OSN expanding from locations to locations, leads to the growth of population spatially. At the same time, the population in each geographic location grows in temporal scale. People in a geographic location may be attracted to join in the network from time to time and thus the location will have more and more people. Therefore, combining spatial and temporal effects, we model the population growth as the accumulation of populations in different geographic locations, while population in each location changes as a function of time. We describe the population growth in spatial and temporal dimensions as follows.
This formulation describes the aggregated population on a certain time point in spatial aspect.
The formulation reflects not only the spatial characteristics that it is aggregation of populations in different geographic locations, but also temporal factor that the population growth is a dynamic process as a function of time.
So far, we propose a population growth model in spatiotemporal perspective. To specify this model, we need to study three time-dependent functions: the dynamics of population distribution P(s,t), the growth function of populated locations l(t) and the growth function of the largest population n(t) in the following subsections.
Alternatively, we conduct alternative hypothesis testing regarding the population distribution by the likelihood ratio test (Clauset et al., 2009), which suggests that the distribution is a power-law if the likelihood ratio between the alternative and power-law distribution is positive. We calculated the likelihood ratio of exponential distribution compared with power-law distribution, which is 2.23, and the likelihood ratio of log-normal distribution compared with power-law distribution, which is 0.12. The results suggest that power-law is the best distribution to represent population distribution.
We fit each distribution in figures with maximal likelihood estimation (MLE) (Newman 2005Clauset et al. 2009). The fitting results are shown in dashed lines. It shows that Renren data set has a power-law exponent of 1.4, Twitter has a power-law exponent of 1.78 and Gowalla has a power-law exponent of 1.4.
where φ is the scaling factor and λ is the power-law exponent. The equation reveals that the population distribution in different time periods is a power-law function, and it is independent from time.
To model the population growth in OSNs, one important aspect is to understand the growth of populated locations. In this subsection, we investigate the growth of populated locations.
with scaling parameter η and the power exponent ε. The power exponent of Renren is 1.26, of Twitter is 1.96 and of Gowalla is 1.62.
In a summary, we find that the growth of the number of populated locations in OSNs is a power function of time.
As we model the population growth as an accumulation of populations in various locations, the largest population as the upper bound of the formulation also needs to be investigated.
Besides the power component a ∗ t b , there is a constant number c added to the power function. We use this function to fit the largest population size in each data set shown as the solid lines in the figures. Specifically, the power parameter of Renren is 1.31, of Twitter is 2.97 and it is 1.61 for Gowalla.
Therefore, the growth function of the largest population size is a power function. The population growth of a location will affect the aggregated population growth. We will give the detailed model of the population growth in the following subsection.
The above equation reveals that the population growth is a function of time, and it is similar to power function. The model describes the aggregated population growth of online social networks in both temporal dimension and spatial dimension.
To present the effectiveness of our population growth model, we evaluate our proposed model from three aspects. First, we verify the model in the early stage population growth of three data sets. Then we evaluate the full population growth in Facebook (2013) and Sina Weibo (2013) by our model. Finally, as an application of our model, we use it to predict the population growth on the latter part populations of three data sets.
We verify our population growth model by estimating the early stages of population growth in three data sets (i.e. first 35 months’ population growth of Renren, the first 40 months’ population growth of Twitter and the first 14 months’ population growth of Gowalla).
In a summary, the verification of the population growth model in three data sets validate the correctness of the model.
As an application, we use our model to predict the latter part population growth of three OSN data sets.
The prediction accuracy of the population growth model
In this section, we discuss the impact of methods used for data collection and processing, the effects of populations acquirement in OSNs and the scope of our growth model.
In this paper, we use BFS started from random selected nodes to collect the data samples from online social networks. The population growth model based on random sampling data may cause inaccurate population estimation and prediction. To avoid this issue, we conduct several actions to make the data sets fair enough. First of all, we launch a BFS-bias correlation procedure (Kurant et al. 2011) to eliminate the biases caused by random walking. Secondly, according to the estimation method of social network sizes by random sampling presented in (Katzir et al. 2011), we argue that the quality of data sets cannot be affected by random selected nodes. Finally, we use full population size in Facebook and Sina Weibo to validate the effectiveness of the proposed population growth model. All these efforts are made to let the data samples collected from online social networks be accurate enough for modeling population growth.
We use registered users as the population in OSNs. Thus, we count every registered user in the network as a member of the total population. It contains both active users and inactive users. In our model, we consider these inactive users as one part of the aggregated population for the following reasons: (1) Inactive users are also one part of the population. We cannot say a user who is not active in the network does not belong to this network. (2) Detecting active users is a complex process which cannot be done simply. For example, people may find the active users from the activity that the user conducts on the web site. However, many people may only browse the web site without any explicit activity. They perform inactive in interacting with others in OSNs, but they are also active users. By these two reasons, we consider registered users instead of active users as the population of an OSN.
Our population growth model focuses on the growing stage of OSNs. We do not intend to track the life circle of an OSN. When an OSN’s population stops growing, our model will not take effect on it. To specify the growing stage of an OSN, we define the population monthly growth rate as if r(t) > 0, we consider the population is in growing stage between time t-1 and t. We say an OSN is in growing stage if its monthly growth rates are all greater than 0 in the observed time period. Actually, our crawled data sets and two full population OSNs (Facebook and Sina Weibo) are all in population growing stage, which adapt to our study. Besides, most popular OSNs (such as Facebook and Twitter) are currently still in the growing stage. Therefore, our population model focuses in the stage of population growth in OSNs.
In this paper, we propose a population growth model for online social networks. We investigate the population growth in spatiotemporal perspective. By studying the population growth over locations and time in three data sets of Renren, Twitter and Gowalla, we find out the population distribution is a power-law function over various locations. The growth of populated locations and the largest population are both power functions of time. By integrating the temporal and spatial characteristics of population growth, we conduct the general population growth model. Extensive experiments show that our model can fit the population growth in Facebook and Sina Weibo. As an application, we use the model for population predication in three data sets, and it can achieve a prediction accuracy between 86.14% and 99.89%.
The authors acknowledge the funding from Alexander von Humboldt Foundation and DAAD Foundation. We would like to thank Mr. Cong Ding for the help to crawl Renren data sets. We also appreciate the comments from anonymous reviewers for improving the quality of the paper.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.