Tag Archives: social network analysis

Empirical Study: Twitter is not a Social Network

Given my long time interest in complexity science, I often browse through arXiv (pronounced “archive”, as if the “X” were the Greek letter Chi, χ) for a little distraction. This archive is the go-to site for electronic preprints of scientific papers in the fields of mathematics, physics, computer science and statistics. If only we could have a similar archive in the social sciences.

In any case, I was pleasantly surprised to find a paper on arXiv entitled “Social Networks that Matter: Twitter Under the Miscroscope.” The authors argue that the linked structures of social networks do not reveal actual interactions among people. “Scarcity of attention and the daily rythms of life and work makes people default to interacting with those few that matter and that reciprocate their attention.” Using Twitter to study social interactions, the authors find that the “driver of usage is a sparse and hidden network of connections underlying the ‘declared’ set of friends and followers.”

The authors compiled a large dataset of Twitter 309,740 users. They obtained the number of followers and followees for each user along with the content and datestamp of all her posts. They also identified the number of directed (@name) posts and definited a user’s friend as a person whom the user has directed at least two posts to. The researchers were thus able to compare the number of friends a user has with the number of followers and followees they declared.

The first figure below depicts the number of posts as a function of the number of followers. The number of posts initially increases as the number of followers increases but it eventually saturates.

arXiv Twitter1

The second figure depicts the number of posts as a function of the number of friends. The number of posts increases as the number of friends increases, reaching the maximum 3,200 without saturating. As the authors note, “this suggests that in order to predict how active a Twitter user is, the number of friends is a more accurate signal than the number of his followers.”

arXiv Twitter2

The histogram below depicts a Twitter user’s number of friends divided by the number of followers. Most users have a very small number of friends compared to the number of followers they declared. “Hence, while the social network created by the declared followers and followees appears to be very dense, in reality the more influential network of friends suggests that the social network is sparse.”

arXiv Twitter2

The next figure below represents the number of friends as a function of the number of followees. As can be noted, the total number of friends saturates while the number of followers keeps growing due to the minimal effort required to add a followee.

arXiv Twitter4

In turn, the figure below depicts the proportion of friends versus followees as a function of followers. The curve initially increases but rapidly approaches zero as the number of followees increases.
arXive Twitter5

The authors thus conclude that Twitter users have a very small number of friends compared to the number of followers and followees they declare.

“This implies the existence of two different networks: a very dense one made up of followers and followees, and a sparser and simpler network of actual friends. The latter proves to be a more influential network in driving Twitter usage since users with many actual friends tend to post more updates than users with few actual friends. On the other hand, users with many followers or followees post updates more infrequently than those with few followers or followees.”

arXive Twitter6

In social network (a) above, all followees are depicted as linked nodes. In network (b), only links to actual friends are depicted. The latter is the hidden network that is more representative of actual interactions between Twitter users.

Most avid Twitter users would most likely find the authors’ conclusions rather obvious. As  Twitter user @timoreilly recently Tweeted, “Facebook is about people you used to know; Twitter is about people you’d like to know better.” I for one view Twitter as more of an information subscription tool that complements my use of emails than an actual network for social interaction.

This is precisely what the Twitter study are getting at:

Many people, including scholars, advertisers and political activists, see online social networks as an opportunity to study the propagation of ideas, the formation of social bonds and viral marketing, among others.

This view should be tempered by our findings that a link between any two people does not necessarily imply an interaction between them. As we showed in the case of Twitter, most of the links declared within Twitter were meaningless from an interaction point of view. Thus the need to find the hidden social network; the one that matters when trying to rely on word of mouth to spread an idea, a belief, or a trend.

This is an important reminder, especially for colleagues of mine at the Berkman Center who are engaged in social network analyses of various political blogospheres. Just because the data is there and “easily” available doesn’t mean that they actually represent the offline social interactions that we are ultimatley interested in studying. Social network data no matter how novel are still proxy data at best.

Patrick Philippe Meier

Politics 2.0 Conference: Social Network Analysis

“The Politics of Blogging” is the first panel I am bloggling live from at the Politics 2.0 conference in London. In what reflects an increasing interest in applying social network analysis (SNA) to blogosphere dynamics, two of the three papers applied SNA to political blogs in South Korea and Greece. See my previous blog on mapping the persion blogosphere here.

The first presentation was entitled “Social Network Analysis of Ideological Landscapes from the Political Blogosphere: The Case of South Korea.” The presenter argued that South Korea provides an ideal case study for network analysis. The country has seen important grassroots activities prior to the arrival of the Internet; there have been periods of demonstrations, student and worker revolutions/protests. South Korea also has the highest proportion of broadband users in the world. The analysis drew on the 115 blogs of the country’s 219 assembly members and their blog rolls.

The result of the analysis presented an interesting contrast to the results of SNA studies carried out on Republicans and Democrats in the US. South Korea’s political blogosphere was far less polarized. In fact, a substantial number of blogs linked to both the political-right and center parties. The main drawback of the study is the lack of statistical analysis applied to the network map, let alone any statistical analysis of dynamics and trends over time.

The presentation on the Greek political blogosphere applied standard SNA metrics to teethe out some of the underlying structures of the network. The case study focused specifically on the recent debate that took place on the Web with respect to the presidential elections for the Pan-Hellenic Socialist Movement (PASOK).

What I appreciate about this paper is the application of statistical analysis to the network map. Indeed,one reason for using mathematical and graphical techniques in social network analysis is to represent the descriptions of networks compactly and systematically. A related reason for using formal methods for representing social networks is that mathematical representations allow us to use software programs to analyze the network data. The third, and final reason for using mathematics and graphs for representing social network data is that the techniques of graphing and the rules of mathematics themselves suggest properties that we might look for in our networked data—features that might not have occurred to us if we presented our data using descriptions in words. These reasons are articulated by Hanneman and Riddle here.

Another reason I liked the paper is that the authors tied their analysis to the existing literature, e.g., Drezner and Farrel’s paper on the power and politics of blogs. Disclaimer: Professor Daniel Drezner is the chair of my dissertation committee. One of the interesting points that came out of the Q & A was the suggestion of studying negative links, i.e., those bloggers who tell others not to look at certain blogs. I had the last comment of the Q & A session in which I relayed to the panelists Berkman’s recent study on the Iranian blogosphere. My recommendations to the panelists were the same I gave to a colleague of mine at Berkman. These are included in my previous blog on Berkman’s work.

Patrick Philippe Meier