Saturday, September 4, 2010

Social networks and Dunbar's number

In response to my earlier post about Metcalfe's Law, Lior Sion brought up another magic number that is frequently cited when it comes to the number of social relationships: Dunbar's Number. I remembered reading about this in Malcolm Gladwell's Tipping Point and it got me thinking about how it applies to online social networks.

To get started in my investigation, I looked up the Wikipedia article on Dunbar's number. The article describes the number as "a theoretical cognitive limit to the number of people with whom one can maintain stable social relationships". As I continued my search I found a better reference, Dunbar himself explaining how he arrived at this number in this video . According to Dunbar, the number of meaningful relationships a primate can maintain depends on the ratio of the size of neocortex (which exists only in mammals) to the overall brain size. You can compare this ratio and corresponding social unit size in different primates and arrive at a number of roughly 150 in humans. There are several examples of social units without a significant hierarchy that support this hypothesis including Goretex micro-businesses cited in Gladwell's book and Hutterite communes.

So how is this number related to online social communities? Apparently Dunbar is investigating this issue as reported in this article. The full report is due some time later this year. Not surprisingly he notes that even in case of people who have millions of "friends", the communication frequencies indicate that the number of meaningful connections is less than 150. I ran some numbers on Spigit's communities and I came away with the same conclusion. The chart below shows the total number of connections as a function of individual users for three community sizes: small (< 500), medium (< 5000) and large (> 5000). I cut off the long tail in order to clearly show the variation in the fat part (please note that no animals where harmed in this process).


The most conspicuous thing about the charts is the fact that the number of connections as a function of users seems to follow the power law. An interesting property of power law functions is that the natural log of frequencies (y values) varies linearly with log of x values. I did exactly that to the values shown in the first chart and plotted the resulting values in the chart below.


The top few highly connected users fail to live up to the power law expectations (Wikipedia article mentions similar deviations in other phenomena that otherwise follow power law), but the shape of the curve is quite close to being linear towards the end.

Looking at the first chart it is quite clear that the bulk of users have less than 150 connections but what about the highly talkative users that seem to have communicated over a thousand other individuals? The answer lies in the fact that none of these users maintained all these conversations simulteneously. Online communities differ significantly from  the social groups studied earlier in three inter-related respects:

  1. Lower barrier to entry - Internet is highly efficient communication mechanism. It is incredibly easy to get virtual access to people that you have no prior knowledge of. 
  2. Higher mobility - It is much easier to leave or join a social cluster compared to other social units where physical proximity plays a critical role.
  3. Driven by Individual Goals - Online clusters form around a common interest or a goal that is initiated at the individual level not at the communal level. Offline social units form around a set of predefined common goals (e.g. Gore factory unit, HOAs, etc.). Social units (certainly within Spigit communities) in online communities are spawned by people who want to change the world around them.
These three factors contribute to a very dynamic set of social clusters that are formed around topics of interest, gain traction with a small group of users and are eventually replaced by other hot spots with a different membership. In fact looking under the covers, it was clear to me that even highly social users changed their group affiliations over a period of time. Even though they remained interested in communicating with a handful of individuals on a continuous basis, they communicated with different sets of people at different times. The chart shown below illustrates this point. It shows the communication pattern of some of the most connected users in highly successful Spigit communities. Each data point reflects the number of connections in a 30 day period. As you can see from the chart, that number is below 150 with the exception of one data point even though the total number connections over the entire period is above 1000 for some of these users.


So why is this analysis so interesting (other than satisfying one's academic curiosity)? Well, because it helps us evaluate the breadth of collaboration in an online community which is strongly correlated with diversity of participation.  The analysis presented here and in my last post about Metcalfe's law helps us set expectations in this regard. First, the number of interactions per user follows the power law even in the most successful communities. In general the best you can hope to achieve is to lift the long tail  part of the curve upwards. Second, whether it is due to cognitive or bandwidth limitations, the number of active communications in a short time frame that can be maintained by even the most active individuals seems to be below 150. As a result it is not realistic to judge the level of collaboration in terms of the total number of possible connections (n*(n-1)/2), but Dunbar's number seems to be an achievable ceiling for the most active users in your community.