Paper presented at the Ford Foundation conference on media diversity, Dec. 2003.


Measuring Media Concentration Online and Offline


By Mathew Hindman and Kenneth Neil Cukier

Fellows, National Center for Digital Government

John F. Kennedy School of Government, Harvard University



The Web is a critical part of the media environment. It is now the largest information source ever assembled by humankind, comprising innumerable terabytes of data, many times more than any one person could explore in many lifetimes. Among the reasons for the Web’s ability to scale to such an extent is that online content is generated in an entirely decentralized way; it is the aggregate of the contributions of millions of individual parties. The result, as we all know, is a cornucopia of information on every conceivable subject.


But has this vast plethora of Web content translated into diversity for what users see and access? There is evidence that it has not. On the contrary, despite the vast amount of content online, the structure of the Web leads to a staggering and unexpected degree of concentration, and may even exceed the concentration found in traditional, offline media.


The reason represents a paradox of the Internet that social scientists are just beginning to understand: although it is easier than ever for content to be created, it’s tougher than ever for that content to be seen by others. More competition means more concentration. Ironically, it is the very factors that account for the Web’s success that are responsible for this shortcoming. Because the Web is far larger than any one person can hope to view or comprehend, the role of navigational aids like hyperlinks and search engines become paramount. This funnels users to the most common content; less popular content is extremely hard to find. Additionally, because the information is generated by millions of parties, it is highly likely that the vast majority of content will be overlooked by most people.


But to what degree does this happen? And how does that compare to traditional media?


This paper seeks to answer these questions. The issue is particularly important in light of the Federal Communications Commission’s decision in June 2003 to loosen existing media ownership restrictions. Part I uses a method to measure the concentration of Web content by examining the link-structure, which is correlated to site traffic, to find that the Net is highly concentrated. Part II compares online media concentration with offline media using two statistical approaches, the Herfindahl-Hirschman Index (HHI) and the Gini coefficient, to note that the Web exhibits higher degrees of audience concentration than other media.


The paper concludes by applying the findings to media diversity policy, and suggests that regulators should take the limitations of the medium into account when they consider what the Web means for media as a whole.


I. Measuring Online Content Concentration


The Web allows any information provider to compete with any other on equal footing. However, this equality in content creation belies an extreme inequality in the attention that sites receive. How the Web is used in practice necessarily alters our understanding of online diversity: what matters most is the spectrum of information sources people actually hear, not simply the ability to speak. The question becomes how to measure the Web, looking not at the huge number of sites that sit idle waiting for surfers to pass, but focusing instead on the online sources people actually use.


The gold standard of Web measurement would be to monitor the Web traffic from each individual and each Web site. This is obviously infeasible.  Current methods of looking at large samples of users have significant drawbacks—they are extremely expensive, usually proprietary, and they introduce large-scale bias into subjects’ usage patterns. Though the Web is celebrated as a “narrowcasting” or “pointcasting” medium, cross-sectional data doesn’t provide enough information about such small content niches to allow us to make valid statistical inferences.


An alternative to tracking users is to look at the link structure of the Web. The number of links pointing to a site is, it turns out, highly correlated with the number of visitors that a site receives. Even more directly, the link structure of the Web determines what content citizens can find: most modern search engine algorithms—such as Google’s PageRank—use link structure to rank search engine results. Link data, which is all publicly available, thus allows us to draw a rough map of how the attention of citizens is distributed across different sources of online information. 

In a previous large-scale study of online political information, the Web was crawled using sophisticated text-classification software to identify all sites related to one of six contentious political issues (abortion, gun control, the death penalty, the presidency, the U.S. Congress, and general politics). The resulting classifications of Web communities – comprising 3 million pages and half a terabyte of data -- were then measured for the frequency that links to a particular Web page appeared. Again, the degree to which a page is linked to is highly correlated with the amount of traffic it receives. (The previous study can be viewed at


The results show that in every sub-community of Web content examined, the link-density of sites followed a power law distribution—an extreme winners-take-all pattern. For instance, in the area of gun control, there are over13,000 Web pages yet two-thirds of all hyperlinks point to the 10 most popular sites. For capital punishment, the top 10 sites account for 63% of the all links on the topic. Moreover, in every category of content, more than half the Web sites have only a single link to them.


The implication of these findings is that contrary to conventional wisdom, the Web exhibits a high degree of information-source concentration.  In other words, the Web does not provide in practice the diversity of content that it promises in theory. This raises the question of how it compares to traditional media, which is addressed in the next section.



II. Comparing Online Concentration to Offline Media


We propose using two different metrics to illustrate the comparative concentration of Web-based content with traditional media: the Herfindahl-Hirschman Index (HHI) and the Gini coefficient. While the two metrics measure different things, they ultimately tell a  similar story: that online content is at least as concentrated as traditional media.


HHI is a method among competition regulators to measure the concentration of firms within a single market. It is calculated by taking the percentage market share of each firm in a market, squaring the number, and then adding them to find the sum. In this instance, however, we are not measuring market size, but rather the audience-concentration disparity among different media. The advantage of HHI is that by squaring market concentration, it allows us to focus our attention on the power of the few, top information sources. In essence, HHI tells us how powerful the big boys are.


The second metric is the Gini coefficient. It was originally developed as a way to measure income inequality within and across different populations.  The Gini coefficient is based on the Lorenz Curve, which is plotted by comparing the cumulative portion of wealth owned by the cumulative percentage of the population. 


The Gini coefficient produces a number between 0 and 1, where 0 represents perfectly equal distribution and 1 is complete concentration. Importantly, this measure is impervious to changes in the size of a population – or in this case, the size of a given media niche. In contrast to HHI, the Gini coefficient accounts for the mass of guppies at the tail end, not just the big fish at the top.


By both metrics, the degree of concentration of online information sources exceeds that of offline media (see table, below).


Media Type

Gini Coefficient


TV—Primetime Audience Share



Radio—New York Market



Radio—Boston Market



Print—All U.S. Magazines



Print—All U.S. Newspapers



Web—Abortion Sites



Web—Gun Control Sites



Web—Presidency Sites




The results show the magnitude of the disparity among media. Using the Gini coefficient, traditional media concentration grows as the competitive structure of the medium increases, as one would expect. Audience is fairly evenly distributed among television networks, but the concentration grows for radio and print. But since the Gini coefficient is really comparing the area under the Lorenz curve, the numbers here are actually more surprising than they appear at first glance.  The Web communities have roughly 1/6th the area under the curve as newspapers and magazines, and roughly 1/10th that of radio and television. 


Using HHI, the disparity is also pronounced. Television is again the least concentrated medium. Radio and print show more concentration, but in an inverse order compared to the Gini coefficient measurement.  Still, Web content again appears far more concentrated any other media.


Which metric used certainly matters: in this context local radio markets, for example, look more concentrated with HHI than they do with the Gini coefficient.  Yet by both measures, online content looks like a different world than traditional media—though not in the way that scholars and policymakers likely would have expected.




Although the Web greatly expands the amount of information sources people can chose from, in practice the structure of the medium creates a high degree of concentration of content among a small handful of sites. In short, every Web site has a voice—but most speak in a whisper, and a powerful few are given a megaphone. Compared to traditional media, both the Herfindahl-Hirschman Index and the Gini coefficient show that Web content is at least as concentrated as offline outlets, and probably more so.


These findings are relevant in terms of the FCC’s decision to raise the media ownership caps. In the face of criticism, FCC Commissioner Powell justified the Commission’s decision stating that new technologies, particularly the Internet, has increased the diversity of information sources that Americans see. Although the Web was not mentioned in the FCC’s formal decision, it nevertheless plays a significant role in the FCC’s media diversity index through which the Commission will monitor the impact of media concentration on an ongoing basis.


The Web’s concentrated structure does not point to the need for any sort of regulatory remedy to lessen the winner-take-all situation. Rather, the importance for regulatory policy is more modest: that the phenomenon be acknowledged and taken into account. Specifically, the Web cannot be used as evidence that the medium increases media diversity.


That the Web may exacerbate, not remedy, long-standing concerns over media concentration is a somewhat counter-intuitive conclusion compared to the commonly-held idea of the Net as an informational nirvana. Yet just as people have evolved their thinking about what the dot-com world means for business after an initial period of over-excitement, scholars and regulators need to reconsider the assumptions that they bring to the question of what the Web means for media diversity.