Gnutella File Sharing

by Moya K. Mason

Originally created by Tom Pepper and Justin Frankel at Nullsoft, Gnutella is a protocol that allows Internet file and information sharing in a decentralized system. They designed it in hopes of creating a purer peer-to-peer network.

Users connect to other users and not to a central server. Instead of connecting to a central server, Internet users connect themselves to other Internet users. Users act both as clients and as servers, creating a peer-to-peer network that allows information and files to be shared and found through a non-centralized system.

Gnutella software is composed of a mini search engine combined with a file system.

In Gnutella and the Transient Web, Kelly Truelove says that:

The Gnutella protocol restores the Web's original symmetry, enabling even transient computers to effectively participate as servers. It's far from a complete solution, and alternative systems may eclipse it. Nonetheless, this simple and idiosyncratic protocol is currently in the vanguard of the emergence of the transient Web. The transient Web has the potential to be every bit as disruptive as the conventional "permanent" Web, and possibly more so.

What makes Gnutella so great? Even with audio software programs like Napster, SpinFrenzy, and CuteMX, you aren't always able to find what you're looking for. This is the reason why Gnutella can help you. Gnutella can help you find anything you want, such as the latest software downloads, audio samples, and movies. It also provides search results for files, links, graphics, and advertisements. Many people have suspected that if Napster shuts down, its users would automatically turn to Gnutella to find their music.

However, in First Among Equals, Will Knight says that Gnutella is making moves towards more of a Napster-like syatem, using reflectors or proxy servers to provide resources for users with slow connections and not allowing them to act as servers.

In A Network of Peers: Peer-to-Peer Models Through the History of the Internet, by Nelson Minar and Marc Hedlund, the authors state:

Through the music-sharing application called Napster, and the larger movement dubbed "peer-to-peer," the millions of users connecting to the Internet have started using their ever more powerful home computers for more than just browsing the Web and trading email. Instead, machines in the home and on the desktop are connecting to each other directly, forming groups and collaborating to become user-created search engines, virtual supercomputers, and file systems.

So how does Gnutella actually work? All Gnutella users must receive the IP address of another user in order to connect. The message used to connect appears as "GNUTELLA CONNECT/0.4/N/N", and is referred to as a handshake message (Jerome Kuptz). The response from the individual contacted appears as "GNUTELLA OK/n/n" and this message connects you to GnutellaNet (Jerome Kuptz). Each user has a network of peers, which become your radius; the average radius is of 2,000 to 10,000 peers. This allows for access to 500,000 to 1 million files (Jerome Kuptz). Just as in Napster or Google, all search terms are entered in the Gnutella search box. From there, all users in your radius search their files to match your search. Recent improvements to Gnutella actually allow the user to set the desired time for their search.

In Free Riding on Gnutella, Eytan Adar and Bernardo A. Huberman report that:

  1. Gnutella has a significant amount of free riding in its system.
  2. They found that nearly 70% of Gnutella users share no files.
  3. Nearly 50% of the resources are contributed by the top 1% of users.

These statistics highlight the number of freeloaders using the system. Since no one is in charge, users are expected to volunteer files and cooperate with one another. But there is no way to enforce this because Gnutella has an anonymous user population. Ultimately, the users of Gnutella are responsible for its future success.

How is all of this information sharing legal? No one company is responsible for Gnutella. It is a protocol consisting of information accessible to anyone and everyone at absolutely no cost. Since no company is in charge, it is difficult to hold anyone accountable. If one part of the network is shut down, the system will continue to survive because it is a distributed system and doesn't exist on a single server.

Gnutella, by Gene Kan

"Gnutella was born sometime in early March 2000. Justin Frankel and Tom Pepper, working under the dot-com pen name of Gnullsoft, are Gnutella's inventors. Their last life-changing product, Winamp, was the beginning of a company called Nullsoft, which was purchased by American Online (AOL) in 1999. Winamp was developed primarily to play digital music files. According to Tom Pepper, Gnutella was developed primarily to share recipes.

Gnutella was developed in just fourteen days by two guys without college degrees. It was released as an experiment. Unfortunately, executives at AOL were not amenable to improving the state of recipe sharing and squashed the nascent Gnutella just hours after its birth. What was supposed to be a GNU General Public License product when it matured to Version 1.0 was never allowed to grow beyond Version 0.56. Certainly if Gnutella were allowed to develop further under the hands of Frankel and Pepper, this chapter would look a lot different.

At least Gnutella was born with a name. The neologism comes from ramming GNU and Nutella together at high speed. GNU is short for GNU's Not Unix, the geekish rallying cry of a new generation of software developers who enjoy giving free access to the source code of their products. Nutella is the hazelnut and chocolate spread produced by Italian confectioner Ferrero. It is typically used on dessert crepes and the like. I think it's great, and chocolate is my nemesis.

Anyway, Gnutella was declared an "unauthorized freelance project" and put out to pasture like a car that goes a hundred miles on a gallon of gas. Or maybe like a technology that could eliminate the need for a physical music distribution network. Cast out like a technology that could close the books on a lot of old-world business models? Well, something like that, anyway.

Freenet can really be described as a bandwidth- and disk space-sharing concept with the goal of promoting free speech. Gnutella is a searching and discovery network that promotes free interpretation and response to queries.

In contrast, Gnutella is a distributed searching system with obvious applications for humans and less obvious applications for automatons. Each Gnutella node is free to interpret the query as it wants, allowing Gnutella nodes to give hits in the form of filenames, advertising messages, URLs, graphics, and other arbitrary content. There is no such flexibility in the Freenet system."

First Among Equals, by Will Knight

"New techniques are making the Gnutella file-sharing network flourish, but may be taking away the key benefits of pure peer-to-peer networking.

In peer-to-peer networking, each desktop computer both provides and receives information, i.e. it acts as both a server and a client.

Gnutella was designed to create a more pure peer-to-peer network. The first version removed the need for Napster's central servers by sending requests for files to every computer on the network. In practice, however, this overloaded the network when more than a few thousand users were connected.

Ironically, Gnutella is now moving back to a system reminiscent of Napster. Recent refinements to Gnutella and related software reduce the burden on the network by stopping clients with slow connections acting as servers. Instead, proxy servers - called reflectors - host information for the slower clients and reduce the overall strain on the network."

A Network of Peers: Peer-to-Peer Models Through the History of the Internet, by Nelson Minar and Marc Hedlund

"Through the music-sharing application called Napster, and the larger movement dubbed "peer-to-peer," the millions of users connecting to the Internet have started using their ever more powerful home computers for more than just browsing the Web and trading email. Instead, machines in the home and on the desktop are connecting to each other directly, forming groups and collaborating to become user-created search engines, virtual supercomputers, and file systems."

Mojo Nation, by Jim McCoy

"In a similar vein, Napster and other distributed client-servers are built on the shifting sands of volunteerism. Freeloaders and parasites cannot be controlled. The freeloader gains all the benefit of the whole system and pushes the cost to those foolish enough to give away their resources. Xerox PARC researchers Eytan Adar and Bernardo A. Huberman documented these problems in their Free Riding on Gnutella paper that found 70% of Gnutella users provided no files or resources to the system and that 1% of the users were providing half of the total system resources. Whether or not this turns out to be a major problem for peer-to-peer systems remains to be seen but the Mojo Nation technology provides flexible tools to reduce freeloading if it becomes a serious problem."

Free Riding on Gnutella, by Eytan Adar and Bernardo A. Huberman

"An extensive analysis of user traffic on Gnutella shows a significant amount of free riding in the system. By sampling messages on the Gnutella network over a 24-hour period, we established that almost 70% of Gnutella users share no files, and nearly 50% of all responses are returned by the top 1% of sharing hosts. Furthermore, we found out that free riding is distributed evenly between domains, so that no one group contributes significantly more than others, and that peers that volunteer to share files are not necessarily those who have desirable ones. We argue that free riding leads to degradation of the system performance and adds vulnerability to the system. If this trend continues copyright issues might become moot compared to the possible collapse of such systems.

In this paper we analyzed user traffic in Gnutella and concluded that there is a significant amount of free riding in the system. Specifically, we found that nearly 70% of Gnutella users share no files, and nearly 50% of all responses are returned by the top 1% of sharing hosts. Furthermore, we found that free riding is distributed evenly between domains, so that no one group contributes significantly more than others, and that peers that volunteer to share files are not necessarily those who have desirable ones.

These findings have serious implications for the future development of Gnutella and its many variants. In order for distributed systems with no central monitoring to succeed, a large amount of voluntary cooperation is required, a requirement that is very hard to fulfill in systems with large user populations that remain anonymous.

Sometimes, the logic behind the decision to cooperate or not changes when the interaction is ongoing, since future expected utility gains will join present ones in influencing the rational individual's decision. In particular, individual expectations concerning the future evolution of the social dilemma can play a significant role in each member's decisions[Hu96]. An interesting continuation of these experiments may lead to an understanding of how free riding changes over time."

Gnutella and the Transient Web, by Kelly Truelove

"To the lament of Tim Berners-Lee and other Web pioneers, the Web became more of a one-to-many medium than the many-to-many communication system originally envisioned.

The Gnutella protocol restores the Web's original symmetry, enabling even transient computers to effectively participate as servers. It's far from a complete solution, and alternative systems may eclipse it. Nonetheless, this simple and idiosyncratic protocol is currently in the vanguard of the emergence of the transient Web. The transient Web has the potential to be every bit as disruptive as the conventional "permanent" Web, and possibly more so.

What do Gnutella and the Web have to do with each other? Isn't Gnutella just one of many P2P file-sharing systems?

Yes, Gnutella enables P2P file sharing, but take a closer look. With Gnutella, file transfer is accomplished via HTTP, the same protocol Web browsers and servers use to transfer Web pages and other data. Under the hood, each Gnutella application contains a no-frills Web-server component for serving files and a primitive browser-like element for retrieving them."

Independence Array, by Jerome Kuptz

"Any chance of shutting down unauthorized file-sharing ended on March 14, when programmers at AOL's Nullsoft division unleashed their server-less P2P client, Gnutella. AOL yanked the program from Nullsoft's site within hours, but dozens of reverse-engineered replacements have since been posted to the Net, many complete with source code. As shown at right, Gnutella's architecture is fully decentralized, so file-sharers' computers can find each other without soliciting a central server. Shut down any part of the network, and the rest will keep running. Gnutella's freedom to file-share, alas, isn't without trade-offs: It replaces efficient client-server transactions with a many-to-many packet flurry that can chew up bandwidth. And the decentralization that prevents shutdowns also means there's no place to build user relationships or collect revenue. Who cares? Not the end user.

1. The Gnutella application on your desktop is actually a peer, acting as both client and server in interactions with a network of similar peers. Unlike Napster, Gnutella has no central servers to which it can connect for information. Before it can begin swapping files, your peer must be told (by the user or from its own database) the IP address of one other peer to which it can connect.

2. Your Gnutella peer transmits a handshake message ("GNUTELLA CONNECT/0.4\n\n") to the other peer. The handshake identifies you to the other peer, which in return sends back a confirmation ("GNUTELLA OK\n\n").

3. Your peer sends a ping request to the other peer, announcing its presence on the network. The ping request includes a TTL (time-to-live) count, which states how many times the request can be forwarded to other computers. The default for most Gnutella peers is 7.

4. The other peer replies to your ping with a pong, which contains its IP address and file sharing information (total files and kilobytes shared).

5. The other peer also forwards your ping to additional Gnutella peers it knows about, after first reducing the TTL count by 1, from 7 to 6. Each peer that receives the packet similarly subtracts 1 from the TTL and forwards the packet to others. Many peers end up forwarding your ping to one another over and over. Gnutella relies on fat bandwidth to overcome this inefficiency. Users raising their TTLs past 7 could flood the Net with trillions of pings. To keep Gnutella efficient, other peers will adjust high TTLs before forwarding them.

6. Each peer that receives your ping sends back a pong to your peer, routing the pong back along the path of the ping.

7. As pongs arrive, your hostcatcher collects the IP addresses of available peers. They may be anywhere on the Internet, but all are at most seven degrees of separation from you. This network of peers known to your own is your radius.

8. A typical radius includes 2,000 to 10,000 other peers, with 500,000 to 1 million files. Gnutella's open architecture means you can also share files with users of compatible programs such as Gnotella or Gnucleus.

9. To find a file, you enter a search term into the Gnutella interface on your screen. Your peer then sends a query directly to every peer known to your peer.

10. Each peer searches its local files for matches to your query. If it doesn't find any, it doesn't reply. This prevents your computer from being bombarded with "no results" messages.

11. If there are one or more matches, a query results message is routed to your peer, containing the IP address of the sender and the matching file name. Unlike Napster or a Web search engine, your peer doesn't know when the search process is complete: Peers that haven't replied either have found no results, or are still working on a reply. Newer implementations allow the user to set the duration of the search.

12. When you select one of the query results for downloading, your peer creates a standard http request - the kind used by browsers to request Web pages - from the IP address and filename in the results message. It sends this request directly to the peer, which returns the file via http. This is part of what makes Gnutella networks hard to shut down - their file transfers look just like ordinary Web traffic.

13. If the file you're after is hidden behind a firewall, your peer will issue a push request - a broadcast message that winds its way around the network until it gets to the recipient, which responds by connecting to your peer and transmitting the file. An estimated 50 percent of Gnutella traffic is across firewalls.

14. Peers on low-bandwidth networks will miss (or "drop") messages, causing pings, pongs, queries, and replies to be lost. This happens not only to messages to and from the low-bandwidth computer, but to any to which it is trying to forward packets. In other words, a huge portion of your radius can "go dark," becoming unreachable and unusable. This is another inefficiency inherent in Gnutella's serverless structure."

The Noisy War Over Napster, by Steven Levy

"But even if the music industry succeeds in killing Napster, it is faced with a series of imitators, some of whom are even scarier from an industry point of view In a way, Napster is a fat target for attackers: it uses a centralized database, which allows the company some control over its users. (And keeps a list of transfers handy for potential litigants.) But with some newer systems, the searching is done in a distributed manner that can't be shut down or modulated. One of these systems is Gnutella (pronounced New-tella). Unlike Napster, Gnutella could be used to exchange not just music files but any files, including movies, text and photos--a copyright holder's nightmare."

In Praise of Freeloaders, by Clay Shirky

"As the excitement over P2P grew during the past year, it seemed that decentralized architectures could do no wrong. Napster and its cousins managed to decentralize costs and control, creating applications of seemingly unstoppable power. And then researchers at Xerox brought us P2P's first crisis: freeloading.

Freeloading is the tendency of people to take resources without paying for them. In the case of P2P systems, this means consuming resources provided by other users without providing an equivalent amount of resources (if any) back to the system. The Xerox study of Gnutella (now available at FirstMonday) found that " ... a large proportion of the user population, upwards of 70 percent, enjoy the benefits of the system without contributing to its content," and labels the problem a "Tragedy of the Digital Commons."

The Tragedy of the Commons is an economic problem with a long pedigree. As Mojo Nation, a P2P system set up to combat freeloading, states in its FAQ:

"Other file-sharing systems are plagued by "the tragedy of the commons," in which rational folks using a shared resource eat the resources to death. Most often, the "Tragedy of the Commons" refers to farmers and pasture, but technology journalists are writing about users who download and download but never contribute to the system."

To combat this problem, Mojo Nation proposes creating a market for computational resources -- disk space, bandwidth, CPU cycles. In its proposed system, if you provide computational resources to the system, you earn Mojo, a kind of digital currency. If you consume computational resources, you spend the Mojo you've earned. This system is designed to keep freeloaders from consuming more than they contribute to the system.

The Xerox study on Gnutella makes broad claims about the relevance of its findings, even as Napster, which adds more users each day than the entire installed base of Gnutella, is growing without suffering from the study's predicted effects. Indeed, Napster's genius in building an architecture that understands the inevitability of freeloading and works within those constraints has led Dan Bricklin to christen Napster's effects "The Cornucopia of the Commons."

Systems that set out to right the imagined wrongs of freeloading are more marketing efforts than technological ones, in that they attempt to inflame our sense of injustice at the users who download and download but never contribute to the system. This plays well in the press, of course, garnering headlines like "A revolutionary file-sharing system could spell the end for dot-communism and Net leeches" or labeling P2P users "cyberparasites."

This sense of unfairness, however, obscures two key aspects of P2P file-sharing: the economics of digital resources, which are either replicable or replenishable; and the ways the selfish nature of user participation drives the system.

The Tragedy of the Commons is a simple, compelling illustration of what can happen to commonly owned resources. It is also almost completely inapplicable to the digital world."

Profit from Peer-to-Peer

"For the past year or so, the term peer-to-peer (P2P) has become synonymous with Napster, the controversial file-sharing program created by a 20-year-old software whiz called Shawn Fanning and now the subject of numerous lawsuits. Napster, much like its close cousin, Gnutella, allows users to transfer music files among themselves, circumventing many legal controls over copyright and creating a massive network of music libraries scattered about the Internet. Napster is a clever twist on a time-worn architecture dating back to the early days of the Internet. Now, a number of start-up firms are hoping to harness the same technology in the corporate world, promising to use the computing architecture to empower workers, unleash their creativity and solve communication problems.

As the embattled Napster struggles to switch to a subscription-based business model, companies such as Groove Networks, NextPage and XDegrees are trying to introduce the peer-to-peer style of computing into applications that allow workers to collaborate on joint projects (groupware), swap information, and share network resources such as storage space and other costly bits of equipment. Other firms, including Entropia and United Devices, are developing supercomputing applications that use the unbridled power of computers hooked to networks (see article). All of them believe fervently that the corporate world is fertile ground for P2P computing. The shift will, they say, do for computing in the 2000s what the PC did in the 1980s.

With Groove, for instance, workers can connect to colleagues in "virtual environments" to pursue all manner of collaborative work, ranging from brainstorming and event-planning to sharing documents and surfing the Internet together. Groove's system, like others, informs each user when colleagues ("buddies") are online, identifies them, and allows a user to connect to them from anywhere. Most important for the business world, once the software is launched, the service creates a secure space for users to communicate-be they on the Internet or on a private Intranet behind a corporate firewall. There is no need to involve the company's service engineers; no need to establish some form of central organisation; and little concern about strangers gaining access to the corporate network.

Finally, there are the distributed computing services that deliver supercomputing power to companies needing massive number-crunching capacity occasionally but unwilling to pay millions of dollars for it. Essentially, the technology being developed by firms such as United Devices of Austin, Texas; Entropia of San Diego, California; and Applied Meta of Cambridge, Massachusetts, breaks down large computations into small parcels that can be distributed among computers tethered to a network. Each PC simultaneously computes the data and returns the results to a central computer that assembles the parts into a whole.

Would that signal the end of commercial P2P? Not necessarily. Even as the first swarm of peer-to-peer companies drop like flies, the know-how will permeate all manner of applications developed for large enterprises by Microsoft, Oracle, Sun Microsystems and others. Those behemoths have worked to emphasise their backing of the technology in recent months. Intel, the first big company to back peer-to-peer, has thrown its weight behind the technology, investing in several start-ups and helping guide the peer-to-peer working group to develop standards. Meanwhile, Microsoft unveiled in April its Project Hailstorm, as part of its software-as-service initiative (".Net"), which features peer-to-peer services prominently. Not to be outdone, Sun Microsystems unveiled its JXTA (pronounced "Juxta") set of standards for an open-source P2P platform. Sun also has begun investing in the area-most recently by acquiring Infrasearch for a reported $10m.

But for all the problems associated with the technology and financing, it is the psychology of the modern workplace that will make peer-to-peer a force to be reckoned with. The past decade has seen a dramatic shift in the nature of work. Company boundaries have grown wider, tying customers and suppliers ever closer, and increasing their reliance on temporary workers and consultants, while depending ever more on ad hoc work-groups. According to Daniel Pink, the author of "Free Agent Nation", such a workplace is increasingly devoid of fixed structures and clearly defined social and professional roles. Peer-to-peer matches the behaviour of modern corporations rather well.

Bonnie Nardi, an anthropologist at Hewlett-Packard in San Jose, California, who has studied the way people work today, has identified the rise of ad hoc networks as one of the most fundamental innovations in the workforce in recent years. These networks, which Ms Nardi calls "intensional networks", because of their intentional nature and inherent tension with other structures, have become a significant extension of work-and, increasingly, of the corporation itself. In turn, such self-managed messaging systems as ICQ and other peer-to-peer tools for collaboration then become the centre point of such networks, allowing users to contact peers freely, easily, and on their own terms."

Related Papers

Short History of Barcodes
Short History of Collaborative Filtering
Freenet File Sharing
Bio of Bernardo Huberman
Management of Reputation Using Collaborative Filtering
Early History and Media Coverage of Webraska
Early History of Napster
Notes on RFID Technology, 2001
What is Mojo Nation?


Copyright © 2014 Moya K. Mason, All Rights Reserved

Back to: Resume and More Papers