In the May 1970 issue of
Popular Science magazine
Arthur C. Clarke was reported to have predicted that satellites would one day "bring the accumulated knowledge of the world to your fingertips" using a console that would combine the functionality of the Xerox, telephone, television and a small computer, allowing data transfer and video conferencing around the globe.
[5]
In March 1989,
Tim Berners-Lee wrote a proposal that referenced
ENQUIRE, a database and software project he had built in 1980, and described a more elaborate information management system.
[6]
With help from
Robert Cailliau, he published a more formal proposal (on November 12, 1990) to build a "
Hypertext project" called "WorldWideWeb" (one word, also "W3") as a "web" of "hypertext documents" to be viewed by "
browsers" using a
client–server architecture.
[2] This proposal estimated that a read-only web would be developed within three months and that it would take six months to achieve "the creation of new links and new material by readers, [so that] authorship becomes universal" as well as "the automatic notification of a reader when new material of interest to him/her has become available." See
Web 2.0 and
RSS/
Atom, which have taken a little longer to mature.
The proposal was modeled after the
Dynatext SGML reader by Electronic Book Technology, a spin-off from the Institute for Research in Information and Scholarship at Brown University. The Dynatext system, licensed by CERN, was technically advanced and was a key player in the extension of SGML ISO 8879:1986 to Hypermedia within
HyTime, but it was considered too expensive and had an inappropriate licensing policy for use in the general high energy physics community, namely a fee for each document and each document alteration.
The CERN
datacenter in 2010 housing some www servers
A
NeXT Computer was used by Berners-Lee as the world's first
web server and also to write the first
web browser,
WorldWideWeb, in 1990. By Christmas 1990, Berners-Lee had built all the tools necessary for a working Web:
[7] the
first web browser (which was a web editor as well); the first web server; and the first web pages,
[8] which described the project itself. On August 6, 1991, he posted a short summary of the World Wide Web project on the
alt.hypertext newsgroup.
[9] This date also marked the debut of the Web as a publicly available service on the
Internet. The first photo on the web was uploaded by Berners-Lee in 1992, an image of the CERN house band
Les Horribles Cernettes.
Web as a "Side Effect" of the 40 years of Particle Physics Experiments. It happened many times during history of science that the most impressive results of large scale scientific efforts appeared far away from the main directions of those efforts... After the World War 2 the nuclear centers of almost all developed countries became the places with the highest concentration of talented scientists. For about four decades many of them were invited to the international CERN's Laboratories. So specific kind of the CERN's intellectual "entire culture" (as you called it) was constantly growing from one generation of the scientists and engineers to another. When the concentration of the human talents per square foot of the CERN's Labs reached the critical mass, it caused an intellectual explosion The Web -- crucial point of human's history -- was born... Nothing could be compared to it... We cant imagine yet the real scale of the recent shake, because there has not been so fast growing multi-dimension social-economic processes in human history... [10]
The first server outside Europe was set up at
SLAC to host the
SPIRES-HEP database. Accounts differ substantially as to the date of this event. The World Wide Web Consortium says December 1992,
[11] whereas SLAC itself claims 1991.
[12][13] This is supported by a W3C document entitled
A Little History of the World Wide Web.
[14]
The crucial underlying concept of
hypertext originated with older projects from the 1960s, such as the Hypertext Editing System (HES) at Brown University,
Ted Nelson's
Project Xanadu, and
Douglas Engelbart's
oN-Line System (NLS). Both Nelson and Engelbart were in turn inspired by
Vannevar Bush's
microfilm-based "
memex", which was described in the 1945 essay "
As We May Think".
[citation needed]
Berners-Lee's breakthrough was to marry hypertext to the
Internet. In his book
Weaving The Web, he explains that he had repeatedly suggested that a marriage between the two technologies was possible to members of
both technical communities, but when no one took up his invitation, he finally tackled the project himself. In the process, he developed a system of globally unique identifiers for resources on the Web and elsewhere: the Universal Document Identifier (UDI), later known as
Uniform Resource Locator (URL) and
Uniform Resource Identifier (URI); the publishing language
HyperText Markup Language (HTML); and the
Hypertext Transfer Protocol (HTTP).
[15]
The World Wide Web had a number of differences from other hypertext systems that were then available. The Web required only unidirectional links rather than bidirectional ones. This made it possible for someone to link to another resource without action by the owner of that resource. It also significantly reduced the difficulty of implementing web servers and browsers (in comparison to earlier systems), but in turn presented the chronic problem of
link rot. Unlike predecessors such as
HyperCard, the World Wide Web was non-proprietary, making it possible to develop servers and clients independently and to add extensions without licensing restrictions. On April 30, 1993,
CERN announced
[16] that the World Wide Web would be free to anyone, with no fees due. Coming two months after the announcement that the
Gopher protocol was no longer free to use, this produced a rapid shift away from Gopher and towards the Web. An early popular web browser was
ViolaWWW, which was based upon
HyperCard.
Scholars generally agree that a turning point for the World Wide Web began with the introduction
[17] of the
Mosaic web browser
[18] in 1993, a graphical browser developed by a team at the
National Center for Supercomputing Applications at the
University of Illinois at Urbana-Champaign (NCSA-UIUC), led by
Marc Andreessen. Funding for Mosaic came from the U.S.
High-Performance Computing and Communications Initiative, a funding program initiated by the
High Performance Computing and Communication Act of 1991, one of
several computing developments initiated by U.S. Senator
Al Gore.
[19] Prior to the release of Mosaic, graphics were not commonly mixed with text in web pages and the Web's popularity was less than older protocols in use over the Internet, such as
Gopher and
Wide Area Information Servers (WAIS). Mosaic's graphical user interface allowed the Web to become, by far, the most popular Internet protocol.
The
World Wide Web Consortium (W3C) was founded by Tim Berners-Lee after he left the European Organization for Nuclear Research (
CERN) in October, 1994. It was founded at the
Massachusetts Institute of Technology Laboratory for Computer Science (MIT/LCS) with support from the
Defense Advanced Research Projects Agency (DARPA), which had pioneered the
Internet; a year later, a second site was founded at INRIA (a French national computer research lab) with support from the
European Commission DG InfSo; and in 1996, a third continental site was created in Japan at Keio University. By the end of 1994, while the total number of websites was still minute compared to present standards, quite a number of
notable websites were already active, many of which are the precursors or inspiration for today's most popular services.
Connected by the existing Internet, other
websites were created around the world, adding international standards for
domain names and
HTML. Since then, Berners-Lee has played an active role in guiding the development of web standards (such as the
markup languages in which web pages are composed), and in recent years has advocated his vision of a
Semantic Web. The World Wide Web enabled the spread of information over the
Internet through an easy-to-use and flexible format. It thus played an important role in popularizing use of the Internet.
[20] Although the two terms are sometimes
conflated in popular use,
World Wide Web is not
synonymous with
Internet.
[21] The Web is an application built on top of the Internet.
[edit] Function
The terms
Internet and World Wide Web are often used in every-day speech without much distinction. However, the
Internet and the World Wide Web are not one and the same. The Internet is a global system of interconnected
computer networks. In contrast, the Web is one of the services that runs on the Internet. It is a collection of interconnected documents and other resources, linked by hyperlinks and URLs. In short, the Web is an
application running on the Internet.
[22] Viewing a
web page on the World Wide Web normally begins either by typing the
URL of the page into a
web browser, or by following a
hyperlink to that page or resource. The web browser then initiates a series of communication messages, behind the scenes, in order to fetch and display it.
First, the server-name portion of the URL is resolved into an
IP address using the global, distributed
Internet database known as the
Domain Name System (DNS). This IP address is necessary to contact the
Web server. The browser then requests the resource by sending an
HTTP request to the Web server at that particular address. In the case of a typical web page, the
HTML text of the page is requested first and
parsed immediately by the web browser, which then makes additional requests for images and any other files that complete the page image. Statistics measuring a website's popularity are usually based either on the number of
page views or associated server '
hits' (file requests) that take place.
While receiving these files from the web server, browsers may progressively
render the page onto the screen as specified by its HTML,
Cascading Style Sheets (CSS), or other page composition languages. Any images and other resources are incorporated to produce the on-screen web page that the user sees. Most web pages contain
hyperlinks to other related pages and perhaps to downloadable files, source documents, definitions and other web resources. Such a collection of useful, related resources, interconnected via hypertext links is dubbed a
web of information. Publication on the Internet created what
Tim Berners-Lee first called the
WorldWideWeb (in its original
CamelCase, which was subsequently discarded) in November 1990.
[2]
[edit] Linking
Graphic representation of a minute fraction of the
WWW, demonstrating
hyperlinks Over time, many web resources pointed to by hyperlinks disappear, relocate, or are replaced with different content. This makes hyperlinks obsolete, a phenomenon referred to in some circles as
link rot and the hyperlinks affected by it are often called
dead links. The ephemeral nature of the Web has prompted many efforts to archive web sites. The
Internet Archive, active since 1996, is one of the best-known efforts.
[edit] Dynamic updates of web pages
JavaScript is a
scripting language that was initially developed in 1995 by
Brendan Eich, then of
Netscape, for use within web pages.
[23] The standardized version is
ECMAScript.
[23] To overcome some of the limitations of the page-by-page model described above, some web applications also use
Ajax (
asynchronous JavaScript and
XML). JavaScript is delivered with the page that can make additional HTTP requests to the server, either in response to user actions such as mouse-clicks, or based on lapsed time. The server's responses are used to modify the current page rather than creating a new page with each response. Thus the server only needs to provide limited, incremental information. Since multiple Ajax requests can be handled at the same time, users can interact with a page even while data is being retrieved. Some web applications regularly
poll the server to ask if new information is available.
[24]
[edit] WWW prefix
Many domain names used for the World Wide Web begin with
www because of the long-standing practice of naming
Internet hosts (servers) according to the services they provide. The
hostname for a
web server is often
www, in the same way that it may be
ftp for an
FTP server, and
news or
nntp for a
USENET news server. These host names appear as
Domain Name System (DNS)
subdomain names, as in
www.example.com. The use of 'www' as a subdomain name is not required by any technical or policy standard; indeed, the first ever web server was called
nxoc01.cern.ch,
[25] and many web sites exist without it. Many established websites still use 'www', or they invent other subdomain names such as 'www2', 'secure', etc. Many such web servers are set up such that both the domain root (e.g., example.com) and the
www subdomain (e.g., www.example.com) refer to the same site; others require one form or the other, or they may map to different web sites.
The use of a subdomain name is useful for
load balancing incoming web traffic by creating a
CNAME record that points to a cluster of web servers. Since, currently, only a subdomain can be cname'ed the same result cannot be achieved by using the bare domain root.
When a user submits an incomplete website address to a web browser in its address bar input field, some web browsers automatically try adding the prefix "www" to the beginning of it and possibly ".com", ".org" and ".net" at the end, depending on what might be missing. For example, entering 'microsoft' may be transformed to
http://www.microsoft.com/ and 'openoffice' to
http://www.openoffice.org. This feature started appearing in early versions of Mozilla
Firefox, when it still had the working title 'Firebird' in early 2003.
[26] It is reported that Microsoft was granted a US patent for the same idea in 2008, but only for mobile devices.
[27]
The scheme specifier (
http:// or
https://) in
URIs refers to the
Hypertext Transfer Protocol and to
HTTP Secure respectively and so defines the communication protocol to be used for the request and response. The HTTP protocol is fundamental to the operation of the World Wide Web, and the encryption involved in HTTPS adds an essential layer if confidential information such as passwords or banking information are to be exchanged over the public Internet. Web browsers usually prepend the scheme to URLs too, if omitted.
In English,
www is pronounced by individually pronouncing the name of characters (
double-u double-u double-u). Although some technical users pronounce it
dub-dub-dub this is not widespread. The English writer
Douglas Adams once quipped in
The Independent on Sunday (1999): "The World Wide Web is the only thing I know of whose shortened form takes three times longer to say than what it's short for," with Stephen Fry later pronouncing it in his "Podgrammes" series of podcasts as "wuh wuh wuh." In Mandarin Chinese,
World Wide Web is commonly translated via a
phono-semantic matching to
wà n wéi wǎng (
万维网), which satisfies
www and literally means "myriad dimensional net",
[28] a translation that very appropriately reflects the design concept and proliferation of the World Wide Web. Tim Berners-Lee's web-space states that
World Wide Web is officially spelled as three separate words, each capitalized, with no intervening hyphens.
[29]
[edit] Privacy
Computer users, who save time and money, and who gain conveniences and entertainment, may or may not have surrendered the right to
privacy in exchange for using a number of technologies including the Web.
[30] Worldwide, more than a half billion people have used a
social network service,
[31] and of Americans who grew up with the Web, half created an online profile
[32] and are part of a generational shift that could be changing norms.
[33][34] Facebook progressed from U.S. college students to a 70% non-U.S. audience, and in 2009 estimated that only 20% of its members use privacy settings.
[35] In 2010 (six years after co-founding the company),
Mark Zuckerberg wrote, "we will add privacy controls that are much simpler to use".
[36]
Privacy representatives from 60 countries have resolved to ask for laws to complement industry self-regulation, for education for children and other minors who use the Web, and for default protections for users of social networks.
[37] They also believe data protection for
personally identifiable information benefits business more than the sale of that information.
[37] Users can opt-in to features in browsers to clear their personal histories locally and block some
cookies and
advertising networks[38] but they are still tracked in websites'
server logs, and particularly
web beacons.
[39] Berners-Lee and colleagues see hope in accountability and appropriate use achieved by extending the Web's architecture to policy awareness, perhaps with audit logging, reasoners and appliances.
[40]
In exchange for providing free content, vendors hire advertisers who spy on Web users and base their business model on tracking them.
[41] Since 2009, they buy and sell consumer data on exchanges (lacking a few details that could make it possible to de-anonymize, or identify an individual).
[42][41] Hundreds of millions of times per day, Lotame Solutions captures what users are typing in real time, and sends that text to OpenAmplify who then tries to determine, to quote a writer at
The Wall Street Journal, "what topics are being discussed, how the author feels about those topics, and what the person is going to do about them".
[43][44]
Microsoft backed away in 2008 from its plans for strong privacy features in Internet Explorer,
[45] leaving its users (50% of the world's Web users) open to advertisers who may make assumptions about them based on only
one click when they visit a website.
[46] Among services paid for by
advertising,
Yahoo! could collect the most data about users of commercial websites, about 2,500 bits of information per month about each typical user of its site and its affiliated advertising network sites. Yahoo! was followed by
MySpace with about half that potential and then by
AOL–
TimeWarner,
Google,
Facebook, Microsoft, and
eBay.
[47]
[edit] Security
The Web has become criminals' preferred pathway for spreading
malware. Cybercrime carried out on the Web can include
identity theft, fraud, espionage and intelligence gathering.
[48] Web-based
vulnerabilities now outnumber traditional computer security concerns,
[49][50] and as measured by
Google, about one in ten web pages may contain malicious code.
[51] Most Web-based
attacks take place on legitimate websites, and most, as measured by
Sophos, are hosted in the United States, China and Russia.
[52] The most common of all malware
threats is
SQL injection attacks against websites.
[53] Through HTML and URIs the Web was vulnerable to attacks like
cross-site scripting (XSS) that came with the introduction of JavaScript
[54] and were exacerbated to some degree by Web 2.0 and Ajax
web design that favors the use of scripts.
[55] Today by one estimate, 70% of all websites are open to XSS attacks on their users.
[56]
Proposed solutions vary to extremes. Large security vendors like
McAfee already design governance and compliance suites to meet post-9/11 regulations,
[57] and some, like
Finjan have recommended active real-time inspection of code and all content regardless of its source.
[48] Some have argued that for enterprise to see security as a business opportunity rather than a cost center,
[58] "ubiquitous, always-on digital rights management" enforced in the infrastructure by a handful of organizations must replace the hundreds of companies that today secure data and networks.
[59] Jonathan Zittrain has said users sharing responsibility for computing safety is far preferable to locking down the
Internet.
[60]
[edit] Standards
Main article:
Web standardsMany formal standards and other technical specifications and software define the operation of different aspects of the World Wide Web, the
Internet, and computer information exchange. Many of the documents are the work of the
World Wide Web Consortium (W3C), headed by Berners-Lee, but some are produced by the
Internet Engineering Task Force (IETF) and other organizations.
Usually, when web standards are discussed, the following publications are seen as foundational:
Additional publications provide definitions of other essential technologies for the World Wide Web, including, but not limited to, the following:
- Uniform Resource Identifier (URI), which is a universal system for referencing resources on the Internet, such as hypertext documents and images. URIs, often called URLs, are defined by the IETF's RFC 3986 / STD 66: Uniform Resource Identifier (URI): Generic Syntax, as well as its predecessors and numerous URI scheme-defining RFCs;
- HyperText Transfer Protocol (HTTP), especially as defined by RFC 2616: HTTP/1.1 and RFC 2617: HTTP Authentication, which specify how the browser and server authenticate each other.
[edit] Accessibility
Access to the Web is for everyone regardless of
disability including visual, auditory, physical, speech, cognitive, or neurological. Accessibility features also help others with temporary disabilities like a broken arm or the aging population as their abilities change.
[61] The Web is used for receiving information as well as providing information and interacting with society, making it essential that the Web be accessible in order to provide equal access and
equal opportunity to people with disabilities.
[62] Tim Berners-Lee once noted, "The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect."
[61] Many countries regulate
web accessibility as a requirement for websites.
[63] International cooperation in the W3C
Web Accessibility Initiative led to simple guidelines that web content authors as well as software developers can use to make the Web accessible to persons who may or may not be using
assistive technology.
[61][64]
[edit] Internationalization
The W3C
Internationalization Activity assures that web technology will work in all languages, scripts, and cultures.
[65] Beginning in 2004 or 2005,
Unicode gained ground and eventually in December 2007 surpassed both
ASCII and Western European as the Web's most frequently used
character encoding.
[66] Originally
RFC 3986 allowed resources to be identified by
URI in a subset of US-ASCII.
RFC 3987 allows more characters—any character in the
Universal Character Set—and now a resource can be identified by
IRI in any language.
[67]
[edit] Statistics
According to a 2001 study, there were a massive over 550 billion documents on the Web, mostly in the invisible Web, or
deep Web.
[68] A 2002 survey of 2,024 million Web pages
[69] determined that by far the most Web content was in English: 56.4%; next were pages in German (7.7%), French (5.6%), and Japanese (4.9%). A more recent study, which used Web searches in 75 different languages to sample the Web, determined that there were over 11.5 billion Web pages in the
publicly indexable Web as of the end of January 2005.
[70] As of March 2009
[update], the indexable web contains at least 25.21 billion pages.
[71] On July 25, 2008, Google software engineers Jesse Alpert and Nissan Hajaj announced that
Google Search had discovered one trillion unique URLs.
[72] As of May 2009
[update], over 109.5 million websites operated.
[73] Of these 74% were commercial or other sites operating in the
.com
generic top-level domain.
[73]
[edit] Speed issues
Frustration over
congestion issues in the
Internet infrastructure and the high
latency that results in slow browsing has led to a pejorative name for the World Wide Web: the
World Wide Wait.
[74] Speeding up the Internet is an ongoing discussion over the use of
peering and
QoS technologies. Other solutions to reduce the congestion can be found at
W3C.
[75] Standard
guidelines for ideal Web response times are:
[76]
- 0.1 second (one tenth of a second). Ideal response time. The user doesn't sense any interruption.
- 1 second. Highest acceptable response time. Download times above 1 second interrupt the user experience.
- 10 seconds. Unacceptable response time. The user experience is interrupted and the user is likely to leave the site or system.
[edit] Caching
If a user revisits a Web page after only a short interval, the page data may not need to be re-obtained from the source Web server. Almost all web browsers
cache recently obtained data, usually on the local hard drive. HTTP requests sent by a browser will usually only ask for data that has changed since the last download. If the locally cached data are still current, it will be reused. Caching helps reduce the amount of Web traffic on the Internet. The decision about expiration is made independently for each downloaded file, whether image,
stylesheet,
JavaScript, HTML, or whatever other content the site may provide. Thus even on sites with highly dynamic content, many of the basic resources only need to be refreshed occasionally. Web site designers find it worthwhile to collate resources such as CSS data and JavaScript into a few site-wide files so that they can be cached efficiently. This helps reduce page download times and lowers demands on the Web server.
There are other components of the
Internet that can cache Web content. Corporate and academic
firewalls often cache Web resources requested by one user for the benefit of all. (See also
Caching proxy server.) Some
search engines also store cached content from websites. Apart from the facilities built into Web servers that can determine when files have been updated and so need to be re-sent, designers of dynamically generated Web pages can control the HTTP headers sent back to requesting users, so that transient or sensitive pages are not cached.
Internet banking and news sites frequently use this facility. Data requested with an
HTTP 'GET' is likely to be cached if other conditions are met; data obtained in response to a 'POST' is assumed to depend on the data that was POSTed and so is not cached.