文/斯蒂芬妮·帕帕斯 譯/周臻
By Stephanie Pappas
互聯網是一個繁忙之所。國際實時統計項目的網站“互聯網實時統計”顯示,每秒大約有6000推特更新,4萬多谷歌搜索,200多萬電郵發送。
[2]但這些統計數據只暗示了網絡大小。截至2014年9月,互聯網上有10億個網站,其數量隨著每分鐘有網站消失和誕生而波動?;ヂ摼W不停變化,某種程度上卻可量化——就在這人人熟知的互聯網之下,是谷歌和其他搜索引擎都未索引的“深網”。深網的內容可以與在線數據庫的搜索結果一樣無害,也可以像那些使用特殊Tor軟件才能訪問的黑市論壇一樣神秘。(使用Tor是人們為了某種理由需要在網上匿名,而不僅僅是為了非法活動。)
[3]結合“表層”網絡的不斷變化和深網的無法量化,很容易看出為什么估算互聯網的大小是一項艱巨的任務。不過分析師們認為,網絡規模龐大且越來越大。
[4]除了大約10億個網站,網絡還是更多個人網頁的家園。www.worldwidewebsize.com是其中之一,旨在通過互聯網顧問莫里斯·德昆德爾的研究來取得數字量化。德昆德爾和他的同事們于2016年2月在《科學計量學》雜志上發表了他們的研究方法。為了取得估算結果,研究人員在谷歌和必應上批量搜索了50個常用詞。(雅虎搜索和Ask.com曾經被納入,但因為它們不再顯示結果總數而被排除。)研究人員知道這些詞在普通印刷品中的出現頻率,他們便能基于詞匯引用的多少來推算出頁面總數。搜索引擎索引出的頁面會互相重復,因此該方法還需要估計和去除可能的重疊部分。
[5]根據這些計算,截至2016年3月中,至少有46.6億個網頁在線。但是,該計算僅涵蓋可搜索的網絡,不包括深網。
[6]那么互聯網存有多少信息呢?據加州大學戴維斯分校通信系教授馬丁·希爾伯特說,有三種方法來審視這個問題。
[7]“互聯網存儲信息,互聯網傳播信息,互聯網計算信息?!毕柌厝缡钦f。他表示,互聯網的通信能力,可以通過在任何給定時間內能夠傳輸多少信息或實際傳輸多少信息來衡量。
[8] 2014年,研究人員在《超級計算前沿和創新》雜志上發表了一份研究報告,估算互聯網的存儲容量為10的24次方字節,即100萬埃字節。1個字節是一個包含8個比特的數據單元,相當于您正讀到的1個單詞中的單個字符。1個埃字節是100億億字節。
[9]估算互聯網通信能力的一種方法是測量互聯網的流量。根據思科可視網絡指數計劃,互聯網正處于“澤字節時代”。1個澤字節等于十萬億億個字節或1000埃字節。根據思科推斷,截至2016年底,全球互聯網流量將達到每年1.1澤字節,到2019年,全球流量預計將達到每年2澤字節。
[10]思科思維領袖總監小托馬斯·巴內特在2011年的一篇博客中寫到了公司的發現,1個澤字節相當于長達3.6萬年的高清視頻,也相當于播放Netflix的整個目錄3177次。
[11] 2011年,希爾伯特和他的同事在《科學》雜志上發表了一篇論文,以帶寬測量估算出互聯網的通信能力,為每秒3兆千比特。這是基于硬件的能力,而不是任何時刻實際傳輸的信息量。
[12]在一項特別不尋常的研究中,一個匿名黑客通過計算使用了多少個IP(互聯網協議)來測量互聯網的大小。IP是數據通行于互聯網的起點,每個在線設備至少有一個IP地址。據該黑客估計,2012年在線的IP地址有13億個。
[13]互聯網大大改變了數據格局。希爾伯特及同事發現,2000年,在互聯網應用無所不在之前,電信容量為2.2個完美壓縮的埃字節。2007年,這個數字為65。這個容量包括電話網絡和語音呼叫,以及龐大的互聯網信息庫接入。然而研究者們發現,2007年移動網絡上的數據流量已經超過了語音流量。
[14]如果感覺所有這些位元和字節有點抽象,別擔心:2015年,研究人員嘗試了用物理術語來表達互聯網的大小。他們在《跨學科科學課題》雜志上發文稱,據估計,需要用2%的亞馬孫熱帶雨林制造的紙張來打印出整個網絡(包括暗網)。對于這項研究,他們做出了一些關于網上文本的大膽假設:一張普通網頁估計需要30頁A4紙(8.27×11.69英寸)?;谶@個假設,打印互聯網上的文本將需要1360億頁之多。(后來,《華盛頓郵報》的一名記者想要提升估算的準確率,他認為一張網頁的平均長度更接近6.5頁,因而估算出需要3055億頁來打印整個互聯網。)
[15]當然,用文本形式打印出來的互聯網不會包含大量在線的非文本數據。根據思科的調查結果,2015年,視頻的IP傳輸量為每月8000拍字節,而網頁、電郵和數據傳輸每月則為約3000拍字節。(拍字節是100萬吉字節或2的50次方字節。)據了解,該公司估計,視頻占當年大部分互聯網流量,達到3.4萬拍字節。文件共享排在第二,達1.4萬拍字節。
[16]希爾伯特及同事采取了自己的方式,將全世界的信息可視化。在發表于2011年《科學》雜志的文章里,他們計算出,全世界模擬和數字存儲的信息容量為295個完美壓縮埃字節。研究人員寫道:若用光盤存儲295埃字節,需要的光盤將摞到月球(238900英里,即384400公里),接著再壘起地球到月球的四分之一距離??偩嚯x為298625英里(480590公里)。到2007年,94%的信息是數字化的,意味著如果存儲在光盤上,僅世界上的數字信息就會沖過月球,延伸280707.5英里(451755公里)。
[17]希爾伯特說,互聯網的規模在不斷變化,而其增長呈跳躍式。這些洶涌而來的信息,只有一個可取之處:比起存儲的數據量,我們的計算能力增長更快。
[18]希爾伯特說,全世界存儲容量每三年翻一番,但全世界計算能力每一年半翻一番。2011年,人類可以用其所有計算機每秒執行64艾條指令——相當于人腦每秒的神經脈沖數。5年后,計算機的能力將大致達到8個人類大腦的水平。當然,這并不意味著一個房間里的8個人就可以超越全世界的電腦。從許多方面講,人工智能已經勝過人類的認知能力(盡管人工智能還遠未能模擬普通的類人智力)。在線上,人工智能決定了你能看到的臉書帖子、谷歌搜索內容,甚至80%的股票交易。希爾伯特說,線上數據爆炸式增長唯一有用的地方就是計算能力的擴展。
[19]他說:“我們正從信息時代進入知識時代?!?□
The Internet is a busy place. Every second, approximately 6,000 tweets are tweeted; more than 40,000 Google queries are searched; and more than 2 million emails are sent, according to Internet Live Stats1該信息來源于http://www.internetlivestats.com/one-second/。本文寫于2016年,數據已不準確。有興趣, a website of the international Real Time Statistics Project.
[2] But these statistics only hint at the size of the Web. As of September 2014,there were 1 billion websites on the Internet, a number that fluctuates by the minute as sites go defunct and others are born.And beneath this constantly changing (but sort of quanti fiable) Internet that’s familiar to most people lies the “Deep Web2這里參照了淺海和深海的概念。,”which includes things Google and other search engines don’t index. Deep Web content can be as innocuous as the results of a search of an online database or as secretive as black-market forums accessible only to those with special Tor33 Tor是The Onion Router的縮寫,是第二代洋蔥路由(onion routing)的一種實現,用戶通過Tor可以防范流量過濾、嗅探分析,在互聯網上實現匿名交流。software.(Though Tor isn’t only for illegal activity,it’s used wherever people might have reason to go anonymous online.)
[3] Combine the constant change in the “surface” Web with the unquanti fiability of the Deep Web, and it’s easy to see why estimating the size of the Internet is a dif ficult task. However, analysts say the Web is big and getting bigger.
[4] With about 1 billion websites, the Web is home to many more individual Web pages. One of these pages, www.worldwidewebsize.com, seeks to quantify the number using research by Internet consultant Maurice de Kunder. De Kunder and his colleagues published their methodology in February 2016 in the journal Scientometrics4由Springer發行的學術期刊,關注科學和科學研究中的量化方法和特征研究。. To come to an estimate, the researchers sent a batch of 50 common words to be searched by Google and Bing. (Yahoo Search and Ask.com used to be included but are not anymore because they no longer show the total results.) The researchers knew how frequently these words have appeared in print in general, allowing them to extrapolate the total number of pages out there based on how many contain the reference words. Search engines overlap in the pages they index,so the method also requires estimating and subtracting the likely overlap.
[5] According to these calculations,there were at least 4.66 billion Web pages online as of mid-March 2016.This calculation covers only the searchable Web, however, not the Deep Web.
[6] So how much information does the Internet hold? There are three ways to look at that question, said Martin Hilbert, a professor of communications at the University of California, Davis.
[7] “The Internet stores information,the Internet communicates information and the Internet computes information,”Hilbert said. The communication capacity of the Internet can be measured by how much information it can transfer,or how much information it does transfer at any given time, he said.
[8] In 2014, researchers published a study in the journal Supercomputing Frontiers and Innovations estimating the storage capacity of the Internet at 1024bytes, or 1 million exabytes. A byte is a data unit comprising 8 bits, and is equal to a single character in one of the words you’re reading now. An exabyte is 1 billion billion bytes.5有必要列表一下各種字節單位的換算:1B(byte 字節)=8bit(比特),1KB(Kilobyte千字節)=1024B,1MB(Megabyte 兆字節,簡稱“兆”)=1024KB,1GB(Gigabyte吉字節,又稱“千兆”)=1024MB,1TB(Terabyte 萬億字節,太字節)=1024GB,1PB(Petabyte 千萬億字節,拍字節)=1024TB,1EB(Exabyte 百億億字節,埃字節)=1024PB,1ZB(Zettabyte 十萬億億字節,澤字節)= 1024EB,1YB(Yottabyte 一億億億字節,堯字節)= 1024ZB。
[9] One way to estimate the communication capacity of the Internet is to measure the traffic moving through it.According to Cisco’s Visual Networking Index initiative, the Internet is now in the“zettabyte era.” A zettabyte equals 1 sextillion6根據國際單位制,一個sextillion相當于10的21次方。bytes, or 1,000 exabytes. By the end of 2016, global Internet traffic will reach 1.1 zettabytes per year, according to Cisco, and by 2019, global traffic is expected to hit 2 zettabytes per year.
[10] One zettabyte is the equivalent of 36,000 years of high-definition video,which, in turn, is the equivalent of streaming Net flix7全球最大的在線電視電影節目付費收’s entire catalog 3,177 times, Thomas Barnett Jr., Cisco’s director of thought leadership, wrote in a 2011 blog post about the company’s findings.
[11] In 2011, Hilbert and his colleagues published a paper in the journal Science estimating the communication capacity of the Internet at 3 × 1012kilobits per second, a measure of bandwidth. This was based on hardware capacity, and not on how much information was actually being transferred at any moment.
[12] In one particularly offbeat study,an anonymous hacker measured the size of the Internet by counting how many IPs (Internet Protocols) were in use. IPs are the wayposts of the Internet through which data travels, and each device online has at least one IP address. According to the hacker’s estimate, there were 1.3 billion IP addresses used online in 2012.
[13] The Internet has vastly altered the data landscape. In 2000, before Internet use became ubiquitous, telecommunications capacity was 2.2 optimally compressed exabytes, Hilbert and his colleagues found. In 2007, the number was 65. This capacity includes phone networks and voice calls as well as access to the enormous information reservoir that is the Internet. However,data traffic over mobile networks was already outpacing voice traf fic in 2007,the researchers found.
[14] If all of these bits and bytes feel a little abstract, don’t worry: In 2015,researchers tried to put the Internet’s size in physical terms. The researchers estimated that it would take 2 percent of the Amazon rainforest to make the paper to print out the entire Web (including the Dark Web), they reported in the Journal of Interdisciplinary Science Topics. For that study, they made some big assumptions about the amount of text online by estimating that an average Web page would require 30 pages of A4 paper (8.27 by 11.69 inches). With this assumption, the text on the Internet would require 1.36 × 1011pages to print a hard copy. (A Washington Post reporter later aimed for a better estimate and determined that the average length of a Web page was closer to 6.5 printed pages, yielding an estimate of 305.5 billion pages to print the whole Internet.)
[15] Of course, printing out the Internet in text form wouldn’t include the massive amount of nontext data hosted online. According to Cisco’s research,8,000 petabytes per month of IP traf fic was dedicated to video in 2015, compared with about 3,000 petabytes per month for Web, email and data transfer.(A petabyte is a million gigabytes or 250bytes.) All told, the company estimated that video accounted for most Internet traffic that year, at 34,000 petabytes.File sharing came in second, at 14,000 petabytes.
[16] Hilbert and his colleagues took their own stab at visualizing the world’s information. In their 2011 Science paper, they calculated that the information capacity of the world’s analog and digital storage was 295 optimally compressed exabytes. To store 295 exabytes on CD-ROMS would require a stack of discs reaching to the moon (238,900 miles, or 384,400 kilometers), and then a quarter of the distance from the Earth to the moon again, the researchers wrote. That’s a total distance of 298,625 miles (480,590 km). By 2007,94 percent of information was digital,meaning that the world’s digital information alone would overshoot the moon if stored on CD-ROM. It would stretch 280,707.5 miles (451,755 km).
[17] The Internet’s size is a moving target, Hilbert said, but it’s growing by leaps and bounds. There’s just one saving grace when it comes to this deluge of information: Our computing capacity is growing even faster than the amount of data we store.
[18] While world storage capacity doubles every three years, world computing capacity doubles every year and a half, Hilbert said. In 2011, humanity could carry out 6.4 × 1018instructions per second with all of its computers—similar to the number of nerve impulses per second in the human brain. Five years later, computational power is up in the ballpark of about eight human brains. That doesn’t mean, of course,that eight people in a room could outthink the world’s computers. In many ways, artificial intelligence already outperforms human cognitive capacity(though A.I. is still far from mimicking general, humanlike intelligence). Online, artificial intelligence determines which Facebook posts you see, what comes up in a Google search and even 80 percent of stock market transactions.The expansion of computing power is the only thing making the explosion of data online useful, Hilbert said.
[19] “We’re going from an information age to a knowledge age,” he said. ■