Home > News content

Internet Archive: recording the forgotten Internet

via:驱动之家     time:2019/11/16 14:00:57     readed:2398

Under Martin Luther King, Jr's Wikipedia entry, there are more than 300 footnotes, including 66 Book citations.

This is the reason why people trust Wikipedia. Almost every description of every entry can be followed. The readers can check the accuracy of the entry text by reference.

But even an Internet encyclopedia such as Wikipedia has very limited records. Can the Internet be archived?, an article entitled Wikipedia The article once wrote, "the Internet always lives in the present." It is illusory, short, unstable and unreliable. Sometimes the page you want to visit points to 404. Sometimes the page you want to query has been overwritten by the updated content

So, is there any way to find those 404 or modified web content?

Backup Internet

Someone tried to back up the whole Internet.

In 1996, for fear that information on the Web could not be preserved forever as printed in books, Brewster.

Many people define Internet Archive as the greatest search site. Wayback machine, a search tool developed by Kahle, regularly collects and grabs information from global websites and saves it. The work of wayback machine can also be divided into primary and secondary. For different websites, the number and frequency of records are also different.

Up to now, Internet archive has saved 330 billion web pages and page snapshots, and the greatness of Internet Archive lies in that in addition, this huge archive has recorded 20 million books and texts, 8.5 million audio and video, 3 million images and 200000 software programs.

All in all, what Internet Archive wants to do is make information acquisition easier and more accurate. Recently, Internet Archive and Wikipedia have worked together to make Wikipedia more reliable. Internet archive has directed and linked 130000 Book references in Wikipedia footnotes to 50000 Internet Archive books (covering English, Greek and Arabic) that have been digitally scanned and made public. Visitors can click the page number of the footnote to view the two page context preview of the referenced part.


Visitors can click the page number of the footnote to view the two page context preview of the referenced part | Internet Archive

Network Library

"Footnote is a milestone in the history of human civilization. It took centuries to invent and disseminate it, and only a few years to destroy it," said the new Yorker. In the past, for example, footnotes for books and papers have allowed you to get accurate information about additional information and where it came from. Now, when it comes to the Internet, you can still get more information by clicking on the footnote link, but you don't know which day the link might fail. "

In October 2016, Wikipedia and Internet Archive announced to cooperate to solve the problem of invalid links. The Internet Archive BOT developed by Mark Graham, director of wayback machine, automatically scans the invalid links of Wikipedia footnotes and automatically connects the invalid links to the pages saved by wayback machine. "We've edited 14 million links, more than 11 million links to Internet Archive. "Graham said.

The work of linking books is similar, but more challenging. Graham explained that not all books have ISBN codes, and that not all footnotes refer to the correct reference format and page numbers.

Internet Archive calls itself a network library. Many offline libraries will also lend books to users after digitization. When you are interested in a reference book, you can ask Internet Archive to borrow the electronic version.

Internet Archive started to digitize books in 2005. It has 3.8 million "collections". At present, Internet archive has 22 sites around the world, and 100 employees speed up the scanning work at the rate of 1000 books a day, even if there are millions of books waiting in line.

In the digital age, people are more and more far away from books. "We want to start with Wikipedia and connect readers with books by weaving them into the Internet," Kahle said. "

Internet archives

The youth of the post-80s and post-90s may stop one day with the closing of Tianya and Douban, and Facebook has only seen a decade since its establishment. The Internet speeds up the dissemination and iteration of information, and accordingly people forget faster. But in Internet Archive, nostalgic people can see the hot topic at that time, "manufacturing machine" Tianya community, and now it seems that there are some "non mainstream" snapshots of sina Weibo homepage.



Snapshot of Tianya and Sina Weibo saved by Internet Archive | Internet Archive

As the new Yorker comments, it's almost certain that if something isn't included in the web's wake machine, it's like it never existed.

On July 17, 2014, a Malaysian Boeing 777 plane crashed in Ukraine less than three hours after taking off. "We just shot down a plane, an an-26," said Strelkov, commander of Ukraine's opposition, in a message on Russian social media Vkontakte. "The post contains a video link to the wreckage of the plane, which looks like a Boeing 777, and was later deleted. The next day, the post was included in the wake machine. Internet Archive posted on Facebook saying, "that's what we exist for. "

As the financial times commented, in an era of false information, extremist content being rapidly created and disseminated, and social media information constantly iterating and updating, the importance of being able to record "who said what", "when said what" and the immutability of content has been magnified. It is more valuable to study the historical information of different periods through Internet Archive. For example, after trump was elected, Internet Archive collected more than 6000 videos, including before trump took office, to help people identify and verify false information.

However, it is not easy to build an Internet archive of globalization, in part because of the lack of harmonization of legal issues, such as legal deposit, copyright, privacy and so on. At the beginning of the year, the Society of Directors said that the Internet Archive practice was suspected of infringing

Soon after, a document released by the National Writers' Union (NationalWritersUnion) and co-signed by the other 36 organizations, including The Society of Authors), condemned the scanning and distribution of e-books by Internet Archive and the Cooperative Library. Although Internet Archive explained that he signed the CDL (controlled digital lending) agreement,

Law can't keep up with the pace of technology iteration, just like many dare to be pioneers, Internet Archive is in the gap between resource sharing and copyright supremacy.


Brewster Kahle, founder of Internet archives, Wikipedia

"In the ancient times of the Internet in China, people not only used the Internet, but also participated in the construction of the Internet. For example, they went to Wikipedia to compile terms and manage content. In the Chinese Internet world, people go to douban.com to add items of movies, books and music albums, which is convenient for other netizens to mark, collect and comment. "He caitou, an online writer, once wrote.

This may be similar to the Internet world that Internet Archive wants to build. In Graham's words, Internet Archive wants to popularize all knowledge. Although Internet Archive is based in San Francisco, it has very little in common with Silicon Valley today, Kahle said. He hoped that the "legacy" of all technology would not be in the hands of a few people. "I like the feeling that many people can win. "


China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments