- Google has come under scrutiny lately for its dominance over the flow of information on the internet.
- TagTheWeb is researching a method to allow the “wisdom of the crowd” to categorize the internet more effectively.
- With or without Google, the internet looks to change significantly in the future, in ways we may not be ready for.
The internet is always changing. It reached a billion websites in 2014, and it’ll probably collect another billion by next year. On Internet Live Stats, the counters for Google searches, emails sent, and tweets tweeted climb faster than the U.S. National Debt. As of this writing, the internet traffic for today has amounted to 5 billion gigabytes of data.
This supply of data is incomprehensibly large — way too much for anyone to shift through just to find out which movie stars Bruce Campbell as a mummy-hunting Elvis Presley. Clearly, we need a curator to shift through this data and inform us that Campbell got his uh-huh on in Bubba Ho-Tep.
For many Google is the curator of choice, and as far as overlords curators go, Google’s great. Its searches are quick and responsive. It limits vast amounts of spam and parked domains from cropping up its in results. And those doodles and Easter eggs are a blast.
But some researchers are looking at a new way to navigate the internet, one that doesn’t require Google.
How Google curates the internet for you
(Photo from Flickr)
The Google Mountain View campus. As most people, this is where the internet happens. For others, that’s a huge problem.
Search engines like Google’s build their indexes through a process called web crawling. Web crawlers explore webpages to gather data on its content, links, keywords, and the like. The crawlers then send these data back to the search engine where an algorithm uses them create an index of pages. When you enter search terms, the algorithm matches those terms to its index and displays results based on its internal ranking system.
That’s the basic recipe for the search engine sauce. But difference engines each add their own proprietary ingredients to their algorithms, such as speed, number of webpages crawled, how it weighs a website’s content, and what information it has on you to personalize your results. Basically, just like how all Italian restaurants use tomatoes for the base of their marinara sauce, but each sauce is unique based on its combination of oregano, basil, and (heaven forbid!) mushrooms.
You may have noticed a potential issue here. While Google does a great job of navigating the internet for you, it is ultimately the one in charge. You see the sites it chooses for you, and you have little control over how its algorithm decides what sites meet your needs. For example, last year the European Union accused Google of violating antitrust law by rigging its search results to favor Google’s products.
This dominance over the flow of information has consequences, not only for Google’s competition but also for the information available to the user. That’s where TagTheWeb comes in.
Many hands make light categorization
TagTheWeb is an experiment designed to create a general-purpose system to categorize content on the web. It’s the brainchild of Brazilian researchers Jerry Fernades Medeiros, Bernardo Pereira Nunes, Sean Wolfgand Matsui Siqueria and Luiz Andra Porest Paes Leme, who demoed their initial findings at the 2018 European Semantic Web Conference.
They based their search tool on the Wikipedia Categorization scheme, with the stated goal of “automatically categoriz[ing] any text-based content on the Web according to the collective knowledge of Wikipedia contributors.”
The process uses three steps. First, text annotation structures information from unstructured sources. Then categories are extracted by looking at relationships shared by that information. Finally, they generate a “fingerprint” for main topic categories for easy retrieval and comparison of documents.
The result is a classification system driven by people and common sense, the “wisdom of the crowd,” not domain experts like Google.
TagTheWeb is still in its experimental phase, so it will be a while before it upends any online paradigms. If you want to try it out, you can find them at http://www.tagtheweb.com.br.
Brave new world wide web
(Photo from Wikimedia)
Former Google CEO Eric Schmidt foresees that the U.S. and China’s different approaches to free speech may break the internet into two.
Even if TagTheWeb doesn’t take off, plenty of other changes will come to the internet in the coming years. That’s the nature of the e-beast. Here are some of the more far-reaching forecasts about the internet’s future:
An internet adolescence. The World Economic Forum foresees a tightening of regulations on the internet. It predicts governments will put pressure on platforms to police their content more efficiently, take action to legislate more stringent digital privacy protections, and embrace broader definitions of antitrust laws to curb Silicon Valley’s monopolistic practices.
Split consensus. According to a Pew Research survey, experts are split as to whether technology can curb the internet’s penchant for misleading stories. Forty-nine percent believe that technological innovations will help lessen the spreading of lies, while 51 percent believe the situation won’t improve.
A tale of two internets. Former Google CEO Eric Schmidt believes the internet will split in two. One internet will be China-led, the other U.S.-led. Google’s Dragonfly prototype is allegedly a search engine designed to meet China’s strict censorship practices. Schmidt worries bifurcation will occur as other countries fall under China’s infrastructural influence and adopt its suppressed version.
“If you think of China as like, ‘Oh yeah, they’re good with the Internet,’ you’re missing the point,” Schmidt said. “Globalization means that they get to play too.”
Will any of these predictions come to pass? Who can say? The only that that is certain is that the internet is always in flux, and it won’t be the same tomorrow as it is today.
Related Articles Around the Web
AddSearch Custom Site Search