OUT OF MIND
Would you like to react to this message? Create an account in a few clicks or log in to continue.
Latest topics
» The TRUTH about the Gates Family
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyToday at 11:53 am by PurpleSkyz

» January 2023 UFO NEWS ~ Did Russia just shot down a ball-shaped UFO? plus MORE
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyToday at 11:45 am by PurpleSkyz

» This Island Appeared Out of Nowhere, With Life Forms Never Seen Before
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyToday at 11:30 am by PurpleSkyz

» Saturday Surprise — Good People Doing Good Things!
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyToday at 11:25 am by PurpleSkyz

» Farmer Secretly Paid Town’s Pharmacy Bills for Years Before Death
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyToday at 11:00 am by PurpleSkyz

» Tens of Thousands of Defibrillators Being Installed in UK Children's Schools
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyToday at 2:08 am by PurpleSkyz

» Niacin - the Real Story
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyToday at 2:00 am by PurpleSkyz

» Feds prosecute Utah doctor who dumped out mRNA shots and worked with parents to free kids from vax mandates
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyToday at 1:45 am by PurpleSkyz

» URGENT: A big New Zealand study reveals high rates of kidney injury after the Pfizer jab
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyToday at 1:38 am by PurpleSkyz

» Forgiving the Medically Brainwashed in the Post-COVID Era
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyToday at 1:36 am by PurpleSkyz

» ‘Our Brains Have Stopped Working Right On Schedule’
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyToday at 1:34 am by PurpleSkyz

» FOOTAGE RELEASED - It's Hammer Time - San Francisco judge orders police bodycam footage of Paul Pelosi attack to be released
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyYesterday at 2:46 pm by PurpleSkyz

» UPDATED! BOMBSHELL: New Project Veritas video hits it out of the park!!! Watch now!
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyYesterday at 1:52 pm by PurpleSkyz

» There's a 'Lost City' Deep in The Ocean, And It's Unlike Anything We've Ever Seen
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyYesterday at 12:46 pm by PurpleSkyz

» Food shortages will increase in 2023: here are the top 13 most likely products to show scarcity
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyYesterday at 11:34 am by PurpleSkyz

» Hunter Biden Was Receiving Classified State Department Briefings on a Regular Basis – Used to Promote Biden Family Business
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyYesterday at 11:25 am by PurpleSkyz

» First Commercial-Scale Nuclear Fuel Recycling Facility Being Developed
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyYesterday at 11:07 am by PurpleSkyz

» Where the Hell Are Biden’s Weed Pardons?
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyYesterday at 11:04 am by PurpleSkyz

» Former Detroit Lions Player Dies at 25, Team Announces
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyYesterday at 11:00 am by PurpleSkyz

» CHD Sues FDA to Obtain Documents Related to VAERS Reports on COVID Vaccine Injuries, Deaths
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyYesterday at 10:54 am by PurpleSkyz

»  Breaking: Federal Judge Blocks California Law Punishing Doctors for ‘COVID Misinformation’ aka "COVID VAX TRUTH"
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyYesterday at 10:53 am by PurpleSkyz

» Satire Or Serious: "Why Didn't The Unvaccinated Do More To Warn Us?"
Open Source Web Crawling is About Ten to Fifteen Years Behind Google EmptyYesterday at 10:49 am by PurpleSkyz

You are not connected. Please login or register

OUT OF MIND » THE INSANITY OF REALITY » SCIENCE & TECHNOLOGY » Open Source Web Crawling is About Ten to Fifteen Years Behind Google

Open Source Web Crawling is About Ten to Fifteen Years Behind Google

Go down  Message [Page 1 of 1]

PurpleSkyz

PurpleSkyz
Admin

Open Source Web Crawling is About Ten to Fifteen Years Behind Google
Date: August 31, 2019Author: Nwo Report

Open Source Web Crawling is About Ten to Fifteen Years Behind Google Web-crawlers-730x430
Source: Brian Wang
 
In 1999, it took Google one month to crawl and build an index of about 50 million pages. In 2012, the same task was accomplished in less than one minute. The 2012 capability is about 50,000 times faster. This is slightly better than doubling the speed every year for 14 years.
In 2016, a new open-source Bubing web crawler was announced that can achieve around 12,000 crawled pages per second on a relatively slow connection. This is could be 1 billion pages per day. The pricing is about $40 per day. There is an arxiv article from 2016. (BUbiNG: Massive Crawling for the Masses) This is about the capability that Google had about ten to fifteen years ago.
BUbiNG is here at github.
a 64-core, 64 GB workstation it can download hundreds of million of pages at more than 10 000 pages per second respecting politeness both by host and by IP, analyzing, compressing and storing more than 160 MB/s of data.
It is about $200 for a 10 Terabyte hard drive. This would store about one hour of crawling.
Read More

Thanks to: https://nworeport.me



  

Back to top  Message [Page 1 of 1]

Permissions in this forum:
You cannot reply to topics in this forum