Latest topics











PLEASE VISIT
CannaBananaDaze
CBD/THC OIL INFO
www.cannabananadaze.com

MEDICAL & REC
GROWING TIPS & MORE
CURE YOURSELF!


You are not connected. Please login or register

OUT OF MIND » PLANET AWARENESS » SCIENCE & TECHNOLOGY » Open Source Web Crawling is About Ten to Fifteen Years Behind Google

Open Source Web Crawling is About Ten to Fifteen Years Behind Google

Go down  Message [Page 1 of 1]

PurpleSkyz

PurpleSkyz
Admin
Open Source Web Crawling is About Ten to Fifteen Years Behind Google
Date: August 31, 2019Author: Nwo Report

Open Source Web Crawling is About Ten to Fifteen Years Behind Google Web-crawlers-730x430
Source: Brian Wang
 
In 1999, it took Google one month to crawl and build an index of about 50 million pages. In 2012, the same task was accomplished in less than one minute. The 2012 capability is about 50,000 times faster. This is slightly better than doubling the speed every year for 14 years.
In 2016, a new open-source Bubing web crawler was announced that can achieve around 12,000 crawled pages per second on a relatively slow connection. This is could be 1 billion pages per day. The pricing is about $40 per day. There is an arxiv article from 2016. (BUbiNG: Massive Crawling for the Masses) This is about the capability that Google had about ten to fifteen years ago.
BUbiNG is here at github.
a 64-core, 64 GB workstation it can download hundreds of million of pages at more than 10 000 pages per second respecting politeness both by host and by IP, analyzing, compressing and storing more than 160 MB/s of data.
It is about $200 for a 10 Terabyte hard drive. This would store about one hour of crawling.
Read More

Thanks to: https://nworeport.me



  

Back to top  Message [Page 1 of 1]

Permissions in this forum:
You cannot reply to topics in this forum