I first encountered Hadoop in the fall of 2008 when I was working on an internet
crawl-and-analysis project at Verisign. We were making discoveries similar to those that
Doug Cutting and others at Nutch had made several years earlier about how to efficiently
store and manage terabytes of crawl-and-analyzed data. At the time, we were
getting by with our homegrown distributed system, but the influx of a new data stream
and requirements to join that stream with our crawl data couldn’t be supported by our
existing system in the required timeline.