Doug Laney writes for Information Management and shares his thoughts on how web content is an overlooked external data source. He shares examples of how companies are leveraging scraped web data and offers tips for companies interested in doing so. Below, find an excerpt from the article.
Web Content: The Overlooked External Data Source
With the gold rush to mine external data from social media posts, open data sources, and syndicated data providers for improved diagnostic, predictive, and prescriptive analytics, many organizations are turning to yet another source: harvested web content.
No doubt there is untapped value in the billions of social media posts each day, the 10 million or so open datasets published by government organizations and others, and from the thousands of data product companies aggregating and selling hundreds of thousands of data sets. However, when it comes to understanding what’s happening in your industry, or any industry, one of the best and most up-to-date sources of information can be others’ websites.
Harvesting (or “scraping”) content from websites can yield invaluable insights into the activities of competitors, partners and suppliers, such as product pricing, product specs and descriptions, job openings, newly introduced offerings, and so forth.