Harvest Data from Websites…But How Ethical is Web Scraping?
Posted by Ritesh Sanghani | Posted on: January 27th, 2015
We have entered an age where relevant, comprehensive and apt information has a pivotal role to play! In fact, it would not be an exaggeration to say that around 80% of business processes across the world, today, somewhere relies on effective online data harvesting and scraping.
This “dependability” on data has turned web scraping and data harvesting into a full-fledged industry with millions of dollars as yearly turn-over. As a matter of fact, it has become a favorite technique for many market researchers to understand the in-and-out of the volatile marketplace.
However, a recent verdict from the highest court of European Union brought grim news for the web scrapers. The European Union’s top judiciary body passed a verdict last week that that Ryanair has all the rights to block or enforce conditions on price comparison websites which are mining data from the airline’s online database without taking prior permission.
Yes, for all those who harvest data from a certain website to get insights about current market scenarios; the CJEU, highest judiciary body of European Union passed a verdict which has brought web scraping services under the radar.
The verdict came in the case where an airline operator Ryanair and a Dutch price comparisons business, PR Aviation came in a legal face-off.
Since, the news actually has stirred a controversy – a very crucial one – Is web scraping unethical? Well, this question will have a number of answers and all of them will be extremely objective.
A start up might simply love it; since it does not require extra investments; yet it offers powerful means to gain an insight in the volatile market. Thinking from the perspective of a start-up company; data collection or harvesting is crucial, since, it helps them to gain fruitful insight into the ever changing business scenarios, as well as helps in further strategizing their moves.
And it always happens, every story has a flip side – it might be a blessing for the small-scale or start-ups; but can lead to severe bottle-necked servers for big-wigs.
Bots and spiders are used to crawl through site to capture the information, which adversely affects the speed of the servers and ultimately slowing the site, resulting into high bounce rates. Moreover, the bots that are used to crawl through the sites create various real life problems where big companies complain that these bots makes the sensitive data on their sites extremely vulnerable. According to Incapsula.com; there are over 30% bots and spiders which are malicious and puts cyber security at stake. The email spamming is nothing but a result of (illegal) web scraping. Perhaps, this explains why big heads start getting shivers down their spine on hearing the term web scraping.
However, you cannot generalize everything! – There are several professional web scraping service providers who abide by the general rules and regulations; as well as make sure that they nowhere intrude the secure scape in the site. In other words, they get adequate and appropriate authorization from the concerned web resource.
These professional service providers will not engage in what may be called an offensive attack – They will not intrude the protected space (containing sensitive data like email ids of the respective clients, credit card numbers or other details of financial transactions).
Ethical or Not; one cannot deny the fact that web scraping is definitely has a crucial part to play as it ensures a business to flourish. However, scanning the present situation, it is important that taking a professional’s service to scrape data will certainly lead to a right path where legalities and ethics will not be compromised.