Protect Broker Data with Anti-Scraping

by Victor Lund on April 9, 2014

DataIf you are a broker or an MLS, you must treat your data as an asset. By and large, our industry does a great job of protecting the front door of data access, but the back door of data scraping is wide open. It begs the question, if you care about data protection, why are you only scrutinizing the legitimate data users without hunting for the nefarious and illegitimate data thieves?

Thieves that want to steal data send robots out onto the internet to do their pickpocketing, and they do it quite well. Reviewing Distil Networks’ reports with Matt Cohen of Clareity Consulting heightened my awareness of how clever these people are. The rookies are the ones who are attacking from outside of the country. The good ones attack on networks like Charter and Comcast and other common Internet service providers that cloak them. They even use simulated browsers to access your site so that it is difficult to detect automated behavior.

Until recently, the answer was related to the high costs and high technical acumen that would need to be absorbed somewhere. Who can operate the solution and who will pay for it? But with Distil Networks, for a small to midsized broker with 50 website domains, the cost would be $100 per year to cover 1 million page requests a month. A single site is $36 per year. That is hardly an economic barrier and that includes support.

Sidenote: Those ComScore numbers that many portals are bragging about are not all consumers – double-digit percentages of them are robots. One of the chief reasons why realtor.com® traffic was outpaced so heavily by others is due to their implementation of anti-scraping solutions. By doing the right thing, they got their teeth kicked in.

Protecting every site in an IDX region will only happen if you mandate it across all IDX vendors and the MLS itself. Many MLSs already charge data access fees. Perhaps they can use those fees to shelter the costs of an anti-scraping solution. The issue with any mandate or requirement of this type is that it would only be successful with the support of the participating brokers. Without that support, it would be a political nightmare for any MLS.

Aside from blocking data scraping, products like Distil are effective at setting a trap to snare companies who are stealing and profiting from data scraping. The behavior of evildoers is collected forensically by the technology and can be used to pursue legal actions against them. You may not be surprised to learn that the same entity scraping your data is doing it nationwide. You may also not be surprised to learn that many well-known companies in the real estate industry are guilty of this practice. Frankly, it is easier and less expensive to scrape data than it is to license it. That is, unless you get caught.

For $100 per year, brokers are not going to shut down scraping unless every website in a market has the defense measures in place. However, for $100 per year, any broker can benefit from three other resounding benefits. First, your site speed will likely increase about 30 percent! Everyone loves fast sites, so consumers may prefer your sites to others.

Secondly, Google and other search engines will appreciate it and give you some extra SEO juice (or so say the experts). Lastly, if you have a site that has variable hosting fees based upon the amount of bandwidth, you will be able to cut those hosting costs commensurate with blocking 20-35 percent of your traffic that is not human.

For more information, visit realestate.distilnetworks.com

Disclaimer: WAV Group is not affiliated or associated with Distil or Clareity in any way. I would like to thank Matt Cohen of Clareity Consulting for spending time to help me become familiar with Distil.

Leave a Comment

Previous post:

Next post: