Web Scraping Hypocrisy
Web scraping is a time honored tradition. Everyone does it. Just not at the scale that companies like Google, Facebook and Microsoft do it.
Some people say that data is the new oil. If true, that means if you have more data than the other team, you will win.
AI for example, needs to train on lots of data. And when I say lots, I mean lots at a level you may not be able to comprehend.
Companies like Facebook and LinkedIn have taken web scrapers to court.
But the problem is that the data that the other team is scraping is not owned by the Facebooks and LinkedIns of the world – it is, at least in theory, owned by you and me.
When we use a social media platform we give them a license to use our data, but they don’t own the copyright to that data.
But when these companies go after their competitors they pretend that the data is theirs. And the fact that no court has ever said that the social media companies have any property rights in your data doesn’t seem to be stopping them.
In the early days the companies used a legal theory called trespass to chattels. That theory requires that the company shows that the actions damage them. Scratch that theory.
Then they tried saying that it was a violation of the Computer Fraud and Abuse Act (CFAA). But the men and women in black (SCOTUS) disabused them of that thinking.
Now they are trying to use breach of contract.
Twitter, for example, has filed multiple lawsuits against web scrapers, including Bright Data, which might just be the biggest web scraper in the world.
Previously, the lawyers filed 10 to 20 claims hoping something would stick. Now the lawyers seem to be focusing on breach of contract claims, hoping that will work.
IF the contract law claims work, the ball is in the court of the web site operators.
If companies claim they have intellectual property rights in the data they collected from you, the courts may be more friendly to their theory.
Since the concept of breach of contract as property theory has no legal requirement for honesty, companies are free to press their advantage on what is deemed proprietary by them while claiming that they free to steal whatever data they want to.
This is far from settled, but it is going to get sorted out in the next decade or so.
Credit: Professor Eric Goldman