Uncategorized

Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web

2023 Concordia Annual Summit - September 18
Photo by Riccardo Savi/Getty Images for Concordia Summit

Microsoft AI boss Mustafa Suleyman incorrectly believes that the moment you publish anything on the open web, it becomes “freeware” that anyone can freely copy and use.

When CNBC’s Andrew Ross Sorkin asked him whether “AI companies have effectively stolen the world’s IP,” he said:

I think that with respect to content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been “freeware,” if you like, that’s been the understanding.

Microsoft is currently the target of multiple lawsuits alleging that it — and OpenAI — are stealing copyrighted online stories to train generative AI models, so it may not surprise you to hear a Microsoft exec defend it as perfectly legal. I just didn’t expect him to be so very publicly and obviously wrong!

I am not a lawyer, but even I can tell you that the moment you create a work, it’s automatically protected by copyright in the US. You don’t even need to apply for it, and you certainly don’t void your rights just by publishing it on the web. In fact, it’s so difficult to waive your rights that lawyers had to come up with special web licenses to help!

Fair use, meanwhile, is not granted by a “social contract” — it’s granted by a court. It’s a legal defense that allows some uses of copyrighted material once that court weighs what you’re copying, why, how much, and whether it’ll harm the copyright owner.

That certainly hasn’t kept many AI companies from claiming that training on copyrighted content is “fair use,” but most haven’t been as brazen as Suleyman when talking about it.

Speaking of brazen, he’s got a choice quote about the purpose of humanity shortly after his “fair use” remark:

What are we, collectively, as an organism of humans, other than a knowledge and intellectual production engine?

Suleyman does seem to think there’s something to the robots.txt idea — that specifying which bots can’t scrape a particular website within a text file might keep people from taking its content. He says:

There’s a separate category where a website, or a publisher, or a news organization had explicitly said ‘do not scrape or crawl me for any other reason than indexing me so that other people can find this content.’ That’s a grey area, and I think it’s going to work its way through the courts.

But robots.txt is not a legal document. It, not fair use, is the social contract that’s been with us since the ‘90s — and yet some AI companies appear to be ignoring it, too. Microsoft partner OpenAI is reportedly among those ignoring it.

Disclosure: Vox Media, The Verge’s parent company, has a technology and content deal with OpenAI.