OpenAI and Google reportedly used transcriptions of YouTube videos to train their AI models

ooli@lemmy.world · 7 months ago

OpenAI and Google reportedly used transcriptions of YouTube videos to train their AI models

Immersive_Matthew@sh.itjust.works · 7 months ago

We already know they used all the public information on the Internet. How is this news? If AI is going to be any use, it needs to learn from somewhere.

circuitfarmer@lemmy.sdf.org · 7 months ago

People have been used to a lot of private services for a while now. YouTube is so ubiquitous it’s almost like a utility, in that everyone always has access to it and it’s just everywhere, with no real competitor.

But all of these social media services are private, so as much as they feel like public information utilities, once you’re on one, your data isn’t your own. I think that’s the disconnect when people hear that “their data” has been used for AI training. It ceased to be their data as soon as it went on the platform, at least tacitly in the US.

There has traditionally been a public expectation of control that simply isn’t there for any of these services. The industry knows this and capitalizes on it regularly. It’s a key tenet of technofeudalism.

Drewelite@lemmynsfw.com · edit-2 7 months ago

Yeah, I’ve had this loop-de-loop conversation with a few people now:
“Are you against AI in principle?”
“No, they just shouldn’t use copyrighted material!”
“But you want them to be very similar to a human?”
“Yes”
“Have you ever talked to someone who’s never seen anything copywritten?”