A report by The New York Times claims both OpenAI and Google transcribed videos from YouTube and used these texts to train their AI models, possibly violating copyrights. OpenAI reportedly transcribed more than one million hours of YouTube videos.
We already know they used all the public information on the Internet. How is this news? If AI is going to be any use, it needs to learn from somewhere.
People have been used to a lot of private services for a while now. YouTube is so ubiquitous it’s almost like a utility, in that everyone always has access to it and it’s just everywhere, with no real competitor.
But all of these social media services are private, so as much as they feel like public information utilities, once you’re on one, your data isn’t your own. I think that’s the disconnect when people hear that “their data” has been used for AI training. It ceased to be their data as soon as it went on the platform, at least tacitly in the US.
There has traditionally been a public expectation of control that simply isn’t there for any of these services. The industry knows this and capitalizes on it regularly. It’s a key tenet of technofeudalism.
Yeah, I’ve had this loop-de-loop conversation with a few people now:
“Are you against AI in principle?”
“No, they just shouldn’t use copyrighted material!”
“But you want them to be very similar to a human?”
“Yes”
“Have you ever talked to someone who’s never seen anything copywritten?”
We already know they used all the public information on the Internet. How is this news? If AI is going to be any use, it needs to learn from somewhere.
People have been used to a lot of private services for a while now. YouTube is so ubiquitous it’s almost like a utility, in that everyone always has access to it and it’s just everywhere, with no real competitor.
But all of these social media services are private, so as much as they feel like public information utilities, once you’re on one, your data isn’t your own. I think that’s the disconnect when people hear that “their data” has been used for AI training. It ceased to be their data as soon as it went on the platform, at least tacitly in the US.
There has traditionally been a public expectation of control that simply isn’t there for any of these services. The industry knows this and capitalizes on it regularly. It’s a key tenet of technofeudalism.
Yeah, I’ve had this loop-de-loop conversation with a few people now:
“Are you against AI in principle?”
“No, they just shouldn’t use copyrighted material!”
“But you want them to be very similar to a human?”
“Yes”
“Have you ever talked to someone who’s never seen anything copywritten?”