Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.

Spent many years on Reddit and then some time on kbin.social.

  • 0 Posts
  • 686 Comments
Joined 4 months ago
cake
Cake day: March 3rd, 2024

help-circle

  • There was a politics subreddit I was on that had a “downvoting is not allowed” rule. There’s literally no way to tell who’s downvoting on Reddit, or even if downvoting is happening if it’s not enough to go below 0 or trigger the “controversial” indicator.

    I got permabanned from that subreddit when someone who’d said something offensive asked “why am I being downvoted???” And I tried to explain to them why that was the case. No trial, one million years dungeon, all modmail ignored. I guess they don’t get to enforce that rule often and so leapt at the opportunity to find an excuse.





  • Especially because seeing the same information in different contexts helps mapping the links between the different contexts and helps dispel incorrect assumptions.

    Yes, but this is exactly the point of deduplication - you don’t want identical inputs, you want variety. If you want the AI to understand the concept of cats you don’t keep showing it the same picture of a cat over and over, all that tells it is that you want exactly that picture. You show it a whole bunch of different pictures whose only commonality is that there’s a cat in it, and then the AI can figure out what “cat” means.

    They need to fundamentally change big parts of how learning happens and how the algorithm learns to fix this conflict.

    Why do you think this?


  • There actually isn’t a downside to de-duplicating data sets, overfitting is simply a flaw. Generative models aren’t supposed to “memorize” stuff - if you really want a copy of an existing picture there are far easier and more reliable ways to accomplish that than giant GPU server farms. These models don’t derive any benefit from drilling on the same subset of data over and over. It makes them less creative.

    I want to normalize the notion that copyright isn’t an all-powerful fundamental law of physics like so many people seem to assume these days, and if I can get big companies like Meta to throw their resources behind me in that argument then all the better.


  • Remember when piracy communities thought that the media companies were wrong to sue switch manufacturers because of that?

    It baffles me that there’s such an anti-AI sentiment going around that it would cause even folks here to go “you know, maybe those litigious copyright cartels had the right idea after all.”

    We should be cheering that we’ve got Meta on the side of fair use for once.

    look up sample recover attacks.

    Look up “overfitting.” It’s a flaw in generative AI training that modern AI trainers have done a great deal to resolve, and even in the cases of overfitting it’s not all of the training data that gets “memorized.” Only the stuff that got hammered into the AI thousands of times in error.




  • It also isn’t telepathic, so the only thing it has to go on when determining “what you want” is what you tell it you want.

    I often see people gripe about how ChatGPT’s essay writing style is mediocre and always sounds the same, for example. But that’s what you get when you just tell ChatGPT “write me an essay about X.” It doesn’t know what kind of essay you want unless you tell it. You have to give it context and direction to get good results.