I just think it’s odd how many verbs chat gpt uses like “crucial”, “essential”, and “leverage”. Like I don’t use that shit in regular conversations or papers. It’s like a small hint that it wants to be caught.
LLMs, in fact, have slop profiles (aka overused tokens/phrases) common to the family/company, often from “inbreeding” by training on their own output.
Sometimes you can tell if new model “stole” output from another company this way. For instance, Deepseek R1 is suspiciously similar to Google Gemini, heh.
This longform writing benchmark tries to test/measure this (click the I on each model for infographics):
The training data probably includes a lot more formal writing. As the major selling point of chatgpt is it sounding like it “knows” things. More “complex” verbiage is helpful to that. This type of writing is more common in things like textbooks and scientific writing in general which have been at least part of its training data.
I just think it’s odd how many verbs chat gpt uses like “crucial”, “essential”, and “leverage”. Like I don’t use that shit in regular conversations or papers. It’s like a small hint that it wants to be caught.
🤓☝️ akshually, crucial and essential are adjectives
And leverage is a noun. OP needs to back to grammar school.
Also a verb, to be fair. 😅
Leverage can be a verb, but isn’t here
I think my ignorance proves that I’m human ¯\_(ツ)_/¯
Thank you. I just woke up and was questioning my English skills and my entire reality.
LLMs, in fact, have slop profiles (aka overused tokens/phrases) common to the family/company, often from “inbreeding” by training on their own output.
Sometimes you can tell if new model “stole” output from another company this way. For instance, Deepseek R1 is suspiciously similar to Google Gemini, heh.
This longform writing benchmark tries to test/measure this (click the I on each model for infographics):
https://eqbench.com/creative_writing_longform.html
As well as some some disparate attempts on GitHub (actually all from the eqbench dev): https://github.com/sam-paech/slop-forensics
https://github.com/sam-paech/antislop-vllm
The training data probably includes a lot more formal writing. As the major selling point of chatgpt is it sounding like it “knows” things. More “complex” verbiage is helpful to that. This type of writing is more common in things like textbooks and scientific writing in general which have been at least part of its training data.
Yeah, it’s overly formal, but I do use each of those in regular conversion, just a lot more sparingly than AI seems to.