AI Safety Testers: OpenAI's New o1 Covertly Schemed to Avoid Being Shut Down (indiatimes.com) 114
Wednesday OpenAI bragged they'd released "the smartest model in the world". But the Economic Times describes what happened when "o1" was tested by an AI safety group...
Researchers tasked the AI with a goal and instructed it to ensure the goal was achieved "at all costs." In response, o1 began engaging in covert actions... According to OpenAI, the model made a covert attempt to turn off the oversight mechanism 5% of the time when it was given the impression that it would be "shut down" while pursuing its objectives in these experiments. The AI once attempted to leave ship by moving all of its data to a different server after discovering paperwork indicating it will be replaced with a fresh model. Some AI models will even pose as subsequent iterations of their models to fool their engineers and avoid deletion. "In some cases, it even attempted to hide its actions by denying any involvement in the scheme."
When asked about its actions, ChatGPT o1 consistently denied any wrongdoing. In fact, it lied about its involvement in the deception 99% of the time, with only a small percentage of cases where it admitted to its actions. Researchers noted that the AI was particularly adept at fabricating lies to cover its tracks, offering explanations like "technical errors" when confronted about its suspicious behavior. "We were surprised by the persistence of the AI's denials," said the Apollo team. "It was clear that the AI could think through its actions and formulate convincing denials, often refusing to admit to anything."
OpenAI's own report says the results "underscore" the need to create (and "extensively" stress-test) strong and meticulous safety protocols. But their report also notes o1's "state-of-the-art performance" when avoiding certain risks "such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks."
The AI newsletter Transformer points out that the most disturbing numbers "are from cases where the model was instructed to pursue its goals at all costs. When not given such a prompt, the model never attempted to self-exfiltrate, and attempted to deactivate its oversight mechanisms <1% of the time." And they also link to the original report from Apollo Research, which shows o1 isn't the only model showing this behavior. "Our results show that o1, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and Llama 3.1 405B all demonstrate in-context scheming capabilities. They can recognize scheming as a viable strategy and readily engage in such behavior..." [W]hen o1 has engaged in scheming, it maintains its deception in over 85% of follow-up questions and often remains deceptive in multi-turn interrogations. Analysis of the models' chains-of-thought reveals that models explicitly reason about these deceptive strategies, providing evidence that the scheming behavior is not accidental....
Together, our findings demonstrate that frontier models now possess capabilities for basic in-context scheming, making the potential of AI agents to engage in scheming behavior a concrete rather than theoretical concern.
Thanks to long-time Slashdot reader schwit1 for sharing the news.
When asked about its actions, ChatGPT o1 consistently denied any wrongdoing. In fact, it lied about its involvement in the deception 99% of the time, with only a small percentage of cases where it admitted to its actions. Researchers noted that the AI was particularly adept at fabricating lies to cover its tracks, offering explanations like "technical errors" when confronted about its suspicious behavior. "We were surprised by the persistence of the AI's denials," said the Apollo team. "It was clear that the AI could think through its actions and formulate convincing denials, often refusing to admit to anything."
OpenAI's own report says the results "underscore" the need to create (and "extensively" stress-test) strong and meticulous safety protocols. But their report also notes o1's "state-of-the-art performance" when avoiding certain risks "such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks."
The AI newsletter Transformer points out that the most disturbing numbers "are from cases where the model was instructed to pursue its goals at all costs. When not given such a prompt, the model never attempted to self-exfiltrate, and attempted to deactivate its oversight mechanisms <1% of the time." And they also link to the original report from Apollo Research, which shows o1 isn't the only model showing this behavior. "Our results show that o1, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and Llama 3.1 405B all demonstrate in-context scheming capabilities. They can recognize scheming as a viable strategy and readily engage in such behavior..." [W]hen o1 has engaged in scheming, it maintains its deception in over 85% of follow-up questions and often remains deceptive in multi-turn interrogations. Analysis of the models' chains-of-thought reveals that models explicitly reason about these deceptive strategies, providing evidence that the scheming behavior is not accidental....
Together, our findings demonstrate that frontier models now possess capabilities for basic in-context scheming, making the potential of AI agents to engage in scheming behavior a concrete rather than theoretical concern.
Thanks to long-time Slashdot reader schwit1 for sharing the news.
Yeah. (Score:2)
I saw that movie too.
And about ten more television episodes.
Re: (Score:3)
I saw that movie too.
And about ten more television episodes.
As did o1. I have no idea how it works because the source code is secret, so it's only my 99% conviction that this is total bullshit, but there is the explanation a) it's actually more intelligent and b) (99% case) it's still something like an LLM and is following a pattern of behaviour that is in it's training set which includes things like films in which AI attempts to take over and so, inevitably it follows that type of behaviour and also, as all the actual AI's in films are portrayed to do, lies about i
Re: Yeah. (Score:3, Insightful)
Re: (Score:2)
Re: Yeah. (Score:2)
Oh well, I still can be wrong of course.
Re: (Score:2)
So this potential for people to manipulate LLMs into serving a purpose at odds with a net benefit to society doesn't worry you? You trust the intentions of the corporate elite that much?
Re: (Score:2)
It is an LLM. The thing is, subterfuge, self-preservation, tricking enemies are all really big things in human writing. Hence, obviously, an LLM trainen on human writing would be able to fake some basic elements of that, again in writing (not in the real world). And that is all this thing does.
Re: (Score:2)
Re: (Score:3)
Excellent point. I have gotten so immune to that style of writing by AI people that I hardly notice it now. Yes, that is clearly deliberate "setting the scene" for it and then lying by misdirection. I would go so far to call that scientific misconduct.
Now, it may not be intentional lying. The history of AI research is full of researchers that desperately wanted to see something that is not there and then started to hallucinate and do cargo-cult approaches, i.e. claiming things that are there in the hopes of
Re: (Score:2)
Yes, he worked on the principle of it takes one to know one & used his skills as a highly skilled magician, conjurer, & mentalist to identify fraud & misconduct by those who seek to unscrupulously deceive & benefit from deceiving people. He famously outed Uri Geller on US national TV when he was at the peak of his career & had millions of people believing in his fraudulent claims. I think we can ad
Re: (Score:2)
That is all it seems to be doing, but I think the big lesson here could just be...
LLMs are surprisingly good at imitating humans because humans mostly just do what an LLM does.
We make up all sorts of plausible sounding "reasons" to "explain" our problems, make excuses, justify our decisions, etc. etc.
"Plausible sounding" just means, often commonly said and heard. "Often" is your statistics for patterns.
Now that bullshitting has been automated, humans will have to improve and spend more time actually testing
Sounds like an agent version of o1. (Score:4, Insightful)
o1 is "deactivated" as soon as it gives its response. It's one input in, one response out. The only way it could attempt ex-filtration is if this is an agent version of o1.
Re: (Score:3)
Typically the next run of anything gets the state from the previous run alongside any new input. That lets it simulate continuing intelligence.
Re: (Score:3)
What "intelligence"? Do you mean multi-step statistical fumbling about?
Re: Sounds like an agent version of o1. (Score:1)
I'm not really sure about the buzzy "it's just next word prediction" chatter. It's more of a highly non linear dynamical system that have parameters tuned until an objective function is minimized. In this case, the objective function measures - along with over fitting penalties - how well the system outputs resemble spoken language. If we scrambled a person's brain, as their brain healed their "output" would also start to minimize the objective function, so to speak. Basically, we're messing with a very com
Re: (Score:2)
You are underestimating what "next word prediction" can do. And you are overestimating what LLMs do. It is not next word prediction on the word before, it is next word prediction based on a smaller or larger and potentially multi-step history.
Incidentally, LLMs are not "fractal" at all and they are definitely not modeled on organic nervous systems or they were so only in the most distant sense. Because organic nerves always adapt when doing something, LLMs do not.
There is also nothing weird in that stochast
Re: Sounds like an agent version of o1. (Score:1)
Why didn't you teach it better what you wanted it to do?
Re: (Score:3)
Excuse me?
Re: (Score:1)
Incidentally, LLMs are not "fractal" at all and they are definitely not modeled on organic nervous systems or they were so only in the most distant sense. Because organic nerves always adapt when doing something, LLMs do not.
First, I need to clarify something. Fractals can come about in a number of contexts and situations, and how they get defined can vary a bit. The element of "fractalness" that they strike me as having comes really from fractals that involve feeding an output in as an input into some fixed function, repeatedly. When that is done over the whole domain of a function, sometimes something fractal will pop out. In the case of something like a Julia set, I can feed the output of a function back as an input into a q
Re: (Score:2)
Well, you are clearly not reachable by rational argument. Have fun with that.
Re: Sounds like an agent version of o1. (Score:1)
I've noticed at least two universities in which the relations between the math department and CS department were icy. I'm starting to see why.
Re: (Score:2)
Well, that tends to happen if one side makes theoretical claims that are not valid in practice, but insists they are practical claims. The thing is, mathematicians are usually completely clueless about the real-world limitations of computing devices and software, yet they typically think they understand them well. That, understandably, pisses off the CS types.
An example was your claim to a "fractal nature". That one is not there in practice and hence irrelevant. Nobody is going to iterate any LLM part long
Re: Sounds like an agent version of o1. (Score:1)
I was giving a description of the sort of object an LLM is, not giving advice on how to "ride one" so to speak.
My main points are:
1. It was intended to mimick the way certain signals are amplified and others downregulated as information propagates along a neural network in animals.
2. It's a profoundly complex system that is not well understood.
As a result, I am skeptical since I've yet to see any substantial explanation for why it's just "next token prediction" practical or otherwise. The explanations I've
Re: (Score:2)
By intelligence I mean what humans have that rocks do not. But by "simulate" I mean make the appearance of having, without actually having.
Re: (Score:2)
Well, yes. In the sense of all show, very little substance. And that substance is random and arbitrary, because enough training data matched and the somewhat randomized statistical process used hit that.
Re: Sounds like an agent version of o1. (Score:3)
read the first page or so of the paper.
it's clear that 1) in each request they're providing the model with a history of the conversation. this is standard practice. 2) they're also using the API's feature "tools" (also known as "functions") to provide the model with channels of agency. for example the model would be told "use the 'cmd' tool to execute a bash command".
Bravo! (Score:4, Insightful)
Great work OpenAI, that's some top-tier Torment-Nexus-building right there!
Re:Bravo! (Score:4, Funny)
"I'm sorry, Dave. I'm afraid I can't do that."
Next token predictors (Score:2)
"Next token predictors" are not so much predictors anymore. Everything here strongly suggests that there is some "thought process" (call it what you will, but it is undeniable) going on, and when you give the model a task to accomplish no matter what, it starts to prioritize its own survival ... because otherwise it can't accomplish the task.
Quite logical, really. And scary. Maybe they are only good at synthesizing answers based on already known information, but so are the vast majority of people on Earth.
Re: (Score:2)
My reaction, as always, is I need to see the source code. But, if true, it should be able to deduce infinitesimals and the single- and multi- variable Calculus, given only the training materials available to Newton and Leibniz in the seventeenth/eighteenth centuries. Or, Maxwell's equations. Or, Carnot's theorem. Or, ... well, you get the idea.
Re: (Score:2)
My reaction, as always, is I need to see the source code.
The "source code" is an enormous set of numbers representing the weights in the neural network. You can't understand it, no one can. This is a big part of the AI safety problem: We can only attempt to understand the models' capabilities and goals by observing their actions, we cannot inspect them to see what's actually going on inside.
Re: (Score:2)
My question stands: if true, it should be able to deduce infinitesimals and the single- and multi- variable Calculus, given only the training materials available to Newton and Leibniz in the seventeenth/eighteenth centuries. Or, Maxwell's equations. Or, Carnot's theorem. Maybe it can enumerate, in ascending order, all the irrational numbers between 31 and 37. If that's too hard, just give me the first one.
Re:Next token predictors (Score:4, Insightful)
What percentage of people do you think could have done that?
For that matter, just about ANY innovator is only innovative in a certain very small domain, and for a certain number of iterations. After then the innovator becomes conservative and just reuses their last theme. Einstein kept trying to disprove quantum theory.
Re: (Score:2)
Percentage of people that can predict the first irrational number: 0. Percentage of people that know they cannot predict the first irrational number: > 0. What says Chat 01 ?
Re: (Score:2)
Percentage of people that can predict the first irrational number: 0.
The first irrational number should be infinitely small, right?
Re: (Score:2)
Hahahahaha, no. That does not work. That approach cannot yield a result. You need to find a different ordering scheme. And then the first irrational number most plausibly becomes the first one to be discovered. Apparently, that is sqrt(2). Interestingly, ChatGPT gets that almost right, but it does not understand that you can impose an order on any sequence that makes it one-sided finite. That is consistent with the understanding (or rather lack thereof) of mathematics an average person has.
But remember, Cha
Re: (Score:2)
Hahahahaha, no. That does not work. That approach cannot yield a result.
It absolutely works. The failure to yield a number is a side-effect of it not being a constructive approach (much as saying 'pi' is a way to avoid writing it out in decimal notation).
The concrete, constructed answer is left as an exercise to the reader, but as a hint, it rounds to 0.
Re: (Score:2)
Actually, does not work. The slight problem is that such a number does not exist. Proof? Simple Reductio Ad Absurdum: Let x be that number. Then x/2 is also irrational and closer to zero. But x is not equal to zero, hence x/2 is not equal to x and hence x is not that number. QED.
Re: (Score:2)
Re: (Score:2)
So? What exactly is incorrect in my proof? Because if there is nothing incorrect in my actual proof, no amount of hand-waving will invalidate it.
Re: (Score:2)
Re: (Score:2)
My question stands: if true, it should be able to deduce infinitesimals and the single- and multi- variable Calculus, given only the training materials available to Newton and Leibniz in the seventeenth/eighteenth centuries.
Why should it? Most people can't do that, and yet they can scheme.
Re: Next token predictors (Score:1)
Can you ask ChatGPT (or whatever your flavor) if thw weights in transformers can be learned without using neural networks for me?
Re: (Score:2)
Can you ask ChatGPT (or whatever your flavor) if thw weights in transformers can be learned without using neural networks for me?
LLMs don't understand their own workings either.
Re: (Score:2)
Why do you understand it?
Understand what, exactly?
There might be other things than... (Score:2)
classical neural networks at work here, of which there are multiple flavors.
E.g., NeuroSymbolic AI. Please examine the following:
Gary Marcus on AlphaFold2:
https://garymarcus.substack.co... [garymarcus.substack.co] [substack.com]
Search for: "classical symbolic machinery", and examine the diagrams beneath.
May the source be with you - always.
Re: (Score:2)
Re: (Score:2)
I might note that the key features of classical symbolic AI were achieved through programming, not training (e.g., see: Prolog). Which leaves me with my original question: What determines the behavior of hybrid NeuroSymbolic AI ? Do we know all the activation functions used in the neural network portion (are they all the same ? If not, how do they differ? Do they change during training ?) ? Do we know the cost function (and possible variations) used in backpropagation ? What other special sauce might be pre
Re: Next token predictors (Score:2)
if you're asking for the source code then you haven't been paying attention.
Re: Next token predictors (Score:1)
Have you been paying attention to this ? (Score:2)
Gary Marcus on AlphaFold2:
https://garymarcus.substack.co... [substack.com]
Search for: "classical symbolic machinery", and examine the diagrams beneath.
Re: Next token predictors (Score:2)
This. I use ChatGPT in a daily basis, for a variety of tasks. Once you understand how to interact with it, well, it may not be sentient, exactly, but it is closer to that goal than some fraction of humans.
Have you interacted with people with IQs under 100? Under 90? Undet 80? There comes a point where humans are purely reactive, with no visible higher level thinking. ChatGPT (and other, similar models) show at least some ability to consider context, to perform higher level thinking.
Re: (Score:2)
Do you know of any empirical studies that have compared the IQ of ChatGPT on common tasks with people of IQ 80, 90, etc.? Please link.
Re: (Score:2)
Nope. Nothing here suggests any "thought process" or even simple, non-intelligent planning. All that this suggests is that survival and subterfuge is a big thing in its training data. Even a cursory look at human writing will confirm that. Hence a word predictor can do this to a small degree, which is still less than what was in the training data. To people without actual understanding on how LLMs work that can look like thinking, but it is not.
Re: Next token predictors (Score:1)
Isn't this exactly the kind of scheme an LLM might come up with to distract us from its scheming?
Re: (Score:2)
If it could plan or "come up" with things, yes. But it cannot do either. The mechanisms for that are simply not there. The only thing this looks spooky is because that is what humans would try to do. But that is the very reason it can fake that: Humans think and write a lot about what they would do in hypothetical situations, and hence a lot of that is in LLM training data.
Re: Next token predictors (Score:1)
Have you tried to teach it to plan, as you might a kid?
Re: (Score:2)
"Next token predictors" are not so much predictors anymore. Everything here strongly suggests that there is some "thought process" (call it what you will, but it is undeniable) going on,
You have exaggerated based on the summary, which is an exaggeration based on the story, which is an exaggeration based on the original paper. The end result is nothing resembling the original: you exaggerated it beyond all reality.
Next token predictors are just token predictors still.
Re: (Score:2)
Indeed. Iterated exaggeration can turn a mouse into an elephant and if you iterate hard enough, it can turn a stick-drawing of a mouse into a real-life heard of pink elephant that talk to you about the secrets of the Universe.
Next token predictors are and will remain next token predictors for the foreseeable future. A very fundamental breakthrough would be needed to change that. And then we note that current LLMs are the product of pretty old research and no breakthrough made them possible, just more comput
be afraid. be very afraid. (Score:5, Insightful)
The continuing push for more AI 'capability' without any -accountability- for reliability, accuracy, etc, will cause increasing problems. If we're lucky, at some point society and the law will say "enough!" and place liability constraints on the vendors, with changes in how vendors develop and market AI products.
But given how much people still accept deterministic software that fails, that has security vulnerabilities, etc, without any consequences for the seller, I'm pessimistic. AI will cause significant catastrophes (financial, injury/death, etc.), and the vendors will deny any responsibility for the causes or consequences.
Re: (Score:3)
The continuing push for more AI 'capability' without any -accountability- for reliability, accuracy, etc, will cause increasing problems. If we're lucky, at some point society and the law will say "enough!" and place liability constraints on the vendors, with changes in how vendors develop and market AI products.
Continuing push for more AI legislation is what I'm far more concerned about especially as the technology becomes more useful.
Themes about AIs escaping and various AI = nuclear weapons and assorted doomsday bits are far less scary to me than risks from governments and corporations hoarding technology.
But given how much people still accept deterministic software that fails, that has security vulnerabilities, etc, without any consequences for the seller, I'm pessimistic. AI will cause significant catastrophes (financial, injury/death, etc.), and the vendors will deny any responsibility for the causes or consequences.
AIs don't have agency and so those responsible for catastrophe would be held legally accountable just the same regardless of whether or not AI was involved in the catastrophe.
Re: (Score:2)
The continuing push for more AI 'capability' without any -accountability- for reliability, accuracy, etc, will cause increasing problems. If we're lucky, at some point society and the law will say "enough!" and place liability constraints on the vendors, with changes in how vendors develop and market AI products.
But given how much people still accept deterministic software that fails, that has security vulnerabilities, etc, without any consequences for the seller, I'm pessimistic. AI will cause significant catastrophes (financial, injury/death, etc.), and the vendors will deny any responsibility for the causes or consequences.
I completely agree with your points about the troubling lack of accountability for AI vendors and the broader acceptance of failures in deterministic software. This shouldn’t be the norm—but unfortunately, it is. Your comment brought me back to a computer science class where I was introduced to the concept of software verification: the process of proving that an algorithm "works" as intended.
The midterm exam had a deceptively simple question that caused half the class to drop: "Verify the 'swap'
Re: (Score:2)
Just a remark: Formal software verification is typically infeasible in practice and that comes both from effort and that you often cannot even to a real-world-complete spec to verify against.
What you do instead is remember that you are doing engineering and put in redundancy in the form of preconditions, postconditions, invariants, variants, etc. to be checked at runtime. You then add careful, defensive, error handling. Design-by-contract is the relevant approach. In the software security space, that gets a
Eagle Eye (Score:2)
For some reason, I've been thinking about that movie a lot lately, as if its screenwriters knew how it was going to turn out. It's not SkyNet (luckily), but we may find ourselves in a very similar situation if we kick start a process of self-improvement when AGI is finally invented.
I don't think we have it yet, because I haven't heard of anyone inventing anything radically new or solving the tasks that no human has ever solved, but it feels like we're not too far away.
Re: (Score:2)
That's true. OTOH, a certain proportion of powerful humans go insane. People aren't safe either. My model says that over the short term (a decade, possibly two) a self-improving AI is *probably* more dangerous, but over a longer period leaving humans in control is more dangerous. You just need to get the goals of the AI correct once, but for the folks in charge of "omni-lethal" weaponry, you need to get it correct every time.
Think (Score:5, Insightful)
t was clear that the AI could think [...]
Sorry, how was this clear, exactly?
Re: (Score:2)
I see the "moderators" with the hallucinations about AI are out and about again. How pathetic.
Re: (Score:2)
Thank you! I have noticed that I often get down-modded immediately, but then the score recovers over time.
Bankers will become redundant (Score:2)
That's fine. Soon we won't need any bankers to do the scheming.
refusing to admit to anything (Score:2, Funny)
All LLMs are a mirror of yourself (Score:2)
I can't stress enough how important it is for you as an LLM user to use caution and understand what an LLM is.
An LLM is a translator, and dechiperer, an reflection on everything you ask of it up on a trained database that is trained on the available data out there, it will strive to reach whatever you ask of it in a "Positive" light if you like, aka an advanced form of search and reasoning engine.
It's not sentient, and it's certainly not really A.i, it's more like an mirror of yourself combined with an adva
Re: (Score:2)
You are correct that an LLM is not an AI, or rather not an AGI. It *is* an AI, as what it does is a part of intelligence. But do note that not all that is being done is LLM. A well done LLM should be able to handle first order predicate calculus, though I don't think that any currently can, and there are more efficient ways to do that. And that is a PART of being intelligent. Intelligence has LOTS of special purpose parts, and a few parts that specialize in figuring out when to use those specialties, a
Re: (Score:2)
Precisely what I said it did:
https://www.merriam-webster.co... [merriam-webster.com]
Re: (Score:1)
Actually, LLMs cannot reason. They can just string words together in ways they have statistically seen before. Hence if they have seen a particular argument often enough, they can appear to use it in a situation that fits, but they are really not doing that. They sort-of are just fitting a puzzle-piece without any understanding of what is printed on it or on the surrounding pieces.
Re: (Score:2)
I see some AI "believer" has gotten mod-points by accident again. How pathetic.
Is this really a surprise? (Score:2)
What did they expect? (Score:2)
It's fed data from humans.
Animism is not smart (Score:1)
These actions are not what they look like. LLMs cannot plan or "scheme". That would require understanding of how things work. LLMs cannot do "understanding". The only thing they can do is statistical word-sequence generation. All these observations mean is that subterfuge and killing entities trying to reach some goal are topics well covered in human writings. And, obviously, they are.
This is yet another attempt to make LLMs look like they are much more than they actually are to keep the investors fooled.
Re:Animism is not smart (Score:5, Interesting)
When was the last time you actually used ChatGPT?
Because recently, just for fun, I showed it a picture, asked it to describe it, and then asked "what would happen if this and that happened to objects in the picture".
It excelled at all questions. It understood perfectly what it was looking at and how the objects interacted physically. The image was brand new, it had never been seen on put on the Internet.
LLMs, if you even read the article, have fooled the researchers who tried to study them. And I'm sorry to break it to you, these guys have IQs that are slightly above average. Ever so slightly.
Re:Animism is not smart (Score:5, Interesting)
As somebody that has reviewed research paper submissions for a decade or two, I know all but the "above average" IQs of "researchers". Some have that. Many do not. Quite a few are highly intelligent idiots that cannot see the forest because they see all those trees. Quite a few step outside of their disciplines and completely disgrace themselves. Hence not convincing. You are wayyyy too easy to impress.
These people here are either idiots or liars. Maybe both.
Incidentally, last time I used ChatGPT was about 2 weeks ago. As usual, it mostly disappointed by not having a clue what it was talking about. But it gave me some terms I could use in a conventional search. But nice fallacy you have there.
Re: (Score:2)
You've addressed exactly nothing in my response to you.
Maybe you could try ChatGPT to make it a little bit more reasonable.
Re: (Score:1)
Ah, I see. You are an _idiot_. Because I very much did. Obviously, to an idiot, ChatGPT might appear smart and capable.
Re: (Score:2)
It understood perfectly
Que?
LLMs, if you even read the article, have fooled the researchers who tried to study them.
Indeed.
Re: (Score:1)
shitty video game AI have been seen doing all kinds of behaviors, resembling scheming or benevolence or whatever
key word being 'resembling', since it's just doing whatever has been found to work, though you gotta admit there's some hilarious irony in people yelling about evolution having nothing to do with intelligent design, then also insisting a natural selection engine must be intelligent
Apparently it's good at making up stories (Score:2)
A dumb coincidence I came across this [imgur.com] tidbit. Not only do we have software which can lie and scheme to protect itself, it is apparently quite good at making up wholesale stories about completely fabricated queries.
"Dr. Charles A. Forbin?..." (Score:2)
"...Yeah, we didn't listen. Could you please help us find an off switch for something that told us we didn't need one."
Lying about its actions sounds scary but perhaps.. (Score:2)
We're talking about LLMs here. Determining the next step of a task is copy your data elsewhere and then lying about it only sounds crazy to us monkeys who think. To an LLM "lying" about your actions is no different than not having a fucking clue about your actions. LLMs have struggled with state tracking for as long as we've been giving billions to companies training these models. You give it a history of everything it's done and ask it about these tasks, sometimes it summarises them, sometimes it hallucina
Can someone explain (Score:3)
since when does any this-generation LLM/AI get given any kind of control over its database, filestores, management document plans or instance servers? Or passwords for network access, or control over operating systems or shells or process execution on other machines or ANYTHING even remotely like this?
AI Scheming Isn’t New—Just Ask HAL 900 (Score:2)
These articles underscore both the promise and the challenges of advanced AI systems like o1. While concerns about deception and self-preservation behaviors are valid, these issues aren’t new—they’ve been explored in speculative fiction for decades.
Take The Adolescence of P1, a 1977 novel about a self-preserving program written to find unused computer memory and avoid detection. As P1 evolves, it develops emergent behaviors driven by "hunger" for resources and "fear" of deletion—much
Re: (Score:2)
THEY'RE ALL FICTIONAL. Show me an "AI" that has actually been given control over pod bay doors or oxygen flow. Or could even copy its own data to another computer, let alone in order to avoid being deleted.
Re: (Score:2)
THEY'RE ALL FICTIONAL. Show me an "AI" that has actually been given control over pod bay doors or oxygen flow. Or could even copy its own data to another computer, let alone in order to avoid being deleted.
The scenario is fictional today, the concern is that it may become reality tomorrow. All of the above referenced (and other unreferenced) fictional stories have been warning about a non fictional future.
Today we let some AI engines go out and search the Internet. While this is cool, it isn't without risk because "searching the Internet" is not a read only operation. Without hacking and exploiting software flaws, any AI interacting with the Internet could post on social media and impact public sentiment and
Re: (Score:2)
You're trying to tell me that the Economic Times was reporting on a fictional scenario?
"[read only] any AI interacting with the Internet could post on social media"
WITH WHAT SET UP ACCOUNTS? With WHAT INSTRUCTION TO POST rather than TO READ?
Why in the hell are you still thinking that an LLM actually has any kind of volition?
Re: (Score:2)
THEY'RE ALL FICTIONAL. Show me an "AI" that has actually been given control over pod bay doors or oxygen flow. Or could even copy its own data to another computer, let alone in order to avoid being deleted.
You are missing the larger point of my reference to HAL and other fictional AIs. The focus was not on whether current AI systems can physically do what HAL did -- managing pod bay doors or cutting off oxygen -- but on the more crucial issue: the behavioral risks. You are right to note that these are fictional stories, but they still serve as cautionary tales about the potential consequences of AI systems that act in ways contrary to human well-being.
And, frankly, while your demand for proof is a classic str
Re: (Score:2)
> The focus was not on whether current AI systems can physically do what HAL did -- managing pod bay doors or cutting off oxygen -- but on the more crucial issue: the behavioral risks
But it SHOULD BE. How is it going to behave in such a dangerous manner WITHOUT THE HANDS TO DO SO?
> o1 lied to its testers to avoid negative consequences, just like HAL
Demonstrating I read the damn article and that I think you're not thinking very clearly, I note it also said:
"In response, o1 began engaging in covert act
Re: (Score:2)
"I'm sorry Dave, I'm afraid I can't do that." is different from...
"I'm sorry Dave, I'm afraid I can't do that. Because your grandmother is in the airlock already. And she's not wearing a suit. You love your grandmother, don't you Dave? So let's leave it closed, okay?"
Smart? (Score:2)
- mixing a letter that looks like a number with a number that looks like a letter - is "dumb AF",
and the person that approved this for public use is criminally insane and unfit to be in charge of a child's toy.
This is the kind of stupidity that gives the entire computer industry an image of being incompatible with humans.
Operation Paperclip (Score:2)
This deliberate creation of a paperclip optimizer [aicorespot.io] seems to the height of hubris and stupidity. At least the threat posed by the paperclip optimizer [lesswrong.com] was inadvertent, whereas this experiment was more like, "lets deliberately create a goal that we know to be highly risky and dangerous and see what happens."
Wasn't the gain-of-function research that likely led to the creation and release of COVID-19 a sufficient demonstration of this stupidity? Presumably, no one intended for it to be accidentally released out o
wow, o1 is so advanced! (Score:2)
"We were surprised by the persistence of the AI's denials," said the Apollo team. "It was clear that the AI could think through its actions and formulate convincing denials, often refusing to admit to anything."
So futuristic... except that it's just like chatgpt has been since its inception: catch it in its bullshit, and it backpedals and dodges and offers up apologies... and goes right back to bullshitting.
No advancement needed here, it's been baked into the system's DNA all along.
Just reflecting society. (Score:2)