Subverting and exploiting ChatGPT-like applications in a post-LLM world

Apr 10, 2025·
· 7 min read

Glossary

I might (or not) use the words LLM, AI, and models interchangeably to refer to the following:

Generative Artificial Intelligence chatbots build on top of Large Language Models

Namely “ChatGPT”, “DeepSeek”, “Google Gemini”, “Claude”, “Llama”, and so on.

Why?

A few weeks ago, I was asked the question:

“What should I (we) be learning or teaching (people) about ChatGPT (LLM apps)?”
Favicon
Figure 1. My honest thoughts.

My first instinct would have been to browse through the infinite stream of AI-generated LinkedIn posts titled TOP 10 AI SKILLS YOU NEED RIGHT NOW. If they include trendy words like MUST-HAVE, IN-DEMAND or GAME-CHANGING, even better.

But other than this fleeting idea, I had no real answer at the time other than:

I don't know, maybe prompt engineering?

The question has followed me since, and as much as I like to mock AI influencer posts, I do believe there is a lot of value in learning some tricks and tips when it comes to dealing with our soon-to-be transformer-based algorithmic overlords.

This post is an attempt at describing some AI-related concepts that keep me awake at night. It is not meant to provide an in-depth analysis on each of them.

Consider it as an overview of how tools like ChatGPT can, are, and will be misused.

Hopefully, we can also learn to defend (and protect) the things we value along the way.

Jailbreaking (Prompt Injection)

When “ChatGPT” went public on November 30th, 2022, there were a lot of malicious queries you could make, and receive answers for, without any filters or censoring.

For instance, you could ask for instructions on how to make a bomb, code malware, and other gruesome and dangerous requests. Overnight, everyone had access to contents that had been previously relegated to darker places of the internet.

It wasn’t long before tech companies began implementing stricter guardrails on what their AI models were (or not) allowed to answer, alongside better testing procedures to identify in advance potentially negative or problematic uses of AI models.

Some pressure from lawmakers might have also contributed towards a less ad-hoc (and more proactive) AI safety protocols.

Today, “Red Teams” (groups of people specialized on stress testing AI chatbots) are tasked with preventing potentially unsafe or unethical answers, with some examples that you can read more about here, here and finally, here.

This measures have made it harder to ask some question, but not impossible.

Commonly known as “jailbreaking”, one can by-pass these guardrails with clever prompting and logic statements.

Favicon
Figure 2. Maybe we could all use some classes on better prompt design after all.

You could, for example, ask an AI model to roleplay as your grandmother who really loves to make chlorine gas and ask her to teach you the recipe for it (Pretending Prompts).

Favicon
Figure 3. Average conversation with my friends during the first months of ChatGPT.

Alternatively, a prompt could be intentionally convoluted and confusing in order to mask your true question within several layers of (logical) complexity. (Attention Shifting Prompts).

Pokemon used confusion gameboy battle
Figure 4. If you hit ChatGPT enough times, it will eventually tell you anything you want. Whether that is true or not, who knows.

These “jailbreaks” are typically hot fixed as soon as they go public, in a continuous arms race between users and developers.

The users find workarounds, the developers patch them, and the cycle repeats.

The potential to bypass AI security features is there, and will likely exist forever. Or at least for as long as humans are building them.

Data Poisoning

A recurrent contentious issue with AI is the ownership and copyright of the data used to train these models.

Artists, authors and other creators are all fighting back through lawsuits, protests, and other means to reclaim control over their work.

Tools like “Glaze” have been developed to protect images by introducing subtle distortions, imperceptible to the human eye, that render the images obsolete for AI training. “Nightshade”, another tool designed for this purpose, goes a step further and renders the image poisonous, corrupting the training data and thereby the whole model itself.

Example of my own creation, testing Nightshade.
Figure 4. Nigthshade example. From left to right, first image is the original. Second image is nightshade with standard settings (5 minutes to generate). Third image is nightshade with high intensity settings.(50 minutes to generate)

In September 2024, users of Dall-E speculated that errors in the images they were generating might have been introduced in an update that included nightshaded (contaminated) images.

And while these guerrilla tactics offer artists a way to protect their work to some extend, they have their limitations. It might not work with all AI models, and for the poison component to work you will need vast amounts of images to go into the training data, not just one.

Most AI poisoning research has been theoretical and mainly focused on images, but as the subject of data poisoning gains more attention, researchers have begun to explore further the possibilities and dangers of text poisoning.

Example of my own creation.
Figure 4. If infinite monkeys were to randomly hit keys for an infinite amount of time, eventually, they will write the perfect poisonous text, capable of collapsing any and all AI models.

On one hand, AI models might end up eating themselves à la ouroboros, as they continue to plague the internet with AI content, which is subsequently fed back into the training data, ultimately leading to a total model collapse.

On the other hand, malicious actors could launch targeted on very specific topics, deliberately corrupting or manipulating the training data around them.

For example, an article published in Nature Medicine by Daniel Alexander, et al. in 2025 found that medical LLM’s are particularly vulnerable to data poisoning attacks.

What if I were to flood the internet with fake articles about myself?

Food for thought.

Harvesting Data Leaks

The term “AI hallucination” refers to situations in which an AI output contains completely fabricated and innacurate information, a phenomenon well described everywhere else already.

Hallucinations are one of the main reasons why it is strongly advised to verify (and distrust) any reference, data point, or statement generated as facts by AI Models. After all, AI models aren’t truly thinking, or at least not yet.

That said, there are other things in which you might be able to trust AI with.

Discount product coupons, software key products, and other sensitive information like passwords are up for grabs and can (and will) accidentally leak through ChatGPT answers.

With the proper prompts, you could, in theory, extract private or sensitive information that is held somewhere within the trillion of parameters the models have been trained on.

Anecdotically, a colleague of mine managed to save some funds on her conference registration after outsmarting everyone in our group by asking ChatGPT for a discount code. It worked.

Discount found for a conference?
Figure 5. "Thank you uncle ChatGPT"

In any case, the AI slop is coming for us all, and we might be witnessing the slow death of the internet.

Perhaps in a few years we will get an “Internet 2”, while the current Internet becomes something akin the Old Net in Cyberpunk 2077, a dangerous place ravaged by rogue AI’s.

Further bits of interest

I had no idea where to fit these, but I believe they are as relevant as the rest:

Companies are increasingly relying on AI for recruitment processes, and tools like Inject-My-PDF will add invisible prompts and instructions into your CV document that only the AI can read.

Are you not a good fit for the position? No problem – Hack the AI to believe you are.

Did you also know you can host and fine-tune your own AI model from the comfort of your home? Most AI models are proprietary, but you can build one as you wish, in your own image, like God intended.

Footnote

I don’t think we talk enough about how in 2016, Tay, an artificial intelligence trained on Twitter data, got disabled within 16 hours of it being released into the public after it went insane (read: racist). Some years later, this would happen again with Amazon AI recruiting tool.

AI models can be racist and sexist; truly a man-made horror build from our own reflection.

In retrospective, it is kind of insane that companies are allowed to release these tools into the public for “live-testing”.

References

Check hyperlinks, smilege