Limiting the Chance of Code Agent Prompt Injections

Yesterday, I wrote about the Lethal Trifecta when using coding agents and how I am escaping it via sandboxing. I built a place to code where there is nothing valuable to lose. The agents might be poisoned by prompt injection and able to phone home, but there’s nothing to send. I can wipe the entire VM at any time and rebuild it from a snapshot or from scratch easily.

This deals with one leg of the trifecta, which is sufficient, but I don’t ignore the other two.

To limit the chance of an agent being exposed to a prompt injections, I build on an architecture of very limited dependencies. My current project is to build visualizations in JS on D3. I only include D3 on pages in the browser (it’s not on my machine). I don’t use npm, and I have no other dependencies.

The thing I miss most is jest, but I decided to build a minimal testing framework (just need to run functions and make assertions). I run the tests in a browser, so I get access to a DOM too, which I could test against. All of the code for this project only makes sense inside of a web page in the browser, which is another sandbox. It’s like Inception up in here.

My other projects are python based and live in their own VM. I need some dependencies there (pandas, numpy, matplotlib and more). The main thing I am doing is keeping that separate from the visualization project so that any issue in one doesn’t affect the other.

Nothing else that I need for the project (that I didn’t create) lives in that VM.

My main exposure to untrusted text is that I let the agent browse the web. I don’t see how I could avoid this, which is why this leg of the trifecta could never be the one I eliminate.

Escaping the Lethal Trifecta of AI Agents

The “Lethal Trifecta” is a term coined by Simon Willison that posits that you are open to an attacker stealing your data using your own AI agent if that agent has:

You need all three to be vulnerable, but usage of Claw or Coding agents will have them by default. I would say that the second two are almost impossible to stop.

#2 Untrusted content includes all of your incoming email and messages, all documents you didn’t write, all packages you have downloaded (via pip, npm, or whatever) and every web page you let the agent read. I have no idea how to make an agent useful without some of these (especially web searching).

#3 External communication includes any API call you let it make, embedded images in responses, or just letting it read the web. Even if you whitelist domains, agents have found ways to piggyback communication because many URLs/APIs have a way of embedding a follow-up URL inside of them.

For my uses, I find it impossible to avoid these two. Reduce? Yes, but not eliminate.

So, my only chance to escape the trifecta is to not give agents access to my private data. This means that I would never let an agent process my email or messages. I also would never run them on my personal laptop. I would never let them login as me to a service.

This is why I built hardware and software sandboxes to code in. Inside a VM on a dedicated machine, there is no private data at all. I use it while assuming that all code inside that VM is untrusted and that my agent is compromised. I do my best to try to make sure that won’t happen, but my main concern is that there is no harm if it does happen.

Incidentally, this same lethal trifecta also applies to every package you install into your coding projects. If an NPM package can (1) read your secrets (2) is untrusted and (3) can communicate, then you may suffer from a supply chain attack. It’s obvious that code you install and run makes #2 and #3 impossible to safeguard against. Not having secrets in the VM is the best solution for supply chain attacks too.

Tomorrow, I’ll follow up with how I reduce the other two legs of the lethal trifecta.

Write While True Episode 52: Using Feedback

Lou: Hey, Brian. I wanted to start this episode by doing a little bit of a follow up to episode 48, where we talked about starting a collaboration. One of the things we ended with was doing a simple collaboration by just getting feedback on something you’re writing. And I wanted to talk to you about your thoughts on what to do with feedback.

Brian: Yeah, let’s do it, for sure. And to begin, the assumptions here are that you’re working on a piece of writing that you intend to iterate on. You’re going to revise this writing, you’re going to improve it. It might be a book, newsletter, blog post, something you care enough about to spend time on after you first put it out into the world for feedback.

Transcript

Dev Stack, Part XI: Sandboxing

Late last year, I completely changed my dev stack to Python on Linux with some other things. I wrote a series about it at the time:

My choices were driven by the dangers of AI Coding Agents and Supply Chain attacks (more generally, just running untrusted code).

Getting all development off of my main machine was a big step. Choosing Linux for that machine was driven by cost per computing power for a desktop machine, and that I only need to run VSCode, a browser, and dev tools that are Linux first anyway.

I have been programming on the bare OS, but I was always going to want more isolation between projects and between the projects and the machine. I finally completed that step.

My choice was to use QEMU-KVM, an open-source VM solution. This blog about QEMU-KVM on Ubuntu was the most useful (and accurate) for me.

My general setup:

  1. The machine only has Ubuntu, Firefox, Tailscale (see networking), and my KVM setup described above.
  2. I built one VM to work on a new project (charting visualizations for Google Sheets), which only needs Ubuntu, VSCode, Git, and Firefox.
  3. This project is in Javascript, but I am building it with a dependency on D3 and nothing else. No NPM, not even jest. D3 is only loaded by the browser (not on the machine)
  4. For testing, I am building a minimal test harness in JS. It runs in the browser, so it will also be able to do DOM testing.
  5. There is no firewall yet, but I will probably do that soon. As a first step, just limiting the ports. I will document that if I go that way. It would be inside the VM.
  6. I allow some limited logged in browsing in my outside OS, mostly ChatGPT, but not Google. The main OS is for research. Nothing else can be installed on it (through any means, even trusted). The VM browsers are only for using my software (not the internet).

Other solutions I considered:

  1. Cloud based programming (like codespaces): This would definitely work for some projects I have, but I feel like I’d run up against limitations. Long-term, I think this will become the only sane way to program.
  2. Docker: I am not that comfortable with it, and it seems like running GUIs (like VSCode) is not trivial. It would be more efficient with sharing installed software, but wasting disk space is just not an issue.
  3. No Sandbox: Just putting all development on a dedicated computer is probably enough. I went the VM route mostly out of personal interest. Having done it, one big plus is snapshotting.

Write While True Episode 51: Phase Based Writing Goals

Brian: I’m building up my ability to do that by spending my mornings and my first sip of coffee, learning this framework. And at the same time, you got to watch out because learning a JavaScript framework is not writing per se. So that’s what I’ve been up to. And it’s felt productive and like it works toward my ultimate goal, my ultimate writing focused goal, but I haven’t been writing. So I’ve mixed feelings about that. 

And I guess I want to say, Lou, you went through the whole process of blogging for a long time and then taking on a book project or multiple book projects, writing the book, editing the books. Now you’re in the phase where you have to be marketing the book. I hope you’re marketing the book. Talk to me about writing goals in different phases of projects.

Transcript

Write While True Episode 50: Habit Troubleshooting

Brian: I’m Brian Hall. And today we’re talking again about habits. In the last episode, we both set out some intentions to change our writing habits and establish a daily practice. Let’s check in on that. Lou, how did it go for you?

Lou: Not great, Brian. I did the habit none times.

Brian: Okay, not great. Remind us what you were going to do and tell us what happened.

Transcript

Write While True Episode 49: Tiny Habits

Brian: And so just getting yourself in position and getting started in this way tends to lead to more powerful, robust output. I guess I’ve reached the point where just by starting consistently every morning, I was cranking out a blog post a day for quite a while. And I guess that’s the other thing I’ll say too, is habits can come and go. You can lose a habit and it’s probably not helpful to beat yourself up about that because you can also regain them. If you’re in the middle of a sprint, you might write every single morning and you might produce quite a bit. And then you might finish that book or reach the end of that series of blog posts. And maybe you abandon your habit. It’s there to be reclaimed, I guess.

Transcript

26 for 26 March Update

At the beginning of the year I wrote down 26 fun mini-goals to try to do in 2026. I gave an update at the end of January and here’s another one with an update since then.

  1. I’ve been doing well with my vegan recipe books and I also found a great new recipe online for Sopa Locro de Papa that connects me to my Ecuadorian roots.
  2. I’ve donated all of the books I don’t want to keep (way more than 10), so that goal is done.
  3. I went to a new restaurant for valentine’s day in NYC.
  4. I appeared on the Thinking in Tech podcast where I talked about tech debt and AI.
  5. I released episode 48 of Write While True.
  6. I wrote one new Amazon Book Review for Forever Fit [ad] by Maxime Sigouin
  7. I started a new open-source project to package up some visualizations I have made in D3 and want to use in the Google Sheet I ship with my book, Swimming in Tech Debt.

If you want to see the sheet that I use to manage tech debt and get free emails on how to use it, sign up below:

Finding My First Open Source Contribution

I keep track of my GitHub open source contributions on this site’s GitHub page, but only back to 2013. According to GitHub, I opened my account in late 2010 to open a couple of issues on Yammer.net, which I was using to build an internal tool for Atalasoft that needed access to our Yammer data (Yammer was a precursor to Slack).

My first GitHub source contribution was to YUICompressor (a JavaScript compression tool) to output a Munge Map to aid debugging. I PR’d it in 2011. I needed this to help debug Atalasoft’s JS code in production.

But, that’s just GitHub. I’ve been posting code in other places before that. Here’s a multithreaded prime number sieve in clojure from 2008. Here’s a port of Apple’s CPPUnit to run on Windows from 2006. I found evidence that I published a JavaScript Code39 Bar Code Generator on my Atalasoft blog in 2008, which also has a Code39 web app based on it (which hosts the JS code). I have a lot of code snippets on StackOverflow, but only after 2008. My first post with code was in 2003 (comparing jUnit and NUnit).

I had a distinct memory of emailing an open source dev with a multi-threaded race condition fix for a C++ data structure that we used at Spheresoft. Looking at a list of our external libraries jarred my memory that it was WFC by Samuel R. Blackburn. I also found the WFC release notes in the Wayback Machine that mention my fix. He migrated WFC to GitHub much later, but I found a comment mentioning my fix. The actual diff predates the migration, but it’s the double-checked locking directly below the comment:

    // 1999-12-08
    // Many many thanks go to Lou Franco (lfranco@spheresoft.com)
    // for finding an bug here. In rare but recreatable situations,
    // m_AddIndex could be in an invalid state.

So, that’s 1999. Ironically, my oldest verified contribution is actually on GitHub, but predates its release by about eight years. Where’s my green square?

Before that, I have to go by memory because I can’t find the originals.

One thing that came to mind was back in college. I co-developed code for our computer center to draw plots on a Unix PC terminal (saving paper). Using that code, we also built a Unix PC driver for GNU Plot and sent it to them. I am pretty sure this was hosted on MIT’s Athena.

That would be in 1991 or so. I did some simple searches and didn’t find it, but supposedly there are FTP archives from that era, so I might try looking later.

Write While True Episode 48: Start a Collaboration

This is the beginning of season five, episode 48, and I’m going to tell you about something that we’re doing a little different now. This has always been a podcast where I, a software developer, talks to you, who I think are also software developers, about what I write and I’m trying to share with you tips and techniques for writing for people like us that want to write.

And in this season of Write While True, we’re going to start something new.

Write While True is now a collaboration between me and another software developer who writes. His name is Brian Hall. And throughout the entirety of this run of podcasts, I’ve talked a lot about the kinds of things I write, blogs, and I wrote a book, and this podcast.

Transcript