The next thing on my list to try out is Goose

Goose is defined as “a local AI agent”. One of the use cases I keep seeing is using it to develop, change, and understand codebases. In this example I’m going to focus on writing some code from nothing. I’ll spend some time on using it to help understand and change codebase, but as you’ll see, starting from zero has been a trip.

I’ll start out by noting I installed Goose in a virtual machine. I tried to use a local container, but I kept getting super weird errors I couldn’t google for, so a VM it is. I’ve heard too many horror stories about agents doing terrible things to the host system. The model Goose suggests by default is qwen3-coder:30b, so that’s what I’ll use. Here’s the prompt I used (yes I know it’s a bad prompt, bad prompts are part of the journey).

Create me a python application that runs as a REST API. I want the API to allow me to read or write a file. There should also be an API that can list all the files.

The application should use python virtual environments. The API should be served with uvicorn on port 8080.

After a few seconds of thinking, the model it decided to call was to create a todo list from that prompt. I mean, I guess that’s not the worst place to start.

Todo Write content: To do: 1. Create project directory and virtual environment 2. Install FastAPI and uvicorn 3. Create the main API application 4. Implement the file listing, reading, and writing endpoints 5. Configure to run on port 8080 6. Test the API

To do:

Create project directory and virtual environment

Install FastAPI and uvicorn

Create the main API application

Implement the file listing, reading, and writing endpoints

Configure to run on port 8080

Test the API

So that’s something. I mean, it could be worse, it didn’t delete my home directory :)

I tried many many things but I just couldn’t get the qwen3-coder model to work in the way I was hoping to while writing this blog post. And it kept failing in different ways. The non determinism of these things is one of my pet peeves. My favorite failure was when qwen3-coder kept trying to write files into the /repo/ directory. This is fixed with a better prompt of course, if I specified the exact directory to use it would at least write thing in the correct directory.

After some research it sounds like a more general model like gpt-oss:120b is better for this sort of agent workflow. So I’ll switch to that. The coding models are good at coding, but apparently not as good at tool calling.

I should also add I threw this prompt at Claude as the model backend for Goose. It did the right thing without any trouble (which should not surprise anyone really).

Tool calling?#

Yeah, there’s a thing that happens in this universe called tool calling. It’s how the LLM interacts with your environment (like deleting your home directory). There’s a way to add templates to existing models to give them the knowledge to call tools. At some future point I need to understand what that means.

I want to point out by fumbling around with terrible prompts and random models, I learned about thing I didn’t know about, in this case tool calling. This is why I try to look like an idiot when I work on these things. I don’t know what I don’t know. Yet. By acting dumb on purpose I get to fail really fast.

The results#

So when I fed my prompt to gtp-oss:120b it mostly did what you would expect. It created a very lame application that serves some file data via rest. It did what I asked, which given how bad that prompt was, isn’t too bad. I’ll past the code below. It’s not the best code ever written, the upload doesn’t quite work right (I might be holding it wrong), but whatever, it’s close enough for this experiment.

I should note I burned through over a million tokens while working on all this in less than an hour. It’s pretty clear that using an agent lights a token fire.

I’ll be curious to see how this can handle some much larger and complex prompts. I have a suspicion that for a large coding project, the giant LLM companies will be unbeatable. Which is fine. I’m more interested in using this to parse lots of text than I am to write code (I like to write code).

The code#

from fastapi import FastAPI, HTTPException, UploadFile, File
import os
from typing import List

app = FastAPI()

# Base directory for file storage (relative to this script)
BASE_DIR = os.path.join(os.path.dirname(__file__), "data")

# Ensure the directory exists
os.makedirs(BASE_DIR, exist_ok=True)

from pydantic import BaseModel

class FileInfo(BaseModel):
    filename: str
    size: int

@app.get("/files", response_model=List[FileInfo])
async def list_files():
    """Return a list of files with their sizes stored in the data directory."""
    try:
        files = []
        for f in os.listdir(BASE_DIR):
            fp = os.path.join(BASE_DIR, f)
            if os.path.isfile(fp):
                info = FileInfo(filename=f, size=os.path.getsize(fp))
                files.append(info)
        return files
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/files/{filename}")
async def read_file(filename: str):
    """Read the contents of a file and return as plain text."""
    file_path = os.path.join(BASE_DIR, filename)
    if not os.path.isfile(file_path):
        raise HTTPException(status_code=404, detail="File not found")
    try:
        with open(file_path, "r", encoding="utf-8") as f:
            content = f.read()
        return {"filename": filename, "content": content}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/files/{filename}")
async def write_file(filename: str, content: str):
    """Create or overwrite a file with the provided content."""
    file_path = os.path.join(BASE_DIR, filename)
    try:
        with open(file_path, "w", encoding="utf-8") as f:
            f.write(content)
        return {"filename": filename, "detail": "File written successfully"}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

Time to run Goose

Tool calling?#

The results#

The code#