Triggering local actions with AI

Another topic I got pretty interested in is to have AI actually do something on my local machine, instead of just using it for suggestions. OpenAI introduced function calling for that quite a while ago, but as that's a bit technical to use. If you like to join in the fun - I'll give you in this blog a few easy ways how to do that. About as easy as writing simple shell scripts, that is - you'll need some experience with that.

Besides some basics I'll show you how to trigger local actions with my CoDeveloper GPT Engine and from the command line with the swiss knife type chatgpt script from my ChatGPT Toolsuite, which I recently extended to support image input, chats, structured output and tools. The CoDeveloper GPT Engine will work even with the free version of ChatGPT; the chatgpt script will need an API Key from OpenAI - which is an excellent investment if you'd like to use AI from the command line, I think. There is also some work on function calling with local LLM models (e.g. Functionary), but I didn't try that out yet.

Illustration - instructing the AI to trigger actions on your computer

Using GPTs

One of the first ways OpenAI implemented for ChatGPT to trigger external actions were GPTs (expanding on the previous concept of ChatGPT Plugins. They offer an easy way to customize ChatGPT with instructions, knowledge and - external actions: you can register an OpenAPI description of the interface of a server and the GPT can then trigger that. The catch, however, is that you need a publicly accessible server using https for that, which your local machine normally isn't, right? But since I wanted that really badly for myself I did the work, and you can use that, too: I wrote the CoDeveloper GPT engine, which has a plugin mechanism besides its other functions.

Using the CoDeveloper GPT Engine to trigger external actions

I developed the CoDeveloper GPT Engine mostly to support me in programming. It consists of a local server you can start on the command line in an arbitrary directory on your machine, and a GPT you have to set up in ChatGPT that connects to that server when it is running. (Setting up that https access to that server is a bit of a hassle, but I described several ways .) That way the GPT can access the files in that directory - list, read, write or modify them. Since I wanted to have it execute and fix builds, have a GPT inspect my database and run SQL queries and so on, I added the possibility to put shell scripts into a certain directory as external actions , and then I can ask the AI to execute it. The AI can give arguments to the script as well as write something to its standard input, and gets fed back the output. To inform the AI about the available scripts, I usually put a special script listActions.sh there that scans all the scripts in that directory for a comment Plugin Action: that describes the script, and prints the names and these comments. So if you have it call that first, it knows what's available.

So, if you could use it's capabilities to read and modify text files anyway, or are a programmer, that could be a nice solution for you. It'll need a bit of setup, though - see the website.

A bit of background: the OpenAI chat completion API

If you don't want to do everything from the ChatGPT chat interface, the Chat Completion API probably is the way to go. In the basics it has a very simple API where you can carry out a conversation with the AI by sending it the conversation so far, e.g.:

{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}, {"role": "assistant", "content": "The capital of France is Paris."}, {"role": "user", "content": "What is the capital of England?"}

and you will get back a response like:

{ "role": "assistant", "content": "The capital of England is London." }

If you want to carry on with that chat, you'll add that to the request and the users next messge, too, and send it again. It's a bit more complicated than that - there are various additional parameters, but for most application you won't need most of them except perhaps the model parameter to select a different model, and perhaps the "max_tokens" parameter to limit the length of the response. That's all that's needed even for complex applications like my Composum AI or my AI Code Generation Pipeline.

That's also how the chatgpt script started out quite a while ago. Just try chatgpt make a Haiku about the weather or try chatgpt -cr to have a conversation. To try the further examples in this blog, you'll need to either grab a copy of the ChatGPT toolsuite and add it's bin directory to your path or link the programs you need to a directory in your path. Of course you can also download the script individually, though it's more difficult to get updates this way and some of its functionality (e.g. dictating prompts) relies on other scripts in the suite. You'll need to have an API Key from OpenAI and put that either into the environment variable OPENAI_API_KEY or a file ~/.openai-api-key.txt .

Using structured output

Now, if you want to trigger an action on your local machine there are more facets to that - e.g. you have to make sure that the AI give something these local actions can process, as - being for a chat interface - the GPT models tend to give explanations and comments in addition to what you request from them. OpenAI provides a feature structured output where you give JSON schema file that describes the intended output. To create a JSON schema file sounds a bit complicated, but often isn't these times - you can often just describe what you need to ChatGPT and have it generate one.

An example using this approach is the script findprogram which might be worth a look. It uses chatgpt search the path / Applications on your local machine in a two step process. First, it gives the machine description to the AI and asks it to print list names of programs that could help with the users task. It gives a simple JSON schema with the argument -rf via the chatgpt script to the AI:

{ "type": "object", "properties": { "programs": { "type": "array", "items": { "type": "string" } } }, "required": [ "programs" ], "additionalProperties": false }

This ensures an easily parseable JSON output which is extracted from the JSON with jq -r '.programs[]' . Then it searches for programs matching any of the returned patterns and gives all found programs to ChatGPT, asking it to print the best choices. Please compare also my blog entry about using structured output from the command line.

Using function / tool calling

A big step forward was, however, the introduction of function calling . The big difference here is that you don't control the whole workflow from outside of the AI, but that the AI itself can decide whether to trigger one or more actions and which action with which parameters. Basically the AI can make a plan and trigger the actions itself. OpenAI likely introduced that for the ChatGPT plugins / GPTs mentioned above, but this is a part of the OpenAI Chat Completion API, so you can use that for yourself.

I won't go into the gory details here - you can see that in the API documentation - but it basically works like this. In each call to the API you can give a list of tools, that is, descriptions of functions the AI can trigger. The AI has the choice to ignore them and output text for the user, or to create message with tool_calls containing one or more calls with a JSON snippet that names toe name of the tool to call and the arguments. The applications job is it now to execute those calls and send a new chat request that contains the conversation up to that point for each tool call add a tool message that contains the results of the tool call. The AI can then continue with generating the resulting text for the user, or decide to make another round of tool calls.

That's probably what is doing the work behind the scenes of the OpenAI GPTs (and thus my CoDeveloper GPT Engine I mentioned). Or you can use this via the chat completions REST interface by yourself, using the specification or use libraries that wrap that (the core module of my Composum AI also has that job, for example), but I've also made something that makes this as easy as creating shell scripts - you can use tools from the chatgpt script.

Giving the AI tools with the chatgpt script

For quick experiments or varying daily jobs that are not common enough to be worth creating an actual application I added a mechanism to my swiss knive type chatgpt script to create the possibility to give the AI access to command line tools. This needs a tools configuration file that lists some tools and how they are to be called (which makes sense especially if the tool is an external program you cannot change, though it might make sense to write a wrapper shell script, or if you want to define a whole set of tools at once), or each tool can output it's own description if called with the special argument --openaitoolsconfig. Both for testing and demonstration purposes I have in the ChatGPT Toolsuite a couple of example tools to demonstrate how that works. Let's have a look at the tool writefile.sh, that outputs this as --openaitoolsconfig description:

{ "function": { "name": "writefilesh", "description": "Overwrites the named file with the given content.", "parameters": { "type": "object", "properties": { "filename": { "type": "string", "description": "relative path to the file to write wrt the current working directory - it's not allowed to write files outside of the current working directory" }, "content": { "type": "string", "description": "The content the file will be overwritten with" } }, "required": [ "filename", "content" ], "additionalProperties": false }, "strict": true }, "commandline": [ "./writefile.sh", "$filename" ], "stdin": "$content" }

The function part is basically what OpenAI expects as tool description in the Chat Completion interface that is used in the end. This defines the description of the tool (used by the model) and the parameters the AI has to provide. But there is additional information how to call the program: commandline contains the name of the script and can contain references to the parameters (like here $filename), and stdin can be given if the tool expects input from stdin - here the actual content for the file. The path to the program is given relative to the tools config file location, if you use a separate config file, or to the program if --openaitoolsconfig was used. Of course, you can also have an array of several tools descriptions instead of only one.

If you are developing a tool description for yourself: the OpenAI playground gives a nice way to test the syntax of such descriptions - just add a function and try to call it in the chat.

The OpenAI assistants API

Just mentioning: OpenAI also provides the assistants API that goes a step further and encapsulates the whole chat. While the chat completion API is stateless, the assistants API saves the current state of the chat, so that just the difference (the last user, tools or assistant message) is sent in each step. That means a different billing strategy, and also allows more functionality: it is possible to upload documents the AI can use as background knowledge (likely using an internal RAG), and the chat can be arbitrarily long (which likely means that an internal RAG mechanism will access old messages when the chat becomes very long).

Conclusion

If you search for "AI agents" you'll find more and more tools where an AI can trigger some actions. But usually these are pre-made actions in the cloud. As a developer I'm more interested in having an AI help me in my daily tasks, and having an AI calling little scripts is a nice low barrier way to do that. I hope I could give you some ideas how you can easily do that. Please tell me what you did, want to do and how it worked!