2025-03-02 Triggering local actions with AI on your computer
Triggering local actions with AI
Another topic I got pretty interested in is to have AI actually do something on my local machine, instead of just using it for suggestions. OpenAI introduced function calling for that quite a while ago, but as that's a bit technical to use. If you like to join in the fun - I'll give you in this blog a few easy ways how to do that. About as easy as writing simple shell scripts, that is - you'll need some experience with that.
Besides some basics I'll show you how to trigger local actions with my
CoDeveloper GPT Engine and from the command line with the
swiss knife type
chatgpt
script from my
ChatGPT Toolsuite, which I recently extended to support image input, chats,
structured output and tools. The CoDeveloper GPT Engine will work even with the free version
of ChatGPT; the chatgpt
script will need
an API Key from OpenAI - which is an excellent investment if you'd like to use
AI from the command line, I think. There is also some work on function calling with local LLM models (e.g.
Functionary), but I didn't try that out yet.

Using GPTs
One of the first ways OpenAI implemented for ChatGPT to trigger external actions were GPTs (expanding on the previous concept of ChatGPT Plugins. They offer an easy way to customize ChatGPT with instructions, knowledge and - external actions: you can register an OpenAPI description of the interface of a server and the GPT can then trigger that. The catch, however, is that you need a publicly accessible server using https for that, which your local machine normally isn't, right? But since I wanted that really badly for myself I did the work, and you can use that, too: I wrote the CoDeveloper GPT engine, which has a plugin mechanism besides its other functions.
Using the CoDeveloper GPT Engine to trigger external actions
I developed the
CoDeveloper GPT Engine mostly to support me in programming.
It consists of a local server you can start on the command line in an arbitrary directory on your machine,
and a GPT you have to set up in ChatGPT that connects to that server when it is running.
(Setting up that https access to that server is a bit of a hassle, but I described
several ways .) That way the GPT can
access the files in that directory - list, read, write or modify them. Since I wanted to have it execute and fix
builds, have a GPT inspect my database and run SQL queries and so on,
I added the possibility to put shell scripts into a certain directory
as external actions , and then I can ask the AI to
execute it. The AI can give arguments to the script as well as write something to its standard input, and gets
fed back the output. To inform the AI about the available scripts, I usually put a special script
listActions.sh
there that scans all the scripts in that directory for a comment Plugin Action:
that describes the script,
and prints the names and these comments. So if you have it call that first, it knows what's available.
So, if you could use it's capabilities to read and modify text files anyway, or are a programmer, that could be a nice solution for you. It'll need a bit of setup, though - see the website.
A bit of background: the OpenAI chat completion API
If you don't want to do everything from the ChatGPT chat interface, the Chat Completion API probably is the way to go. In the basics it has a very simple API where you can carry out a conversation with the AI by sending it the conversation so far, e.g.:
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What is the capital of England?"}
and you will get back a response like:
{
"role": "assistant",
"content": "The capital of England is London."
}
If you want to carry on with that chat, you'll add that to the request and the users next messge, too, and send it
again.
It's a bit more complicated than that - there are
various additional parameters,
but for most application you won't need most of them except perhaps the model
parameter to select a different model,
and perhaps the "max_tokens" parameter to limit the length of the response. That's all that's needed even for complex
applications like my Composum AI or
my AI Code Generation Pipeline.
That's also how the chatgpt
script started out
quite a while ago. Just try chatgpt make a Haiku about the weather
or try chatgpt -cr
to have a conversation.
To try the further examples in this blog, you'll need to either grab a copy of the
ChatGPT toolsuite
and add it's bin directory to your path or link the programs you need to a directory in your path. Of course you can
also
download the script individually, though it's more difficult to get updates this way and some of its functionality (e.g.
dictating prompts) relies on other scripts in the suite. You'll need to have an
API Key from OpenAI and put that either into the environment variable
OPENAI_API_KEY or a file ~/.openai-api-key.txt
.
Using structured output
Now, if you want to trigger an action on your local machine there are more facets to that - e.g. you have to make sure that the AI give something these local actions can process, as - being for a chat interface - the GPT models tend to give explanations and comments in addition to what you request from them. OpenAI provides a feature structured output where you give JSON schema file that describes the intended output. To create a JSON schema file sounds a bit complicated, but often isn't these times - you can often just describe what you need to ChatGPT and have it generate one.
An example using this approach is the script
findprogram
which might be worth a look. It uses chatgpt
search the path / Applications on your local machine in a two step
process. First, it gives the machine description to the AI and asks it to print list names of programs that could help
with the users task. It gives a simple JSON schema with the argument -rf
via the chatgpt
script to the AI:
{
"type": "object",
"properties": {
"programs": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"programs"
],
"additionalProperties": false
}
This ensures an easily parseable JSON output which is extracted from the JSON with jq -r '.programs[]'
. Then it
searches for programs matching any of the returned patterns and gives all found programs to ChatGPT, asking it to print
the best choices. Please compare also my blog entry about
using structured output from the command line.
Using function / tool calling
A big step forward was, however, the introduction of function calling . The big difference here is that you don't control the whole workflow from outside of the AI, but that the AI itself can decide whether to trigger one or more actions and which action with which parameters. Basically the AI can make a plan and trigger the actions itself. OpenAI likely introduced that for the ChatGPT plugins / GPTs mentioned above, but this is a part of the OpenAI Chat Completion API, so you can use that for yourself.
I won't go into the gory details here - you can see that in the
API documentation - but it basically works like this. In each
call to the API you can give a list of tools, that is, descriptions of functions the AI can trigger. The AI has the
choice to ignore them and output text for the user, or to create message with tool_calls
containing one or more calls
with a JSON snippet that names toe name of the tool to call and the arguments. The applications job is it now to execute
those calls and send a new chat request that contains the conversation up to that point for each tool call add a tool
message that contains the results of the tool call. The AI can then continue with generating the resulting text for the
user, or decide to make another round of tool calls.
That's probably what is doing the work behind the scenes of the OpenAI GPTs (and thus my CoDeveloper GPT Engine I
mentioned). Or you can use this via the chat completions REST interface by yourself, using the
specification or use libraries that wrap that (the core module of
my Composum AI also has that job, for example), but I've also made something that makes this
as easy as creating shell scripts - you can use tools from the chatgpt
script.
Giving the AI tools with the chatgpt script
For quick experiments or varying daily jobs that are not common enough to be worth creating an actual application I
added a mechanism to my swiss knive type chatgpt
script to create the possibility to give the AI access to command
line tools. This needs a tools configuration file that lists some tools and how they are to be called (which makes sense
especially if the tool is an external program you cannot change, though it might make sense to write a wrapper shell
script, or if you want to define a whole set of tools at once), or each tool can output it's own description if called
with the special argument --openaitoolsconfig
. Both for testing and demonstration purposes I have in the
ChatGPT Toolsuite a couple of
example tools to demonstrate how that works.
Let's have a look at the tool
writefile.sh, that outputs this as
--openaitoolsconfig
description:
{
"function": {
"name": "writefilesh",
"description": "Overwrites the named file with the given content.",
"parameters": {
"type": "object",
"properties": {
"filename": {
"type": "string",
"description": "relative path to the file to write wrt the current working directory - it's not allowed to write files outside of the current working directory"
},
"content": {
"type": "string",
"description": "The content the file will be overwritten with"
}
},
"required": [ "filename", "content" ],
"additionalProperties": false
},
"strict": true
},
"commandline": [ "./writefile.sh", "$filename" ],
"stdin": "$content"
}
The function
part is basically what OpenAI expects as tool description in the
Chat Completion interface that is used in the end.
This defines the description of the tool (used by the model) and the parameters the AI has to provide.
But there is additional information how to call the program: commandline
contains the name of the script and can
contain
references to the parameters (like here $filename
), and stdin
can be given if the tool expects input from stdin -
here the actual content for the file. The path to the program is given relative to the tools config file location,
if you use a separate config file, or to the program if --openaitoolsconfig
was used. Of course, you can also have
an array of several tools descriptions instead of only one.
If you are developing a tool description for yourself: the OpenAI playground gives a nice way to test the syntax of such descriptions - just add a function and try to call it in the chat.
The OpenAI assistants API
Just mentioning: OpenAI also provides the assistants API that goes a step further and encapsulates the whole chat. While the chat completion API is stateless, the assistants API saves the current state of the chat, so that just the difference (the last user, tools or assistant message) is sent in each step. That means a different billing strategy, and also allows more functionality: it is possible to upload documents the AI can use as background knowledge (likely using an internal RAG), and the chat can be arbitrarily long (which likely means that an internal RAG mechanism will access old messages when the chat becomes very long).
Conclusion
If you search for "AI agents" you'll find more and more tools where an AI can trigger some actions. But usually these are pre-made actions in the cloud. As a developer I'm more interested in having an AI help me in my daily tasks, and having an AI calling little scripts is a nice low barrier way to do that. I hope I could give you some ideas how you can easily do that. Please tell me what you did, want to do and how it worked!