2024-11-02 Using OpenAI structured JSON output from the command line
Using OpenAI structured JSON output from the command line
If you like to use AI for doing daily tasks from the command line, e.g. to extract data from unstructured text or even images or audio, and want to feed the results into other programs, you have to make the output machine readable. Of yourse it's a well proven technique to describe the output you want to the AI and give some examples there, but it still can happen that the AI goes off track and deviate from the intended output, so that your process stops there. To prevent that, OpenAI provided the nifty feature structured output where you give a JSON schema and OpenAI makes sure that the response of the model will match that format. (Which is, by the way, a really interesting problem to solve: that requires on each output token to verify which of the tokens suggested by the LLM has a continuation that will still match the schema. That'd be a lot of fun to write, but alas would be way too much for a sparetime project.) I integrated that into my chatgpt swiss knife type command line tool for using OpenAI's chat completion API, so that it's possible to use from the command line. But I didn't stop there - since writing JSON schemas is a bit of a hassle, I added some shortcuts for common use cases.
Of course, there also are many more command line tools in my ChatGPT toolsuite which you might like.
An example
As an example, let's extract the links from
the slides
of my
AdaptTo 2024 talk about the Composum AI
in a machine readable format, while pretending the slides weren't properly linked. We'll use multimodal output, so the
first step is turning the slides into images we can submit to OpenAI. We'll have
suggestbash make a
command line suggestion: suggestbash split talk.pdf into individual images
suggests e.g.
pdftoppm -png talk.pdf slides
to generate files slides-01.png
to slides-31.png
. Now, you can already get the links using chatgpt
from that,
using image input:
cmd=chatgpt
for fil in slides-*; do cmd="$cmd -i $fil"; done
$cmd 'print urls of links in the image; if there are no links print nothing'
That does work, but you might get a codeblock around the links or comments and whatnot, and would have to dissuade the AI to do that in the prompt, or take more clever measures. So let's see how we can avoid that.
Using a JSON schema
Now let's go the structured output way and provide a JSON schema. Since that'd be annoying to provide by hand: OpenAI's playground does offer an assistant that can generate a schema for you when you choose response type json_schema . I'll use the description "output a list of urls", and it'll create one for me:
{
"name": "url_list",
"schema": {
"type": "object",
"properties": {
"urls": {
"type": "array",
"description": "A list of URLs.",
"items": {
"type": "string",
"description": "A single URL."
}
}
},
"required": [
"urls"
],
"additionalProperties": false
},
"strict": true
}
Now let's call chatgpt again, this time using $(printf -- '-i slides-%02d.png ' {1..31})
(Kudos to ChatGPT) for the
-i slides-01.png -i slides-02.png ...
arguments, and the schema file:
chatgpt $(printf -- '-i slides-%02d.png ' {1..31}) -rf urlsschema.json 'print urls of links in the image as JSON'
That nicely prints us
{"urls":["https://ai.composum.com","https://github.com/ist-dresden/composum-AI","https://www.composum.com/","https://www.stoerr.net/ai.html","https://github.com/ist-dresden/composum-nodes","https://www.stoerr.net/ai"]}
Shortcut arguments
I added shortcuts that save you creating a schema file for two common usecases: creating a simple object with some
string attributes, and a list of simple objects with some string attributes. Here is an excerpt from the chatgpt -h
help:
Response Options:
-rj [R]esponse mode JSON: model outputs a JSON object
-rf schemafile Structured output: requests that the [r]esponse conforms
to the given JSON schema read from a [f]ile.
-ra attr1,... Structured output [r]esponse - JSON with [a]ttributes: comma separated
list of attributes to include in the JSON response.
Alternative to -rf - creates a simple schema with these attributes
as string properties.
-rar attr1,... Structured output for JSON [r]esponse [ar]ray of objects with the
given attributes - e.g. for extracting a list of entities from an input.
Alternative to -rf and -ra , all are string properties.
If I want, e.g., to extract the author and talk title from the first slide, I can use the -ra
option:
chatgpt -i slides-01.png -ra author,title "Print talk author and talk title as JSON"
That prints
{
"author": "Dr. Hans-Peter Störr, IST GmbH Dresden",
"title": "Composum AI - Supporting the Content Author with LLM"
}
Under the hood it creates a JSON schema for that object for you, so you don't have to bother with that. Or let's create a list of objects:
chatgpt $(printf -- '-i slides-%02d.png ' {1..31}) -rar name,description 'print the slide name and a content description as JSON'
returns
[
{
"name": "Slide 1",
"description": "Introduction to the presentation and overview of the conference."
},
{
"name": "Slide 2",
"description": "Details on the talk's content, focusing on the functionalities of Composum AI."
},
{
"name": "Slide 3",
"description": "Introduction to Dr. Hans-Peter Störr and his background."
},
...
]
BTW: if you get confused by all the chatgpt options - how about using the built in help feature -ha
:
chatgpt -ha how can I take the prompt from prompt.mp3
Conclusion
The chatgpt
tool from my ChatGPT toolsuite makes it easy to use OpenAI's
Structured output to generate machine readable output
for the tasks you want the AI to do, and opens many possibilities to join with other tools using the best
Unix spirit of combining small tools to do great things. Give it a try! I've been using that from extracting URLs from
screenshots, categorizing banking statements, extracting information from webpages, asking quick questions to ChatGPT
from the command line, and many more things.