Dictation tools using OpenAI API

I increasingly like using dictation for texts. Not that quite basic stuff you get on your handy or the medium quality MacOS dictation, but I'm quite impressed by the speech to text service of OpenAI. Unfortunatly I didn't find good tools to use it - so, again, it was time to roll my own. This blog describes a couple of them - perhaps there is something for you, too.

The Dictation App

One of the neat features of the speech to text API is that you can add a "prompt" to the audio, which is in this case a bit of a misnomer, as it doesn't contain instructions but rather an example that guides the model - in case of a dictation that ill usually just be the preceding text. That was a main idea behind my Dictation App . It is a simple web page that gives you a text area and a "Dictate" button, which works in a "push to talk" mode - you can set the cursor in any place in the text area, press the button, speak, and release the button. The audio will then be transmitted to the API, and the text will be inserted at the cursor position. That gives a nice flow creating short texts. THere is, of course, an "Undo", and as a spice up there is a "Fixup" button that will pass the text through the chat completion API with the instruction to reformat the text, correct obvious errors and perhaps carry out any correction instructions in there. (If you care - in dictation.js method fixupText there is the prompt that does this.) Of cource that needs copying and pasting to / from that app, but as long as I don't find an app that can do that, e.g., in a text editor, that 's a nice extension. BTW: of course part of this blog has been dictated using this app.

Dictation in the Composum AI

Of course I put that way of dictation also in our Composum AI. That's a plugin for Adobe AEM or Composum Pages that supports editing texts or generating them from external sources according to prompts. Here the prompts can often speed up things and on prompts any speech recognition errors are likely not that important as they would be in actual texts for the webpages.

Dictation on the command line

My ChatGPT tools provide many OpenAI services on the command line, among them chatgpttranscription to transcribe audio files, chatgptdictate, which will record audio until a key is pressed and then transcribes it and writes it to stdout. I added to the chatgpt script, a swiss knife type tool which you can access the OpenAI Chat Completion API including submitting files etc., with the argument -pa. That triggers chatgptdictate and thus will allow you to dictate part of the prompt, if you like.

Since ChatGPT is quite chatty in its normal personality, I really liked the nice idea on to create a command line tool for quick questions. So, of course I implemented this with my ChatGPT tool as well (q and as a nice twist integrated chatgptdictate into this, so that you can also dictate the prompt: qa . That forwards all additional arguments to chatgpt and will thus allow you for instance to add files which you want to process with the prompt you dictate. So, if you want a quick description of the logo of the AI Generation Pipeline , you can type:

qa -iu https://aigenpipeline.stoerr.net/images/AIGenPipelineLogo-square.png

and then say "describe this picture please", press a key and it will tell you something like

The image features colorful cartoon robots lined up on a conveyor-like structure, holding papers. It includes the text "AI Generation Pipeline." The background is blue and circular.

So, have fun using dictation with OpenAI. It's very functional, and easy to integrate into your own applications! If you have any comments, questions or suggestions, please contact me!