2024-08-27 Dictation tools using OpenAI API
Dictation tools using OpenAI API
I increasingly like using dictation for texts. Not that quite basic stuff you get on your handy or the medium quality MacOS dictation, but I'm quite impressed by the speech to text service of OpenAI. Unfortunatly I didn't find good tools to use it - so, again, it was time to roll my own. This blog describes a couple of them - perhaps there is something for you, too.
The Dictation App
One of the neat features of the speech to text API is that you can add a "prompt" to the audio, which is in this case a
bit of a misnomer, as it doesn't contain instructions but rather an example that guides the model - in case of a
dictation that ill usually just be the preceding text. That was a main idea behind my
Dictation App .
It is a simple web page that gives you a text area and a "Dictate" button, which works in a "push to talk" mode - you
can set the cursor in any place in the text area, press the button, speak, and release the button. The audio will then
be transmitted to the API, and the text will be inserted at the cursor position. That gives a nice flow creating short
texts. THere is, of course, an "Undo", and as a spice up there is a "Fixup" button that will pass the text through
the chat completion API with the instruction to reformat the text, correct obvious errors and perhaps carry out any
correction instructions in there. (If you care - in
dictation.js
method fixupText
there is the prompt that does this.) Of cource that needs copying and pasting to / from that app, but
as long as I don't find an app that can do that, e.g., in a text editor, that 's a nice extension.
BTW: of course part of this blog has been dictated using this app.
Dictation in the Composum AI
Of course I put that way of dictation also in our Composum AI. That's a plugin for Adobe AEM or Composum Pages that supports editing texts or generating them from external sources according to prompts. Here the prompts can often speed up things and on prompts any speech recognition errors are likely not that important as they would be in actual texts for the webpages.
Dictation on the command line
My ChatGPT tools provide many OpenAI services on the command line, among them
chatgpttranscription
to transcribe audio files, chatgptdictate
, which will record audio until a key is pressed and
then transcribes it and writes it to stdout. I added to the
chatgpt script,
a swiss knife type tool which you can access the OpenAI Chat Completion API including submitting files etc., with the argument
-pa. That triggers chatgptdictate
and thus will allow you to dictate part of the prompt, if you like.
Since ChatGPT is quite chatty in its normal personality, I really liked the nice idea on to create a
command line tool for quick questions. So, of course I implemented this with my ChatGPT tool as well
(q and as a nice twist integrated chatgptdictate
into this, so that
you can also dictate the prompt:
qa . That forwards all additional arguments to chatgpt
and will
thus allow you for instance to add files which you want to process with the prompt you dictate. So, if you want a quick description of the
logo of the
AI Generation Pipeline , you can type:
qa -iu https://aigenpipeline.stoerr.net/images/AIGenPipelineLogo-square.png
and then say "describe this picture please", press a key and it will tell you something like
The image features colorful cartoon robots lined up on a conveyor-like structure, holding papers. It includes the text "AI Generation Pipeline." The background is blue and circular.
So, have fun using dictation with OpenAI. It's very functional, and easy to integrate into your own applications! If you have any comments, questions or suggestions, please contact me!