2024-08-25 Flashcards from pictures with OpenAI API command line tools
Creating flashcards from pictures of vocabulary lists with OpenAI API command line tools
Recently I wanted to help my son with his spanish lessons by creating flashcards for the extremely useful Anki app. He had a couple of pages with vocabulary lists in his school book. I admit a more responsible parent might have made him made type the words into the program since learning is improved if you do it many ways - reading, writing, speaking, listening, and so on. But since I find it always interesting to see whether I can solve a little problem quickly with AI, I thought I'd try to generate the flashcards from pictures of the lists, using my command line tools for OpenAI's services. Here is how nicely that went.
The flashcard program Anki can import flashcards from a CSV file, where each line contains a question and the desired answer. Using it is easy - there are a number of clients for the web, phones or the desktop, and when learning you'll get shown the question and have to think of the answer, and then you get to see the real answer and judge yourself how good you were, which will determine the time you'll get that question agaion. That's my primary way when I need to learn some random stuff. So I needed to turn pictures of pages that have basically columns like this into a CSV file:
con with Ana habla con Julia
?verdad? Right? Tu eres Julia, ?verdad?
Now, I'm a big fan of Unix command line tools, since they can often be combined in nice and quick ways to solve problems. So I wrote the ChatGPT tools that contain, among other useful stuff, quite a number of little scripts that give you the power of OpenAI's API services on the command line, starting from calling the chat completion API to creating images from a prompt, transcribing audio files, text to speach and so on. The chatgpt script is a swiss knive type tool that provides much of the functionality of the chat completion API functionality for the commmand line that I need here - it allows passing prompts and files that need processing (including images in our case) to ChatGPT. The prompt needed a bit of experimentation, but in principle it's good old prompt engineering: tell very precisely what you mean, and give some examples.
The image is a photo of a Spanish German vocabulary list.
For each vocabulary definition create two lines - one for Spanish to German and one for German to Spanish.
The tag of spanish to german is spde and the tag of german to spanish is desp.
In the image there is in most lines a spanish word or phrase followed by its german translation and sometimes
followed by a spanish example or comment where ~ is a placeholder for the spanish word / phrase.
For each such spanish word or phrase, output two lines that will be imported into the flashcard program Anki.
The first line should contain the spanish word or phrase, a | , the german translation, and,
if the example is present, a \n and the example.
The second line should contain the german translation, a | , the spanish word or phrase, and,
if the example is present, a \n and the example.
For example, for one line of the input, the output should be two lines like this:
hace sol|die Sonne Scheint \n Cuando hace sol, vamos a la playa
die Sonne Scheint|hace sol \n Cuando hace sol, vamos a la playa
Output everything without comments as a CSV file that can be imported into Anki.
Check that the translation / the examples are correct; if not print a FIXME at the end of the line.
It's wise to give the input piece by piece (in this case page by page) to ChatGPT, since it get's confused if there is too much data at once. So I use the chatgpt script on each image, where ankiprompt.txt contains the prompt above:
for fil in *.jpg; do
chatgpt -i $fil -pf ankiprompt.txt > ${fil%.jpg}.txt
done
You'll guess: the -i option says the next argument is an image file, the -pf option has it read the prompt from a file.
In the end I just concatenated the files cat *.txt > all.csv
and edited that slightly to remove codeblock start end
end markers ChatGPT put in, though I could have done that
with chatgptextractcodeblock, too.
BTW, if you like to try something like that, the chatgpt
script requires an OpenAI API key in ~/.openai-api-key.txt
.
If you rather like Anthropic's claude - I had Claude rewrite it to a similar
claude script, too.
The whole thing saved quite some time and was interesting to do. I increasingly use such AI based command line tools for small tasks, though in software development projects I tend to use more serious tools like my AI Code Generation Pipeline. Have fun to spice up your command line skills with AI, too! You are welcome to use my tools, or you can of course use other tools like Simon Willison's impressive llm. If you have any questions or suggestions, please contact me!