Implementation ideas how to deal with AI translation

I'm currently having the pleasure to support the translation efforts for a multinational site with many languages by AI tooling: I extended the Composum AI to support automatic translation . I had quite a number of interest ideas that I implemented there, which I'd like to share with you in this blog.

We wanted to use large language models (LLM), currently the OpenAI models with the Chat Completion API since OpenAI has an amazing support for many languages, and using a LLM means that you can give instructions for the translation like setting tone, style or instructing to use specific translations for certain words or phrases. This goes beyond the capabilities of traditional translation tools.

The automatic translation is a feature of the Composum AI that integrates a new way of doing translation into Adobe Experience Manager (AEM) If you happen to know AEM: instead of using language copies it integrates into the live copy mechanism so that rolling out the live copy automagically translates the page as well. If you don't know AEM - that doesn't matter much for this blog. I'll be talking about the translation implementation. This blog will discuss the points I find interesting enough that you might reuse them if you are doing something similar.

Finding attributes to translate

In AEM a page is composed of an number of components. Each can have a number of attributes that might be technical or actual texts (both richtext or plain text) that need to be translated, and can have subcomponents. The first task is to identify the text attributes among the attributes. Usually you'd have to configure translation rules enumerating all properties. We found that a heuristics is working very well here: if an attribute contains, say, two separate whitespace sequences then it's very likely a text to be translated, and if it doesn't then it's very likely a technical attribute. There are also a number of attributes like text, title, jcr:title, jcr:description that are always texts, and it's of course possible to configure exceptions (both blacklists and whitelists) for the heuristic. Since when doing an automatic translation it's always necessary to inspect the result before publishing, it is pretty safe to use this heuristical way and configure exeptions if needed, which proved to be very rare.

Translating a whole page at a time

One very important argument for using LLM for translation is that it's possible to translate a whole page at a time, giving the LLM the possibility to see the connections between the text of the page - which might play an important role in choosing the right translation. You'll however need need a way to put the text attributes of all components together into one text, and separate out the individual attributes again after it's translated. I did that by introducing separators like this:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 757238 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% About translating a whole page at a time %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 865743 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% We introduce separators like this and instruct the model to copy over this kind of separator, and also to leave HTML tags untouched to enable richtext. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 424242 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

These are different enough from text so that the model doesn't try to actually translate them somehow, but also do not disturb the text flow much. The numbers are random, so it's easy to check whether the LLM went through everything or aborted or got caught into loop, as it sometimes does for very long texts. In that case we apply divide and conquer - the text is split into two halves and the translation is retried.

An alternative to this would have been to use JSON somehow, but that would have put a considerable bigger cognitive load on the model which might degrade the translation quality, though some experiments would be interesting here.

Additional checks

Besides the mentioned check that exactly the numbered separators that are present in the input are present in the translation we do also a check whether all URLs present in the input are also present in the output, since the LLM sometimes tries to translate paths in URLs despite being instructed not to do so. If one or more URLs or relative URLs ( content paths) in the input (as found with regular expressions) are not present in the output, we also apply divide and conquer to reduce the cognitive load and retry the translation with additional explicit instructions to not translate these urls.

Translation rules

The other important advantage of using LLM for translation is that it's possible to give the LLM additional instructions for the translation. That can be things like tone and style, how formal to be (for instance whether to use the German informal "du" or the formal "sie" to translate "you"), but also how exactly to translate e.g. product names or technical terms. Since there could be a lot of such rules, it's necessary to restrict them to certain pages - it's either possible to set a regular expression for the page path or set a word / phrase / regular expression that has to occur in the page in order for the rule instructions to be added to the translation instructions for the page. (In AEM those rules can be set with Sling Context Aware Configuration.)

In case you have a some kind of dictionary that should be used for translation you'd probably get too many rules from that for manual rule creation. Thus it's also possible to use a spreadsheet where one column is a word or phrase in the source language and one is the word or phrase in the target language. If the source word / phrase occurs in the page, an appropriate instruction Translate "X" to "Y" is added.

Handling retranslation

The basic idea is that the page is translated, checked and possibly manually improved. To change texts in our AEM implementation it is necessary to cancel the live copy inheritance - that is, the component is completely exempted from future retranslations when the source text is changed. If the inheritance is not cancelled, the text is simply overwritten with a new translation if the components source text changes. For the retranslation of components with cancelled inheritance, that is, with manually revised text we are currently implementing a feature that displays all component texts from a page with manually revised text, shows the current source text, the automatical translation of that source text and the current text, and allows you to edit the current text. To support that there will be a button that does what I call differential retranslation : merge the changes from the source text into the current text. If you like to have a look at the request that is sent to the OpenAI chat completion API for that:

{ "model": "gpt-4o", "messages": [ { "role": "system", "content": "You are tasked as an expert translator to translate texts with utmost fidelity, preserving the original style, tone, sentiment, and all formatting elements (markdown, HTML tags, special characters) to the greatest extent possible.\nIMPORTANT: Only provide the translated text, maintaining all original formatting and non-translatable elements. Avoid any extraneous comments or actions not directly related to the translation." }, { "role": "user", "content": "Print the original text you have to translate exactly without any comments." }, { "role": "assistant", "content": "Our Contributors" }, { "role": "user", "content": "Translate this text into German" }, { "role": "assistant", "content": "Unsere Mitwirkenden" }, { "role": "user", "content": "Print this original text as it was manually adapted." }, { "role": "assistant", "content": "Unsere Autoren" }, { "role": "user", "content": "Print the new text that is to be translated." }, { "role": "assistant", "content": "Meet our contributors" }, { "role": "user", "content": "Translate the new text. Take care to include the manual adaptions for the original text." } ] }

There are two interesting points about that. First, this creates a made up "conversation" that uses the chat completion API to simulate a chat that is structured in a way that the logical answer of the AI is exactly what I need - a kind of "conversation engineering". Second, it is using what I call the put it in the AI's mouth pattern for this "fake conversation": it is structured so that the actual texts are in AI (assistant) messages instead of user messages, which makes it considerably less likely that the AI will interpret these as instructions, reducing prompt injection risks.

Conclusion

I enjoyed working on this project a lot since it needed finding (almost inventing) and impementing interesting ideas, and am looking forward to implement further enhancements to the translation. And I hope you enjoyed reading about it. Please contact me if you have any questions or suggestions! In case you're working with AEM: since the Composum AI is open source and free for you to use please make sure you try it and tell me about your experiences!