Rating: 3.62 / 5 Based on 42 reviews
Note: This article was updated in May 2024.
Recent advancements in AI, such as Google Gemini, open up new possibilities for Flutter app localization, offering accurate translations with contextual understanding. Despite this progress, many projects still grapple with time-consuming processes, especially those using only ARB files for translations.
Addressing this gap, at our Flutter app development agency, we created the arb_translate package. This package automates the addition of missing translations with Google's Gemini API and OpenAI's ChatGPT, enabling streamlined localization for Flutter apps. Read on to discover how we implemented it and how you can integrate it into your projects.
Until recently, if we wanted to provide a localized experience for our app users, we had only two options available: either hiring a translator or using machine translation. Neither of these options was ideal. Human translation can be costly and may slow down the release cycle, especially when managing various translators for multiple languages. On the other hand, machine translation often resulted in inaccurate translations that didn’t take into account the application context.
The situation has recently changed with the advancement of LLMs (large language models) such as OpenAI ChatGPT and Google Gemini. These models are now capable of understanding context and providing decent translations almost instantaneously.
New AI translation features have been swiftly introduced to many TMS (translation management software) services; however, not all have implemented them yet. Additionally, not all projects use TMS to store translations. This is particularly true for small or personal projects that rely on ARB files stored in the repository as the source of translations. Using AI translation in such projects requires time-consuming manual copying of new messages back and forth between ARB files and an AI chat. This is the problem arb_translate aims to solve by automatically adding missing translations to ARB files. Generated translations can then be imported back to the TMS service for review and manual tweaks if necessary.
As an AI model powering the arb_translate, we opted to use Google Gemini LLM because Google recently released its Dart SDK for Gemini models, presenting a great opportunity to explore its capabilities.
arb_translate has been designed to seamlessly integrate with Flutter apps using flutter_localizations for code generation from ARB files. Thanks to this integration, the setup process for arb_translate can be completed in just a few steps:
1. Generate your Gemini API token by following https://ai.google.dev/tutorials/setup
2. Install arb_translate by running the following command:
$ dart pub global activate arb_translate
3. Save your API token in the environment variable ARB_TRANSLATE_API_KEY by executing the command:
$ export ARB_TRANSLATE_API_KEY={your-api-key}
4. Run arb_translate in your Flutter project:
$ arb_translate
The integration process becomes slightly more complex if you are located in a region where the Gemini API is not yet available. Refer to the arb_translate package description for details.
Update: Google released the Gemini API in the EU, UK, and Switzerland during Google I/O 2024 so using Vertex AI in those regions is no longer necessary.
Update: Version 1.0.0 of arb_translate supports OpenAI ChatGPT models.
Setup for ChatGPT requires slightly different steps:
1. Generate your ChatGPT API token here https://platform.openai.com/api-keys
2. Install arb_translate by running the following command: $ dart pub global activate arb_translate
3. Save your API token in the environment variable ARB_TRANSLATE_API_KEY by executing the command: $ export ARB_TRANSLATE_API_KEY={your-api-key}
4. Select OpenAI as the model provider. To do it add arb-translate-model-provider: open-ai to l10n.yaml or use command argument --model-provider: open-ai
5. Run arb_translate in your Flutter project:
$ arb_translate
Take a look at our video below:
Implementing a tool like arb_translate using an AI language model isn’t particularly difficult. There are a couple of problems that need to be solved:
1. Reading and writing the ARB files
2. Communicating with the AI model
3. Creating prompts that result in good translations
4. Handling unexpected results from the model
The ARB format is based on JSON and isn’t particularly complex; however, working with it still requires some logic for extracting locale from file names and parsing placeholders, especially those including plurals. While we could write this logic from scratch, there is no need since the Flutter team has already implemented it in flutter_localizations. By reusing this implementation, we not only avoid the need to maintain it in the future, but we can also ensure that arb_translate handles ARB files consistently with how flutter_localizations handles them.
There is only one caveat: flutter_localizations was originally meant to be used only internally by the Flutter gen-l10n command. Because of that, we need to use some not-so-pretty hacks to make it work outside the Flutter command environment.
Communication with Gemini API is very straightforward, especially using the recently released google_generative_ai package. It’s a Dart client for Google AI models, for which we need an API key and a couple of lines of code to send a prompt to the model and receive an answer.
However, when implementing a tool that needs to work with a variety of inputs, we need to take into consideration the limitations of the model and API we are working with.
Like any other model, Gemini has limits on the amount of data it can receive as input and return as output. In the case of Gemini Pro, these limits are 32768 tokens of input and 2048 tokens of output. For our purposes, the lower of the two limits is more critical because translation tasks typically involve similar amounts of input and output data. To ensure that our translation results fit within the limit, we need to split our data into batches and translate them separately.
The next challenge is deciding the batch size. The limits are expressed in tokens, and a single token roughly represents 4 characters. While the Gemini API provides a method to calculate the token count of content, the translation result and its token count are impossible to predict. Therefore, the only approach is to estimate the output with some margin for error.
Each batch's translation can take a few seconds, so we aim to send batches parallel to expedite the process. However, another limitation of Gemini API comes into play here: a rate limit of 60 requests per minute. To avoid this limit, we have to limit the number of parallel requests and be prepared for possible rate-limiting errors.
AI translation offers two crucial advantages over previous machine translation tools: understanding the translation context and the ability to generate correct forms for messages with plurals. To benefit from these abilities, we must prepare a prompt that clearly articulates our expectations.
Ensuring the model generates correct forms for plural instances is quite simple with state-of-the-art models such as Gemini. To achieve this, we can include instructions in our prompt to add the necessary plural forms in the desired notation, which is ICU in the case of ARB format. This approach results in the model generating correct forms most of the time. However, translations may sometimes include invalid ICU categories, which is why we implemented validation, as described in the next section. This process could potentially be improved by explicitly providing the model with the correct ICU categories for a given language, which we will explore in the future.
Two types of context can enhance translation quality: the context of a single message in the ARB file and the application context, which can be provided in the arb_translate configuration.
Each message in the ARB template file can have additional metadata, including technical data related to plurals and a message description. Because the model can utilize all this information to provide more accurate translations, we send messages along with their metadata.
The second type of context is the application context, allowing the model to use terminology specific to our application's domain and select the correct version when multiple interpretations may be valid depending on the context. This becomes particularly crucial with homonyms, words with different meanings depending on the context. "Bat" serves as an example of a homonym in English; it can refer to either an animal or a piece of sporting equipment. Without additional context, Gemini translates "Bat" as an animal. However, providing context such as "Sporting goods store application" can be translated appropriately, although there may be some inconsistencies.
Given the nondeterministic nature of LLMs, we need to be prepared for malformed or unexpected results. Our tool expects the translation output to consist solely of raw ARB data. However, on occasion, the model may return additional text along with the ARB data or somehow malformed data.
The ARB format is based on JSON, so the initial step in result processing involves parsing it as JSON. If JSON parsing fails, we simply retry the translation. In the vast majority of cases, on the second attempt, we receive the raw JSON data response that we expect. While it may require more attempts in some cases, we can almost always get the correct result before reaching our retry limit.
One of the most significant advantages of LLM translation over previous machine translation solutions is its understanding of the ICU plural notation used in ARB format, enabling it to generate correct translation plural forms for other languages. However, while plural forms are typically formatted correctly in the ICU notation, there are instances when the model may hallucinate and produce nonexistent plural categories. To prevent invalid outputs from being written to ARB files, all results from the model are first parsed using a parser from flutter_localizations. This ensures that the output is at least syntactically correct.
The number of AI translation tools is increasing. While AI translation features provided by dedicated TMS services may often be the right choice, they may not be suitable for all projects. In such cases, our arb_translate tool powered by Gemini AI may be a good option to streamline the localization process of your Flutter app. It also serves as a proof of concept of how the Gemini API can be used in a Dart package to leverage modern LLM capabilities.
Try our AI localization tool for Flutter, the arb_translate package is available on PubDev and GitHub.
Don't forget to give us feedback and star our package.