Non-English datasets
Often, finding a dataset to train your model is the most challenging part of the project. There may be occasions where a dataset is available but it is in a different language—this is where translation can be used to make that dataset useful for your task. There are a number of different ways to translate a dataset, as listed here:
- Ask someone you know, who knows the language
- Employ a specialist translation company
- Use an online translation service (e.g. Google Translate) either through the GUI or via an API
Clearly, the first two are the preferred options; however, they come with an associated cost in terms of effort, time, and money. The third option is also a good option, especially if there is a lot of data that needs translating. However, this option should be used with care because (as we will see) translation services have nuances, and each can produce different results.
There are lots of different translation services available...