Semi-Automating the DDTP translations: step by step process (contact: #ubuntu-fr-l10n or #ubuntu-translators on freenode).

Rationale

  • The DDTP represent around 50 000 strings to translate * 140 languages= 7 million strings to translate !
  • On good weeks, a typical yet active translation team translates 500 strings. It would take a hundred weeks with highly motivated volunteers of a large translation team, working non-stop, at their best to fully translate a language.

The solution

  • Delegate initial translation to Google Translate and review translations with humans to speed the process.

A Word of Caution before beginning

  • Machine translations don't replace (yet) human translations. It needs to be validated and corrected in many cases.
  • Depending on the target language, the quality of the translations will fluctuate. (e.g. French, German, Spanish…) should work well. Smaller languages might have more issues.
  • Another factor is the complexity of the strings. You'll have really good translations for simple strings and poor translations for poorly written or complex descriptions.
  • Finally, DO take time to do a mass review of the translations after the initial import. Given the size of the DDTP, a simple find and replace can save dozens of hours of translation.
  • Make sure your review team is closed and has strong standards to avoid validation of poorly translated strings by too enthusiastic members

Process

  1. 1/ Preparing the DDTP for Machine Translation
  1. 2/ Perform the Machine Translation

<REPEAT FOR EACH CHUNK OF THE FILE>

  • Import the file in Google Translator Kit : http://translate.google.com/toolkit/docupload?hl=fr
  • You might see errors:
  • the file might be too big (remove a few strings)
  • After a couple of seconds, Google Translator Kit displays the file along with automatic translations. You could theoretically review them here, but that's not the point.
  • You should fix some mistakes directly in Google Translator Kit thanks to its powerful search and replace functionnality that can be limited to translations only
  • The \n (line break) become \ n which messes everything
  • The \" become \ " (with a space) which messes everything
  • Check the po file validates in Poedit (do more fixes directly in Google Translator Kit if there are many mistakes)

</REPEAT FOR EACH CHUNK OF THE FILE>

  1. 3/ Bring back the suggestions into your drive

Download all the translated po back to your drive or your shared online folder in a specific folder (named translated to avoid messing up with original files)

  1. 4/ Do mass-Replace of obvious mistakes in each file to be able to import them back into Launchpad and also save a lot time when correcting the strings
  • DO NOT CORRECT ORIGINAL ENGLISH STRINGS: if you do so, Launchpad will consider those new strings (as now they differ) and thus discard the string AND your translation.
  • Use Gedit syntax coloring to quickly spot mistakes
  • Use RegEx to correct as many mistakes as possible before initial import into the Bogus Project. The Time Savings are HUGE.
  • Make sure that the PO file validates (using PoEdit)
  • It seems that Google Translate adds spaces after \ .
  • The \n (line break) become \ n which messes everything
  • The \" become \ " (with a space) which messes everything
  • The syntaxic coloration in gEdit highlights the mistakes well.
  • It might create bugs during the launchpad import as "" is a string delimiter.
  • Strings that you should also be careful for (Global)
  • Any word in English in the translated part
  • The beginning of your string should be capitalized (Paquet, Interface…)
  • Consistency in similar strings
  • Punctuation and Typography rules (spaces..)
  • Typos in the original strings should be reported as bugs against the DDTP (https://bugs.launchpad.net/ddtp-ubuntu/+filebug)
  • Strings that you should be careful for (French specific)
  1. 5/ Import the PO files back in our bogus project in Launchpad
  1. 6/ Review Process
  • Go to the DDTP, the imported translations in DDTP automation bogus project should appear as suggestions
  • Ask for forum members to do a first review for obvious mistakes and make suggestions
  • Then, team members can make the actual review and validate the translation

Additional Advice

Bring back the validated translations in the Bogus Project

  • To simplify mass corrections, regularly export the official DDTP in their normal form and reimport it in the Bogus project.
  • The manually reviewed suggestions will replace the automatic suggestions, reducing time to mass correct mistake and removing the previous automated suggestions.

Recycling fuzzy translations from older releases (contact Redmar for more details)

  • When you merge the translations of an older version of Ubuntu into the current version, there will be a lot of « fuzzy » translations for strings that are similar (for example, meta packages for different programs, debugging symbols etc).
  • msgmerge quantal_ddtp.po raring_ddtp.po -o merged_ddtp.po, for example
  • Those fuzzy strings often only need a few small changes (eg. program name) to be accepted, which can really speed up translations. And you don't have to worry about google putting in a weird translation, since it is all based on earlier translations done by a human.
  • You have to unfuzzy them using Find & Replace in a text editor. Remove the line with #, fuzzy. Save. Upload.

#. type: newglossaryentry{#2} #: frontmatter/glossary-entries.tex :15 #, fuzzy msgid "bla bla english" msgstr "bla bla translated"

Status, Results, Todo, Bugs & Questions

  • Results
  • Todo & Bugs
  • Create a DDTP automation LP team and invite coordinator for each language
  • Questions
  • Your question here
  • Status for French
  • Main
  • 6000 suggestions to generate, splitted by Pierre, 5 parts, translated and merged
  • Restricted
  • Translated manually, not applicable
  • Multiverse
  • 300 suggestions to generate, likely technical & difficult, Splitted by Pierre & Merged
  • Universe
  • 14 parts, translated and merged
  • ubuntu-l10n-fr/ddtp-ubuntu.txt
  • Dernière modification: Le 19/11/2013, 14:28
  • par teolemon