Going Beyond Google Translate?

Chessa, Francesca
Brelstaff, Gavin
Statistical machine translation (SMT) delivers texts unacceptable for literary or academic purposes since generally, it cannot assimilate adequate context: Yet how might one ever articulate such context? Here rather than taking a theoretical perspective we adopt an spatio-visual approach made possible by recent advances in the electronic presentation of multilingual texts:– we allow the translator supply the colour higlights... But how? Semantic units don't respect lexical boundaries and they occur at different scales. Any translator, committed to provide a definitive version of a text, eventually arrives at irreversible order of words – and may actually wish to justify their choices by documenting the correspondence between their version and the original. We focus on verse – an extreme challenge for SMT – with the eventual aim of expressing elusive aspects of semantic communication in order to differentiate those that can be articulated via spatio-visual cues. In verse a deviation from a literal correspondence is essential to reestablish in the translation a "decorum" appropriate to the original so that readers are encouraged to achieve an equivalent respect for its author also from the translated works. We use jQuery to provide an interface that lets the human translator mark up what they consider a correct alignment between words, or groups of words, in the original and their own translation – with a view to articulating context that may not be readily available to SMT. We detail below how the interface runs off a web-page and allows the alignment of equivalent ranges in parallel texts via a simple point-and-click action. Alignments created by the user are instantaneously made visible using a variant of the interactive color-highlight system mentioned above. Key to reducing the complexity of the implementation of the interface is our systematic deployment of open-standard, non-proprietary, web technologies.
Ciclo 2012 di seminari interni CRS4, Number 20120229.
multilingual web , HCI , jQuery , TEI , XML , parallel texts