European Commission - Directorate-General for Translation (DGT)
The DGT-Acquis is a family of several multingual parallel corpora extracted from the Official Journal of the European Union (OJ) in Formex 4 (XML) format, consisting of documents from the middle of 2004 to the end of 2011 in up to 23 languages.
Resource type: 
Resource availability: 
available for commercial use
available for research purposes
Can the resource be directly downloaded?: 
Production date: 
Format explanation: 
The original data of the OJ has been processed in several steps. In each step, the result of the previous step was refined to a finer granularity: (1) original data, (2) file level in Formex4 format, (3) file level in plain text and (4) paragraph level. The result of each step is a corpus packaged as a self-contained Multilingual Dataset Format (muset) file. Even though the musets are independent, they are linked to each other so that, for example, one can find the source document of any given text segment. Data users can choose the data with the most appropriate processing level for their own needs.