Author: 
University of Łódź, http://pelcra.pl, PELCRA group, Department of English Language and Applied Linguistics, University of Łódź
Description: 
A subset of the PELCRA Polish parallel corpora licensed under the CC-BY license. This resource contains 11300 texts in 6 languages from the CORDIS website, 5556 texts in 28 languages from the RAPID site, 3037 press releases of the European Parliament in 22 languages and 109 press releases of the European Southern Observatory in 17 languages. The texts are sentence-aligned with the mAligna aligner using the Church & Gale algorithm. The texts are provided as TEI P5-compliant XML files with custom PELCRA extensions and in the XLIFF format. 
Resource type: 
corpus
Modality: 
text
Size: 
Total: 31,810,000 Words, 60,120 Texts. Slovenian (1,359,000 Words), Swedish (1,403,000 Words), Bulgarian (1,070,000 Words), English (1,985,000 Words), Greek (1,650,000 Words), Estonian (987,000 Words), Spanish (1,911,000 Words), Czech (1,401,000 Words), G
Production date: 
30/06/20
Domain: 
Format explanation: 
UTF - 8, Language script: Latn, Grek, Cyrl