Source name: 
Author: 
INESC-ID Portugal
Description: 
This parallel corpus (Portuguese and English) consists of two sets of nearly 5,500 plus 500 questions each, to be used as training/testing corpora, respectively. Details on the translation and some experiments regarding statistical machine translation of questions can be found in [1] The original corpus of 6000 questions in English can be found in http://cogcomp.cs.illinois.edu/Data/QA/QC/. (The language of the corpus are European Portuguese and English. This corpus is a parallel bilingual corpus. Both corpora have the same number of lines, and each line is the translation of the line with the same number. Thus, both documents can be use to create a sentence by sentence alignment.
Resource type: 
corpus
Resource availability: 
free
Tags: 
Modality: 
text
Size: 
6,000 sentences