Projects for NLP course

Design an simple QA system(or Dialog System)

You can use FAQ from SMS Spam Collection Data Set, which contains 100M examples. The reference paper is "The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems". Github:rkadlec/ubuntu-ranking-dataset-creator

Design an translation system of Chinese-English

You can use the data from here (https://conferences.unite.un.org/UNCorpus/zh#introduction)

Design an auto Summary Extractor with baidu wiki

Design an information retrieval system with baidu wiki

Text Classfier for news

You can use the data from bytedance(https://github.com/aceimnorstuvwxz/toutiao-text-classfication-dataset)

*Any competitions released by alibaba, bytedance, baidu, tencent, huawei et al.

Read a paper publised in the last three years on NLP from the top conference, such as AAAI, IJCAI, ACL, EMNLP et al. You need to implement and show your coding when you report it.

Note that, most likely you will receive a lower score if you choose Level 2 other Level 1.