Projects for NLP course

There are two different levels of the projects, that you can choose one of them to submit.

Level 1

  • Design an simple QA system(or Dialog System)
  • You can use FAQ from SMS Spam Collection Data Set, which contains 100M examples. The reference paper is "The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems". Github:rkadlec/ubuntu-ranking-dataset-creator

  • Design an translation system of Chinese-English
  • You can use the data from here (https://conferences.unite.un.org/UNCorpus/zh#introduction)

  • Design an auto Summary Extractor with baidu wiki
  • Design an information retrieval system with baidu wiki
  • Text Classfier for news
  • You can use the data from bytedance(https://github.com/aceimnorstuvwxz/toutiao-text-classfication-dataset)
  • *Any competitions released by alibaba, bytedance, baidu, tencent, huawei et al.

  • Level 2

    Read a paper publised in the last three years on NLP from the top conference, such as AAAI, IJCAI, ACL, EMNLP et al. You need to implement and show your coding when you report it.

    Note that, most likely you will receive a lower score if you choose Level 2 other Level 1.