1. Chinese Word Segmentation and POS Tagging for Micro-Blog Texts 
    The training and test data consist of micro-blogs from various topics, such as finance, sports, entertainment, and so on.
    Download:  http://nlp.fudan.edu.cn/nlpcc2015/
    A newer and larger dataset can be found in https://github.com/FudanNLP/NLPCC-WordSeg-Weibo .
  2. Multi-task Learning for Text Classification
    The datasets of all 16 tasks are publicly available HERE.
  3. Neural Sentence Ordering
    Since abstracts of paper are always well written and have strong logic clues. We collect all abstracts of papers (before 2016-5-25) from arXiv.com. Abstracts from arXiv can be mainly classified into 7 categories: statistics, quantitative biology, physics, computer science, nonlinear sciences, quantitative finance and mathematics. The development set and test set are the first and last 10% abstracts from shuffled data, and the training set consists of the remains. The detailed information of arXiv dataset is shown in https://arxiv.org/abs/1607.06952 .
    Download: https://drive.google.com/drive/folders/0B-mnK8kniGAiNVB6WTQ4bmdyamc