1. Chinese Word Segmentation and POS Tagging for Micro-Blog Texts 
    The training and test data consist of micro-blogs from various topics, such as finance, sports, entertainment, and so on.
    Download:  http://nlp.fudan.edu.cn/nlpcc2015/
    A newer and larger dataset can be found in https://github.com/FudanNLP/NLPCC-WordSeg-Weibo .
  2. Neural Sentence Ordering
    Since abstracts of paper are always well written and have strong logic clues. We collect all abstracts of papers (before 2016-5-25) from arXiv.com. Abstracts from arXiv can be mainly classified into 7 categories: statistics, quantitative biology, physics, computer science, nonlinear sciences, quantitative finance and mathematics. The development set and test set are the first and last 10% abstracts from shuffled data, and the training set consists of the remains. The detailed information of arXiv dataset is shown in https://arxiv.org/abs/1607.06952 .
    Download: https://drive.google.com/drive/folders/0B-mnK8kniGAiNVB6WTQ4bmdyamc