Find information:

[7-30]Improving Topic Models with Latent Feature Word Representations

Date:2015-07-27

Title: Improving Topic Models with Latent Feature Word Representations

Speaker: Mark Johnson (Macquarie University, Australia)

          web.science.mq.edu.au/~mjohnson

Time: 30th July 2015, 15:30

Venue: Seminar Room (334), Level 3, Building 5,

     Institute of Software, Chinese Academy of Sciences (CAS),

     4 Zhongguancun South Fourth Street, Haidian District, Beijing 100190

 

Abstract:

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words learnt from very large external corpora have been used to improve the performance of many NLP tasks. In this talk I explain how we extended two existing Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora, and show that this improves the word-topic mapping learnt on much smaller target corpora.Experimental results show that by using latent feature information from large external corpora, our new models produce significantimprovements on topic coherence, document clustering and document classification tasks. The improvement is greatest on datasets with few or short documents, including social media such as Twitter.

 

Joint work with Dat Quoc Nguyen, Richard Billingsley and Lan Du.