Title: | Weakly supervised prototype topic model with discriminative seed words: modifying the category prior by self-exploring supervised signals |
Author(s): | Ximing Li |
Keywords: | Dataless text classification; Topic modeling; Seed words; Category prior; Prototype scheme |
Abstract: | Dataless text classification, i.e., a new paradigm of weakly supervised learning, refers to the task of learning with unlabeled documents and a few predefined representative words of categories, known as seed words. The recent generative dataless methods construct document-specific category priors by using seed word occurrences only; however, such category priors often contain very limited and even noisy supervised signals. To remedy this problem, in this paper, we propose a novel formulation of category prior. First, for each document, we consider its label membership degree by not only counting seed word occurrences, but also using a novel prototype scheme, which captures pseudo-nearest neighboring categories. Second, for each label, we consider its frequency prior knowledge of the corpus, which is also a discriminative knowledge for classification. By incorporating the proposed category prior into the previous generative dataless method, we suggest a novel generative dataless method, namely Weakly Supervised Prototype Topic Model. The experimental results on real-world datasets demonstrate that WSPTM outperforms the existing baseline methods. |
Issue Date: | 2023 |
Publisher: | Springer |
Series/Report no.: | Vol. 27 |
URI: | https://digital.lib.ueh.edu.vn/handle/UEH/68745 |
DOI: | https://doi.org/10.1007/s00500-022-07771-9 |
ISSN: | 1432-7643 (Print), 1433-7479 (Online) |
Appears in Collections: | INTERNATIONAL PUBLICATIONS
|