Title: | Improving deep embedded clustering for intent mining with jensen- shannon divergence and sophia optimizer |
Author(s): | Nguyễn Quỳnh Khánh Hà |
Advisor(s): | Đặng Ngọc Hoàng Thành |
Abstract: | Discovering customer intents from their written or spoken language plays a vital role in natural language understanding and automated dialogue response. However, labeling intents for new domains from the ground up is a daunting and time-consuming process, often requiring extensive manual effort from domain experts. To address this challenge, this paper proposes an unsupervised approach for discovering intents and automatically producing meaningful intention labels from a collection of unlabeled utterances in the context of a banking domain. In the initial stage, we deploy Deep Embedded Clustering (DEC) to simultaneously learn feature representations and cluster assignments to create a set of coherent clusters where the utterances within each cluster have the same intent. For enhanced performance, we modify the joint loss functions of DEC to preserve the local structure of the model for improved performance (known as Improved Deep Embedded Clustering with Local Structure Preservation). Importantly, we explore the use of a state-of-the-art optimiza tion technique called Sophia Optimizer and employ the Jensen-Shannon divergence as a measure of similarity in the clustering algorithm. We empirically show that our pro posed modification achieves state-of-the-art results in terms of NMI score, surpassing all prior unsupervised DEC architectures. In the second stage, intent labels for each cluster are automatically generated by extract ing the ACTION-OBJECT pair from each utterance using a dependency parser. The pro posed unsupervised approach is capable of automatically generating meaningful intent labels while obtaining high evaluation scores in utterance clustering and intent discov ery. While initially developed to build an intent model for conversational systems, this framework can also be adapted for short text clustering in various general applications. |
Issue Date: | 2024 |
Publisher: | University of Economics Ho Chi Minh City |
Series/Report no.: | Giải thưởng Nhà nghiên cứu trẻ UEH 2024 |
URI: | https://digital.lib.ueh.edu.vn/handle/UEH/72835 |
Appears in Collections: | Nhà nghiên cứu trẻ UEH
|