Klyachin V.A., Khizhnyakova E.V. On the Possibility of Using the Wiener Index to Calculate Features of Natural Language Texts

https://doi.org/10.15688/mpcm.jvolsu.2025.3.3

Vladimir A. Klyachin
Doctor of Sciences (Physics and Mathematics), Head of the Department of Computer Sciences and Experimental Mathematics, Volgograd State University
This email address is being protected from spambots. You need JavaScript enabled to view it. , This email address is being protected from spambots. You need JavaScript enabled to view it. ,
https://orcid.org/0000-0003-1922-7849
Prosp. Universitetsky, 100, 400062 Volgograd, Russian Federation

Ekaterina V. Khizhnyakova
Senior Lecturer, Department of Computer Sciences and Experimental Mathematics, Volgograd State University
This email address is being protected from spambots. You need JavaScript enabled to view it. , This email address is being protected from spambots. You need JavaScript enabled to view it. ,
https://orcid.org/0000-0002-7914-9988
Prosp. Universitetsky, 100, 400062 Volgograd, Russian Federation

Abstract. The article demonstrates the application of the Wiener index to solving one of the problems of natural language text processing. The Wiener index is defined as the sum of all shortest distances in a weighted connected graph. This value characterizes the complexity of the graph. In this paper, two modifications of this index are introduced. In the first version, the usual Wiener index of an N vertex connected graph is divided by (N − 1)2 . In the second version, the Wiener index of a Euclidean graph is divided by the sum of the distances between any pair of non-coinciding vertices. For application to text processing problems, the article introduces a graph of text sentences: an edge is formed by a pair of words that occur in the text in some sentence. To calculate the value of the Wiener index for a Euclidean graph, word embedding is used. The article briefly describes the algorithm for learning word embeddings by T. Mikolova. In addition, the article provides an algorithm for approximate calculation of a spanning tree with a minimal Wiener index. The algorithm is based on minimizing the new term when adding an edge to the constructed part of the tree. In order to identify uninformative text, 4 features are calculated based on the Wiener index and its modifications. Classification is carried out using standard machine learning methods.

Key words: graph, Wiener index, spanning tree, words embedding, machine learning.

Creative Commons License
On the Possibility of Using the Wiener Index to Calculate Features of Natural Language Texts by Klyachin V.A., Khizhnyakova E.V. is licensed under a Creative Commons Attribution 4.0 International License.

Citation in EnglishMathematical Physics and Computer Simulation. Vol. 28 No. 3 2025, pp. 24-36

Attachments:
Download this file (klyachin.pdf) klyachin.pdf
URL: https://mp.jvolsu.com/index.php/en/component/attachments/download/1252
1 Download