Komendantov A.S., Matveev A.G., Svetlov A.V. Automation of archival documents morphological tagging

https://doi.org/10.15688/mpcm.jvolsu.2019.4.4

Anatoly S. Komendantov
Student, Institute of Mathematics and IT,
Volgograd State University
This email address is being protected from spambots. You need JavaScript enabled to view it. , This email address is being protected from spambots. You need JavaScript enabled to view it.
Prosp. Universitetsky, 100, 400062 Volgograd, Russian Federation

Alexander G. Matveev
Student, Institute of Mathematics and IT,
Volgograd State University
This email address is being protected from spambots. You need JavaScript enabled to view it. , This email address is being protected from spambots. You need JavaScript enabled to view it.
Prosp. Universitetsky, 100, 400062 Volgograd, Russian Federation

Andrey V. Svetlov
Candidate of Physical and Mathematical Sciences, Associate Professor, Department of
Mathematical Analysis and Function Theory,
Volgograd State University
This email address is being protected from spambots. You need JavaScript enabled to view it. , This email address is being protected from spambots. You need JavaScript enabled to view it. , This email address is being protected from spambots. You need JavaScript enabled to view it.
https://orcid.org/0000-0002-8764-6132
Prosp. Universitetsky, 100, 400062 Volgograd, Russian Federation

Abstract. The paper provides the description of the add-on to the stemming tool MyStem by I. Segalovich. We designe the application to add the MyStem a convenient graphical interface that is easy to learn and intuitive for users who do not specialize in information technology. It turned out that MyStem correctly processes outdated vocabulary if it is passed into the program using modern Cyrillic. In addition to the convenient interface, our program has the option to work with the outdated Cyrillic alphabet, when turned on, for instance, the letters zelo and omega are being replaced by «ks» and «o» respectively, and only then the text is transferring for analysis to MyStem, and then the characters are being replaced back in the processed document. So our add-on intercepts the output of the MyStem tool, reformatts and analyzes it in a special way. In addition, the application has functionality for removing homonyms manually if the program was not correct with automatic tagging the morphological characteristics of a word. The main purpose of this application is to prepare the morphological tagging of documents of the archival fund «Mikhailovsky Stanichny Ataman» to create a linguistic corpus. During the work on the application, we solved the problem with the correct processing of texts containing outdated Cyrillic characters. To implement the functional and user-friendly graphical interface, we use the JavaFX platform (OpenJFX).

Key words: automation of linguistic analysis, automation of morphological nalysis, MyStem tool, graphical interface, software shell, corpus-based linguistics.

Creative Commons License
Automation of archival documents morphological tagging by Komendantov A.S., Matveev A.G., Svetlov A.V. is licensed under a Creative Commons Attribution 4.0 International License.

 

Citation in EnglishMathematical Physics and Computer Simulation. Vol. 22 No. 4 2019, pp. 53-63

 

Attachments:
Download this file (1_Komendanrov_i_dr.pdf) 1_Komendanrov_i_dr.pdf
URL: https://mp.jvolsu.com/index.php/en/component/attachments/download/901
649 Downloads