Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis
Name:
Publisher version
View Source
Access full-text PDFOpen Access
View Source
Check access options
Check access options
Authors
Anjum, AshiqAffiliation
University of DerbyIssue Date
2019-03-28
Metadata
Show full item recordAbstract
In biomedical domain, abbreviations are appearing more and more frequently in various data sets, which has caused significant obstacles to biomedical big data analysis. The dictionary-based approach has been adopted to process abbreviations, but it cannot handle ad hoc abbreviations, and it is impossible to cover all abbreviations. To overcome these drawbacks, this paper proposes an automatic abbreviation expansion method called LMAAE (Language Model-based Automatic Abbreviation Expansion). In this method, the abbreviation is firstly divided into blocks; then, expansion candidates are generated by restoring each block; and finally, the expansion candidates are filtered and clustered to acquire the final expansion result according to the language model and clustering method. Through restrict the abbreviation to prefix abbreviation, the search space of expansion is reduced sharply. And then, the search space is continuous reduced by restrained the effective and the length of the partition. In order to validate the effective of the method, two types of experiments are designed. For standard abbreviations, the expansion results include most of the expansion in dictionary. Therefore, it has a high precision. For ad hoc abbreviations, the precisions of schema matching, knowledge fusion are increased by using this method to handle the abbreviations. Although the recall for standard abbreviation needs to be improved, but this does not affect the good complement effect for the dictionary method.Citation
Du, X., Zhu, R., Li, Y. and Anjum, A., (2019). 'Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis'. Future Generation Computer Systems, 98, pp. 238-251. DOI: 10.1016/j.future.2019.01.016.Publisher
ElsevierJournal
Future Generation Computer SystemsDOI
10.1016/j.future.2019.01.016Additional Links
https://www.sciencedirect.com/science/article/pii/S0167739X18326529Type
ArticleLanguage
enISSN
0167739Xae974a485f413a2113503eed53cd6c53
10.1016/j.future.2019.01.016