ChinaXiv.org 中国科学院科技论文预发布平台

Submitted Date

2020
2

Subjects

Natural Language Understanding and Machine Translation
2

Authors

Institution

国防科技大学计算机学院
2

result total 2.

Hide Summary

Hits

Date

Downloads

Your conditions: 谭郁松

1. ChinaXiv:202010.00060
Download

An Advanced ICD9 Terminology Standardization Method Based on BERT and Text Similarity

Subjects: Computer Science >> Natural Language Understanding and Machine Translation submitted time 2020-10-27

刘宜佳纪斌余杰谭郁松马俊吴庆波

Abstract： The ICD-9 terminology standardization task aims to standardize the colloquial terminology recorded by doctors in medical records into the standard terminology defined in the ninth version of International Classification of Diseases (ICD-9). In this paper, we first propose a BERT and Text Similarity Based Method (BTSBM) that combines BERT classification model with text similarity calculation algorithm: 1) use the N-gram algorithm to generate a Candidate Standard Terminology Set (CSTS) for each colloquial terminology, which is used as the training dataset and test dataset for next step; 2) use the BERT classification model to classify the correct standard terminology. In this BTSBM method, if a larger-scale CSTS is taken as the test dataset, the training dataset also needs to maintain larger-scale. However, there is only one positive sample in each CSTS. Hence, expanding the scale will cause a serious imbalance in the ratio of positive and negative samples, which will significantly degrade system performance. While if we keep the test dataset relatively small, the CSTS Accuracy (CSTSA) will degrade significantly, which results a very low system performance ceiling. In order to address above problems, we then propose an optimized terminology standardization method, called as Advanced BERT and Text Similarity Based Method (ABTSBM), which 1) uses a large-scale initial CSTS to maintain a high CSTSA to ensure a high system performance ceiling, 2) denoises CSTS based on body structure to alleviate the imbalance of positive and negative samples without reducing the CSTSA, and 3) introduces the focal loss function to further promote a balance of positive and negative samples. Experiments show that, the precision of the ABTSBM method is up to 83.5%, which is 0.6% higher than BTSBM, while the computation cost of ABTSBM is 26.7% lower than BTSBM." " "

Peer Review Status:Awaiting Review

Hits 12792 Downloads 1994 Comment
2. ChinaXiv:202010.00061
Download

Span Classification Based Model For Clinical Concept Extraction

Subjects: Computer Science >> Natural Language Understanding and Machine Translation submitted time 2020-10-27

汤勇韬余杰李莎莎纪斌谭郁松吴庆波

Abstract： Recently, how to structuralize electronic medical records (EMRs) has attracted considerable attention from researchers. Extracting clinical concepts from EMRs is a critical part of EMR structuralization. The performance of clinical concept extraction will directly affect the performance of the downstream tasks related to EMR structuralization. However, the mainstream method, sequence labeling model has some shortcomings. The clinical concept extraction method based on sequence labeling does not conform to the human cognitive model of language. At the same time, the extraction results produced by this method are dif- ficult to couple with downstream tasks, which will cause error propagation and affect the performance of downstream tasks. To deal with these problems, we propose a span classification based method to improves the performance of clinical concept extraction tasks by considering the overall semantics of the token sequence instead of the semantics of each token. We call this model as span classification model. Experiments show that the span classification model achieves the best micro-average F1 score(81.22%) on the corpora of the 2012 i2b2 NLP challenges, and obtained an F1 score(89.25%) comparable to SOTA in the 2010 i2b2 NLP challenges. Furthermore, the performance of our approach is always better than the sequence labeling model such as BiLSTM-CRF model and softmax classifier " " "

Peer Review Status:Awaiting Review

Hits 12574 Downloads 1638 Comment

An Advanced ICD9 Terminology Standardization Method Based on BERT and Text Similarity

Span Classification Based Model For Clinical Concept Extraction