
Introduction
The technology advancement in healthcare industry have helped us generate more and more data, it was estimate that global healthcare data generated in 2020 was 2,314 exabytes which increased from 153 exabytes in 153 exabytes [1]. These data were generated from multiple sources such as EHR (Electronic Health Records), clinical research, patient’s internet of thing (IoT), outpatient’s medical records, patient’s telehealth records and chat history, a large portion of these data takes the form of unstructured narrative text which describe the medical procedures such as clinical records, discharge summaries, clinical monitoring sheets or radiological reports [2].
These unstructured narrative texts are written by the doctors, nurses, pharmacist and staff providing care to a patient, and offer increased detail beyond the traditional discharge summary, these notes are generated during the course of care, and possesses detailed information such as the progress of a patient’s condition, the plan of care, medical and family history, as well as a number of other clinical attributes.
With the increase amount of these type of data, it make computer systems difficult to analyze the data in a usable format and consume the time of medical professions to extract valuable information from it and hence, the emergence of automated analysis of these information is vital for medical professionals in order to improve patient’s clinical outcome and lower the hospital operational cost. Example of such work include the prediction of hospital readmission, the identification of adverse effects in high-risk patients, and the creation of personalized disease risk predictions [3]
This automated analysis of these unstructured texts could be accomplished with the help of the medical text analytic techniques. Text analytic is the process of examining large collections of documents to discover new information or knowledge from the documents, it identifies facts, relationships, assertions in the medical notes and convert it into a structured form that can be further integrated into databases, data warehouses or business intelligence dashboards and used for descriptive, prescriptive or predictive analytics by the health professionals.
i. Named Entity Recognition (NER)
Medical Named Entity Recognition (NER) refer to the task of automatically identifying the medical terms in different chunks of textual data [5], it is mainly used to extract important entity categories such as clinical findings, procedures and drugs names and their dosages [6] from clinical notes. NER is typically a 2-phase process where the first one involves detections and determination of entities within the text, and the second one involves the selection and extraction of the entities. [8] The example of NER application in medical includes the extraction of medical concepts from clinical reports written in different languages, the discovery of concepts related to temporal expressions, personal data anonymization and finally, the extraction of relationships between medical entities is analyzed [7].

ii. Hypothesis Generation And Knowledge Discovery
NLP is often used to discover the new hypotheses and hidden knowledge on textual data. It provides healthcare professionals the important insight which can be used in their daily practices and research works. This technique helps to detect risk factors, symptoms and critical events of a patient and facilitate the decision making by the health professionals. Example application of this technique are the detection and identification of adverse drug event (ADE) from the content of scientific articles and health-related web-based social media [9], classification and study different pains of patients with metastatic prostate cancer via medical record text [10].
![Source: [10] Longitudinal analysis of pain in patients with metastatic prostate cancer using natural language processing of medical record text](https://towardsdatascience.com/wp-content/uploads/2020/11/1Ntd9OwvU2wQlg6dd3XFNrg.png)
iii. Text Summarization
Automatic summarizations refer to the process by which the main topics of one or more documents is identified and presented concise and precisely.[4] This allow the healthcare professionals to obtain the main point of a document such as EHR clinical notes, admission notes in a form of summarized short text which therefore reduced quantum of time, increase the productivity and make it less sensible to the subjectivity that appears when large volumes of information are analyzed. Example of application include summarization system that allows the automatic generation of summaries from medical news articles [11] and summarization of clinical information by extracting sentences from research paper’s abstracts relating to support clinician information’s needs [12].

iv. Text Classification
Text classifications refer to the automatic process for finding classify the text into categories from massive data, it plays essential role in text data retrieval and mining. [13] Classification for health-related text is considered a more challenging case of text classification as medical text generally contains normalized medical terminology, which refers to some concept or abbreviations in the medical field, such as blood pressure of 140/65. Besides, medical records often have poor grammatical sentences. [14] The example application of text classification including clinical alerts or risk factors categorization, automatically diagnostic classification, patient stratification, adverse events classification, electronic health records classification, symptomatology categorization, health miner via opinion and sentiment analysis mining [4]

Conclusion:
With the shift of healthcare to EHR, the future of medical text analytic is bright. The continuous advancement of medical text analytics offers better quality of care to the patients, lower the healthcare cost in both hospital and nationwide and lastly it provides more healthcare accessibility to patients. Healthcare professionals and health policy makers must understand the benefits this technique provides and have construct a solid plan to incorporate this technology into daily practice. Different types of text analytic terminology datasets in medical domain were introduced in this paper, with the abundance of resources, medical professionals and Health Informatics can leverage on these existing resources and start to implement medical text mining in their practice in a much lower cost.
Lastly, embracement of new practice or technology into a healthcare industry is not a trivial task especially when this industry involves patient’s lives and privacy, numerous challenges have to be overcome such as preparedness of existing facilities, legal and ethics, patient privacies and the support of higher management. The author hope that medical professions can overcome these challenges as this technology could help countless lives and save large sums of money.
Thank you for reading!
References:
[1] M. Zwolenski and L. Weatherill, "The Digital Universe Rich Data and the Increasing Value of the Internet of Things," Australian Journal of Telecommunications and the Digital Economy, vol. 2, no. 3, 2014.
[2] K. Feldman, N. Hazekamp, and N. V. Chawla, "Mining the Clinical Narrative: All Text are Not Equal," 2016 IEEE International Conference on Healthcare Informatics (ICHI), 2016.
[3] Muñoz, Isabel & Zambrana, MarĂa Rosario. (2013). Applying Ontologies to Terminology: Advantages and Disadvantages. Hermes: Journal of Language and Communication in Business. 51. 65–77. 10.7146/hjlcb.v26i51.97438.
[4] "What is Text Mining, Text Analytics and Natural Language Processing?" Linguamatics, 14-Aug-2020. [Online]. Available: https://www.linguamatics.com/what-text-mining-text-analytics-andnatural-language-processing. [Accessed: 13-Nov-2020].
[5] M. S. Simpson and D. Demner-Fushman, "Biomedical Text Mining: A Survey of Recent Progress," Mining Text Data, pp. 465–517, 2012.
[6] P. Corbett and A. Copestake, "Cascaded classifiers for confidencebased chemical named entity recognition," BMC Bioinformatics, vol. 9, no. S11, 2008.
[7] Luque, Carmen & Luna, JosĂ© MarĂa & Luque, MarĂa & Ventura, Sebastian. (2018). An advanced review on Text Mining in Medicine. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 9. 10.1002/widm.1302.
[8] G. Popovski, S. Kochev, B. Seljak, and T. Eftimov, "FoodIE: A Rulebased Named-entity Recognition Method for Food Information Extraction," Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods, 2019.
[9] Tafti AP, Badger J, LaRose E, Shirzadi E, Mahnke A, Mayer J, Ye Z, Page D, Peissig P (2017) Adverse drug event discovery using biomedical literature: a big data neural network adventure. JMIR medical informatics 5(4)
[10] N. H. Heintzelman, R. J. Taylor, L. Simonsen, R. Lustig, D. Anderko, J. A. Haythornthwaite, L. C. Childs, and G. S. Bova, "Longitudinal analysis of pain in patients with metastatic prostate cancer using natural language processing of medical record text," Journal of the American Medical Informatics Association, vol. 20, no. 5, pp. 898–905, 2013.
[11] Sarkar, Kamal. (2009). Using Domain Knowledge for Text Summarization in Medical Domain. International Journal of Recent Trends in Engineering. 1.
[12] S. R. Jonnalagadda, G. D. Fiol, R. Medlin, C. Weir, M. Fiszman, J. Mostafa, and H. Liu, "Automatically extracting sentences from Medline citations to support clinicians’ information needs," Journal of the American Medical Informatics Association, vol. 20, no. 5, pp. 995– 1000, 2013.
[13] F. Sebastiani, "Machine learning in automated text categorization," ACM Computing Surveys, vol. 34, no. 1, pp. 1–47, 2002.
[14] L. Qing, W. Linhong, and D. Xuehai, "A Novel Neural Network-Based Method for Medical Text Classification," Future Internet, vol. 11, no. 12, p. 255, 2019. [15] J. Manyika, M. Chui, B. Brown, J. Bughin, R.