The world’s leading publication for data science, AI, and ML professionals.

Creating Smart – Knowledge Base Systems(KBS) using advanced NLP library

Creating and Sustaining Large Knowledge Base Systems(KBS) can be a daunting exercise even for the largest corporations in the world.

Creating and Sustaining Large Knowledge Base Systems(KBS) can be a daunting exercise even for the largest corporations in the world.
Creating and Sustaining Large Knowledge Base Systems(KBS) can be a daunting exercise even for the largest corporations in the world.

Enterprises, Institutions or any large Organizations have built their knowledge over several years of their existence by recording it as books, journals/articles, documents, etc. Continuous access to this knowledge for its employees, students, teaching/research communities is essential for sustained operations. KM(Knowledge Management) tools in the market help to address this need to a certain extent by creating a knowledge repository of sorts and enabling access to it. KM tools require a document to be tagged (either manually or automatically) in order for it to be easily searchable by users. While these tools serve a purpose, organizations are faced with new challenges in the digital era –

  1. Capturing contextual & semantic information in the knowledge base
  2. Ability to instantly access information.
  3. Need for an automated & sustained methodology to create such a system

As-is scenario

A search result like this may not serve the purpose all the time
A search result like this may not serve the purpose all the time

An enterprise knowledge management solution could help reach all the relevant documents to the user based on the user’s query. But that hardly satisfies the user’s requirement because now that the user has (let’s say) access to 10 documents, one has to read through them in order to get the information one is looking for. This could still prove to be a tedious and frustrating task for users in the live projects, research community etc as it introduces a delay in getting the information.

To-be scenario

What information can I get for you today?
What information can I get for you today?

The future scenario of accessing information from knowledge bases could be as trivial as asking a chatbot for the exact information that one needs. For eg. "In how many project implementations were a certain technology used?" type of query would come back instantaneously with an answer. Although this might sound too good to be true for most people, the technological advancements in Deep Learning & Natural Language Processing frameworks have put such a solution feasible. Moreover, thanks to the democratization of AI technologies and the healthy ecosystem of open source projects in AI, such a futuristic solution is now within the grasp of small, big enterprises alike.

Who is the Genie behind the chatbot?

Well, it’s obvious there is no Genie behind all of this. But I believe figuratively, the hordes of research scientists, research institutes who relentlessly contribute to the cutting edge work in ML & AI for NLP are the genies who made this possible and we should all be collectively grateful for their efforts. Here’s how the solution goes.

Knowledge Graph (Illustration)
Knowledge Graph (Illustration)

The chatbot is integrated to a huge graph database that captures information about the various entities such as a person, company, location, technology name etc as nodes and the relationship & association among all the entities as edges. For eg. in a sentence ‘company X uses technology XYZ’, ‘uses’ is a relationship that exists between entities X & XYZ and that is stored as an edge. Further, all other associated information about entities and relationships are stored as properties of the nodes and edges respectively. We were motivated by how Google uses knowledge graphs to store and retrieve information. Essentially, this solution can be thought of as building a google like a semantic search engine on a body of knowledge for instant retrieval.

Open source library from Explosion.ai
Open source library from Explosion.ai

The most critical part or the heart of the solution is to determine which are the entities and what are the relationships among them from the natural language. This is powered by the NLP libraries such as spacy.io that empowers developers to train customized NER (Named Entity Recognition) and dependency parser models on any natural language text. The outcome of these models is well-identified entities and their relationships capturing the semantic context among entities and relationships in documents. This semantic knowledge among the entities is then fed to the knowledge graph.

Once the above process flow is automated into a pipeline, the knowledge graph can absorb future additions or overlaps when processing subsequent documents. Where it recognizes the same entity, it would reuse it and create new entities only when its completely new to the knowledge graph. This way the knowledge base can be grown organically in the future in a sustained manner for all subsequent additions while retaining the relationships with existing knowledge.

Challenges & other considerations

As with any technology, implementing a Smart – Knowledge Base System (KBS) has its own set of challenges.

  1. Change Management – The solution entails a new process of digitizing and processing documents in the enterprise requiring the active participation knowledge SMEs in the organization not to mention this might also require the creation of new roles in the organization
  2. Garbage-In-Garbage-Out – is how the famous adage goes. So ensuring that the documents are digitized and information is extracted accurately is paramount for the success of this initiative. In addition, tagging the entities & relationships needs to be done accurately
  3. This solution may not be needed in all the organizations where the latency in information extraction is not a huge roadblock, so investing time & resources might prove to be counter-intuitive
  4. Data Science Team – It’s essential to have an experience data science team to overlook the implementation even if it means hiring experts externally
  5. Technical Infrastructure – The solution would need reasonable to significant investments in the technology infrastructure either on-prem or on the cloud. If you already have an ongoing relationship with any of the cloud service providers, provisioning for this requirement would be seamless
  6. Lastly, involve the end users in the solution even during the design stage to encourage large scale adoption and to avoid any mismatched expectations at a later stage

Liked my article? Pls hit the claps button to help others find it too and pls check out my Rajkumar Kaliyaperumal other popular posts below

  1. From Business Intelligence to Data Science & Machine Learning
  2. ML & DL Learning path

I published this article on Linkedin too.


Related Articles