Knowledge Management is a powerful tool for improving the business results of any enterprise. It achieves this through engaging the enterprise in a concerted effort to optimise the use of knowledge assets, which include recorded information and methodologies as well as human experience and know-how.

KM is both a strategic and a tactical endeavour. It will fail without vision and leadership from the top. It will only succeed when all parts of the organisation collaborate to align people and resources toward a common goal.

KM is also a technical endeavour. It requires a systematic approach based on standards and best practices, as well as the support of software systems for managing knowledge organisation, discovery, and sharing.

The technologies and the methodologies used to organise, discover and share knowledge are the subject of this article. Academically this activity is firmly grounded in library and information science, but other disciplines are now playing an increasingly important role, including computer and data science, language engineering, linked data, social media and user experience design.

The Origins and Evolution of Knowledge Organisation Systems

Organisation methods and practices started as soon as people began collecting knowledge in libraries. In the 3rd century BCE the poet and scholar Callimachus of Cyrene produced a 120 volume bibliographic catalogue of the half-million works held in the Library of Alexandria. The catalogue comprised indexes and tables of information including the title, author and subject, as well as brief biographies and abstracts.

In the late 19th century CE the American librarian and educator Melvil Dewey developed a hierarchically organized classification system. The Dewey Decimal Classification system was widely adopted by libraries around the world and is still in use today. In the 20th Century CE the Indian mathematician and librarian S. R. Ranganathan devised a facetted classification system (colon classification), supporting much greater indexing specificity.

Library classification schemes were primarily designed to place physical books on to physical shelves. This posed an immediate knowledge organisation challenge. A physical book can’t be in two places at the same time, but many books span multiple subjects and justify being accessible under more than one heading.

The problem was overcome by creating indexes. These can provide access to a book under the multiple subject headings that it is about, as well as by title, author, and other parameters.

 

Simple card indexes grew in size and complexity, taking the form of alphabetical and hierarchical indexes of subjects, places and people. Libraries standardized the terminology they used by creating ‘controlled vocabularies’ and ‘authority files’, such as the Library of Congress Subject Headings.

The Digital Age

When information was stored digitally rather than on paper, computers could search the full text of books and documents at lightning speed. For a while some people thought that they no longer needed the knowledge organisation tools and methods developed by libraries.

Most enterprises came to realize, however, that full text search has its own limitations. Search can find specific words or phrases but it has no understanding which words and phrases are significant; it doesn’t understand what a document is about. Additionally, because one word can refer to different things and one thing can be described by different words, full text searching is necessarily imprecise. For every relevant document returned a user may have to wade through hundreds or thousands of irrelevant documents.

A more insidious problem is that some relevant documents will never be seen because the language used by the searcher doesn’t match the language used by the author.

Knowledge Organisation Systems

KOS are formally structured schemes that describe collections of like-things such as subjects, people and places. They disambiguate words that can have multiple meanings, and also map together variant terms for the same concept. Relationships assert facts about how concepts and entities relate to one another.

Numerous national and international standards are available to help guide the design of KOS schemes, foremost among them are ISO 25964 and W3C SKOS and OWL.

Linked Data

One of the most exciting innovations in the management of knowledge is the prolific growth of Linked Data. Linked Data allows diverse datasets to be recombined in novel ways. This can be kept behind the firewall as Linked Enterprise Data or exposed to the world as Linked Open Data. Linked Open Data provides a framework for the entire planet to share knowledge. When different user communities and perspectives come together and collaborate, our collective knowledge is enriched in the process. Any organisation can start tapping into a wealth of structured knowledge currently available as Linked Open Data. Organisations that fail to embrace the Linked Data opportunity may struggle to keep up with those who do.

Semantic Annotation

Words and phrases found in content may be individually annotated by linking them to the most specific concepts and names they discuss or represent. Larger sections of content including complete documents can be classified to the broader categories they are about. The indexing process can also extract concepts and names from content and submit them as candidates for inclusion in the ontology. Natural Language Processing (NLP) techniques analyze the text and metadata of content. The indexing process leverages the semantic relationships of the ontology to determine the context of words and phrases within the text and then match them to disambiguated concepts in the ontology.

Semantic Enrichment

Concepts and names within an enterprise ontology may be mapped to equivalents in external ontologies. Alternatively, external ontologies may be adopted and used directly as reference authorities. When internal content is indexed or mapped to external resources then additional information contained in those resources can be retrieved. For example, text analysis may identify that a document mentions ‘London’ and that this refers to London, England as distinct from London, Ontario. Once the named entity has been unambiguously identified it can be mapped to equivalent entities in resources like DBpedia and GeoNames. These resources can then deliver additional information such as latitude and longitude coordinates, population statistics, maps, images and data on industry, commerce and government.

Human and Machine Indexing

Indexing content can be a fully manual process, a fully automated process, or most often a mixture of the two. Automation involves the use of Named Entity Recognition (NER) and Natural Language Processing (NLP) systems. It also uses the semantic relationships found within Knowledge Organisation Systems plus general-purpose lexicons and grammatical parsers, along with custom built indexing rules and machine learning processes. Modular NLP components can be assembled to create a finely-tuned semantic indexing pipeline.

During the prototyping phase tools may be employed to create reference training sets using consensus-based human indexing. NLP and NER processes may then be tested against the training sets and optimized. After the prototyping phase, corpora, ontologies and NLP/NER processing resources may be compiled into a semantic annotation application.

Knowledge Discovery

We first discover knowledge about the world directly through our senses. Through touch, smell, taste, sight and sound. We then discover knowledge vicariously through books, media and dialogue with other people.

In the digital age most of the vicarious information we receive is delivered via computers and mobile devices. Screens have become the medium between people and knowledge. Knowledge discovery is the end-game of knowledge organisation.

Modes of Human Interaction with Information Systems

There are three fundamentally different ways that humans interact with information systems: (i) Search, which starts with a user’s expression of their question (usually one or two keywords) and then follows iterative refinement; (ii) Browse, which starts with the system presenting organized lists or graphs of related things and then follows the user’s chosen pathway; and (iii) Discovery, which happens when the user’s search or browse experience is interrupted upon the surfacing of relevant concepts or content that were not present in the user’s initial query.

Examples of Knowledge Discovery

The presentation slides accompanying this talk provide five examples illustrating how Knowledge Organisation Systems support knowledge discovery.

In example 1 below the user searched on “love” – the Knowledge Organisation System was accessed to identify artworks and specific figurative details that are about the concept of love regardless of what words may actually exist in the content.

In example 2 below the user pans and zooms around images and the Knowledge Organisation System responds by dynamically updating a discovery panel to reveal the concepts related to what is in view.

In example 3 below the user browses alphabetical and hierarchical lists of concepts and visual features and the system responds by opening the image and panning and zooming to the specific visual details.

In example 4 below the Knowledge Organisation System works behind the scenes to facilitate the discovery of conceptually related content. Using Linked Open Data KOS opens more gateways to discover related external content from sources such as DBpedia.

Finally, example 5 illustrates how Knowledge Organisation Systems and Linked Open Data ontologies allow enterprises to ‘search outside the box’ of their own content, enabling questions to be answered where the relevant data does not exist within the enterprise’s internal content. The example uses NLP technology to identify people within news articles. The people are then semantically indexed using named entities in external biographical ontologies. The external biographical data can then be queried along with the internal news content.

This technique enables powerful queries to be performed, such as find news articles about politicians born in England who are under the age of 50 and hold cabinet positions. The internal content only has to mention the names of people within the text. The rest of the query can be answered by referencing knowledge contained in external KOS ontologies.

Conclusions

KM and KO can help enterprises to create, preserve and disseminate actionable knowledge. At the level of knowledge organisation and discovery the key challenge is to help people to see both the forest and the trees. To optimise search and find the needle in the haystack, while also being able to contextualise how resources relate to one another. It addresses the challenges of retrieval, and relevance.

As digital data continues to grow exponentially the need for knowledge organisation tools only increases. We need systems that can index documents by all the significant concepts they discuss and then summarize what each document is about. Controlled vocabularies and authority files are key enablers in this endeavour.


This article is based on a presentation to the Global Management Congress in Mumbai, June 2016, delivered by Gene Loh on behalf of David Clarke upon acceptance of the Knowledge Management Leadership Award.