Visual images are a powerful medium for communicating ideas and information, and they provide a valuable complement to textual content.  A vast amount of information resides inside photographs, paintings, diagrams and drawings which is comprehensible to the human eye but until recently has been relatively inaccessible to machine queries.

‘Access to image-based resources is fundamental to research, scholarship and the transmission of cultural knowledge. Digital images are a container for much of the information content in the Web-based delivery of images, books, newspapers, manuscripts, maps, scrolls, single sheet collections, and archival materials.’

International Image Interoperability Framework (IIIF)

Well established techniques exist to support searching and browsing images based on metadata applied to the whole image, but search and browse access to specific points or regions within images is a relatively immature field.

Image-level metadata may be sufficient for information access to the majority of images, but image-level metadata is insufficient to support access to large or complex images.  Examples from art, science and current affairs are considered.

Figure 1 below portrays The Garden of Earthly Delights by Hieronymus Bosch, which is an allegorical work of art with dense figurative detail.  The painting’s narrative is built up from many separate scenes, each telling a story that is rich in allusion and symbolism.  Metadata applied to the whole image does not enable the viewer to locate or interpret the various different scenes and figures in the work.

Figure 1: The Garden of Earthly Delights, Hieronymus Bosch, c. 1503

Figure 2 below presents an image of the Earth’s moon.  Metadata applied to the whole image does not enable the viewer to identify topographic features or the location of the Apollo landing sites.

Figure 2: Full Moon, Gregory H. Revera, 2010

Figure 3 below presents a photograph of ten international leaders at the G8 Summit in Lough Erne Northern Ireland on 18th June 2013.  Image level metadata can describe the names and titles of the people in the photograph, but not match the specific names to each individual.

Figure 3 Leaders of the G8, 2013

The Value Proposition for Sub-Image Annotation

In all three of the examples, while a certain level of information access can be supported by image-level metadata, a much richer knowledge discovery experience can be provided if specific visual details are individually identified.

An analogy with information access to physical books illustrates the value proposition for sub-image annotation.

In a physical library, card catalogues and bibliographic databases provide a means to identify shelves of books and individual books.  Tables of contents and subject indexes then facilitate a deeper level of information access to specific pages, sections and paragraphs within a book.

Image-level metadata may be compared with card catalogues and bibliographic databases.  They take the user to a discrete work but cannot take the user inside the work to discover its interior content.  Sub-image annotation may be compared with tables of contents and subject indexes.  They allow the user to search and to navigate inside images.

Image-Level Metadata

Image-level metadata describes the whole image. At a minimum, descriptive properties usually include an image title, long description, date and creator.  They could also record metadata automatically captured by cameras according to the Exif standard. Different knowledge domains and disciplines will also each have the need to capture additional information.  In the cultural heritage community it may be desirable to distinguish the date and creator of the photograph from the date and creator of the object depicted in the photograph, and to add provenance information about the depicted object.  The image-level metadata may be further extended and linked to collection management system records.  In the healthcare and life-sciences community it may be desirable to include metadata captured according to the AIM or DICOM standards.

A generalized prescriptive approach to image-level metadata would fail to satisfy most real-life user needs, which require image-level metadata to be an extensible ontology.  A base set of properties and attributes need to be defined for any knowledge domain and then be quickly and easily extended. Image-level metadata provides a catalogue of the images in a collection.  The metadata elements need to support faceted search and navigation of the collection.

Sub-Image-Level Metadata

Sub-image-level metadata, also referred to as annotation metadata, describes particular points of interest or regions of interest in an image.  At a minimum, descriptive properties usually include an annotation label and longer description.  Additional metadata is required to define the spatial co-ordinates of the point or area and the level of zoom defined for the point or region.  Spatial co-ordinates may be expressed in pixels offset from the top-left corner or in percentages of the width and height.

As with Image-level metadata extensibility is essential in order to meet the real-world needs of users in a diversity of knowledge domains and disciplines. Annotation metadata provides an inventory of the visual features within an image.  This metadata can support search inside functions and table-of-contents style browsable navigation.

Further posts in this section will explore specific use cases and emerging standards, as well as a discussion on the benefits of semantic indexing and Linked Data.