Open Annotation Draft Data Model

Abstract

This document provides an intuitive introduction and guide to the Open Annotation Data Model [OA-DM], an interoperable framework for creating associations between related resources, annotations, using a methodology which conforms to the Architecture of the World Wide Web. This primer explains the fundamental Open Annotation Data Model concepts and provides examples of its use. The primer is intended as a starting point for those wishing to create or use Open Annotation Data Model compliant annotation data.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document.

Copyright © 2012-2013 the Contributors to the Open Annotation Core Data Model Specification, published by the Open Annotation Community Group under the W3C Community Contributor License Agreement (CLA). A human-readable summary is available.

This document was published by the Open Annotation Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

This document has been made available to the Open Annotation Community Group for review, but is not endorsed by them. This is a working draft, and it is not endorsed by the W3C or its members. It is inappropriate to refer to this document other than as "work in progress".

Please send general comments about this document to the public mailing list: public-openannotation@w3.org (public archives).

1. Introduction

Annotating, the act of creating associations between distinct pieces of information, is a pervasive activity online in many guises but lacks a structured approach. Web citizens make comments about online resources using either tools built in to the hosting web site, external web services, or the functionality of an annotation client. Comments about photos on Flickr, videos on YouTube, people's posts on Facebook, or mentions of resources on Twitter could all be considered as annotations associated with the resource being discussed. In addition, there a plethora of closed and proprietary web-based "sticky note" systems, and stand-alone multimedia annotation systems. The primary complaint about all of these systems is that the user created annotations cannot be shared or reused, due to a deliberate "lock-in" strategy within the environments where they were created, or at the very least the lack of a common approach to expressing the annotations.

The Open Annotation data model provides an extensible, interoperable framework for expressing annotations such that they can easily be shared between platforms, with sufficient richness of expression to satisfy complex requirements while remaining simple enough to also allow for the most common use cases, such as attaching a piece of text to a single web resource.

Unlike previous attempts at annotation interoperability, the Open Annotation system does not prescribe a protocol for creating, managing and retrieving annotations. Instead it describes a web-centric method, promoting discovery and sharing of annotations without clients or servers having to agree on a particular set of operations on those annotations.

2. Intuitive overview of OA

An annotation is considered to be a set of connected resources, typically including a body and target, and conveys that the body is related to the target. The exact nature of this relationship changes according to the intention of the annotation, but most frequently conveys that the body is somehow "about" the target. Other possible relationships include that the body is an identifier for the target, provides a representation of the target, or classifies the target in some way. (See specs, Introduction)

Figure 1. Annotation, Body and Target

In OA, an Annotation is considered to be a set of connected resources. The simplest set of resources characterizing the Annotationi is depicted in fig. 1 and consists of (i) the annotation instance, a de-facto reification of the annotation (depicted in yellow), (ii) one or more targets. what we are annotating (depicted in pink), and (iii) zero or more bodies that, if present, represent the content of the annotation or the information which is annotating the target, or targets (depicted in light blue).

The Annotation can be as simple as a bookmark that targets a webpage and does not have a body. In the example in fig. 2, the target is the Open Annotation Collaboration Group homepage and the Annotation does not exhibit any body.

Figure 2. Simple bookmark of the Open Annotation Comuntity Group page.

When the body exists, the Annotation usually conveys that the body is "somehow about" the target. This is the case of the note in the example in fig. 3, where the target is the Open Annotation Community Group homepage and the Annotation does exhibit a textual body.

Figure 3. A note targeting the Open Annotation Comuntity Group page.

An Annotation can target multiple entities as in the note depicted in fig. 4, where the targets are two images and the Annotation does exhibit a textual body.

Figure 4. A note targeting two pictures of Boston

An Annotation can consists of multiple bodies and can target multiple entities as in the example depicted in fig. 5, where the Wikipedia page of Boston and a YouTube video about Boston are declared "somehow about" the two targeted pictures.

Figure 5. A webpage (Wikipedia) and a video (YouTube) "about" two pictures.

In the next sections, we will provide detailed examples showcasing the basic features of the Open Annotation model. Additional examples of how to model and implement specific situations are available in the Annotation Cookbook.

3 Notation

Examples throughout the document will be conveyed as both a diagram and in the Turtle RDF format [TURTLE]. The Turtle examples do not provide namespace declarations, and should be considered following these namespaces:

Prefix	Namespace	Description
oa	http://www.w3.org/ns/oa#	The Open Annotation ontology
cnt	http://www.w3.org/2011/content#	Representing Content in RDF
dc	http://purl.org/dc/elements/1.1/	Dublin Core Elements
dcterms	http://purl.org/dc/terms/	Dublin Core Terms
dctypes	http://purl.org/dc/dcmitype/	Dublin Core Type Vocabulary
foaf	http://xmlns.com/foaf/0.1/	Friend-of-a-Friend Vocabulary
prov	http://www.w3.org/ns/prov#	Provenance Ontology
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#	RDF
rdfs	http://www.w3.org/2000/01/rdf-schema#	RDF Schema
skos	http://www.w3.org/2004/02/skos/core#	Simple Knowledge Organization System
trig	http://www.w3.org/2004/03/trix/rdfg-1/	TriG Named Graphs

Table 1. Namespaces

The diagrams of the examples use the following style:

Instances are depicted as colored ellipses
- Instances with a resolvable URI have a single line border
- Instances with a non-resolvable URN have a double line border
Classes are depicted as white rectanges
Literals are depicted as white lozenges
Relationships are depicted as straight, black lines (predicates where the range is a Resource)
Properties are depicted as curved, black lines (predicates where the range is a Literal)
Class instantiation (rdf:type) is depicted as a straight black line with white arrow head
Example instance identifiers or example literal values always end in a number (eg 'anno1' is a specific instance of an Annotation, whereas 'oa:Annotation' is a real class)
Relationships not explicit in the model, but important for understanding, are depicted as curved, dashed, colored lines
Resource boundaries not explicit in the model, but important for understanding, are depicted as grey dashed boxes around the components.

4 First example: Bookmarking a Webpage

In the context of the World Wide Web, a Bookmark is a locally stored Uniform Resource Identifier (URI). All modern web browsers include bookmark features (source: Wikipedia). Typically the URI is associated with a textual descriptions (or name or label) for improving human readability. In addition, Tags can be used to classify the resource and to improve retrieval.

In order to rapresent a Bookmark of the W3C Open Annotation Community Group homepage, we start by creating an instance of the oa:Annotation class; then we link such instance to the webpage (Target) identified by the homepage URI http://www.w3.org/community/openannotation/. Adding the Target content type and the dc:format (dctypes:Text and "text/html"), can be useful for applications consuming the annotation.

The Open Annotation specification, in the section 2.1.1 Typing of Body and Target, recommends the use of the Dublin Core Types vocabulary for specifying the content types.

Figure 6. Bookmark of a webpage, step 1, defining Annotation and Target

Targets are what we are annotating. The Target may be a resource with its own dereferencable URI. The representation that is retrieved may be of any content type. The resolution of the Target's URI to retrieve a representation may require multiple steps, such as HTTP redirects, and potentially multiple protocols.

We can then provide the oa:Motivation that encodes the reason(s) why the Annotation was created. In this specific case, the Motivation is oa:bookmarking:

Figure 7. Bookmark of a webpage, step 2, adding the Motivation

Next, we are going to provide the Provenance information. The Open Annotation model allows to encode both the agent that created the annotation and the agent that took care of the serialization of the annotation.

Figure 8. Bookmark of a webpage, step 3, adding the Provenance information

In order to improve human readability, we can add a textual description as a Body.

Body(ies), if any, represents the content of the annotation. They contain the information which is annotating the Target, or Targets. Bodies can be embedded or resources with their own dereferencable URIs.

In this particular example, we are going to embed the textual Body (see specs: Embedded Textual Bodies) through the Content in RDF specification introduces a resource with the class cnt:ContentAsText to represent the content, and a property cnt:chars to hold the content string itself.

5 Second example: Tagging an Image

A Tag is a non-hierarchical keyword or term assigned to someone or something. Tags are typically used to categorize or describe resources by attaching some free text label to them (sources: Wikipedia)

We are now going to create multiple tags for the Wikipedia image of the Eiffel Tower here identified by the URI http://alturl.com/wxidq.we start by creating an instance of the oa:Annotation class; then we link such instance to the image (Target) identified by the URI http://alturl.com/wxidq. Adding the Target content type and the dc:format (dctypes:Image and "image/jpeg"), can be useful for applications consuming the annotation.

The Open Annotation specification, in the section 2.1.1 Typing of Body and Target, recommends the use of the Dublin Core Types vocabulary for specifying the content types.

Figure 10. Tagging an image, step 1, defining Annotation and Target

We can then provide the oa:Motivation that encodes the reason(s) why the Annotation was created. In this specific case, the Motivation is oa:tagging:

Figure 11. Tagging an image, step 2, adding a Motivation

In order to illustrate the different tagging mechanisms supported by the Open Annotation data model we are going to create three different tags:

A textual tag: "Eiffel Tower". This is the embodiment of the 'classic' folksonomic approach. In this example we are using an embedded textual body (See specs section 2.1.2 Embedded Textual Bodies). The textual body is here identified by a UUID URI.

Figure 12. Tagging an image, step 3, adding a free text tag

In the example we have used a UUID URI, to identify the embedded textual body. You could use also other URIs or, if you need to reduces the burden for minting and maintaining identifiers when it is not necessary to do so, blank nodes. However, if you resort to the blank nodes note that the approach makes it impossible for further Annotations or other systems to refer to the Body without a Skolem IRI.

A semantic tag through a non-informational resource: DBpedia entry for the Eiffel Tower. This is the pure semantic tag construct performed by linking to terms coming from ontologies and vocabularies and by typing such terms as oa:SemanticTag.

Figure 13. Tagging an image, step 4, adding a semantic tag with a non-informative resource

The terms are identified by URIs and are equivalent to concepts not to informational resources such as a webpage. Applications don't necessarily need to fetch such resources as you would do for instance for a webpage.

A semantic tag through a webpage: Wikipedia page about the Eiffel Tower. This is a non ideal case that however is common scenario. Many applications use the URis of webpages to identify concepts. For instance I could have a webpage about a particular entity in biomedicine and, without having an ontology or terminology for that, I use that URI also to indicate the concept. As the same URI is obviously identifying the webpage, we can treat it as a pure non-informational resource. We basically define a new URI that is 'a oa:SemanticTag that corresponds to a foaf:page".

Figure 14. Tagging an image, step 5, adding a webpage as a semantic tag

As already mentioned in section 2, the Open Annotation model allows for multiple bodies. Therefore we can now pull together the three different ways of encoding tags in one single annotation. This is just an example, normally you would probably not create three different tags tags for the same entity, unless the goal is to infer mappings.

Figure 15. Tagging an image, step 6, pulling multiple tags together.

Finally, as in the previous example, we can add provenance

Figure 16. Tagging an image, step 7, adding provenance.

The above example in Turtle RDF format:

 ex:anno a oa:Annotation ;
   oa:hasTarget <http://alturl.com/xxbxn> ;
   oa:hasBody ex:uuid ;
   oa:hasBody ex:semtag1 ;
   oa:hasBody <http://dbpedia.org/resource/Eiffel_Tower>;
   oa:motivatedBy oa:tagging ;
   oa:annotatedBy ex:person1 ;
   oa:annotatedAt "2012-02-12T15:02:14Z" ;
   oa:serializedBy ex:software1 ;
   oa:serializedAt "2012-02-12T15:02:14Z" .
 
 <http://alturl.com/xxbxn> a dctypes:Image
   dc:format "image/jpeg" .
 
 ex:uuid a cnt:ContentAsText ;
   cnt:chars "Eiffel Tower" ;
   dc:format "text/plain" ;
   cnt:characterEncoding "utf-8" .

 ex:semtag1 a oa:SemanticTag ;
   foaf:page <http://en.wikipedia.org/wiki/Eiffel_Tower> .

 <http://dbpedia.org/resource/Eiffel_Tower> a oa:SemanticTag.

 ex:person1 a foaf:Person ;
   foaf:mbox <mailto:john.doe@example.org> ;
   foaf:name "John Doe" .
 
 ex:software1 a foaf:Agent, prov:SoftwareAgent ;
   foaf:name "ExAnnotator" .

5 Third example: Commenting a text fragment within a Webpage

In this example we will select a text fragment in the description displayed in the homepage of the Open Annotation Community Grooup and we will comment on it. (see Figure 17)

Annotating fragments of resources is a very common use case for the Open Annotation model. In this example we will create a multiple-bodies annotation that targets a specific text fragment in a Webpage. The Open Annotation model allows identifying resources fragments with both Fragment URis (see 2.1.4 Fragment URIs Identifying Body or Target) or with the SpecificResource (see 3.1 Specifiers and Specific Resources) and Selector (see 3.2 Selectors) mechanism.

Figure 17. Commenting on a text fragment, step 1, identifying the fragment within the Open Annotation Community Group homepage.

We can start by creating the instance of the oa:SpecifResource that is going to link to the original entire resource (Open Annotation Community Group homepage) through the relationship oa:hasSource.

Figure 18. Commenting on a text fragment, step 2, identifying the Specific Resoruce source.

As I am interested in creating an annotation that could potentially display on both the HTML and the PDF version of the same document, I will opt for a Text Quote Selector (see 3.2.2.2 Text Quote Selector). The idea of such selector is simple: we identify the match by defining the text preceeding the match (oa:prefix), the match itself (oa:exact) and the text following the match (oa:suffix).

Figure 19. Commenting on a text fragment, step 3, defining the selector (the exact match has been shortened).

Figure 20. Commenting on a text fragment, step 4, adding body and provenance (the exact match part of the selector has been shortened).

Fourth Example: Annotating an image webpage with a video

The Wikipedia page about the 'Hubble Deep Field' contains an image (see figure 21).

Figure 21. Wikipedia page about 'Hubble deep field' that contains an image identified by the red arrow.

A user creates a video on YouTube that discusses the Hubble Deep Field Image. The video is therefore the Body of the Annotation, and the image is the Target, as the video is about the image. We start by defining the target of our annotation:

Figure 22. Defining the image as the annotation target.

The Open Annotation model gives the option of recording the webpage that the user was looking at while performing the annotation through the use of a SpecificTarget and the relationship hasScope.

Figure 23. Defining the image as the annotation target within the Wikipedia page (scope).

We will now define the body of the annotation as the YouTube video.

Figure 24. Defining the video as the annotation body.

Figure 25. Adding provenance.

Figure 26. Identifying a Body fragment throuh a Media Fragment URI [Media-Frag].

C. References

C.1 Normative references

No normative references.

C.2 Informative references

[OA-DM]: Rob Sanderson, Paolo Ciccarese, Herbert Van de Sompel Open Annotation Data Model. URL: http://www.openannotation.org/spec/core/
[TURTLE]: Eric Prud'hommeaux, Gavin Carothers Turtle: Terse RDF Triple Language. 9 August 2011. W3C Working Draft. URL: http://www.w3.org/TR/2011/WD-turtle-20110809/
[MEDIA-FRAG]: Raphaël Troncy, Erik Mannens, Silvia Pfeiffer, Davy Van Deursen Media Fragments URI 1.0 (basic). 25 September 2012. W3C Recommendation. URL: http://www.w3.org/TR/media-frags/

Open Annotation Data Model Primer

Community Draft, 14 January 2013