Information engineering

Logo

Documentation, backgrounders and tutorial material related to information design, engineering, semantics, ontologies, and vocabularies

Standards and namespaces
Controlled vocabularies
Semantic Web Tools
Learning resources

Best practice in formalizing a SKOS vocabulary

This page was originally developed as part of the SEEGrid website.

Some best practices have been developed that will assist in the preparation of a well-behaved, maintainable vocabulary.

(SKOS) is based on a vocabulary/thesaurus model, and was designed primarily to formalize existing vocabularies using the semantic web tools, and also smooth the transition towards the richer logic-based tools from ontology modeling. The aim of SKOS is to enable pre-existing controlled vocabularies to be consumed on the web and to allow vocabulary creators to publish born-digital vocabularies on the web. To understand SKOS you have to have a basic understanding of controlled vocabularies (hierarchical relationships, broader and narrower terms where each node has a relationship)

SKOS was built on RDF, and thus SKOS (a data sharing standard for formal logic and structure) data are represented as RDF triples. This standard expresses data in a manner that is easily amenable to computation and hence the usefulness.

Syntax

RDF examples on this page are shown using Turtle which is the most human readable syntax, though it is sometimes recommended that the maintenance versions are stored in RDF/XML which is the only format that is mandatory for RDF applications.

Resource identifiers in Turtle may be shown as either

Vocabulary content

Get the current vocabulary content from the organization responsible for its maintenance. Tell them about your intention to provide a web service to deliver it. Verify that they do not already provide such a service. Find out their versioning and maintenance policy. If possible, give them a formal role in the project, at least to ensure that you are notified of changes to vocabulary content. Discuss governance mechanisms, in particular how changes will be managed, and what the versioning policy is.

For each item in the vocabulary, make sure you have the following information:

If possible, also get

URIs and namespaces

RDF Namespace

URIs for vocabulary elements (concepts, collections, concept schemes) should be in a domain whose HTTP server can be configured to either

The URIs should be owned by, or at least acknowledge, the vocabulary governance body.

Namespace examples:

URIs are case sensitive

Except for the domain name (i.e. the part between http:// and the first single /) URIs are case-sensitive. It is often suggested that any URI which you expect people to enter by typing at a keyboard should be all lower-case. It also allows a server to be easily configured to be case-insensitive, using a simple rule to fold mixed-case into a single (lower) case.

Vocabulary elements

All elements in a SKOS vocabulary are denoted with URIs.

Ontology document

An “Ontology Document” packages descriptions of the elements of the vocabulary and dependencies.

Ontology URI examples:

The Ontology URI plays no specific role in a vocabulary, but is involved with OWL dependency management, using the owl:imports property. This mechanism essentially assumes that the complete ontology can be obtained from the ontology URI, in the RDF/XML format. To enable this, the ontology document should be designated by an Ontology URI or location on a HTTP server where either

  1. an RDF/XML file can be placed, so that it can subsequently be downloaded by a simple HTTP request, or
  2. a suitable redirection or content generation operation can be triggered.

The Ontology URI may also name a ‘graph’ of content in a triple-store, which is derived from a single source.

Concepts

Vocabulary terms are associated with SKOS Concepts.

Concept URIs are generally unversioned. The rationale for this is that a concept is an abstract resource which does not in principle change. What we know about a concept may change, and this will be captured in a document describing the concept (also known as a ‘graph’). The concept is not itself on the web, while the description is. The description is a concrete resource which should be denoted by a different URI to the concept, and may be versioned if what we know about the abstract resource changes. On the other hand, if a concept has “changed”, then it is a different concept and should be given a different URI, but the change will generally not be just by incrementing a number.

Concept URI examples:

(The version numbers in these URIs generally apply to the source of the definition, rather than concept identity.)

Several ‘standard’ patterns have been proposed to map the URI for an off-web resource to a URI for a document describing it.

Collections

Collections can be used for various purposes.

One application is to support partial URIs, treating the concept URI as a path so that each URI created by trimming a path element from the concept URI is realized as a skos:Collection. This provides entry points for exploring a vocabulary similar to browsing a traditional hierarchical file-system. Collections should be matched to every partial URI that the provider expects users to attempt to resolve.

Collection URI examples:

A useful convention is that URIs with a trailing ‘/’ denote fully described collections, and aliases without the trailing ‘/’ ensure that every possible path still resolves.

Collections can also be used to group concepts along thematic grounds, perhaps as part of a facetted classification, if desired. In these cases the collection URI may not be the parent of the member concept URIs.

Concept-schemes

Concept-schemes provide another kind of container. In terms of semantics, a concept-scheme collects a set of concepts with a consistent set of hierarchical and other semantic relationships.

Membership in a concept scheme is explicit (through the skos:inScheme property) so the URI for a concept scheme might not use the same stem as the URIs for the member concepts. If the same URI stem is used, then the keyword ‘scheme’ may be used as the local identifier to distinguish the concept scheme from collections and concepts.

Concept-scheme URI examples:

Slash or Hash

There are different views on whether the RDF namespace is separated from the local name using a / or #. Currently we recommend use of / namespaces in vocabularies on pragmatic grounds.

Vocabulary items may be members of very large sets. Furthermore, vocabulary items are commonly accessed one-at-a-time, and not only in the context of the complete set. For those reasons ‘slash-URIs’ are recommended for vocabulary items, in which the local-name appears after a “/”, as shown in the examples above. Slash URIs allow access to individual items over HTTP.

In contrast, ‘hash-URIs’ in which the local-name appears after the # fragment separator, do not allow individual access because the HTTP protocol does not transmit the fragment to the server. A request for an item identified by a hash-URI, such as http://www.w3.org/2000/01/rdf-schema#Class, will get the entire vocabulary, regardless of whether the secondary resource is even present in it. For some important use-cases this behaviour is undesirable, because (a) it requires additional processing on the client side to extract the secondary resource (b) it is not possible to rely on HTTP 404 to discover that a concept does not exit.

URI stability and patterns

If the vocabulary is intended to be used for a long period (and what vocabulary isn’t?) the URIs must be stable and persistent. For this reason the URI domain and pattern should not be based on something temporary, like the current deployment technology (e.g. .asp, .jsp, .cfm) or the current organizational structure of the deploying agency.

Intuitive URIs, including words rather than opaque identifiers, are more memorable and therefore more user- and developer-friendly. Nevertheless, URIs in which the localname is a number or code are fine, as long as a label is provided.

Do not over-design URI patterns. Elaborate path structures are unwieldy and may become outdated. Shorter paths and flatter hierarchies are generally more scalable and flexible. Do not rely on URI structure to capture semantics: semantics and relationships are better provided explicitly, in the resource descriptions.

Also see Tim Berners-Lee’s note from 1998 on the dangers of embedding too much information in URIs.

Vocabulary properties

Ontology metadata

The Ontology resource provides a hook primarily for metadata relating to the document. In addition to standard RDFS and Dublin Core terms, OWL provides the following important properties:

<http://resource.geosciml.org/vocabulary/timescale/isc-2010>
       a       owl:Ontology ;
       rdfs:label "Ontology document containing the International Stratigraphic Chart (2010)"@en ;
       rdfs:seeAlso <http://resource.geosciml.org/classifierscheme/ics/2010/ischart> ;
       dc:creator "Simon J D COX"@en ;
       dc:rights "RDF version Copyright © 2012 CSIRO, Arizona Geological Survey, IUGS"@en ,
           "Original version Copyright © 2010 International Commission on Stratigraphy"@en ,
           <http://opendatacommons.org/licenses/by/1.0> ,
           "This Ontology is made available under the Open Data Commons Attribution License: http://opendatacommons.org/licenses/by/1.0"^^xsd:string ;
       dcterms:isReplacedBy               <http://resource.geosciml.org/vocabulary/timescale/isc-2012> ;
       dcterms:replaces <http://resource.geosciml.org/vocabulary/timescale/isc-2009> ;
       owl:imports <http://resource.geosciml.org/ontology/timescale/gts-30> , foaf: , <http://www.opengis.net/ont/geosparql> ;
       owl:priorVersion <https://www.seegrid.csiro.au/subversion/xmml/ontologies/tags/201205-hash-namespaces/GeologicTimeScale/isc-2010.rdf> ;
       owl:versionIRI <https://www.seegrid.csiro.au/subversion/xmml/ontologies/tags/201208-Temporal/GeologicTimeScale/isc-2010.rdf> .

Concept scheme properties

A concept-scheme provides a home for metadata related to the vocabulary content as a whole, including versioning. The description of a concept-scheme may be found by following the skos:inScheme property in a concept description.

<http://resource.geosciml.org/classifierscheme/ics/2010/ischart>
      a       gts:GeologicTimescale , skos:ConceptScheme ;
      rdfs:isDefinedBy <http://resource.geosciml.org/vocabulary/timescale/isc-2010> ;
      rdfs:label "International Stratigraphic Chart (2010)"@en ;
      rdfs:seeAlso <http://www.stratigraphy.org/upload/ISChart2010.pdf> ;
      dc:contributor "Chinese and Japanese preferred labels from SKOS by Xiaogang Ma, adopted from Asian Multilingual Thesaurus of Geosciences."@en , 
         "OneGeology Europe preferred labels merged in by S.M. Richard."@en , "International Commission on Stratigraphy"@en ;
      dc:creator "Simon J D COX"@en ;
      dc:description "This is an RDF/OWL representation of the 2010 edition of the International Stratigraphic Chart from the International Commission on Stratigraphy."@en ;
      dc:rights "RDF version Copyright © 2012 CSIRO, Arizona Geological Survey, IUGS"@en , "Original version Copyright © 2010 International Commission on Stratigraphy"@en , 
         <http://opendatacommons.org/licenses/by/1.0> , 
         "This Ontology is made available under the Open Data Commons Attribution License: http://opendatacommons.org/licenses/by/1.0"^^xsd:string ;
      dc:source "International Stratigraphic Chart 2010"@en ;
      dc:title "International Stratigraphic Chart (2010)"@en ;
      dcterms:created "2012-01-13"^^xsd:date ;
      dcterms:isReplacedBy
              <http://resource.geosciml.org/classifierscheme/ics/2012/ischart> ;
      dcterms:replaces <http://resource.geosciml.org/classifierscheme/ics/2009/ischart> ;
      skos:hasTopConcept isc:Precambrian , isc:Phanerozoic ;
      skos:prefLabel "International Stratigraphic Chart (2010)"@en .

The metadata properties are used as follows:

Concept descriptions

Each concept is described by a set of assertions using RDF properties from the core RDF vocabularies.

  1. each concept should have at least one skos:prefLabel. Additional prefLabel properties can support multi-lingual terms, and skos:altLabel properties can provide for non-preferred synonyms.
    • example: isc:Silurian skos:prefLabel "Силур"@bg , "Silúrcio"@es , "シルル紀"@ja , "Silurian"@en .
  2. one of the skos:prefLabel values should also be recorded as the rdfs:labelfor display in simple clients and IDEs
    • since skos:prefLabel rdfs:subPropertyOf rdfs:label this can be generated with the help of a reasoner
  3. all known synonyms should be recorded using skos:altLabel
  4. each concept should have a textual definition recorded using skos:definition
  5. the language should be indicated for every text string (@lang tag)
  6. where an external source for the individual definition is available, this should be recorded in two ways:
    • link to an online resource using rdfs:seeAlso
    • the name of the external source of the definition using dc:source
  7. use skos:notation for any symbolic label or code which is externally recognised
  8. use skos:broader and skos:narrower relationships between concepts to capture hierarchies in the vocabulary if they exist
    • example: isc:Silurian skos:broader isc:Paleozoic .
      Note: These terms have sometimes been used to capture partitive as well as hierarchical relationships, such as my:wheel skos:broader my:car . However, it is recommended to use skos:broader only for is-a relationships, else misleading inferences might be drawn.
  9. transitive hierarchical relationships (skos:broaderTransitive, skos:narrowerTransitive) should be added as assertions if they are expected to be used in vocabulary queries
    • these can be generated from the broader/narrower relationships with the help of a reasoner
    • example: isc:Silurian skos:broaderTransitive isc:Phanerozoic , isc:Paleozoic .
  10. non-specific relationships between concepts should be added using skos:semanticRelation. Or use skos:related (symmetrical but not transitive).
  11. all inverse relationships (broader/narrower, broaderTransitive/narrowerTransitive) should be added as assertions if they are expected to be used in vocabulary queries * these can be generated from the initial asserted relationships with the help of a reasoner * example: isc:Paleozoic skos:narrower isc:Silurian .
  12. owl:sameAs should be used to record alternative URIs (aliases) used for the same concept
  13. skos:closeMatch, skos:exactMatchetc. should be used to link to concepts in other published vocabularies * see VocabularyHarmonization * example: isc:Silurian owl:sameAs <http://dbpedia.org/resource/Silurian> .
  14. each concept should be skos:inScheme a concept-scheme
isc:Silurian
     a      skos:Concept ;
     owl:sameAs      <http://dbpedia.org/resource/Silurian> ;
     rdfs:comment "younger bound-416 +/-2.8"@en , "older bound-443.7 +/-1.5"@en ;
     rdfs:label "Silurian Period"@en ;
     skos:definition "The Silurian is a geologic period and system that extends from the end of the Ordovician Period, about 
         443.7 ± 1.5 million years ago (mya), to the beginning of the Devonian Period, about 416.0 ± 2.8 mya. As with other 
         geologic periods, the rock beds that define the period's start and end are well identified, but the exact dates are 
         uncertain by several million years. The base of the Silurian is set at a major extinction event when 60% of marine 
         species were wiped out."^^xsd:string ;
     rdfs:seeAlso <http://www.stratigraphy.org/ics%20chart/ChronostratChart2012.pdf> ;
     dc:source "International Stratigraphic Chart. International Commission on Stratigraphy, July 2012"^^xsd:string ;
     skos:broader isc:Paleozoic ;
     skos:broaderTransitive isc:Paleozoic , isc:Phanerozoic ;
     skos:inScheme <http://resource.geosciml.org/classifierscheme/ics/2012/ischart> ;
     skos:narrower isc:Llandovery , isc:Ludlow , isc:Wenlock , isc:Pridoli ;
     skos:narrowerTransitive   isc:Aeronian , isc:Llandovery , isc:Sheinwoodian , isc:Ludlow , isc:Wenlock , isc:Gorstian , 
         isc:Telychian , isc:Rhuddanian , isc:Homerian , isc:Pridoli , isc:Ludfordian ;
     skos:notation "a1.1.3.4"^^<http://resource.geosciml.org/schema/cgi/gts/3.0#EraCode>;
     skos:prefLabel "Silúrico"@pt , "Silur"@no , "Silur"@cs , "Silur"@da , "Silur"@de , "Silur"@et , "siluriano"@it , 
         "Silúrcio"@es , "Siluur"@nl , "Silurian"@en , "Siluuri"@fi , "Sylur"@pl , "&#24535;&#30041;&#32426;"@zh , 
         "szilur"@hu , "Sil&#363;ras"@lt , "&#1057;&#1080;&#1083;&#1091;&#1088;"@bg , "silur"@sv , "silur"@sl , 
         "&#12471;&#12523;&#12523;&#32000;"@ja , "Silurien"@fr , "silúr"@sk ;
     foaf:isPrimaryTopicOf <http://sweet.jpl.nasa.gov/2.2/stateTimeGeologic.owl#Silurian> . 

Collection properties

A primary use of skos:Collection is to provide a resolvable resource for every partial path in the URI set for the concepts. skos:Collection and skos:OrderedCollection can also be used for any other grouping, within or across concept-schemes.

  1. a skos:Concept or skos:Collection can be a member of any number of skos:Collections
  2. a rdfs:label and skos:prefLabel should be provided for display in user interfaces

Example:

nil:
      a       skos:Collection ;
      rdfs:label "OGC Nils 0" ;
      skos:member nil:AboveDetectionRange , nil:withheld , nil:unknown , nil:missing , nil:inapplicable , nil:template , nil:BelowDetectionRange .

Example:

isc:  a       skos:Collection ;
      rdfs:label "Geologic Timescale Elements"^^xsd:string ;
      owl:versionInfo "Created with TopBraid Composer"@en ;
      skos:member isc:Bajocian , isc:Cenozoic , isc:Tournaisian , isc:LowerMississippian , isc:LowerJurassic , 
         isc:BaseKungurian , isc:BaseMaastrichtian , isc:BasePridoli , isc:Proterozoic , isc:BaseMiddleOrdovician , 
         isc:Tithonian , isc:BaseLopingian , isc:Rhyacian , isc:UpperJurassic , isc:BaseOrosirian , isc:BaseFamennian , isc:BaseLudlow ;
      skos:prefLabel "Geologic Timescale Elements"^^xsd:string .

isc:Eras
      a       skos:Collection ;
      rdfs:label "Eras (all ranks) in the International Stratigraphic Chart"@en ;
      skos:member isc:Bajocian , isc:Cenozoic , isc:LowerMississippian , isc:Tournaisian , isc:Wenlock , 
         isc:MiddleJurassic , isc:LowerJurassic , isc:UpperOrdovician , isc:Kimmeridgian , isc:MiddleOrdovician , 
         isc:Cryogenian , isc:Aalenian , isc:Kasimovian , isc:Proterozoic . 
(etc)

Other considerations

Container patterns

A number of container resources and patterns are available in RDF/OWL/SKOS.

rdf:type

Basic RDF provides for resource types. For example:

my:ResourceA rdf:type skos:Concept.  
my:ResourceB rdf:type skos:Concept .

asserts that the resources are members of the class indicated. The subject resources are individuals and the object resources are classes in this case.

rdfs:subClassOf rdfs:subPropertyOf

RDFS adds mechanisms to define subsumption hierarchies at the class level:

my:ResourceC rdfs:subClassOf some:ClassN .  
my:ResourceD rdfs:subClassOf some:ClassN .  
my:ResourceE rdfs:subClassOf my:ResourceD .

asserts that the resources are specializations of the class indicated. Both subjects and objects of these triples are classes, such as skos:Concept. RDFS also provides for specialization of properties:

my:propertyF rdfs:subPropertyOf some:propertyO .

rdfs:isDefinedBy

An OWL Ontology collects a set of classes, properties and axioms. By convention, rdfs:isDefinedBy links a resource to the ontology context that contains its definition:

my:ResourceD rdfs:isDefinedBy my:OntologyP .  
my:propertyF rdfs:isDefinedBy my:OntologyP .

The subject resources may be either individuals (including properties) or classes.

There is no inverse property to indicate the membership of an owl:Ontology. dct:hasPart has approximately the required semantics. For example:

<http://environment.data.gov.au/water/quality/def/op>
      a       owl:Ontology ;
      dct:hasPart wqop:QualityKind , wqop:qualityKind , wqop:constraint , wqop:ScaledQuantityKind , wqop:featureOfInterest , wqop:matrix , wqop:PropertyKind , wqop:propertyKind , wqop:SubstanceOrTaxon , wqop:applicableVocabulary , wqop:objectOfInterest , wqop:procedure .

If the ontology is also a void:Dataset or skos:ConceptScheme, then the predicates from those vocabularies are available.

skos:ConceptScheme

SKOS introduces the notion of a concept-scheme, which is a set of concepts with a related scope and well defined semantic relationships:

my:ConceptH skos:inScheme my:ConceptSchemeQ .  
my:ConceptI skos:inScheme my:ConceptSchemeQ .

Both concepts and concept-schemes are individuals. In practice skos:ConceptScheme resources add little information useful for reasoning, but can provide a convenient point to attach metadata relating to the set of concepts.

SKOS concept-scheme and OWL ontology have a similar intention, to provide a container for a set of related resources. However, while ontologies can contain any kind of resource (axioms related to classes, properties, or individuals), SKOS only supports membership of concept schemes by SKOS concepts (i.e. individuals). The SKOS Reference indicates that it is consistent with SKOS semantics for the same resource to be typed as both an ontology and a concept-scheme, but OWL reasoners may object as this is only consisten with OWL-Full.

skos:Collection

SKOS also provides collections:

my:CollectionR skos:member my:ConceptH , my:ConceptI , my:CollectionS .

Collections are also individuals. Note that collection membership can be either concepts or collections, so a skos:Collection can be used to assist navigation through a hierarchy of concepts in a fashion similar to a traditional file-system.

skos:narrower (and skos:broader)

Specialized semantic relations in SKOS provide for asserting specialization relations amongst individual concepts within a concept scheme:

my:ConceptH skos:broader my:ConceptJ .  
my:ConceptI skos:broader my:ConceptJ .  
my:ConceptJ skos:narrower my:ConceptH , my:ConceptI .

and between concepts from different schemes:

my:ConceptH skos:broadMatch her:ConceptK .  
my:ConceptH skos:narrowMatch his:ConceptL .

and also for approximate and exact matches between concepts from different schemes:

my:ConceptH skos:closeMatch her:ConceptM .  
my:ConceptH skos:exactMatch his:ConceptN .

Linking back to 4., the top-concept property, and its inverse, provide for entry points at the top of a concept-scheme

my:ConceptJ skos:topConceptOf my:ConceptSchemeQ .  
my:ConceptSchemeQ skos:hasTopConcept my:ConceptJ .

(As expected: skos:topConceptOf rdfs:subPropertyOf skos:inscheme . )

When preparing a specific vocabulary any or all of these patterns may be applicable. The best practice described on this page utilizes patterns 1., 4., 5. and 6.

Non-SKOS properties

A vocabulary provided through a SISSvoc service can contain other ontological relationships. For example, this Geologic Timescale is represented using SKOS, with Cambrian, Ordovician, Silurian etc modeled as concepts, all ‘narrower’ than Paleozoic. But the concepts in this vocabulary are also be typed as a boundary or era, with additional relationships that reflect topology and semantics of the timescale model which has been formalized as an OWL ontology.

NOTE:

  1. Additional properties will be reported in SISSvoc results, but a basic SISSvoc service does not expose the non-SKOS properties for query.
  2. SISSvoc will only return resources whose type is skos:Concept, so reasoning must be enabled to also return resources whose type is a subClassOf skos:Concept.

How many documents/concept-schemes/repositories?

The relationship between concept schemes and vocabularies, ontology documents, and concept repositories is flexible and can be used to support various governance models. Vocabularies provided by SISSvoc services to date use various patterns, including:

Triple-stores, quad-stores and graphs

RDF repositories are commonly referred to as ‘triple-stores’ but in practice are almost always ‘quad-stores’, with an extra field associated with each triple. This element holds a URI which can be:

Different patterns of usage for the extra field can result in few or many different values in the fourth field within a single repository.

Complete examples