Using Protégé-2000 to Edit RDF

30 January 2001

Abstract:

For the past 15 years, the Knowledge Modeling Group (KMG) at Stanford University has developed a variety of knowledge-modeling tools as part of the Protégé project. The current knowledge-base editing tool, Protégé-2000, is an extensible, open-source application. An explicit goal of the KMG is to provide a knowledge-base editing platform that can easily be adapted to any well-defined frame-based modeling language while simultaneously enabling maximal code reuse (e.g., tools developed for a particular modeling language and application ought to be reusable with different modeling languages and applications). Since RDF Schema is a frame-based language, the core Protégé-2000 tools can be easily extended to acquire, edit, and maintain RDF knowledge bases. This document discusses the current support for RDF editing in Protégé-2000 and the tradeoffs and possibilities of implementing a complete set of RDF and RDF Schema editing features in Protégé.

Document Status

This document compares RDF and RDF Schema with Protégé-2000, version 1.5 which has a completely new RDF support. See this document for the comparison of RDF and RDF Schema with the Protégé-2000, version 1.4 release.
Comments on this specification may be sent to protege-help@smi.stanford.edu.

1. An overview of Protégé

Protégé-2000 is a knowledge-base design and knowledge-acquisition system developed at Stanford University. It is the result of 15 years of experience in building tools for knowledge-base construction. In recent years, the major thrust of development has been in making the tools usable to a much larger audience, for a much wider range of applications. There are currently more than 100 projects using Protégé-2000 world-wide.

Protégé-2000 is now available as free software under the open-source Mozilla Public License and Protégé-2000 is compatible with a wide range of knowledge representation languages. It provides an integrated knowledge-base editing environment and an extensible architecture for the creation of customized knowledge-based tools. In the remainder of this overview, we briefly discuss the key architectural ideas that underlie Protégé-2000, in order to motivate and simplify the rest of the document. This diagram may be helpful.

1.1 The Idea of a Form

Protégé knowledge-modeling language makes a distinction between classes and instances. Classes correspond to definitions of concepts and instances correspond to specific examples of a concept. In addition, slots are a third type of modeling abstraction-they are first-class objects that correspond to attributes (e.g., can take values) of either a class or an instance. Slots correspond to properties in RDF and we will use the two terms interchangeably (all the Protégé documentation refers to slots). For example:

    MiniVan is a class
    example_INSTANCE_0009 is an instance of MiniVan
    registeredTo and rearSeatLegRoom are slots which have been attached to the class MiniVan
    The value of the rearSeatLegRoom slot at example_INSTANCE_0009 is "40.2"

(alternatively, you can view these classes and instances in Protégé default user interface).

Protégé-2000 significantly simplifies the often-complicated task of developing an appropriate class hierarchy for a given application (or set of applications). The user can easily browse the class hierarchy, visualize class definitions, create new classes and slots, and bind slots to classes. This can be easily done in the Protégé-2000 ontology editor.

Structured data entry allows users to enter an instance quickly and easily (and to verify that the information that they have entered is correct). Protégé-2000 enables this capability through the use of forms-every class in Protégé-2000 is associated to a user-interface form, which can be customized in a number of ways (for example, by placing important information at the "top" of the form). The use of forms transforms the knowledge-base operation "acquiring an instance" into the user-interface operation "filling out the blanks on the form."

The complete editing cycle is therefore the following: define a concept, layout the associated form, and use the form to acquire instances.

1.2 What Widgets Do

A slot is, formally, a binary relation with both a domain and a range. That is, a slot is attached to a set of classes (the slot domain) and, for each class to which the slot is attached, the slot has a set of allowed value types (taken as a whole, these define the range of the slot). In the example referenced above, the slot rearSeatLegRoom, is attached to the class MiniVan and has floating-point values.

Protégé-2000 defines a set of custom user-interface components (widgets) that know how to acquire and display the value of a slot on a particular class or instance. In this example, FloatFieldWidget is being used to acquire the rearSeatLegRoom of an instance of MiniVan. FloatFieldWidget not only displays the value of the slot, but also it allows the user to edit the value, and can perform some simple validation checks on what the user has typed.

System developers use a well-defined API to implement separate widgets defined as separate components that interact with the core Protégé framework. This component architecture enables the creation of collections of user-interface devices that are specialized for acquiring certain types of knowledge (e.g., widgets that are appropriate for slots with certain value types). In the following example, a user decides to use a SliderWidget instead of a FloatFieldWidget widget to acquire (and view) a floating-point value.The user views the automatically generated form, chooses to use a SliderWidget instead of a FloatFieldWidget, and uses it to browse the knowledge base.

1.3 Storage Models and Persistence

The final piece of the Protégé-2000 component architecture is a storage model. The core Protégé framework does not include any code to save (or load) a knowledge base. Instead, the framework delegates this functionality to a persistence layer, with which it interacts via a published (and formally defined) API. The net effect is to totally decouple the widgets, and the user interface, from the actual storage and mechanism, and thereby to enable Protégé-2000 to save a given knowledge base to a wide variety of formats.

An additional benefit of this layer of indirection is that it provides a convenient location for translation code. For example, we implemented the RDF storage layer to import RDF files to Protégé and store Protégé knowledge bases in RDF. This layer, in addition to saving the knowledge base as an RDF document or creating a knowledge base from an RDF document, also performs the necessary interpretation and translation.

1.4 Conclusions

While there are some minor distinctions between the Protégé knowledge model and the knowledge model used in RDF, the differences are easy to identify and to overcome, using a few simple mapping conventions that we adopted in the persistence layer.

2. Summary of mappings and differences between core concepts of RDF and Protégé-2000

2.1 Core classes and properties mapping

The RDF persistence layer, which is available in the version 1.5 of Protégé-2000, eliminates most of the terminological differences in the mapping between core classes and properties in RDF and Protégé. We discuss explicit RDF support in Protégé-2000 in the next section.

2.3 Summary of differences between the knowledge models of RDF and Protégé-2000

The table below summarizes the relatively minor semantic differences in the knowledge models of RDF and Protégé-2000. In the later section, we elaborate on the elements in the table and suggest ways of resolving the differences.

Feature	RDF and RDF Schema	Protégé-2000
Multi-class membership	A resource can be an instance of one or more classes	An instance can have only one direct type
Range constraints	The value of the range property is a single Class which constraints the value of the corresponding property to instances of that class	A value of a slot can be a value of a primitive type or an instance of a class. There can be one or more classes that constrain the value
Containers	There are three types of container objects: bag, sequence, and alternative	Collections have to be encoded, e.g. by ordered lists
Namespaces	Frame names are unique within one schema; for multiple schemas, the XML namespace facility is used to associate each property with the schema	Frame names are unique within one project. Name conflicts are not resolved during project inclusion.
Literal markup	A literal may have content that is XML markup but is not further evaluated by the RDF processor or it can be a primitive datatype defined by XML	Literals can be either plain strings, numbers, symbols, or boolean values

3. How to use Protégé-2000 as an RDF editor

When you create or save an RDF project in Protégé (by selecting RDF Schema as your storage format when you open, import, or save a project), Protégé will use its normal set of sytem metaclasses which map directly to the RDF core classes as follows:

:THING <-> rdfs:Resource: the superclass of all classes
:STANDARD-CLASS <-> rdfs:Class: the default metaclass for each new class you create.
A metaclass is a template from which new classes get their own slots. More formally, each class is an instance of a meta class. An own slot for a class defines a property of the class itself and not of its instances.
:STANDARD-SLOT <-> rdf:Property: the default metaclass for each new property (or slot) you create

Since Protégé-2000 version 1.5 does not allow its metaclasses to be changed, you cannot make changes to rdfs:Resource, rdfs:Class, and rdf:Property. As a consequence, rdfs:seeAlso, rdfs:isDefinedBy, and rdfs:label are not supported since this would require attaching them to :THING. This will be fixed in a future release.

4. Differences between knowledge models of RDF and Protégé-2000

The RDF storage layer in version 1.5 of Protégé-2000 does not completely eliminate the semantic differences that we describe in this section. For many of them, the current RDF layer uses a short-term solution, which we plan to improve later.

4.1 Multi-class membership

RDF permits "multi-class membership" or "complex-entity types": resources may be instances of several classes C₁, ..., C_n. In Protégé-2000, each instance has only one direct type because of user-interface considerations. (If there is no single class that is a parent of that instance, then there is no place to edit the complete form for an instance.) When a user needs to create a resource that is an instance of several classes C₁, ..., C_n, the RDF-editor layer of Protégé-2000 could simulate the multi-class membership by automatically creating a new class C as a subclass of C₁, ..., C_n and then by creating an instance of class C. This solution was used for creating an interface between Protégé and Loom. For the RDF support in Protégé-2000 version 1.5, only one type is picked, and an error message is generated.

4.2 Core constraints: rdfs:domain and rdfs:range

In Protégé-2000, slots (properties) are linked to classes through slot attachment (see Protégé-2000 knowledge model summary). For each slot S, the set of classes to which S is attached can be viewed as the domain of slot S. This notion of slot domain is the same as rdfs:domain for properties in RDF.

Protégé-2000 and the RDF Schema handle ranges quite differently. RDF Schema, via the rdfs:range property, defines the range of a property to be instances of a single class. In Protégé-2000, on the other hand, slot ranges are defined using multiple properties. Each slot has both an associated primitive type (one of: integer, float, string, symbol, boolean, class, or instance) and additional constraints (depending on the primitive type) that allow the range to be more precisely specified. For example, the semantics of RDF Schema's rdfs:range property are exactly modeled by using the primitive type Instance and listing exactly one class in the "Allowed Classes" facet. Protégé-2000 allows the user to have more than one class in the "Allowed classes" list.

In order to comply with the one-range restriction in RDF, the RDF-editing layer picks the smallest common superclass of the classes in the allowed classes list (alternatively, it could create a new class which is a superclass of all of the intended allowed classes and use it as the range). This mapping can be done in the RDF storage model without the user ever being aware of it. Alternatively, users of Protégé-2000 can exercise self discipline and create the common superclass themselves.

4.3 Containers

RDF Schema defines three different types of containers for properties that have multiple values: Sequence, Alternative, and Bag (each of these container types has a different semantics). The Protégé knowledge model, on the other hand, does not have an explicit container type.

4.4 Namespaces

XML namespaces, which are basic to RDF and RDF Schema, are not currently supported in Protégé-2000 directly. However, the RDF persistence layer introduced the concept of namespace abbreviations, i.e., each frame that is not in the default namespace (which has to be specified when importing or saving a knowledge base) is prefixed with a namespace abbreviation. When saving a knowledgebase, the abbreviations are expanded into the full namespace URIs. This concept also allows included projects to use namespaces different from the namespace of the main project, thus avoiding name clashes.

4.5 Literals

RDF literals are "the most primitive value type represented in RDF, typically a string of characters. The content of a literal is not interpreted by RDF itself and may contain additional XML markup." RDF does not yet define any other concrete data types like integer, float, date, ..., but often the corresponding XML Schema types are used (unofficially).
Therefore, there are two issues in supporting the RDF notion of literals in Protégé: allowing XML markup and supporting primitive data types. In the current implementation, XML Schema datatypes are not automatically recognized on import. RDF Literal (and any subclasses) are mapped to Protégé's String type when used as a range.
XML markup for RDF literals is not currently supported in Protégé-2000. The Protégé plug-in API allows implementing slot widgets that display, acquire, and validate slot values of a specific type. Therefore one can implement an XML-value widget that will acquire and validate the XML input to ensure that it is well-formed and that the XML instance corresponds to a specific DTD. These widgets can be implemented independently of the RDF-editing layer.

5. Solutions for Difficulties Identified in this Document

The table below summarizes the issues involved in making Protégé-2000 an RDF and RDF Schema editor.

Issue	Implemented in Protégé-2000 version 1.5	Long-Term Solution
Multi-class membership	Only one class is picked (plus error message).	Automatically generate the necessary intermediate classes.
Range constraints	Smallest superclass is used.	Automatically generate intermediate classes which encapsulate the range constraints.
Containers	Not supported.
Namespaces	Automatically prepend the namespace abbreviation to the frame name.	Support namespaces throughout the Protégé-2000 framework.
Literals	Mapped to String.	Support XML Schema types if their use becomes official in RDF.

6. Acknowledgments

The following people contributed to this document: Harold Boley, Stefan Decker, William Grosso, Natalya Fridman Noy, Michael Sintek, and Mark A. Musen.

Appendix. Protégé-2000 knowledge model

Protégé-2000 is a frame-based system. The knowledge model of Protégé-2000 is compatible with the Open Knowledge-Based Connectivity Protocol (OKBC). The main elements of the Protégé knowledge model are frames representing:

classes which correspond to concepts in the domain;
instances of classes;
slots which are properties of classes and instances;
facets which are properties of slots

Classes are organized in a subclass-of hierarchy with multiple inheritance. Every instance of a class A is also an instance of any of the superclasses of A. Classes themselves can be instances of other classes (metaclasses). Slots are first-class objects in Protégé-2000. Slots are attached to classes and instances in one of two ways:

Slots can be attached to a class as template slots, describing the properties of instances of that class. When a slot is attached to a class as a template slot, value-type restrictions can be defined for that slot. Template slots are inherited to subclasses. Template slots become own slots of a class's instance
Slots can be attached to a class or an instance as own slots, describing properties of that particular class or instance (and not instances of a class). Therefore, when an own slot is attached to a class, its value describes properties of the class itself (e.g., URI, creator, etc.).

Creating and editing classes

The class hierarchy and the slots attached to classes can be browsed and edited within Protégé-2000's Classes tab. For instance, Protégé-2000 will present the example of 'MotorVehicle' from the RDF Schema Specification in the following way.

The left-hand pane visualizes the class hierarchy in a tree. The right-hand pane summarizes the slots that are attached to the highlighted class. Each slot has cardinality (single or multiple) defining the number of possible values for the slot and value type defining the types of values. Depending on the value type, additional restrictions on the values can be specified using facets. For instance, if the value of a slot is an instance of another class, the allowed-classes facet contains the list of classes that the instances can come from.

Creating and editing instances

The Instances tab in Protégé-2000 provides the interface for creating instances of classes. Protégé-2000 uses a forms interface for acquiring the slot values for instances. Protégé-2000 automatically generates the layout and content of the instance forms based on the values and cardinality of slots for the class. The user can then customize the forms using the Forms tab. An instance of a MiniVan from an earlier example will be represented in the following way.