Matthew West of Shell Oil Company in a set of three white papers ‘Developing High Quality Data Models’ published in 1994 [1] enumerated six design principles that can be used to produce high quality data models. These design principles are –
- Entity types should represent, and be named after, the underlying nature of an object, not the role it plays in a particular context.
- Entity types should be part of a sub-type/super-type hierarchy ("class hierarchy" if you're familiar with object oriented terms) in order to define a universal context for the model.
- Activities and associations should be represented by entity types (not relationships).
- Relationships (in the entity/relationship sense) should only be used to express the involvement of entity types with activities or associations.
- Entity types should have a single attribute as their primary unique identifier. This should be artificial, and not changeable by the user. Relationships should not be used as a part of the primary unique identifier. (They may be part of alternate identifiers.)
- Candidate attributes should be suspected of representing relationships to other entity types.
The objective of this document is to highlight how RIM incorporates these design principles to create a flexible and powerful conceptual information model. Understanding these principles and how they relate to RIM can help to get a deeper understanding of the model. In order to use the model effectively in V3 development process, it is important to understand basic underpinnings of the model. People who have basic knowledge of RIM but need to understand it better in order to develop and implement models based on it can benefit from the discussion below.
Reference Model
HL7 V3 Reference Information Model (RIM) is an information model that defines the information from which all information related content of V3 messages is derived. RIM is the ultimate source of all information models developed as part of V3 development process. Essentially, it is common shared view of information semantics and structure that define and bind concepts into a meaningful, generic abstraction of the real world. HL7 RIM consists of six core classes and several specialized and enumerated sub types of these core classes.
Figure 1: RIM Backbone
HL7 V3 development process is a model driven development methodology that is based upon iteratively constraining scope and content of information thru a linear sequence of constraint models. All models derived from RIM are statements of constraint against the RIM for its use in a specific context. Constraint models narrow down properties of a class, set of values that an attribute can take, restrict the domain of coded concepts, or restrict cardinalities of association between model classes. At each level of constraint, while scope and flexibility gets reduced, information model increasingly becomes specific towards a particular usage or requirement.
This process of applying multiple sequential constraints to RIM ultimately leads to the constraint model that defines the structure and semantics of the message to be exchanged. To start with, a domain information message model (DIM) is developed using new concepts created by constraining class attributes, data types and relationships from the RIM. DIM is a common shared model of a set of Message Information Models (MIM) in a particular domain. MIM is a specific model of constraint against a DIM and is a common shared information model for a set of messages.

Figure2: V3 Constraint Models
The first design principle states that –
“Entity types should represent, and be named after, the underlying nature of an object, not the role it plays in a particular context.”
This is a very powerful design principle that disambiguates between “what an entity is” and “what it does in a particular context”. These two things often get mixed up as a single abstraction in the modeling process and introduce inflexibility in the model.
For instance, let us take an example of a person who is a customer as well as a vendor of the same product. In figure 3 below, person is abstracted as a vendor or a customer of a product that he trades in. Class Customer and Vendor are shown as specializations of class Person resulting in two different types of person – Customer or Vendor. In other words a person could either be a customer or a vendor of a product but not both.

Figure3: Entity and Context
In the redesigned model below, a distinction is made between what the entity is and what it does in the current context. ‘What an entity does’ is moved from as part of the identity of the entity to the relationship between the person and the product class. The role played by a person in each of the two relationships is based upon what a person does in the context of the relationship. If a person buys a product, he becomes a customer and if he sells a product he becomes a vendor of that product. These roles are represented as association roles in the model below. The model recognizes the fact that what a person does in a context are roles played by a person in that context. As we will see in the next section, this relationship is modeled using an associative Role class that defines the competency or role played by an entity in context of another entity.

Figure4: Activity as Relationship
In a similar manner, RIM defines ‘Entity’ as a set of information classes which describe ‘things’ such as persons, organizations, places, devices, substances and containers. Entity classes and enumerations thus defined in the RIM are purely based on what an Entity is and is devoid of any superimposed identity based upon what the entity does in context of another entity. For example, RIM does not define the concept “Employee” to be a type of entity but makes it part of the relationship between two entity classes. Concept ‘Employee’ is described as a relationship between entity classes Person and an organization.
The second design principle states that –
“Activities and associations should be represented by entity types (not relationships)”
Relationship between classes is usually modeled using association roles that enumerate what a class does in relationship with another class. In figure 5 below, Class Person plays the role of a Patient and class Organisation plays the role of a Provider according to the activities performed by these classes in the relationship. The association between instances of the two classes is a many-to-many association with zero or more instances of Person Entity class associated with zero or more instances of organization entity class.

Figure5: Association Roles
In this model, even though a person instance can be associated with multiple instances of Organisation entity, there could be only one association between specific instances of Person and organisation entities. For example, as depicted in figure 6 below, person ‘John Doe’ could be a patient at Organisation ‘Apollo Hospital’ at different points in time. Thus John Doe has more than one association with Apollo hospital. This information cannot be captured thru simple association relationship of this model and constrains John Doe to have only one association with Apollo Hospital.

Figure6: Multiple associations between specific entity instances
To solve this problem, association between Person and Organisation classes is modeled as a class that sits in between these two classes and represents the association between them. Association between Person and Organisation entity classes is modeled using a ‘Role’ class that is an associative class and one that also represents what an entity class does in context of another entity class. Any information that pertains to the association between Person and Organisation classes is captured in this associative class. For example, information related to multiple encounters of John Doe in the role of a patient with Apollo Hospital is captured in the role class (for ex., date attribute). Figure 7 shows these associations of John Doe and Apollo Hospital as multiple instances of the Role class.

Figure7: Example of multiple associations between specific entity instances
In addition to providing for specification of attributes that are specific to the role, externalizing Role from an entity also enables association of multiple roles to an entity. Seen in another way, this allows for two entities to be associated in multiple ways via roles played by player entity in context of the scoper entity. This design principle also ties into actor-role pattern that is very useful and commonly used. Figure 8 displays multiple roles played by Person entity in the context of an Organisation.

Figure8: Multiple relationships between entity types
For example, John Doe while being a patient at Apollo hospital can also be an employed there.

Figure9: Example of multiple relationships between entity types
Similarly, there exists many-to-many relationship between Role and Act class. In the figure below, an entity in the role of a Physician can perform many acts of type observation and one act of observation can be performed by many physicians.

Figure10: Many-to-many association between Role and Act class
To resolve many-to-many relationship between Role and Act classes, we introduce a Role Participation class that is an associative class between Role and Act classes. As stated in this design principle, all activities between Role and Act classes should be represented as a class type. A Role participation class allows for all activities performed by a role in an act to be captured as instances of the Participation class. Role participation class includes coded attribute that enumerates different participation roles that a Role can assume in the context of an Act. Essentially, Role participation is a contextual role played by an entity in a competency role.

Figure11: Participation Role Class
Third design principle states that
“Entity types should be part of sub-type/super-type class hierarchy”
According to this design principle, any conceptual abstraction of a real world entity should either specialize from another entity or should itself be the object of specialization. In a sub-type / super-type class hierarchy, sub-type inherits all attributes and relationships of the super-type but adds at least one unique attribute and/or relationship that is not present in the super-type. For example, in RIM entity model, generic entity class Living_Subject is specialized into “Person” and “Non Person Living Subject” entity classes. Class ‘Person’ adds new attributes such as ‘address’, ‘marital status code’, etc. that complement and complete the definition of a person as a concept.

Figure12: Entity Type Specialisation
Extending sub-types from super-types, as explained above, provides what is known as formal justification for setting up a sub-type / super-type class hierarchy.
In situations, where sub-types have the same set of attributes and relationships as the super-type, sub-types serve to illustrate kind of things represented by the super-type. This method of organizing real world information provides for what is known as informal justification for setting up a sub-type / super-type class hierarchy.
Figure below displays a fragment of RIM entity class hierarchy with entity sub-types represented as entities inside other entities. Common attributes shared between entity super-type and its sub-type(s) are shown in the outside entity. All attributes of entity super-type are inherited by the sub-type.

Figure13: Entity Class Hierarchy
Each sub-type (or concept) in the class hierarchy ‘Entity->Living_Subject->Person, NonPersonLivingSubject defines attributes in addition to those inherited from the super-type (or concept). This is an example of type specialization that is based upon formal rationale of specialization.
Sub-types of NonPersonLivingSubject displayed in the right corner box in the diagram above are examples of specialization that is not based upon formal reasoning. Concepts such as Animal, Microorganism, and Plant are sub-types of NonPersonLivingSubject type. No additional attributes have been defined for these specialized concepts. Concepts Animal, Microorganism, and Plant can also be thought of as distinct concepts when we talk of ‘NonPersonLivingSubject’ domain but for which at present we have decided not to model the distinction itself. These concepts serve to illustrate examples of the concept NonPersonLivingSubject.
RIM does not represent (in the model diagram) conceptual specializations of a class that do not require additional properties beyond the properties of the class that it is specialization of. For all such enumerated specializations, classifier attribute is defined the super class to distinguish specializations that exist conceptually but are not represented in the model. For example, Concepts Animal, Microorganism, and Plant are distinguished with Entity Class code values ‘ANM’, ‘MIC’, and ‘PLNT’ respectively in the controlling vocabulary of Entity Class Code.
Looking at the model defined in Figure 4, we realize that in addition to a person, it could also be an organisation that is a vendor and/or customer of the product. We can capture common attributes of Person and Organisation in a generalization hierarchy with Trading Partner as the super class of enumerated specializations – Person and organisation. Doing so simplifies our model by recognizing the fact that relationships can now apply to either a person or an organization.

Figure14: Entity Specialisation
Similarly, common entity concepts in RIM are organized in a generalization-specialization hierarchy with Entity class at the root of hierarchy. This allows for a simple RIM model since all entity class specializations can participate in the same relationships as applicable for the class generalization in the entity class hierarchy model.
RIM achieves its enormous flexibility as a model thru the use of structural attributes such as Class Code that help distinguish between different conceptual specializations.The controlling vocabulary provides the semantic description of the concepts. It also provides a universal context of generic entity types that are linked to each other in a hierarchical generalization-specialization relationship - one that is based on pure semantics of the concepts involved.
Fourth design principle states that
“Relationships as involvement”
This design principle states that where relationship between two entity types is represented as an associative entity type, relationship between entity types themselves is simply involvement of either entity type with the associative entity between them.
As shown in the figure 15 below, association between person and organization entity types is modeled using another entity type (Role Class). With the introduction of a Role class, relationship between Person and Organization entity types now becomes involvement of Person and Organisation classes with the Role class. In figure below, the involvement is depicted as ‘plays’ or ‘scoped by’ which are roles played by Person and organization entity types in their relationship with Role class.

Figure15: Relationship as involvement
Fifth design principle states that
“Use only surrogate identifiers”
This design principle asserts that only surrogate identifiers be used to indentify entity types. Surrogate identifiers are system generated identifiers that are unique to the system. The identifier possesses no meaning and is used only to guarantee data integrity. Surrogate identifiers are best used for reference entities when it is difficult to find attributes that never change value.
In HL7 RIM, Instance Identifier (II) data type is used to uniquely identify an object. Instance Identifiers are created as Object Identifiers (OID). OIDs are guaranteed to be unique if created by an ISO registration authority following procedures laid down by ISO standards. The basic structure of the instance identifier includes a namespace that is the root of the OID and an extension that serves as the identifier within the namespace. Instance identifiers uniquely identify Act, Role, and Entity class objects. Though HL7 does not mandate creation of meaningless identifiers to identify objects, it does provide the option of doing so using the extension attribute of the OID.
In cases where entity described by class attributes is a concept data type Concept Descriptor (CD) is used to express object identity. In a CD, namespace is the code system and identifier is the code attribute.
Sixth design principle states that
“Candidate attributes should be suspected of representing relationships to other entity types.”
This design principle implores examination of all class attributes to determine whether these attributes are really relationships to another concept.
Conclusion
In the discussion above we saw how HL7 RIM data model conforms to the design principles that produce data models which are generic and flexible to provide a universal context to model real world entities.
________________________________________________________________________
[1]. Matthew West "Developing High Quality Data Models, Volume 1, Principles and Techniques", The Data Management Guide. (London: Shell International Petroleum Company Limited, 1994). These ideas are further expanded at http://www.matthew-west.org.uk/documents/princ03.pdf.




