Abstract
Creating a model of some domain for publishing data as Linked Data is, fundamentally, no different to any other form of modelling exercise: we need to understand the core entities and their attributes, including their relationships to one another. RDF has a core model that is based on the decades old Entity-Attribute-Value (EAV) approach that underpins many different information management approaches and technologies. What is different and new for many people is the context in which that modelling is carried out.
When we are creating an Entity-Relationship (ER) or Object-Oriented model we are typically doing so in order to support the needs of a particular application. We have some idea about what data the application needs to store and we tailor the model accordingly. We also typically optimise the model for the needs of that application. In particular, when translating a logical ER into a physical database schema, we often need to introduce new entities, e.g. to capture many-many relationships in order to be able to express the model within the restrictions imposed by a relational database.
In a Linked Data context we approach a modelling exercise in a different way.
Firstly we may be able to avoid the exercise completely. Using RDF Schema and OWL it is possible to share vocabularies (or ontologies, or schemas, whatever your prefered term) on the web, just as we share data. This means that communities of interest can collaborate around a data model and then extend it where necessary. With semantic web technologies we can better crowd-source and share our data models.
It is often the case that a combination of existing vocabularies will adequately cover a domain. Where there is a short-fall we can define small extensions, e.g. new properties or types, that extend the model for our purposes.
The second key difference is that we focus on modelling the domain itself and set aside the immediate needs of the application. By focusing on the model rather than the application, we are more likely to be able to extend and enrich the model at a later date. Thinking about the entities and their relationships, rather than our application code, results in a stronger model. And one that is more likely to be of use to others.
Of course a model must ultimately support the kinds of data we want to capture or manipulate in our application. The ability to populate and query a model is a good test if it is fit for purpose, but application requirements alone shouldn't guide our modelling.
This is not the same as saying that all modelling must be done up front. Modelling for Linked Data can be agile and iterative. We should just take care to think about the likely areas of extension, as the Link Not Label pattern illustrates.
Thirdly, and finally, the other important way in which RDF modelling differs from other approaches is that there is no separate physical data model: the logical model is exactly how our data is stored. An RDF triple store is "schema-free" in the sense that we don't have to define the physical layout (e.g. tables, columns) of our database. Any RDF triple store can store any RDF data expressed using any RDF vocabulary.
This makes RDF very amenable for rapid, iterative application development. Importantly it also means that everyone involved in a project team is working off the same domain model; the same mental model of how the resources in a dataset are described and how they relate to one another.
This chapter introduces a number of RDF modelling patterns that are useful in a variety of different use cases. Several of the patterns are really illustrations of how to use particular features of RDF, e.g. annotation of literals with data types or languages. We've included these within the overall patterns collection as we feel that presenting them as solutions to specific questions, e.g. "How can internationalized text be expressed in RDF?" may help address what are frequently asked questions by new users.