Graph Per Aspect

How can we avoid contention around updates to a single graph when applying the Graph Per Resource pattern?

Context

For some applications the entire description of a resource might be maintained by a single authority, e.g. the data might all derive from a single data conversion or be managed by a single editing interface. However in some applications data about a single resource might be contributed in different ways. One example might be a VoiD description for a dataset. A dataset description may consist of a mixture of hand-authored information -- e.g. a title, description, and example resources -- plus some statistics derived from the dataset itself, e.g. size and class partitions. An administrator might update the descriptive aspects while the rest is updated asynchronously by a background application that analyses the dataset.

Multiple applications writing to the same graph could lead to contention for system resources or the need to implement complex locking behaviour.

Solution

Apply a combination of the Graph Per Resource and Graph Per Source patterns and factor out the different aspects of a resources description into separate graphs. Use a Union Graph to collate the different aspects of the description of a resource into a single view.

Example(s)

A content management application stores information about articles. This includes descriptive metadata about the articles as well as pointers to the content. Content metadata will be manually managed by users. In the background two additional processes will be carrying out additional tasks. One will be retrieving the content of the article to perform text mining, resulting in machine-tagging of subjects in the article. The second will be harvesting related links from the rest of the system and the web. The "aspect graphs" are created: one for the core metadata, one for the tags and one for the links:

	
#core description of a resource; provided by user
<http://data.example.org/graphs/core/document/1> {
	<http://example.org/document/1> dcterms:title "Bath in the Summertime".
}    	
#tags; maintained by process 1.
<http://data.example.org/graphs/tags/document/1> {
	<http://example.org/document/1> dc:subject "Bath".
	<http://example.org/document/1> dc:subject "Travel".
}    	
#related links; maintained by process 2.
<http://data.example.org/graphs/links/document/1> {
	<http://example.org/document/1> dcterms:related <http://travel.example.org/doc/bath>.
}    	
#System metadata graph, listing topic of each graph
<http://data.example.org/graphs> { 
	<http://data.example.org/graphs/core/document/1> foaf:primaryTopic <http://example.org/document/1>.
	<http://data.example.org/graphs/tags/document/1> foaf:primaryTopic <http://example.org/document/1>.
	<http://data.example.org/graphs/links/document/1> foaf:primaryTopic <http://example.org/document/1>.
}

As the above example illustrates, graph URIs for the different aspects of a resources description can by generated by using Patterned URIs. A fourth graph, covering system-wide metadata is also maintained. This graph lists the foaf:primaryTopic of each graph, allowing applications to discover which graphs relate to a specific resource.

An application consuming this data could rely on either a system default Union Graph to provide a complete view of a resource. Partial views might address individual named graphs. Using a CONSTRUCT query it is also possible to construct a view of a resource using just those graphs referenced in the system metadata graph:

	
CONSTRUCT { ?s ?p ?o. }
WHERE {
 GRAPH <http://data.example.org/graphs> {
   	  ?graph foaf:primaryTopic <http://example.org/document/1>.
  }
 GRAPH ?graph { ?s ?p ?o. }
}

Discussion

Named graphs provide flexibility in how to organise an RDF data store. In some cases storage is oriented towards the sources of data, in others around individual resources. The Graph Per Aspect pattern provides a combination of those features that allows for very fine-grained graph management. The description of each resource is divided over a number of graphs, each of which is contributed to the system by a different source or application component.

As with the other named graph patterns reliance is made on the Union Graph pattern to bring together the description of a resource into a single consistent view.

Separating out aspects of resource description into different graphs also provides a way to shard a dataset. Different aspects might be stored in different triple stores across a network. These are then brought together in the application for building a user interface. With knowledge of how graphs are partitioned across the network, as well as which graphs contain which statements, an application can use Parallel Retrieval to synthesise a local working copy of a resource's description. This aggregation could happen within a server component or on the client-side

The small, focused graphs created by use of this pattern and, more generally, by the Graph Per Resource pattern are very amenable for delivery to mobile & web clients for local processing. By separating out the different aspects of a resource into graphs that are likely to change with differing frequencies, caching can be made much more efficient.

Related

Further Reading