Parallel Retrieval

How can we improve performance of an application dynamically retrieving Linked Data?


An application that draws on data from the web may typically be retrieving a number of different resources. This is especially true if using the Follow Your Nose pattern to discover data


Use several workers to make parallel GET requests, with each work writing into a shared RDF graph


Most good HTTP client libraries will support parallelisation of HTTP requests. E.g. PHP's curl_multi or Ruby's typhoeus library.


Parallelisation of HTTP requests can greatly reduce retrieval times, e.g. to time of the single longest GET request.

By combining this approach with Resource Caching of the individual responses, an application can maintain a local cache of the most requested data, which are then combined and parsed into a single RDF graph for driving application behaviour.

Parallelisation is particularly useful for AJAX based applications as browsers are particularly well optimized for making a large number of parallel HTTP requests.