According to Wikipedia (page: Linked Data), Tim Berners Lee coined the term "linked data" in 2006 in a document about the Semantic Web project. Nonetheless Wikipedia goes on to cite Bizer, Heath and Berners-Lee's 2009 paper entitled "Linked Data: The Story So Far" as a source for its opening definition of Linked data:
"a method of publishing structured data so that it can be interlinked and become more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried."
Why is Linked Data relevant to the DPRR project? First of all, I believe Linked and Open Data principles are particularly relevant to DPRR because DPRR is a prosopography, and, generally speaking, I believe that published prosopography offers an almost ideal kind of research that could be expressed as linked data. There are two senses in which prosopography connects with linked data's central principles. First, because a prosopography aims to develop the identity of their historical persons in a way that crosses multiple historical sources, these identified historical people act, by their very nature, as a kind of interlinking between these different sources. Second, a prosopography is, at least potentially, a global object — something used by other researchers throughout the world as a source for identities for historical people. The people-as-entities in a prosopography ideally have a global reach and can thus play a part in the Global Graph that web folk, and those in the Semantic Web and Linked Data in particular, talk about. For these reasons, it seems to me that a prosopography forms the basis for a particularly rich and interesting Linked Data kind of publication.
Furthermore, the Digital Prosopography of the Roman Republic (DPRR), like DDH/CCH's many other prosopographical projects, is constructed based on a representation of its materials in the form of highly structured data. Indeed, like DDH/CCH’s other structured prosopographies, DPRR is built on top of that quintessential highly structured paradigm: the relational database, and as a result, DPRR’s historical research work has been already expressed in terms of entities, attributes and relationships as they are thought of in the relational data model. Since the Linked Data model is also based on the idea of representing materials in the form of highly structured data that is accessible globally, DPRR’s highly structured database would appear to fit well with it.
Although DPRR is like DDH/CCH's many other prosopographical projects in that it takes a structured data rather than a narrative-article based traditional prosopographical approach, it is unlike most of DDH’s others in that DPRR is not structured as a factoid prosopography (Bradley 2017). Part of the reason for its non-factoid approach is that, again unlike DDH/CCH’s other prosopographical projects which worked directly with primary sources, DPRR has been built primarily upon late 19th, 20th and 21st century existing print and digital prosopographies; what we call here secondary sources. What sources are used in DPRR?
Of all the prosopographical work available about the Roman Republic T. Robert Broughton's study of office-holders (1951-2, supplement 1986) has been a primary resource work for historians since its creation. When completed in the 1950s reviewers described it as:
All of the relevent bits of Broughton are fully included in the DPRR data. However, DPRR is not only built upon the work and publications of Broughton. In addition, DPRR has arranged that more recent prosopographical work by Nicolet (on Roman Knights), Rüpke (on Priests), Zmeskal (on Senators) and others should also be included. Indeed more than 30 secondary sources are listed in DPRR’s database. Thus, this is a further way in which DPRR acts as one kind of model for a linked data project: it represent and links together this diverse range of originally separate modern scholarly prosopographical publications on the Roman republic, and does this via the historical people that appear in them.
We have already said that DPRR is built upon the relational database — and called this the “quintessential structured data paradigm”. Here, however, we are talking about a linked data or semantic web representation of DPRR’s materials, and although both linked data/semantic web technologies and relational database technologies are built upon a shared basic conception of highly structured data, they are not the same. What, then, is necessary to turn DPRR’s already existing database-like structured materials into a publication that fits the similar-but-different Linked Data model? In order to think about this most usefully, we need to understand the fundamental principles of Linked Data.
Tim Berners-Lee gave a presentation on linked data at the TED 2010 conference. In it, he restated the linked data principles as three "extremely simple" rules:
More formally, Bizer, Heath and Berners-Lee's 2009 paper, mentioned earlier, specify four criteria that Berners-Lee had described as a “set of 'rules' for publishing data on the Web” in a way that all published data becomes part of a single global data space. These four principles are presented succinctly in Wikipedia’s “Linked Data” entry:
Rules one and two of these four seem to be met already by DPRR’s existing web browser oriented web application at URL http://www.romanrepublic.ac.uk/. Rule one, for example: there is one publishable URL provided by the browser-oriented web application for each person in DPRR, and it is a RESTful one (definition of RESTful URL’s see Wikipedia’s definition here). This URL, then, could be interpreted as rule 1’s “URI” acting as a name for its person. Furthermore, if the existing web app is directly presented with this person’s URL, the application will return back an HTML page containing the information DPRR has about that person. Thus, as rule 2 requires, anyone with WWW access can use this HTTP URI to look up that DPRR person. Furthermore, the generated DPRR page provides, as rule 3 says, “useful information” about the entity it refers to — although, of course, the material is presented as an HTML page in a form suitable for presentation by a web browser and is not delivered using the semantic web standards of RDF. Finally, (rule 4) these generated web pages do in fact contain links to other URIs within DPRR.
So, what’s missing from the existing DPRR web application that is needed to make it more fully into a Linked Data application? The key issue can be found in the second half of Wikipedia’s definition of linked data — items 3 and 4. As Bizer, Heath and Berners-Lee say, to operate as Linked Data, the material has to presented “in a way that can be read automatically by computers.” They then go on to say that this enables data from different sources to be “appropriately connected and queried." With the current “browser oriented” web application the material is presented in terms of a HTML web page suitable for reading by a human user, rather than in the form of formal structured data. Of course, one can apply techniques called “screen scraping” to extract the data from the presented web pages, but screen scraping is broadly understood by its practitioners as awkward to do, and prone to error. Thus, when presented as a set of HTML web pages, DPRR’s data cannot readily be processed, as data, by computers, and cannot readily be used as a source to be connected, as data, with other sources. This is why Bizer, Heath and Berners-Lee attach an explicit reference to RDF in their rule 3. RDF is described in its own documentation as a representation of a world-wide “graph-based data model” (section 1.1, https://www.w3.org/TR/rdf11-concepts/). By presenting the DPRR data in RDF, a language specifically designed for interlinking between data that can operate potentially world-wide, and is then available for further computer processing, one can present DPRR’s research materials as a more satisfactory Linked Data source.
The work described and referenced in these pages does exactly this: it turns DPRR’s relational database which holds the intellectual work embodied in DPRR into RDF, and then uses pieces of RDF-related technology to deliver RDF over the internet to anyone that wants to use it. However, the work done here goes further than just this. In addition:
So, the work described here resulted in several products:
The rest of this DPRR RDF site talks more about this work, and has three parts: