Why DPRR as RDF?

According to Wikipedia (page: Linked Data), Tim Berners Lee coined the term "linked data" in 2006 in a document about the Semantic Web project. Nonetheless Wikipedia goes on to cite Bizer, Heath and Berners-Lee's 2009 paper entitled "Linked Data: The Story So Far" as a source for its opening definition of Linked data:

"a method of publishing structured data so that it can be interlinked and become more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried."

About DPRR

Why is Linked Data relevant to the DPRR project? First of all, I believe Linked and Open Data principles are particularly relevant to DPRR because DPRR is a prosopography, and, generally speaking, I believe that published prosopography offers an almost ideal kind of research that could be expressed as linked data. There are two senses in which prosopography connects with linked data's central principles. First, because a prosopography aims to develop the identity of their historical persons in a way that crosses multiple historical sources, these identified historical people act, by their very nature, as a kind of interlinking between these different sources. Second, a prosopography is, at least potentially, a global object — something used by other researchers throughout the world as a source for identities for historical people. The people-as-entities in a prosopography ideally have a global reach and can thus play a part in the Global Graph that web folk, and those in the Semantic Web and Linked Data in particular, talk about. For these reasons, it seems to me that a prosopography forms the basis for a particularly rich and interesting Linked Data kind of publication.

Furthermore, the Digital Prosopography of the Roman Republic (DPRR), like DDH/CCH's many other prosopographical projects, is constructed based on a representation of its materials in the form of highly structured data. Indeed, like DDH/CCH’s other structured prosopographies, DPRR is built on top of that quintessential highly structured paradigm: the relational database, and as a result, DPRR’s historical research work has been already expressed in terms of entities, attributes and relationships as they are thought of in the relational data model. Since the Linked Data model is also based on the idea of representing materials in the form of highly structured data that is accessible globally, DPRR’s highly structured database would appear to fit well with it.

Although DPRR is like DDH/CCH's many other prosopographical projects in that it takes a structured data rather than a narrative-article based traditional prosopographical approach, it is unlike most of DDH’s others in that DPRR is not structured as a factoid prosopography (Bradley 2017). Part of the reason for its non-factoid approach is that, again unlike DDH/CCH’s other prosopographical projects which worked directly with primary sources, DPRR has been built primarily upon late 19^th, 20^th and 21^st century existing print and digital prosopographies; what we call here secondary sources. What sources are used in DPRR?

Of all the prosopographical work available about the Roman Republic T. Robert Broughton's study of office-holders (1951-2, supplement 1986) has been a primary resource work for historians since its creation. When completed in the 1950s reviewers described it as:

"a valuable substitute for a full prosopography of the Roman Republic" in Bagnani's review in Phoenix Vol 7 No 4 (1953). pp 161-2
"useful and extensive accomplishment in Republican Prosopography" in Adams's review in American Journal of Archaeology , vol 58 No 2 (April 1954), p. 176

All of the relevent bits of Broughton are fully included in the DPRR data. However, DPRR is not only built upon the work and publications of Broughton. In addition, DPRR has arranged that more recent prosopographical work by Nicolet (on Roman Knights), Rüpke (on Priests), Zmeskal (on Senators) and others should also be included. Indeed more than 30 secondary sources are listed in DPRR’s database. Thus, this is a further way in which DPRR acts as one kind of model for a linked data project: it represent and links together this diverse range of originally separate modern scholarly prosopographical publications on the Roman republic, and does this via the historical people that appear in them.

Why a DPRR RDF Server?

We have already said that DPRR is built upon the relational database — and called this the “quintessential structured data paradigm”. Here, however, we are talking about a linked data or semantic web representation of DPRR’s materials, and although both linked data/semantic web technologies and relational database technologies are built upon a shared basic conception of highly structured data, they are not the same. What, then, is necessary to turn DPRR’s already existing database-like structured materials into a publication that fits the similar-but-different Linked Data model? In order to think about this most usefully, we need to understand the fundamental principles of Linked Data.

Tim Berners-Lee gave a presentation on linked data at the TED 2010 conference. In it, he restated the linked data principles as three "extremely simple" rules:

"All kinds of conceptual things, they have names now that start with HTTP.
I get important information back. I will get back some data in a standard format which is kind of useful data that somebody might like to know about that thing, about that event.
I get back that information it's not just got somebody's height and weight and when they were born, it's got relationships. And when it has relationships, whenever it expresses a relationship then the other thing that it's related to is given one of those names that starts with HTTP."

More formally, Bizer, Heath and Berners-Lee's 2009 paper, mentioned earlier, specify four criteria that Berners-Lee had described as a “set of 'rules' for publishing data on the Web” in a way that all published data becomes part of a single global data space. These four principles are presented succinctly in Wikipedia’s “Linked Data” entry:

Use URIs as names for things
Use HTTP URIs so that people can look up those names
When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
Include links to other URIs, so that they can discover more things.

Rules one and two of these four seem to be met already by DPRR’s existing web browser oriented web application at URL http://www.romanrepublic.ac.uk/. Rule one, for example: there is one publishable URL provided by the browser-oriented web application for each person in DPRR, and it is a RESTful one (definition of RESTful URL’s see Wikipedia’s definition here). This URL, then, could be interpreted as rule 1’s “URI” acting as a name for its person. Furthermore, if the existing web app is directly presented with this person’s URL, the application will return back an HTML page containing the information DPRR has about that person. Thus, as rule 2 requires, anyone with WWW access can use this HTTP URI to look up that DPRR person. Furthermore, the generated DPRR page provides, as rule 3 says, “useful information” about the entity it refers to — although, of course, the material is presented as an HTML page in a form suitable for presentation by a web browser and is not delivered using the semantic web standards of RDF. Finally, (rule 4) these generated web pages do in fact contain links to other URIs within DPRR.

So, what’s missing from the existing DPRR web application that is needed to make it more fully into a Linked Data application? The key issue can be found in the second half of Wikipedia’s definition of linked data — items 3 and 4. As Bizer, Heath and Berners-Lee say, to operate as Linked Data, the material has to presented “in a way that can be read automatically by computers.” They then go on to say that this enables data from different sources to be “appropriately connected and queried." With the current “browser oriented” web application the material is presented in terms of a HTML web page suitable for reading by a human user, rather than in the form of formal structured data. Of course, one can apply techniques called “screen scraping” to extract the data from the presented web pages, but screen scraping is broadly understood by its practitioners as awkward to do, and prone to error. Thus, when presented as a set of HTML web pages, DPRR’s data cannot readily be processed, as data, by computers, and cannot readily be used as a source to be connected, as data, with other sources. This is why Bizer, Heath and Berners-Lee attach an explicit reference to RDF in their rule 3. RDF is described in its own documentation as a representation of a world-wide “graph-based data model” (section 1.1, https://www.w3.org/TR/rdf11-concepts/). By presenting the DPRR data in RDF, a language specifically designed for interlinking between data that can operate potentially world-wide, and is then available for further computer processing, one can present DPRR’s research materials as a more satisfactory Linked Data source.

Building the RDF

The work described and referenced in these pages does exactly this: it turns DPRR’s relational database which holds the intellectual work embodied in DPRR into RDF, and then uses pieces of RDF-related technology to deliver RDF over the internet to anyone that wants to use it. However, the work done here goes further than just this. In addition:

First, it provides access to RDF data using the query mechanism specifically designed for selecting and processing RDF data: SPARQL (http://www.w3.org/TR/sparql11-query/). By providing a SPARQL endpoint, we are allowing users to select and order data in DPRR in any way that a SPARQL query allows (subject, of course, to length-of-time limitations for processing), and this, in turn, allows for a much broader range of querying than DPRR’s current browser-oriented web application data selection methods support. You can see how this can be exploited in the timeline demonstration provided here.
Second, the work has resulted in the creation of a Semantic Web ontology for DPRR — a formal definition of the structure of DPRR’s data. Digital Ontologies are important elements in the whole framework of Semantic Web technologies that go beyond the more modest aims of Linked Data. They provide mechanisms that, because of their formalism, allow the computer on its own to better exploit the link between the structure of DPRR’s data and its meaning in the world. And, indeed, DPRR’ s ontology does indeed allow some of the Semantic Web’s ideas related to automatic reasoning to be exploited against this data. However, an ontology for DPRR has a further significant different purpose: it makes it easier for human users to understand what kind of information is in the set of RDF data for DPRR, and how it fits together.

So, the work described here resulted in several products:

The DPRR Postgres relational database was translated into a sequence of RDF triples,
The triples were loaded into an RDF repository, and made available to the WWW via a server,
DPRR’s server was extended to support a SPARQL query frontend (a so-called SPARQL endpoint), and
The structure behind RDF’s triples has been defined in a basic Semantic Web OWL ontology.

The rest of this DPRR RDF site talks more about this work, and has three parts:

Using DPRR’s RDF Server: First, there is guidance on how to use the DPRR RDF Linked Data server to get at, and query, DPRR materials (Tab "Using the Server" (above)), and
Building DPRR’s RDF data and server: there is an overview (Tab "Building the Server") of how the RDF triples, repository/server and ontology was created out of the DPRR database.
DPRR Ontology: (Tab "DPRR Ontology") presents an overview of DPRR's ontology. In addition to this, there is also a web-site presentation of the ontology that was automatically generated by OWLDoc here.

References

Berners-Lee, Tim (2010). “The year open data went worldwide”. TED Talk. At https://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide
Bizer, Christian, Tom Heath and Tim Berners-Lee (2009). "Linked Data: the Story So Far". In International Journal on Semantic Web and Information Systems. 5 (3): 1–22. doi:10.4018/jswis.2009081901. ISSN 1552-6283. At http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf.
Broughton, T. Robert S. (1951-2, 1986). The Magistrates of the Roman Republic. In series De Lacy, Phillip H. (ed) Philological Monographs. Atlanta: Scholars Press edition.

DPRR: RDF Services Documentation

Why DPRR as RDF?

About DPRR

Why a DPRR RDF Server?

Building the RDF

References