The trouble with PSI’s

Published Subject Identifiers face a couple of serious problems, as do all URI-based identifier schemes. A recent post of Lars Marius Garshol reminded me of the – pardon the pun – subject. I was pretty occupied with PSI’s some time ago, maybe now is the moment to write down some of my reservations about PSI’s. PSI’s are URI’s which uniquely identify something, which is – ideally – described on the web page you see when you browse to the URI – read Lars’ post for a real introduction.

First, PSI’s solve the wrong problem. The idea of using some schema to uniquely identify things is hardly novel. Any larger collection of names or id’s for whatever sooner or later faces the problem of telling whether two names point to the same whatever or not. So we’ve got ISBN numbers for books, codes for asteroids and social security numbers for people. They are supposed to designate a single individual or single concept. The problem with all those identifier schemes is not the idea, but the fact that they get polluted over time. Through software glitches and human error and – mostly – intentional fraud a single social security number may point to two or more persons and two or more social security numbers may point to a single person. I can’t speak for non-Dutch realms, but the problem here is very real, and I assume given human nature it is not much different elsewhere. So the real problem is not inventing a identification scheme, the problem is avoiding pollution. This may seem like unfair criticism – no, PSI’s don’t solve famine and war either – but it does set PSI’s in the right light – they are not a panacea for identity problems.

Second, PSI’s are supposed to help identification for both computers – they will compare URI’s, and conclude two things are the same if their URI’s are equivalent – and for humans, through an associated web page. The trouble is what to put on the web page. Let’s make a PSI for me, using my social securitiy number: Now what can we put on the web page? We could say “This page identifies the person with the Dutch social security number 0123.456.789” – but that is hardly additional information. If we elaborate – “This page identifies Marc de Graauw, the son of Joop de Graauw and Mieke Hoendervangers, who was born on the 6th of March 1961 in Tilburg, the Netherlands, the person with the Dutch social security number 0123.456.789” we get into trouble. I could find out for instance I was not actually born in Tilburg, but my that my parents for some reason falsely reported this as my birthplace to the authorities. Now even if this were the case, 0123.456.789 would still be my social security number, and it would identify me, not someone else. But if we look at the page, we have to conclude identifies nobody, since nobody fits all the criteria listed. The same goes for any other fact we could list – I could find out I was born on another day, to other parents et cetera. The only truly reliable information, the one piece we cannot change, is “This page identifies the person who has been given the Dutch social security number 0123.456.789 by the Dutch Tax Authority”, which hardly is no information at all beyond the social security number itself. All we’ve achieved is prepending my social security number with, and this simple addition won’t solve any real-world problem. The problem is highlighted by Lars’ example of a PSI, the date page for my birthday, This page has no information whatsoever which could not be conveyed with a simple standardized date format, such ‘1961-03-06’ in ISO 8601.

Third, in a real-world scenario, establishing identity statements across concepts from diverse ontologies is the problem to solve. Getting everybody to use the same single identifier for a concept is not feasible. Take an example such as Chimezie Ogbuji’s work on a Problem-Oriented Medical Record Ontology, where it says about ‘Person’:

cpr:person = foaf:Person and galen:Person and rim:EntityPerson and dol:rational-agent

In FOAF a person is defined with:”Something is a foaf:Person if it is a person. We don’t nitpic about whethet they’re alive, dead, real or imaginary.”

HL7’s RIM defines Person as:

“A subtype of LivingSubject representing a human being.”

and defines LivingSubject as:
“A subtype of Entity representing an organism or complex animal, alive or not.”

In FOAF persons can be imaginary and ‘not real’, in the RIM they cannot. Now Chimezie wisely uses and which is an intersection in OWL, so his cpr:Person does not include imaginary persons. And for patient records, which are his concern, it doesn’t matter: we don’t treat Donald Duck for bird flu, so for medical care the entire problem is theoretical. But what about PSI’s: could we ever reconcile the concepts behind foaf:Person and rim:EntityPerson? Probably not: there are a lot of contexts where imaginary persons make sense. So if we make two PSI’s, foaf:Person and rim:EntityPerson, our subjects won’t merge, even when – in a certain context such as medical care – they should. Or we could forbid the use of foaf:Person in the medical realm, but this seems to harsh: the FOAF approach to personal information is certainly useful in medical care.

Identity of concepts is context-dependent. The definitions behind the concepts don’t matter much. Trying to find a universal definition for any complex concept such as ‘person’ will only lead to endless semantic war. Usually natural language words will do for a definition (but you do need disambiguation for homonyms). Way more important than trying to establish a single new id system with new definitions, are ways to make sensible context-dependent equivalences between existing id systems.

15 Sep 2008: Comments are closed