specifying the persona #236

stoicflame · 2013-03-14T20:45:31Z

Your comments are invited on the attached changes to the specification which define the notion of a persona and provide a way to indicate data that is identified as a persona.

…cause the information is redundant with the flag.

MystEre84 · 2013-03-14T22:10:23Z

Hi,

do you have an example when a persona would be useful?

Personnally I think it would complicate the gedcomX model. Isn't the "confidence level" of conclusions already doing something like that? I thought, and I prefer, to have only one person and all of the "informations" about it, right or wrong (using confidence level to make the difference).
The idea of "persona" means that there may be "right" and "wrong" informations but as we mostly do research about people that we haven't known in person I consider every information we collect can be wrong we just choose to consider some of them as "right" with proof or not.

mikkelee · 2013-03-14T22:16:21Z

@stoicflame - would personas be N-tier?

My usecase in #232 might benefit from a tiered persona construct. A through D would then be personas, they would be gathered as AB + CD personas, and finally a top-level Person gathering the evidence with associated confidence-levels.

stoicflame · 2013-03-14T22:58:31Z

do you have an example when a persona would be useful?

Now, I'm only an amateur (close to novice) genealogist, but as I do my best to understand the needs of professional researchers, they express the need for more fidelity with the genealogical research process. Part of the formality of this process is a separation between the act of gathering information and compiling evidence. The process of gathering information includes digging through all the potentially relevant sources and recording what they say. After you've gone through the process of gathering information, you compile evidence that supports your conclusions about a person. The process of gathering evidence includes a bunch of analysis of the information you've gathered. You do your best to determine whether a given information item is applicable as evidence, based on (among other things) where it came from and what it said.

So that's a pretty quick-and-dirty overview, but the concept of a persona is intended to separate your "information" from your "evidence".

I thought, and I prefer, to have only one person and all of the "informations" about it, right or wrong (using confidence level to make the difference).

Yes, but you're describing an activity that is pretty strictly in the "evidence" side of the world. I believe the model supports what you describe, but the concept of a persona is intended to support applications that deal with more than just the evidence gathering part of the research process.

would personas be N-tier?

Yes, they are one of the basic "units" of the N-tier architecture.

My usecase in #232 might benefit from a tiered persona construct.

Indeed, that's the idea.

A through D would then be personas, they would be gathered as AB + CD personas, and finally a top-level Person gathering the evidence with associated confidence-levels.

Yes, although I would say that the act of identifying A and B as the same person is part of the conclusion-making process and therefore the thing that binds AB would be considered a "person" and not a "persona".

mikkelee · 2013-03-14T23:13:21Z

Yes, although I would say that the act of identifying A and B as the same person is part of the conclusion-making process and therefore the thing that binds AB would be considered a "person" and not a "persona".

Ok, that makes sense. I like this idea so far, I'll think on the consequences some more.

jralls · 2013-03-15T03:41:05Z

Since it hasn't been mentioned yet, this is related to #149, #72, and #138, among several others that were discussed at excruciating length last year.

As for the present proposal, it's seriously incomplete: If you flag a Person as a Persona, how then to you connect to a conclusional Person? Via a SoureReference?

stoicflame · 2013-03-15T15:37:01Z

As for the present proposal, it's seriously incomplete: If you flag a Person as a Persona, how then to you connect to a conclusional Person? Via a SoureReference?

I'm not sure what you mean by "connect to". Do you mean "cite as a source"? If so, then yeah, via source reference. Or do you mean "make a conclusion that two personas are the same person"? If so, then via an identifier of type Evidence. If you're wanting clarification on how to support a full n-tiered implementation as defined by @ttwetmore, then you're correct that it's incomplete: I expect to attach the proposed changes to #149 in the next few days.

jralls · 2013-03-15T16:31:31Z

I meant "make a conclusion...", but I'll wait for the changes to N-tier before commenting further.

…erson...

MystEre84 · 2013-03-15T21:11:57Z

Thank you for the answer. I understand the need of evidence it's just that I really don't see what is the best way to modelise it. So with this change it could be possible either to make directly conclusions from sources or define informations, that's it?
Why the constrain of a unique source for a persona? because even if the names are the same we have to prove that it's the same person?

I think the boolean choice is a good idea so we can pass from information to conclusion, and conversely, quickly without duplicating informations if it is no needed. Did I understand well?

Another question: why was there "extractedConclusions" in SourceDecription? can't we find them with SourceReference in each conclusions?

thomast73 · 2013-03-15T22:19:41Z

I would like to attempt to restate the use case, requirements, and proposal, then perhaps re-solicit feedback.

But first, I want to attempt to be strict in my use of a couple of words. I like to think of a source as a container of information (e.g. a death record source might include information about birth, death, burial, parents, etc.). When we identify information in a source as helpful to answering a question, the selected information becomes evidence in an answer to that question.

The Use Case

We often create digital representations of our sources -- things like image copies, extracts, abstracts, transcriptions, indexes, etc. Combined with appropriate software, these digital representations become useful in the research process (e.g., finding aides, mechanisms for sharing sources to that our work can be peer reviewed, etc.). One desired representation of the information in a source is a lineage-linked representation of the persons and relationships found there -- a "micro-tree" of sorts. Ideally, this lineage-linked data would be constructed using GEDCOM X entities -- Person, Relationship, etc. These entities are intended to represent the information in the source. Ideally, these entities are not the result of lots of interpretation but remain true to the information in the source.

As a researcher, I find sources and create digital representations of the information in those sources -- including lineage-linked representations. Along side the representations of this information is the data about what I have concluded -- the conclusions that represent the result of correlating the evidence I've selected. My conclusions are also represented with GEDCOM X entities -- Person, Relationship, etc. -- the same objects used to represent the lineage-linked information from my sources.

Requirement(s)

In exchanging data, it is required that the data representing conclusions to be distinct from the data representing information in sources.

It is also desired that the same model entities be used to described both conclusions and their informational equivalents.

The Current Model

We designate the objects intended to represent information in a source as such by adding references to them to the SourceDescription.extractedConclusions list.

NOTE: Given a list of Person instances, we cannot tell which instances represent information and which represent conclusions without walking the list of SourceDescription instances to discover which appear in extractedConclusions lists.

The Proposal

@stoicflame has proposed that we remove the extractedConcluions list from SourceDescription and add something to Person to mark that instance of Person as being a persona (an instance of Person being used to represent information in a single source). The absence of the marker says that the instance is conclusion.

NOTE: Given a list of Person instances, it is easy to pick out those intended to represent information. To discover other information entities, one must explore the Person instances associated with those entities.

Comments

I understand the value of representing both information and conclusions using the same types of entities. I also believe that it will be important to distinguish the conclusions from the information.

I do not think that either mechanism -- the current or the proposed -- makes distinguishing conclusions and information particularly easy.

Some feel that pushing the marker to the Conclusion object would be over-kill (e.g. do I need to mark a Name as being information. There have also objections to sub-classing entities (e.g. InformationPerson extends Person).

We could mark the top-level entities (e.g., Person, Relationship, Event, Document)?

What are some other ways we might look at these issues?

@MystEre84: Why the constrain of a unique source for a persona?

The reason for this constraint is that the personas are intended to represent the information in a single source.

ttwetmore · 2013-03-15T22:32:34Z

I hope you will indulge me as I have managed to be silent on this topic, which is of great interest to me, for many months! So here are a few sentences on my personal views about personas and related concepts.

A persona is a record in a database. Its fields contains information extracted from evidence found in a source. A persona has only one source because it holds information extracted from a single item of evidence.

When a researcher decides that two personas represent the same person a new person record is created that links to the two persona records. The two persona records are permanent records. They are never destroyed. They are not merged into the body of the new person record. The new person record does not need a source because the persona records already hold complete source information. The new person record does not need any fields at all, really, since it inherits things like name, gender, birth date from the personas. If there are conflicts in the data in the two personas then the preferred or chosen or even modified values can be added to the person records, which then take precedence over the values in the personas.

The person record obviously represents a conclusion. In a sense that conclusion is the "source" of the person. Whereas personas need source references, a person should have a conclusion. When you set an attribute of the person record, say the person's name, which may be different in the different personas, you are making the conclusion that this is the better name for the person.

What is described here is a two tier, binary system. There is no need to be binary. A person record may contain links to many persona records. This is obviously necessary as new evidence is found, and that evidence is codified into more persona records.

And there is no need for the the system to be limited to two tiers. Say you decide that two of your person records (each referring to multiple persona records) represent the same real person. Two obvious approaches exist. First all the personas from the two persons could be grouped together into one person record, replacing the two persons. Or a new person record record could be created that refers to the two person records, adding a tier; the two person records are not modified in any way.

There are advantages to both approaches. In the former things remain two tier. The persona level is always a codification of evidence, and the person level is alway the codification of conclusions and decision making. And two tiers are simple and make good sense.

In the latter case, the history of decision making is maintained. Each interior node in an n-tier tree keeps its own conclusion, so you end up with a "conclusion tree" that clearly shows how you made your decisions about who was who. Another advantage of the n-tier approach is its reversibility. You can undo decisions easily.

All this depends on the idea that we decide we want to codify our evidence into persistant data base records. I don't put a value judgement on that. I want to be able to it, because it is how I do my own research and models how I view the research process. But others do just as well by only keeping the conclusion persons records around, adding information from new sources directly to those conclusions records, which grow larger and larger as new evidence is found.

ttwetmore · 2013-03-15T22:47:40Z

In an n-tier system, which is the system I prefer, there is no need, in my opinion, for a tag to specify whether a person record is a persona record or a conclusion record. If a record has tiers below it, it must be a conclusion. If a record is a leaf in a tree (or a stand alone record) we WANT it to be a persona, but there is no way to require it to be. A persona record could be defined operationally as any person record with a source reference.

But if you want a tag there's not too much to complain about from my point of view. I always prefer simplicity in a model, knowing that things always complexify enough!

stoicflame · 2013-03-15T23:01:50Z

Thanks, @thomast73. I'm so glad you're there to help fill in my many gaps.

And, thanks @ttwetmore for taking the time to expound on the n-tier model and particularly to compare it to the two-tier model. I'd like to say again that we intend GEDCOM X to be able to support an n-tier model to accommodate applications that implement such a model. I hope to be able to initiate a proposal at #149 within the next few days.

there is no need, in my opinion, for a tag to specify whether a person record is a persona record or a conclusion record.

So what about person records that are "stand alone"? How would you be able to tell whether applications should treat such a record as an "information item" (i.e. the record shouldn't have more than one source and shouldn't be modified in such a way so as to conflict with what that source says) and a "conclusion record"?

mikkelee · 2013-03-16T05:33:10Z

First off, I love the amount of detail that just showed up here tonight. To quote Hemingway, this is fascinating as obscenity. My comments:

@thomast73 I would like to attempt to restate the use case

Thanks, that helped a lot on forming my thoughts, diffuse as they may appear.

@ttwetmore When a researcher decides that two personas represent the same person a new person record is created that links to the two persona records. The two persona records are permanent records. They are never destroyed. They are not merged into the body of the new person record. The new person record does not need a source because the persona records already hold complete source information.

I very much agree that a "person" composed of "personas" should show all data from all contained "personas". Say I have five different personas that I consider the same person. It is then up to me and my software to sort eg. their birth year data based on confidence and conflicts/lack thereof (say all except one say born in 1743, the last one says born in 1753). Method being something like sort events of type T by confidence > average numeric values/display values colorcoded by strength of evidencer

Obviously that is too complex to encode in a standard, that's just me talking idealism, but the idea being as @ttwetmore says: All data is always there, and it is up to the user & software to decide what is right and what is wrong - to show to the end user at first glance. Generally data will somewhat agree, and if they do not match up at all, the user is probably at fault, putting two obviously conflicting "personas" into the same "person" without any confidence check.

@stoicflame So what about person records that are "stand alone"?

I suppose those could be classified by being 1 ref from an "actual" source - ie, any person deriving itself from conclusions on other persons is not a persona. Any person derived from a piece of paper or a picture is a persona.

ttwetmore · 2013-03-16T11:27:44Z

@stoicflame So what about person records that are "stand alone"? How would you be able to tell whether applications should treat such a record as an "information item" (i.e. the record shouldn't have more than one source and shouldn't be modified in such a way so as to conflict with what that source says) and a "conclusion record"?

If a person record has a source link it is a persona. If a stand-alone person record does not have a source reference it is either a "lazy" persona (user didn't bother to add source info) or it is an "old-fashioned" conclusion record (as in today's systems, in which case one hopes that at least some of the individual attributes/fields/properties within the record will have source references).

If we do end up with applications that support the research process by using evidence based records (e.g., personas) and conclusion based records (e.g., "today's" person records), then at certain times the user interface will concentrate on personas (just the facts, m'am), and sometimes on conclusions (showing the user the "roots of the person trees", with options to "dig deeper" into the facts). Where does a stand-alone person record fit into this UI scheme? I think the user might want to see these in both contexts. Certainly if the record has a source reference it is a fact and should be shown with them. Certainly if it doesn't have a source reference it should be shown with the conclusions. If I were writing such software I would also have a mode where I could see all and only the stand alone records.

The issue is probably whether to allow a stand-alone record to not have a source reference. One could imagine a very strict, research based application, that simply insists that all stand-alone records must be personas, and be done with it -- that's just the way it is. I would want a more flexible system that would allow stand-alones that could be "imported conclusion records." But wanting such flexibility might be another example of my desire to resist restrictions, even in places where they are the best way.

But this is probably all moot. I have no real objection to the tag at all, other than my natural contrariness toward any kind of rule or restriction before it is fully considered.

stoicflame · 2013-03-16T15:06:39Z

If a person record has a source link it is a persona.

So that doesn't make sense to me. Why shouldn't we accommodate the notion of a "conclusion" person that cites sources? Most applications today (which are decidedly not n-tier) don't even provide much of a UX that allows a user to gather "information" in the form of a persona, instead just allowing users to put all their conclusions together and cite the sources they used.

So maybe what you're saying is that a media type that enforced the n-tier architecture wouldn't need to have the persona flag, and I guess I can see that.

Just to be clear, though: when I say "we intend GEDCOM X to be able to support an n-tier model," I'm not saying that we intend GEDCOM X to enforce and n-tier model. GEDCOM X needs to (also) accommodate implementations that allow "conclusion" persons to cite sources. Hence the need for a persona flag to distinguish.

ttwetmore · 2013-03-16T17:35:59Z

Ryan,

Yes, I think the whole thing comes down to the "model" that an application supports.

Today a person record (in a typical desktop or on-line system) is a conclusion record and it contains possibly many PFACTs (properties, facts, attributes, characteristics, traits), with the pfacts extracted from multiple sources, and each pfact "should" have a source reference to indicate where it was found. I believe this is the base model we are all comfortable with. Therefore it is a model that GEDCOM-X should support.

When I said a conclusion record doesn't need source references, I was referring to conclusion records as they might exist in a two-tier or n-tier system, in which the conclusion records, instead of containing pfacts, contain references to persona records that have the pfacts and the source references. In such a research based system the conclusion records get their source references indirectly through their personas. This is a model that I hope GEDCOM-X will be able to support, and I am happy to see that you are supporting the idea.

I agree with you that GEDCOM-X, if it is to be a generic model, must not restrict things to any particular genealogical software model. So a person record should be able to have a source reference that applies to all the pfacts in the record inclusively, and/or each pfact should be able to have its own source reference. This generic approach is the one I use in the DeadEnds model.

I guess one of the reasons that I don't like the tag idea (which really is a fine idea, I just don't like it), is that it kind of admits to the world exactly what kind of a model the software is choosing to use. I think I'm just be dumb about this as that's not such a big deal. Maybe it is important, given that GEDCOM-X will be able to support different genealogical models, that there be tags to indicate the fact.

zappala · 2013-03-16T22:05:05Z

@ttwetmore

Say you decide that two of your person records (each referring to multiple persona records) represent the same real person. Two obvious approaches exist. First all the personas from the two persons could be grouped together into one person record, replacing the two persons. Or a new person record record could be created that refers to the two person records, adding a tier; the two person records are not modified in any way.

And then what happens if you decide that the original two persons were wrong altogether?

Consider Persona 1 and 2 referenced by Person A, and Persona 3 and 4 referenced by Person B. These are merged into Person C, referencing Person A and B. An astute researcher realizes the proper grouping is actually Person D referencing Persona 1 and 3, and Person E referencing Persona 2 and 4. (You better believe this will happen, especially in shared trees with novice researchers.)

It seems like the only logical conclusion is to delete Persons A, B and C and instead provide Persons D and E. How do you retain any of the advantages of the N-tier model when it reduces back to 2 tiers the minute a genealogist make a mistake and the tree of Persons must be re-arranged by a more careful researcher?

------ EDIT -------

I see from another thread that perhaps the right thing to do here is to modify Person A and Person B pointing to ALL the personas, including the wrong ones, and write a proof statement for each person that includes the conflicting evidence. If they end up being the same person, you can merge them into person C, and if that is the wrong thing to do you break that merge and retain the original two persons, with revised proof statements.

ttwetmore · 2013-03-17T00:43:54Z

@zappala, I agree with your analysis (it would be hard to disagree!). But I don't believe it applies to the 2-tier vs n-tier issue. N-tier only makes sense (to me) as a way to record the history of sequential, decision making. As your example points out, when you have to undo earlier decisions and make fundamental changes to the structure of the records, you loose that history, in the sense that it is no longer present in the structure of the records. If it is important to you to record every decision, the ones that break down older decisions, as well as the ones that build up new decisions, you will have to find another way to do it. Notes in the person records come to mind.

I don't wish to sound preachy about the n-tier approach. If it were available I think I would use it. I have used the same structure in a real-world, non-genealogical application that had to automatically join billions of persona records into 100s of thousands of person records (the algorithms used by the Zoominfo Company). Because of the sheer mass of data, and the need to solve the O(n*n) problems of comparing billions of records to billions of records, a multi-phase approach based on n-tiers (one tier per phase) was the only practical solution I could come up with. This is certainly not an argument that a similar structure is needed in a genealogical application that deals with orders of magnitude fewer records. There are enough analogies however to make it intriguing to think about.

A two-tier approach will solve the problems of codifying research excellently. The real issue boils down to whether or not we decide that our genealogical databases should hold our evidence in some explicit record-based format, or whether we only wish to copy items of evidence directly from the source material to our conclusion records with no intermediary (e.g., persona) stage.

stoicflame · 2013-03-18T16:54:54Z

I guess one of the reasons that I don't like the tag idea (which really is a fine idea, I just don't like it), is that it kind of admits to the world exactly what kind of a model the software is choosing to use.

Believe it or not, I can relate to that. I had the same concern (but didn't articulate it the same way) and pushed us to where we are today as @thomast73 articulated. But it turned out to be confusing, hard to explain, and add potential for data integrity violations. Hence the proposal here to just use a flag because it's clearer.

stoicflame · 2013-03-19T21:21:27Z

FYI, I've attached the initial draft of the proposal to support the N-Tier Evidence Architecture to issue #149 . The proposal is dependant on this issue. The pith of it is at 0161dd8. Your comments are welcome.

jralls · 2013-03-19T22:29:58Z

Note that the changes in #149 include the changes here, so it's easier to review there.

thomast73 · 2013-03-28T16:43:14Z

I continue to worry that the only marker being considered is in Person. Persons are not the only information we need to represent from a source. It is true that all of the names, gender and facts associated with a "persona" can be reasonably construed to be information in the source associated with the "persona". But not every piece of information associated with a source can be directly associated with a Person.

In some cases, we use information from a source to make a case, and the information is not directly associated with any Person being discussed in the case.

To repeat myself...

We could mark the top-level entities (e.g., Person, Relationship, Event, Document)?

I would add to this list PlaceDescription.

For Relationship, which should always be associated with two Persons, I can see an argument that it doesn't need a flag to indicate that it represents information. If the Relationship was associated with two "personas", it could be reasonable to assume that it is an information entity and not a conclusion entity. But I wonder if there would still be some convenience in having it marked explicitly?

But what about an Event to which we are not associating any persons? What about representing research about a place via a PlaceDescription? Or what about a Document that contains a transcription, extract, abstract or translation? All of these could be attempts to represent information in a single source. All of these may need to be included in a data exchange and have no connection to any "persona". Shouldn't we be able to mark these as representing information in a single source? I feel like there would be a definite gap in the model if we could not designate these objects as information entities.

stoicflame · 2013-03-28T18:29:55Z

So if we're going to add a flag to more than just Person, then we need to come up with a name for the concept that can be applied to Person, Relationship, Event, Document, PlaceDescription, etc. Does anyone have any good ideas for a name of the flag that could be applied to more than just Person?

information ? Not specific enough.
informationItem ? Still not quite specific enough.
scopedToSource ? Ick.
singleSource ? Ick.
assertion ? Probably too generic.

Here's an idea: instead of distinguishing what resources are "information items", we could distinguish which resources are "working conclusions". Then the flag would be the inverse and might be more easily named...

conclusion ? Maybe...

Or what about:

analysis ? That might work. This person, even, place, document, etc. represents an analysis of information.

What do you think?

jralls · 2013-03-28T20:30:30Z

Let's step back a bit and think about what we're trying to model.

What Thad is pointing out is that a proof argument may need to take into account a wide variety of evidence, some of which may not directly mention the historical person under discussion and therefore doesn't generate a Persona instance but nevertheless bears on determining the fact or perhaps just in writing a good biographical sketch.

The BCG crowd advocates a tree of prose analyses culminating in a proof argument that is attached to one or more events and facts associated with a person. For the most part they also advocate doing that work outside of the genealogy database program, because none of those programs provide any support for recording the analysis.Is that where you want to go?

zappala · 2013-03-28T21:54:33Z

How about a flag called evidence?

jralls · 2013-03-29T22:31:55Z

How about using the extractedConclusion list on the SourceDescription like we already decided in #202? We've been around this block before.

stoicflame · 2013-03-30T03:30:49Z

How about using the extractedConclusion list on the SourceDescription

That is, indeed, an option. But I'd (personally) vote against it because I've had to explain it to too many people who get confused about it. After going through the rounds of explanation until they finally get it, their question is usually: why not just provide a flag?

jralls · 2013-03-30T03:43:49Z

OK, but on Conclusion and perhaps called "abstracted"? Or are you still stuck on it applying only to Persons? Can we lose the extractedConclusion list on SourceDescription in exchange?

stoicflame · 2013-04-01T14:59:35Z

OK, but on Conclusion and perhaps called "abstracted"?

Huh. Yeah. I guess I kind of like that. Anybody else?

So @jralls I like your suggestion, but I still can't tell if you would prefer to just not have a flag. What's your preference?

jralls · 2013-04-01T15:32:44Z

What's your preference?

KISS. I like a flag that says "this conclusion is a verbatim abstract of the cited source" a lot better than a list on the source of "conclusions which are verbatim abstracts of this object" because the former is better data encapsulation: One shouldn't have to look at another object to get a complete description of the object at hand.

But what is the motivation for such a tag? Does anyone besides Tom contemplate writing a program that makes use of the difference? Does it really add anything to an n-tier program? Tom doesn't think so:

In an n-tier system, which is the system I prefer, there is no need, in my opinion, for a tag to specify whether a person record is a persona record or a conclusion record. If a record has tiers below it, it must be a conclusion. If a record is a leaf in a tree (or a stand alone record) we WANT it to be a persona, but there is no way to require it to be. A persona record could be defined operationally as any person record with a source reference.

thomast73 · 2013-04-01T17:45:35Z

The words "extract" and "abstract" already have meaning in the genealogical community. We have already received push-back on associating these names with the concept of a "verbatim" representation of information in a source. We tried to get around that concern by combining two words to form "extractedConclusions".

Perhaps the flag could be "extractedConclusion"?

Here are a few more attempts at a name...none of which I am truly happy with...just hoping to spur ideas:

trueToSource
verbatimFromSource
infoFromSource
informationItem

jralls · 2013-04-01T19:02:59Z

"extractedConclusion" is less ugly than the others.

My mac's Thesaurus produces the following synonyms for "abstract"
summary, synopsis, précis, résumé, outline, abridgment, digest, summation; wrap-up.
and for "extract":
excerpt, passage, citation, quotation; (excerpts) analects.

I like 'digest'.

stoicflame · 2013-04-02T14:57:36Z

I like the word "extracted", but to put a property named "extractedConclusion" on a data type called "Conclusion" seems redundant. I'd prefer just "extracted" so that the accessor would look something like "conclusion.extracted".

jralls · 2013-04-03T03:46:48Z

Doesn't 'extracted' elicit the same objections as 'extract' and 'abstract'?

thomast73 · 2013-04-08T21:35:47Z

I like 'digest'.

I like some of the dictionary definitions I saw for this word, and thought they were applicable. But without reading those definitions, I did not see an immediate connection. So I worry about adopting this name.

Doesn't 'extracted' elicit the same objections as 'extract' and 'abstract'?

So...I went back and looked for the original objection...and found it here? If this was it, it was not raised exactly like I remember it, so I apologize.

And despite the objection, the "extractedConclusions" name was eventually adopted.

@stoicflame argues that the "conclusion" part of the name was more meaningful when its context was the SourceDescription class, and that the "conclusion" part of the name is redundant when the context will be the Conclusion class and therefore ought to be eliminated.

At this point, I guess I would also lean toward using the "extracted" name.

jralls · 2013-04-08T23:15:01Z

So...I went back and looked for the original objection...and found it here? If this was it, it was not raised exactly like I remember it, so I apologize.

With "here" being in #202, so we are indeed revisiting last year's work. Rather different from the usual meaning of "pushback" in this forum, where you usually mean the anonymous group of "outsiders" who occasionally veto consensus arrived at after (sometimes weeks) of discussion in an issue.

And despite the objection, the "extractedConclusions" name was eventually adopted.

Well, Ryan exercised his executive privilege and committed a change with that in it; it wasn't because of anything resembling consensus. In any case the objection was something of an aside in a long and rather circuitous discussion.

With that cleared up, "Extracted" is fine with me.

But to repeat an earlier question, is this replacing the extractedConclusion list on SourceDescriptions?

thomast73 · 2013-04-08T23:30:47Z

But to repeat an earlier question, is this replacing the extractedConclusion list on SourceDescriptions?

Yes. That is the current plan.

…tion*__ to conclusions.

thomast73 · 2013-04-08T23:39:18Z

I have tried to update the specification reflect the current state of the proposal.

jralls · 2013-04-09T19:00:18Z

Looks good, but there are still a couple of references to "persona" that should be redone.

…formation'

stoicflame · 2013-04-09T21:12:45Z

Looks good, but there are still a couple of references to "persona" that should be redone.

See d28209d, which formally defines the "persona" concept using the "extracted information" concept. (I think there is still value in formalizing the notion of "persona", even if it's just for convenience.)

jralls · 2013-04-09T21:29:31Z

OK. I don't see why we need a special term for it, but it's harmless.

"Encloses" sounds odd. What was wrong with "contains"? While I'm nit-picking wording, how about "Extracted Conclusion Constraints" instead of "Extracted Information Constraints"? The constraints are on Conclusions, and "information" isn't a defined term in the spec.

stoicflame · 2013-04-10T19:24:11Z

"Encloses" sounds odd. What was wrong with "contains"?

ac170a6

how about "Extracted Conclusion Constraints" instead of "Extracted Information Constraints"? The constraints are on Conclusions, and "information" isn't a defined term in the spec.

+1

I'd like to have @thomast73 comment on that, though.

thomast73 · 2013-04-14T01:54:48Z

One of the purposes for adding this flag to the GEDCOM X model is to add a provision in the model for what is usually termed "information" in the Genealogical Proof Standard (GPS) literature—see Elizabeth Shown Mills, Evidence Explained: Citing History Sources from Artifacts to Cyberspace, 2d Ed. (Baltimore, Maryland: Genealogical Publishing Company, 2009), 24.

Given this, and after trying to get the constraints written, etc., I would actually like to modify the proposal so that the flag is called extractedInformation, tying it to the equivalent concept in GPS. I would like the "Extracted Information Constraints" section to remained titled as it is. I would also like "Extracted Information Constraints" to be tied to its related concept in the GPS literature...which probably means that the "Extracted Information Constraints" needs to be updated.

jralls · 2013-04-14T05:19:52Z

The GPS itself is a bit loose about information, data, and evidence, using all three in the same bullet-item without any distinction about which means what.
Ms. Mills on p24 defines information as "referring to the content of the source" [italics hers], then goes on to explain the difference between primary and secondary (i.e. eyewitness vs. hearsay) information. The point she's trying to make here is that sources themselves aren't primary or secondary (a incorrect distinction which is unfortunately common in genealogy texts), but rather that they can contain both primary and secondary content. The classic illustration is a death certificate, where the informant likely has direct knowledge of the death information but in most cases is unlikely to have been present at the deceased's birth.

She goes on to define evidence as " our interpretation of information we consider relevant to the research question or problem" [again, italics hers], and to explain direct, indirect, and negative evidence. Two pages later, she presents the "five essential parts" of a proof argument, where the third is "presentation of evidence, supported by thorough source citations and analyses" and the fourth "explicit discussion of any conflicting evidence".

ISTM you want to use 'evidence' here (as in extractedEvidence instead of extractedInformation), as the very act of encoding the information into a database is necessarily interpretive.

stoicflame · 2013-04-14T14:15:45Z

Well said, John.

I think what Thad's trying to express is that he'd like to use the term "information" so that it's easier for those who use the vocabulary as you've explained can more easily identify how those concepts are supported in GEDCOM X. I think Thad would like to identify the "information" and then refer to that information as "evidence" from the (working) conclusion.

So, the piece that's not on the table yet is a new (forthcoming) proposal to introduce a new concept called something like "evidence reference" which is used to refer, for example, to personas from persons instead of using the identifiers as is currently established. Then, you can put on that evidence reference things like direct/indirect etc.

For my comments, I could get behind the name extractedInformation although it's not a big deal to me because, as you point out, "the very act of encoding the information into a database is necessarily interpretive."

jralls · 2013-04-14T18:02:43Z

So, the piece that's not on the table yet is a new (forthcoming) proposal to introduce a new concept called something like "evidence reference" which is used to refer, for example, to personas from persons instead of using the identifiers as is currently established.

It's a better name, but it's not really a new concept, is it?

For my comments, I could get behind the name extractedInformation although it's not a big deal to me because, as you point out, "the very act of encoding the information into a database is necessarily interpretive."

That's the point, actually, and is why using "information" isn't appropriate. Information is abstract; as soon as you make concrete bits of it it becomes evidence.

There's another viewpoint buried in here, though, and that's that we're writing a spec for programmers, not genealogists, and the name of the object being modified is "conclusion", not "evidence" or "information". In OO speak, Evidence subclasses Conclusion, and to my mind that's conceptually clearer and therefore easier to explain than applying constraints to Conclusion regardless of how you label them.

thomast73 · 2013-04-15T17:02:13Z

I am going to rescind my most recent addendum to the proposal.

The distinctions between information and evidence in Evidence Explained are not as tight as the pundits (e.g., Tom Jones) are currently preaching, but I do not know where to find the current preaching in writing to be able to discuss and cite it in a public forum. I think the extracted name and the reference to "Conclusion" in the constraints title as suggested by @jralls are generic, saying little about how these objects and properties ought to be mapped to the GPS concepts of information and evidence. I feel that this is probably sufficient to move forward. Our opportunity to associate the concepts in the model with the concepts in GPS may turn out to be largely a function of the documentation anyway.

…ing another concept in the conceptual model that must be defined.

stoicflame · 2013-04-15T18:48:44Z

4ff62e9 is available for review.

We also missed adding the extracted property to the xml, json specs. Fixed at 8e0e399.

Conflicts: specifications/conceptual-model-specification.md

stoicflame added 2 commits March 14, 2013 14:38

introducing the persona flag and a definition of a persona in the spec

15ca985

removing the references from a source to its extracted conclusions be…

9343eb3

…cause the information is redundant with the flag.

woops! didn't mean to get rid of 'attribution' documentation on the p…

afb17be

…erson...

stoicflame mentioned this pull request Mar 16, 2013

Events vs Facts #208

Closed

stoicflame mentioned this pull request Mar 19, 2013

clarify again support for "n-tiered" implementation #149

Closed

Modifies the specification to add the concept of __*extracted informa…

772cf9a

…tion*__ to conclusions.

formalizing the notion of 'persona' using the notion of 'extracted in…

d28209d

…formation'

s/encloses/contains

ac170a6

stoicflame added 2 commits April 15, 2013 12:39

s/extracted information/extracted conclusion so as to avoid introduct…

4ff62e9

…ing another concept in the conceptual model that must be defined.

json, xml updates for the 'extracted' flag

8e0e399

Merge branch 'master' into persona-flag

0179a09

Conflicts: specifications/conceptual-model-specification.md

stoicflame merged commit 0179a09 into master Apr 16, 2013

specifying the persona #236

specifying the persona #236

Conversation

stoicflame commented Mar 14, 2013

MystEre84 commented Mar 14, 2013

mikkelee commented Mar 14, 2013

stoicflame commented Mar 14, 2013

mikkelee commented Mar 14, 2013

jralls commented Mar 15, 2013

stoicflame commented Mar 15, 2013

jralls commented Mar 15, 2013

MystEre84 commented Mar 15, 2013

thomast73 commented Mar 15, 2013

The Use Case

Requirement(s)

The Current Model

The Proposal

Comments

ttwetmore commented Mar 15, 2013

ttwetmore commented Mar 15, 2013

stoicflame commented Mar 15, 2013

mikkelee commented Mar 16, 2013

ttwetmore commented Mar 16, 2013

stoicflame commented Mar 16, 2013

ttwetmore commented Mar 16, 2013

zappala commented Mar 16, 2013

ttwetmore commented Mar 17, 2013

stoicflame commented Mar 18, 2013

stoicflame commented Mar 19, 2013

jralls commented Mar 19, 2013

thomast73 commented Mar 28, 2013

stoicflame commented Mar 28, 2013

jralls commented Mar 28, 2013

zappala commented Mar 28, 2013

jralls commented Mar 29, 2013

stoicflame commented Mar 30, 2013

jralls commented Mar 30, 2013

stoicflame commented Apr 1, 2013

jralls commented Apr 1, 2013

thomast73 commented Apr 1, 2013

jralls commented Apr 1, 2013

stoicflame commented Apr 2, 2013

jralls commented Apr 3, 2013

thomast73 commented Apr 8, 2013

jralls commented Apr 8, 2013

thomast73 commented Apr 8, 2013

thomast73 commented Apr 8, 2013

jralls commented Apr 9, 2013

stoicflame commented Apr 9, 2013

jralls commented Apr 9, 2013

stoicflame commented Apr 10, 2013

thomast73 commented Apr 14, 2013

jralls commented Apr 14, 2013

stoicflame commented Apr 14, 2013

jralls commented Apr 14, 2013

thomast73 commented Apr 15, 2013

stoicflame commented Apr 15, 2013