Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOURcing details #455

Open
WGroleau opened this issue Apr 11, 2024 · 10 comments
Open

SOURcing details #455

WGroleau opened this issue Apr 11, 2024 · 10 comments
Labels
awaiting use Waiting for evidence of vendor implementation and use next minor

Comments

@WGroleau
Copy link

In the GEDCOM 5.5 spec, we find
“The <<SOURCE_CITATION>> structure is placed subordinate to the fact being cited.  It is generally best if the source citation contains only information specific to the fact being cited …”
But limiting it to levels one and two runs counter to that.  Often, a source identifies only the DATE or only the PLAC or only some other fact.

A SOURCE_CITATION should be allowed at any level, subordinate to the specific detail it supports.

@dthaler
Copy link
Collaborator

dthaler commented Apr 11, 2024

Discussion during GEDCOM Steering Committee meeting April 11, 2024:

  • This can be done in GEDCOM 5.5 or 7.0 via an extension today (i.e., relocated standard structure, using the language in the 7.0 spec), such as defining an _SOUR tag that can appear in such places.
  • The process for incorporating new functionality in the standard over time is explained on the extension registry page, which includes the requirement that an extension be used (or a commitment to be used) by at least two independent applications or websites.
  • This idea of allowing SOUR in more places was suggested during the 7.0 specification process but there was significant pushback from implementers at the time.

@dthaler dthaler transferred this issue from FamilySearch/GEDCOM.io Apr 11, 2024
@dthaler dthaler added next minor awaiting use Waiting for evidence of vendor implementation and use labels Apr 11, 2024
@Norwegian-Sardines
Copy link

The process for incorporating new functionality in the standard over time is explained on the [extension registry] which includes the requirement that an extension be used (or a commitment to be used) by at least two independent applications or websites.

  1. Why must applications first commit to using a feature before it is added? Is this the wrong way to add a feature?

  2. What applications are polled/used to base this commitment? If I write an application and I support a feature/change to GEDCOM will my vote count?

  3. If we leave feature change up to a controlling few, changes may not support good design, good genealogy, knowledgeable uses!

As with this specific need, for example. Most commercial programs do not support facts that have multiple sourced, but different, dates and/or places. For example: Do they provide multiple BIRT tags for each of the sourced data, lump all data into one BIRT tag, write tons of notes, ignore all but the first BIRT tag?

If GEDCOM does not provide a sanctioned way to transmit data each application will go their own way and later fight for their way to transmit data so they don’t have to change their code! We have seen this fight play out in the years between the publishing of GEDCOM V5.5 and today! Some applications refused to use all of v5.5.1 because it was a “Draft” or only implemented something that they thought was good for them, but ignored other things that they could not implement. Or create an “extension” in GEDCOM, but don’t use a perfectly good standard tag already in the current standard. We see this today in the multiple ways each application implemented Sourcing and Citations and very often not using the current standard ways of saving source information!

Sorry for this long rant, I’m feeling frustrated and stymied by the lack of movement on some GEDCOM fronts!

@tychonievich
Copy link
Collaborator

@Norwegian-Sardines Thanks for sharing your feelings! I personally want several changes to the spec to happen that have not met these requirements, so I sympathize with your frustration.

The steering committee has discussed this topic many times. One model we've discussed is we add to the specification all the things we think make for good family history data and then rely on the strength of the GEDCOM name to encourage applications to implement the features that data presumes, like was done with ALIA in 5.5. Another model we've discussed is we only add what there's already market consensus to use, like was done with UID in 7.0.

The standardize-first model has two primary risks: it might cause future versions of the spec to be seen as out of touch with reality, causing people to not move to them; and it might cause us to commit to elements of the specification that are not as good in practice as they appear in advance.

The consensus-first model has two primary risks: it might cause a stagnation of the standard, perpetuating bad design and bad practice; and it might cause a proliferation of application-specific alternatives, reducing the interoperability of software.

Our current goal is to find a balance between these two extremes. We do not require full consensus, but we do require evidence that some people have implemented or are committed to implementing each feature we add to the specification. We might revisit this balance point in the future, but for now the steering committee is united around this set of criteria.

On your question "If I write an application and I support a feature/change to GEDCOM will my vote count?" — Yes! We have not limited who counts and who doesn't. We are looking for evidence of use or commitment to use; that can come from any software, regardless of how popular it is.

@Norwegian-Sardines
Copy link

On your question "If I write an application and I support a feature/change to GEDCOM will my vote count?" — Yes! We have not limited who counts and who doesn't. We are looking for evidence of use or commitment to use; that can come from any software, regardless of how popular it is.

How does a software program, or committee know that they can comment or show they want a feature to be included?

More importantly how does a software program or committee know that they can risk breaking GEDCOM interoperability by “rolling their own standard” and get their vision into GEDCOM in the future? Some software adheres to that standard (because it is a STANDARD) while others break it without concern to best design and interoperability!

For example: Why has the German _LOC (location extension) not been implemented or suggested for inclusion? The software I use has some of it implemented! Was that software consulted about its implementation?

Some applications allow their users to write “add-in” or “modules” that augment the application, do these get recognition as “use” as well? Are they even identified for consideration?

@WGroleau
Copy link
Author

WGroleau commented Apr 26, 2024

"This can be done in GEDCOM 5.5 or 7.0 via an extension today (i.e., relocated standard structure, using the language in the 7.0 spec), such as defining an _SOUR tag that can appear in such places."
That's not really a solution—if one believes this level of integrity is needed, he/she is forced to use only the application (currently none as far as I know) that supports it.  Unless he/she is capable of designing and coding a rather large application.

@WGroleau
Copy link
Author

Aside: if _LOC functionality is ever adopted (I hope it is), please change it to PLAC. If NOTE or SOUR can be both level zero or a pointer, why not PLAC?

@Norwegian-Sardines
Copy link

Aside: if _LOC functionality is ever adopted (I hope it is), please change it to PLAC. If NOTE or SOUR can be both level zero or a pointer, why not PLAC?

Actually NOTE is no longer a zero level entity in GECOM v7.0, it has been replaced by SNOTE, indicating a "Shared Note". Therefore, following this construct the appropriate "Shared Place" would be SPLAC, which would conveniently not break any applications that use the older definition of PLAC.

Noramlizing the various repeatable "shared elements" such as "Facts" and "Citations" could also follow this changed reducing the size of GEDCOMs and allowing for reuse of shared facts and citations.

I would also advocate for augmenting the current GEDCOM to incorporate a new record types for each of these "shared elements" such as: a) "Shared Citation" (SCITA ?) and b) "Shared Fact" (SFACT ?) A "shared citation" would replace what some applications and advocates misuse the Source_Record design along with their source-template systems. A "shared fact" would, like SNOTE, be used as an alternative to the current Attribute/Event stand-alone tags.

@chtiland
Copy link

Hello !

Since I started doing genealogy, I have used and tested a lot of software.
It seems to me that the vast majority of them have made the same mistake: considering GEDCOM as a data model.
As a result, their application is based on a "GEDCOM" structure, with its advantages, but above all its disadvantages in limiting developments and functionalities.
Not only that, but I also noticed that from one application to another, the interpretation of GEDCOM generated disparities!
So much time was wasted in discussions about 'how to interpret' this or that definition given in GEDCOM! And the number of years that pass between each evolution of GEDCOM doesn't help matters.

Of course, the GEDCOM format has the merit of existing and it has certainly played a major role in the evolution of digital genealogy, but it is time that genealogy software publishers stopped considering the GEDCOM format as a data model, but rather as an exchange protocol, a simple export/import file model, as has always been the case :

"A common usage is as a standard format for the backup and transfer of family tree data between different genealogy software and websites, most of which support importing from and exporting to GEDCOM format" - Wikipedia

@Norwegian-Sardines
Copy link

Norwegian-Sardines commented Jul 11, 2024

@chtiland

You are both Right and Wrong!

Many applications do have their own internal data model!

Adhering to the GEDCOM standard as your data model does limit an applications ability to do real genealogy, support elements and constructs not visualized in GEDCOM or by a bias design.

However, without GEDCOM as a base for an applications data model, users of an application will find themselves “locked into” that application due to data loss when trying to transfer information to other applications, or misinterpretations of data by others when receiving shared data. We see this condition today already because of the different data models software already uses which gets worse when they try to use GEDCOM as a data transfer protocol!

In a closed business system, developing a data model to serve the business needs of the company is very important, data transfer is the farthest thing from the developers mind! It is not until the company decides to migrate to a new application with its own data model that they start to “wring hands” and call in migration experts to resolve data inconsistency issues. I did this for a living as a database and application consultant and wrote several utilities to make migration easier!

Most of us have experienced this data loss and misinterpretation first hand in our multiple years of experience and the whole reason many of us are striving for a “BetterGEDCOM”!

@fisharebest
Copy link
Contributor

"Shared Fact" (SFACT ?)

Out of interest, I started to develop a system based on "shared facts" (and also shared citations) a while back.

All events were shared - although some obviously only had one participant. I recall lots of things becoming very simple and consistent. e.g.

  • FAM records are redundant, as families are implicit from things like MARR (two participants with role SELF) and BIRT (participants with roles SELF, PARENT, PARENT).
  • ASSO facts are redundant, as the associate just has a link to the event with a role of WITNESS or whatever.
  • RESI events allow you to record people living together (e.g. cohabiting).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting use Waiting for evidence of vendor implementation and use next minor
Projects
None yet
Development

No branches or pull requests

6 participants