ChatGPT (Artificial Intelligence/AI) transcribing family documents dense with names and dates

 Recently watched a familytree webinar with a panel on the topic of Genealogy meets AI. I had rather dismissed chatGPT after a short talk at the Princeton Public Library and some experimentation.  The demos  have me trying to understand prompts and GPTs and see if I can get solutions to some tedious comparison tasks on my genealogy list. (Ironically a recent upgrade to MacOS Sequoia involved setting up Apple Intelligence, Siri, and chatGPT.)

In chrome on my Windows 11 box, I signed into chatgpt (4 not the previously used plus with has the paperclip grayed out). Apparently share is broken - or payfor. So I am trying to copy and paste the more interesting bits. I used the paperclip icon to upload page 7 of a certain well known family document, and "transcribe exactly". (I really need to rename my files.) It started transcribing the page and asked how it was doing. Things I told it included "The table format should be used for each set of children" and "the page number on the bottom was dropped". I should go back and get the header used only once per page. It even went back an gave my new chat an appropriate name, Genealogy Table Transcription. In any case chatgpt is better than google doc at recognizing the significance of spacing to create tables.

Next up "analyze". It did a nice job with "Structure and Content". While I think most of the info is in familysearch, I am going to make sure I am not sharing anything I shouldn't by not including more. "Observations" were spot on. I went back to ask "who was missing a death date" and received a nice contextual list. Looking at the "Suggestions for Improvement" I feel like I should see if it could help with 2) Formatting and 4) Visualization.

Observations

  1. Consistency:

    • The document is methodically organized by generations and branches of the family tree.
    • Each set of children is grouped under their respective parents, making it easy to trace lineage.
  2. Missing Information:

    • Some entries have missing details, particularly for spouses' birth or death dates (e.g., ???, ???, etc.).
    • ???’s death date is omitted entirely.
    • Highlights:

      • This document provides a clear snapshot of family connections over three generations, including spouses.
      • It serves as a valuable resource for anyone researching the Hauck family history or constructing a detailed family tree.
    • Usability:

      • The use of tables makes the information structured and easy to follow. However, for a printed format, ensuring all tables fit neatly on one page would be essential.

Suggestions for Improvement

  1. Standardization:

    • Ensure all missing dates or approximate markers (*) are explained, or leave placeholders (e.g., “Unknown”).
    • Consistently indicate multiple spouses for individuals (e.g., George Hauck II's two spouses).
  2. Formatting:

    • If the document is to remain in its current form, reformat it slightly to ensure all text and tables fit neatly within page boundaries, especially for printed use.
  3. Additional Context:

    • If possible, include migration details (e.g., year of arrival in the United States) to enhance historical context.
  4. Family Tree Visualization:

    • Consider supplementing this document with a diagram or chart for visualizing the family tree. It would help in understanding complex relationships at a glance.

Finally I asked "how many pages of similar data can you handle" because of warnings of the system slowing down. This is only one page of a longer document (some add a marriage date). Then there is a supplement with additional information. Can it combine them? Can it add citation references to the added/corrected data? I want to see the family tree, GEDCOM, and maybe even the genealogy report.

I can handle a large number of pages of similar data, provided they're shared in manageable formats (e.g., images, PDFs, or text). Here's a breakdown of how I can assist:

  1. For Image or PDF Files:

    • I can transcribe and process dozens of pages of similar genealogical data in structured formats like tables, preserving the original structure and details.
    • If the files are particularly lengthy, processing time might increase, but there’s no strict limit on the number of pages.
  2. For Text Files or Copy-Paste Input:

    • I can process and organize vast amounts of data at once. For example, hundreds of names, birth/death dates, and other details can be formatted systematically.
    • If You Need Summaries or Insights:

      • I can analyze or cross-reference the data for missing details, duplicates, or patterns.
    • When Creating or Updating a Document:

      • I can compile all the information into a single well-structured document, such as a family tree or genealogy report.

This small effort was preceded by watching another familytree webinar, Using AI to Transcribe Records by Thomas MacEntee, to figure out get started. Was unhappy to figure out this kind of work also needs citing. In any case, I decided it was time to take a break from the computer. By the time I finished copy and pasting into the blog the free version, it had timed out.

 

 You’ve hit the Free plan limit for GPT-4o.

You need GPT-4o to continue this chat because there's an attachment. Your limit resets after 5:25 PM.
MacEntee had demoed on chat.openai.com switching between the free and $$ version getting different quality results. He also mentioned trying Anthropic's claude.ai, copilot.microsoft.com, and gemini.google.com (was BARD). With the Mac lots of experimentation is available.

How do I cite the two documents and include in output?

The whole reason I started down this path was to compare information in different forms. Clearly that is an effort for another time.

1/16 update:
I've been trying to transcribe more of this document a page at a time, and keep getting put off. After 6 pages it barfed, I mean reset to no knowlege. The plan is to contine to transcribe pages and download. Then see if I can start a new chat with the text files  (or maybe even one file) and proceed from there.  This is no longer saving me time. Granted I could pay $20 / month. My feeling is there is optimism but it is still just a toy.

Comments

Popular posts from this blog

Can of worms - one thing leading to so many others

digitizing dad's slides and old home movies

familysearch hint > Goldie May > HeritageHub