Saturday, 3 December 2016

Analysing my DNA: Crossovers Part 1

Recently I have been working on identifying the "crossover points" in the DNA of a group of four siblings. They are related to me, so the results of their DNA tests will not only help me in my ancestral searches, but also in discovering more about my own DNA and mapping it to specific ancestors. Crossover points are important for anyone who is trying to identify where segments of DNA came from, ie through which ancestors, as they indicate a change in the DNA, from that of one grandparent to that of the other grandparent in that couple.

There are detailed explanations of the processes involved available elsewhere online but these are the basics, for anyone who is new to this. We all have 23 pairs of chromosomes, one of each pair coming from our father and one from our mother. Just as we received them from our parents, so our parents received one of each pair of their chromosomes from each of their parents, any future child's grandparents:

Things aren't usually as simple as that shown above. Meiosis, the process of cell division that produces the egg or sperm and ensures the correct amount of DNA is passed on to the offspring, more commonly involves the two chromosomes in a pair splitting and then recombining in a different way, so that the resulting chromosome that's passed on is a mixture of the two of the parent:

 As the process of recombination is random, children of the same parents will each receive a different combination of the DNA that came to their parents from their grandparents:

Unfortunately, the DNA tests do not phase our results, so we cannot even identify our two separate chromosomes, unless we have other relatives tested. All we have to work with is the raw data and information about where we match other people.

In the case of a parent and child who have both tested, comparison with their matches may sometimes indicate a possible crossover:

Not only does the Match, match me by much less than they do my mother, they also match a group of other people, who only match my mother, not me, over the latter part of the segment, from about 122,000,000 to 134,000,000. So it appears that I may have a crossover in my maternal chromosome and did not receive the rest of the segment from the ancestor shared with this match. If I knew which side of my mother's family this match has a connection to then, if my mother and I have any shared matches starting after 125,000,000, I would know to concentrate my search for the shared ancestry on the other side of my mother's family.

But comparisons with a parent against other matches is only likely to reveal a few of the potential crossovers. A better method, available to anyone with a group of three or more siblings tested, is to use the sibling comparisons to identify crossovers. As already indicated, a group of siblings will have received different segments of the grandparent's DNA, but the results are not phased by any of the testing company chromosome browsers:

I have represented in orange the (approximate!) overall matching segments of the children - but, using Gedmatch, it is possible to also identify where two people fully match, ie match on both of their chromosomes, rather than just "half match", ie match on one chromosome. It is this, more complete, pattern of matching - changing between the states of having full, half, or no, matching DNA - which is used in order to identify the points where the DNA "crossed over" from one grandparent's DNA to the other.

There are probably several methods for doing this but I think most credit goes to Kathy Johnston for her "Visual Phasing" method. There are some very good blog posts about using Kathy's method, which I shall include links to below - it is worth reading several, as we all have different ways of describing what we do. I have now developed a slightly different method of working, which suits me better.

But I am going to start with the smallest chromosome, chromosome 21, and follow Kathy's instructions, to illustrate the basic method to start with.

These are the comparisons between the four siblings at Gedmatch:

This is the key to the Gedmatch colours:

 And these are the figures for the comparisons:

I have kept the figures separate from the individual chromosome images, as I found the crossover lines end up obscuring the figures on the chromosomes with more crossovers.

Despite having looked at comparisons between the siblings in various other formats (eg the FTDNA downloads), it was only when I did these comparisons that I realised Siblings A and B do not match each other on this chromosome.

Which goes to show how we often notice just what is present - not what is missing! 🙂

But, as you can see, even where two siblings do not match each other (ie they have a grey bar along the lower section, not a blue bar) there are still some base pairs showing a half, or even a full, match. There just aren't sufficient of such matching base pairs in a consecutive sequence for it to be regarded as genealogically relevant.

The next step is for the crossover points to be identified. These are the points where there is a change between fully matching and half matching, or half matching and non-matching, ie where the bottom bar changes between blue and grey, or where the top bars change between an area that is consistently green and one which is predominantly yellow, with intermittent green. The former changes are also demonstrated by the figures. Unfortunately, the changes between fully matching and half matching are not specifically identified in any figures at Gedmatch, although Sue Griffith has explained how to obtain a very good estimate of them*. They can also be identified using one of David Pike's tools*.

Once the crossover points have been identified, they are allocated to particular siblings - a crossover "belongs" to the sibling who shows that change in all of their comparisons. (This isn't always obvious, especially if only using three siblings - sometimes, what looks like a single crossover for one sibling can actually be a double crossover for the two others. Having results available from more than three siblings is an advantage for me.)

In the comparison between B to D, the matching segment does seem to start before the crossover point indicated in the comparisons between D to A and D to C. I suspect this segment could be being artificially extended through some base pairs that just happen to match on both B and D. Issues like this are things to note for future investigation, as they may be a hint that something is wrong with the identification.

Next, working with just the identified crossover lines in an image, but referring to the comparison diagram and the figures, the phased segments of the grandparents' DNA are constructed, usually starting with a segment where two siblings are fully identical. In order to do this, four colours are chosen to represent the DNA received by the children from the grandparents.  Two colours are used for the top grandparent couple and two for the bottom grandparent couple.  [Note, If you follow a colour coded genealogy filing system, I would suggest choosing different colours for the chromosome mapping  (at least, until you are absolutely positive you have identified the correct grandparents' segments, in which case you could change the colours to match your genealogy system.  This would then also be a visual clue that, that chromosome is "confirmed")  But, if you use those colours prior to such confirmation, you might find yourself becoming confused, as we do not yet know which grandparent couple is represented by which phased chromosome.]

At the start of this chromosome 21, B does not match any of their siblings, whereas the other three are all fully identical to each other, so the colours can be allocated as follows:

Since neither A nor B have any crossovers, their coloured bars can be extended for the full length of the chromosome. D's can also be extended as far as D's crossover line at 40:

 Between "37" and "40", C becomes half identical to all three of the other siblings. We don't know whether the crossover is on the maternal or the paternal chromosomes (and we haven't identified the colours as being for specific grandparents anyway), so we just have to pick one of the colours to change. I have chosen the top chromosome, purple changing to blue.  As this is the only crossover C has, the two bars can then be extended to the end:

At "40", D becomes half matching to A and B, but fully matching to C. The same chromosome that we changed for C therefore needs to change for D, in order to produce the correct pattern of matching, and the other colour can be extended, unchanged, to the end:

At this stage, we don't know which colours represent which grandparents - that can only be identified by comparison to other known relatives. But we can still look at the shared matches between the siblings, to see how those results correlate with the phasing represented here. For example, I would expect there to be no shared matches between A and B at any point on this chromosome, whereas A, C and D should have exactly the same matches prior to the point "37". B, C and D will share some matches after point "40", but not all of them. The ones C and D don't share with B after "40", should be people that match A, as well.

So, in my next post, I will explore that. I'll also describe some of the issues I have come across in this process so far, as well as explain the way I have adapted Kathy's method to my own way of working.

But, if you thought this chromosome was easy to phase, then perhaps you'd like to consider the following set of comparisons:

[PS Having begun to look at the matches the siblings have with their niece and their 1st cousin, as well as the more distant matches, I have found an "anomaly". So, perhaps phasing chromosome 21 isn't so straightforward, after all!]

* Sources and references I have found helpful:

Kathy Johnston - step by step instructions for her method: http://forums.familytreedna.com/showthread.php?t=36812 (make sure you download both the slides and the instructions)
Jason Lee - a blog post detailing Kathy's method: http://dnagenealogy.tumblr.com/post/137722603308/the-use-of-crossover-lines-among-siblings-to
Blaine Bettinger's pdf combining his five posts about the phasing process - http://thegeneticgenealogist.com/wp-content/uploads/2016/11/Visual-Phasing-Bettinger.pdf

Two other bloggers with helpful posts about phasing, including issues such as the way what looks like a single crossover for one sibling can actually be a double crossover for two others:
Ann Raymont - https://dnasleuth.wordpress.com/2016/06/01/chromosome-mapping-with-siblings-part-2/ (and part 1)
Joel Hartley: http://www.jmhartley.com/HBlog/?p=2239

 Sue Griffith's post on how obtain the values for crossovers from FIR to HIR & vice versa: http://www.genealogyjunkie.net/blog/obtaining-fir-boundaries-on-gedmatch-using-the-little-tick-marks

David Pike has a number of free DNA tools, including the "Search for Shared DNA Segments in Two Raw Data Files" which reports single and double matching segments (ie half identical and fully identical): http://www.math.mun.ca/~dapike/FF23utils/pair-comp.php

Friday, 25 November 2016

DNA Update

It has been an "interesting" year on my DNA journey. Ever since I first took an autosomal DNA test with 23andMe in 2010, I have been working on looking for what are known as "triangulating groups" (TGs) in the data. These are groups of people, who all match me over the same segment of DNA and who also all match each other over that same segment. The theory is that shared DNA indicates shared ancestry and, therefore, if a group of people all share the same segment of DNA, it must have come from the same ancestor (at some level - some of the people in the group may share a close ancestor along the line back to the overall shared ancestor.) The theory sounds "right" and logical, and it appears to fit the patterns I can see in the data:

 I liked using 23andMe for this process. It is the only testing company where it is possible to compare the people you match (and are sharing with) to each other and therefore confirm for yourself whether, or not, they form a TG. This is not possible at the other companies I have tested with. At Family Tree DNA (FTDNA), it is only possible to see where someone matches you, and whether they are "in common with" (ie also share some DNA with) any of your other matches. But you then need to ask them where they match the other people, in order to confirm if they actually match those people over the same segment that they match you on. If it is a different segment, so the TG theory went, then you may all be related to each other through different ancestors, since many of us probably have multiple ancestors in common, as we move further back in time. It was said that you could only be sure the DNA was from the same ancestor if you matched on the same segment.

Part of the difficulty in identifying the TGs at FTDNA, and why you cannot assume people who match you over what looks to be the same segment, and who are "in common with" each other, actually do match each other in the same place and therefore form a TG, is that these DNA tests do not phase the data, ie they do not split it into the two sides we received from our parents. We all have 23 pairs of chromosomes, one of each pair from our father and one from our mother - but the tests just report the two base pairs (bits of DNA!) we have at particular points along the chromosome. So, whilst it might look as if two people match you over the same segment of DNA, one could be matching you on your maternal side and one could be matching you on your paternal side. In that case, the DNA each shares with you would be from different ancestors, one on each side of your family. If the two people also happened to share another ancestor between them, they would show as "in common with" each other - but you would not all be a TG.

 [The lack of phasing also creates the possibility of "false positives" - people who show as a match but who aren't really, because the computers doing the matching have effectively criss-crossed between the base pairs of each chromosome. This is potentially an issue at both FTDNA and 23andMe, in particular. It isn't thought to be so much of an issue at Ancestry, as Ancestry does a form of phasing of the data. However, I didn't think such false matches were likely to be much of a problem, because I thought that, if a group of people were all triangulating, then the chances of all the comparisons being "computer creations" must be quite slim. I do have some groups of matches where no-one matches each other, despite all apparently matching me over the same segment - so those were the matches I took to be "false positives", as theoretically there can only be a maximum of two non-matching results over any particular segment. A third person must match one of the other two, if the matches are genuine.]

 Although I have more of my relatives tested at FTDNA, the reliance on having to contact your matches in order to obtain the details for how they match others was why FTDNA did not seem to be so useful to me, especially as many people do not respond to contact. And Ancestry does not give us any tools to analyse where the actual shared DNA is, so the process of finding TGs is impossible there. Therefore, whilst the other companies do have their own advantages, 23andMe was where I did most of my "work" and, although most of the triangulating groups at 23andMre shared relatively small segments with me (ie between 7cM - 15cM ), I had identified the potential shared ancestry with one of my matches, a 4th Cousin 1x removed, who shared 14cM with me and I just assumed the relationships for the other matches were likely to be further back in time.

So I was happy with my 23andMe process. I'd even agreed to do a talk for the Guild of One-Name Studies on using autosomal DNA, as I felt confident I knew what I was doing.

But a couple of months later, everything changed. A different theory had developed, partly as a result of statistics produced by Ancestry but also through the work of other scientists. These statistics demonstrated that the probability of several cousins actually sharing the same matching segment was very low, if not impossible. Instead of "triangles", we now had "circles" - and suddenly that brought into question exactly what all these "triangulating groups" really are.

The "circle" theory is still based on the fact that shared DNA means shared ancestry - but now the claim was that the shared DNA would be on different segments of the chromosomes, because of the way DNA is transmitted. A parent passes half their DNA to each child, but each child receives a different half, as there is a recombination process between each parent's two chromosomes before one chromosome is passed on to the child. After several generations, there would be quite a variety of smaller segments carried by cousins descended from the same ancestor. So, rather than looking for the TGs, we should be looking for "genetic networks", clusters of people who share DNA with each other in the cluster but not necessarily over the same segments. The existence of the TGs was explained partly by features in the testing process, such as the lack of phasing, but also by the existence of what are called "population segments" - sequences of base pairs that are just very common in particular populations, so everyone has them, even though there are no close ancestors in common.

How does one know the difference between a genealogically significant triangulating segment and a population segment? Or between a group of matches who have received different segments of DNA from a single ancestor and a group of matches who match on different segments that have come to them from a variety of shared ancestors? Surely the companies are taking these factors into account when they predict the matches? Were the results from the companies even reliable?

So many questions - I felt like I was floundering.

My confidence in what I was doing certainly took a dive at that time. It didn't help that I had also uploaded the raw data for my mother and I to another organisation, DNA Land, who claim to be able to impute "missing" (by which I assume they mean, "untested") areas of DNA, in order to produce a more complete sequence - and yet the number of matches they suggested as a result of this process was not only much less than I have at the other companies, it included people who don't appear to match me at any of the other companies. That seems strange, given that I have tested at all three of the main companies. I know only a small number of my matches elsewhere will have uploaded to DNA Land, but the differences still seemed quite significant [ie only three matches, including Mum, for me at DNA Land - compared to the 1888 I currently have at 23andMe, 1146 at FTDNA, and almost 6000 at Ancestry!]

Was this DNA testing all a waste of time (and money!)?

When in doubt - I go back to what I know. Just as I work from the known to the unknown in my normal genealogy, I realised I needed to do that more with my DNA research, as well. A "stab in the dark" may occasionally hit a target but it's just as likely to leave me floundering around in the darkness, following blind alleys.  And that's what looking for shared ancestry just from the TGs felt like.

The statistics from all of the companies indicate that autosomal test relationships can only be predicted reliably for about the first five generations. That is not to say we won't show a match to more distant relatives - it's just that, the more distant the relationship, the more difficult it becomes to predict the level of that relationship, as the range of possibilities increases. A single segment of DNA may be passed on unchanged for many generations. But, in all the test results, I knew my known relatives always showed up as they should do. My mother was definitely my mother (not that I doubted that!) And my father's known relatives all show up as matches at the right levels.

So DNA testing works!

Beating the temptation to run and hide, I gave the talk in August, describing the two theories and commenting that "most of us don't understand enough about the statistics to make definitive claims either way so a combination of the methods seems to be the best approach. Both methods are valid but have caveats, eg small segments often appear to triangulate, but may not be genuine, clusters of people sharing different DNA may be due to having multiple ancestors in common."

Some bloggers do seem to be finding segments that are shared by groups of distant cousins. The problem for many of us in the UK, though, is that often we don't have sufficient "middle-distance" relatives identified (both in our genealogy and in our DNA) to produce the sort of success stories that many in the US seem to be experiencing. For example I only have 29 fourth cousins in the Ancestry "4th cousins & closer" section, whereas some of the American results I have seen have between 400 - 750 relatives at that level!

But I have had some success in identifying relationships with my matches - I now have the potential shared ancestry identified for 10 of them (and if the 10th is actually correct, it's a big clue as to which of my ancestral lines three other shared matches fit into). So that's a start.

As well as confirming my genealogy & finding new relatives, one of my goals with DNA testing is mapping where my DNA came from. Identifying shared ancestry with my matches is one part of this process and, so far, my chromosome map, mapping DNA received to the relevant "most recent common ancestor" (MRCA), looks like this:

Chromosome 4 shows where a known Parry segment contains within it a Saunders segment:

And this shows how that Saunders segment of DNA appears to have passed down to my Parry grandfather:

Any other matches over the identified segments on the chromosome map should (if the identification is correct) be either a descendant of the same couple, or a descendant of one of their ancestors. 

I think there needs to be a continual checking process, using both DNA and genealogy - for example, having found a genealogical connection to one of my DNA matches at Ancestry, we were then able to confirm, using FTDNA, that the person also matched my mother over the same segment, and that neither my mother, nor I, matched the person's father (both requirements necessary for the genealogy to be correct.) 

Since I have several close relatives tested, it gives me the opportunity to work from the DNA data backwards, rather than just concentrating on those potential triangulating groups of distant relatives. My DNA consists of segments of the DNA of my grandparents, passed to me by each of my parents. The "crossover points", where a segment from one grandparent switches over to a segment from the other grandparent can (sometimes) be identified in our DNA, using the details of how we match close relatives. This is a process I began looking at some years ago, using tools written by David Pike. But now more of my relatives are on Gedmatch, I can use the "Visual phasing" method as explained by Kathy Johnston, which should be a lot easier. 

I have been working on this recently and will post about the process soon (now there's a challenge to myself!)

Friday, 24 June 2016

My Ancestors and their Descendants - my potential DNA Tree

Earlier this year, our ISP informed us that it would no longer support personal web spaces - a poor decision in my view (of course!)

The upside of this is that it will force me to do the web site "re-write" that I set as a goal in 2015.

The downside is that I haven't done it yet, so my Parry Surname Research (Family History and the One-Name Study) site has disappeared.

Theoretically, since the site was written in html and css, it would have been quite easy to just upload all the files elsewhere.  But then there'd be little incentive to get the rewrite done.  And, with the development of the Guild's "Members' Websites Project", it seems an ideal opportunity to separate out any personal family history from the Parry One-Name Study information, and to ensure the long term survival of the ONS data by placing it on the Guild's site.

So that's the plan. And it is in progress (slowly).

But today, frustrated at the loss of my "DNA tree", which I really need to accompany the autosomal DNA project I have set up at Family Tree DNA, I decided to try uploading that here, on Blogger.  It's taken a bit of tweaking of the coding, especially on the page width, which I hope I don't accidentally delete, but at least the information is available again:

My Ancestors and their Descendants - my potential DNA Tree

And now I've been reminded of just how many of my ancestors and their descendants I still need to trace. ☺

Tuesday, 5 May 2015

Other activities - the Genealogy Do-Over interlude

Sometimes I keep a diary.  And sometimes I don't.  And, when I don't, I often look back and wonder what I did for all those days! 

So, for my own future reference (and for any descendants who ever wonder what their "x times great" grandmother did), here are a few notes.  Firstly, I resurrected another hobby - sewing.  Prompted by the thought that the Saturday night banquet at the Guild of One Name Studies Conference has seen me wearing the same dress for a number of years, I decided to make a skirt - which then developed into making a skirt, top, evening bag and several other items just for the fun of it.  Getting the critical items finished on time did involve stitching at 5.30 am on the morning of the banquet but, since I'd woken up early anyway, it seemed like a good use of my time.

Finishing the sewing so early at least left me free to chat to people in any spare time during that day.  And chat I did, as the Conference is a great time for catching up with "old" friends, as well as making new ones.  Some of the conference sessions were recorded and the videos are available on the Guild's YouTube channel - I am looking forward to watching some of those sessions I missed, due to there being two sessions running at the same time.  It would be hard to pick highlights from the Conference, as it was all so good, but I think Jim Benedict's interactive session on "Succession-Proofing your ONS" probably stands out as providing the most laughs, as the various groups debated why *their* method of succession-proofing was best (Debbie, have you bought that spaceship yet?).

We heard more about the Guild Members Websites project over the weekend and I took the opportunity to chat with Mike Spathaky about his Cree Study site, and the various different options for producing websites.  It was Mike who had asked me, on the Guild hangout in February, why I was thinking of moving my PARRY ONS site to WordPress.  As a result of our discussions about the benefits, and potential longevity, of html, I now have a few more reasons for not doing so.

For the first time at the Conference, on the Friday afternoon there was an informal meeting for those interested in DNA testing.  Despite me being totally disorganised, having arrived at the hotel later than planned, and then walking all the way to my hotel room, only to discover that my key didn't work, so that I was still carrying around half my belongings at the time the meeting began, things seemed to run smoothly as we all shared about our various levels of involvement with DNA testing.  No doubt we will all be building on this in the coming months and years. 

I have frequently come away from the Conference with some snippet of Parry information, whether it has been from Marriage Challenge certificates passed on to me, or references I have found in books on the bookstall, or in someone's talk, etc.  This year was no exception, as Jo Fitz-Henry very kindly supplied me with photographs of some Parry gravestones that she had come across.  I'll write more about those on the Parry ONS blog.

The Conference was held at Brigg in Lincolnshire and my route there provided an opportunity to drive past RAF Scampton, one of the bases where my mother had been stationed in her WRAF days.  When planning my conference attendance, I had originally thought of contacting the museum on the base with a view to arranging to visit enroute to Brigg.  It was probably a good job I didn't do that, given how time went.  But that's now on my "To Do" list, for another occasion.

Moving on from the Conference in March, the next main event was the WDYTYA? Live Show in April which, for the first time, was being held at the NEC, Birmingham.  This provided another incentive to do some sewing!  Several years ago, Dick Eastman blogged about the Progeny Charting Companion program and its ability to produce an embroidery pattern from your family tree.  "What a wonderful idea," I thought, and soon after that, I was able to replace my 35 year old sewing machine with a new one capable of following such a pattern.  Then came the "busy-ness" of the last few years.  I still haven't tried that program but, ever since I discovered some ancestors who were "artisans in fireworks", I have had an idea in my mind - and I finally managed to execute that in time to wear to the show.

Okay, the hall was too warm to actually wear the hoody *in* the show, but I'd achieved my goal!  I'm now on the look-out for other items I can embroider with bits of my family history!

At the show, I was helping to man the ISOGG stand (ISOGG = International Society of Genetic Genealogy).  We were so busy throughout most of the time that I was amazed I hadn't lost my voice - it seemed like every time I sat down, another visitor would arrive with a query.  Hopefully, we will be seeing a rapid increase in DNA testing in the UK over the coming months, especially now all three of the main companies (FamilyTreeDNA, 23andMe and Ancestry) are marketing their products here.  Another enjoyable aspect of WDYTYA was meeting many of the ISOGG members who came across from the United States to assist with the practical aspects of testing on the FTDNA stand.  Although ISOGG itself is an independent organisation and, as far as possible, information is always presented without bias, many of us would admit to having a personal preference towards FTDNA, not least because they are the only testing company that support the YDNA and mtDNA projects.  (Having taken the autosomal test at all three companies, I think it only fair to mention that I can find pros and cons for each of them.)

There was a fair amount of catching up to do, after the three days of "doing nothing" at WDYTYA, which was followed by a deadline for some paperwork.  But, now that's been met, I find myself actually restarting my Genealogy Do-Over. 

I wonder whether I can get to week 13 without any further interruptions!

Genealogy Do-Over "restart"

It's time to restart my restart!

As I described in my last post, I needed to postpone my Genealogy Do-Over, as other activities have had to take priority recently.  However, I'm now back again - and, amazingly, back before the repeat of the scheduled Do-Over week that I had paused at.  So that gives me a bit of time to refresh my memory of what I had been doing (seems to be an increasingly necessary task these days!)

There has still been some - almost unintentional - progress on the Do-Over topics in the interim.  I have bought a new laptop, as the start up of my previous one would have been beaten by a snail doing a marathon.  Unlike previous occasions when I have changed computers, this time I do not intend to just transfer everything across in one go, thus maintaining (and perhaps being limited by) the old file structure.  Instead,  I will take the opportunity to redesign my filing system - which was one of my aims for the Do-Over.  Since I am keeping the old laptop to use whenever I run a stand for the Guild of One-Name Studies at a family history fair, the new laptop has also been a good opportunity to purchase full and/or up-to-date versions of the programs I'm going to be using from now on, such as Legacy and Evidentia.

So the next couple of weeks will be a steep learning curve, as I start to get to grips with these properly, as well as continue trying to build the use of programs such as OneNote and Evernote into my routine, in order to maintain a good system to my research files and the Parry data collection, in particular.  Thankfully, many of the programs have active User Groups, which I imagine I shall be making frequent use of!

Sunday, 5 April 2015

How am I doing? A Do-Over review

Half way through the Do-Over, I started to assess my progress.  Seven weeks later, the post is still sitting here unfinished - which probably says it all!

Other aspects of life got in the way again, and with the "Who Do You Think You Are? Live" exhibition coming up soon at the NEC, Birmingham, as well as family activities, the Do-Over situation won't improve anytime soon.   It isn't really a problem to me - I always knew that applying the lessons of the Do-Over would take longer than the 13 weeks of the scheme.  Thomas MacEntee is now repeating the series, for those who joined late, or who just want to repeat it.  Although I would have liked to have completed the full sequence of topics, if only at a basic level, before repeating them to add further layers of knowledge and experiences, for this second time around perhaps I will just pick it up again when he reaches week 7.

(Did I see a reference to cycles 3 and 4 among Thomas's comments on Facebook?  That will certainly help to keep me going all year. Perhaps by December I will have made it to week 13!  J  )

For those new to the idea, the Do-Over Facebook group can be found at https://www.facebook.com/groups/genealogydoover/

Tuesday, 10 March 2015

Genealogy Do-Over week 6

The topics for week 6 of the Genealogy Do-Over were:
1. Evaluating Evidence
2. Reviewing Online Education Options

Collecting data, or "evidence", is easy - I do it all the time, particularly for my one-name study.  A new database is announced, I visit the site, search for "Parry", and then collect any results.  Sometimes this is only at the index level as, depending on the format of the database, extraction of any additional details can be quite time consuming.  And often, because the Parry ONS is a fairly large study, that is as far as I get.  Yes, eventually, when I am identifying individuals, and tracking the events of their lives, the expectation is that I will take a closer look at the details and be able to add the information to a person in a pedigree.  But that does not always happen to start with, and even an index level of detail can have value for a one-name study, so that's okay.  It is still progress on the study.

However, it is another step to actually evaluate the evidence found.  But this is an essential step, if we're aiming to produce reliable pedigrees, or life histories, or even just statistics from the original database.  After all, how complete *is* that database?  Are the results really representative of what I think they are?

Sometimes the need for evaluation of a source is obvious.  When I first started collecting any references to the Parry surname, I soon realised that there were certain "well known" Parry families.  For example, 'The Parrys of Poston', in Herefordshire, who are frequently noted because descendants include Blanche Parry, Chief Gentlewoman to Queen Elizabeth I.   But, when I found the often quoted source, a pedigree for the family in the "History of Breconshire", warning bells began to ring.  It wasn't just the tracing of the tree back into the 'myths of time', from "Catherine, widow of Thomas Lord Laci", through "Idio Wyllt, Earl of Desmond",  and back to the kings of Ireland, but basic issues, such as the almost total absence of dates, and even occasionally names, for some of the more recent individuals in the pedigree. 

Clearly there are questions to be asked about the accuracy and reliability of such a work. 

But the necessity for evaluation of all sources is easy to forget when dealing with some of the more recent "evidence" we collect.  So we take documents such as census records or birth certificates at face value.  Occasionally, we might perhaps spot an anomaly that causes us to ponder but, generally, we can be tempted to think, "it's an official record, it must be accurate".  We can also fall into the trap of assuming that, just because we can only find one entry for the name we're looking for, then that *must* be the relevant one.  I was amused to see a blog post recently, by Cherie Tabor Cayemberg, which illustrated exactly this point, as she was searching for the death date of a relative with what seemed to be a rare combination of names, but found two possibilities in the same area.  How easy it would have been to be misled, if there had only been one obituary available (Tuesday's Tip - The Case of the Two Viola Vanias http://haveyouseenmyroots.blogspot.co.uk/2015/03/tuesdays-tip-case-of-two-viola-vanias.html

These days, it is so easy to add details to a family tree without going through a process of evaluation (especially when the tree is on the same site as the databases themselves, such as on Ancestry, with their "Save to person in your tree" button).  Once entered into a tree, there's even less chance of a later reader examining why a particular connection was made, or how strong the evidence was for a stated fact.   Good research, that produces results which can be relied upon, requires a better examination of every source, or piece of evidence, and a ranking of reliability.  That was something I was aiming at with my Colston Parry pedigree at http://freepages.family.rootsweb.ancestry.com/~parryresearch/colston.htm , but I still have some way to go to build this process into my practice. 

The principles of evaluating genealogical evidence, usually based on the work of Elizabeth Shown Mills (see https://www.evidenceexplained.com/content/quicklesson-17-evidence-analysis-process-map ), can be found on many sites. Thomas MacEntee added the relevant considerations as columns in his Research Log spreadsheet but, for a working reference sheet, I quite like the way Dawn Kogutkiewicz formatting the items as questions ( at http://dawninggenealogy.blogspot.co.uk/2015/02/genealogy-do-over-week-6.html?spref=fB ).  So these are now entered into my OneNote Research Notebooks, to be referred to whenever I am collecting data.  I have also added a note to develop some questions for myself, that I can apply to a whole database prior to even looking at individual entries, as evaluation at that level will be necessary if I am drawing conclusions based on index level information.

Reviewing Online Education Options
This topic made me laugh - as, if "doing the Do-Over" wasn't enough of an example of online education, I don't know what is!

We all need to keep learning, as Thomas MacEntee says, not just to improve our own research, but to keep up with new developments and to learn about new areas of research.  So, do I need a specific 'education plan, as he suggests setting?  One needs to remember that those whose livelihood involves genealogical education will keep on producing 'new' courses, webinars, etc., as long as people keep attending them.  The danger is that there is so much information 'out there', that we can easily spend all our time trying to learn everything, and we never actually 'do' anything.

So, no, I am not going to create a new 'education plan' this week - in a sense, I already have one, because the goals that I set out initially for this year of my Do-Over, such as mastering the new techniques and new programs that I am using, involves a lot of learning.  So I shall continue to focus on the items already specified and trying to ensure that what I learn actually gets embedded into my practice.