Knowing Neurons
Neurological and Psychiatric DisordersNeuroscience Technologies

What are genome-wide association studies?

by Rebeka Popovic

To operate properly, each cell relies on thousands of proteins performing their function at the right time and place within the boundaries of its membrane. The function of proteins depends on their structure, which is dictated by the genetic code written in our DNA. More specifically, the sequence of nucleotides in the DNA, i.e., the order of adenine (A), thymine (T), guanine (G), and cytosine (C), codes for the amino acid composition of the protein (Brown, 2002). The human nuclear genome comprises approximately 3.2 billion nucleotides of DNA. Now, that is a lot of A’s, T’s, G’s and C’s!

The human nuclear genome comprises approximately 3.2 billion nucleotides of DNA

Whilst differences in the DNA sequence make us unique, for example by dictating the color of our eyes and hair, the same differences also dictate our susceptibility to disease. Sometimes, variations in the genetic code can lead to changes in the gene’s instructions for synthesizing a specific protein, causing it to malfunction or not be produced at all. In monogenic disorders, such as Huntington’s disease, a variation in a single gene is responsible for the disease (Nopoulos, 2016). However, many neurological diseases are complex or multifactorial, meaning they are caused by the interaction of multiple genetic and environmental factors (National Human Research Institute, 2023). In these cases, the disease’s onset is influenced by the combined effects of many genes, where each of them contributes a small effect, ultimately increasing the individual’s predisposition to a certain disorder (Jackson et al., 2018). Therefore, one of the main challenges of understanding such diseases is to identify the specific genes and interactions involved. So, how do scientists detect the many genes that contribute to a disease?

Genome-wide association studies (GWAS) are a research approach used by scientists to find genes associated with a particular disease or trait. This method compares the genomes of large groups of individuals with and without the particular trait. Researchers can then identify small genetic variations known as single nucleotide polymorphisms (SNPs) within specific genes that are more prevalent in individuals with the disease of interest (Uffelmann et al., 2021). They can then begin to investigate how these variations affect gene and protein function, and how interactions between these genes lead to the disease. GWAS have become a very important tool to detect genetic differences associated with disease. Since the first GWAS study in 2005 (Klein et al., 2005), more than 4,300 papers describing 4,500 different GWAS had been published by 2020 (Loos et al., 2020). Remarkably, researchers were able to identify over 55,000 unique loci for nearly 5,000 diseases and traits.

… many neurological diseases are complex or multifactorial, meaning they are caused by the interaction of multiple genetic and environmental factors

Several steps are involved in any GWAS experiment (Uffelmann et al., 2021). First, the scientists need to collect DNA and phenotypic information from a group of individuals, noting their disease status and other general information such as sex, age, and ethnicity. Then, the genetic code of participants needs to be determined using common sequencing strategies, such as microarrays or next-generation sequencing methods. Whilst determining the first sequence of the human genome took 13 years and 3 billion dollars, nowadays this analysis can be done in weeks and costs only around $1,000 (National Human Research Institute, 2022; Wetterstrand 2021). Finally, the statistical test for association is performed and the results are interpreted by conducting multiple post-GWAS analyses. The primary output of GWAS studies can be visualized using a Manhattan plot, which was named after its similarity to the Manhattan skyline, as data points form a silhouette that resembles tall skyscrapers towering above the city.

An image of a Manhattan plot depicting genes at different levels, like towers, on a log10 scale

An example Manhattan plot (Kunkle et al., 2019).

Prior to GWAS, mapping genes to a particular disease mainly focused on family-based approaches, such as linkage analysis, which assesses the transmission of specific DNA sequences, also known as alleles, between genetically related individuals or families. This method is powerful when detecting single or few rare genetic factors with high penetrance, meaning that the trait will almost always be apparent in an individual carrying the allele, making it very sensitive for monogenic, or Mendelian, diseases. However, it is not very sensitive for investigating complex diseases where genetic factors have low penetrance, do not cluster in families, or involve many common genes (Londin et al., 2013). Therefore, when it comes to complex diseases, GWAS can provide better results by enabling us to look at data from many unrelated individuals instead of only observing data from families.

… when it comes to complex diseases, GWAS can provide better results by enabling us to look at data from many unrelated individuals instead of only observing data from families

GWAS studies have been conducted for many neurological disorders and have helped elucidate the molecular mechanisms underlying these diseases. For example, GWAS studies looking into Alzheimer’s disease (AD) have identified several genetic variants that increase the risk of developing AD. One such variant is found in the Triggering receptor expressed on myeloid cells-2 (TREM2) gene, which encodes a receptor of the innate immune system carried by the microglia in the human brain (Ulland and Colona, 2018). Follow-up studies have found that the R47H variant results in impaired TREM2 function. When studying AD in mice, scientists demonstrated that a lack of TREM2 leads to a reduction of microglial activation in the brains of these mice, which normally allows these cells to recognize and fight AD-associated pathology (Song et al., 2018; Cheng-Hathaway et al., 2018). Therefore, by using GWAS, the scientists were able to find additional genes involved in AD progression, helping to explain how the brain’s immune cells contribute to this disease.

As amazing as they might seem, there are some limitations to GWAS. They can only explain a small portion of the total genetic risk for most diseases, as environmental and lifestyle factors, as well as other genetic variations that have not yet been identified, may also play a role in a given disease. Furthermore, GWAS studies can be influenced by limited sample sizes and confounding variables, such as population stratification, when differences in allele frequency between the individuals with and without the disease are due to differences in ancestry rather than genes associated with the studied disease (Tam et al., 2019). Nonetheless, they are another useful tool that can help us understand how our genetic make-up may predispose us to certain diseases, such as Alzheimer’s Disease!

~~~

Written by Rebeka Popovic
Illustrated by Mary Bullock
Edited by Zoe Dobler, Johanna Popp, and Shiri Spitz Siddiqi

~~~

Become a Patron!

silhouettes of people in blue and yellow against a black and gray city background

References

Brown, T. A. (2002). The Human Genome. In Genomes. 2nd edition. Wiley-Liss. https://www.ncbi.nlm.nih.gov/books/NBK21134/

Cheng-Hathaway, P. J., Reed-Geaghan, E. G., Jay, T. R., Casali, B. T., Bemiller, S. M., Puntambekar, S. S., von Saucken, V. E., Williams, R. Y., Karlo, J. C., Moutinho, M., Xu, G., Ransohoff, R. M., Lamb, B. T., & Landreth, G. E. (2018). The Trem2 R47H variant confers loss-of-function-like phenotypes in Alzheimer’s disease. Molecular Neurodegeneration, 13(1), 29. https://doi.org/10.1186/s13024-018-0262-8

Jackson, M., Marks, L., May, G. H. W., & Wilson, J. B. (2018). The genetic basis of disease. Essays in Biochemistry, 62(5), 643–723. https://doi.org/10.1042/EBC20170053

Klein, R. J., Zeiss, C., Chew, E. Y., Tsai, J.-Y., Sackler, R. S., Haynes, C., Henning, A. K., SanGiovanni, J. P., Mane, S. M., Mayne, S. T., Bracken, M. B., Ferris, F. L., Ott, J., Barnstable, C., & Hoh, J. (2005). Complement factor H polymorphism in age-related macular degeneration. Science, 308(5720), 385–389. https://doi.org/10.1126/science.1109557

Londin, E., Yadav, P., Surrey, S., Kricka, L. J., & Fortina, P. (2013). Use of Linkage Analysis, Genome-Wide Association Studies, and Next-Generation Sequencing in the Identification of Disease-Causing Mutations. In F. Innocenti & R. H. N. van Schaik (Eds.), Pharmacogenomics: Methods and Protocols (pp. 127–146). Humana Press. https://doi.org/10.1007/978-1-62703-435-7_8

Loos, R. J. F. (2020). 15 years of genome-wide association studies and no signs of slowing down. Nature Communications, 11(1), Article 1. https://doi.org/10.1038/s41467-020-19653-5

National Human Research Institute. (2023, March 23rd). Complex diseases. https://www.genome.gov/genetics-glossary/Complex-Disease

National Human Research Institute. (2022, August 24th). Fact sheet: Human Genome Project. https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genome-project

Nopoulos, P. C. (2016). Huntington disease: A single-gene degenerative disorder of the striatum. Dialogues in Clinical Neuroscience, 18(1), 91–98.

Song, W. M., Joshita, S., Zhou, Y., Ulland, T. K., Gilfillan, S., & Colonna, M. (2018). Humanized TREM2 mice reveal microglia-intrinsic and -extrinsic effects of R47H polymorphism. Journal of Experimental Medicine, 215(3), 745–760. https://doi.org/10.1084/jem.20171529

Tam, V., Patel, N., Turcotte, M., Bossé, Y., Paré, G., & Meyre, D. (2019). Benefits and limitations of genome-wide association studies. Nature Reviews Genetics, 20(8), Article 8. https://doi.org/10.1038/s41576-019-0127-1

Uffelmann, E., Huang, Q. Q., Munung, N. S., de Vries, J., Okada, Y., Martin, A. R., Martin, H. C., Lappalainen, T., & Posthuma, D. (2021). Genome-wide association studies. Nature Reviews Methods Primers, 1(1), Article 1. https://doi.org/10.1038/s43586-021-00056-9

Ulland, T. K., & Colonna, M. (2018). TREM2—A key player in microglial biology and Alzheimer disease. Nature Reviews Neurology, 14(11), Article 11. https://doi.org/10.1038/s41582-018-0072-1

Wetterstrand, K. A. (2021). DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP). www.genome.gov/sequencingcostsdata

Author

  • Rebeka Popovic

    Rebeka is a PhD student at the University of Cambridge. Her current research explores molecular mechanisms driving Parkinson’s disease using the fruit fly as a model organism. She received her MSc in Neuroscience from King’s College London, specialising in neurodegeneration research. Outside the lab, she enjoys bouldering and listening to music. For more about Rebeka’s research and experience, please visit her full profile.

Avatar photo

Rebeka Popovic

Rebeka is a PhD student at the University of Cambridge. Her current research explores molecular mechanisms driving Parkinson’s disease using the fruit fly as a model organism. She received her MSc in Neuroscience from King’s College London, specialising in neurodegeneration research. Outside the lab, she enjoys bouldering and listening to music. For more about Rebeka’s research and experience, please visit her full profile.