Forward vs reverse genetics, phenotype & genotype

I’m always forgetting forward vs. reverse genetics, so I thought I’d share a memory tip stemming from phonetics! I hate when people introduce cliff-hangers and/or teases, and then leave you waiting, so here it is: Forward genetics starts from the PHenotype. If you can remember that sound-alikeness you can then know reverse genetics must start from the genotype. Those are just fancy words for the version of the gene (genotype) & the effects of a gene (phenotype) – like having a version of a gene (allele) for brown hair would be your genotype & brown hair would be the phenotype. If you take a brown-haired person and try to figure out the genetic reason why their hair is brown, that would be forward genetics. Say you discover that reason, a certain gene version. So you take a blonde-haired person, introduce that gene version, and see the person’s hair turn brown. That would be reverse genetics.

video added 2/19/22

In a way, it’s like light. White light is made up of light waves of every color of the rainbow. The only reason things look colored to us is that color absorbing parts of molecules (chromophores) absorb some of that light. But they only absorb light of specific wavelength. Which wavelength they absorb depends on their molecular makeup, so different things look different colors because they’re removing different wavelengths than the light that’s bounced back or goes through (transmitted light). The color we see is the sum of all the remaining wavelengths, which is no longer white. more here: http://bit.ly/lightleafcolor

Forward genetics is like seeing light of a certain color (analogous to phenotype), then shining it through a prism to see what wavelength is missing (analogous to genotype). Reverse genetics is like removing a specific wavelength (analogous to changing the genotype) then looking to see what the resulting color is (analogous to genotype).

That helps to visualize things, but when it comes to biological phenotypes, we’re often dealing with changes to proteins, which are like cellular machines with lots of working parts. The “instruction manual” for putting together all these proteins is your genome, which is a collection of really long, really coiled strands of DNA called chromosomes. Genes are the equivalent of chapters of these manuals – they’re stretches of DNA that hold the instructions for making proteins (often – sometimes they have instructions for making functional RNAs, but I’m not going to go there today). Therefore, changes to the genetic sequence (either naturally-occurring mutations or experimentally-introduced ones), which we classify as genotypic changes, can change the protein.

At that point, you’ve changed the protein’s phenotype, cuz it “looks different,” but phenotype is kinda like one of those “if a tree falls in the forest does it make a noise?” situations. Phenotype typically refers to a change we observe and/or measure. So if that change to the protein (caused by a change to the gene) was so mild that it doesn’t change anything enough to be detected, we wouldn’t say it was a phenotypic change. But, if that change causes the protein to stop working or something, causing problems that we detect, we would say that there was a phenotypic change.

And here we get into the problem of whether or not you detect a change depends in part on whether you’re measuring the thing that has changed. Say a clown breaks his leg. If you go to a circus and measure how fast that clown can juggle, and compare that to how fast the clown normally juggles, you won’t be able to tell he broke his leg (assuming the cast is hidden under a costume or something). But, if you ask that clown to ride a unicycle, now you see that something’s “changed.”

An example of a genotype/phenotype version of this would be if a mutation causes changes to a person’s toes, but the doctors never examine those (sorry, had to rhyme there!). At a cellular level, maybe there’s a mutation to a protein enzyme (reaction mediator/speed upper) that’s needed for breaking down (metabolizing) the sugar fructose. But you’re measuring the cells’ ability to metabolize the sugar glucose. So you won’t detect the defect. Similarly, if you only feed the cells a different sugar, you won’t know anything’s wrong. But if you switch them to a fructose-only diet, the cells start having issues. In that same example, at the molecular level, maybe you *are* measuring fructose metabolism, but you’re using an assay (experiment) that only looks at an endpoint measurement – give the enzyme a bunch of fructose and a lot of time and see if it can all get broken down. If the mutation causes the enzyme to be slow but not non-functional, you might not be able to detect the change (kinda like if you check at the finish line of a marathon 12 hrs in to see how many people finished).

The situation is even more complicated because some molecules are only made in certain kinds of cells at certain times. And so you might be searching in vain for effects of something in the wrong cells and/or at the wrong time. Like if you go to an acrobat circus to see if anything’s askew, but the broken-legged clown isn’t in that show. Bottom line – “phenotype” can be conditional and complicated, and messy….

But, say you find some phenotype – like a subpar circus act or purple toes or cells failing to grow – and you want to figure out what’s causing it. This is the realm of FORWARD GENETICS. You go from the phenotype to the genotype. It always confuses me because if you look at the

genotype -> phenotype

situation, where a genotype (what version of a gene one has) leads to a phenotype (what measurable effects are seen as a result), it looks like phenotype to genotype would be reverse. But it’s not! In addition to the Ph/F mnemonic, it can help to think about forward genetics as “traditional genetics” – when Mendel was studying peas and stuff, he wasn’t going in and purposefully introducing mutations to specific genes (which would be reverse genetics). That would be bonkers! He didn’t even know genes existed. It was only later, after a lot of advancements in the field, that reverse genetics became possible.

More on reverse genetics later, but let’s get back to forward genetics. One important practical use of forward genetics is in medical genetics, where people are trying to figure out the genetic causes of diseases. Some diseases are caused by mutations to a single gene. In that case, the disease is usually inherited in a “Mendelian fashion.” Basically, you get one version of each non-sex-chromosome-located (i.e. autosomal) gene from each biological parent. Some diseases are “autosomal dominant” meaning you only need 1 bad copy to get the disease. These people thus have a 50/50 chance of inheriting the disease if a parent has it. Other diseases are “autosomal recessive” meaning that you need 2 bad copies to get the disease. Therefore, in order for a person to inherit the gene, both parents must either have the disease or be a carrier for it (have 1 bad copy & 1 good copy). If both parents are carriers, each child has a 1/2 risk of being a carrier & a 1/4 risk of having the disease.

Bottom line, since these things all have certain probabilities, you can look at genealogical trees, looking at who does/doesn’t have the disease to figure out the inheritance pattern. And then you can use genetic techniques to map out what gene sequences the people with the disease have in common that the people without the disease don’t have. An example of this classical form of gene mapping is the discovery of the gene that, when mutated, causes the lung (and more) disease cystic fibrosis. http://bit.ly/cysticfibrosisscience

Nowadays, DNA sequencing can be used to help, but things aren’t as easy as you might think because people have a LOT of DNA and there’s a lot of normal variation, so it can be hard to know if a change is really causative (which is where reverse genetics can come in to help (for example, if you introduce that change to cells in a dish, do you see a phenotype that might explain the disease?)).

Those techniques are useful for monogenic diseases, but most diseases aren’t that “simple” (with simple already being really hard!). Instead, most diseases are influenced by multiple genes as well as environmental factors, etc. To try to tease out what genes may be involved in some condition (disease or trait (like height, etc.)), scientists frequently turn to something called a GWAS (pronounced gee-waz). GWAS stands for Genome-Wide-Association-Survey and basically it looks at genetic markers (specific DNA sequences that tend to have variation) spanning the “entire” genome for a lot of people and looks to see if people with that condition/trait have markers in common. These markers don’t tell you what gene might be involved, but, since near-together genes are more likely to be inherited together, the markers serve as a “proxy” for nearby genes. So if scientists find a “hit,” a shared marker, they can then look to sequence databases to see what genes are nearby which might help explain things.

Those are examples in which forward genetics is used to explain naturally-occurring variation. And they’re great for studying human genetics. But other forward genetics projects are more actively experimental – introducing random mutations (not to people though!) and then screening for cool phenotypes to study.

In the early days, scientists would use “generic” mutagens like radiation to introduce random mutations in genes (more here: http://bit.ly/2TkzbKR ), then look for cells/flies etc. with cool features to study. Then they’d have to figure out where they mutated which could be hard

Nowadays, scientists can use use a “library” of sequences as guides to knockdown lots of different genes in different cells using CRISPR/Cas or RNAi (more here: http://bit.ly/2EfCycW ) – screen the cells for some effect, then look to see what sequence was in that cell.

Once you find something, you have to flip things around and go in the reverse direction to find out more about what the gene does and how it does it. How you go about this depends in part on what “kind” of scientist you are. Cell biologists alter the gene “in place” in cells to get a sense of how it works “in the wild” whereas protein biochemists like myself take a copy of the gene out of the original place, edit it in a tube using site-directed mutagenesis, this is a lot different than this random mutagenesis – we introduce precise mutations at precise locations to test how they affect different things. We then stick it into expression cells (often bacteria or insects) to make the protein for us. Then we purify the protein and study it. Less “real” but more controllable – so we can test directly what specific parts do. more here: http://bit.ly/sitedirectedmutagenesis2

If you want to study the protein in the wild….CRISPR/Cas is a way to use RNA as a guide to direct a protein called Cas (Crispr-associated-protein) to a specific location on DNA (target sequence) and cut it, then let the cells fix it. If you give it a matching piece you can have the cells put it in when they fix it, and thereby change the sequence. If you don’t give it an insert, it’ll try to stitch it together but often make mistakes that make the cell not make the protein. more here: http://bit.ly/crisprdoudna I’ve never actually used CRISPR, but I do a LOT of cloning. A LOT A LOT. I think I’m at ~170 clones since I joined the lab…

Another way to study the effects of a protein “in the wild” that is NOT a form of genetic engineering is to reduce the levels of the protein without actually changing the gene. You can do this with a process called RNA interference (RNAi) that “takes the protein off the menu” without destroying the original recipe. more here: http://bit.ly/rnaiknockdown

It can do this because the “original recipe” for a protein is its gene, which is written in DNA and locked up in a membrane bound compartment of the cell called the nucleus and RNAi only affects RNA copies of the gene that are made to serve as messengers between the gene in the nucleus and the protein-making machinery (ribosomes) in the cytoplasm – hence the name “messenger RNA” or mRNA.

So the basic process if a cell wants to make a protein is -> find the genetic instructions in the DNA (easier said than done because you have a LOT of DNA and it’s wound up tight – transcription factors can help make the region accessible and call in helpers to transcribe it). TRANSCRIPTION is a process in which RNA Polymerase (RNA Pol) makes an RNA copy of the DNA gene. This messenger RNA (mRNA) then gets exported out of the nucleus into the general cell part (cytoplasm) where ribosomes read it and make the corresponding protein in a process called TRANSLATION.

With RNAi, you stick RNA in the cells that acts as a guide to guide mRNA-destruction-machinery to target mRNAs with complementary sequences. This allows you to study the effects of proteins “in place” in a cell but in a way that is NOT gene engineering because you’re only changing mRNA copies of the gene so you’re only affecting protein levels not destroying the recipe. But you’re also not changing the instructions so you’re only looking at presence vs absence of protein, so you can’t make changes to see what different parts of the protein do, which is where protein biochemists can come in to help complement things.

Speaking of complementing things, before I go, one more terminology thing that often trips me up – sometimes scientists talk about “rescuing a phenotype.” My first thought is usually that the phenotype has been rescued (i.e. it’s present), but it really means that the cell, etc. is rescued *from* that phenotype, and is now “normal.” For example, going back to our mutated fructose metabolizing enzyme, if a scientist adds the normal version of the gene to those cells, that gene can complement the bad gene and compensate for the defect, allowing the fructose-makes-cell-get-messed-up phenotype to go away, and the cell to act normally.

P.S. this post was inspired my an episode of my new favorite podcast, The Skeptic’s Guide to the Universe – it’s been keeping me company during some really monotonous experiments, and they had a “what’s the word” segment on what phenotype means. https://www.theskepticsguide.org/podcasts

more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉 http://bit.ly/2OllAB0

Leave a Reply Cancel reply