RT-qPCR (protein recipe copy counting)

“Normal” PCR can take you far – but it only copies regions of DNA. Sometimes you want to take things to the next level… You want to copy things and get a readout while it’s copying. Oh – and it would be nice if you could start with RNA so that you could measure protein recipe copies (messenger RNAs) to see what a cell was probably making. Would that be possible? Yes! Enter Reverse Transcription Real Time quantitative PCR (RT-qPCR)!

Want to know how many copies of a protein recipe there are? Cue the qPCR! Just like a bakery might only serve strawberry pie in the summer and pumpkin pie in the fall, cells make different proteins at different times, and they alter the amount made based on demand by regulating how many copies of the messenger RNA (mRNA) recipe they make to give to the chefs (protein-makers called ribosomes). If we can count the recipe copies, we can get an idea about how much of each protein is being made under different conditions, and quantitative PCR (qPCR) provides a way to do this by making copies of these copies and watching them exponentially amplify in “Real Time”

note: refreshed from February 2020, video added 2/22/22

more on the practical aspects of analyzing the data here: https://bit.ly/qpcranalysis

Quantitative PCR (qPCR) is a way to “count copies” of specific pieces of DNA – and when those specific pieces are DNA copies of RNA (like the RNA copies of protein recipes) we call it Reverse Transcription (RT) qPCR – RT can also stand for Real Time, which refers to the measuring – the products are detected as they’re made because of the fluorescence they cause (either because of generic dyes or specific probes) – unlike traditional PCR (Polymerase Chain Reaction) which you use just to make copies of defined stretches of DNA, with the goal of making as many copies as possible, RT-qPCR counts the copies as they’re made to figure out how many copies you started with.

In the beginning there aren’t enough copies to measure directly, so you stabilize the recipe copies and make more copies of them to amplify the signal to a detectable level – the more copies you start with, the fewer amplification cycles you’ll need to do this – so you can compare how many cycles are required for different recipes, a number referred to as the quantification cycle (Cq) value – a lower Cq value means more recipes, and likely more protein. Here’s the gist of how it works.

Cells are a bit like bakeries, where the “baked goods” are molecular workers called proteins, and their recipes are called genes. The “original recipes” are written in DNA form in “cookbooks” called chromosomes, which are really long, really coiled up, strands of DNA that have lots of recipes – this DNA is precious so you don’t want to mess with it – if mutations are introduced, dysfunctional proteins can be produced!

So special measures are taken to protect it. In eukaryotes (which for the most part basically means non-bacteria) it’s actually physically “locked up” in a membrane-bound compartment called the nucleus. If a cell wants to make a protein, it first makes a copy of the recipe, and it gives the copy to the chefs.

Instead of making the copies in DNA, which is pretty stable, it writes the copies in DNA’s “cousin” RNA (they’re really similar – both have 4 letters (nucleotides) with a generic sugar-phosphate part that allows for linking together to form chains (RNA’s sugar has 1 more oxygen than DNA’s) & 4 unique nitrogenous bases (“bases”) that allow for strand-to-strand “zipping up” through specific base pairing (A to T (or U in RNA) and C to G). more here: http://bit.ly/nucleicacidstructure

The RNA copy of a gene that gets made is called messenger RNA (mRNA), and it’s made in a process called TRANSCRIPTION. In transcription, an enzyme called RNA Polymerase (RNA Pol) travels along the DNA and makes a letter-for-letter RNA copy. Then this copy is edited – in the process of SPLICING, regulatory regions called INTRONS get removed and the protein-coding parts (EXONS) get stitched back together to form the finished recipe. more here: https://bit.ly/altsplicing

In addition to the unique instructions parts, all of the recipes get a generic “cover” and “backmatter” – a 5’ cap and a 3’ poly-A tail – these generic parts help the cell know that these are recipes & provide room for processing machinery to latch on – and the tail will come in handy later…

And then the recipes get handed over to the chefs (ribosomes) for protein-making (TRANSLATION). Proteins are made up of strings of amino acid “letters.” In the process of TRANSLATION, ribosomes travel along the recipe and “servants” called transfer RNAs (tRNAs) bring them the next amino acid to add. They know which to add because the tRNA has a 3-letter anticodon that complements the 3-letter codon on the mRNA. more here: https://bit.ly/translationtimestwo

Lots of different recipes get made & when it comes to the chefs, it’s kinda like a first-come-first-serve basis, so the more copies of a recipe there are the more likely it is to be made, so mRNA abundance is often used as a predictor of protein abundance, which is harder to measure (though it can be done through things like Western blotting). It’s not really quite this simple because there are mechanisms for translational and post-translational regulation, but qPCR won’t detect these – instead it’ll only detect transcriptional regulation.

In addition to making RNA copies of stretches of DNA in transcription, your cells make DNA copies of *all* your DNA – whole chromosomes – in a process called replication – each time before they divide so both daughter cells will get a full library. And it does this using a DNA Polymerase (DNA Pol). And we can use DNA Pol *outside* of cells to copy DNA in little tubes in a process called Polymerase Chain Reaction (PCR). More here: http://bit.ly/pcrtrain

We don’t want it to copy everything, but thankfully for us, we can utilize its limitations to our advantage to only copy defined stretches of DNA. You see, polymerases are a bit like trains that can only travel on double-stranded track, so if a Pol wants to travel along single-stranded DNA or RNA it has to lay down the complementary track ahead of it as it goes. This limitation also applies to the starting – it can only start from double-stranded track – so if you have single-stranded stuff (like mRNA or DNA after you unzip the strands by heating them up (melting) you want to copy you need to provide start “stations” which you can do with short pieces of DNA called primers that you design to match where you want Pol to start. Another limitation of its is that it can only travel in 1 direction (5’ to 3’) more here: http://bit.ly/sequencetermstools

I’ve been talking about DNA Pol because that’s what we use in ReverseTranscription qPCR – even though we’re interested in RNA not DNA. RNA is less stable than DNA – in part because of its chemical makeup – RNA’s extra O makes it more prone to self-destruction – and in part because of enzymes called RNases (RNA chewers) to chew up RNAs that are foreign or no longer wanted. more here: https://bit.ly/rnafragility

So the first step in RT-qPCR (after you isolate the RNA) is making DNA copies of the RNA copies of the DNA recipes through REVERSE TRANSCRIPTION. Normal transcription goes DNA->RNA. REVERSE transcription goes RNA->DNA. It uses a different polymerase (instead of the usual DNA-RNA or DNA-DNA Pols you need an RNA-DNA Pol – we call such Pols reverse transcriptase) – and we call the DNA copies of the mature mRNAs complementary DNA (cDNA)

The reverse transcriptase can make DNA copies of RNA, but it still has the limitation of needing a double-stranded starting platform – so you need to provide primers for it.

Usually you want to measure multiple mRNAs. Even if you’re only interested in levels of one recipe, you need to normalize it to something so you make sure that if you see twice as many strawberry pie recipes under a set of conditions it isn’t just because you had RNA from twice as many cells.

Traditionally this is done by comparing levels of “housekeeping genes” which are recipes that are made at pretty constant levels under all conditions. It’s kinda like a bakery that always sells cheesecake – unlike strawberries, cream cheese never goes out of season, so the same amount of cheesecake is made year-round and you can compare samples after adjusting so that they have equal amounts of cheesecake-making.

Some “cheesecake-like” genes that are frequently used as “endogenous internal controls” are “cleaners” – UBC (a gene for ubiquitin which, as the name implies is pretty ubiquitous! It’s expressed lots everywhere and is used as a tag to target unwanted proteins for degradation); genes for proteins involved in energy-making (including GAPDH (Glyceraldehyde 3-phosphate dehydrogenase) which plays a key role in glycolysis (sugar-breaking-down) and ATP5B (part of the ATP synthase machine that turns ADP to ATP); Genes for proteins that form a cell’s cytoskeleton (tubes, filaments, etc that give cells shape and help things move inside, etc.) – TUBB2B & TUBA1A (which makes the beta & alpha isoforms of tubulin, which makes up a system of microtubules that help things move in an orderly fashion throughout cells) & ACTB (which makes tubulin’s cytoskeletal friend actin)

Since you want to count multiple things, you usually start by stabilizing and reverse-transcribing all the mRNA and/or all the RNA (mRNA or otherwise). To just RT the mRNA you can take advantage of that generic poly-A tail we saw earlier. Since A pairs with T, you can use a short stretch (usually 15) of DNA T’s (an oligo(dT)) as an all-mRNA-specific primer. It’ll latch onto the poly-A tail to provide a starting point for the reverse transcriptase.

If you provide a “normal” oligo-dT primer, it can latch on anywhere along the poly-A tail, but if you use an “anchored oligo-dT” which ends (3’ end) with a letter other than T (a G, C, or A that acts as an anchor) – it can only latch onto the part closest to the end of the unique stuff (binds at the 5’ end of the poly(A) tail.

To illustrate: imagine you have an mRNA that’s unique part ends in a C

blahblahblahCAAAAAAAAAAAAAAAAAAAA

If you use and un-anchored oligo(dT) like TTTTTTTT that can bind anywhere along the stretch of As and serve as a primer for the reverse transcriptase. So you can get

<——————————TTTTTTTT

blahblahblahCAAAAAAAAAAAAAAAAAAAA

<——————————————TTTTTTTT

blahblahblahCAAAAAAAAAAAAAAAAAAAA

etc. But if you use anchored oligo(dT)s where you have a mix of ATTTTTTTT, CTTTTTTTT, & GTTTTTTTT, only the G-version can bind and it can only bind in one spot

<——————GTTTTTTTT

blahblahblahCAAAAAAAAAAAAAAAAAAAA

oligo-dT primers are good if you don’t have much RNA to start with – but there are some disadvantages – like it can sometimes prime internal poly(A) sites (if an RNA happens to have a stretch of As before the end the primer can stick to the center of the recipe “thinking it’s the tail end” so you get a truncated (end-lobbed-off) version of the recipe, not the whole thing – and it can also bind to RNAs that aren’t mRNAs they just happen to have a lot of As (like some rRNAs (ribosomal RNAs)).

You can also have the “opposite problem” – some mRNAs like the ones for histone proteins are weird and un-tailed. And if you want to look at expression of things things like tRNA, rRNA, & noncoding RNAs, oligo oligo(dT)s won’t help you.

If you want to reverse transcribe TOTAL RNA (not just the mRNA but also tRNA, rRNA, noncoding RNA, etc.) you can use random oligo-dTs. These are short (6-9 nt) random sequences that are short enough that they’re likely to be found in lots of genes and you have enough variety that all the genes are likely to find several matches. So you end up with pieces of the recipe copied, not the full-length thing.

Even if you’re thing is tailed & even if you use anchored oligo(dT)s so you’re as close to the unique stuff as possible, mRNAs can be REALLY LONG and if you want to detect something further towards an mRNA’s start, random oligos or a mix of random & oligo-dT might be a better option.

If you only have a few you want to test & you want high sensitivity (be able to detect tiny amounts) you can use sequence specific primers that are designed to mach the mRNA you want to look for and are longer so those sequence aren’t found other places (unlike the shorter random ones)

After you make the cDNA copies of “everything” or at least all the mRNAs, you want to make copies ONLY of the recipe you’re interested in. So instead of aiming for genericness, you want your primers to be super specific. And you’ll need primerS now Because now we want to amplify – with reverse transcription we only made 1 strand of cDNA, but now we want to make lots of copies. So we need to make a second strand from that first strand and then we can use those strands as templates for the other strands so you can make more and more and more…

Since you’re now just copying DNA to DNA, you can use a “normal” DNA Pol. And just like in normal PCR, qPCR is performed in cycles of temperature changes – MELT (heat up to separate strands) -> ANNEAL (cool down to let primers bind & Pol latch on) -> EXTEND (let Pol lay down complementary track) -> REPEAT.

So you need 2 primers – one for each strand – one will define the start & the other the stop for the region you want to copy. The first primer will bind the cDNA (at where you want to start) and Pol will start copying it 5’ to 3’ until it falls off the end of the cDNA or it runs out of steam, etc. And then in the next cycle that second strand needs a primer that bind it – and where it binds will define the start of where that strand starts. And it’ll go to where the 1st primer started because that’s as far as the strand it’s copying goes. So from then on, your strands will be the same length, bookended by those primer sites. So where to choose to put your primers?

Ideally, you want to design primers that span an exon-exon junction – this prevents you from amplifying the genomic version (in the original DNA recipe, the exons are separated by introns (eg. EXON1-intron1-EXON2, so an exon-exon specific primer (e.g. EXON1-EXON2) can’t bind it.

Each round of PCR, another copy can be made from each copy, so you increase exponentially. In the very beginning you can’t tell this though because the levels are so low you’re below the background & just see “noise.” But soon you’ll enter the exponential phase where you get measurable doubling each cycle – and since you start with way more supplies (primers, dNTPS, etc.) than you need, you don’t have to worry about running out. But later on you do start running out, so copy # stops growing exponentially, and your curve plateaus.

How do the copies get measured? Fluorescence – this is where a molecule absorbs a certain wavelength of light and release a different wavelength. More here: http://bit.ly/fluorescentstains

If you can directly couple the amount of light given off to the number of copies you make, and you use a special PCR machine with a fluorescence detector, you can read out – in “Real Time” – the number of copies you’re making. There are a couple of different ways of going about this.

“Generic” DNA-binding dyes like SYBR Green – it has a flat structure that can kinda wedge itself in between bases in DNA (intercalate). It fluoresces strongly when it’s zapped with a laser of the right wavelength of light AND it’s bound to double-stranded (ds) DNA, but not when bound to single-stranded (ss) DNA. So the more dsDNA is made (which happens when you make more copies) the more fluorescence you’ll see.

An alternative is to use specific REPORTER PROBES – a common such probe method is “TaqMan” -in some ways these probes are like specific primers – they’re short pieces of DNA (oligos) that you design to match specific sequences – but unlike primers, these aren’t designed to act as starting stations for Pol – they lack a 3’OH so can’t be build off of. And instead of the start of your thing, you design them to match somewhere in the middle of the thing you want copied.

The probe will bind and serve as a kind of “roadblock” – but instead of getting slowed down by it, the Pol chews it up because it has exonuclease activity (can chew up nucleic acids from the ends). And when the roadblock gets chewed up you get a fluorescent read-out. Why? Don’t FRET – let me explain!

On one end of the probe is a fluorophore (the reporter) and on the other end is a quencher. When the probe is uncut they’re close enough that the quencher can absorb the energy that the fluorophore would normally give off as light – it “quenches” the fluorescence that would given off by the fluorophore in a phenomenon known as FRET (Forster Resonance Energy Transfer (FRET)). More on FRET: http://bit.ly/fretandfluorescence

But when they get chewed up they separate so the quencher gets far away, the light doesn’t get stolen, & you can see it. And since the chewing occurs each time a copy gets made, you can use increase in fluorescence as an indicator of copy-making.

You can plot cycle # vs fluorescence and – in either type of measuring – what you’re looking for is a value called the Cq value (quantification cycle) which is the # of cycles it takes to pass a “threshold line” corresponding to the background fluorescence level – the more copies you start with, the fewer cycles it will take (lower Cq) and the more “left-shifted” your curve will be.

RT-qPCR is just one way to measure “gene expression” and it has limitations – but so do all methods – so if there’s something you really care about showing you’ll often measure the gene product at multiple levels (like using qPCR and western blotting). More here: bit.ly/measuringexpression

qPCR involves a lot of pipetting & it’s a real pain & easy to make stupid mistakes – I did it a lot in one of my lab rotations and hated it – especially since I have this bad habit of holding my breath when I concentrate – so I kept making myself light-headed setting all the reactions up!

In addition to being useful in the research lab, RT-qPCR is also useful in the diagnostic lab. It serves as the basis for traditional diagnostic tests for SARS-CoV-2 (“the” coronavirus), which is an RNA virus. Basically, the tests reverse transcribe RNA in swabbed fluids to capture evidence the virus has left and then run qPCR to measure how many copies of the viral genome there are. more here: http://bit.ly/coronavirustesting

a couple helpful webinars:

IDT webinar: Design considerations for qPCR assays: https://youtu.be/C-aJ103lUwQ

IDT webinar: Technical Tips for qPCR—Sample and Experimental Considerations: https://youtu.be/g9rqdtdYOg0

more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉 http://bit.ly/2OllAB0