Before we get into data and papers, I wanted to go over what is involved to generate the data. I was hoping to do these type of posts as I processed the samples and provide real time pictures of the process. However, due to COVID severely limiting university laboratory access for the foreseeable future, I decided I should just get started.
Step 1: Sample Collection
I am sure most of you are now familiar with this step, but I wanted to go over a few details about collecting samples for biological analysis. Biological analysis of any kind can be messy and inconsistent because microbial communities are always changing and can be heterogeneous on very small scale depending on the environment. To help overcome these sampling difficulties, we sample more than once to ensure the sampling captures the potential heterogeneity. Community change can usually be overcome by essentially “killing” or slowing down the community by freezing. DNA is very resilient and stable, so it is not damaged by freezing but rather preserved. Preservatives for soils/liquids can also be used to stop microbial activity and stabilize DNA, but can be expensive.
Step 2: DNA extraction
The goal of a DNA extraction is to go from compost sample to only DNA in water. If you have a pure culture of cells (think bacteria colonies on a petri dish) this step can even be skipped. With soils, and especially compost, humic acids among other inhibitors such as salts and can be a huge problem for downstream analysis. While there are many methods for DNA extractions, most are performed with commercial kits designed for a specific sample type. The first step is to break the cells open. This is accomplished by adding glass beads and a soapy solution to the sample. It is then shaken vigorously in a instrument called a bead beater (think small paint shaker). The beads physically break open the cells, while the soap pulls apart the fatty cell walls. After bead beating, the DNA should be out of the cells and dissolved in solution. Chemical solutions designed to pull out different inhibitor types such as proteins, salts, and humics are added and removed. The whole process is basically precipitating or binding different compounds and separating them from the DNA so ideally you only end up with DNA dissolved in water.
Step 3: DNA quantification and quality
Next, we want to see how much DNA we recovered from our samples and get an idea of how pure it is. One of the most common ways of doing this using an instrument called a Nanodrop. This instrument uses only 3ul (a pin drop) of sample and UV light which to estimate the concentration of DNA in your sample as well as potential PCR inhibitors (humics).
Step 4: Polymerase chain reaction (PCR)
PCR amplifies a specific fragment of DNA exponentially, and is one of the most commonly used molecular biology techniques. Essentially, we are mimicking how cells copy their DNA to make new cells, but instead of replicating all of the DNA from all of the different cells we only amplify a biomarker gene (16S / 18S gene) from each cell. These are the genes that will actually be sequenced and analyzed. We can target the polymerase to amplify these genes using primers. These are small fragments of DNA which bind to either side of the gene and allow for the DNA polymerase to bind to the single stranded DNA. This amplification is performed by using the DNA polymerase enzyme, a mixture of single base pairs (ATCG’s), and primers. The mixture is then cycled through different temperatures in a machine called a thermocycler. The short video below describes the process better than I can.
Step 5: PCR reaction Clean Up and Pooling
After PCR, we will have 1 PCR reaction per sample which needs to purified to remove the extra base pairs and DNA polymerase. Magnetic beads are used to bind the DNA and pull it out of solution. It is then re-suspended in water so we now only have our gene of interest (from all of the cells in our sample) and water. The last step we pool all of the cleaned PCR reactions into a single tube and are now ready sequence!
But how do we know what sample is what if they are all a single tube? One detail I left out of the previous PCR step is that special primers are used so that each sample has a unique string of ATCG’s (DNA barcode) attached to every fragment of DNA that is amplified. These barcodes correspond back to the original sample, and is used after sequencing to parse the DNA sequences back to their sample.
Step 6: Sequencing (Finally!!)
We are now ready to load our potentially hundreds of pooled samples into the sequencer. There are many different types of sequencing instruments, but the most commonly used types are made by Illumina. Again, sparing some details… the DNA from the sample uniformly binds on to a special chip or flow cell which contains small strands of DNA that bind a single strand of DNA. Each strand of DNA is then re-amplified by using fluorescent base pairs which light up. Each base pair is a different color. After each cycle of adding a base pair, a high-resolution camera take a picture of the chip and records the colors at each location where a strand is bound. This process is performed over and over again for 300 cycles and then another 300 cycles in the opposite direction. Ultimately, we end up with about 15 million sequences which are ideally evenly divided between all of the samples. We now have a 10GB text file (fastq file) with ATCGs and quality scores which need to undergo bioinformatic processing to assign the sequences to the right sample, quality control, and taxonomic assignment. Each sequence ideally represents a cell but this is not quite the case due to biases we'll talk more about later. I'll also talk more about the bioinformatic processing in another post because generating the data the only half of the battle.
I know this is a lot of info (a few years of molecular bio courses), but please feel free to ask any questions. I’ll try to post progress pictures later of your samples as they go through each step!