top of page

First Look at Sequencing Data!

Hi everyone,

We finally have some results to start looking at from the sequencing run. As I said before, the initial data from the sequencing run looked really good, meaning the run worked and we got a bunch (~13,000,000) sequences. However, once we took a closer look there is a bit of bad news. We only generated sequences for about half of the samples, so out of about 250 samples we submitted we have data for maybe about 125. I am really, really bummed about this because we have such a rich data set. Usually a few samples fail to sequence, but losing half indicates a larger problem.

After discussion with the sample preparation lab, the problem likely started with the DNA extractions and a lot of samples had trouble amplifying likely due to humic acid contamination. The original plan was to work through potential issues like this before even collecting samples, but we had to just go for it due to the unavailability of university labs. Unfortunately, we don't have the time or the money to reprocess all the samples and try again either. With that being said I think there is still a lot of information we can pull out and hypothesis we can generate for future studies.

Final disclaimer: the work below is only exploratory and interpretation may change. Also, some sample groups contain only 1 or 2 samples and proper lack replication.


The measure of diversity is usually broken down into richness and evenness. Richness is number of different species and evenness is how close in number those species are.


First lets look at how many "species" or number of unique sequences we are seeing in each sample (Figure below).

For now, samples are grouped by type on the horizontal axis using a box plot. These grouping are fairly self explanatory except for "worm" which is mostly worm guts. The vertical axis is the number of unique sequences in each sample and each dot is a sample. Don't worry about the p value bars at the top for now they are only indicating which sample groups are have statistically different values.

As you can see we have the most samples for vermicompost with a really broad range of richness. The sample types peat, soil, and compost only have a few samples and will likely be dropped from future figures.

Lets focus on the vermicompost, precompost, and manure. One of my hypothesizes was that diversity would increase during the vermicomposing process which we can generally see to be true as we go from Manure _--> Precompost --> Vermicompost. However, we do see some vermicompost that has much lower diversity than precompost and manure. Vermicompost diversity and potential factors affecting it is definitely something we will look into in a future post.


Next is a quick look at evenness, or how evenly the amount species in a sample compare to each other. For example a perfect evenness of 1 means every specie in the sample is equally abundant (ex: 20 species each 5% of the population). The closer the zero the more uneven a population is.

I don't see too much going on here except for the worm guts being really uneven. This typically means there are a few dominant species present which we will see later. There also looks to be a similar trend with an increase of evenness from Manure--> precompost --> vermicompost which is significantly different.

Beta Diversity

Beta diversity basically shows how similar the communities of each sample are. The closer the dots the more similar the communities are.

Let me start by saying this is an example of not a very good beta diversity plot. We can see the compost (1 sample) is very different than all the other sample types and is skewing the analysis. We will remove this sample and redo the chart so we can "zoom" in on the vermicompost, precompost, and manure samples. The tea also appears to be different but again this only 2 samples unfortunately so it's hard to say anything definitively.

So what is actually there...? (Taxonomy)

This part get tricky but is usually the most interesting and fun. With samples like vermicompost with potentially hundreds of different species it is very hard to look at everything. Even if we did it is unlikely we are going to know what that organism is specifically doing. With that said lets take a look!

We are going to start way zoomed out and look at the Kingdom level. Remember, organisms are classified on the following levels: Kingdom, Phyla, Class, Order, Family, Genus, and Species.


All of these figures will be display the top 20 groups of organisms as percentages (0-100%) at each taxonomic level. The colors are a heatmap where red is high abundance and blue is low abundance. I think we are all familiar with Bacteria, single celled organisms without a nucleus. Archaea are also single celled without a nucleus but have different features from bacteria such as the cell wall which allow them to live in extreme conditions. Eukaryota include all multicellular life such as fungi, nematodes, plants, and animals. All Eukaryotic sequences are represented by the 18S gene sequences while the Bacteria and Archaea are 16S gene sequences.

From the Kingdom figure above I notice a trend of decreasing percent of Eukaryota and increasing amount of bacteria. We can also see the compost and worm samples have a lot of Eukaryota sequences and could be a big reason why they are so far away from the other samples on the Beta Diversity Plot.


If we drop down one taxonomic level to the Phyla we can already start to see some organisms of interest. Looking at our worm samples on the end we can see they are composed of almost 23% Annelida or worm sequences. Not surprising, pretty neat, but should probably be filtered out in the future. Jumping to Compost we can see Eukaryote Mucoromycota at 8.7% which is knows to contain mainly mainly mycorrhizal fungi, root endophytes, and plant decomposers. We also see compost has 5.5% of something that is yet to be defined at all... there is a lot to still be discovered in this area of microbiology.


If we jump to the top 20 families in the samples we start to see some more potential organisms of interest.

Our number #1 family is Chitinophagaceae which is most enriched in the single compost sample but present in all of the samples to some degree. This family still contains a large diversity of organisms which makes it hard to even speculate what they are. At #2 we see Moraxellaceae enriched in the worm gut. These are likely commensal organisms meaning they likely are in the main inhabitants of the worm gut. The Tea samples are starting to show enrichment of some families but there are too many to go over now. The Manure and vermicompost samples are showing an enrichment of Microscillaceae which have also been found in manure amendments and is suggested to increase degradation of petroleum based toxins. Vermicompost is also showing an enrichment of Pirellulaceae which are aerobic organisms that mostly only been studied in aquatic environments.


The Genus level of classification is as far down as we confidently identify with this type of sequencing. This is due to the fact that we can only sequence ~25% of the whole gene due to sequencing length limitations. Also keep in mind we are only looking at the top 20 most abundant so for most samples we are still not seeing the majority of organisms present as it becomes too unwieldy to deal with. Also keep in ind that these are averages of many samples which could be very different.

I can't go through all of these right now but some of the genus that jump out are Taibiella which contains a member Taibaiella smilacinae , a known endophyte isolated from a plant root. It we look at the Tea samples we can see it contains 4.5% Massilia, a major group root colonizing organisms. There is also a presence of Chryseobacteria especially in the Tea which is know to contain groups of plant growth promoting organisms. Interestingly, only Ruminofilibacter is only seen in the precompost and has previously been shown to be dominant in mature manure based compost.

Please feel free to look more of these up on your own if you are curious. Many of these organisms I am not familiar with so I am looking them all up too. If you find something interesting please let me know. I'm eager to know what your interested in knowing more about. Hopefully we can get a more detailed analysis of the vermicompost samples out soon.

122 views0 comments


bottom of page