Obs # 6: The authors failed to report critical sequencing information
Each sample was extracted, PCR amplified, <u>and sequenced twice to ensure</u> that the sequences generated were not modified through low template copy number. <u>We recovered five full-length env sequences and five partial (0.7- to 1.2-kb) gag sequences.</u>
There are a number of things not clear with the above statement. Before I address some of them, I think it’s important to clarify a few point about DNA itself.
DNA consists of a sequence of subunits called nucleotides. Each nucleotide consists of a sugar, a phosphate and a base. In a given stretch of DNA, the following 4 bases are found: Adenine (A), Cytosine (C), Guanine (G), and Thymine (T). To sequence a piece of DNA, a gene, a chromosome, or an entire genome means to find the order in which these bases are organized. An example of a DNA sequence is ATCGAATTGCCG. And that sequence, if found in a DNA sample that is adequately stored, will be the same no matter of often one may want to look at it. Given that, I understand that our story tellers failed to report critical sequencing information in their study:
a) How much similarity did they find in their various sequences?
The authors started with 6 blood samples and one of them did not give any PCR product (Obs # 4). If we call the first one “Sample A”, that leaves us with 5 samples (B, C, D, E, and F). Considering that each sample was sequenced twice, we would have something like this: B1 (1st sequence from Sample B) and B2 (2nd sequence from Sample B), C1 and C2, D1 and D2, E1 and E2, and F1 and F2.
As readers, we would like for instance to know if B1=B2, C1=C2 and so on…And since we are dealing with the same virus and the same pair of genes (env and gag), perhaps it would be even good for us to know if B1=D2=F2 for each gene. In other words, we would like to know if experimental or human error was large or small. Was there significant variation within samples (a different sequence for each replication in the same sample) or among samples (a different sequence for each sample)? Basically, the authors failed to tell us if the sequences generated by each replication were identical, completely different, or partially similar. As a result, when they talk about sequence alignment throughout the paper, we simply can’t know what they really mean. And that’s not good for us. For them, maybe; but not for us!
b) How can they use both full and partial sequences?
The authors told us they recovered full sequence for the env gene but only partial sequences for the gag gene from each of the five samples; and they kept going as usual without even attempting to offer an explanation. Is it because they simply could not recover a full sequence after repeated attempts? Or is it because for their purpose a full sequence of the gag gene was not necessary? Your guess is as good as mine.
c) Why did they pick only “two” gene sequences among the many that are available?
I do not find a solid justification for their choice of just the env and gag genes anywhere in their paper. Perhaps they did not see this as an important point, but we are talking science here, not mere propaganda. There is much more that is known about the HIV virus.
gelin