Dispensation Order Generation for Pyrosequencing
by Mats Carlsson
With the huge increase in the use of DNA technology in fields such as forensic analysis, mutation analysis, antibiotic resistance studies, clinical genetics and pharmacogenetics, it is becoming ever more important to optimize the throughput of DNA sequencing equipment. In a project at SICS, constraint programming was used to optimize the instruction sequence driving DNA sequencing based on the Pyrosequencing principle.
The main application area of Pyrosequencing is the analysis of polymorphic stretches of DNA sequence in the context of stretches of known sequence. The method is based on the principle of sequencing by synthesis. That is, a single strand of DNA is used as a template for synthesizing its complementary strand. The synthesis proceeds by incorporating one nucleotide at a time. In each reaction cycle, a nucleotide is dispensed, ie added to the reaction. There are two possibilities:
- it matches the current nucleotide in the template, it is incorporated into the complementary strand, and a chain of reactions leads to the emission of quantitatively detectable visible light
- otherwise, no incorporation takes place and no light is emitted.
In either case, certain enzymes ensure that any surplus reagents are degraded, making the equipment ready for the next cycle.
Thus, the equipment is driven by a dispensation order, that is, a sequence of instructions. An instruction is one of the DNA nucleotides A, C, G, T. The picture shows how a cyclic dispensation order (A, G, T, C, A, G, T, C, ...) can be used to analyze an unknown sequence. However, if most of the sequence to analyze is known, this is wasteful in terms of reagents and time. By cleverly taking into account the known parts of the sequence, and the known variants of the polymorphic parts, a dispensation order that allows for an optimal throughput of the analysis can be computed.
The task of finding such a dispensation order may seem relatively straightforward. However, humans and most other higher organisms are diploid, ie we have all inherited one copy of each gene from our mother and one from our father. These two copies may be identical or may represent different variants in the polymorphic parts. The dispensation order must allow for determining unambiguously and quantitatively what sequence(s) are present in the sample being analyzed.
To further increase throughput of the method, it is often possible to multiplex several Pyrosequencing reactions in the same reaction well. This significantly increases the complexity of the problem of finding an optimal dispensation order, or even a feasible one.
Biotage AB (formerly Pyrosequencing AB) is a Swedish corporation manufacturing Pyrosequencing equipment. A project with SICS was set up, where the task of SICS was to study this problem and come up with an algorithm to produce an optimal dispensation order given a formal description of the sequence(s) to analyze.
From a computer science perspective, this was a clean yet challenging problem. To successfully address this challenge, a host of computational techniques was brought to bear, including logic programming, term rewriting, nogoods, and constraint programming over finite domains.
The technical part of the project was finished in a matter of weeks. Since the final algorithm was far from trivial due to the advanced computer science techniques used, the best method of communicating the algorithm to Biotage became an issue. At the time, Biotage's technical staff included computer science engineers, but they were not specialists in the techniques used in the algorithm. We solved this issue by delivering in addition to the code itself an intensive course in logic programming, in constraint programming, and in the details of the algorithm.
Mats Carlsson, SICS, Sweden
Tel: +46 18 572361