The Cost of Quality for Cycle-Harvesting Grids
by Chris Kenyon
Cycle-harvesting locating unused on-line computational capacity and making it available is an important feature of Grid resource management. Work at IBM Research has led to a quality of service (QoS) definition, called Hard Statistical QoS (HSQ), and an accompanying system architecture that can support commercial service contracts.
Business models in Grid computing dealing in buying and selling resources across budget boundaries (within or between organizations) are in their very early stages. Cycle-harvesting (or -scavenging, or -stealing) is a significant area of Grid and cluster computing with software available from several vendors. However, creating commercial contracts based on resources made available by cycle-harvesting is a significant challenge for two reasons. Firstly, the characteristics of the harvested resources are inherently stochastic. Secondly, in a commercial environment, purchasers can expect the sellers of such contracts to optimize against the quality of service (QoS) definitions provided. These challenges have been successfully met in conventional commodities, eg, Random Length Lumber (RLL), traded on financial exchanges (the Chicago Mercantile (CME, www.cme.com) in this case). The essential point for creating a commercially valuable QoS definition is to guarantee a set of statistical parameters of each and every contract instance.
The chance of achieving a good QoS level of the delivered resource slots just by taking whatever comes and passing it on (ie, basically random sampling of the underlying harvested resources) is shown in Figure 1. The error level permitted is taken as 1% of the contract size. As expected, larger contracts (more samples per quantile) make it easier to attain the fixed QoS requirement. However the probability of satisfying the quality requirements with random sampling is basically zero. Our analysis is valid when quantiles are guaranteed for any distribution: the analysis has no dependence on the underlying distribution. Thus we need a systematic way to deliver HSQ we do this by combining stochastic (harvested) resources with directly controlled (deterministic) resources.
|Figure 1: Probability of fulfilling a contract (Prob[success]) for a given distribution of slots using random sampling from the harvested resources relative to the precision with which the distribution is defined (number of quantiles). These results are valid for any distribution.
|Figure 2: Expected proportion of deterministic resources required (E[R]) to guarantee a given distribution (LogNormal with mean 3 and variance 1.5) to different levels of precision as defined by the distributions quantiles.
Stochastic QoS is a QoS parameter or set of QoS parameters that is based on statistical properties of some QoS metric. Hard Stochastic QoS (HSQ) is simply a QoS parameter or set of QoS parameters that is based on statistical properties of some QoS metric and is guaranteed with certainty. These metrics are guaranteed for each and every contract. Statistically this is the difference between guaranteeing the properties of a sample versus guaranteeing the properties of the population from which the sample was drawn. In business terms this means that each contract can have a consistent value not just a collection of contracts or a particular provider. A typical example in conventional commodities is the specification of Random Length Lumber contracts on the Chicago Mercantile Exchange; in the IT world statistical guarantees on network usage and capacity parameters are commonplace.
We have defined a system architecture for delivering HSQ. This comprises five things: an HSQ controller; a pool of stochastic (harvested) resources; a pool of deterministic (dedicated) resources; monitoring of the stochastic pool of resources; and control of the stochastic resource pool. The basic idea is that the HSQ Controller monitors the Cycle Harvesting System (CHS) resources, using the CHS Monitor, as applied to each Contract and takes one or more of two possible actions: sends an artificially 'harvested resource end' to a particular CHS resource; or diverts contract obligations from the CHS resources to the Deterministic Resources, under control of the Deterministic Controller, with appropriate instructions for their execution.
Given that resource contracts are being traded and the owner of the HSQ system sees an Ask (ie, a request for resources) with a price and a deadline, what is the cost to support that request? If it is below the price then the HSQ system owner can respond directly to the Ask. If not then the owner can post an alternative price. The cost will consist of the quantity of stochastic resources required and the relative proportion of deterministic resources to meet the quality requirements together with the costs of the two types of resources. In many situations the major cost will be the deterministic resources.
The motivation for this work was the desire to see whether it was possible to turn cycle-harvested resources into something commercially valuable. Was it possible to define a contract that was valuable given that the underlying resources are inherently stochastic and that in a commercial environment we can expect the provider of a QoS guarantee to optimize against it? We wanted to avoid all arguments based on long term reputation because we were interested in having valuable (tradable) individual contracts for cycle harvested resources not the question of how to create valuable reputations for the providers.
We have shown how HSQ can be implemented for cycle-harvested resources using a hybrid stochastic-deterministic system. We have also described architecture, and algorithms, to support HSQ contracts. We have analyzed the algorithm behavior analytically using a distribution-free approach in terms of the expected proportion of deterministic resources required to support a given HSQ level. Thus our analysis can be applied whatever the details of a particular system happen to be.
For a particular HSQ example (see Figure 2) where time slot lengths were log-Normally distributed we found that to provide hard guarantees on 8 quantiles for contract sizes of 16 to 1024 slots, from 13% to 1% additional deterministic resources were required. Permitting oversampling, for example for a contract size of 64, was relatively inefficient, leading to up to 61% of the stochastic resources being wasted. Including downwards substitution (ie, accepting longer time slots as valid substitutes for shorter ones) reduced deterministic resource requirements by roughly half. These results are robust (change <5\%) for changes in shape and scale parameters by a factor of 2 in both directions.
We have concluded that commercial service contracts with hard statistical QoS guarantees (HSQ), based on cycle-harvested resources are viable both from a conceptual point of view and quantitatively with only small (1% 10%) requirements for deterministic (dedicated) resources.
Chris Kenyon, IBM Research,
Zurich Research Lab., Switzerland