Dorothea Beringer and Gio Wiederhold
Computer Science Department, Stanford University
{beringer,gio}@db.stanford.edu
http://www-db.stanford.edu/CHAIMS/
Because we assume remote services to be autonomous, and assume that the programmer does not have reliable information about performance, CPAM allows the client to get performance estimates before invocations take place, and to check the status of an ongoing execution. We assume that the services we compose by using CPAM are large, remote, and autonomous. Remote services can be offered by servers within the same organization, or by servers of other organizations. The location of the service is unimportant as long as the servers and the client are connected by some distribution system on which CPAM is implemented. We assume that the services offered by servers in other organizations are autonomous. Autonomy implies that the implementation, execution and maintenance is under the control of the organization owning the service. The only control the client has is to select another service, or to negotiate improvements, easier done when the service is being paid for. Changes to the servers and their services can be made any time as long as the services still cooperate with their interface posted in a generally accessible repository. Autonomy also means that the person doing composition has no control over the resources made available to the services by the organization providing the services. Yet availability of resources has a great influence on the performance of services.
Of course, we could require that performance characteristics of the services are posted along with the interface definition in a repository. Yet this approach has several weaknesses: 1) performance does not only depend on resources but also on the specific input data, 2) static performance information cannot take into account dynamic fluctuations in the performance of services, and it might only give an average or an upper bound for the performance, 3) static performance information can help in deciding if a certain service is to be used or not, but does not give any possibility to monitor the execution and performance of the service at run-time. Therefore we have introduced into CPAM the capability to obtain estimates about the performance of services prior to the invocation from the module offering the service.
We also enable the monitoring of the performance of a service during
its execution. This allows to make educated decisions prior and during
service invocations, and allows various novel optimization possibilities
specific to the composition of large, remote, and autonomous services.
In order to allow composition by a simple sequential client, all the primitives in CPAM are initiated by the client and are synchronous procedure calls from the client to the modules offering the services. This has the advantage of being a simple and very generally applicable paradigm that can be easily implemented on top of various distribution systems like RMI, CORBA or DCE. Having a simple client structure also simplifies system optimization, e.g., exploiting the inherent parallelism between various services.
A client can terminate an invocation in which it is no longer interested with the TERMINATE primitive. Any results stored by the server for extraction as well as other invocation specific information will be deleted.
Presetting of parameters: Input parameters for invocations as well as results are transmitted in CPAM as name-value lists. These lists contain for each parameter its name as specified by the server providing the service in a generally accessible repository. The parameter value consists of triplets for simple data elements containing type, descriptive information and the actual value. Complex data elements are hierarchies of triplets, with each node having type and descriptive information as well as either the actual value or another complex data element.
In order to avoid data flow redundancy, INVOKE only needs those input parameters that are different from default values provided by the service. CPAM has the primitives SETPARAM and GETPARAM that allow to preset parameters. Parameters that remain the same for several service invocations by the same client have to be set only once.
Partial result extraction: A client can extract a subset of the results of a service, only including the elements it needs. CPAM also allows progressive extraction: the client can repeatedly extract more accurate values of the same result parameter. Incremental extraction of results is used if the service makes a result available as soon as its computation is completed, even before the computation of the next result is done.
Invocation monitoring: The EXAMINE primitive allows to monitor the progress of an invocation. It returns the status of the invocation (e.g. DONE, NOT_DONE), as well as a progress estimate (e.g. 30%). While the first return parameter, the invocation status, is well defined in CPAM, the interpretation of the second parameter, the progress, is service specific. It can be a quantitative measure, denoting the progress of work in time or data volume. In case of simulation services it can express the quality of the current results.
Cost estimation: EXAMINE only allows to monitor and get progress estimations after a service has been invoked. Yet in many cases, especially for optimization and invocation scheduling, a client would like to get various cost estimates prior to invocation. This is done with the ESTIMATE primitive which for a specific service and a specific set of preset parameters returns estimations of the execution time of the service, the fee to be paid for the service, and the data volume of the results to be expected.
For more information about CPAM see [Melloul:99], or the description
of the various CHAIMS components on our web pages (http://www-db.stanford.edu/CHAIMS/).
Pre-invocation estimates (ESTIMATE primitive) can be used for various optimization objectives:
So far we have used the CPAM protocol in the CHAIMS project, where we implemented it on top of CORBA and RMI. We also provide wrapper templates for wrapping legacy code into CPAM compatible modules. These wrapper templates take care of handling several concurrent invocations, presetting parameters, transforming parameter values, and dispatching method calls to the legacy code. For the cases where a legacy module does not provide any pre-invocation estimates, future versions of the wrappers will also provide estimates based on the history of previous invocations. The repository, accessible either as a simple text file or via a graphical browser, contains the names of available services and their parameters together with additional information. The CHAIMS environment ([Beringer:98], [Perrochon:97]) furthermore contains a compiler for the language CLAM (Composition Language for Autonomous Megamodules) [Sample:99]. This compiler generates client code that uses CPAM within various distribution systems in order to access distributed, CPAM compliant modules.
Due to the ESTIMATE and EXAMINE primitives various optimization techniques become feasible. It is possible to hand code clients that have optimization based on these primitives, either directly in C++ or JAVA, or in a higher level language like the composition language CLAM. Yet this is a very tedious task. We are therefore investigating automatic optimization and scheduling techniques. Based on given preferences and constraints, these optimization techniques find optimal services and invocation schedules. No pre-existing cost model is used. Optimization can be done at compile time as well as run-time, based on the ESTIMATE and EXAMINE primitives. Run-time estimates take into account the influence of actual input parameters as well as the availability of resources. Besides avoiding additional dependencies and information flows between the highly independent provider of autonomous services and the clients using these services, this has also the advantage of being accurate even in fast changing environments.
Our current demonstration example for CPAM comes from the domain of
logistics. Several information and reservation services are used to determine
the best way of transportation from a city A to a city B. Yet the simple
CPAM protocol is not limited to software services, and there are no constraints
concerning the maximum execution time of a service. Therefore, we also
plan to investigate the applicability of the CPAM protocol to workflow
management. We believe that especially the usage of pre-invocation estimates
could be of high interest in the domain of workflow management.
[Melloul:99] L. Melloul, D. Beringer, N. Sample, G. Wiederhold: CPAM, A Protocol for Software Composition; submitted
[Perrochon:97]Perrochon, Wiederhold, Burback: A Compiler for Composition: CHAIMS; Fifth International Symposium on Assessment of Software Tools and Technologies (SAST'97), Pittsburgh, June 3-5, 1997
[Sample:99] N. Sample, D. Beringer, L. Melloul, G. Wiederhold: CLAM: Composition Language for Autonomous Megamodules; submitted
[Swenson:98] K. Swenson; "SWAP Simple Workflow Access Protocol
(SWAP)", Internet Draft, http://www.ietf.org/internet-drafts/draft-swenson-swap-prot-00.txt