September 10, 1997
The PolyMath Development Group of the Centre for Experimental and Constructive Mathematics has a completed an initial survey of the HPCnet network and proof-of-concept implementation of a PolyMath-based mediated resource brokerage system. The project's intent has been to assess the readiness of the technology, the network sites and the available facilities for the implementation of a distributed network infrastructure supporting the operation of HPCnet.
At the time of this report, the survey has been completed and a simple demo employing PolyServ been implemented. Progress connecting various sites in the HPCnet network has been slowed by various considerations. This is still underway and the demo will be expanded to include new sites as they become available and time permits.
Results from the project indicate a high level of feasibility for such an distributed network infrastructure within HPCnet. While it is not expected to cause any particular problems technically, there are a number of adminstrative issues (such as security and site-to-site consistency) which will need to be addressed in order to pursue such a large-scale project effectively.
In light of initiatives like Canadian Computational Collaboratory, we propose that HPCnet seriously consider investigating a distributed approach to its network, employing some of the technologies that are described below. Discussions would need to take place regarding the nature of such a network and its breadth of utility before any end-specific project could be proposed.
We wish to thank HPCnet for its support of this project and the opportunity to work with the various sites in the network. We appreciate the interest and helpfulness of the managers and committees at the sites. We hope to be provided with further opportunity to work with HPCnet and its consituents and see our vision of a integrated distributed network to reality with the help and cooperation of other groups working on aspects of this problem.
The HPCnet network is presently a collection of resources and services which strongly begs the presence of an infrastructure of unifying communications and facilitation mechanisms. This central structure would allow users to access the associated HPCnet facilities in the most efficient fashion, and for services to be delivered to the larger community. In order to conside such a overarching support system, HPCnet would need to examine closely and implement some of the latest network technologies. This is viewed as essential in light of initiatives like the Canadian Computational Collaboratory.
In our view, such a support system would have the form of an agent-oriented network which would include the following:
The CECM PolyMath group proposed that it
To achieve these goals, we proposed to proceed in several steps:
These goals form the first phase in a project to construct the infrastructure
needed to upgrade HPCnet to a fully agent-capable modern network that
can deliver more than simple large-scale batch-oriented computing to
education, research, and business users in Canada.
PDG decided to test the interoperability of the HPCnet systems by
implementing their PolyServ distributed computing technology on it.
PolyServ is a system for delivering mathematical services over the network
for use in advanced online environments for working in the sciences
and engineering.
PolyServ functions by placing ServiceLaunchers
on each host, which
maintain a list of services available on those hosts and publish
that information to ServiceCentres on coordinating hosts.
Clients can then request resouces through the ServiceCentres, which
allocate those resources on the best machine possible.
The core server system, and client system, is written in Java. This allows
mediating agents to be dynamically downloaded into web browsers to handle
communication with the remote resources, providing security and stability
to the system.
|
PolyServ represents PDG's core development of a fully agent-oriented, publish and subscribe, networking system. It was used in Phase I as a testbed, with minimal services, to explore the communication and implementation issues needed for a later complete implementation of such a system on HPCnet.
The test used for the HPCnet Phase I project was to attempt implementing
a basic service that would provide HPCnet host and load information, and
basic information on network connection quality.
Each ServiceLauncher was set up to publish the required information,
periodically and on demand, to a single ServiceCentre. A simple client
was written that could then request this information from the ServiceCentre
and graphically display it in a Java-capable web-browser.
|
The project largely met the Phase I goals. Status is as follows:
Modern Java Remote Method Invocation (RMI) systems need Java 1.1 to function. This is now available from most vendors for their platforms. The present status is as follows:
It was not possible to implement the software at all sites due to some limitations at various sites in HPCnet. However, it is firmly believed that these problems can be resolved if the project moves into Phase II.
|
University of Alberta
System: IBM SP2 (8 nodes) Machine Names: Address: husky1.ucs.ualberta.ca, husky2.ucs.ualberta.ca, ..., husky8.ucs.ualberta.ca OS: AIX 4.1.4 Java capable: Yes
|
Notes
Prompt and friendly email response to our OS query. |
|
University of Calgary
System: SGI PowerChallenge R8000 (18 nodes) Machine Names: Oxygen Address: oxygen.cpsc.ucalgary.ca OS: Java capable: Yes
|
Notes
System installed and functioning. |
|
Dalhousie University
System: IBM SP2 (4 nodes) Machine Names: Address: sp2-eN.ucis.dal.ca OS: AIX V3.2.5 Java capable: No
|
NotesN/A |
|
Memorial University
System: Digital AlphaServer 4100 (4 nodes) Machine Names: Address: OS: Digital Unix Java capable: Yes
|
NotesPresently discussing security questions. |
|
University of Montreal
System: SGI PowerChallenge R10000 Machine Names: Rossini, Schubert Address: OS: IRIX64 version 6.2 Java capable: Yes
|
NotesDespite initial contact delays (primary contact was out of town), email exchanges have been quite speedy and friendly. Account requests must go through lengthy committee decisions. Concerns raised include security issues and load requirements. |
|
University of Quebec at Hull
System: Alex AVX-3 (16 nodes) Machine Names: Address: OS: Java capable: No
|
NotesPresently discussing account. |
|
University of Sherbrooke
System: IBM SP2 (16 nodes) Machine Names: Address: OS: AIX 3.2.5 Java capable: No
|
NotesN/A |
|
University of Western Ontario
System: Ultrasparc Enterprise 4000 (6 nodes) Machine names: Address: panther.uwo.ca OS: Solaris 2.5.1 Java capable: Yes
System: Cray J90 Machine names: Address: OS: UNICOS 9.2 Java capable: No
|
NotesAcquisition of account underway. Fee required for system access. |
The project was successful in highlighting the issues around constructing a collaborative, agent-based, network infrastructure on top of the present HPCnet. Issues that arised were as follows:
Phase II Suggestions
PDG is interested in initiating the second phase of this investigation.
We would suggest that we do the following:
The scale of the Phase II project is significantly larger than that of the Phase I project. It will require significantly more integration with, and access to, HPCnet resources. Although PDG has encountered some constraints at the various sites, the issues are more administrative than technical. It is felt that distributed computing, with full agent support, is entirely possible with most of the HPCnet sites. The requirements are as follows:
|
PolyServ |
___
/
/
/ ___
/ /
/ /
----------
\ \
\ \___
\
\
\___
|
remote high performance computing resource delivery-on-demand distributed resource base load balancing thin client technology |
PolyNet |
___
/
/
----------
\
\___
|
OpenMath-based standard delivery of mathematical objects inter-tool communications |
PolyShare |
___
/
/
/ ___
/ /
/ /
----------
\ \
\ \___
\
\
\___
|
collaboration and real-time interaction
mediated resources mediated sharing and exchange thin interaction protocols diverse interaction models |
PolyManager |
___
/
/
/ ___
/ /
/ /
----------
\ \
\ \___
\
\
\___
|
user environment
user control customization multiple windowing tool environment |
Thus the PolyMath Development Group is seeking to build a consortium with other groups around the world to share resources and knowledge.