Mediated Resource Brokerage
Phase I Investigation, Final Report

September 10, 1997

Dr. Loki Jörgenson / Dr. Stephen Braham
Centre for Experimental and Constructive Mathematics
Simon Fraser University

  • Mediated Resource Brokerage project
  • Summary
  • Long Range Vision
  • Phase I Goals
  • PolyServ
  • Load Monitor System
  • Load Monitor Demo

  • Phase I Project Review
  • Demo Status
  • JDK Status
  • Project Concerns
  • HPCnet Resource Summary
  • Challenges for the Network

  • Phase II
  • Phase II Suggestions
  • Phase II Requirements

  • Background - PolyMath Overview
  • The PDG
  • PolyMath Projects
  • Forming a Consortium



  • Mediated Resource Brokerage project
    Summary

    The PolyMath Development Group of the Centre for Experimental and Constructive Mathematics has a completed an initial survey of the HPCnet network and proof-of-concept implementation of a PolyMath-based mediated resource brokerage system. The project's intent has been to assess the readiness of the technology, the network sites and the available facilities for the implementation of a distributed network infrastructure supporting the operation of HPCnet.

    At the time of this report, the survey has been completed and a simple demo employing PolyServ been implemented. Progress connecting various sites in the HPCnet network has been slowed by various considerations. This is still underway and the demo will be expanded to include new sites as they become available and time permits.

    Results from the project indicate a high level of feasibility for such an distributed network infrastructure within HPCnet. While it is not expected to cause any particular problems technically, there are a number of adminstrative issues (such as security and site-to-site consistency) which will need to be addressed in order to pursue such a large-scale project effectively.

    In light of initiatives like Canadian Computational Collaboratory, we propose that HPCnet seriously consider investigating a distributed approach to its network, employing some of the technologies that are described below. Discussions would need to take place regarding the nature of such a network and its breadth of utility before any end-specific project could be proposed.

    We wish to thank HPCnet for its support of this project and the opportunity to work with the various sites in the network. We appreciate the interest and helpfulness of the managers and committees at the sites. We hope to be provided with further opportunity to work with HPCnet and its consituents and see our vision of a integrated distributed network to reality with the help and cooperation of other groups working on aspects of this problem.

    Long Range Vision

    The HPCnet network is presently a collection of resources and services which strongly begs the presence of an infrastructure of unifying communications and facilitation mechanisms. This central structure would allow users to access the associated HPCnet facilities in the most efficient fashion, and for services to be delivered to the larger community. In order to conside such a overarching support system, HPCnet would need to examine closely and implement some of the latest network technologies. This is viewed as essential in light of initiatives like the Canadian Computational Collaboratory.

    In our view, such a support system would have the form of an agent-oriented network which would include the following:

    User profiling
    Agent-based systems allow clients accessing HPCnet to maintain a record of what is needed by the user, and to allow access to system resources to be built on those needs.

    Membership network
    A fully integrated infrastructure will allow users to truly be members of HPCnet, accessing many of the networks resources through a single command interface, and providing general access security.

    Two tier audience; developer and user
    Presently HPCnet operates on a `roll your own' basis: Users develop code that is then run on a target machine (typically for a only a few executions) during which data is collected. This model does not recognize that a general researcher may want to use tools that require high-performance computing, but does not want to be the sole implementor of those tools. For HPCnet to deliver computational resources to a wide-range of users, it is important to be able to offer a wide range of network-accessible user interfaces to advanced applications running on HPCnet hosts.

    User access control
    Users presently need to have accounts on a range of machines, with each account set up separately. With modern networking, and especially with agent-oriented networking, it should be possible to set up a single `HPCnet' account, and then set up an authorization list for which machines the user may use. Such a system would then automatically transfer files between hosts as needed to allow the user to access the best machine possible.

    Resource access control
    An agent-based infrastructure to HPCnet will also allow host-based resources to be controlled and allocated on a per-user basis.

    Remote communication and exchange protocol
    The largest gain of a single infrastructure is that it will allow communication between remote users and services, and also allow real-time communication between users. This provides the essential collaborative structure needed for effective interactive use of high-performance computing.
    Phase I was designed to investigate the current state of HPCnet for such an infrastructure by implementing a proof-of-concept system. The outcomes then identify the potential for a full-scale prototype in any subsequent phase of development.


    Phase I Goals

    The CECM PolyMath group proposed that it

    To achieve these goals, we proposed to proceed in several steps:

    • We conduct a tightly focussed investigation of HPCnet's resource base, audience needs and near future developments to determine the details of the implementation.
    • A proof-of-concept demonstration is developed.
    • A presentation is made to HPCnet members in Calgary
    • Following discussions which establish a particular set of development goals and a production schedule, we pursue a pilot project.

    These goals form the first phase in a project to construct the infrastructure needed to upgrade HPCnet to a fully agent-capable modern network that can deliver more than simple large-scale batch-oriented computing to education, research, and business users in Canada.



    PolyServ: Distributed Services System

    PDG decided to test the interoperability of the HPCnet systems by implementing their PolyServ distributed computing technology on it.

    PolyServ is a system for delivering mathematical services over the network for use in advanced online environments for working in the sciences and engineering.

    The aims are to:
  • use distributed object networking to place services on many machines
  • create `service centres' that process requests for new services
  • automatically launch copies of services as needed
  • maintain services `pools' that provides pre-initialized services ready for rapid access (to handle high-load situations)
  • recycle services after use, to reduce the overhead of constantly creating new services
  • dispose of services, including service pools, when not needed
  • make all accesses completely transparent, so that location of service and service user do not matter
  • Allow dynamic `hot-swapping' of services, so that services can be added, deleted, and modified, without shutting down PolyServ or PolyNet
  • PolyServ functions by placing ServiceLaunchers on each host, which maintain a list of services available on those hosts and publish that information to ServiceCentres on coordinating hosts. Clients can then request resouces through the ServiceCentres, which allocate those resources on the best machine possible.

    The core server system, and client system, is written in Java. This allows mediating agents to be dynamically downloaded into web browsers to handle communication with the remote resources, providing security and stability to the system.

    PolyServ represents PDG's core development of a fully agent-oriented, publish and subscribe, networking system. It was used in Phase I as a testbed, with minimal services, to explore the communication and implementation issues needed for a later complete implementation of such a system on HPCnet.




    HPCnet Load Monitoring System

    The test used for the HPCnet Phase I project was to attempt implementing a basic service that would provide HPCnet host and load information, and basic information on network connection quality.

    Each ServiceLauncher was set up to publish the required information, periodically and on demand, to a single ServiceCentre. A simple client was written that could then request this information from the ServiceCentre and graphically display it in a Java-capable web-browser.




    Phase I Project Review
    Demo Status

    The project largely met the Phase I goals. Status is as follows:

  • The demo system was successfully constructed and ported to the SGI R10000 Power Challenge at the University of Calgary.
    Load Monitor Demo

  • Each machine in HPCnet was investigated for Java portability.
  • Account requests were placed at several HPCnet sites (some unsucessfully).
  • Results were compiled, and conclusions drawn.



  • Java (1.1) availability

    Modern Java Remote Method Invocation (RMI) systems need Java 1.1 to function. This is now available from most vendors for their platforms. The present status is as follows:

  • Available on:
  • Sun Enterprise Series (Solaris 2.4 or above)
  • SGI PowerChallenge R-series (IRIX 5.3 or above)
  • IBM SP2 (AIX 4.1.3 or above)
  • Digital AlphaServer (Digital Unix)
  • Alex AVX-3 (NT 4.0 and above)
  • Not available on:
  • IBM SP2 with AIX 3.* OS
  • Alex AVX-3
  • Cray J90



  • Project Concerns

    It was not possible to implement the software at all sites due to some limitations at various sites in HPCnet. However, it is firmly believed that these problems can be resolved if the project moves into Phase II.

  • Costs at some sites
    Many sites charge for CPU usage. This is not an appropriate model for an experimental network infrastructure (though, of course, it is perfectly reasonable for charging users).

  • Security concerns
    Many sites have security concerns when it comes to network connectivity. It should be noted that PDG is committed to ensuring that PolyServ does not expose hosts to attack. Under the circumstances though, the reluctance of some sites to participate is fully understood.

  • Slow progress communicating
    A lot of sites have a committee structure that makes it awkward to have accounts allocated for experimental work.

  • Batch-only machines
    Many sites use a batch-oriented procedure for distributing jobs across their machines. This is a historically significant technique and consistent with many existing computing practices but is not really compatible with modern HPC computing, considering the availability of more advanced distributed computing solutions. For experimental network work, it is important that projects like ours have access to the underlying system, and are not limited to batch-oriented requests.

  • Internal policy limits
    There are many policies in place at HPCnet sites that are focussed on account requests that require purely high-powered, batch-oriented, computing. It is difficult to get accounts that are based on interactive access to and from the network. This makes it difficult to implement the software needed for network infrastructure at many HPCnet sites.



  • Summary of HPCnet Resources and Status

    University of Alberta

    System: IBM SP2 (8 nodes)

    Machine Names:

    Address: husky1.ucs.ualberta.ca, husky2.ucs.ualberta.ca, ..., husky8.ucs.ualberta.ca

    OS: AIX 4.1.4

    Java capable: Yes

    Notes

    Prompt and friendly email response to our OS query.
    Account recently received. Installation in progress.

    University of Calgary

    System: SGI PowerChallenge R8000 (18 nodes)

    Machine Names: Oxygen

    Address: oxygen.cpsc.ucalgary.ca

    OS:

    Java capable: Yes

    Notes

    System installed and functioning.
    Slight modifications needed due to DNS problems.

    Dalhousie University

    System: IBM SP2 (4 nodes)

    Machine Names:

    Address: sp2-eN.ucis.dal.ca

    OS: AIX V3.2.5

    Java capable: No

    Notes

    N/A

    Memorial University

    System: Digital AlphaServer 4100 (4 nodes)

    Machine Names:

    Address:

    OS: Digital Unix

    Java capable: Yes

    Notes

    Presently discussing security questions.

    University of Montreal

    System: SGI PowerChallenge R10000
    (two 8-node systems connected via HIPPI)

    Machine Names: Rossini, Schubert

    Address:

    OS: IRIX64 version 6.2

    Java capable: Yes

    Notes

    Despite initial contact delays (primary contact was out of town), email exchanges have been quite speedy and friendly. Account requests must go through lengthy committee decisions.

    Concerns raised include security issues and load requirements.

    University of Quebec at Hull

    System: Alex AVX-3 (16 nodes)

    Machine Names:

    Address:

    OS:

    Java capable: No

    Notes

    Presently discussing account.

    University of Sherbrooke

    System: IBM SP2 (16 nodes)

    Machine Names:

    Address:

    OS: AIX 3.2.5

    Java capable: No

    Notes

    N/A

    University of Western Ontario

    System: Ultrasparc Enterprise 4000 (6 nodes)

    Machine names:

    Address: panther.uwo.ca

    OS: Solaris 2.5.1

    Java capable: Yes


    System: Cray J90

    Machine names:

    Address:

    OS: UNICOS 9.2

    Java capable: No

    Notes

    Acquisition of account underway.

    Fee required for system access.


    Implementation issues

    The project was successful in highlighting the issues around constructing a collaborative, agent-based, network infrastructure on top of the present HPCnet. Issues that arised were as follows:

  • Security control
    It is important to ensure that security matters are addressed by anybody setting up a national infrastructire for HPCnet networking. It requires that considerably communication and understanding be built up between local site management and HPCnet administration.

  • Site requirements; current JDK
    Most of the HPCnet sites are capable of supporting Java in the latest, distributed computing-capable releases. It would be beneficial to have that software at all HPCnet sites, and essential for large-scale distributed agent-oriented networking. However, some sites cite security issues as a reason for not having it available. Other run batch-only services.

  • Breadth of utility
    It is not entirely clear how best to organize and offer HPCnet's services so as to meet its mandates. Considerable reflection is necessary to establish its direction and intended audience and how to meet the requirements of both.

  • HPCnet committement to distributed networking
    If HPCnet committed to implementing distributed, collaborative computing throughout its network, PDG feels that specific support for such an initiative will be required. Related policies would be relatively straightforward and specialized in nature. Infrastructure needs to be supported as a research topic uniformly at all sites, with provisions made for special accounts, and acounting procedures. This would greatly enhance HPCnet's ability to quickly implement experimental infrastructure, and investigate solutions for the future of the network.



  • Phase II

    Phase II Suggestions

    PDG is interested in initiating the second phase of this investigation. We would suggest that we do the following:

  • Implement a robust PolyServ system on all available HPCnet hosts.

  • Add a full agent-capable layer to PolyServ, allowing services to dynamically move from one HPCnet host to another.

  • Implement selected services through PolyServ that provide HPCnet utilization to general and/or specific users in a secure manner.

  • Implement PDG's PolyShare technology, which allows for collaborative use of the PolyServ system.



  • Phase II requirements

    The scale of the Phase II project is significantly larger than that of the Phase I project. It will require significantly more integration with, and access to, HPCnet resources. Although PDG has encountered some constraints at the various sites, the issues are more administrative than technical. It is felt that distributed computing, with full agent support, is entirely possible with most of the HPCnet sites. The requirements are as follows:

  • Access to non-batched, interactive shell, accounts on all HPCnet nodes.

  • Unlimited runtime, and no charges.

  • Exploration, with all HPCnet members, of security questions.

  • Open debate on utilization of the HPCnet network.

  • One full man-year of funding (note the increased level of committement over the original proposal).





  • Background on PDG and PolyMath Technologies
    PolyMath Development Group
    http://pdg.cecm.sfu.ca/

    Director - Stephen Braham
    Developer - Terrance Yu
    Developer - Paul Irvine
    Educational Testing and Evaluation Coordinator - Nathalie Sinclair
    Educational Development - Terry Stanway
    Technical Assistant - Jen Chang



    PolyMath Technologies


    PolyServ

            ___
           /
          /
         /  ___
        /  / 
       /  / 
     ----------
       \  \ 
        \  \___
         \
          \
           \___
    
    remote high performance computing

    resource delivery-on-demand

    distributed resource base

    load balancing

    thin client technology


    PolyNet

            ___
           / 
          / 
     ----------
          \ 
           \___
    
    OpenMath-based standard

    delivery of mathematical objects

    inter-tool communications


    PolyShare

            ___
           /
          /
         /  ___
        /  / 
       /  / 
     ----------
       \  \ 
        \  \___
         \
          \
           \___
    
    collaboration and real-time interaction

    mediated resources

    mediated sharing and exchange

    thin interaction protocols

    diverse interaction models


    PolyManager

            ___
           /
          /
         /  ___
        /  / 
       /  / 
     ----------
       \  \ 
        \  \___
         \
          \
           \___
    
    user environment

    user control

    customization

    multiple windowing

    tool environment




    A Consortium for Network Technology development
    in the Mathematical Sciences

    It is important to link the resources of the various groups that are now working on the knotty problems of

  • Communicating scientific and engineering data

  • Providing for sophisticated online learning

  • Constructing networked environments for collaborative scientific research

  • Providing interactive content

  • Thus the PolyMath Development Group is seeking to build a consortium with other groups around the world to share resources and knowledge.