Mon Nov 23 16:03:12 EST 1992
From owner-mpi-collcomm@CS.UTK.EDU  Tue Nov 24 23:07:28 1992
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA15625; Tue, 24 Nov 92 23:07:28 -0500
Received:  by CS.UTK.EDU (5.61++/2.8s-UTK)
	id AA26099; Tue, 24 Nov 92 22:57:48 -0500
Received: from gstws.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA26095; Tue, 24 Nov 92 22:57:45 -0500
Received: by gstws.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA13365; Tue, 24 Nov 1992 22:57:44 -0500
Date: Tue, 24 Nov 1992 22:57:44 -0500
From: geist@gstws.epm.ornl.gov (Al Geist)
Message-Id: <9211250357.AA13365@gstws.epm.ornl.gov>
To: mpi-collcomm@cs.utk.edu
Subject: MPI collective communication...


Collective communication subcommittee.

Welcome. We have our work cut out for us - first because collective
communication was not included in the first iteration of the MPI draft
and second because "groups" caused the most resistance in the last meeting.

In the next 6 weeks we need to come up with and agree on the definition 
of a set of routines that fall under the jurisdiction of collective
communication. As I see it these routines fall into two categories.

- routines that require the cooperation of a group of processes.
  This includes collective communication like multicast 
  and cooperative routines like synchronization.

- routines that create groups of processes and potentially modify these groups.
  This also needs to include group information routines 
  that we feel are required like who am I in the group.

Two items that need to be coordinated with the pt2pt subcommittee
heterogeneity - it's not in the present MPI draft. If we want to be able
                to execute across heterogeneous networks, then we have to 
                think about how a process is identified in MPI and
                also how a message buffer can get encoded/decoded.
                For the latter we will need to know the type of the
                data in pack/unpack routines. 
                (or specified directly in the send/recv)

Inter-group communication - point to point communication between two members
                of a group.

As a first step I would like to get everyone's ideas out on the table
so we can see what type of consensus we have. And so we don't miss any
good ideas. So what basic routines (functions) do you think are required?
I would like to get your input to this first step by December 5.
----------------------
Since I got the short straw, I'll go first.
My basic philosophy about MPI and our standards effort is to
KEEP THINGS SIMPLE. It is easier to add a function later if
we see lots of users combining the basic routines in standard ways.
It is a waste to support a bunch of routines only 1% of the users ever call.

General:
I would like to see all the routines be functions that return error code(s)
as opposed to subroutines.

=======================================================================
Groups:
=======================================================================
Groups could be implemented separate from the collective communication routines.
The collective routines could take an integer array list of task IDs
and there could be a group routine that returned such a list.
There are efficiency factors here since the list of members of a group
would not have to be looked up every time a collective routine was called.
FUNCTIONS: groupsize()
           groupmembers()

GID: groups could be user named and addressed by name
or they could be addressed by a system supplied (unique) integer group ID.

Question - should groups be allowed to overlap?
Question - should we let groups be dynamic or restrict them to be static?

Group member IDs: There should be a notion of the members of a group
being addressable either directly or indirectly by [0 -- num_of_members-1]
There needs to be a routine to return mygroupINDEX (at least) and maybe
a more general routine that can return any process' group index.
FUNCTIONS: gettaskID( given GID and group index )
           getindex( given GID and taskID )

Creating groups: Here are three alternative methods. 
Method 1 (dynamic)
Most general case is to allow any task to join or leave
any group at any time without the consent of the other group members.
While this creates a simple and flexible user interface, it can be 
difficult to implement because of the potential race conditions.
FUNCTIONS: joingroup()
           leavegroup()

Method 2 (static)
A group could be defined by any single task by listing the task IDs.
Or alternatively all the future members of a group have to simultaneously
define the same group.
FUNCTION: makegroup()

Method 3 (dynamic)
Another method which met with some resistance when presented at the 
last MPI meeting was the notion of creating groups by partitioning
an existing group. The negative comments were the large number of routines
involved and the lack of usefulness of a tree of groups.
I am not keen on this method but for completeness.
FUNCTIONS: from MPI draft
           partition()
           root()
           children()
           parent()
           siblings()
           pushg()
           popg()

==============================================================================
Collective Routines:
==============================================================================
One problem we can get into is defining many different 
collective communication routines gmax, gsum, gadd, etc.
I propose that we have only a handful of routines based 
on the underlying communication logic.
All participating tasks call the same function.

FUNCTIONS:

broadcast()  broadcast a message from one task to all tasks in a group.

reduce()     inverse of broadcast. Data from all tasks in a group
			 is reduced using a predefined function or a user function
			 and the result is placed in a specified task.
			 Function name is specified in the argument list.
			 Pre-defined functions should include: max, min, add, mult,
			 and optionally AND, OR, XOR. (others?)

scatter()    a single task contains different messages for each task.
			 Scatter these messages to all tasks in a group.

gather()     inverse of scatter. gather distinct messages from each task
			 in a group and collect them in a specified task.

synchronize()  barrier synchronization of a group of tasks.

shift()      assume group members form a (logical) ring.
			 shift the message in each task to its right (or left) neighbor.
			 (useful in matrix multiply shift and roll algorithm)

exchange()   equivalent to every task in a group calling scatter.
             (routine used for matrix transpose)

all2all()    equivalent to every task in a group calling broadcast.

                          -----------------------------
   __o        /\          Al Geist
 _`\<,_    /\/  \         Oak Ridge National Laboratory
(_)/ (_)  /      \        (615) 574-3153   gst@ornl.gov
* * * * * * * * * *       -----------------------------
From owner-mpi-collcomm@CS.UTK.EDU  Wed Nov 25 13:41:20 1992
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA21743; Wed, 25 Nov 92 13:41:20 -0500
Received:  by CS.UTK.EDU (5.61++/2.8s-UTK)
	id AA09805; Wed, 25 Nov 92 13:14:39 -0500
Received: from relay2.UU.NET by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA09790; Wed, 25 Nov 92 13:14:31 -0500
Received: from uunet.uu.net (via LOCALHOST.UU.NET) by relay2.UU.NET with SMTP 
	(5.61/UUNET-internet-primary) id AA22179; Wed, 25 Nov 92 13:14:32 -0500
Received: from kailand.UUCP by uunet.uu.net with UUCP/RMAIL
	(queueing-rmail) id 131339.4005; Wed, 25 Nov 1992 13:13:39 EST
Received: from brisk.kai.com (brisk) by kailand.kai.com via SMTP
  (5.65d-92031301) id AA12688; Wed, 25 Nov 1992 12:06:04 -0600
Received: by brisk.kai.com
  (920330.SGI-92101201) id AA08958; Wed, 25 Nov 92 12:06:02 -0600
Date: Wed, 25 Nov 92 12:06:02 -0600
Message-Id: <9211251806.AA08958@brisk.kai.com>
To: mpi-pt2pt@cs.utk.edu, mpi-collcomm@cs.utk.edu, mpi-formal@cs.utk.edu,
        mpi-ptop@cs.utk.edu
Reply-To: William.Gropp's.message.of.Wed@kai.com,
        25 Nov 92 09:28:43 CST <9211251528.AA12985@godzilla.mcs.anl.gov>
Subject: Nonblocking functions and handlers.
From: Steven Ericsson Zenith <zenith@kai.com>
Sender: zenith@kai.com
Organization: 	Kuck and Associates, Inc.
		1906 Fox Drive, Champaign IL USA 61820-7334,
		voice 217-356-2288, fax 217-356-5199


Bill Gropp writes:

    (Warning: radical position that I'm not sure even I hold follows:)
    An interesting issue is whether we should defer all nonblocking communications
    to a thread-based execution model.

I'm not so sure this is a radical position Bill since even
nonsynchronized communication will need to be defined formally this way.
Nonsynchronized communication is in effect creating a parallel process
that has the job of passing the communication on. Al Geist earlier asked
the question wheather buffers used by nonsynchronized communication
should be accessible after the communication has started - the answer
should be - no, unless by some explicit mechanism that formally amounts
to a communication with the process mentioned above.  Any nonexplicit
interaction (e.g. a write to the buffer) would have to be specified as
formally equivalent to an explicit interaction.

Also, there is quite a range of terminology in use.  One common error:
"Asynchronous" and "synchronous" has quite a particular meaning in EE
and when CS people use the terms in relation to message passing they
usually mean NONSYNCHRONIZED and SYNCHRONIZED. Also BLOCKING =
SYNCHRONIZED. Let us begin a glossary that defines the terms we use - if
no-one else volunteers I'll take this to be the responsibility of the
Formal Specification Subcommittee. So I'm looking for volunteers from
that subcommittee.

Steven

From owner-mpi-collcomm@CS.UTK.EDU  Wed Nov 25 15:37:50 1992
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA25014; Wed, 25 Nov 92 15:37:50 -0500
Received:  by CS.UTK.EDU (5.61++/2.8s-UTK)
	id AA12301; Wed, 25 Nov 92 15:15:34 -0500
Received: from relay2.UU.NET by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA12297; Wed, 25 Nov 92 15:15:32 -0500
Received: from uunet.uu.net (via LOCALHOST.UU.NET) by relay2.UU.NET with SMTP 
	(5.61/UUNET-internet-primary) id AA26923; Wed, 25 Nov 92 15:15:35 -0500
Received: from kailand.UUCP by uunet.uu.net with UUCP/RMAIL
	(queueing-rmail) id 151429.20685; Wed, 25 Nov 1992 15:14:29 EST
Received: from brisk.kai.com (brisk) by kailand.kai.com via SMTP
  (5.65d-92031301 for <mpi-collcomm@cs.utk.edu>) id AA15317; Wed, 25 Nov 1992 13:22:32 -0600
Received: by brisk.kai.com
  (920330.SGI-92101201) id AA09015; Wed, 25 Nov 92 13:22:31 -0600
Date: Wed, 25 Nov 92 13:22:31 -0600
Message-Id: <9211251922.AA09015@brisk.kai.com>
To: geist@gstws.epm.ornl.gov
Cc: mpi-collcomm@cs.utk.edu
In-Reply-To: Al Geist's message of Tue, 24 Nov 1992 22:57:44 -0500 <9211250357.AA13365@gstws.epm.ornl.gov>
Subject: MPI collective communication...
From: Steven Ericsson Zenith <zenith@kai.com>
Sender: zenith@kai.com
Organization: 	Kuck and Associates, Inc.
		1906 Fox Drive, Champaign IL USA 61820-7334,
		voice 217-356-2288, fax 217-356-5199


   Date: Tue, 24 Nov 1992 22:57:44 -0500
   From: geist@gstws.epm.ornl.gov (Al Geist)

	<discussion on groups>

Is this discussion not the domain of the process and topology subcommittee?

   ==============================================================================
   Collective Routines:
   ==============================================================================
   One problem we can get into is defining many different 
   collective communication routines gmax, gsum, gadd, etc.
   I propose that we have only a handful of routines based 
   on the underlying communication logic.
   All participating tasks call the same function.

   FUNCTIONS:

   broadcast()  broadcast a message from one task to all tasks in a group.

Agreed. So, using my earlier suggestion where communications are
"channels" (logically shared objects .. whatever) with logical names, a
broadcast channel called G

	/* example of a declaration */
	communication broadcast_type(N) <datatype> G

"(N)" identifies the number of participants in the broadcast -
would be broadcast to by

	broadcast(G, expression)

(actually this would be formally equivalent to "send(G, expression)"
since G carries the broadcast semantics)

and each process (task) would recieve the message by

	receive(G, variable)

You must clearly identify the meaning of parallel broadcasts to the same
group. I would choose the constuction

	(... broadcast(G, x) ...) ||
	(... broadcast(G, y) ...) ||
	(... receive(G, v1) -> receive(G, v2) ...) ||
	...
to mean
	v1 = x | y
	v2 = x iff v1 = y
	v2 = y iff v1 = y

   reduce()     inverse of broadcast. Data from all tasks in a group
			    is reduced using a predefined function or a user function
			    and the result is placed in a specified task.
			    Function name is specified in the argument list.
			    Pre-defined functions should include: max, min, add, mult,
			    and optionally AND, OR, XOR. (others?)

I'm not sure I like the introduction of the function name. The inverse
of broadcast though is in effect a many-to-one. So

	/* example of a declaration */
	communication reduce_type(N) <datatype> R

would be written to by 

	send(R, e)

in N processes, and

	reduce(R, v, f)

is equivalent to

	receive( R, result )
	receive( R, v)
	v = f( result, v )
	receive(R, result)
	v = f(result, v)
	... until N receive times

In both broadcast and reduce cases we have left it to the implementation
to count the distinct communication instances.

Again we must concern ourselves with the meaning of parallel reduce constuctions

	(... reduce(R, v1, f) ...) ||
	(... reduce(R, v2, f) ...) ||
	(... send(R, x) -> send(R, y) ...)

It would be simplest to restrict this case and say reduce can only
appear in one process for each reduce type, but what about 

	(... reduce(R, v2, f) ...) ||
	(... send(R, x) -> send(R, y) ...)

does each send in sequence apply to one reduce or subsequent reduces. To
be the inverse of broadcast it would be the former.

   scatter()    a single task contains different messages for each task.
			    Scatter these messages to all tasks in a group.

Isn't this an abbreviation for a sequence of sends on an array of
one-to-one channels? So an array of channels S

	/* example of a declaration */
	communication one-to-one (N) S

where 

	scatter(S, A)

such that A is an array of size N, and the scatter is equivalent to

	parallel do i
		send(S[i], A[i])
	end parallel do

and the corresponding receive looks like

	receive(S[i], v)

   gather()     inverse of scatter. gather distinct messages from each task
			    in a group and collect them in a specified task.

Similarly, this an abbreviation for a sequence of recieves on an array of
one-to-one channels. So an array of channels G

	/* example of a declaration */
	communication one-to-one (N) G

where 

	gather(G, A)

such that A is an array of size N, and the gather is equivalent to

	parallel do i
		receive(G[i], A[i])
	end parallel do

and the corresponding send looks like

	send(G[i], e)

   synchronize()  barrier synchronization of a group of tasks.

This is also a many-to-one where the one is a synchronization process
created by the declaration (yes, I know this sounds odd).

		/* example of a declaration */
	communication sync SYNC
and
	synchronize(SYNC)

is equivalent to the output

	send(SYNC)

i.e. send with no output value.

   shift()      assume group members form a (logical) ring.
			    shift the message in each task to its right (or left) neighbor.
			    (useful in matrix multiply shift and roll algorithm)

This can be constructed from the above.

   exchange()   equivalent to every task in a group calling scatter.
		(routine used for matrix transpose)

This is tricky, and isn't as simple as is implied. I have no trouble
with it if we can specify a deadlock free implementation, but frankly I
think it is out of place here.

   all2all()    equivalent to every task in a group calling broadcast.

Why doesn't this cause deadlock in the group? Nah! It does cause deadlock.

Steven


From owner-mpi-collcomm@CS.UTK.EDU  Wed Nov 25 18:12:56 1992
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA26783; Wed, 25 Nov 92 18:12:56 -0500
Received:  by CS.UTK.EDU (5.61++/2.8s-UTK)
	id AA15993; Wed, 25 Nov 92 18:07:19 -0500
Received: from relay1.UU.NET by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA15989; Wed, 25 Nov 92 18:07:17 -0500
Received: from uunet.uu.net (via LOCALHOST.UU.NET) by relay1.UU.NET with SMTP 
	(5.61/UUNET-internet-primary) id AA23256; Wed, 25 Nov 92 18:07:14 -0500
Received: from kailand.UUCP by uunet.uu.net with UUCP/RMAIL
	(queueing-rmail) id 180640.18440; Wed, 25 Nov 1992 18:06:40 EST
Received: from brisk.kai.com (brisk) by kailand.kai.com via SMTP
  (5.65d-92031301 for <mpi-collcomm@cs.utk.edu>) id AA24413; Wed, 25 Nov 1992 16:21:00 -0600
Received: by brisk.kai.com
  (920330.SGI-92101201) id AA09165; Wed, 25 Nov 92 16:20:58 -0600
Date: Wed, 25 Nov 92 16:20:58 -0600
Message-Id: <9211252220.AA09165@brisk.kai.com>
To: zenith@kai.com
Cc: geist@gstws.epm.ornl.gov, mpi-collcomm@cs.utk.edu
In-Reply-To: Steven Ericsson Zenith's message of Wed, 25 Nov 92 13:22:31 -0600 <9211251922.AA09015@brisk.kai.com>
Subject: MPI collective communication...
From: Steven Ericsson Zenith <zenith@kai.com>
Sender: zenith@kai.com
Organization: 	Kuck and Associates, Inc.
		1906 Fox Drive, Champaign IL USA 61820-7334,
		voice 217-356-2288, fax 217-356-5199


An typo. error crept into my last message.

	v1 = x | y
	v2 = x iff v1 = y
	v2 = y iff v1 = y

should, of course, be

 	v1 = x | y
	v2 = x iff v1 = y
	v2 = y iff v1 = x

And in the examples all sends are synchronized (blocking).

Steven


From owner-mpi-collcomm@CS.UTK.EDU  Wed Nov 25 19:37:42 1992
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA28059; Wed, 25 Nov 92 19:37:42 -0500
Received:  by CS.UTK.EDU (5.61++/2.8s-UTK)
	id AA16782; Wed, 25 Nov 92 19:14:32 -0500
Received: from relay2.UU.NET by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA16778; Wed, 25 Nov 92 19:14:29 -0500
Received: from uunet.uu.net (via LOCALHOST.UU.NET) by relay2.UU.NET with SMTP 
	(5.61/UUNET-internet-primary) id AA25540; Wed, 25 Nov 92 19:14:34 -0500
Received: from kailand.UUCP by uunet.uu.net with UUCP/RMAIL
	(queueing-rmail) id 191312.8393; Wed, 25 Nov 1992 19:13:12 EST
Received: from brisk.kai.com (brisk) by kailand.kai.com via SMTP
  (5.65d-92031301 for <mpi-collcomm@cs.utk.edu>) id AA26829; Wed, 25 Nov 1992 17:33:21 -0600
Received: by brisk.kai.com
  (920330.SGI-92101201) id AA09251; Wed, 25 Nov 92 17:33:20 -0600
Date: Wed, 25 Nov 92 17:33:20 -0600
Message-Id: <9211252333.AA09251@brisk.kai.com>
To: geist@gstws.epm.ornl.gov, mpi-collcomm@cs.utk.edu
In-Reply-To: Steven Ericsson Zenith's message of Wed, 25 Nov 92 13:22:31 -0600 <9211251922.AA09015@brisk.kai.com>
Subject: MPI collective communication...
From: Steven Ericsson Zenith <zenith@kai.com>
Sender: zenith@kai.com
Organization: 	Kuck and Associates, Inc.
		1906 Fox Drive, Champaign IL USA 61820-7334,
		voice 217-356-2288, fax 217-356-5199


Observation on the following point:

	    synchronize()  barrier synchronization of a group of tasks.

	 This is also a many-to-one where the one is a synchronization process
	 created by the declaration (yes, I know this sounds odd).

			 /* example of a declaration */
		 communication sync SYNC
	 and
		 synchronize(SYNC)

	 is equivalent to the output

		 send(SYNC)

	 i.e. send with no output value.

I should clarify this. Given

	(P||Q);R

This reads P and Q in parallel followed by R; i.e., there is a barrier
at the semicolon. To implement this barrier using Al's primitive the
compiler in effect places a send(SYNC) at the end of P and Q and the
corresponding receive(SYNC);receive(SYNC) at the start of R. Using
something, perhaps more familiar

	begin parallel
	   section
		P
	   end section
	   section
		Q
	   end section
	end parallel
	R

translated using MPI might become the following three programs executed
on three nodes of a distributed memory machine

	program Node0
		P
		synchronize(SYNC)
	end program

	program Node1
		Q
		synchronize(SYNC)
	end program

	program Node2
		receive(SYNC)
		receive(SYNC)
		R
	end program

But now I'm less convinced we need a separate synchronize primitive and
should just permit "empty" messages in send and receive for their
synchronization characteristics. (An implementation may, of course,
choose to send a dummy value to gain the same effect).

Steven
	



From owner-mpi-collcomm@CS.UTK.EDU  Fri Nov 27 12:08:43 1992
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA25724; Fri, 27 Nov 92 12:08:43 -0500
Received:  by CS.UTK.EDU (5.61++/2.8s-UTK)
	id AA08742; Fri, 27 Nov 92 12:06:12 -0500
Received: from relay1.UU.NET by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA08738; Fri, 27 Nov 92 12:06:10 -0500
Received: from uunet.uu.net (via LOCALHOST.UU.NET) by relay1.UU.NET with SMTP 
	(5.61/UUNET-internet-primary) id AA02767; Fri, 27 Nov 92 12:06:08 -0500
Received: from kailand.UUCP by uunet.uu.net with UUCP/RMAIL
	(queueing-rmail) id 120540.25488; Fri, 27 Nov 1992 12:05:40 EST
Received: from brisk.kai.com (brisk) by kailand.kai.com via SMTP
  (5.65d-92031301) id AA12937; Fri, 27 Nov 1992 10:25:06 -0600
Received: by brisk.kai.com
  (920330.SGI-92101201) id AA11158; Fri, 27 Nov 92 10:25:05 -0600
Date: Fri, 27 Nov 92 10:25:05 -0600
Message-Id: <9211271625.AA11158@brisk.kai.com>
To: mpi-collcomm@cs.utk.edu
Cc: mpi-formal@cs.utk.edu
In-Reply-To: Steven Ericsson Zenith's message of Wed, 25 Nov 92 13:22:31 -0600 <9211251922.AA09015@brisk.kai.com>
Subject: MPI collective communication...
From: Steven Ericsson Zenith <zenith@kai.com>
Sender: zenith@kai.com
Organization: 	Kuck and Associates, Inc.
		1906 Fox Drive, Champaign IL USA 61820-7334,
		voice 217-356-2288, fax 217-356-5199


Observation on the following:

	   all2all()    equivalent to every task in a group calling broadcast.

	Why doesn't this cause deadlock in the group? Nah! It does cause deadlock.

I was thinking about this yesterday over my stuffed Tofu :-). Even if we
permit the broadcast to be nonsynchronized we have the problem I
described earlier with defining the behavior of parallel broadcasts. If
all2all is nonsynchronized then the order of received values must be
nondeterministic.

(|| i for N: broadcast(C, e[i])) || (|| k for N:|| j for N: receive(C, v[k, j]))

i.e., the order of values from e in v is nondeterministic. Now maybe I'm
missing something that has to do with the TMC perspective - in any case,
I have never seen the use of such a construction in an application. If
we do specify a deadlock free behavior for all2all is it desirable given
this nondeterminism? I know it's implementation will be tricky to get
right. Can we have some vendor comments please?

I have assumed here that the values broadcast are the same type.

Steven

Footnote: The syntax 

(|| i for N: broadcast(C, e[i])) || (|| k for N:|| j for N: recieve(C, v[k, j]))

illustrates N broadcasts implementing the all2all, where N is the number
of participants, in parallel with N parallel groups of N (parallel) receives.

From owner-mpi-collcomm@CS.UTK.EDU  Fri Nov 27 12:37:48 1992
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA25951; Fri, 27 Nov 92 12:37:48 -0500
Received:  by CS.UTK.EDU (5.61++/2.8s-UTK)
	id AA08855; Fri, 27 Nov 92 12:17:02 -0500
Received: from sampson.ccsf.caltech.edu by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA08851; Fri, 27 Nov 92 12:16:57 -0500
Received: from elephant by sampson.ccsf.caltech.edu with SMTP id AA24714
  (5.65c/IDA-1.4.4 for mpi-collcomm@cs.utk.edu); Fri, 27 Nov 1992 09:16:50 -0800
Received: by elephant (4.1/SMI-4.1)
	id AA13810; Fri, 27 Nov 92 08:13:55 PST
Date: Fri, 27 Nov 92 08:13:55 PST
From: jwf@parasoft.com (Jon Flower)
Message-Id: <9211271613.AA13810@elephant>
To: mpi-collcomm@cs.utk.edu

To: mpi-collcomm@cs.utk.edu
Re: A few ideas

In response to Al Geist's request here are a few more or less
random ideas about collective communication based on my own
experience......

Some comments about groups:

    I think that we should not describe the "group" concept in
    collective communication in terms of lists of task ID's. It
    might be implemented that way but I think the underlying
    concept should be related to the user application topology.

    I think the key to really optimizing collective 
    communication routines is to be to match the system's geometric
    knowledge of the hardware topology with the geometric
    behavior of the "logical topology" of the user code. 
    So, for example, you can do a lot better on a column-restricted
    broadcast if you know that the user's logical topology actually
    matches the hardware of the DELTA (for example).

    Similarly the exchange primitive doesn't make too much
    sense when defined only in terms of a list of nodes since
    at best the user is left with the responsibility of forming
    the list in the right order.

    I would like to see groups described in conjunction with
    the topological info and, in general, I though Rolf's idea
    was pretty good except that I didn't see how to deal with
    a very common case; a broadcast from a "host" program
    to all of its "nodes" or a reduction from all the "nodes"
    into the "host". These can both be represented as the
    combination of a "node" only operation and a point-to-point
    operation but it would be nice to encapsulate them somehow
    since they come up all the time.

    We could leave open a loophole for "expert" users to make their
    own groups from lists of task ID's but I don't know how we'd
    optimize their behavior.

Some comments about the individual functions:

broadcast:
---------
    I'm not sure who to address this issue to - it probably
    falls outside our domain of comment, but how do you
    deal with the case of a "master" program broadcasting
    to all of its slaves? (Actually read "host" for "master"
    and "node" for "slaves".) Does MPI1 even support this
    concept? It comes up all the time in our applications.
    I suppose this is an application of the group concept
    but it's one that I would like to see very streamlined
    because of its generality.

reduce:
------
    The comments for "reduce" and "gather" indicate that only
    one task in the group can get at the result? I would hope
    that there was some way for all the tasks in a group to 
    get the answer too, without following the reduce/gather
    with a broadcast operation since this looses a lot of
    efficiency.

    I like the idea of a function pointer for reduce.
    Is this done by having a general facility for the user
    with a function pointer argument and then providing a list
    of pre-defined "external" functions that do the common tasks?
    This would be my preference since the heterogeneity is then
    taken care of by the system. However, how do you express
    a reduce on a standard data type using a user-specified 
    function? Is there an argument to reduce that says the data
    type so that the system can still byte swap or are we
    going to restrict reduce (and possibly all collective ops)
    to the "byte-stream" data type and force the user to
    deal with it themselves. This latter is horrible because
    putting the byte swapping in the right places for a reduction
    operation is hard.

    I would like to add "average" to the list of predefined 
    functions even though it's a triviality.

gather:
------
    As for reduce - I would hope that all tasks can get at the result
    too.

synchronize:
-----------
   This one seems to be a real thorn. I would like to have a 
   non-blocking synchronize - you call the function to say that
   you're interested in synchronizing a particular group of
   tasks and then later check to see whether they've all done
   or not. This is very valuable in certain types of event-driven 
   simulation, for example, where you might start each time 
   step by invoking the sync. function and then go off and
   respond to incoming events. Periodically you then check to
   see if everyone in your group has checked in and if so, 
   increase global virtual time for the next step.

   A non-blocking sync. also allows a single (master) task to wait for
   the completion of either/or subtasks in two disjoint slave 
   groups. Obviously this can be done in another way but is very
   elegant and simple to code with non-blocking syncs.

   I would propose both a blocking and a non-blocking "wait for
   sync to complete" function in the same way that the point-to-
   point style has both.

shift:
-----
   How do you specify the (non-)periodicity of the edge elements?
   In fact what does left and right actually mean - is there an implied
   ordering in the entries of a group?

exchange, all2all:
-----------------
    These are life savers in my opinion since they encapsulate
    the biggest problem that I've seen in user codes. Writing
    these with point-to-point message passing primitives almost
    guarantees that the code doesn't scale and that it runs out
    of memory as you go to more nodes or even bigger problems.

    On the downside I agree with Steve Zenith that implementations
    of these functions are hard. I would also say that the ways that
    user's use these functions often because they don't want to
    think about a better decomposition method and so it's possible
    that my supporting these functions we are contributing to less
    than optimal coding at the user level. I would still vote
    them in, however, on the grounds that I would get fewer
    phone calls from customers!

Generalities:
============
I think the set of functions listed is rich enough 
for most applications. It would be interesting to see how many
arguments these things end up with when you try to write down
functional specs. I wonder if it might be worth having two
functions in each category; one with very few arguments that
does what most user's will probably want and another that
has all the arguments and flexibility. This might reduce the
number of "simple" mistakes that can be made. 

For example, I often forget the "EXTERN MAX" that you need 
to pass MAX as a function pointer in FORTRAN programs. Perhaps 
the simple form of the reduce operation could have a variable 
indicating the operation type instead?

Do the collective routines have message types like the 
point-to-point routines? In general I don't think they need to
since everyone is participating at once. On the other hand if
you make a mistake in this regard having a different message
type for each one sometimes facilitates looking them up in
a debugger. The one area where a message type might be 
interesting is in regard to the "synchronize" primitive
as discussed in the comments above.

	Jon Flower, jwf@parasoft.com
	ParaSoft Corp.
	818-792-9941
From owner-mpi-collcomm@CS.UTK.EDU  Sat Dec  5 22:08:57 1992
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA22904; Sat, 5 Dec 92 22:08:57 -0500
Received:  by CS.UTK.EDU (5.61++/2.8s-UTK)
	id AA19180; Sat, 5 Dec 92 21:55:26 -0500
Received: from msr.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA19176; Sat, 5 Dec 92 21:55:23 -0500
Received: by msr.EPM.ORNL.GOV (5.61/1.34)
	id AA02074; Sat, 5 Dec 92 21:55:20 -0500
Date: Sat, 5 Dec 92 21:55:20 -0500
From: geist@msr.EPM.ORNL.GOV (Al Geist)
Message-Id: <9212060255.AA02074@msr.EPM.ORNL.GOV>
To: mpi-collcomm@cs.utk.edu
Subject: A proposal for collective communication interface. Opinions?

Collective Communication Proposal.

After reading Marc Snir's point-to-point outline, I think our 
work in the collective communication subcommittee is more clear.
A few of the Goals from the outline that I felt were particularly relevant:

1. Design an application programming interface.

2. Design an interface that is not too different from current practice

3. Define an interface that can be quickly implemented on many vendor platforms.

4. Focus on a proposal that can be agreed upon in 6 months.

5. Provide a reliable communication interface.

===============================================================================

Primary Requirement.
--------------------------------------------------------------------
The collective communication interface should be an extension of the
point-to-point interface. 
--------------------------------------------------------------------
As Marc points out on page 2 
 "SEND and RECV are a particular case of broadcast in a group of size 2;
 this observation can be used to check if the definition of collective
 communication semantics are consistent with the definition of 
 point-to-point communication."

This leads to the following points:
a. Collective routines like broadcast should provide the same 
   message data format as point-to-point routines.
   Be that [from page 7] scalar, contiguous, buffer with stride, typed
   or a union of these.

b. Collective communication should follow the same message context paradigms
   and recognize the same context control functions.

c. By using a structured name space (described on pages 7-8) 
   where all processes are identified by a (group,rank) pair
   for both the point-to-point and collective routines,
   then the users will have a consistent naming scheme
   across all the MPI communication routines.
   And those desiring a flat name space can have it by 
   using the default group "ALL".

d. Syntax of collective routines should follow the point-to-point scheme,
   whatever that turns out to be.

Collective communication is a matter of convenience for the user
and a matter of efficiency for the implementer. We must not
lose track of the fact that ANY collective communication function
can be implemented using only the MPI point-to-point routines.
I bring this up because in the spirit of simplicity and robustness
The following proposal contains only the most commonly used
and currently available functions.

=========================================================================

I propose the following minimum set of collective routines
be presented at the next committee meeting.

1. info = MPI_BCAST( buf, bytes, type, gid, root )

   Function:
   Called by all members of the group "gid" 
   using the same argument for "bytes", "type", "gid", and "root".
   On return the contents of "buf" on "root" is contained in "buf"
   on all group members.
   On return "info" contains the error code.

2. info = MPI_GATHER( buf, bytes, type, gid, root )

   Function:
   Called by all members of the group "gid" 
   using the same argument for "bytes", "type", "gid", and "root".
   On return all the individual "buf" are concatenated into the "root" buf,
   which must be of size at least gsize*bytes.
   The data is laid in the "root" buf in rank order that is
   | gid,0 data | gid,1 data | ...| gid, root data | ...| gid, gsize-1 data |
   Other member's "buf" are unchanged on return.
   On return "info" contains the error code.

3. info = MPI_GLOBAL_OP( inbuf, bytes, type, gid, op, outbuf )

   Function:
   Called by all members of the group "gid"
   using the same argument for "bytes", "type", "gid", and "op".
   On return the "outbuf" of all group members contains the 
   result of the global operation "op" applied pointwise to
   the collective "inbuf". For example, if the op is max and
   inbuf contains two float point numbers then 
	 outbuf(1) = global max( inbuf(1)) and 
	 outbuf(2) = global max( inbuf(2)) 
   A set of standard operations are supplied with MPI including:
     global max - for each data type
     global min - for each data type
	 global sum - for each data type
	 global mult- for each data type
	 global AND - for integer and logical type
	 global OR  - for integer and logical type
	 global XOR - for integer and logical type
   Optionally the users may define their own global functions for this routine.
   On return "info" contains the error code.

4. info = MPI_SYNCH( gid )

   Function:
   Called by all members of the group "gid"
   Returns only when all members have called this function.
   On return "info" contains the error code.

5. gid = MPI_MKGROUP( list_of_processes )

   Function:
   Called by all processes in the list.
   Forms a logical group containing the listed processes
   and assigns each process a unique rank in the group.
   The ranks are consecutively numbered from 0 to gsize-1.
   On return "gid" is an MPI assigned group ID (or error code if < 0)

6. gsize = MPI_GROUPSIZE( gid )

   Function:
   Can be called by any process.
   On return "gsize" is the number of members in the group "gid"
   (or error code if < 0).

7. rank = MPI_MYRANK( gid )

   Function:
   Can be called only by members of group "gid".
   On return "rank" is the rank of the calling process in group "gid"
   (an integer between 0 and gsize-1) or error code if < 0.

===========================================================================
Comments?
From owner-mpi-collcomm@CS.UTK.EDU  Mon Dec 14 15:48:54 1992
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA24894; Mon, 14 Dec 92 15:48:54 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA17266; Mon, 14 Dec 92 15:48:41 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Mon, 14 Dec 1992 20:48:40 GMT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from THUD.CS.UTK.EDU by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA17227; Mon, 14 Dec 92 15:48:19 -0500
From: Jack Dongarra <dongarra@cs.utk.edu>
Received:  by thud.cs.utk.edu (5.61++/2.7c-UTK)
	id AA03749; Mon, 14 Dec 92 15:48:17 -0500
Date: Mon, 14 Dec 92 15:48:17 -0500
Message-Id: <9212142048.AA03749@thud.cs.utk.edu>
To: mpi-collcomm@cs.utk.edu, mpi-pt2pt@cs.utk.edu
Subject: Re: Message Passing Interface Forum
Forwarding: Mail from '"Dr. C.D. Wright" <CDW10@LIVERPOOL.AC.UK>'
      dated: Mon, 14 Dec 92 12:16:10 GMT

---------- Begin Forwarded Message ----------
>From @ibm.liv.ac.uk:CDW10@LIVERPOOL.AC.UK Mon Dec 14 07:20:05 1992
Return-Path: <@ibm.liv.ac.uk:CDW10@LIVERPOOL.AC.UK>
Received: from mail.liv.ac.uk by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA22696; Mon, 14 Dec 92 07:19:55 -0500
Received: from ibm.liverpool.ac.uk by mailhub.liverpool.ac.uk via JANET 
          with NIFTP (PP) id <21042-0@mailhub.liverpool.ac.uk>;
          Mon, 14 Dec 1992 12:19:28 +0000
Received: from UK.AC.LIVERPOOL by MAILER(4.4.t); 14 Dec 1992 12:20:02 GMT
Date: Mon, 14 Dec 92 12:16:10 GMT
From: "Dr. C.D. Wright" <CDW10@LIVERPOOL.AC.UK>
Subject: Re: Message Passing Interface Forum
To: dongarra@edu.utk.cs
Message-Id: <"mailhub.li.044:14.11.92.12.19.28"@liverpool.ac.uk>
Status: RO

Hi.

Since I am in the UK it is clear that I can't actively participate
in the MPI Forum.  I do, however, have one particular problem with
every comms library I have used so far that I would like to see
addressed in any new "standard", and I hope you can pass this on to
whoever is the appropriate person to deal with it.

In many packages such as PVM, PARMACS, p4, etc, it is possible to
probe for and/or receive messages selectively, the selection being
based on the message type (usually in integer) and/or the sender.
This is overly restrictive.  It would be far more useful if the
message's format were sufficiently well defined for the user to be
able to provide their own selection function to be passed in and
used as the basis for reception and/or probing.

That's it.  Hope you can do something with this gripe/suggestion.

Colin.
----------- End Forwarded Message -----------

From owner-mpi-collcomm@CS.UTK.EDU  Tue Dec 15 19:28:40 1992
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA21809; Tue, 15 Dec 92 19:28:40 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA15597; Tue, 15 Dec 92 19:28:32 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 16 Dec 1992 00:28:32 GMT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from helios.llnl.gov by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA15573; Tue, 15 Dec 92 19:28:06 -0500
Received: by helios.llnl.gov (4.1/LLNL-1.18)
	id AA11599; Tue, 15 Dec 92 16:30:03 PST
Date: Tue, 15 Dec 92 16:30:03 PST
From: tony@helios.llnl.gov (Anthony Skjellum)
Message-Id: <9212160030.AA11599@helios.llnl.gov>
To: dongarra@cs.utk.edu, mpi-collcomm@cs.utk.edu, mpi-pt2pt@cs.utk.edu
Subject: Re: Message Passing Interface Forum

That is what we have been talking about in Zipcode for a long time.
- Tony
From owner-mpi-collcomm@CS.UTK.EDU  Thu Dec 31 22:14:12 1992
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA14668; Thu, 31 Dec 92 22:14:12 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA12447; Thu, 31 Dec 92 22:14:01 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Fri, 01 Jan 1993 03:14:00 GMT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from pnlg.pnl.gov by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA12429; Thu, 31 Dec 92 22:13:45 -0500
Received: from carbon.pnl.gov (130.20.65.121) by pnlg.pnl.gov; Thu, 31 Dec 92
 19:09 PST
Received: from fermi.pnl.gov by carbon.pnl.gov (4.1/SMI-4.1) id AA21172; Thu,
 31 Dec 92 19:08:30 PST
Received: by fermi.pnl.gov (4.1/SMI-4.1) id AA11537; Thu, 31 Dec 92 19:08:29 PST
Date: Thu, 31 Dec 92 19:08:29 PST
From: d3g681@fermi.pnl.gov
To: littlefield@fermi.pnl.gov, mpi-collcomm@cs.utk.edu, mpi-ptop@cs.utk.edu
Message-Id: <9301010308.AA11537@fermi.pnl.gov>
X-Envelope-To: mpi-ptop@cs.utk.edu, mpi-collcomm@cs.utk.edu

Posted to mpi-collcomm and mpi-ptop.

I have just taken the archived discussion from netlib@ornl and not
found anything more recent than december 15 (collcomm) and 21 (ptop).
Since I asked for my name to be on the mailing lists and have seen
nothing I assume that things have been quiet since then.

Al Geist's proposal (Dec. 5) for collective communication and the
reasoning behind it seems to provide a resonable starting point for
the discussion of interface and functionality.  I have only a few
minor comments in this regard, but given that the efficiency of
collective communications is critically sensitive to hardware topology
it *must* be essential to more closely integrate the definition of
process groups with topology.  I restrict my comments here to
this subject.

For example, on the Touchstone Delta efficient sub-group global-ops
would suggest that process groups map as best possible to square
sub-meshes, on the iPSC as sub-cubes, on the KSR as sub-rings.
Currently, if one's interest is in performing efficient collective
communication in subgroups, there is no way of performing this mapping
in a portable way.  In this instance one might want something that
functions along these lines

  Create NG process groups with P(0), P(1), ..., P(NG-1) processes in each
  group and assign each process to one of these groups so that collective
  communication within each (and perhaps also between all) subgroup is
  optimized.

Such a mapping might also be readily accomodated as a sub-partitioning
of an existing process group, with the default being ALL.  I could
envisage writing, for instance, a fast-multipole integration using this
functionality.

Comments?

Robert J. Harrison

Mail Stop K1-90                             tel: 509-375-2037
Battelle Pacific Northwest Laboratory       fax: 509-375-6631
P.O. Box 999, Richland WA 99352          E-mail: rj_harrison@pnl.gov





From owner-mpi-collcomm@CS.UTK.EDU  Fri Jan  1 11:54:06 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA16862; Fri, 1 Jan 93 11:54:06 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA13835; Fri, 1 Jan 93 11:53:57 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Fri, 01 Jan 1993 16:53:56 GMT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from msr.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA13817; Fri, 1 Jan 93 11:53:46 -0500
Received: by msr.EPM.ORNL.GOV (5.61/1.34)
	id AA04566; Fri, 1 Jan 93 11:53:35 -0500
Date: Fri, 1 Jan 93 11:53:35 -0500
From: geist@msr.EPM.ORNL.GOV (Al Geist)
Message-Id: <9301011653.AA04566@msr.EPM.ORNL.GOV>
To: d3g681@fermi.pnl.gov, littlefield@fermi.pnl.gov, mpi-collcomm@cs.utk.edu,
        mpi-ptop@cs.utk.edu
Subject: Re: groups and topology.


>I have only a few
>minor comments in this regard, but given that the efficiency of
>collective communications is critically sensitive to hardware topology
>it *must* be essential to more closely integrate the definition of
>process groups with topology.

>Currently, if one's interest is in performing efficient collective
>communication in subgroups, there is no way of performing this mapping
>in a portable way.

It is critical that MPI be portable even if efficiency suffers.
Portability is primary reason for having a standard.

Efficiency is important and tightly coupled to the implementation
on a given vendor's machine. My feeling is that our MPI work
should specify the functionality at the user level
and not dictate how MPI is implemented underneath.

Mapping is the key word in integrating topology and groups,
and mapping is not defined (so far) in MPI. It is related to
the spawning and placement of tasks. I can envision some implementations
allowing tasks to migrate to improve load balance and fault tolerance.
This greatly compounds the mapping problem, but I don't think MPI
should exclude such implementations.
The hope would be that vendors would supply MPI implementations
that map process number to node number in a way that their
collective routines would be efficient with default ALL group
AND that the vendor's mapping would be documented so that
a user could specify subgroups that could exploit this same efficiency.

Al Geist
From owner-mpi-collcomm@CS.UTK.EDU  Sat Jan 16 06:36:12 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA18118; Sat, 16 Jan 93 06:36:12 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA01632; Sat, 16 Jan 93 06:35:42 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Sat, 16 Jan 1993 06:35:41 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from sol.cs.wmich.edu by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA01612; Sat, 16 Jan 93 06:35:37 -0500
Received: from id.wmich.edu (id.cs.wmich.edu) by cs.wmich.edu (4.1/SMI-4.1)
	id AA06149; Sat, 16 Jan 93 06:30:31 EST
Date: Sat, 16 Jan 93 06:30:31 EST
From: john@cs.wmich.edu (John Kapenga)
Message-Id: <9301161130.AA06149@cs.wmich.edu>
To: mpi-collcomm@cs.utk.edu
Subject: A Collection of Primitives


\section{Introduction}
This is a description of some topology independent combined
communication primitives. These primitives are more commonly
referred to relative to "nodes" (eg Single Node Broadcast),
rather than the politically correct "process" (eg Single Process
Broadcast). The names below might look better with the word
Process deleted. These primitives often go under other names
as well I've put the names used in previous posts next to the
names below. The MP_* names appear in Al Geist's post with calls, 
the names further to the right appear in Al Geist's earlier post and
Jon Flowers post.

Some things below were developed in discussion at the last MPI
meeting (IE don't give me credit but you can give me blame). 
This is a list of primitives for discussion.
%
\section{Topology Independent Collective Communication Primitives}
Assume there is a group of $N$ processes. The following collective 
communication primitives can be defined.

Barrier:                  BARRIER       MPI_SYNCH        synchronize
    Every process blocks at the barrier until all processes reach it.
    (unless we have a non-blocking version too, then I would prefer the
     name synchronize)
Collective Operator:      COP           MPI_GLOBAL_OP
    START: every process $i$ has a value $m_i$.
    STOP: a single designated process has the combine of all values
          $m_i: 0 <= i < N$.
    The operations supported in COP are fixed, they include
        add, multiply, min, max, and, or, xor. 
        Types supported include: int, float and double.
Global Operator:          GOP           
    START: every process $i$ has a value $m_i$.
    STOP: every process has the combine of all values $m_i: 0 <= i < N$.
    The operations supported in COP are fixed, they include
        add, multiply, min, max, and, or, xor. 
        Types supported include: int, float and double.
Single Process Broadcast-    SPB        MPI_BCAST
    START: a single designated process $i$ has a message $m$.
    STOP: every process has message $m$.
Multiple Process Broadcast-  MPB	
    START: every process $i$ has a message $m_i$.
    STOP: every process has all messages $m_i: 0 <= i < N$.
Single Process Accumulate-   SPA                          reduce
    START: every process $i$ has a message $m_i$.
    STOP: a single designated process $i$ contains the combine of all the $m_i$.
          Any process can combine two messages into a single new message. (This
          makes the most sense when the combine is associative and commutative.)
Multiple Process Accumulate- MPA
    START every process $i$ has $N$ messages $m_{i,j}: 0 <= j < N$.
    STOP: every process $j$ has the combine of the $N$ messages
          $m_{i,j}: 0 <= i < N$
Single Process Scatter-      SPS
    START: a single designated process $i$ has $N$ messages $m_j: 0 <= j < N$.
    STOP: every process $j$ has message $m_j$.
Single Process Gather-       SPG        MPI_GATHER        gather
    START: every process $i$ has 1 message $m_i$.
    STOP: a single designated process $i$ has all messages $m_i: 0 <= i < N$.
Total Process Exchange-      TPE                          all2all
    START: every process $i$ has $N$ messages $m_{i,j}:  0 <= j < N$.
    STOP: every process $j$ has $N$ messages $m_{i,j}: 0 <= i < N$.

Note a Multiple Process Gather would be the same as a Multiple Process Scatter,
this is called a Total Process Exchange or all2all.
%
\section{Some Background}
For some background, the following simple relationships are known.

Theorem 1:
Assume no computation time and unit communication time per hop for all
messages.  For any network the following diagram holds. A directed arrow
from A to B indicates an algorithm for solving A also solves B and the
optimal time for solving B is not more than the optimal time for solving A.
Horizontal double arrows indicate the relationship holds in both directions.

                             Total Process Exchange
                                      |
                                      V
Multiple Process Broadcast    <----------------->   Multiple Process Accumulate
        |                                                   |
        V                                                   V
Single Process Gather         <----------------->   Single Process Scatter
        |                                                   |
        V                                                   V
Single Process Accumulate     <----------------->   Single Process Broadcast


Theorem 2:
The following optimal complexities can be proven (the log is base 2).
The tree is a balanced binary tree and the times for a linear array are
the same as the ring. p is the number of processors. (W means to a constant)

Problem                     ring      tree          mesh            hypercube
-------------------------------------------------------------------------
single process broadcast    W(p)      W(log p)      W(p ** (1/d))   W(log p)
single process scatter      W(p)      W(p)          W(p)            W(p/log p)
multiple process broadcast  W(p)      W(p)          W(p)            W(p/log p)
total process exchange      W(p**2)   W(p**2)       W(p**((d+1)/d)) W(p)     

Theorem 3:
Additionally, assuming a process can only send one message at a time
(even if it has many links) Some optimal complexities for the above
communications primitives can again be determined.

Problem                    ring      tree          mesh             hypercube
------------------------------------------------------------------------
single process broadcast   W(p)      W(log p)      W(p ** (1/d))    W(log p)
single process scatter     W(p)      W(p)          W(p)             W(p)     
multiprocess broadcast     W(p)      W(p)          W(p)             W(p)     
total process exchange     W(p**2)   W(p**2)       W(p**((d+1)/d))  W(p log p)     
Some results in this direction are also known for wormhole routing.
Cluster architecture machines and be included as well.
%
\section{Remarks}
SPB and SPA
These require a spanning tree of the group.  One difference between
the COP and a SPA followed by a SPB is that the COP uses fixed operations,
while the combine functions should be user supplied. The user supplied function
must be run as a user process on data in user memory on a computation
processor. The COP on the other hand is safe in a system process and my be
able to be run of the communication processor directly.

I tend to use the GOP more often than the COP.

For Steve Ericsson Zeinth's question on non-deterministic order of receives.
My implementations of such primitives have been very deterministic. They 
loosely synchronize to protect the message system. A receiving node on an
all2all (TPE) knows who sent each message, so even if it could be implemented
by N parallel scatters (SPS) the receiver would know where to put each of the 
N incoming messages.

For the primitives above SPB and SPA it becomes important to be very careful
not to overload most current message systems.

We talked about the combine function. Should it be strictly binary or
expect to combine a list of size n? I'll claim binary is enough because
fan in any reasonable implementation is likely to be low at any node.

Two of Jon Flower's requests are for a the GOP (note gop() was such a function
even in an iPSC-1 library) and the  MNB, which is the same as a SNG followed
by a SNB.

I prefer the form of global communication primatives shown by AL Geist,
where all processes make the same call.

The BARRIER and the other primitives could share many of the "512 variations"
currently proposed for the send. In particular a non-blocking BARRIER does
make sense (as requested by Jon Flower). 

There are many questions about details of any colcom primitives, most
of those questions should be clearer as the pt2pt specification matures.
We can discuss the colcom primitives we would propose. 

I would expect BARRIER, COP, GOP, SNA, SNB, SNG and SNS.

I have used 2 of the 3 others (and know where the other might be used).
But If I'm the only one who uses them ... :-)

We could provide (ALL) these primitives based on MPI pt2pt primitives for
groups of with actually topology : Hypercube, Mesh, 2-level Cluster and Generic.
These could be ready a few weeks after the pt2pt specification is stable.
Note these would be much slower than kernel based primitives, but better
than many user codes.

john

From owner-mpi-collcomm@CS.UTK.EDU  Sat Jan 16 06:45:00 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA19956; Sat, 16 Jan 93 06:45:00 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA02116; Sat, 16 Jan 93 06:44:37 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Sat, 16 Jan 1993 06:44:36 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from sol.cs.wmich.edu by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA02108; Sat, 16 Jan 93 06:44:35 -0500
Received: from id.wmich.edu (id.cs.wmich.edu) by cs.wmich.edu (4.1/SMI-4.1)
	id AA06158; Sat, 16 Jan 93 06:39:29 EST
Date: Sat, 16 Jan 93 06:39:29 EST
From: john@cs.wmich.edu (John Kapenga)
Message-Id: <9301161139.AA06158@cs.wmich.edu>
To: mpi-collcomm@cs.utk.edu
Subject: groups and architectures


I strongly agree with Jon Flowers in that topology is important in the
group definition. It seems that any effort to define a group on a large
machine, say 64K nodes, would be futile without using a very regular structure.

I would hope that there are group defining functions which require topology 
and carry that information with them. Most applications on large machines
I know of treat the machine as a unit of a given topology for each stage
of the computation. Whatever else the MPI dose, it must support that mode of
operation efficiently.

For example, a program might do an inquire to find out what kind of machine
topology the machine really is, and then request a 2d-mesh group of a given
size, knowing it will be well laid out on the machine. I know this is against
the architecture independent spirit. If that type of facility is not to be
allowed then it must be shown that on current machines the same effect can
still be achieved.

I would suggest we need the ability to map standard structures onto current
large machines. If we have some primitives that can be safely ignored on later
machines there is no harm.

john
From owner-mpi-collcomm@CS.UTK.EDU  Mon Jan 25 15:20:44 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA25035; Mon, 25 Jan 93 15:20:44 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA18241; Mon, 25 Jan 93 15:20:12 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Mon, 25 Jan 1993 15:20:11 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from beagle.cps.msu.edu by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA18223; Mon, 25 Jan 93 15:20:06 -0500
Received: from uranium.cps.msu.edu by beagle.cps.msu.edu (4.1/rpj-5.0); id AA05995; Mon, 25 Jan 93 15:19:58 EST
Received: by uranium.cps.msu.edu (4.1/4.1)
	id AA12809; Mon, 25 Jan 93 15:19:58 EST
Date: Mon, 25 Jan 93 15:19:58 EST
From: huangch@cps.msu.edu
Message-Id: <9301252019.AA12809@uranium.cps.msu.edu>
To: mpi-intro@cs.utk.edu
Subject: Subscription 
Cc: mpi-collcomm@cs.utk.edu


Please add my name into your mailing list.

Thanks,

--Chengchang Huang
From owner-mpi-collcomm@CS.UTK.EDU  Mon Feb 15 06:51:52 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA26138; Mon, 15 Feb 93 06:51:52 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA12752; Mon, 15 Feb 93 06:51:16 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Mon, 15 Feb 1993 06:51:15 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from daedalus.epcc.ed.ac.uk by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA12744; Mon, 15 Feb 93 06:51:11 -0500
Date: Mon, 15 Feb 93 11:51:03 GMT
Message-Id: <21574.9302151151@subnode.epcc.ed.ac.uk>
From: L J Clarke <lyndon@epcc.ed.ac.uk>
Subject: Re: A Collection of Primitives
To: mpi-collcomm@cs.utk.edu
Reply-To: lyndon@epcc.ed.ac.uk

Dear MPI Colleagues

I found the "Collection of Primitives" most useful. We have a similar
suite of communication routines which we find most useful.

a) When considering finding maxima/minima of distributed data items, it
is often useful to also be able to locate the maxima/minima either in
terms of the process holding that data value, or its position within a
distributed data structure.  The approach we have taken to this is to
introduce a set of procedures which choose a value from a set, rather
than combining a set of values.  The programmer provides an integer
identifier associated with each data value, this may be simply a process
number or a position within a distributed data set such as matrix row
number, and the routine provides the maxima/minima and there
identifiers.  (Ties are resolved by choosing the lowest identifer value,
and all identifiers must be unique.) I propose that we should add a
routine, or routines, of this nature.

b) After some discussion with other interested persons locally, I come
to the conclusion that we should take time at the meeting to consider
what the collcomm operations involving a mixture of communications plus
calculations, such as combination, mean in a heterogeneous environment -
bith in terms of mixed language applications and mixed processor types. 

c) John poses the question of which operations to retain.  I have never
seen an application which uses a large number of these kinds of
functions, but on the other hand I have seen applications which between
them use all of the functions we have implemented.  I therefore suggest
that we retain all of them. 

Best Wishes
Lyndon

         /--------------------------------------------------------\
    e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
    c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
         \--------------------------------------------------------/


From owner-mpi-collcomm@CS.UTK.EDU  Sat Feb 20 10:11:10 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA29389; Sat, 20 Feb 93 10:11:10 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA22263; Sat, 20 Feb 93 10:10:14 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Sat, 20 Feb 1993 10:10:13 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from vnet.ibm.com by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA22249; Sat, 20 Feb 93 10:10:10 -0500
Message-Id: <9302201510.AA22249@CS.UTK.EDU>
Received: from KGNVMA by vnet.ibm.com (IBM VM SMTP V2R2) with BSMTP id 7138;
   Sat, 20 Feb 93 10:07:58 EST
Date: Sat, 20 Feb 93 09:43:04 EST
From: "Daniel D. Frye" <DANIELF@KGNVMA.VNET.IBM.COM>
To: mpi-collcomm@cs.utk.edu

I would recommend we add the following collective communication
routines

  mpi-index - Each process sends a distinct message to all the other
              processes in the group, aka - all-to-all personalized
              communication. Each process in the calling group partitions
              its local buffer into N blocks of equal size, where N is
              the number of processes in the group.  The ith process in
              the sends its jth block in the out buffer to the jth process
              and this block is stored at the ith block in its in buffer.
              Therefore the ith block of the out buffer will be copied
              locally to the ith block of the in buffer.  The only
              arguments necessary are out buffer, in buffer, length of
              the block, gid, and tag/context/whatever.

  mpi-shift - Perform a shift or rotation within a group.  Send a
              block of data any specified number of steps along the
              group either up or down.  The difference between shift
              and rotation is whether or not there is "wrap-around".
              The arguments necessary are out buffer, in buffer, length
              of the block, gid, # of steps, and (perhaps) a flag to
              decide shift or rotation (possibly we want 2 routines?),
              and tag/context/whatever.

  mpi-prefix - Apply parallel prefix (aka scan) with respect to an
               associative reduction operation on data distributed across
               a across and place the corresponding result in each process
               in the group (necessary, I believe, for the generalized
               combine operation we invented in Dallas.)  The operation
               can be any of the functions used in the mpi-reduce operation.


Has anyone taken a shot at a list of reduce operations?


Furthermore, before I forget, given non-blocking collective communication
operations (head-shaking here), we need to define order.  It's more
complicated than ptp message-passing but probably still possible.  I'm
sure we can guarantee order for (e.g.) two successive broadcasts in the
same group with the same root, but not if they have different roots.
Similarly for the cases with a particular destination.   More tricky are
the cases where every process gets a different result.  Can order be
defined for mpi-combine and still preserver some performance?

Thanks.
Dan Frye

From owner-mpi-collcomm@CS.UTK.EDU  Sun Feb 21 11:09:09 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA09315; Sun, 21 Feb 93 11:09:09 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA17311; Sun, 21 Feb 93 11:08:33 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Sun, 21 Feb 1993 11:08:32 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from Aurora.CS.MsState.Edu by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA17303; Sun, 21 Feb 93 11:08:31 -0500
Received:  by Aurora.CS.MsState.Edu (4.1/6.0s-FWP);
	   id AA18416; Sun, 21 Feb 93 10:07:17 CST
Date: Sun, 21 Feb 93 10:07:17 CST
From: Tony Skjellum <tony@Aurora.CS.MsState.Edu>
Message-Id: <9302211607.AA18416@Aurora.CS.MsState.Edu>
To: DANIELF@KGNVMA.VNET.IBM.COM
Subject: hi
Cc: mpi-collcomm@cs.utk.edu

Dan,

It is not obvious to me that we can require the same order for two
successive broadcasts from the same root.  I say this because hardware
implementations (which would be fast) might not support this form of
determinism.  Second, performance characteristics might be better on
average if a different apparent permutation of the participants were
used (for the same root) each time.  I would furthermore add that an
algorithm might like to control that question.

In broadcasts, I see that there are four reasonable cases, modulo
the permutations just discussed.  An algorithm with the root node
sending ceil(log N) messages, an algorithm with each node sending at most
two messages; same algorithms, with the root node off-loading its
data to another node (hot-spot reduction), and then sending no
other messages.

- Tony

From owner-mpi-collcomm@CS.UTK.EDU Sat Feb 20 09:13:40 1993
Received: from Walt.CS.MsState.Edu by Aurora.CS.MsState.Edu (4.1/6.0s-FWP);
	   id AA17871; Sat, 20 Feb 93 09:13:40 CST
Received: from CS.UTK.EDU by Walt.CS.MsState.Edu (4.1/6.0s-FWP);
	   id AA13806; Sat, 20 Feb 93 09:14:36 CST
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA22263; Sat, 20 Feb 93 10:10:14 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Sat, 20 Feb 1993 10:10:13 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from vnet.ibm.com by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA22249; Sat, 20 Feb 93 10:10:10 -0500
Message-Id: <9302201510.AA22249@CS.UTK.EDU>
Received: from KGNVMA by vnet.ibm.com (IBM VM SMTP V2R2) with BSMTP id 7138;
   Sat, 20 Feb 93 10:07:58 EST
Date: Sat, 20 Feb 93 09:43:04 EST
From: "Daniel D. Frye" <DANIELF@KGNVMA.VNET.IBM.COM>
To: mpi-collcomm@cs.utk.edu
Status: RO
Content-Length: 2441
X-Lines: 47

I would recommend we add the following collective communication
routines

  mpi-index - Each process sends a distinct message to all the other
              processes in the group, aka - all-to-all personalized
              communication. Each process in the calling group partitions
              its local buffer into N blocks of equal size, where N is
              the number of processes in the group.  The ith process in
              the sends its jth block in the out buffer to the jth process
              and this block is stored at the ith block in its in buffer.
              Therefore the ith block of the out buffer will be copied
              locally to the ith block of the in buffer.  The only
              arguments necessary are out buffer, in buffer, length of
              the block, gid, and tag/context/whatever.

  mpi-shift - Perform a shift or rotation within a group.  Send a
              block of data any specified number of steps along the
              group either up or down.  The difference between shift
              and rotation is whether or not there is "wrap-around".
              The arguments necessary are out buffer, in buffer, length
              of the block, gid, # of steps, and (perhaps) a flag to
              decide shift or rotation (possibly we want 2 routines?),
              and tag/context/whatever.

  mpi-prefix - Apply parallel prefix (aka scan) with respect to an
               associative reduction operation on data distributed across
               a across and place the corresponding result in each process
               in the group (necessary, I believe, for the generalized
               combine operation we invented in Dallas.)  The operation
               can be any of the functions used in the mpi-reduce operation.


Has anyone taken a shot at a list of reduce operations?


Furthermore, before I forget, given non-blocking collective communication
operations (head-shaking here), we need to define order.  It's more
complicated than ptp message-passing but probably still possible.  I'm
sure we can guarantee order for (e.g.) two successive broadcasts in the
same group with the same root, but not if they have different roots.
Similarly for the cases with a particular destination.   More tricky are
the cases where every process gets a different result.  Can order be
defined for mpi-combine and still preserver some performance?

Thanks.
Dan Frye


From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar  4 10:37:50 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA29481; Thu, 4 Mar 93 10:37:50 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA00367; Thu, 4 Mar 93 10:37:05 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 4 Mar 1993 10:37:03 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from marge.meiko.com by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA00357; Thu, 4 Mar 93 10:37:00 -0500
Received: from hub.meiko.co.uk by marge.meiko.com with SMTP id AA19768
  (5.65c/IDA-1.4.4 for <mpi-collcomm@cs.utk.edu>); Thu, 4 Mar 1993 10:36:56 -0500
Received: from float.co.uk (float.meiko.co.uk) by hub.meiko.co.uk (4.1/SMI-4.1)
	id AA21448; Thu, 4 Mar 93 15:36:53 GMT
Date: Thu, 4 Mar 93 15:36:53 GMT
From: jim@meiko.co.uk (James Cownie)
Message-Id: <9303041536.AA21448@hub.meiko.co.uk>
Received: by float.co.uk (5.0/SMI-SVR4)
	id AA01959; Thu, 4 Mar 93 15:34:12 GMT
To: mpi-collcomm@cs.utk.edu
Cc: jim@meiko.co.uk
Subject: Synchronisation semantics
Content-Length: 1419

Sorry if you get this twice, I sent something similar yesterday, but
didn't get it back myself, so I guess it's disappeared into the great
bit-bucket in the sky.

As I understand the current collective communication proposal, the
synchronisation semantics of the global operations are only weakly
specified. Either 
1) each process can continue as soon as its contribution to the global
   operation is complete 
or 
2) they can be implemented as if there were a group synchronisation.

The first case allows code like this to execute

	Process 1	Process 2	Process 3

	broadcast(rx)   receive from 1	broadcast(tx)
	send to 2	broadcast(rx)	

the second would cause it to deadlock.

I don't believe we should leave this an open issue, since in the
absence of a specification, the user MUST assume that a group
synchronisation occurs. (And if the assume it does they'll get bitten
when it doesn't).

I believe that we should assert that the synchronisation happens.

Those users who explicitly do NOT want it can then make use of the
non-blocking forms of the collective operations (whichever we allow
in) to relax the synchronisation point.

-- Jim
James Cownie 
Meiko Limited			Meiko Inc.
650 Aztec West			Reservoir Place
Bristol BS12 4SD		1601 Trapelo Road
England				Waltham
				MA 02154

Phone : +44 454 616171		+1 617 890 7676
FAX   : +44 454 618188		+1 617 890 5042
E-Mail: jim@meiko.co.uk   or    jim@meiko.com


From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar  4 12:12:41 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA02263; Thu, 4 Mar 93 12:12:41 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA05340; Thu, 4 Mar 93 12:10:21 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 4 Mar 1993 12:10:19 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from gstws.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA05321; Thu, 4 Mar 93 12:10:18 -0500
Received: by gstws.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA14943; Thu, 4 Mar 1993 12:10:17 -0500
Date: Thu, 4 Mar 1993 12:10:17 -0500
From: geist@gstws.epm.ornl.gov (Al Geist)
Message-Id: <9303041710.AA14943@gstws.epm.ornl.gov>
To: mpi-collcomm@cs.utk.edu
Subject: Re: Synchronisation semantics



>I believe that we should assert that the synchronisation happens.

I one the other hand would like to declare the example you give
as an erroneous program and put it in the (growing larger) class
of the errroneous programs that can now be written in pt2pt.
And
I would prefer that the user's applications not be forced to wait
on synchronization to occur. It is a mixed bag in existing interfaces
some use method 1 some use method 2. Method 1 is faster
and I don't hear user's complaining about their codes breaking
when using the existing method 1 interfaces.
So I am inclined to specify:
1) each process can continue as soon as its contribution to the global
   operation is complete 

Do other people in this subcommittee have an opinion?

Al Geist
From owner-mpi-collcomm@CS.UTK.EDU  Fri Mar  5 03:32:04 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA23652; Fri, 5 Mar 93 03:32:04 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA22953; Fri, 5 Mar 93 03:31:40 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Fri, 5 Mar 1993 03:31:39 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from pnlg.pnl.gov by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA22945; Fri, 5 Mar 93 03:31:36 -0500
Received: from fermi.pnl.gov (130.20.182.50) by pnlg.pnl.gov; Thu, 4 Mar 93
 12:10 PST
Received: by fermi.pnl.gov (4.1/SMI-4.1) id AA02720; Thu, 4 Mar 93 12:09:19 PST
Date: Thu, 04 Mar 93 12:09:17 -0800
From: Robert J Harrison <d3g681@fermi.pnl.gov>
Subject: Re: Synchronisation semantics
To: mpi-collcomm@cs.utk.edu
Message-Id: <9303042009.AA02720@fermi.pnl.gov>
In-Reply-To: Your message of "Thu, 04 Mar 93 12:10:17 EST."
 <9303041710.AA14943@gstws.epm.ornl.gov>
X-Envelope-To: mpi-collcomm@cs.utk.edu

In message <9303041710.AA14943@gstws.epm.ornl.gov> you write:
> 
> 
> >I believe that we should assert that the synchronisation happens.
> 
> I one the other hand would like to declare the example you give
> as an erroneous program and put it in the (growing larger) class
> of the errroneous programs that can now be written in pt2pt.
> And
> I would prefer that the user's applications not be forced to wait
> on synchronization to occur. It is a mixed bag in existing interfaces
> some use method 1 some use method 2. Method 1 is faster
> and I don't hear user's complaining about their codes breaking
> when using the existing method 1 interfaces.
> So I am inclined to specify:
> 1) each process can continue as soon as its contribution to the global
>    operation is complete 
> 
> Do other people in this subcommittee have an opinion?
> 
> Al Geist


I do not think that one can define what this

> 1) each process can continue as soon as its contribution to the global
>    operation is complete 

means without reference to an implementation.  Also, some implementations
may require synchronization (e.g. for efficiency, or due to h/w or s/w 
limitations).  Other implementations may not.  With proper use
of tagging etc. no synchronization is required for correct execution
no matter what order messages arrive in, apart from the usual
concerns about available buffer space.

Thus, from consideration of orthogonality of function and efficiency,
I would suggest that

1) The synchronization properties of global operations be left
   undefined where this is not required for their termination
   with correct numerical results (e.g. a global summation).
   Any constraints on tags, etc., for correct execution should
   also be defined, though I think we should work very hard to
   remove any such contraints.

2) A separate primitve that acts as a barrier or synchronization
   be provided (I think this is the case already).

Primitive 2 might be provided as a special form of primitive 1, so
that unecessary communication is avoided.  However, this seems
to me a minor optimization.

Robert.

Robert J. Harrison

Mail Stop K1-90                             tel: 509-375-2037
Battelle Pacific Northwest Laboratory       fax: 509-375-6631
P.O. Box 999, Richland WA 99352          E-mail: rj_harrison@pnl.gov





From owner-mpi-collcomm@CS.UTK.EDU  Mon Mar  8 07:07:24 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA01919; Mon, 8 Mar 93 07:07:24 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA17798; Mon, 8 Mar 93 07:06:48 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Mon, 8 Mar 1993 07:06:47 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from marge.meiko.com by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA17790; Mon, 8 Mar 93 07:06:43 -0500
Received: from hub.meiko.co.uk by marge.meiko.com with SMTP id AA03142
  (5.65c/IDA-1.4.4 for <mpi-collcomm@cs.utk.edu>); Mon, 8 Mar 1993 07:06:39 -0500
Received: from float.co.uk (float.meiko.co.uk) by hub.meiko.co.uk (4.1/SMI-4.1)
	id AA10106; Mon, 8 Mar 93 12:06:35 GMT
Date: Mon, 8 Mar 93 12:06:35 GMT
From: jim@meiko.co.uk (James Cownie)
Message-Id: <9303081206.AA10106@hub.meiko.co.uk>
Received: by float.co.uk (5.0/SMI-SVR4)
	id AA02376; Mon, 8 Mar 93 12:03:45 GMT
To: geist@gstws.epm.ornl.gov
Cc: mpi-collcomm@cs.utk.edu
In-Reply-To: Al Geist's message of Thu, 4 Mar 1993 12:10:17 -0500 <9303041710.AA14943@gstws.epm.ornl.gov>
Subject: Synchronisation semantics
Content-Length: 1001

Jim> I believe that we should assert that the synchronisation happens.
OK, so maybe I was a bit stronger than I meant to be.

I actually don't mind too much one way or the other, as long as we
understand what it is that we're doing. 

Therefore are we specifying
> 1) each process CAN continue as soon as its contribution to the global
>    operation is complete 

or

1) each process MUST continue as soon as its contribution to the global
   operation is complete 

(In other words is an implementation free to treat all global operations
as a global synchronisation or not ?) I'm happy with the first of
these statements, but not the second. (However it should be re-worded to make
the possiblity clearer in a draft).

-- Jim
James Cownie 
Meiko Limited			Meiko Inc.
650 Aztec West			Reservoir Place
Bristol BS12 4SD		1601 Trapelo Road
England				Waltham
				MA 02154

Phone : +44 454 616171		+1 617 890 7676
FAX   : +44 454 618188		+1 617 890 5042
E-Mail: jim@meiko.co.uk   or    jim@meiko.com


From owner-mpi-collcomm@CS.UTK.EDU  Mon Mar  8 09:14:45 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA05445; Mon, 8 Mar 93 09:14:45 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA22382; Mon, 8 Mar 93 09:14:09 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Mon, 8 Mar 1993 09:14:07 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from msr.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA22373; Mon, 8 Mar 93 09:14:06 -0500
Received: by msr.EPM.ORNL.GOV (5.67/1.34)
	id AA13188; Mon, 8 Mar 93 09:13:48 -0500
Date: Mon, 8 Mar 93 09:13:48 -0500
From: geist@msr.EPM.ORNL.GOV (Al Geist)
Message-Id: <9303081413.AA13188@msr.EPM.ORNL.GOV>
To: jim@meiko.co.uk
Subject: Re:  Synchronisation semantics
Cc: mpi-collcomm@cs.utk.edu

The draft will read:
1) each process CAN continue as soon as its contribution to the global
   operation is complete.

Cheers,
 Al
From owner-mpi-collcomm@CS.UTK.EDU  Wed Mar 10 08:13:16 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA28509; Wed, 10 Mar 93 08:13:16 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA11051; Wed, 10 Mar 93 08:11:02 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 10 Mar 1993 08:11:01 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from super.super.org by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA11039; Wed, 10 Mar 93 08:10:59 -0500
Received: from b125.super.org by super.super.org (4.1/SMI-4.1)
	id AA20486; Wed, 10 Mar 93 08:10:57 EST
Received: by b125.super.org (4.1/SMI-4.1)
	id AA01741; Wed, 10 Mar 93 08:10:56 EST
Date: Wed, 10 Mar 93 08:10:56 EST
From: lederman@b125.super.org (Steve Huss-Lederman)
Message-Id: <9303101310.AA01741@b125.super.org>
To: mpi-collcomm@cs.utk.edu
Subject: non-blocking routines

I casually raised the issue at the last meeting of whether we were
going to make the collective communications completely compatible with
the point-to-point standard.  Specifically, I raised the issue of
whether there would be a non-blocking broadcast.  This started a chain
of events that ultimately lead to a non-blocking wait.  I can only hope
that in the official minutes the originator's name will be lost.  I
already see the laughing occurring when one first hears this suggestion
out of context :-)

But seriously, since I started this thing I would like to now see it
resolved.  The next points involve the global picture and then some
details of non-blocking collective communications follow.  I have not
filled in a lot of details until I think the global picture is
resolved.

The way this whole thing started was for symmetry with
point-to-point.  I think people agree that you could take advantage of
a non-blocking collective communication in the same way you can a
non-blocking send.  It was even pointed out that collective
communications are generally more expensive in terms of latency and
time so there might be an even bigger justification.  So the group
voted for a non-blocking broadcast by a fairly large majority (if my
recollection is correct).  Now the slippery slope argument sets in.  If
you have a non-blocking broadcast, you need a non-blocking gather,
scan, etc.  This finally led to the non-blocking wait or more
appropriately called a non-blocking barrier.  As the votes progressed,
there were fewer total votes and fewer yes votes for the non-blocking
version.  I interpret this as people starting to understand the
consequences of the first vote and starting to have second thoughts.
Do people agree with this interpretation?  Is my memory/notes correct?

So the big picture question is whether we should have non-blocking
collective communications calls at all.  Here is my current feelings.
I think that they have merit and can be useful.  However, they add a
lot of complexity to routines that are already difficult to do
correctly and efficiently.  I think it is unlikely that we can specify
these routines and get done in 3 more meetings.  (I am also posting
this idea in a more general context to the whole committee.)  It also
falls outside current practice.  If we decide to pursue a more complex
standard and extend the deadline, then we should include this too.
However, I would think a more manageable first standard that can get
done quickly would be better.

Given that, I raise a few of the issues involved in non-blocking
collective communications.  I only list some to show what is involved.
If we decide to continue down this path, then I will be more explicit
and get involved more in details.

If we have non-blocking calls, then we need all the routines like
point-to-point has.  For example, we need a wait, probe and either two
calls or an option to choose between blocking and not.  Another issue
is dealing with two non-blocking calls in a row.  For example, suppose
you do two non-blocking broadcasts in a row but use a different root.
It seems to me that an intermediate node could get two different
messages from another intermediate node and have trouble telling which
broadcast it is supposed to be for.  Are we going to allow this?  If
so, the coding of the broadcast may be much harder on some systems.
If not, you restrict the user in a way that is unnatural.

Steve

P.S. - The moral is: never make a casual suggestion at an MPI
meeting.  You'll probably live to regret it :-).
From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 11 12:49:49 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA02571; Thu, 11 Mar 93 12:49:49 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA06487; Thu, 11 Mar 93 12:48:54 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 11 Mar 1993 12:48:53 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from canidae.cps.msu.edu by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA06461; Thu, 11 Mar 93 12:48:46 -0500
Received: from pit-bull.cps.msu.edu by canidae.cps.msu.edu (4.1/rpj-5.0); id AA01661; Thu, 11 Mar 93 12:48:43 EST
Received: by pit-bull.cps.msu.edu (4.1/4.1)
	id AA04285; Thu, 11 Mar 93 12:48:42 EST
Date: Thu, 11 Mar 93 12:48:42 EST
From: kalns@cps.msu.edu
Message-Id: <9303111748.AA04285@pit-bull.cps.msu.edu>
To: mpi-collcomm@cs.utk.edu
Subject: reduction and gather

Dear Collective Communications Subcommittee:

I have not participated in this forum in the past;
however, I have been an active MPI reader for the past two
months.  I would like to comment on the following:

1. Reduction
   a. each participating process gets the result
   b. additional ops

2. Gather
   a. concatenation in rank order
----------

Al Geist proposed the following interface for reduction:

>  info = MPI_GLOBAL_OP( inbuf, bytes, type, gid, op, outbuf )
>
>  Function:
>  Called by all members of the group "gid"
>  using the same argument for "bytes", "type", "gid", and "op".
>  On return the "outbuf" of all group members contains the
>  result of the global operation "op" applied pointwise to
>  the collective "inbuf". For example, if the op is max and
>  inbuf contains two float point numbers then
>        outbuf(1) = global max( inbuf(1)) and
>        outbuf(2) = global max( inbuf(2))
>  A set of standard operations are supplied with MPI including:
>    global max - for each data type
>    global min - for each data type
>    global sum - for each data type
>    global mult- for each data type
>    global AND - for integer and logical type
>    global OR  - for integer and logical type
>    global XOR - for integer and logical type

Every process receives the result of the reduction operation.

John Kapenga proposed two different reductions in
"Collection of Primitives" where in one
case all processes receive the result, the other only a
single process receives the result.

I concur with John's more flexible approach since for some
applications, only a single process needs the result.
Consider Gaussian Elimination with columns of the coefficient
matrix distributed to processors.  The following code illustrates.
This code must be translated into message-passing (SPMD) code
for each processor. (Assuming one process/processor)

s1:  DO I=1,N
s2:    LOC = MAXLOC(A[I,I:N])              /* max location in row */
s3:    EXCHANGE(A[1:N,I],A[1:N,LOC])       /* exchange columns */
s4:    A[I,I:N] = A[I,I:N] / A[I,I]
s5:    DO J=I+1,N
s6:       DO K=I+1,N
s7;          A[J,K] = A[J,K] - A[J,I] * A[I,K]
s8:       END DO
s9:    END DO
s10: END DO

The only processes that need to know the max location are
the process which owns column I and the process which
owns column LOC, in order to exchange columns.

The above code also illustrates where MAXLOC (and MINLOC)

Al Geist proposed the following interface for gather:
>  info = MPI_GATHER( buf, bytes, type, gid, root )
>
>  Function:
>  Called by all members of the group "gid"
>  using the same argument for "bytes", "type", "gid", and "root".
>  On return all the individual "buf" are concatenated into the "root" buf,
>  which must be of size at least gsize*bytes.
>  The data is laid in the "root" buf in rank order that is
>  | gid,0 data | gid,1 data | ...| gid, root data | ...| gid, gsize-1 data |
>  Other member's "buf" are unchanged on return.
>  On return "info" contains the error code.

Why must the data be laid out in "rank order"? This may not
always be necessary.  There is certainly additional overhead
in arranging it this way instead of just concatenating messages (with
GCPID) as they arrive. Perhaps there could be an option to obtain
in rank order when necessary.

Regards,
Edgar

======================================================================
| Edgar T. Kalns                     | Internet: kalns@cps.msu.edu   |
| Advanced Computing Systems Lab     | Tel: (517) 353-8666           |   
| Department of Computer Science     |                               |
| Michigan State University          |                               |
| East Lansing, MI 48824, USA        |                               |
======================================================================

From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 11 13:25:26 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA04034; Thu, 11 Mar 93 13:25:26 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA09574; Thu, 11 Mar 93 13:24:28 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 11 Mar 1993 13:24:27 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from deepthought.cs.utexas.edu by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA09566; Thu, 11 Mar 93 13:24:25 -0500
From: rvdg@cs.utexas.edu (Robert van de Geijn)
Received: from grit.cs.utexas.edu by deepthought.cs.utexas.edu (5.64/1.2/relay) with SMTP
	id AA23445; Thu, 11 Mar 93 12:24:26 -0600
Received: by grit.cs.utexas.edu (5.64/Client-v1.3)
	id AA13089; Thu, 11 Mar 93 12:24:16 -0600
Date: Thu, 11 Mar 93 12:24:16 -0600
Message-Id: <9303111824.AA13089@grit.cs.utexas.edu>
To: kalns@cps.msu.edu
Cc: mpi-collcomm@cs.utk.edu
In-Reply-To: kalns@cps.msu.edu's message of Thu, 11 Mar 93 12:48:42 EST <9303111748.AA04285@pit-bull.cps.msu.edu>
Subject: reduction and gather

   Dear Collective Communications Subcommittee:

   Al Geist proposed the following interface for reduction:
n
   >  info = MPI_GLOBAL_OP( inbuf, bytes, type, gid, op, outbuf )
   >
 
   Every process receives the result of the reduction operation.

   John Kapenga proposed two different reductions in
   "Collection of Primitives" where in one
   case all processes receive the result, the other only a
   single process receives the result.

   I concur with John's more flexible approach since for some
   applications, only a single process needs the result.
   Consider Gaussian Elimination with columns of the coefficient
   matrix distributed to processors.  The following code illustrates.
   This code must be translated into message-passing (SPMD) code
   for each processor. (Assuming one process/processor)

There are a number of reasons to have two versions: Indeed, the
"Fan-in" is often used, and can be implemented on most systems
requiring half the time of the GSUM to all (for large vectors).
Indeed, I propose a third version: A combine leaving the result in
pieces distributed among the nodes.  (This would be the inverse of the
GCOLX routine, with a combine added, in Intel Lingo).  an integer
array would indicate the size of the piece to be left at each node.
Again, there are performance issues behind the need for this last
operation, since the GSUM to all performs this operation, and more.

Robert




=====================================================================
  Robert A. van de Geijn                     rvdg@cs.utexas.edu  
  Assistant Professor
  Department of Computer Sciences            (Work)  (512) 471-9720
  The University of Texas                    (Home)  (512) 251-8301 
  Austin, TX 78712                           (FAX)   (512) 471-8885 
=====================================================================
From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 11 13:44:15 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA04337; Thu, 11 Mar 93 13:44:15 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA10540; Thu, 11 Mar 93 13:43:30 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 11 Mar 1993 13:43:29 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from msr.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA10532; Thu, 11 Mar 93 13:43:16 -0500
Received: by msr.EPM.ORNL.GOV (5.67/1.34)
	id AA01838; Thu, 11 Mar 93 13:43:03 -0500
Date: Thu, 11 Mar 93 13:43:03 -0500
From: geist@msr.EPM.ORNL.GOV (Al Geist)
Message-Id: <9303111843.AA01838@msr.EPM.ORNL.GOV>
To: mpi-collcomm@cs.utk.edu
Subject: Re: Edgar's questions
Cc: kalns@cps.msu.edu


Hi Edgar,

>John Kapenga proposed two different reductions in
>all processes receive the result
>single process receives the result
>I concur with John's more flexible approach

I also agree that we can have both functions,
and the collective communication draft I  am maddly writing
contains both. (and some others submitted by Frye.)

>Why must the data be laid out in "rank order"? This may not
>always be necessary.

It is a convience to the user so that he may quickly
find data from a particular task. Since bytes is constant
root can place each message in the correct location in buf
with no extra overhead. So there is no incentive to have 
a random order.

Al Geist
From owner-mpi-collcomm@CS.UTK.EDU  Fri Mar 12 11:23:01 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA23453; Fri, 12 Mar 93 11:23:01 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA08903; Fri, 12 Mar 93 11:22:14 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Fri, 12 Mar 1993 11:22:12 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from [128.219.8.54] by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA08885; Fri, 12 Mar 93 11:22:09 -0500
Received: by gstws.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA15629; Fri, 12 Mar 1993 11:21:55 -0500
Date: Fri, 12 Mar 1993 11:21:55 -0500
From: geist@gstws.epm.ornl.gov (Al Geist)
Message-Id: <9303121621.AA15629@gstws.epm.ornl.gov>
To: mpi-collcomm@cs.utk.edu
Subject: First draft of Collective Communication section of MPI.


\documentstyle[12pt]{article}
\begin{document}

\section{Collective Communication}

[I have placed comments and questions in square braces.]

\subsection{Introduction}

This section is a draft of the current proposal for collective communication.
Collective communication is defined to be communication that involves
a group of tasks. Examples are broadcast and global sum.
Because of the need to deal with groups of tasks, this section will also
present a proposal for the formation, partitioning, and managing
of basic groups. 
A basic group has two properties:
It has a group identifier which is associated with a set of tasks.
Each task in a group has a unique rank $0 - (p-1)$ in the group.
There is an initial default group {\bf ALL} that contains
all the tasks.
Giving or forming groups with topological features is presented in section 4.
[by the Topology subcommittee]

The collective communication routines are built above the point-to-point
routines. While vendors may optimize certain collective routines for
their architectures, a complete library of the collective communication
routines written entirely in point-to-point will be available.
The following communication functions are proposed.
\begin{itemize}
\item
Broadcast from one member to all members of a group.
\item
Barrier across all group members
\item
Gather data from all group members to one member.
\item
Scatter data from one member to all members of a group.
\item
Global operations such as sum, max, min, etc., were the result
is known by all group members and a variation where the result is
known by only one member. The ability to have user defined
global operations.
\item
Simultaneous shift of data around the group, the simplest example
being all members sending their data to (rank+1) with wrap around.
For portability, the topology section provides routines for 
defining who is a member's neighbor in a given direction and hops away.
\item
Scan across all members of a group (also called parallel prefix).
\item
Broadcast from all members to all members of a group.
\item
Scatter data from all members to all members of a group
(also called complete exchange or index).

To simplify the collective communication interface it is
designed with two layers. The low level routines have all the
generality of, and make use of, the buffer descriptor routines
of the point-to-point section which allows arbitrarily complex
messages to be constructed. The second level routines are
similar to the upper level point-to-point routines in that they send
only a contiguous buffer.

\section{Group Functions}

Before defining a collective operation between a group of tasks,
it is necessary to create and manage a group.
A group is identified by a group name that is supplied by the user.
[It is sufficient with static groups to only have an opaque group ID,
which is returned to the user during group formation.
But if we allow dynamic groups in some (future) version of MPI,
then there is no way for a new task to join a group since the 
user doesn't have the opportunity to label the group.
To allow for future extensibility of the group concept
the present draft specifies that groups be named. 
The underlying implementation can map this label to any
type of group ID that is convenient or fast. This could be
an elaborate structure or a simple integer.]
Each member of a group has a unique rank in the group. 
The rank values are the integers 0 to number-of-members minus 1.
Each group has a topology associated with it. 
The collective communication routines are implemented in terms
of the topology associated with a given group.
Athough the function would be the same, a broadcast in a group 
with a ring topology could be implemented differently from a broadcast in
a group with hypercube topology.

The default topology for a group is fully connected. 
Existing groups including the {\bf ALL} group
can switch their associated topology using the functions described in
section 4. This allows the user to match the group topology to the
algorithm executed by the group or the underlying hardware.

The debate rages on about whether groups should be dynamic or static.
A static group is defined to be a group where once it is formed
its membership never changes.
Static groups are just a subset of dynamic groups. The added generality
of dynamic groups is perceived as a useful property to have in MPI
at some future time. One of the most important properties dynamic
groups allows is the development of fault tolerant applications.
Given the time constraints for MPI-1,
the following proposal is written so that dynamic groups 
are possible, but MPI-1 only specifies the restricted case
where the groups are static. 

my\_rank = MPI\_PARTGROUP(group, newgroup)

     Returns only after all members of group have called it.
     The newgroup argument is used as a key. All members with
     the same newgroup argument are placed in the same group
     and their rank in this new group is returned.

old\_group = MPI\_LVGROUP(group)

     In the restricted case of MPI-1, returns only after all
     members of group have called it. 
     Frees all memory and system resources used by group. 
     Returns the name of the old group
     from which they were last partitioned (and of which
     they are still a member). It is an error to call lvgroup(ALL).

size    = MPI\_GSIZE(group)

     Returns the (instantaneous) size of group. [can be called by any task?]

rank    = MPI\_GETRANK(group,pid)

     Given that pid is the unique (possibly opaque) task identifier,
     returns the rank of pid in group.

pid     = MPI\_GETPID(group,rank)

     Given that pid is the unique (possibly opaque) task identifier,
     returns the pid of the task identified by (group,rank).

pid = MPI\_MYPID()

This is included here for completeness to show how a task could
get its rank in a group.

my\_rank = MPI\_JOINGROUP(group)

     Dynamic group function available in MPI-2. 
     Can be called by an individual task with any argument for group.
     If group doesn't exist, then it is created and this task
     becomes its first member.
     If the group exists, then this task is placed in the group
     and given the lowest available rank. For example, if there is
     a gap in the ranks due to a process failure, then this task
     would fill the gap.

\section{Communication Functions}

The proposed communication functions are divided into two layers.
The lowest level uses the same buffer descriptor routines 
available in point-to-point to create noncontiguous, multiple data type
messages. The second level handles only contiguous single data type
messages. Like the point-to-point high level interface, the second
level of collective communication routines handles heterogeneity.

There has been discussion about the synchronization properties
of the collective communication routines. In this proposal
routines can (but are not required to) return as soon as their 
participation in the collective communication is complete.

Each of the following functions returns an error code 
in the info argument.

\subsection{Level 2 routines}

info = MPI\_BCAST( buf, nitems, type, tag, group, from\_rank )

MPI\_BCAST broadcasts a message to all members of a group.
It is called by all members of group using the same arguments for
nitems, type, tag, group, and from\_rank.
On return the contents of the array buf on the member with from\_rank
is contained in buf on all group members.
type is the data type to be sent, nitems is the number of 
these items, tag is a user supplied message tag.

info = MPI\_BARRIER( group, tag )

MPI\_BARRIER blocks the calling task until all group members have called it
using the same tag, 
MPI\_BARRIER returns only when all group members have called this function.

info = MPI\_GATHER( inbuf, outbuf, nitems, type, tag, group, to\_rank~)

MPI\_GATHER gathers the nitems in each group member's inbuf
and places these items in rank order in the to\_rank member's outbuf.
It is called by all members of group using the same arguments for
nitems, type, tag, group, and to\_rank.
The receiving member must declare outbuf to be at least
(nitems * sizeof(type)) * (gsize(group)).
outbuf is unchanged on all the other group members.


info = MPI\_SCATTER( inbuf, outbuf, nitems, type, tag, group, from\_rank~)

MPI\_SCATTER sends different pieces of the from\_rank member's inbuf
to each of the other group members.
The routine is called by all members of the group using the same arguments for
nitems, type, tag, group, and from\_rank.
The data is laid in the from\_rank member's inbuf in rank order.
The other member's inbuf is unchanged by the routine.
On return each member's outbuf contains its nitems piece of the
originators inbuf.

info = MPI\_GLOBAL\_OP( inbuf, outbuf, nitems, type, tag, group, op~)

MPI\_GLOBAL\_OP performs a global operation on the inbuf and
returns the result in outbuf.
The routine is called by all group members using the same arguments
for nitems, type, tag, group, and op.
On return the outbuf of each member contains the result of 
the global operation op applied pointwise the the collective inbuf.
For example, if the op is max and inbuf contains two floating point numbers,
then outbuf(1) $=$ global max(inbuf(1)) and outbuf(2) $=$ global max(inbuf(2)).
A set of standard operations are supplied with MPI including:
\begin{itemize}
\item global max for each data type
\item global min for each data type
\item global sum for each data type
\item global mult for each data type
\item global AND for integer and logical
\item global OR for integer and logical
\item global XOR for integer and logical
\item global scalar max and who has it
\item global scalar min and who has it
\end{itemize}

info = MPI\_USER\_OP( inbuf, outbuf, nitems, type, tag, group, func~)

Same as the global operation function above except the user
supplies the function that is performed on each member rather
than using the standard operations.

info = MPI\_REDUCE(inbuf, outbuf, nitems, type, tag, group, to\_rank, op~)

Same as the global operation function above except only the 
to\_rank member receives the result in its outbuf. The outbuf
of all other routines is unchanged.

info = MPI\_SHIFT( inbuf, outbuf, nitems, type, tag, group, steps~)

Simultaneous shift of data a given number of steps around the group, 
the simplest example
being all members sending their data to (rank+1) with wrap around.
For portability, the topology section provides routines for
defining who is a member's neighbor in a given direction and hops away.

info = MPI\_SCAN( inbuf, outbuf, nitems, type, tag, group, op )

MPI\_SCAN is used to perform a parallel prefix with respect to
an associative reduction operation on data distributed across the group. 
The same standard operations as found in MPI\_GLOBAL\_OP are supplied
with MPI.

info = MPI\_ALLCAST( inbuf, outbuf, nitems, type, tag, group )

Broadcast from all members to all members of a group.

info = MPI\_ALLSCATTER( inbuf, outbuf, nitems, type, tag, group~)

Each process sends a distinct message to all the other
processes in the group, aka - all-to-all personalized
communication. Each process in the calling group partitions
its local buffer into N blocks of equal size, where N is
the number of processes in the group.  The ith process in
the sends its jth block in the out buffer to the jth process
and this block is stored at the ith block in its in buffer.
Therefore the ith block of the out buffer will be copied
locally to the ith block of the in buffer.

\subsection{Level 1 routines}

[I suggest that the level 1 routines be deferred to MPI-2
as well as the buffer descriptor versions of point-to-point.
But if point-to-point includes bd versions then it will be
easy to include comparable version of collective communication routines.
I like the bd version of point-to-point and collective, but I feel
it deviates too far from common practice for MPI-1.]

Level 1 routines allow the user to communicate noncontiguous messages
containing multiple data types. The present proposal is for the 
collective routines to use the same routines that are in the
point-to-point interface to create these arbitrary messages.
Not all collective operations make sense in this context.
The following functions are provided in level 1:

\begin{tabular}{l}
info = MPI\_BCASTBD( bd, tag, group, from\_rank )            \\
info = MPI\_GATHERBD( inbd, outbd, tag, group, to\_rank )    \\
info = MPI\_SCATTERBD( inbd, outbd, tag, group, from\_rank ) \\
info = MPI\_USER\_OPBD( inbd, outbd, tag, group, func )     \\
info = MPI\_SHIFTBD( inbd, outbd, tag, group, steps )       \\
info = MPI\_ALLCASTBD( inbd, outbd, tag, group )            \\
info = MPI\_ALLSCATTERBD( inbd, outbd, tag, group )         \\
\end{tabular}

The descriptions of the functions is the same as in level 2
with the exception that instead of a contiguous block of data
of the same data type each block of data is described by a
buffer descriptor for both input and output buffers.
data types.

\subsection{Nonblocking Communication}

[There was discussion at the last meeting about having nonblocking
variants of the collective communication routines.
They are not presented here because a formal proposal was never 
submitted to the collective communication subcommittee for discussion.
The proposal must explain how the routines work, how they are
used in an application preferably with an example, and if 
possible how the routines could be implemented with discussion
about message order guarantees, robustness, and cancellation.
I feel that the nonblocking routines are far too complex for MPI-1,
and should not be discussed in the present proposal.]

\end{document}
From owner-mpi-collcomm@CS.UTK.EDU  Sun Mar 14 13:59:12 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA05206; Sun, 14 Mar 93 13:59:12 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA16184; Sun, 14 Mar 93 13:58:44 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Sun, 14 Mar 1993 13:58:42 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from pnlg.pnl.gov by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA16158; Sun, 14 Mar 93 13:58:04 -0500
Received: from carbon.pnl.gov (130.20.188.38) by pnlg.pnl.gov; Sun, 14 Mar 93
 10:57 PST
Received: from sodium.pnl.gov by carbon.pnl.gov (4.1/SMI-4.1) id AA29374; Sun,
 14 Mar 93 10:55:26 PST
Received: by sodium.pnl.gov (4.1/SMI-4.0) id AA05149; Sun, 14 Mar 93 10:55:22
 PST
Date: Sun, 14 Mar 93 10:55:22 PST
From: rj_littlefield@pnlg.pnl.gov
Subject: proposal to mpi-collcomm
To: d39135@sodium.pnl.gov, geist@gstws.epm.ornl.gov, gropp@mcs.anl.gov,
        jim@meiko.co.uk, lusk@mcs.anl.gov, lyndon@epcc.ed.ac.uk,
        mpi-collcomm@cs.utk.edu, mpi-context@cs.utk.edu, ranka@top.cis.syr.edu,
        tony@Aurora.CS.MsState.Edu
Message-Id: <9303141855.AA05149@sodium.pnl.gov>
X-Envelope-To: mpi-context@cs.utk.edu, mpi-collcomm@cs.utk.edu

Al & Tony, et.al.:

I am about to send to mpi-collcomm, two notes regarding changes I
propose to the collective communication specification.  (One note
summarizes the changes; the other discusses the reasons for them.)

I am also sending these notes to mpi-context and friends because
they relate to other discussions going on there.

Thought you'd like to know...
--Rik

----------------------------------------------------------------------
rj_littlefield@pnl.gov               Rik Littlefield
Tel: 509-375-3927                    Pacific Northwest Lab, MS K1-87
                                     P.O.Box 999, Richland, WA  99352
From owner-mpi-collcomm@CS.UTK.EDU  Sun Mar 14 15:04:40 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA06334; Sun, 14 Mar 93 15:04:40 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA18143; Sun, 14 Mar 93 15:04:09 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Sun, 14 Mar 1993 15:04:08 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from pnlg.pnl.gov by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA18125; Sun, 14 Mar 93 15:03:46 -0500
Received: from carbon.pnl.gov (130.20.188.38) by pnlg.pnl.gov; Sun, 14 Mar 93
 12:01 PST
Received: from sodium.pnl.gov by carbon.pnl.gov (4.1/SMI-4.1) id AA29382; Sun,
 14 Mar 93 11:59:17 PST
Received: by sodium.pnl.gov (4.1/SMI-4.0) id AA05208; Sun, 14 Mar 93 11:59:13
 PST
Date: Sun, 14 Mar 93 11:59:13 PST
From: rj_littlefield@pnlg.pnl.gov
Subject: collcomm changes, summary
To: geist@gstws.epm.ornl.gov, gropp@mcs.anl.gov, jim@meiko.co.uk,
        lusk@mcs.anl.gov, lyndon@epcc.ed.ac.uk, mpi-collcomm@cs.utk.edu,
        mpi-context@cs.utk.edu, ranka@top.cis.syr.edu,
        tony@Aurora.CS.MsState.Edu
Cc: d39135@sodium.pnl.gov
Message-Id: <9303141959.AA05208@sodium.pnl.gov>
X-Envelope-To: mpi-context@cs.utk.edu, mpi-collcomm@cs.utk.edu

SUMMARY OF SUGGESTED CHANGES TO COLLECTIVE COMMUNICATION PROPOSAL

The draft proposal that Al Geist distributed several days ago
contains some features that would prevent it from being
implemented as a layer on top of MPI point-to-point facilities.

The purpose of this note is to propose changes to the group
control routines in order to permit layering, and to propose
other changes for better and more predictable performance.

A discussion of the rationale for these proposed changes 
will be distributed separately because of its length.

The main changes introduced in this note are:

. The concept of group identification is firmed up.  Most
  operations use a "group handle" that is local to the process.
  (Think of the group handle as being just the address of a
  potentially large and complex "group descriptor".)  There is
  still a "group ID" that is globally unique, but it has only a
  secondary role and can be ignored by most applications.  The
  "group name" is entirely removed from MPI-1.  (Group names are
  still anticipated in MPI-2, but upward-compatibility is
  maintained in a different way from the draft proposal.)

. A semantic restriction is introduced, that a process can access
  information about a group only if the process holds a group
  handle for it.  Group handles can be obtained in two ways: 1)
  they are produced by group formation routines, and 2) a process
  can explicitly distribute copies of its group handles to other
  processes, using new routines introduced specifically for that
  purpose.

. A cacheing mechanism is introduced, that allows modules to
  attach arbitrary information to a group descriptor in such a
  way that it can be quickly retrieved.  Cacheing facilitates the
  construction of collective communication routines that are
  "fast after the first execution in a group", no matter how the
  other group operations are implemented.

. A new group formation routine is introduced, that is less
  synchronous and more general than MPI_PARTGROUP.

Specifically, the following routines are proposed to be added or
modified:

1. Arbitrary group formation:

    newgrp_handle = MPI_FORMGROUP (grouptag,groupsize,knownmembers)

    where
     grouptag     is a user-provided integer tag, sufficiently unique
                  to disambiguate overlapping groups that might be
                  formed simultaneously (say by multiple threads).

     groupsize    is the number of members that will compose the group.

     knownmembers is a set of pid's of some or all members of the group.
                  Each member of the group must provide the same
                  set of knownmembers.

     newgrp_handle  is a group handle for the newly formed group

    This new routine must be called synchronously, but only by those
    processes forming the group.

2. Group partitioning:

    newgrp_handle = MPI_PARTGROUP (oldgrp_handle,grouptag)

    where the semantics are the same as the draft proposal except that
    the return value is now a new group handle instead of a rank.
    (The rank can be determined by a separate call to
    MPI_GETRANK(group_handle,pid) .)

3. Group disbanding:

    MPI_LVGROUP (group_handle)

    where the semantics are the same as the draft proposal except that
    MPI_LVGROUP now does not return any result.  (Since groups can now
    be formed arbitrarily, not just by partitioning, it is not obvious
    what MPI_LVGROUP could return in general.)  This routine can be
    called only by members of the group.

4. Distribution of group handles and disposition of distributed handles:

    MPI_SendGroupHandle (pid,context,tag,old_group_handle)

    new_group_handle = MPI_RecvGroupHandle (pid,context,tag)

    MPI_FreeGroupHandle (group_handle)

    (The latter routine is similar to MPI_LVGROUP except that
    it can be called only for distributed group handles.  This is
    solely for semantic clarity; a single interface routine would do.)

5. Cacheing group-specific process-local information:

    The following routines get and free keys for use with group
    cacheing.

      key = MPI_GetAttributeKey ()
      MPI_FreeAttributeKey ()

    The following routines cache and retrieve information.

      MPI_SetGroupAttribute  (grouphandle,key,value,destructor_routine)
      status = MPI_TestGroupAttribute (grouphandle,key,&value)

    where
      key         must be unique within the group
      value       is anything the size of a pointer
      destructor_routine   is an application-provided routine that
                           is called by MPI_LVGROUP, with arguments
                           being the group handle, cached key and value.

    Cached information is stripped from the new group handle
    returned by MPI_SendGroupHandle.

    In a conforming implementation, MPI_TestGroupAttribute must
    be no slower than a point-to-point communication call.

6. Retrieving global group ID:

    global_id = MPI_GetGlobalGroupID (grouphandle)

7. Other collective communications:

   Consistently substitute "grouphandle" in place of "group".

----------------------------------------------------------------------
rj_littlefield@pnl.gov               Rik Littlefield
Tel: 509-375-3927                    Pacific Northwest Lab, MS K1-87
                                     P.O.Box 999, Richland, WA  99352
From owner-mpi-collcomm@CS.UTK.EDU  Sun Mar 14 15:50:50 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA06925; Sun, 14 Mar 93 15:50:50 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA19504; Sun, 14 Mar 93 15:50:24 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Sun, 14 Mar 1993 15:50:22 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from pnlg.pnl.gov by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA19411; Sun, 14 Mar 93 15:49:35 -0500
Received: from carbon.pnl.gov (130.20.188.38) by pnlg.pnl.gov; Sun, 14 Mar 93
 12:48 PST
Received: from sodium.pnl.gov by carbon.pnl.gov (4.1/SMI-4.1) id AA29389; Sun,
 14 Mar 93 12:46:59 PST
Received: by sodium.pnl.gov (4.1/SMI-4.0) id AA05301; Sun, 14 Mar 93 12:46:57
 PST
Date: Sun, 14 Mar 93 12:46:57 PST
From: rj_littlefield@pnlg.pnl.gov
Subject: collcomm changes, rationale
To: geist@gstws.epm.ornl.gov, gropp@mcs.anl.gov, jim@meiko.co.uk,
        lusk@mcs.anl.gov, lyndon@epcc.ed.ac.uk, mpi-collcomm@cs.utk.edu,
        mpi-context@cs.utk.edu, ranka@top.cis.syr.edu,
        tony@Aurora.CS.MsState.Edu
Cc: d39135@sodium.pnl.gov
Message-Id: <9303142046.AA05301@sodium.pnl.gov>
X-Envelope-To: mpi-context@cs.utk.edu, mpi-collcomm@cs.utk.edu

RATIONALE FOR SUGGESTED CHANGES TO COLLECTIVE COMMUNICATION PROPOSAL

In a related summary, I outlined a set of suggested changes to
the concepts and routines in the collective communication proposal.

The purpose of this note is to present the rationale for those
suggestions and to discuss possible alternatives.

The discussion is organized into 5 areas, flagged with "----- Topic #".

Entries flagged with > are from my summary of suggested changes.
Entries flagged with >>> are from the draft proposal sent out by Al Geist.

----- Topic #1: Group Identification -----

> . The concept of group identification has been firmed up.  Most
>   operations use a "group handle" that is local to the process.
>   (Think of the group handle as being just the address of a
>   potentially large and complex "group descriptor".)
>        ...
> . A semantic restriction is introduced, that a process can access
>   information about a group only if the process holds a group
>   handle for it.  Group handles can be obtained in two ways: 1)
>   they are produced by group formation routines, and 2) a process
>   can explicitly distribute copies of its group handles to other
>   processes, using new routines introduced specifically for that
>   purpose.

There are two issues here: one of being able to layer collective
communications on top of point-to-point at all, and a secondary
one of efficiency.

The more fundamental issue is layering.  Given only MPI point-
to-point functionality, how can a group identifier (whatever it
is) be transmitted between processes so as to be useful to the
receiver?

Presumably we want to allow group identifiers to be passed around so
that any process holding the group identifier can use it for purposes
like translating between (group,rank) and pid.  We also want to allow
this translation to be done asynchronously, i.e., without requiring
the explicit cooperation of any other MPI process at the time of
translation.  Since MPI pt-pt does not support asynchronous servers or
an interrupt receive capability, this implies that the group
identifier must come complete with enough information to resolve all
translations without communication.

This prompts the concept that the group identifier must be associated
with a "group descriptor" that is large and complex enough to
fully describe the group.

How is the association done?  This is a question of efficiency.  If
the identifier is allowed to be process-local, the group descriptor
can be located very quickly -- just make the identifier be a pointer
to the group descriptor.  Requiring the identifier to have global
scope would not be so good.  In that case, either the identifier has
to be carefully constructed or the association has to be done with
some sort of table search.  These issues also arise with global pid's.
However, groups can be formed much more often and in greater numbers
than processes.  I doubt that careful construction tricks could be
assured to be adequate, and if not, then a table search would be
required on each collective communication call.

The conclusion is that, for most purposes, a process-local
identifier generated by the system is preferred.  Such things are
typically called "handles", hence the term "group handle".

> 4. Distribution of group handles and disposition of distributed handles:
> 
>     MPI_SendGroupHandle (pid,context,tag,old_group_handle)
> 
>     new_group_handle = MPI_RecvGroupHandle (pid,context,tag)
> 
>     MPI_FreeGroupHandle  (group_handle)

The next question is how group handles should be distributed.

Implicit distribution is out because MPI pt-pt doesn't support
a server capability, and presumably we aren't willing to synchronize
all of the processes whenever somebody creates a group handle.

So, explicit distribution is required.  How do we handle it?

Two ideas that I do not like are the following.  MPI might provide
routines to translate to and from some machine- and process-
independent format, so that the translated information could be sent
using normal point-point primitives.  This strategy requires that the
user program manage the storage of indefinite-length objects, which
makes for an ugly Fortran interface.  Or, group descriptors (and their
translation routines) might be built into point-point MPI as another
data type.  This violates the spirit of layering collective
communication on point-to-point, and has the same storage management
problem.

The three routines proposed above were the cleanest interface
I could think of.

----- Topic #2: Global Group ID -----

>   There is
>   still a "group ID" that is globally unique, but it has only a
>   secondary role and can be ignored by most applications.  
>         ...
>     global_id = MPI_GetGlobalGroupID (grouphandle)

Given that we are now able (and required) to pass around copies of
group handles, it is not clear to me that MPI really needs special
support for the concept of a global group ID.  On the other hand,
it's easy to provide, since we have to construct one or more
globally unique context values for each group anyway.  So just
use the first such context value as the global ID.  This gives
something unique that all processes can agree on.  

But note that knowing just the global group ID does not let you
get other information about the group -- you have to hold a group
handle for that.

(We could add a routine that would accept the global group ID and
return a handle for that group, presuming that the process held
one.  This would be cheap to do, since group handles are managed
by MPI anyway, and I can vaguely imagine that it might help some
applications.  On the other hand, there are no similar "handle
lookup" facilities provided elsewhere in MPI, and I'm reluctant
to set that kind of precedent without clear need.)

----- Topic #3: Group Formation -----

> . A new group formation routine is introduced, that is less
>   synchronous and more general than MPI_PARTGROUP.
>        ...
> 1. Arbitrary group formation:
> 
>     newgrp_handle = MPI_FORMGROUP (grouptag,groupsize,knownmembers)
> 
>     where
>      grouptag     is a user-provided integer tag, sufficiently unique
>                   to disambiguate overlapping groups that might be
>                   formed simultaneously, say by multiple threads.
> 
>      groupsize    is the number of members that will compose the group.
> 
>      knownmembers is a set of pid's of some or all members of the group.
>                   Each member of the group must provide the same
>                   set of knownmembers.
> 
>      newgrp_handle     is a group handle for the newly formed group
> 
>     This new routine must be called synchronously, but only by those
>     processes forming the group.  

The draft proposal distributed by Al Geist says that

>>> A group is identified by a group name that is supplied by the user.

A group name by itself is not enough to allow implementing groups
as a layer on top of point-to-point, unless we impose
restrictions that I think would be not acceptable.

The problem is: how does a group-forming routine know whom it
should send messages to, in order to form the group?

MPI_PARTGROUP does not have a problem with this, because it has
to be called synchronously by all members of the group.  Since
each current member of the group holds a handle (descriptor) for
that group, it is easy for each member to figure out who talks to
whom.

Unfortunately, there are some important application designs that
I do not see how to implement with just MPI_PARTGROUP.

For example, I am now doing an application that uses a
master-slaves strategy to asynchronously parcel out chunks of
work, with each chunk being done by several processes working
collaboratively.  Collective communication between those
processes is required, so it seems natural to organize them into
MPI groups.  Using a synchronous group partitioning routine
would introduce a risk of load imbalance, because the varying
chunk size implies that groups can finish their work at
different times, and synchronous partitioning would delay their
reassignment.

Applications like this could benefit from a group formation
routine that is called synchronously, but only by those
processes forming the group -- hence MPI_FORMGROUP.

This type of routine does have the problem of identifying its
collaborators, and the only solution I can think of is to
tell it.  That's what the knownmembers argument is for.

I have specified knownmembers in terms of pid's because I assume
that point-to-point communication based on pid's is always fast
and unrestricted.  If knownmembers were based on (group,rank)
pairs, then per the discussion above, all processes making this
call would have to hold handles (descriptors) for the referenced
groups.  This seems to me to be more trouble than it's worth, but
others may disagree.

Another comment about efficiency...  The size of the knownmembers
set affects the efficiency of group formation.  At one extreme,
only one member is required to be known.  This is scalable in a
memory sense, but not in a time sense, because it implies O(P)
group formation time for a group of P processes.  At the other
extreme, all members can be specified.  This is not scalable in a
memory sense, but allows guaranteed O(log P) formation time.
Other tradeoffs are possible, such as O(sqrt P) knownmembers and
O(sqrt P) formation time.  The interface as specified allows
each application to choose the type of scalability it wants.

----- Topic #4: Group Names -----

>   ...  The
>   "group name" is entirely removed from MPI-1.  (Group names are
>   still anticipated in MPI-2, but upward-compatibility is
>   maintained in a different way from the draft proposal.)

The draft distributed by Al Geist states:

>>> To allow for future extensibility of the group concept
>>> the present draft specifies that groups be named. 

Requiring names has the drawback that 1) it burdens the user with
at least the appearance of having to create unique names, in
order to be upward-compatible with dynamic groups, even though 2)
in a layered MPI-1, there is no way in general to check global
uniqueness, and thus programs can work fine with non-unique names.

This combination strikes me as actually impeding upward-
compatibility.  The tendency will be for programmers to use
non-unique names because it works and it's easy.  But such programs
would break when MPI-2 came along and started actually using
the names for something.  I don't like encouraging people to
write programs that are going to break.

I do support upward compatibility.  However, rather than requiring
names in MPI-1, I propose that they be deferred entirely to
MPI-2, at which point they can be supported either just through
MPI_JOINGROUP (as an alternative to MPI_FORMGROUP) or via
additional routines to attach globally unique names to groups
that have already been formed via MPI_JOINGROUP.

----- Topic #5: Cacheing -----

> 5. Cacheing group-specific process-local information:
> 
>     The following routines get and free keys for use with group
>     cacheing.
> 
>       key = MPI_GetAttributeKey ()
>       MPI_FreeAttributeKey ()
> 
>     The following routines cache and retrieve information.
> 
>       MPI_SetGroupAttribute  (grouphandle,key,value,destructor_routine)
>       MPI_TestGroupAttribute (grouphandle,key,&value)
> 
>     where
>       key         must be unique within the group
>       value       is anything the size of a pointer
>       destructor_routine   is an application-provided routine that
>                            is called by MPI_LVGROUP, with arguments
>                            being the group handle, cached key and value.
> 
>     Cached information is stripped from the new group handle
>     returned by MPI_SendGroupHandle.
> 
>     In a conforming implementation, MPI_TestGroupAttribute must
>     be no slower than a point-to-point communication call.

This feature is purely for efficiency, but I think it's so valuable,
cheap, and clean that something like it has to go in.

One feature of collective communication is that the fastest
algorithm for any particular job usually depends on the machine
topology, which processes belong to the group, and the amount of
data being manipulated.  For example, global combine of L data
elements across P = RC processes on a 2-D RxC mesh can be done in
O(L log(P)) time using a fanin/fanout algorithm, or in O(L + sqrt(P))
time using a nested rings algorithm.  The former is better for
small L, the latter for big L, and using the wrong one can easily
cost a factor of 3 in execution time.

So, there is strong motivation to write collective communication
routines that are adaptive in the sense of figuring out which
algorithm is best.  The problem is that it can take quite a lot
of time to make the decision, starting from a scratch position
of not even knowing which processes belong to the group.  It's
going to take lots of calls to the inquiry routines to get that
information, and then some more cycles to make the proper decisions.

Obviously it would be profitable to cache the information and/or
decisions.  The question is, where?  

It is tempting to say that the collective communication routine
could or should keep its own cache, indexed by group handle
and/or global group ID.  The problem is, groups are dynamic in
the sense of being formed and disbanded, so that unless group IDs
can get very large, eventually they will have to be reused.  Now,
it wouldn't do to have a collective communication routine use
stale cached information, so if the collective communication
routine is keeping its own cache, then it needs to be notified of
the reuse so that it can release the cached stuff.
Alternatively, perhaps the cached information could be
automatically released.  (Either strategy guarantees immediate
release of cached info when the group handle/descriptor is
released.  I presume we want to do that, to avoid getting into
the morass of garbage collection.)

The method proposed here can be thought of as implementing both
strategies.  The idea is that the routines that free group
handles (and the associated descriptors) loop through the cached
information, calling an application-provided destructor routine
for each piece of cached information.  Typically, the cached
information will be a pointer to a hunk of memory managed by the
collective communication, which the destructor will free in
whatever way it has to.  Upon return from the destructor, the
group-freeing routine will release the little piece of memory
holding the pointer, and everything will be cleaned up.

If that group handle/descriptor is ever reused, it will be
reinitialized to indicate no cached information, and
MPI_TestGroupAttribute will return "not found".

An efficient-after-first-call group-global operation using 
cacheing might look like this:

   static int gop_key_assigned = 0;    /* 0 only on first entry */
   static MPI_key_type gop_key;        /* key for this module's stuff */

   efficient_global_op (grphandle, ...)
   struct group_descriptor_type *grphandle;
   {
     struct gop_stuff_type *gop_stuff;   /* whatever we need */

     if (!gop_key_assigned)     /* get a key on first call ever */
     { gop_key_assigned = 1;
       if ( ! (gop_key = MPI_GetAttributeKey()) ) {
         MPI_abort ("Insufficient keys available");
       }
     }

     if (MPI_TestGroupAttribute (grphandle,gop_key,&gop_stuff))
     { /* This module has executed in this group before.
          We will use the cached information */
     }
     else
     { /* This is a group that we have not yet cached anything in.
          We will now do so.
        */

       gop_stuff = /* malloc a gop_stuff_type */
  
       /* ... fill in *gop_stuff with whatever we want ... */

       MPI_SetGroupAttribute (grphandle, gop_key, gop_stuff, 
                              gop_stuff_destructor);
     }

     /* ... use contents of *gop_stuff to do the global op ... */
     
    }

    gop_stuff_destructor (gop_stuff)   /* called by MPI on group close */
    struct gop_stuff_type *gop_stuff;
    {
      /* ... free storage pointed to by gop_stuff ... */
    }


----------------------------------------------------------------------
rj_littlefield@pnl.gov               Rik Littlefield
Tel: 509-375-3927                    Pacific Northwest Lab, MS K1-87
                                     P.O.Box 999, Richland, WA  99352
From owner-mpi-collcomm@CS.UTK.EDU  Tue Mar 16 04:54:47 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA14266; Tue, 16 Mar 93 04:54:47 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA15014; Tue, 16 Mar 93 04:54:12 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Tue, 16 Mar 1993 04:54:10 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from gmdzi.gmd.de by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA15006; Tue, 16 Mar 93 04:54:01 -0500
Received: from f1neuman.gmd.de (f1neuman) by gmdzi.gmd.de with SMTP id AA13913
  (5.65c/IDA-1.4.4); Tue, 16 Mar 1993 10:52:14 +0100
Received: by f1neuman.gmd.de id AA15815; Tue, 16 Mar 1993 10:53:37 GMT
Date: Tue, 16 Mar 1993 10:53:37 GMT
From: Rolf.Hempel@gmd.de
Message-Id: <9303161053.AA15815@f1neuman.gmd.de>
To: mpi-collcomm@cs.utk.edu, mpi-ptop@cs.utk.edu
Subject: Al's COLLCOMM proposal
Cc: gmap10@f1neuman.gmd.de


I would like to comment on Al's Collective Communications draft which
he sent out a few days ago. First of all, I agree with Al in most
points, especially that we should not attempt to include everything
into MPI-1. Given the limited time available, it seems to me a good
idea to leave the level-1 routines and all dynamic stuff for MPI-2.

In section 2 Al says that "each group has a topology associated with
it". As far as I know this is still an open issue. Do we agree that
there is always a default topology (like a ring, to make the shift
operation meaningful in all cases, or fully connected)? Otherwise a
topology is an optional attribute which a group may or may not have.

Another question then is how this assignment is done. At the last
Dallas meeting we discussed two basic options:
1. A topology is defined after the creation of the group. The topology
   thus is an attribute which is assigned to the group, and which can
   be overwritten without creating a new group.
2. A topology definition always creates a new group (or even two of
   them, the second one being the collection of processes which are not
   used by the topology). The advantage of this choice is that the
   rank of a process within a group never changes. When a group with
   topology is created, the processes can be arranged in the optimal
   way from the very beginning.

Personally, I prefer the second option. One additional advantage is the
following: assume that the original group has 10 processes, and then
a (3,3) grid topology is defined. Does a global operation on this group
include all 10 processes? If the (3,3) grid formation creates a new
subgroup of 9 processes, the answer is clear.

The draft is not consistent in the relationship of groups and
topologies. In the Introduction it says "Giving or forming groups with
topological features is presented in section 4" which suggests option
2. above. On the other hand, under Group Functions it states that
"Existing groups including the ALL group can switch their associated
topology", which sounds like option 1. Do we all agree on choosing
option 2?

On page 2 the draft states that "The collective communication routines
are implemented in terms of the topology associated with a given group.
Although the function would be the same, a broadcast in a group with
a ring topology could be implemented differently from a broadcast in
a group with hypercube topology". I see a confusion of application
and machine topologies here. The optimal implementation of a broadcast
is guided by the machine topology, which could be a hypercube. Even
if the logical group topology is a mesh, the global operation would
follow the hypercube structure. However, this implementation detail
is completely invisible to the user and should not be part of the
standard. The only thing the user sees is the mesh topology and the
result of the broadcast.

The proposed MPI_SHIFT function could be made much more useful by
adding another argument. Here's my proposal:

 Info = MPI_SHIFT(inbuf,outbuf,nitems,type,tag,group,direction,steps)

The additional integer argument "direction" selects the coordinate
direction in the group topology, and "steps" is the number of steps
in that direction. In the case of cartesian structures the meaning is
immediately clear. One could apply the function also in the case of a
general graph. In this case "direction" would specify the neighbor
number. "steps" could either be ignored, or we could define a
transitive scheme of the kind "neighbor of neighbor of neighbor ...",
with the indirection depth being specified by "steps".

A hot topic for further discussions will be the "group names" proposed
by Al. I see his point, but I don't see how the user-supplied group
name solves the problem which arises if a new process wants to join
a group. Even if the user tells MPI the global name of the group,
global knowledge of all groups in the system is required to find the
other group members to talk to. I agree to most points of Rik
Littlefields comments. The only thing which does not convince me yet
is the explicite caching mechanism. If the information caching is
handled consistently between the group management and collective
communication routines (in order to avoid usage of stale group
information), I still hope that it could be done without showing up
at the user interface.

As a last point, I would like to forward the following note by
Tom Henderson:

> Rolf,
> 
> Would it be a good idea to merge the mpi-collcomm and mpi-ptop
> mailing lists? It seems like lots of stuff on that mailing list
> now is closely related to process topology. I suppose the
> mpi-collcomm stuff could just be forwarded to the mpi-ptop list.  
> 
> Tom

I agree. What do others think?

Rolf
From owner-mpi-collcomm@CS.UTK.EDU  Tue Mar 16 08:15:58 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA15709; Tue, 16 Mar 93 08:15:58 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA25944; Tue, 16 Mar 93 08:15:02 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Tue, 16 Mar 1993 08:15:00 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from gstws.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA25934; Tue, 16 Mar 93 08:14:57 -0500
Received: by gstws.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA12817; Tue, 16 Mar 1993 08:14:54 -0500
Date: Tue, 16 Mar 1993 08:14:54 -0500
From: geist@gstws.epm.ornl.gov (Al Geist)
Message-Id: <9303161314.AA12817@gstws.epm.ornl.gov>
To: mpi-collcomm@cs.utk.edu
Subject: Revised Collective Draft - consistent with p2p draft.



Hi Folks,

I had written the first collective communication draft before
seeing the latest point-to-point draft from Marc. The two need
to be consistent in MPI. Marc has revised my first draft to be
consistent with the point-to point section and has clarified
the section considerably. Many thanks Marc.

The new collective communication draft is attached below,
this should be the focus of our discussion in this subcommittee.
The major changes:
1. The context/group management routines are now a part of the p2p section
   rather than the collcomm section. The routines are repeated
   in the following draft for completeness.

2. The buffer descriptor version of the collective routines are 
   described in much more detail in the new draft and form the core
   of the collective communication routines.

Rik has also sent both collective and p2p committees an alternate
proposal for managing context/group. Other comments are welcome.

Al Geist

--------------------------- Draft follows ---------------------------


\documentstyle[12pt]{article}


\newcommand{\discuss}[1]{
\ \\ \ \\ {\small {\bf Discussion:} #1} \ \\ \ \\
}

\newcommand{\missing}[1]{
\ \\ \ \\ {\small {\bf Missing:} #1} \\ \ \\
}

\begin{document}

\title{ Collective Communication}


\author{Al Geist \\ Marc Snir}
\maketitle

\section{Collective Communication}
\subsection{Introduction}

This section is a draft of the current proposal for collective communication.
Collective communication is defined to be communication that involves
a group of processes.  Examples are broadcast and global sum.
A collective operation is executed by having all processes in the group call the
communication routine, with matching parameters.
Routines can (but are not required to) return as soon as their
participation in the collective communication is complete.  The completion
of a call indicates that the caller is now free to access the locations in the
communication buffer, or any other location that can be referenced by the
collective operation.  However, it does not indicate that other processes in
the group have started the operation (unless otherwise indicated in the
description of the operation).   However, the successful completion of
a collective communication call may depend on the execution of a matching call
at all processes in the group.

The syntax and semantics of the collective operations is
defined so as to be consistent with the syntax and semantics of the point to
point operations.

The reader is referred to the point-to-point communication section of the current
MPI draft for information concerning groups (aka contexts) and group formation
operations, and for general information on types of objects used by the MPI
library.

The collective communication routines are built above the point-to-point
routines.  While vendors may optimize certain collective routines for
their architectures, a complete library of the collective communication
routines can be written entirely using point-to-point communication
functions.  We are using naive implementations of the collective calls in terms
of point to point operations in order to provide an operational definition of
their semantics.

The following communication functions are proposed.
\begin{itemize}
\item
Broadcast from one member to all members of a group.
\item
Barrier across all group members
\item
Gather data from all group members to one member.
\item
Scatter data from one member to all members of a group.
\item
Global operations such as sum, max, min, etc., were the result
is known by all group members and a variation where the result is
known by only one member. The ability to have user defined
global operations.
\item
Simultaneous shift of data around the group, the simplest example
being all members sending their data to (rank+1) with wrap around.
\item
Scan across all members of a group (also called parallel prefix).
\item
Broadcast from all members to all members of a group.
\item
Scatter data from all members to all members of a group
(also called complete exchange or index).
\end{itemize}

To simplify the collective communication interface it is
designed with two layers. The low level routines have all the
generality of, and make use of, the buffer descriptor routines
of the point-to-point section which allows arbitrarily complex
messages to be constructed. The second level routines are
similar to the upper level point-to-point routines in that they send
only a contiguous buffer.

\missing {

The current draft does not include the nonblocking collective communication
calls that where discussed at the last meeting.
}


\subsection{Group Functions}

The point to point document discusses the use of groups (aka contexts), and
describe the operations available for the creation and manipulation of
groups and group objects. For sake of completeness, we list
them anew here.


{\bf \ \\ MPI\_CREATE(handle, type, persistence)} \\
Create new opaque object
\begin{description}
\item[OUT handle] handle to object
\item[IN type] state value that identifies the type of object to be created
\item[IN persistence] state value; either {\tt MPI\_PERSISTENT} or {\tt
MPI\_EPHEMERAL}.
\end{description}

{\bf \ \\ MPI\_FREE(handle)} \\
Destroy object associated with handle.
\begin{description}
\item[IN handle] handle to object
\end{description}


{\bf \ \\ MPI\_ASSOCIATED(handle, type)}  \\
Returns the type of the object the handle is currently associated with, if
such exists.  Returns the special type {\tt MPI\_NULL} if the handle is
not currently associated with any object.
\begin{description}
\item[IN handle] handle to object
\item[OUT type] state
\end{description}


{\bf \ \\ MPI\_COPY\_CONTEXT(newcontext, context)}  \\

Create a new context that includes all processes in the old context.
The rank of the processes in the previous context is preserved.  The call must
be executed by all processes in the old context.  It is a blocking call:  No
call returns until all processes have called the function.
\begin{description}
\item[OUT newcontext]  handle to newly created context.  The handle should not
be associated with an object before the call.
\item[IN context] handle to old context
\end{description}

{\bf \ \\ MPI\_NEW\_CONTEXT(newcontext, context, key, index)} \\
A new context is created for
each distinct value of {\tt key}; this context is shared by all processes that
made the call with this key value.  Within each new context the processes are
ranked according to the order of the {\tt index} values they provided; in case
of ties, processes are ranked according to their rank in the old context.
This call is blocking:  No call returns until all processes in the old context
executed the call.
\begin{description}
\item[OUT newcontext] handle to newly created context at calling process.   This
handle should not be associated with an object before the call.
\item[IN context] handle to old context
\item[IN key] integer
\item[IN index] integer
\end{description}

{\bf \ \\ MPI\_RANK(rank, context)} \\
Return the rank of the calling process within the specified context.
\begin{description}
\item[OUT rank] integer
\item[IN context] context handle
\end{description}


{\bf \ \\ MPI\_SIZE(size, context)} \\
Return the number of processes that belong to the specified context.
\begin{description}
\item[OUT size] integer
\item[IN context] context handle
\end{description}

\paragraph*{Extensions}
Possible extensions for dynamic process spawning (MPI2):

{\bf \ \\ MPI\_PROCESS(process, context, rank)} \\
Returns a handle to
the process identified by the {\tt rank} and {\tt context} parameters.
\begin{description}
\item[OUT process] handle to process object
\item[IN context] handle to context object
\item[IN rank] integer
\end{description}

{\bf \ \\ MPI\_CREATE\_CONTEXT(newcontext, list\_of\_process\_handles)} \\
creates a new context out of an explicit list of members
and rank them in their order of occurrence in the list.
\begin{description}
\item[OUT newcontext] handle to newly created context.  Handle should not
be associated with an object before the call.
\item[IN list\_of\_process\_handles]
List of handles to processes to be included in new group.
\end{description}

This, coupled with a mechanism for requiring the
spawning of new processes to the computation, will allow to create a new
all inclusive context that includes the additional processes.


\subsection{Communication Functions}

The proposed communication functions are divided into two layers.
The lowest level uses the same buffer descriptor objects
available in point-to-point to create noncontiguous, multiple data type
messages. The second level is similar to the block send/receive
point-to-point operations in that it supports only contiguous buffers of
arithmetic storage units.   For each communication operation, we list these two
level of calls together.


\subsubsection{Synchronization}

\paragraph*{Barrier synchronization}

{\bf \ \\ MPI\_BARRIER( group, tag )} \\

MPI\_BARRIER blocks the calling process until all group members have called
it; the call returns at any process only after all group members have
entered the call.
\begin{description}
\item[IN group] group handle
\item[tag] communication tag (integer)
\end{description}

{\tt \ \\ MPI\_BARRIER( group, tag )}  \\ is
\begin{verbatim}
MPI_CREATE(buffer_handle, MPI_BUFFER, MPI_PERSISTENT);
MPI_SIZE( &size, group);
MPI_RANK( &rank, group);
if (rank==0)
{
   for (i=1; i < size; i++)
      MPI_RECV(buffer_handle, i, tag, group);
   for (i=1; i < size; i++)
      MPI_SEND(buffer_handle, i, tag, group);
}
else
{
   MPI_SEND(buffer_handle, 0, tag, group);
   MPI_RECV(buffer_handle, 0, tag, group);
}
MPI_FREE(buffer_handle);
\end{verbatim}

\subsubsection{Data move functions}

\paragraph*{Circular shift}

{\bf \ \\ MPI\_CSHIFT( inbuf, outbuf, tag, group, shift)} \\

Process with rank {\tt i} sends the data in its input buffer to
process with rank $\tt (i+ shift) \bmod  group\_size$, who receives the
data in its output buffer. All processes make the call with the same values for
{\tt tag, group}, and {\tt shift}.  The {\tt shift} value can be positive, zero,
or negative.

\begin{description}
\item[IN inbuf] handle to input buffer descriptor
\item[OUT outbuf] handle to output buffer descriptor
\item[IN tag] operation tag (integer)
\item[IN group] handle to group
\item[IN shift] integer
\end{description}


{\bf \ \\ MPI\_CSHIFTB( inbuf, outbuf, len, tag, group, shift)} \\

Behaves like {\tt MPI\_CSHIFT}, with buffers restricted to be blocks of
numeric units.
All processes make the call with the same values for
{\tt len, tag, group}, and {\tt shift}.
\begin{description}
\item[IN inbuf] initial location of input buffer
\item[OUT outbuf] initial location of output buffer
\item[IN len] number of entries in input (and output) buffers
\item[IN tag] operation tag (integer)
\item[IN group] handle to group
\item[IN shift] integer
\end{description}


{\tt \ \\ MPI\_CSHIFT( inbuf, outbuf, tag, group, shift)} \\ is
\begin{verbatim}
MPI_SIZE( &size, group);
MPI_RANK( &rank, group);
MPI_ISEND( handle, inbuf, mod(rank+shift, size), tag, group);
MPI_RECV( outbuf, mod(rank-shift,size), tag, group)
MPI_WAIT(handle);
\end{verbatim}

\discuss{
Do we want to support the case {\tt inbuf = outbuf} somehow?
}

\paragraph*{End-off shift}

{\bf \ \\ MPI\_EOSHIFT( inbuf, outbuf, tag, group, shift)} \\

Process with rank {\tt i}, $\tt \max( 0, -shift) \le i < min( size, size -
shift)$, sends the data
in its input buffer to process with rank {\tt i+ shift}, who receives the data
in its output buffer.   The output buffer of processes which do not receive
data is left unchanged.   All processes
make the call with the same values for {\tt tag, group}, and {\tt shift}.

\begin{description}
\item[IN inbuf] handle to input buffer descriptor
\item[OUT outbuf] handle to output buffer descriptor
\item[IN tag] operation tag (integer)
\item[IN group] handle to group
\item[IN shift] integer
\end{description}


{\bf \ \\ MPI\_EOSHIFTB( inbuf, outbuf, len, tag, group, shift)} \\

Behaves like {\tt MPI\_EOSHIFT}, with buffers restricted to be blocks of
numeric units.
All processes make the call with the same values for
{\tt len, tag, group}, and {\tt shift}.
\begin{description}
\item[IN inbuf] initial location of input buffer
\item[OUT outbuf] initial location of output buffer
\item[IN len] number of entries in input (and output) buffers
\item[IN tag] operation tag (integer)
\item[IN group] handle to group
\item[IN shift] integer
\end{description}

\discuss{

Two other possible definitions for end-off shift: (i) zero filling for processes
that don't receive messages, or (ii) boundary values explicitly provided as an
additional parameter.  Any preferences?
(Fortran 90 allows to optionally provide boundary values, and does zero filling,
if none were provided)

}

\paragraph*{Broadcast}

{\bf \ \\  MPI\_BCAST( buffer\_handle, tag, group, root )} \\

{\tt MPI\_BCAST} broadcasts a message from the process with rank {\tt root} to
all other processes
of the group. It is called by all members of group using the same arguments for
{\tt tag, group, and root}.
On return the contents of the buffer of the process with rank {\tt root}
is contained in buffer of all group members.
\begin{description}
\item[INOUT buffer\_handle]  Handle for buffer where from message is
sent or received.
\item[IN tag] tag of communication operation (integer)
\item[IN group] context of communication (handle)
\item[IN root] rank of broadcast root (integer)
\end{description}


{\bf \ \\  MPI\_BCASTB( buf, len, tag, group, root )} \\

{\tt MPI\_BCASTB} behaves like broadcast, restricted to a block buffer.
It is called by all processes with the same arguments for {\tt len, tag, group}
and {\tt root}.
\begin{description}
\item[INOUT buffer]  Starting address of buffer (choice type)
\item[IN len] Number of words in buffer (integer)
\item[IN tag] tag of communication operation (integer)
\item[IN group] context of communication (handle)
\item[in root] rank of broadcast root (integer)
\end{description}


{\tt \ \\  MPI\_BCAST( buffer\_handle, tag, group, root )} \\
is
\begin{verbatim}
MPI_SIZE( &size, context);
MPI_RANK( &rank, context);
MPI_IRECV(handle, buffer_handle, root, tag, group);
if (rank==root)
   for (i=0; i < size; i++)
      MPI_SEND(buffer_handle, i, tag, group);
MPI_WAIT(handle)
\end{verbatim}

\paragraph*{Gather}

{\bf \ \\ MPI\_GATHER( inbuf, outbuf, tag, group, root, len) } \\

Each process (including the root process) sends the content of its input
buffer to the root process.  The root process concatenates all the
incoming messages in the order of the senders' rank and places the
results in its output buffer.
It is called by all members of group using the same arguments for
{\tt tag, group}, and {\tt root}.   The input buffer of each process may have
different length.
\begin{description}
\item[IN inbuf] handle to input buffer descriptor
\item[OUT outbuf] handle to output buffer descriptor -- significant only at root
(choice)
\item[IN tag] operation tag (integer)
\item[IN group] group handle
\item[IN root] rank of receiving process (integer)
\item[OUT len] difference between output buffer size (in bytes) and
number of bytes received.
\end{description}

\discuss{

It would be more elegant (but no more convenient) to have a return status
object.
}

{\bf \ \\ MPI\_GATHERB( inbuf, inlen, outbuf, tag, group, root) } \\

{\tt MPI\_GATHER} behaves like {\tt MPI\_GATHER} restricted to block
buffers, and with the additional restriction that all input buffers should
have the same length.   All processes should provided the same values for
{\tt inlen, tag, group}, and {\tt root} .
\begin{description}
\item[IN inbuf] first variable of input buffer (choice)
\item[IN inlen] Number of (word) variables in input buffer (integer)
\item[OUT outbuf] first variable of output buffer -- significant only at
root (choice)
\item[IN tag] operation tag (integer)
\item[IN group] group handle
\item[IN root] rank of receiving process (integer)
\end{description}


{\tt \ \\ MPI\_GATHERB( inbuf, inlen, outbuf, tag, group, root) } \\
is
\begin{verbatim}
MPI_SIZE( &size, group);
MPI_RANK( &rank, group);
MPI_ISENDB(handle, inbuf, inlen, root, tag, group);
if (rank==root)
   for (i=0; i < size; i++)
   {
      MPI_RECVB(outbuf, inlen, i, tag, group, return_status);
      outbuf += inlen;
   }
MPI_WAIT(handle);
\end{verbatim}

\paragraph*{Scatter}

{\bf \ \\ MPI\_SCATTER( list\_of\_inbufs, outbuf, tag, group, root, len)} \\

The root process sends the content of its {\tt i}-th input buffer
to the process with rank {\tt i}; each process (including the root process)
stores the incoming message in its output buffer.
The difference between the size of
the output buffer (in bytes) and the number of bytes received is returned
in {\tt len}.  The routine is called by all members of the group using the same
arguments for {\tt tag, group}, and {\tt root}.
\begin{description}
\item[IN list\_of\_inbufs] list of buffer descriptor handles
\item[OUT outbuf] buffer descriptor handle
\item[IN tag]  operation tag (integer)
\item[IN group] handle
\item[IN root]  rank of sending process (integer)
\item[OUT len]  number of remaining bytes in the output buffer at each process
(integer)
\end{description}


{\tt \ \\ MPI\_SCATTER( list\_of\_inbufs, outbuf, tag, group, root, len)} \\
is
\begin{verbatim}
MPI_SIZE( &size, group);
MPI_RANK( &rank, group);
MPI_IRECV(handle, outbuf, root, tag, group);
if (rank=root)
   for (i=0; i < size; i++)
      MPI_SEND(inbuf[i], i, tag, group);
MPI_WAIT(handle, return_status);
MPI_RETURN_STATUS(return_status, len, source, tag);
\end{verbatim}


{\bf \ \\ MPI\_SCATTERB( inbuf, outbuf, len, tag, group, root)}
\\

{\tt MPI\_SCATTERB} behaves like {\tt MPI\_SCATTER} restricted to block buffers,
and with the additional restriction that all output buffers have the same
length. The input buffer block of the root process is partitioned into
{\tt n} consecutive blocks,
each consisting of {\tt len} words.  The {\tt i}-th block is sent to the
{\tt i}-th process in the group and stored in its output buffer.
The routine is called by all members of the group using the same
arguments for {\tt tag, group, len}, and {\tt root}.
\begin{description}
\item[IN inbuf] first entry in input buffer -- significant only at root
(choice).
\item[OUT outbuf] first entry in output buffer (choice).
\item[IN len]  number of entries to be stored in output buffer (integer)
\item[IN group] handle
\item[IN root]  rank of sending process (integer)
\end{description}


{\tt \ \\ MPI\_SCATTERB( inbuf, outbuf, outlen, tag, group, root) } \\
is
\begin{verbatim}
MPI_SIZE( &size, group);
MPI_RANK( &rank, group);
MPI_IRECVB( handle, outbuf, outlen, root, tag, group);
if (rank=root)
   for (i=0; i < size; i++)
   {
      MPI_SENDB(inbuf, outlen, i, tag, group, return_status);
      inbuf += outlen;
   }
MPI_WAIT(handle);
\end{verbatim}

\paragraph*{All-to-all scatter}

{\bf \ \\ MPI\_ALLSCATTER( list\_of\_inbufs, outbuf, tag, group, len)} \\

Each process in the group sends its {\tt i}-th buffer in its input buffer list
to the process with rank {\tt i} (itself included); each process concatenates
the incoming messages in its output buffer, in the order of the senders' ranks.
The number of bytes left in the output buffer is returned
in {\tt len}.  The routine is called by all members of the group using the same
arguments for {\tt tag} and {\tt group}.
\begin{description}
\item[IN list\_of\_inbufs] list of buffer descriptor handles
\item[OUT outbuf] buffer descriptor handle
\item[IN tag]  operation tag (integer)
\item[IN group] handle
\item[OUT len]  number of remaining bytes in the output buffer (integer)
\end{description}




{\bf \ \\ MPI\_ALLSCATTERB( inbuf, outbuf, len, tag, group)} \\

{\tt MPI\_ALLSCATTERB} behaves like {\tt MPI\_ALLSCATTER} restricted to
block buffers,
and with the additional restriction that all blocks sent from one process
to another have
the same length. The input buffer block of each process is partitioned
into {\tt n} consecutive blocks,
each consisting of {\tt len} words.  The {\tt i}-th block is sent to the
{\tt it}-th process in the group.  Each process concatenates the incoming
messages, in the order of the senders' ranks, and store them in its output
buffer. The routine is called by all members of the group using the same
arguments for {\tt tag, group}, and {\tt len}.
\begin{description}
\item[IN inbuf] first entry in input buffer (choice).
root (integer)
\item[OUT outbuf] first entry in output buffer (choice).
\item[IN len]  number of entries sent from each process to each other (integer).
\item[IN tag]  operation tag (integer)
\item[IN group] handle
\end{description}


{\tt \ \\ MPI\_ALLSCATTERB( inbuf, outbuf, len, tag, group)} \\ is
\begin{verbatim}
MPI_SIZE( &size, group);
MPI_RANK( &rank, group);
for (i=0; i < rank; i++)
   {
    MPI_IRECVB(recv_handles[i], outbuf, len, tag, group);
    outbuf += len;
   }
for (i=0; i < size; i++)
   {
    MPI_ISENDB(send_handle[i], inbuf, len, i, tag, group);
    inbuf += len;
   }
MPI_WAITALL(send_handle);
MPI_WAITALL(recv_handle);
\end{verbatim}

\paragraph*{All-to-all broadcast}

{\bf \ \\ MPI\_ALLCAST( inbuf, outbuf, tag, group, len)} \\

Each process in the group broadcasts its input buffer
to all processes (including itself);
each process concatenates
the incoming messages in its output buffer, in the order of the senders' ranks.
The number of bytes left in the output buffer is returned
in {\tt len}.  The routine is called by all members of the group using the same
arguments for {\tt tag} and {\tt group}.
\begin{description}
\item[IN inbuf] buffer descriptor handle for input buffer
\item[OUT outbuf] buffer descriptor handle for output buffer
\item[IN tag]  operation tag (integer)
\item[IN group] handle
\item[OUT len]  number of remaining untouched bytes in each output buffer
(integer)
\end{description}




{\bf \ \\ MPI\_ALLCASTB( inbuf, outbuf, len, tag, group)} \\

{\tt MPI\_ALLCASTB} behaves like {\tt MPI\_ALLCAST} restricted to
block buffers,
and with the additional restriction that all blocks sent from one process
to another have the same length.
The routine is called by all members of the group using the same
arguments for {\tt tag, group}, and {\tt len}.
\begin{description}
\item[IN inbuf] first entry in input buffer (choice).
root (integer)
\item[OUT outbuf] first entry in output buffer (choice).
\item[IN len]  number of entries sent from each process to each other
(including itself).
\item[IN group] handle
\end{description}


{\tt \ \\ MPI\_ALLCASTB( inbuf, outbuf, len, tag, group)} \\ is
\begin{verbatim}
MPI_SIZE( &size, group);
MPI_RANK( &rank, group);
for (i=0; i < rank; i++)
   {
    MPI_IRECVB(recv_handles[i], outbuf, len, tag, group);
    outbuf += len;
   }
for (i=0; i < size; i++)
   {
    MPI_ISENDB(send_handle[i], inbuf, len, i, tag, group);
   }
MPI_WAITALL(send_handle);
MPI_WAITALL(recv_handle);
\end{verbatim}


\subsubsection{Global Compute Operations}

\paragraph*{Reduce}

{\bf \ \\ MPI\_REDUCE( inbuf, outbuf, tag, group, root, op)} \\

Combines the values provided in the input buffer of each process in the
group, using the operation {\tt op}, and returns the combined value in
the output buffer of the process with rank {\tt root}.
Each process can provide one value, or a sequence of values, in which case the
combine operation is executed pointwise on each entry of the sequence.
For example, if the operation is {\tt max} and input buffers contains two
floating point numbers, then outbuf(1) $=$ global max(inbuf(1)) and
outbuf(2) $=$ global max(inbuf(2)). All input
buffers should define sequences of equal length of entries of types
that match the type of the operands of {\tt op}.  The
output buffer should define a sequence of the same length of entries of
types that match the type of the result of {\tt op}.
(Note that,
here as for all other communication operations, the type of entries inserted in
a message depend on the information provided by the input buffer descriptor, and
not on the declarations of these variables in the calling program.   The types
of the variables in the calling program need not match the types defined by the
buffer descriptor, but in such case the outcome of a reduce operation may be
implementation dependent.)

The operation
defined by {\tt op} is associative and commutative, and the implementation can
take advantage of associativity and commutativity in order to change
order of evaluation.
The routine is called by all group members using the same arguments
for {\tt tag, group, root} and {\tt op}.
\begin{description}
\item[IN inbuf] handle to input buffer
\item[OUT outbuf] handle to output buffer -- significant only at root
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN root] rank of root process (integer)
\item[IN op] operation (status)
\end{description}

We list below the operations are supported for Fortran, each with the
corresponding value of the {\tt op} parameter.
\begin{description}
\item[MPI\_IMAX] integer maximum
\item[MPI\_RMAX] real maximum
\item[MPI\_DMAX] double precision real maximum
\item[MPI\_IMIN] integer minimum
\item[MPI\_RMIN] real minimum
\item[MPI\_DMIN] double precision real minimum
\item[MPI\_ISUM] integer sum
\item[MPI\_RSUM] real sum
\item[MPI\_DSUM] double precision real sum
\item[MPI\_CSUM] complex sum
\item[MPI\_DCSUM] double precision complex sum
\item[MPI\_IPROD] integer product
\item[MPI\_RPROD] real product
\item[MPI\_DPROD] double precision real product
\item[MPI\_CPROD] complex product
\item[MPI\_DCPROD] double precision complex product
\item[MPI\_AND] logical and
\item[MPI\_IAND] integer (bit-wise) and
\item[MPI\_OR] logical or
\item[MPI\_IOR] integer (bit-wise) or
\item[MPI\_XOR] logical xor
\item[MPI\_IXOR] integer (bit-wise) xor
\item[MPI\_MAXLOC] rank of process with maximum integer value
\item[MPI\_MAXRLOC] rank of process with maximum real value
\item[MPI\_MAXDLOC] rank of process with maximum double precision real value
\item[MPI\_MINLOC] rank of process with minimum integer value
\item[MPI\_MINRLOC] rank of process with minimum real value
\item[MPI\_MINDLOC] rank of process with minimum double precision real value
\end{description}

{\bf \ \\ MPI\_REDUCEB( inbuf, outbuf, len, tag, group, root, op)} \\

Is same as {\tt MPI\_REDUCE}, restricted to a block buffer.
\begin{description}
\item[IN inbuf] first location in input buffer
\item[OUT outbuf] first location in output buffer -- significant only at root
\item[IN len] number of entries in input and output buffer (integer)
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN root] rank of root process (integer)
\item[IN op] operation (status)
\end{description}

\discuss{

If we are to be compatible with the point to point block operations, the
{\tt len} parameter should indicate the number of words in buffer.  But it
might be more natural to have {\tt len} indicate the number of entries in
the buffer, so that if the entries are complex or double precision, {\tt
len} will be half the number of words in the buffer.

}


{\bf \ \\ MPI\_USER\_REDUCE( inbuf, outbuf, tag, group, root, function)} \\

Same as the reduce operation function above except that a user
supplied function is used.  {\tt function} is an associative and commutative
function with two arguments.  The types of the two arguments and of the
returned values all agree.
\begin{description}
\item[IN inbuf] handle to input buffer
\item[OUT outbuf] handle to output buffer -- significant only at root
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN root] rank of root process (integer)
\item[IN function] user provided function
\end{description}

{\bf \ \\ MPI\_USER\_REDUCEB( inbuf, outbuf, len, tag, group, root, function)}
\\
Is same as {\tt MPI\_\_USER\_REDUCE}, restricted to a block buffer.
\begin{description}
\item[IN inbuf] first location in input buffer
\item[OUT outbuf] first location in output buffer -- significant only at root
\item[IN len] number of entries in input and output buffer (integer)
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN root] rank of root process (integer)
\item[IN op] operation (status)
\end{description}


\discuss{

Do we also want a version of reduce that broadcasts the result to all processes
in the group?  (This can be achieved by a reduce followed by a broadcast, but a
combined function may be somewhat more efficient.

}

\paragraph*{Scan}

{\bf \ \\  MPI\_SCAN( inbuf, outbuf, tag, group, op )} \\

MPI\_SCAN is used to perform a parallel prefix with respect to
an associative reduction operation on data distributed across the group.
The operation returns in the output buffer of the process with rank {\tt i} the
reduction of the values in the input buffers of processes with ranks {\tt
0,...,i}.  The type of operations supported and their semantic, and the
constraints on input and output buffers are as for {\tt MPI\_REDUCE}.
\begin{description}
\item[IN inbuf] handle to input buffer
\item[OUT outbuf] handle to output buffer
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN op] operation (status)
\end{description}

{\bf \ \\  MPI\_SCANB( inbuf, outbuf, len, tag, group, op )} \\
Same as {\tt MPI\_SCAN}, restricted to block buffers.

\begin{description}
\item[IN inbuf] first input buffer element (choice)
\item[OUT outbuf] first output buffer element (choice)
\item[IN len] number of entries in input and output buffer (integer)
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN op] operation (status)
\end{description}


{\bf \ \\  MPI\_USER\_SCAN( inbuf, outbuf, tag, group, function )} \\

Same as the scan operation function above except that a user
supplied function is used.  {\tt function} is an associative and commutative
function with two arguments.  The types of the two arguments and of the
returned values all agree.
\begin{description}
\item[IN inbuf] handle to input buffer
\item[OUT outbuf] handle to output buffer
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN function] user provided function
\end{description}

{\bf \ \\ MPI\_USER\_SCANB( inbuf, outbuf, len, tag, group, function)}
\\
Is same as {\tt MPI\_USER\_SCAN}, restricted to a block buffer.
\begin{description}
\item[IN inbuf] first location in input buffer
\item[OUT outbuf] first location in output buffer
\item[IN len] number of entries in input and output buffer (integer)
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN function] user provided function
\end{description}

\discuss{

Do we want scan operations executed by segments? (The HPF definition of prefix
and suffix operation might be handy -- in addition to the scanned vector of
values there is a mask that tells where segments start and end.)
}

\missing{

Nonblocking (immediate) collective operations.  The syntax is obvious:   for
each collective operation  {\tt MPI\_op(params)} one may have a new nonblocking
collective operation of the form {\tt MPI\_Iop(handle, params)}, that initiates
the execution of the corresponding operation.  The execution of the operation
is completed by executing {\tt MPI\_WAIT(handle,...},  {\tt
MPI\_STATUS(handle,...)},  {\tt MPI\_WAITALL}, {\tt MPI\_WAITANY}, or {\tt
MPI\_STATUSANY}.   There are three issues to consider:

(i) The exact definition of the semantics of there operations (in particular
constraints on order.

(ii) The complexity of implementation (including the complexity of having the
same {\tt WAIT} or {\tt STATUS} functions apply both to point-to-point and to
collective operations).

(iii) The accrued performance advantage.
}

\subsection{Correctness}

\discuss{ This is still very preliminary}

The semantics of the collective communication operations can be derived from
their operational definition in terms of  point-to-point communication.  It is
assumed that messages pertaining to one
operation cannot be confused with messages pertaining to another operation.
Also messages pertaining to two distinct occurrences of the same operation
cannot be confused, if the two occurrences have distinct parameters.
The relevant parameters for this purpose are {\tt group}, {\tt tag}, {\tt
root} and {\tt op}.
messages pertaining to another occurrence of the same operation, with different
parameters.   The implementer can, of course, use another, more efficient
implementation, as long as it has the same effect.

\discuss{

This statement does not yet apply to the current, incomplete and
somewhat careless definitions I provided in this draft.

The definition above means that messages pertaining to a collective
communication carry information identifying the operation itself, and the
values of the {\tt tag, group} and,
where relevant, {\tt root} or {\tt op} parameters.
Is this acceptable?

}


A few examples:

\begin{verbatim}
MPI_BCAST(buf, len, tag, group, 0);
MPI_BCAST(buf, len, tag, group, 1);
\end{verbatim}

Two consecutive broadcasts, in the same group, with the same tag, but different
roots.  Since the operations are distinguishable, messages from one broadcast
cannot be confused with messages from the other broadcast; the program is safe
and will execute as expected.

\begin{verbatim}
MPI_BCAST(buf, len, tag, group, 0);
MPI_BCAST(buf, len, tag, group, 0);
\end{verbatim}

Two consecutive broadcasts, in the same group, with the same tag and root.
Since point-to-point communication preserves the order of messages
here, too, messages from one broadcast will not be confused with messages from
the other broadcast; the program is safe and will execute as intended.

\begin{verbatim}
MPI_RANK(&rank, group)
if (rank==0)
  {
   MPI_BCASTB(buf, len, tag, group, 0);
   MPI_SENDB(buf, len, 2, tag, group);
  }
elseif (rank==1)
  {
   MPI_RECVB(buf, len, MPI_DONTCARE, tag, group);
   MPI_BCASTB(buf, len, tag, group, 0);
   MPI_RECVB(buf, len, MPI_DONTCARE, tag, group);
  }
else
  {
   MPI_SENDB(buf, len, 2, tag, group);
   MPI_BCASTB(buf, len, tag, group, 0);
  }
\end{verbatim}

Process zero executes a broadcast followed by a send to process one;
process two executes a send to process one, followed by a broadcast;
and process one executes a receive, a broadcast and a receive.
A possible outcome is for the operations to be matched as illustrated by the
diagram below.

\begin{verbatim}


    0                       1                      2

                / - >  receive            / - send
              /                         /
broadcast   /         broadcast       /   broadcast
           /                        /
  send   -             receive  < -


\end{verbatim}

The reason is that broadcast is not a synchronous operation; the call at a
process may return before the other processes have entered the broadcast.
Thus, the message sent by process zero can arrive to process one before the
message sent by process two, and before the call to broadcast on process one.

\end{document}



From owner-mpi-collcomm@CS.UTK.EDU  Tue Mar 16 13:43:41 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA25648; Tue, 16 Mar 93 13:43:41 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA12864; Tue, 16 Mar 93 13:43:08 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Tue, 16 Mar 1993 13:43:07 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from pnlg.pnl.gov by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA12826; Tue, 16 Mar 93 13:42:10 -0500
Received: from carbon.pnl.gov (130.20.188.38) by pnlg.pnl.gov; Tue, 16 Mar 93
 10:35 PST
Received: from sodium.pnl.gov by carbon.pnl.gov (4.1/SMI-4.1) id AA01248; Tue,
 16 Mar 93 10:33:37 PST
Received: by sodium.pnl.gov (4.1/SMI-4.0) id AA07439; Tue, 16 Mar 93 10:33:34
 PST
Date: Tue, 16 Mar 93 10:33:34 PST
From: rj_littlefield@pnlg.pnl.gov
Subject: Re:  Al's COLLCOMM proposal
To: Rolf.Hempel@gmd.de, mpi-collcomm@cs.utk.edu, mpi-ptop@cs.utk.edu
Cc: gmap10@f1neuman.gmd.de, rj_littlefield@pnlg.pnl.gov
Message-Id: <9303161833.AA07439@sodium.pnl.gov>
X-Envelope-To: mpi-ptop@cs.utk.edu, mpi-collcomm@cs.utk.edu

Rolf Hempel writes:

> I agree to most points of Rik
> Littlefields comments. The only thing which does not convince me yet
> is the explicite caching mechanism. If the information caching is
> handled consistently between the group management and collective
> communication routines (in order to avoid usage of stale group
> information), I still hope that it could be done without showing up
> at the user interface.

Just a point of clarification.  

I do NOT propose that cacheing be visible at the interface between
the application program and a collective communication routine that
it calls.  The example I provided was perhaps not explicit enough on
this point.  It said:

   efficient_global_op (grphandle, ...)
   struct group_descriptor_type *grphandle;
     <and so on>

I intended "..." to mean only the arguments that would be provided
to any collective communication routine, e.g., data buffer, number
of elements, and so on.  Nothing about cacheing there.

I think Rolf would agree that the standard collective communication
routines need an internal facility like this to coordinate with the
standard group management routines, if they are to achieve high
efficiency.  

My proposal is essentially to standardize and export that facility so
as to permit new collective communication routines to run as
efficiently as the built-ins.  In this vein, you may wish to think of
standardized cacheing as a feature to increase MPI's extensibility.

--Rik

----------------------------------------------------------------------
rj_littlefield@pnl.gov (alias 'd39135')   Rik Littlefield
Tel: 509-375-3927                         Pacific Northwest Lab, MS K1-87
Fax: 509-375-6631                         P.O.Box 999, Richland, WA  99352
From owner-mpi-collcomm@CS.UTK.EDU  Tue Mar 16 14:02:53 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA26216; Tue, 16 Mar 93 14:02:53 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA13784; Tue, 16 Mar 93 14:02:08 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Tue, 16 Mar 1993 14:02:07 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from watson.ibm.com by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA13776; Tue, 16 Mar 93 14:02:06 -0500
Message-Id: <9303161902.AA13776@CS.UTK.EDU>
Received: from YKTVMV by watson.ibm.com (IBM VM SMTP V2R3) with BSMTP id 4245;
   Tue, 16 Mar 93 14:02:05 EST
Date: Tue, 16 Mar 93 14:01:00 EST
From: "Marc Snir" <snir@watson.ibm.com>
X-Addr: (914) 945-3204  (862-3204)
        28-226 IBM T.J. Watson Research Center
        P.O. Box 218 Yorktown Heights NY 10598
To: mpi-collcomm@cs.utk.edu
Subject: draft by Geist and Snir
Reply-To: SNIR@watson.ibm.com

Next message will be the postcript file, for nonlatexers.
From owner-mpi-collcomm@CS.UTK.EDU  Tue Mar 16 14:04:18 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA26260; Tue, 16 Mar 93 14:04:18 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA13847; Tue, 16 Mar 93 14:03:24 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Tue, 16 Mar 1993 14:03:21 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from watson.ibm.com by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA13831; Tue, 16 Mar 93 14:03:16 -0500
Message-Id: <9303161903.AA13831@CS.UTK.EDU>
Received: from YKTVMV by watson.ibm.com (IBM VM SMTP V2R3) with BSMTP id 4261;
   Tue, 16 Mar 93 14:03:15 EST
Date: Tue, 16 Mar 93 14:03:14 EST
From: "Marc Snir" <snir@watson.ibm.com>
To: MPI-COLLCOMM@CS.UTK.EDU

%!PS-Adobe-2.0
%%Creator: dvips 5.47 Copyright 1986-91 Radical Eye Software
%%Title: COLLECT2.DVI.*
%%Pages: 25 1
%%BoundingBox: 0 0 612 792
%%EndComments
%%BeginProcSet: texc.pro
/TeXDict 250 dict def TeXDict begin /N /def load def /B{bind def}N /S /exch
load def /X{S N}B /TR /translate load N /isls false N /vsize 10 N /@rigin{
isls{[0 1 -1 0 0 0]concat}if 72 Resolution div 72 VResolution div neg scale
Resolution VResolution vsize neg mul TR matrix currentmatrix dup dup 4 get
round 4 exch put dup dup 5 get round 5 exch put setmatrix}N /@letter{/vsize 10
N}B /@landscape{/isls true N /vsize -1 N}B /@a4{/vsize 10.6929133858 N}B /@a3{
/vsize 15.5531 N}B /@ledger{/vsize 16 N}B /@legal{/vsize 13 N}B /@manualfeed{
statusdict /manualfeed true put}B /@copies{/#copies X}B /FMat[1 0 0 -1 0 0]N
/FBB[0 0 0 0]N /nn 0 N /IE 0 N /ctr 0 N /df-tail{/nn 8 dict N nn begin
/FontType 3 N /FontMatrix fntrx N /FontBBox FBB N string /base X array
/BitMaps X /BuildChar{CharBuilder} N /Encoding IE N end dup{/foo setfont}2
array copy cvx N load 0 nn put /ctr 0 N[}B /df{/sf 1 N /fntrx FMat N df-tail}
B /dfs{div /sf X /fntrx[sf 0 0 sf neg 0 0]N df-tail}B /E{pop nn dup definefont
setfont}B /ch-width{ch-data dup length 5 sub get} B /ch-height{ch-data dup
length 4 sub get} B /ch-xoff{128 ch-data dup length 3 sub get sub} B /ch-yoff{
ch-data dup length 2 sub get 127 sub} B /ch-dx{ch-data dup length 1 sub get} B
/ch-image{ch-data dup type /stringtype ne{ctr get /ctr ctr 1 add N}if}B /id 0
N /rw 0 N /rc 0 N /gp 0 N /cp 0 N /G 0 N /sf 0 N /CharBuilder{save 3 1 roll S
dup /base get 2 index get S /BitMaps get S get /ch-data X pop /ctr 0 N ch-dx 0
ch-xoff ch-yoff ch-height sub ch-xoff ch-width add ch-yoff setcachedevice
ch-width ch-height true[1 0 0 -1 -.1 ch-xoff sub ch-yoff .1 add]/id ch-image N
/rw ch-width 7 add 8 idiv string N /rc 0 N /gp 0 N /cp 0 N{rc 0 ne{rc 1 sub
/rc X rw}{G}ifelse}imagemask restore}B /G{{id gp get /gp gp 1 add N dup 18 mod
S 18 idiv pl S get exec}loop}B /adv{cp add /cp X}B /chg{rw cp id gp 4 index
getinterval putinterval dup gp add /gp X adv}B /nd{/cp 0 N rw exit}B /lsh{rw
cp 2 copy get dup 0 eq{pop 1}{dup 255 eq{pop 254}{dup dup add 255 and S 1 and
or}ifelse}ifelse put 1 adv}B /rsh{rw cp 2 copy get dup 0 eq{pop 128}{dup 255
eq{pop 127}{dup 2 idiv S 128 and or}ifelse}ifelse put 1 adv}B /clr{rw cp 2
index string putinterval adv}B /set{rw cp fillstr 0 4 index getinterval
putinterval adv}B /fillstr 18 string 0 1 17{2 copy 255 put pop}for N /pl[{adv
1 chg}bind{adv 1 chg nd}bind{1 add chg}bind{1 add chg nd}bind{adv lsh}bind{
adv lsh nd}bind{adv rsh}bind{adv rsh nd}bind{1 add adv}bind{/rc X nd}bind{1
add set}bind{1 add clr}bind{adv 2 chg}bind{adv 2 chg nd}bind{pop nd}bind]N /D{
/cc X dup type /stringtype ne{]}if nn /base get cc ctr put nn /BitMaps get S
ctr S sf 1 ne{dup dup length 1 sub dup 2 index S get sf div put}if put /ctr
ctr 1 add N}B /I{cc 1 add D}B /bop{userdict /bop-hook known{bop-hook}if /SI
save N @rigin 0 0 moveto}N /eop{clear SI restore showpage userdict /eop-hook
known{eop-hook}if}N /@start{userdict /start-hook known{start-hook}if
/VResolution X /Resolution X 1000 div /DVImag X /IE 256 array N 0 1 255{IE S 1
string dup 0 3 index put cvn put} for}N /p /show load N /RMat[1 0 0 -1 0 0]N
/BDot 260 string N /rulex 0 N /ruley 0 N /v{/ruley X /rulex X V}B /V
statusdict begin /product where{pop product dup length 7 ge{0 7 getinterval
(Display)eq}{pop false}ifelse}{false}ifelse end{{gsave TR -.1 -.1 TR 1 1 scale
rulex ruley false RMat{BDot}imagemask grestore}}{{gsave TR -.1 -.1 TR rulex
ruley scale 1 1 false RMat{BDot}imagemask grestore}}ifelse B /a{moveto}B
/delta 0 N /tail{dup /delta X 0 rmoveto}B /M{S p delta add tail}B /b{S p tail}
B /c{-4 M}B /d{-3 M}B /e{-2 M}B /f{-1 M}B /g{0 M}B /h{1 M}B /i{2 M}B /j{3 M}B
/k{4 M}B /w{0 rmoveto}B /l{p -4 w}B /m{p -3 w}B /n{p -2 w}B /o{p -1 w}B /q{p 1
w}B /r{p 2 w}B /s{p 3 w}B /t{p 4 w}B /x{0 S rmoveto}B /y{3 2 roll p a}B /bos{
/SS save N}B /eos{clear SS restore}B end
%%EndProcSet
TeXDict begin 1000 300 300 @start /Fa 2 61 df<127012F812FCA2127C120CA41218A212
30A212601240060F7C840E>59 D<15181578EC01E0EC0780EC1E001478EB03E0EB0F80013CC7FC
13F0EA03C0000FC8FC123C12F0A2123C120FEA03C0EA00F0133CEB0F80EB03E0EB0078141EEC07
80EC01E0EC007815181D1C7C9926>I E /Fb 33 118 df<137013F01201EA03C0EA0780EA0F00
121E121C123C123812781270A212F05AA87E1270A212781238123C121C121E7EEA0780EA03C0EA
01F0120013700C24799F18>40 D<126012F012787E7E7EEA0780120313C0120113E01200A213F0
1370A813F013E0A2120113C0120313801207EA0F00121E5A5A5A12600C247C9F18>I<123C127E
127FA3123F120F120E121E127C12F81270080C788518>44 D<127812FCA412780606778518>46
D<387FFFC0B512E0A3C8FCA4B512E0A36C13C0130C7E9318>61 D<137013F8A213D8A2EA01DCA3
138CEA038EA41306EA0707A4380FFF80A3EA0E03A2381C01C0A2387F07F038FF8FF8387F07F015
1C7F9B18>65 D<EA7FFFB512806C1300EA01C0B3A4EA7FFFB512806C1300111C7D9B18>73
D<EA7FE012FF127F000EC7FCB11470A5387FFFF0B5FC7E141C7F9B18>76
D<38FC01F8EAFE03A2383B06E0A4138EA2EA398CA213DCA3EA38D8A213F81370A21300A638FE03
F8A3151C7F9B18>I<387E07F038FF0FF8387F07F0381D81C0A313C1121CA213E1A313611371A2
13311339A31319A2131D130DA3EA7F07EAFF87EA7F03151C7F9B18>I<EAFFFEEBFF8014C0EA1C
03EB01E013001470A514E01301EB03C0EA1FFF1480EBFE00001CC7FCA8B47EA3141C7F9B18>80
D<3807F380EA1FFF5AEA7C1FEA7007EAF00312E0A290C7FC7E1278123FEA1FF0EA0FFEEA01FF38
001F80EB03C0EB01E01300A2126012E0130100F013C0EAFC07B512801400EAE7FC131C7E9B18>
83 D<387FFFF8B5FCA238E07038A400001300B2EA07FFA3151C7F9B18>I<38FF83FEA3381C0070
B2001E13F0000E13E0EA0F013807C7C03803FF806C1300EA007C171C809B18>I<38FE03F8A338
700070A36C13E0A513F8A2EA39DCA2001913C0A3138CEA1D8DA4000D13801305EA0F07A2EA0E03
151C7F9B18>87 D<38FF07F8A3381C01C0EA1E03000E1380EA0F0700071300A2EA038EA2EA01DC
A213FC6C5AA21370A9EA01FC487E6C5A151C7F9B18>89 D<EA1FE0EA3FF8487EEA783EEA300FC6
7EA248B4FC120F123FEA7F07127812F012E0A26C5AEA783F387FFFF0EA3FFBEA0FE114147D9318
>97 D<127E12FE127E120EA5133EEBFF80000F13C0EBE3E0EB80F0EB00701478000E1338A5120F
14781470EB80F0EBC3E0EBFFC0000E138038067E00151C809B18>I<EB1F80133F131F1303A5EA
03F3EA0FFBEA1FFFEA3E1FEA780FEA700712F0EAE003A5130712F01270EA780FEA3E3F381FFFF0
380FFBF83803E3F0151C7E9B18>100 D<EA03F0EA0FFC487EEA3E1F38780780EA700300F013C0
EAE001A2B5FCA300F0C7FC1270387801C0123CEA3F07381FFF8000071300EA01FC12147D9318>
I<EB1FC0EB7FE013FFEA01F1EBC0C01400A3387FFFC0B5FCA23801C000AEEA7FFFA3131C7F9B18
>I<3803F1F03807FFF85A381E1F30383C0F00EA3807A5EA3C0FEA1E1EEA1FFC485AEA3BF00038
C7FC123CEA1FFF14C04813E0387801F038F00078481338A36C1378007813F0EA7E03383FFFE000
0F13803803FE00151F7F9318>I<127E12FE127E120EA5133FEBFF80000F13C0EBE1E013801300
A2120EAA387FC3FC38FFE7FE387FC3FC171C809B18>I<EA0380487EA36C5AC8FCA4EA7FC012FF
127F1201AEB5FC14801400111D7C9C18>I<EA7FE012FF127F1200B3A4387FFFC0B512E06C13C0
131C7E9B18>108 D<387DF1F038FFFBF86CB47E381F1F1CEA1E1EA2EA1C1CAB387F1F1F39FFBF
BF80397F1F1F001914819318>I<EA7E3F38FEFF80007F13C0380FE1E013801300A2120EAA387F
C3FC38FFE7FE387FC3FC1714809318>I<EA01F0EA0FFE487E383E0F80EA3803387001C0A238E0
00E0A5EAF001007013C0EA7803383C0780EA3E0F381FFF006C5AEA01F013147E9318>I<EA7E3E
38FEFF80007F13C0380FE3E0EB80F0EB00701478000E1338A5120F14781470EB80F0EBC3E0EBFF
C0000E1380EB7E0090C7FCA7EA7FC0487E6C5A151E809318>I<387F87E038FF9FF8EA7FBF3803
FC78EBF030EBE0005BA35BA8EA7FFEB5FC6C5A15147F9318>114 D<EA0FF7EA3FFF5AEAF81FEA
E007A212F0007CC7FCEA7FF0EA1FFCEA07FEEA001F38600780EAE00312F0130738FC0F00B5FC5B
EAE7F811147D9318>I<487E1203A4387FFFC0B5FCA238038000A9144014E0A21381EBC3C0EA01
FF6C1380EB7E0013197F9818>I<387E07E0EAFE0FEA7E07EA0E00AC1301EA0F073807FFFC6C13
FE3801FCFC1714809318>I E /Fc 65 126 df<EA01E0487E487EEA0F3CEA0E1CA4133CEB39FC
1379EA0FF13807E1E0EBC1C013811383000F1380EA1FC7003D1300EA79E7EAF0EFEAE0FE137EEB
3C08141CEAF07E3878FF3C387FE7F8EA3FC3380F81F0161E7F9D1A>38 D<1338137813F8EA01E0
EA03C0EA0780EA0F00121E121C123C123812781270A312F05AA87E1270A312781238123C121C12
1E7EEA0780EA03C0EA01E0EA00F8137813380D2878A21A>40 D<126012F012787E7E7EEA0780EA
03C0120113E0120013F01370A313781338A813781370A313F013E0120113C01203EA0780EA0F00
121E5A5A5A12600D287CA21A>I<13301378A8387FFFF0B512F8A26C13F038007800A813301516
7E991A>43 D<123C127E127FA3123F1207120F120E123E12FC12F812E0080D77851A>I<387FFF
C0B512E0A26C13C013047D901A>I<127812FCA41278060676851A>I<14C0EB01E01303A214C013
07A2EB0F80A2EB1F00A2131E133EA25BA25BA2485AA25B1203A2485AA2485AA290C7FC5AA2123E
A25AA2127812F8A25A126013277DA21A>I<EA01F0EA07FC487EEA1F1FEA1C0738380380A23870
01C0A338E000E0A9EAF001007013C0A2EA780300381380EA3C07001C1300EA1F1FEA0FFE6C5AEA
01F0131E7D9D1A>I<13C012011203A21207120F127F12FD12791201B2EA7FFFA3101E7B9D1A>I<
EA07F8EA0FFE487E383C0F80387803C0EAF00100E013E0EAF000A21260C7FCA2130114C01303EB
0780EB0F00130E133E5B5BEA01E0485A485A48C7FC001E13E05AEA7FFFB5FC7E131E7D9D1A>I<
123C127EA4123C1200A9123C127C127EA3123E120E121E121C123C12F812F012E0071C77941A>
59 D<14C0EB03E01307EB0FC0EB3F80EB7F00EA01FC485AEA07E0EA1FC0485A007EC7FC5AA212
7E6C7E6C7EEA07E0EA03F86C7EEA007FEB3F80EB0FC0EB07E01303EB00C0131A7D9B1A>I<387F
FFF0B512F8A26C13F0C8FCA4387FFFF0B512F8A26C13F0150C7E941A>I<126012F87E127E6C7E
6C7EEA07F06C7EC67E137FEB3F80EB0FC0EB07E0A2EB0FC0EB3F80EB7F0013FCEA03F8485AEA1F
C0485A007EC7FC5A5A1260131A7D9B1A>I<1338137CA2136C13EEA313C6A2EA01C7A438038380
A4380701C0A213FFA24813E0EA0E00A4481370387F01FC38FF83FE387F01FC171E7F9D1A>65
D<EAFFFEEBFF8014C0381C03E0130014F01470A414E01301EB07C0381FFF80A214C0381C01E0EB
00F014701438A5147814F01301B512E014C01400151E7E9D1A>I<EBFE383803FFB84813F8EA0F
83EA1E00001C1378123C4813381270A200F013005AA87E00701338A212786C1378001C1370001E
13F0380F83E03807FFC06C13803800FE00151E7E9D1A>I<EA7FFEB5FC6C1380381C07C0EB01E0
EB00F0147014781438A2143C141CA8143C1438A21478147014F0EB01E0EB07C0EA7FFFB512006C
5A161E7F9D1A>I<B512F8A3381C0038A41400A3130EA3EA1FFEA3EA1C0EA390C7FCA3141CA5B5
12FCA3161E7E9D1A>I<387FFFFCB5FC7E380E001CA41400A3EB0380A3EA0FFFA3EA0E03A390C7
FCA8EA7FE012FF127F161E7F9D1A>I<3801F8E0EA03FEEA07FFEA0F0FEA1E03EA3C011238EA78
001270A200F013005AA5EB0FF8A338F000E01270130112781238EA3C03121EEA0F0FEA07FFEA03
FEEA01F8151E7E9D1A>I<38FF83FEA3381C0070AA381FFFF0A3381C0070AB38FF83FEA3171E7F
9D1A>I<B51280A33801C000B3A6B51280A3111E7C9D1A>I<387F03F838FF87FC387F03F8381C01
E0EB03C014801307EB0F00131E131C133C5B5B7FEA1DFC121F139E130E130FEA1E07001C138013
0314C0EB01E0A2EB00F01470007F13FC38FF81FE387F00FC171E7F9D1A>75
D<EA7FE0487E6C5A000EC7FCB3141CA5387FFFFCB5FC7E161E7F9D1A>I<007E133FB4EB7F806C
1400381D80DCA313C1A2001C139CA213E3A2EB631C1377A21336A2133E131CA21300A7007F137F
39FF80FF80397F007F00191E809D1A>I<38FE03FE12FFA2381D8070A213C0121CA213E0A21360
1370A213301338A21318131CA2130C130EA21306A213071303A238FF81F0A21380171E7F9D1A>
I<EA0FFE383FFF804813C0EA7C07EA700100F013E0EAE000B1EAF001A2007013C0EA7C07EA7FFF
6C1380380FFE00131E7D9D1A>I<EAFFFEEBFF8014C0381C03E0EB00F0147014781438A4147814
7014F0EB03E0381FFFC01480EBFE00001CC7FCA9B47EA3151E7E9D1A>I<EAFFFC13FF1480381C
07C0EB01E0EB00F01470A414F0EB01E0EB07C0381FFF8014001480381C07C0EB01E01300A514E2
14E7A338FF80FF147E143C181E7F9D1A>82 D<3807F1C0EA1FFDEA3FFFEA7C1FEA7007EAF003EA
E001A390C7FC7E1278123FEA1FF8EA0FFEEA01FF38000F80EB03C0130114E01300126012E0A2EA
F001EB03C038FE0780B5FCEBFE00EAE3FC131E7D9D1A>I<387FFFFEB5FCA238E0380EA4000013
00B3A23803FF80A3171E7F9D1A>I<38FF83FEA3381C0070B3A2001E13F0000E13E0EA0F013807
C7C03803FF806C1300EA007C171E7F9D1A>I<38FF01FEA3381C0070A3001E13F0000E13E0A338
0701C0A438038380A43801C700A4EA00C613EEA3136C137CA21338171E7F9D1A>I<00FE13FEEA
FF01EAFE000070131C0078133C00381338A7137C001C137013EEA513C6A2380DC760A31383A300
0F13E0A2380701C0171E7F9D1A>I<383FFFF85AA23870007014F0EB01E014C0EA0003EB0780EB
0F00130E131E5B133813785B5B1201485A5B120748C7FC001E1338121C123C5A1270B512F8A315
1E7E9D1A>90 D<EAFFF8A3EAE000B3AFEAFFF8A30D2776A21A>I<EAFFF8A3EA0038B3AFEAFFF8
A30D277EA21A>93 D<387FFFC0B512E0A26C13C013047D7E1A>95 D<EA1FF0EA3FFC487EEA781F
38300780EA0003A213FF1207121FEA3F83EA7C0312F012E0A3EAF007EA7C1F383FFFFCEA1FFDEA
07F016157D941A>97 D<12FEA3120EA6133FEBFFC0000F13E0EBE1F0EB8070EB00781438000E13
3C141CA5000F133C14381478EB80F0EBC3E0EBFFC0000E138038067E00161E7F9D1A>I<3801FF
80000713C04813E0EA1F01383C00C0481300127012F05AA57E1270007813707E381F01F0380FFF
E06C13C00001130014157D941A>I<EB1FC0A31301A6EA01F9EA07FDEA0FFFEA1F0FEA3C07EA78
031270EAF00112E0A5EAF0031270EA78071238EA3E1F381FFFFCEA0FFDEA03F1161E7E9D1A>I<
EA01FCEA07FF481380381F07C0383C01E0EA7800007013F000F013705AB512F0A300E0C7FC7E12
70007813707E381F01F0380FFFE06C13C00001130014157D941A>I<EB0FF0EB1FF8133FEB7878
EBF030EBE000A4387FFFF0B5FCA23800E000AF383FFF804813C06C1380151E7F9D1A>I<3801F8
FC3807FFFE5A381F0F8C381C0380003C13C0EA3801A3EA3C03001C1380EA1F0FEBFF00485AEA39
F80038C7FC123C121C381FFF8014F04813F8387C00FC0070131C00F0131E48130EA36C131E0078
133C383F01F8381FFFF06C13E00001130017217F941A>I<12FEA3120EA6133FEBFF80000F13C0
EBE1E013801300A2120EAB38FFE3FE13E713E3171E7F9D1A>I<EA01C0487EA36C5AC8FCA5EA7F
E0A31200AF387FFF80B512C06C1380121F7C9E1A>I<12FEA3120EA6EB0FFCEB1FFEEB0FFCEB03
C0EB0780EB0F00131E5B5B13FC120F13DE138F380E0780EB03C0A2EB01E0EB00F038FFE3FE14FF
14FE181E7F9D1A>107 D<EAFFE0A31200B3A6B512E0A3131E7D9D1A>I<387DF1F038FFFBF86CB4
7E381F1F1CEA1E1EA2EA1C1CAC387F1F1F39FF9F9F80397F1F1F00191580941A>I<EAFE3FEBFF
80B512C0380FE1E013801300A2120EAB38FFE3FE13E713E317157F941A>I<EA01F0EA07FCEA1F
FF383E0F80EA3C07387803C0EA700138E000E0A6EAF001007013C0EA7803383C0780EA3E0F381F
FF00EA07FCEA01F013157D941A>I<EAFE3FEBFFC0B512E0380FE1F0EB8070EB00781438000E13
3C141CA5000F133C14381478EB80F0EBC3E0EBFFC0000E1380EB7E0090C7FCA8EAFFE0A316207F
941A>I<387F87F038FF9FFCEA7FBF3803FC3CEBF018EBE000A25BA25BA9EA7FFFB5FC7E16157E
941A>114 D<380FFB80EA3FFF5AEAF80FEAE003A300F8C7FCEA7FC0EA3FFCEA0FFF38007F80EB
07C0EA600112E012F0130338FC0F80B512005BEAE7F812157C941A>I<13C01201A6387FFFE0B5
FCA23801C000AA1470A314F0EBE1E0EA00FFEB7FC0EB3F00141C7F9B1A>I<38FE0FE0A3EA0E00
AC1301A2EA0F073807FFFE7EEA01FC17157F941A>I<387F83FC38FFC7FE387F83FC380E00E0A3
380701C0A338038380A33801C700A3EA00EEA3137CA2133817157F941A>I<387FC7F8EBCFFCEB
C7F8380703C038038380EBC700EA01EFEA00FE137C13781338137C13EE120113C7380383800007
13C0EA0F01387FC7FC00FF13FE007F13FC17157F941A>120 D<387FC3FC38FFC7FE387FC3FC38
0E00E0A27EEB01C013811203EB838013C31201EBC700EA00E7A213E61366136E133CA31338A35B
A21230EA78E01271EA7FC06C5A001EC7FC17207F941A>I<387FFFF0B5FCA238E001E0EB03C0EB
078038000F00131E5B5B5B485A485A485A380F0038121E5A5AB512F8A315157E941A>I<EB07E0
131F133FEB7C0013F05BAB1201EA07C0B45A90C7FC7FEA07C0EA01E01200AB7F137CEB3FE0131F
130713277DA21A>I<127CB4FC7FEA07C0EA01E01200AB7F137CEB3FE0131F133FEB7C0013F05B
AB1201EA07C0B45A90C7FC127C13277DA21A>125 D E /Fd 56 123 df<903807F83F017FB512
C03A01FC0FE3E03903F01FC7EA07E0D80FC01387ED83C0ED8000A6B612FCA2390FC01F80B2397F
F8FFF8A223237FA221>11 D<13181330136013C01201EA0380120713005A121EA2123E123CA212
7CA3127812F8AD1278127CA3123CA2123E121EA27E7E13801203EA01C012001360133013180D31
7BA416>40 D<12C012607E7E121C7E120F7E1380EA03C0A213E01201A213F0A3120013F8AD13F0
1201A313E0A2120313C0A2EA078013005A120E5A12185A5A5A0D317DA416>I<1238127C12FE12
FFA2127F123B1203A212071206A2120C121C12181270122008117C8610>44
D<EAFFFCA50E057F8D13>I<1238127C12FEA3127C123807077C8610>I<13181378EA01F812FFA2
1201B3A7387FFFE0A213207C9F1C>49 D<EA03FCEA0FFF383C1FC0387007E0007C13F0EAFE0314
F8A21301127CEA3803120014F0A2EB07E014C0EB0F80EB1F00133E13385BEBE018EA01C0EA0380
EA0700000E1338380FFFF05A5A5AB5FCA215207D9F1C>I<EA01FE3807FFC0380F07E0381E03F0
123FEB01F813811301EA1F03000C13F0120014E0EB07C0EB1F803801FE007F380007C0EB01F014
F8EB00FCA214FE127CA212FEA214FCEA7C01007813F8383C07F0380FFFC03803FE0017207E9F1C
>I<1470A214F8A3497EA2497EA3EB06FF80010E7FEB0C3FA201187F141F01387FEB300FA20160
7F140701E07F90B5FCA239018001FCA200038090C7FCA20006147FA23AFFE00FFFF8A225227EA1
2A>65 D<B67E15E03907F001F86E7E157EA2157FA5157E15FE5DEC03F890B55AA29038F001FCEC
007E811680151F16C0A6ED3F80A2ED7F00EC01FEB612F815C022227EA128>I<D903FE13809038
1FFF819038FF01E33901F8003FD803E0131F4848130F48481307121F48C71203A2481401127EA2
00FE91C7FCA8127EED0180127F7E15036C6C1400120F6C6C1306D803F05B6C6C13386CB413F090
381FFFC0D903FEC7FC21227DA128>I<B67E15F03907F003FCEC007E81ED1F80ED0FC0ED07E0A2
16F01503A316F8A916F0A3ED07E0A2ED0FC0ED1F80ED3F00157EEC03FCB612F0158025227EA12B
>I<B612FCA23807F000153C151C150C150EA215061418A3150014381478EBFFF8A2EBF0781438
1418A21503A214001506A3150EA2151E153EEC01FCB6FCA220227EA125>I<B612F8A23807F001
EC007815381518151CA2150CA21418A21500A214381478EBFFF8A2EBF07814381418A491C7FCA8
B512E0A21E227EA123>I<D903FE134090391FFFC0C090387F00F1D801F8133F4848130FD807C0
1307000F1403485A48C71201A2481400127EA200FE1500A791380FFFFC127E007F9038001FC0A2
7EA26C7E6C7E6C7E6C7ED801FC133F39007F80E790381FFFC30103130026227DA12C>I<B53883
FFFEA23A07F0001FC0AD90B6FCA29038F0001FAFB53883FFFEA227227EA12C>I<B512E0A23803
F800B3ACB512E0A213227FA115>I<B538803FFCA23A07F0000380ED0700150E15185D15E04A5A
4A5A4AC7FC140E1418143814FCEBF1FE13F3EBF77F01FE7FEBF83F496C7E81140F6E7E8114036E
7E816E7E811680ED3FC0B53883FFFCA226227EA12C>75 D<B512E0A2D807F0C7FCB31518A41538
A21570A215F014011407B6FCA21D227EA122>I<D8FFF0EC0FFF6D5C000716E0D806FC1437A301
7E1467A26D14C7A290391F800187A290390FC00307A3903807E006A2903803F00CA2903801F818
A3903800FC30A2EC7E60A2EC3FC0A2EC1F80A3EC0F00D8FFF091B5FC140630227EA135>I<D8FF
F8EB1FFE7F0007EC00C07FEA06FF6D7E6D7E6D7E130F806D7E6D7E6D7E130080EC7F80EC3FC0EC
1FE0EC0FF0140715F8EC03FCEC01FEEC00FF157FA2153F151F150F15071503A2D8FFF013011500
27227EA12C>I<EB07FC90383FFF809038FC07E03903F001F848486C7E4848137E48487FA248C7
EA1F80A24815C0007E140FA200FE15E0A9007E15C0007F141FA26C15806D133F001F15006C6C13
7E6C6C5B6C6C485A3900FC07E090383FFF80D907FCC7FC23227DA12A>I<B6FC15E03907F007F0
EC01FC1400157EA2157FA5157EA215FC1401EC07F090B512E0150001F0C7FCADB57EA220227EA1
26>I<B512FEECFFC03907F007F0EC01F86E7E157E157FA6157E5D4A5AEC07F090B512C05D9038
F00FE06E7E6E7E6E7EA81606EC00FEEDFF0CB538803FF8ED0FF027227EA12A>82
D<3801FC043807FF8C381F03FC383C007C007C133C0078131CA200F8130CA27E1400B4FC13E06C
B4FC14C06C13F06C13F86C13FC000313FEEA003FEB03FFEB007F143FA200C0131FA36C131EA26C
133C12FCB413F838C7FFE00080138018227DA11F>I<007FB61280A2397E03F80F007814070070
14030060140100E015C0A200C01400A400001500B3A20003B512F8A222227EA127>I<B538803F
FCA23A07F0000180B3A60003EC03007F000114066C6C130E017E5B90383F80F890380FFFE00101
90C7FC26227EA12B>I<B53A0FFFF01FFEA2260FF00090C712E000076E14C0A26C6C9138800180
153F6D1503000103C01300A26C6C90387FE006156F7F6D9038C7F00CA20280EBF81C90263F8183
1318A2D91FC36D5A150114E3903A0FE600FE60A202F6EBFFE0D907FC6D5AA201035D4A133FA26D
486DC7FCA20100141E4A130EA237227FA13A>87 D<3A7FFFC1FFF0A23A03FC000C006C6C5B0000
14386D5B90387F8060013F5B14C190381FE380010F90C7FC14F7EB07FE6D5AA26D7E1300808149
7F14BF9038031FE0496C7E130E90380C07F8496C7E133890383001FE496C7E13E04848EB7F8049
EB3FC03AFFFC03FFFEA227227FA12A>I<B538800FFEA2D807F8C712C015016C6C14806C6CEB03
005D6C6C13065D90387F801C90383FC0185D90381FE07090380FF06015E06D6C5A903803FD8014
FF6D90C7FC5C1300AC90381FFFF0A227227FA12A>I<003FB512E0A29038801FC0383E003F003C
14800038EB7F00485B5C1301386003FC5C130700005B495A131F5C133F495A91C7FC5B49136048
5A12035B000714E0485A5B001FEB01C013C0383F8003007F1307EB003FB6FCA21B227DA122>I<
EA07FC381FFF80383F0FC0EB07E0130314F0121E1200A213FF1207EA1FC3EA3F03127E12FCA4EA
7E07EB1DF8381FF8FF3807E07F18167E951B>97 D<B47EA2121FABEB8FE0EBBFF8EBF07CEBC01E
EB801FEC0F80A215C0A81580141F1500EBC03EEB607C381E3FF8381C0FC01A237EA21F>I<EBFF
80000713E0380F83F0EA1F03123E127E387C01E090C7FC12FCA6127C127EA2003E13306C136038
0FC0E03807FF803800FE0014167E9519>I<EB03FEA2EB007EABEA01FCEA07FF380F81FEEA1F00
003E137E127E127C12FCA8127CA27E001E13FEEA0F833907FF7FC0EA01FC1A237EA21F>I<13FE
3807FF80380F87C0381E01E0003E13F0EA7C0014F812FCA2B5FCA200FCC7FCA3127CA2127E003E
13186C1330380FC0703803FFC0C6130015167E951A>I<EB3F80EBFFC03801F3E0EA03E7EA07C7
120FEBC3C0EBC000A6EAFFFCA2EA0FC0B2EA7FFCA213237FA211>I<3801FE1F0007B51280380F
87E7EA1F03391E01E000003E7FA5001E5BEA1F03380F87C0EBFF80D819FEC7FC0018C8FC121CA2
381FFFE014F86C13FE80123F397C003F8048131F140FA3007CEB1F00007E5B381F80FC6CB45A00
0113C019217F951C>I<B47EA2121FABEB87E0EB9FF8EBB8FCEBE07CEBC07EA21380AE39FFF1FF
C0A21A237EA21F>I<120E121FEA3F80A3EA1F00120EC7FCA7EAFF80A2121FB2EAFFF0A20C247F
A30F>I<B47EA2121FABECFF80A2EC38005C14C0EB83800187C7FC138E139E13BE13FFEBDF80EB
8FC0A2EB87E0EB83F0A2EB81F8EB80FC147E39FFF1FFC0A21A237EA21E>107
D<EAFF80A2121FB3ADEAFFF0A20C237FA20F>I<3AFF87F00FE090399FFC3FF83A1FB87E70FC90
39E03EC07C9039C03F807EA201801300AE3BFFF1FFE3FFC0A22A167E952F>I<38FF87E0EB9FF8
381FB8FCEBE07CEBC07EA21380AE39FFF1FFC0A21A167E951F>I<13FE3807FFC0380F83E0381E
00F0003E13F848137CA300FC137EA7007C137CA26C13F8381F01F0380F83E03807FFC03800FE00
17167E951C>I<38FF8FE0EBBFF8381FF07CEBC03E497E1580A2EC0FC0A8EC1F80A2EC3F00EBC0
3EEBE0FCEBBFF8EB8FC00180C7FCA8EAFFF0A21A207E951F>I<EAFF1FEB3FC0381F67E013C7A3
EB83C0EB8000ADEAFFF8A213167E9517>114 D<EA07F3EA1FFFEA780FEA7007EAF003A26CC7FC
B4FC13F0EA7FFC6C7E6C7E120738003F80EAC00F130712E0A200F01300EAFC1EEAEFFCEAC7F011
167E9516>I<13C0A41201A212031207120F121FB5FCA2EA0FC0ABEBC180A51207EBE300EA03FE
C65A11207F9F16>I<38FF83FEA2381F807EAF14FEA2EA0F833907FF7FC0EA01FC1A167E951F>I<
39FFF01FE0A2390FC00600A2EBE00E0007130CEBF01C0003131813F800015BA26C6C5AA2EB7EC0
A2137F6D5AA26DC7FCA2130EA21B167F951E>I<3AFFE3FF87F8A23A1F807C00C0D80FC0EB0180
147E13E0000790387F030014DF01F05B00031486EBF18FD801F913CC13FB9038FF07DC6C14F8EB
FE03017E5BA2EB7C01013C5BEB380001185B25167F9528>I<39FFF07FC0A2390FC01C006C6C5A
6D5A6C6C5A00015B3800FD80017FC7FCA27F6D7E497E80EB67F013E33801C1F8380381FC48C67E
000E137E39FF81FFE0A21B167F951E>I<39FFF01FE0A2390FC00600A2EBE00E0007130CEBF01C
0003131813F800015BA26C6C5AA2EB7EC0A2137F6D5AA26DC7FCA2130EA2130CA25B1278EAFC38
13305BEA69C0EA7F80001FC8FC1B207F951E>I<387FFFF0A2387C07E038700FC0EA601F00E013
8038C03F005B13FEC65A1201485AEBF0301207EA0FE0EBC070EA1F80003F1360EB00E0EA7E03B5
FCA214167E9519>I E /Fe 48 124 df<90380FC3E090387FEFF09038E07C783801C0F8D80380
13303907007000A7B61280A23907007000B0387FE3FFA21D20809F1B>11
D<EB1F80EB7FC03801E0E0EA0381A2EA070190C7FCA6B512E0A2EA0700B0387FC3FEA21720809F
19>I<90380F80F890387FE7FE9038E06E063901C0FC0F380380F8380700F00270C7FCA6B7FCA2
3907007007B03A7FE3FE3FF0A22420809F26>14 D<127012F812FCA2127C120CA31218A2123812
3012601240060E7C9F0D>39 D<136013C0EA0180EA03005A12065A121C12181238A212301270A3
1260A212E0AC1260A21270A312301238A21218121C120C7E12077EEA0180EA00C013600B2E7DA1
12>I<12C012607E7E121C120C7E12077E1380A2120113C0A31200A213E0AC13C0A21201A31380
1203A213005A12065A121C12185A5A5A0B2E7DA112>I<127012F812FCA2127C120CA31218A212
38123012601240060E7C840D>44 D<EAFFC0A30A037F8A0F>I<127012F8A3127005057C840D>I<
EA03F0EA0FFCEA1E1EEA1C0E487E00781380EA7003A300F013C0AD00701380A3EA780700381300
EA1C0EEA1E1EEA0FFCEA03F0121F7E9D17>48 D<EA03F0487EEA1E1CEA380E7F1270EB038012F0
A214C0A5EA7007A2EA380F121CEA1FFBEA07F338000380A2130714001230EA780EA2EA701CEA30
78EA1FF0EA0FC0121F7E9D17>57 D<127012F8A312701200AA127012F8A3127005147C930D>I<
EA0FC0EA3FF0EA7078EA6038EAE03C12F0A212601200137813F013E0EA01C0138012031300A7C7
FCA51207EA0F80A3EA07000E207D9F15>63 D<EB0380A3497EA3EB0DE0A3EB18F0A3EB3078A349
7EA3EBE01E13C0EBFFFE487FEB800FA200031480EB0007A24814C01403EA0F8039FFE03FFEA21F
207F9F22>65 D<B512E014F83807803E80801580A515005C143E5CEBFFF880EB801E8015801407
15C0A51580140FEC1F00143EB512FC14F01A1F7E9E20>I<B512E014FC3807803E140FEC0780EC
03C015E0140115F01400A215F8A915F0A2140115E0A2EC03C0EC0780EC0F00143EB512FC14E01D
1F7E9E23>68 D<B6FCA23807801F140780A215801401A214C1A2ECC000A2138113FFA213811380
A491C7FCA8EAFFFEA2191F7E9E1E>70 D<39FFF8FFF8A23907800F00AC90B5FCA2EB800FAD39FF
F8FFF8A21D1F7E9E22>72 D<EAFFFCA2EA0780B3A9EAFFFCA20E1F7F9E10>I<39FF807FF813C0
0007EB07809038E00300A2EA06F0A21378133CA2131EA2130FA2EB078314C31303EB01E3A2EB00
F3A2147BA2143F80A280A2000F7FEAFFF0801D1F7E9E22>78 D<B512E014F83807807C141E141F
801580A515005C141E147CEBFFF814E00180C7FCACEAFFFCA2191F7E9E1F>80
D<007FB512E0A238780F010070130000601460A200E0147000C01430A400001400B23807FFFEA2
1C1F7E9E21>84 D<EA1FE0487EEA78387FEA300E1200A3EA03FE121FEA3E0E127812F800F01330
A3131E38783F70383FEFE0380F878014147E9317>97 D<120E12FEA2120EA9133FEBFF80380FC3
C0EB00E0000E13F014701478A7147014F0120FEB01E0EBC3C0380CFF80EB3E0015207F9F19>I<
EA03F8EA0FFCEA1E1E123CEA380CEA7800127012F0A612701278EA3803123CEA1F0EEA0FFCEA03
F010147E9314>I<EB0380133FA21303A9EA03E3EA0FFBEA1E0FEA3C07EA7803A2127012F0A612
70A2EA78071238EA1E1F380FFBF8EA03E315207E9F19>I<EA03F0EA0FFCEA1E1E487EEA380712
783870038012F0B5FCA200F0C7FCA31270127838380180EA1C03380F0700EA07FEEA01F811147F
9314>I<133C13FEEA01CFEA038F1306EA0700A7EAFFF0A2EA0700B0EA7FF0A21020809F0E>I<EB
01E03803E3F0380FFF70EA1C1C383C1E00EA380EEA780FA4EA380EEA3C1EEA1C1CEA3FF8EA33E0
0030C7FCA21238EA3FFE381FFF804813C0387003E0EB00F0481370A36C13F0387801E0383E07C0
380FFF00EA03FC141F7F9417>I<120E12FEA2120EA9133E13FF380FC380EB01C0A2120EAD38FF
E7FCA216207F9F19>I<121C121E123E121E121CC7FCA6120E127EA2120EAFEAFFC0A20A1F809E
0C>I<13E0EA01F0A3EA00E01300A61370EA07F0A212001370B3A21260EAF0E0EAF1C0EA7F80EA
3E000C28829E0E>I<120E12FEA2120EA9EB1FF0A2EB0F80EB0E00130C5B5B137013F0EA0FF813
38EA0E1C131E130E7F1480130314C038FFCFF8A215207F9F18>I<120E12FEA2120EB3A9EAFFE0
A20B20809F0C>I<390E3F03F039FEFF8FF839FFC1DC1C390F80F80EEB00F0000E13E0AD3AFFE7
FE7FE0A223147F9326>I<EA0E3EEAFEFF38FFC380380F01C0A2120EAD38FFE7FCA216147F9319>
I<EA01F8EA07FE381E0780383C03C0EA3801387000E0A200F013F0A6007013E0EA7801003813C0
EA3C03381E07803807FE00EA01F814147F9317>I<EA0E3F38FEFF8038FFC3C0380F01E0380E00
F0A21478A7147014F0120FEB01E0EBC3C0380EFF80EB3E0090C7FCA7EAFFE0A2151D7F9319>I<
EA0E78EAFEFCEAFF9EEA0F1E130C1300120EACEAFFE0A20F147F9312>114
D<EA1F90EA3FF0EA7070EAE030A3EAF0001278EA7F80EA3FE0EA0FF01200EAC0781338A212E0A2
EAF070EADFE0EA8F800D147E9312>I<1206A4120EA2121E123EEAFFF8A2EA0E00AA1318A5EA07
3013E0EA03C00D1C7F9B12>I<380E01C0EAFE1FA2EA0E01AC1303A2EA070FEBFDFCEA01F11614
7F9319>I<38FF87F8A2381E01E0000E13C01480A238070300A3EA0386A2138EEA01CCA213FC6C
5AA21370A315147F9318>I<39FF9FF3FCA2391C0780F01560ECC0E0D80E0F13C0130C14E00007
EBE180EB186114713903987300EBB033A2143F3801F03EEBE01EA20000131CEBC00C1E147F9321
>I<387FC7FCA2380703E0148038038300EA01C7EA00EE13EC13781338133C137C13EEEA01C713
8738030380380701C0000F13E038FF87FEA21714809318>I<38FF87F8A2381E01E0000E13C014
80A238070300A3EA0386A2138EEA01CCA213FC6C5AA21370A31360A35B12F0EAF18012F3007FC7
FC123C151D7F9318>I<EA3FFFA2EA380EEA301CEA703CEA6038137013F0EA01E013C0EA0380EA
0783EA0F03120EEA1C07EA3C061238EA701EEAFFFEA210147F9314>I<B512FCA21602808C17>I
E /Ff 10 118 df<1238127C12FEA3127C12381200A61238127C12FEA3127C123807147C930F>
58 D<B512FEECFFC03907F007F0EC01F86E7E157E81A2ED1F80A316C0A91680A3ED3F00A2157E
5D4A5AEC07F0B612C04AC7FC221F7E9E28>68 D<D8FFF0EC7FF86D14FF00071600D806FCEB01BF
A3017EEB033FA26D1306A290381F800CA390380FC018A2903807E030A2903803F060A3903801F8
C0A2903800FD80A2EC7F00A2143EA33BFFF01C07FFF8A22D1F7E9E32>77
D<EA01FE3807FF80381F0FC0123EA2127CEB030000FCC7FCA6127C127E003E1360003F13C0EA1F
813807FF00EA01FC13147E9317>99 D<3801FC3C3807FFFE380F07DEEA1E03003E13E0A5001E13
C0380F0780EBFF00EA19FC0018C7FCA2121C381FFF8014F06C13F8003F13FC387C007C0070133E
00F0131EA30078133CA2383F01F8380FFFE000011300171E7F931A>103
D<121C123F5AA37E121CC7FCA6B4FCA2121FB0EAFFE0A20B217EA00E>105
D<38FE0FC0EB3FE0381E61F0EBC0F8EA1F801300AD38FFE3FFA218147D931D>110
D<48B4FC000713C0381F83F0383E00F8A248137CA200FC137EA6007C137CA26C13F8A2381F83F0
3807FFC00001130017147F931A>I<EA0FE6EA3FFEEA701EEA600EEAE006A2EAF800EAFFC0EA7F
F8EA3FFCEA1FFE1203EA001FEAC007A212E0EAF006EAF81EEAFFFCEAC7F010147E9315>115
D<38FF07F8A2EA1F00AD1301A2EA0F073807FEFFEA03F818147D931D>117
D E /Fg 3 21 df<B612FCA21E027C8C27>0 D<EA03F0EA0FFC487E487E481380A2B512C0A86C
1380A26C13006C5A6C5AEA03F012147D9519>15 D<150C153C15F0EC03C0EC0F00143C14F0EB07
C0011FC7FC1378EA01E0EA0780001EC8FC127812E01278121EEA0780EA01E0EA0078131FEB07C0
EB00F0143C140FEC03C0EC00F0153C150C1500A8B612FCA21E277C9F27>20
D E /Fh 70 124 df<90380F83E090387FE7F09038F07E783801C0F8EA0380EC7000EA0700A8B6
12C0A23907007000B1397FE3FF80A21D2380A21C>11 D<EB0FC0EB3FE0EBF0703801C038380380
78A23807003091C7FCA7B512F8A2380700781438B0397FE1FF80A2192380A21B>I<EB0FF8133F
EBF078EA01C0EA03801438EA0700A8B512F8A238070038B1397FF3FF80A2192380A21B>I<9038
07E03F90393FF0FF809039F03BC1C03A01C01F00E03903803E01A23A07001C00C01600A7B712E0
A23907001C011500B03A7FF1FFCFFEA2272380A229>I<127012F812FCA2127C120CA41218A212
30A212601240060F7CA20E>39 D<1330136013C0EA0180EA03005A1206120E120C121C12181238
A212301270A3126012E0AE12601270A312301238A21218121C120C120E120612077EEA0180EA00
C0136013300C327DA413>I<12C012607E7E7E120E120612077E1380120113C0A2120013E0A313
601370AE136013E0A313C01201A21380120313005A1206120E120C5A5A5A5A0C327DA413>I<49
7EB0B612FEA23900018000B01F227D9C26>43 D<127012F812FCA2127C120CA41218A21230A212
601240060F7C840E>I<EAFFE0A30B037F8B10>I<127012F8A3127005057C840E>I<EB0180A213
031400A25B1306A2130E130CA2131C1318A313381330A213701360A213E05BA212015BA2120390
C7FCA25A1206A2120E120CA3121C1218A212381230A212701260A212E05AA211317DA418>I<EA
01F0EA07FCEA0E0E487E38380380A2007813C0EA7001A300F013E0AE007013C0A3EA7803003813
80A2381C0700EA0E0EEA07FCEA01F013227EA018>I<EA01801203120F12FF12F31203B3A8EAFF
FEA20F217CA018>I<EA03F0EA0FFCEA1C1F38300F80EA6007EB03C012C000F013E0EAF801A3EA
2003120014C0A2EB0780A2EB0F00131E131C5B5B5B485A485A38070060120E120C4813E04813C0
EA7FFFB5FCA213217EA018>I<EA03F0EA0FFCEA1C1F383007801270007813C0A21303EA380712
001480A2EB0F00130E133CEA03F8A2EA001E7FEB078014C0130314E01220127012F8A200F013C0
1260EB07801230381C1F00EA0FFCEA03F013227EA018>I<130EA2131EA2133EA2136E13EE13CE
1201138EEA030E12071206120E120C1218A212301270126012E0B512F8A238000E00A73801FFF0
A215217FA018>I<00101380EA1C07381FFF005B5B13F00018C7FCA613F8EA1BFEEA1F0F381C07
80EA180314C0EA000114E0A4126012F0A214C0EAC0031260148038300700EA1C1EEA0FFCEA03F0
13227EA018>I<137E48B4FC3803C180380701C0EA0E03121CEB018048C7FCA2127812701320EA
F1FCEAF3FEEAF60738FC038000F813C0130112F014E0A51270A3003813C0130300181380381C07
00EA0E0EEA07FCEA01F013227EA018>I<12601270387FFFE0A214C0EA600038E0018038C00300
A21306C65AA25BA25BA25BA213E0A3485AA51203A86C5A13237DA118>I<EA01F0EA07FCEA0E0F
38180780EA3803383001C01270A31278EB0380123E383F0700EA1FCEEA0FFCEA03F87FEA0F7F38
1C3F80EA380F387007C0130338E001E01300A5387001C0A238380380381E0F00EA0FFEEA03F013
227EA018>I<EA01F0EA07FCEA0E0E487E383803801278127038F001C0A314E0A5127013031278
EA3807EA1C0DEA0FF9EA07F1380081C0130113031480A2383007001278130EEA701C6C5AEA1FF0
EA0FC013227EA018>I<127012F8A312701200AB127012F8A3127005157C940E>I<127012F8A312
701200AB127012F8A312781218A41230A3126012E01240051F7C940E>I<B612FEA2C9FCA8B612
FEA21F0C7D9126>61 D<497E497EA3497EA3497E130CA2EB1CF8EB1878A2EB383C1330A2497EA3
497EA348B51280A2EB800739030003C0A30006EB01E0A3000EEB00F0001F130139FFC00FFFA220
237EA225>65 D<B512F814FE3907800F80EC07C0EC03E0140115F0A515E01403EC07C0EC0F8090
B512005C9038801F80EC07C0EC03E0EC01F0140015F8A6EC01F0140315E0EC0FC0B6120014FC1D
227EA123>I<90380FE01090383FF8309038F81C703801E0063903C003F03807800148C7FC121E
003E1470123C127C15301278A212F81500A700781430A2127CA2003C1460123E121E6C14C06C7E
3903C001803901E003003800F80EEB3FF8EB0FE01C247DA223>I<B512F014FE3807801FEC07C0
1403EC01E0EC00F015F81578157C153CA3153EA9153CA2157C1578A215F0EC01E01403EC07C0EC
1F00B512FE14F81F227EA125>I<B612C0A23807800F14031401140015E0A215601460A3150014
E0138113FFA2138113801460A21518A214001530A4157015F01401EC07E0B6FCA21D227EA121>
I<B612C0A23807800F14031401140015E0A21560A21460A21500A214E0138113FFA21381138014
60A491C7FCA8EAFFFEA21B227EA120>I<903807F00890383FFC189038FC0E383801E0033903C0
01F83807800048C71278121E15385AA2007C14181278A212F81500A6EC1FFF1278007CEB0078A2
123CA27EA27E6C7E6C6C13F83801F0013900FC079890383FFE08903807F80020247DA226>I<39
FFFC3FFFA239078001E0AD90B5FCA2EB8001AF39FFFC3FFFA220227EA125>I<EAFFFCA2EA0780
B3ACEAFFFCA20E227EA112>I<EAFFFEA2EA0780B3EC0180A41403A215005CA25C143FB6FCA219
227EA11E>76 D<D8FFC0EB03FF6D5B000715E0A2D806F0130DA301781319A36D1331A36D1361A3
6D13C1A29038078181A3903803C301A3EB01E6A3EB00FCA31478EA1F80D8FFF0EB3FFF14302822
7EA12D>I<39FF800FFF13C00007EB01F89038E000607F12061378A27F133E131E7FA2EB078014
C01303EB01E0A2EB00F01478A2143CA2141E140FA2EC07E0A214031401A2381F8000EAFFF01560
20227EA125>I<EB0FE0EB7FFCEBF83E3903E00F8039078003C0390F0001E0A2001EEB00F0003E
14F8003C1478007C147CA20078143CA200F8143EA9007C147CA3003C1478003E14F8001E14F06C
EB01E0EB80033907C007C03903E00F803900F83E00EB7FFCEB0FE01F247DA226>I<B512F014FC
3807803FEC0F801407EC03C0A215E0A515C0A2EC0780140FEC3F00EBFFFC14F00180C7FCADEAFF
FCA21B227EA121>I<B512E014F83807803E140F6E7E816E7EA64A5A5D4AC7FC143EEBFFF85CEB
80788080140E140FA481A3ED818015C114073AFFFC03E300EC01FEC8127C21237EA124>82
D<3803F020380FFC60381C0EE0EA3803EA7001A2EAE000A21460A36C1300A21278127FEA3FF0EA
1FFE6C7E0003138038003FC0EB07E01301EB00F0A2147012C0A46C136014E06C13C0EAF80138EF
038038C7FF00EA81FC14247DA21B>I<007FB512F8A2387C07800070143800601418A200E0141C
00C0140CA500001400B3A20003B5FCA21E227EA123>I<3BFFF03FFC07FEA23B0F0007C001F002
03EB00E01760D807806D13C0A33B03C007F001801406A216032701E00C781300A33A00F0183C06
A3903978383E0CEC301EA2161C90393C600F18A390391EC007B0A3010F14E0EC8003A36D486C5A
A32F237FA132>87 D<EA0FE0EA1FF8EA3C1C7FEA18071200A25BEA03FF120FEA3F07127C127812
F01418A2130F1278387C3FB8383FF3F0380FC3C015157E9418>97 D<120E12FEA2121E120EAAEB
1F80EB7FE0380FC0F0EB0078000E1338143C141C141EA7141C143C000F1338EB8070EBC1F0380C
7FC0EB1F0017237FA21B>I<EA01FEEA07FF380F0780121C383803000078C7FC127012F0A71278
14C07E381E0180380F0300EA07FEEA01F812157E9416>I<14E0130FA213011300AAEA03F0EA07
FEEA1F07EA3C01EA38001278127012F0A712701278EA3801EA3C03381E0EF0380FFCFEEA03F017
237EA21B>I<EA01FCEA07FF380F0780381C03C0EA3801007813E0EA7000B5FCA200F0C7FCA512
7814607E6C13C0380F83803807FF00EA00FC13157F9416>I<133C13FEEA01CFEA038FA2EA0700
A9EAFFF8A2EA0700B1EA7FF8A2102380A20F>I<14F03801F1F83807FFB8380F1F38381E0F00EA
1C07003C1380A5001C1300EA1E0FEA0F1EEA1FFCEA19F00018C7FCA2121CEA1FFF6C13C04813E0
383801F038700070481338A400701370007813F0381E03C0380FFF803801FC0015217F9518>I<
120E12FEA2121E120EAAEB1F80EB7FC0380FC1E0EB80F0EB0070120EAE38FFE7FFA218237FA21B
>I<121C121E123E121E121CC7FCA8120E12FEA2121E120EAFEAFFC0A20A227FA10E>I<EA01C0EA
03E0A3EA01C0C7FCA8EA01E0120FA212011200B3A4EA60C012F11380EA7F00123E0B2C82A10F>
I<120E12FEA2121E120EAAEB0FFCA2EB07E0EB0380EB0700130E13185B137813F8EA0F9C131EEA
0E0E7F1480EB03C0130114E014F038FFE3FEA217237FA21A>I<120E12FEA2121E120EB3ABEAFF
E0A20B237FA20E>I<390E1FC07F3AFE7FE1FF809039C0F303C03A1F807E01E0390F003C00000E
1338AE3AFFE3FF8FFEA227157F942A>I<380E1F8038FE7FC038FFC1E0381F80F0380F0070120E
AE38FFE7FFA218157F941B>I<EA01FCEA07FF380F0780381C01C0383800E0007813F000701370
00F01378A700701370007813F0003813E0381C01C0380F07803807FF00EA01FC15157F9418>I<
380E1F8038FE7FE038FFC1F0380F0078120E143CA2141EA7143CA2000F1378EB8070EBC1F0380E
7FC0EB1F0090C7FCA8EAFFE0A2171F7F941B>I<3801F060EA07FCEA1F06381C03E0EA3C01EA78
00A25AA712781301123C1303EA1F0EEA0FFCEA03F0C7FCA8EB0FFEA2171F7E941A>I<EA0E3CEA
FEFEEAFFCFEA1F8FEA0F061300120EADEAFFF0A210157F9413>I<EA0F88EA3FF8EA7078EAE038
1318A3EAF000127FEA3FE0EA1FF0EA01F8EA003CEAC01CA212E0A2EAF018EAF878EADFF0EA8FC0
0E157E9413>I<1206A5120EA3121E123EEAFFF8A2EA0E00AA130CA51308EA0718EA03F0EA01E0
0E1F7F9E13>I<000E137038FE07F0A2EA1E00000E1370AC14F01301380703783803FE7FEA01F8
18157F941B>I<38FFC3FEA2381E00F8000E1360A26C13C0A338038180A213C300011300A2EA00
E6A3137CA31338A217157F941A>I<39FF8FF9FFA2391E01C07CD81C031338000EEBE030A2EB06
600007EB7060A2130E39038C30C01438139C3901D81980141DA2EBF00F00001400A2497EEB6006
20157F9423>I<387FC1FFA2380780F8000313E03801C1C014803800E3001377133E133C131C13
3E13771367EBC3803801C1C0380380E0380700F0EA0F8038FFC1FFA2181580941A>I<38FFC3FE
A2381E00F8000E1360A26C13C0A338038180A213C300011300A2EA00E6A3137CA31338A21330A2
13701360A2EAF0C012F1EAF380007FC7FC123E171F7F941A>I<383FFFC0A2383C038038380700
EA300EEA701EEA603C13385BEA00F0485A3803C0C01380EA07005AEA1E01001C1380EA3803EA70
07B5FCA212157F9416>I<B512FEA21702808D18>I E /Fi 22 118 df<121C127FEAFF80A5EA7F
00121C09097B8813>46 D<13075B137FEA07FFB5FCA212F8C6FCB3AB007F13FEA317277BA622>
49 D<EBFF80000713F0001F13FC383F03FFD87C001380007FEB7FC0EAFF80EC3FE0A3141FEA7F
00001C133FC7FC15C0A2EC7F80A2ECFF00495A5CEB03F0495A495A495A90383E00E05B13789038
F001C0EA01C0EA038048B5FC5A5A5A481480B6FCA31B277DA622>I<EB7F803801FFF0000713FC
380F81FE381F80FF487E9038E07F80A5381FC0FFD807001300C7FC495AEB03F8495AEBFFC014F0
EB01FC6DB4FCEC7F8015C0143F15E0121EEA7F80A2EAFFC0A315C0147FD87F801380387E00FF6C
481300380FFFFC000313F0C613801B277DA622>I<14075C5C5C5C5CA25B5B497E130F130E131C
1338137013F013E0EA01C0EA0380EA07005A120E5A5A5A5AB612F8A3C71300A7017F13F8A31D27
7EA622>I<91393FF00180903903FFFE07010FEBFF8F90393FF007FF9038FF80014848C7127FD8
07FC143F49141F4848140F485A003F15075B007F1503A3484891C7FCAB6C7EEE0380A2123F7F00
1F15076C6C15006C6C5C6D141ED801FE5C6C6C6C13F890393FF007F0010FB512C0010391C7FC90
38003FF829297CA832>67 D<B712C0A33903FE003FED0FE015031501A21500A316F09138038070
A31600A21407140F90B5FCA3EBFE0F14071403A591C8FCA9B512FEA324297DA82B>70
D<91387FE003903903FFFC0F011FEBFF1F90397FF00FFF9038FF8001D803FEC7FC484880484880
4980485A003F815B007F81A3484891C7FCA90203B512F8A2EA7FC0DA00011300A2123F7F121F6C
7E7F6C7E6C6C5B3800FF8090387FF00F011FB5123F0103EBFC0F9039007FE0032D297CA836>I<
B512FEA300011300B3B1B512FEA317297FA81A>73 D<48B47E000F13F0381F81FC486C7E147FA2
EC3F80A2EA0F00C7FCA2EB0FFF90B5FC3807FC3FEA1FE0EA3F80127F130012FEA3147F7E6CEBFF
C0393F83DFFC380FFF0F3801FC031E1B7E9A21>97 D<EB1FF0EBFFFE3803F03F390FE07F80EA1F
C0EA3F80A2127F9038001E004890C7FCA97E7F003FEB01C013C0001F1303390FE007803903F01F
003800FFFCEB1FE01A1B7E9A1F>99 D<EC3FF8A31403ACEB1FE3EBFFFB3803F03F380FE00F381F
C007383F8003A2127F13005AA97EA2EA3F801407381FC00F380FE01F3A03F03FFF803800FFF3EB
3FC3212A7EA926>I<EB3FE03801FFF83803F07E380FE03F391FC01F80393F800FC0A2EA7F00EC
07E05AA390B5FCA290C8FCA47E7F003F14E01401D81FC013C0380FE0033903F81F803900FFFE00
EB1FF01B1B7E9A20>I<1207EA1FC013E0123FA3121F13C0EA0700C7FCA7EAFFE0A3120FB3A3EA
FFFEA30F2B7DAA14>105 D<3BFFC07F800FF0903AC1FFE03FFC903AC783F0F07E3B0FCE03F9C0
7F903ADC01FB803F01F8D9FF00138001F05BA301E05BAF3CFFFE1FFFC3FFF8A3351B7D9A3A>
109 D<38FFC07F9038C1FFC09038C787E0390FCE07F09038DC03F813F813F0A313E0AF3AFFFE3F
FF80A3211B7D9A26>I<EB3FE03801FFFC3803F07E390FC01F80391F800FC0003F14E0EB000748
14F0A34814F8A86C14F0A2393F800FE0A2001F14C0390FC01F803907F07F003801FFFC38003FE0
1D1B7E9A22>I<38FFE1FE9038E7FF809038FE07E0390FF803F8496C7E01E07F140081A2ED7F80
A9EDFF00A25DEBF0014A5A01F85B9038FE0FE09038EFFF80D9E1FCC7FC01E0C8FCA9EAFFFEA321
277E9A26>I<38FFC3F0EBCFFCEBDC7E380FD8FF13F85BA3EBE03C1400AFB5FCA3181B7E9A1C>
114 D<3803FE30380FFFF0EA3E03EA7800127000F01370A27E6C1300EAFFE013FE387FFFC06C13
E06C13F0000713F8C613FC1303130000E0137C143C7EA26C13787E38FF01F038F7FFC000C11300
161B7E9A1B>I<1370A413F0A312011203A21207381FFFF0B5FCA23807F000AD1438A73803F870
000113F03800FFE0EB1F8015267FA51B>I<39FFE03FF8A3000F1303B11407A2140F0007131F3A
03F03BFF803801FFF338003FC3211B7D9A26>I E /Fj 13 119 df<EB01E01303130F137FEA1F
FFB5FCA213BFEAE03F1200B3B0007FB512F0A41C2F7AAE29>49 D<913A03FF800380023FEBF007
49B5EAFC0F0107ECFF1F011F9038803FBF903A3FF80007FFD9FFE07F48497F48497F4890C8127F
4848153F49151F121F49150F123F5B007F1607A34992C7FC12FFAB127F7FEF0780A2123F7F001F
160F6D1600120F6D5D6C6C153E6C6D5C6C6D14FC6C6D495AD93FF8495A903A1FFF801FC0010790
B55A01014AC7FCD9003F13F80203138031337BB13C>67 D<EB7FF80003B5FC000F14C0391FE01F
F09038F007F88114036E7EEA0FE0EA07C0EA0100C7FCA2EB01FF133F3801FFF13807FE01EA1FF0
EA3FE0EA7FC0138012FF1300A3EB800314076C6C487E263FF03E13F8391FFFF87F0007EBF03FC6
EB801F25207E9F28>97 D<EB07FF017F13E048B512F83903FC03FC3807F807EA0FF0EA1FE0EA3F
C0EC03F8007FEB01F0903880004000FF1400AA6C7EA2003F141E7F001F143E6C6C137C6C6C13F8
3903FE03F06CB512E06C6C1380903807FC001F207D9F25>99 D<EB0FFE90387FFFC048B57E3903
FE0FF03907F801F848486C7E48487F4848137FA2007F80491480A212FFA290B6FCA30180C8FCA3
127FA27F003FEC07807F001F140F6C6CEB1F006C6C133E3903FF01FCC6EBFFF8013F13E0010790
C7FC21207E9F26>101 D<EA03C0EA0FF0487EA37F5BA36C5AEA03C0C8FCA8EA01F812FFA4120F
1207B3A4B51280A411337DB217>105 D<EA01F812FFA4120F1207B3B3A4B512C0A412327DB117>
108 D<2703F007F8EB0FF000FFD93FFFEB7FFE4A6DB5FC903CF1F03FC3E07F80903CF3C01FE780
3FC0260FF780EBEF0000079026000FFEEB1FE001FE5C495CA2495CB2B500C1B50083B5FCA44020
7D9F45>I<3903F007F800FFEB3FFF4A7F9039F1F03FC09039F3C01FE0380FF7800007496C7E13
FE5BA25BB2B500C1B51280A429207D9F2E>I<EB07FE90383FFFC090B512F03903FC03FC3907F0
00FE4848137F4848EB3F80003F15C0A24848EB1FE0A300FF15F0A8007F15E0A36C6CEB3FC0A26C
6CEB7F80000F15003907F801FE3903FE07FC6CB55AD8003F13C0D907FEC7FC24207E9F29>I<13
78A513F8A41201A212031207120F381FFFFEB5FCA33807F800AF140FA7141F3803FC1EEBFE3E38
01FFFC38007FF0EB1FC0182E7EAD20>116 D<D801F8EB03F000FFEB01FFA4000FEB001F000714
0FB1151FA2153F157F6C6C497E903AFE03EFFF806CB512CF6C6C130FEB0FFC29207D9F2E>I<B5
38803FFEA43A07F80003C06D1307000315806D130F000115006D5B6C141EA26D6C5AA2ECC07C01
3F1378ECE0F8011F5B14F1010F5B14F3903807FBC0A214FF6D5BA26D90C7FCA26D5AA2147CA227
207E9F2C>I E /Fk 19 117 df<1238127C12FEA212FF127F123B1203A41206A2120CA2121812
381270122008137B8611>44 D<1318133813F8120712FF12F81200B3AD487E387FFFF0A214287C
A71E>49 D<137F3801FFC0380781F0380E00F80018137C121E003F137EEB803EA3381F007E000E
137CC7FCA25C5C495AEB07C001FFC7FCA2EB01E06D7E147C80A280A21580123C127EB4FCA31500
485B007C133E00305B001C5B380F01F06CB45AC690C7FC19297EA71E>51
D<EB0FE0EB3FF0EBF8383801E00C3803803E0007137EEA0F00120E121E001C133C003C90C7FCA2
127C1278130438F87FC0EBFFF038F9807838FB003C00FE131C141E48131F805A1580A41278A312
7C003C1400A2001C131E121E000E5B6C5B3803C0F03801FFC06C6CC7FC19297EA71E>54
D<137F3801FFC03807C1E0380F0070001E7F001C133C003C131C48131EA200F87FA41580A41278
141F127C003C133F121C001E136F6C13CF3807FF8F0001130FD8001013001300A2141EA2121E00
3F5BA25C1470003E5B381801C0380E0780D807FEC7FCEA01F819297EA71E>57
D<1418143CA3147EA314FFA3903801BF80149FA29038030FC0A390380607E0A3496C7EA3496C7E
A3496C7EA2EB3FFF497F903860007EA2497FA20001158049131FA2000315C090C7120F487ED81F
C0EB1FE026FFF801B5FCA2282A7EA92D>65 D<02FF13100107EBE03090391FC0707090387E001C
01F8EB0EF048481303485A4848130148481300A248C812705A123E1630127E127CA200FC1500A8
4AB5FC127C007E90380007F01503123EA2123F7E6C7EA26C7E6C7E6C6C13076C7E017E131C9039
1FC07870903907FFE0100100EB8000282B7DA92F>71 D<D8FFF0913807FFC06D5C0007EEF80000
035E017C141BA36D1433A36D1463A26D6C13C3A3903907C00183A3903903E00303A2903801F006
A3903800F80CA3EC7C18A3EC3E30A2EC1F60A3EC0FC0A33907800780D80FC04A7ED8FFFC91B512
C06E5A32297EA837>77 D<EBFE013803FF83380781E7381E0077001C133F487F00787F127000F0
7FA280A27EA26C90C7FC127EEA7FC0EA3FFCEBFFC06C13F06C7F6C7F00017F38001FFF01011380
EB003F140F15C0140712C01403A37E1580A26C13076C14006C130E00EF5B38E3C07838C1FFF038
803FC01A2B7DA921>83 D<EA07FC381FFF80383E07C0383F01E06D7E1478121EC7FCA3EB0FF8EA
01FF3807F878EA1FC0EA3F00127CA2481460A314F8A2EA7C01393F077CC0391FFE3F803907F01F
001B1A7E991E>97 D<EB7FE03801FFF83807C07C380F00FC121E123E003C1378007C1300127812
F8A8127CA2003C130C123E6C1318380F80303807E0603801FFC038007F00161A7E991B>99
D<137E3803FF80380783E0380F00F0121E481378A2007C133C127812F8B512FCA200F8C7FCA512
78127C003C130C123E001E1318380F80303807E0603801FFC038007F00161A7E991B>101
D<EA078012FFA2120F1207ACEB83F8EB8FFCEB9C1EEBB00F9038E0078013C0A21380B139FFFCFF
FCA21E2A7FA921>104 D<120FEA1F80A213C01380A2EA0F00C7FCA8EA0780127FA2120F1207B3
A2EAFFF8A20D297FA811>I<EA078012FFA2120F1207B3B2EAFFFCA20E2A7FA911>108
D<380783F838FF8FFCEB9C1E380FB00F3907E0078013C0A21380B139FFFCFFFCA21E1A7F9921>
110 D<380787C038FF9FE0EBB9F0EA0FF1EA07E1EBC0E01400A25BAF7FEAFFFEA2141A7F9917>
114 D<3807F840381FFFC0EA3C07EA7003EA6001EAE000A36C1300127EEA7FF0EA3FFC6CB4FC00
07138038003FC0130738C001E013007EA36C13C0EAF80138FE078038C7FF00EA83F8131A7E9918
>I<487EA41203A31207A2120F123FB51280A238078000AD14C0A73803C180EBE300EA01FEEA00
7C12257FA417>I E /Fl 12 119 df<DBFFC01360020701F813E0023F13FE9139FFC01F01903A
03FE0007C3D907F8EB01E7D91FE0EB00F74948143F4948141F49C8FC4848150F48481507491503
120748481501A2485A1700123F5B1860127FA348481600AD6C7E1860A2123FA27F001F17E018C0
6C7E17016C6C1680000316037F6C6CED07006C6C150E6D6C141E6D6C5C6D6C5CD907F85CD903FE
EB03E0903A00FFC01F8091263FFFFEC7FC020713F8020013C0333D7BBB3E>67
D<EB3FC03801FFF83807C07E390E001F80001E6D7E393F8007E013C06E7EA26E7EEA1F80EA0F00
C7FCA4141FEB07FFEB3FF9EBFF01EA03F8EA07F0EA1FE013C0EA3F80EA7F00A248150C5AA31403
A26C13076C130E3A3F800C7C183A1FC03C7E383A0FE0703FF03A03FFE01FE03A007F800F802628
7CA62B>97 D<EB03FE90381FFFC090387E01F09038F800384848133C484813FE00071301EA0FC0
EA1F80A2003FEB00FC90C71278481400A2127E12FEAA127E127FA215037E6D1307001F14066C6C
130E150C6C6C131C6C6C1338C66C13F090387E03C090381FFF00EB03FC20287DA626>99
D<EB03FCEB1FFF90387E07C09038F803F03903F001F848486C7E49137C000F147E4848133E003F
143F90C77EA2481580A2127E12FEA2B7FCA248C9FCA6127E127FA26CEC0180A26C6C130316006C
6C5B6C6C130E0003140C6C6C133CD800FC137090383F01E090380FFF80D901FEC7FC21287EA626
>101 D<EA01C0EA07F0487EA56C5AEA01C0C8FCABEA01F8127FA312071201B3AB487EB512E0A3
133A7FB917>105 D<EA01F812FFA312071201B3B3AF487EB512F0A3143C7FBB17>108
D<2701F803F8EB03F800FFD91FFFEB1FFF913B7C0FC07C0FC0913BE007E0E007E03C07F9C003E1
C0032601FB80D9F3807FD9FF00EBF70049D901FE6D7EA2495CA3495CB3A4486C496C497EB500F0
B500F0B512F0A344267EA549>I<3901F807F800FFEB1FFE9138781F809138E00FC03A07F9C007
E03801FB80EBFF00496D7EA25BA35BB3A4486C497EB500F1B512E0A32B267EA530>I<EB01FE90
380FFFC090383F03F09038F8007C48487F48487F4848EB0F804848EB07C0A248C7EA03E04815F0
A3007EEC01F8A300FE15FCA9007E15F8A2007F14036C15F0A26C15E06D1307000F15C06C6CEB0F
806C6CEB1F006C6C133E6C6C5B90383F03F090380FFFC0D901FEC7FC26287EA62B>I<1318A513
38A41378A213F8A2120112031207001FB5FCB6FCA2D801F8C7FCB2EC0180AA3800FC031500137C
EB7E07EB3F0EEB0FFCEB03F019367EB421>116 D<D801F8EB03F000FFEB01FFA30007EB000F00
011403B3A51507A3150F12006D131F017CEB3BFC017E903873FFE090381F81E390380FFF83903A
01FE03F0002B277EA530>I<B538801FFFA33A07FC0007F86C48EB03E0ED01C0120116806C6CEB
0300A3017E1306A2017F130E6D130CA26D6C5AA2ECC038010F1330A26D6C5AA36D6C5AA214F901
015BA26DB4C7FCA3147EA2143CA3141828267EA42D>I E end
%%EndProlog
%%BeginSetup
%%Feature: *Resolution 300
TeXDict begin
%%EndSetup
%%Page: 1 1
bop 477 509 a Fl(Collecti)q(v)o(e)31 b(Comm)n(unication)864
656 y Fk(Al)20 b(Geist)843 731 y(Marc)g(Snir)772 848 y(Marc)n(h)h(16,)e(1993)
164 1054 y Fj(1)83 b(Collectiv)n(e)25 b(Comm)n(unication)164
1178 y Fi(1.1)70 b(In)n(tro)r(duction)164 1270 y Fh(This)17
b(section)f(is)h(a)g(draft)g(of)g(the)g(curren)o(t)e(prop)q(osal)k(for)e
(collectiv)o(e)d(comm)o(uni)o(cation.)164 1330 y(Collectiv)o(e)20
b(comm)o(unication)g(is)i(de\014ned)h(to)g(b)q(e)f(comm)o(unication)e(that)j
(in)o(v)o(olv)o(es)e(a)164 1391 y(group)h(of)e(pro)q(cesses.)35
b(Examples)20 b(are)g(broadcast)i(and)f(global)g(sum.)34 b(A)20
b(collectiv)o(e)164 1451 y(op)q(eration)c(is)e(executed)g(b)o(y)g(ha)o(ving)h
(all)g(pro)q(cesses)g(in)g(the)f(group)i(call)e(the)h(comm)o(uni-)164
1511 y(cation)c(routine,)g(with)g(matc)o(hing)e(parameters.)19
b(Routines)11 b(can)g(\(but)g(are)g(not)g(required)164 1571
y(to\))19 b(return)g(as)g(so)q(on)h(as)f(their)f(participation)h(in)f(the)h
(collectiv)o(e)d(comm)o(uni)o(cation)g(is)164 1631 y(complete.)28
b(The)20 b(completion)d(of)j(a)g(call)f(indicates)f(that)i(the)g(caller)e(is)
h(no)o(w)h(free)f(to)164 1692 y(access)d(the)g(lo)q(cations)g(in)g(the)g
(comm)o(unic)o(ation)e(bu\013er,)i(or)g(an)o(y)g(other)g(lo)q(cation)g(that)
164 1752 y(can)h(b)q(e)g(referenced)e(b)o(y)i(the)f(collectiv)o(e)e(op)q
(eration.)24 b(Ho)o(w)o(ev)o(er,)15 b(it)h(do)q(es)i(not)f(indicate)164
1812 y(that)i(other)g(pro)q(cesses)g(in)f(the)g(group)i(ha)o(v)o(e)e(started)
g(the)h(op)q(eration)g(\(unless)g(other-)164 1872 y(wise)e(indicated)g(in)g
(the)g(description)g(of)h(the)g(op)q(eration\).)26 b(Ho)o(w)o(ev)o(er,)15
b(the)j(successful)164 1932 y(completion)d(of)i(a)g(collectiv)o(e)c(comm)o
(unication)h(call)i(ma)o(y)f(dep)q(end)h(on)i(the)e(execution)164
1993 y(of)h(a)f(matc)o(hing)f(call)g(at)i(all)f(pro)q(cesses)g(in)g(the)g
(group.)237 2053 y(The)h(syn)o(tax)f(and)h(seman)o(tics)e(of)i(the)g
(collectiv)o(e)c(op)q(erations)18 b(is)f(de\014ned)f(so)h(as)h(to)164
2113 y(b)q(e)c(consisten)o(t)f(with)h(the)f(syn)o(tax)h(and)g(seman)o(tics)e
(of)i(the)f(p)q(oin)o(t)h(to)g(p)q(oin)o(t)g(op)q(erations.)237
2173 y(The)24 b(reader)f(is)h(referred)e(to)j(the)e(p)q(oin)o(t-to-p)q(oin)o
(t)i(comm)o(unic)o(ation)c(section)j(of)164 2233 y(the)16 b(curren)o(t)f(MPI)
g(draft)h(for)g(information)f(concerning)g(groups)i(\(ak)m(a)g(con)o(texts\))
e(and)164 2293 y(group)i(formation)f(op)q(erations,)h(and)g(for)g(general)f
(information)f(on)i(t)o(yp)q(es)g(of)f(ob)s(jects)164 2354
y(used)g(b)o(y)g(the)g(MPI)g(library)l(.)237 2414 y(The)f(collectiv)o(e)d
(comm)o(unic)o(ation)h(routines)h(are)h(built)f(ab)q(o)o(v)o(e)h(the)g(p)q
(oin)o(t-to-p)q(oin)o(t)164 2474 y(routines.)35 b(While)20
b(v)o(endors)h(ma)o(y)f(optimize)e(certain)i(collectiv)o(e)e(routines)j(for)h
(their)961 2599 y(1)p eop
%%Page: 2 2
bop 164 307 a Fh(arc)o(hitectures,)22 b(a)h(complete)c(library)j(of)g(the)g
(collectiv)o(e)e(comm)o(unic)o(ation)g(routines)164 367 y(can)11
b(b)q(e)g(written)g(en)o(tirely)d(using)k(p)q(oin)o(t-to-p)q(oin)o(t)g(comm)o
(unic)o(ation)d(functions.)19 b(W)l(e)11 b(are)164 428 y(using)16
b(naiv)o(e)e(impleme)o(n)o(tations)f(of)j(the)f(collectiv)o(e)e(calls)i(in)g
(terms)f(of)h(p)q(oin)o(t)h(to)g(p)q(oin)o(t)164 488 y(op)q(erations)h(in)f
(order)h(to)f(pro)o(vide)g(an)g(op)q(erational)h(de\014nition)f(of)h(their)e
(seman)o(tics.)237 548 y(The)h(follo)o(wing)g(comm)o(unication)d(functions)k
(are)f(prop)q(osed.)237 640 y Fg(\017)24 b Fh(Broadcast)17
b(from)e(one)h(mem)o(b)q(er)d(to)k(all)f(mem)n(b)q(ers)e(of)i(a)h(group.)237
739 y Fg(\017)24 b Fh(Barrier)15 b(across)i(all)f(group)h(mem)o(b)q(ers)237
838 y Fg(\017)24 b Fh(Gather)17 b(data)g(from)e(all)h(group)h(mem)n(b)q(ers)d
(to)j(one)f(mem)o(b)q(er.)237 936 y Fg(\017)24 b Fh(Scatter)16
b(data)h(from)e(one)i(mem)n(b)q(er)d(to)i(all)g(mem)o(b)q(ers)d(of)k(a)f
(group.)237 1035 y Fg(\017)24 b Fh(Global)13 b(op)q(erations)g(suc)o(h)g(as)g
(sum,)f(max,)f(min,)g(etc.,)h(w)o(ere)g(the)g(result)g(is)h(kno)o(wn)286
1095 y(b)o(y)f(all)g(group)i(mem)o(b)q(ers)c(and)j(a)g(v)m(ariation)g(where)g
(the)f(result)g(is)h(kno)o(wn)g(b)o(y)f(only)286 1155 y(one)k(mem)o(b)q(er.)i
(The)f(abilit)o(y)d(to)j(ha)o(v)o(e)e(user)i(de\014ned)f(global)g(op)q
(erations.)237 1254 y Fg(\017)24 b Fh(Sim)o(ultaneous)e(shift)i(of)g(data)h
(around)g(the)e(group,)k(the)c(simplest)f(example)286 1314
y(b)q(eing)16 b(all)g(mem)o(b)q(ers)d(sending)k(their)e(data)i(to)g
(\(rank+1\))g(with)f(wrap)h(around.)237 1413 y Fg(\017)24 b
Fh(Scan)16 b(across)i(all)d(mem)o(b)q(ers)f(of)i(a)h(group)g(\(also)g(called)
e(parallel)g(pre\014x\).)237 1512 y Fg(\017)24 b Fh(Broadcast)17
b(from)e(all)h(mem)n(b)q(ers)e(to)j(all)e(mem)o(b)q(ers)e(of)k(a)g(group.)237
1610 y Fg(\017)24 b Fh(Scatter)18 b(data)h(from)e(all)g(mem)o(b)q(ers)e(to)k
(all)e(mem)o(b)q(ers)e(of)k(a)f(group)h(\(also)g(called)286
1670 y(complete)14 b(exc)o(hange)h(or)i(index\).)237 1763 y(T)l(o)24
b(simplify)d(the)i(collectiv)o(e)e(comm)o(unic)o(ation)g(in)o(terface)h(it)h
(is)h(designed)f(with)164 1823 y(t)o(w)o(o)18 b(la)o(y)o(ers.)28
b(The)18 b(lo)o(w)h(lev)o(el)d(routines)j(ha)o(v)o(e)e(all)h(the)h(generalit)
o(y)e(of,)i(and)g(mak)o(e)e(use)164 1883 y(of,)i(the)g(bu\013er)g(descriptor)
f(routines)h(of)g(the)g(p)q(oin)o(t-to-p)q(oin)o(t)g(section)g(whic)o(h)f
(allo)o(ws)164 1944 y(arbitrarily)h(complex)e(messages)j(to)g(b)q(e)g
(constructed.)32 b(The)19 b(second)h(lev)o(el)e(routines)164
2004 y(are)f(similar)e(to)i(the)f(upp)q(er)i(lev)o(el)c(p)q(oin)o(t-to-p)q
(oin)o(t)k(routines)f(in)g(that)g(they)f(send)h(only)164 2064
y(a)g(con)o(tiguous)f(bu\013er.)164 2233 y Ff(Missing:)237
2293 y Fe(The)h(curren)o(t)f(draft)g(do)q(es)h(not)f(include)j(the)e(non)o
(blo)q(c)o(king)h(collectiv)o(e)g(comm)o(unication)164 2354
y(calls)e(that)f(where)g(discussed)i(at)d(the)i(last)f(meeting.)961
2599 y Fh(2)p eop
%%Page: 3 3
bop 164 307 a Fi(1.2)70 b(Group)24 b(F)-6 b(unctions)164 400
y Fh(The)15 b(p)q(oin)o(t)g(to)g(p)q(oin)o(t)g(do)q(cumen)o(t)e(discusses)i
(the)g(use)g(of)g(groups)h(\(ak)m(a)f(con)o(texts\),)f(and)164
460 y(describ)q(e)f(the)g(op)q(erations)i(a)o(v)m(ailable)e(for)h(the)f
(creation)g(and)h(manipulation)f(of)h(groups)164 520 y(and)j(group)g(ob)s
(jects.)k(F)l(or)16 b(sak)o(e)g(of)h(completeness,)d(w)o(e)h(list)h(them)f
(anew)h(here.)164 640 y Fd(MPI)p 279 640 17 2 v 20 w(CREA)-5
b(TE\(handle,)20 b(t)n(yp)r(e,)d(p)r(ersistence\))164 700 y
Fh(Create)f(new)g(opaque)h(ob)s(ject)164 799 y Fd(OUT)i(handle)25
b Fh(handle)16 b(to)h(ob)s(ject)164 900 y Fd(IN)h(t)n(yp)r(e)24
b Fh(state)16 b(v)m(alue)g(that)h(iden)o(ti\014es)e(the)h(t)o(yp)q(e)g(of)g
(ob)s(ject)g(to)g(b)q(e)h(created)164 1000 y Fd(IN)h(p)r(ersistence)24
b Fh(state)16 b(v)m(alue;)g(either)f Fc(MPI)p 1020 1000 16
2 v 17 w(PERSISTENT)e Fh(or)j Fc(MPI)p 1447 1000 V 18 w(EPHEMERAL)o
Fh(.)164 1159 y Fd(MPI)p 279 1159 17 2 v 20 w(FREE\(handle\))164
1219 y Fh(Destro)o(y)g(ob)s(ject)g(asso)q(ciated)h(with)f(handle.)164
1317 y Fd(IN)i(handle)26 b Fh(handle)16 b(to)g(ob)s(ject)164
1476 y Fd(MPI)p 279 1476 V 20 w(ASSOCIA)-5 b(TED\(handle,)21
b(t)n(yp)r(e\))164 1536 y Fh(Returns)h(the)f(t)o(yp)q(e)h(of)g(the)f(ob)s
(ject)h(the)f(handle)h(is)g(curren)o(tly)e(asso)q(ciated)j(with,)f(if)164
1597 y(suc)o(h)15 b(exists.)20 b(Returns)c(the)f(sp)q(ecial)g(t)o(yp)q(e)f
Fc(MPI)p 1041 1597 16 2 v 18 w(NULL)g Fh(if)g(the)h(handle)h(is)f(not)h
(curren)o(tly)164 1657 y(asso)q(ciated)h(with)f(an)o(y)g(ob)s(ject.)164
1755 y Fd(IN)i(handle)26 b Fh(handle)16 b(to)g(ob)s(ject)164
1856 y Fd(OUT)j(t)n(yp)r(e)k Fh(state)164 2014 y Fd(MPI)p 279
2014 17 2 v 20 w(COPY)p 461 2014 V 22 w(CONTEXT\(new)n(con)n(text,)18
b(con)n(text\))237 2135 y Fh(Create)f(a)g(new)f(con)o(text)g(that)h(includes)
e(all)h(pro)q(cesses)i(in)e(the)g(old)h(con)o(text.)k(The)164
2195 y(rank)c(of)f(the)g(pro)q(cesses)h(in)f(the)h(previous)f(con)o(text)f
(is)h(preserv)o(ed.)21 b(The)16 b(call)g(m)o(ust)f(b)q(e)164
2255 y(executed)j(b)o(y)i(all)f(pro)q(cesses)i(in)e(the)h(old)g(con)o(text.)
31 b(It)19 b(is)h(a)g(blo)q(c)o(king)f(call:)28 b(No)20 b(call)164
2315 y(returns)c(un)o(til)f(all)h(pro)q(cesses)h(ha)o(v)o(e)e(called)h(the)g
(function.)164 2414 y Fd(OUT)j(new)n(con)n(text)24 b Fh(handle)12
b(to)h(newly)f(created)g(con)o(text.)19 b(The)13 b(handle)f(should)h(not)286
2474 y(b)q(e)j(asso)q(ciated)i(with)e(an)g(ob)s(ject)g(b)q(efore)g(the)g
(call.)961 2599 y(3)p eop
%%Page: 4 4
bop 164 307 a Fd(IN)18 b(con)n(text)24 b Fh(handle)16 b(to)h(old)f(con)o
(text)164 463 y Fd(MPI)p 279 463 17 2 v 20 w(NEW)p 438 463
V 20 w(CONTEXT\(new)n(con)n(text,)i(con)n(text,)g(k)n(ey)-5
b(,)18 b(index\))164 524 y Fh(A)13 b(new)h(con)o(text)f(is)h(created)f(for)i
(eac)o(h)e(distinct)g(v)m(alue)h(of)g Fc(key)p Fh(;)f(this)h(con)o(text)f(is)
g(shared)164 584 y(b)o(y)20 b(all)g(pro)q(cesses)h(that)g(made)e(the)i(call)e
(with)i(this)f(k)o(ey)f(v)m(alue.)34 b(Within)20 b(eac)o(h)g(new)164
644 y(con)o(text)c(the)g(pro)q(cesses)i(are)f(rank)o(ed)f(according)h(to)g
(the)g(order)f(of)h(the)g Fc(index)e Fh(v)m(alues)164 704 y(they)j(pro)o
(vided;)h(in)g(case)f(of)h(ties,)g(pro)q(cesses)g(are)g(rank)o(ed)f
(according)h(to)h(their)e(rank)164 764 y(in)g(the)g(old)g(con)o(text.)26
b(This)18 b(call)f(is)h(blo)q(c)o(king:)24 b(No)18 b(call)g(returns)g(un)o
(til)f(all)g(pro)q(cesses)164 825 y(in)f(the)g(old)g(con)o(text)g(executed)e
(the)i(call.)164 920 y Fd(OUT)j(new)n(con)n(text)24 b Fh(handle)13
b(to)g(newly)g(created)f(con)o(text)g(at)i(calling)e(pro)q(cess.)21
b(This)286 981 y(handle)16 b(should)h(not)g(b)q(e)f(asso)q(ciated)h(with)f
(an)h(ob)s(ject)f(b)q(efore)g(the)g(call.)164 1080 y Fd(IN)i(con)n(text)24
b Fh(handle)16 b(to)h(old)f(con)o(text)164 1180 y Fd(IN)i(k)n(ey)24
b Fh(in)o(teger)164 1280 y Fd(IN)18 b(index)25 b Fh(in)o(teger)164
1436 y Fd(MPI)p 279 1436 V 20 w(RANK\(rank,)18 b(con)n(text\))164
1496 y Fh(Return)e(the)g(rank)g(of)h(the)f(calling)f(pro)q(cess)i(within)f
(the)g(sp)q(eci\014ed)g(con)o(text.)164 1592 y Fd(OUT)j(rank)24
b Fh(in)o(teger)164 1692 y Fd(IN)18 b(con)n(text)24 b Fh(con)o(text)15
b(handle)164 1848 y Fd(MPI)p 279 1848 V 20 w(SIZE\(size,)k(con)n(text\))164
1909 y Fh(Return)d(the)g(n)o(um)o(b)q(er)e(of)j(pro)q(cesses)g(that)f(b)q
(elong)h(to)g(the)f(sp)q(eci\014ed)g(con)o(text.)164 2005 y
Fd(OUT)j(size)24 b Fh(in)o(teger)164 2104 y Fd(IN)18 b(con)n(text)24
b Fh(con)o(text)15 b(handle)164 2233 y Fd(Extensions)49 b Fh(P)o(ossible)15
b(extensions)h(for)h(dynamic)d(pro)q(cess)j(spa)o(wning)g(\(MPI2\):)164
2354 y Fd(MPI)p 279 2354 V 20 w(PR)n(OCESS\(pro)r(cess,)i(con)n(text,)e
(rank\))164 2414 y Fh(Returns)f(a)g(handle)g(to)h(the)e(pro)q(cess)i(iden)o
(ti\014ed)d(b)o(y)i(the)g Fc(rank)e Fh(and)j Fc(context)c Fh(param-)164
2474 y(eters.)961 2599 y(4)p eop
%%Page: 5 5
bop 164 307 a Fd(OUT)19 b(pro)r(cess)k Fh(handle)17 b(to)f(pro)q(cess)h(ob)s
(ject)164 407 y Fd(IN)h(con)n(text)24 b Fh(handle)16 b(to)h(con)o(text)e(ob)s
(ject)164 507 y Fd(IN)j(rank)25 b Fh(in)o(teger)164 664 y Fd(MPI)p
279 664 17 2 v 20 w(CREA)-5 b(TE)p 531 664 V 21 w(CONTEXT\(new)n(con)n(text,)
18 b(list)p 1242 664 V 22 w(of)p 1309 664 V 20 w(pro)r(cess)p
1508 664 V 19 w(handles\))164 724 y Fh(creates)h(a)h(new)f(con)o(text)g(out)g
(of)h(an)g(explicit)d(list)i(of)h(mem)n(b)q(ers)d(and)j(rank)f(them)f(in)164
784 y(their)d(order)i(of)f(o)q(ccurrence)g(in)g(the)g(list.)164
881 y Fd(OUT)j(new)n(con)n(text)24 b Fh(handle)16 b(to)g(newly)f(created)h
(con)o(text.)k(Handle)15 b(should)h(not)h(b)q(e)286 941 y(asso)q(ciated)g
(with)f(an)h(ob)s(ject)f(b)q(efore)g(the)g(call.)164 1041 y
Fd(IN)i(list)p 324 1041 V 22 w(of)p 391 1041 V 20 w(pro)r(cess)p
590 1041 V 19 w(handles)26 b Fh(List)17 b(of)g(handles)g(to)g(pro)q(cesses)h
(to)f(b)q(e)g(included)f(in)286 1101 y(new)g(group.)237 1198
y(This,)i(coupled)f(with)h(a)g(mec)o(hanism)d(for)j(requiring)f(the)h(spa)o
(wning)g(of)g(new)g(pro-)164 1258 y(cesses)i(to)h(the)g(computation,)f(will)f
(allo)o(w)i(to)g(create)f(a)h(new)f(all)g(inclusiv)o(e)f(con)o(text)164
1318 y(that)e(includes)e(the)h(additional)g(pro)q(cesses.)164
1462 y Fi(1.3)70 b(Comm)n(unicati)o(on)21 b(F)-6 b(unctions)164
1554 y Fh(The)23 b(prop)q(osed)i(comm)o(uni)o(cation)c(functions)i(are)g
(divided)f(in)o(to)h(t)o(w)o(o)g(la)o(y)o(ers.)41 b(The)164
1614 y(lo)o(w)o(est)12 b(lev)o(el)e(uses)j(the)f(same)f(bu\013er)i
(descriptor)f(ob)s(jects)g(a)o(v)m(ailable)g(in)g(p)q(oin)o(t-to-p)q(oin)o(t)
164 1675 y(to)20 b(create)f(noncon)o(tiguous,)i(m)o(ultiple)c(data)j(t)o(yp)q
(e)f(messages.)31 b(The)20 b(second)g(lev)o(el)d(is)164 1735
y(similar)10 b(to)j(the)f(blo)q(c)o(k)g(send/receiv)o(e)e(p)q(oin)o(t-to-p)q
(oin)o(t)j(op)q(erations)h(in)e(that)h(it)e(supp)q(orts)164
1795 y(only)k(con)o(tiguous)g(bu\013ers)g(of)h(arithmetic)c(storage)k(units.)
21 b(F)l(or)15 b(eac)o(h)f(comm)o(unicati)o(on)164 1855 y(op)q(eration,)j(w)o
(e)e(list)h(these)g(t)o(w)o(o)g(lev)o(el)e(of)j(calls)e(together.)164
1984 y Fd(1.3.1)55 b(Sync)n(hronization)164 2077 y(Barrier)19
b(sync)n(hronization)164 2137 y(MPI)p 279 2137 V 20 w(BARRIER\()f(group,)h
(tag)f(\))237 2257 y Fh(MPI)p 336 2257 15 2 v 17 w(BARRIER)e(blo)q(c)o(ks)h
(the)f(calling)h(pro)q(cess)g(un)o(til)f(all)h(group)h(mem)n(b)q(ers)d(ha)o
(v)o(e)164 2317 y(called)i(it;)h(the)g(call)f(returns)h(at)h(an)o(y)f(pro)q
(cess)g(only)g(after)g(all)g(group)h(mem)n(b)q(ers)d(ha)o(v)o(e)164
2377 y(en)o(tered)f(the)h(call.)164 2474 y Fd(IN)i(group)25
b Fh(group)17 b(handle)961 2599 y(5)p eop
%%Page: 6 6
bop 164 307 a Fd(tag)24 b Fh(comm)o(unication)13 b(tag)k(\(in)o(teger\))164
469 y Fc(MPI)p 245 469 16 2 v 17 w(BARRIER\()23 b(group,)g(tag)i(\))164
529 y Fh(is)164 631 y Fc(MPI_CREATE)o(\(bu)o(ff)o(er_)o(han)o(dle)o(,)d
(MPI_BUFFER,)g(MPI_PERSIS)o(TEN)o(T\);)164 691 y(MPI_SIZE\()g(&size,)i
(group\);)164 751 y(MPI_RANK\()e(&rank,)i(group\);)164 812
y(if)h(\(rank==0\))164 872 y({)241 932 y(for)f(\(i=1;)g(i)h(<)h(size;)e
(i++\))318 992 y(MPI_RECV\()o(buf)o(fer)o(_ha)o(nd)o(le,)e(i,)j(tag,)f
(group\);)241 1052 y(for)g(\(i=1;)g(i)h(<)h(size;)e(i++\))318
1112 y(MPI_SEND\()o(buf)o(fer)o(_ha)o(nd)o(le,)e(i,)j(tag,)f(group\);)164
1173 y(})164 1233 y(else)164 1293 y({)241 1353 y(MPI_SEND\(b)o(uf)o(fer)o
(_ha)o(ndl)o(e,)e(0,)j(tag,)f(group\);)241 1413 y(MPI_RECV\(b)o(uf)o(fer)o
(_ha)o(ndl)o(e,)e(0,)j(tag,)f(group\);)164 1474 y(})164 1534
y(MPI_FREE\(b)o(uff)o(er)o(_ha)o(ndl)o(e\);)164 1664 y Fd(1.3.2)55
b(Data)19 b(mo)n(v)n(e)g(functions)164 1756 y(Circular)h(shift)164
1816 y(MPI)p 279 1816 17 2 v 20 w(CSHIFT\()f(in)n(buf,)h(outbuf,)f(tag,)f
(group,)h(shift\))237 1937 y Fh(Pro)q(cess)13 b(with)g(rank)f
Fc(i)h Fh(sends)f(the)h(data)g(in)f(its)g(input)h(bu\013er)f(to)h(pro)q(cess)
g(with)g(rank)164 1997 y(\()p Fc(i)8 b Fh(+)g Fc(shift)p Fh(\))k(mo)q(d)h
Fc(group)p 664 1997 16 2 v 17 w(size)p Fh(,)g(who)j(receiv)o(es)c(the)j(data)
h(in)e(its)h(output)g(bu\013er.)21 b(All)164 2057 y(pro)q(cesses)g(mak)o(e)e
(the)h(call)f(with)i(the)f(same)f(v)m(alues)i(for)g Fc(tag,)j(group)p
Fh(,)19 b(and)i Fc(shift)p Fh(.)164 2117 y(The)16 b Fc(shift)f
Fh(v)m(alue)h(can)g(b)q(e)g(p)q(ositiv)o(e,)f(zero,)h(or)h(negativ)o(e.)164
2231 y Fd(IN)h(in)n(buf)26 b Fh(handle)16 b(to)h(input)f(bu\013er)h
(descriptor)164 2333 y Fd(OUT)i(outbuf)25 b Fh(handle)16 b(to)g(output)h
(bu\013er)g(descriptor)164 2435 y Fd(IN)h(tag)25 b Fh(op)q(eration)17
b(tag)g(\(in)o(teger\))961 2599 y(6)p eop
%%Page: 7 7
bop 164 307 a Fd(IN)18 b(group)25 b Fh(handle)16 b(to)h(group)164
409 y Fd(IN)h(shift)25 b Fh(in)o(teger)164 583 y Fd(MPI)p 279
583 17 2 v 20 w(CSHIFTB\()19 b(in)n(buf,)h(outbuf,)f(len,)f(tag,)g(group,)h
(shift\))237 704 y Fh(Beha)o(v)o(es)c(lik)o(e)g Fc(MPI)p 596
704 16 2 v 18 w(CSHIFT)p Fh(,)f(with)i(bu\013ers)h(restricted)f(to)h(b)q(e)f
(blo)q(c)o(ks)h(of)g(n)o(umeric)164 764 y(units.)j(All)10 b(pro)q(cesses)i
(mak)o(e)d(the)j(call)e(with)i(the)f(same)g(v)m(alues)g(for)h
Fc(len,)24 b(tag,)g(group)p Fh(,)164 824 y(and)17 b Fc(shift)p
Fh(.)164 926 y Fd(IN)h(in)n(buf)26 b Fh(initial)15 b(lo)q(cation)i(of)f
(input)g(bu\013er)164 1027 y Fd(OUT)j(outbuf)25 b Fh(initial)14
b(lo)q(cation)j(of)g(output)f(bu\013er)164 1129 y Fd(IN)i(len)25
b Fh(n)o(um)o(b)q(er)14 b(of)j(en)o(tries)e(in)h(input)g(\(and)h(output\))g
(bu\013ers)164 1231 y Fd(IN)h(tag)25 b Fh(op)q(eration)17 b(tag)g(\(in)o
(teger\))164 1333 y Fd(IN)h(group)25 b Fh(handle)16 b(to)h(group)164
1434 y Fd(IN)h(shift)25 b Fh(in)o(teger)164 1596 y Fc(MPI)p
245 1596 V 17 w(CSHIFT\()e(inbuf,)h(outbuf,)f(tag,)h(group,)f(shift\))164
1656 y Fh(is)164 1758 y Fc(MPI_SIZE\()f(&size,)i(group\);)164
1818 y(MPI_RANK\()e(&rank,)i(group\);)164 1878 y(MPI_ISEND\()e(handle,)h
(inbuf,)g(mod\(rank+sh)o(ift)o(,)g(size\),)g(tag,)h(group\);)164
1939 y(MPI_RECV\()e(outbuf,)h(mod\(rank-sh)o(ift)o(,s)o(ize)o(\),)f(tag,)i
(group\))164 1999 y(MPI_WAIT\(h)o(and)o(le)o(\);)164 2221 y
Ff(Discussion:)40 b Fe(Do)15 b(w)o(e)g(w)o(an)o(t)f(to)h(supp)q(ort)g(the)g
(case)g Fb(inbuf)23 b(=)h(outbuf)15 b Fe(someho)o(w?)961 2599
y Fh(7)p eop
%%Page: 8 8
bop 164 307 a Fd(End-o\013)18 b(shift)164 367 y(MPI)p 279 367
17 2 v 20 w(EOSHIFT\()h(in)n(buf,)h(outbuf,)e(tag,)g(group,)h(shift\))237
488 y Fh(Pro)q(cess)i(with)f(rank)g Fc(i)p Fh(,)g(max)o(\()p
Fc(0)p Fa(;)8 b Fg(\000)p Fc(shift)p Fh(\))17 b Fg(\024)j Fc(i)g
Fa(<)g Fc(min)p Fh(\()p Fc(size)p Fa(;)7 b Fc(size)j Fg(\000)j
Fc(shift)p Fh(\),)164 548 y(sends)f(the)g(data)h(in)e(its)h(input)g(bu\013er)
g(to)g(pro)q(cess)h(with)f(rank)g Fc(i+)25 b(shift)p Fh(,)10
b(who)j(receiv)o(es)164 608 y(the)18 b(data)i(in)e(its)h(output)g(bu\013er.)
29 b(The)19 b(output)g(bu\013er)g(of)g(pro)q(cesses)g(whic)o(h)f(do)h(not)164
668 y(receiv)o(e)f(data)j(is)g(left)e(unc)o(hanged.)35 b(All)19
b(pro)q(cesses)i(mak)o(e)d(the)j(call)e(with)i(the)f(same)164
729 y(v)m(alues)c(for)h Fc(tag,)24 b(group)p Fh(,)14 b(and)j
Fc(shift)p Fh(.)164 843 y Fd(IN)h(in)n(buf)26 b Fh(handle)16
b(to)h(input)f(bu\013er)h(descriptor)164 944 y Fd(OUT)i(outbuf)25
b Fh(handle)16 b(to)g(output)h(bu\013er)g(descriptor)164 1046
y Fd(IN)h(tag)25 b Fh(op)q(eration)17 b(tag)g(\(in)o(teger\))164
1148 y Fd(IN)h(group)25 b Fh(handle)16 b(to)h(group)164 1249
y Fd(IN)h(shift)25 b Fh(in)o(teger)164 1424 y Fd(MPI)p 279
1424 V 20 w(EOSHIFTB\()19 b(in)n(buf,)g(outbuf,)g(len,)g(tag,)f(group,)g
(shift\))237 1544 y Fh(Beha)o(v)o(es)13 b(lik)o(e)f Fc(MPI)p
591 1544 16 2 v 17 w(EOSHIFT)p Fh(,)f(with)j(bu\013ers)h(restricted)e(to)h(b)
q(e)g(blo)q(c)o(ks)g(of)h(n)o(umeric)164 1604 y(units.)20 b(All)10
b(pro)q(cesses)i(mak)o(e)d(the)j(call)e(with)i(the)f(same)g(v)m(alues)g(for)h
Fc(len,)24 b(tag,)g(group)p Fh(,)164 1665 y(and)17 b Fc(shift)p
Fh(.)164 1766 y Fd(IN)h(in)n(buf)26 b Fh(initial)15 b(lo)q(cation)i(of)f
(input)g(bu\013er)164 1868 y Fd(OUT)j(outbuf)25 b Fh(initial)14
b(lo)q(cation)j(of)g(output)f(bu\013er)164 1970 y Fd(IN)i(len)25
b Fh(n)o(um)o(b)q(er)14 b(of)j(en)o(tries)e(in)h(input)g(\(and)h(output\))g
(bu\013ers)164 2071 y Fd(IN)h(tag)25 b Fh(op)q(eration)17 b(tag)g(\(in)o
(teger\))164 2173 y Fd(IN)h(group)25 b Fh(handle)16 b(to)h(group)164
2275 y Fd(IN)h(shift)25 b Fh(in)o(teger)961 2599 y(8)p eop
%%Page: 9 9
bop 164 420 a Ff(Discussion:)237 477 y Fe(Tw)o(o)11 b(other)g(p)q(ossible)i
(de\014nitions)h(for)d(end-o\013)g(shift:)19 b(\(i\))11 b(zero)h(\014lling)i
(for)d(pro)q(cesses)h(that)164 533 y(don't)g(receiv)o(e)i(messages,)e(or)g
(\(ii\))h(b)q(oundary)g(v)m(alues)h(explicitly)h(pro)o(vided)f(as)e(an)g
(additional)164 589 y(parameter.)25 b(An)o(y)18 b(preferences?)28
b(\(F)l(ortran)15 b(90)i(allo)o(ws)g(to)g(optionally)i(pro)o(vide)e(b)q
(oundary)164 646 y(v)m(alues,)f(and)f(do)q(es)h(zero)f(\014lling,)i(if)e
(none)h(w)o(ere)f(pro)o(vided\))164 956 y Fd(Broadcast)164
1017 y(MPI)p 279 1017 17 2 v 20 w(BCAST\()20 b(bu\013er)p 677
1017 V 19 w(handle,)f(tag,)g(group,)f(ro)r(ot)g(\))237 1137
y Fc(MPI)p 318 1137 16 2 v 17 w(BCAST)e Fh(broadcasts)j(a)f(message)f(from)f
(the)h(pro)q(cess)i(with)e(rank)h Fc(root)e Fh(to)h(all)164
1197 y(other)23 b(pro)q(cesses)g(of)g(the)g(group.)41 b(It)23
b(is)f(called)g(b)o(y)g(all)g(mem)o(b)q(ers)e(of)j(group)h(using)164
1257 y(the)17 b(same)e(argumen)o(ts)h(for)h Fc(tag,)25 b(group,)e(and)i(root)
p Fh(.)c(On)c(return)g(the)f(con)o(ten)o(ts)h(of)164 1318 y(the)h(bu\013er)h
(of)g(the)f(pro)q(cess)h(with)g(rank)g Fc(root)e Fh(is)h(con)o(tained)g(in)g
(bu\013er)h(of)g(all)f(group)164 1378 y(mem)o(b)q(ers.)164
1479 y Fd(INOUT)g(bu\013er)p 518 1479 17 2 v 20 w(handle)25
b Fh(Handle)c(for)h(bu\013er)g(where)f(from)g(message)g(is)g(sen)o(t)h(or)286
1540 y(receiv)o(ed.)164 1641 y Fd(IN)c(tag)25 b Fh(tag)17 b(of)f(comm)o
(unication)d(op)q(eration)18 b(\(in)o(teger\))164 1743 y Fd(IN)g(group)25
b Fh(con)o(text)15 b(of)i(comm)o(unic)o(ation)d(\(handle\))164
1845 y Fd(IN)k(ro)r(ot)24 b Fh(rank)16 b(of)h(broadcast)g(ro)q(ot)h(\(in)o
(teger\))164 2007 y Fd(MPI)p 279 2007 V 20 w(BCASTB\()i(buf,)e(len,)h(tag,)f
(group,)g(ro)r(ot)g(\))237 2127 y Fc(MPI)p 318 2127 16 2 v
17 w(BCASTB)j Fh(b)q(eha)o(v)o(es)i(lik)o(e)e(broadcast,)26
b(restricted)21 b(to)j(a)f(blo)q(c)o(k)g(bu\013er.)42 b(It)22
b(is)164 2187 y(called)c(b)o(y)g(all)g(pro)q(cesses)i(with)f(the)f(same)g
(argumen)o(ts)g(for)h Fc(len,)24 b(tag,)g(group)17 b Fh(and)164
2247 y Fc(root)p Fh(.)164 2349 y Fd(INOUT)h(bu\013er)24 b Fh(Starting)17
b(address)g(of)f(bu\013er)h(\(c)o(hoice)e(t)o(yp)q(e\))164
2451 y Fd(IN)j(len)25 b Fh(Num)o(b)q(er)14 b(of)j(w)o(ords)g(in)f(bu\013er)g
(\(in)o(teger\))961 2599 y(9)p eop
%%Page: 10 10
bop 164 307 a Fd(IN)18 b(tag)25 b Fh(tag)17 b(of)f(comm)o(unication)d(op)q
(eration)18 b(\(in)o(teger\))164 409 y Fd(IN)g(group)25 b Fh(con)o(text)15
b(of)i(comm)o(unic)o(ation)d(\(handle\))164 511 y Fd(in)19
b(ro)r(ot)24 b Fh(rank)16 b(of)h(broadcast)g(ro)q(ot)h(\(in)o(teger\))164
672 y Fc(MPI)p 245 672 16 2 v 17 w(BCAST\()24 b(buffer)p 598
672 V 16 w(handle,)f(tag,)h(group,)g(root)g(\))164 733 y Fh(is)164
834 y Fc(MPI_SIZE\()e(&size,)i(context\);)164 895 y(MPI_RANK\()e(&rank,)i
(context\);)164 955 y(MPI_IRECV\()o(han)o(dl)o(e,)e(buffer_hand)o(le,)g
(root,)i(tag,)g(group\);)164 1015 y(if)h(\(rank==roo)o(t\))241
1075 y(for)f(\(i=0;)g(i)h(<)h(size;)e(i++\))318 1135 y(MPI_SEND\()o(buf)o
(fer)o(_ha)o(nd)o(le,)e(i,)j(tag,)f(group\);)164 1196 y(MPI_WAIT\(h)o(and)o
(le)o(\))164 1325 y Fd(Gather)164 1386 y(MPI)p 279 1386 17
2 v 20 w(GA)-5 b(THER\()19 b(in)n(buf,)h(outbuf,)e(tag,)h(group,)f(ro)r(ot,)g
(len\))237 1506 y Fh(Eac)o(h)g(pro)q(cess)h(\(including)f(the)f(ro)q(ot)j
(pro)q(cess\))f(sends)f(the)g(con)o(ten)o(t)g(of)g(its)g(input)164
1566 y(bu\013er)h(to)h(the)f(ro)q(ot)h(pro)q(cess.)30 b(The)19
b(ro)q(ot)h(pro)q(cess)g(concatenates)f(all)g(the)g(incoming)164
1626 y(messages)13 b(in)g(the)g(order)h(of)g(the)f(senders')g(rank)g(and)h
(places)g(the)f(results)g(in)g(its)g(output)164 1687 y(bu\013er.)33
b(It)19 b(is)h(called)f(b)o(y)g(all)h(mem)n(b)q(ers)e(of)i(group)h(using)f
(the)g(same)f(argumen)o(ts)g(for)164 1747 y Fc(tag,)24 b(group)p
Fh(,)14 b(and)j Fc(root)p Fh(.)j(The)c(input)h(bu\013er)f(of)h(eac)o(h)f(pro)
q(cess)h(ma)o(y)d(ha)o(v)o(e)i(di\013eren)o(t)164 1807 y(length.)164
1909 y Fd(IN)i(in)n(buf)26 b Fh(handle)16 b(to)h(input)f(bu\013er)h
(descriptor)164 2010 y Fd(OUT)i(outbuf)25 b Fh(handle)18 b(to)h(output)g
(bu\013er)g(descriptor)f({)h(signi\014can)o(t)f(only)g(at)h(ro)q(ot)286
2071 y(\(c)o(hoice\))164 2172 y Fd(IN)f(tag)25 b Fh(op)q(eration)17
b(tag)g(\(in)o(teger\))164 2274 y Fd(IN)h(group)25 b Fh(group)17
b(handle)164 2376 y Fd(IN)h(ro)r(ot)24 b Fh(rank)16 b(of)h(receiving)e(pro)q
(cess)h(\(in)o(teger\))949 2599 y(10)p eop
%%Page: 11 11
bop 164 307 a Fd(OUT)19 b(len)24 b Fh(di\013erence)d(b)q(et)o(w)o(een)f
(output)i(bu\013er)g(size)f(\(in)g(b)o(ytes\))g(and)h(n)o(um)o(b)q(er)e(of)
286 367 y(b)o(ytes)c(receiv)o(ed.)164 578 y Ff(Discussion:)237
638 y Fe(It)k(w)o(ould)g(b)q(e)g(more)f(elegan)o(t)g(\(but)h(no)f(more)g(con)
o(v)o(enien)o(t\))h(to)f(ha)o(v)o(e)g(a)g(return)h(status)164
698 y(ob)s(ject.)164 939 y Fd(MPI)p 279 939 17 2 v 20 w(GA)-5
b(THERB\()19 b(in)n(buf,)h(inlen,)f(outbuf,)f(tag,)h(group,)f(ro)r(ot\))237
1060 y Fc(MPI)p 318 1060 16 2 v 17 w(GATHER)12 b Fh(b)q(eha)o(v)o(es)h(lik)o
(e)f Fc(MPI)p 847 1060 V 17 w(GATHER)f Fh(restricted)i(to)h(blo)q(c)o(k)f
(bu\013ers,)h(and)g(with)164 1120 y(the)h(additional)g(restriction)g(that)g
(all)g(input)g(bu\013ers)h(should)g(ha)o(v)o(e)e(the)h(same)g(length.)164
1180 y(All)g(pro)q(cesses)i(should)g(pro)o(vided)f(the)g(same)g(v)m(alues)h
(for)f Fc(inlen,)24 b(tag,)g(group)p Fh(,)14 b(and)164 1240
y Fc(root)h Fh(.)164 1342 y Fd(IN)j(in)n(buf)26 b Fh(\014rst)17
b(v)m(ariable)f(of)g(input)g(bu\013er)h(\(c)o(hoice\))164 1443
y Fd(IN)h(inlen)26 b Fh(Num)o(b)q(er)14 b(of)j(\(w)o(ord\))f(v)m(ariables)g
(in)g(input)g(bu\013er)h(\(in)o(teger\))164 1545 y Fd(OUT)i(outbuf)25
b Fh(\014rst)11 b(v)m(ariable)f(of)h(output)h(bu\013er)f({)g(signi\014can)o
(t)g(only)f(at)h(ro)q(ot)h(\(c)o(hoice\))164 1646 y Fd(IN)18
b(tag)25 b Fh(op)q(eration)17 b(tag)g(\(in)o(teger\))164 1748
y Fd(IN)h(group)25 b Fh(group)17 b(handle)164 1850 y Fd(IN)h(ro)r(ot)24
b Fh(rank)16 b(of)h(receiving)e(pro)q(cess)h(\(in)o(teger\))164
2011 y Fc(MPI)p 245 2011 V 17 w(GATHERB\()23 b(inbuf,)g(inlen,)h(outbuf,)f
(tag,)h(group,)g(root\))164 2072 y Fh(is)164 2173 y Fc(MPI_SIZE\()e(&size,)i
(group\);)164 2233 y(MPI_RANK\()e(&rank,)i(group\);)164 2293
y(MPI_ISENDB)o(\(ha)o(nd)o(le,)e(inbuf,)h(inlen,)h(root,)g(tag,)g(group\);)
164 2354 y(if)h(\(rank==roo)o(t\))241 2414 y(for)f(\(i=0;)g(i)h(<)h(size;)e
(i++\))241 2474 y({)949 2599 y Fh(11)p eop
%%Page: 12 12
bop 318 307 a Fc(MPI_RECVB)o(\(ou)o(tbu)o(f,)22 b(inlen,)i(i,)g(tag,)h
(group,)e(return_sta)o(tus)o(\);)318 367 y(outbuf)g(+=)i(inlen;)241
428 y(})164 488 y(MPI_WAIT\(h)o(and)o(le)o(\);)164 616 y Fd(Scatter)164
676 y(MPI)p 279 676 17 2 v 20 w(SCA)-5 b(TTER\()20 b(list)p
680 676 V 21 w(of)p 746 676 V 20 w(in)n(bufs,)g(outbuf,)f(tag,)f(group,)g(ro)
r(ot,)g(len\))237 797 y Fh(The)e(ro)q(ot)g(pro)q(cess)g(sends)g(the)f(con)o
(ten)o(t)f(of)i(its)f Fc(i)p Fh(-th)h(input)f(bu\013er)g(to)h(the)f(pro)q
(cess)164 857 y(with)k(rank)h Fc(i)p Fh(;)g(eac)o(h)f(pro)q(cess)i
(\(including)d(the)h(ro)q(ot)i(pro)q(cess\))f(stores)g(the)g(incoming)164
917 y(message)d(in)h(its)f(output)i(bu\013er.)26 b(The)18 b(di\013erence)e(b)
q(et)o(w)o(een)h(the)h(size)f(of)h(the)f(output)164 977 y(bu\013er)i(\(in)e
(b)o(ytes\))h(and)h(the)f(n)o(um)o(b)q(er)f(of)h(b)o(ytes)g(receiv)o(ed)e(is)
i(returned)g(in)g Fc(len)p Fh(.)26 b(The)164 1038 y(routine)17
b(is)g(called)g(b)o(y)g(all)f(mem)o(b)q(ers)f(of)i(the)h(group)g(using)g(the)
f(same)f(argumen)o(ts)h(for)164 1098 y Fc(tag,)24 b(group)p
Fh(,)14 b(and)j Fc(root)p Fh(.)164 1191 y Fd(IN)h(list)p 324
1191 V 22 w(of)p 391 1191 V 20 w(in)n(bufs)26 b Fh(list)15
b(of)i(bu\013er)g(descriptor)e(handles)164 1290 y Fd(OUT)k(outbuf)25
b Fh(bu\013er)16 b(descriptor)g(handle)164 1389 y Fd(IN)i(tag)25
b Fh(op)q(eration)17 b(tag)g(\(in)o(teger\))164 1488 y Fd(IN)h(group)25
b Fh(handle)164 1587 y Fd(IN)18 b(ro)r(ot)24 b Fh(rank)16 b(of)h(sending)f
(pro)q(cess)h(\(in)o(teger\))164 1686 y Fd(OUT)i(len)24 b Fh(n)o(um)o(b)q(er)
18 b(of)h(remaining)e(b)o(ytes)i(in)g(the)f(output)i(bu\013er)f(at)h(eac)o(h)
e(pro)q(cess)286 1746 y(\(in)o(teger\))164 1899 y Fc(MPI)p
245 1899 16 2 v 17 w(SCATTER\()23 b(list)p 597 1899 V 17 w(of)p
666 1899 V 18 w(inbufs,)g(outbuf,)g(tag,)h(group,)f(root,)h(len\))164
1959 y Fh(is)164 2053 y Fc(MPI_SIZE\()e(&size,)i(group\);)164
2113 y(MPI_RANK\()e(&rank,)i(group\);)164 2173 y(MPI_IRECV\()o(han)o(dl)o(e,)
e(outbuf,)h(root,)h(tag,)g(group\);)164 2233 y(if)h(\(rank=root)o(\))241
2293 y(for)f(\(i=0;)g(i)h(<)h(size;)e(i++\))318 2354 y(MPI_SEND\()o(inb)o
(uf[)o(i],)e(i,)j(tag,)f(group\);)164 2414 y(MPI_WAIT\(h)o(and)o(le)o(,)f
(return_st)o(atu)o(s\);)164 2474 y(MPI_RETURN)o(_ST)o(AT)o(US\()o(ret)o(urn)o
(_s)o(tat)o(us,)f(len,)i(source,)f(tag\);)949 2599 y Fh(12)p
eop
%%Page: 13 13
bop 164 367 a Fd(MPI)p 279 367 17 2 v 20 w(SCA)-5 b(TTERB\()20
b(in)n(buf,)f(outbuf,)g(len,)f(tag,)h(group,)f(ro)r(ot\))237
488 y Fc(MPI)p 318 488 16 2 v 17 w(SCATTERB)d Fh(b)q(eha)o(v)o(es)i(lik)o(e)f
Fc(MPI)p 910 488 V 17 w(SCATTER)f Fh(restricted)i(to)g(blo)q(c)o(k)g
(bu\013ers,)h(and)164 548 y(with)e(the)h(additional)f(restriction)g(that)h
(all)f(output)h(bu\013ers)g(ha)o(v)o(e)f(the)g(same)g(length.)164
608 y(The)i(input)g(bu\013er)h(blo)q(c)o(k)e(of)i(the)f(ro)q(ot)h(pro)q(cess)
g(is)f(partitioned)g(in)o(to)g Fc(n)f Fh(consecutiv)o(e)164
668 y(blo)q(c)o(ks,)24 b(eac)o(h)e(consisting)h(of)g Fc(len)e
Fh(w)o(ords.)42 b(The)22 b Fc(i)p Fh(-th)h(blo)q(c)o(k)f(is)h(sen)o(t)f(to)h
(the)g Fc(i)p Fh(-th)164 729 y(pro)q(cess)15 b(in)g(the)f(group)h(and)h
(stored)f(in)f(its)g(output)h(bu\013er.)21 b(The)15 b(routine)f(is)h(called)e
(b)o(y)164 789 y(all)18 b(mem)o(b)q(ers)e(of)j(the)f(group)h(using)g(the)g
(same)e(argumen)o(ts)h(for)h Fc(tag,)24 b(group,)f(len)p Fh(,)164
849 y(and)17 b Fc(root)p Fh(.)164 951 y Fd(IN)h(in)n(buf)26
b Fh(\014rst)17 b(en)o(try)e(in)h(input)g(bu\013er)h({)f(signi\014can)o(t)g
(only)g(at)h(ro)q(ot)g(\(c)o(hoice\).)164 1052 y Fd(OUT)i(outbuf)25
b Fh(\014rst)16 b(en)o(try)f(in)h(output)h(bu\013er)g(\(c)o(hoice\).)164
1154 y Fd(IN)h(len)25 b Fh(n)o(um)o(b)q(er)14 b(of)j(en)o(tries)e(to)i(b)q(e)
f(stored)h(in)f(output)g(bu\013er)h(\(in)o(teger\))164 1256
y Fd(IN)h(group)25 b Fh(handle)164 1357 y Fd(IN)18 b(ro)r(ot)24
b Fh(rank)16 b(of)h(sending)f(pro)q(cess)h(\(in)o(teger\))164
1519 y Fc(MPI)p 245 1519 V 17 w(SCATTERB\()23 b(inbuf,)g(outbuf,)g(outlen,)g
(tag,)h(group,)g(root\))164 1579 y Fh(is)164 1681 y Fc(MPI_SIZE\()e(&size,)i
(group\);)164 1741 y(MPI_RANK\()e(&rank,)i(group\);)164 1802
y(MPI_IRECVB)o(\()f(handle,)g(outbuf,)g(outlen,)g(root,)h(tag,)g(group\);)164
1862 y(if)h(\(rank=root)o(\))241 1922 y(for)f(\(i=0;)g(i)h(<)h(size;)e(i++\))
241 1982 y({)318 2042 y(MPI_SENDB)o(\(in)o(buf)o(,)f(outlen,)g(i,)h(tag,)h
(group,)e(return_sta)o(tus)o(\);)318 2103 y(inbuf)h(+=)g(outlen;)241
2163 y(})164 2223 y(MPI_WAIT\(h)o(and)o(le)o(\);)164 2353 y
Fd(All-to-all)d(scatter)164 2413 y(MPI)p 279 2413 17 2 v 20
w(ALLSCA)-5 b(TTER\()19 b(list)p 789 2413 V 22 w(of)p 856 2413
V 20 w(in)n(bufs,)h(outbuf,)e(tag,)h(group,)f(len\))949 2599
y Fh(13)p eop
%%Page: 14 14
bop 237 307 a Fh(Eac)o(h)22 b(pro)q(cess)h(in)e(the)h(group)h(sends)f(its)f
Fc(i)p Fh(-th)h(bu\013er)h(in)e(its)h(input)f(bu\013er)i(list)164
367 y(to)d(the)g(pro)q(cess)g(with)g(rank)g Fc(i)f Fh(\(itself)g(included\);)
g(eac)o(h)h(pro)q(cess)g(concatenates)g(the)164 428 y(incoming)e(messages)i
(in)f(its)h(output)g(bu\013er,)h(in)e(the)h(order)g(of)g(the)g(senders')f
(ranks.)164 488 y(The)14 b(n)o(um)o(b)q(er)d(of)j(b)o(ytes)f(left)g(in)g(the)
g(output)i(bu\013er)f(is)f(returned)g(in)g Fc(len)p Fh(.)20
b(The)13 b(routine)164 548 y(is)j(called)e(b)o(y)i(all)f(mem)o(b)q(ers)e(of)j
(the)g(group)h(using)f(the)g(same)e(argumen)o(ts)h(for)i Fc(tag)d
Fh(and)164 608 y Fc(group)p Fh(.)164 698 y Fd(IN)k(list)p 324
698 17 2 v 22 w(of)p 391 698 V 20 w(in)n(bufs)26 b Fh(list)15
b(of)i(bu\013er)g(descriptor)e(handles)164 796 y Fd(OUT)k(outbuf)25
b Fh(bu\013er)16 b(descriptor)g(handle)164 894 y Fd(IN)i(tag)25
b Fh(op)q(eration)17 b(tag)g(\(in)o(teger\))164 992 y Fd(IN)h(group)25
b Fh(handle)164 1090 y Fd(OUT)19 b(len)24 b Fh(n)o(um)o(b)q(er)15
b(of)h(remaining)f(b)o(ytes)h(in)g(the)g(output)g(bu\013er)h(\(in)o(teger\))
164 1240 y Fd(MPI)p 279 1240 V 20 w(ALLSCA)-5 b(TTERB\()19
b(in)n(buf,)h(outbuf,)f(len,)f(tag,)g(group\))237 1361 y Fc(MPI)p
318 1361 16 2 v 17 w(ALLSCATTERB)7 b Fh(b)q(eha)o(v)o(es)k(lik)o(e)e
Fc(MPI)p 967 1361 V 17 w(ALLSCATTER)e Fh(restricted)j(to)h(blo)q(c)o(k)f
(bu\013ers,)164 1421 y(and)19 b(with)f(the)g(additional)g(restriction)g(that)
g(all)g(blo)q(c)o(ks)g(sen)o(t)g(from)f(one)i(pro)q(cess)g(to)164
1481 y(another)d(ha)o(v)o(e)e(the)g(same)g(length.)21 b(The)15
b(input)g(bu\013er)g(blo)q(c)o(k)f(of)h(eac)o(h)g(pro)q(cess)g(is)g(par-)164
1541 y(titioned)k(in)o(to)g Fc(n)h Fh(consecutiv)o(e)e(blo)q(c)o(ks,)i(eac)o
(h)f(consisting)h(of)g Fc(len)f Fh(w)o(ords.)32 b(The)20 b
Fc(i)p Fh(-th)164 1601 y(blo)q(c)o(k)e(is)h(sen)o(t)f(to)h(the)g
Fc(it)p Fh(-th)f(pro)q(cess)i(in)e(the)h(group.)29 b(Eac)o(h)19
b(pro)q(cess)h(concatenates)164 1661 y(the)15 b(incoming)f(messages,)g(in)h
(the)g(order)h(of)g(the)f(senders')f(ranks,)i(and)g(store)g(them)d(in)164
1722 y(its)19 b(output)h(bu\013er.)31 b(The)19 b(routine)g(is)g(called)g(b)o
(y)f(all)h(mem)o(b)q(ers)e(of)i(the)g(group)i(using)164 1782
y(the)16 b(same)f(argumen)o(ts)h(for)g Fc(tag,)24 b(group)p
Fh(,)14 b(and)j Fc(len)p Fh(.)164 1872 y Fd(IN)h(in)n(buf)26
b Fh(\014rst)17 b(en)o(try)e(in)h(input)g(bu\013er)h(\(c)o(hoice\).)j(ro)q
(ot)d(\(in)o(teger\))164 1970 y Fd(OUT)i(outbuf)25 b Fh(\014rst)16
b(en)o(try)f(in)h(output)h(bu\013er)g(\(c)o(hoice\).)164 2068
y Fd(IN)h(len)25 b Fh(n)o(um)o(b)q(er)14 b(of)j(en)o(tries)e(sen)o(t)h(from)f
(eac)o(h)h(pro)q(cess)h(to)f(eac)o(h)g(other)h(\(in)o(teger\).)164
2166 y Fd(IN)h(tag)25 b Fh(op)q(eration)17 b(tag)g(\(in)o(teger\))164
2263 y Fd(IN)h(group)25 b Fh(handle)164 2414 y Fc(MPI)p 245
2414 V 17 w(ALLSCATTERB)o(\()e(inbuf,)g(outbuf,)g(len,)h(tag,)g(group\))164
2474 y Fh(is)949 2599 y(14)p eop
%%Page: 15 15
bop 164 307 a Fc(MPI_SIZE\()22 b(&size,)i(group\);)164 367
y(MPI_RANK\()e(&rank,)i(group\);)164 428 y(for)h(\(i=0;)e(i)j(<)f(rank;)f
(i++\))241 488 y({)267 548 y(MPI_IRECV)o(B\()o(rec)o(v_h)o(and)o(le)o(s[i)o
(],)e(outbuf,)h(len,)h(tag,)h(group\);)267 608 y(outbuf)e(+=)i(len;)241
668 y(})164 729 y(for)g(\(i=0;)e(i)j(<)f(size;)f(i++\))241
789 y({)267 849 y(MPI_ISEND)o(B\()o(sen)o(d_h)o(and)o(le)o([i])o(,)f(inbuf,)g
(len,)h(i,)h(tag,)f(group\);)267 909 y(inbuf)f(+=)i(len;)241
969 y(})164 1029 y(MPI_WAITAL)o(L\(s)o(en)o(d_h)o(and)o(le\))o(;)164
1090 y(MPI_WAITAL)o(L\(r)o(ec)o(v_h)o(and)o(le\))o(;)164 1220
y Fd(All-to-all)c(broadcast)164 1280 y(MPI)p 279 1280 17 2
v 20 w(ALLCAST\()e(in)n(buf,)h(outbuf,)f(tag,)f(group,)h(len\))237
1400 y Fh(Eac)o(h)k(pro)q(cess)h(in)e(the)h(group)h(broadcasts)h(its)d(input)
h(bu\013er)g(to)h(all)e(pro)q(cesses)164 1460 y(\(including)g(itself)s(\);)i
(eac)o(h)e(pro)q(cess)h(concatenates)g(the)f(incoming)f(messages)h(in)g(its)
164 1521 y(output)d(bu\013er,)g(in)f(the)g(order)g(of)h(the)f(senders')g
(ranks.)28 b(The)19 b(n)o(um)o(b)q(er)d(of)j(b)o(ytes)f(left)164
1581 y(in)d(the)f(output)i(bu\013er)f(is)g(returned)f(in)g
Fc(len)p Fh(.)20 b(The)15 b(routine)g(is)f(called)g(b)o(y)h(all)f(mem)o(b)q
(ers)164 1641 y(of)j(the)f(group)h(using)f(the)g(same)g(argumen)o(ts)f(for)h
Fc(tag)g Fh(and)g Fc(group)p Fh(.)164 1743 y Fd(IN)i(in)n(buf)26
b Fh(bu\013er)17 b(descriptor)f(handle)g(for)g(input)g(bu\013er)164
1844 y Fd(OUT)j(outbuf)25 b Fh(bu\013er)16 b(descriptor)g(handle)g(for)h
(output)f(bu\013er)164 1946 y Fd(IN)i(tag)25 b Fh(op)q(eration)17
b(tag)g(\(in)o(teger\))164 2048 y Fd(IN)h(group)25 b Fh(handle)164
2149 y Fd(OUT)19 b(len)24 b Fh(n)o(um)o(b)q(er)16 b(of)i(remaining)f(un)o
(touc)o(hed)g(b)o(ytes)g(in)g(eac)o(h)h(output)g(bu\013er)g(\(in-)286
2210 y(teger\))164 2371 y Fd(MPI)p 279 2371 V 20 w(ALLCASTB\()h(in)n(buf,)h
(outbuf,)f(len,)f(tag,)h(group\))949 2599 y Fh(15)p eop
%%Page: 16 16
bop 237 307 a Fc(MPI)p 318 307 16 2 v 17 w(ALLCASTB)15 b Fh(b)q(eha)o(v)o(es)
i(lik)o(e)f Fc(MPI)p 910 307 V 17 w(ALLCAST)f Fh(restricted)i(to)g(blo)q(c)o
(k)g(bu\013ers,)h(and)164 367 y(with)11 b(the)g(additional)g(restriction)g
(that)g(all)g(blo)q(c)o(ks)g(sen)o(t)g(from)f(one)i(pro)q(cess)g(to)f
(another)164 428 y(ha)o(v)o(e)g(the)h(same)f(length.)20 b(The)12
b(routine)g(is)g(called)f(b)o(y)h(all)g(mem)n(b)q(ers)e(of)i(the)g(group)i
(using)164 488 y(the)i(same)f(argumen)o(ts)h(for)g Fc(tag,)24
b(group)p Fh(,)14 b(and)j Fc(len)p Fh(.)164 589 y Fd(IN)h(in)n(buf)26
b Fh(\014rst)17 b(en)o(try)e(in)h(input)g(bu\013er)h(\(c)o(hoice\).)j(ro)q
(ot)d(\(in)o(teger\))164 691 y Fd(OUT)i(outbuf)25 b Fh(\014rst)16
b(en)o(try)f(in)h(output)h(bu\013er)g(\(c)o(hoice\).)164 793
y Fd(IN)h(len)25 b Fh(n)o(um)o(b)q(er)19 b(of)h(en)o(tries)g(sen)o(t)g(from)f
(eac)o(h)h(pro)q(cess)h(to)g(eac)o(h)f(other)h(\(including)286
853 y(itself)s(\).)164 955 y Fd(IN)d(group)25 b Fh(handle)164
1117 y Fc(MPI)p 245 1117 V 17 w(ALLCASTB\()e(inbuf,)g(outbuf,)g(len,)h(tag,)g
(group\))164 1177 y Fh(is)164 1279 y Fc(MPI_SIZE\()e(&size,)i(group\);)164
1339 y(MPI_RANK\()e(&rank,)i(group\);)164 1399 y(for)h(\(i=0;)e(i)j(<)f
(rank;)f(i++\))241 1459 y({)267 1519 y(MPI_IRECV)o(B\()o(rec)o(v_h)o(and)o
(le)o(s[i)o(],)e(outbuf,)h(len,)h(tag,)h(group\);)267 1579
y(outbuf)e(+=)i(len;)241 1640 y(})164 1700 y(for)g(\(i=0;)e(i)j(<)f(size;)f
(i++\))241 1760 y({)267 1820 y(MPI_ISEND)o(B\()o(sen)o(d_h)o(and)o(le)o([i])o
(,)f(inbuf,)g(len,)h(i,)h(tag,)f(group\);)241 1880 y(})164
1941 y(MPI_WAITAL)o(L\(s)o(en)o(d_h)o(and)o(le\))o(;)164 2001
y(MPI_WAITAL)o(L\(r)o(ec)o(v_h)o(and)o(le\))o(;)164 2131 y
Fd(1.3.3)55 b(Global)20 b(Compute)f(Op)r(erations)164 2223
y(Reduce)164 2283 y(MPI)p 279 2283 17 2 v 20 w(REDUCE\()g(in)n(buf,)h
(outbuf,)e(tag,)g(group,)h(ro)r(ot,)e(op\))237 2404 y Fh(Com)o(bines)g(the)h
(v)m(alues)g(pro)o(vided)f(in)h(the)g(input)f(bu\013er)i(of)f(eac)o(h)g(pro)q
(cess)g(in)g(the)164 2464 y(group,)d(using)g(the)g(op)q(eration)g
Fc(op)p Fh(,)f(and)h(returns)g(the)f(com)o(bined)e(v)m(alue)j(in)f(the)g
(output)949 2599 y(16)p eop
%%Page: 17 17
bop 164 307 a Fh(bu\013er)20 b(of)f(the)g(pro)q(cess)h(with)f(rank)g
Fc(root)p Fh(.)29 b(Eac)o(h)19 b(pro)q(cess)h(can)g(pro)o(vide)e(one)i(v)m
(alue,)164 367 y(or)j(a)f(sequence)g(of)g(v)m(alues,)i(in)e(whic)o(h)f(case)i
(the)f(com)o(bine)e(op)q(eration)j(is)f(executed)164 428 y(p)q(oin)o(t)o
(wise)e(on)i(eac)o(h)e(en)o(try)g(of)h(the)g(sequence.)34 b(F)l(or)21
b(example,)e(if)i(the)f(op)q(eration)i(is)164 488 y Fc(max)c
Fh(and)i(input)f(bu\013ers)h(con)o(tains)g(t)o(w)o(o)f(\015oating)h(p)q(oin)o
(t)g(n)o(um)o(b)q(ers,)e(then)h(outbuf\(1\))164 548 y(=)j(global)h(max\(in)o
(buf\(1\)\))e(and)h(outbuf\(2\))i(=)e(global)g(max\(in)o(buf\(2\)\).)38
b(All)21 b(input)164 608 y(bu\013ers)15 b(should)g(de\014ne)f(sequences)f(of)
i(equal)e(length)h(of)h(en)o(tries)e(of)i(t)o(yp)q(es)f(that)g(matc)o(h)164
668 y(the)j(t)o(yp)q(e)g(of)g(the)g(op)q(erands)i(of)e Fc(op)p
Fh(.)23 b(The)18 b(output)f(bu\013er)h(should)g(de\014ne)e(a)i(sequence)164
729 y(of)i(the)f(same)g(length)g(of)h(en)o(tries)e(of)i(t)o(yp)q(es)g(that)g
(matc)o(h)e(the)h(t)o(yp)q(e)g(of)h(the)f(result)h(of)164 789
y Fc(op)p Fh(.)k(\(Note)17 b(that,)h(here)f(as)h(for)f(all)g(other)h(comm)o
(unic)o(ation)d(op)q(erations,)j(the)g(t)o(yp)q(e)f(of)164
849 y(en)o(tries)10 b(inserted)h(in)g(a)h(message)e(dep)q(end)i(on)g(the)f
(information)f(pro)o(vided)h(b)o(y)g(the)g(input)164 909 y(bu\013er)j
(descriptor,)g(and)g(not)g(on)h(the)e(declarations)h(of)g(these)g(v)m
(ariables)g(in)f(the)h(calling)164 969 y(program.)24 b(The)17
b(t)o(yp)q(es)f(of)i(the)f(v)m(ariables)g(in)f(the)h(calling)f(program)h
(need)g(not)g(matc)o(h)164 1029 y(the)e(t)o(yp)q(es)g(de\014ned)g(b)o(y)g
(the)g(bu\013er)g(descriptor,)g(but)h(in)e(suc)o(h)i(case)f(the)g(outcome)f
(of)i(a)164 1090 y(reduce)f(op)q(eration)j(ma)o(y)c(b)q(e)j(implem)o(e)o(n)o
(tation)d(dep)q(enden)o(t.\))237 1150 y(The)h(op)q(eration)g(de\014ned)g(b)o
(y)f Fc(op)g Fh(is)g(asso)q(ciativ)o(e)h(and)g(comm)o(utativ)o(e)o(,)d(and)j
(the)f(im-)164 1210 y(plemen)o(tation)d(can)j(tak)o(e)e(adv)m(an)o(tage)j(of)
e(asso)q(ciativit)o(y)g(and)h(comm)o(utativi)o(t)o(y)c(in)j(order)164
1270 y(to)20 b(c)o(hange)f(order)g(of)h(ev)m(aluation.)30 b(The)20
b(routine)f(is)g(called)f(b)o(y)h(all)g(group)h(mem)o(b)q(ers)164
1330 y(using)d(the)f(same)f(argumen)o(ts)g(for)i Fc(tag,)24
b(group,)f(root)15 b Fh(and)i Fc(op)p Fh(.)164 1426 y Fd(IN)h(in)n(buf)26
b Fh(handle)16 b(to)h(input)f(bu\013er)164 1526 y Fd(OUT)j(outbuf)25
b Fh(handle)16 b(to)g(output)h(bu\013er)g({)f(signi\014can)o(t)g(only)g(at)h
(ro)q(ot)164 1625 y Fd(IN)h(tag)25 b Fh(op)q(eration)17 b(tag)g(\(in)o
(teger\))164 1725 y Fd(IN)h(group)25 b Fh(handle)16 b(to)h(group)164
1824 y Fd(IN)h(ro)r(ot)24 b Fh(rank)16 b(of)h(ro)q(ot)g(pro)q(cess)g(\(in)o
(teger\))164 1924 y Fd(IN)h(op)25 b Fh(op)q(eration)17 b(\(status\))237
2020 y(W)l(e)k(list)g(b)q(elo)o(w)g(the)g(op)q(erations)h(are)g(supp)q(orted)
g(for)g(F)l(ortran,)g(eac)o(h)f(with)g(the)164 2080 y(corresp)q(onding)c(v)m
(alue)f(of)h(the)f Fc(op)f Fh(parameter.)164 2175 y Fd(MPI)p
279 2175 17 2 v 20 w(IMAX)26 b Fh(in)o(teger)15 b(maxim)n(um)164
2275 y Fd(MPI)p 279 2275 V 20 w(RMAX)25 b Fh(real)16 b(maxim)o(um)164
2374 y Fd(MPI)p 279 2374 V 20 w(DMAX)26 b Fh(double)16 b(precision)f(real)h
(maxim)o(um)164 2474 y Fd(MPI)p 279 2474 V 20 w(IMIN)25 b Fh(in)o(teger)15
b(minim)n(um)949 2599 y(17)p eop
%%Page: 18 18
bop 164 307 a Fd(MPI)p 279 307 17 2 v 20 w(RMIN)24 b Fh(real)16
b(minim)n(um)164 407 y Fd(MPI)p 279 407 V 20 w(DMIN)25 b Fh(double)16
b(precision)f(real)h(minim)n(um)164 508 y Fd(MPI)p 279 508
V 20 w(ISUM)25 b Fh(in)o(teger)15 b(sum)164 608 y Fd(MPI)p
279 608 V 20 w(RSUM)25 b Fh(real)16 b(sum)164 708 y Fd(MPI)p
279 708 V 20 w(DSUM)25 b Fh(double)16 b(precision)g(real)g(sum)164
809 y Fd(MPI)p 279 809 V 20 w(CSUM)26 b Fh(complex)14 b(sum)164
909 y Fd(MPI)p 279 909 V 20 w(DCSUM)26 b Fh(double)16 b(precision)f(complex)f
(sum)164 1009 y Fd(MPI)p 279 1009 V 20 w(IPR)n(OD)25 b Fh(in)o(teger)16
b(pro)q(duct)164 1110 y Fd(MPI)p 279 1110 V 20 w(RPR)n(OD)25
b Fh(real)16 b(pro)q(duct)164 1210 y Fd(MPI)p 279 1210 V 20
w(DPR)n(OD)25 b Fh(double)17 b(precision)e(real)h(pro)q(duct)164
1310 y Fd(MPI)p 279 1310 V 20 w(CPR)n(OD)26 b Fh(complex)14
b(pro)q(duct)164 1411 y Fd(MPI)p 279 1411 V 20 w(DCPR)n(OD)26
b Fh(double)16 b(precision)f(complex)g(pro)q(duct)164 1511
y Fd(MPI)p 279 1511 V 20 w(AND)25 b Fh(logical)16 b(and)164
1611 y Fd(MPI)p 279 1611 V 20 w(IAND)25 b Fh(in)o(teger)15
b(\(bit-wise\))h(and)164 1712 y Fd(MPI)p 279 1712 V 20 w(OR)25
b Fh(logical)15 b(or)164 1812 y Fd(MPI)p 279 1812 V 20 w(IOR)25
b Fh(in)o(teger)15 b(\(bit-wise\))h(or)164 1912 y Fd(MPI)p
279 1912 V 20 w(X)n(OR)25 b Fh(logical)16 b(xor)164 2013 y
Fd(MPI)p 279 2013 V 20 w(IX)n(OR)25 b Fh(in)o(teger)16 b(\(bit-wise\))f(xor)
164 2113 y Fd(MPI)p 279 2113 V 20 w(MAXLOC)26 b Fh(rank)16
b(of)h(pro)q(cess)g(with)f(maxim)n(um)c(in)o(teger)k(v)m(alue)164
2213 y Fd(MPI)p 279 2213 V 20 w(MAXRLOC)26 b Fh(rank)16 b(of)h(pro)q(cess)g
(with)f(maxim)n(um)c(real)k(v)m(alue)164 2314 y Fd(MPI)p 279
2314 V 20 w(MAXDLOC)26 b Fh(rank)d(of)h(pro)q(cess)g(with)f(maxim)o(um)c
(double)k(precision)f(real)286 2374 y(v)m(alue)164 2474 y Fd(MPI)p
279 2474 V 20 w(MINLOC)j Fh(rank)16 b(of)h(pro)q(cess)g(with)f(minim)n(um)c
(in)o(teger)j(v)m(alue)949 2599 y(18)p eop
%%Page: 19 19
bop 164 307 a Fd(MPI)p 279 307 17 2 v 20 w(MINRLOC)25 b Fh(rank)16
b(of)h(pro)q(cess)g(with)f(minim)n(um)c(real)k(v)m(alue)164
407 y Fd(MPI)p 279 407 V 20 w(MINDLOC)25 b Fh(rank)11 b(of)g(pro)q(cess)h
(with)e(minim)n(um)d(double)k(precision)f(real)g(v)m(alue)164
565 y Fd(MPI)p 279 565 V 20 w(REDUCEB\()19 b(in)n(buf,)g(outbuf,)g(len,)g
(tag,)f(group,)g(ro)r(ot,)g(op\))237 685 y Fh(Is)e(same)f(as)i
Fc(MPI)p 553 685 16 2 v 18 w(REDUCE)p Fh(,)c(restricted)i(to)i(a)g(blo)q(c)o
(k)e(bu\013er.)164 782 y Fd(IN)j(in)n(buf)26 b Fh(\014rst)17
b(lo)q(cation)f(in)g(input)g(bu\013er)164 883 y Fd(OUT)j(outbuf)25
b Fh(\014rst)16 b(lo)q(cation)h(in)f(output)g(bu\013er)h({)g(signi\014can)o
(t)f(only)g(at)g(ro)q(ot)164 983 y Fd(IN)i(len)25 b Fh(n)o(um)o(b)q(er)14
b(of)j(en)o(tries)e(in)h(input)g(and)h(output)g(bu\013er)f(\(in)o(teger\))164
1083 y Fd(IN)i(tag)25 b Fh(op)q(eration)17 b(tag)g(\(in)o(teger\))164
1183 y Fd(IN)h(group)25 b Fh(handle)16 b(to)h(group)164 1283
y Fd(IN)h(ro)r(ot)24 b Fh(rank)16 b(of)h(ro)q(ot)g(pro)q(cess)g(\(in)o
(teger\))164 1384 y Fd(IN)h(op)25 b Fh(op)q(eration)17 b(\(status\))164
1590 y Ff(Discussion:)237 1646 y Fe(If)h(w)o(e)f(are)g(to)f(b)q(e)i
(compatible)g(with)g(the)f(p)q(oin)o(t)h(to)f(p)q(oin)o(t)g(blo)q(c)o(k)h(op)
q(erations,)g(the)f Fb(len)164 1703 y Fe(parameter)g(should)h(indicate)h(the)
e(n)o(um)o(b)q(er)h(of)f(w)o(ords)f(in)j(bu\013er.)26 b(But)17
b(it)h(migh)o(t)f(b)q(e)h(more)164 1759 y(natural)i(to)e(ha)o(v)o(e)i
Fb(len)f Fe(indicate)i(the)e(n)o(um)o(b)q(er)h(of)f(en)o(tries)h(in)h(the)e
(bu\013er,)i(so)e(that)g(if)h(the)164 1816 y(en)o(tries)d(are)g(complex)h(or)
f(double)h(precision,)h Fb(len)d Fe(will)j(b)q(e)f(half)f(the)h(n)o(um)o(b)q
(er)f(of)g(w)o(ords)f(in)164 1872 y(the)f(bu\013er.)164 2173
y Fd(MPI)p 279 2173 17 2 v 20 w(USER)p 452 2173 V 20 w(REDUCE\()k(in)n(buf,)h
(outbuf,)e(tag,)g(group,)h(ro)r(ot,)e(function\))237 2293 y
Fh(Same)d(as)i(the)g(reduce)e(op)q(eration)j(function)e(ab)q(o)o(v)o(e)g
(except)f(that)i(a)g(user)f(supplied)164 2354 y(function)h(is)f(used.)21
b Fc(function)13 b Fh(is)j(an)g(asso)q(ciativ)o(e)f(and)i(comm)o(utativ)n(e)c
(function)i(with)164 2414 y(t)o(w)o(o)i(argumen)o(ts.)23 b(The)17
b(t)o(yp)q(es)f(of)i(the)f(t)o(w)o(o)f(argumen)o(ts)h(and)g(of)h(the)e
(returned)h(v)m(alues)164 2474 y(all)f(agree.)949 2599 y(19)p
eop
%%Page: 20 20
bop 164 307 a Fd(IN)18 b(in)n(buf)26 b Fh(handle)16 b(to)h(input)f(bu\013er)
164 409 y Fd(OUT)j(outbuf)25 b Fh(handle)16 b(to)g(output)h(bu\013er)g({)f
(signi\014can)o(t)g(only)g(at)h(ro)q(ot)164 511 y Fd(IN)h(tag)25
b Fh(op)q(eration)17 b(tag)g(\(in)o(teger\))164 612 y Fd(IN)h(group)25
b Fh(handle)16 b(to)h(group)164 714 y Fd(IN)h(ro)r(ot)24 b
Fh(rank)16 b(of)h(ro)q(ot)g(pro)q(cess)g(\(in)o(teger\))164
816 y Fd(IN)h(function)26 b Fh(user)16 b(pro)o(vided)f(function)164
978 y Fd(MPI)p 279 978 17 2 v 20 w(USER)p 452 978 V 20 w(REDUCEB\()h(in)n
(buf,)i(outbuf,)e(len,)h(tag,)f(group,)g(ro)r(ot,)g(func-)164
1038 y(tion\))164 1098 y Fh(Is)g(same)f(as)i Fc(MPI)p 480 1098
16 2 v 498 1098 V 36 w(USER)p 620 1098 V 17 w(REDUCE)p Fh(,)d(restricted)h
(to)h(a)h(blo)q(c)o(k)f(bu\013er.)164 1200 y Fd(IN)i(in)n(buf)26
b Fh(\014rst)17 b(lo)q(cation)f(in)g(input)g(bu\013er)164 1301
y Fd(OUT)j(outbuf)25 b Fh(\014rst)16 b(lo)q(cation)h(in)f(output)g(bu\013er)h
({)g(signi\014can)o(t)f(only)g(at)g(ro)q(ot)164 1403 y Fd(IN)i(len)25
b Fh(n)o(um)o(b)q(er)14 b(of)j(en)o(tries)e(in)h(input)g(and)h(output)g
(bu\013er)f(\(in)o(teger\))164 1505 y Fd(IN)i(tag)25 b Fh(op)q(eration)17
b(tag)g(\(in)o(teger\))164 1606 y Fd(IN)h(group)25 b Fh(handle)16
b(to)h(group)164 1708 y Fd(IN)h(ro)r(ot)24 b Fh(rank)16 b(of)h(ro)q(ot)g(pro)
q(cess)g(\(in)o(teger\))164 1810 y Fd(IN)h(op)25 b Fh(op)q(eration)17
b(\(status\))164 2021 y Ff(Discussion:)237 2077 y Fe(Do)d(w)o(e)g(also)g(w)o
(an)o(t)f(a)h(v)o(ersion)h(of)e(reduce)j(that)d(broadcasts)h(the)g(result)h
(to)e(all)i(pro)q(cesses)164 2134 y(in)j(the)f(group?)27 b(\(This)17
b(can)h(b)q(e)f(ac)o(hiev)o(ed)i(b)o(y)e(a)g(reduce)h(follo)o(w)o(ed)f(b)o(y)
g(a)g(broadcast,)g(but)g(a)164 2190 y(com)o(bined)f(function)g(ma)o(y)f(b)q
(e)h(somewhat)e(more)h(e\016cien)o(t.)949 2599 y Fh(20)p eop
%%Page: 21 21
bop 164 307 a Fd(Scan)164 367 y(MPI)p 279 367 17 2 v 20 w(SCAN\()20
b(in)n(buf,)f(outbuf,)g(tag,)f(group,)h(op)g(\))237 488 y Fh(MPI)p
336 488 15 2 v 17 w(SCAN)12 b(is)h(used)f(to)h(p)q(erform)f(a)h(parallel)f
(pre\014x)g(with)g(resp)q(ect)g(to)h(an)g(asso)q(cia-)164 548
y(tiv)o(e)f(reduction)h(op)q(eration)h(on)g(data)g(distributed)f(across)h
(the)f(group.)22 b(The)13 b(op)q(eration)164 608 y(returns)k(in)g(the)g
(output)g(bu\013er)h(of)f(the)g(pro)q(cess)h(with)e(rank)i
Fc(i)e Fh(the)h(reduction)g(of)g(the)164 668 y(v)m(alues)k(in)g(the)g(input)g
(bu\013ers)h(of)g(pro)q(cesses)g(with)f(ranks)h Fc(0,...,i)p
Fh(.)33 b(The)22 b(t)o(yp)q(e)f(of)164 729 y(op)q(erations)e(supp)q(orted)h
(and)f(their)e(seman)o(tic,)g(and)i(the)f(constrain)o(ts)g(on)h(input)f(and)
164 789 y(output)f(bu\013ers)g(are)f(as)h(for)f Fc(MPI)p 779
789 16 2 v 18 w(REDUCE)p Fh(.)164 872 y Fd(IN)i(in)n(buf)26
b Fh(handle)16 b(to)h(input)f(bu\013er)164 967 y Fd(OUT)j(outbuf)25
b Fh(handle)16 b(to)g(output)h(bu\013er)164 1063 y Fd(IN)h(tag)25
b Fh(op)q(eration)17 b(tag)g(\(in)o(teger\))164 1158 y Fd(IN)h(group)25
b Fh(handle)16 b(to)h(group)164 1254 y Fd(IN)h(op)25 b Fh(op)q(eration)17
b(\(status\))164 1397 y Fd(MPI)p 279 1397 17 2 v 20 w(SCANB\()j(in)n(buf,)f
(outbuf,)g(len,)f(tag,)h(group,)f(op)h(\))164 1457 y Fh(Same)c(as)i
Fc(MPI)p 435 1457 16 2 v 17 w(SCAN)p Fh(,)e(restricted)g(to)h(blo)q(c)o(k)g
(bu\013ers.)164 1546 y Fd(IN)i(in)n(buf)26 b Fh(\014rst)17
b(input)f(bu\013er)g(elemen)o(t)e(\(c)o(hoice\))164 1642 y
Fd(OUT)19 b(outbuf)25 b Fh(\014rst)16 b(output)h(bu\013er)f(elemen)o(t)e(\(c)
o(hoice\))164 1737 y Fd(IN)k(len)25 b Fh(n)o(um)o(b)q(er)14
b(of)j(en)o(tries)e(in)h(input)g(and)h(output)g(bu\013er)f(\(in)o(teger\))164
1833 y Fd(IN)i(tag)25 b Fh(op)q(eration)17 b(tag)g(\(in)o(teger\))164
1928 y Fd(IN)h(group)25 b Fh(handle)16 b(to)h(group)164 2024
y Fd(IN)h(op)25 b Fh(op)q(eration)17 b(\(status\))164 2173
y Fd(MPI)p 279 2173 17 2 v 20 w(USER)p 452 2173 V 20 w(SCAN\()j(in)n(buf,)f
(outbuf,)g(tag,)f(group,)h(function)g(\))237 2293 y Fh(Same)g(as)h(the)f
(scan)h(op)q(eration)h(function)e(ab)q(o)o(v)o(e)g(except)g(that)h(a)g(user)f
(supplied)164 2354 y(function)d(is)f(used.)21 b Fc(function)13
b Fh(is)j(an)g(asso)q(ciativ)o(e)f(and)i(comm)o(utativ)n(e)c(function)i(with)
164 2414 y(t)o(w)o(o)i(argumen)o(ts.)23 b(The)17 b(t)o(yp)q(es)f(of)i(the)f
(t)o(w)o(o)f(argumen)o(ts)h(and)g(of)h(the)e(returned)h(v)m(alues)164
2474 y(all)f(agree.)949 2599 y(21)p eop
%%Page: 22 22
bop 164 307 a Fd(IN)18 b(in)n(buf)26 b Fh(handle)16 b(to)h(input)f(bu\013er)
164 409 y Fd(OUT)j(outbuf)25 b Fh(handle)16 b(to)g(output)h(bu\013er)164
511 y Fd(IN)h(tag)25 b Fh(op)q(eration)17 b(tag)g(\(in)o(teger\))164
612 y Fd(IN)h(group)25 b Fh(handle)16 b(to)h(group)164 714
y Fd(IN)h(function)26 b Fh(user)16 b(pro)o(vided)f(function)164
876 y Fd(MPI)p 279 876 17 2 v 20 w(USER)p 452 876 V 20 w(SCANB\()k(in)n(buf,)
h(outbuf,)f(len,)f(tag,)h(group,)f(function\))164 936 y Fh(Is)e(same)f(as)i
Fc(MPI)p 480 936 16 2 v 18 w(USER)p 602 936 V 16 w(SCAN)p Fh(,)e(restricted)g
(to)h(a)h(blo)q(c)o(k)f(bu\013er.)164 1038 y Fd(IN)i(in)n(buf)26
b Fh(\014rst)17 b(lo)q(cation)f(in)g(input)g(bu\013er)164 1139
y Fd(OUT)j(outbuf)25 b Fh(\014rst)16 b(lo)q(cation)h(in)f(output)g(bu\013er)
164 1241 y Fd(IN)i(len)25 b Fh(n)o(um)o(b)q(er)14 b(of)j(en)o(tries)e(in)h
(input)g(and)h(output)g(bu\013er)f(\(in)o(teger\))164 1343
y Fd(IN)i(tag)25 b Fh(op)q(eration)17 b(tag)g(\(in)o(teger\))164
1445 y Fd(IN)h(group)25 b Fh(handle)16 b(to)h(group)164 1546
y Fd(IN)h(function)26 b Fh(user)16 b(pro)o(vided)f(function)164
1757 y Ff(Discussion:)237 1817 y Fe(Do)j(w)o(e)g(w)o(an)o(t)f(scan)i(op)q
(erations)f(executed)i(b)o(y)e(segmen)o(ts?)30 b(\(The)18 b(HPF)g
(de\014nition)i(of)164 1878 y(pre\014x)c(and)f(su\016x)g(op)q(eration)h(migh)
o(t)f(b)q(e)h(handy)f({)g(in)h(addition)h(to)d(the)i(scanned)g(v)o(ector)e
(of)164 1938 y(v)m(alues)i(there)g(is)f(a)g(mask)g(that)f(tells)j(where)e
(segmen)o(ts)g(start)f(and)h(end.\))164 2227 y Ff(Missing:)237
2284 y Fe(Non)o(blo)q(c)o(king)d(\(immediate\))g(collectiv)o(e)h(op)q
(erations.)18 b(The)12 b(syn)o(tax)e(is)h(ob)o(vious:)18 b(for)11
b(eac)o(h)164 2340 y(collectiv)o(e)20 b(op)q(eration)e Fb(MPI)p
645 2340 15 2 v 16 w(op\(params\))f Fe(one)h(ma)o(y)f(ha)o(v)o(e)g(a)h(new)g
(non)o(blo)q(c)o(king)h(collectiv)o(e)164 2397 y(op)q(eration)e(of)g(the)g
(form)f Fb(MPI)p 687 2397 V 17 w(Iop\(handle,)22 b(params\))p
Fe(,)16 b(that)h(initiates)h(the)f(execution)h(of)949 2599
y Fh(22)p eop
%%Page: 23 23
bop 164 307 a Fe(the)14 b(corresp)q(onding)i(op)q(eration.)k(The)14
b(execution)i(of)e(the)g(op)q(eration)h(is)g(completed)g(b)o(y)f(exe-)164
364 y(cuting)d Fb(MPI)p 373 364 15 2 v 17 w(WAIT\(handle,...)p
Fe(,)d Fb(MPI)p 843 364 V 17 w(STATUS\(handle,...\))p Fe(,)g
Fb(MPI)p 1385 364 V 17 w(WAITALL)p Fe(,)h Fb(MPI)p 1664 364
V 17 w(WAITANY)p Fe(,)164 420 y(or)15 b Fb(MPI)p 295 420 V
16 w(STATUSANY)p Fe(.)f(There)h(are)g(three)h(issues)g(to)e(consider:)237
477 y(\(i\))22 b(The)h(exact)e(de\014nition)j(of)e(the)g(seman)o(tics)g(of)g
(there)g(op)q(erations)h(\(in)f(particular)164 533 y(constrain)o(ts)15
b(on)g(order.)237 589 y(\(ii\))23 b(The)f(complexit)o(y)h(of)f(implemen)o
(tation)h(\(including)i(the)d(complexit)o(y)h(of)f(ha)o(ving)164
646 y(the)15 b(same)g Fb(WAIT)g Fe(or)f Fb(STATUS)h Fe(functions)h(apply)f(b)
q(oth)h(to)e(p)q(oin)o(t-to-p)q(oin)o(t)i(and)g(to)e(collectiv)o(e)164
702 y(op)q(erations\).)237 763 y(\(iii\))j(The)e(accrued)h(p)q(erformance)f
(adv)m(an)o(tage.)164 1027 y Fi(1.4)70 b(Correctness)164 1240
y Ff(Discussion:)40 b Fe(This)16 b(is)g(still)g(v)o(ery)f(preliminary)237
1421 y Fh(The)g(seman)o(tics)e(of)h(the)h(collectiv)o(e)d(comm)o(uni)o
(cation)g(op)q(erations)k(can)f(b)q(e)f(deriv)o(ed)164 1481
y(from)j(their)g(op)q(erational)i(de\014nition)f(in)f(terms)g(of)h(p)q(oin)o
(t-to-p)q(oin)o(t)i(comm)o(uni)o(cation.)164 1541 y(It)c(is)f(assumed)h(that)
g(messages)g(p)q(ertaining)g(to)h(one)f(op)q(eration)h(cannot)g(b)q(e)f
(confused)164 1601 y(with)g(messages)f(p)q(ertaining)h(to)g(another)h(op)q
(eration.)22 b(Also)15 b(messages)h(p)q(ertaining)g(to)164
1661 y(t)o(w)o(o)21 b(distinct)g(o)q(ccurrences)f(of)i(the)f(same)f(op)q
(eration)j(cannot)f(b)q(e)f(confused,)h(if)f(the)164 1722 y(t)o(w)o(o)c(o)q
(ccurrences)h(ha)o(v)o(e)e(distinct)h(parameters.)24 b(The)18
b(relev)m(an)o(t)f(parameters)f(for)i(this)164 1782 y(purp)q(ose)j(are)e
Fc(group)p Fh(,)g Fc(tag)p Fh(,)g Fc(root)f Fh(and)i Fc(op)p
Fh(.)31 b(messages)19 b(p)q(ertaining)h(to)g(another)g(o)q(c-)164
1842 y(currence)14 b(of)h(the)g(same)f(op)q(eration,)i(with)f(di\013eren)o(t)
g(parameters.)20 b(The)15 b(implem)o(en)n(ter)164 1902 y(can,)i(of)h(course,)
f(use)g(another,)g(more)f(e\016cien)o(t)f(implem)o(en)o(tation,)f(as)k(long)g
(as)f(it)g(has)164 1962 y(the)f(same)f(e\013ect.)164 2132 y
Ff(Discussion:)237 2188 y Fe(This)j(statemen)o(t)f(do)q(es)h(not)f(y)o(et)g
(apply)h(to)f(the)h(curren)o(t,)g(incomplete)h(and)f(somewhat)164
2245 y(careless)e(de\014nitions)h(I)e(pro)o(vided)h(in)g(this)g(draft.)237
2301 y(The)i(de\014nition)h(ab)q(o)o(v)o(e)f(means)f(that)g(messages)g(p)q
(ertaining)i(to)e(a)h(collectiv)o(e)h(comm)o(u-)164 2358 y(nication)g(carry)f
(information)g(iden)o(tifying)i(the)e(op)q(eration)h(itself,)g(and)f(the)h(v)
m(alues)g(of)f(the)164 2414 y Fb(tag,)23 b(group)15 b Fe(and,)g(where)g
(relev)m(an)o(t,)h Fb(root)e Fe(or)h Fb(op)g Fe(parameters.)k(Is)c(this)h
(acceptable?)949 2599 y Fh(23)p eop
%%Page: 24 24
bop 237 488 a Fh(A)16 b(few)g(examples:)164 596 y Fc(MPI_BCAST\()o(buf)o(,)22
b(len,)j(tag,)f(group,)f(0\);)164 656 y(MPI_BCAST\()o(buf)o(,)f(len,)j(tag,)f
(group,)f(1\);)237 765 y Fh(Tw)o(o)c(consecutiv)o(e)d(broadcasts,)j(in)f(the)
g(same)f(group,)i(with)e(the)h(same)f(tag,)i(but)164 825 y(di\013eren)o(t)d
(ro)q(ots.)26 b(Since)16 b(the)h(op)q(erations)i(are)e(distinguishable,)g
(messages)g(from)f(one)164 885 y(broadcast)k(cannot)f(b)q(e)f(confused)h
(with)f(messages)g(from)f(the)h(other)h(broadcast;)h(the)164
945 y(program)c(is)g(safe)h(and)f(will)g(execute)e(as)j(exp)q(ected.)164
1054 y Fc(MPI_BCAST\()o(buf)o(,)22 b(len,)j(tag,)f(group,)f(0\);)164
1114 y(MPI_BCAST\()o(buf)o(,)f(len,)j(tag,)f(group,)f(0\);)237
1222 y Fh(Tw)o(o)c(consecutiv)o(e)e(broadcasts,)k(in)d(the)h(same)e(group,)j
(with)f(the)f(same)g(tag)h(and)164 1282 y(ro)q(ot.)36 b(Since)21
b(p)q(oin)o(t-to-p)q(oin)o(t)h(comm)o(uni)o(cation)c(preserv)o(es)i(the)h
(order)g(of)g(messages)164 1342 y(here,)c(to)q(o,)j(messages)d(from)g(one)h
(broadcast)i(will)d(not)h(b)q(e)g(confused)h(with)e(messages)164
1403 y(from)e(the)h(other)g(broadcast;)i(the)e(program)g(is)g(safe)g(and)h
(will)e(execute)g(as)i(in)o(tended.)164 1511 y Fc(MPI_RANK\(&)o(ran)o(k,)22
b(group\))164 1571 y(if)j(\(rank==0\))215 1631 y({)241 1692
y(MPI_BCASTB)o(\(b)o(uf,)d(len,)i(tag,)g(group,)g(0\);)241
1752 y(MPI_SENDB\()o(bu)o(f,)e(len,)j(2,)f(tag,)h(group\);)215
1812 y(})164 1872 y(elseif)e(\(rank==1\))215 1932 y({)241 1993
y(MPI_RECVB\()o(bu)o(f,)f(len,)j(MPI_DONTC)o(AR)o(E,)d(tag,)j(group\);)241
2053 y(MPI_BCASTB)o(\(b)o(uf,)d(len,)i(tag,)g(group,)g(0\);)241
2113 y(MPI_RECVB\()o(bu)o(f,)e(len,)j(MPI_DONTC)o(AR)o(E,)d(tag,)j(group\);)
215 2173 y(})164 2233 y(else)215 2293 y({)241 2354 y(MPI_SENDB\()o(bu)o(f,)d
(len,)j(2,)f(tag,)h(group\);)241 2414 y(MPI_BCASTB)o(\(b)o(uf,)d(len,)i(tag,)
g(group,)g(0\);)215 2474 y(})949 2599 y Fh(24)p eop
%%Page: 25 25
bop 237 307 a Fh(Pro)q(cess)25 b(zero)f(executes)e(a)j(broadcast)g(follo)o(w)
o(ed)e(b)o(y)h(a)g(send)g(to)h(pro)q(cess)f(one;)164 367 y(pro)q(cess)e(t)o
(w)o(o)g(executes)e(a)i(send)f(to)h(pro)q(cess)g(one,)h(follo)o(w)o(ed)d(b)o
(y)i(a)f(broadcast;)k(and)164 428 y(pro)q(cess)13 b(one)g(executes)e(a)i
(receiv)o(e,)e(a)i(broadcast)g(and)h(a)f(receiv)o(e.)k(A)12
b(p)q(ossible)h(outcome)164 488 y(is)j(for)h(the)f(op)q(erations)h(to)g(b)q
(e)f(matc)o(hed)e(as)j(illustrated)f(b)o(y)f(the)h(diagram)g(b)q(elo)o(w.)267
722 y Fc(0)589 b(1)563 b(2)574 843 y(/)25 b(-)h(>)51 b(receive)305
b(/)25 b(-)g(send)523 903 y(/)640 b(/)164 963 y(broadcast)74
b(/)230 b(broadcast)176 b(/)77 b(broadcast)446 1023 y(/)615
b(/)215 1083 y(send)76 b(-)333 b(receive)48 b(<)25 b(-)237
1318 y Fh(The)18 b(reason)g(is)f(that)h(broadcast)g(is)g(not)f(a)h(sync)o
(hronous)g(op)q(eration;)h(the)e(call)g(at)164 1378 y(a)f(pro)q(cess)h(ma)o
(y)d(return)i(b)q(efore)g(the)g(other)g(pro)q(cesses)g(ha)o(v)o(e)g(en)o
(tered)e(the)i(broadcast.)164 1438 y(Th)o(us,)h(the)f(message)g(sen)o(t)h(b)o
(y)f(pro)q(cess)i(zero)e(can)h(arriv)o(e)f(to)h(pro)q(cess)g(one)g(b)q(efore)
g(the)164 1499 y(message)c(sen)o(t)g(b)o(y)g(pro)q(cess)h(t)o(w)o(o,)g(and)g
(b)q(efore)f(the)h(call)e(to)i(broadcast)h(on)f(pro)q(cess)g(one.)949
2599 y(25)p eop
%%Trailer
end
userdict /end-hook known{end-hook}if
%%EOF
From owner-mpi-collcomm@CS.UTK.EDU  Wed Mar 17 03:33:29 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA11551; Wed, 17 Mar 93 03:33:29 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA00308; Wed, 17 Mar 93 03:33:01 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 17 Mar 1993 03:33:00 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from gmdzi.gmd.de by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA00300; Wed, 17 Mar 93 03:32:54 -0500
Received: from f1neuman.gmd.de (f1neuman) by gmdzi.gmd.de with SMTP id AA10257
  (5.65c/IDA-1.4.4 for <mpi-collcomm@cs.utk.edu>); Wed, 17 Mar 1993 09:31:20 +0100
Received: by f1neuman.gmd.de id AA15351; Wed, 17 Mar 1993 09:32:45 GMT
Date: Wed, 17 Mar 1993 09:32:45 GMT
From: Rolf.Hempel@gmd.de
Message-Id: <9303170932.AA15351@f1neuman.gmd.de>
To: mpi-collcomm@cs.utk.edu
Subject: information cacheing
Cc: gmap10@f1neuman.gmd.de


Thanks to Rick for the clarification! Without any doubt the proposed
cacheing mechanism is very useful for implementors of global
communication routines. The remaining question is whether we want to
export it to writers of custom-made collective routines, and therefore
put it into the standard. If we decide so, then we have to mark this
section such that the regular MPI user knows that he does not have to
read it.

Rolf Hempel
From owner-mpi-collcomm@CS.UTK.EDU  Wed Mar 17 05:28:26 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA20171; Wed, 17 Mar 93 05:28:26 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA09098; Wed, 17 Mar 93 05:27:37 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 17 Mar 1993 05:27:36 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from dino.conicit.ve by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA09089; Wed, 17 Mar 93 05:27:32 -0500
Received: by dino.conicit.ve (4.1/SMI-4.1/RP-1.2)
	id AA06437; Wed, 17 Mar 93 06:27:47-040
From: mcuttin@conicit.ve (Marco Cuttin (USB))
Message-Id: <9303171027.AA06437@dino.conicit.ve>
Subject: MPI Group information required
To: gst@ornl.gov, mpi-collcomm@cs.utk.edu
Date: Wed, 17 Mar 93 6:27:47 AST
Cc: cuttin@usb.ve (Marco Cuttin (USB-PDP-SUN))
X-Mailer: ELM [version 2.2 PL13]

Mr. Geist
We at the Simon Bolivar University of Caracas Venezuela are trying to
implement the MPI standard on a transputer plattform. We have been
reading the different mails that are offered by the MPI commetee
but we need to have more information about the concept of groups. We
have seen this concept on the original draft standard (A proposal for a
User-level Message-Passing interface in a distributed memory
environment, October 1992).
Please let us know what you mean with the concept of group of processes,
and any other information you think will help us in our implementation. 
Hoping to read you soon, and thanking you in advance

sicerely,

Marco Cuttin
cuttin@usb.ve, mcuttin@conicit.ve
FAX: +58-2-238-1816
Phone: +58-2-238-7749

From owner-mpi-collcomm@CS.UTK.EDU  Wed Mar 17 09:25:09 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA15942; Wed, 17 Mar 93 09:25:09 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA19426; Wed, 17 Mar 93 09:24:05 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 17 Mar 1993 09:24:03 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from gmdzi.gmd.de by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA19418; Wed, 17 Mar 93 09:23:58 -0500
Received: from f1neuman.gmd.de (f1neuman) by gmdzi.gmd.de with SMTP id AA29985
  (5.65c/IDA-1.4.4 for <mpi-collcomm@cs.utk.edu>); Wed, 17 Mar 1993 15:22:23 +0100
Received: by f1neuman.gmd.de id AA15159; Wed, 17 Mar 1993 15:23:46 GMT
Date: Wed, 17 Mar 1993 15:23:46 GMT
From: Rolf.Hempel@gmd.de
Message-Id: <9303171523.AA15159@f1neuman.gmd.de>
To: mpi-collcomm@cs.utk.edu
Subject: New draft
Cc: gmap10@f1neuman.gmd.de


The distribution of the second version of the COLLCOMM draft after
just a few days came as a surprise to me. There are some changes,
and I would like to throw in a few comments:

1. What I said yesterday about the shift function still holds with the
   new draft. If the shift is based on the group topology, the end-off
   version comes as a special case of the (single) shift function,
   depending on whether the cartesian topology is periodic in the
   shift direction or not. I still propose to add the "direction"
   argument.

2. I do not like to return via the "len" argument the difference of
   the buffer length and the actual message length. It is common
   practice to return the message length, and that's what most users
   will expect. If we chose the other definition, this will lead to
   frequent user errors.

3. In Marc's definition, a list of handles contains the number of
   elements as the first entry. I would prefer an additional argument
   over putting together the handles and their number into one vector
   (at least in Fortran). As far as I understand the definition code of
   routine MPI_SCATTER, in the loop
    
      for (i=0; i < size; i++)
         MPI_SEND(inbuf[i], i, tag, group);

   inbuf[i] must be replaced by something like list_of_inbufs.inbuf[i].

4. In function MPI_REDUCE (or in an additional function) I would like
   to see the possibility of specifying different operations for
   different elements of the input buffer. So, it would be possible
   to have a buffer of two reals, and to compute the global sum on the
   first entry and the maximum on the second. In the current proposal
   I don't see how this can be done, at least not in Fortran, even if
   one resorts to the MPI_USER_REDUCE function.

Rolf Hempel
From owner-mpi-collcomm@CS.UTK.EDU  Wed Mar 17 11:25:17 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA19014; Wed, 17 Mar 93 11:25:17 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA26161; Wed, 17 Mar 93 11:21:59 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 17 Mar 1993 11:21:57 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from watson.ibm.com by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA26153; Wed, 17 Mar 93 11:21:56 -0500
Message-Id: <9303171621.AA26153@CS.UTK.EDU>
Received: from YKTVMV by watson.ibm.com (IBM VM SMTP V2R3) with BSMTP id 2519;
   Wed, 17 Mar 93 11:21:53 EST
Date: Wed, 17 Mar 93 11:09:52 EST
From: "Marc Snir" <snir@watson.ibm.com>
X-Addr: (914) 945-3204  (862-3204)
        28-226 IBM T.J. Watson Research Center
        P.O. Box 218 Yorktown Heights NY 10598
To: mpi-collcomm@cs.utk.edu
Reply-To: SNIR@watson.ibm.com
Subject:  New draft

Reference:  Attached note from Rolf.Hempel at gmd.de




*************** Forwarded Note ***************

Received: from CS.UTK.EDU by watson.ibm.com (IBM VM SMTP V2R3) with TCP;
   Wed, 17 Mar 93 09:27:56 EST
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA19426; Wed, 17 Mar 93 09:24:05 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 17 Mar 1993 09:24:03 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from gmdzi.gmd.de by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA19418; Wed, 17 Mar 93 09:23:58 -0500
Received: from f1neuman.gmd.de (f1neuman) by gmdzi.gmd.de with SMTP id AA29985
  (5.65c/IDA-1.4.4 for <mpi-collcomm@cs.utk.edu>); Wed, 17 Mar 1993 15:22:23 +0100
Received: by f1neuman.gmd.de id AA15159; Wed, 17 Mar 1993 15:23:46 GMT
Date: Wed, 17 Mar 1993 15:23:46 GMT
From: Rolf.Hempel at gmd.de
Message-Id: <9303171523.AA15159@f1neuman.gmd.de>
To: mpi-collcomm@cs.utk.edu
Subject: New draft
Cc: gmap10@f1neuman.gmd.de


The distribution of the second version of the COLLCOMM draft after
just a few days came as a surprise to me. There are some changes,
and I would like to throw in a few comments:

1. What I said yesterday about the shift function still holds with the
   new draft. If the shift is based on the group topology, the end-off
   version comes as a special case of the (single) shift function,
   depending on whether the cartesian topology is periodic in the
   shift direction or not. I still propose to add the "direction"
   argument.

>>> Is this an argument for "topological shift" functions, that use
>>> CSHIFT or EOSHIFT as appropriate, or is this an argument for
>>> different group shift functions?


2. I do not like to return via the "len" argument the difference of
   the buffer length and the actual message length. It is common
   practice to return the message length, and that's what most users
   will expect. If we chose the other definition, this will lead to
   frequent user errors.

>>> The reason for my choice is that I believe that most of the time
>>> people will check for a match (len=0) or mismatch (len >0).  I
>>> wanted this test to be easy.  But I am willing to bow to "accepted
>>> practice" if, indeed, there is an entrenched practice.



3. In Marc's definition, a list of handles contains the number of
   elements as the first entry. I would prefer an additional argument
   over putting together the handles and their number into one vector
   (at least in Fortran). As far as I understand the definition code of
   routine MPI_SCATTER, in the loop

      for (i=0; i < size; i++)
         MPI_SEND(inbuf[i], i, tag, group);

   inbuf[i] must be replaced by something like list_of_inbufs.inbuf[i].

>>> I don't see the virtue of an additional argument.  The definition
>>> code is, indeed, messier than what I provided, but this makes the
>>> user interface simpler, which is goodness.  I don't think there
>>> will be any significant difference in efficiency.  In any case, if we
>>> change here the definition of "list of handles", it should be
>>> done consistently across MPI, including for the WAITALL, WAITANY
>>> functions. I don't view this as a major issue.

4. In function MPI_REDUCE (or in an additional function) I would like
   to see the possibility of specifying different operations for
   different elements of the input buffer. So, it would be possible
   to have a buffer of two reals, and to compute the global sum on the
   first entry and the maximum on the second. In the current proposal
   I don't see how this can be done, at least not in Fortran, even if
   one resorts to the MPI_USER_REDUCE function.

>>> A proposal will be appreciated.



Rolf Hempel
>>> Thanks for the comments
>>> Marc Snir
From owner-mpi-collcomm@CS.UTK.EDU  Wed Mar 17 15:19:59 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA24555; Wed, 17 Mar 93 15:19:59 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA07968; Wed, 17 Mar 93 15:19:12 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 17 Mar 1993 15:19:10 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from gstws.EPM.ORNL.GOV by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA07952; Wed, 17 Mar 93 15:19:09 -0500
Received: by gstws.epm.ornl.gov (AIX 3.2/UCB 5.64/4.03)
          id AA16341; Wed, 17 Mar 1993 15:19:07 -0500
Date: Wed, 17 Mar 1993 15:19:07 -0500
From: geist@gstws.epm.ornl.gov (Al Geist)
Message-Id: <9303172019.AA16341@gstws.epm.ornl.gov>
To: mpi-collcomm@cs.utk.edu
Subject: Re: New draft



>What I said yesterday about the shift function still holds with the
>the new draft. I still propose to add the "direction" argument.

point taken. I sent out the revised draft before I saw your comments Rolf.
There is the question of what does "direction" mean if the user
has not specified a topology. Can he use the collective functions
without envoking topology routines?

>I do not like to return via the "len" argument the difference of
>the buffer length and the actual message length. It is common
>practice to return the message length, and that's what most users
>will expect.

We should discuss this in Dallas, I suspect most will favor your 
"common practice" approach.

>In Marc's definition, a list of handles contains the number of
>elements as the first entry. I would prefer an additional argument
>over putting together the handles and their number into one vector.

What are other other subcommittee members thoughts?

>to see the possibility of specifying different operations for
>different elements of the input buffer
>I don't see how this can be done, at least not in Fortran,

We should make sure the draft reads that the MPI_USER_REDUCE function
specified by the user takes the buffer as an argument so that the
user can manipulate the buffer any way he wishes before returning
it to MPI_USER_REDUCE. PICL contains a routine like this so it is possible.

Al
From owner-mpi-collcomm@CS.UTK.EDU  Wed Mar 17 15:28:05 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA24602; Wed, 17 Mar 93 15:28:05 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA08481; Wed, 17 Mar 93 15:27:00 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 17 Mar 1993 15:26:58 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from sampson.ccsf.caltech.edu by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA08473; Wed, 17 Mar 93 15:26:56 -0500
Received: from elephant (elephant.parasoft.com) by sampson.ccsf.caltech.edu with SMTP id AA22235
  (5.65c/IDA-1.4.4 for mpi-collcomm@cs.utk.edu); Wed, 17 Mar 1993 12:26:52 -0800
Received: from lion.parasoft by elephant (4.1/SMI-4.1)
	id AA20937; Wed, 17 Mar 93 12:19:02 PST
Received: by lion.parasoft (4.1/SMI-4.1)
	id AA02454; Wed, 17 Mar 93 12:19:42 PST
Date: Wed, 17 Mar 93 12:19:42 PST
From: jwf@lion.Parasoft.COM (Jon Flower)
Message-Id: <9303172019.AA02454@lion.parasoft>
To: mpi-collcomm@cs.utk.edu


I'm definitely in favor of having the number of entries in the list
and the list as separate arguments. Although I don't think it's
confusing either way, as long as its documented clearly, it's
very easy to make simple "off by one" errors if you put everything
together. 

I would like MPI to be consistent in having distinct arguments
for lists or things and the number of them.

	Jon

From owner-mpi-collcomm@CS.UTK.EDU  Wed Mar 17 15:31:37 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA24665; Wed, 17 Mar 93 15:31:37 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA08743; Wed, 17 Mar 93 15:30:50 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 17 Mar 1993 15:30:49 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from watson.ibm.com by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA08725; Wed, 17 Mar 93 15:30:47 -0500
Message-Id: <9303172030.AA08725@CS.UTK.EDU>
Received: from YKTVMV by watson.ibm.com (IBM VM SMTP V2R3) with BSMTP id 5629;
   Wed, 17 Mar 93 15:30:36 EST
Date: Wed, 17 Mar 93 15:25:59 EST
From: "Marc Snir" <snir@watson.ibm.com>
X-Addr: (914) 945-3204  (862-3204)
        28-226 IBM T.J. Watson Research Center
        P.O. Box 218 Yorktown Heights NY 10598
To: mpi-collcomm@cs.utk.edu
Subject: direction for shifts
Reply-To: SNIR@watson.ibm.com

The current definition of shift in the draft assumes no topology information
for the underlying group -- just an ordering of the processes in the group.
Thus, direction is not meaningful in this context.   A "grid shift"
operation that would act on grids (i.e., on topological objects, rather than
groups) could take advantage of this additional parameter.

I don't think it's a good idea to force each group object to be associated with
a topology.   More to come on this.
From owner-mpi-collcomm@CS.UTK.EDU  Wed Mar 17 17:52:04 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA27117; Wed, 17 Mar 93 17:52:04 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA18194; Wed, 17 Mar 93 17:51:27 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 17 Mar 1993 17:51:25 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from pnlg.pnl.gov by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA18183; Wed, 17 Mar 93 17:51:21 -0500
Received: from fermi.pnl.gov (130.20.182.50) by pnlg.pnl.gov; Wed, 17 Mar 93
 14:49 PST
Received: by fermi.pnl.gov (4.1/SMI-4.1) id AA22140; Wed, 17 Mar 93 14:47:28 PST
Date: Wed, 17 Mar 93 14:47:27 -0800
From: Robert J Harrison <d3g681@fermi.pnl.gov>
Subject: Re: New draft
To: mpi-collcomm@cs.utk.edu
Message-Id: <9303172247.AA22140@fermi.pnl.gov>
In-Reply-To: Your message of "Wed, 17 Mar 93 15:19:07 EST."
 <9303172019.AA16341@gstws.epm.ornl.gov>
X-Envelope-To: mpi-collcomm@cs.utk.edu

In message <9303172019.AA16341@gstws.epm.ornl.gov> you write:
> 

...

> 
> >I do not like to return via the "len" argument the difference of
> >the buffer length and the actual message length. It is common
> >practice to return the message length, and that's what most users
> >will expect.
> 
> We should discuss this in Dallas, I suspect most will favor your 
> "common practice" approach.

I certainly strongly endorse the common practice argument in this
instance.  Also, in FORTRAN, there is no special interpretation of
zero being equivalent to FALSE, as there is in C.

> 
> >In Marc's definition, a list of handles contains the number of
> >elements as the first entry. I would prefer an additional argument
> >over putting together the handles and their number into one vector.
> 
> What are other other subcommittee members thoughts?

Certainly, in FORTRAN again, it would be much easier, and also far more
consistent with current practice, to manipulate them separately than
together.  It is possible to use an EQUIVALENCE to treat the
array reference as a true scalar variable, but this is generally
deprecated practice.

> 
> >to see the possibility of specifying different operations for
> >different elements of the input buffer
> >I don't see how this can be done, at least not in Fortran,
> 
> We should make sure the draft reads that the MPI_USER_REDUCE function
> specified by the user takes the buffer as an argument so that the
> user can manipulate the buffer any way he wishes before returning
> it to MPI_USER_REDUCE. PICL contains a routine like this so it is possible.

If this functionality of operating on different pieces of the data
vector with different functions is to be supported, it should not
compromise the possible efficiency of the simpler single function
operations.  By requiring that the entire vector be available before
the function operates we preclude some *major* optimizations (e.g.
pipelining, recursive splitting, ...) that can transform, for
example,  naive O(N log P) algorithms to effective O(N) algorithms
(for some N >> P).

I propose therefore that two interfaces be provided.  One that is
capable of functioning by applying a user supplied function on
arbitary (with constraints of item size) chunks of the vector.
Another, as described by Al, that is given the whole vector.

I also propose that we futher discuss if MPI-1 should worry about
providing this second routine.

Robert.
From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 18 02:44:37 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA06810; Thu, 18 Mar 93 02:44:37 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA13098; Thu, 18 Mar 93 02:44:01 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 18 Mar 1993 02:44:00 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from gmdzi.gmd.de by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA13089; Thu, 18 Mar 93 02:43:57 -0500
Received: from f1neuman.gmd.de (f1neuman) by gmdzi.gmd.de with SMTP id AA08455
  (5.65c/IDA-1.4.4 for <mpi-collcomm@cs.utk.edu>); Thu, 18 Mar 1993 08:42:23 +0100
Received: by f1neuman.gmd.de id AA15304; Thu, 18 Mar 1993 08:43:49 GMT
Date: Thu, 18 Mar 1993 08:43:49 GMT
From: Rolf.Hempel@gmd.de
Message-Id: <9303180843.AA15304@f1neuman.gmd.de>
To: mpi-collcomm@cs.utk.edu
Subject: more on New Draft
Cc: gmap10@f1neuman.gmd.de


Just a few more thoughts about the new COLLCOMM proposal:

1. Commenting on my proposed change to MPI_SHIFT Marc asks:

   >>> Is this an argument for "topological shift" functions, that use
   >>> CSHIFT or EOSHIFT as appropriate, or is this an argument for
   >>> different group shift functions?

   Well, it depends on whether we can agree on a default topology for
   a group (this is the topology of a group which is not created by
   a topology definition function like MPI_CART). If we define this
   default to be a ring topology, then we need only one shift function.
   I think there is some reason for this approach. After all, the
   linear ordering of processes by rank (with wrap-around as in some
   examples we have seen) is nothing else than a logical ring topology.

2. Yesterday I proposed that we should have a reduce function with
   the capability of applying different operations on different 
   elements of the buffer. Al's suggestion to define the operands
   of the MPI_USER_REDUCE function as being blocks instead of single
   elements seems to resolve my problem. Bob Harrison then said that
   both versions should be available for the sake of efficient
   implementations. Perhaps the algorithms he mentioned (pipelining,
   recursive splitting,...) could work on the level of blocks in a
   vector type buffer. Would this work as a compromise?

Rolf Hempel
From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 18 08:26:19 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA10327; Thu, 18 Mar 93 08:26:19 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA04137; Thu, 18 Mar 93 08:25:33 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 18 Mar 1993 08:25:31 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from daedalus.epcc.ed.ac.uk by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA04106; Thu, 18 Mar 93 08:25:24 -0500
Date: Thu, 18 Mar 93 13:25:15 GMT
Message-Id: <9144.9303181325@subnode.epcc.ed.ac.uk>
From: L J Clarke <lyndon@epcc.ed.ac.uk>
Subject: document of March 16
To: mpi-collcomm@cs.utk.edu
Reply-To: lyndon@epcc.ed.ac.uk

Hi all

I just got back from approx.  one week of leave and customer vistis, and
what a lovely load of email I find!

I have just read the document of March 16, "Collective Communication",
of Al and Marc.  On first skim-read, this looks generally great,
although I do have a couple of problems with it. 

I suggest that certain sections be deleted from this document, as they
do not appear to be within the remit of the collective communication
subcommittee. The material is:

a) Section 1.2 from "MPI_COPY_CONTEXT(" to the end of section 1.2, as
this would appear to be within the remit of the context subcommittee. 

Comments? Flames??

Best Wishes
Lyndon

         /--------------------------------------------------------\
    e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
    c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
         \--------------------------------------------------------/


From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 18 08:56:56 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA10924; Thu, 18 Mar 93 08:56:56 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA05271; Thu, 18 Mar 93 08:54:32 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 18 Mar 1993 08:54:31 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from watson.ibm.com by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA05262; Thu, 18 Mar 93 08:54:29 -0500
Message-Id: <9303181354.AA05262@CS.UTK.EDU>
Received: from YKTVMV by watson.ibm.com (IBM VM SMTP V2R3) with BSMTP id 1801;
   Thu, 18 Mar 93 08:54:30 EST
Date: Thu, 18 Mar 93 08:51:17 EST
From: "Marc Snir" <snir@watson.ibm.com>
X-Addr: (914) 945-3204  (862-3204)
        28-226 IBM T.J. Watson Research Center
        P.O. Box 218 Yorktown Heights NY 10598
To: mpi-collcomm@cs.utk.edu
Reply-To: SNIR@watson.ibm.com
Subject: reduce

*************** Referenced Note ***************

Received: from CS.UTK.EDU by watson.ibm.com (IBM VM SMTP V2R3) with TCP;
   Thu, 18 Mar 93 02:47:11 EST
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA13098; Thu, 18 Mar 93 02:44:01 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 18 Mar 1993 02:44:00 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from gmdzi.gmd.de by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA13089; Thu, 18 Mar 93 02:43:57 -0500
Received: from f1neuman.gmd.de (f1neuman) by gmdzi.gmd.de with SMTP id AA08455
  (5.65c/IDA-1.4.4 for <mpi-collcomm@cs.utk.edu>); Thu, 18 Mar 1993 08:42:23 +0100
Received: by f1neuman.gmd.de id AA15304; Thu, 18 Mar 1993 08:43:49 GMT
Date: Thu, 18 Mar 1993 08:43:49 GMT
From: Rolf.Hempel@gmd.de
Message-Id: <9303180843.AA15304@f1neuman.gmd.de>
To: mpi-collcomm@cs.utk.edu
Subject: more on New Draft
Cc: gmap10@f1neuman.gmd.de


Just a few more thoughts about the new COLLCOMM proposal:

1. Commenting on my proposed change to MPI_SHIFT Marc asks:

   >>> Is this an argument for "topological shift" functions, that use
   >>> CSHIFT or EOSHIFT as appropriate, or is this an argument for
   >>> different group shift functions?

   Well, it depends on whether we can agree on a default topology for
   a group (this is the topology of a group which is not created by
   a topology definition function like MPI_CART). If we define this
   default to be a ring topology, then we need only one shift function.
   I think there is some reason for this approach. After all, the
   linear ordering of processes by rank (with wrap-around as in some
   examples we have seen) is nothing else than a logical ring topology.

*** I think this discussion has to be postponed until we resolve the
*** question of the status of topologies in MPI.


2. Yesterday I proposed that we should have a reduce function with
   the capability of applying different operations on different
   elements of the buffer. Al's suggestion to define the operands
   of the MPI_USER_REDUCE function as being blocks instead of single
   elements seems to resolve my problem. Bob Harrison then said that
   both versions should be available for the sake of efficient
   implementations. Perhaps the algorithms he mentioned (pipelining,
   recursive splitting,...) could work on the level of blocks in a
   vector type buffer. Would this work as a compromise?

*** The proposal, I assume, is to have two user defined reduce functions:
*** one is elemental, and applies to each element in the input buffer;
*** the other applies to the entire input buffer, as one argument.



Rolf Hempel

*** Marc


From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 18 09:09:45 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA11145; Thu, 18 Mar 93 09:09:45 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA05806; Thu, 18 Mar 93 09:08:55 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 18 Mar 1993 09:08:54 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from sampson.ccsf.caltech.edu by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA05798; Thu, 18 Mar 93 09:08:51 -0500
Received: from elephant (elephant.parasoft.com) by sampson.ccsf.caltech.edu with SMTP id AA17821
  (5.65c/IDA-1.4.4 for mpi-collcomm@cs.utk.edu); Thu, 18 Mar 1993 06:08:49 -0800
Received: from lion.parasoft by elephant (4.1/SMI-4.1)
	id AA05586; Thu, 18 Mar 93 06:00:54 PST
Received: by lion.parasoft (4.1/SMI-4.1)
	id AA02012; Thu, 18 Mar 93 06:01:37 PST
Date: Thu, 18 Mar 93 06:01:37 PST
From: jwf@lion.Parasoft.COM (Jon Flower)
Message-Id: <9303181401.AA02012@lion.parasoft>
To: mpi-collcomm@cs.utk.edu
Subject: Default topology


I agree with Rolf. whether we like it or not there is
definitely a default topology associated with every group
of nodes. IN fact its the one that most people take advantage
of in their programs whenever they make use of processor
numbers that are merely ranks in this group.

I think it would be eminently sensible to take advantage
of this default topology and endow it with whatever
properties all the other topologies will have. That way
we can have both opaque node identifiers but still easy
access to rank information without inventing yet another
set of routines.

I think the shift function (and its partner, exchange)
are extremely useful.

I also agree with the comments about the user level REDUCE 
operation. We have implemented the "Blocks" approach in
Express and it seems to work fine. Of course the user
can cause it to break by having extremely large blocks
but that's always going to be true.

	Jon
From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 18 09:43:46 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA11672; Thu, 18 Mar 93 09:43:46 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA07311; Thu, 18 Mar 93 09:43:18 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 18 Mar 1993 09:43:17 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from marge.meiko.com by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA07302; Thu, 18 Mar 93 09:43:14 -0500
Received: from hub.meiko.co.uk by marge.meiko.com with SMTP id AA13129
  (5.65c/IDA-1.4.4 for <mpi-collcomm@cs.utk.edu>); Thu, 18 Mar 1993 09:43:10 -0500
Received: from float.co.uk (float.meiko.co.uk) by hub.meiko.co.uk (4.1/SMI-4.1)
	id AA11776; Thu, 18 Mar 93 14:43:06 GMT
Date: Thu, 18 Mar 93 14:43:06 GMT
From: jim@meiko.co.uk (James Cownie)
Message-Id: <9303181443.AA11776@hub.meiko.co.uk>
Received: by float.co.uk (5.0/SMI-SVR4)
	id AA01402; Thu, 18 Mar 93 14:39:55 GMT
To: mpi-collcomm@cs.utk.edu
In-Reply-To: Jon Flower's message of Thu, 18 Mar 93 06:01:37 PST <9303181401.AA02012@lion.parasoft>
Subject: User reduction functions
Content-Length: 1422

From a performance point of view it is important to allow the
vectorisation of these calls, especially if they need a whole
transition into user space (as may be the case in some
implementations). [See Nessett's work on data conversion for the
effects of removing similarly cheap subroutine calls from the loop].

I would therefore suggest that the user function operations should
look like this.

MPI_USER_XXX(inbuf,outbuf,tag,group,function,BLOCKSIZE)

where BLOCKSIZE must be a factor of the length of inbuf, and is the
smallest chunk which will be passed to the function. [This lets the
implementation split the buffer if that is beneficial, while allowing
the user to ensure that suitable contiguous chunks are kept together
if that is a requirement, say because the buffer is really an array of
structures. ]

The user function should ALWAYS look something like

void reduceFunction(inbuf1, inbuf2, outbuf, nelems)
{
   register int i;

   for (i=0; i < nelems; i++)
     outbuf[i] = inbuf1[i] OP inbuf2[i];
}	

Questions :--
1) Should nelems be passed in, or can the user function obtain this
   from the buffer descriptors (cheaply !)

-- Jim
James Cownie 
Meiko Limited			Meiko Inc.
650 Aztec West			Reservoir Place
Bristol BS12 4SD		1601 Trapelo Road
England				Waltham
				MA 02154

Phone : +44 454 616171		+1 617 890 7676
FAX   : +44 454 618188		+1 617 890 5042
E-Mail: jim@meiko.co.uk   or    jim@meiko.com

From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 18 11:56:59 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA14768; Thu, 18 Mar 93 11:56:59 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA13527; Thu, 18 Mar 93 11:55:38 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 18 Mar 1993 11:55:36 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from pnlg.pnl.gov by CS.UTK.EDU with SMTP (5.61++/2.8s-UTK)
	id AA13498; Thu, 18 Mar 93 11:55:30 -0500
Received: from fermi.pnl.gov (130.20.182.50) by pnlg.pnl.gov; Thu, 18 Mar 93
 08:54 PST
Received: by fermi.pnl.gov (4.1/SMI-4.1) id AA23964; Thu, 18 Mar 93 08:53:04 PST
Date: Thu, 18 Mar 93 08:53:03 -0800
From: Robert J Harrison <d3g681@fermi.pnl.gov>
Subject: Re: User reduction functions
To: mpi-collcomm@cs.utk.edu
Message-Id: <9303181653.AA23964@fermi.pnl.gov>
In-Reply-To: Your message of "Thu, 18 Mar 93 14:43:06 GMT."
 <9303181443.AA11776@hub.meiko.co.uk>
X-Envelope-To: mpi-collcomm@cs.utk.edu

In message <9303181443.AA11776@hub.meiko.co.uk> you write:

c.f. the previous discussion about supporting multiple functions
     to operate on disjoint sections Jim's syntax might be slightly
     adjusted to include an additional argument, base, the offset
     of this array chunk in the entire vector.
> 
> The user function should ALWAYS look something like
> 
> void reduceFunction(inbuf1, inbuf2, outbuf, nelems)

  void reduceFunction(inbuf1, inbuf2, outbuf, nelems, base)

> {
>    register int i;
> 
>    for (i=0; i < nelems; i++)
>      outbuf[i] = inbuf1[i] OP inbuf2[i];

       outbuf[i] = inbuf1[i] OP[i+base] inbuf2[i];

> }	

OP[i+base] could of course be independent of its argument.

> 
> Questions :--
> 1) Should nelems be passed in, or can the user function obtain this
>    from the buffer descriptors (cheaply !)

I would recomend that all required information be passed directly
in as arguments.  Since FORTRAN does not typically support macros
or inline functions in as clean a fashion as C or C++, there is
little chance to optimize away any subroutine call overhead.

Robert.


Robert J. Harrison

Mail Stop K1-90                             tel: 509-375-2037
Battelle Pacific Northwest Laboratory       fax: 509-375-6631
P.O. Box 999, Richland WA 99352          E-mail: rj_harrison@pnl.gov

From owner-mpi-collcomm@CS.UTK.EDU  Mon Mar 22 13:28:31 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA28628; Mon, 22 Mar 93 13:28:31 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA19822; Mon, 22 Mar 93 13:27:26 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Mon, 22 Mar 1993 13:27:25 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from pnlg.pnl.gov by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA19813; Mon, 22 Mar 93 13:27:23 -0500
Received: from carbon.pnl.gov (130.20.188.38) by pnlg.pnl.gov; Mon, 22 Mar 93
 10:23 PST
Received: from sodium.pnl.gov by carbon.pnl.gov (4.1/SMI-4.1) id AA06637; Mon,
 22 Mar 93 10:22:02 PST
Received: by sodium.pnl.gov (4.1/SMI-4.0) id AA16699; Mon, 22 Mar 93 10:21:58
 PST
Date: Mon, 22 Mar 93 10:21:58 PST
From: rj_littlefield@pnlg.pnl.gov
Subject: inbuf == outbuf
To: geist@gstws.epm.ornl.gov, mpi-collcomm@cs.utk.edu
Cc: d39135@carbon.pnl.gov
Message-Id: <9303221821.AA16699@sodium.pnl.gov>
X-Envelope-To: mpi-collcomm@cs.utk.edu

The March 16 collective communication draft asks:

> Do we want to support the case {\tt inbuf = outbuf} somehow?

Yes -- this is important to some of our applications.

If inbuf=outbuf is not permitted, then these applications
have to explicitly copy data and to allocate extra storage.
The extra storage may also have to be smaller than the
buffer that the application would otherwise handle, due to
hard limits on process memory.

The resulting application code is certainly longer and
probably slower than it would be with inbuf=outbuf.

However, I believe it would be a mistake to allow arbitrary
overlap, due to the difficulty of writing correct code,
particularly in user reduction routines.  It would be OK if
inbuf and outbuf had to be either disjoint or coincident.

--Rik

----------------------------------------------------------------------
rj_littlefield@pnl.gov (alias 'd39135')   Rik Littlefield
Tel: 509-375-3927                         Pacific Northwest Lab, MS K1-87
Fax: 509-375-6631                         P.O.Box 999, Richland, WA  99352
From owner-mpi-collcomm@CS.UTK.EDU  Tue Mar 23 14:27:00 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA25676; Tue, 23 Mar 93 14:27:00 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA25694; Tue, 23 Mar 93 14:26:33 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Tue, 23 Mar 1993 14:26:32 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from almaden.ibm.com by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA25686; Tue, 23 Mar 93 14:26:30 -0500
Message-Id: <9303231926.AA25686@CS.UTK.EDU>
Received: from almaden.ibm.com by almaden.ibm.com (IBM VM SMTP V2R2)
   with BSMTP id 4942; Tue, 23 Mar 93 11:26:55 PST
Date: Tue, 23 Mar 93 11:24:18 PST
From: "Ching-Tien (Howard) Ho" <ho@almaden.ibm.com>
To: mpi-collcomm@cs.utk.edu
Subject: No tag for a CC routine?

Hi,
  I like to revisit an old issue regarding the Collective Communication (CC)
proposal to MPI.

Do we really need a user-supplied tag (aka type) in a CC call?

I know most of you believe a tag is needed and am ready to get a lot of
objections.  I remember the issue was raised in the first MPI meeting but not
really resolved.  I didn't attend the 2nd one and don't know if that was
discussed.  FYI, the original design of Venus, by Bala
and Kipnis, took a tag as well.  However, the
newer version of Venus also removed a tag from a call to CC.
In the Collective Communication Library (CCL)
which is part of the External User Interface (EUI) of IBM's Scalable Parallel
Systems, we decided not to take "tags" for CC routines after various
discussion.  (See some supporting arguments below.  The receive-by-source
methodology is described in other part of a forth-coming paper of ours.)

========================================================================
\subsection{No Tags for CCL Routines}

Certain communication libraries
require the user to supply a user tag to each CCL call.  There
are a few disadvantages to this approach.  Consider two typical cases
for the semantics of this user's tag.  One is that the user needs to
guarantee that the tag is uniquely matched within the given group
instance and cannot be matched with any other group instances existing
at the same time, for all possible program runs.  The other case is
that the user only guarantees that the tag is matched within the given
group instance, but may not be unique at a given time.

Consider the semantics of the first case for the user's tag.  Although
this helps to simplify the implementation, it is inconvenient and
tedious, if not impossible, for the user to guarantee such property on
the tag, given the fact that the receive-by-source methodology used in
the implementation of CCL can solve the
matching problem gracefully.  For the second case, it means that the
implementation cannot use the user's tag alone in selecting the
expected incoming message, as confusion may occur.  In fact, using
(gid, tag) to select an incoming message still cannot guarantee
correctness.  Thus, the receive-by-source implementation is still
needed to guarantee correctness.  In other words, the tag is
really a redundant field which does not add more functionality and
cannot substitute the receive-by-source
implementation.  Furthermore, the semantics of tag here are not
consistent with those in the context of point-to-point communication.
Specifically, the
tag here is used as message matching from a given source.  (That is if
the tag of the expected message does not match with the tag of the
first message from the specified source, CCL returns an error
flag.)  In contrast, the tag in point-to-point routines is used in
selecting a message from a given source.  (That is if the tag of the
expected message does not match with the tag of the first message from
the specified source, the receive call simply blocks until there is
one that matches.)  In summary, a tag is a redundant argument to the CCL
routines. It is not required for correct implementation of CCL, and it may
cause confusion to the users.  The only advantage for
having a tag field in collective communication routines is that in
an incorrect program where the collective communication routines
are mismatched, one may be able to locate the mismatch at
an earlier place than otherwise (assuming the user does not introduce
tag-mismatch errors).

===========================================================================

Any comments?

-- Howard







From owner-mpi-collcomm@CS.UTK.EDU  Tue Mar 23 14:38:00 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA25900; Tue, 23 Mar 93 14:38:00 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA26157; Tue, 23 Mar 93 14:37:32 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Tue, 23 Mar 1993 14:37:31 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from daedalus.epcc.ed.ac.uk by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA26148; Tue, 23 Mar 93 14:37:27 -0500
Date: Tue, 23 Mar 93 19:37:21 GMT
Message-Id: <16027.9303231937@subnode.epcc.ed.ac.uk>
From: L J Clarke <lyndon@epcc.ed.ac.uk>
Subject: Re: No tag for a CC routine?
To: "Ching-Tien (Howard) Ho" <ho@almaden.ibm.com>, mpi-collcomm@cs.utk.edu
In-Reply-To: Howard's message of Tue, 23 Mar 93 11:24:18 PST
Reply-To: lyndon@epcc.ed.ac.uk

> Hi,
>   I like to revisit an old issue regarding the Collective Communication (CC)
> proposal to MPI.

I support the specification of collective communications without use of
message tag. I just cannot see that it is needed there.

Best Wishes
Lyndon

         /--------------------------------------------------------\
    e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
    c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
         \--------------------------------------------------------/


From owner-mpi-collcomm@CS.UTK.EDU  Wed Mar 24 00:12:23 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA04586; Wed, 24 Mar 93 00:12:23 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA19034; Wed, 24 Mar 93 00:11:46 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 24 Mar 1993 00:11:44 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from watson.ibm.com by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA19026; Wed, 24 Mar 93 00:11:41 -0500
Message-Id: <9303240511.AA19026@CS.UTK.EDU>
Received: from YKTVMV by watson.ibm.com (IBM VM SMTP V2R3) with BSMTP id 2163;
   Wed, 24 Mar 93 00:11:41 EST
Date: Wed, 24 Mar 93 00:11:40 EST
From: "Marc Snir" <snir@watson.ibm.com>
To: MPI-COLLCOMM@CS.UTK.EDU

\documentstyle[12pt]{article}


\newcommand{\discuss}[1]{
\ \\ \ \\ {\small {\bf Discussion:} #1} \ \\ \ \\
}

\newcommand{\missing}[1]{
\ \\ \ \\ {\small {\bf Missing:} #1} \\ \ \\
}

\begin{document}

\title{ Collective Communication}


\author{Al Geist \\ Marc Snir}
\maketitle

\section{Collective Communication}
\subsection{Introduction}

This section is a draft of the current proposal for collective communication.
Collective communication is defined to be communication that involves
a group of processes.  Examples are broadcast and global sum.
A collective operation is executed by having all processes in the group call the
communication routine, with matching parameters.
Routines can (but are not required to) return as soon as their
participation in the collective communication is complete.  The completion
of a call indicates that the caller is now free to access the locations in the
communication buffer, or any other location that can be referenced by the
collective operation.  However, it does not indicate that other processes in
the group have started the operation (unless otherwise indicated in the
description of the operation).   However, the successful completion of
a collective communication call may depend on the execution of a matching call
at all processes in the group.

The syntax and semantics of the collective operations is
defined so as to be consistent with the syntax and semantics of the point to
point operations.

The reader is referred to the point-to-point communication section of the current
MPI draft for information concerning groups (aka contexts) and group formation
operations, and for general information on types of objects used by the MPI
library.

The collective communication routines are built above the point-to-point
routines.  While vendors may optimize certain collective routines for
their architectures, a complete library of the collective communication
routines can be written entirely using point-to-point communication
functions.  We are using naive implementations of the collective calls in terms
of point to point operations in order to provide an operational definition of
their semantics.

The following communication functions are proposed.
\begin{itemize}
\item
Broadcast from one member to all members of a group.
\item
Barrier across all group members
\item
Gather data from all group members to one member.
\item
Scatter data from one member to all members of a group.
\item
Global operations such as sum, max, min, etc., were the result
is known by all group members and a variation where the result is
known by only one member. The ability to have user defined
global operations.
\item
Simultaneous shift of data around the group, the simplest example
being all members sending their data to (rank+1) with wrap around.
\item
Scan across all members of a group (also called parallel prefix).
\item
Broadcast from all members to all members of a group.
\item
Scatter data from all members to all members of a group
(also called complete exchange or index).
\end{itemize}

To simplify the collective communication interface it is
designed with two layers. The low level routines have all the
generality of, and make use of, the buffer descriptor routines
of the point-to-point section which allows arbitrarily complex
messages to be constructed. The second level routines are
similar to the upper level point-to-point routines in that they send
only a contiguous buffer.

\missing {

The current draft does not include the nonblocking collective communication
calls that where discussed at the last meeting.
}

\discuss{

The current proposal assumes that a group carries no ``topology''
information; it is just an ordered set of processes.
}

\subsection{Group Functions}

The point to point document discusses the use of groups (aka contexts), and
describe the operations available for the creation and manipulation of
groups and group objects. For sake of completeness, we list
them anew here.


{\bf \ \\ MPI\_CREATE(handle, type, persistence)} \\
Create new opaque object
\begin{description}
\item[OUT handle] handle to object
\item[IN type] state value that identifies the type of object to be created
\item[IN persistence] state value; either {\tt MPI\_PERSISTENT} or {\tt
MPI\_EPHEMERAL}.
\end{description}

{\bf \ \\ MPI\_FREE(handle)} \\
Destroy object associated with handle.
\begin{description}
\item[IN handle] handle to object
\end{description}


{\bf \ \\ MPI\_ASSOCIATED(handle, type)}  \\
Returns the type of the object the handle is currently associated with, if
such exists.  Returns the special type {\tt MPI\_NULL} if the handle is
not currently associated with any object.
\begin{description}
\item[IN handle] handle to object
\item[OUT type] state
\end{description}


{\bf \ \\ MPI\_COPY\_CONTEXT(newcontext, context)}  \\

Create a new context that includes all processes in the old context.
The rank of the processes in the previous context is preserved.  The call must
be executed by all processes in the old context.  It is a blocking call:  No
call returns until all processes have called the function.
\begin{description}
\item[OUT newcontext]  handle to newly created context.  The handle should not
be associated with an object before the call.
\item[IN context] handle to old context
\end{description}

{\bf \ \\ MPI\_NEW\_CONTEXT(newcontext, context, key, index)} \\
A new context is created for
each distinct value of {\tt key}; this context is shared by all processes that
made the call with this key value.  Within each new context the processes are
ranked according to the order of the {\tt index} values they provided; in case
of ties, processes are ranked according to their rank in the old context.
This call is blocking:  No call returns until all processes in the old context
executed the call.
\begin{description}
\item[OUT newcontext] handle to newly created context at calling process.   This
handle should not be associated with an object before the call.
\item[IN context] handle to old context
\item[IN key] integer
\item[IN index] integer
\end{description}

{\bf \ \\ MPI\_RANK(rank, context)} \\
Return the rank of the calling process within the specified context.
\begin{description}
\item[OUT rank] integer
\item[IN context] context handle
\end{description}


{\bf \ \\ MPI\_SIZE(size, context)} \\
Return the number of processes that belong to the specified context.
\begin{description}
\item[OUT size] integer
\item[IN context] context handle
\end{description}

\paragraph*{Extensions}
Possible extensions:

{\bf \ \\ MPI\_CREATE\_CONTEXT(newcontext, oldcontext,
list\_of\_ranks)} \\
creates a new context out of an explicit list of members
and rank them in their order of occurrence in the list.
\begin{description}
\item[OUT newcontext] handle to newly created context.  Handle should not
be associated with an object before the call.
\item[IN oldcontext] handle to previous context.
\item[IN list\_of\_ranks]
List of the ranks of in the old group of the
processes to be included in new group.
\end{description}

The function is called by all processes in the list, and all
supply the same parameters.


{\bf \ \\ MPI\_EXTEND\_CONTEXT(context, number)} \\
Add processes to an existing context.  The new processes are ranked above
the old context members.
\begin{description}
\item[INOUT context] handle to context object
\item[IN number] number of additional processes (integer)

\end{description}
\subsection{Communication Functions}

The proposed communication functions are divided into two layers.
The lowest level uses the same buffer descriptor objects
available in point-to-point to create noncontiguous, multiple data type
messages. The second level is similar to the block send/receive
point-to-point operations in that it supports only contiguous buffers of
arithmetic storage units.   For each communication operation, we list these two
level of calls together.


\subsubsection{Synchronization}

\paragraph*{Barrier synchronization}

{\bf \ \\ MPI\_BARRIER( group, tag )} \\

MPI\_BARRIER blocks the calling process until all group members have called
it; the call returns at any process only after all group members have
entered the call.
\begin{description}
\item[IN group] group handle
\item[tag] communication tag (integer)
\end{description}

{\tt \ \\ MPI\_BARRIER( group, tag )}  \\ is
\begin{verbatim}
MPI_CREATE(buffer_handle, MPI_BUFFER, MPI_PERSISTENT);
MPI_SIZE( &size, group);
MPI_RANK( &rank, group);
if (rank==0)
{
   for (i=1; i < size; i++)
      MPI_RECV(buffer_handle, i, tag, group);
   for (i=1; i < size; i++)
      MPI_SEND(buffer_handle, i, tag, group);
}
else
{
   MPI_SEND(buffer_handle, 0, tag, group);
   MPI_RECV(buffer_handle, 0, tag, group);
}
MPI_FREE(buffer_handle);
\end{verbatim}

\subsubsection{Data move functions}

\paragraph*{Circular shift}

{\bf \ \\ MPI\_CSHIFT( inbuf, outbuf, tag, group, shift)} \\

Process with rank {\tt i} sends the data in its input buffer to
process with rank $\tt (i+ shift) \bmod  group\_size$, who receives the
data in its output buffer. All processes make the call with the same values for
{\tt tag, group}, and {\tt shift}.  The {\tt shift} value can be positive, zero,
or negative.

\begin{description}
\item[IN inbuf] handle to input buffer descriptor
\item[OUT outbuf] handle to output buffer descriptor
\item[IN tag] operation tag (integer)
\item[IN group] handle to group
\item[IN shift] integer
\end{description}


{\bf \ \\ MPI\_CSHIFTB( inbuf, outbuf, len, tag, group, shift)} \\

Behaves like {\tt MPI\_CSHIFT}, with buffers restricted to be blocks of
numeric units.
All processes make the call with the same values for
{\tt len, tag, group}, and {\tt shift}.
\begin{description}
\item[IN inbuf] initial location of input buffer
\item[OUT outbuf] initial location of output buffer
\item[IN len] number of entries in input (and output) buffers
\item[IN tag] operation tag (integer)
\item[IN group] handle to group
\item[IN shift] integer
\end{description}


{\tt \ \\ MPI\_CSHIFT( inbuf, outbuf, tag, group, shift)} \\ is
\begin{verbatim}
MPI_SIZE( &size, group);
MPI_RANK( &rank, group);
MPI_ISEND( handle, inbuf, mod(rank+shift, size), tag, group);
MPI_RECV( outbuf, mod(rank-shift,size), tag, group)
MPI_WAIT(handle);
\end{verbatim}

\discuss{
Do we want to support the case {\tt inbuf = outbuf} somehow?
}

\paragraph*{End-off shift}

{\bf \ \\ MPI\_EOSHIFT( inbuf, outbuf, tag, group, shift)} \\

Process with rank {\tt i}, $\tt \max( 0, -shift) \le i < min( size, size -
shift)$, sends the data
in its input buffer to process with rank {\tt i+ shift}, who receives the data
in its output buffer.   The output buffer of processes which do not receive
data is left unchanged.   All processes
make the call with the same values for {\tt tag, group}, and {\tt shift}.

\begin{description}
\item[IN inbuf] handle to input buffer descriptor
\item[OUT outbuf] handle to output buffer descriptor
\item[IN tag] operation tag (integer)
\item[IN group] handle to group
\item[IN shift] integer
\end{description}


{\bf \ \\ MPI\_EOSHIFTB( inbuf, outbuf, len, tag, group, shift)} \\

Behaves like {\tt MPI\_EOSHIFT}, with buffers restricted to be blocks of
numeric units.
All processes make the call with the same values for
{\tt len, tag, group}, and {\tt shift}.
\begin{description}
\item[IN inbuf] initial location of input buffer
\item[OUT outbuf] initial location of output buffer
\item[IN len] number of entries in input (and output) buffers
\item[IN tag] operation tag (integer)
\item[IN group] handle to group
\item[IN shift] integer
\end{description}

\discuss{

Two other possible definitions for end-off shift: (i) zero filling for processes
that don't receive messages, or (ii) boundary values explicitly provided as an
additional parameter.  Any preferences?
(Fortran 90 allows to optionally provide boundary values, and does zero filling,
if none were provided)

}

\paragraph*{Broadcast}

{\bf \ \\  MPI\_BCAST( buffer\_handle, tag, group, root )} \\

{\tt MPI\_BCAST} broadcasts a message from the process with rank {\tt root} to
all other processes
of the group. It is called by all members of group using the same arguments for
{\tt tag, group, and root}.
On return the contents of the buffer of the process with rank {\tt root}
is contained in buffer of all group members.
\begin{description}
\item[INOUT buffer\_handle]  Handle for buffer where from message is
sent or received.
\item[IN tag] tag of communication operation (integer)
\item[IN group] context of communication (handle)
\item[IN root] rank of broadcast root (integer)
\end{description}


{\bf \ \\  MPI\_BCASTB( buf, len, tag, group, root )} \\

{\tt MPI\_BCASTB} behaves like broadcast, restricted to a block buffer.
It is called by all processes with the same arguments for {\tt len, tag, group}
and {\tt root}.
\begin{description}
\item[INOUT buffer]  Starting address of buffer (choice type)
\item[IN len] Number of words in buffer (integer)
\item[IN tag] tag of communication operation (integer)
\item[IN group] context of communication (handle)
\item[in root] rank of broadcast root (integer)
\end{description}


{\tt \ \\  MPI\_BCAST( buffer\_handle, tag, group, root )} \\
is
\begin{verbatim}
MPI_SIZE( &size, context);
MPI_RANK( &rank, context);
MPI_IRECV(handle, buffer_handle, root, tag, group);
if (rank==root)
   for (i=0; i < size; i++)
      MPI_SEND(buffer_handle, i, tag, group);
MPI_WAIT(handle)
\end{verbatim}

\paragraph*{Gather}

{\bf \ \\ MPI\_GATHER( inbuf, outbuf, tag, group, root, len) } \\

Each process (including the root process) sends the content of its input
buffer to the root process.  The root process concatenates all the
incoming messages in the order of the senders' rank and places the
results in its output buffer.
It is called by all members of group using the same arguments for
{\tt tag, group}, and {\tt root}.   The input buffer of each process may have
different length.
\begin{description}
\item[IN inbuf] handle to input buffer descriptor
\item[OUT outbuf] handle to output buffer descriptor -- significant only at root
(choice)
\item[IN tag] operation tag (integer)
\item[IN group] group handle
\item[IN root] rank of receiving process (integer)
\item[OUT len] difference between output buffer size (in bytes) and
number of bytes received.
\end{description}

\discuss{

It would be more elegant (but no more convenient) to have a return status
object.

If we follow ``accepted practice'' we shall return number of bytes
received.   The choice here and in subsequent similar functions
should be consistent with similar choice for point to point routines.

}

{\bf \ \\ MPI\_GATHERB( inbuf, inlen, outbuf, tag, group, root) } \\

{\tt MPI\_GATHER} behaves like {\tt MPI\_GATHER} restricted to block
buffers, and with the additional restriction that all input buffers should
have the same length.   All processes should provided the same values for
{\tt inlen, tag, group}, and {\tt root} .
\begin{description}
\item[IN inbuf] first variable of input buffer (choice)
\item[IN inlen] Number of (word) variables in input buffer (integer)
\item[OUT outbuf] first variable of output buffer -- significant only at
root (choice)
\item[IN tag] operation tag (integer)
\item[IN group] group handle
\item[IN root] rank of receiving process (integer)
\end{description}


{\tt \ \\ MPI\_GATHERB( inbuf, inlen, outbuf, tag, group, root) } \\
is
\begin{verbatim}
MPI_SIZE( &size, group);
MPI_RANK( &rank, group);
MPI_ISENDB(handle, inbuf, inlen, root, tag, group);
if (rank==root)
   for (i=0; i < size; i++)
   {
      MPI_RECVB(outbuf, inlen, i, tag, group, return_status);
      outbuf += inlen;
   }
MPI_WAIT(handle);
\end{verbatim}

\paragraph*{Scatter}

{\bf \ \\ MPI\_SCATTER( list\_of\_inbufs, outbuf, tag, group, root, len)} \\

The root process sends the content of its {\tt i}-th input buffer
to the process with rank {\tt i}; each process (including the root process)
stores the incoming message in its output buffer.
The difference between the size of
the output buffer (in bytes) and the number of bytes received is returned
in {\tt len}.  The routine is called by all members of the group using the same
arguments for {\tt tag, group}, and {\tt root}.
\begin{description}
\item[IN list\_of\_inbufs] list of buffer descriptor handles
\item[OUT outbuf] buffer descriptor handle
\item[IN tag]  operation tag (integer)
\item[IN group] handle
\item[IN root]  rank of sending process (integer)
\item[OUT len]  number of remaining bytes in the output buffer at each process
(integer)
\end{description}


{\tt \ \\ MPI\_SCATTER( list\_of\_inbufs, outbuf, tag, group, root, len)} \\
is
\begin{verbatim}
MPI_SIZE( &size, group);
MPI_RANK( &rank, group);
MPI_IRECV(handle, outbuf, root, tag, group);
if (rank=root)
   for (i=0; i < size; i++)
      MPI_SEND(inbuf[i], i, tag, group);
MPI_WAIT(handle, return_status);
MPI_RETURN_STATUS(return_status, len, source, tag);
\end{verbatim}


{\bf \ \\ MPI\_SCATTERB( inbuf, outbuf, len, tag, group, root)}
\\

{\tt MPI\_SCATTERB} behaves like {\tt MPI\_SCATTER} restricted to block buffers,
and with the additional restriction that all output buffers have the same
length. The input buffer block of the root process is partitioned into
{\tt n} consecutive blocks,
each consisting of {\tt len} words.  The {\tt i}-th block is sent to the
{\tt i}-th process in the group and stored in its output buffer.
The routine is called by all members of the group using the same
arguments for {\tt tag, group, len}, and {\tt root}.
\begin{description}
\item[IN inbuf] first entry in input buffer -- significant only at root
(choice).
\item[OUT outbuf] first entry in output buffer (choice).
\item[IN len]  number of entries to be stored in output buffer (integer)
\item[IN group] handle
\item[IN root]  rank of sending process (integer)
\end{description}


{\tt \ \\ MPI\_SCATTERB( inbuf, outbuf, outlen, tag, group, root) } \\
is
\begin{verbatim}
MPI_SIZE( &size, group);
MPI_RANK( &rank, group);
MPI_IRECVB( handle, outbuf, outlen, root, tag, group);
if (rank=root)
   for (i=0; i < size; i++)
   {
      MPI_SENDB(inbuf, outlen, i, tag, group, return_status);
      inbuf += outlen;
   }
MPI_WAIT(handle);
\end{verbatim}

\paragraph*{All-to-all scatter}

{\bf \ \\ MPI\_ALLSCATTER( list\_of\_inbufs, outbuf, tag, group, len)} \\

Each process in the group sends its {\tt i}-th buffer in its input buffer list
to the process with rank {\tt i} (itself included); each process concatenates
the incoming messages in its output buffer, in the order of the senders' ranks.
The number of bytes left in the output buffer is returned
in {\tt len}.  The routine is called by all members of the group using the same
arguments for {\tt tag} and {\tt group}.
\begin{description}
\item[IN list\_of\_inbufs] list of buffer descriptor handles
\item[OUT outbuf] buffer descriptor handle
\item[IN tag]  operation tag (integer)
\item[IN group] handle
\item[OUT len]  number of remaining bytes in the output buffer (integer)
\end{description}




{\bf \ \\ MPI\_ALLSCATTERB( inbuf, outbuf, len, tag, group)} \\

{\tt MPI\_ALLSCATTERB} behaves like {\tt MPI\_ALLSCATTER} restricted to
block buffers,
and with the additional restriction that all blocks sent from one process
to another have
the same length. The input buffer block of each process is partitioned
into {\tt n} consecutive blocks,
each consisting of {\tt len} words.  The {\tt i}-th block is sent to the
{\tt it}-th process in the group.  Each process concatenates the incoming
messages, in the order of the senders' ranks, and store them in its output
buffer. The routine is called by all members of the group using the same
arguments for {\tt tag, group}, and {\tt len}.
\begin{description}
\item[IN inbuf] first entry in input buffer (choice).
root (integer)
\item[OUT outbuf] first entry in output buffer (choice).
\item[IN len]  number of entries sent from each process to each other (integer).
\item[IN tag]  operation tag (integer)
\item[IN group] handle
\end{description}


{\tt \ \\ MPI\_ALLSCATTERB( inbuf, outbuf, len, tag, group)} \\ is
\begin{verbatim}
MPI_SIZE( &size, group);
MPI_RANK( &rank, group);
for (i=0; i < rank; i++)
   {
    MPI_IRECVB(recv_handles[i], outbuf, len, tag, group);
    outbuf += len;
   }
for (i=0; i < size; i++)
   {
    MPI_ISENDB(send_handle[i], inbuf, len, i, tag, group);
    inbuf += len;
   }
MPI_WAITALL(send_handle);
MPI_WAITALL(recv_handle);
\end{verbatim}

\paragraph*{All-to-all broadcast}

{\bf \ \\ MPI\_ALLCAST( inbuf, outbuf, tag, group, len)} \\

Each process in the group broadcasts its input buffer
to all processes (including itself);
each process concatenates
the incoming messages in its output buffer, in the order of the senders' ranks.
The number of bytes left in the output buffer is returned
in {\tt len}.  The routine is called by all members of the group using the same
arguments for {\tt tag} and {\tt group}.
\begin{description}
\item[IN inbuf] buffer descriptor handle for input buffer
\item[OUT outbuf] buffer descriptor handle for output buffer
\item[IN tag]  operation tag (integer)
\item[IN group] handle
\item[OUT len]  number of remaining untouched bytes in each output buffer
(integer)
\end{description}




{\bf \ \\ MPI\_ALLCASTB( inbuf, outbuf, len, tag, group)} \\

{\tt MPI\_ALLCASTB} behaves like {\tt MPI\_ALLCAST} restricted to
block buffers,
and with the additional restriction that all blocks sent from one process
to another have the same length.
The routine is called by all members of the group using the same
arguments for {\tt tag, group}, and {\tt len}.
\begin{description}
\item[IN inbuf] first entry in input buffer (choice).
root (integer)
\item[OUT outbuf] first entry in output buffer (choice).
\item[IN len]  number of entries sent from each process to each other
(including itself).
\item[IN group] handle
\end{description}


{\tt \ \\ MPI\_ALLCASTB( inbuf, outbuf, len, tag, group)} \\ is
\begin{verbatim}
MPI_SIZE( &size, group);
MPI_RANK( &rank, group);
for (i=0; i < rank; i++)
   {
    MPI_IRECVB(recv_handles[i], outbuf, len, tag, group);
    outbuf += len;
   }
for (i=0; i < size; i++)
   {
    MPI_ISENDB(send_handle[i], inbuf, len, i, tag, group);
   }
MPI_WAITALL(send_handle);
MPI_WAITALL(recv_handle);
\end{verbatim}


\subsubsection{Global Compute Operations}

\paragraph*{Reduce}

{\bf \ \\ MPI\_REDUCE( inbuf, outbuf, tag, group, root, op)} \\

Combines the values provided in the input buffer of each process in the
group, using the operation {\tt op}, and returns the combined value in
the output buffer of the process with rank {\tt root}.
Each process can provide one value, or a sequence of values, in which case the
combine operation is executed pointwise on each entry of the sequence.
For example, if the operation is {\tt max} and input buffers contains two
floating point numbers, then outbuf(1) $=$ global max(inbuf(1)) and
outbuf(2) $=$ global max(inbuf(2)). All input
buffers should define sequences of equal length of entries of types
that match the type of the operands of {\tt op}.  The
output buffer should define a sequence of the same length of entries of
types that match the type of the result of {\tt op}.
(Note that,
here as for all other communication operations, the type of entries inserted in
a message depend on the information provided by the input buffer descriptor, and
not on the declarations of these variables in the calling program.   The types
of the variables in the calling program need not match the types defined by the
buffer descriptor, but in such case the outcome of a reduce operation may be
implementation dependent.)

The operation
defined by {\tt op} is associative and commutative, and the implementation can
take advantage of associativity and commutativity in order to change
order of evaluation.
The routine is called by all group members using the same arguments
for {\tt tag, group, root} and {\tt op}.
\begin{description}
\item[IN inbuf] handle to input buffer
\item[OUT outbuf] handle to output buffer -- significant only at root
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN root] rank of root process (integer)
\item[IN op] operation (status)
\end{description}

We list below the operations are supported for Fortran, each with the
corresponding value of the {\tt op} parameter.
\begin{description}
\item[MPI\_IMAX] integer maximum
\item[MPI\_RMAX] real maximum
\item[MPI\_DMAX] double precision real maximum
\item[MPI\_IMIN] integer minimum
\item[MPI\_RMIN] real minimum
\item[MPI\_DMIN] double precision real minimum
\item[MPI\_ISUM] integer sum
\item[MPI\_RSUM] real sum
\item[MPI\_DSUM] double precision real sum
\item[MPI\_CSUM] complex sum
\item[MPI\_DCSUM] double precision complex sum
\item[MPI\_IPROD] integer product
\item[MPI\_RPROD] real product
\item[MPI\_DPROD] double precision real product
\item[MPI\_CPROD] complex product
\item[MPI\_DCPROD] double precision complex product
\item[MPI\_AND] logical and
\item[MPI\_IAND] integer (bit-wise) and
\item[MPI\_OR] logical or
\item[MPI\_IOR] integer (bit-wise) or
\item[MPI\_XOR] logical xor
\item[MPI\_IXOR] integer (bit-wise) xor
\item[MPI\_MAXLOC] rank of process with maximum integer value
\item[MPI\_MAXRLOC] rank of process with maximum real value
\item[MPI\_MAXDLOC] rank of process with maximum double precision real value
\item[MPI\_MINLOC] rank of process with minimum integer value
\item[MPI\_MINRLOC] rank of process with minimum real value
\item[MPI\_MINDLOC] rank of process with minimum double precision real value
\end{description}

{\bf \ \\ MPI\_REDUCEB( inbuf, outbuf, len, tag, group, root, op)} \\

Is same as {\tt MPI\_REDUCE}, restricted to a block buffer.
\begin{description}
\item[IN inbuf] first location in input buffer
\item[OUT outbuf] first location in output buffer -- significant only at root
\item[IN len] number of entries in input and output buffer (integer)
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN root] rank of root process (integer)
\item[IN op] operation (status)
\end{description}

\discuss{

If we are to be compatible with the point to point block operations, the
{\tt len} parameter should indicate the number of words in buffer.  But it
might be more natural to have {\tt len} indicate the number of entries in
the buffer, so that if the entries are complex or double precision, {\tt
len} will be half the number of words in the buffer.

}


{\bf \ \\ MPI\_USER\_REDUCE( inbuf, outbuf, tag, group, root, function)} \\

Same as the reduce operation function above except that a user
supplied function is used.  {\tt function} is an associative and commutative
function with two arguments.  The types of the two arguments and of the
returned value of the function, and the types of all entries in the
input and output buffers all agree.  The output buffer has the same
length as the input buffer.
\begin{description}
\item[IN inbuf] handle to input buffer
\item[OUT outbuf] handle to output buffer -- significant only at root
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN root] rank of root process (integer)
\item[IN function] user provided function
\end{description}

{\bf \ \\ MPI\_USER\_REDUCEB( inbuf, outbuf, len, tag, group, root, function)}
\\
Is same as {\tt MPI\_\_USER\_REDUCE}, restricted to a block buffer.
\begin{description}
\item[IN inbuf] first location in input buffer
\item[OUT outbuf] first location in output buffer -- significant only at root
\item[IN len] number of entries in input and output buffer (integer)
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN root] rank of root process (integer)
\item[IN op] operation (status)
\end{description}


\discuss{

Do we also want a version of reduce that broadcasts the result to all processes
in the group?  (This can be achieved by a reduce followed by a broadcast, but a
combined function may be somewhat more efficient.

Do we want a user provided {\em function} (two IN parameters, one OUT
value), or a user provided procedure that overwrites the second input
(ie. one IN param, one INOUT param, the equivalent of C {\tt a op= b}
type assignment)?  The second choice may allow a
more efficient implementation, without changing the semantics of the
MPI call.

Various peoples have suggested an {\tt MPI\_GLOBAL\_USER\_REDUCE} function
where the user function is applied to the entire buffer as one argument, rather
then piece-wise to each entry in the buffer.
A possible definition is given below.

{\bf \ \\ MPI\_GLOBAL\_USER\_REDUCE( inbuf, outbuf, tag,
group, root, routine)} \\

Same as the user reduce operation function above except that the user
supplied routine applies to the entire buffer at once.
{\tt routine} has {\tt 2n} parameters:
{\tt routine( a1, ..., an, b1, ... bn)}.
Each argument {\tt ai} has
intent {\tt IN} and each argument {\tt bi} is intent {\tt INOUT}.
The function assigns to {\tt bi} the value {\tt ai $op_i$ bi},
$op_i$ is a commutative and associative operator (possibly distinct
for each $i$).   Both input buffer and output buffer have {\tt n}
entries, and the type of the {\tt i}-th entry in each agree with the type
of {\tt ai} and of {\tt bi}.
\begin{description}
\item[IN inbuf] handle to input buffer
\item[OUT outbuf] handle to output buffer -- significant only at root
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN root] rank of root process (integer)
\item[IN routine] user provided routine
\end{description}

A similar ``block'' function can be defined.  Note that, in Fortran 77, there
is no straightforward mechanism for passing a heterogeneous structure as one
argument to a function, or have a function return a heterogeneous structure as
result.

A more ``reasonable'' design for a global user reduce function is possible in
the case where all buffer entries have the same type.

}

\paragraph*{Scan}

{\bf \ \\  MPI\_SCAN( inbuf, outbuf, tag, group, op )} \\

MPI\_SCAN is used to perform a parallel prefix with respect to
an associative reduction operation on data distributed across the group.
The operation returns in the output buffer of the process with rank {\tt i} the
reduction of the values in the input buffers of processes with ranks {\tt
0,...,i}.  The type of operations supported and their semantic, and the
constraints on input and output buffers are as for {\tt MPI\_REDUCE}.
\begin{description}
\item[IN inbuf] handle to input buffer
\item[OUT outbuf] handle to output buffer
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN op] operation (status)
\end{description}

{\bf \ \\  MPI\_SCANB( inbuf, outbuf, len, tag, group, op )} \\
Same as {\tt MPI\_SCAN}, restricted to block buffers.

\begin{description}
\item[IN inbuf] first input buffer element (choice)
\item[OUT outbuf] first output buffer element (choice)
\item[IN len] number of entries in input and output buffer (integer)
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN op] operation (status)
\end{description}


{\bf \ \\  MPI\_USER\_SCAN( inbuf, outbuf, tag, group, function )} \\

Same as the scan operation function above except that a user
supplied function is used.  {\tt function} is an associative and commutative
function with two arguments.  The types of the two arguments and of the
returned values all agree.
\begin{description}
\item[IN inbuf] handle to input buffer
\item[OUT outbuf] handle to output buffer
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN function] user provided function
\end{description}

{\bf \ \\ MPI\_USER\_SCANB( inbuf, outbuf, len, tag, group, function)}
\\
Is same as {\tt MPI\_USER\_SCAN}, restricted to a block buffer.
\begin{description}
\item[IN inbuf] first location in input buffer
\item[OUT outbuf] first location in output buffer
\item[IN len] number of entries in input and output buffer (integer)
\item[IN tag]  operation tag (integer)
\item[IN group] handle to group
\item[IN function] user provided function
\end{description}

\discuss{

Do we want scan operations executed by segments? (The HPF definition of prefix
and suffix operation might be handy -- in addition to the scanned vector of
values there is a mask that tells where segments start and end.)
}

\missing{

Nonblocking (immediate) collective operations.  The syntax is obvious:   for
each collective operation  {\tt MPI\_op(params)} one may have a new nonblocking
collective operation of the form {\tt MPI\_Iop(handle, params)}, that initiates
the execution of the corresponding operation.  The execution of the operation
is completed by executing {\tt MPI\_WAIT(handle,...},  {\tt
MPI\_STATUS(handle,...)},  {\tt MPI\_WAITALL}, {\tt MPI\_WAITANY}, or {\tt
MPI\_STATUSANY}.   There are three issues to consider:

(i) The exact definition of the semantics of there operations (in particular
constraints on order.

(ii) The complexity of implementation (including the complexity of having the
same {\tt WAIT} or {\tt STATUS} functions apply both to point-to-point and to
collective operations).

(iii) The accrued performance advantage.
}

\subsection{Correctness}

\discuss{ This is still very preliminary}

The semantics of the collective communication operations can be derived from
their operational definition in terms of  point-to-point communication.  It is
assumed that messages pertaining to one
operation cannot be confused with messages pertaining to another operation.
Also messages pertaining to two distinct occurrences of the same operation
cannot be confused, if the two occurrences have distinct parameters.
The relevant parameters for this purpose are {\tt group}, {\tt tag}, {\tt
root} and {\tt op}.
messages pertaining to another occurrence of the same operation, with different
parameters.   The implementer can, of course, use another, more efficient
implementation, as long as it has the same effect.

\discuss{

This statement does not yet apply to the current, incomplete and
somewhat careless definitions I provided in this draft.

The definition above means that messages pertaining to a collective
communication carry information identifying the operation itself, and the
values of the {\tt tag, group} and,
where relevant, {\tt root} or {\tt op} parameters.
Is this acceptable?

}


A few examples:

\begin{verbatim}
MPI_BCAST(buf, len, tag, group, 0);
MPI_BCAST(buf, len, tag, group, 1);
\end{verbatim}

Two consecutive broadcasts, in the same group, with the same tag, but different
roots.  Since the operations are distinguishable, messages from one broadcast
cannot be confused with messages from the other broadcast; the program is safe
and will execute as expected.

\begin{verbatim}
MPI_BCAST(buf, len, tag, group, 0);
MPI_BCAST(buf, len, tag, group, 0);
\end{verbatim}

Two consecutive broadcasts, in the same group, with the same tag and root.
Since point-to-point communication preserves the order of messages
here, too, messages from one broadcast will not be confused with messages from
the other broadcast; the program is safe and will execute as intended.

\begin{verbatim}
MPI_RANK(&rank, group)
if (rank==0)
  {
   MPI_BCASTB(buf, len, tag, group, 0);
   MPI_SENDB(buf, len, 2, tag, group);
  }
elseif (rank==1)
  {
   MPI_RECVB(buf, len, MPI_DONTCARE, tag, group);
   MPI_BCASTB(buf, len, tag, group, 0);
   MPI_RECVB(buf, len, MPI_DONTCARE, tag, group);
  }
else
  {
   MPI_SENDB(buf, len, 2, tag, group);
   MPI_BCASTB(buf, len, tag, group, 0);
  }
\end{verbatim}

Process zero executes a broadcast followed by a send to process one;
process two executes a send to process one, followed by a broadcast;
and process one executes a receive, a broadcast and a receive.
A possible outcome is for the operations to be matched as illustrated by the
diagram below.

\begin{verbatim}


    0                       1                      2

                / - >  receive            / - send
              /                         /
broadcast   /         broadcast       /   broadcast
           /                        /
  send   -             receive  < -


\end{verbatim}

The reason is that broadcast is not a synchronous operation; the call at a
process may return before the other processes have entered the broadcast.
Thus, the message sent by process zero can arrive to process one before the
message sent by process two, and before the call to broadcast on process one.

\end{document}



From owner-mpi-collcomm@CS.UTK.EDU  Wed Mar 24 00:17:08 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA04629; Wed, 24 Mar 93 00:17:08 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA19214; Wed, 24 Mar 93 00:16:45 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 24 Mar 1993 00:16:45 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from watson.ibm.com by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA19204; Wed, 24 Mar 93 00:16:43 -0500
Message-Id: <9303240516.AA19204@CS.UTK.EDU>
Received: from YKTVMV by watson.ibm.com (IBM VM SMTP V2R3) with BSMTP id 2223;
   Wed, 24 Mar 93 00:16:44 EST
Date: Wed, 24 Mar 93 00:12:02 EST
From: "Marc Snir" <snir@watson.ibm.com>
X-Addr: (914) 945-3204  (862-3204)
        28-226 IBM T.J. Watson Research Center
        P.O. Box 218 Yorktown Heights NY 10598
To: mpi-collcomm@cs.utk.edu
Subject: new draft
Reply-To: SNIR@watson.ibm.com

Minor changes, some discussion of alternative choices of reduce with
user provided function.  Thanks to Rolph Hempel, Jon Flower, Robert Harrison,
and everybody else for their comments.

By the way, Steve Otto will put out in a day or two (isn't it, Otto?) a new
complete draft in Postscript format -- So you poor dislatexic guys, be
patient.
From owner-mpi-collcomm@CS.UTK.EDU  Wed Mar 24 18:18:20 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA04031; Wed, 24 Mar 93 18:18:20 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA09968; Wed, 24 Mar 93 18:17:26 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 24 Mar 1993 18:17:25 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from fslg8.fsl.noaa.gov by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA09960; Wed, 24 Mar 93 18:17:22 -0500
Received: by fslg8.fsl.noaa.gov (5.57/Ultrix3.0-C)
	id AA27247; Wed, 24 Mar 93 23:17:16 GMT
Received: by macaw.fsl.noaa.gov (4.1/SMI-4.1)
	id AA01503; Wed, 24 Mar 93 16:15:59 MST
Date: Wed, 24 Mar 93 16:15:59 MST
From: hender@macaw.fsl.noaa.gov (Tom Henderson)
Message-Id: <9303242315.AA01503@macaw.fsl.noaa.gov>
To: mpi-collcomm@cs.utk.edu
Subject: Re:   Revised Collective Draft - consistent with p2p draft


Hi all,

I have a few (mostly minor) comments/questions about the current Collective 
Communication proposal.  

1.  One feature I especially like in the point-to-point proposal is the use of 
    a (start, len, datatype) triplet to describe a sequence of contiguous 
    values (block).  Is this going to appear in collective communication 
    also?  I'd like to see it.  For example, MPI_CSHIFTB() would then look 
    like:  

    MPI_CSHIFTB(inbuf, outbuf, len, datatype, tag, group, shift)


2.  How should the buffer descriptor at the root process be specified during a 
    call to MPI_GATHER()?  (This happens elsewhere as well.)  When considering 
    the simple "BLOCK" buffer components only, I can see two alternatives:  

    A)  outbuf in the root is identical to the inbuf in each of the other 
        processes except for length (ie they all have the same number and type 
        of buffer components).  Each buffer component in the root must have 
        length equal to the sum of lengths of corresponding buffer components 
        in the other processes.  For example, suppose process 0 is the root for 
        an MPI_GATHER() called by processes 0, 1, and 2.  If processes 1 and 2 
        have buffer components with the following characteristics:  

            Process 1

                Buffer Component Number:  0
                Buffer Component Type:    BLOCK
                Data Type:                DOUBLE
                Length:                   100

                Buffer Component Number:  1
                Buffer Component Type:    BLOCK
                Data Type:                INTEGER
                Length:                   5


            Process 2

                Buffer Component Number:  0
                Buffer Component Type:    BLOCK
                Data Type:                DOUBLE
                Length:                   200

                Buffer Component Number:  1
                Buffer Component Type:    BLOCK
                Data Type:                INTEGER
                Length:                   10


        then Process 0 must have buffer components with the following 
        characteristics:  

            Process 0

                Buffer Component Number:  0
                Buffer Component Type:    BLOCK
                Data Type:                DOUBLE
                Length:                   300

                Buffer Component Number:  1
                Buffer Component Type:    BLOCK
                Data Type:                INTEGER
                Length:                   15

        In this case, the routine would behave like a bunch of separate calls 
        to MPI_GATHERB().  

    B)  outbuf contains the sum of all buffer components in all buffer 
        descriptors in the other processes, in the appropriate order.  For 
        the same example, if processes 1 and 2 have buffer components with the 
        following characteristics:  

            Process 1

                Buffer Component Number:  0
                Buffer Component Type:    BLOCK
                Data Type:                DOUBLE
                Length:                   100

                Buffer Component Number:  1
                Buffer Component Type:    BLOCK
                Data Type:                INTEGER
                Length:                   5


            Process 2

                Buffer Component Number:  0
                Buffer Component Type:    BLOCK
                Data Type:                DOUBLE
                Length:                   200

                Buffer Component Number:  1
                Buffer Component Type:    BLOCK
                Data Type:                INTEGER
                Length:                   10


        then Process 0 must have buffer components with the following 
        characteristics:  

            Process 0

                Buffer Component Number:  0
                Buffer Component Type:    BLOCK
                Data Type:                DOUBLE
                Length:                   100

                Buffer Component Number:  1
                Buffer Component Type:    BLOCK
                Data Type:                INTEGER
                Length:                   5

                Buffer Component Number:  2
                Buffer Component Type:    BLOCK
                Data Type:                DOUBLE
                Length:                   200

                Buffer Component Number:  3
                Buffer Component Type:    BLOCK
                Data Type:                INTEGER
                Length:                   10

        In this case, MPI_GATHER() is a bit more flexible.  

    I'm not sure which I prefer...  "A" may be a bit easier.  I think that we 
    should pick one and say so explicitly in the document.  

    When considering the other types of buffer components (VECTOR and INDEX) 
    it looks like "BLOCK" could be replaced by either "VECTOR" or "INDEX" 
    anywhere in the examples above as long as the total length of each buffer 
    component is preserved.  (This is really point-to-point stuff now.  Is 
    mixing of different buffer components permitted?  I can't see how to 
    prevent it without sending extra junk along with each message...)  


3.  MPI_GATHER() returns "len" == difference in bytes between number of bytes 
    expected and number of bytes received at the root.  (Total number of bytes 
    delivered to the root is proposed as an alternative.)  In MPI_GATHERB() 
    there is no equivalent return value and "inlen" refers to words.  In 
    MPI_CSHIFTB() "len" means "number of elements".  I think this might be 
    confusing (I'm confused!  :-).  I would like to see a "status" returned 
    from each of these routines that behaves in the same way (like "0" means 
    success or something).  (Are you suggesting this in the "Discussion"?)  
    Also, do all calling processes get the return value?  


4.  MPI_REDUCE() has the following op parameters:  

    MPI_IMAX integer maximum
    MPI_RMAX real maximum
    MPI_DMAX double precision real maximum
    MPI_IMIN integer minimum
    MPI_RMIN real minimum
    MPI_DMIN double precision real minimum
    MPI_ISUM integer sum
    MPI_RSUM real sum
    MPI_DSUM double precision real sum
    MPI_CSUM complex sum
    MPI_DCSUM double precision complex sum
    MPI_IPROD integer product
    MPI_RPROD real product
    MPI_DPROD double precision real product
    MPI_CPROD complex product
    MPI_DCPROD double precision complex product
    MPI_AND logical and
    MPI_IAND integer (bit-wise) and
    MPI_OR logical or
    MPI_IOR integer (bit-wise) or
    MPI_XOR logical xor
    MPI_IXOR integer (bit-wise) xor
    MPI_MAXLOC rank of process with maximum integer value
    MPI_MAXRLOC rank of process with maximum real value
    MPI_MAXDLOC rank of process with maximum double precision real value
    MPI_MINLOC rank of process with minimum integer value
    MPI_MINRLOC rank of process with minimum real value
    MPI_MINDLOC rank of process with minimum double precision real value

    Since buffer components contain data type information, it seems like these 
    could be reduced to:  

    MPI_MAX    maximum (integer, real, or double)
    MPI_MIN    minimum (integer, real, or double)
    MPI_SUM    sum (integer, real, double, complex, or double complex)
    MPI_PROD   product (integer, real, double, complex, or double complex)
    MPI_AND    and (logical or bit-wise integer)
    MPI_OR     or (logical or bit-wise integer)
    MPI_XOR    xor (logical or bit-wise integer)
    MPI_MAXLOC rank of process with maximum value (integer, real, or double)
    MPI_MINLOC rank of process with minimum value (integer, real, or double)
    (I kind of hate to suggest getting rid of MPI_MINDLOC...  :-)

    This makes sense for MPI_REDUCEB() if datatype is explicitly included in 
    the parameter list as in point 1.  

    MPI_REDUCEB(inbuf, outbuf, len, datatype, tag, group, root, op)

    I'm completely in favor of having "len" refer to number of entries in a 
    buffer for all the MPI_xxxxxB() routines.  

Generally, I like this proposal.  


Tom Henderson
NOAA Forecast Systems Laboratory
hender@fsl.noaa.gov


From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 25 09:09:46 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA14741; Thu, 25 Mar 93 09:09:46 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA17983; Thu, 25 Mar 93 09:08:58 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 25 Mar 1993 09:08:57 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from marge.meiko.com by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA17972; Thu, 25 Mar 93 09:08:45 -0500
Received: from hub.meiko.co.uk by marge.meiko.com with SMTP id AA01580
  (5.65c/IDA-1.4.4 for <mpi-collcomm@cs.utk.edu>); Thu, 25 Mar 1993 09:08:42 -0500
Received: from float.co.uk (float.meiko.co.uk) by hub.meiko.co.uk (4.1/SMI-4.1)
	id AA18010; Thu, 25 Mar 93 14:08:39 GMT
Date: Thu, 25 Mar 93 14:08:39 GMT
From: jim@meiko.co.uk (James Cownie)
Message-Id: <9303251408.AA18010@hub.meiko.co.uk>
Received: by float.co.uk (5.0/SMI-SVR4)
	id AA07231; Thu, 25 Mar 93 14:05:11 GMT
To: ho@almaden.ibm.com
Cc: mpi-collcomm@cs.utk.edu
In-Reply-To: "Ching-Tien (Howard) Ho"'s message of Tue, 23 Mar 93 11:24:18 PST <9303231926.AA25686@CS.UTK.EDU>
Subject: No tag for a CC routine?
Content-Length: 916

In the current draft of the CC chapter, the explanation of the way in
which the collective routines function is in terms of point to point,
and it uses the supplied tag to do the necessary selection...

I guess that this conforms to your first semantic (tag unique in the
group, and all other groups which have intersecting members with this
group). [Actually this means program wide unique, since all processes
are in the INITIAL or ALL group !]

Why is this so unpleasant ? It seems to me to be no more than the
normal requirements of a tag, which are that the user's application
understands it and does not incorrectly replicate it.

-- Jim
James Cownie 
Meiko Limited			Meiko Inc.
650 Aztec West			Reservoir Place
Bristol BS12 4SD		1601 Trapelo Road
England				Waltham
				MA 02154

Phone : +44 454 616171		+1 617 890 7676
FAX   : +44 454 618188		+1 617 890 5042
E-Mail: jim@meiko.co.uk   or    jim@meiko.com



From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 25 12:16:41 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA20707; Thu, 25 Mar 93 12:16:41 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA25872; Thu, 25 Mar 93 12:16:09 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 25 Mar 1993 12:16:08 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from pnlg.pnl.gov by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA25864; Thu, 25 Mar 93 12:16:05 -0500
Received: from carbon.pnl.gov (130.20.188.38) by pnlg.pnl.gov; Thu, 25 Mar 93
 09:12 PST
Received: from sodium.pnl.gov by carbon.pnl.gov (4.1/SMI-4.1) id AA10516; Thu,
 25 Mar 93 09:10:33 PST
Received: by sodium.pnl.gov (4.1/SMI-4.0) id AA01146; Thu, 25 Mar 93 09:10:29
 PST
Date: Thu, 25 Mar 93 09:10:29 PST
From: rj_littlefield@pnlg.pnl.gov
Subject: Re:  No tag for a CC routine?
To: ho@almaden.ibm.com, jim@meiko.co.uk
Cc: d39135@carbon.pnl.gov, mpi-collcomm@cs.utk.edu
Message-Id: <9303251710.AA01146@sodium.pnl.gov>
X-Envelope-To: mpi-collcomm@cs.utk.edu

Jim Cownie writes:

> In the current draft of the CC chapter, the explanation of the way in
> which the collective routines function is in terms of point to point,
> and it uses the supplied tag to do the necessary selection...

Yes, the draft does this, but it's arguably an oversight.  See below.

> I guess that this conforms to your first semantic (tag unique in the
> group, and all other groups which have intersecting members with this
> group). [Actually this means program wide unique, since all processes
> are in the INITIAL or ALL group !]
>
> Why is this so unpleasant ? It seems to me to be no more than the
> normal requirements of a tag, which are that the user's application
> understands it and does not incorrectly replicate it.

It's unpleasant because the coding used in the draft would break
if another module in the group happened to use wildcard receive.
(The draft itself acknowledges that the draft example routines
are not bulletproof.)

The preferred way to isolate one collective comm's messages from all
others is to use "context".  All of the context/group proosals
provide mechanisms to make this cheap and effective.  Presumably
a subsequent draft of collective communication will reflect whatever
mechanism the committee selects for context management.

A collective comm routine might use tags internally to keep its
own messages straight.  But then it needs more than one tag, so
passing one in as an argument would not even be adequate.

> -- Jim
> James Cownie 

--Rik
----------------------------------------------------------------------
rj_littlefield@pnl.gov (alias 'd39135')   Rik Littlefield
Tel: 509-375-3927                         Pacific Northwest Lab, MS K1-87
Fax: 509-375-6631                         P.O.Box 999, Richland, WA  99352
From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 25 15:44:07 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA25543; Thu, 25 Mar 93 15:44:07 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA05674; Thu, 25 Mar 93 15:43:25 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 25 Mar 1993 15:43:24 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from deepthought.cs.utexas.edu by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA05664; Thu, 25 Mar 93 15:43:22 -0500
From: rvdg@cs.utexas.edu (Robert van de Geijn)
Received: from grit.cs.utexas.edu by deepthought.cs.utexas.edu (5.64/1.2/relay) with SMTP
	id AA29413; Thu, 25 Mar 93 14:43:20 -0600
Received: by grit.cs.utexas.edu (5.64/Client-v1.3)
	id AA05025; Thu, 25 Mar 93 14:42:58 -0600
Date: Thu, 25 Mar 93 14:42:58 -0600
Message-Id: <9303252042.AA05025@grit.cs.utexas.edu>
To: lyndon@epcc.ed.ac.uk
Cc: ho@almaden.ibm.com, mpi-collcomm@cs.utk.edu
In-Reply-To: L J Clarke's message of Tue, 23 Mar 93 19:37:21 GMT <16027.9303231937@subnode.epcc.ed.ac.uk>
Subject: No tag for a CC routine?

   X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Tue, 23 Mar 1993 14:37:31 EST
   Date: Tue, 23 Mar 93 19:37:21 GMT
   From: L J Clarke <lyndon@epcc.ed.ac.uk>
   Reply-To: lyndon@epcc.ed.ac.uk

   > Hi,
   >   I like to revisit an old issue regarding the Collective Communication (CC)
   > proposal to MPI.

   I support the specification of collective communications without use of
   message tag. I just cannot see that it is needed there.

   Best Wishes
   Lyndon

	    /--------------------------------------------------------\
       e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
       c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
	    \--------------------------------------------------------/



Ditto here.

Robert

=====================================================================
  Robert A. van de Geijn                     rvdg@cs.utexas.edu  
  Assistant Professor
  Department of Computer Sciences            (Work)  (512) 471-9720
  The University of Texas                    (Home)  (512) 251-8301 
  Austin, TX 78712                           (FAX)   (512) 471-8885 
=====================================================================
From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 25 16:20:54 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA26660; Thu, 25 Mar 93 16:20:54 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA07148; Thu, 25 Mar 93 16:20:05 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 25 Mar 1993 16:20:04 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from Aurora.CS.MsState.Edu by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA07136; Thu, 25 Mar 93 16:20:03 -0500
Received:  by Aurora.CS.MsState.Edu (4.1/6.0s-FWP);
	   id AA07870; Thu, 25 Mar 93 15:13:41 CST
Date: Thu, 25 Mar 93 15:13:41 CST
From: Tony Skjellum <tony@Aurora.CS.MsState.Edu>
Message-Id: <9303252113.AA07870@Aurora.CS.MsState.Edu>
To: lyndon@epcc.ed.ac.uk, rvdg@cs.utexas.edu
Subject: Re: No tag for a CC routine?
Cc: ho@almaden.ibm.com, mpi-collcomm@cs.utk.edu


Yes, one wonders...

Rationale for:
	1) Debugging of erroneous programs (well, what does the tag mean???)
	2) symmetry with point-to-point ???


Rationale against;
	1) prohibits use of some hardware, for certain
	2) no clear value
	3) tag might have role in implementing certain global operations,
		for some implementations

Despite my previous comments for this, I agree that tag should go.
- TOny
From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 25 18:13:21 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA01329; Thu, 25 Mar 93 18:13:21 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA11992; Thu, 25 Mar 93 18:12:45 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 25 Mar 1993 18:12:45 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from pnlg.pnl.gov by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA11984; Thu, 25 Mar 93 18:12:43 -0500
Received: from carbon.pnl.gov (130.20.188.38) by pnlg.pnl.gov; Thu, 25 Mar 93
 15:11 PST
Received: from sodium.pnl.gov by carbon.pnl.gov (4.1/SMI-4.1) id AA11096; Thu,
 25 Mar 93 15:09:21 PST
Received: by sodium.pnl.gov (4.1/SMI-4.0) id AA01610; Thu, 25 Mar 93 15:09:18
 PST
Date: Thu, 25 Mar 93 15:09:18 PST
From: rj_littlefield@pnlg.pnl.gov
Subject: RE: No tag for a CC routine?
To: lyndon@epcc.ed.ac.uk, rvdg@cs.utexas.edu, tony@Aurora.CS.MsState.Edu
Cc: d39135@carbon.pnl.gov, ho@almaden.ibm.com, mpi-collcomm@cs.utk.edu
Message-Id: <9303252309.AA01610@sodium.pnl.gov>
X-Envelope-To: mpi-collcomm@cs.utk.edu

SUMMARY: I am in favor of tags for collective communication calls,
on the basis that they have more value than cost.

Tony says

> Rationale for:
> 	1) Debugging of erroneous programs (well, what does the tag mean???)
> 	2) symmetry with point-to-point ???
> 
> Rationale against;
> 	1) prohibits use of some hardware, for certain
> 	2) no clear value
> 	3) tag might have role in implementing certain global operations,
> 		for some implementations

Let us specify that the tag value is logically redundant.

That is, let us specify that collective comm calls in separate
processes are actually matched by group and sequence, but that a
program is declared correct only if the tag value is the same for
all matching calls.  

The match can be checked for debugging.

This clarifies and supports reason #1 in favor of tags.

Reason #1 against is not true (under this spec).  Since the tags
are logically redundant, they can be ignored for the sake of
efficiency.

Reason #2 against is countered by the personal observation that
programmers sometimes foul up and match calls they didn't intend to.
Having a facility to detect this foulup would be valuable.

I don't much care about reason #2 for, and I don't understand reason
#3 against.  It bears some resemblance to my previous reply to Jim.
However, all I intended to do in that note was point out that if
there is a tag, it should not be interpreted as meaning anything in
terms of the point-to-point comms used inside the collective comm
routine.

--Rik

----------------------------------------------------------------------
rj_littlefield@pnl.gov (alias 'd39135')   Rik Littlefield
Tel: 509-375-3927                         Pacific Northwest Lab, MS K1-87
Fax: 509-375-6631                         P.O.Box 999, Richland, WA  99352
From owner-mpi-collcomm@CS.UTK.EDU  Thu Mar 25 19:45:48 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA02875; Thu, 25 Mar 93 19:45:48 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA15216; Thu, 25 Mar 93 19:44:59 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 25 Mar 1993 19:44:59 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from Aurora.CS.MsState.Edu by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA15208; Thu, 25 Mar 93 19:44:57 -0500
Received:  by Aurora.CS.MsState.Edu (4.1/6.0s-FWP);
	   id AA11124; Thu, 25 Mar 93 18:38:18 CST
Date: Thu, 25 Mar 93 18:38:18 CST
From: Tony Skjellum <tony@Aurora.CS.MsState.Edu>
Message-Id: <9303260038.AA11124@Aurora.CS.MsState.Edu>
To: lyndon@epcc.ed.ac.uk, rvdg@cs.utexas.edu, tony@Aurora.CS.MsState.Edu,
        rj_littlefield@pnlg.pnl.gov
Subject: RE: No tag for a CC routine?
Cc: d39135@carbon.pnl.gov, ho@almaden.ibm.com, mpi-collcomm@cs.utk.edu


With regard to hardware problems introduced by tag, it is possible that
a hardware 'maximum' or 'combine' might not be able to handle the extra
tag, without significant additional overhead.  That is all.

I am not strongly against this, and I do value Rik's points of view on
this, provided we do not create an abstraction that limits the (important)
ability to use the emerging hardware-supported SIMD-like operations,
as appropriate.

- Tony
From owner-mpi-collcomm@CS.UTK.EDU  Fri Mar 26 02:16:58 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA09236; Fri, 26 Mar 93 02:16:58 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA28767; Fri, 26 Mar 93 02:16:18 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Fri, 26 Mar 1993 02:16:17 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from pnlg.pnl.gov by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA28759; Fri, 26 Mar 93 02:16:15 -0500
Received: from carbon.pnl.gov (130.20.188.38) by pnlg.pnl.gov; Thu, 25 Mar 93
 23:14 PST
Received: from sodium.pnl.gov by carbon.pnl.gov (4.1/SMI-4.1) id AA11259; Thu,
 25 Mar 93 23:12:47 PST
Received: by sodium.pnl.gov (4.1/SMI-4.0) id AA01977; Thu, 25 Mar 93 23:12:44
 PST
Date: Thu, 25 Mar 93 23:12:44 PST
From: d39135@sodium.pnl.gov
Subject: RE: No tag for a CC routine?
To: lyndon@epcc.ed.ac.uk, rj_littlefield@pnlg.pnl.gov, rvdg@cs.utexas.edu,
        tony@Aurora.CS.MsState.Edu
Cc: d39135@carbon.pnl.gov, ho@almaden.ibm.com, mpi-collcomm@cs.utk.edu
Message-Id: <9303260712.AA01977@sodium.pnl.gov>
X-Envelope-To: mpi-collcomm@cs.utk.edu

Tony says

> I am not strongly against this...
> ...provided we do not create an abstraction that limits the (important)
> ability to use the emerging hardware-supported SIMD-like operations,
> as appropriate.

I agree completely with Tony's concern.  If we are going to include
a tag for collective comms (as I argue would be desirable),
then we need to be sure that the semantics are defined so as to
not exclude efficient hardware ops.  I believe that the specification
I stated accomplishes this.  If not, please correct me.

--Rik
From owner-mpi-collcomm@CS.UTK.EDU  Fri Apr  2 01:52:44 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA24322; Fri, 2 Apr 93 01:52:44 -0500
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA08752; Fri, 2 Apr 93 01:51:45 -0500
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Fri, 2 Apr 1993 01:51:43 EST
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from Aurora.CS.MsState.Edu by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA08727; Fri, 2 Apr 93 01:51:13 -0500
Received:  by Aurora.CS.MsState.Edu (4.1/6.0s-FWP);
	   id AA01906; Fri, 2 Apr 93 00:50:53 CST
Date: Fri, 2 Apr 93 00:50:53 CST
From: Tony Skjellum <tony@Aurora.CS.MsState.Edu>
Message-Id: <9304020650.AA01906@Aurora.CS.MsState.Edu>
To: mpi-context@cs.utk.edu
Subject: the gathering
Cc: mpi-collcomm@cs.utk.edu

Dear Context sub-committee members (and observers from collcomm, etc),

	The meeting this week underscored the need for convergence
to a unifying proposal that captures the features of Proposal I, VIII, and
III+VII=X.  The following work will be accomplished before May 12 to
that end, while respecting the current separateness of I and VIII.  I
regret having to leave current MPI meeting early, but the context
discussions were quite sufficient to put me in a higher gear on the
problems before us...

	.  Rik Littlefield agrees to organize a set of test cases to be coded
		for each proposal; proposers including codings in their
		proposals.  Deadline for such examples is April 21, 8pm EST.
		This will be discussed on mpi-context over next three weeks.
	.  I will develop a unified proposal X (with sensible names, and
		rationale, details, performance discussion, and examples). 
	.  I will ask for help, as needed, from Lyndon/Mark/Marc etc, on
		understanding nuances of their proposals,
	.  Marc Snir / Lyndon Clarke 
		will discuss changes/enhancements (if any) to Proposal I
	.  Mark Sears will complete (presumably) a full proposal VIII

(Tacit in this discussion is the accepted merger of III+VII as X,
despite its incomplete state, so we have eliminated some proposals
from consideration this round).  To be considered for a straw vote
(before next meeting), all proposals must be complete in that they
must

	.  Address their interactions with the first-reading of pt2pt, and
		current status of collcomm, including needed changes if any

	.  Provide specific syntax/semantics, as needed for pt2pt & collcomm
		chapters

	.  Describe any known flaws in syntax / semantics

	.  Describe logical subsets, if any, for MPI1

	.  Implement the examples that Rik organizes, and upon which we
		agree together (including those from Wednesday night 
		 discussion session)

	.  Include discussions of how starting works, and what the spawning
	   semantics must provide them (or through an initial message)
	   so that they can work. 

	.  The meaning of the MPI_ALL group in the proposal, if any, or
	   weaker substitutes for same.

	.  The existence/non-existence/requirement for servers or
	   shared-memory locations to effect some features

	.  Include expectations for performance of key operations
	   (eg, how much does it cost to get a new context?, can this
		be done outside of loops and cached?)

	.  Describe their use of a "cacheing facility," if any

	.  Describe their syntax/semantics of a "cacheing facility"

	.  Describe their reliance on any other MPI1 features not specifically
		part of context/group/tag/pid nature

		-	-	-	-	-

Presumably Proposals I, VIII, and X will fill all requirements to
reach the next straw poll deadline.  Whichever do make this Straw poll
deadline, (May 10, 1993, 5pm EST), can be considered by the voting
subcommittee.  A ranking will be developed, with the bottom N-2
proposals dropped.  We will meet on the evening of Wednesday, May 12,
8:00pm CST, for as long as it takes to choose the final proposal,
possibly by further merger of the remaining strong proposals.  On
Thursday, May 13, we will present our first reading of the Context
subcommittee (with possible spill over to Friday, May 14).  Actual
context sub-committee members will vote, only, in all cases.  Please
recall the two-sub-committee voting limit of the MPIF (as well as
sub-committee membership; observers are always welcome).

I will strive not to send fine-grain changes to proposal X's around,
but will wait to circulate my product in complete form, prior to May
10, so there is a lower e-mail burden for next weeks; perhaps others
will like to keep their updates coarse grain, but share important
things with everyone, for sure.  If agreements/compromises occur
between proposals and/or proposers, please share this with me and the
sub-committee in a timely fashion; I do not desire surprises at the
next meeting.  For instance, if Marc Snir were willing to consider a
separate context feature (separate from group) in Proposal I, a lot of
effort could be averted, because his proposal is pretty good otherwise
(except in re inter-group issues).  I think Lyndon will be talking to
Marc about making inter-group communication easier in Proposal I,
also.  If any breakthroughs are made, please let me know.

- Tony

PS Please copy mpi-collcomm on context-related matters for the
duration of MPIF. 

.	.	.	.	.	.	.	.	.      .
"There is no lifeguard at the gene pool." - C. H. Baldwin
"In the end ... there can be only one." - Ramirez (Sean Connery) in <Highlander>

Anthony Skjellum, MSU/ERC, (601)325-8435; FAX: 325-8997; tony@cs.msstate.edu




From owner-mpi-collcomm@CS.UTK.EDU  Tue Apr  6 15:38:54 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA25360; Tue, 6 Apr 93 15:38:54 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA10129; Tue, 6 Apr 93 15:38:13 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Tue, 6 Apr 1993 15:38:12 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from daedalus.epcc.ed.ac.uk by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA10111; Tue, 6 Apr 93 15:38:09 -0400
Date: Tue, 6 Apr 93 20:38:05 BST
Message-Id: <841.9304061938@subnode.epcc.ed.ac.uk>
From: L J Clarke <lyndon@epcc.ed.ac.uk>
Subject: [L J Clarke: mpi-context: comment and suggestion]
To: mpi-collcomm@cs.utk.edu
Reply-To: lyndon@epcc.ed.ac.uk

---- Start of forwarded text ----

Dear MPI context colleagues.

I'd like to say something about contexts and groups and the extant
proposals ...

First off, we have two major concepts floating around, which I need to
define here for purpose of the discussion below. 

Group --- is an ordered collection of distinct processes, or formally of
references to distinct processes.  It provides a naming scheme for
processes in terms of a group name and rank of process within group. 

Context --- is a distinct space of messages, or more formally of message
tags.  It provides management of messages as a message in context A
cannot be received as a message in context B. 

Within these definitions there are exactly two themes in the extant
proposals. 

Marc Snir, in Proposal I, views Group and Context as identical.  This
simplifies the number of concepts in MPI, but does mean that we can have
intragroup communication and no way at all can you have intergroup
communication within the above definition of Group and Context.  

Rik and I amusingly coined the term "grountext" to describe the
group/context entity in this proposal. 

Tony Skjellum, in Proposal III, views Group and Context as independent. 
This means two concepts instead of one, but does mean that we can allow
intragroup communication and some intergroup communication with
restriction on how flexible we can make such communication. 

Proposals VIII and X are identical to III in the manner in which they
treat Context and Group as independent concepts.  Please consider
Proposal VII as not compliant with the above definitions of Group and
Context. 

We need to decide:

1) Are context and group identical or different?

2) Is intergroup communication provided?

Now I want to point out something about intergroup communication which
we have in our system and find most expressive and convenient, but does
not fit in with the above frameworks and the assumption that the message
envelope always contains just (context, process, tag). 

Receive in intergroup communication can wildcard on (sender group and
sender rank) or (sender rank), in addition to message tag. 

We (at EPCC) do, and want to do (in MPI) (written out in longhand
notation)

receive(group, group', rank, tag)

where group is the receiver group, group' is the sender group,
rank is the sender rank in group' and tag is the message tag.
The receiver can never wildcard group.
The receiver can always wildcard tag.
The receiver can always wildcard either (rank) or (group' and rank).

(In fact, group and group' in this expression are more like the
grountext of Marc's proposal or the "context" of historical proposal
VII, but never mind on that point.)

In the framework of Marc we can reasonably do intergroup communication
without wildcard on group'.  To do this we transmit group information in
messages and form a group which is the union of group and group'.  We
cannot add wildcard on group' by saying that to do that one forms a
union of group and all cases of group'.  This requires the sender to
always know too much about the detail of the recieve call with which it
is to match (i.e., that the receiver is or is not doing a wildcard).  If
you disbelive this, then you should probably argue that we do not need
source selection in point-to-point as you can use tag to choose the
source, as it is the same argument (and bogus in my opinion). 

In the framework of Tony we can reasonably do intergroup communication
without wildcard on group'.  To do this we transmit group information in
messages and choose a context for the pair of groups to use for
intergroup communication.  We cannot add wildcard on group' by using a
context agreed for such use between group and all cases of group'.  The
argument is the same as that above after a little substitution. 

If we are serious about intergroup communication then in my opinion we
really should provide the facility to wildcard on sender group.  This
throws up a small number issues, some of which I now address. 

No process addressed: I didn't mention process addressed communication
at all.  Perhaps the demons of speed are bothered by this.  Well, we
could do such as (context,process,tag), and the above does not exclude
it.  We can fit it in, of course. 

Size of point-to-point section: I said above "longhand notation".  Well
that is the most expressive and convenient notation, and if you ask me
then I think that (group,group,rank,tag) or (NULL,group,rank,tag) are
both acceptable for intragroup communication.  On the other hand one can
introduce some grunge syntax for intergroup communication which use the
same framework as intragroup communication and replaces group in
(group,rank,tag) with some glob object which is "shorthand" for (group,
group').  This is not the best syntax in the world but we can live with
it.  We can even fit in the process addressed stuff with this kind of
syntax as I have shown in Proposal X. 

Message envelope: You probably spot that this needs the sender group id
to go into the message envelope.  Perhaps the demons of speed are
bothered by this.  Well, you could have a different enevelope for
groupless communication, intragroup communication and intergroup
communication, and only pay the cost of the bigger envelope when you
need it.  This is going to take two bits for envelope identification. 
Big deal! It will anyway be natural not to match communications of
different kinds (e.g.  intergroup cannot match with intragroup,
groupless cannot match with intergroup) so the extra header bits would
be useful anyway. 

Unknown group: You probably also spot that the receive with wildcard on
group can pick up a group that the receiver knows nothing of.  I would
be happiest if the implementation of MPI at the receiver asked the
implementation of MPI at the sender about the group in this case, so
that the receiver never has to bother about the eventuality.  We (at
EPCC) could accept that the returned group identifier is a NULL
identifier.  This means that groups have to exchange flattened group
descriptions in messages in a reasonable way before they can make a
great deal of sense of intergroup communication.  Not ideal, but we can
live with it. 

Comments please?

---- End of forwarded text ----
         /--------------------------------------------------------\
    e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
    c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
         \--------------------------------------------------------/


From owner-mpi-collcomm@CS.UTK.EDU  Thu Apr  8 10:58:23 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA11790; Thu, 8 Apr 93 10:58:23 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA15413; Thu, 8 Apr 93 10:57:34 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 8 Apr 1993 10:57:33 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from daedalus.epcc.ed.ac.uk by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA15386; Thu, 8 Apr 93 10:56:27 -0400
Date: Thu, 8 Apr 93 15:56:22 BST
Message-Id: <2310.9304081456@subnode.epcc.ed.ac.uk>
From: L J Clarke <lyndon@epcc.ed.ac.uk>
Subject: mpi-context: context and group (medium)
To: mpi-context@cs.utk.edu
Reply-To: lyndon@epcc.ed.ac.uk
Cc: mpi-collcomm@cs.utk.edu

Dear MPI Colleagues

This letter is about groups, contexts, independence and coupling
thereof, the kinds of point-to-point communication which we have talked
about, and to brief extent libraries. 

Before embarking on the guts of the letter, I should like to express
very strong support for the suggestion that MPI users can cleanly
program in the host-node model.  In my opinion, this model of
programming is of considerable commercial significance, and I observer
that their are a number of important programs around which use this
model. 


			o--------------------o

I understand three different kinds of point-to-point communication which
have been discussed by various people in MPIF.  I write these out with
separate group and context concepts, as per a previous message to
mpi-context [Subject: mpi-context: comment and suggestion].  I will then
discuss coupling of group and context.  I refer the reader to my prevous
message to mpi-comm which described classes of MPI user libraries
[Subject: mpi-comm: various (long)], as their is some follow on
discussion below. 


Groupless (process addressed)
-----------------------------
(process, context, tag)
Wildcard on process, tag.
No wildcard on context

Intragroup (closed group)
-------------------------
(group, rank, context, tag)
Wildcard on rank, tag. 
No wildcard on group, context.

Intergroup (open group)
-----------------------
(lgroup, rgroup, rank, context, tag)
Wildcard on rgroup, rank, tag. 
No wildcard on lgroup, context.

Observe that "group" in intragroup and "lgroup" in intergroup  are the
same thing, they are the group of the callling process.

Since neither "group" nor "context" in intragroup can be wildcard then
there  may  appear to be appeal in some coupling of them  in order  to
provide  shorter  syntax and  easier  context/group  management.  This
implies that we couple context to group of calling process.  Now  this
coulping  is not compatible  with intergroup  since  the  two  calling
processes have  different  groups, thus different  contexts, thus  the
send and receive can never match. We can resolve this  difficulty by a
more careful statement of where the context of the message is coupled.
In particular  we can state that the context of the message is coupled
to the  group of the message receiver.   In  this way we would express
intragroup  as a coupling  of (group,context),  and  we would  express
intergroup as a pair of such couplings.

The claim  we have  heard that  context  and  group  must  be strongly
coupled, resulting in a proposal which asserts that context and  group
are identical,  is  possibly nothing  more  than a  consequence of  an
assumption  that messages may only be  distinguished  on the basis  of
(process, context, tag)  (here process is a process label which can be
a rank  within  a group).   Given  that  assumption, we  can only  use
context to  distinguish messages within different groups  and  the two
entities become  strongly coupled.   Examining records  of  the  early
meetings  of  MPI,  I  find  that  this "decision"  was  made  by  the
point-to-point subcommittee  in a straw  poll which rejected selection
by group by  a narrow  majority of 10 to 11. Please note also that the
same  meeting  rejected context  modifying process  identifier  ---  a
"decision" which we are already  often  ignoring.   These  "decisions"
predate the existence of  the contexts  subcommittee  and the vigorous
discussion of contexts and groups which has been and continues to take
place.  We should uniformly be  open minded enough  to allow ourselves
to question all such "decisions", and to change them if we see fit.

The description of  MPI  user libraries which has been  given by  Mark
Sears and  myself strongly  suggests  that  context and group  must be
independent entities. 

Provision  of the process addressed communication immediately suggests
that a context can appear without coupling to a group in which case it
seems (to me) that they are independent entities.

There  is an  argument  against process  addressed communication which
says that process addressed communication gains nothing in performance
over intragroup  communication  in the group of all processes.  If the
process  description in process addressed communication will, for sake
of generality and thus portability, have to be an some kind of pointer
to a process description object which contains whatever information is
needed to route a message to the intended recipient.  It could be just
that (in C, at  least), a pointer.  Sometimes,  on  some  machines, it
will actually  be implementable with some other kind of magic which is
more scalable, but it must always appear the same way.  It could be an
index, representable as  an integer in the host language, into a table
of process description objects  (better for F77, for sure).   It could
be  a  rank in  a group  of (all)  processes, used as an  index into a
process  description object table,  which is  just fine  for  a static
process model (and reflects existing practice).  It could be some kind
of global  unique process  identifier which is again  user as  a table
index  somewhere.  If  tables  grow too large in either of the  latter
cases, then there may be some hashing and/or caching involved.

There are counter arguments. I give one,  and invite you to give more.
On some machines,  the global unique process identifier is  sufficient
to route the message, and is representable  as an integer in  the host
language. For example, the global process id can be a composite of two
bit fields (nodeid,  procid) where nodeid is a physical processor node
number and procid is a process number  on the node, and the nodeid bit
field is sufficient to route.  In  these cases, there is no need for a
process description object table, and no need to do a table lookup. We
probably all have used machines just like this.

For  me the arguments  have  piled up  in favour  of context and group
being separate and  independent entities.  This letter therefore makes
the recommendation that context and group are separate and independent
entities. In that light  I propose further discussion on management of
contexts within and between processes, and within  and between groups,
and on the  subject of the use objects which bind one or more contexts
and  one or more  groups in  order  to keep  the  communcation  syntax
compact by overloading. I shall post another letter to you tomorrow.

			o--------------------o

Comments, questions, (flames :-) please?!

Best Wishes
Lyndon


         /--------------------------------------------------------\
    e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
    c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
         \--------------------------------------------------------/


From owner-mpi-collcomm@CS.UTK.EDU  Thu Apr  8 14:31:08 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA15535; Thu, 8 Apr 93 14:31:08 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA26254; Thu, 8 Apr 93 14:30:02 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Thu, 8 Apr 1993 14:30:01 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from ssd.intel.com by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA26243; Thu, 8 Apr 93 14:29:58 -0400
Received: from ernie.ssd.intel.com by SSD.intel.com (4.1/SMI-4.1)
	id AA01330; Thu, 8 Apr 93 11:29:43 PDT
Message-Id: <9304081829.AA01330@SSD.intel.com>
To: lyndon@epcc.ed.ac.uk
Cc: mpi-context@cs.utk.edu, mpi-collcomm@cs.utk.edu, prp@SSD.intel.com
Subject: Re: mpi-context: context and group (longer) 
In-Reply-To: Your message of "Thu, 08 Apr 93 15:56:22 BST."
             <2310.9304081456@subnode.epcc.ed.ac.uk> 
Date: Thu, 08 Apr 93 11:29:42 -0700
From: prp@SSD.intel.com


> From: L J Clarke <lyndon@epcc.ed.ac.uk>
> Subject: mpi-context: context and group (medium)
>
> ...
> 
> For  me the arguments  have  piled up  in favour  of context and group
> being separate and  independent entities.
>
> Lyndon

I agree, although I also see merit in associating a context with a group.

I would like to share my thoughts about context which lead me to think we need
two differently managed forms of context. Some of you have already heard this.

Most of the discussion about context has revolved around protecting two
different entities: libraries and groups. I think this is
required, but I think they need very differently managed contexts. One form
is not adequate to cover both needs without sacrificing performance.

Consider a SPMD program with these calls. Assume the calls are loosely
synchronous.

		call to LibA (Group1)
		call to LibB (Group1)
		call to LibB (Group2)

In a loosely synchronous environment, messages for the next call can come in
before the previous one has completed. Here we see two forms of overlap.

Within the call to LibA, we might get messages from processes which have already
entered LibB. If LibA and LibB are independently written, they might use some
of the same tags. To avoid messages from LibB matching receives in LibA, we
must use different contexts. If we have static contexts, allocated when the
libraries are initialized, each call in the library can quickly provide the
context to its point-to-point calls. If we only have dynamic contexts,
especially if contexts are carried inside groups, then a library must be
prepared to dynamically allocate a new context on any call when it sees a new
group. I know we discussed ways to do this locally, so the context could be
created and cached locally on the fly without communication, but I find the
idea of incorporating such code into every library call horrifying.

Within the first call to LibB, we might get messages from processes which have
entered the second call to LibB. Since these calls are in different groups, it
might be difficult to code LibB in such a way that messages could not
intermix, since a process' position in Group2 might be quite different from
its position in Group1. (I would hope that libraries would be coded so that
multiple sequential calls to the same library with the same group would be
safe. That seems to be current practice.) To keep the two calls from
interfering, it would be convenient to have a different context for each
group. If each group contains a dynamically allocated context, thats easy. But
if contexts are statically allocated, especially if they require a name
server, getting a new context for each new group might be a global operation
that wouldn't scale well.

So I propose that we need two forms of context, one that is quite static for
protecting code, and one that is more dynamic for protecting groups.

The only mechanism I know of that is adequate for protecting code is context
alloctated via a nameserver. In MIMD programs, one cannot say much about the
order in which libraries are initialized. Thus, if context is statically
allocated at initialization time, there must be a way to obtain the global
context value for a piece of code independently of other processes. A more
static method, such as a MPI registry or a "dollar bill server" has the
disadvantage of requiring a much larger value range for context. That uses
precious bits in the envelope of every message. Once a context is allocated to
a piece of code, it can be safely stored in a global variable without
endangering thread safety or shared memory implementations, because no matter
how many instantiations of the library store into the variable, they will
always store the same value.

There are nice dynamic mechanisms for allocating context for groups, which
require only communication within the group. This can piggyback on the
communication which is probably required to set up and synchronize the group
when it is created. For instance, one might set aside a small number of
context values for use by groups. When a group is created, every process in
the group could provide its current set of free context values, possibly as a
bit vector. After a groupwide reduction, each process chooses the smallest
value from the intersection, resulting in every process choosing the same
value.

Other forms of context protection might be required in the future. I don't
predict any, and expect that with both a static and a dynamic form, it is
likely that future needs would be covered.

The point-to-point calls might be configured to accept (group, rank, context).
In this configuration, the static context protecting the code is passed in
explicitly, and the context protecting the group is inside the group object.

I'm not sure how this interacts with cross-group message passing. Perhaps the
simplest solution is to use a well-known group context in such cases, which
effectively disables group protection.

Those are my thoughts on context. Although I think the methods outlined here
are simple enough, I would be happy to see simpler mechanisms that solve the
same problems. I am not comfortable with any solution that requires active
participation by every library call, no matter how local.

Paul
From owner-mpi-collcomm@CS.UTK.EDU  Fri Apr  9 11:48:57 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA29408; Fri, 9 Apr 93 11:48:57 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA24729; Fri, 9 Apr 93 11:48:37 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Fri, 9 Apr 1993 11:48:36 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from daedalus.epcc.ed.ac.uk by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA24704; Fri, 9 Apr 93 11:48:22 -0400
Date: Fri, 9 Apr 93 16:48:19 BST
Message-Id: <3201.9304091548@subnode.epcc.ed.ac.uk>
From: L J Clarke <lyndon@epcc.ed.ac.uk>
Subject: Re: mpi-context: context and group (medium)
To: mpi-context@cs.utk.edu
In-Reply-To: L J Clarke's message of Thu, 8 Apr 93 15:56:22 BST
Reply-To: lyndon@epcc.ed.ac.uk
Cc: mpi-collcomm@cs.utk.edu

Dear MPI Colleagues

This is a short letter. 

First: a colleague here pointed out to me that I left an unfinished
point, ie failed to draw to conclusion, in the mail message "Subject:
mpi-context: context and group (medium)".  Apologies to you all for my
slipshod work.  I conclude that discussion here. 

Regarding the process identifier, for which there are arguments for and
against its appearance as a global unique process identifier and as a
process local handle to opaque process descriptor object.  The
discussion in the referenced letter should have concluded that MPI
should say that it is a process local identifier of a process expressed
as an integer, and no more.  This allows the implementation of MPI to
choose the "best" form, which may be a global unique process identifier
or may be a process local opaque reference to a process description
object or may be an index into a table of subject objects describing all
processes. 

Second: the letter I sent to you all "Subject: mpi-context: comment and
suggestion" contained.  Apologies again.  I correct those errors here. 

* The claim that the conceptual framework of Tony regarding Group and 
  Context restricts the possibilities for inter(group)communication is 
  false.  It is the restriction of the message envelope to 
  (context,process,tag) which creates the limitation in this case.

* When I explained how intergroup communication can be done within the
  conceptual framework of Marc (Snir) I should have said that this 
  is a method for *simulating* intergroup communication without wildcard
  on group'.

* When I explained how intergroup communication can be done within the
  framework Tony Ishould have said that this is a method for
  *implementing* intergroup communication without wildcard on group'.

Final: Regarding the same letter whihc really deals with the subject of
inter(group)communication I may have made errors or at least unhelpful
assumptions in the latter couple of paragraphs of the message.  Again I
apologise.  I plan to go into deep thought on the subject of
inter(group,context)communication, and promise to deliver some quality
discussion to you all next week.  Please bear with me.  Until such time
I shall omit inter(group,context)communication from my discussions. 

Best Wishes
Lyndon


         /--------------------------------------------------------\
    e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
    c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
         \--------------------------------------------------------/


From owner-mpi-collcomm@CS.UTK.EDU  Fri Apr  9 12:21:22 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA29775; Fri, 9 Apr 93 12:21:22 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA26245; Fri, 9 Apr 93 12:20:56 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Fri, 9 Apr 1993 12:20:56 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from daedalus.epcc.ed.ac.uk by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA26150; Fri, 9 Apr 93 12:20:12 -0400
Date: Fri, 9 Apr 93 17:20:03 BST
Message-Id: <3227.9304091620@subnode.epcc.ed.ac.uk>
From: L J Clarke <lyndon@epcc.ed.ac.uk>
Subject: mpi-context: Why scarce contexts?
To: mpi-context@cs.utk.edu
Reply-To: lyndon@epcc.ed.ac.uk
Cc: mpi-collcomm@cs.utk.edu

Dear MPI Colleagues.

This question is primarily directed at Mark Sears.  

Mark, in Proposal VII you say that contexts will be a scarce resource,
in fact you suggest 16 which is in my mind very scarce indeed. 

Why do you say this? It will help me/us if I/we understand, I am sure. 
Please reply. 

Best Wishes
Lyndon

         /--------------------------------------------------------\
    e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
    c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
         \--------------------------------------------------------/


From owner-mpi-collcomm@CS.UTK.EDU  Fri Apr  9 13:32:48 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA01040; Fri, 9 Apr 93 13:32:48 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA29417; Fri, 9 Apr 93 13:32:23 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Fri, 9 Apr 1993 13:32:22 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from daedalus.epcc.ed.ac.uk by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA29393; Fri, 9 Apr 93 13:31:43 -0400
Date: Fri, 9 Apr 93 18:31:38 BST
Message-Id: <3385.9304091731@subnode.epcc.ed.ac.uk>
From: L J Clarke <lyndon@epcc.ed.ac.uk>
Subject: Re: mpi-context: Why scarce contexts? 
To: mpsears@newton.cs.sandia.gov, mpi-context@cs.utk.edu
In-Reply-To: mpsears@newton.cs.sandia.gov's message of Fri, 09 Apr 93 11:06:58 MST
Reply-To: lyndon@epcc.ed.ac.uk
Cc: mpi-collcomm@cs.utk.edu


Dear Mark

First, apologies for getting the proposal number wrong.

> 
> Lyndon asks why I think context values will be a scarce resource.
>

[stuff deleted]

> I think there are several reasons. The first is that a context
> requires underlying resources in the implementation (e.g. queues)
> which may be limited. A message arrives at a process, it goes
> into a queue matching the assigned context value in the
> envelope. Both support for the queue and the matching function
> take some effort. (16 queues is not too bad; 1000 is a lot.)
> One way to limit the effort required is to
> limit the number of supported contexts.

What you seem to be asking in this argument is that each process should
use a limited number of contexts, which is different to asking that the
system as a whole should use a limited number of contexts. Okay, that is
just perhaps a subtle point.

You are assuming details of an implementation of context.  For example,
in a different approach there could be just one queue which is searched
through (in some fashion) in receive for a matching message, testing for
context in no different way to testing for tag and sender.  In that
implementation contexts do not require resource, and the number of
contexts is bounded only by the bit length of the context identifier. 

I imagine that you must have good reasons for the assumed implementation
of context.  Please do let me/us know why you make the assumption, I am
sure that I am not alone in my concern that the number of contexts
should be so scarce, but perhaps you know of very good reasons why they
should so be. 

> Second, the bits in the envelope that support the context value
> have to come from somewhere, probably the existing tag field. If
> the tag field is only 16 bits to begin with (for argument's sake),
> then taking more than 4 bits for a context value might have a
> large impact.

I must be missing something here again.  This seems to say that the bit
length of the envelope is fixed to some number of bits and the more
fields we want to cram into the envelope the shorter the bit lengths of
fields must be.  Is there a good reason why the bit length of the
envelope shoud be fixed in this fashion, or perhaps are you arguing
that the bit length of the envelope should be as short as possible?

> This is a question vendors might answer: how many
> context values and tag values are you willing to support on future
> platforms and how many are you willing to back fit on existing ones?
> 

Yes, this would be a good question for the vendors indeed.  

VENDORS - PLEASE PLEASE PLEASE DO ADVISE US ON THIS ONE. 

> Last, I don't see a need for billions of contexts. My model calls
> for most programs to use handfuls, not thousands.

Yes, your model demands that programs use a handfull, the concern which
I have is that complex and highly modular software will not be able to
conform with your model, inhibiting the development of third party
software. 

> I would also like to
> think (this is a hopeless cause, but here goes) that much of
> MPI could be implemented in hardware, not just the communications
> part but the part that we now think of as overhead. This would
> greatly extend the class of programs that could benefit from
> parallelization, and I oppose for this reason things which add
> unnecessary complexity to the communications process. 

I am sure that vendors do take very seriously the possibility of
implementing relevant parts of MPI in hardware. 

Best Wishes
Lyndon

         /--------------------------------------------------------\
    e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
    c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
         \--------------------------------------------------------/


From owner-mpi-collcomm@CS.UTK.EDU  Fri Apr  9 15:33:50 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA05151; Fri, 9 Apr 93 15:33:50 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA04547; Fri, 9 Apr 93 15:33:05 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Fri, 9 Apr 1993 15:33:04 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from daedalus.epcc.ed.ac.uk by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA04510; Fri, 9 Apr 93 15:32:25 -0400
Date: Fri, 9 Apr 93 20:32:21 BST
Message-Id: <3457.9304091932@subnode.epcc.ed.ac.uk>
From: L J Clarke <lyndon@epcc.ed.ac.uk>
Subject: mpi-context: context management and group binding (long)
To: mpi-context@cs.utk.edu
Reply-To: lyndon@epcc.ed.ac.uk
Cc: mpi-collcomm@cs.utk.edu

Dear MPI Colleagues

I now discuss context management and group binding. As promised I omit
inter(group,context)communication  for  the  present.  This letter  is
further to letters of today and yesterday to mpi-comm and mpi-context.

Some of the people I talked with  about contexts at the recent meeting
wanted to be able  to  generate some contexts values  themselves, i.e.
not by calling context constructor procedures. This  is accomadated in
the recommendations of this letter.

In  my letter to mpi-comm  "Subject: various (long)"  I suggested that
the  question  of   secure/insecure   point-to-point  and   collective
communications  could  be described  as a property of  a context, with
some advantage. In this letter I will incorporate this feature.

I will also be discussing communicator objects as  in  Proposal X, but
with sensible names.  Tony Skjellum has  made the valued suggestion to
me,  privately,  that it  is better  to  attributise the  communicator
object with the secure/insecure  stuff,  rather  than  the context.  I
shall  adopt  this suggestion  in  this  letter  and  attributise  the
communicator object rather than the context.

			o--------------------o

Message Contexts
----------------

In this  proposal  a (message) context identifier  is (like a  message
tag) just an integer which is used in message selection and  (unlike a
message tag) may not be wildcard.

The interval of context identifiers (1, ...,  MPI_NUMUSR_CONTEXTS) are
reserved for  the  MPI user to manage as she sees  fit.  Use  of these
contexts  allows the user to  write programs  which do not make use of
the provided context creation and deletion facilities.  How big should
MPI_NUMUSR_CONTEXTS be? Say  1, 2, 4, 8, 16,  32, 64, 128, ...   Steve
and Tom, and friends, can you advise?


The  MPI system provides a procedure which  creates  a  unique context
outside  the  interval  of reserved user  context identifiers,  and  a
procedure  which deletes a created context  (it does  not delete  user
reserved contexts).  For example:

            context = mpi_create_context()
            mpi_delete_context(context)


There  may  be  advantage  in defining  the context  create and delete
functions such that they create and delete more than one context at  a
time, in order to amortise creation/deletion overhead. 

Please note that  these  context generation calls are made by a single
process and are  asynchronous.  They  can be implemented  as a process
local  operation  by  attatching  the  global  process identifier to a
process local context  allocator, at the  expense  of needing a lot of
bits in  the  context.   They can  also be implemented  via access  to
shared data (or a reactive server) in which case the bit length of the
context can be made smaller.  [I view this as an implementation detail
which we  should not dwell on in MPI, and should be the freedom of the
implementor  to  choose any  formally  correct method  which hopefully
optimises execution on the target platform.]

The  user program may make  use of the user reserved  contexts. ClassB
libraries (encapsulated  objects) are  expected  to use system created
contexts. These can  be created  as above or through  the Communicator
object constructors described below.

Communicator Objects
--------------------

The context  acquired by the  user in either of  the above ways is not
valid  for  communication.  Communication  is  effected by  use  of  a
Communicator  object,  which is a binding  of  context, zero  or  more
groups (just zero or one in this letter),  and communicator attributes
(just one in this letter).

Two classes of communicator are described in this letter:

* WorldCommunicator - an instance of a WorldCommunicator is a binding of
  context   to  nothing.   This   communicator   allows   the   user  to
  intracommunicate within  the world  of  processes comprising  the user
  application,  labelling  processes with their (process  local) process
  identifier.

* GroupCommunicator - an instance of a GroupCommunicator is a binding of
  context to  a process  group.  This  communicator  allows the user  to
  intracommunicate within the  group of  processes comprising the group,
  labelling processes with their (group global) rank within group.

Communicator   creation  defines   the   SECURITY  attribute   of  the
communicator to be created, which may be any of the following:

* MPI_DEFAULT_COMMUNICATOR - the default Security attribute
  specified in environmental management.

* MPI_REGULAR_COMMUNICATOR - the regular Security attribute
  which provides regular point-to-point and collective semantics

* MPI_SECURE_COMMUNICATOR  - the secure Security attribute
  which provides secure point-to-point and collective semantics

Communicator objects are opaque objects of undefined size referenced
by an object handle which is expressed as in integer in the host
language.

Communicator  creation will create a context  for the Communicator, or
will  accept and  bind  a  user  managed  context. MPI should  provide
procedures for creation of each class of Communicator objects, and for
deletion of any class of Communicator object.  
For example,
            handle = mpi_create_world_communicator(context, security)
            handle = mpi_create_group_communicator(group, context, security)
            mpi_delete_communicator(handle)

In  each  creation  procedure  "security" is  the  security  attribute
described above. It is  the responsibility  of the user to ensure that
all communicators with the same context also have the same security.

In each creation procedure "context"  may be a user managed context or
may take the value  MPI_NULL_CONTEXT  (or  something  like that :-) in
which  case  the creation  procedure also creates  a context  for  the
communicator.  If the creation procedure creates  a  context  then the
procedure  synchronises the  calling processes  (all  processes for  a
WorldCommunicator and the  group of processes for a GroupCommunicator)
and returns the same context to each copy of the  communicator object.
If a user managed context was supplied then the  procedure is  process
local  and it is the responsibility of the  user  to ensure that  each
user managed context is bound to no more than one  communicator at any
time.

In the GroupCommunicator  creation procedure "group"  is a handle to a
group description.

The  communicator deletion procedure deletes the bound context if that
context  was created in the communicator creation  procedure  but does
not delete a user managed context.

Short Examples
--------------

A user program which only makes use of  two user reserved contexts and
makes no  use  of process  groupings  can "enable" the  user  reserved
contexts by creating WorldCommunicator objects.
For example,
            c0 = mpi_create_world_communicator(0,MPI_DEFAULT_COMMUNICATOR)
            c1 = mpi_create_world_communicator(1,MPI_DEFAULT_COMMUNICATOR)

A ClassA library can accept a communicator object as argument.
For example,
            void class_a_procedure(int communicator, ...) 
            {
              /* do it */
            }

A ClassB library can accept a group as argument and create private
GroupCommunicator objects.
For example,
            void class_b_procedure(int group, ...)
            {
              static int communicator = MPI_NULL_COMMUNICATOR;

              if (communicator != MPI_NULL_COMMUNICATOR) 
              {
                  communicator = mpi_create_group_communicator(group,
                                                    MPI_NULL_CONTEXT, 
                                            MPI_SECURE_COMMUNICATOR);
              }

              /* do it */
            }
This example could  be generalised by adding a group "cache"  facility
as described by Rik Littlefield.

Point-to-point communication
----------------------------

The  point-to-point  (intra)communication  procedures  have a  generic
process     and     message     addressing     form     (communicator,
process_label,message_label).  I  shall  deal with  Send  and  Receive
separately.

Send(communicator, process-label, message-label)
----

* communicator  is   a WorldCommunicator or a GroupCommunicator

* process-label is { the (process local) identifier of the receiver when
                   {                 communicator is a WorldCommunicator
                   {
                   { the rank in communicator.group of the receiver when
                   {                 communicator is a GroupCommunicator

* message-label is   the message tag in communicator.context.

The point-to-point  communication is  REGULAR if communicator.security
has    the    value    MPI_REGULAR_COMMUNICATOR,    and    SECURE   if
communicator.security has the value MPI_SECURE_COMMUNICATOR.


Recv(communicator, process-label, message-label)
----

* communicator  is   a WorldCommunicator or a GroupCommunicator

* process-label is { the (process local) identifier of the receiver when
                   {                 communicator is a WorldCommunicator
                   {
                   { the rank in communicator.group of the receiver when
                   {                 communicator is a GroupCommunicator
                   {
                   { a wildcard value in either case

* message-label is   the message tag in communicator.context or a
  wildcard value

The point-to-point  communication is  REGULAR if communicator.security
has    the    value    MPI_REGULAR_COMMUNICATOR,    and    SECURE   if
communicator.security has the value MPI_SECURE_COMMUNICATOR.

Collective communication
------------------------

The WorldCommunicator is not valid for MPI collective communication.

The  GroupCommunicator  is  valid  for  MPI  collective  communication
procedures.     The   collective    communication   is   REGULAR    if
communicator.security  has  the  value  MPI_REGULAR_COMMUNICATOR,  and
SECURE if communicator.security has the value MPI_SECURE_COMMUNICATOR.

			o--------------------o


Comments, questions, (flames :-), please!

For your conveniene, my plan now is to go into a session of deep thought
regarding intercommunication, the work we have done at EPCC, and MPI.  I
will then discuss these thoughts with my colleagues here, and promise to
return quality discussion of intercommunication to you sometime next
week. 

[If anyone wants to discuss intercommunication with me, I prefer to do
so privately until I have really thought longer and harder than before.]

I have an oustanding reply to Paul Pierce's recent letter, which I shall
make now.  I'll be off-line for a while, probably come on-line again
Sunday, and will reply to letters which I hope you will write in a
reactive and less prolific fashion. 

Happy reading :-)

Best Wishes
Lyndon

         /--------------------------------------------------------\
    e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
    c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
         \--------------------------------------------------------/


From owner-mpi-collcomm@CS.UTK.EDU  Fri Apr  9 16:01:34 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA06769; Fri, 9 Apr 93 16:01:34 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA05808; Fri, 9 Apr 93 16:01:03 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Fri, 9 Apr 1993 16:01:02 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from daedalus.epcc.ed.ac.uk by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA05800; Fri, 9 Apr 93 16:01:00 -0400
Date: Fri, 9 Apr 93 21:00:58 BST
Message-Id: <3497.9304092000@subnode.epcc.ed.ac.uk>
From: L J Clarke <lyndon@epcc.ed.ac.uk>
Subject: mpi-context: CORRECTION to previous message
To: mpi-collcomm@cs.utk.edu
Reply-To: lyndon@epcc.ed.ac.uk

Dear MPI Colleagues

An astute colleague here has pointed out two silly errors and some
exceptionally bad phrasing in my previous letter "Subject: mpi-context:
context management and group binding (long)"

When describing point-to-point receive, please replace the two erroneous
occurences of "receiver" by "sender".  Cut and paste errors, sorry. 

In the final paragraph I am inviting your replies and informing that I
personally will be in a reactive and less prolific mode of operation. 
The wording implies that I am asking you to be reactive and less
prolific, which of course I would not ask.  Tired and hungry errors (its
9pm here now, Easter Friday), sorry. 

Best Wishes
Lyndon "the prolific" 

ps thanks Al :-)

         /--------------------------------------------------------\
    e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
    c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
         \--------------------------------------------------------/


From owner-mpi-collcomm@CS.UTK.EDU  Fri Apr  9 16:20:54 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA07116; Fri, 9 Apr 93 16:20:54 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA06566; Fri, 9 Apr 93 16:20:22 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Fri, 9 Apr 1993 16:20:22 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from daedalus.epcc.ed.ac.uk by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA06500; Fri, 9 Apr 93 16:19:36 -0400
Date: Fri, 9 Apr 93 21:19:33 BST
Message-Id: <3512.9304092019@subnode.epcc.ed.ac.uk>
From: L J Clarke <lyndon@epcc.ed.ac.uk>
Subject: Re: mpi-context: context and group (longer) 
To: prp@SSD.intel.com
In-Reply-To: prp@SSD.intel.com's message of Thu, 08 Apr 93 11:29:42 -0700
Reply-To: lyndon@epcc.ed.ac.uk
Cc: mpi-context@cs.utk.edu, mpi-collcomm@cs.utk.edu

Paul Pierce writes:

> > For  me the arguments  have  piled up  in favour  of context and group
> > being separate and  independent entities.
> >
> > Lyndon
> 
> I agree, although I also see merit in associating a context with a group.
> 

Hey, some consensus here.  Magic!

BTW, Paul, I wanted to ask what you thought about my suggestions for
secure send/receive being bound to a context kind of thing (now
communicator object as of last mail to mpi-context), as opposed to
having different calls.  I think you are right about the microscopic
effect on the code.  I just tried to give both global default control
and per module instance control over the security question. 

> Consider a SPMD program with these calls. Assume the calls are loosely
> synchronous.
> 
> 		call to LibA (Group1)
> 		call to LibB (Group1)
> 		call to LibB (Group2)
> 
> In a loosely synchronous environment, messages for the next call can come in
> before the previous one has completed. Here we see two forms of overlap.
> 
> Within the call to LibA, we might get messages from processes which have already
> entered LibB. If LibA and LibB are independently written, they might use some
> of the same tags. To avoid messages from LibB matching receives in LibA, we
> must use different contexts. If we have static contexts, allocated when the
> libraries are initialized, each call in the library can quickly provide the
> context to its point-to-point calls. If we only have dynamic contexts,
> especially if contexts are carried inside groups, then a library must be
> prepared to dynamically allocate a new context on any call when it sees a new
> group. I know we discussed ways to do this locally, so the context could be
> created and cached locally on the fly without communication, but I find the
> idea of incorporating such code into every library call horrifying.

Paul, I have a model for libraries like this, which in my mail to
mpi-comm "Subject: mpi-comm: various (long)" I referred to as ClassB
libraries, which maybe you might want to think about.  It's quite
simple.  We write libraries just like this, which are akin to
encapsulated objects. 

We think in terms of library instances.  The library provides is an
instance constructor which accepts a group, creates context(s) for the
instance and constructs the instance, returning an instance id to the
user which is used to refer to the instance for all calls.  That is, all
calls including and up to the instance destructor, which asks an
instance to detruct itself. 

Our experience is that users do not find it difficult to manage this
model for ClassB libraries. 

> 
> So I propose that we need two forms of context, one that is quite static for
> protecting code, and one that is more dynamic for protecting groups.

I cannot see any difference between the latter of these two contexts
"more dynamic for protecting groups" and a global group identifier. 

> The only mechanism I know of that is adequate for protecting code is context
> alloctated via a nameserver. In MIMD programs, one cannot say much about the
> order in which libraries are initialized. Thus, if context is statically
> allocated at initialization time, there must be a way to obtain the global
> context value for a piece of code independently of other processes. A more
> static method, such as a MPI registry or a "dollar bill server" has the
> disadvantage of requiring a much larger value range for context. That uses
> precious bits in the envelope of every message. Once a context is allocated to
> a piece of code, it can be safely stored in a global variable without
> endangering thread safety or shared memory implementations, because no matter
> how many instantiations of the library store into the variable, they will
> always store the same value.

We find that with regard to operations within a process group, and in
particular to library instance construction and desctruction decribed
above, the main user program has a highly SPMD nature.  So we can
exploit sequencing.  This is a most valuable learning experience,
because we had similar thoughts to those you express here, implemented a
name server, and really didn't need it once (for this purpose). 

> 
> The point-to-point calls might be configured to accept (group, rank, context).
> In this configuration, the static context protecting the code is passed in
> explicitly, and the context protecting the group is inside the group object.
> 
> I'm not sure how this interacts with cross-group message passing. Perhaps the
> simplest solution is to use a well-known group context in such cases, which
> effectively disables group protection.

As I point out above, your "group protecting context hidden inside
group" really does just seem to me to be a global group identifier. 
Within the definition of context I see no reason why we necessarily will
cause a problem with intercommunication. 

When you say "use a well know group context in such cases" I take it you
mean a common ancestor like the "group context" of all processes or
something? 

I have promised, I will return quality discussion on intercommunication
next week. 

Did the points in this reply letter help, Paul?

Best Wishes
Lyndon "the temporarily less prolific"

         /--------------------------------------------------------\
    e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
    c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
         \--------------------------------------------------------/


From owner-mpi-collcomm@CS.UTK.EDU  Fri Apr  9 18:44:27 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA08721; Fri, 9 Apr 93 18:44:27 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA12880; Fri, 9 Apr 93 18:43:52 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Fri, 9 Apr 1993 18:43:51 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from ssd.intel.com by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA12872; Fri, 9 Apr 93 18:43:49 -0400
Received: from ernie.ssd.intel.com by SSD.intel.com (4.1/SMI-4.1)
	id AA24612; Fri, 9 Apr 93 15:43:36 PDT
Message-Id: <9304092243.AA24612@SSD.intel.com>
To: lyndon@epcc.ed.ac.uk
Cc: prp@SSD.intel.com, mpi-context@cs.utk.edu, mpi-collcomm@cs.utk.edu
Subject: Re: mpi-context: context and group (longer) 
In-Reply-To: Your message of "Fri, 09 Apr 93 21:19:33 BST."
             <3512.9304092019@subnode.epcc.ed.ac.uk> 
Date: Fri, 09 Apr 93 15:43:35 -0700
From: prp@SSD.intel.com


> From: L J Clarke <lyndon@epcc.ed.ac.uk>
> 
> Paul Pierce writes:
>
> > Consider a SPMD program with these calls. Assume the calls are loosely
> > synchronous.
> > 
> > 		call to LibA (Group1)
> > 		call to LibB (Group1)
> > 		call to LibB (Group2)
> > 
> > In a loosely synchronous environment, messages for the next call can come in
> > before the previous one has completed. Here we see two forms of overlap.
> > 
> > Within the call to LibA, we might get messages from processes which have already
> > entered LibB.
> 
> Paul, I have a model for libraries like this, which in my mail to
> mpi-comm "Subject: mpi-comm: various (long)" I referred to as ClassB
> libraries ...

Yes, that is a matching concept.

> We think in terms of library instances. ...

I found this part hard to understand. However, if you propose to use the
mechanisms in your just previous mail:

> A ClassB library can accept a group as argument and create private
> GroupCommunicator objects.
> For example,
>            void class_b_procedure(int group, ...)
>            {
>              static int communicator = MPI_NULL_COMMUNICATOR;
>
>              if (communicator != MPI_NULL_COMMUNICATOR) 
>              {
>                  communicator = mpi_create_group_communicator(group,
>                                                    MPI_NULL_CONTEXT, 
>                                            MPI_SECURE_COMMUNICATOR);
>              }
>
>              /* do it */
>            }
> This example could  be generalised by adding a group "cache"  facility
> as described by Rik Littlefield.

First of all this code doesn't work - if the library is called with a
different group (see the LibB(Group{1,2}) example above) it will mistakenly
use the communicator for Group1 when called for Group2. This problem can be
fixed using cacheing. But...

This is exactly the sort of too-dynamic, too-intrusive mechanism I find
horrifying. I can't conceive of unleashing on the unsuspecting world a
standard that requires you to put code like that in every library call
(its even more complex with cacheing.) We _must_ come up with a
better mechanism.

The example mechanism I talked about might look like this:

	int my_context;

	void class_b_initialize() /* Called once at the beginning of time */
	{
		my_context = create_and_or_lookup_context("mylib");
	}

	void class_b_procedure(int group, ...)
	{

		/* do it using (group, rank, my_context) */
	}

Note the total absence of context maintenance in the arbitrary library
procedure. For group protection, the group must contain an additional embedded
context.

> > So I propose that we need two forms of context, one that is quite static for
> > protecting code, and one that is more dynamic for protecting groups.
> 
> I cannot see any difference between the latter of these two contexts
> "more dynamic for protecting groups" and a global group identifier.

You are right, its the same. My point is that group context is necessary but
_not_ sufficient.

> > The only mechanism I know of that is adequate for protecting code is context
> > alloctated via a nameserver. ...
> 
> We find that with regard to operations within a process group, and in
> particular to library instance construction and desctruction decribed
> above, the main user program has a highly SPMD nature.  So we can
> exploit sequencing.  This is a most valuable learning experience,
> because we had similar thoughts to those you express here, implemented a
> name server, and really didn't need it once (for this purpose).

You learned that you have a SPMD universe. We have a mostly SPMD universe, but
we have customers already with MPMD applications.

One can argue that sequencing is acceptable for a static-process model, but it
is not adequate for dynamic processes. We have talked about defining MPI in
such a way that it is complete for a static process model but without limiting
its extension to a dynamic process model. So we must be careful - if we assume
sequencing now, we must do it in a way that allows for a nameserver later.


> Lyndon "the temporarily less prolific"

Paul
From owner-mpi-collcomm@CS.UTK.EDU  Fri Apr  9 22:58:35 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA11840; Fri, 9 Apr 93 22:58:35 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA20945; Fri, 9 Apr 93 22:58:12 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Fri, 9 Apr 1993 22:58:12 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from pnlg.pnl.gov by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA20883; Fri, 9 Apr 93 22:56:53 -0400
Received: from carbon.pnl.gov (130.20.188.38) by pnlg.pnl.gov; Fri, 9 Apr 93
 19:45 PST
Received: from sodium.pnl.gov by carbon.pnl.gov (4.1/SMI-4.1) id AA09922; Fri,
 9 Apr 93 19:43:30 PDT
Received: by sodium.pnl.gov (4.1/SMI-4.0) id AA22459; Fri, 9 Apr 93 19:43:26 PDT
Date: Fri, 9 Apr 93 19:43:26 PDT
From: rj_littlefield@pnlg.pnl.gov
Subject: proposal -- context and tag limits
To: lyndon@epcc.ed.ac.uk, mpi-context@cs.utk.edu, mpsears@newton.cs.sandia.gov
Cc: d39135@carbon.pnl.gov, gropp@mcs.anl.gov, mpi-collcomm@cs.utk.edu,
        mpi-envir@cs.utk.edu, mpi-pt2pt@cs.utk.edu
Message-Id: <9304100243.AA22459@sodium.pnl.gov>
X-Envelope-To: mpi-pt2pt@cs.utk.edu, mpi-envir@cs.utk.edu,
 mpi-context@cs.utk.edu, mpi-collcomm@cs.utk.edu

Lyndon et.al. write:

> ...  This seems to say that the bit
> length of the envelope is fixed to some number of bits and the more
> fields we want to cram into the envelope the shorter the bit lengths of
> fields must be.  Is there a good reason why the bit length of the
> envelope shoud be fixed in this fashion, or perhaps are you arguing
> that the bit length of the envelope should be as short as possible?
> 
> > This is a question vendors might answer: how many
> > context values and tag values are you willing to support on future
> > platforms and how many are you willing to back fit on existing ones?
> 
> Yes, this would be a good question for the vendors indeed.  
> 
> VENDORS - PLEASE PLEASE PLEASE DO ADVISE US ON THIS ONE. 

I wonder what kind of useful advice vendors could really give us.

Hardware support boils down to a question of getting faster
performance in exchange for some relatively small resource limit.

But in almost every case I can think of, such limits are made
functionally transparent to the user by automatic fallback to
some slower mechanism without the resource limit.  Thus we have..
  fixed size register sets with compilers that spill to memory,
  fixed size caches with automatic flush/reload from main memory,
  fixed size TLB's with cpu traps for TLB reload, 
  fixed size physical memory with virtual memory support, 
and so on.

The only counterexample that pops to mind is fixed-length numeric
values, for which reasonably well established conventions exist.

No such conventions currently exist regarding tag and context
values.

============  PROPOSAL TO ENVIRONMENT COMMITTEE ==============

The MPI specification should 

1. require that all MPI implementations provide functional
   support for specified generous limits (e.g., 32 bits) on tag
   and context values, and

2. suggest that vendors provide a system-specific mechanism by
   which the user can optionally specify tag and context limits
   that the program agrees to abide by.  Even the form of
   these limits should remain unspecified since they may vary
   from system to system.
   
======================== END PROPOSAL ========================

Further discussion...

If a vendor wishes to provide hardware support to enhance
performance for some stricter limits, and if some people are able
and willing to write programs within those limits, that's great.
Those people on those machines will be lark happy.  If the
performance increase is substantial, and I'm on one of those
machines, and my program is simple enough, I'll probably be one
of those people.

However, I am not aware of any system on which generous limits
could not be supported, albeit with some loss of performance
compared to staying within the (currently hypothetical)
hardware-supported limits.

Everyone I know would MUCH prefer suboptimal performance 
over HAVING to rewrite applications to conform to varying and
inconsistent hard limits.

Yes, I recall the many arguments against mandating specific
limits.  But, I claim that those arguments are misdirected.
They are based on analogy to things like word length and memory
size, which I again note are subject to well established
conventions and principles.  (You can't run big programs on small
machines, and we pretty much agree about what "big" and "small"
mean.)  In the case of context and tag values, such conventions
do not exist, and a very wide range of conflicting limits have
been discussed at various times and places.

I believe that we will not meet our goal of portability 
if we do not specify usable limits on tag and context values.

--Rik

----------------------------------------------------------------------
rj_littlefield@pnl.gov (alias 'd39135')   Rik Littlefield
Tel: 509-375-3927                         Pacific Northwest Lab, MS K1-87
Fax: 509-375-6631                         P.O.Box 999, Richland, WA  99352
From owner-mpi-collcomm@CS.UTK.EDU  Mon Apr 12 13:58:09 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA19594; Mon, 12 Apr 93 13:58:09 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA24464; Mon, 12 Apr 93 13:57:40 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Mon, 12 Apr 1993 13:57:38 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from pnlg.pnl.gov by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA24426; Mon, 12 Apr 93 13:56:31 -0400
Received: from carbon.pnl.gov (130.20.188.38) by pnlg.pnl.gov; Mon, 12 Apr 93
 10:42 PST
Received: from sodium.pnl.gov by carbon.pnl.gov (4.1/SMI-4.1) id AA11608; Mon,
 12 Apr 93 10:40:32 PDT
Received: by sodium.pnl.gov (4.1/SMI-4.0) id AA24711; Mon, 12 Apr 93 10:40:28
 PDT
Date: Mon, 12 Apr 93 10:40:28 PDT
From: rj_littlefield@pnlg.pnl.gov
Subject: contexts examples/problems 1-3
To: jwf@parasoft.com, lyndon@epcc.ed.ac.uk, mpi-collcomm@cs.utk.edu,
        mpi-context@cs.utk.edu, mpsears@cs.sandia.gov, snir@watson.ibm.com,
        tony@cs.msstate.edu
Cc: d39135@carbon.pnl.gov
Message-Id: <9304121740.AA24711@sodium.pnl.gov>
X-Envelope-To: mpi-context@cs.utk.edu, mpi-collcomm@cs.utk.edu

Folks,

As Tony Skjellum noted, I am organizing a set of test cases & issues
to be addressed by the various context proposals.  I have formulated
these as a set of "problems" such as might be found on an essay test.

Here are draft versions of the first three "problem statements"
for the context proposals.

I anticipate that at least one more problem will be submitted.

Please tell me about defects and inadequacies in these problems.

If you have a favorite concern, now is the time to get it
reflected in the problem set.

Thanks,
--Rik Littlefield


BACKGROUND INFO

. Be sure that your point-to-point and group/context control calls
  are specified elsewhere in your proposal.

PROBLEM 1 (simple):

. Specify your calling sequence for an MPI circular-shift routine
  that operates on a contiguous buffer of double precision float
  values.

  E.g. you might specify

    MPI_CSHIFTB (inbuf,outbuf,datatype,len,group,shift)

    where  IN inbuf        input buffer
           OUT outbuf      output buffer
           IN datatype     symbolic constant MPI_DOUBLE
           IN len          length of inbuf (# of elements)
           IN group        handle to group descriptor
           IN shift        number of processes to shift
           
. Assume that a user desires to write a new collective
  communication routine with the same calling sequence as cshift,
  but with different semantics.  

  To be definite, this routine exchanges data in the pattern needed
  for one stage in a butterfly.  I.e., the process of rank i exchanges
  data with the process of rank i+shift*(1-2*(i%(2*shift)/shift).

  Call this routine bflyexchange.

. Show an implementation of bflyexchange in terms of your
  point-to-point and group/context control calls.

. Specify the conditions necessary to ensure correct operation of
  this implementation.

  E.g., you might say "safe under all conditions", "safe if and
  only if no other routine issues wildcard receives in the same
  group/context", "safe if and only if context and tag are
  unique", or something like that.

  Making these conditions simple and broad is good.  
  Getting caught stating conditions that are too broad is bad.

. Discuss the performance of this implementation.  

  Note that the semantics of bflyexchange require only a single send
  and receive per process.  Explain how this level of performance can
  be achieved or approached by your implementation.  

  If you assert that group control operations can be done without
  communications, explain how this works and what implications it has
  on other system parameters, e.g., the number and range of context
  values.


PROBLEM 2 (medium)

. Write a "guidelines for library developers and users" document
  that explains how to write and call libraries in order to maintain
  message-passing isolation between the various libraries and
  between the libraries and the user program.  Be sure to explain
  how to achieve good efficiency.

  Be complete, but brief.

  (Long explanations can be interpreted as indicating a complex
  design.)

  You may wish to describe two or more self-consistent strategies,
  along the lines of Lyndon's "ClassA" and "ClassB" libraries as
  discussed earlier on mpi-context.


PROBLEM 3 (hard?)

This problem is paraphrased from one posed by Jon Flower.  The task
is to simulate the host-node programming model through the use of
"host" and "node" groups.  This is interesting both for backward-
compatibility and for its inter-group communication requirements.

As stated by Jon, this problem really spans subcommittees.  For
the sake of the present discussion, I have reformulated it in
terms of an SPMD programming model in which a black-box function
is used to tell each process whether it's the host or a node.
Note in particular that nodes don't know the id of the host.

Here is pseudo-code for the desired program:

    main()
    {
      if (I_am_the_host())
        host ();
      else
        node ();
    }

    host ();
    /*
     * Form two groups containing:
     *     i)  only the host process.
     *     ii) the node processes.
     */
	host_group = mpi_...;
	node_group = mpi_...;
    /*
     * Broadcast from host to all nodes; using "ALL" group.
     * (It would be nice to have inter-group broadcast for
     *  this since that is more like "current practice".)
     */
	myrank = mpi_...;
	mpi_bcast( ...., myrank, MPI_GROUP_ALL, ...);
    /*
     * Send individual message to each node in turn.
     */
	for(node=0; node < MPI_ORDER(node_group); node++) {
	    mpi_send( ..., (node_group, node), ...);
	}
    /*
     * Receive result from node 0.
     */
	mpi_recv( ..., (node_group, 0), ...);
    }

    node()
    {
    /*
     * Form two groups containing:
     *     i)  only the host process.
     *     ii) the node processes.
     */
	host_group = mpi_...;
	node_group = mpi_...;
    /*
     * Receive bcast from host using ALL group.
     */
	host_rank = mpi_...;
	mpi_bcast(..., host_rank, MPI_GROUP_ALL, ...);
    /*
     * Receive single message from host.
     */
	mpi_recv(..., 0, host_group, ...);
    /*
     * Send point-to-point messages in node group.
     */
        myrank = mpi_... (node_group);
        nnodes = mpi_... (node_group);
        sendhandle = mpi_isend( ..., 
         (node_group,(myrank+1)%nnodes), ...);
        mpi_recv ( ...,
         (node_group,(myrank-1+nnodes)%nnodes), ...);
        mpi_complete (sendhandle);
    /*
     * Compute global sum in nodes only.
     */
	mpi_reduce(...  , node_group, MPI_SUM_OP, ...);
    /*
     * Node 0 sends sum to host.
     */
	 if(myrank == 0) mpi_send(..., 0, host_group, ...);
    }

. Show how to implement this pseudo-code using your point-to-point
  and group calls.  Note that this code wants to think of node
  processes in terms of their rank in the node_group, not the
  ALL group.  Be sure to show all details of any translations
  that are required.

. Discuss how the collective comms and point-to-point messages
  are kept separate, even if the point-to-point calls are
  changed to used wildcards.

----------------------------------------------------------------------
rj_littlefield@pnl.gov (alias 'd39135')   Rik Littlefield
Tel: 509-375-3927                         Pacific Northwest Lab, MS K1-87
Fax: 509-375-6631                         P.O.Box 999, Richland, WA  99352
From owner-mpi-collcomm@CS.UTK.EDU  Mon Apr 12 17:55:01 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA24478; Mon, 12 Apr 93 17:55:01 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA10963; Mon, 12 Apr 93 17:54:05 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Mon, 12 Apr 1993 17:54:04 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from Aurora.CS.MsState.Edu by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA10955; Mon, 12 Apr 93 17:54:02 -0400
Received:  by Aurora.CS.MsState.Edu (4.1/6.0s-FWP);
	   id AA13925; Mon, 12 Apr 93 16:52:04 CDT
Date: Mon, 12 Apr 93 16:52:04 CDT
From: Tony Skjellum <tony@Aurora.CS.MsState.Edu>
Message-Id: <9304122152.AA13925@Aurora.CS.MsState.Edu>
To: tony@aurora@cs.msstate.edu, mpsears@newton.cs.sandia.gov
Subject: Re: the gathering
Cc: mpi-context@cs.utk.edu, mpi-collcomm@cs.utk.edu

Mark,

You should explain what your model implies about the starting of
processes.  If you assume that processes have been started by MPI, that
is OK (generally a tacit assumption of MPI1), but in any event you
should tell us what the process is told at the moment of spawning (eg,
about ALL groups, or its name, etc), that will help it become part of
MPI-based communication.  We need to see how "safe"/"unsafe" it will
be to start MPI in every model.  If it is extremely difficult/simple
to get from the "just-spawned" state to the "MPI-up-and-running" sate,
that should be made clear.

I am happy to answer more questions!  Please shoot away.

- Tony
PS Because this is of general interest to all readers, I am echoing to
	the reflector.  I hope that is OK with you.

----- Begin Included Message -----

From mpsears@newton.cs.sandia.gov Mon Apr 12 15:08:18 1993
To: tony@aurora@cs.msstate.edu
Subject: Re: the gathering
Date: Mon, 12 Apr 93 14:10:36 MST
From: mpsears@newton.cs.sandia.gov
Content-Length: 243


Tony, I need a little clarification of what you mean by

"Include discussion of how starting works and what the spawning
semantics must provide them (or through an initial message)
so that they can work."

Starting and spawning what?

mark




----- End Included Message -----

From owner-mpi-collcomm@CS.UTK.EDU  Mon Apr 19 09:47:57 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA01603; Mon, 19 Apr 93 09:47:57 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA09663; Mon, 19 Apr 93 09:47:31 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Mon, 19 Apr 1993 09:47:30 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from daedalus.epcc.ed.ac.uk by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA09302; Mon, 19 Apr 93 09:45:46 -0400
Date: Mon, 19 Apr 93 14:45:01 BST
Message-Id: <3994.9304191345@subnode.epcc.ed.ac.uk>
From: L J Clarke <lyndon@epcc.ed.ac.uk>
Subject: operation modes
To: mpi-pt2pt@cs.utk.edu, mpi-collcomm@cs.utk.edu
Reply-To: lyndon@epcc.ed.ac.uk

Dear mpi-pt2pt and mpi-collcomm colleagues,

I'm sending this to both subcommittees.  There is a section for pt2pt
and a section for collcomm, however these sections deal with an subject
which probably should be consistent across both subcommittees, hence I
send to both. 

                           o----------o
pt2pt
-----

Here is (yet another) suggestion which if adopted would help to reduce
the multiplicity of send calls.  In particular the multiplicity derived
from the three extant communication modes REGULAR (STANDARD?), READY and
SECURE (SYNCHRONOUS?). 

Observe that the send call in the case of each mode has the same syntax
class, unlike the multiplicity derived from data buffer nature.  The
suggestion is to have one send procedure which accepts a MODE argument
describing the communication mode, i.e.  is one of: REGULAR (STANDARD?);
READY; SECURE (SYNCHRONOUS?). 

This lets the MPI user make either local code decisions about which mode
is appropriate, by using the above names, or global code decisions by
use of #define in C and use of PARAMETER in Fortran (for example).

I also suggest that we say SYNCHRONOUS rather than SECURE, so as not to
give the impression that REGULAR (rather than STANDARD), is always not
secure, since it may be secure some of the time. 

I propose to the pt2pt subcommittee the suggestions made here.

                           o----------o

collcomm
--------

There is a class of collcomm procedures which we see may or may not
barrier synchronise the calling group.  The suggestion at the last
meeting was that users have to write code which allows such procedures
to barrier whereas they may not. 

The suggestion here is that those procedures which are not implicitly
barrier synchronising accept a MODE argument which determines whether
they certainly barrier synchronise, or whether they may or may not
barrier synchronize depending on the implementation.  This mode argument
is one of: REGULAR; SYNCHRONOUS.  Obviously I suggest that SYNCHRONOUS
is the mode which forces barrier synchronisation of the group. 

This is consistent with the pt2pt suggestion above, except that READY is
not a collcomm mode, and again lets the MPI user make either local code
decisions about which mode is appropriate, by using the above names, or
global code decisions by use of #define in C and use of PARAMETER in
Fortran (for example). 

I propose to the collcomm subcommittee the suggestions made here.

                           o----------o

Comments, questions, (flames :-) please.

Best Wishes
Lyndon

         /--------------------------------------------------------\
    e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
    c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
         \--------------------------------------------------------/


From owner-mpi-collcomm@CS.UTK.EDU  Tue Apr 20 13:27:37 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA03009; Tue, 20 Apr 93 13:27:37 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA18519; Tue, 20 Apr 93 13:26:45 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Tue, 20 Apr 1993 13:26:44 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from daedalus.epcc.ed.ac.uk by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA18365; Tue, 20 Apr 93 13:24:25 -0400
Date: Tue, 20 Apr 93 18:23:45 BST
Message-Id: <4968.9304201723@subnode.epcc.ed.ac.uk>
From: L J Clarke <lyndon@epcc.ed.ac.uk>
To: mpi-context@cs.utk.edu
Reply-To: lyndon@epcc.ed.ac.uk
Cc: mpi-collcomm@cs.utk.edu

Subject: mpi-context; intercommunication etc (long)

Dear mpi-context colleagues

I  previously  wrote  regarding  context  management  and  binding  of
contexts for intracommunication in the  letter 
[Subject:  mpi-context: context management and  group binding (long)]  
and  sent  out a  short correction in the letter 
[Subject: mpi-context: CORRECTION to previous message] 
to which I draw your attention.

In this letter I wish to briefly revisit and recap the above subjects,
then move on to  briefly discuss and  make  a  concrete suggestion for
intercommunication.   This is a long letter. Probably best to print it
and read over a coffee.

I really must clarify  the  nature of the context which I am assuming.
In  this letter contexts are assumed to be global in the sense that if
a process P  creates a context C, it can send C to another  process Q,
and Q can both send and  receive  messages of context C.  This is  the
model  adopted  by Zipcode, which  I view as the  exemplar of existing
practice regarding message context.

Regarding intracommunication I hope  to slightly simplify the  content
of my suggestion compared to the letters referred to above.  Regarding
intercommunication the particular suggestion I  make is  motivated  by
conformity  with intracommunication both in the point-to-point  syntax
class, and in the content of the message envelope.

			o--------------------o

1. Communicator and Communication
=================================

Communicator   objects   provide    point-to-point    and   collective
communication in MPI.  A communicator object is a binding of a message
context   and   one  or   more  process  worlds.   Two  subclasses  of
communicator   object   are  defined  below,   intracommunicator   and
intercommunicator.   Communicator  objects are identified  by  process
local object identifiers.

1.1 Construction, Destruction and Information
---------------------------------------------

MPI  provides subclass specific  communicator  constructors  described
below. MPI provides a subclass generic communicator object  destructor
procedure.

     mpi_delete_communicator(id)

id           is identifier of a communicator

purpose      deletes the communicator object identified by id
             See Note 1) under intracommunicator construction
             and Note 1) under intercommunicator construction

Notes:

1) This procedure could be replaced with MPI_FREE if we wish to fit in
with the manipulation of communication handles and buffer descriptor
handles described in the point-to-point chapter.

MPI provides a subclass generic procedure which returns the context
identifier of a communicator object.

context = mpi_communicator_context(id)

context   is the context bound to the communicator
id        is the identifier of a communicator
          See Note 1) under intracommunicator construction
          and Note 1) under intercommunicator construction

purpose   informs the caller of the context bound to a communicator

1.2 Discussion
--------------

2. Intracommunicator and Intracommunication
===========================================

Intracommunicator objects provide point-to-point communication between
processes of the same process world in MPI.  Intracommunicator objects
also provide collective communication in MPI.

2.1 Construction and Information
--------------------------------

MPI provides a subclass intracommunicator constructor.

id = mpi_create_intracommunicator(context, world)

id           is identifier of created communicator
context      is message context for communications
world        is process world of receiver and sender in both send and recv

purpose      creates an intracommunicator object

Notes:

1) The context of an intracommunicator is either an actual context  or
the null  context (MPI_NULL). If the context is an actual context then
the call does not  synchronise processes in  the process world of  the
intracommunicator.  If  the context is the null  context then the call
synchronises  the process world of  the  communicator  and  creates  a
context for the communicator. In this case the context is deleted when
the communicator is  itself deleted  calling  mpi_delete_communicator,
and that call  will synchronise the  process world.  In this  case the
information procedure mpi_communicator_context will return MPI_NULL to
the caller --- the  caller is  not  allowed to  have knowledge  of the
context created.

2) The  process  world  of  an intracommunicator  object is either  an
actual process group or the null group (MPI_NULL). If the world is  an
actual  process  group then  the world  is  understood to contain  all
processes  composing the process  group  and  the  communicator object
identifies processes in  a relative  sense, i.e. as a  rank within the
process group.   If the world is the  null  group  then  the  world is
dunerstood to  contain  all  processes  composing the program  and the
communicator  object identifies  processes in an absolute  sense, i.e.
as a process identifier.

MPI  provides  a subclass  information  procedure  which  returns  the
identifier of the world of the intracommunicator.

world = mpi_intracommunicator_world(id)

world        is process world of the communicator
id           is identifier of created communicator

purpose      returns the world identifier of the intracommunicator, 
             either an actual group identifier or the null group
             identifier (MPI_NULL)

2.2 Point-to-point
------------------

I deal with generic "send" and "recv" seperately, and  can  ignore the
multiple flavours thereof.

send(id, process, label, ...)

id           is identifier of intracommunicator object
process      is identifier of receiver in world of object
label        is message tag in context of object        

recv(id, process, label, ...)

id           is identifier of intracommunicator object,
                and cannot be wildcard
process      is identifier of sender in world of object, 
                and can be wildcard
label        is message tag in context of object, 
                and can be wildcard

Notes:

1) The caller  must be  in the  world of  the intracommunicator,  i.e.
either  it is the  null process  group  or an  actual process group of
which the caller is a member.

2.3 Collective
--------------

I  deal  with a  generic collective "operation",  and  can ignore  the
multiple flavours thereof.


operation(id, ...)

id           is identifier of intracommunicator object

Notes:

1) The intracommunicator must have a world which  is an actual process
group of which the caller is a member.

2.4 Envelope
------------

The message envelope for intracommunication consists of:

* sender identifier within process world of communicator (pid or rank)
* receiver routing (implementation defined)
* message context of communicator
* message tag
* message length   (implementation defined)

The sender and  reciever must  bind  the  context to  the same process
world in an intracommunicator, thus the world is determinable.

2.5 Discussion
--------------

The facilities for intracommunication, coupled with the context model,
provide a  convenient and powerful interface  for communications which
are  closed  within  the   scope  of  a  group  and   for  the  serial
client-server model.

The ability to create an  intracommunicator without synchronisation of
processes  simplifies the construction  of  libraries  in highly  MIMD
programs, and  can be  used  to  advantage  in  conjunction  with  the
association and location facilities described below.

3. Association, Dissociation, Location, Passivation and Activation
==================================================================

3.1 Association, Dissociation and Location 
------------------------------------------

These facilities allow the user to bind names to process, group, and
context objects.

     mpi_associate(name, id)

name     is a string which is the name bound to the given object 
id       is the object identifier (process, group or context)

prupose  associates name with object identified by id


id = mpi_locate(name, wait)

id       is the object identifier (process, group or context)
name     is a string which is the name bound to the given object
wait     is a boolean value determining whether the caller waits for
         the name to become associated with an object of given class

purpose  creates a copy of the object associated with name

     mpi_dissociate(id)

id       is the object identifier (process, group or context)

purpose  removes the association of name with object id, and can only
         be performed by the process which previously associated name.

Notes:

1) These facilities are a name service. This could be implemented by a
name server process  which can run on  a host or login  node, and need
not consume expensive numerical computation resources.

3.2 Passivation and Activation 
------------------------------

These  facilities  allow  the  user  to  transmit a process, group and
context objects.   Passivation and  activation  produce  a  "portable"
description  of the object in  a  memory buffer  (conventionally these
operations produce  a description in a file, but a  memory  buffer  is
more convenient for transmission in a message :-).

     mpi_passivate(id, buf, len)

id       is the object identifier (process, group or context)
buf      is an array of character
len      is the length of the array buf

purpose  writes a portable description of object identified by id in
         the memory buffer buf

id = mpi_activate(buf, len)

id       is the object identifier (process, group or context)
buf      is an array of character
len      is the length of the array buf

purpose  reads a portable description of an object and creates a copy
         of the object

Notes:

1) The detailed type  of the  memory buffer is not of great importance
provided  that  we define that type.  I have used character above,  we
could choose integer, for example.

3.3 Discussion
--------------

I  have assumed  that  MPI  can  distinguish the class  of  the object
(process, group  or  context)  given the object  identifier.   If this
cannot be the case  then we can describe a different set of procedures
for each class or we can add a class argument to the above procedures.

The  name association and location  service is the most manageable way
of  describing  which   groups   communicate  with  one  another.  The
passivation activation facilities are potentially a building block  in
the implementation of the name association and location service.

Deletion  of objects created  by  activation or  location  should only
delete the process local copy of the object.  It should not delete the
original copy. 

When location and activation "create" an object and the object already
exists within the calling process, a  new object should not be created
and the id of the existing object should be returned.  This means that
such  object  have  multiple  references,  so  we  should  define  the
destructors in  terms of deleting  references to  objects, leaving the
implementation to delete the object when there are zero references.

4. Intercommunicator and Intercommunication
===========================================

Intercommunicator objects provide point-to-point communication between
processes  of  different  process  worlds  in  MPI.  Intercommunicator
objects do not provide collective communication in MPI (yet :-).

4.1 Construction
----------------

id = mpi_create_intercommunicator(context, local_world, remote_world)

id           is identifier of created communicator
context      is message context for communications
local_world  is process world of sender in send and receiver in recv
remote_world is process world of receiver in send and sender in recv

purpose      creates an intercommunicator object

Notes:

1)  The  context  can  be  an  actual  context  or  the  null  context
(MPI_NULL).  If  the context is an actual  context then the  call does
not  synchronise  processes  within  the  two  process  worlds of  the
communicator.  If  the  context  is the  null  context then  the  call
synchronises the two process worlds of  the communicator and creates a
context for the communicator. In this case the context is deleted when
the communicator  is  itself deleted calling  mpi_delete_communicator,
and that call  will synchronise the  process world.   In this case the
information procedure mpi_communicator_context will return MPI_NULL to
the caller ---  the caller is not  allowed to have  knowledge  of  the
context created.

2)  Each process world of  an  intercommunicator object  is  either an
actual process  group or the null group (MPI_NULL). If the world is an
actual process group  then  the world  is  understood to  contain  all
processes  composing  the  process  group and  the communicator object
identifies processes in that world in a relative sense, i.e. as a rank
within the process group.  If  the world is  the null  group  then the
world is understood to contain all processes composing the program and
the communicator  object  identifies  processes  in  that  world in an
absolute sense,  i.e.   as a  global process identifier.  

MPI  provides  subclass  information  procedures  which  return  the
identifier of the local_world and remote_world of the intercommunicator.

world = mpi_intercommunicator_local_world(id)

world        is local process world of the communicator
id           is identifier of created communicator

purpose      returns the local world identifier of the intercommunicator, 
             either an actual group identifier or the null group
             identifier (MPI_NULL)


world = mpi_intercommunicator_remote_world(id)

world        is remote process world of the communicator
id           is identifier of created communicator

purpose      returns the remote world identifier of the intercommunicator, 
             either an actual group identifier or the null group
             identifier (MPI_NULL)


4.2 Point-to-point
------------------

I deal with generic "send" and "recv" seperately, and  can  ignore the
multiple flavours thereof.

send(id, process, label, ...)

id           is identifier of intracommunicator object
process      is identifier of receiver in remote_world of object
label        is message tag in context of object        

recv(id, process, label, ...)

id           is identifier of intracommunicator object,
                and cannot be wildcard
process      is identifier of sender in remote_world of object, 
                and can be a wildcard
label        is message tag in context of object, 
                and can be a wildcard

1)  The  caller must be  in  the local_world of the intracommunicator,
i.e.  either it is the null process group or an  actual process  group
of which the caller is a member.

4.3 Envelope
------------

The message envelope for intercommunication consists of:

* sender identifier within process world of communicator (pid or rank)
* receiver routing (implementation defined)
* message context of communicator
* message tag
* message length   (implementation defined)

The  sender and reciever  must bind  the context to  the  same process
worlds  in  an  intercommunicator, thus both the  local_world and  the
remote_world are determinable.

This is identical to the envelope of intracommunication.

4.4 Discussion
--------------

The facilities for intercommunication, coupled with the context model,
and the name service, provide a convenient interface for  the parallel
client-server   model  and   parallel  modular-application   software,
provided    that   the   WAIT_ANY()   facilities   of   point-to-point
communication are fair.

The ability  to create an intercommunicator without synchronisation of
processes  simplifies  the   programming  of  parallel   client-server
software, and avoids a dependency graph problem when writing  parallel
modular-application software in which the module graph contains loops.

5. Discussion
=============

I find  it  a  wee  bit  amusing that an  intercommunicator  in  which
local_world  and  remote_world  are  the same  is  no different to  an
intracommunicator.  This  suggests to me that either  (a) there should
only  be  an   intercommunicator  class  or  (b)   we  think  of   the
intracommunicator   class   as  simply   syntactic  sugar  around  the
intercommunicator class. 

The  communicator  object  class  names   are  rather  long.   Perhaps
programmers would prefer shorter names  in programs. We could take the
approach  of  deriving  names  from  the  list  of  objects  which   a
communicator binds, for example: "intracommunicator"  becomes "CW"  as
it is a binding of a context and  a world; "intercommunicator" becomes
"CWW" as it is a binding of a context  and  a world and another world.
On the other hand we  could take  collections of letters from the long
names,  for  example: "intracommunicator"  becomes  "RACO"  or  "ACO";
"intercommunicator" becomes "ERCO" or "ECO".


			o--------------------o

Comments, questions, flames, please!

Best Wishes
Lyndon 

         /--------------------------------------------------------\
    e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
    c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
         \--------------------------------------------------------/


From owner-mpi-collcomm@CS.UTK.EDU  Tue Apr 20 14:07:45 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA04411; Tue, 20 Apr 93 14:07:45 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA22213; Tue, 20 Apr 93 14:07:04 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Tue, 20 Apr 1993 14:07:02 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from daedalus.epcc.ed.ac.uk by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA22052; Tue, 20 Apr 93 14:06:10 -0400
Date: Tue, 20 Apr 93 19:06:06 BST
Message-Id: <5045.9304201806@subnode.epcc.ed.ac.uk>
From: L J Clarke <lyndon@epcc.ed.ac.uk>
Subject: Re: proposal -- context and tag limits
To: rj_littlefield@pnlg.pnl.gov, mpi-context@cs.utk.edu
In-Reply-To: rj_littlefield@pnlg.pnl.gov's message of Fri, 9 Apr 93 19:43:26 PDT
Reply-To: lyndon@epcc.ed.ac.uk
Cc: d39135@carbon.pnl.gov, gropp@mcs.anl.gov, mpi-collcomm@cs.utk.edu,
        mpi-envir@cs.utk.edu, mpi-pt2pt@cs.utk.edu

Rik writes:

> ============  PROPOSAL TO ENVIRONMENT COMMITTEE ==============

Yes, I support the spirit and detail of the proposal.

> Everyone I know would MUCH prefer suboptimal performance 
> over HAVING to rewrite applications to conform to varying and
> inconsistent hard limits.

Yes, this claim is true of everyone I know except for one very small
community of academic scientists who will write their relatively simple
programs from scratch for every machine on which they will do major
scientific production runs.  I know a whole lot more academics and
commercials who just will not write programs from scratch in this way. 

> Yes, I recall the many arguments against mandating specific
> limits.  But, I claim that those arguments are misdirected.

Indeed I believe that your claim is valid.

> I believe that we will not meet our goal of portability 
> if we do not specify usable limits on tag and context values.

I have the same belief.  I also believe that if we fail on portability
then we fail period. 

Best Wishes
Lyndon

         /--------------------------------------------------------\
    e||) | Lyndon J Clarke    Edinburgh Parallel Computing Centre | e||) 
    c||c | Tel: 031 650 5021  Email: lyndon@epcc.edinburgh.ac.uk  | c||c 
         \--------------------------------------------------------/


From owner-mpi-collcomm@CS.UTK.EDU  Wed Apr 21 12:39:43 1993
Received: from CS.UTK.EDU by surfer.EPM.ORNL.GOV (5.61/1.34)
	id AA02902; Wed, 21 Apr 93 12:39:43 -0400
Received: from localhost by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA03050; Wed, 21 Apr 93 12:38:34 -0400
X-Resent-To: mpi-collcomm@CS.UTK.EDU ; Wed, 21 Apr 1993 12:38:32 EDT
Errors-To: owner-mpi-collcomm@CS.UTK.EDU
Received: from almaden.ibm.com by CS.UTK.EDU with SMTP (5.61+IDA+UTK-930125/2.8s-UTK)
	id AA03027; Wed, 21 Apr 93 12:38:01 -0400
Message-Id: <9304211638.AA03027@CS.UTK.EDU>
Received: from almaden.ibm.com by almaden.ibm.com (IBM VM SMTP V2R2)
   with BSMTP id 3186; Wed, 21 Apr 93 09:38:37 PDT
Date: Wed, 21 Apr 93 09:19:33 PDT
From: "Ching-Tien (Howard) Ho" <ho@almaden.ibm.com>
To: mpi-collcomm@cs.utk.edu
Subject: The CCL Common Group Structures paper by Bruck et al.

Hi,
  Here is another recent related paper by us (A Proposal for Common
Group Structures in a Collective Communication Library)
which I distributed to some people in various occasions.
It has appeared as an IBM RJ 9241, March 1993.

In this paper, we tried NOT to change the semantics of process groups from
its original definition: an ordered set of processes.  Also, the assumption
for our case was to IGNORE the machine topology at all for various reasons.
Under these two assumptions,
the process topology was treated in an implicit way (say, based on
creating subgroup and performing +1/-1 shift within a subgroup).
That is we mainly
provide a set of macros which conveniently create various subgroups from
a specified group based on the commonly used algorithm structures
posed on the group by the user.

As usual, all comments are welcome.

Regards,

-- Howard

%!PS-Adobe-2.0
%%Creator: dvips 5.47 (RS/6000 1.0) Copyright 1986-91 Radical Eye Software
%%Title: grid.dvi
%%Pages: 24 1
%%BoundingBox: 0 0 612 792
%%EndComments
%%BeginProcSet: tex.pro
/TeXDict 250 dict def TeXDict begin /N /def load def /B{bind def}N /S /exch
load def /X{S N}B /TR /translate load N /isls false N /vsize 10 N /@rigin{
isls{[0 1 -1 0 0 0]concat}if 72 Resolution div 72 VResolution div neg scale
Resolution VResolution vsize neg mul TR matrix currentmatrix dup dup 4 get
round 4 exch put dup dup 5 get round 5 exch put setmatrix}N /@letter{/vsize 10
N}B /@landscape{/isls true N /vsize -1 N}B /@a4{/vsize 10.6929133858 N}B /@a3{
/vsize 15.5531 N}B /@ledger{/vsize 16 N}B /@legal{/vsize 13 N}B /@manualfeed{
statusdict /manualfeed true put}B /@copies{/#copies X}B /FMat[1 0 0 -1 0 0]N
/FBB[0 0 0 0]N /nn 0 N /IE 0 N /ctr 0 N /df-tail{/nn 8 dict N nn begin
/FontType 3 N /FontMatrix fntrx N /FontBBox FBB N string /base X array
/BitMaps X /BuildChar{CharBuilder} N /Encoding IE N end dup{/foo setfont}2
array copy cvx N load 0 nn put /ctr 0 N[}B /df{/sf 1 N /fntrx FMat N df-tail}
B /dfs{div /sf X /fntrx[sf 0 0 sf neg 0 0]N df-tail}B /E{pop nn dup definefont
setfont}B /ch-width{ch-data dup length 5 sub get} B /ch-height{ch-data dup
length 4 sub get} B /ch-xoff{128 ch-data dup length 3 sub get sub} B /ch-yoff{
ch-data dup length 2 sub get 127 sub} B /ch-dx{ch-data dup length 1 sub get} B
/ch-image{ch-data dup type /stringtype ne{ctr get /ctr ctr 1 add N}if}B /id 0
N /rw 0 N /rc 0 N /gp 0 N /cp 0 N /G 0 N /sf 0 N /CharBuilder{save 3 1 roll S
dup /base get 2 index get S /BitMaps get S get /ch-data X pop /ctr 0 N ch-dx 0
ch-xoff ch-yoff ch-height sub ch-xoff ch-width add ch-yoff setcachedevice
ch-width ch-height true[1 0 0 -1 -.1 ch-xoff sub ch-yoff .1 add]{ch-image}
imagemask restore}B /D{/cc X dup type /stringtype ne{]}if nn /base get cc ctr
put nn /BitMaps get S ctr S sf 1 ne{dup dup length 1 sub dup 2 index S get sf
div put}if put /ctr ctr 1 add N}B /I{cc 1 add D}B /bop{userdict /bop-hook
known{bop-hook}if /SI save N @rigin 0 0 moveto}N /eop{clear SI restore
showpage userdict /eop-hook known{eop-hook}if}N /@start{userdict /start-hook
known{start-hook}if /VResolution X /Resolution X 1000 div /DVImag X /IE 256
array N 0 1 255{IE S 1 string dup 0 3 index put cvn put} for}N /p /show load N
/RMat[1 0 0 -1 0 0]N /BDot 260 string N /rulex 0 N /ruley 0 N /v{/ruley X
/rulex X V}B /V statusdict begin /product where{pop product dup length 7 ge{0
7 getinterval(Display)eq}{pop false}ifelse}{false}ifelse end{{gsave TR -.1 -.1
TR 1 1 scale rulex ruley false RMat{BDot}imagemask grestore}}{{gsave TR -.1
-.1 TR rulex ruley scale 1 1 false RMat{BDot}imagemask grestore}}ifelse B /a{
moveto}B /delta 0 N /tail{dup /delta X 0 rmoveto}B /M{S p delta add tail}B /b{
S p tail}B /c{-4 M}B /d{-3 M}B /e{-2 M}B /f{-1 M}B /g{0 M}B /h{1 M}B /i{2 M}B
/j{3 M}B /k{4 M}B /w{0 rmoveto}B /l{p -4 w}B /m{p -3 w}B /n{p -2 w}B /o{p -1 w
}B /q{p 1 w}B /r{p 2 w}B /s{p 3 w}B /t{p 4 w}B /x{0 S rmoveto}B /y{3 2 roll p
a}B /bos{/SS save N}B /eos{clear SS restore}B end
%%EndProcSet
TeXDict begin 1000 300 300 @start /Fa 4 113 df<000FE0000FE00001E00001E00001E0
0001E00001E00001E00001E003F9E00F07E01C03E03C01E07801E07801E0F801E0F801E0F801E0
F801E0F801E07801E07801E03C01E01C03E00F0DFC03F9FC161A7F9919>100
D<1C003E003E003E003E001C0000000000000000007E007E001E001E001E001E001E001E001E00
1E001E001E001E001E001E00FF80FF80091B7F9A0D>105 D<FE1F01F000FE63C63C001E81C81C
001F01F01E001F01F01E001E01E01E001E01E01E001E01E01E001E01E01E001E01E01E001E01E0
1E001E01E01E001E01E01E001E01E01E001E01E01E00FFCFFCFFC0FFCFFCFFC022117F9025>
109 D<FE7F00FFC3C01F01E01E00F01E00F81E00781E007C1E007C1E007C1E007C1E007C1E0078
1E00F81E00F01F01E01F83C01E7F001E00001E00001E00001E00001E0000FFC000FFC00016187F
9019>112 D E /Fb 30 123 df<1C3C3C3C3C040408081020204080060E7D840E>44
D<7FF0FFE07FE00C037D8A10>I<0000600000E00000E00000E00001C00001C00001C000038000
0380000300000700000700000600000E00000C0000180000180000300000300000630000C70000
8700010700030700060E00040E00080E003F8E00607C00801FC0001C00001C0000380000380000
380000380000700000700000600013277E9D17>52 D<08E0100BF01017F8201FF8603E19C0380E
80200080600100400300800300000600000E00000C00001C00001C000038000038000070000070
0000F00000F00001E00001E00001E00003C00003C00003C00007C000078000078000030000141F
799D17>55 D<001F000061800080C00100600300600600600600600600600E00C00F00800F8180
07C30007E40003F80001F80003FC00047E00183F00300F00200700600700C00300C00300C00300
800600800600C00C00C008004030003060001F8000131F7B9D17>I<01FFF0001F00001E00001E
00001E00003C00003C00003C00003C0000780000780000780000780000F00000F00000F00000F0
0001E00001E00001E00001E00003C00003C00003C00003C0000780000780000780000780000F80
00FFF800141F7D9E12>73 D<01FFFF80001E00E0001E0070001E0038001E003C003C003C003C00
3C003C003C003C003C0078007800780078007800F0007800E000F003C000F00F0000FFFC0000F0
000001E0000001E0000001E0000001E0000003C0000003C0000003C0000003C000000780000007
80000007800000078000000F800000FFF000001E1F7D9E1F>80 D<0007E040001C18C000300580
0060038000C0038001C00180018001000380010003800100038001000380000003C0000003C000
0003F8000001FF800001FFE000007FF000001FF0000001F8000000780000007800000038000000
380020003800200038002000300060007000600060006000E0007000C000E8038000C606000081
F800001A217D9F1A>83 D<00F1800389C00707800E03801C03803C038038070078070078070078
0700F00E00F00E00F00E00F00E20F01C40F01C40703C40705C40308C800F070013147C9317>97
D<07803F8007000700070007000E000E000E000E001C001C001CF01D0C3A0E3C0E380F380F700F
700F700F700FE01EE01EE01EE01CE03CE038607060E031C01F0010207B9F15>I<007E0001C100
0300800E07801E07801C07003C0200780000780000780000F00000F00000F00000F00000F00000
70010070020030040018380007C00011147C9315>I<0000780003F80000700000700000700000
700000E00000E00000E00000E00001C00001C000F1C00389C00707800E03801C03803C03803807
00780700780700780700F00E00F00E00F00E00F00E20F01C40F01C40703C40705C40308C800F07
0015207C9F17>I<007C01C207010E011C013C013802780C7BF07C00F000F000F000F000700070
0170023804183807C010147C9315>I<00007800019C00033C00033C000718000700000700000E
00000E00000E00000E00000E0001FFE0001C00001C00001C00001C000038000038000038000038
0000380000700000700000700000700000700000700000E00000E00000E00000E00000C00001C0
0001C0000180003180007B0000F300006600003C00001629829F0E>I<003C6000E27001C1E003
80E00700E00F00E00E01C01E01C01E01C01E01C03C03803C03803C03803C03803C07003C07001C
0F001C17000C2E0003CE00000E00000E00001C00001C00301C00783800F0700060E0003F800014
1D7E9315>I<01E0000FE00001C00001C00001C00001C000038000038000038000038000070000
070000071E000763000E81800F01C00E01C00E01C01C03801C03801C03801C0380380700380700
380700380E10700E20700C20701C20700C40E00CC060070014207D9F17>I<00C001E001E001C0
00000000000000000000000000000E003300230043804300470087000E000E000E001C001C001C
003840388030807080310033001C000B1F7C9E0E>I<01E0000FE00001C00001C00001C00001C0
000380000380000380000380000700000700000703C00704200E08E00E11E00E21E00E40C01C80
001D00001E00001FC00038E000387000387000383840707080707080707080703100E03100601E
0013207D9F15>107 D<03C01FC0038003800380038007000700070007000E000E000E000E001C
001C001C001C0038003800380038007000700070007100E200E200E200E200640038000A207C9F
0C>I<1C0F80F0002630C318004740640C004780680E004700700E004700700E008E00E01C000E
00E01C000E00E01C000E00E01C001C01C038001C01C038001C01C038001C01C070803803807100
3803806100380380E10038038062007007006600300300380021147C9325>I<1C0F802630C047
40604780604700704700708E00E00E00E00E00E00E00E01C01C01C01C01C01C01C038438038838
03083807083803107003303001C016147C931A>I<007C0001C3000301800E01C01E01C01C01E0
3C01E07801E07801E07801E0F003C0F003C0F003C0F00780F00700700F00700E00301800187000
07C00013147C9317>I<01C1E002621804741C04781C04701E04701E08E01E00E01E00E01E00E0
1E01C03C01C03C01C03C01C0380380780380700380E003C1C0072380071E000700000700000E00
000E00000E00000E00001C00001C0000FFC000171D809317>I<1C1E0026610047838047878047
07804703008E00000E00000E00000E00001C00001C00001C00001C000038000038000038000038
000070000030000011147C9313>114 D<00FC030206010C030C070C060C000F800FF007F803FC
003E000E700EF00CF00CE008401020601F8010147D9313>I<018001C003800380038003800700
0700FFF007000E000E000E000E001C001C001C001C003800380038003820704070407080708031
001E000C1C7C9B0F>I<0E00C03300E02301C04381C04301C04701C08703800E03800E03800E03
801C07001C07001C07001C07101C0E20180E20180E201C1E200C264007C38014147C9318>I<03
83800CC4401068E01071E02071E02070C040E00000E00000E00000E00001C00001C00001C00001
C040638080F38080F38100E5810084C60078780013147D9315>120 D<0E00C03300E02301C043
81C04301C04701C08703800E03800E03800E03801C07001C07001C07001C07001C0E00180E0018
0E001C1E000C3C0007DC00001C00001C00003800F03800F07000E06000C0C0004380003E000013
1D7C9316>I<01C04003E08007F1800C1F00080200000400000800001000002000004000008000
0100000200000401000802001002003E0C0063FC0041F80080E00012147D9313>I
E /Fc 3 111 df<003E000C000C000C000C0018001800180018073018F0307060706060C060C0
60C06080C080C480C4C1C446C838700F177E9612>100 D<030003800300000000000000000000
0000001C002400460046008C000C0018001800180031003100320032001C0009177F960C>105
D<383C0044C6004702004602008E06000C06000C06000C0C00180C00180C401818401818803008
80300F00120E7F8D15>110 D E /Fd 46 122 df<FFFCFFFCFFFCFFFC0E047F8C13>45
D<387CFEFEFE7C3807077C8610>I<00180000780001F800FFF800FFF80001F80001F80001F800
01F80001F80001F80001F80001F80001F80001F80001F80001F80001F80001F80001F80001F800
01F80001F80001F80001F80001F80001F80001F80001F80001F8007FFFE07FFFE013207C9F1C>
49 D<03FC000FFF003C1FC07007E07C07F0FE03F0FE03F8FE03F8FE01F87C01F83803F80003F8
0003F00003F00007E00007C0000F80001F00003E0000380000700000E01801C018038018070018
0E00380FFFF01FFFF03FFFF07FFFF0FFFFF0FFFFF015207D9F1C>I<00FE0007FFC00F07E01E03
F03F03F03F81F83F81F83F81F81F03F81F03F00003F00003E00007C0001F8001FE0001FF000007
C00001F00001F80000FC0000FC3C00FE7E00FEFF00FEFF00FEFF00FEFF00FC7E01FC7801F81E07
F00FFFC001FE0017207E9F1C>I<0000E00001E00003E00003E00007E0000FE0001FE0001FE000
37E00077E000E7E001C7E00187E00307E00707E00E07E00C07E01807E03807E07007E0E007E0FF
FFFEFFFFFE0007E00007E00007E00007E00007E00007E00007E000FFFE00FFFE17207E9F1C>I<
1000201E01E01FFFC01FFF801FFF001FFE001FF8001BC00018000018000018000018000019FC00
1FFF001E0FC01807E01803E00003F00003F00003F80003F83803F87C03F8FE03F8FE03F8FC03F0
FC03F07007E03007C01C1F800FFF0003F80015207D9F1C>I<000070000000007000000000F800
000000F800000000F800000001FC00000001FC00000003FE00000003FE00000003FE00000006FF
000000067F0000000E7F8000000C3F8000000C3F800000183FC00000181FC00000381FE0000030
0FE00000300FE00000600FF000006007F00000E007F80000FFFFF80000FFFFF800018001FC0001
8001FC00038001FE00030000FE00030000FE000600007F000600007F00FFE00FFFF8FFE00FFFF8
25227EA12A>65 D<0003FE0080001FFF818000FF01E38001F8003F8003E0001F8007C0000F800F
800007801F800007803F000003803F000003807F000001807E000001807E00000180FE00000000
FE00000000FE00000000FE00000000FE00000000FE00000000FE00000000FE000000007E000000
007E000001807F000001803F000001803F000003801F800003000F8000030007C000060003F000
0C0001F800380000FF00F000001FFFC0000003FE000021227DA128>67 D<FFFFFF8000FFFFFFF0
0007F003FC0007F0007E0007F0003F0007F0001F8007F0000FC007F00007E007F00007E007F000
07F007F00003F007F00003F007F00003F007F00003F807F00003F807F00003F807F00003F807F0
0003F807F00003F807F00003F807F00003F807F00003F807F00003F007F00003F007F00003F007
F00007E007F00007E007F0000FC007F0001F8007F0003F0007F0007E0007F003FC00FFFFFFF000
FFFFFF800025227EA12B>I<FFFFFFFCFFFFFFFC07F000FC07F0003C07F0001C07F0000C07F000
0E07F0000E07F0000607F0180607F0180607F0180607F0180007F0380007F0780007FFF80007FF
F80007F0780007F0380007F0180007F0180007F0180307F0180307F0000307F0000607F0000607
F0000607F0000E07F0000E07F0001E07F0003E07F001FCFFFFFFFCFFFFFFFC20227EA125>I<00
03FE0040001FFFC0C0007F00F1C001F8003FC003F0000FC007C00007C00FC00003C01F800003C0
3F000001C03F000001C07F000000C07E000000C07E000000C0FE00000000FE00000000FE000000
00FE00000000FE00000000FE00000000FE00000000FE000FFFFC7E000FFFFC7F00001FC07F0000
1FC03F00001FC03F00001FC01F80001FC00FC0001FC007E0001FC003F0001FC001FC003FC0007F
80E7C0001FFFC3C00003FF00C026227DA12C>71 D<FFFF83FFFEFFFF83FFFE07F0001FC007F000
1FC007F0001FC007F0001FC007F0001FC007F0001FC007F0001FC007F0001FC007F0001FC007F0
001FC007F0001FC007F0001FC007F0001FC007FFFFFFC007FFFFFFC007F0001FC007F0001FC007
F0001FC007F0001FC007F0001FC007F0001FC007F0001FC007F0001FC007F0001FC007F0001FC0
07F0001FC007F0001FC007F0001FC007F0001FC007F0001FC0FFFF83FFFEFFFF83FFFE27227EA1
2C>I<FFFFE0FFFFE003F80003F80003F80003F80003F80003F80003F80003F80003F80003F800
03F80003F80003F80003F80003F80003F80003F80003F80003F80003F80003F80003F80003F800
03F80003F80003F80003F80003F80003F80003F800FFFFE0FFFFE013227FA115>I<FFFFE000FF
FFE00007F0000007F0000007F0000007F0000007F0000007F0000007F0000007F0000007F00000
07F0000007F0000007F0000007F0000007F0000007F0000007F0000007F0000007F0000007F000
0007F0001807F0001807F0001807F0001807F0003807F0003807F0007007F0007007F000F007F0
01F007F007F0FFFFFFF0FFFFFFF01D227EA122>76 D<FFF000000FFFFFF800001FFF07F800001F
E006FC000037E006FC000037E006FC000037E0067E000067E0067E000067E0063F0000C7E0063F
0000C7E0061F800187E0061F800187E0060FC00307E0060FC00307E0060FC00307E00607E00607
E00607E00607E00603F00C07E00603F00C07E00601F81807E00601F81807E00601F81807E00600
FC3007E00600FC3007E006007E6007E006007E6007E006003FC007E006003FC007E006001F8007
E006001F8007E006001F8007E006000F0007E0FFF00F00FFFFFFF00600FFFF30227EA135>I<FF
F8001FFEFFFC001FFE07FC0000C007FE0000C006FF0000C0067F8000C0063FC000C0061FE000C0
060FE000C0060FF000C00607F800C00603FC00C00601FE00C00600FE00C00600FF00C006007F80
C006003FC0C006001FE0C006000FF0C0060007F0C0060007F8C0060003FCC0060001FEC0060000
FFC00600007FC00600007FC00600003FC00600001FC00600000FC006000007C006000003C00600
0003C0FFF00001C0FFF00000C027227EA12C>I<0007FC0000003FFF800000FC07E00003F001F8
0007E000FC000FC0007E001F80003F001F80003F003F00001F803F00001F807F00001FC07E0000
0FC07E00000FC0FE00000FE0FE00000FE0FE00000FE0FE00000FE0FE00000FE0FE00000FE0FE00
000FE0FE00000FE0FE00000FE07E00000FC07F00001FC07F00001FC03F00001F803F80003F801F
80003F000FC0007E0007E000FC0003F001F80000FC07E000003FFF80000007FC000023227DA12A
>I<FFFFFF00FFFFFFE007F007F007F001FC07F000FC07F0007E07F0007E07F0007F07F0007F07
F0007F07F0007F07F0007F07F0007E07F0007E07F000FC07F001FC07F007F007FFFFE007FFFF00
07F0000007F0000007F0000007F0000007F0000007F0000007F0000007F0000007F0000007F000
0007F0000007F0000007F00000FFFF8000FFFF800020227EA126>I<FFFFFE0000FFFFFFC00007
F007F00007F001F80007F000FC0007F0007E0007F0007F0007F0007F0007F0007F0007F0007F00
07F0007F0007F0007F0007F0007E0007F000FC0007F001F80007F007F00007FFFFC00007FFFF80
0007F00FE00007F007F00007F003F80007F001FC0007F001FC0007F001FC0007F001FC0007F001
FC0007F001FC0007F001FC0007F001FC0007F001FC0607F000FE0607F000FF0CFFFF803FF8FFFF
800FF027227EA12A>82 D<01FC0407FF8C1F03FC3C007C7C003C78001C78001CF8000CF8000CFC
000CFC0000FF0000FFE0007FFF007FFFC03FFFF01FFFF80FFFFC03FFFE003FFE0003FF00007F00
003F00003FC0001FC0001FC0001FE0001EE0001EF0003CFC003CFF00F8C7FFE080FF8018227DA1
1F>I<7FFFFFFF807FFFFFFF807E03F80F807803F807807003F803806003F80180E003F801C0E0
03F801C0C003F800C0C003F800C0C003F800C0C003F800C00003F800000003F800000003F80000
0003F800000003F800000003F800000003F800000003F800000003F800000003F800000003F800
000003F800000003F800000003F800000003F800000003F800000003F800000003F800000003F8
00000003F8000003FFFFF80003FFFFF80022227EA127>I<FFFF803FFCFFFF803FFC07F0000180
07F000018007F000018007F000018007F000018007F000018007F000018007F000018007F00001
8007F000018007F000018007F000018007F000018007F000018007F000018007F000018007F000
018007F000018007F000018007F000018007F000018007F000018007F000018007F000018003F0
00030003F800030001F800060000FC000E00007E001C00003F80F800000FFFE0000001FF000026
227EA12B>I<FFFF800FFEFFFF800FFE07F80000C007F80001C003FC00018001FE00030001FE00
070000FF00060000FF000C00007F801C00003FC01800003FC03000001FE07000000FF06000000F
F0E0000007F8C0000003FD80000003FF80000001FF00000001FE00000000FE00000000FE000000
00FE00000000FE00000000FE00000000FE00000000FE00000000FE00000000FE00000000FE0000
0000FE00000000FE0000001FFFF000001FFFF00027227FA12A>89 D<07FC001FFF803F07C03F03
E03F01E03F01F01E01F00001F00001F0003FF003FDF01FC1F03F01F07E01F0FC01F0FC01F0FC01
F0FC01F07E02F07E0CF81FF87F07E03F18167E951B>97 D<FF000000FF0000001F0000001F0000
001F0000001F0000001F0000001F0000001F0000001F0000001F0000001F0000001F0000001F0F
E0001F3FF8001FF07C001F801E001F001F001F000F801F000F801F000FC01F000FC01F000FC01F
000FC01F000FC01F000FC01F000FC01F000FC01F000F801F001F801F801F001FC03E001EE07C00
1C3FF800180FC0001A237EA21F>I<00FF8007FFE00F83F01F03F03E03F07E03F07C01E07C0000
FC0000FC0000FC0000FC0000FC0000FC00007C00007E00007E00003E00301F00600FC0E007FF80
00FE0014167E9519>I<0001FE000001FE0000003E0000003E0000003E0000003E0000003E0000
003E0000003E0000003E0000003E0000003E0000003E0001FC3E0007FFBE000F81FE001F007E00
3E003E007E003E007C003E00FC003E00FC003E00FC003E00FC003E00FC003E00FC003E00FC003E
00FC003E007C003E007C003E003E007E001E00FE000F83BE0007FF3FC001FC3FC01A237EA21F>
I<00FE0007FF800F87C01E01E03E01F07C00F07C00F8FC00F8FC00F8FFFFF8FFFFF8FC0000FC00
00FC00007C00007C00007E00003E00181F00300FC07003FFC000FF0015167E951A>I<003F8000
FFC001E3E003C7E007C7E00F87E00F83C00F80000F80000F80000F80000F80000F8000FFFC00FF
FC000F80000F80000F80000F80000F80000F80000F80000F80000F80000F80000F80000F80000F
80000F80000F80000F80000F80000F80007FF8007FF80013237FA211>I<03FC1E0FFF7F1F0F8F
3E07CF3C03C07C03E07C03E07C03E07C03E07C03E03C03C03E07C01F0F801FFF0013FC00300000
3000003800003FFF801FFFF00FFFF81FFFFC3800FC70003EF0001EF0001EF0001EF0001E78003C
7C007C3F01F80FFFE001FF0018217E951C>I<FF000000FF0000001F0000001F0000001F000000
1F0000001F0000001F0000001F0000001F0000001F0000001F0000001F0000001F07E0001F1FF8
001F307C001F403C001F803E001F803E001F003E001F003E001F003E001F003E001F003E001F00
3E001F003E001F003E001F003E001F003E001F003E001F003E001F003E001F003E00FFE1FFC0FF
E1FFC01A237EA21F>I<1C003F007F007F007F003F001C000000000000000000000000000000FF
00FF001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F00
FFE0FFE00B247EA310>I<FF000000FF0000001F0000001F0000001F0000001F0000001F000000
1F0000001F0000001F0000001F0000001F0000001F0000001F00FF801F00FF801F0038001F0060
001F01C0001F0380001F0700001F0E00001F1C00001F7E00001FFF00001FCF00001F0F80001F07
C0001F03E0001F01E0001F01F0001F00F8001F007C001F003C00FFE0FFC0FFE0FFC01A237EA21E
>107 D<FF00FF001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F
001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F00FFE0FFE00B237EA2
10>I<FF07F007F000FF1FFC1FFC001F303E303E001F403E403E001F801F801F001F801F801F00
1F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F
001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F00
1F001F001F00FFE0FFE0FFE0FFE0FFE0FFE02B167E9530>I<FF07E000FF1FF8001F307C001F40
3C001F803E001F803E001F003E001F003E001F003E001F003E001F003E001F003E001F003E001F
003E001F003E001F003E001F003E001F003E001F003E001F003E00FFE1FFC0FFE1FFC01A167E95
1F>I<00FE0007FFC00F83E01E00F03E00F87C007C7C007C7C007CFC007EFC007EFC007EFC007E
FC007EFC007EFC007E7C007C7C007C3E00F81F01F00F83E007FFC000FE0017167E951C>I<FF0F
E000FF3FF8001FF07C001F803E001F001F001F001F801F001F801F000FC01F000FC01F000FC01F
000FC01F000FC01F000FC01F000FC01F000FC01F001F801F001F801F803F001FC03E001FE0FC00
1F3FF8001F0FC0001F0000001F0000001F0000001F0000001F0000001F0000001F0000001F0000
00FFE00000FFE000001A207E951F>I<FE1F00FE3FC01E67E01EC7E01E87E01E87E01F83C01F00
001F00001F00001F00001F00001F00001F00001F00001F00001F00001F00001F00001F0000FFF0
00FFF00013167E9517>114 D<0FF3003FFF00781F00600700E00300E00300F00300FC00007FE0
007FF8003FFE000FFF0001FF00000F80C00780C00380E00380E00380F00700FC0E00EFFC00C7F0
0011167E9516>I<0180000180000180000180000380000380000780000780000F80003F8000FF
FF00FFFF000F80000F80000F80000F80000F80000F80000F80000F80000F80000F80000F80000F
81800F81800F81800F81800F81800F830007C30003FE0000F80011207F9F16>I<FF01FE00FF01
FE001F003E001F003E001F003E001F003E001F003E001F003E001F003E001F003E001F003E001F
003E001F003E001F003E001F003E001F003E001F003E001F007E001F00FE000F81BE0007FF3FC0
01FC3FC01A167E951F>I<FFE7FF07F8FFE7FF07F81F007800C00F807801800F807C01800F807C
018007C07E030007C0DE030007E0DE070003E0DF060003E18F060001F18F0C0001F38F8C0001FB
079C0000FB07D80000FE03D800007E03F000007E03F000007C01F000003C01E000003800E00000
1800C00025167F9528>119 D<FFE07FC0FFE07FC00F801C0007C0380003E0700003F0600001F8
C00000F98000007F8000003F0000001F0000001F8000003FC0000037C0000063E00000C1F00001
C0F8000380FC0007007E000E003E00FF80FFE0FF80FFE01B167F951E>I<FFE01FE0FFE01FE00F
8006000F8006000FC00E0007C00C0007E01C0003E0180003E0180001F0300001F0300000F86000
00F86000007CC000007CC000007FC000003F8000003F8000001F0000001F0000000E0000000E00
00000C0000000C00000018000078180000FC380000FC300000FC60000069C000007F8000001F00
00001B207F951E>I E /Fe 4 21 df<FFFFFFC0FFFFFFC01A027C8B23>0
D<400004C0000C6000183000301800600C00C006018003030001860000CC000078000030000030
0000780000CC000186000303000601800C00C0180060300030600018C0000C40000416187A9623
>2 D<01800180018001800180C183F18F399C0FF003C003C00FF0399CF18FC183018001800180
0180018010147D9417>I<000000C0000003C000000F0000003C000000F0000003C00000070000
001C00000078000001E00000078000001E00000078000000E0000000780000001E000000078000
0001E0000000780000001C0000000700000003C0000000F00000003C0000000F00000003C00000
00C0000000000000000000000000000000000000000000000000000000007FFFFF80FFFFFFC01A
247C9C23>20 D E /Ff 8 122 df<00FFF83FF8000FC00F80000F80060000078004000007C008
000003C010000003C020000003E040000001E080000001F100000000F300000000F600000000FC
0000000078000000007C000000007C000000007C00000000BE000000011E000000021E00000006
1F0000000C0F000000080F800000100780000020078000004007C000008003C000010003E00003
0003E0000F0007E000FFE01FFE00251F7F9E26>88 D<FFF801FF0F8000780F8000600780004007
C0008007C0018003C0010003E0020003E0040001E0080001F0180000F0100000F0200000F84000
00788000007D0000007D0000003E0000003C0000003C0000003800000078000000780000007800
000070000000F0000000F0000000F0000000F0000001E000003FFF0000201F7F9E1A>I<007FFF
F800FC00F000E001E000C003C0018007800100078003000F0002001E0002003C00040078000000
F8000000F0000001E0000003C00000078000000F0000000F0000001E0000003C00000078008000
F0008001F0010001E0010003C00300078002000F0006001E0004003E000C003C003C007800F800
FFFFF8001D1F7D9E1F>I<0000780003F80000700000700000700000700000E00000E00000E000
00E00001C00001C000F1C00389C00707800E03801C03803C0380380700780700780700780700F0
0E00F00E00F00E00F00E10F01C20F01C20703C20705C40308C400F078015207E9F18>100
D<00E001E001E000C000000000000000000000000000000E001300238043804380438087000700
07000E000E001C001C001C20384038403840388019000E000B1F7E9E10>105
D<1E07802318C023A06043C0704380704380708700E00700E00700E00700E00E01C00E01C00E01
C00E03821C03841C07041C07081C03083803101801E017147E931B>110
D<03C1C00C62201034701038F02038F020386040700000700000700000700000E00000E00000E0
0000E02061C040F1C040F1C080E2C080446300383C0014147E931A>120
D<0F00601180702180E021C0E041C0E04380E08381C00701C00701C00701C00E03800E03800E03
800E03800E07000C07000C07000E0F00061E0003EE00000E00000E00001C007818007838007070
0060600021C0001F0000141D7E9316>I E /Fg 18 117 df<60F0F06004047D830B>46
D<00300030007000F000F001700370027004700C7008701070307020704070C070FFFF00700070
007000700070007007FF10187F9713>52 D<000C0000000C0000000C0000001E0000001E000000
3F000000270000002700000043800000438000004380000081C0000081C0000081C0000100E000
0100E00001FFE000020070000200700006007800040038000400380008001C0008001C001C001E
00FF00FFC01A1A7F991D>65 D<FEFEC0C0C0C0C0C0C0C0C0C0C0C0C0C0C0C0C0C0C0C0C0C0C0C0
C0C0C0C0C0C0C0C0C0FEFE07257D9B0B>91 D<FEFE060606060606060606060606060606060606
060606060606060606060606060606FEFE0725809B0B>93 D<3F8070C070E020700070007007F0
1C7030707070E070E071E071E0F171FB1E3C10107E8F13>97 D<07F80C1C381C30087000E000E0
00E000E000E000E0007000300438080C1807E00E107F8F11>99 D<007E00000E00000E00000E00
000E00000E00000E00000E00000E00000E0003CE000C3E00380E00300E00700E00E00E00E00E00
E00E00E00E00E00E00E00E00600E00700E00381E001C2E0007CFC0121A7F9915>I<07C01C3030
187018600CE00CFFFCE000E000E000E0006000300438080C1807E00E107F8F11>I<01F0031807
380E100E000E000E000E000E000E00FFC00E000E000E000E000E000E000E000E000E000E000E00
0E000E000E007FE00D1A80990C>I<18003C003C001800000000000000000000000000FC001C00
1C001C001C001C001C001C001C001C001C001C001C001C001C00FF80091A80990A>105
D<FC00001C00001C00001C00001C00001C00001C00001C00001C00001C00001C3F801C1E001C18
001C10001C20001C40001DC0001FE0001CE0001C70001C78001C38001C1C001C1E001C1F00FF3F
C0121A7F9914>107 D<FCF8001D0C001E0E001E0E001C0E001C0E001C0E001C0E001C0E001C0E
001C0E001C0E001C0E001C0E001C0E00FF9FC012107F8F15>110 D<07E01C38300C700E6006E0
07E007E007E007E007E0076006700E381C1C3807E010107F8F13>I<FCF8001F0E001E07001C03
801C03801C01C01C01C01C01C01C01C01C01C01C01C01C03801C03001E07001F0C001CF0001C00
001C00001C00001C00001C00001C0000FF800012177F8F15>I<FCE01D701E701E201C001C001C
001C001C001C001C001C001C001C001C00FFC00C107F8F0F>114 D<1F2060E04020C020C020F0
007F003FC01FE000F080708030C030C020F0408F800C107F8F0F>I<0400040004000C000C001C
003C00FFC01C001C001C001C001C001C001C001C001C201C201C201C201C200E4003800B177F96
0F>I E /Fh 1 50 df<0C003C00CC000C000C000C000C000C000C000C000C000C000C000C000C
00FF8009107E8F0F>49 D E /Fi 42 123 df<00E00001E0000FE000FFE000F3E00003E00003E0
0003E00003E00003E00003E00003E00003E00003E00003E00003E00003E00003E00003E00003E0
0003E00003E00003E00003E00003E00003E00003E000FFFF80FFFF80111D7C9C1A>49
D<07F0001FFE00383F007C1F80FE0FC0FE0FC0FE0FE0FE07E07C07E03807E0000FE0000FC0000F
C0001F80001F00003E0000780000F00000E00001C0000380600700600E00601C00E01FFFC03FFF
C07FFFC0FFFFC0FFFFC0131D7D9C1A>I<01FC0007FF000E0F801E0FC03F07E03F07E03F07E03F
07E01E0FC0000FC0000F80001F0001FC0001FC00000F800007C00003E00003F00003F83803F87C
03F8FE03F8FE03F8FE03F0FC03F07807E03C0FC01FFF8003FC00151D7E9C1A>I<0001C00003C0
0007C00007C0000FC0001FC0003BC00073C00063C000C3C00183C00383C00703C00E03C00C03C0
1803C03803C07003C0E003C0FFFFFEFFFFFE0007C00007C00007C00007C00007C00007C000FFFE
00FFFE171D7F9C1A>I<0000E000000000E000000001F000000001F000000001F000000003F800
000003F800000006FC00000006FC0000000EFE0000000C7E0000000C7E000000183F000000183F
000000303F800000301F800000701FC00000600FC00000600FC00000C007E00000FFFFE00001FF
FFF000018003F000018003F000030001F800030001F800060001FC00060000FC000E0000FE00FF
E00FFFE0FFE00FFFE0231F7E9E28>65 D<FFFFFE00FFFFFFC007C007E007C003F007C001F807C0
01FC07C001FC07C001FC07C001FC07C001FC07C001F807C003F807C007F007C00FE007FFFF8007
FFFFC007C003F007C001F807C001FC07C000FC07C000FE07C000FE07C000FE07C000FE07C000FE
07C000FC07C001FC07C003F807C007F0FFFFFFE0FFFFFF001F1F7E9E25>I<0007FC02003FFF0E
00FE03DE03F000FE07E0003E0FC0001E1F80001E3F00000E3F00000E7F0000067E0000067E0000
06FE000000FE000000FE000000FE000000FE000000FE000000FE0000007E0000007E0000067F00
00063F0000063F00000C1F80000C0FC0001807E0003803F0007000FE01C0003FFF800007FC001F
1F7D9E26>I<FFFFFE0000FFFFFFC00007E007F00007E001F80007E000FC0007E0007E0007E000
3F0007E0003F0007E0001F8007E0001F8007E0001F8007E0001FC007E0001FC007E0001FC007E0
001FC007E0001FC007E0001FC007E0001FC007E0001FC007E0001FC007E0001F8007E0001F8007
E0001F8007E0003F0007E0003F0007E0007E0007E000FC0007E001F80007E007F000FFFFFFC000
FFFFFE0000221F7E9E28>I<FFFFFFE0FFFFFFE007E007E007E001E007E000E007E0006007E000
7007E0003007E0003007E0603007E0603007E0600007E0E00007E1E00007FFE00007FFE00007E1
E00007E0E00007E0600007E0600C07E0600C07E0000C07E0001807E0001807E0001807E0003807
E0007807E000F807E003F0FFFFFFF0FFFFFFF01E1F7E9E22>I<FFFFFFE0FFFFFFE007E007E007
E001E007E000E007E0006007E0007007E0003007E0003007E0603007E0603007E0600007E0E000
07E1E00007FFE00007FFE00007E1E00007E0E00007E0600007E0600007E0600007E0000007E000
0007E0000007E0000007E0000007E0000007E0000007E00000FFFF8000FFFF80001C1F7E9E21>
I<0007FC0200003FFF0E0000FE03DE0003F000FE0007E0003E000FC0001E001F80001E003F0000
0E003F00000E007F000006007E000006007E00000600FE00000000FE00000000FE00000000FE00
000000FE00000000FE003FFFE0FE003FFFE07E00007E007E00007E007F00007E003F00007E003F
00007E001F80007E000FC0007E0007E0007E0003F000FE0000FE01FE00003FFF8E000007FC0600
231F7D9E29>I<FFFFFFFF07E007E007E007E007E007E007E007E007E007E007E007E007E007E0
07E007E007E007E007E007E007E007E007E007E007E007E007E0FFFFFFFF101F7E9E14>73
D<FFE000003FF8FFF000007FF807F000007F0006F80000DF0006F80000DF0006F80000DF00067C
00019F00067C00019F00063E00031F00063E00031F00061F00061F00061F00061F00060F800C1F
00060F800C1F000607C0181F000607C0181F000607C0181F000603E0301F000603E0301F000601
F0601F000601F0601F000600F8C01F000600F8C01F0006007D801F0006007D801F0006003F001F
0006003F001F0006003F001F0006001E001F00FFF01E03FFF8FFF00C03FFF82D1F7E9E32>77
D<FFE000FFF0FFF000FFF007F000060007F800060006FC000600067E000600063F000600063F80
0600061F800600060FC006000607E006000603F006000601F806000601FC06000600FC06000600
7E060006003F060006001F860006001FC60006000FE600060007E600060003F600060001FE0006
0000FE00060000FE000600007E000600003E000600001E000600000E00FFF0000600FFF0000600
241F7E9E29>I<001FF80000FFFF0001F81F8007E007E00FC003F01F8001F81F0000F83F0000FC
7F0000FE7E00007E7E00007EFE00007FFE00007FFE00007FFE00007FFE00007FFE00007FFE0000
7FFE00007FFE00007F7E00007E7F0000FE7F0000FE3F0000FC3F8001FC1F8001F80FC003F007E0
07E001F81F8000FFFF00001FF800201F7D9E27>I<FFFFFE00FFFFFF8007E00FE007E003F007E0
01F807E001F807E001FC07E001FC07E001FC07E001FC07E001FC07E001F807E001F807E003F007
E00FE007FFFF8007FFFE0007E0000007E0000007E0000007E0000007E0000007E0000007E00000
07E0000007E0000007E0000007E0000007E00000FFFF0000FFFF00001E1F7E9E24>I<FFFFF800
00FFFFFF000007E01FC00007E007E00007E003F00007E003F00007E003F80007E003F80007E003
F80007E003F80007E003F00007E003F00007E007E00007E01FC00007FFFF000007FFFC000007E0
3E000007E01F000007E00F800007E00F800007E00FC00007E00FC00007E00FC00007E00FE00007
E00FE00007E00FE00007E00FE03007E007F03007E003F860FFFF01FFC0FFFF007F80241F7E9E27
>82 D<03FC080FFF381E03F83800F8700078700038F00038F00018F00018F80000FC00007FC000
7FFE003FFF801FFFC00FFFF007FFF000FFF80007F80000FC00007C00003CC0003CC0003CC0003C
E00038E00078F80070FE01E0E7FFC081FF00161F7D9E1D>I<FFFF01FFE0FFFF01FFE007E0000C
0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E000
0C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0
000C0007E0000C0007E0000C0007E0000C0007E0000C0007E0000C0003E000180001F000180001
F000300000F8006000007E03C000001FFF80000003FC0000231F7E9E28>85
D<07FC001FFF003F0F803F07C03F03E03F03E00C03E00003E0007FE007FBE01F03E03C03E07C03
E0F803E0F803E0F803E0FC05E07E0DE03FF8FE0FE07E17147F9319>97 D<FF0000FF00001F0000
1F00001F00001F00001F00001F00001F00001F00001F00001F00001F1FC01F7FF01FE0F81F807C
1F007E1F003E1F003E1F003F1F003F1F003F1F003F1F003F1F003F1F003E1F003E1F007C1F807C
1EC1F81C7FE0181F8018207E9F1D>I<01FE0007FF801F0FC03E0FC03E0FC07C0FC07C0300FC00
00FC0000FC0000FC0000FC0000FC00007C00007E00003E00603F00C01F81C007FF0001FC001314
7E9317>I<0007F80007F80000F80000F80000F80000F80000F80000F80000F80000F80000F800
00F801F8F80FFEF81F83F83E01F87E00F87C00F87C00F8FC00F8FC00F8FC00F8FC00F8FC00F8FC
00F87C00F87C00F87E00F83E01F81F07F80FFEFF03F8FF18207E9F1D>I<01FE0007FF800F83C0
1E01E03E00F07C00F07C00F8FC00F8FFFFF8FFFFF8FC0000FC0000FC00007C00007C00003E0018
1E00180F807007FFE000FF8015147F9318>I<001F8000FFC001F3E003E7E003C7E007C7E007C3
C007C00007C00007C00007C00007C000FFFC00FFFC0007C00007C00007C00007C00007C00007C0
0007C00007C00007C00007C00007C00007C00007C00007C00007C00007C0003FFC003FFC001320
7F9F10>I<01FC3C07FFFE0F079E1E03DE3E03E03E03E03E03E03E03E03E03E01E03C00F07800F
FF0009FC001800001800001C00001FFF800FFFF007FFF81FFFFC3C007C70003EF0001EF0001EF0
001E78003C78003C3F01F80FFFE001FF00171E7F931A>I<FF0000FF00001F00001F00001F0000
1F00001F00001F00001F00001F00001F00001F00001F0FC01F3FE01F61F01FC0F81F80F81F00F8
1F00F81F00F81F00F81F00F81F00F81F00F81F00F81F00F81F00F81F00F81F00F81F00F8FFE3FF
FFE3FF18207D9F1D>I<1C003E003F007F003F003E001C00000000000000000000000000FF00FF
001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F00FFE0FFE00B21
7EA00E>I<FF0000FF00001F00001F00001F00001F00001F00001F00001F00001F00001F00001F
00001F01FE1F01FE1F00F01F00C01F03801F07001F0C001F18001F7C001FFC001F9E001F0F001E
0F801E07C01E03C01E01E01E01F01E00F8FFC3FFFFC3FF18207E9F1C>107
D<FF00FF001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F001F00
1F001F001F001F001F001F001F001F001F001F001F00FFE0FFE00B207E9F0E>I<FE0FE03F80FE
1FF07FC01E70F9C3E01E407D01F01E807E01F01F807E01F01F007C01F01F007C01F01F007C01F0
1F007C01F01F007C01F01F007C01F01F007C01F01F007C01F01F007C01F01F007C01F01F007C01
F01F007C01F0FFE3FF8FFEFFE3FF8FFE27147D932C>I<FE0FC0FE3FE01E61F01EC0F81E80F81F
00F81F00F81F00F81F00F81F00F81F00F81F00F81F00F81F00F81F00F81F00F81F00F81F00F8FF
E3FFFFE3FF18147D931D>I<01FF0007FFC01F83F03E00F83E00F87C007C7C007CFC007EFC007E
FC007EFC007EFC007EFC007E7C007C7C007C3E00F83E00F81F83F007FFC001FF0017147F931A>
I<FF1FC0FF7FF01FE1F81F80FC1F007E1F007E1F003E1F003F1F003F1F003F1F003F1F003F1F00
3F1F003E1F007E1F007C1F80FC1FC1F81F7FE01F1F801F00001F00001F00001F00001F00001F00
001F0000FFE000FFE000181D7E931D>I<FE3E00FE7F801ECFC01E8FC01E8FC01F8FC01F03001F
00001F00001F00001F00001F00001F00001F00001F00001F00001F00001F0000FFF000FFF00012
147E9316>114 D<0FE63FFE701E600EE006E006F800FFC07FF83FFC1FFE03FE001FC007C007E0
07F006F81EFFFCC7F010147E9315>I<01800180018003800380038007800F803F80FFFCFFFC0F
800F800F800F800F800F800F800F800F800F800F860F860F860F860F8607CC03F801F00F1D7F9C
14>I<FF07F8FF07F81F00F81F00F81F00F81F00F81F00F81F00F81F00F81F00F81F00F81F00F8
1F00F81F00F81F00F81F01F81F01F80F06F807FCFF03F8FF18147D931D>I<FFE7FE1FE0FFE7FE
1FE01F00F003001F00F803000F80F806000F80F8060007C1BC0C0007C1BC0C0007C1BE0C0003E3
1E180003E31E180001F60F300001F60F300001F60FB00000FC07E00000FC07E000007803C00000
7803C000007803C000003001800023147F9326>119 D<FFE1FF00FFE1FF000F80700007C0E000
07E0C00003E1800001F3800000FF0000007E0000003E0000003F0000007F8000006F800000C7C0
000183E0000381F0000701F8000E00FC00FF81FF80FF81FF8019147F931C>I<FFE07F80FFE07F
801F001C000F8018000F80180007C0300007C0300003E0600003E0600001F0C00001F0C00001F9
C00000F9800000FF8000007F0000007F0000003E0000003E0000001C0000001C00000018000000
18000078300000FC300000FC600000C0E00000E1C000007F8000001E000000191D7F931C>I<3F
FFE03FFFE03C07C0380F80701F80603F00603E00607C0000F80001F80003F00003E06007C0600F
80601F80E03F00C03E01C07C03C0FFFFC0FFFFC013147F9317>I E /Fj
2 51 df<03000700FF000700070007000700070007000700070007000700070007000700070007
00070007007FF00C157E9412>49 D<0F8030E040708030C038E0384038003800700070006000C0
0180030006000C08080810183FF07FF0FFF00D157E9412>I E /Fk 80 123
df<001F83E000F06E3001C078780380F8780300F0300700700007007000070070000700700007
0070000700700007007000FFFFFF80070070000700700007007000070070000700700007007000
070070000700700007007000070070000700700007007000070070000700700007007000070070
0007007000070070007FE3FF001D20809F1B>11 D<003F0000E0C001C0C00381E00701E00701E0
070000070000070000070000070000070000FFFFE00700E00700E00700E00700E00700E00700E0
0700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700E07FC3FE
1720809F19>I<003FE000E0E001C1E00381E00700E00700E00700E00700E00700E00700E00700
E00700E0FFFFE00700E00700E00700E00700E00700E00700E00700E00700E00700E00700E00700
E00700E00700E00700E00700E00700E00700E00700E07FE7FE1720809F19>I<001F81F80000F0
4F040001C07C06000380F80F000300F00F000700F00F0007007000000700700000070070000007
0070000007007000000700700000FFFFFFFF000700700700070070070007007007000700700700
070070070007007007000700700700070070070007007007000700700700070070070007007007
000700700700070070070007007007000700700700070070070007007007007FE3FE3FF0242080
9F26>I<7038F87CFC7EFC7E743A0402040204020804080410081008201040200F0E7E9F17>34
D<70F8FCFC74040404080810102040060E7C9F0D>39 D<0020004000800100020006000C000C00
180018003000300030007000600060006000E000E000E000E000E000E000E000E000E000E000E0
00E0006000600060007000300030003000180018000C000C000600020001000080004000200B2E
7DA112>I<800040002000100008000C00060006000300030001800180018001C000C000C000C0
00E000E000E000E000E000E000E000E000E000E000E000E000C000C000C001C001800180018003
000300060006000C00