The following glossary has been put together to attempt to clarify
expressions commonly used in and around parallel computing for people
who are not familiar with the area.  Experts will notice that it
over-simplifies many of the definitions, and some are technically not
quite true.  It should not be used as more than an indication of what
the phrases mean.

[ Aside: for example, if anyone can provide a mathematically precise
definition of either causal consistency or strong memory consistency
that doesn't classify at least some designs 'the wrong way', I should
very much like to see it! ]


Access -- For data, any action on it, for reading, writing or update

ACML -- AMD Core Mathematical Library - AMD's mathematical library,
including tuned versions of the BLAS and LAPACK (see also BLAS, LAPACK,
MKL and NAG)

Ada -- A programming language, originally designed for high reliability
by the DoD, named after Ada, Countess Lovelace

Altivec -- The SIMD instruction set used on IBM's POWER systems (see
also SSE)

AMD -- The main maker of Intel compatible CPUs (see also Intel)

AMD64 -- The CPU architecture that almost all modern systems use (see
x86 for more information)

API -- Application Programming Interface (the syntax, names and purpose
needed for a programmer to use a facility, but usually omitting any
detailed specification)

Acquire -- See release-acquire

Affinity -- Either when some memory, a device or other resource is
associated to a system thread, or when a logical (program) thread, or
process is associated with a CPU core

Alias/Aliasing -- When two names, pointers etc. refer to the same or
overlapping objects, especially when they are in different threads

Architecture -- The abstract design of something, usually computer
hardware but sometimes programming interfaces

ASCI -- Accelerated SuperComputer Initiative (a USA slush fund for
HPC computing, set up to simulate nuclear bomb testing)

Asynchronous/Asynchronism -- Performing operations at an unspecified
time, which may or may not be in parallel (see also synchronous)

Attached Processor -- A separate CPU which is attached to (say) a
workstation to deliver special functionality or high performance
facilities

Atomic -- An action which happens completely once it starts, any
intermediate state cannot be observed, and no external change will
affect it; also a variable where access to it is atomic - note that many
people use it to imply coherence, but that is not always the case

Autoparallelisation -- When a compiler takes a serial program and makes
it run in parallel, with no code changes needed by the user, invariably
using some form of threading

AVX -- Advanced Vector Extensions - the latest and most powerful SIMD
extensions in the x86 instruction set, currently available on the Xeon
Phi and latest Intel CPUs - the first such extension was MMX, followed
by several versions of SSE, both of which are still supported

Background Process -- A process that is run asynchronously, leaving
the initiating process free to do something else - in Unix, started
by a command that ends in an '&' and often incorrectly called a job

Batch -- When a job is executed without interacting with anything
other than files and similar devices - i.e. not with a user, the
network, other jobs or the application that started it

Binding -- A specification of an abstract (semi-formal) interface
design, in terms of an explicit API for a particular programming
language

BLAS -- Basic Linear Algebra Subroutines - a standard interface for
basic operations on real and complex vectors and matrices (see also
ACML, LAPACK, MKL and NAG)

Block/Blocking -- In I/O and message passing, when a transfer does
not return until the data has been copied - for writes, it may be
in a system buffer and have not reached its destination

BSP -- Bulk Synchronous Parallel - a very simple parallel model
developed by C.A.R. Hoare

C++ Threads -- The C++ programming language and the various versions of
the standard defined by ISO SC22 WG21

C++/C++03/C++11 -- The C++ programming language and the various versions
of the standard defined by ISO SC22/WG21

C/C90/C99/C11 -- The C programming language and the various versions
of the standard defined by ISO SC22/WG14

Cache -- A faster form of memory, used to keep a copy of the most
recently used locations in main memory

Cache line -- The unit of memory that is copied into or out of the
cache - typically 32-128 bytes, occasionally 256 or more

Causal consistency -- For data access, the property that apparent
'time travel' cannot occur (see also sequential consistency); note
that causal consistency is not a well-defined term

Child -- A thread, process or program that was created and (usually)
is controlled by its parent

Cilk/Cilk Plus -- Intel's language extensions to C++ intended
for shared-memory parallel programming; also the compiler for them

Client -- A program that makes requests of a server - think of
a Web browser or FTP command

Clock rate -- The frequency at which a CPU starts instructions or a
memory controller accesses data - the maximum number that may be
issued may be larger, if several are issued at once

Cluster -- A system built up of multiple workstations or small servers,
typically connected by Ethernet

Coarrays -- See Fortran coarrays

Coherence -- For data access, the property that each thread will see
simultaneous parallel actions on data occur in some unspecified order;
note that it does not mean that all threads see the same order (see
consistency)

Communication -- Any form of data transfer or signalling between two
CPUs, processes etc.

Condition variables -- see locking

Condor -- A widely-used job scheduler

Consistency -- For data access, the property that all threads' views of
parallel data accesses obey some consistency rule (see also causal
consistency and sequential consistency)

Controller -- A program or piece of hardware that controls the
execution of other programs or hardware (see also harness)

CORBA -- Common Object Request Broker Architecture - a networking
interface widely used in commercial applications

Core -- A unit of a CPU that executes a single thread or process

CPU -- Central Processing Unit - usually used to refer to the
processing hardware of a computer system (see also GPU and memory)

Cray -- A USA manufacturer of supercomputer systems, specialising in
DoD (USA Department of Defense) contracts

Cray SHMEM -- A semi-shared memory communication mechanism, requiring
RDMA

Critical Sections -- A section of code that is automatically locked,
because otherwise it might cause a data race

CS -- Computer Science

CUDA -- Compute Unified Device Architecture - NVIDIA's interface
specification for using its GPUs as compute processors, available
on its GeForce and TESLA ranges of GPUs

DAG -- see Directed Acyclic Graph

Data Affinity -- When some data locations are bound more closely to
particular CPU cores on an SMP system than to others

Data distribution -- How the program's data is distributed across
multiple processes (or, sometimes, threads or CPU cores)

Dataflow -- An execution design where the programming model is how data
are filtered through actions, rather than by specifying an order of
execution of actions

Data race -- Non-atomic access to the same or overlapping data by two
threads with no intervening synchronisation (see also race condition)

Deadlock -- When a set of threads or processes are stuck, because none
can proceed until one of the others has (see also livelock)

Directed Acyclic Graph -- a mathematical directed graph is a set of
nodes connected by one-directional links; a graph is acyclic if there
is no possible path from any node back to itself

Distributed memory -- A programming or hardware model where multiple
processes run with no shared data, and communicate by message passing
or I/O

DMA -- See RDMA

DoD -- USA Department of Defense

Double precision -- For floating-point, roughly twice as much
precision as the 'basic' precision - nowadays, typically taking
8 bytes and giving 15-16 significant digits

Duplex I/O -- A connection where data can be passed in both directions

Dynamic process -- A process that is created and destroyed as part
of the execution of a program

Email -- Electronic mail

Embarrassingly parallel -- An application that can be parallelised by
running multiple separate threads, with very little no communication
between them; the term originated because these applications perform
well in parallel benchmarking, no matter how slow the interconnect

Encapsulate -- To ensure that all accesses to some data or actions
of a particular form (e.g. I/O) are through a small number of
interfaces

Erlang -- A programming language with built-in parallelism

Ethernet -- The currently dominating network hardware specification
(see also InfiniBand)

Event -- The name used for semaphores in the forthcoming Fortran coarray
extension

Farmable -- Like embarrassingly parallel, but with no communication
between threads or processes, and none with the harness except reading
the parameters and input, and writing the output

Fence -- A very low-level synchronisation mechanism, used to construct
higher-level ones; fences myst be executed by both threads to get any
sychronisation between them

FIFO -- First-Input, First-Output - another name for a queue, usually
used under POSIX systems to indicate devices like sockets, pipes and
named FIFOs

FFTW -- The Fastest Fourier Transform in The West - a widely used
and portable open-source fast Fourier transform library

Firmware -- Software that is stored in read-only memory, and that
appears to programmers to be part of the hardware (see also hardware
and software)

Fortran coarrays -- The Fortran 2008 parallel programming facility; it
is a PGAS model

Fortran/Fortran 66/Fortran 77/Fortran 90/Fortran 2003/Fortran 2008 --
The Fortran programming language and the various versions of the
standard defined by ISO SC22/WG5

FPS -- Floating Point Systems - the maker of an attached SIMD unit
that was widely used in the 1980s to enhance the performance of IBM
PCs for scientific calculations (see also GPU)

FTP -- File Transfer Protocol - a widely used Internet protocol for
transferring files between systems

Future -- In C++, very similar to a task

Gang scheduling -- When all threads or processes either execute on
separate, dedicated CPU cores or none execute

GeForce -- NVIDIA's range of ordinary video cards that can also be used
for CPU programming

GNU -- The brand name for software produced by the Free Software
Foundation

GNU Ada -- The Ada compiler that is part of the GNU compiler
suite, that also includes gcc, G++ and gfortran

GPU -- Graphics Programming Unit; while mainly used to deliver smooth
effects for video and gaming, many modern ones (like NVIDIA's) can be
used as high-performance SIMD attached processors

GridEngine -- A widely-used job scheduler

GUI -- Graphical User Interface - the sort of interface available on
almost all modern computers used for interactive work

Handshake -- A barrier that involves only two agents

'Happens After'/'Happens Before' -- In Java, C++ etc., when two actions
are required to occur in a specific order by the language rules

Hardware -- The physical components of a computer system (see also
firmware and software)

Harness -- A program, script or other framework that is used to control
the execution of other processes (see also controller)

High Performance Computing -- A calculation which is limited by the
availability of resources, where the primary objective is to do larger
calculations, faster

HPC -- High Performance Computing

HPF -- High Performance Fortran - one of the earlier PGAS designs, no
longer available, and superseded by either OpenMP or Fortran coarrays

I/O -- Input/output - when a program reads or writes data to a file
or other file-like device (e.g. a socket or terminal)

IA/IA-32/IA-64 -- Intel Architecture; IA-32 usually means x86
and IA-64 usually means Itanium

IBM -- International Business Machines (see also POWER and Altivec)

Image -- The name used for a thread/process in the Fortran standard

InfiniBand -- The leading 'non-proprietary' specification for HPC
interconnects -- it is faster and more expensive than Ethernet

Intel -- The maker of most CPUs currently used for general computing
(see also AMD)

Interconnect -- The network used to link a cluster together and
enable fast message passing or other communication

Internet servers/Internet services -- The server systems and services
accessible via the Internet, such as online shopping ones

ISO -- International Standards Organisation, the body responsible
for standards like Fortran, C and C++

Itanium -- A computer architecture developed by HP and Intel, intended
to replace x86, but which is fast disappearing

Java -- The widely used scripting language designed by Sun

Job -- A set of commands that specifies the execution of one or more
processes and the location of their data (see also task); Unix
incorrectly used it to for background tasks, and that use is widespread

Job scheduler -- A system program that controls where and when jobs
are executed

Kernel scheduler -- The part of the operating system kernel that
controls where and when threads and processes are executed on the CPU
cores it manages

KISS -- Keep It Simple and Stupid - an age-old engineering principle,
and perhaps the most important one in computing - the acronym was coined
by Kelly Johnson of the Skunkworks and is commonly misquoted as Keep It
Simple, Stupid

LAPACK -- Linear Algebra Package - portable, high-quality, open source
code for matrix decomposition, solution of simultaneous linear equations
and eigensystems (see also ACML, BLAS, MKL and NAG)

Linux -- The Unix-like operating system that is currently used for most
scientific computing

Livelock -- When a set of threads or processes are stuck in an infinite
loop, all waiting for one of the others to do something (see also
deadlock)

Lock/Locking -- To prevent any other thread, process or command getting
access to an item of data or facility until the lock is released
(unlocked) - there are numerous different forms of this, including
condition variables, mutexes, readers/writers locks and semaphores,
which are sometimes provided under other names

LSF -- Load Sharing Facility - a widely-used job scheduler

Master/Master-Worker -- An application design where one process (the
master) parcels out work to the other processes (the workers)

Matlab -- The widely used matrix programming package

MCS -- Managed Cluster Service - the system used for the teaching
systems supported by the University of Cambridge University Information
Services

Memory -- The component of a computer used to store the data that is
being worked on by a process

Memory Consistency/Memory Model -- The rules stating what guarantees
parallel accesses to the same or overlapping data locations, and what
rules a program must obey to get defined behaviour

Message passing -- A communication model that involves one process
sending messages to another (a bit like a sort of internal Email)

Message Passing Interface -- See MPI

MIC -- Many Integrated Core Architecture (see Xeon Phi)

Microsoft -- The well-known computer company, and its software

MIMD -- Multiple Instruction Multiple Data - when multiple processes
execute independently on different data - it is often incorrectly used
to mean distributed memory with message passing (see also MPMD)

MKL -- Mathematics Kernel Library - Intel's tuned mathematical library,
including tuned versions of the BLAS and LAPACK (see also ACML, BLAS,
LAPACK and NAG)

MMX -- Multi-Media Extensions - see AVX

MPI -- Message Passing Interface - the name of the distributed memory
message passing library standard that dominates HPC programming on
clusters (see also OpenMPI and MPICH)

MPICH -- A widely-used open source MPI implementation (see also
OpenMPI)

MPMD -- Multiple Program Multiple Data - when multiple processes
execute separate programs on different data

MTBF -- Mean Time Between Failures - a measure of failure rate used
when analysing non-repeatable failures

Mutex -- See locking

NAG -- Numerical Algorithms Group - a not-for-profit commercial company,
producing probably the most general and high-quality numerical library

NAG SMP -- The version of the NAG library that can make use of multiple
cores for extra performance

Named FIFO -- A FIFO that is accessed by a filename (see also FIFO)

Nesting -- When an instance of one construct occurs inside another
instance; e.g. when lock B is used in code that already holds lock A

NUMA -- Non-Uniform Memory Architecture - a form of SMP when it
takes longer to access some data locations than others from a particular
thread -- almost all SMP CPus nowadays are NUMA (see also data affinity)

NVIDIA/NVIDIA Fermi/NVIDIA Kepler/NVIDIA Tesla -- The well-known
manufacturer of video cards, and the names for its high-end GPUs

OpenAcc -- The previous name for the OpenMP extensions to support
SIMD as distinct from threading

OpenCL -- The most widely-used interface used to program GPUs,
callable from C and C++ and indirectly from other languages

OpenMP -- A language extension for Fortran, C and C++ that is commonly
used for SMP programming using threads - not to be confused with OpenMPI

OpenMPI -- Possibly the most widely-used open source MPI
implementation (see also MPICH) - not to be confused with OpenMP

Packet -- A unit of data sent over a network

Parallel Genetic Algorithms -- Parallel search methods based on a
simplification of a biological model of genetics

Parallel Global Array Storage -- See PGAS

Parameter Space Searching -- Global optimisation to find the best
combination of a set of parameters for some calculation

Parent -- A thread, process or program that creates and (usually)
controls its children

Partitioned Global Address Space -- See PGAS

PBS -- Portable Batch Scheuler - a widely-used job scheduler

PC -- Personal Computer

PDE -- Partial Differential Equation

Perl -- A widely-used but low level and complicated scripting language
(see also Python)

PGAS -- Partitioned Global Address Space or Parallel Global Array
Storage - a hybrid distributed/shared memory model where some arrays can
be accessed semi-directly from all processes

Pipe -- A Unix (POSIX) mechanism for one process to pass data to
another, using normal I/O facilities (see also FIFO)

POSIX/POSIX standard -- The specification of the Unix-like operating
system interface that is fairly closely followed by Linux and other
modern Unix systems

POSIX mmap/POSIX shmat/POSIX shmem -- The shared memory segment APIs
defined by POSIX

POSIX threads -- The APIs for executing multiple threads iwith shared
memory defined by POSIX

POWER -- IBM's proprietary computer architecture

Process -- In modern usage, the unit of execution that is protected
against other processes, including not being able to access their data
directly, owned by the same or other users; each process may have
multiple threads executing

Prolog -- A programming language that uses a dataflow design

Prototype -- A preliminary implementation of a design, intended to
find problems that have been missed up to that stage

PVM -- Parallel Virtual Machine - a distributed memory programming
environment for multiple workstations, now superseded by MPI

PWF -- Personal Workstation Facility, later Public Workstation
Facility - this now called the MCS

PWF Condor -- The version of Condor used on the PWF

Python -- A widely-used, simple and recommended scripting language

Queue -- A data structure where entries are appended to the end and
taken off the beginning; job schedulers often use it for jobs (see also
FIFO)

Race Condition -- Two actions of the same form that happen with no
intervening synchronisation and conflict; this is a generalisation of
data race

RAS -- Reliability, Availability and Serviceability - i.e. the property
that a system rarely crashes, misbehaves or is inaccessible

RDMA -- Remote Direct Memory Access - a facility for one distributed
memory process to access the memory of another, without that other
process needing to do anything

Readers/writers locks -- see locking

Reduction -- A parallel mechanism by which data is (for example) summed
across threads or processes

Release/Release-acquire -- ???

SC22 -- The ISO subscommittee responsible for all programming languages
and programming interfaces

ScaLAPACK -- A parallel form of LAPACK (Linear Algebra Package), based
on MPI

Scheduler -- See job scheduler and kernel scheduler

Semaphore -- See locking

Semantics -- The meaning of a program and the restrictions on what is
permitted, as distinct from its syntax (i.e. the rules for writing its
text)

Sequential -- occuring in a single order in time, which may be explicit
or unspecified; the opposite of parallel (see also synchronous and
asynchronous)

Sequential Consistency -- For data access, the property that all
parallel data accesses appear to have occurred in some serial order
(see 'causal consistency')

Serial -- See sequential

Serial Debugger -- A debugger written to handle only serial programs

Shared Memory -- A programming or hardware model where multiple
threads can access each other's data as if it were their own; it is
often incorrectly assumed to mean coherence and even consistency

Shared Memory/Shared Memory Processor -- See SMP

Shared Memory Segment -- An area of memory that can be shared between
two processes, that otherwise do not share memory

Shared Memory Threading -- see threading

SHMEM -- See Cray SHMEM

Sibling -- Two threads, processes or programs with the same parent

SIGPIPE -- A signal sent to indicate that a pipe is broken under POSIX
systems

SIMD -- Single Instruction Multiple Data - when a single operation
acts on a large number of data items at once, usually some form of
vector operation

Simplex I/O -- A connection where data can be passed in only one
direction (see also streaming I/O)

Smalltalk -- A language with built-in parallelism and message passing
that had a major influence on later parallel designs

SMP -- Shared Memory Processor - a system capable of sharibng memory
between multiple CPU cores (and hence threads) - originally, it meant
Symmetric Multi-Processor, but that usage has disappeared

SOAP -- Simple Object Access Protocol - a networking interface
widely used in commercial applications

Socket -- A mechanism for one process to pass data to another, not
necessarily on the same system, using normal I/O facilities (see also
FIFO)

Software -- The programs of a computer system, including the operating
system itself (see also firmware and hardware)

Solaris dtrace -- A system debugging mechanism that allows the ordinary
user to trace some events that happen in the kernel that are related to
his processes

Son of Star Wars -- See ASCI

Spawn -- To create a new (child) process

Specification -- The description of something, in this context often
a programming interface

Spin loop -- where one thread or process waits for another by testing an
atomic variable in a tight loop, and exiting when it changes value

SPMD -- Single Program Multiple Data - a form of MIMD where each
thread or process runs the same executable

SSE -- Streaming SIMD Extensions - see AVX

Standard -- A specification that is produced by some official body
(e.g. ISO) or is widely accepted as the interface to design to

STL -- (C++) Standard Template Library - the old name for the standard
library, especially the containers, iterators and algorithms

Strong Memory Model -- Typically, a memory model that guarantees causal
consistency

Streaming -- When a program takes a sequence of inputs (e.g. lines),
processes them, and writes them as soon as they are ready; many basic
Unix utilities (e.g. grep, cat, tr) are streaming

Streaming I/O -- Simplex I/O that does not have any form of
repositioning (including being closed and reopened)

Sun -- A computer company, now taken over by Oracle

SVD -- Singular Value Decomposition

Synchronisation -- Actions to ensure that parallel operations occur in a
particular order (see also asynchronous)

Synchronous -- Performing operations at a specified time; in a parallel
context, this implies that they are executed in some serial order

Syntax -- The test forms that are valid in a programming language

SysV shmem -- See POSIX shmem

Task -- A unit of work, such as a procedure (function) call, but more
general than that; in threading, usually an asynchronous procedure call
that is run as a separate thread

TANSTAAFL -- There Ain't No Such Thing As A Free Lunch - an acronym
coined by the science fiction writer Larry Niven to indicate that
everything has its disadvantages

TCP/IP -- Transmission Control Protocol/Internet Protocol - the
currently most widely used low-level interface used to transfer data
between systems

TESLA -- A range of NVIDIA's CPUs designed for high-end computation
(models include FERMI and KEPLER)

Thread -- In modern usage, a unit of execution that can run
asynchronously from other threads, but can share data with them; these
are usually in the context of a process

Thread Interference -- When two threads do something that conflicts,
and so things go wrong

Thread pool -- A collection of threads that can be used to provide
a thread when one is needed to execute a task

Transaction -- An action (usually involving multiple agents or threads)
that is packaged to be atomic - i.e. it is independent of all other
transactions and has no effect if it is cancelled

UPC -- Unified Parallel C - a widely-hyped PGAS model which does not
seem to be much used, and is not recommended

Update -- Any access that can potentially change the contents of a
data location, whether it does or not

USA -- United States of America

Variable -- The name of a data location that can be updated

Vector Hardware/Vector System -- A system which provides major SIMD
capabilities - these were the dominant supercomputers of the 1960s and
1970s, but have been superseded by SSE etc. and GPUs

Vectorisation -- Ensuring that a program uses the SIMD facilities
of its CPU - compilers often do this automatically for SSE etc.
at high levels of optimisation

Virtual shared memory -- A programming model that that makes multiple
cores with distributed memory appear to the programmer as if they had
shared memory

Weak Memory Model -- A memory model that is not a strong memory model
(q.v.)

Web/World Wide Web -- The collection of information accessible via a
browser over the Internet

Web of a Million Lies -- a gross underestimate; while the information on
the Web is extremely useful, a high proportion of it is misleading or
just plain wrong

Web pages -- Locations (i.e. views of data) on the Web

WG14 -- The ISO SC22 working group responsible for the C standard

WG21 -- The ISO SC22 working group responsible for the C++ standard

WG5 -- The ISO SC22 working group responsible for the Fortran
standard

Worker -- See Master-Worker

x86/x86-64 -- The Intel architecture used by the CPUs used on almost all
modern CPUs above the size of tablets; the 64-bit version was developed
by AMD (see AMD64) and Intel call it x86-64

Xeon Phi -- Intel's range of attached processors using the MIC
architecture for x86 systems, that provide a large number of CPU cores