VectorLib
Site Index:
OptiVec home
MatrixLib
CMATH
Download
Order
Update
Support

VectorLib
VectorLib is the vector functions part of OptiVec. This file describes the basic principles of the OptiVec libraries and gives an overview over VectorLib. The new objectoriented interface, VecObj, is described in chapter 3. MatrixLib and CMATH are described separately.
Contents
1. Introduction
2. The Elements of OptiVec Routines
3. C++ only: VecObj, the ObjectOriented Interface for VectorLib
4. VectorLib Functions and Routines: A Short Overview
5. Error Handling
6. TroubleShooting
7. The IncludeFiles and Units of OptiVec
1. Introduction
OptiVec offers a powerful set of routines for numerically demanding applications, making the philosophy of vectorized programming available for C/C++ and Pascal/Delphi languages. It serves to overcome the limitations of loop management of conventional compilers – which proved to be one of the largest obstacles in the programmer's way towards efficient coding for scientific and data analysis applications.
In comparison to the old vector language APL, OptiVec has the advantage of being incorporated into the modern and versatile languages C/C++ and Pascal/Delphi. Recent versions of C++ and Fortran do already offer some sort of vector processing, by virtue of iterator classes using templates (C++) and field functions (Fortran90). Both of these, however, are basically a convenient means of letting the compiler write the loop for you and then compile it to the usual inefficient code. The same is true for most implementations of the popular BLAS (Basic Linear Algebra Subroutine) libraries.
In comparison to these approaches, OptiVec is superior mainly with respect to execution speed – on the average by a factor of 23, in some cases even up to 8. The performance is no longer limited by the quality of your compiler, but rather by the real speed of the processor!
There is a certain overlap in the range of functions offered by OptiVec and by BLAS, LINPACK, and other libraries and sourcecode collections. However, the latter must be compiled, and, consequently, their performance is determined mainly by the quality of the compiler chosen. To the best of our knowledge, it is our product, OptiVec, that offers the first comprehensive vectorizedfunctions library realized in a true Assembler implementation.
 All operators and mathematical functions of C/C++ are implemented in vectorized form; additionally many more mathematical functions are included which normally would have to be calculated by more or less complicated combinations of existing functions. Not only the execution speed, but also the accuracy of the results is greatly improved.
 Building blocks for statistical data analysis are supplied.
 Derivatives, integrals, interpolation schemes are included.
 Fast Fourier Transform techniques allow for efficient convolutions, correlation analyses, spectral filtering, and so on.
 Graphical representation of data offers a convenient way of monitoring the results of vectorized calculations.
 A wide range of optimized matrix functions like matrix arithmetics, algebra,
decompositions, data fitting, etc. is offered by MatrixLib.
TensorLib is planned as a future extension of these concepts for general multidimensional arrays.
 Each function exists for every data type for which this is reasonable. The data type is signalled by the prefix of the function name. No implicit name mangling or other specific C++ features are used, which makes OptiVec usable in plainC as well as in C++ programs. Moreover, the names and the syntax of nearly all functions are the same in C/C++ and Pascal/Delphi languages.
 The input and output vectors/matrices of VectorLib and MatrixLib routines may be of variable size and it is possible to process only a part (e.g., the first 100 elements, or every 10th element) of a vector, which is another important advantage over other approaches, where only whole arrays are processed.
 A new objectoriented interface for C++, named VecObj, encapsulates all vector functions, offering even easier use and increased memory safety.
 Using OptiVec routines instead of loops can make your source code much more compact and far better readable.
The wide range of routines and functions covered by OptiVec, the high numerical efficiency and increased ease of programming make this package a powerful programming tool for scientific and data analysis applications, competing with (and often beating) many highpriced integrated systems, but imbedded into your favourite programming language.
Back to VectorLib Table of Contents
OptiVec home
1.1 Why Vectorized Programming Pays Off on the PC
To process onedimensional data arrays or "vectors", a programmer would normally write a loop over all vector elements. Similarly, two or higherdimensional arrays ("matrices" or "tensors") are usually processed through nested loops over the indices in all dimensions. The alternative to this classic style of programming are vector and matrix functions.
Vector functions act on whole arrays/vectors instead of single scalar arguments. They are the most consequent form of "vectorization", i.e., organisation of program code (by clever compilers or by the programmer himself) in such a way as to optimize vector treatment.
Vectorization has always been the magic formula for supercomputers with their multiprocessor parallel architectures. On these architectures, one tries to spread the computational effort equally over the available processors, thus maximizing execution speed. The socalled "divide and conquer" algorithms break down more complicated numerical tasks into small loops over array elements. Sophisticated compilers then find out the most efficient way how to distribute the array elements among the processors. Many supercomputer compilers also come with a large set of predefined proprietary vector and matrix functions for many basic tasks. These vectorized functions offer the best way to achieve maximum throughput.
Obviously, the massive parallel processing of, say, a Cray is not possible even on modern PCs with their modest 2 or 4processor core configurations, let alone on the classical singleprocessor PC. Consequently, at first sight, it might seem difficult to apply the principle of vectorized programming to the PC. Actually, however, there are many vectorspecific optimizations possible, even for computers with only one CPU. Most of these optimizations are not available to present compilers. Rather, one has to go down to the machinecode level. Handoptimized, Assemblerwritten vector functions outperform compiled loops by a factor of two to three, on the average. This means that vectorization, properly done, is indeed worth the effort, also for PC programs.
1.1.1 General OptiVec Optimization Strategies
Here are the most important optimization strategies, employed in OptiVec to boost the performance on any PC (regardless of the number of processor cores):
Prefetch of chunks of vector elements Beginning with the Pentium III processor, Intel introduced the very useful feature of explicit memory prefetch. With these commands, it is possible to "tell" the processor to fetch data from memory sufficiently in advance, so that no time is waisted waiting for them when they are actually needed.
Cache control The Pentium III+ processors offer the possibility to mark data as "temporal" (will be used again) or "nontemporal" (used only once), while they are fetched or stored. In OptiVec functions, it is assumed that input vectors (and matrices) will not be used again, whereas the output vectors are likely to become the input for some ensuing procedure. Consequently, the cache is bypassed while loading input data, but the output data are written into the cache. Of course, this approach breaks down if the vectors or matrices become too large to fit into the cache. For these cases, a largevector version of the OptiVec libraries is available which bypasses the cache also while writing the output vectors. For simple arithmetic functions, up to 20% in speed are gained as compared to the smallandmediumsize version. On the other hand, as this largevector version effectively switches the cache off, a drastic performance penalty (up to a factor of three or four!) will result, if it is used for smaller systems. For the same reason, you should carefully check if your problem could perhaps be split up into smaller vectors, before resorting to the largevector version. This would allow to achieve the much higher performance resulting from efficient data caching.
Use of SIMD commands You might wonder why this strategy is not listed first. The SSE or "Streaming SingleInstructionMultipleData Extensions" of Pentium III, Pentium 4 and their successors provide explicit support for vectorized programming with floatingpoint data in float / single or double precision (the latter only for Pentium 4). At first sight, therefore, they should revolutionize vector programming. Given the normal relation between processor and data bus speeds, however, many of the simple arithmetic operations are data transfer limited, and the use of SIMD commands does not make the large difference (with respect to wellwritten FPU code) it could make otherwise. In most cases, the advantage of treating four floats in a single command melts down to a 2030% increase in speed (which is not that bad, anyway!). For more complicated operations, on the other hand, SIMD commands often cannot be employed, either because conditional branches have to be taken for each vector element individually, or because the "extra" accuracy and range, available by traditional FPU commands (with their internal extended accuracy), allows to simplify algorithms so much that the FPU code is still faster. As a consequence, we use SIMD commands only where a real speed gain is possible. Please note, however, that, the SIMDemploying library versions (P6, P7 etc.) generally sacrifices 23 digits of accuracy in order to attain the described speed gain. If this is not acceptable for your specific task, please stay with the P4 libraries.
Preload of floatingpoint constants Floatingpoint constants, employed in the evaluation of mathematical functions, are loaded into floatingpoint registers outside the actual loop and stay as long as they are needed. This saves a large amount of loading/unloading operations which are necessary if a mathematical function is called for each element of a vector separately.
Full XMM and FPU stack usage Where necessary, all eight (64bit: all sixteen) XMM registers and/or all eight coprocessor registers are employed.
Superscalar scheduling By careful "pairing" of commands whose results do not depend upon each other, the two integer pipes and the two fadd/fmul units of the processor are used as efficiently as possible.
Loopunrolling Where optimum pairing of commands cannot be achieved for single elements, vectors are often processed in chunks of two, four, or even more elements. This allows to fully exploit the parallelprocessing capabilities of the Pentium and its successors. Moreover, the relative amount of time spent for loop management is significantly reduced. In connection with dataprefetching, described above, the depth of the unrolled loops is most often adapted to the cache line size of 32 bytes (PentiumXX) or 64 bytes (AMD 64 x2 or Core2 Duo).
Simplified addressing The addressing of vector elements is still a major source of inefficiency with present compilers. Switching forth and back between input and output vectors, a large number of redundant addressing operations is performed. The strict (and easy!) definitions of all OptiVec functions allow to reduce these operations to a minimum.
Replacement of floatingpoint by integer commands For any operations with floatingpoint numbers that can also be performed using integer commands (like copying, swapping, or comparing to preset values), the faster method is consistently employed.
Strict precision control C compilers convert a float into a double – Borland Pascal/Delphi even into extended – before passing it to a mathematical function. This approach was useful at times when disk memory was too great a problem to include separate functions for each data type in the .LIB files, but it is simply inefficient on modern PCs. Consequently, no such implicit conversions are present in OptiVec routines. Here, a function of a float is calculated to float (i.e. single) precision, wasting no time for the calculation of more digits than necessary – which would be discarded anyway. There is also a bruteforce approach to precisioncontrol: You can call V_setFPAccuracy( 1 ); to actively switch the FPU to single precision, if that is enough for a given application. Thereby, execution can be slightly sped up from Pentium CPUs on. Be, however, prepared to accept even lowerthansingle accuracy of your end results, if you elect this option. For further details and precautions, see V_setFPAccuracy.
Allinline coding All external function calls are eliminated from the inner loops of the vector processing. This saves the execution time necessary for the "call / ret" pairs and for loading the parameters onto the stack.
Cacheline matching of local variables
The Level1 cache of the Pentium and its 32bit successors is organized in lines of 32 bytes each, modern 64bit processors use 64byte lines. Many OptiVec functions need doubleprecision or extendedprecision real local variables on the stack (mainly for integer/floatingpoint conversions or for range checking). Present compilers align the stack on 4byte boundaries, which means there is a certain chance that the 8 bytes of a double or the 10 bytes of an extended, stored on the stack, will cross a cacheline boundary. This, in turn, would lead to a cache linebreak penalty, deteriorating the performance. Consequently, those OptiVec functions where this is an issue, use special procedures to align their local variables on 8byte (for doubles), 16byte (for extendeds), or 32byte boundaries (for XMM values).
Unprotected and reducedrange functions
OptiVec offers alternative forms of some mathematical functions, where you have the choice between the fully protected variant with error handling and another, unprotected variant without. In the case of the integer power functions, for example, the absence of error checking allows the unprotected versions to be vectorized much more efficiently. Similarly, the sine and cosine functions can be coded more efficiently for arguments that the user can guarantee to lie in the range 2p and +2p. In these special cases, the execution time may be reduced by up to 40%, depending on the hardware environment. This increased speed has always to be balanced against the increased risk, though: If any input element outside the valid range is encountered, the unprotected and reducedrange functions will crash without warning.
1.1.2 MultiProcessor Optimization
Multithread support
All the above being said about singleCPU PCs, multiprocessors computers (with Intel's Core i3, i5, i7, Core2Duo, AMD's Athlon 64 X2, or workstations and servers equipped with 2 or 4 PentiumXX chips) do allow the operating system to distribute threads among the available processors, doubling or quadrupling the overall performance. For that, any functions running in parallel must be prevented from interfering with each other through read/write operations on global variables. With very few exceptions (namely the plotting functions, which have to use global variables to store the current window and coordinate system settings, and the nonlinear datafitting functions), all other OptiVec functions are reentrant and may run in parallel.
Be careful with multithreading, if you are using the P6 or P7 versions of OptiVec: The earlier releases of 32bit Windows do not save the XMM registers (employed in the SIMD commands) during task switches. No such problems have been found with Windows XP or Vista.
When designing your multithread application, you have two options: functional parallelism and data parallelism.
Functional Parallelism
If different threads are performing different tasks – they are functionally different – one speaks of functional parallelism. As an example, consider one thread handling user input / output, while another one performs background calculations. Even on a singlecore CPU, this kind of multithreading may offer advantages (e.g., the user interface does not block during extensive background calculations, but still takes input). On a multicore computer, the two (or more) threads can actually run simultaneously on the different processor cores. In general, however, the load balance between the processor cores is far from perfect: often, one processor is running at maximum load, while another one is sitting idle, waiting for input. Still, functional multithreading is the best option whenever your numerical tasks involve vectors and matrices of only smalltomoderate size.
Data Parallelism
In order to improve the load balance between the available processor cores, thereby maximizing throughput, it is possible to employ classical parallel processing: the data to be processed is split up into several chunks, each thread getting one of these chunks. This is aptly called data parallelism. The usefulness of this approach is limited by the overhead involved in the data distribution and in the threadtothread communication. Moreover, there are always parts of the code which need to be processed sequentially and cannot be parallelized. Therefore, data parallelism pays off only for larger vectors and matrices. Typical breakeven sizes range from about 100 (for the calculation of transcendental functions of complex input values) to several 10,000 elements (as in the simple arithmetic functions). Only when your vectors and matrices are considerably larger than that threshold, the performance is actually improved over a functionalparallelism approach. The boost then quickly approaches (but never exactly reaches) the theoretical limit of a factor equal to the number of processor cores available.
Choosing the right OptiVec Library
Whenever you want your application to run on a wide range of supported platforms, and when your vectors and matrices are only of smalltomoderate size, we recommend to use the generalpurpose libraries, OVVC4.LIB (for MS Visual C++), VCF4W.LIB (for Borland C++), or the units in OPTIVEC\LIB4 (for Delphi). These libraries combine good performance with backcompatibility to older hardware, down to 486DX, Pentium, old models of Athlon. They are all multithread safe and support functional parallelism. If you do not need full floatingpoint accuracy and that amount of backcompatibility, you can get higher performance by switching to the P6, P7, or P8 libraries (marked by the respective number in the in the library name).
For large vectors/matrices on singlecore machines from Pentium III+ on, we offer versions gaining some performance by simply bypassing the data cache. These LargeVector Libraries are marked by the letter "L": OVVC6L.LIB (for MS Visual C++), VCF6L.LIB (for Borland C++), or the units in OPTIVEC\LIB6L (for Delphi). Replace the "6" with "7" to get the Pentium 4+ versions, and so on. If misused for smaller vectors / matrices, the LargeVector libraries will perform significantly slower than the generalpurpose libraries!
Finally, for large vectors/matrices on multicore machines, our new multicore optimized libraries actively distribute the work load over the available processor cores for data parallel execution. These libraries are marked by the letter "M", as in OVVC7M.LIB (for MS Visual C++, using SSE2), VCF4M.LIB (for Borland C++, full FPU accuracy), or the units in OPTIVEC\LIB8M (for Delphi, using SSE3). These libraries are designed for AMD 64 x2, Intel Core2 Duo, or machines equipped with several discrete processors of the Pentium 4+ level.
The "M" libraries will still run on singlecore machines, but – due to the threadmanagement overhead – somewhat slower than the generalpurpose libraries. Although the "M" libraries are designed with medium to large vectors in mind, the penalty for using them with smaller vectors is almost negligible, as the OptiVec threadengine automatically executes a function in a single thread, if the vector size is too small for parallel execution to earn back the cost involved in the threadtothread communication.
If you use the "M" libraries, your programme must call V_initMT before any of the vector functions.
1.1.3 CUDA Device Support
Modern graphics cards are equipped with powerful multiprocessor capacity of up to several hundred processor kernels running in parallel. In recent years, interfaces have been developed, allowing to exploit this processing capacity not only for graphics rendering, but also for general calculations. One of these approaches is the CUDA concept by NVIDIA. Practically all current NVIDIA graphics cards support CUDA. Additionally, dedicated CUDA hardware is being offered by NVIDIA with the "Tesla" and "Fermi" board family. With the "C" libraries (e.g., OVVC8C.LIB), OptiVec offers a simple way to use a CUDA device for vector / matrix calculations without the hassles of actually programming in CUDA. There are a number of points to be considered:
 Obviously, the "C" libraries can be used only with a CUDAenabled device installed. This means, only NVIDIA products are supported.
 Out of the compilers supported by OptiVec, currently, NVIDIA provides CUDA support only for MS Visual C++. This means there are presently no CUDA OptiVec libraries for the Embarcadero / Borland compilers available.
 It is necessary to have the latest display driver installed. Even brandnew computers most often do not have the latest drivers. They must be selected and downloaded from NVIDIA's website, www.nvidia.com.
 Already a sub100$ graphics card can boost the performance of certain functions on a computer with a mediumrange CPU by a factor of 10, dedicated hardware by much more. However, the combination of a highend CPU with a lowend graphics card (as it is often found in laptop computers) will only marginally benefit from the "C" libraries.
 The cost of swapping data forth and back between mainboard memory and graphics memory is so high that it can be "earned" back only for quite large vectors and matrices. E.g., for mathematical functions like the sine or exponential functions, CUDA pays off from 100,000 vector elements on. For matrix multiplication, payback occurs in the region of 200x200 elements. All OptiVec functions check if using the CUDA device makes sense and decide accordingly wether to sourceout processing to the graphics processor or to stay on the CPU.
 Using CUDA with OptiVec is as easy as simply linking with the "C" library and with the import libraries provided by NVIDIA. No modifications of your source code are necessary. On the other hand, by eliminating the repeated data transfers for each function, programming directly for CUDA devices with nVidia's CUDA SDK can lead to considerably higher performance than is possible with the use of the OptiVec "C" libraries.
 As support for doubleprecision floatingpoint is more ore less restricted to the expensive Tesla and Fermi boards, OptiVec currently uses the CUDA device only for single precision.
 The OptiVec "C" libraries actually use DLLs developed by NVIDIA. These have to be installed along with the OptiVec libraries.
 NVIDIA might at any time change the licence terms for their CUDA libraries, so that we might at some point no longer be able to include them in our distributions and/or to support CUDA at all.
1.1.4 Choosing the right OptiVec Library
Whenever you want your application to run on a wide range of supported platforms, and when your vectors and matrices are only of smalltomoderate size, we recommend to use the generalpurpose libraries, OVVC4.LIB (for MS Visual C++), VCF4W.LIB (for Borland C++), or the units in OPTIVEC\LIB4 (for Delphi). These libraries combine good performance with backcompatibility to older hardware, down to 486DX, Pentium, old models of Athlon. They are all multithread safe and support functional parallelism. If you do not need full floatingpoint accuracy and that amount of backcompatibility, you can get higher performance by switching to the P6, P7, or P8 libraries (marked by the respective number in the in the library name).
For large vectors/matrices on singlecore machines from Pentium III+ on, we offer versions gaining some performance by simply bypassing the data cache. These LargeVector Libraries are marked by the letter "L": OVVC6L.LIB (for MS Visual C++), VCF6L.LIB (for Borland C++), or the units in OPTIVEC\LIB6L (for Delphi). Replace the "6" with "7" to get the Pentium 4+ versions, and so on. If misused for smaller vectors / matrices, the LargeVector libraries will perform significantly slower than the generalpurpose libraries!
Finally, for large vectors/matrices on multicore machines, multicore optimized libraries actively distribute the work load over the available processor cores for data parallel execution. These libraries are marked by the letter "M", as in OVVC7M.LIB (for MS Visual C++, using SSE2), VCF4M.LIB (for Borland C++, full FPU accuracy), or the units in OPTIVEC\LIB8M (for Delphi, using SSE3). These libraries are designed for AMD 64 x2, Intel Core2 Duo, or machines equipped with several discrete processors of the Pentium 4+ level. The CUDA libraries are based on the "M" libraries and are marked by the letter "C", as, e.g., in OVVC8C.LIB.
The "M" and "C" libraries will still run on singlecore machines, but – due to the threadmanagement overhead – somewhat slower than the generalpurpose libraries. Although the "M" libraries are designed with medium to large vectors in mind, the penalty for using them with smaller vectors is almost negligible, as the OptiVec threadengine automatically executes a function in a single thread, if the vector size is too small for parallel execution to earn back the cost involved in the threadtothread communication.
If you use the "M" or "C" libraries, your programme must call V_initMT( nAvailProcCores ) before any of the vector functions.
Back to VectorLib Table of Contents
OptiVec home
2. Elements of OptiVec Routines
2.1 Synonyms for Some Data Types
To increase the versatility and completeness of OptiVec, additional data types are defined in <VecLib.h> or the unit VecLib:
a) C/C++ only:
The data type ui (short for "unsigned index") is used for the indexing of vectors and is defined as "unsigned int".
The 64bit integer data type (__int64 in BC++ Builder and MS Visual C++, Int64 in Delphi) is called quad (for "quadword integer") in OptiVec.
In 32bit, the type quad is always signed. Functions for unsigned 64bit integers are available only in the 64bit versions of OptiVec.
 Borland C++ below C++ Builder 2006 only: For the older BC versions, which did not directly support 64bit integers, the data type quad is implemented as a struct of two 32bit values. Floatingpoint numbers (preferably long doubles with their 64bit mantissa) have to be used as intermediates. The necessary interface functions are setquad, quadtod and _quadtold. Alternatively, the two 32bit halves may explicitly be set, as in:
xq.Hi = 0x00000001UL;
xq.Lo = 0x2468ABCDUL;
The data type extended, which is familiar to Pascal/Delphi programmers, is defined as a synonym for "long double" in OptiVec for C/C++. As Visual C++ does not support 80bit reals, we define extended as "double" in the OptiVec versions for that compiler.
b) Delphi only:
The data type Float, which is familiar to C/C++ programmers, is defined as a synonym for Single. We prefer to have the letters defining the realnumber data types in alphabetical proximity: "D" for Double, "E" for Extended, and "F" for Float. As noted above, possible future 128bit and 256bit real numbers could find their place in this series as "G" for Great and "H" for Hyper.
For historical reasons (dating back to the development of Turbo Pascal), the various integer data types have a somewhat confusing nomenclature in Delphi. In order to make the derived function prefixes compatible with the C/C++ versions of OptiVec, we define a number of synonyms, as described in the following table:
type  Delphi name  synonym  derived prefix 
8 bit signed  ShortInt  ByteInt  VBI_ 
8 bit unsigned  Byte  UByte  VUB_ 
16 bit signed  SmallInt   VSI_ 
16 bit unsigned  Word  USmall  VUS_ 
32 bit signed  LongInt   VLI_ 
32 bit unsigned   ULong  VUL_ 
64 bit signed  Int64  QuadInt  VQI_ 
64 bit unsigned (x64 version only!)  UInt64  UQuad  VUQ_ 
16/32 bit signed  Integer   VI_ 
16/32 bit unsigned  Cardinal  UInt  VU_ 
To have a Boolean data type available which is of the same size as Integer, we define the type IntBool. It is equivalent to LongBool in Delphi. You will see the IntBool type as the return value of many mathematical VectorLib functions.
2.2 Complex Numbers
As described in greater detail for CMATH, OptiVec supports complex numbers both in cartesian and polar format.
If you use only the vectorized complex functions (but not the scalar functions of CMATH), you need not explicitly include CMATH. In this case, the following complex data types are defined in <VecLib.h> for C/C++:
typedef struct { float Re, Im; } fComplex;
typedef struct { double Re, Im; } dComplex;
typedef struct { extended Re, Im; } eComplex;
typedef struct { float Mag, Arg; } fPolar;
typedef struct { double Mag, Arg; } dPolar;
typedef struct { extended Mag, Arg; } ePolar;
The corresponding definitions for Pascal/Delphi are contained in the unit VecLib:
type fComplex = record Re, Im: Float; end;
type dComplex = record Re, Im: Double; end;
type eComplex = record Re, Im: Extended; end;
type fPolar = record Mag, Arg: Float; end;
type dPolar = record Mag, Arg: Double; end;
type ePolar = record Mag, Arg: Extended; end;
If, for example, a complex number z is declared as "fComplex z;", the real and imaginary parts of z are available as z.Re and z.Im, resp. Complex numbers are initialized either by setting the constituent parts separately to the desired value, e.g.,
z.Re = 3.0; z.Im = 5.7;
p.Mag = 4.0; p.Arg = 0.7;
(of course, the assignment operator is := in Pascal/Delphi).
Alternatively, the same initialization can be accomplished by the
functions fcplx or fpolr:
C/C++:
z = fcplx( 3.0, 5.7 );
p = fpolr( 4.0, 0.7 );
Pascal/Delphi:
fcplx( z, 3.0, 5.7 );
fpolr( p, 3.0, 5.7 );
For doubleprecision complex numbers, use dcplx and dpolr, for extendedprecision complex numbers, use ecplx and epolr.
Pointers to arrays or vectors of complex numbers are declared using the data types cfVector, cdVector, and ceVector (for cartesian complex) and pfVector, pdVector, and peVector (for polar complex) described below.
2.3 Vector Data Types
We define, as usual, a "vector" as a onedimensional array of data containing, at least, one element, with all elements being of the same data type. Using a more mathematical definition, a vector is a rankone tensor. A twodimensional array (i.e. a ranktwo tensor) is denoted as a "matrix", and higher dimensions are always referred to as "tensors".
In contrast to other approaches, VectorLib does not allow zerosize vectors!The basis of all VectorLib routines is formed by the various vector data types given below and declared in <VecLib.h> or the unit VecLib. In contrast to the fixedsize static arrays, the VectorLib types use dynamic memory allocation and allow for varying sizes. Because of this increased flexibility, we recommend that you predominantly use the latter. Here they are:
C/C++
typedef  float *  fVector 
typedef  double *  dVector 
typedef  extended *  eVector 
typedef  fComplex *  cfVector 
typedef  dComplex *  cdVector 
typedef  eComplex *  ceVector 
typedef  fPolar *  pfVector 
typedef  dPolar *  pdVector 
typedef  ePolar *  peVector 
typedef  int *  iVector 
typedef  byte *  biVector 
typedef  short *  siVector 
typedef  long *  liVector 
typedef  quad *  qiVector 
typedef  uquad *  uqVector 
typedef  unsigned *  uVector 
typedef  unsigned byte *  ubVector 
typedef  unsigned short *  usVector 
typedef  unsigned long *  ulVector 
typedef  ui *  uiVector 
 
Pascal/Delphi
type  fVector  = ^Float; 
type  dVector  = ^Double; 
type  eVector  = ^Extended; 
type  cfVector  = ^fComplex; 
type  cdVector  = ^dComplex; 
type  ceVector  = ^eComplex; 
type  pfVector  = ^fPolar; 
type  pdVector  = ^dPolar; 
type  peVector  = ^ePolar 
type  iVector  = ^Integer; 
type  biVector  = ^ByteInt; 
type  siVector  = ^SmallInt; 
type  liVector  = ^LongInt; 
type  qiVector  = ^QuadInt; 
type  uVector  = ^UInt; 
type  ubVector  = ^UByte; 
type  usVector  = ^USmall; 
type  ulVector  = ^ULong; 
  

Internally, a data type like fVector means "pointer to float", but you may think of a variable declared as fVector rather in terms of a "vector of floats".

Note: in connection with Windows programs, often the letter "l" or "L" is used to denote "long int" variables. In order to prevent confusion, however, the data type "long int" is signalled by "li" or "LI", and the data type "unsigned long" is signalled by "ul" or "UL". Conflicts with prefixes for "long double" vectors are avoided by deriving these from the alias name "extended" and using "e", "ce", "E", and "CE", as described above and in the following. 
C/C++ specific:
Vector elements can be accessed either with the [] operator, like VA[375] = 1.234;
or by the typespecific functions VF_element (returns the value of the
desired vector element, but cannot be used to overwrite the element) and
VF_Pelement (returns the pointer to a vector element).
Especially for some older Borland C versions (which have a bug in the
pointerarithmetics), VF_Pelement has to be used instead of the syntax
X+n.
In your programs, you may mix these vector types with the static arrays of classic C style.
For example:
float a[100]; /* classic static array */
fVector b=VF_vector(100); /* VectorLib vector */
VF_equ1( a, 100 ); /* set the first 100 elements of a equal to 1.0 */
VF_equC( b, 100, 3.7 ); /* set the first 100 elements of b equal to 3.7 */
Pascal/Delphi specific:
As in C/C++, you may mix these vector types with the static arrays of classic Pascal style. Static arrays have to be passed to OptiVec functions with the "address of" operator. Here, the above example reads:
a: array[0..99] of Single; (* classic static array *)
b: fVector;(* VectorLib vector *)
b := VF_vector(100);
VF_equ1( @a, 100 ); (* set first 100 elements of a = 1.0 *)
VF_equC( b, 100, 3.7 ); (* set first 100 elements of b = 3.7 *)
Delphi also offers dynamicallyallocated arrays, which may also be used as arguments for OptiVec functions. The following table compares the pointerbased vectors of VectorLib with the array types of Pascal/Delphi:
 OptiVec vectors  Pascal/Delphi static/dynamic arrays 
alignment of first element  on 32byte boundary for optimum cacheline matching  2 or 4byte boundary (may cause linebreak penalty for double, QuadInt) 
alignment of following elements  packed (i.e., no dummy bytes between elements, even for 8, 10, and 16bit types  arrays must be declared as "packed" for Delphi 4+ to be compatible with OptiVec 
index range checking  none  automatic with builtin size information 
dynamic allocation  function VF_vector, VF_vector0  procedure SetLength (Delphi 4+ only) 
initialization with 0  optional by calling VF_vector0  always (Delphi 4+ only) 
deallocation  function V_free, V_freeAll  procedure Finalize (Delphi 4+ only) 
reading single elements  function VF_element: a := VF_element(X,5); Delphi 4+ only: typecast into array also possible: a := fArray(X)[5];  index in brackets: a := X[5]; 
setting single elements  function VF_Pelement: VF_Pelement(X,5)^ := a; Delphi 4+ only: typecast into array also possible: fArray(X)[5] := a;  index in brackets: X[5] := a; 
passing to OptiVec function  directly: VF_equ1( X, sz );  addressof operator: VF_equ1( @X, sz ); 
passing subvector to OptiVec function  function VF_Pelement: VF_equC( VF_Pelement(X,10), sz10, 3.7);  addressof operator: VF_equC( @X[10], sz10, 3.7 ); 
Summarizing the properties of OptiVec vectors and of Pascal/Delphi arrays, the latter are somewhat more convenient and, due to the index range checking, safer, whereas the pointerbased OptiVec vectors are processed faster (due to the better alignment and to the absence of checking routines).
Back to VectorLib Table of Contents
OptiVec home
2.4 Vector Function Prefixes
In the plainC, Pascal and Delphi versions, every OptiVec function has a prefix denoting the datatype on which it acts. (Read here about the overloaded C++ functions of VecObj.)
Prefix  Arguments and return value 
VF_  fVector and float 
VD_  dVector and double 
VE_  eVector and extended (long double) 
VCF_  cfVector and fComplex 
VCD_  cdVector and dComplex 
VCE_  ceVector and eComplex 
VPF_  pfVector and fPolar 
VPD_  pdVector and dPolar 
VPE_  peVector and ePolar 
VI_  iVector and int / Integer 
VBI_  biVector and byte / ByteInt 
VSI_  siVector and short int / SmallInt 
VLI_  liVector and long int / LongInt 
VQI_  qiVector and quad / QuadInt 
VU_  uVector and unsigned / UInt 
VUB_  ubVector and unsigned char / UByte 
VUS_  usVector and unsigned short / USmall 
VUL_  ulVector and unsigned long / ULong 
VUQ_  uqVector and uquad / UQuad (for Win64 only!) 
VUI_  uiVector and ui 
V_  (datatype conversions like V_FtoD, datatype independent functions like V_initPlot) 
Back to VectorLib Table of Contents
OptiVec home
3. VecObj, theObjectOriented Interface for VectorLib
VecObj, the objectoriented C++ interface to OptiVec vector functions was written by Brian Dale, Case Western Reserve University.
Among the advantages it offers are the following:
 automatic allocation and deallocation of memory
 simplified vector handling
 greatly reduced risk of memory leaks
 increased memory access safety
 intuitive overloaded operators
 simpler function calls
There are a few drawbacks, though, which you should be aware of:
 increased compiler load
 larger overhead (as for any encapsulated C++ code!), leading to
 increased code size
 decreased computational efficiency
 vectors can be processed only as a whole, not in parts
VecObj is contained in the includefiles <VecObj.h>, <fVecObj.h>, <dVecObj.h> etc., with one includefile for each of the datatypes supported in OptiVec.
To get the whole interface (for all data types at once),
#include <OptiVec.h>.
For access to any of the vector graphics functions, always include <OptiVec.h>.
MS Visual C++ and Embarcadero / Borland C++ Builder (but not previous Borland C++ versions): Programmers should put the directive
"using namespace OptiVec;"
either in the body of any function that usestVecObj, or in the global declaration part of the program. Placing the directive in the function body is safer, avoiding potential namespace conflicts in other functions.
The vector objects are defined as classes vector<T>, encapsulating the vector address (pointer) and size.
For easier use, these classes got alias names fVecObj, dVecObj, and so on, with the datatype signalled by the first one or two letters of the class name, in the same way as the vector types described above.
All functions defined in VectorLib for a specific vector datatype are contained as member functions in the respective tVecObj class.
The constructors are available in four forms:
vector(); // no memory allocated, size set to 0
vector( ui size ); // vector of size elements allocated
vector( ui size, T fill ); // as before, but initialized with value "fill"
vector( vector<T> init ); // creates a copy of the vector "init"
For all vector classes, the arithmetic operators
+  * / += = *= /=
are defined, with the exception of the polarcomplex vector classes, where only multiplications and divisions, but no additions or subtractions are supported. These operators are the only cases in which you can directly assign the result of a calculation to a vector object, like fVecObj Z = X + Y; or
fVecObj Z = X * 3.5;
Note, however, that the C++ class syntax rules do not allow a very efficient implementation of these operators. The arithmetic member functions are much faster. If speed is an issue, use
fVecObj Z.addV( X, Y ); or
fVecObj Z.mulC( X, 3.5 );
instead of the operator syntax. The operator * refers to elementwise multiplication, not to the scalar product of two vectors.
All other arithmetic and math functions can only be called as member functions of the respective output vector as, for example, Y.exp(X). Although it would certainly be more logical to have these functions defined in such a way that you could write "Y = exp(X)" instead, the memberfunction syntax was chosen for efficiency considerations: The only way to implement the second variant is to store the result of the exponential function of X first in a temporary vector, which is then copied into Y, thus considerably increasing the workload and memory demands.
While most VecObjfunctions are member functions of the output vector, there exists a number of functions which do not have an output vector. In these cases, the functions are member functions of an input vector.
Example: s = X.mean();.
If you ever need to process a VecObj vector in a "classic" plainC VectorLib function (for example, to process only some part of it), you may use the member functions
getSize() to retrieve its size,
getVector() for the pointer (of data type tVector, where "t" stands for the usual type prefix), and
Pelement( n ) for a pointer to the to the n'th element.
Continue with chapter 4. VectorLib Functions and Routines: A Short Overview Back to VectorLib Table of Contents
OptiVec home
Copyright © 19982013 OptiCode – Dr. Martin Sander Software Development
