VectorLib



Site Index:

OptiVec home
MatrixLib
CMATH
Download
Order
Update
Support

VectorLib

VectorLib is the vector functions part of OptiVec. This file describes the basic principles of the OptiVec libraries and gives an overview over VectorLib. The new object-oriented interface, VecObj, is described in chapter 3. MatrixLib and CMATH are described separately.

Contents

1. Introduction
1.1 Why Vectorized Programming Pays Off on the PC
 1.1.1 General OptiVec Optimization Strategies
 1.1.2 Multi-Processor Optimization
 1.1.3 CUDA Device Support
 1.1.4 Choosing the right OptiVec Library
2. The Elements of OptiVec Routines
2.1 Synonyms for Some Data Types
2.2 Complex Numbers
2.3 Vector Data Types
2.4 Vector Function Prefixes
3. C++ only: VecObj, the Object-Oriented Interface for VectorLib
4. VectorLib Functions and Routines: A Short Overview
4.1 Generation, Initialization and De-Allocation of Vectors
4.2 Index-oriented Manipulations
4.3 Data-Type Interconversions
4.4 More about Integer Arithmetics
4.5 Basic Functions of Complex Vectors
4.6 Mathematical Functions
4.6.1 Rounding
4.6.2 Comparisons
4.6.3 Direct Bit-Manipulation
4.6.4 Basic Arithmetics, Accumulations
4.6.5 Geometrical Vector Arithmetics
4.6.6 Powers
4.6.7 Exponentials and Hyperbolic Functions
4.6.8 Logarithms
4.6.9 Trigonometric Functions
4.7 Analysis
4.8 Signal Processing: Fourier Transforms and Related Topics
4.9 Statistical Functions and Building Blocks
4.10 Data Fitting
4.11 Input and Output
4.12 Graphics
5. Error Handling
5.1 General Remarks
5.2 Integer Errors
5.3 Floating-Point Errors
5.3.1 C/C++ specific
5.3.2 Pascal/Delphi specific
5.3.3 Error Types (Both C/C++ and Pascal/Delphi)
5.4 The Treatment of Denormal Numbers
5.5 Advanced Error Handling: Writing Messages into a File
5.6 OptiVec Error Messages
6. Trouble-Shooting
7. The Include-Files and Units of OptiVec


1. Introduction

OptiVec offers a powerful set of routines for numerically demanding applications, making the philosophy of vectorized programming available for C/C++ and Pascal/Delphi languages. It serves to overcome the limitations of loop management of conventional compilers – which proved to be one of the largest obstacles in the programmer's way towards efficient coding for scientific and data analysis applications.

In comparison to the old vector language APL, OptiVec has the advantage of being incorporated into the modern and versatile languages C/C++ and Pascal/Delphi. Recent versions of C++ and Fortran do already offer some sort of vector processing, by virtue of iterator classes using templates (C++) and field functions (Fortran90). Both of these, however, are basically a convenient means of letting the compiler write the loop for you and then compile it to the usual inefficient code. The same is true for most implementations of the popular BLAS (Basic Linear Algebra Subroutine) libraries. In comparison to these approaches, OptiVec is superior mainly with respect to execution speed – on the average by a factor of 2-3, in some cases even up to 8. The performance is no longer limited by the quality of your compiler, but rather by the real speed of the processor!

There is a certain overlap in the range of functions offered by OptiVec and by BLAS, LINPACK, and other libraries and source-code collections. However, the latter must be compiled, and, consequently, their performance is determined mainly by the quality of the compiler chosen. To the best of our knowledge, it is our product, OptiVec, that offers the first comprehensive vectorized-functions library realized in a true Assembler implementation.

  • All operators and mathematical functions of C/C++ are implemented in vectorized form; additionally many more mathematical functions are included which normally would have to be calculated by more or less complicated combinations of existing functions. Not only the execution speed, but also the accuracy of the results is greatly improved.
  • Building blocks for statistical data analysis are supplied.
  • Derivatives, integrals, interpolation schemes are included.
  • Fast Fourier Transform techniques allow for efficient convolutions, correlation analyses, spectral filtering, and so on.
  • Graphical representation of data offers a convenient way of monitoring the results of vectorized calculations.
  • A wide range of optimized matrix functions like matrix arithmetics, algebra, decompositions, data fitting, etc. is offered by MatrixLib.
    TensorLib is planned as a future extension of these concepts for general multidimensional arrays.
  • Each function exists for every data type for which this is reasonable. The data type is signalled by the prefix of the function name. No implicit name mangling or other specific C++ features are used, which makes OptiVec usable in plain-C as well as in C++ programs. Moreover, the names and the syntax of nearly all functions are the same in C/C++ and Pascal/Delphi languages.
  • The input and output vectors/matrices of VectorLib and MatrixLib routines may be of variable size and it is possible to process only a part (e.g., the first 100 elements, or every 10th element) of a vector, which is another important advantage over other approaches, where only whole arrays are processed.
  • A new object-oriented interface for C++, named VecObj, encapsulates all vector functions, offering even easier use and increased memory safety.
  • Using OptiVec routines instead of loops can make your source code much more compact and far better readable.

The wide range of routines and functions covered by OptiVec, the high numerical efficiency and increased ease of programming make this package a powerful programming tool for scientific and data analysis applications, competing with (and often beating) many high-priced integrated systems, but imbedded into your favourite programming language.

Back to VectorLib Table of Contents   OptiVec home 

1.1 Why Vectorized Programming Pays Off on the PC

To process one-dimensional data arrays or "vectors", a programmer would normally write a loop over all vector elements. Similarly, two- or higher-dimensional arrays ("matrices" or "tensors") are usually processed through nested loops over the indices in all dimensions. The alternative to this classic style of programming are vector and matrix functions.
Vector functions act on whole arrays/vectors instead of single scalar arguments. They are the most consequent form of "vectorization", i.e., organisation of program code (by clever compilers or by the programmer himself) in such a way as to optimize vector treatment.

Vectorization has always been the magic formula for supercomputers with their multi-processor parallel architectures. On these architectures, one tries to spread the computational effort equally over the available processors, thus maximizing execution speed. The so-called "divide and conquer" algorithms break down more complicated numerical tasks into small loops over array elements. Sophisticated compilers then find out the most efficient way how to distribute the array elements among the processors. Many supercomputer compilers also come with a large set of pre-defined proprietary vector and matrix functions for many basic tasks. These vectorized functions offer the best way to achieve maximum throughput.

Obviously, the massive parallel processing of, say, a Cray is not possible even on modern PCs with their modest 2 or 4-processor core configurations, let alone on the classical single-processor PC. Consequently, at first sight, it might seem difficult to apply the principle of vectorized programming to the PC. Actually, however, there are many vector-specific optimizations possible, even for computers with only one CPU. Most of these optimizations are not available to present compilers. Rather, one has to go down to the machine-code level. Hand-optimized, Assembler-written vector functions outperform compiled loops by a factor of two to three, on the average. This means that vectorization, properly done, is indeed worth the effort, also for PC programs.

1.1.1 General OptiVec Optimization Strategies

Here are the most important optimization strategies, employed in OptiVec to boost the performance on any PC (regardless of the number of processor cores):

Prefetch of chunks of vector elements
Beginning with the Pentium III processor, Intel introduced the very useful feature of explicit memory prefetch. With these commands, it is possible to "tell" the processor to fetch data from memory sufficiently in advance, so that no time is waisted waiting for them when they are actually needed.

Cache control
The Pentium III+ processors offer the possibility to mark data as "temporal" (will be used again) or "non-temporal" (used only once), while they are fetched or stored. In OptiVec functions, it is assumed that input vectors (and matrices) will not be used again, whereas the output vectors are likely to become the input for some ensuing procedure. Consequently, the cache is bypassed while loading input data, but the output data are written into the cache. Of course, this approach breaks down if the vectors or matrices become too large to fit into the cache. For these cases, a large-vector version of the OptiVec libraries is available which bypasses the cache also while writing the output vectors. For simple arithmetic functions, up to 20% in speed are gained as compared to the small-and-medium-size version. On the other hand, as this large-vector version effectively switches the cache off, a drastic performance penalty (up to a factor of three or four!) will result, if it is used for smaller systems. For the same reason, you should carefully check if your problem could perhaps be split up into smaller vectors, before resorting to the large-vector version. This would allow to achieve the much higher performance resulting from efficient data caching.

Use of SIMD commands
You might wonder why this strategy is not listed first. The SSE or "Streaming Single-Instruction-Multiple-Data Extensions" of Pentium III, Pentium 4 and their successors provide explicit support for vectorized programming with floating-point data in float / single or double precision (the latter only for Pentium 4). At first sight, therefore, they should revolutionize vector programming. Given the normal relation between processor and data bus speeds, however, many of the simple arithmetic operations are data transfer limited, and the use of SIMD commands does not make the large difference (with respect to well-written FPU code) it could make otherwise. In most cases, the advantage of treating four floats in a single command melts down to a 20-30% increase in speed (which is not that bad, anyway!). For more complicated operations, on the other hand, SIMD commands often cannot be employed, either because conditional branches have to be taken for each vector element individually, or because the "extra" accuracy and range, available by traditional FPU commands (with their internal extended accuracy), allows to simplify algorithms so much that the FPU code is still faster. As a consequence, we use SIMD commands only where a real speed gain is possible. Please note, however, that, the SIMD-employing library versions (P6, P7 etc.) generally sacrifices 2-3 digits of accuracy in order to attain the described speed gain. If this is not acceptable for your specific task, please stay with the P4 libraries.

Preload of floating-point constants
Floating-point constants, employed in the evaluation of mathematical functions, are loaded into floating-point registers outside the actual loop and stay as long as they are needed. This saves a large amount of loading/unloading operations which are necessary if a mathematical function is called for each element of a vector separately.

Full XMM and FPU stack usage
Where necessary, all eight (64-bit: all sixteen) XMM registers and/or all eight coprocessor registers are employed.

Superscalar scheduling
By careful "pairing" of commands whose results do not depend upon each other, the two integer pipes and the two fadd/fmul units of the processor are used as efficiently as possible.

Loop-unrolling
Where optimum pairing of commands cannot be achieved for single elements, vectors are often processed in chunks of two, four, or even more elements. This allows to fully exploit the parallel-processing capabilities of the Pentium and its successors. Moreover, the relative amount of time spent for loop management is significantly reduced. In connection with data-prefetching, described above, the depth of the unrolled loops is most often adapted to the cache line size of 32 bytes (PentiumXX) or 64 bytes (AMD 64 x2 or Core2 Duo).

Simplified addressing
The addressing of vector elements is still a major source of inefficiency with present compilers. Switching forth and back between input and output vectors, a large number of redundant addressing operations is performed. The strict (and easy!) definitions of all OptiVec functions allow to reduce these operations to a minimum.

Replacement of floating-point by integer commands
For any operations with floating-point numbers that can also be performed using integer commands (like copying, swapping, or comparing to preset values), the faster method is consistently employed.

Strict precision control
C compilers convert a float into a double – Borland Pascal/Delphi even into extended – before passing it to a mathematical function. This approach was useful at times when disk memory was too great a problem to include separate functions for each data type in the .LIB files, but it is simply inefficient on modern PCs. Consequently, no such implicit conversions are present in OptiVec routines. Here, a function of a float is calculated to float (i.e. single) precision, wasting no time for the calculation of more digits than necessary – which would be discarded anyway. There is also a brute-force approach to precision-control: You can call V_setFPAccuracy( 1 ); to actively switch the FPU to single precision, if that is enough for a given application. Thereby, execution can be slightly sped up from Pentium CPUs on. Be, however, prepared to accept even lower-than-single accuracy of your end results, if you elect this option. For further details and precautions, see V_setFPAccuracy.

All-inline coding
All external function calls are eliminated from the inner loops of the vector processing. This saves the execution time necessary for the "call / ret" pairs and for loading the parameters onto the stack.

Cache-line matching of local variables
The Level-1 cache of the Pentium and its 32-bit successors is organized in lines of 32 bytes each, modern 64-bit processors use 64-byte lines. Many OptiVec functions need double-precision or extended-precision real local variables on the stack (mainly for integer/floating-point conversions or for range checking). Present compilers align the stack on 4-byte boundaries, which means there is a certain chance that the 8 bytes of a double or the 10 bytes of an extended, stored on the stack, will cross a cache-line boundary. This, in turn, would lead to a cache line-break penalty, deteriorating the performance. Consequently, those OptiVec functions where this is an issue, use special procedures to align their local variables on 8-byte (for doubles), 16-byte (for extendeds), or 32-byte boundaries (for XMM values).

Unprotected and reduced-range functions
OptiVec offers alternative forms of some mathematical functions, where you have the choice between the fully protected variant with error handling and another, unprotected variant without. In the case of the integer power functions, for example, the absence of error checking allows the unprotected versions to be vectorized much more efficiently. Similarly, the sine and cosine functions can be coded more efficiently for arguments that the user can guarantee to lie in the range -2p and +2p. In these special cases, the execution time may be reduced by up to 40%, depending on the hardware environment. This increased speed has always to be balanced against the increased risk, though: If any input element outside the valid range is encountered, the unprotected and reduced-range functions will crash without warning.

1.1.2 Multi-Processor Optimization

Multithread support
All the above being said about single-CPU PCs, multi-processors computers (with Intel's Core i3, i5, i7, Core2Duo, AMD's Athlon 64 X2, or workstations and servers equipped with 2 or 4 PentiumXX chips) do allow the operating system to distribute threads among the available processors, doubling or quadrupling the overall performance. For that, any functions running in parallel must be prevented from interfering with each other through read/write operations on global variables. With very few exceptions (namely the plotting functions, which have to use global variables to store the current window and coordinate system settings, and the non-linear data-fitting functions), all other OptiVec functions are reentrant and may run in parallel.
Be careful with multi-threading, if you are using the P6 or P7 versions of OptiVec: The earlier releases of 32-bit Windows do not save the XMM registers (employed in the SIMD commands) during task switches. No such problems have been found with Windows XP or Vista.

When designing your multi-thread application, you have two options: functional parallelism and data parallelism.

Functional Parallelism
If different threads are performing different tasks – they are functionally different – one speaks of functional parallelism. As an example, consider one thread handling user input / output, while another one performs background calculations. Even on a single-core CPU, this kind of multi-threading may offer advantages (e.g., the user interface does not block during extensive background calculations, but still takes input). On a multi-core computer, the two (or more) threads can actually run simultaneously on the different processor cores. In general, however, the load balance between the processor cores is far from perfect: often, one processor is running at maximum load, while another one is sitting idle, waiting for input. Still, functional multithreading is the best option whenever your numerical tasks involve vectors and matrices of only small-to-moderate size.

Data Parallelism
In order to improve the load balance between the available processor cores, thereby maximizing throughput, it is possible to employ classical parallel processing: the data to be processed is split up into several chunks, each thread getting one of these chunks. This is aptly called data parallelism. The usefulness of this approach is limited by the overhead involved in the data distribution and in the thread-to-thread communication. Moreover, there are always parts of the code which need to be processed sequentially and cannot be parallelized. Therefore, data parallelism pays off only for larger vectors and matrices. Typical break-even sizes range from about 100 (for the calculation of transcendental functions of complex input values) to several 10,000 elements (as in the simple arithmetic functions). Only when your vectors and matrices are considerably larger than that threshold, the performance is actually improved over a functional-parallelism approach. The boost then quickly approaches (but never exactly reaches) the theoretical limit of a factor equal to the number of processor cores available.

Choosing the right OptiVec Library
Whenever you want your application to run on a wide range of supported platforms, and when your vectors and matrices are only of small-to-moderate size, we recommend to use the general-purpose libraries, OVVC4.LIB  (for MS Visual C++),  VCF4W.LIB  (for Borland C++),  or the units in OPTIVEC\LIB4  (for Delphi). These libraries combine good performance with back-compatibility to older hardware, down to 486DX, Pentium, old models of Athlon. They are all multi-thread safe and support functional parallelism. If you do not need full floating-point accuracy and that amount of back-compatibility, you can get higher performance by switching to the P6, P7, or P8 libraries (marked by the respective number in the in the library name).

For large vectors/matrices on single-core machines from Pentium III+ on, we offer versions gaining some performance by simply bypassing the data cache. These Large-Vector Libraries are marked by the letter "L":  OVVC6L.LIB  (for MS Visual C++),  VCF6L.LIB  (for Borland C++),  or the units in OPTIVEC\LIB6L  (for Delphi). Replace the "6" with "7" to get the Pentium 4+ versions, and so on. If mis-used for smaller vectors / matrices, the Large-Vector libraries will perform significantly slower than the general-purpose libraries!

Finally, for large vectors/matrices on multi-core machines, our new multi-core optimized libraries actively distribute the work load over the available processor cores for data parallel execution. These libraries are marked by the letter "M", as in  OVVC7M.LIB  (for MS Visual C++, using SSE2),  VCF4M.LIB  (for Borland C++, full FPU accuracy),  or the units in OPTIVEC\LIB8M  (for Delphi, using SSE3). These libraries are designed for AMD 64 x2,    Intel Core2 Duo, or machines equipped with several discrete processors of the Pentium 4+ level.
The "M" libraries will still run on single-core machines, but – due to the thread-management overhead – somewhat slower than the general-purpose libraries. Although the "M" libraries are designed with medium to large vectors in mind, the penalty for using them with smaller vectors is almost negligible, as the OptiVec thread-engine automatically executes a function in a single thread, if the vector size is too small for parallel execution to earn back the cost involved in the thread-to-thread communication.
If you use the "M" libraries, your programme must call V_initMT before any of the vector functions.

1.1.3 CUDA Device Support

Modern graphics cards are equipped with powerful multiprocessor capacity of up to several hundred processor kernels running in parallel. In recent years, interfaces have been developed, allowing to exploit this processing capacity not only for graphics rendering, but also for general calculations. One of these approaches is the CUDA concept by NVIDIA. Practically all current NVIDIA graphics cards support CUDA. Additionally, dedicated CUDA hardware is being offered by NVIDIA with the "Tesla" and "Fermi" board family. With the "C" libraries (e.g., OVVC8C.LIB), OptiVec offers a simple way to use a CUDA device for vector / matrix calculations without the hassles of actually programming in CUDA. There are a number of points to be considered:
  • Obviously, the "C" libraries can be used only with a CUDA-enabled device installed. This means, only NVIDIA products are supported.
  • Out of the compilers supported by OptiVec, currently, NVIDIA provides CUDA support only for MS Visual C++. This means there are presently no CUDA OptiVec libraries for the Embarcadero / Borland compilers available.
  • It is necessary to have the latest display driver installed. Even brand-new computers most often do not have the latest drivers. They must be selected and downloaded from NVIDIA's web-site, www.nvidia.com.
  • Already a sub-100$ graphics card can boost the performance of certain functions on a computer with a medium-range CPU by a factor of 10, dedicated hardware by much more. However, the combination of a high-end CPU with a low-end graphics card (as it is often found in laptop computers) will only marginally benefit from the "C" libraries.
  • The cost of swapping data forth and back between main-board memory and graphics memory is so high that it can be "earned" back only for quite large vectors and matrices. E.g., for mathematical functions like the sine or exponential functions, CUDA pays off from 100,000 vector elements on. For matrix multiplication, payback occurs in the region of 200x200 elements. All OptiVec functions check if using the CUDA device makes sense and decide accordingly wether to source-out processing to the graphics processor or to stay on the CPU.
  • Using CUDA with OptiVec is as easy as simply linking with the "C" library and with the import libraries provided by NVIDIA. No modifications of your source code are necessary. On the other hand, by eliminating the repeated data transfers for each function, programming directly for CUDA devices with nVidia's CUDA SDK can lead to considerably higher performance than is possible with the use of the OptiVec "C" libraries.
  • As support for double-precision floating-point is more ore less restricted to the expensive Tesla and Fermi boards, OptiVec currently uses the CUDA device only for single precision.
  • The OptiVec "C" libraries actually use DLLs developed by NVIDIA. These have to be installed along with the OptiVec libraries.
  • NVIDIA might at any time change the licence terms for their CUDA libraries, so that we might at some point no longer be able to include them in our distributions and/or to support CUDA at all.

1.1.4 Choosing the right OptiVec Library

Whenever you want your application to run on a wide range of supported platforms, and when your vectors and matrices are only of small-to-moderate size, we recommend to use the general-purpose libraries, OVVC4.LIB  (for MS Visual C++),  VCF4W.LIB  (for Borland C++),  or the units in OPTIVEC\LIB4  (for Delphi). These libraries combine good performance with back-compatibility to older hardware, down to 486DX, Pentium, old models of Athlon. They are all multi-thread safe and support functional parallelism. If you do not need full floating-point accuracy and that amount of back-compatibility, you can get higher performance by switching to the P6, P7, or P8 libraries (marked by the respective number in the in the library name).

For large vectors/matrices on single-core machines from Pentium III+ on, we offer versions gaining some performance by simply bypassing the data cache. These Large-Vector Libraries are marked by the letter "L":  OVVC6L.LIB  (for MS Visual C++),  VCF6L.LIB  (for Borland C++),  or the units in OPTIVEC\LIB6L  (for Delphi). Replace the "6" with "7" to get the Pentium 4+ versions, and so on. If mis-used for smaller vectors / matrices, the Large-Vector libraries will perform significantly slower than the general-purpose libraries!

Finally, for large vectors/matrices on multi-core machines, multi-core optimized libraries actively distribute the work load over the available processor cores for data parallel execution. These libraries are marked by the letter "M", as in  OVVC7M.LIB  (for MS Visual C++, using SSE2),  VCF4M.LIB  (for Borland C++, full FPU accuracy),  or the units in OPTIVEC\LIB8M  (for Delphi, using SSE3). These libraries are designed for AMD 64 x2,    Intel Core2 Duo, or machines equipped with several discrete processors of the Pentium 4+ level. The CUDA libraries are based on the "M" libraries and are marked by the letter "C", as, e.g., in  OVVC8C.LIB.
The "M" and "C" libraries will still run on single-core machines, but – due to the thread-management overhead – somewhat slower than the general-purpose libraries. Although the "M" libraries are designed with medium to large vectors in mind, the penalty for using them with smaller vectors is almost negligible, as the OptiVec thread-engine automatically executes a function in a single thread, if the vector size is too small for parallel execution to earn back the cost involved in the thread-to-thread communication.
If you use the "M" or "C" libraries, your programme must call V_initMT( nAvailProcCores ) before any of the vector functions.

Back to VectorLib Table of Contents   OptiVec home 


2. Elements of OptiVec Routines

2.1 Synonyms for Some Data Types

To increase the versatility and completeness of OptiVec, additional data types are defined in <VecLib.h> or the unit VecLib:

a) C/C++ only:

The data type ui (short for "unsigned index") is used for the indexing of vectors and is defined as "unsigned int".

The 64-bit integer data type (__int64 in BC++ Builder and MS Visual C++, Int64 in Delphi) is called quad (for "quadword integer") in OptiVec.
In 32-bit, the type quad is always signed. Functions for unsigned 64-bit integers are available only in the 64-bit versions of OptiVec.

  • Borland C++ below C++ Builder 2006 only: For the older BC versions, which did not directly support 64-bit integers, the data type quad is implemented as a struct of two 32-bit values. Floating-point numbers (preferably long doubles with their 64-bit mantissa) have to be used as intermediates. The necessary interface functions are setquad, quadtod and _quadtold. Alternatively, the two 32-bit halves may explicitly be set, as in:
    xq.Hi = 0x00000001UL;
    xq.Lo = 0x2468ABCDUL;

The data type extended, which is familiar to Pascal/Delphi programmers, is defined as a synonym for "long double" in OptiVec for C/C++. As Visual C++ does not support 80-bit reals, we define extended as "double" in the OptiVec versions for that compiler.

b) Delphi only:

The data type Float, which is familiar to C/C++ programmers, is defined as a synonym for Single. We prefer to have the letters defining the real-number data types in alphabetical proximity: "D" for Double, "E" for Extended, and "F" for Float. As noted above, possible future 128-bit and 256-bit real numbers could find their place in this series as "G" for Great and "H" for Hyper.

For historical reasons (dating back to the development of Turbo Pascal), the various integer data types have a somewhat confusing nomenclature in Delphi. In order to make the derived function prefixes compatible with the C/C++ versions of OptiVec, we define a number of synonyms, as described in the following table:
typeDelphi namesynonymderived prefix
8 bit signedShortIntByteIntVBI_
8 bit unsignedByteUByteVUB_
16 bit signed SmallInt VSI_
16 bit unsigned WordUSmallVUS_
32 bit signed LongInt VLI_
32 bit unsigned  ULongVUL_
64 bit signed Int64QuadIntVQI_
64 bit unsigned (x64 version only!)UInt64UQuadVUQ_
16/32 bit signedInteger VI_
16/32 bit unsignedCardinalUIntVU_

To have a Boolean data type available which is of the same size as Integer, we define the type IntBool. It is equivalent to LongBool in Delphi. You will see the IntBool type as the return value of many mathematical VectorLib functions.

2.2 Complex Numbers

As described in greater detail for CMATH, OptiVec supports complex numbers both in cartesian and polar format.

If you use only the vectorized complex functions (but not the scalar functions of CMATH), you need not explicitly include CMATH. In this case, the following complex data types are defined in <VecLib.h> for C/C++:
typedef struct { float Re, Im; } fComplex;
typedef struct { double Re, Im; } dComplex;
typedef struct { extended Re, Im; } eComplex;
typedef struct { float Mag, Arg; } fPolar;
typedef struct { double Mag, Arg; } dPolar;
typedef struct { extended Mag, Arg; } ePolar;

The corresponding definitions for Pascal/Delphi are contained in the unit VecLib:
type fComplex = record Re, Im: Float; end;
type dComplex = record Re, Im: Double; end;
type eComplex = record Re, Im: Extended; end;
type fPolar = record Mag, Arg: Float; end;
type dPolar = record Mag, Arg: Double; end;
type ePolar = record Mag, Arg: Extended; end;

If, for example, a complex number z is declared as "fComplex z;", the real and imaginary parts of z are available as z.Re and z.Im, resp. Complex numbers are initialized either by setting the constituent parts separately to the desired value, e.g.,
z.Re = 3.0; z.Im = 5.7;
p.Mag = 4.0; p.Arg = 0.7;

(of course, the assignment operator is := in Pascal/Delphi).
Alternatively, the same initialization can be accomplished by the functions fcplx or fpolr:
C/C++:
z = fcplx( 3.0, 5.7 );
p = fpolr( 4.0, 0.7 );

Pascal/Delphi:
fcplx( z, 3.0, 5.7 );
fpolr( p, 3.0, 5.7 );

For double-precision complex numbers, use dcplx and dpolr, for extended-precision complex numbers, use ecplx and epolr.
Pointers to arrays or vectors of complex numbers are declared using the data types cfVector, cdVector, and ceVector (for cartesian complex) and pfVector, pdVector, and peVector (for polar complex) described below.

2.3 Vector Data Types

We define, as usual, a "vector" as a one-dimensional array of data containing, at least, one element, with all elements being of the same data type. Using a more mathematical definition, a vector is a rank-one tensor. A two-dimensional array (i.e. a rank-two tensor) is denoted as a "matrix", and higher dimensions are always referred to as "tensors".
In contrast to other approaches, VectorLib does not allow zero-size vectors!

The basis of all VectorLib routines is formed by the various vector data types given below and declared in <VecLib.h> or the unit VecLib. In contrast to the fixed-size static arrays, the VectorLib types use dynamic memory allocation and allow for varying sizes. Because of this increased flexibility, we recommend that you predominantly use the latter. Here they are:
 
C/C++
typedeffloat *fVector
typedefdouble *dVector
typedefextended *eVector
typedeffComplex *cfVector
typedefdComplex *cdVector
typedefeComplex *ceVector
typedeffPolar *pfVector
typedefdPolar *pdVector
typedefePolar *peVector
typedefint *iVector
typedefbyte *biVector
typedefshort *siVector
typedeflong *liVector
typedefquad *qiVector
typedefuquad *uqVector
typedefunsigned *uVector
typedefunsigned byte *ubVector
typedefunsigned short *usVector
typedefunsigned long *ulVector
typedefui *uiVector
  Pascal/Delphi
typefVector= ^Float;
typedVector= ^Double;
typeeVector= ^Extended;
typecfVector= ^fComplex;
typecdVector= ^dComplex;
typeceVector= ^eComplex;
typepfVector= ^fPolar;
typepdVector= ^dPolar;
typepeVector= ^ePolar
typeiVector= ^Integer;
typebiVector= ^ByteInt;
typesiVector= ^SmallInt;
typeliVector= ^LongInt;
typeqiVector= ^QuadInt;
typeuVector= ^UInt;
typeubVector= ^UByte;
typeusVector= ^USmall;
typeulVector= ^ULong;

Internally, a data type like fVector means "pointer to float", but you may think of a variable declared as fVector rather in terms of a "vector of floats".
 
Note: in connection with Windows programs, often the letter "l" or "L" is used to denote "long int" variables. In order to prevent confusion, however, the data type "long int" is signalled by "li" or "LI", and the data type "unsigned long" is signalled by "ul" or "UL". Conflicts with prefixes for "long double" vectors are avoided by deriving these from the alias name "extended" and using "e", "ce", "E", and "CE", as described above and in the following.
 
C/C++ specific:
Vector elements can be accessed either with the [] operator, like VA[375] = 1.234;
or by the type-specific functions VF_element (returns the value of the desired vector element, but cannot be used to overwrite the element) and VF_Pelement (returns the pointer to a vector element).
Especially for some older Borland C versions (which have a bug in the pointer-arithmetics), VF_Pelement has to be used instead of the syntax X+n.
In your programs, you may mix these vector types with the static arrays of classic C style.
For example:
float a[100]; /* classic static array */
fVector b=VF_vector(100); /* VectorLib vector */
VF_equ1( a, 100 ); /* set the first 100 elements of a equal to 1.0 */
VF_equC( b, 100, 3.7 ); /* set the first 100 elements of b equal to 3.7 */

Pascal/Delphi specific:
As in C/C++, you may mix these vector types with the static arrays of classic Pascal style. Static arrays have to be passed to OptiVec functions with the "address of" operator. Here, the above example reads:
a: array[0..99] of Single; (* classic static array *)
b: fVector;(* VectorLib vector *)
b := VF_vector(100);
VF_equ1( @a, 100 ); (* set first 100 elements of a = 1.0 *)
VF_equC( b, 100, 3.7 ); (* set first 100 elements of b = 3.7 *)

Delphi also offers dynamically-allocated arrays, which may also be used as arguments for OptiVec functions. The following table compares the pointer-based vectors of VectorLib with the array types of Pascal/Delphi:
 
 OptiVec vectorsPascal/Delphi static/dynamic arrays
alignment of first elementon 32-byte boundary for optimum cache-line matching2 or 4-byte boundary (may cause line-break penalty for double, QuadInt)
alignment of following elementspacked (i.e., no dummy bytes between elements, even for 8, 10, and 16-bit typesarrays must be declared as "packed" for Delphi 4+ to be compatible with OptiVec
index range checkingnoneautomatic with built-in size information
dynamic allocationfunction VF_vector, VF_vector0procedure SetLength (Delphi 4+ only)
initialization with 0optional by calling VF_vector0always (Delphi 4+ only)
de-allocationfunction V_free, V_freeAllprocedure Finalize (Delphi 4+ only)
reading single elementsfunction VF_element:
a := VF_element(X,5);
Delphi 4+ only: typecast into array also possible:
a := fArray(X)[5];
index in brackets:
a := X[5];
setting single elementsfunction VF_Pelement:
VF_Pelement(X,5)^ := a;
Delphi 4+ only: typecast into array also possible:
fArray(X)[5] := a;
index in brackets:
X[5] := a;
passing to OptiVec functiondirectly:
VF_equ1( X, sz );
address-of operator:
VF_equ1( @X, sz );
passing sub-vector to OptiVec functionfunction VF_Pelement:
VF_equC( VF_Pelement(X,10), sz-10, 3.7);
address-of operator:
VF_equC( @X[10], sz-10, 3.7 );
 
Summarizing the properties of OptiVec vectors and of Pascal/Delphi arrays, the latter are somewhat more convenient and, due to the index range checking, safer, whereas the pointer-based OptiVec vectors are processed faster (due to the better alignment and to the absence of checking routines).

Back to VectorLib Table of Contents   OptiVec home 

2.4 Vector Function Prefixes

In the plain-C, Pascal and Delphi versions, every OptiVec function has a prefix denoting the data-type on which it acts. (Read here about the overloaded C++ functions of VecObj.)
 
Prefix Arguments and return value
VF_fVector and float
VD_dVector and double
VE_eVector and extended (long double) 
VCF_cfVector and fComplex
VCD_cdVector and dComplex 
VCE_ceVector and eComplex
VPF_pfVector and fPolar
VPD_ pdVector and dPolar
VPE_peVector and ePolar
VI_iVector and int / Integer
VBI_biVector and byte / ByteInt
VSI_siVector and short int / SmallInt
VLI_liVector and long int / LongInt
VQI_qiVector and quad / QuadInt
VU_uVector and unsigned / UInt
VUB_ubVector and unsigned char / UByte
VUS_usVector and unsigned short / USmall
VUL_ulVector and unsigned long / ULong
VUQ_uqVector and uquad / UQuad (for Win64 only!)
VUI_uiVector and ui
V_(data-type conversions like V_FtoD, data-type independent functions like V_initPlot)
 

Back to VectorLib Table of Contents   OptiVec home 


3. VecObj, theObject-Oriented Interface for VectorLib

VecObj, the object-oriented C++ interface to OptiVec vector functions was written by Brian Dale, Case Western Reserve University.
Among the advantages it offers are the following:
  • automatic allocation and deallocation of memory
  • simplified vector handling
  • greatly reduced risk of memory leaks
  • increased memory access safety
  • intuitive overloaded operators
  • simpler function calls
There are a few draw-backs, though, which you should be aware of:
  • increased compiler load
  • larger overhead (as for any encapsulated C++ code!), leading to
  • increased code size
  • decreased computational efficiency
  • vectors can be processed only as a whole, not in parts
VecObj is contained in the include-files <VecObj.h>, <fVecObj.h>, <dVecObj.h> etc., with one include-file for each of the data-types supported in OptiVec.
To get the whole interface (for all data types at once),
#include <OptiVec.h>.
For access to any of the vector graphics functions, always include <OptiVec.h>.

MS Visual C++ and Embarcadero / Borland C++ Builder (but not previous Borland C++ versions): Programmers should put the directive
"using namespace OptiVec;"
either in the body of any function that usestVecObj, or in the global declaration part of the program. Placing the directive in the function body is safer, avoiding potential namespace conflicts in other functions.
The vector objects are defined as classes vector<T>, encapsulating the vector address (pointer) and size.
For easier use, these classes got alias names fVecObj, dVecObj, and so on, with the data-type signalled by the first one or two letters of the class name, in the same way as the vector types described above.

All functions defined in VectorLib for a specific vector data-type are contained as member functions in the respective tVecObj class.
The constructors are available in four forms:
vector(); // no memory allocated, size set to 0
vector( ui size ); // vector of size elements allocated
vector( ui size, T fill ); // as before, but initialized with value "fill"
vector( vector<T> init ); // creates a copy of the vector "init"

For all vector classes, the arithmetic operators
+    -    *    /    +=    -=    *=    /=
are defined, with the exception of the polar-complex vector classes, where only multiplications and divisions, but no additions or subtractions are supported. These operators are the only cases in which you can directly assign the result of a calculation to a vector object, like
fVecObj Z = X + Y; or
fVecObj Z = X * 3.5;
Note, however, that the C++ class syntax rules do not allow a very efficient implementation of these operators. The arithmetic member functions are much faster. If speed is an issue, use
fVecObj Z.addV( X, Y ); or
fVecObj Z.mulC( X, 3.5 );
instead of the operator syntax. The operator * refers to element-wise multiplication, not to the scalar product of two vectors.

All other arithmetic and math functions can only be called as member functions of the respective output vector as, for example, Y.exp(X). Although it would certainly be more logical to have these functions defined in such a way that you could write "Y = exp(X)" instead, the member-function syntax was chosen for efficiency considerations: The only way to implement the second variant is to store the result of the exponential function of X first in a temporary vector, which is then copied into Y, thus considerably increasing the work-load and memory demands.

While most VecObjfunctions are member functions of the output vector, there exists a number of functions which do not have an output vector. In these cases, the functions are member functions of an input vector.
Example: s = X.mean();.

If you ever need to process a VecObj vector in a "classic" plain-C VectorLib function (for example, to process only some part of it), you may use the member functions
getSize() to retrieve its size,
getVector() for the pointer (of data type tVector, where "t" stands for the usual type prefix), and
Pelement( n ) for a pointer to the to the n'th element.

Continue with chapter 4. VectorLib Functions and Routines: A Short Overview
Back to VectorLib Table of Contents      OptiVec home 

Copyright © 1998-2013 OptiCode – Dr. Martin Sander Software Development

Last modified: 20 January 2013