VectorLib

Site Index:

OptiVec home
MatrixLib
CMATH
Download
Order
Update
Support

VectorLib

VectorLib is the vector functions part of OptiVec. This file describes the basic principles of the OptiVec libraries and gives an overview over VectorLib. The new object-oriented interface, VecObj, is described in chapter 3. MatrixLib and CMATH are described separately.

This is the English version.
Translation of the first three chapters into Portuguese by Artur Weber for https://www.homeyou.com/~edu/.

1. Introduction

1.1 Why Vectorized Programming Pays Off on the PC

	1.1.1 General OptiVec Optimization Strategies
	1.1.2 Multi-Processor Optimization
	1.1.3 CUDA Device Support
	1.1.4 Choosing the right OptiVec Library

2. The Elements of OptiVec Routines

	2.1 Synonyms for Some Data Types
	2.2 Complex Numbers
	2.3 Vector Data Types
	2.4 Vector Function Prefixes

3. C++ only: VecObj, the Object-Oriented Interface for VectorLib
4. VectorLib Functions and Routines: A Short Overview

	4.1 Generation, Initialization and De-Allocation of Vectors
	4.2 Index-oriented Manipulations
	4.3 Data-Type Interconversions
	4.4 More about Integer Arithmetics
	4.5 Basic Functions of Complex Vectors
	4.6 Mathematical Functions

	4.6.1 Rounding
	4.6.2 Comparisons
	4.6.3 Direct Bit-Manipulation
	4.6.4 Basic Arithmetics, Accumulations
	4.6.5 Geometrical Vector Arithmetics
	4.6.6 Powers
	4.6.7 Exponentials and Hyperbolic Functions
	4.6.8 Logarithms
	4.6.9 Trigonometric Functions

	4.7 Analysis
	4.8 Signal Processing: Fourier Transforms and Related Topics
	4.9 Statistical Functions and Building Blocks
	4.10 Data Fitting
	4.11 Input and Output
	4.12 Graphics

5. Error Handling

	5.1 General Remarks
	5.2 Integer Errors
	5.3 Floating-Point Errors

	5.3.1 C/C++ specific
	5.3.2 Pascal/Delphi specific
	5.3.3 Error Types (Both C/C++ and Pascal/Delphi)

	5.4 The Treatment of Denormal Numbers
	5.5 Advanced Error Handling: Writing Messages into a File
	5.6 OptiVec Error Messages

6. Trouble-Shooting
7. The Include-Files and Units of OptiVec

1. Introduction

OptiVec offers a powerful set of routines for numerically demanding applications, making the philosophy of vectorized programming available for C/C++ and Pascal/Delphi languages. It serves to overcome the limitations of loop management of conventional compilers – which proved to be one of the largest obstacles in the programmer's way towards efficient coding for scientific and data analysis applications.

In contrast to integrated packages like MatLab or others, OptiVec has the advantage of being incorporated into the modern and versatile languages C/C++ and Pascal/Delphi. Both C++ and Fortran do already offer some sort of vector processing, by virtue of iterator classes using templates (C++) and field functions (Fortran90). Both of these, however, are basically a convenient means of letting the compiler write the loop for you and then compile it to the usual inefficient code. The same is true for most implementations of the popular BLAS (Basic Linear Algebra Subroutine) libraries. In comparison to these approaches, OptiVec is superior mainly with respect to execution speed –: many functions are up to 5 or even 10 times faster than compiled loops. The performance is no longer limited by the quality of your compiler, but rather by the real speed of the processor!

There is a certain overlap in the range of functions offered by OptiVec and by BLAS, LINPACK, and other libraries and source-code collections. However, the latter must be compiled, and, consequently, their performance is determined mainly by the quality of the compiler chosen. To the best of our knowledge, OptiVec, was. in 1996, the first product on the market offering a comprehensive vectorized-functions library realized in a true Assembler implementation.

All operators and mathematical functions of C/C++ are implemented in vectorized form; additionally many more mathematical functions are included which normally would have to be calculated by more or less complicated combinations of existing functions. Not only the execution speed, but also the accuracy of the results is greatly improved.
Building blocks for statistical data analysis are supplied.
Derivatives, integrals, interpolation schemes are included.
Fast Fourier Transform techniques allow for efficient convolutions, correlation analyses, spectral filtering, and so on.
Graphical representation of data offers a convenient way of monitoring the results of vectorized calculations.
A wide range of optimized matrix functions like matrix arithmetics, algebra, decompositions, data fitting, etc. is offered by MatrixLib.
TensorLib is planned as a future extension of these concepts for general multidimensional arrays.
Each function exists for every data type for which this is reasonable. The data type is signalled by the prefix of the function name. No implicit name mangling or other specific C++ features are used, which makes OptiVec usable in plain-C as well as in C++ programs. Moreover, the names and the syntax of nearly all functions are the same in C/C++ and Pascal/Delphi languages.
The input and output vectors/matrices of VectorLib and MatrixLib routines may be of variable size and it is possible to process only a part (e.g., the first 100 elements, or every 10th element) of a vector, which is another important advantage over other approaches, where only whole arrays are processed.
A new object-oriented interface for C++, named VecObj, encapsulates all vector functions, offering even easier use and increased memory safety.
Using OptiVec routines instead of loops can make your source code much more compact and far better readable.

The wide range of routines and functions covered by OptiVec, the high numerical efficiency and increased ease of programming make this package a powerful programming tool for scientific and data analysis applications, competing with (and often beating) many high-priced integrated systems, but imbedded into your favourite programming language.

Back to VectorLib Table of Contents OptiVec home

1.1 Why Vectorized Programming Pays Off on the PC

To process one-dimensional data arrays or "vectors", a programmer would normally write a loop over all vector elements. Similarly, two- or higher-dimensional arrays ("matrices" or "tensors") are usually processed through nested loops over the indices in all dimensions. The alternative to this classic style of programming are vector and matrix functions.
Vector functions act on whole arrays/vectors instead of single scalar arguments. They are the most consequent form of "vectorization", i.e., organisation of program code (by clever compilers or by the programmer himself) in such a way as to optimize vector treatment.

Vectorization has always been the magic formula for supercomputers with their multi-processor parallel architectures. On these architectures, one tries to spread the computational effort equally over the available processors, thus maximizing execution speed. The so-called "divide and conquer" algorithms break down more complicated numerical tasks into small loops over array elements. Sophisticated compilers then find out the most efficient way how to distribute the array elements among the processors. Many supercomputer compilers also come with a large set of pre-defined proprietary vector and matrix functions for many basic tasks. These vectorized functions offer the best way to achieve maximum throughput.

Obviously, the massive parallel processing of, say, a Cray is not possible even on modern PCs with their modest modest 4 to 16 or 32-processor core configurations. Consequently, at first sight, it might seem difficult to apply the principle of vectorized programming to the PC. Actually, however, there are many vector-specific optimizations possible, even for computers with only one CPU. Most of these optimizations are not available to present compilers. Rather, one has to go down to the machine-code level. Hand-optimized, Assembler-written vector functions outperform compiled loops by a factor of two to three, on the average. This means that vectorization, properly done, is indeed worth the effort, also for PC programs.

1.1.1 General OptiVec Optimization Strategies

Here are the most important optimization strategies, employed in OptiVec to boost the performance on any PC (regardless of the number of processor cores):

Preload of constants
Floating-point as well as integer constants, employed in the evaluation of mathematical functions, are loaded into registers outside of the actual loop and stay as long as they are needed. This saves a large amount of loading/unloading operations which are necessary if a mathematical function is called for each element of a vector separately.

Use of SIMD commands
The SSE or "Streaming Single-Instruction-Multiple-Data Extensions", introduced since the days of the Pentium III and improved with every new processor generation, provide explicit support for vectorized programming with floating-point data in float / single or double precision. Given the usual relation between processor and data bus speeds, however, many of the simple arithmetic operations are data transfer limited, and the use of SIMD commands does not make the large difference (with respect to well-written FPU code) it could make otherwise. For more complicated operations, on the other hand, SIMD commands can lead to a drastic speed improvement, especially if conditional branches can be avoided or at least reduced to a minimum.

Full XMM and FPU stack usage
Where necessary, all 8, 16, or 32 XMM / YMM / ZMM registers and/or all eight coprocessor registers are employed.

Prefetch of chunks of vector elements
Beginning with the Pentium III processor, Intel introduced the very useful feature of explicit memory prefetch. With these commands, it is possible to "tell" the processor to fetch data from memory sufficiently in advance, so that no time is wasted waiting for them when they are actually needed.

Superscalar scheduling
By careful "pairing" of commands whose results do not depend upon each other, the parallel integer pipes and fadd/fmul units of the processor are used as efficiently as possible.

Loop-unrolling
Where SIMD instructions cannot be used and where optimum pairing of commands cannot be achieved for single elements, vectors are often processed in chunks of two, four, or even more elements. This allows to fully exploit the parallel execution pipes. Moreover, the relative amount of time spent for loop management is significantly reduced. In connection with data-prefetching, described above, the depth of the unrolled loops is most often adapted to the cache line size.

Simplified addressing
The addressing of vector elements is still a major source of inefficiency with present compilers. Switching forth and back between input and output vectors, a large number of redundant addressing operations is performed. The strict (and easy!) definitions of all OptiVec functions allow to reduce these operations to a minimum.

Replacement of floating-point by integer commands
For any operations with floating-point numbers that can also be performed using integer commands (like copying, swapping, or comparing to preset values), the faster method is consistently employed.

Strict precision control
Older compilers used to have only one set of math functions in double precision. Therefore, floats always had to be converted into doubles – Borland Pascal/Delphi even into extended – before passing then to a mathematical function. This approach was useful at times when disk memory was too great a problem to include separate functions for each data type in the .LIB files. All modern compilers now contain math functions for all supported accuracy levels. Of course, also OptiVec routines always calculate results only to the really needed accuracy, wasting no time for the calculation of more digits than necessary – which would be discarded anyway.

All-inline coding
All external function calls are eliminated from the inner loops of the vector processing. This saves the execution time necessary for the "call / ret" pairs and for loading the parameters onto the stack.

Cache-line matching of local variables
The Level−1 cache of modern processors uses 64-byte lines. Many OptiVec functions need double-precision or extended-precision real local variables on the stack. 32-bit compilers align the stack on 4-byte boundaries, which means there is a certain chance that the 8 bytes of a double or the 10 bytes of an extended, stored on the stack, will cross a cache-line boundary. This, in turn, would lead to a cache line-break penalty, deteriorating the performance. Consequently, those OptiVec functions where this is an issue, use special procedures to align their local variables on 8-byte (for doubles), 16-byte (for extendeds), or 64-byte boundaries (for XMM and YMM values).

Unprotected and reduced-range functions
OptiVec offers alternative forms of some mathematical functions, where you have the choice between the fully protected variant with error handling and another, unprotected variant without. In the case of the ipow (numbers raised to integer powers) functions, for example, the absence of error checking allows the unprotected versions to be vectorized much more efficiently. Similarly, the sine and cosine functions can be coded more efficiently for arguments that the user can guarantee to lie in a moderate range. In these special cases, the execution time may be reduced by up to 40%, depending on the hardware environment. This increased speed has always to be balanced against the increased risk, though: If any input element outside the valid range is encountered, the unprotected and reduced-range functions may crash without warning.

1.1.2 Multi-Processor Optimization

Multithread support
Modern multi-core processors allow the operating system to distribute threads among the available processors, scaling the overall performance with the number of available processor cores. For that, any functions running in parallel must be prevented from interfering with each other through read/write operations on global variables. With very few exceptions (namely the plotting functions, which have to use global variables to store the current window and coordinate system settings, and the non-linear data-fitting functions), all other OptiVec functions are reentrant and may run in parallel.

When designing your multi-thread application, you have two options: functional parallelism and data parallelism.

Functional Parallelism
If different threads are performing different tasks – they are functionally different – one speaks of functional parallelism. As an example, consider one thread handling user input / output, while another one performs background calculations. Even on a single-core CPU, this kind of multi-threading may offer advantages (e.g., the user interface does not block during extensive background calculations, but still takes input). On a multi-core computer, the two (or more) threads can actually run simultaneously on the different processor cores. In general, however, the load balance between the processor cores is far from perfect: often, one processor is running at maximum load, while another one is sitting idle, waiting for input. Still, functional multithreading is the best option whenever your numerical tasks involve vectors and matrices of only small-to-moderate size.

Data Parallelism
In order to improve the load balance between the available processor cores, thereby maximizing throughput, it is possible to employ classical parallel processing: the data to be processed is split up into several chunks, each thread getting one of these chunks. This is aptly called data parallelism. The usefulness of this approach is limited by the overhead involved in the data distribution and in the thread-to-thread communication. Moreover, there are always parts of the code which need to be processed sequentially and cannot be parallelized. Therefore, data parallelism pays off only for larger vectors and matrices. Typical break-even sizes range from about 100 (for the calculation of transcendental functions of complex input values) to several 10,000 elements (as in the simple arithmetic functions). Only when your vectors and matrices are considerably larger than that threshold, the performance is actually improved over a functional-parallelism approach. The boost then quickly approaches (but never exactly reaches) the theoretical limit of a factor equal to the number of processor cores available.

1.1.3 CUDA Device Support

Modern graphics cards are equipped with powerful multiprocessor capacity of up to several hundred processor kernels running in parallel. In recent years, interfaces have been developed, allowing to exploit this processing capacity not only for graphics rendering, but also for general calculations. One of these approaches is the CUDA concept by NVIDIA. Practically all current NVIDIA graphics cards support CUDA. Additionally, dedicated CUDA hardware is being offered by NVIDIA with the "Tesla" and "Fermi" board family. With the "C" libraries (e.g., OVVC8C.LIB), OptiVec offers a simple way to use a CUDA device for vector / matrix calculations without the hassles of actually programming in CUDA. There are a number of points to be considered:

Obviously, the "C" libraries can be used only with a CUDA-enabled device installed. This means, only NVIDIA products are supported.
Out of the compilers supported by OptiVec, currently, NVIDIA provides CUDA support only for MS Visual C++. This means there are presently no CUDA OptiVec libraries for the Embarcadero / Borland compilers available.
It is necessary to have the latest display driver installed. Even brand-new computers most often do not have the latest drivers. They must be selected and downloaded from NVIDIA's web-site, www.nvidia.com.
Already a sub-100$ graphics card can boost the performance of certain functions on a computer with a medium-range CPU by a factor of 10, dedicated hardware by much more. However, the combination of a high-end CPU with a low-end graphics card (as it is often found in laptop computers) will, at best, only marginally benefit from the "C" libraries.
The cost of swapping data forth and back between main-board memory and graphics memory is so high that it can be "earned" back only for quite large vectors and matrices. E.g., for mathematical functions like the sine or exponential functions, CUDA pays off from 100,000 vector elements on. For matrix multiplication, payback occurs in the region of 200x200 elements. All OptiVec functions check if using the CUDA device makes sense and decide accordingly wether to source-out processing to the graphics processor or to stay on the CPU.
Using CUDA with OptiVec is as easy as simply linking with the "C" library and with the cudaOptiVec import library. No modifications of your source code are necessary. On the other hand, by eliminating the repeated data transfers for each function, programming directly for CUDA devices with nVidia's CUDA SDK can lead to considerably higher performance than is possible with the use of the OptiVec "C" libraries.
NVIDIA might at any time change the licence terms for their CUDA libraries, so that we might at some point no longer be able to include them in our distributions and/or to support CUDA at all.

1.1.4 Choosing the right OptiVec Library

Whenever you want your application to run on a wide range of supported platforms, and when your vectors and matrices are only of small-to-moderate size, we recommend to use the general-purpose libraries. For 64-bit, these are OVVC64_8.Lib (for MS Visual C++), OVBC64_8.LIB (C++ Builder), or the units in OPTIVEC\WIN64\LIB8 (for Delphi).
For 32-bit, take OVVC4.LIB (for MS Visual C++), VCF4W.LIB (Embarcadero/Borland C++ compiler series), or the units in OPTIVEC\LIB4 (for Delphi). These libraries combine good performance with back-compatibility to older hardware, in 32-bit even down to 486DX, Pentium, Athlon. They are all multi-thread safe and support functional parallelism. If you do not need full floating-point accuracy and that amount of back-compatibility, you can get higher performance by switching to the P8 or P9 libraries (marked by the respective number in the in the library name).

For large vectors/matrices on multi-core machines, multi-core optimized libraries actively distribute the work load over the available processor cores for data parallel execution. These libraries are marked by the letter "M", as in OVVC8M.LIB (for MS Visual C++, using SSE2), VCF4M.LIB (for Embarcadero/Borland C++, full FPU accuracy), or the units in OPTIVEC\LIB8M (for Delphi, using SSE3). These libraries are designed for AMD 64 x2, Intel Core2 Duo, or machines equipped with several discrete processors of the Pentium 4+ level. The CUDA libraries are based on the "M" libraries and are marked by the letter "C", as, e.g., in OVVC8C.LIB.
The "M" and "C" libraries will still run on single-core machines, but – due to the thread-management overhead – somewhat slower than the general-purpose libraries. Although the "M" libraries are designed with medium to large vectors in mind, the penalty for using them with smaller vectors is almost negligible, as the OptiVec thread-engine automatically executes a function in a single thread, if the vector size is too small for parallel execution to earn back the cost involved in the thread-to-thread communication.
If you use the "M" or "C" libraries, your programme must call V_initMT( nAvailProcCores ) before any of the vector functions.

Back to VectorLib Table of Contents OptiVec home

2. Elements of OptiVec Routines

2.1 Synonyms for Some Data Types

To increase the versatility and completeness of OptiVec, additional data types are defined in <VecLib.h> or the unit VecLib:

a) C/C++ only:

The data type ui (short for "unsigned index") is used for the indexing of vectors and is defined as "unsigned int".

The 64-bit integer data type (__int64 in BC++ Builder and MS Visual C++, Int64 in Delphi) is called quad (for "quadword integer") in OptiVec.
In 32-bit, the type quad is always signed. Functions for unsigned 64-bit integers are available only in the 64-bit versions of OptiVec.

The data type extended, which is familiar to Pascal/Delphi programmers, is defined as a synonym for "long double" in OptiVec for C/C++. As Visual C++ does not support 80-bit reals, we define extended as "double" in the OptiVec versions for that compiler.

b) Delphi only:

The data type Float, which is familiar to C/C++ programmers, is defined as a synonym for Single. We prefer to have the letters defining the real-number data types in alphabetical proximity: "D" for Double, "E" for Extended, and "F" for Float. The letters "G" and "H" are already reserved for Great (128-bit real) and Half (16-bit real).

For historical reasons (dating back to the development of Turbo Pascal), the various integer data types have a somewhat confusing nomenclature in Delphi. In order to make the derived function prefixes compatible with the C/C++ versions of OptiVec, we define a number of synonyms, as described in the following table:

type	Delphi name	synonym	derived prefix
8 bit signed	ShortInt	ByteInt	VBI_
8 bit unsigned	Byte	UByte	VUB_
16 bit signed	SmallInt		VSI_
16 bit unsigned	Word	USmall	VUS_
32 bit signed	LongInt		VLI_
32 bit unsigned		ULong	VUL_
64 bit signed	Int64	QuadInt	VQI_
64 bit unsigned (x64 version only!)	UInt64	UQuad	VUQ_
16/32 bit signed	Integer		VI_
16/32 bit unsigned	Cardinal	UInt	VU_

To have a Boolean data type available which is of the same size as Integer, we define the type IntBool. It is equivalent to LongBool in Delphi. You will see the IntBool type as the return value of many mathematical VectorLib functions.

2.2 Complex Numbers

As described in greater detail for CMATH, OptiVec supports complex numbers both in cartesian and polar format.

If you use only the vectorized complex functions (but not the scalar functions of CMATH), you need not explicitly include CMATH. In this case, the following complex data types are defined in <VecLib.h> for C/C++:
typedef struct { float Re, Im; } fComplex;
typedef struct { double Re, Im; } dComplex;
typedef struct { extended Re, Im; } eComplex;
typedef struct { float Mag, Arg; } fPolar;
typedef struct { double Mag, Arg; } dPolar;
typedef struct { extended Mag, Arg; } ePolar;

The corresponding definitions for Pascal/Delphi are contained in the unit VecLib:
type fComplex = record Re, Im: Float; end;
type dComplex = record Re, Im: Double; end;
type eComplex = record Re, Im: Extended; end;
type fPolar = record Mag, Arg: Float; end;
type dPolar = record Mag, Arg: Double; end;
type ePolar = record Mag, Arg: Extended; end;

If, for example, a complex number z is declared as "fComplex z;", the real and imaginary parts of z are available as z.Re and z.Im, resp. Complex numbers are initialized either by setting the constituent parts separately to the desired value, e.g.,
z.Re = 3.0; z.Im = 5.7;
p.Mag = 4.0; p.Arg = 0.7;
(of course, the assignment operator is := in Pascal/Delphi).
Alternatively, the same initialization can be accomplished by the functions fcplx or fpolr:
C/C++:
z = fcplx( 3.0, 5.7 );
p = fpolr( 4.0, 0.7 );
Pascal/Delphi:
fcplx( z, 3.0, 5.7 );
fpolr( p, 3.0, 5.7 );

For double-precision complex numbers, use dcplx and dpolr, for extended-precision complex numbers, use ecplx and epolr.
Pointers to arrays or vectors of complex numbers are declared using the data types cfVector, cdVector, and ceVector (for cartesian complex) and pfVector, pdVector, and peVector (for polar complex) described below.

2.3 Vector Data Types

We define, as usual, a "vector" as a one-dimensional array of data containing, at least, one element, with all elements being of the same data type. Using a more mathematical definition, a vector is a rank-one tensor. A two-dimensional array (i.e. a rank-two tensor) is denoted as a "matrix", and higher dimensions are always referred to as "tensors".
In contrast to other approaches, VectorLib does not allow zero-size vectors!

The basis of all VectorLib routines is formed by the various vector data types given below and declared in <VecLib.h> or the unit VecLib. In contrast to the fixed-size static arrays, the VectorLib types use dynamic memory allocation and allow for varying sizes. Because of this increased flexibility, we recommend that you predominantly use the latter. Here they are:

C/C++

typedef	float *	fVector
typedef	double *	dVector
typedef	extended *	eVector
typedef	fComplex *	cfVector
typedef	dComplex *	cdVector
typedef	eComplex *	ceVector
typedef	fPolar *	pfVector
typedef	dPolar *	pdVector
typedef	ePolar *	peVector
typedef	int *	iVector
typedef	byte *	biVector
typedef	short *	siVector
typedef	long *	liVector
typedef	quad *	qiVector
typedef	unsigned *	uVector
typedef	unsigned byte *	ubVector
typedef	unsigned short *	usVector
typedef	unsigned long *	ulVector
typedef	uquad *	uqVector
typedef	ui *	uiVector

Pascal/Delphi

type	fVector	= ^Float;
type	dVector	= ^Double;
type	eVector	= ^Extended;
type	cfVector	= ^fComplex;
type	cdVector	= ^dComplex;
type	ceVector	= ^eComplex;
type	pfVector	= ^fPolar;
type	pdVector	= ^dPolar;
type	peVector	= ^ePolar
type	iVector	= ^Integer;
type	biVector	= ^ByteInt;
type	siVector	= ^SmallInt;
type	liVector	= ^LongInt;
type	qiVector	= ^QuadInt;
type	uVector	= ^UInt;
type	ubVector	= ^UByte;
type	usVector	= ^USmall;
type	ulVector	= ^ULong;
type	uqVector	= ^UQuad;

Internally, a data type like fVector means "pointer to float", but you may think of a variable declared as fVector rather in terms of a "vector of floats".

Note: in connection with Windows programs, often the letter "l" or "L" is used to denote "long int" variables. In order to prevent confusion, however, the data type "long int" is signalled by "li" or "LI", and the data type "unsigned long" is signalled by "ul" or "UL". Conflicts with prefixes for "long double" vectors are avoided by deriving these from the alias name "extended" and using "e", "ce", "E", and "CE", as described above and in the following.

C/C++ specific:
Vector elements can be accessed either with the [] operator, like VA[375] = 1.234;
or by the type-specific functions VF_element (returns the value of the desired vector element, but cannot be used to overwrite the element) and VF_Pelement (returns the pointer to a vector element).
Especially for some older Borland C versions (which have a bug in the pointer-arithmetics), VF_Pelement has to be used instead of the syntax X+n.
In your programs, you may mix these vector types with the static arrays of classic C style.
For example:
float a[100]; /* classic static array */
fVector b=VF_vector(100); /* VectorLib vector */
VF_equ1( a, 100 ); /* set the first 100 elements of a equal to 1.0 */
VF_equC( b, 100, 3.7 ); /* set the first 100 elements of b equal to 3.7 */

Pascal/Delphi specific:
As in C/C++, you may mix these vector types with the static arrays of classic Pascal style. Static arrays have to be passed to OptiVec functions with the "address of" operator. Here, the above example reads:
a: array[0..99] of Single; (* classic static array *)
b: fVector;(* VectorLib vector *)
b := VF_vector(100);
VF_equ1( @a, 100 ); (* set first 100 elements of a = 1.0 *)
VF_equC( b, 100, 3.7 ); (* set first 100 elements of b = 3.7 *)
Delphi also offers dynamically-allocated arrays, which may also be used as arguments for OptiVec functions. The following table compares the pointer-based vectors of VectorLib with the array types of Pascal/Delphi:

	OptiVec vectors	Pascal/Delphi static/dynamic arrays
alignment of first element	on 32-byte boundary for optimum cache-line matching	2 or 4-byte boundary (may cause line-break penalty for double, QuadInt)
alignment of following elements	packed (i.e., no dummy bytes between elements, even for 8, 10, and 16-bit types	arrays must be declared as "packed" for Delphi 4+ to be compatible with OptiVec
index range checking	none	automatic with built-in size information
dynamic allocation	function VF_vector, VF_vector0	procedure SetLength (Delphi 4+ only)
initialization with 0	optional by calling VF_vector0	always (Delphi 4+ only)
de-allocation	function V_free, V_freeAll	procedure Finalize (Delphi 4+ only)
reading single elements	function VF_element: a := VF_element(X,5); Delphi 4+ only: typecast into array also possible: a := fArray(X)[5];	index in brackets: a := X[5];
setting single elements	function VF_Pelement: VF_Pelement(X,5)^ := a; Delphi 4+ only: typecast into array also possible: fArray(X)[5] := a;	index in brackets: X[5] := a;
passing to OptiVec function	directly: VF_equ1( X, sz );	address-of operator: VF_equ1( @X, sz );
passing sub-vector to OptiVec function	function VF_Pelement: VF_equC( VF_Pelement(X,10), sz-10, 3.7);	address-of operator: VF_equC( @X[10], sz-10, 3.7 );

Summarizing the properties of OptiVec vectors and of Pascal/Delphi arrays, the latter are somewhat more convenient and, due to the index range checking, safer, whereas the pointer-based OptiVec vectors are processed faster (due to the better alignment and to the absence of checking routines).

Back to VectorLib Table of Contents OptiVec home

2.4 Vector Function Prefixes

In the plain-C, Pascal and Delphi versions, every OptiVec function has a prefix denoting the data-type on which it acts. (Read here about the overloaded C++ functions of VecObj.)

Prefix	Arguments and return value
VF_	fVector and float
VD_	dVector and double
VE_	eVector and extended (long double)
VCF_	cfVector and fComplex
VCD_	cdVector and dComplex
VCE_	ceVector and eComplex
VPF_	pfVector and fPolar
VPD_	pdVector and dPolar
VPE_	peVector and ePolar
VI_	iVector and int / Integer
VBI_	biVector and byte / ByteInt
VSI_	siVector and short int / SmallInt
VLI_	liVector and long int / LongInt
VQI_	qiVector and quad / QuadInt
VU_	uVector and unsigned / UInt
VUB_	ubVector and unsigned char / UByte
VUS_	usVector and unsigned short / USmall
VUL_	ulVector and unsigned long / ULong
VUQ_	uqVector and uquad / UQuad (for Win64 only!)
VUI_	uiVector and ui
V_	(data-type conversions like V_FtoD, data-type independent functions like V_initPlot)

Back to VectorLib Table of Contents OptiVec home

3. VecObj, theObject-Oriented Interface for VectorLib

VecObj, the object-oriented C++ interface to OptiVec vector functions was written by Brian Dale, Case Western Reserve University.
Among the advantages it offers are the following:

automatic allocation and deallocation of memory
simplified vector handling
greatly reduced risk of memory leaks
increased memory access safety
intuitive overloaded operators
simpler function calls

There are a few draw-backs, though, which you should be aware of:

increased compiler load
larger overhead (as for any encapsulated C++ code!), leading to
increased code size
decreased computational efficiency
vectors can be processed only as a whole, not in parts

VecObj is contained in the include-files <VecObj.h>, <fVecObj.h>, <dVecObj.h> etc., with one include-file for each of the data-types supported in OptiVec.
To get the whole interface (for all data types at once),
#include <OptiVec.h>.
For access to any of the vector graphics functions, always include <OptiVec.h>.

MS Visual C++ and Embarcadero / Borland C++ Builder (but not previous Borland C++ versions): Programmers should put the directive
"using namespace OptiVec;"
either in the body of any function that usestVecObj, or in the global declaration part of the program. Placing the directive in the function body is safer, avoiding potential namespace conflicts in other functions.
The vector objects are defined as classes vector<T>, encapsulating the vector address (pointer) and size.
For easier use, these classes got alias names fVecObj, dVecObj, and so on, with the data-type signalled by the first one or two letters of the class name, in the same way as the vector types described above.

All functions defined in VectorLib for a specific vector data-type are contained as member functions in the respective tVecObj class.
The constructors are available in four forms:
vector(); // no memory allocated, size set to 0
vector( ui size ); // vector of size elements allocated
vector( ui size, T fill ); // as before, but initialized with value "fill"
vector( vector<T> init ); // creates a copy of the vector "init"

For all vector classes, the arithmetic operators
+ - * / += -= *= /=
are defined, with the exception of the polar-complex vector classes, where only multiplications and divisions, but no additions or subtractions are supported. These operators are the only cases in which you can directly assign the result of a calculation to a vector object, like
fVecObj Z = X + Y; or
fVecObj Z = X * 3.5;
Note, however, that the C++ class syntax rules do not allow a very efficient implementation of these operators. The arithmetic member functions are much faster. If speed is an issue, use
fVecObj Z.addV( X, Y ); or
fVecObj Z.mulC( X, 3.5 );
instead of the operator syntax. The operator * refers to element-wise multiplication, not to the scalar product of two vectors.

All other arithmetic and math functions can only be called as member functions of the respective output vector as, for example, Y.exp(X). Although it would certainly be more logical to have these functions defined in such a way that you could write "Y = exp(X)" instead, the member-function syntax was chosen for efficiency considerations: The only way to implement the second variant is to store the result of the exponential function of X first in a temporary vector, which is then copied into Y, thus considerably increasing the work-load and memory demands.

While most VecObjfunctions are member functions of the output vector, there exists a number of functions which do not have an output vector. In these cases, the functions are member functions of an input vector.
Example: s = X.mean();.

If you ever need to process a VecObj vector in a "classic" plain-C VectorLib function (for example, to process only some part of it), you may use the member functions
getSize() to retrieve its size,
getVector() for the pointer (of data type tVector, where "t" stands for the usual type prefix), and
Pelement( n ) for a pointer to the to the n'th element.

Continue with chapter 4. VectorLib Functions and Routines: A Short Overview
Back to VectorLib Table of Contents OptiVec home

Last modified: 2 May 2025