VectorLibSite Index:OptiVec homeMatrixLib CMATH Download Order Update Support |
VectorLibVectorLib is the vector functions part of OptiVec. This file describes the basic principles of the OptiVec libraries and gives an overview over VectorLib. The new object-oriented interface, VecObj, is described in chapter 3. MatrixLib and CMATH are described separately.Contents1. Introduction
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1.1.3 CUDA Device Support | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1.1.4 Choosing the right OptiVec Library |
| 2.1 Synonyms for Some Data Types | |
| 2.2 Complex Numbers | |
| 2.3 Vector Data Types | |
| 2.4 Vector Function Prefixes |
| 4.7 Analysis | |
| 4.8 Signal Processing: Fourier Transforms and Related Topics | |
| 4.9 Statistical Functions and Building Blocks | |
| 4.10 Data Fitting | |
| 4.11 Input and Output | |
| 4.12 Graphics |
| 5.1 General Remarks | |
| 5.2 Integer Errors | |
| 5.3 Floating-Point Errors |
| 5.3.1 C/C++ specific | |
| 5.3.2 Pascal/Delphi specific | |
| 5.3.3 Error Types (Both C/C++ and Pascal/Delphi) |
| 5.4 The Treatment of Denormal Numbers | |
| 5.5 Advanced Error Handling: Writing Messages into a File | |
| 5.6 OptiVec Error Messages |
OptiVec offers a powerful set of routines for numerically demanding applications, making the philosophy of vectorized programming available for C/C++ and Pascal/Delphi languages. It serves to overcome the limitations of loop management of conventional compilers which proved to be one of the largest obstacles in the programmer's way towards efficient coding for scientific and data analysis applications.
In comparison to the old vector language APL, OptiVec has the advantage of being incorporated into the modern and versatile languages C/C++ and Pascal/Delphi. Recent versions of C++ and Fortran do already offer some sort of vector processing, by virtue of iterator classes using templates (C++) and field functions (Fortran90). Both of these, however, are basically a convenient means of letting the compiler write the loop for you and then compile it to the usual inefficient code. The same is true for most implementations of the popular BLAS (Basic Linear Algebra Subroutine) libraries. In comparison to these approaches, OptiVec is superior mainly with respect to execution speed on the average by a factor of 2-3, in some cases even up to 8. The performance is no longer limited by the quality of your compiler, but rather by the real speed of the processor!
There is a certain overlap in the range of functions offered by OptiVec and by BLAS, LINPACK, and other libraries and source-code collections. However, the latter must be compiled, and, consequently, their performance is determined mainly by the quality of the compiler chosen. To the best of our knowledge, it is our product, OptiVec, that offers the first comprehensive vectorized-functions library realized in a true Assembler implementation.
The wide range of routines and functions covered by OptiVec, the high numerical efficiency and increased ease of programming make this package a powerful programming tool for scientific and data analysis applications, competing with (and often beating) many high-priced integrated systems, but imbedded into your favourite programming language.
Back to VectorLib Table of Contents OptiVec home
Vectorization has always been the magic formula for supercomputers with their multi-processor parallel architectures. On these architectures, one tries to spread the computational effort equally over the available processors, thus maximizing execution speed. The so-called "divide and conquer" algorithms break down more complicated numerical tasks into small loops over array elements. Sophisticated compilers then find out the most efficient way how to distribute the array elements among the processors. Many supercomputer compilers also come with a large set of pre-defined proprietary vector and matrix functions for many basic tasks. These vectorized functions offer the best way to achieve maximum throughput.
Obviously, the massive parallel processing of, say, a Cray is not possible even on modern PCs with their modest 2 or 4-processor core configurations, let alone on the classical single-processor PC. Consequently, at first sight, it might seem difficult to apply the principle of vectorized programming to the PC. Actually, however, there are many vector-specific optimizations possible, even for computers with only one CPU. Most of these optimizations are not available to present compilers. Rather, one has to go down to the machine-code level. Hand-optimized, Assembler-written vector functions outperform compiled loops by a factor of two to three, on the average. This means that vectorization, properly done, is indeed worth the effort, also for PC programs.
Here are the most important optimization strategies, employed in OptiVec to boost the performance on any PC (regardless of the number of processor cores):
Prefetch of chunks of vector elements
Beginning with the Pentium III processor, Intel introduced the very useful feature of explicit memory prefetch. With these commands, it is possible to "tell" the processor to fetch data from memory sufficiently in advance, so that no time is waisted waiting for them when they are actually needed.
Cache control
The Pentium III+ processors offer the possibility to mark data as "temporal" (will be used again) or "non-temporal" (used only once), while they are fetched or stored. In OptiVec functions, it is assumed that input vectors (and matrices) will not be used again, whereas the output vectors are likely to become the input for some ensuing procedure. Consequently, the cache is bypassed while loading input data, but the output data are written into the cache. Of course, this approach breaks down if the vectors or matrices become too large to fit into the cache. For these cases, a large-vector version of the OptiVec libraries is available which bypasses the cache also while writing the output vectors. For simple arithmetic functions, up to 20% in speed are gained as compared to the small-and-medium-size version. On the other hand, as this large-vector version effectively switches the cache off, a drastic performance penalty (up to a factor of three or four!) will result, if it is used for smaller systems. For the same reason, you should carefully check if your problem could perhaps be split up into smaller vectors, before resorting to the large-vector version. This would allow to achieve the much higher performance resulting from efficient data caching.
Use of SIMD commands
You might wonder why this strategy is not listed first. The SSE or "Streaming Single-Instruction-Multiple-Data Extensions" of Pentium III, Pentium 4 and their successors provide explicit support for vectorized programming with floating-point data in float / single or double precision (the latter only for Pentium 4). At first sight, therefore, they should revolutionize vector programming. Given the normal relation between processor and data bus speeds, however, many of the simple arithmetic operations are data transfer limited, and the use of SIMD commands does not make the large difference (with respect to well-written FPU code) it could make otherwise. In most cases, the advantage of treating four floats in a single command melts down to a 20-30% increase in speed (which is not that bad, anyway!). For more complicated operations, on the other hand, SIMD commands often cannot be employed, either because conditional branches have to be taken for each vector element individually, or because the "extra" accuracy and range, available by traditional FPU commands (with their internal extended accuracy), allows to simplify algorithms so much that the FPU code is still faster. As a consequence, we use SIMD commands only where a real speed gain is possible. Please note, however, that, the SIMD-employing library versions (P6, P7 etc.) generally sacrifices 2-3 digits of accuracy in order to attain the described speed gain. If this is not acceptable for your specific task, please stay with the P4 libraries.
Preload of floating-point constants
Floating-point constants, employed in the evaluation of mathematical functions, are loaded into floating-point registers outside the actual loop and stay as long as they are needed. This saves a large amount of loading/unloading operations which are necessary if a mathematical function is called for each element of a vector separately.
Full XMM and FPU stack usage
Where necessary, all eight (64-bit: all sixteen) XMM registers and/or all eight coprocessor registers are employed.
Superscalar scheduling
By careful "pairing" of commands whose results do not depend upon each other, the two integer pipes and the two fadd/fmul units of the processor are used as efficiently as possible.
Loop-unrolling
Where optimum pairing of commands cannot be achieved for single elements, vectors are often processed in chunks of two, four, or even more elements. This allows to fully exploit the parallel-processing capabilities of the Pentium and its successors. Moreover, the relative amount of time spent for loop management is significantly reduced. In connection with data-prefetching, described above, the depth of the unrolled loops is most often adapted to the cache line size of 32 bytes (PentiumXX) or 64 bytes (AMD 64 x2 or Core2 Duo).
Simplified addressing
The addressing of vector elements is still a major source of inefficiency with present compilers. Switching forth and back between input and output vectors, a large number of redundant addressing operations is performed. The strict (and easy!) definitions of all OptiVec functions allow to reduce these operations to a minimum.
Replacement of floating-point by integer commands
For any operations with floating-point numbers that can also be performed using integer commands (like copying, swapping, or comparing to preset values), the faster method is consistently employed.
Strict precision control
C compilers convert a float into a double Borland Pascal/Delphi even into extended before passing it to a mathematical function. This approach was useful at times when disk memory was too great a problem to include separate functions for each data type in the .LIB files, but it is simply inefficient on modern PCs. Consequently, no such implicit conversions are present in OptiVec routines. Here, a function of a float is calculated to float (i.e. single) precision, wasting no time for the calculation of more digits than necessary which would be discarded anyway. There is also a brute-force approach to precision-control: You can call V_setFPAccuracy( 1 ); to actively switch the FPU to single precision, if that is enough for a given application. Thereby, execution can be slightly sped up from Pentium CPUs on. Be, however, prepared to accept even lower-than-single accuracy of your end results, if you elect this option. For further details and precautions, see V_setFPAccuracy.
All-inline coding
All external function calls are eliminated from the inner loops of the vector processing. This saves the execution time necessary for the "call / ret" pairs and for loading the parameters onto the stack.
Cache-line matching of local variables
The Level-1 cache of the Pentium and its 32-bit successors is organized in lines of 32 bytes each, modern 64-bit processors use 64-byte lines. Many OptiVec functions need double-precision or extended-precision real local variables on the stack (mainly for integer/floating-point conversions or for range checking). Present compilers align the stack on 4-byte boundaries, which means there is a certain chance that the 8 bytes of a double or the 10 bytes of an extended, stored on the stack, will cross a cache-line boundary. This, in turn, would lead to a cache line-break penalty, deteriorating the performance. Consequently, those OptiVec functions where this is an issue, use special procedures to align their local variables on 8-byte (for doubles), 16-byte (for extendeds), or 32-byte boundaries (for XMM values).
Unprotected and reduced-range functions
OptiVec offers alternative forms of some mathematical functions, where you have the choice between the fully protected variant with error handling and another, unprotected variant without. In the case of the integer power functions, for example, the absence of error checking allows the unprotected versions to be vectorized much more efficiently. Similarly, the sine and cosine functions can be coded more efficiently for arguments that the user can guarantee to lie in the range -2p and +2p. In these special cases, the execution time may be reduced by up to 40%, depending on the hardware environment. This increased speed has always to be balanced against the increased risk, though: If any input element outside the valid range is encountered, the unprotected and reduced-range functions will crash without warning.
Multithread support
All the above being said about single-CPU PCs, multi-processors computers (with Intel's Core i3, i5, i7, Core2Duo, AMD's Athlon 64 X2, or workstations and servers equipped with 2 or 4 PentiumXX chips) do allow the operating system to distribute threads among the available processors, doubling or quadrupling the overall performance. For that, any functions running in parallel must be prevented from interfering with each other through read/write operations on global variables. With very few exceptions (namely the plotting functions, which have to use global variables to store the current window and coordinate system settings, and the non-linear data-fitting functions), all other OptiVec functions are reentrant and may run in parallel.
Be careful with multi-threading, if you are using the P6 or P7 versions of OptiVec: The earlier releases of 32-bit Windows do not save the XMM registers (employed in the SIMD commands) during task switches. No such problems have been found with Windows XP or Vista.
When designing your multi-thread application, you have two options: functional parallelism and data parallelism.
Functional Parallelism
If different threads are performing different tasks they are functionally different one speaks of functional parallelism. As an example, consider one thread handling user input / output, while another one performs background calculations. Even on a single-core CPU, this kind of multi-threading may offer advantages (e.g., the user interface does not block during extensive background calculations, but still takes input). On a multi-core computer, the two (or more) threads can actually run simultaneously on the different processor cores. In general, however, the load balance between the processor cores is far from perfect: often, one processor is running at maximum load, while another one is sitting idle, waiting for input. Still, functional multithreading is the best option whenever your numerical tasks involve vectors and matrices of only small-to-moderate size.
Data Parallelism
In order to improve the load balance between the available processor cores, thereby maximizing throughput, it is possible to employ classical parallel processing: the data to be processed is split up into several chunks, each thread getting one of these chunks. This is aptly called data parallelism. The usefulness of this approach is limited by the overhead involved in the data distribution and in the thread-to-thread communication. Moreover, there are always parts of the code which need to be processed sequentially and cannot be parallelized. Therefore, data parallelism pays off only for larger vectors and matrices. Typical break-even sizes range from about 100 (for the calculation of transcendental functions of complex input values) to several 10,000 elements (as in the simple arithmetic functions). Only when your vectors and matrices are considerably larger than that threshold, the performance is actually improved over a functional-parallelism approach. The boost then quickly approaches (but never exactly reaches) the theoretical limit of a factor equal to the number of processor cores available.
Choosing the right OptiVec Library
Whenever you want your application to run on a wide range of supported platforms, and when your vectors and matrices are only of small-to-moderate size, we recommend to use the general-purpose libraries, OVVC4.LIB (for MS Visual C++), VCF4W.LIB (for Borland C++), or the units in OPTIVEC\LIB4 (for Delphi). These libraries combine good performance with back-compatibility to older hardware, down to 486DX, Pentium, old models of Athlon. They are all multi-thread safe and support functional parallelism. If you do not need full floating-point accuracy and that amount of back-compatibility, you can get higher performance by switching to the P6, P7, or P8 libraries (marked by the respective number in the in the library name).
For large vectors/matrices on single-core machines from Pentium III+ on, we offer versions gaining some performance by simply bypassing the data cache. These Large-Vector Libraries are marked by the letter "L": OVVC6L.LIB (for MS Visual C++), VCF6L.LIB (for Borland C++), or the units in OPTIVEC\LIB6L (for Delphi). Replace the "6" with "7" to get the Pentium 4+ versions, and so on. If mis-used for smaller vectors / matrices, the Large-Vector libraries will perform significantly slower than the general-purpose libraries!
Finally, for large vectors/matrices on multi-core machines, our new multi-core optimized libraries actively distribute the work load over the available processor cores for data parallel execution. These libraries are marked by the letter "M", as in OVVC7M.LIB (for MS Visual C++, using SSE2), VCF4M.LIB (for Borland C++, full FPU accuracy), or the units in OPTIVEC\LIB8M (for Delphi, using SSE3). These libraries are designed for AMD 64 x2, Intel Core2 Duo, or machines equipped with several discrete processors of the Pentium 4+ level.
The "M" libraries will still run on single-core machines, but due to the thread-management overhead somewhat slower than the general-purpose libraries. Although the "M" libraries are designed with medium to large vectors in mind, the penalty for using them with smaller vectors is almost negligible, as the OptiVec thread-engine automatically executes a function in a single thread, if the vector size is too small for parallel execution to earn back the cost involved in the thread-to-thread communication.
If you use the "M" libraries, your programme must call V_initMT before any of the vector functions.
For large vectors/matrices on single-core machines from Pentium III+ on, we offer versions gaining some performance by simply bypassing the data cache. These Large-Vector Libraries are marked by the letter "L": OVVC6L.LIB (for MS Visual C++), VCF6L.LIB (for Borland C++), or the units in OPTIVEC\LIB6L (for Delphi). Replace the "6" with "7" to get the Pentium 4+ versions, and so on. If mis-used for smaller vectors / matrices, the Large-Vector libraries will perform significantly slower than the general-purpose libraries!
Finally, for large vectors/matrices on multi-core machines, multi-core optimized libraries actively distribute the work load over the available processor cores for data parallel execution. These libraries are marked by the letter "M", as in OVVC7M.LIB (for MS Visual C++, using SSE2), VCF4M.LIB (for Borland C++, full FPU accuracy), or the units in OPTIVEC\LIB8M (for Delphi, using SSE3). These libraries are designed for AMD 64 x2, Intel Core2 Duo, or machines equipped with several discrete processors of the Pentium 4+ level. The CUDA libraries are based on the "M" libraries and are marked by the letter "C", as, e.g., in OVVC8C.LIB.
The "M" and "C" libraries will still run on single-core machines, but due to the thread-management overhead somewhat slower than the general-purpose libraries. Although the "M" libraries are designed with medium to large vectors in mind, the penalty for using them with smaller vectors is almost negligible, as the OptiVec thread-engine automatically executes a function in a single thread, if the vector size is too small for parallel execution to earn back the cost involved in the thread-to-thread communication.
If you use the "M" or "C" libraries, your programme must call V_initMT( nAvailProcCores ) before any of the vector functions.
Back to VectorLib Table of Contents OptiVec home
The 64-bit integer data type (__int64 in BC++ Builder and MS Visual C++, Int64 in Delphi) is called quad (for "quadword integer") in OptiVec.
In 32-bit, the type quad is always signed. Functions for unsigned 64-bit integers are available only in the 64-bit versions of OptiVec.
The data type extended, which is familiar to Pascal/Delphi programmers, is defined as a synonym for "long double" in OptiVec for C/C++. As Visual C++ does not support 80-bit reals, we define extended as "double" in the OptiVec versions for that compiler.
For historical reasons (dating back to the development of Turbo Pascal), the various integer data types have a somewhat confusing nomenclature in Delphi. In order to make the derived function prefixes compatible with the C/C++ versions of OptiVec, we define a number of synonyms, as described in the following table:
| type | Delphi name | synonym | derived prefix |
| 8 bit signed | ShortInt | ByteInt | VBI_ |
| 8 bit unsigned | Byte | UByte | VUB_ |
| 16 bit signed | SmallInt | VSI_ | |
| 16 bit unsigned | Word | USmall | VUS_ |
| 32 bit signed | LongInt | VLI_ | |
| 32 bit unsigned | ULong | VUL_ | |
| 64 bit signed | Int64 | QuadInt | VQI_ |
| 64 bit unsigned (x64 version only!) | UInt64 | UQuad | VUQ_ |
| 16/32 bit signed | Integer | VI_ | |
| 16/32 bit unsigned | Cardinal | UInt | VU_ |
To have a Boolean data type available which is of the same size as Integer, we define the type IntBool. It is equivalent to LongBool in Delphi. You will see the IntBool type as the return value of many mathematical VectorLib functions.
If you use only the vectorized complex functions (but not the scalar functions of CMATH), you need not explicitly include CMATH. In this case, the following complex data types are defined in <VecLib.h> for C/C++:
typedef struct { float Re, Im; } fComplex;
typedef struct { double Re, Im; } dComplex;
typedef struct { extended Re, Im; } eComplex;
typedef struct { float Mag, Arg; } fPolar;
typedef struct { double Mag, Arg; } dPolar;
typedef struct { extended Mag, Arg; } ePolar;
The corresponding definitions for Pascal/Delphi are contained in the unit VecLib:
type fComplex = record Re, Im: Float; end;
type dComplex = record Re, Im: Double; end;
type eComplex = record Re, Im: Extended; end;
type fPolar = record Mag, Arg: Float; end;
type dPolar = record Mag, Arg: Double; end;
type ePolar = record Mag, Arg: Extended; end;
If, for example, a complex number z is declared as "fComplex z;", the real and imaginary parts of z are available as z.Re and z.Im, resp. Complex numbers are initialized either by setting the constituent parts separately to the desired value, e.g.,
z.Re = 3.0; z.Im = 5.7;
p.Mag = 4.0; p.Arg = 0.7;
(of course, the assignment operator is := in Pascal/Delphi).
Alternatively, the same initialization can be accomplished by the
functions fcplx or fpolr:
C/C++:
z = fcplx( 3.0, 5.7 );
p = fpolr( 4.0, 0.7 );
Pascal/Delphi:
fcplx( z, 3.0, 5.7 );
fpolr( p, 3.0, 5.7 );
For double-precision complex numbers, use dcplx and dpolr, for extended-precision complex numbers, use ecplx and epolr.
Pointers to arrays or vectors of complex numbers are declared using the data types cfVector, cdVector, and ceVector (for cartesian complex) and pfVector, pdVector, and peVector (for polar complex) described below.
The basis of all VectorLib routines is formed by the various vector data types given below and declared in <VecLib.h> or the unit VecLib. In contrast to the fixed-size static arrays, the VectorLib types use dynamic memory allocation and allow for varying sizes. Because of this increased flexibility, we recommend that you predominantly use the latter. Here they are:
C/C++
| Pascal/Delphi
|
| Note: in connection with Windows programs, often the letter "l" or "L" is used to denote "long int" variables. In order to prevent confusion, however, the data type "long int" is signalled by "li" or "LI", and the data type "unsigned long" is signalled by "ul" or "UL". Conflicts with prefixes for "long double" vectors are avoided by deriving these from the alias name "extended" and using "e", "ce", "E", and "CE", as described above and in the following. |
Pascal/Delphi specific:
As in C/C++, you may mix these vector types with the static arrays of classic Pascal style. Static arrays have to be passed to OptiVec functions with the "address of" operator. Here, the above example reads:
a: array[0..99] of Single; (* classic static array *)
b: fVector;(* VectorLib vector *)
b := VF_vector(100);
VF_equ1( @a, 100 ); (* set first 100 elements of a = 1.0 *)
VF_equC( b, 100, 3.7 ); (* set first 100 elements of b = 3.7 *)
Delphi also offers dynamically-allocated arrays, which may also be used as arguments for OptiVec functions. The following table compares the pointer-based vectors of VectorLib with the array types of Pascal/Delphi:
| OptiVec vectors | Pascal/Delphi static/dynamic arrays | |
| alignment of first element | on 32-byte boundary for optimum cache-line matching | 2 or 4-byte boundary (may cause line-break penalty for double, QuadInt) |
| alignment of following elements | packed (i.e., no dummy bytes between elements, even for 8, 10, and 16-bit types | arrays must be declared as "packed" for Delphi 4+ to be compatible with OptiVec |
| index range checking | none | automatic with built-in size information |
| dynamic allocation | function VF_vector, VF_vector0 | procedure SetLength (Delphi 4+ only) |
| initialization with 0 | optional by calling VF_vector0 | always (Delphi 4+ only) |
| de-allocation | function V_free, V_freeAll | procedure Finalize (Delphi 4+ only) |
| reading single elements | function VF_element: a := VF_element(X,5); Delphi 4+ only: typecast into array also possible: a := fArray(X)[5]; | index in brackets: a := X[5]; |
| setting single elements | function VF_Pelement: VF_Pelement(X,5)^ := a; Delphi 4+ only: typecast into array also possible: fArray(X)[5] := a; | index in brackets: X[5] := a; |
| passing to OptiVec function | directly: VF_equ1( X, sz ); | address-of operator: VF_equ1( @X, sz ); |
| passing sub-vector to OptiVec function | function VF_Pelement: VF_equC( VF_Pelement(X,10), sz-10, 3.7); | address-of operator: VF_equC( @X[10], sz-10, 3.7 ); |
Back to VectorLib Table of Contents OptiVec home
| Prefix | Arguments and return value |
| VF_ | fVector and float |
| VD_ | dVector and double |
| VE_ | eVector and extended (long double) |
| VCF_ | cfVector and fComplex |
| VCD_ | cdVector and dComplex |
| VCE_ | ceVector and eComplex |
| VPF_ | pfVector and fPolar |
| VPD_ | pdVector and dPolar |
| VPE_ | peVector and ePolar |
| VI_ | iVector and int / Integer |
| VBI_ | biVector and byte / ByteInt |
| VSI_ | siVector and short int / SmallInt |
| VLI_ | liVector and long int / LongInt |
| VQI_ | qiVector and quad / QuadInt |
| VU_ | uVector and unsigned / UInt |
| VUB_ | ubVector and unsigned char / UByte |
| VUS_ | usVector and unsigned short / USmall |
| VUL_ | ulVector and unsigned long / ULong |
| VUQ_ | uqVector and uquad / UQuad (for Win64 only!) |
| VUI_ | uiVector and ui |
| V_ | (data-type conversions like V_FtoD, data-type independent functions like V_initPlot) |
Back to VectorLib Table of Contents OptiVec home
MS Visual C++ and Embarcadero / Borland C++ Builder (but not previous Borland C++ versions): Programmers should put the directive
"using namespace OptiVec;"
either in the body of any function that usestVecObj, or in the global declaration part of the program. Placing the directive in the function body is safer, avoiding potential namespace conflicts in other functions.
The vector objects are defined as classes vector<T>, encapsulating the vector address (pointer) and size.
For easier use, these classes got alias names fVecObj, dVecObj, and so on, with the data-type signalled by the first one or two letters of the class name, in the same way as the vector types described above.
All functions defined in VectorLib for a specific vector data-type are contained as member functions in the respective tVecObj class.
The constructors are available in four forms:
vector(); // no memory allocated, size set to 0
vector( ui size ); // vector of size elements allocated
vector( ui size, T fill ); // as before, but initialized with value "fill"
vector( vector<T> init ); // creates a copy of the vector "init"
For all vector classes, the arithmetic operators
+ - * / += -= *= /=
are defined, with the exception of the polar-complex vector classes, where only multiplications and divisions, but no additions or subtractions are supported. These operators are the only cases in which you can directly assign the result of a calculation to a vector object, like
fVecObj Z = X + Y; or
fVecObj Z = X * 3.5;
Note, however, that the C++ class syntax rules do not allow a very efficient implementation of these operators. The arithmetic member functions are much faster. If speed is an issue, use
fVecObj Z.addV( X, Y ); or
fVecObj Z.mulC( X, 3.5 );
instead of the operator syntax. The operator * refers to element-wise multiplication, not to the scalar product of two vectors.
All other arithmetic and math functions can only be called as member functions of the respective output vector as, for example, Y.exp(X). Although it would certainly be more logical to have these functions defined in such a way that you could write "Y = exp(X)" instead, the member-function syntax was chosen for efficiency considerations: The only way to implement the second variant is to store the result of the exponential function of X first in a temporary vector, which is then copied into Y, thus considerably increasing the work-load and memory demands.
While most VecObjfunctions are member functions of the output vector, there exists a number of functions which do not have an output vector. In these cases, the functions are member functions of an input vector.
Example: s = X.mean();.
If you ever need to process a VecObj vector in a "classic" plain-C VectorLib function (for example, to process only some part of it), you may use the member functions
getSize() to retrieve its size,
getVector() for the pointer (of data type tVector, where "t" stands for the usual type prefix), and
Pelement( n ) for a pointer to the to the n'th element.
Continue with chapter 4. VectorLib Functions and Routines: A Short Overview
Back to VectorLib Table of Contents
OptiVec home
Copyright © 1998-2012 OptiCode Dr. Martin Sander Software Development