![]() | OptiVec
|
OptiCode
Dr. Martin Sander Software Development Brahmsstr. 6 D-32756 Detmold Germany http://www.optivec.com e-mail: optivec@gmx.de | Part I. A: Handbook |
This HANDBOOK describes the basic principles of the OptiVec libraries and gives an overview over VectorLib, the first part of OptiVec. The object-oriented interface, VecObj, is described in chapter 3. The other parts have their own descriptions in separate files, see MATRIX.HTM and CMATH.HTM.
Chapter 1.2 of this Handbook contains the licence terms for the Demo version, Chapter 1.3 for the Registered version.
OptiCode? and OptiVec? are trademarks of Dr. Martin Sander Software Dev. Other brand and product names mentioned in this handbook for identification purposes are trademarks or registered trademarks of their respective holders.
German-speaking users:
Um die Kosten für das Herunterladen der Demo-Version über das Internet für alle so gering wie möglich zu halten, enthält diese nur die englische Dokumentation. Sie finden die deutsche Beschreibung separat unter http://www.optivec.de/download/OVDOCD.ZIP. |
1.1 Why Vectorized Programming Pays Off on the PC |
1.1.1 General OptiVec Optimization Strategies | |
1.1.2 Multi-Processor Optimization | |
1.1.3 CUDA Device Support | |
1.1.4 Choosing the right OptiVec Library |
1.2 License Terms for the Demo Version | |
1.3 Registered Versions |
1.3.1 Registered Versions: Ordering | |
1.3.2 License Terms for the Registered Versions |
1.4 Getting Started |
4.7 Analysis | |
4.8 Signal Processing: Fourier Transforms and Related Topics | |
4.9 Statistical Functions and Building Blocks | |
4.10 Data Fitting | |
4.11 Input and Output | |
4.12 Graphics |
5.1 General Remarks | |
5.2 Integer Errors | |
5.3 Floating-Point Errors | |
5.4 The Treatment of Denormal Numbers | |
5.5 Advanced Error Handling: Writing Messages into a File | |
5.6 OptiVec Error Messages |
OptiVec offers a powerful set of routines for numerically demanding applications, making the philosophy of vectorized programming available for C/C++ and Pascal/Delphi languages. It serves to overcome the limitations of loop management of conventional compilers – which proved to be one of the largest obstacles in the programmer's way towards efficient coding for scientific and data analysis applications.
In contrast to integrated packages like MatLab or others, OptiVec has the advantage of being incorporated into the modern and versatile languages C/C++ and Pascal/Delphi. Both C++ and Fortran do already offer some sort of vector processing, by virtue of iterator classes using templates (C++) and field functions (Fortran90). Both of these, however, are basically a convenient means of letting the compiler write the loop for you and then compile it to the usual inefficient code. The same is true for most implementations of the popular BLAS (Basic Linear Algebra Subroutine) libraries.
In comparison to these approaches, OptiVec is superior mainly with respect to execution speed – on the average by a factor of 2-3, in some cases even up to 8. The performance is no longer limited by the quality of your compiler, but rather by the real speed of the processor!
There is a certain overlap in the range of functions offered by OptiVec and by BLAS, LINPACK, and other libraries and source-code collections. However, the latter must be compiled, and, consequently, their performance is determined mainly by the quality of the compiler chosen. To the best of our knowledge, OptiVec, was, in 1996, the first product on the market offering a comprehensive vectorized-functions library realized in a true Assembler implementation – and has only grown and evolved since.
The wide range of routines and functions covered by OptiVec, the high numerical efficiency and increased ease of programming make this package a powerful programming tool for scientific and data analysis applications, competing with (and often beating) many high-priced integrated systems, but imbedded into your favourite programming language.
This documentation describes the OptiVec implementations for
Vectorization has always been the magic formula for supercomputers with their multi-processor parallel architectures. On these architectures, one tries to spread the computational effort equally over the available processors, thus maximizing execution speed. The so-called "divide and conquer" algorithms break down more complicated numerical tasks into small loops over array elements. Sophisticated compilers then find out the most efficient way how to distribute the array elements among the processors. Many supercomputer compilers also come with a large set of pre-defined proprietary vector and matrix functions for many basic tasks. These vectorized functions offer the best way to achieve maximum throughput.
Obviously, the massive parallel processing of, say, a Cray is not possible even on modern PCs with their modest 2, 4 or 8-processor core configurations. Consequently, at first sight, it might seem difficult to apply the principle of vectorized programming to the PC. Actually, however, there are many vector-specific optimizations possible, even for computers with only one CPU. Most of these optimizations are not automatically available to present compilers. Rather, one has to go down to the machine-code level. Hand-optimized, Assembler-written vector functions outperform compiled loops by a factor of two to three, on the average. This means that vectorization, properly done, is indeed worth the effort, also for PC programs.
Here are the most important optimization strategies, employed in OptiVec to boost the performance on any PC (regardless of the number of processor cores):
Preload of constants
Floating-point as well as integer constants, employed in the evaluation of mathematical functions, are loaded into registers outside of the actual loop and stay as long as they are needed. This saves a large amount of loading/unloading operations which are necessary if a mathematical function is called for each element of a vector separately.
Prefetch of chunks of vector elements
Beginning with the Pentium III processor, Intel introduced the very useful feature of explicit memory prefetch. With these commands, it is possible to "tell" the processor to fetch data from memory sufficiently in advance, so that no time is wasted waiting for them when they are actually needed.
Full XMM and FPU stack usage
Where necessary, all eight (64-bit: all sixteen) XMM registers and/or all eight coprocessor registers are employed.
Use of SIMD commands
You might wonder why this strategy is not listed first. The SSE or "Streaming Single-Instruction-Multiple-Data Extensions", introduced since the days of the Pentium III and improved with every new processor generation, provide explicit support for vectorized programming with floating-point data in float / single or double precision. At first sight, therefore, they should revolutionize vector programming. Given the usual relation between processor and data bus speeds, however, many of the simple arithmetic operations are data transfer limited, and the use of SIMD commands does not make the large difference (with respect to well-written FPU code) it could make otherwise. In many cases, the advantage of using an SIMD instruction instead of separate FPU instructions melts down to a 20-30% increase in speed (which is not that bad, anyway!). For more complicated operations, on the other hand, SIMD commands often cannot be employed, either because conditional branches have to be taken for each vector element individually, or because the "extra" accuracy and range, available by traditional FPU commands (with their internal extended accuracy), allows to simplify algorithms so much that the FPU code is still faster. As a consequence, we use SIMD commands only where a real speed gain is possible. Please note, however, that, the SIMD-employing library versions (P8, P9 etc.) generally sacrifice 1-2 digits of accuracy in order to attain the described speed gain. If this is not acceptable for your specific task, please stay with the P4 libraries.
Superscalar scheduling
By careful "pairing" of commands whose results do not depend upon each other, the parallel integer pipes and fadd/fmul units of the processor are used as efficiently as possible.
Loop-unrolling
Where SIMD instructions cannot be used and where optimum pairing of commands cannot be achieved for single elements, vectors are often processed in chunks of two, four, or even more elements. This allows to fully exploit the parallel execution pipes. Moreover, the relative amount of time spent for loop management is significantly reduced. In connection with data-prefetching, described above, the depth of the unrolled loops is most often adapted to the cache line size.
Simplified addressing
The addressing of vector elements is still a major source of inefficiency with present compilers. Switching forth and back between input and output vectors, a large number of redundant addressing operations is performed. The strict (and easy!) definitions of all OptiVec functions allow to reduce these operations to a minimum.
Replacement of floating-point by integer commands
For any operations with floating-point numbers that can also be performed using integer commands (like copying, swapping, or comparing to preset values), the faster method is consistently employed.
Strict precision control
C compilers convert a float into a double – Borland Pascal/Delphi even into extended – before passing it to a mathematical function. This approach was useful at times when disk memory was too great a problem to include separate functions for each data type in the .LIB files, but it is simply inefficient on modern PCs. Consequently, no such implicit conversions are present in OptiVec routines. Here, a function of a float is calculated to float (i.e. single) precision, wasting no time for the calculation of more digits than necessary – which would be discarded anyway. There is also a brute-force approach to precision-control: You can call V_setFPAccuracy( 1 ); to actively switch the FPU to single precision, if that is enough for a given application. Thereby, execution can be slightly sped up from Pentium CPUs on. Be, however, prepared to accept even lower-than-single accuracy of your end results, if you elect this option. For further details and precautions, see V_setFPAccuracy.
All-inline coding
All external function calls are eliminated from the inner loops of the vector processing. This saves the execution time necessary for the "call / ret" pairs and for loading the parameters onto the stack.
Cache-line matching of local variables
The Level−1 cache of modern processors uses 64-byte lines. Many OptiVec functions need double-precision or extended-precision real local variables on the stack (mainly for integer/floating-point conversions or for range checking). 32-bit compilers align the stack on 4-byte boundaries, which means there is a certain chance that the 8 bytes of a double or the 10 bytes of an extended, stored on the stack, will cross a cache-line boundary. This, in turn, would lead to a cache line-break penalty, deteriorating the performance. Consequently, those OptiVec functions where this is an issue, use special procedures to align their local variables on 8-byte (for doubles), 16-byte (for extendeds), or 64-byte boundaries (for XMM and YMM values).
Unprotected and reduced-range functions
OptiVec offers alternative forms of some mathematical functions, where you have the choice between the fully protected variant with error handling and another, unprotected variant without. In the case of the integer power functions, for example, the absence of error checking allows the unprotected versions to be vectorized much more efficiently. Similarly, the sine and cosine functions can be coded more efficiently for arguments that the user can guarantee to lie in the range -2p and +2p. In these special cases, the execution time may be reduced by up to 40%, depending on the hardware environment. This increased speed has always to be balanced against the increased risk, though: If any input element outside the valid range is encountered, the unprotected and reduced-range functions will crash without warning.
Multithread support
Modern multi-core processors allow the operating system to distribute threads among the available processors, scaling the overall performance with the number of available processor cores. For that, any functions running in parallel must be prevented from interfering with each other through read/write operations on global variables. With very few exceptions (namely the plotting functions, which have to use global variables to store the current window and coordinate system settings), all other OptiVec functions are reentrant and may run in parallel.
When designing your multi-thread application, you have two options: functional parallelism and data parallelism.
Functional Parallelism
If different threads are performing different tasks – they are functionally different – one speaks of functional parallelism. As an example, consider one thread handling user input / output, while another one performs background calculations. Even on a single-core CPU, this kind of multi-threading may offer advantages (e.g., the user interface does not block during extensive background calculations, but still takes input). On a multi-core computer, the two (or more) threads can actually run simultaneously on the different processor cores. In general, however, the load balance between the processor cores is far from perfect: often, one processor is running at maximum load, while another one is sitting idle, waiting for input. Still, functional multithreading is the best option whenever your numerical tasks involve vectors and matrices of only small-to-moderate size.
Data Parallelism
In order to improve the load balance between the available processor cores, thereby maximizing throughput, it is possible to employ classical parallel processing: the data to be processed is split up into several chunks, each thread getting one of these chunks. This is aptly called data parallelism. The usefulness of this approach is limited by the overhead involved in the data distribution and in the thread-to-thread communication. Moreover, there are always parts of the code which need to be processed sequentially and cannot be parallelized. Therefore, data parallelism pays off only for larger vectors and matrices. Typical break-even sizes range from about 100 (for the calculation of transcendental functions of complex input values) to several 10,000 elements (as in the simple arithmetic functions). Only when your vectors and matrices are considerably larger than that threshold, the performance is actually improved over a functional-parallelism approach. The boost then quickly approaches (but never exactly reaches) the theoretical limit of a factor equal to the number of processor cores available.
For large vectors/matrices on multi-core machines, multi-core optimized libraries actively distribute the work load over the available processor cores for data parallel execution. These libraries are marked by the letter "M", as in OVVC8M.LIB (for MS Visual C++, using SSE2), VCF4M.LIB (for Embarcadero/Borland C++, full FPU accuracy), or the units in OPTIVEC\LIB8M (for Delphi, using SSE3). These libraries are designed for AMD 64 x2, Intel Core2 Duo, or machines equipped with several discrete processors of the Pentium 4+ level. The CUDA libraries are based on the "M" libraries and are marked by the letter "C", as, e.g., in OVVC8C.LIB.
The "M" and "C" libraries will still run on single-core machines, but – due to the thread-management overhead – somewhat slower than the general-purpose libraries. Although the "M" libraries are designed with medium to large vectors in mind, the penalty for using them with smaller vectors is almost negligible, as the OptiVec thread-engine automatically executes a function in a single thread, if the vector size is too small for parallel execution to earn back the cost involved in the thread-to-thread communication.
If you use the "M" or "C" libraries, your programme must call V_initMT( nAvailProcCores ) before any of the vector functions.
Purchasing the full (registered) version gives you the right to use it on as many computers at a time as the number of units you bought.
The right to distribute applications employing functions of OptiVec is included in the commercial-version licence. No run-time licence are needed for your customers! Corporate site and world-wide licences are available upon request.
OptiVec for single compilers: C++ Builder, Visual C++, GCC (Win), LLVM CLang (Win), Delphi, Lazarus / FreePascal, or Linux (GCC / LLVM CLang) | ||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||
OptiVec Master License for all supported compilers | ||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||
CMATH separately; for single compilers: C++ Builder, Visual C++, GCC (Win), LLVM CLang (Win), Delphi, Lazarus / FreePascal, or Linux (GCC / LLVM CLang) | ||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||
CMATH separately; master license for all supported compilers | ||||||||||||||||||||||||||||||
|
If you have a European VAT ID, or if you order from outside the European Union, you are exempt from German VAT, and it will be deduced from your bill, but you may have to pay your local VAT and/or import duties according to local laws.
Please order through www.optivec.de/order/
or send this order form to
OptiCode – Dr. Martin Sander Software Dev.
Brahmsstr. 6
D-32756 Detmold
Germany
optivec@gmx.de
The SOFTWARE in this package is licensed to you as the user. It is not sold. The term "user" means a programmer who links binary code of this SOFTWARE into his own applications. Those people using, in turn, his applications without the need of installing this SOFTWARE themselves, do not need any runtime license for the SOFTWARE. The right to distribute applications containing code of this SOFTWARE is included in the license fee for the commercial version.
Once you have paid the required license fee, you may use the SOFTWARE for as long as you like, provided you do not violate the copyright and if you observe the following rules:
Platform | Compiler | static runtime library | runtime library as DLL |
Win64x | bcc64x (modern), from RAD Studio 12.1.1 (2024) on | ovbcbase64cs.lib | ovbcbase64cd.lib |
Win64 | bcc64 (classic), from RAD Studio 12.x (2023) on | ovbcbase64.a | ovbcbase64.a |
bcc64, all versions until RAD Studio 11.x | ovbcx64.a | ovbcx64.a | |
Win32 | bcc32 (classic), RAD Studio 12+ | ovbcbase32s.lib | ovbcbase32d.lib |
bcc32 (classic), all versions until RAD Studio 11.x | vcfs.lib | vcfd.lib | |
bcc32c (CLang-enhanced), RAD Studio 12+ | ovbcbase32cs.lib | ovbcbase32cd.lib | |
bcc32c (CLang-enhanced),all versions until RAD Studio 11.x | ovbc10_11base32cs.lib | ovbc10_11base32cd.lib |
Second, select the required processor-specific library from the following table and add it to your project. Here, there is no difference between the classic and the CLang-enhanced compiler.
Platform | Compiler | Processor | General-purpose | Debug | Multi-Processor | MP+CUDA |
Win64x | bcc64x | P9: Haswell+ / Excavator+ | OVVC64x_9.lib | ---- | OVBC64x_9M.lib | OVBC64x_9C.lib |
bcc64x | P8: AMD64xxx, Core2xxx | OVBC64x_8.lib | OVBC64x_8D.lib | OVBC64x_8M.lib | OVBC64x_8C.lib | |
Win64 | bcc64 | P9: Haswell+ / Excavator+ | OVVC64_9.a | ---- | OVBC64_9M.a | OVBC64_9C.a |
bcc64 | P8: AMD64xxx, Core2xxx | OVBC64_8.a | OVBC64_8D.a | OVBC64_8M.a | OVBC64_8C.a | |
Win32 | bcc32 (classic) | P8: AMD64xxx, Core2xxx | VCF8W.LIB | ---- | VCF8M.LIB | VCF8C.LIB |
P4: Max compatibility, full FPU accuracy | VCF4W.LIB | VCF4D.LIB | VCF4M.LIB | ---- | ||
bcc32c (CLang) from C++ Builder 10.1 Berlin on | P8: AMD64xxx, Core2xxx | ovbc32c_8.lib | ---- | ovbc32c_8m.lib | ovbc32c_8c.lib | |
P4: Max compatibility, full FPU accuracy | ovbc32c_4.lib | ovbc32c_4d.lib | ovbc32c_4m.lib | ---- | ||
bcc32c (CLang) up until C++ Builder 10 Seattle | P8: AMD64xxx, Core2xxx | VCF8W.LIB | ---- | VCF8M.LIB | VCF8C.LIB | |
P4: Max compatibility, full FPU accuracy | VCF4W.LIB | VCF4D.LIB | VCF4M.LIB | ---- |
In order to profit from the CUDA-enhanced OptiVec library, do the following things:
In previous versions, there were certain restrictions concerning the use of CMATH with the 32-bit CLang-enhanced Borland Compiler bcc32c.exe. All of them have now been lifted.
Continue with chap. 1.5 Declaration of OptiVec functions in C/C++
Platform | Visual Studio version | Runtime Debug DLL | Debug Static | Release DLL | Release Static |
Win64 | VS 2022 | OVVC17x64MDD.LIB | OVVCx64MTD.LIB | OVVC17x64MDR.LIB | OVVCx64MTR.LIB |
Win64 | VS 2019 | OVVC16x64MDD.LIB | OVVCx64MTD.LIB | OVVC16x64MDR.LIB | OVVCx64MTR.LIB |
VS 2017 | OVVC15x64MDD.LIB | OVVCx64MTD.LIB | OVVC15x64MDR.LIB | OVVCx64MTR.LIB | |
VS 2015 | OVVC14x64MDD.LIB | OVVCx64MTD.LIB | OVVC14x64MDR.LIB | OVVCx64MTR.LIB | |
VS 2013 | OVVC12x64MDD.LIB | OVVC8_12x64MTD.LIB | OVVC12x64MDR.LIB | OVVC8_12x64MTR.LIB | |
VS 2012 | OVVC11x64MDD.LIB | OVVC8_12x64MTD.LIB | OVVC11x64MDR.LIB | OVVC8_12x64MTR.LIB | |
VS 2010 | OVVC8x64MDR.LIB* | OVVC8_12x64MTD.LIB | OVVC8x64MDR.LIB | OVVC8_12x64MTR.LIB* | |
VS 2008 | OVVC8x64MDR.LIB* | OVVC8_12x64MTD.LIB | OVVC8x64MDR.LIB* | OVVC8_12x64MTR.LIB | |
VS 2005 | OVVC8x64MDD.LIB | OVVC8_12x64MTD.LIB | OVVC8x64MDR.LIB | OVVC8_12x64MTR.LIB | |
Win32 | VS 2022 | OVVC17MDD.LIB | OVVCMTD.LIB | OVVC17MDR.LIB | OVVCMTR.LIB |
VS 2019 | OVVC16MDD.LIB | OVVCMTD.LIB | OVVC16MDR.LIB | OVVCMTR.LIB | |
VS 2017 | OVVC15MDD.LIB | OVVCMTD.LIB | OVVC15MDR.LIB | OVVCMTR.LIB | |
VS 2015 | OVVC14MDD.LIB | OVVCMTD.LIB | OVVC14MDR.LIB | OVVCMTR.LIB | |
VS 2013 | OVVC12MDD.LIB | OVVC8_12MTD.LIB | OVVC12MDR.LIB | OVVC8_12MTR.LIB | |
VS 2012 | OVVC11MDD.LIB | OVVC8_12MTD.LIB | OVVC11MDR.LIB | OVVC8_12MTR.LIB | |
VS 2010 | OVVC10MDD.LIB | OVVC8_12MTD.LIB | OVVC10MDR.LIB | OVVC8_12MTR.LIB | |
VS 2008 | OVVC9MDD.LIB | OVVC8_12MTD.LIB | OVVC9MDR.LIB | OVVC8_12MTR.LIB | |
VS 2005 | OVVC8MDD.LIB | OVVC8_12MTD.LIB | OVVC8MDR.LIB | OVVC8_12MTR.LIB |
Please note that there is a certain inconsistency in the description of the configurations in Visual Studio: The default configurations "Debug" and "Release" actually use the runtime library and MFC as DLL. Therefore, you have to use the OptiVec base libraries OVVC??MDD.lib and OVVC??MDR.lib with these configurations. There is a problem with using these configurations, however: you always need the RTL and MFC DLL's for the specific compiler version installed on your computer. For many applications, it is therefore recommended to change Project / Properties / Configuration Properties / C/C++ / Code Generation / Runtime Library into "Multi-Thread Debug (/MTd)" or "Multi-Thread Release (MT)", respectively, in order to get rid of the DLL redistributables. This is done in the "DebugStatic" configuration in the demo files coming with OptiVec.
After that, please add the second, processor-specific OptiVec library according to the following table:
Processor | General-purpose library | Debug library | Multi-Processor | MP + CUDA | 64-bit General | 64-bit Debug | 64-bit Multi-Proc. | 64-bit MP + CUDA |
P9: Haswell+ / Excavator+ | ---- | ---- | ---- | ---- | OVVC64_9.LIB | ---- | OVVC64_9M.LIB | OVVC64_9C.LIB |
P8: AMD64xxx, Core2xxx | OVVC8.LIB | ---- | OVVC8M.LIB | OVVC8C.LIB | OVVC64_8.LIB | OVVC64_8D.LIB | OVVC64_8M.LIB | OVVC64_8C.LIB |
P4: Full FPU accuracy, 486DX/Pentium | OVVC4.LIB | OVVC4D.LIB | OVVC4M.LIB | ---- | ---- | ---- | ---- | ---- |
In order to profit from the CUDA-enhanced OptiVec library, do the following things:
Continue with chap. 1.5 Declaration of OptiVec functions in C/C++
You have to include two OptiVec libraries. The first one ("base library") contains the interface between OptiVec and the GCC runtime libraries; it has to be matched with the specific configuration of GCC. The second one is independent from the configuration and runtime library; you have to choose it according to the desired CPU support.
First choose the base library from the following table:
Platform | GCC thread model | GCC exception model | Matching OptiVec base library |
Win64 | Windows threads | SEH | ovgcbase64ws.lib |
Windows threads | Setjmp/Longjmp | ovgcbase64wj.lib | |
Posix threads | SEH | ovgcbase64ps.lib | |
Posix threads | Setjmp/Longjmp | ovgcbase64pj.lib | |
Win32 | Windows threads | Dwarf | ovgcbase32wd.lib |
Windows threads | Setjmp/Longjmp | ovgcbase32wj.lib | |
Posix threads | Dwarf | ovgcbase32pd.lib | |
Posix threads | Setjmp/Longjmp | ovgcbase32pj.lib |
After that, choose the second, processor-specific OptiVec library according to the following table:
Processor | 32-bit General-purpose | Debug | Multi-Processor | MP + CUDA | 64-bit General | 64-bit Debug | 64-bit Multi-Proc. | 64-bit MP + CUDA |
P9: Haswell+ / Excavator+ | ---- | ---- | ---- | ---- | ovgc64_9.lib | ---- | ovgc64_9m.lib | ovgc64_9c.lib |
P8: AMD64xxx, Core2xxx | ovgc32_8.lib | ---- | ovgc32_8m.lib | ovgc32_8c.lib | ovgc64_8.lib | ovgc64_8d.lib | ovgc64_8m.LIB | ovgc64_8c.lib |
P4: Full FPU accuracy, 486DX/Pentium | ovgc32_4.lib | ovgc32_4d.lib | ovgc32_4m.lib | ---- | ---- | ---- | ---- | ---- |
In order to profit from the CUDA-enhanced OptiVec library, do the following things:
One very important point to observe when working with GCC is that the linker does not resolve inter-dependencies between included libraries. As the base library and the processor-specific library of OptiVec do have interdependencies, this means that you will have to include them pair-wise at least twice. In each pair, the processor-specific library should come first and the base library second. If you get linker errors about missing OptiVec functions, just include the same pair of libraries once more. For an example, see the makefile for the OptiVec demo programs.
GCC is the only of the "big" compilers to support 80-bit real numbers (long doubles, extended) in 64-bit. This is a very valuable feature, as the extra accuracy and range can make life much simpler on many occasions. OptiVec also supports this data type with the VE_, VCE_, VPE_, ME_, and MCE_ functions.
For the GCC-Linux version of OptiVec for GCC, see below.
Continue with chap. 1.5 Declaration of OptiVec functions in C/C++
You have to include two OptiVec libraries. The first one ("base library") contains the interface between OptiVec and the CLang runtime libraries. (Actually, CLang heavily relies on the Visual C++ runtime libraries and is almost compatible with Visual C++. This "almost" compatibility, however, is not perfect, to the point that OptiVec has to come with an individual CLang version.) This base library is ovclbase64.lib for 64-bit and ovclbase32.lib for 32-bit.
The second library is specific to the desired CPU support:
Processor | 32-bit General-purpose | Debug | Multi-Processor | MP + CUDA | 64-bit General | 64-bit Debug | 64-bit Multi-Proc. | 64-bit MP + CUDA |
P9: Haswell+ / Excavator+ | ---- | ---- | ---- | ---- | ovcl64_9.lib | ---- | ovcl64_9m.lib | ovcl64_9c.lib |
P8: AMD64xxx, Core2xxx | ovcl32_8.lib | ---- | ovcl32_8m.lib | ovcl32_8c.lib | ovcl64_8.lib | ovcl64_8d.lib | ovcl64_8m.LIB | ovcl64_8c.lib |
P4: Full FPU accuracy, 486DX/Pentium | ovcl32_4.lib | ovcl32_4d.lib | ovcl32_4m.lib | ---- | ---- | ---- | ---- | ---- |
In order to profit from the CUDA-enhanced OptiVec library, do the following things:
For the Linux-CLang version of OptiVec, see below.
Continue with chap. 1.5 Declaration of OptiVec functions in C/C++
Processor | 32-bit General-purpose | Debug | Autothreading (Multi-Processor) | MP + CUDA | 64-bit General | 64-bit Debug | 64-bit Multi-Proc. | 64-bit MP+CUDA | 64-bit MP+legacy CUDA |
P9: Haswell+ / Excavator+ | ---- | ---- | ---- | ---- | Win64\LIB9 | ---- | Win64\LIB9M | Win64\LIB9C | ---- |
P8: AMD64xxx, Core2xxx | LIB8 | ---- | LIB8M | LIB8C | Win64\LIB8 | Win64\LIB8D | Win64\LIB8M | Win64\LIB8C | Win64\LIB8Cleg |
P4: FPU accuracy, 486DX/Pentium | LIB4 | LIB4D | LIB4M | ---- | ---- | ---- | ---- | ---- | ---- |
Continue with chap. 1.5.2 Declaration of OptiVec functions in Pascal / Delphi
Processor | General-purpose | Debug | Autothreading (Multi-Processor) | MP + CUDA | MP + legacy CUDA |
P9: Haswell+ / Excavator+ | LIB9 | --- | LIB9M | LIB9C | --- |
P8: AMD64xxx, Core2xxx | LIB8 | LIB8D | LIB8M | LIB8C | LIB8Cleg |
Continue with chap. 1.5.2 Declaration of OptiVec functions in Pascal / Delphi
Platform | Threading | Matching OptiVec base library |
Linux 64 | Single-thread | ovlxcbase64s.a |
Multi-thread | ovlxcbase64m.a |
After that, choose the second, processor-specific OptiVec library according to the following table:
Processor | General | Debug | Multi-Proc. auto-threading |
P9: Haswell+ / Excavator+ | ovlxc64_9.a | ---- | ovlxc64_9m.a |
P8: AMD64xxx, Core2xxx | ovlxc64_8.a | ovlxc64_8d.a | ovlxc64_8m.a |
One very important point to observe when working with GCC and CLang on Linux is that the linker does not resolve inter-dependencies between included libraries. As the base library and the processor-specific library of OptiVec do have interdependencies, this means that you will have to include them pair-wise at least twice. In each pair, the processor-specific library should come first and the base library second. If you get linker errors about missing OptiVec functions, just include the same pair of libraries once more. For an example, see the makefile for the OptiVec demo programs.
Continue with chap. 1.6 Sample programs
After these preparations, all OptiVec functions are available for your programs.
Should you wish to remove OptiVec from your computer, please run UNINSTAL.EXE or simply delete the directory OPTIVEC with its subdirectories.
The 64-bit integer data type (__int64 in BC++ Builder and MS Visual C++, Int64 in Delphi) is called quad (for "quadword integer") in OptiVec.
In 32-bit, the type quad is always signed. Functions for unsigned 64-bit integers are available only in the 64-bit versions of OptiVec.
The data type extended, which is familiar to Pascal/Delphi programmers, is defined as a synonym for "long double" in OptiVec for C/C++. As all 64-bit compilers (Visual C++ even not for 32-bit) do not support 80-bit reals, we define "extended" as "double" in the OptiVec versions for these compilers.
The reason for the choice of the name "extended" is that all OptiVec routines shall have identical names in C/C++ and Pascal/Delphi languages. Since the function prefixes are derived from the data types of the processed vectors (see below), this necessitates the definition of alias names for some data types denoted differently in the various languages. While the letter "L" (which could possibly stand for "long double") is already overcrowded by the data types long int and unsigned long, the letter "E" is unique to the data type extended and therefore used in the prefixes for vectors and functions of long double precision. This way, the letters defining the real- number data types are in alphabetical proximity: "D" for double, "E" for extended, and "F" for float. In the future, high-precision 128-bit real numbers (__fp128 / __float128) would find their place in this series as "G" for "great" and half floats (__fp16) as "H".
For historical reasons (dating back to the development of Turbo Pascal), the various integer data types have a somewhat confusing nomenclature in Delphi. In order to make the derived function prefixes compatible with the C/C++ versions of OptiVec, we define a number of synonyms, as described in the following table:
type | Delphi name | synonym | derived prefix |
8 bit signed | ShortInt | ByteInt | VBI_ |
8 bit unsigned | Byte | UByte | VUB_ |
16 bit signed | SmallInt | VSI_ | |
16 bit unsigned | Word | USmall | VUS_ |
32 bit signed | LongInt | VLI_ | |
32 bit unsigned | ULong | VUL_ | |
64 bit signed | Int64 | QuadInt | VQI_ |
64 bit unsigned (x64 version only!) | UInt64 | UQuad | VUQ_ |
16/32 bit signed | Integer | VI_ | |
16/32 bit unsigned | Cardinal | UInt | VU_ |
To have a Boolean data type available which is of the same size as Integer, we define the type IntBool. It is equivalent to WordBool in Pascal, but LongBool in Delphi. You will see the IntBool type as the return value of many mathematical VectorLib functions.
Most compilers and available libraries implement complex functions very inefficiently and inaccurately. (Just writing down the textbook formula for a complex function, like it is usually done, works fine only for a very limited range of arguments!)
Our aims are
VectorLib itself contains the necessary initialization functions of complex numbers and all vectorized forms of complex math functions. If you are using only these, you need not explicitly include CMATH. In this case, the following complex data types are defined in <VecLib.h> for C/C++:
typedef struct { float Re, Im; } fComplex;
typedef struct { double Re, Im; } dComplex;
typedef struct { extended Re, Im; } eComplex;
typedef struct { float Mag, Arg; } fPolar;
typedef struct { double Mag, Arg; } dPolar;
typedef struct { extended Mag, Arg; } ePolar;
(the data type extended is used as a synonym for long double, see above.)
The corresponding definitions for Pascal/Delphi are contained in the unit VecLib:
type fComplex = record Re, Im: Float; end;
type dComplex = record Re, Im: Double; end;
type eComplex = record Re, Im: Extended; end;
type fPolar = record Mag, Arg: Float; end;
type dPolar = record Mag, Arg: Double; end;
type ePolar = record Mag, Arg: Extended; end;
If, for example, a complex number z is declared as "fComplex z;", the real and imaginary parts of z are available as z.Re and z.Im, resp. Complex numbers are initialized either by setting the constituent parts separately to the desired value, e.g.,
z.Re = 3.0; z.Im = 5.7;
p.Mag = 4.0; p.Arg = 0.7;
(of course, the assignment operator is := in Pascal/Delphi).
Alternatively, the same initialization can be accomplished by the
functions fcplx or fpolr:
C/C++:
z = fcplx( 3.0, 5.7 );
p = fpolr( 4.0, 0.7 );
Pascal/Delphi:
fcplx( z, 3.0, 5.7 );
fpolr( p, 3.0, 5.7 );
For double-precision complex numbers, use dcplx and dpolr, for extended-precision complex numbers, use ecplx and epolr.
Pointers to arrays or vectors of complex numbers are declared using the data types cfVector, cdVector, and ceVector (for cartesian complex) and pfVector, pdVector, and peVector (for polar complex) described below.
The basis of all VectorLib routines is formed by the various vector data types given below and declared in <VecLib.h> or the unit VecLib. In contrast to the fixed-size static arrays, the VectorLib types use dynamic memory allocation and allow for varying sizes. Because of this increased flexibility, we recommend that you predominantly use the latter. Here they are:
C/C++
| Pascal/Delphi
|
Note: in connection with Windows programs, often the letter "l" or "L" is used to denote "long int" variables. In order to prevent confusion, however, the data type "long int" is signalled by "li" or "LI", and the data type "unsigned long" is signalled by "ul" or "UL". Conflicts with prefixes for "long double" vectors are avoided by deriving these from the alias name "extended" and using "e", "ce", "E", and "CE", as described above and in the following. |
OptiVec vectors | Pascal/Delphi static/dynamic arrays | |
alignment of first element | on 32-byte boundary for optimum cache-line matching | 2 or 4-byte boundary (may cause line-break penalty for double, QuadInt) |
alignment of following elements | packed (i.e., no dummy bytes between elements, even for 10- and 20-bit types | arrays must be declared as "packed" for Delphi to be compatible with OptiVec |
index range checking | Debug libraries: automatic; Release libraries: none | automatic with built-in size information |
dynamic allocation | function VF_vector, VF_vector0 | procedure SetLength |
initialization with 0 | optional by calling VF_vector0 | always |
de-allocation | function V_free, V_freeAll | procedure Finalize |
reading single elements | function VF_element: a := VF_element(X,5); typecast into array also possible: a := fArray(X)[5]; | index in brackets: a := X[5]; |
setting single elements | function VF_setElement: VF_setElement(X,5, a); Delphi only: typecast into array also possible: fArray(X)[5] := a; | index in brackets: X[5] := a; |
getting the address of a single element | function VF_Pelement | |
passing to OptiVec function | directly: VF_equ1( X, sz ); | address-of operator: VF_equ1( @X, sz ); |
passing sub-vector to OptiVec function | function VF_Pelement: VF_equC( VF_Pelement(X,10), sz−10, 3.7); | address-of operator: VF_equC( @X[10], sz−10, 3.7 ); |
Any of the algebraic and mathematical functions included in this library exists in one variant for each floating-point format. The data type of all floating-point vector elements, parameters, and of the return value is always the same within one function. The data type is signalled by the second letter of the prefix: VF_ denotes the variant of a function that uses exclusively the data type float (Pascal: Single), VD_ stands for the data type double, and VE_ for the data type extended, i.e., long double. (The first letter, "V", stands for "Vector function", of course.) VF_ functions thus work on arrays declared as fVector, use parameters of the type float, and, if there is any floating-point return value, this will also be of the type float. Except for a very few cases, there are no mixed-type functions (that would, e.g., work on vectors of type fVector, use parameters of type double and return a value of type long double).
For the description of the functions in the Alphabetical Reference, generally only the VF_ version is described and its syntax explicitly given. The versions for the data types double and long double are exactly analogous to the VF_ variant. You have only to replace the prefix VF_ by VD_ (or VE_) and to use "dVector" and "double" (or "eVector" and "extended", resp.) wherever you find "fVector" and "float" in the VF_ version.
Return values of the complex data types are not possible in Pascal/Delphi. Therefore, the syntax of those functions returning a complex number is different in C/C++ and Pascal/Delphi.
In contrast to the carelessness with which complex mathematical functions are often treated (see above), the complex functions of OptiVec are designed in such a way as to achieve full accuracy over the complete range of input/output values possible with the respective data type.
In order to perform non-vectorized complex operations with the same level of speed and reliability as the vectorized ones, use CMATH. See CMATH.HTM for details.
For the unsigned integer data types, we have:
prefix VUB_: data type unsigned char (unsigned byte) / UByte,
prefix VUS_: data type unsigned short / USmall,
prefix VU_: data type unsigned / UInt,
prefix VUL_: data type unsigned long / ULong (32-bit for Windows, 64-bit for Linux),
prefix VUQ_: data type uquad / UQuad (only for Win64 and Linux),
prefix VUI_: data type ui.
Don't be afraid of so many data types. It is one of the advantages of modern computer languages to have them, and it is one of the disadvantages, at the same time, that a programming style is supported which mixes all the data types until it is no longer clear "who is who". In all normal cases, the VI_, VLI_, and VU_ functions should be sufficient; but keep in mind that there are more available in case you need them.
If present, the vectorized integer functions are always described together with their floating-point analogues. To obtain, for example, the VI_ version, vectors of type iVector have to be substituted for those of type fVector which are demanded by the VF_ version. In the same way, the other versions are obtained by changing "float" and "fVector" into the desired data type.
MS Visual C++ and Embarcadero / Borland C++ Builder (but not previous Borland C++ versions): Programmers should put the directive
"using namespace OptiVec;"
either in the body of any function that usestVecObj, or in the global declaration part of the program. Placing the directive in the function body is safer, avoiding potential namespace conflicts in other functions.
The vector objects are defined as classes vector<T>, encapsulating the vector address (pointer) and size.
For easier use, these classes got alias names fVecObj, dVecObj, and so on, with the data-type signalled by the first one or two letters of the class name, in the same way as the vector types described above.
All functions defined in VectorLib for a specific vector data-type are contained as member functions in the respective tVecObj class.
The constructors are available in four forms:
vector(); // no memory allocated, size set to 0
vector( ui size ); // vector of size elements allocated
vector( ui size, T fill ); // as before, but initialized with value "fill"
vector( vector<T> init ); // creates a copy of the vector "init"
For all vector classes, the arithmetic operators
+ - * / += -= *= /=
are defined, with the exception of the polar-complex vector classes, where only multiplications and divisions, but no additions or subtractions are supported. These operators are the only cases in which you can directly assign the result of a calculation to a vector object, like
fVecObj Z = X + Y; or
fVecObj Z = X * 3.5;
Note, however, that the C++ class syntax rules do not allow a very efficient implementation of these operators. The arithmetic member functions are much faster. If speed is an issue, use
fVecObj Z.addV( X, Y ); or
fVecObj Z.mulC( X, 3.5 );
instead of the operator syntax.
The operator * refers to element-wise multiplication, not to the scalar product of two vectors.
All other arithmetic and math functions can only be called as member functions of the respective output vector as, for example, Y.exp(X). Although it would certainly be more logical to have these functions defined in such a way that you could write "Y = exp(X)" instead, the member-function syntax was chosen for efficiency considerations: The only way to implement the second variant is to store the result of the exponential function of X first in a temporary vector, which is then copied into Y, thus considerably increasing the work-load and memory demands.
While most VecObj functions are member functions of the output vector, there exists a number of functions which do not have an output vector. In these cases, the functions are member functions of an input vector.
Example: s = X.mean();.
If you ever need to process a VecObj vector in a "classic" plain-C VectorLib function (for example, to process only some part of it), you may use the member functions
getSize() to retrieve its size,
getVector() for the pointer (of data type tVector, where "t" stands for the usual type prefix), and
Pelement( n ) for a pointer to the to the n'th element.
The syntax of all VecObj functions is described in FUNCREF.HTM together with the basic VectorLib functions for which tVecObj serves as a wrapper.
The following functions manage dynamically allocated vectors:
VF_vector | memory allocation for one vector |
VF_vector0 | memory allocation and initialization of all elements with 0 |
V_free | free one vector |
V_nfree | free n vectors (only for C, not for Pascal) |
V_freeAll | free all existing vectors |
C/C++: X = VF_vector( 3*size); Z = (Y = X+size) + size; |
Pascal/Delphi: X := VF_vector( 3*size ); Y := VF_Pelement( X, size ); Z := VF_Pelement( Y, size ); |
The following functions are used to initialize or re-initialize vectors that have already been created:
VF_equ0 | set all elements of a vector equal to 0 |
VCF_Reequ0 | set all real parts equal to 0, leaving the imaginary parts unchanged |
VCF_Imequ0 | set all imaginary parts equal to 0, leaving the real parts unchanged |
VF_equ1 | set all elements equal to 1 |
VF_equm1 | set all elements equal to −1 |
VF_equC | set all elements equal to a constant C |
VF_equV | make one vector a copy of another |
VFx_equV | "expanded" version of the equality operation: Yi = a * Xi + b |
VF_ramp | "ramp": Xi = a * i + b. |
VUI_ramp | "index ramp": Xi = i (VUI_ and VU_ versions only) |
VF_randomLC | high-quality random numbers |
VF_random | simplified form of VF_randomLC for high-quality random numbers |
VF_noiseLC | white noise |
VF_noise | simplified form of VF_noiseLC fo white noise |
VF_comb | "comb": equals a constant C at equidistant points, elsewhere 0 |
The following functions are used to access and modify single vector elements:
VF_Pelement | returns a pointer to the vector element specified by its index |
VF_element | returns a specific vector element |
VF_getElement | copies a specific vector element into a variable |
VF_setElement | sets a vector element to a new value |
VF_accElement | X[i] += c; adds a value to one vector element |
VF_decElement | X[i] -= c; subtracts a value from one vector element |
VF_Hann | Hann window |
VF_Parzen | Parzen window |
VF_Welch | Welch window |
VF_ReImtoC | merge two vectors, Re and Im, into one cartesian complex vector |
VF_RetoC | overwrite the real part of a cartesian complex vector |
VF_ImtoC | overwrite the imaginary part of a cartesian complex vector |
VF_PolartoC | construct a cartesian complex vector from polar coordinates, entered as separate vectors Mag and Arg |
VF_MagArgtoP | merge two vectors, Mag and Arg into one polar complex vector |
VF_MagArgtoPrincipal | merge two vectors, Mag and Arg into one polar complex vector, reducing the Arg range to the principal value, -p < Arg ≤ +p |
VF_MagtoP | overwrite the Mag part of a polar complex vector |
VF_ArgtoP | overwrite the Arg part of a polar complex vector |
VF_ReImtoP | construct a polar complex vector from cartesian coordinates, entered as separate vectors Re and Im |
VF_rev | reverse the element ordering |
VCF_revconj | complex conjugate and reverse element ordering |
VF_reflect | set the upper half of a vector equal to the reversed lower half |
VF_rotate | rotate the ordering of the elements |
VF_rotate_buf | efficient rotation, employing user-specified buffer memory |
VF_insert | insert one element into a vector |
VF_delete | delete one element from a vector |
VF_sort | fast sorting of the elements (ascending or descending order) |
VF_sortind | sorting of an index array associated with a vector |
VF_subvector | extract a subvector from a (normally larger) vector, using a constant sampling interval. |
VF_indpick | fills a vector with elements "picked" from another vector according to their indices. |
VF_indput | distribute the elements of one vector to the sites of another vector specified by their indices. |
|
|
VF_searchC | search for the element of a vector that is closest to a pre-set value C (closest, closest larger-or-equal, or closest smaller-or-equal value, depending on a parameter "mode") |
VF_searchV | the same, but for a whole array of pre-set values |
VF_polyinterpol | polynomial interpolation |
VF_ratinterpol | rational interpolation |
VF_natCubSplineInterpol | natural cubic spline interpolation |
VF_splineinterpol | general cubic spline interpolation |
V_FtoD | float to double |
V_CDtoCF | complex<double> to complex<float> (with overflow protection) |
V_PFtoPE | polar<float> to polar<extended> |
VF_PtoC | polar<float> to complex<float> |
V_ItoLI | int to long int |
V_ULtoUS | unsigned long to unsigned short |
V_ItoU | signed int to unsigned int. Interconversions between signed and unsigned types can only be performed on the same level of accuracy. Functions like "V_UStoLI" do not exist. |
V_ItoF | int to float |
VF_roundtoI | round to the closest integer |
VF_choptoI | round by neglecting ("chopping off") the fractional part |
VF_trunctoI | the same as VF_choptoI |
VF_ceiltoI | round to the next greater-or-equal integer |
VF_floortoI | round to the next smaller-or-equal integer |
VF_ReImtoC | form a cartesian complex vector out of its real and imaginary parts |
VF_RetoC | overwrite the real part |
VF_ImtoC | overwrite the imaginary part |
VF_CtoReIm | extract the real and imaginary parts |
VF_CtoRe | extract the real part |
VF_CtoIm | extract the imaginary part |
VF_PolartoC | form a cartesian complex vector out of polar coordinates, entered as separate vectors Mag and Arg |
VF_CtoPolar | transform cartesian complex into polar coordinates, returned in the separate vectors Mag and Arg |
VF_CtoAbs | absolute value (magnitude of the pointer in the complex plane) |
VF_CtoArg | argument (angle of the pointer in the complex plane) |
VF_CtoNorm | norm (here defined as the square of the absolute value) |
VCF_normtoC | norm, stored as a cartesian complex vector (with all imaginary parts equal to 0) |
VF_MagArgtoP | merge two vectors, Mag and Arg into one polar complex vector |
VF_MagArgtoPrincipal | merge two vectors, Mag and Arg into one polar complex vector, reducing the Arg range to the principal value, -p < Arg ≤ +p |
VF_MagtoP | overwrite the Mag part of a polar complex vector |
VF_ArgtoP | overwrite the Arg part of a polar complex vector |
VF_PtoMagArg | extract the Mag and Arg parts |
VF_PtoMag | extract the Mag part |
VF_PtoArg | extract the Arg part |
VF_PtoNorm | norm (here defined as the square of the magnitude) |
VF_ReImtoP | construct a polar complex vector from cartesian coordinates, entered as separate vectors Re and Im |
VF_PtoReIm | transform a polar complex vector into two real vectors, representing the corresponding cartesian coordinates Re and Im |
VF_PtoRe | calculate the real part of the polar complex input numbers |
VF_PtoIm | calculate the imaginary part of the polar complex input numbers |
VPF_principal | calculate the principal value. You might recall that each complex number has an infinite number of representations in polar coordinates, with the angles differing by an integer multiple of 2 p. The representation with -p < Arg ≤ +p is called the principal value. |
In addition to this error handling "by element", the return values of the VectorLib math functions show if all elements have been processed successfully. In C/C++, the return value is of the data-type int, in Pascal/Delphi, it is IntBool. (We do not yet use the newly introduced data type bool for this return value in C/C++, in order to make VectorLib compatible also with older versions of C compilers.) If a math function worked error-free, the return value is FALSE (0), otherwise it is TRUE (any non-zero number).
VF_round | round to the closest integer |
VF_chop | round by neglecting ("chopping off") the fractional part |
VF_trunc | the same as VF_chop |
VF_ceil | round to the next greater-or-equal integer |
VF_floor | round to the next smaller-or-equal integer |
VF_roundtoI | round to the closest integer |
VF_choptoI | round by neglecting ("chopping off") the fractional part |
VF_trunctoI | the same as VF_choptoI |
VF_ceiltoI | round to the next greater-or-equal integer |
VF_floortoI | round to the next smaller-or-equal integer |
VF_choptoSI | neglect the fractional part and store as short int / SmallInt |
VF_ceiltoLI | round up and store as long int / LongInt |
VF_floortoQI | round downwards and store as quadruple integer, quad / QuadInt |
VF_roundtoU | round and store as unsigned / UInt |
VF_ceiltoUS | round up and store as unsigned short / USmall |
VD_choptoUL | neglect the fractional part and store as unsigned long / ULong |
VF_cmp_eq0 / _eqC / _eqV | Xi = 0 / C / Yi ? ("equal") |
VF_cmp_ne0 / _neC / _neV | Xi ≠ 0 / C / Yi ? ("not equal") |
VF_cmp_gt0 / _gtC / _gtV | Xi > 0 / C / Yi ? ("greater than") |
VF_cmp_ge0 / _geC / _geV | Xi ≥ 0 / C / Yi ? ("greater than or equal") |
VF_cmp_lt0 / _ltC / _ltV | Xi < 0 / C / Yi ? ("less than") |
VF_cmp_le0 / _leC / _leV | Xi ≤ 0 / C / Yi ? ("less than or equal") |
VF_cmp_stV | Xi ≈ Yi ? ("similar to") |
VF_cmp_dtV | Xi ≉ Yi ? ("dissimilar to") |
As a second possibility, the indices of conforming elements can be stored in an index vector as is done by the VF_cmp_...ind series. Some examples:
VF_cmp_neCind | Indices of all elements Xi != C |
VD_cmp_lt0ind | Indices of all elements Xi < 0 |
VE_cmp_geVind | Indices of all elements Xi ≥ Yi |
Testing if elements fall into a certain range is done by the functions with the postfixes "inclrange0C", "inclrangeCC", "exclrange0C" and "exclrangeCC":
VF_cmp_inclrange0C | TRUE for 0 ≤ x ≤ C (C positive), 0 ≥ x ≥ C (C negative) |
VF_cmp_exclrange0C | TRUE for 0 < x < C (C positive), 0 > x > C (C negative) |
VF_cmp_inclrangeCC | TRUE for CLo ≤ x ≤ CHi |
VF_cmp_exclrangeCC | TRUE for CLo < x < CHi |
VF_cmp_inclrange0Cind | Indices of all elements 0 ≤ Xi ≤ C (C positive), 0 ≥ Xi > C (C negative) |
VF_cmp_exclrange0Cind | Indices of all elements 0 < Xi < C (C positive), 0 > Xi > C (C negative) |
VF_cmp_inclrangeCCind | Indices of all elements CLo ≤ Xi ≤ CHi |
VF_cmp_exclrangeCCind | Indices of all elements CLo < Xi < CHi |
Counting elements fulfilling a comparison condition is performed by the VF_cnt_... series of functions. Some examples:
VF_cnt_eq0 | count the number of elements equal to 0 (accepting −0 as valid) |
VD_cnt_gtC | count the number of elements greater than a constant C |
VE_cnt_leV | count the number of elements less than or equal to the corresponding elements of another vector |
VLI_cnt_inclrange0C | count the number of elements xi falling into the range 0 ≤ xi ≤ C or, if C is negative, 0 ≥ xi ≥ C |
Additionally, the signum function can be performed with the three possible answers +1 for "greater than", 0 for "equal to" or −1 for "less than". For input vectors of the unsigned integer data types, the output vectors are of the corresponding signed integer types.
VF_cmp0 | signum function: compare to 0: yi = +1, if xi > 0; yi = 0, if xi = 0; yi = −1, if xi < 0. |
VU_cmpC | compare to a constant C: yi = +1, if xi > C; yi = 0, if xi = C; yi = −1, if xi < C. |
VE_cmpV | compare corresponding vector elements: zi = +1, if xi > yi; zi = 0, if xi = yi; zi = −1, if xi < yi |
VF_iselementC | returns TRUE, if C is an element of a vector |
VF_iselementV | checks for each element of a vector if it is contained in a table |
VI_shl | shift the bits to the left |
VI_shr | shift the bits to the right |
VI_or | apply a bit mask in an OR operation |
VI_xor | apply a bit mask in an XOR operation |
VI_not | invert all bits |
VF_neg | Yi = - Xi |
VF_abs | Yi = | Xi | |
VCF_conj | Yi.Re = Xi.Re; Yi.Im = −(Xi.Re) |
VF_inv | Yi = 1.0 / Xi |
|
|
|
|
The functions in the right column of the above two sections also exist in an expanded form (with the prefix VFx_...) in which the function is not evaluated for Xi itself, but for the expression
(a * Xi + b), e.g.
VFx_addV | Zi = (a * Xi + b) + Yi |
VFx_divrV | Zi = Yi / (a * Xi + b) |
VFs_addV | Zi = C * (Xi + Yi) |
VFs_subV | Zi = C * (Xi − Yi) |
VFs_mulV | Zi = C * (Xi * Yi) |
VFs_divV | Zi = C * (Xi / Yi) |
VF_maxC | set Yi equal to Xi or C, whichever is greater |
VF_minC | choose the smaller of Xi and C |
VF_maxV | set Zi equal to Xi or Yi, whichever is greater |
VF_minV | set Zi equal to Xi or Yi, whichever is smaller |
VF_limit | limit the range of values |
VF_flush0 | set all values to zero which are below a preset threshold |
VF_flushInv | set all values to zero which are below a preset threshold and take the inverse of all other values |
VF_absHuge | replace negative poles by positive ones |
VF_intfrac | split into integer and fractional parts |
VF_mantexp | split into mantissa and exponent |
VF_addVI | fVector Z = fVector X + iVector Y |
VD_mulVUL | dVector Z = dVector X * ulVector Y |
VE_divrVBI | eVector Z = biVector Y / eVector X |
Similarly, there exists a family of functions for the accumulation of data in either the same type or in higher-precision data types. Some examples are:
VF_accV | fVector Y += fVector X |
VD_accVF | dVector Y += fVector X |
VF_accVI | fVector Y += iVector X |
VQI_accVLI | qiVector Y += liVector X |
VF_acc2V | fVector Y += fVector X1 + fVector X2 |
VD_acc2VF | dVector Y += fVector X1 + fVector X2 |
Again only within the floating-point data-types, you can also accumulate squares and products:
VF_accV2 | fVector Y += fVector X2 |
VD_accVF2 | dVector Y += fVector X2 |
VCF_accVmulVconj | cfVector Y += cfVector X * cfVector Y* |
VF_scalprod | scalar product of two vectors |
VF_xprod | cross-product (or vector product) of two vectors |
VF_Euclid | Euclidean norm |
If, on the other hand, two real input vectors X and Y, or one complex input vector XY, define the coordinates of several points in a planar coordinate system, there is a function to rotate these coordinates:
VF_rotateCoordinates | counter-clockwise rotation of the input coordinates specified by the vectors X and Y; the result is returned in the vectors Xrot and Yrot. |
VCF_rotateCoordinates | counter-clockwise rotation of the input coordinates specified by the cartesian complex vector XY; the result is returned in the vector XYrot. |
normal version | unprotected version | operation |
VF_square | VFu_square | square |
VF_cubic | VFu_cubic | cubic |
VF_quartic | VFu_quartic | quartic (fourth power) |
VF_rsquare | VFu_rsquare | reciprocal square |
VF_rcubic | VFu_rcubic | reciprocal cubic |
VF_rquartic | VFu_rquartic | reciprocal quartic (fourth power) |
VF_sqrt | VFu_sqrt | square-root (which corresponds to a power of 0.5) |
VF_rsqrt | ||
VFu_rsqrt | reciprocal square-root (which corresponds to a power of −0.5) | |
VF_inv | ||
VFu_inv | reciprocal (power of −1) | |
VF_ipow | VFu_ipow | arbitrary integer powers |
VF_pow | n.a. | fractional powers |
VF_powexp | n.a. | fractional powers, multiplied by exponential function: xrexp(x) |
VF_poly | VFu_poly | polynomial |
VF_polyOdd | VFu_polyOdd | polynomial consisting of odd terms only |
VF_polyEven | VFu_polyEven | polynomial consisting of even terms only |
VF_ratio | VFu_ratio | ratio of two polynomials, p/q |
VF_ratioOddEven | VFu_ratioOddEven | ratio of two polynomials, p/q, where p consists of odd terms only, and q consists of even terms only (like in the rational approximation of the tangent function) |
VF_ratioEvenOdd | VFu_ratioEvenOdd | ratio of two polynomials, p/q, where p consists of even terms only, and q consists of odd terms only (like in the rational approximation of the cotangent function) |
VF_pow10 | fractional powers of 10 |
VF_ipow10 | integer powers of 10 (stored as floating-point numbers) |
VF_pow2 | fractional powers of 2 |
VF_ipow2 | integer powers of 2 (stored as floating-point numbers) |
VF_exp | exponential function |
VF_exp10 | exponential function to the basis 10 (identical to VF_pow10) |
VF_exp2 | exponential function to the basis 2 (identical to VF_pow2) |
VF_expArbBase | exponential function of an arbitrary base |
The complex-number equivalents are available as well, both for cartesian and polar coordinates. Additionally, two special cases are covered:
VCF_powReExpo | real, fractional powers of complex numbers |
VCF_exptoP | takes a cartesian input vector, returning its exponential function in polar coordinates. |
VF_exp | exponential function |
VF_expc | complementary exponential function Yi = 1 - exp[Xi] |
VF_expmx2 | exponential function of the negative square of the argument, Yi = exp( −Xi² ). This is a bell-shaped function. |
VF_Gauss | Gaussian distribution function |
VF_erf | Error function (Integral over the Gaussian distribution) |
VF_erfc | complementary error function, 1 - erf( Xi ) |
VF_powexp | fractional powers, multiplied by exponential function, Xirexp(Xi) |
VF_sinh | hyperbolic sine |
VF_cosh | hyperbolic cosine |
VF_tanh | hyperbolic tangent |
VF_coth | hyperbolic cotangent |
VF_sech | hyperbolic secant |
VF_cosech | hyperbolic cosecant |
VF_sech2 | square of the hyperbolic secant |
VF_log10 | decadic logarithm (to the basis 10) |
VF_log | natural logarithm (to the basis e) |
VF_ln | synonym for VF_log |
VF_log2 | binary logarithm (to the basis 2) |
VPF_log10toC | decadic logarithm (to the basis 10) |
VPF_logtoC | natural logarithm (to the basis e) |
VPF_lntoC | synonym for VPF_logtoC |
VPF_log2toC | binary logarithm (to the basis 2) |
VF_OD | OD = log10( X0/X ) for fVector as input and as output |
VF_ODwDark | OD = log10( (X0−X0Dark) / (X−XDark) ) for fVector as input and as output |
VUS_ODtoF | OD, calculated in float precision for usVector input |
VUL_ODtoD | OD, calculated in double precision for ulVector input |
VQI_ODtoEwDark | OD with dark-current correction, calculated in extended precision for qiVector input |
VF_sin | sine |
VFr_sin | extra-fast "reduced-range" sine function for -2p ≤ Xi ≤ +2p |
VF_cos | cosine |
VFr_cos | cosine for -2p ≤ Xi ≤ +2p |
VF_sincos | sine and cosine at once |
VFr_sincos | sine and cosine for -2p ≤ Xi ≤ +2p |
VF_tan | tangent |
VF_cot | cotangent |
VF_sec | secant |
VF_cosec | cosecant |
VF_sin2 | sine² |
VFr_sin2 | sine² for -2p ≤ Xi ≤ +2p |
VF_cos2 | cosine² |
VFr_cos2 | cosine² for -2p ≤ Xi ≤ +2p |
VF_sincos2 | sine² and cosine² at once |
VFr_sincos2 | sine² and cosine² for -2p ≤ Xi ≤ +2p |
VF_tan2 | tangent² |
VF_cot2 | cotangent² |
VF_sec2 | secant² |
VF_cosec2 | cosecant² |
VF_sinrpi | sine of p/q * p |
VF_cosrpi | cosine of p/q * p |
VF_sincosrpi | sine and cosine of p/q * p at once |
VF_tanrpi | tangent of p/q * p |
VF_cotrpi | cotangent of p/q * p |
VF_secrpi | secant of p/q * p |
VF_cosecrpi | cosecant of p/q * p |
VF_sinrpi2 | sine of p / 2n * p |
VF_tanrpi3 | tangent of p / (3*n) * p |
VF_sinc | sinc function, Yi = sin( Xi ) / Xi |
VF_Kepler | Kepler function, calculating the time-dependent angular position of a planet or comet |
VF_asin | arc sin |
VF_acos | arc cos |
VF_atan | arc tan |
VF_atan2 | arc tan of ratios, Zi = atan( Yi / Xi ) |
VF_derivV | derivative of a Y-array with respect to an X-array |
VF_derivC | the same for constant intervals between the X-values |
VF_integralV | value of the integral of a Y-array over an X-array |
VF_runintegralV | point-by-point ("running") integral |
VF_integralC | integral over an equally spaced X-axis |
VF_runintegralC | point-by-point integral over an equally spaced X-axis |
VF_ismonoton | test if an array is monotonously rising or falling |
VF_iselementC | test, if a given value occurs within a vector |
VF_searchC | search an ordered table for the entry whose value comes closest to a preset value C |
VF_localmaxima | detect local maxima (points whose right and left neighbours are smaller) |
VF_localminima | detect local minima (points whose right and left neighbours are larger) |
VF_max | detect global maximum |
VF_min | detect global minimum |
VF_minmax | detect global minimum and maximum |
VF_maxind | global maximum and its index |
VF_minind | global minimum and its index |
VF_absmax | global maximum absolute value |
VF_absmin | global minimum absolute value |
VF_absminmax | detect global minimum and maximum absolute values |
VF_minpos | smallest positive value within a vector |
VF_absmaxind | global maximum absolute value and its index |
VF_absminind | global minimum absolute value and its index |
VF_maxexp | global maximum exponent |
VF_minexp | global minimum exponent |
VF_runmax | "running" maximum |
VF_runmin | "running" minimum |
The complex equivalents of the last group of functions are:
VCF_maxReIm | maximum real and imaginary parts separately |
VCF_minReIm | minimum real and imaginary parts separately |
VCF_absmaxReIm | maximum absolute real and imaginary values separately |
VCF_absminReIm | minimum absolute real and imaginary values separately |
VCF_absmax | largest magnitude (absolute value; this is a real number) |
VCF_absmin | smallest magnitude |
VCF_cabsmax | complex number of largest magnitude |
VCF_cabsmin | complex number of smallest magnitude |
VCF_sabsmax | complex number for which the sum |Re| + |Im| is largest |
VCF_sabsmin | smallest complex number in terms of the sum |Re| + |Im| |
VCF_absmaxind | largest magnitude (absolute value) and its index |
VCF_absminind | smallest magnitude and its index |
To determine the center of gravity of a vector, you have the choice between the following two functions:
VF_centerOfGravityInd | center of gravity, returned as an interpolated element index |
VF_centerOfGravityV | center of gravity of a Y vector with explicitly given X axis |
VF_FFTtoC | forward Fast Fourier Transform (FFT) of a real vector; the result is a cartesian complex vector |
VF_FFT | forward and backward FFT of a real vector; the result of the forward FFT is packed into a real vector of the same size as the input vector |
VCF_FFT | forward and backward FFT of a complex vector |
MF_Rows_FFT | FFT along the rows of a matrix; this function may be used for batch-processing of several vectors of identical size, stored as the rows of a matrix |
MF_Cols_FFT | FFT along the columns of a matrix; this function may be used for batch-processing of several vectors of identical size, stored as the columns of a matrix |
VF_convolve VF_convolvewEdit |
convolution with a given response function |
VF_deconvolve VF_convolvewEdit |
deconvolution, assuming a given response function |
VF_filter | spectral filtering |
VF_spectrum | spectral analysis |
VF_xspectrum | cross-spectral density of two signals (complex one-sided variant) |
VF_xspectrumAbs | cross-spectral density of two signals (absolute values) |
VF_coherence | coherence function between two signals |
VF_autocorr | autocorrelation function of a data array |
VF_xcorr | cross-correlation function of two arrays |
VF_setRspEdit | set default editing threshold for the filter in convolutions and deconvolutions (decides over the treatment of "lost" frequencies) |
VF_getRspEdit | retrieve the current default editing threshold |
FFT and all FFT-based functions need additional buffer memory which they allocate and free internally. In order to allow this inefficiency to be avoided for multiple calls, all of these functions exist in an additional version, marked by the prefix VFb_, which takes a pointer to user-supplied buffer memory as an additional argument. The necessary size of this buffer memory is given for each function in its detailed description in FUNCREF.HTM.
Although they do not use Fourier transform methods, the functions VF_biquad (bi-quadratic audio filtering) and VF_smooth (crude form of frequency filtering which removes high-frequency noise) should be mentioned here.
VF_sum | sum of all elements |
VI_fsum | sum of all elements of an integer vector, accumulated as a floating point number in double or extended precision |
VF_prod | product of all elements |
VF_ssq | sum-of-squares of all elements |
VF_sumabs | sum of absolute values of all elements |
VF_rms | root-of-the-mean-square of all elements |
VF_runsum | running sum, also called "cumulative sum" or "inclusive sum-scan" |
VF_runprod | running product |
VF_sumdevC | sum over the deviations from a preset constant, sum( |Xi-C| ) |
VF_sumdevV | sum over the deviations from another vector, sum( |Xi-Yi| ) |
VF_sumdevVwSaturation | sum over the deviations from another vector, sum( |Xi-Yi| ) with saturation of possible overflow to HUGE_VAL |
VF_subV_sumabs | difference between two vectors and sum over the absolute values of the results |
VF_avdevC | average deviation from a preset constant, 1/N * sum( |Xi-C| ) |
VF_avdevV | average deviation from another vector, 1 / N * sum( |Xi-Yi| ) |
VF_ssqdevC | sum-of-squares of the deviations from a preset constant, sum( (Xi - C)² ) |
VF_ssqdevV | sum-of-squares of the deviations from another vector, sum( (Xi - Yi)² ) |
VF_ssqdevVwSaturation | sum-of-squares of the deviations from another vector with saturation of possible overflow to HUGE_VAL |
VF_subV_ssq | difference between two vectors and sum-of-squares over the results |
VF_chi2 | chi-square merit function |
VF_chi2wSaturation | chi-square merit function with saturation of possible overflow to HUGE_VAL |
VF_subV_chi2 | difference between two vectors and chi-square merit function |
VF_chiabs | "robust" merit function, similar to VF_chi2, but based on absolute instead of squared deviations |
VF_chiabswSaturation | The same as VF_chiabs, but with saturation of possible overflow to HUGE_VAL |
VF_subV_chiabs | difference between two vectors and chiabs merit function |
VF_mean | equally-weighted mean (or average) of all elements |
VF_meanwW | "mean with weights" of all elements |
VF_meanabs | equally-weighted mean (or average) of the absolute values of all elements |
VF_selected_mean | averages only those vector elements which fall into a specified range, thus allowing to exclude outlier points from the calculation of the mean |
VF_varianceC | variance of a distribution with respect to a preset constant value |
VF_varianceCwW | the same with non-equal weighting |
VF_varianceV | variance of one distribution with respect to another |
VF_varianceVwW | the same with non-equal weighting |
VF_meanvar | mean and variance of a distribution simultaneously |
VF_meanvarwW | the same with non-equal weighting |
VF_median | median of a distribution |
VF_corrcoeff | linear correlation coefficient of two distributions |
VF_distribution | histogram calculation - bins data into a discrete one-dimensional distribution function |
VF_min_max_mean_stddev | simultaneous calculation of the minimum, maximum, mean, and standard deviation of a one-dimensional distribution |
A detailed description of the various data-fitting concepts is given in chapter 13 of MATRIX.HTM. Therefore, at this place, the available X-Y fitting functions are only summarized in the following table:
VF_linregress | equally-weighted linear regression on X-Y data |
VF_linregresswW | the same with non-equal weighting |
VF_polyfit | fitting of one X-Y data set to a polynomial |
VF_polyfitwW | the same for non-equal data-point weighting |
VF_polyfitOdd | fitting of one X-Y data set to a polynomial with odd terms only |
VF_polyfitOddwW | the same for non-equal data-point weighting |
VF_polyfitEven | fitting of one X-Y data set to a polynomial with even terms only |
VF_polyfitEvenwW | the same for non-equal data-point weighting |
VF_linfit | fitting of one X-Y data set to an arbitrary function linear in its parameters |
VF_linfitwW | the same for non-equal data-point weighting |
VF_setLinfitNeglect | set threshold to neglect (i.e. set equal to zero) a fitting parameter A[i], if its significance is smaller than the threshold |
VF_getLinfitNeglect | retrieve current significance threshold |
VF_nonlinfit | fitting of one X-Y data set to an arbitrary, possibly non-linear function |
VF_nonlinfitwW | the same for non-equal data-point weighting |
VF_multiLinfit | fitting of multiple X-Y data sets to one common linear function |
VF_multiLinfitwW | the same for non-equal data-point weighting |
VF_multiNonlinfit | fitting of multiple X-Y data sets to one common nonlinear function |
VF_multiNonlinfitwW | the same for non-equal data-point weighting |
VF_cprint | Windows with MS Visual C++ or Borland / Embarcadero compiler:/u> print the elements of a vector to the screen (or "console" – hence the "c" in the name) into the current text window. The height and width of the text window are automatically detected. After printing one page, the user is prompted to continue. (Only for console applications)
Other Windows compilers and Linux: identical to VF_print. |
VF_print | is similar to VF_cprint in that the output is directed to the screen, but there is no automatic detection of the screen data. The symbolic constant V_consoleWindowWidth (defined in <VecLib.h> or in the unit VecLib with a default value of 150) determines the linewidth, and no division into pages is made. (Only for console applications) |
VF_fprint | print a vector to a stream. |
VF_chexprint | Similar to VF_cprint, but printed in hexadecimal format. |
VF_hexprint | Similar to VF_print, but printed in hexadecimal format. |
VF_fhexprint | Similar to VF_fprint, but printed in hexadecimal format. |
VF_write | write data in ASCII format in a stream |
VF_read | read a vector from an ASCII file |
VF_nwrite | write n vectors of the same data type as the columns of a table into a stream |
VF_nread | read the columns of a table into n vectors of the same type |
VF_store | store data in binary format |
VF_recall | retrieve data in binary format |
The following functions allow to modify the standard settings of VF_write, VF_nwrite and VI_read:
VF_setWriteFormat | define a certain number format |
VF_setWriteSeparate | define a separation string between successive elements, written by VF_write |
VF_setNWriteSeparate | define a separation string between the columns written by VF_nwrite |
V_setRadix | define a radix different from the standard of 10 for the whole-number variants of the V.._read functions |
V_initPlot | initialize VectorLib graphics functions. No shut-down is needed at the end, since the Windows graphics functions always remain accessible. V_initPlot automatically reserves a part of the screen for plotting operations. This part comprises about 2/3 of the screen on the right side. Above, one line is left for a heading. Below, a few lines are left empty. To change this default plotting region, call V_setPlotRegion after V_initPlot. |
V_initPrint | initialize VectorLib graphics functions and direct them to a printer. By default, one whole page is reserved for plotting. In order to change this, call V_setPlotRegion after V_initPrint. |
V_setPlotRegion | set a plotting region different from the default |
VectorLib distinguishes between two sorts of plotting functions, AutoPlot and DataPlot. All AutoPlot functions (e.g., VF_xyAutoPlot) execute the following steps:
V?_autoPlot (no suffix): | both X and Y axis linear. |
V?_autoPlot_xlg_ylin: | X axis logarithmic, Y axis linear. |
V?_autoPlot_xlg_ylg: | both X and Y axis logarithmic. |
V?_autoPlot_xlin_ylg: | X axis linear, Y axis logarithmic. |
VF_xyAutoPlot | display an automatically-scaled plot of an X-Y vector pair |
VF_yAutoPlot | plot a single Y-vector, using the index as X-axis |
VF_xy2AutoPlot | plot two X-Y pairs at once, scaling the axes in such a way that both vectors fit into the same coordinate system |
VF_y2AutoPlot | the same for two Y-vectors, plotted against their indices |
VF_xyDataPlot | plot one additional set of X-Y data |
VF_yDataPlot | plot one additional Y vector over its index |
Cartesian complex arrays are printed into the complex plane (the imaginary parts versus the real parts), using
VCF_autoPlot | plot one cartesian complex vector |
VCF_2AutoPlot | plot two cartesian complex vectors simultaneously |
VCF_dataPlot | plot one additional cartesian complex vector |
At present, there are no plotting functions for polar complex vectors included.
It is possible to draw more than one coordinate systems into a given window on the screen. The position of each coordinate system must be specified by the above-mentioned function V_setPlotRegion. "Hopping" between the different coordinate systems and adding new DataPlots after defining new viewports (e.g., for text output) is made possible by the following functions:
V_continuePlot | go back to the viewport of the last plot and restore its scalings |
V_getCoordSystem | get a copy of the scalings and position of the current coordinate system |
V_setCoordSystem | restore the scalings and position of a coordinate system; these must have been stored previously, using V_getCoordSystem |
The production libraries of OptiVec treat all mathematical errors (Overflow, Singularity, Domain/Range, Loss-of-Accuracy) "silently" and continue program execution with a corrected result.
The Debug libraries, on the other hand, indicate the occurrence of errors by emitting a message, in addition to correcting the result. As described below, the function V_setFPErrorHandling can be called in order to control which errors will lead to a message. The function V_setErrorEventFile allows to choose if the messages are to be emitted into a log file, to a popup window, or into a console text window.
A series of identical errors occurring within one and the same OptiVec function leads to one error message only. Subsequent identical messages are suppressed.
There are limits to the ability of protection mechanisms to catch floating-point errors. Especially the extended-precision versions, but also the double-precision versions do not have much of a "safety margin". To be on the safe side, constant parameters should not exceed about 1.E32 for float, 1.E150 for double, and 1.E2000 for extended parameters.
>In the "expanded" versions of all functions with extended accuracy (those with the prefixes VEx_ and VCEx_; for example VEx_exp), there is generally no overflow protection for the calculation of A*Xi+B, but only for the core of the function itself and for the final multiplication by C.
There is a fundamental difference between floating-point and integer numbers with respect to OVERFLOW and DOMAIN errors: for floating-point numbers, these are always serious errors, whereas for integer numbers, by virtue of the implicit modulo-2n arithmetics, this is not necessarily the case. In the following two paragraphs, details are given on the error handling of integer and floating-point numbers, respectively.
ierrNote | print an error message |
ierrAbort | print an error message and exit the program |
ierrIgnore | ignore the problem. With this last option, the error handling can be switched off intermediately. |
Although you may use a call to
V_setIntErrorHandling( ierrIgnore );
to switch the error handling off, it is always better simply to use the "normal" VI_ version rather than the VIo_ version with the error-handling short-cut, as the normal version is always much faster.
C/C++ only:
To choose the overflow-detecting version not only for single function calls, but everywhere, the easiest way is to define symbolic constant V_trapIntError in the program header before(!) <VecLib.h> is included:
Example:
#define V_trapIntError 1
#include <VSIstd.h>
#include <VSImath.h>
.....
main() /* or WinMain(), or OwlMain() */
{
siVector SI1, SI2;
SI1 = VSI_vector( 1000 ); SI2 = VSI_vector( 1000 );
V_setIntErrorHandling( ierrNote );
VSI_ramp( SI1, 1000, 0, 50 ); /* an overflow will occur here! */
V_setIntErrorHandling( ierrIgnore );
VSI_mulC( SI2, SI1, 1000, 5 );
/* here, even a whole series of overflows will occur; they are all ignored. */
....
}
Debug libraries only: As has been mentioned above, one may call V_setFPErrorHandling in order to select which error types lead to a message and which may lead to program execution being broken off. The available options are set by the predefined constants fperrXXX:
Option | Meaning |
fperrIgnore | Treat all floating-point errors silently |
fperrNoteDOMAIN | Notify in case of DOMAIN / ERANGE errors |
fperrAbortDOMAIN | Notify and break off in case of DOMAIN / ERANGE errors |
fperrNoteSING | Notify in case of Singularities (divisions by 0) |
fperrAbortSING | Notify and break off in case of Singularities |
fperrNoteOVERFLOW | Notify in case of Overflow |
fperrAbortOVERFLOW | Notify and break off in case of Overflow |
fperrNoteTLOSS | Notify in case of Total Loss of Precision (e.g., at sin(1.e30)) |
fperrAbortTLOSS | Notify and break off in case of Total Loss of Precision |
fperrDefault | Default setting = fperrAbortDOMAIN + fperrNoteSING + fperrNoteOVERFLOW |
In the following description of all floating-point error types, we denote by "HUGE_VAL" the largest number possible in the respective data type. Similarly, "TINY_VAL" is the smallest denormal number representable in the respective data type; this is not the same as "MIN_VAL", which is the smallest full-accuracy number of the respective data type.
In general, they may be treated just as ordinary numbers. In some instances, however, like taking the inverse, overflow errors may occur. In these cases, the somewhat academic distinction between SING and OVERFLOW errors is dropped and a SING error signalled (as if it was a division by exactly 0).
On the other hand, for functions like the logarithms, very small input numbers may give perfectly reasonable results, although the exact number 0.0 is an illegal argument, leading to a SING error. Here, the possible loss of precision is neglected and denormals are considered valid arguments. (This treatment is quite different from that chosen for the math functions of most compilers, where denormal arguments lead to SING errors also in these cases, which seems much less appropriate to us.)
You might wish to circumvent this. To this end, OptiVec provides the function V_setErrorEventFile. This function needs as arguments the desired name of your event file and a switch named ScreenAndFile which decides if the error message is printed only into the file (ScreenAndFile = 0), or additionally into a message box (ScreenAndFile = 1) or, for console programmes, to the screen as well (ScreenAndFile = 2). By calling V_setErrorEventFile( "NULL", 0 ) (C/C++) or V_setErrorEventFile( 'nil', 0 ) (Pascal/Delphi), you can even completely switch any messages off (if you decide that is a wise thing to do).
Note that this redirection of error messages is valid only for errors occurring in OptiVec routines. It is possible, however, for a user program to use the OptiVec function V_printErrorMsg for its own error messages.
Certain configurations of the compilers supported by OptiVec do not allow the full set of options described above. For some, either output into a message box is missing or output to the console screen. In these cases, an error message is displayed and the output redirected to the available other option.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
C/C++ only: Declare the use of OptiVec functions with #include statements. If you are using MFC (Microsoft Foundation Classes) or the OWL (ObjectWindows Library) with old Borland compilers, the MFC or OWL include-files have to be #included before (!) the OptiVec include-files.
Pascal/Delphi only: Declare the use of OptiVec functions with the uses clause.
Include-file (add suffix .H) or unit | Contents |
VecLib | Basic definitions of the data types along with the functions common to all data types (prefix V_) except for the graphics initialization functions. |
VFstd, VDstd, VEstd | Floating-point "standard operations:" generation and initialization of vectors, index-oriented manipulations, data-type interconversions, statistics, analysis, geometrical vector arithmetics, Fourier-Transform related functions, I/O operations. |
VCFstd, VCDstd, VCEstd, VPFstd, VPDstd, VPEstd | Standard operations for cartesian and polar complex vectors |
VIstd, VBIstd, VSIstd, VLIstd, VQIstd | Standard operations for signed integer vectors |
VUstd, VUBstd, VUSstd, VULstd, VUQstd, VUIstd | Standard operations for unsigned integer vectors |
VFmath, VDmath, VEmath | Algebraic, arithmetical and mathematical functions for floating-point vectors |
VCFmath, VCDmath, VCEmath, VPFmath, VPDmath, VPEmath | Arithmetical and mathematical functions for complex vectors |
VImath, VBImath, VSImath, VLImath, VQImath | Arithmetical and mathematical functions for signed integer vectors |
VUmath, VUBmath, VUSmath, VULmath, VUQmath, VUImath | Arithmetical and mathematical functions for unsigned integer vectors |
Vgraph | Graphics functions for all data types |
VFNLFIT, VDNLFIT, VENLFIT | Non-linear fitting functions (Pascal/Delphi only; in C/C++, they are in M?std) |
VFMNLFIT, VDMNLFIT, VEMNLFIT | Non-linear fitting functions for multiple data sets (Pascal/Delphi only; in C/C++, they are in M?std) |
MFstd, MDstd, MEstd | Matrix operations for real-valued matrices |
MCFstd, MCDstd, MCEstd | Matrix operations for cartesian complex matrices |
Mgraph | Matrix graphics functions for all data types |
MFNLFIT, MDNLFIT, MENLFIT | Non-linear fitting functions for Z = f(X, Y) data (Pascal/Delphi only; in C/C++, they are in M?std) |
MFMNLFIT, MDMNLFIT, MEMNLFIT | Non-linear fitting functions for multiple Z = f(X, Y) data sets (Pascal/Delphi only; in C/C++, they are in M?std) |
NEWCPLX | complex class library CMATH; C++ only |
CMATH | complex library CMATH for Pascal/Delphi and plain C |
CFMATH, CDMATH, CEMATH | C/C++ only: type-specific parts of CMATH. |
OVXMATH | A few non-vectorized math functions needed internally by other OptiVec functions; they are publically accessible (see chapter 9). C/C++: declares also the sine, cosec, and tangent tables for VF_sinrpi2 etc. |
FSINTAB2, DSINTAB2, ESINTAB3, FSINTAB3, DSINTAB3, ESINTAB3 | sine tables (Pascal/Delphi only; for C/C++, they are in OVXMATH) |
FCSCTAB2, DCSCTAB2, ECSCTAB3, FCSCTAB3, DCSCTAB3, ECSCTAB3 | cosecant tables (Pascal/Delphi only; for C/C++, they are in OVXMATH) |
FTANTAB2, DTANTAB2, ETANTAB3, FTANTAB3, DTANTAB3, ETANTAB3 | tangent tables (Pascal/Delphi only; for C/C++, they are in OVXMATH) |
VecObj | basic definitions for VecObj, the object-oriented interface for C++ |
fVecObj, dVecObj, eVecObj | VecObj member functions for real-valued vector objects (C++ only) |
cfVecObj, cdVecObj, ceVecObj pfVecObj, pdVecObj, peVecObj | VecObj member functions for complex vector objects (C++ only) |
iVecObj, biVecObj, siVecObj, liVecObj, qiVecObj | VecObj member functions for signed-integer vector objects (C++ only) |
uVecObj, ubVecObj, usVecObj, ulVecObj, uiVecObj | VecObj member functions for unsigned-integer vector objects (C++ only) |
OptiVec | includes the whole OptiVec package (C++ only) |
VecAll | includes all VectorLib and CMATH functions (C or C++ only) |
MatAll | includes all MatrixLib functions (C or C++ only) |