diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 19:33:14 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-07 19:33:14 +0000 |
commit | 36d22d82aa202bb199967e9512281e9a53db42c9 (patch) | |
tree | 105e8c98ddea1c1e4784a60a5a6410fa416be2de /media/kiss_fft/README | |
parent | Initial commit. (diff) | |
download | firefox-esr-upstream.tar.xz firefox-esr-upstream.zip |
Adding upstream version 115.7.0esr.upstream/115.7.0esrupstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r-- | media/kiss_fft/README | 134 | ||||
-rw-r--r-- | media/kiss_fft/README.simd | 78 | ||||
-rw-r--r-- | media/kiss_fft/README_MOZILLA | 8 |
3 files changed, 220 insertions, 0 deletions
diff --git a/media/kiss_fft/README b/media/kiss_fft/README new file mode 100644 index 0000000000..03b2e7a9c1 --- /dev/null +++ b/media/kiss_fft/README @@ -0,0 +1,134 @@ +KISS FFT - A mixed-radix Fast Fourier Transform based up on the principle, +"Keep It Simple, Stupid." + + There are many great fft libraries already around. Kiss FFT is not trying +to be better than any of them. It only attempts to be a reasonably efficient, +moderately useful FFT that can use fixed or floating data types and can be +incorporated into someone's C program in a few minutes with trivial licensing. + +USAGE: + + The basic usage for 1-d complex FFT is: + + #include "kiss_fft.h" + + kiss_fft_cfg cfg = kiss_fft_alloc( nfft ,is_inverse_fft ,0,0 ); + + while ... + + ... // put kth sample in cx_in[k].r and cx_in[k].i + + kiss_fft( cfg , cx_in , cx_out ); + + ... // transformed. DC is in cx_out[0].r and cx_out[0].i + + free(cfg); + + Note: frequency-domain data is stored from dc up to 2pi. + so cx_out[0] is the dc bin of the FFT + and cx_out[nfft/2] is the Nyquist bin (if exists) + + Declarations are in "kiss_fft.h", along with a brief description of the +functions you'll need to use. + +Code definitions for 1d complex FFTs are in kiss_fft.c. + +You can do other cool stuff with the extras you'll find in tools/ + + * multi-dimensional FFTs + * real-optimized FFTs (returns the positive half-spectrum: (nfft/2+1) complex frequency bins) + * fast convolution FIR filtering (not available for fixed point) + * spectrum image creation + +The core fft and most tools/ code can be compiled to use float, double, + Q15 short or Q31 samples. The default is float. + + +BACKGROUND: + + I started coding this because I couldn't find a fixed point FFT that didn't +use assembly code. I started with floating point numbers so I could get the +theory straight before working on fixed point issues. In the end, I had a +little bit of code that could be recompiled easily to do ffts with short, float +or double (other types should be easy too). + + Once I got my FFT working, I was curious about the speed compared to +a well respected and highly optimized fft library. I don't want to criticize +this great library, so let's call it FFT_BRANDX. +During this process, I learned: + + 1. FFT_BRANDX has more than 100K lines of code. The core of kiss_fft is about 500 lines (cpx 1-d). + 2. It took me an embarrassingly long time to get FFT_BRANDX working. + 3. A simple program using FFT_BRANDX is 522KB. A similar program using kiss_fft is 18KB (without optimizing for size). + 4. FFT_BRANDX is roughly twice as fast as KISS FFT in default mode. + + It is wonderful that free, highly optimized libraries like FFT_BRANDX exist. +But such libraries carry a huge burden of complexity necessary to extract every +last bit of performance. + + Sometimes simpler is better, even if it's not better. + +FREQUENTLY ASKED QUESTIONS: + Q: Can I use kissfft in a project with a ___ license? + A: Yes. See LICENSE below. + + Q: Why don't I get the output I expect? + A: The two most common causes of this are + 1) scaling : is there a constant multiplier between what you got and what you want? + 2) mixed build environment -- all code must be compiled with same preprocessor + definitions for FIXED_POINT and kiss_fft_scalar + + Q: Will you write/debug my code for me? + A: Probably not unless you pay me. I am happy to answer pointed and topical questions, but + I may refer you to a book, a forum, or some other resource. + + +PERFORMANCE: + (on Athlon XP 2100+, with gcc 2.96, float data type) + + Kiss performed 10000 1024-pt cpx ffts in .63 s of cpu time. + For comparison, it took md5sum twice as long to process the same amount of data. + + Transforming 5 minutes of CD quality audio takes less than a second (nfft=1024). + +DO NOT: + ... use Kiss if you need the Fastest Fourier Transform in the World + ... ask me to add features that will bloat the code + +UNDER THE HOOD: + + Kiss FFT uses a time decimation, mixed-radix, out-of-place FFT. If you give it an input buffer + and output buffer that are the same, a temporary buffer will be created to hold the data. + + No static data is used. The core routines of kiss_fft are thread-safe (but not all of the tools directory). + + No scaling is done for the floating point version (for speed). + Scaling is done both ways for the fixed-point version (for overflow prevention). + + Optimized butterflies are used for factors 2,3,4, and 5. + + The real (i.e. not complex) optimization code only works for even length ffts. It does two half-length + FFTs in parallel (packed into real&imag), and then combines them via twiddling. The result is + nfft/2+1 complex frequency bins from DC to Nyquist. If you don't know what this means, search the web. + + The fast convolution filtering uses the overlap-scrap method, slightly + modified to put the scrap at the tail. + +LICENSE: + Revised BSD License, see COPYING for verbiage. + Basically, "free to use&change, give credit where due, no guarantees" + Note this license is compatible with GPL at one end of the spectrum and closed, commercial software at + the other end. See http://www.fsf.org/licensing/licenses + + A commercial license is available which removes the requirement for attribution. Contact me for details. + + +TODO: + *) Add real optimization for odd length FFTs + *) Document/revisit the input/output fft scaling + *) Make doc describing the overlap (tail) scrap fast convolution filtering in kiss_fastfir.c + *) Test all the ./tools/ code with fixed point (kiss_fastfir.c doesn't work, maybe others) + +AUTHOR: + Mark Borgerding + Mark@Borgerding.net diff --git a/media/kiss_fft/README.simd b/media/kiss_fft/README.simd new file mode 100644 index 0000000000..b0fdac5506 --- /dev/null +++ b/media/kiss_fft/README.simd @@ -0,0 +1,78 @@ +If you are reading this, it means you think you may be interested in using the SIMD extensions in kissfft +to do 4 *separate* FFTs at once. + +Beware! Beyond here there be dragons! + +This API is not easy to use, is not well documented, and breaks the KISS principle. + + +Still reading? Okay, you may get rewarded for your patience with a considerable speedup +(2-3x) on intel x86 machines with SSE if you are willing to jump through some hoops. + +The basic idea is to use the packed 4 float __m128 data type as a scalar element. +This means that the format is pretty convoluted. It performs 4 FFTs per fft call on signals A,B,C,D. + +For complex data, the data is interlaced as follows: +rA0,rB0,rC0,rD0, iA0,iB0,iC0,iD0, rA1,rB1,rC1,rD1, iA1,iB1,iC1,iD1 ... +where "rA0" is the real part of the zeroth sample for signal A + +Real-only data is laid out: +rA0,rB0,rC0,rD0, rA1,rB1,rC1,rD1, ... + +Compile with gcc flags something like +-O3 -mpreferred-stack-boundary=4 -DUSE_SIMD=1 -msse + +Be aware of SIMD alignment. This is the most likely cause of segfaults. +The code within kissfft uses scratch variables on the stack. +With SIMD, these must have addresses on 16 byte boundaries. +Search on "SIMD alignment" for more info. + + + +Robin at Divide Concept was kind enough to share his code for formatting to/from the SIMD kissfft. +I have not run it -- use it at your own risk. It appears to do 4xN and Nx4 transpositions +(out of place). + +void SSETools::pack128(float* target, float* source, unsigned long size128) +{ + __m128* pDest = (__m128*)target; + __m128* pDestEnd = pDest+size128; + float* source0=source; + float* source1=source0+size128; + float* source2=source1+size128; + float* source3=source2+size128; + + while(pDest<pDestEnd) + { + *pDest=_mm_set_ps(*source3,*source2,*source1,*source0); + source0++; + source1++; + source2++; + source3++; + pDest++; + } +} + +void SSETools::unpack128(float* target, float* source, unsigned long size128) +{ + + float* pSrc = source; + float* pSrcEnd = pSrc+size128*4; + float* target0=target; + float* target1=target0+size128; + float* target2=target1+size128; + float* target3=target2+size128; + + while(pSrc<pSrcEnd) + { + *target0=pSrc[0]; + *target1=pSrc[1]; + *target2=pSrc[2]; + *target3=pSrc[3]; + target0++; + target1++; + target2++; + target3++; + pSrc+=4; + } +} diff --git a/media/kiss_fft/README_MOZILLA b/media/kiss_fft/README_MOZILLA new file mode 100644 index 0000000000..a2fd35d6a4 --- /dev/null +++ b/media/kiss_fft/README_MOZILLA @@ -0,0 +1,8 @@ +The source from this directory was copied from the kissfft hg repository using +the update.sh script. The only changes made were those applied by update.sh +and the addition of moz.build and Makefile.in build files for the Mozilla build +system. + +The kissfft hg repository is: http://hg.code.sf.net/p/kissfft/code + +The hg revision ID used was fbe1bb0bc7b94ec252842b8b7e3f3347ec75d92f. |