You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
285 lines
22 KiB
285 lines
22 KiB
;* ======================================================================== *;
|
|
;* TEXAS INSTRUMENTS, INC. *;
|
|
;* *;
|
|
;* IMGLIB DSP Image/Video Processing Library *;
|
|
;* *;
|
|
;* Release: Revision 1.04b *;
|
|
;* CVS Revision: 1.6 Sun Sep 29 03:32:31 2002 (UTC) *;
|
|
;* Snapshot date: 23-Oct-2003 *;
|
|
;* *;
|
|
;* This library contains proprietary intellectual property of Texas *;
|
|
;* Instruments, Inc. The library and its source code are protected by *;
|
|
;* various copyrights, and portions may also be protected by patents or *;
|
|
;* other legal protections. *;
|
|
;* *;
|
|
;* This software is licensed for use with Texas Instruments TMS320 *;
|
|
;* family DSPs. This license was provided to you prior to installing *;
|
|
;* the software. You may review this license by consulting the file *;
|
|
;* TI_license.PDF which accompanies the files in this library. *;
|
|
;* ------------------------------------------------------------------------ *;
|
|
;* Copyright (C) 2003 Texas Instruments, Incorporated. *;
|
|
;* All Rights Reserved. *;
|
|
;* ======================================================================== *;
|
|
;* ======================================================================== *;
|
|
;* Assembler compatibility shim for assembling 4.30 and later code on *;
|
|
;* tools prior to 4.30. *;
|
|
;* ======================================================================== *;
|
|
;* ======================================================================== *;
|
|
;* End of assembler compatibility shim. *;
|
|
;* ======================================================================== *;
|
|
* ========================================================================= *
|
|
* TEXAS INSTRUMENTS, INC. *
|
|
* *
|
|
* NAME *
|
|
* IMG_wave_vert : Vertical Pass of Wavelet Transform *
|
|
* *
|
|
* REVISION DATE *
|
|
* 08-Feb-2000 *
|
|
* *
|
|
* USAGE *
|
|
* This routine is C-callable and can be called as: *
|
|
* *
|
|
* void IMG_wave_vert *
|
|
* ( *
|
|
* const short *restrict *
|
|
* *restrict in_data, /* Array of row pointers */ *
|
|
* const short *restrict qmf, /* Low pass QMF filter */ *
|
|
* const short *restrict mqmf, /* High pass QMF filter */ *
|
|
* short *restrict out_ldata, /* Low pass output data */ *
|
|
* short *restrict out_hdata, /* High pass output data */ *
|
|
* int cols /* Length of rows to process */ *
|
|
* ); *
|
|
* *
|
|
* *
|
|
* DESCRIPTION *
|
|
* The benchmark performs the vertical pass of 2D wavelet *
|
|
* transform It performs a vertical filter on 8 rows which are *
|
|
* pointed to by the pointers contained in an array of pointers. *
|
|
* It produces two lines worth of output, one being the low-pass *
|
|
* and the other being the high pass result. Instead of performing *
|
|
* a transpose on the column and re-using the wave_horz kernel, *
|
|
* the vertical filter is traversed over the entire width of the *
|
|
* line and the low pass and high pass filtering kernels are *
|
|
* performed together. *
|
|
* *
|
|
* This implies that the low-pass and highpass filters be *
|
|
* overlapped in execution so that the input data array may be *
|
|
* loaded once and both filters can be exceuted in parallel. *
|
|
* *
|
|
* TECHNIQUES *
|
|
* The inner loop that advances along each filter tap is totally *
|
|
* optimized by unrolling. Double-word loads are performed, and *
|
|
* paired multiplies are used to perform four iterations of *
|
|
* low-pass filter in parallel. *
|
|
* *
|
|
* For the high-pass kernel, the same loop is reused, in order *
|
|
* to save codesize. This is done by loading the filter *
|
|
* coefficients in a special order. *
|
|
* *
|
|
* ASSUMPTIONS *
|
|
* Since the wavelet transform is dyadic, the image dimensions *
|
|
* "rows" and "cols" are assumed to be powers of 2. No checking *
|
|
* is done within the code to ensure this. *
|
|
* *
|
|
* The input filters qmf and mqmf are assumed to be word aligned *
|
|
* and have exactly 8 taps. *
|
|
* *
|
|
* The output data and input data on any line is assumed to be *
|
|
* dword-aligned. *
|
|
* *
|
|
* The mqmf filter is constructed from the qmf as follows: *
|
|
* *
|
|
* status = -1; *
|
|
* for (i = 0; i < M; i++) *
|
|
* { *
|
|
* status = status * -1; *
|
|
* hdata = qmf[i] * status; *
|
|
* filter[i] = hdata; *
|
|
* } *
|
|
* *
|
|
* The kernels assume that the number of filter taps is exactly *
|
|
* 8. In addition data that is loaded for producing out_ldata[0] *
|
|
* and out_hdata[0] is not identical. The data loaded for *
|
|
* producing out_hdata[0] produces results at the location *
|
|
* *
|
|
* out_lstart = o_im + ((rows >> 1) - 3) * cols *
|
|
* out_hstart = o_im + (rows >> 1) * cols *
|
|
* *
|
|
* Where o_im is start of output image, rows is # of rows of the *
|
|
* input image, and cols is # of cols of the output image. *
|
|
* *
|
|
* The following table illustrates how ylptr and yhptr need to be *
|
|
* updated at the start of each call to this function: *
|
|
* *
|
|
* Call# out_ldata out_hdata *
|
|
* 1 out_lstart out_hstart *
|
|
* 2 out_lstart + cols out_hstart + cols *
|
|
* 3 out_lstart + 2*cols out_hstart + 2*cols *
|
|
* *
|
|
* At this point ylptr wraps around to become o_im, while yhptr *
|
|
* proceeds as usual: *
|
|
* *
|
|
* 4 o_im out_hstart + 3*cols *
|
|
* *
|
|
* In addition the kernel accepts a pointer to an array of *
|
|
* pointers for each input line so that a working buffer of 10 *
|
|
* lines can be used to effectively mix DMA's and processing as *
|
|
* shown below: *
|
|
* *
|
|
* ihptr LINE BUFFER *
|
|
* ptr0 ---->|-------------------------------------------------| *
|
|
* ptr1 ---->|-------------------------------------------------| *
|
|
* ... *
|
|
* ptr7 ---->|-------------------------------------------------| *
|
|
* *
|
|
* At the start of the kernel 8 input lines are filled to the *
|
|
* first 8 lines and processing begins. In the background the next *
|
|
* two lines are fetched. The pointers are moved up by 2 namely *
|
|
* ptr[i] = ptr[i+2] and the last two lines now point to lines 9 *
|
|
* and 10 and processing starts again. In the background the next *
|
|
* two lines are brought in the first two lines of the line *
|
|
* buffer. Pointers move up again by 2 but now the last two *
|
|
* pointers to line 0 and 1. This pattern then repeats. *
|
|
* *
|
|
* The first line to begin filtering is always obtained from *
|
|
* ptr[0], the next from ptr[1] and so on. *
|
|
* *
|
|
* MEMORY NOTE *
|
|
* In order to eliminate bank conflicts succesive lines in the *
|
|
* line buffer or the pointers to these lines are seperated by *
|
|
* exactly two banks (one word) so that loads to any succesive *
|
|
* lines may be parallelized together. *
|
|
* *
|
|
* This code is a LITTLE ENDIAN implementation. *
|
|
* *
|
|
* NOTES *
|
|
* This code masks interrupts for nearly its entire duration. As a *
|
|
* result the code is interrupt tolerant but not interruptible. *
|
|
* *
|
|
* C CODE *
|
|
* void IMG_wave_vert *
|
|
* ( *
|
|
* const short *restrict *
|
|
* *restrict in_data, /* Array of row pointers */ *
|
|
* const short *restrict qmf, /* Low pass QMF filter */ *
|
|
* const short *restrict mqmf, /* High pass QMF filter */ *
|
|
* short *restrict out_ldata, /* Low pass output data */ *
|
|
* short *restrict out_hdata, /* High pass output data */ *
|
|
* int cols /* Length of rows to process */ *
|
|
* ) *
|
|
* { *
|
|
* const int M = 8; *
|
|
* int i, iters, j; *
|
|
* int sum_h, sum_l; *
|
|
* int prod_h, prod_l; *
|
|
* *
|
|
* short res_h, res_l; *
|
|
* short xdata, hdata, ldata; *
|
|
* short *filt_ptr; *
|
|
* *
|
|
* /* ------------------------------------------------------ */ *
|
|
* /* iters: variable for the # of loop iterations. */ *
|
|
* /* */ *
|
|
* /* Both the low pass and the high pass filters produce */ *
|
|
* /* 'iters' points, which is also the width of the input */ *
|
|
* /* line. The low-pass filter reads filter coefficients */ *
|
|
* /* from qmf and the high pass filter reads filter */ *
|
|
* /* coefficients from the conjugate mirror filter. In */ *
|
|
* /* addition note that the low-pass filter coefficients */ *
|
|
* /* are read in increasing order while the high pass the */ *
|
|
* /* filter coefficients are read in the opposite order. */ *
|
|
* /* ------------------------------------------------------ */ *
|
|
* iters = cols; *
|
|
* *
|
|
* /* ------------------------------------------------------ */ *
|
|
* /* Since the filters have fractional point coefficients, */ *
|
|
* /* all math is done using Q15 fixed-point arithmetic. */ *
|
|
* /* Qr is the associated round value and is set as */ *
|
|
* /* follows: */ *
|
|
* /* */ *
|
|
* /* #define Qpt 15 */ *
|
|
* /* #define Qr 16384 */ *
|
|
* /* */ *
|
|
* /* Low-Pass filter: ihptr contains 8 pointers which */ *
|
|
* /* point to input lines. The filters are placed */ *
|
|
* /* vertically and input data is read from 8 seperate */ *
|
|
* /* lines. Hence data-reuse is not possible when */ *
|
|
* /* traversing horizontally. sum_l is initialized to Qr */ *
|
|
* /* and contains the low-pass FIR sum at the end of the */ *
|
|
* /* j loop. sum_h contains the accumulator result for */ *
|
|
* /* the high pass filter in a similar fashion. M is */ *
|
|
* /* assumed to be 8 by all kernels and is # filter taps */ *
|
|
* /* for D4. */ *
|
|
* /* ------------------------------------------------------ */ *
|
|
* *
|
|
* for ( i = 0; i < iters; i++) *
|
|
* { *
|
|
* sum_l = Qr; *
|
|
* filt_ptr = qmf; *
|
|
* *
|
|
* for ( j = 0; j < M; j++) *
|
|
* { *
|
|
* xdata = in_data[j][i]; *
|
|
* ldata = *filt_ptr++; *
|
|
* prod_l = xdata * ldata; *
|
|
* sum_l += prod_l; *
|
|
* } *
|
|
* *
|
|
* res_l = (sum_l >> Qpt); *
|
|
* *out_ldata++ = res_l; *
|
|
* } *
|
|
* *
|
|
* /* ------------------------------------------------------ */ *
|
|
* /* High-Pass filter: ihptr contains 8 pointers which */ *
|
|
* /* point to input lines. The filters are placed */ *
|
|
* /* vertically and input data is read from 8 seperate */ *
|
|
* /* lines. Hence data-reuse is not possible when */ *
|
|
* /* traversing horizontally. sum_h is initialized to */ *
|
|
* /* Qr and contains the low-pass FIR sum at the end of */ *
|
|
* /* the j loop. sum_h contains the accumulator result */ *
|
|
* /* for the high pass filter in a similar fashion. M */ *
|
|
* /* is # filter taps and is assumed to be 8 by all */ *
|
|
* /* kernels. */ *
|
|
* /* ------------------------------------------------------ */ *
|
|
* for ( i = 0; i < iters; i++) *
|
|
* { *
|
|
* sum_h = Qr; *
|
|
* filt_ptr = mqmf + M - 1; *
|
|
* *
|
|
* for ( j = 0; j < M; j++) *
|
|
* { *
|
|
* xdata = in_data[j][i]; *
|
|
* hdata = *filt_ptr--; *
|
|
* prod_h = xdata * hdata; *
|
|
* sum_h += prod_h; *
|
|
* } *
|
|
* *
|
|
* res_h = (sum_h >> Qpt); *
|
|
* *out_hdata++ = res_h; *
|
|
* } *
|
|
* } *
|
|
* *
|
|
* CYCLES *
|
|
* cycles = 4*cols + 96 (both lowpass and highpass together) *
|
|
* *
|
|
* For cols = 256, cycles = 1120. *
|
|
* For cols = 512, cycles = 2144. *
|
|
* *
|
|
* CODESIZE *
|
|
* 888 bytes *
|
|
* *
|
|
* BIBLIOGRAPHY *
|
|
* Mallat, Stephane. "A Wavelet Tour of Signal Processing", pg. 309. *
|
|
* ------------------------------------------------------------------------- *
|
|
* Copyright (c) 2003 Texas Instruments, Incorporated. *
|
|
* All Rights Reserved. *
|
|
* ========================================================================= *
|
|
|
|
.global _IMG_wave_vert
|
|
|
|
* ========================================================================= *
|
|
* End of file: img_wave_vert.h64 *
|
|
* ------------------------------------------------------------------------- *
|
|
* Copyright (c) 2003 Texas Instruments, Incorporated. *
|
|
* All Rights Reserved. *
|
|
* ========================================================================= *
|
|
|