Packman Build Service PMBS

We truncated the diff of some files because they were too big. If you want to see the full diff for every file, click here.

Changes of Revision 5

kvazaar.changes Changed

kvazaar.spec Changed

kvazaar-0.7.1.tar.gz/README.md -> kvazaar-0.7.2.tar.gz/README.md Changed

@@ -3,12 +3,11 @@
 
 Join channel #kvazaar_hevc in Freenode IRC network to contact us.
 
-Kvazaar is not yet finished and does not implement all the features of HEVC. Compression performance will increase as we add more coding tools.
+Kvazaar is not yet finished and does not implement all the features of
+HEVC. Compression performance will increase as we add more coding tools.
 
 http://ultravideo.cs.tut.fi/#encoder for more information.
 
-http://github.com/ultravideo/kvazaar/wiki/List-of-suggested-topics for a list of topics you might want to examine if you would like to do something bigger than a bug fix but don't know what yet.
-
 [![Build Status](https://travis-ci.org/ultravideo/kvazaar.svg?branch=master)](https://travis-ci.org/ultravideo/kvazaar)
 
 ##Using Kvazaar
@@ -96,12 +95,12 @@
                                        Disable threads if set to 0.
 
       Tiles:
-              --tiles-width-split <string>|u<int> : 
+              --tiles-width-split <string>|u<int> :
                                        Specifies a comma separated list of pixel
                                        positions of tiles columns separation coordinates.
                                        Can also be u followed by and a single int n,
                                        in which case it produces columns of uniform width.
-              --tiles-height-split <string>|u<int> : 
+              --tiles-height-split <string>|u<int> :
                                        Specifies a comma separated list of pixel
                                        positions of tiles rows separation coordinates.
                                        Can also be u followed by and a single int n,
@@ -112,7 +111,7 @@
               --owf <integer>|auto   : Number of parallel frames to process. 0 to disable.
 
       Slices:
-              --slice-addresses <string>|u<int>: 
+              --slice-addresses <string>|u<int>:
                                        Specifies a comma separated list of LCU
                                        positions in tile scan order of tile separations.
                                        Can also be u followed by and a single int n,
@@ -123,70 +122,116 @@
            -w, --width               : Width of input in pixels
            -h, --height              : Height of input in pixels
 
-Example:
-    kvazaar -i <INPUT_YUV> --input-res <WIDTH>x<HEIGHT> -o <OUTPUT.BIN> -n <NUMBER_OF_FRAMES> -q <QP>
+For example:
 
-eg. `kvazaar -i BQMall_832x480_60.yuv --input-res 832x480 -o out.bin -n 600 -q 32`
+    kvazaar -i BQMall_832x480_60.yuv --input-res 832x480 -o out.hevc -n 600 -q 32
 
 The only accepted input format so far is 8-bit YUV 4:2:0.
 
+##Kvazaar library
+
+See [kvazaar.h](src/kvazaar.h) for the library API and its
+documentation.
+
+When using the static Kvazaar library on Windows, macro `KVZ_STATIC_LIB`
+must be defined. On other platforms it's not strictly required.
+
+The needed linker and compiler flags can be obtained with pkg-config.
 
 ##Compiling Kvazaar
 
-If you have trouble regarding compiling the source code, please make an [issue](https://github.com/ultravideo/kvazaar/issues) about in Github. Others might encounter the same problem and there is probably much to improve in the build process. We want to make this as simple as possible.
+If you have trouble regarding compiling the source code, please make an
+[issue](https://github.com/ultravideo/kvazaar/issues) about in Github.
+Others might encounter the same problem and there is probably much to
+improve in the build process. We want to make this as simple as
+possible.
 
 ###Required libraries
-- For Visual Studio pthreads-w32 library is required. Platforms with native posix thread support don't need anything.
-  - The project file expects the library to be in ../pthreads.2/ relative to kvazaar. You can just extract the pre-built library there.
-  - The executable needs pthreadVC2.dll to be present. Either install it somewhere or ship it with the executable.
+- For Visual Studio, the pthreads-w32 library is required. Platforms
+  with native POSIX thread support don't need anything.
+  - The project file expects the library to be in ../pthreads.2/
+    relative to Kvazaar. You can just extract the pre-built library
+    there.
+  - The executable needs pthreadVC2.dll to be present. Either install it
+    somewhere or ship it with the executable.
 
-###Visual Studio 2010
-- VS2010 and older does not have support for some of the c99 features that we use. Please use VS2013 or newer or GCC (MinGW) to compile on windows.
+###GCC
+- Makefile can be found in the src directory.
+- Yasm is expected to be in PATH.
+    - Alternatively, NASM can be used by passing `AS=nasm` to make.
 
-###Visual Studio 2013
-- project files included
-- requires external [vsyasm.exe](http://yasm.tortall.net/Download.html) in %PATH%
-  - run `rundll32 sysdm.cpl,EditEnvironmentVariables` and add PATH to user variables
+On Linux, both the shared and the static library are built and installed
+by default. On Windows and OS&nbsp;X, the default is to only build the
+DLL/dylib. The static command line program is built by default on all
+platforms.
 
-###GCC
-- Simple Makefile included in src/
-- Yasm is expected to be in PATH
+The default targets can be installed by running `make install`.
 
 ###OS X
-- The program should compile and work on OS X but you might need a newer version of GCC than what comes with the platform.
+- The program should compile and work on OS X but you might need a newer
+  version of GCC than what comes with the platform.
 
-###Other
-- There is a scons SConstruct file that should work on both Windows and Linux.
-- Contact us for support or write an [issue in Github](https://github.com/ultravideo/kvazaar/issues)
+###Visual Studio
+- VS2010 and older do not have support for some of the C99 features that
+  we use. Please use VS2013 or newer or GCC (MinGW) to compile on
+  Windows.
+- Project files can be found under build/.
+- Requires external [vsyasm.exe](http://yasm.tortall.net/Download.html)
+  in %PATH%
+  - Run `rundll32 sysdm.cpl,EditEnvironmentVariables` and add PATH to
+    user variables
+- Building the Kvazaar library is not yet supported.
 
 
 ##Contributing to Kvazaar
 
-###For version control we try to follow these conventions:
-
-- Master branch always produces a working bitstream (can be decoded with HM).
-- Commits for new features and major changes/fixes put to a sensibly named feature branch first and later merged to the master branch.
-- Always merge the feature branch to the master branch, not the other way around, with fast-forwarding disabled if necessary. We have found that this differentiates between working and unfinished versions nicely.
-- Every commit should at least compile. Producing a working bitstream is nice as well, but not always possible. Features may be temporarily disabled to produce a working bitstream, but remember to re-enbable them before merging to master.
-
-
-###Testing:
-
-- We do not have a proper testing framework yet. We test mainly by decoding the bitstream with HM and checking that the result matches the encoders own reconstruction.
-- You should at least test that HM decodes a bitstream file made with your changes without throwing checksum errors. If your changes shouldn't alter the bitstream, you should check that they don't.
-- We would like to have a suite of automatic tests that also check for BD-rate increase and speed decrease in addition to checking that the bitstream is valid. As of yet there is no such suite.
+See http://github.com/ultravideo/kvazaar/wiki/List-of-suggested-topics
+for a list of topics you might want to examine if you would like to do
+    something bigger than a bug fix but don't know what yet.
 
+###For version control we try to follow these conventions:
 
-###Unit tests:
-- There are some unit tests located in the tests directory. We would like to have more.
-- The Visual Studio project links the unit tests against the actual .lib file used by the encoder. There is no Makefile as of yet.
-- The unit tests use "greatest" unit testing framework. It is included as a submodule, but getting it requires the following commands to be run in the root directory of kvazaar:
+- Master branch always produces a working bitstream (can be decoded with
+  HM).
+- Commits for new features and major changes/fixes put to a sensibly
+  named feature branch first and later merged to the master branch.
+- Always merge the feature branch to the master branch, not the other
+  way around, with fast-forwarding disabled if necessary. We have found
+  that this differentiates between working and unfinished versions
+  nicely.
+- Every commit should at least compile. Producing a working bitstream is
+  nice as well, but not always possible. Features may be temporarily
+  disabled to produce a working bitstream, but remember to re-enbable
+  them before merging to master.
+
+
+###Testing
+
+- We do not have a proper testing framework yet. We test mainly by
+  decoding the bitstream with HM and checking that the result matches
+  the encoders own reconstruction.
+- You should at least test that HM decodes a bitstream file made with
+  your changes without throwing checksum errors. If your changes
+  shouldn't alter the bitstream, you should check that they don't.
+- We would like to have a suite of automatic tests that also check for
+  BD-rate increase and speed decrease in addition to checking that the
+  bitstream is valid. As of yet there is no such suite.
+
+
+###Unit tests
+- There are some unit tests located in the tests directory. We would
+  like to have more.
+- The Visual Studio project links the unit tests against the actual .lib
+  file used by the encoder. There is no Makefile as of yet.
+- The unit tests use "greatest" unit testing framework. It is included
+  as a submodule, but getting it requires the following commands to be
+  run in the root directory of kvazaar:
 
         git submodule init
         git submodule update
 
 
-###Code style:
+###Code style
 
 We try to follow the following conventions:
 - C99 without features not supported by Visual Studio 2013 (VLAs).
@@ -197,11 +242,19 @@
 - Reference and deference next to the variable name.
 - Variable names in lowered characters with words divided by underscore.
 - Maximum line length 79 characters when possible.
-- Functions only used inside the module shouldn't be defined in the module header. They can be defined in the beginning of the .c file if necessary.
-

kvazaar-0.7.1.tar.gz/src/Makefile -> kvazaar-0.7.2.tar.gz/src/Makefile Changed

kvazaar-0.7.1.tar.gz/src/global.h -> kvazaar-0.7.2.tar.gz/src/global.h Changed

kvazaar-0.7.1.tar.gz/src/strategies/avx2/intra-avx2.c -> kvazaar-0.7.2.tar.gz/src/strategies/avx2/intra-avx2.c Changed

@@ -27,8 +27,353 @@
 #include "intra-avx2.h"
 #include "strategyselector.h"
 
-#if COMPILE_INTEL_AVX2
+#if COMPILE_INTEL_AVX2 && defined X86_64
 #include <immintrin.h>
+#include "strategies/strategies-common.h"
+
+ /**
+ * \brief Linear interpolation for 4 pixels. Returns 4 filtered pixels in lowest 32-bits of the register.
+ * \param ref_main      Reference pixels
+ * \param delta_pos     Fractional pixel precise position of sample displacement
+ * \param x             Sample offset in direction x in ref_main array
+ */
+static INLINE __m128i filter_4x1_avx2(const kvz_pixel *ref_main, int16_t delta_pos, int x){
+
+  int8_t delta_int = delta_pos >> 5;
+  int8_t delta_fract = delta_pos & (32-1);
+  __m128i sample0 = _mm_cvtsi32_si128(*(uint32_t*)&(ref_main[x + delta_int]));
+  __m128i sample1 = _mm_cvtsi32_si128(*(uint32_t*)&(ref_main[x + delta_int + 1]));
+  __m128i pairs = _mm_unpacklo_epi8(sample0, sample1);
+  __m128i weight = _mm_set1_epi16( (delta_fract << 8) | (32 - delta_fract) );
+  sample0 = _mm_maddubs_epi16(pairs, weight);
+  sample0 = _mm_add_epi16(sample0, _mm_set1_epi16(16));
+  sample0 = _mm_srli_epi16(sample0, 5);
+  sample0 = _mm_packus_epi16(sample0, sample0);
+
+  return sample0;
+}
+
+ /**
+ * \brief Linear interpolation for 4x4 block. Writes filtered 4x4 block to dst.
+ * \param dst           Destination buffer
+ * \param ref_main      Reference pixels
+ * \param sample_disp   Sample displacement per row
+ * \param vertical_mode Mode direction, true if vertical
+ */
+void filter_4x4_avx2(kvz_pixel *dst, const kvz_pixel *ref_main, int sample_disp, bool vertical_mode){
+
+  __m128i row0 = filter_4x1_avx2(ref_main, 1 * sample_disp, 0);
+  __m128i row1 = filter_4x1_avx2(ref_main, 2 * sample_disp, 0);
+  __m128i row2 = filter_4x1_avx2(ref_main, 3 * sample_disp, 0);
+  __m128i row3 = filter_4x1_avx2(ref_main, 4 * sample_disp, 0);
+
+  //Transpose if horizontal mode
+  if (!vertical_mode) {
+    __m128i temp = _mm_unpacklo_epi16(_mm_unpacklo_epi8(row0, row1), _mm_unpacklo_epi8(row2, row3));
+    row0 = _mm_cvtsi32_si128(_mm_extract_epi32(temp, 0));
+    row1 = _mm_cvtsi32_si128(_mm_extract_epi32(temp, 1));
+    row2 = _mm_cvtsi32_si128(_mm_extract_epi32(temp, 2));
+    row3 = _mm_cvtsi32_si128(_mm_extract_epi32(temp, 3));
+  }
+
+  *(int32_t*)(dst + 0 * 4) = _mm_cvtsi128_si32(row0);
+  *(int32_t*)(dst + 1 * 4) = _mm_cvtsi128_si32(row1);
+  *(int32_t*)(dst + 2 * 4) = _mm_cvtsi128_si32(row2);
+  *(int32_t*)(dst + 3 * 4) = _mm_cvtsi128_si32(row3);
+}
+
+ /**
+ * \brief Linear interpolation for 8 pixels. Returns 8 filtered pixels in lower 64-bits of the register.
+ * \param ref_main      Reference pixels
+ * \param delta_pos     Fractional pixel precise position of sample displacement
+ * \param x             Sample offset in direction x in ref_main array
+ */
+static INLINE __m128i filter_8x1_avx2(const kvz_pixel *ref_main, int16_t delta_pos, int x){
+
+  int8_t delta_int = delta_pos >> 5;
+  int8_t delta_fract = delta_pos & (32-1);
+  __m128i sample0 = _mm_cvtsi64_si128(*(uint64_t*)&(ref_main[x + delta_int]));
+  __m128i sample1 = _mm_cvtsi64_si128(*(uint64_t*)&(ref_main[x + delta_int + 1]));
+  __m128i pairs_lo = _mm_unpacklo_epi8(sample0, sample1);
+  __m128i pairs_hi = _mm_unpackhi_epi8(sample0, sample1);
+
+  __m128i weight = _mm_set1_epi16( (delta_fract << 8) | (32 - delta_fract) );
+  __m128i v_temp_lo = _mm_maddubs_epi16(pairs_lo, weight);
+  __m128i v_temp_hi = _mm_maddubs_epi16(pairs_hi, weight);
+  v_temp_lo = _mm_add_epi16(v_temp_lo, _mm_set1_epi16(16));
+  v_temp_hi = _mm_add_epi16(v_temp_hi, _mm_set1_epi16(16));
+  v_temp_lo = _mm_srli_epi16(v_temp_lo, 5);
+  v_temp_hi = _mm_srli_epi16(v_temp_hi, 5);
+  sample0 = _mm_packus_epi16(v_temp_lo, v_temp_hi);
+
+  return sample0;
+}
+
+ /**
+ * \brief Linear interpolation for 8x8 block. Writes filtered 8x8 block to dst.
+ * \param dst           Destination buffer
+ * \param ref_main      Reference pixels
+ * \param sample_disp   Sample displacement per row
+ * \param vertical_mode Mode direction, true if vertical
+ */
+static void filter_8x8_avx2(kvz_pixel *dst, const kvz_pixel *ref_main, int sample_disp, bool vertical_mode){
+  __m128i row0 = filter_8x1_avx2(ref_main, 1 * sample_disp, 0);
+  __m128i row1 = filter_8x1_avx2(ref_main, 2 * sample_disp, 0);
+  __m128i row2 = filter_8x1_avx2(ref_main, 3 * sample_disp, 0);
+  __m128i row3 = filter_8x1_avx2(ref_main, 4 * sample_disp, 0);
+  __m128i row4 = filter_8x1_avx2(ref_main, 5 * sample_disp, 0);
+  __m128i row5 = filter_8x1_avx2(ref_main, 6 * sample_disp, 0);
+  __m128i row6 = filter_8x1_avx2(ref_main, 7 * sample_disp, 0);
+  __m128i row7 = filter_8x1_avx2(ref_main, 8 * sample_disp, 0);
+
+  //Transpose if horizontal mode
+  if (!vertical_mode) {
+    __m128i q0 = _mm_unpacklo_epi8(row0, row1);
+    __m128i q1 = _mm_unpacklo_epi8(row2, row3);
+    __m128i q2 = _mm_unpacklo_epi8(row4, row5);
+    __m128i q3 = _mm_unpacklo_epi8(row6, row7);
+
+    __m128i h0 = _mm_unpacklo_epi16(q0, q1);
+    __m128i h1 = _mm_unpacklo_epi16(q2, q3);
+    __m128i h2 = _mm_unpackhi_epi16(q0, q1);
+    __m128i h3 = _mm_unpackhi_epi16(q2, q3);
+
+    __m128i temp0 = _mm_unpacklo_epi32(h0, h1);
+    __m128i temp1 = _mm_unpackhi_epi32(h0, h1);
+    __m128i temp2 = _mm_unpacklo_epi32(h2, h3);
+    __m128i temp3 = _mm_unpackhi_epi32(h2, h3);
+
+    row0 = _mm_cvtsi64_si128(_mm_extract_epi64(temp0, 0));
+    row1 = _mm_cvtsi64_si128(_mm_extract_epi64(temp0, 1));
+    row2 = _mm_cvtsi64_si128(_mm_extract_epi64(temp1, 0));
+    row3 = _mm_cvtsi64_si128(_mm_extract_epi64(temp1, 1));
+    row4 = _mm_cvtsi64_si128(_mm_extract_epi64(temp2, 0));
+    row5 = _mm_cvtsi64_si128(_mm_extract_epi64(temp2, 1));
+    row6 = _mm_cvtsi64_si128(_mm_extract_epi64(temp3, 0));
+    row7 = _mm_cvtsi64_si128(_mm_extract_epi64(temp3, 1));
+  }
+      
+  _mm_storel_epi64((__m128i*)(dst + 0 * 8), row0);
+  _mm_storel_epi64((__m128i*)(dst + 1 * 8), row1);
+  _mm_storel_epi64((__m128i*)(dst + 2 * 8), row2);
+  _mm_storel_epi64((__m128i*)(dst + 3 * 8), row3);
+  _mm_storel_epi64((__m128i*)(dst + 4 * 8), row4);
+  _mm_storel_epi64((__m128i*)(dst + 5 * 8), row5);
+  _mm_storel_epi64((__m128i*)(dst + 6 * 8), row6);
+  _mm_storel_epi64((__m128i*)(dst + 7 * 8), row7);
+} 
+
+ /**
+ * \brief Linear interpolation for two 16 pixels. Returns 8 filtered pixels in lower 64-bits of both lanes of the YMM register.
+ * \param ref_main      Reference pixels
+ * \param delta_pos     Fractional pixel precise position of sample displacement
+ * \param x             Sample offset in direction x in ref_main array
+ */
+static INLINE __m256i filter_16x1_avx2(const kvz_pixel *ref_main, int16_t delta_pos, int x){
+
+  int8_t delta_int = delta_pos >> 5;
+  int8_t delta_fract = delta_pos & (32-1);
+  __m256i sample0 = _mm256_cvtepu8_epi16(_mm_loadu_si128((__m128i*)&(ref_main[x + delta_int])));
+  sample0 = _mm256_packus_epi16(sample0, sample0);
+  __m256i sample1 = _mm256_cvtepu8_epi16(_mm_loadu_si128((__m128i*)&(ref_main[x + delta_int + 1])));
+  sample1 = _mm256_packus_epi16(sample1, sample1);
+  __m256i pairs_lo = _mm256_unpacklo_epi8(sample0, sample1);
+  __m256i pairs_hi = _mm256_unpackhi_epi8(sample0, sample1);
+
+  __m256i weight = _mm256_set1_epi16( (delta_fract << 8) | (32 - delta_fract) );
+  __m256i v_temp_lo = _mm256_maddubs_epi16(pairs_lo, weight);
+  __m256i v_temp_hi = _mm256_maddubs_epi16(pairs_hi, weight);
+  v_temp_lo = _mm256_add_epi16(v_temp_lo, _mm256_set1_epi16(16));
+  v_temp_hi = _mm256_add_epi16(v_temp_hi, _mm256_set1_epi16(16));
+  v_temp_lo = _mm256_srli_epi16(v_temp_lo, 5);
+  v_temp_hi = _mm256_srli_epi16(v_temp_hi, 5);
+  sample0 = _mm256_packus_epi16(v_temp_lo, v_temp_hi);
+
+  return sample0;
+}
+
+ /**
+ * \brief Linear interpolation for 16x16 block. Writes filtered 16x16 block to dst.
+ * \param dst           Destination buffer
+ * \param ref_main      Reference pixels
+ * \param sample_disp   Sample displacement per row
+ * \param vertical_mode Mode direction, true if vertical
+ */
+void filter_16x16_avx2(kvz_pixel *dst, const kvz_pixel *ref_main, int sample_disp, bool vertical_mode){
+  for (int y = 0; y < 16; y += 8) {
+    __m256i row0 = filter_16x1_avx2(ref_main, (y + 1) * sample_disp, 0);
+    __m256i row1 = filter_16x1_avx2(ref_main, (y + 2) * sample_disp, 0);
+    __m256i row2 = filter_16x1_avx2(ref_main, (y + 3) * sample_disp, 0);
+    __m256i row3 = filter_16x1_avx2(ref_main, (y + 4) * sample_disp, 0);
+    __m256i row4 = filter_16x1_avx2(ref_main, (y + 5) * sample_disp, 0);
+    __m256i row5 = filter_16x1_avx2(ref_main, (y + 6) * sample_disp, 0);
+    __m256i row6 = filter_16x1_avx2(ref_main, (y + 7) * sample_disp, 0);
+    __m256i row7 = filter_16x1_avx2(ref_main, (y + 8) * sample_disp, 0);
+
+    if (!vertical_mode) {
+      __m256i q0 = _mm256_unpacklo_epi8(row0, row1);
+      __m256i q1 = _mm256_unpacklo_epi8(row2, row3);
+      __m256i q2 = _mm256_unpacklo_epi8(row4, row5);
+      __m256i q3 = _mm256_unpacklo_epi8(row6, row7);
+
+      __m256i h0 = _mm256_unpacklo_epi16(q0, q1);
+      __m256i h1 = _mm256_unpacklo_epi16(q2, q3);
+      __m256i h2 = _mm256_unpackhi_epi16(q0, q1);
+      __m256i h3 = _mm256_unpackhi_epi16(q2, q3);
+
+      __m256i temp0 = _mm256_unpacklo_epi32(h0, h1);

kvazaar-0.7.1.tar.gz/src/strategies/avx2/quant-avx2.c -> kvazaar-0.7.2.tar.gz/src/strategies/avx2/quant-avx2.c Changed

@@ -30,9 +30,11 @@
 #include "strategyselector.h"
 #include "encoder.h"
 #include "transform.h"
+#include "rdo.h"
 
-#if COMPILE_INTEL_AVX2
+#if COMPILE_INTEL_AVX2 && defined X86_64
 #include <immintrin.h>
+#include <smmintrin.h>
 
 /**
 * \brief quantize transformed coefficents
@@ -194,6 +196,208 @@
   }
 }
 
+static INLINE __m128i get_residual_4x1_avx2(const kvz_pixel *a_in, const kvz_pixel *b_in){
+  __m128i a = _mm_cvtsi32_si128(*(int32_t*)a_in);
+  __m128i b = _mm_cvtsi32_si128(*(int32_t*)b_in);
+  __m128i diff = _mm_sub_epi16(_mm_cvtepu8_epi16(a), _mm_cvtepu8_epi16(b) );
+  return diff;
+}
+
+static INLINE __m128i get_residual_8x1_avx2(const kvz_pixel *a_in, const kvz_pixel *b_in){
+  __m128i a = _mm_cvtsi64_si128(*(int64_t*)a_in);
+  __m128i b = _mm_cvtsi64_si128(*(int64_t*)b_in);
+  __m128i diff = _mm_sub_epi16(_mm_cvtepu8_epi16(a), _mm_cvtepu8_epi16(b) );
+  return diff;
+}
+
+static INLINE int32_t get_quantized_recon_4x1_avx2(int16_t *residual, const kvz_pixel *pred_in){
+  __m128i res = _mm_loadl_epi64((__m128i*)residual);
+  __m128i pred = _mm_cvtsi32_si128(*(int32_t*)pred_in);
+  __m128i rec = _mm_add_epi16(res, _mm_cvtepu8_epi16(pred));
+  return _mm_cvtsi128_si32(_mm_packus_epi16(rec, rec));
+}
+
+static INLINE int64_t get_quantized_recon_8x1_avx2(int16_t *residual, const kvz_pixel *pred_in){
+  __m128i res = _mm_loadu_si128((__m128i*)residual);
+  __m128i pred = _mm_cvtsi64_si128(*(int64_t*)pred_in);
+  __m128i rec = _mm_add_epi16(res, _mm_cvtepu8_epi16(pred));
+  return _mm_cvtsi128_si64(_mm_packus_epi16(rec, rec));
+}
+
+static void get_residual_avx2(const kvz_pixel *ref_in, const kvz_pixel *pred_in, int16_t *residual, int width, int in_stride){
+
+  __m128i diff = _mm_setzero_si128();
+  switch (width) {
+    case 4:
+      diff = get_residual_4x1_avx2(ref_in + 0 * in_stride, pred_in + 0 * in_stride);
+      _mm_storel_epi64((__m128i*)&(residual[0]), diff);
+      diff = get_residual_4x1_avx2(ref_in + 1 * in_stride, pred_in + 1 * in_stride);
+      _mm_storel_epi64((__m128i*)&(residual[4]), diff);
+      diff = get_residual_4x1_avx2(ref_in + 2 * in_stride, pred_in + 2 * in_stride);
+      _mm_storel_epi64((__m128i*)&(residual[8]), diff);
+      diff = get_residual_4x1_avx2(ref_in + 3 * in_stride, pred_in + 3 * in_stride);
+      _mm_storel_epi64((__m128i*)&(residual[12]), diff);
+    break;
+    case 8:
+      diff = get_residual_8x1_avx2(&ref_in[0 * in_stride], &pred_in[0 * in_stride]);
+      _mm_storeu_si128((__m128i*)&(residual[0]), diff);
+      diff = get_residual_8x1_avx2(&ref_in[1 * in_stride], &pred_in[1 * in_stride]);
+      _mm_storeu_si128((__m128i*)&(residual[8]), diff);
+      diff = get_residual_8x1_avx2(&ref_in[2 * in_stride], &pred_in[2 * in_stride]);
+      _mm_storeu_si128((__m128i*)&(residual[16]), diff);
+      diff = get_residual_8x1_avx2(&ref_in[3 * in_stride], &pred_in[3 * in_stride]);
+      _mm_storeu_si128((__m128i*)&(residual[24]), diff);
+      diff = get_residual_8x1_avx2(&ref_in[4 * in_stride], &pred_in[4 * in_stride]);
+      _mm_storeu_si128((__m128i*)&(residual[32]), diff);
+      diff = get_residual_8x1_avx2(&ref_in[5 * in_stride], &pred_in[5 * in_stride]);
+      _mm_storeu_si128((__m128i*)&(residual[40]), diff);
+      diff = get_residual_8x1_avx2(&ref_in[6 * in_stride], &pred_in[6 * in_stride]);
+      _mm_storeu_si128((__m128i*)&(residual[48]), diff);
+      diff = get_residual_8x1_avx2(&ref_in[7 * in_stride], &pred_in[7 * in_stride]);
+      _mm_storeu_si128((__m128i*)&(residual[56]), diff);
+    break;
+    default:
+      for (int y = 0; y < width; ++y) {
+        for (int x = 0; x < width; x+=16) {
+          diff = get_residual_8x1_avx2(&ref_in[x + y * in_stride], &pred_in[x + y * in_stride]);
+          _mm_storeu_si128((__m128i*)&residual[x + y * width], diff);
+          diff = get_residual_8x1_avx2(&ref_in[(x+8) + y * in_stride], &pred_in[(x+8) + y * in_stride]);
+          _mm_storeu_si128((__m128i*)&residual[(x+8) + y * width], diff);
+        }
+      }
+    break;
+  }
+}
+
+static void get_quantized_recon_avx2(int16_t *residual, const kvz_pixel *pred_in, int in_stride, kvz_pixel *rec_out, int out_stride, int width){
+
+  switch (width) {
+    case 4:
+      *(int32_t*)&(rec_out[0 * out_stride]) = get_quantized_recon_4x1_avx2(residual + 0 * width, pred_in + 0 * in_stride);
+      *(int32_t*)&(rec_out[1 * out_stride]) = get_quantized_recon_4x1_avx2(residual + 1 * width, pred_in + 1 * in_stride);
+      *(int32_t*)&(rec_out[2 * out_stride]) = get_quantized_recon_4x1_avx2(residual + 2 * width, pred_in + 2 * in_stride);
+      *(int32_t*)&(rec_out[3 * out_stride]) = get_quantized_recon_4x1_avx2(residual + 3 * width, pred_in + 3 * in_stride);
+      break;
+    case 8:
+      *(int64_t*)&(rec_out[0 * out_stride]) = get_quantized_recon_8x1_avx2(residual + 0 * width, pred_in + 0 * in_stride);
+      *(int64_t*)&(rec_out[1 * out_stride]) = get_quantized_recon_8x1_avx2(residual + 1 * width, pred_in + 1 * in_stride);
+      *(int64_t*)&(rec_out[2 * out_stride]) = get_quantized_recon_8x1_avx2(residual + 2 * width, pred_in + 2 * in_stride);
+      *(int64_t*)&(rec_out[3 * out_stride]) = get_quantized_recon_8x1_avx2(residual + 3 * width, pred_in + 3 * in_stride);
+      *(int64_t*)&(rec_out[4 * out_stride]) = get_quantized_recon_8x1_avx2(residual + 4 * width, pred_in + 4 * in_stride);
+      *(int64_t*)&(rec_out[5 * out_stride]) = get_quantized_recon_8x1_avx2(residual + 5 * width, pred_in + 5 * in_stride);
+      *(int64_t*)&(rec_out[6 * out_stride]) = get_quantized_recon_8x1_avx2(residual + 6 * width, pred_in + 6 * in_stride);
+      *(int64_t*)&(rec_out[7 * out_stride]) = get_quantized_recon_8x1_avx2(residual + 7 * width, pred_in + 7 * in_stride);
+      break;
+    default:
+      for (int y = 0; y < width; ++y) {
+        for (int x = 0; x < width; x += 16) {
+          *(int64_t*)&(rec_out[x + y * out_stride]) = get_quantized_recon_8x1_avx2(residual + x + y * width, pred_in + x + y  * in_stride);
+          *(int64_t*)&(rec_out[(x + 8) + y * out_stride]) = get_quantized_recon_8x1_avx2(residual + (x + 8) + y * width, pred_in + (x + 8) + y  * in_stride);
+        }
+      }
+      break;
+  }
+}
+
+/**
+* \brief Quantize residual and get both the reconstruction and coeffs.
+*
+* \param width  Transform width.
+* \param color  Color.
+* \param scan_order  Coefficient scan order.
+* \param use_trskip  Whether transform skip is used.
+* \param stride  Stride for ref_in, pred_in rec_out and coeff_out.
+* \param ref_in  Reference pixels.
+* \param pred_in  Predicted pixels.
+* \param rec_out  Reconstructed pixels.
+* \param coeff_out  Coefficients used for reconstruction of rec_out.
+*
+* \returns  Whether coeff_out contains any non-zero coefficients.
+*/
+int kvz_quantize_residual_avx2(encoder_state_t *const state,
+  const cu_info_t *const cur_cu, const int width, const color_t color,
+  const coeff_scan_order_t scan_order, const int use_trskip,
+  const int in_stride, const int out_stride,
+  const kvz_pixel *const ref_in, const kvz_pixel *const pred_in,
+  kvz_pixel *rec_out, coeff_t *coeff_out)
+{
+  // Temporary arrays to pass data to and from kvz_quant and transform functions.
+  int16_t residual[TR_MAX_WIDTH * TR_MAX_WIDTH];
+  coeff_t quant_coeff[TR_MAX_WIDTH * TR_MAX_WIDTH];
+  coeff_t coeff[TR_MAX_WIDTH * TR_MAX_WIDTH];
+
+  int has_coeffs = 0;
+
+  assert(width <= TR_MAX_WIDTH);
+  assert(width >= TR_MIN_WIDTH);
+
+  // Get residual. (ref_in - pred_in -> residual)
+  get_residual_avx2(ref_in, pred_in, residual, width, in_stride);
+
+  // Transform residual. (residual -> coeff)
+  if (use_trskip) {
+    kvz_transformskip(state->encoder_control, residual, coeff, width);
+  }
+  else {
+    kvz_transform2d(state->encoder_control, residual, coeff, width, (color == COLOR_Y ? 0 : 65535));
+  }
+
+  // Quantize coeffs. (coeff -> quant_coeff)
+  if (state->encoder_control->rdoq_enable) {
+    int8_t tr_depth = cur_cu->tr_depth - cur_cu->depth;
+    tr_depth += (cur_cu->part_size == SIZE_NxN ? 1 : 0);
+    kvz_rdoq(state, coeff, quant_coeff, width, width, (color == COLOR_Y ? 0 : 2),
+      scan_order, cur_cu->type, tr_depth);
+  }
+  else {
+    kvz_quant(state, coeff, quant_coeff, width, width, (color == COLOR_Y ? 0 : 2),
+      scan_order, cur_cu->type);
+  }
+
+  // Check if there are any non-zero coefficients.
+  {
+    int i;
+    for (i = 0; i < width * width; i+=8) {
+      __m128i v_quant_coeff = _mm_loadu_si128((__m128i*)&(quant_coeff[i]));
+      has_coeffs = !_mm_testz_si128(_mm_set1_epi8(0xFF), v_quant_coeff);
+      if(has_coeffs) break;
+    }
+  }
+
+  // Copy coefficients to coeff_out.
+  kvz_coefficients_blit(quant_coeff, coeff_out, width, width, width, out_stride);
+
+  // Do the inverse quantization and transformation and the reconstruction to
+  // rec_out.
+  if (has_coeffs) {
+
+    // Get quantized residual. (quant_coeff -> coeff -> residual)
+    kvz_dequant(state, quant_coeff, coeff, width, width, (color == COLOR_Y ? 0 : (color == COLOR_U ? 2 : 3)), cur_cu->type);
+    if (use_trskip) {
+      kvz_itransformskip(state->encoder_control, residual, coeff, width);
+    }
+    else {
+      kvz_itransform2d(state->encoder_control, residual, coeff, width, (color == COLOR_Y ? 0 : 65535));
+    }

kvazaar-0.7.1.tar.gz/src/strategies/generic/quant-generic.c -> kvazaar-0.7.2.tar.gz/src/strategies/generic/quant-generic.c Changed

@@ -28,6 +28,7 @@
 #include "strategyselector.h"
 #include "encoder.h"
 #include "transform.h"
+#include "rdo.h"
 
 #define QUANT_SHIFT 14
 /**
@@ -162,12 +163,174 @@
   }
 }
 
+/**
+* \brief Quantize residual and get both the reconstruction and coeffs.
+*
+* \param width  Transform width.
+* \param color  Color.
+* \param scan_order  Coefficient scan order.
+* \param use_trskip  Whether transform skip is used.
+* \param stride  Stride for ref_in, pred_in rec_out and coeff_out.
+* \param ref_in  Reference pixels.
+* \param pred_in  Predicted pixels.
+* \param rec_out  Reconstructed pixels.
+* \param coeff_out  Coefficients used for reconstruction of rec_out.
+*
+* \returns  Whether coeff_out contains any non-zero coefficients.
+*/
+int kvz_quantize_residual_generic(encoder_state_t *const state,
+  const cu_info_t *const cur_cu, const int width, const color_t color,
+  const coeff_scan_order_t scan_order, const int use_trskip,
+  const int in_stride, const int out_stride,
+  const kvz_pixel *const ref_in, const kvz_pixel *const pred_in,
+  kvz_pixel *rec_out, coeff_t *coeff_out)
+{
+  // Temporary arrays to pass data to and from kvz_quant and transform functions.
+  int16_t residual[TR_MAX_WIDTH * TR_MAX_WIDTH];
+  coeff_t quant_coeff[TR_MAX_WIDTH * TR_MAX_WIDTH];
+  coeff_t coeff[TR_MAX_WIDTH * TR_MAX_WIDTH];
+
+  int has_coeffs = 0;
+
+  assert(width <= TR_MAX_WIDTH);
+  assert(width >= TR_MIN_WIDTH);
+
+  // Get residual. (ref_in - pred_in -> residual)
+  {
+    int y, x;
+    for (y = 0; y < width; ++y) {
+      for (x = 0; x < width; ++x) {
+        residual[x + y * width] = (int16_t)(ref_in[x + y * in_stride] - pred_in[x + y * in_stride]);
+      }
+    }
+  }
+
+  // Transform residual. (residual -> coeff)
+  if (use_trskip) {
+    kvz_transformskip(state->encoder_control, residual, coeff, width);
+  }
+  else {
+    kvz_transform2d(state->encoder_control, residual, coeff, width, (color == COLOR_Y ? 0 : 65535));
+  }
+
+  // Quantize coeffs. (coeff -> quant_coeff)
+  if (state->encoder_control->rdoq_enable) {
+    int8_t tr_depth = cur_cu->tr_depth - cur_cu->depth;
+    tr_depth += (cur_cu->part_size == SIZE_NxN ? 1 : 0);
+    kvz_rdoq(state, coeff, quant_coeff, width, width, (color == COLOR_Y ? 0 : 2),
+      scan_order, cur_cu->type, tr_depth);
+  }
+  else {
+    kvz_quant(state, coeff, quant_coeff, width, width, (color == COLOR_Y ? 0 : 2),
+      scan_order, cur_cu->type);
+  }
+
+  // Check if there are any non-zero coefficients.
+  {
+    int i;
+    for (i = 0; i < width * width; ++i) {
+      if (quant_coeff[i] != 0) {
+        has_coeffs = 1;
+        break;
+      }
+    }
+  }
+
+  // Copy coefficients to coeff_out.
+  kvz_coefficients_blit(quant_coeff, coeff_out, width, width, width, out_stride);
+
+  // Do the inverse quantization and transformation and the reconstruction to
+  // rec_out.
+  if (has_coeffs) {
+    int y, x;
+
+    // Get quantized residual. (quant_coeff -> coeff -> residual)
+    kvz_dequant(state, quant_coeff, coeff, width, width, (color == COLOR_Y ? 0 : (color == COLOR_U ? 2 : 3)), cur_cu->type);
+    if (use_trskip) {
+      kvz_itransformskip(state->encoder_control, residual, coeff, width);
+    }
+    else {
+      kvz_itransform2d(state->encoder_control, residual, coeff, width, (color == COLOR_Y ? 0 : 65535));
+    }
+
+    // Get quantized reconstruction. (residual + pred_in -> rec_out)
+    for (y = 0; y < width; ++y) {
+      for (x = 0; x < width; ++x) {
+        int16_t val = residual[x + y * width] + pred_in[x + y * in_stride];
+        rec_out[x + y * out_stride] = (kvz_pixel)CLIP(0, PIXEL_MAX, val);
+      }
+    }
+  }
+  else if (rec_out != pred_in) {
+    // With no coeffs and rec_out == pred_int we skip copying the coefficients
+    // because the reconstruction is just the prediction.
+    int y, x;
+
+    for (y = 0; y < width; ++y) {
+      for (x = 0; x < width; ++x) {
+        rec_out[x + y * out_stride] = pred_in[x + y * in_stride];
+      }
+    }
+  }
+
+  return has_coeffs;
+}
+
+/**
+ * \brief inverse quantize transformed and quantized coefficents
+ *
+ */
+void kvz_dequant_generic(const encoder_state_t * const state, coeff_t *q_coef, coeff_t *coef, int32_t width, int32_t height,int8_t type, int8_t block_type)
+{
+  const encoder_control_t * const encoder = state->encoder_control;
+  int32_t shift,add,coeff_q;
+  int32_t n;
+  int32_t transform_shift = 15 - encoder->bitdepth - (kvz_g_convert_to_bit[ width ] + 2);
+
+  int32_t qp_scaled = kvz_get_scaled_qp(type, state->global->QP, (encoder->bitdepth-8)*6);
+
+  shift = 20 - QUANT_SHIFT - transform_shift;
+
+  if (encoder->scaling_list.enable)
+  {
+    uint32_t log2_tr_size = kvz_g_convert_to_bit[ width ] + 2;
+    int32_t scalinglist_type = (block_type == CU_INTRA ? 0 : 3) + (int8_t)("\0\3\1\2"[type]);
+
+    const int32_t *dequant_coef = encoder->scaling_list.de_quant_coeff[log2_tr_size-2][scalinglist_type][qp_scaled%6];
+    shift += 4;
+
+    if (shift >qp_scaled / 6) {
+      add = 1 << (shift - qp_scaled/6 - 1);
+
+      for (n = 0; n < width * height; n++) {
+        coeff_q = ((q_coef[n] * dequant_coef[n]) + add ) >> (shift -  qp_scaled/6);
+        coef[n] = (coeff_t)CLIP(-32768,32767,coeff_q);
+      }
+    } else {
+      for (n = 0; n < width * height; n++) {
+        // Clip to avoid possible overflow in following shift left operation
+        coeff_q   = CLIP(-32768, 32767, q_coef[n] * dequant_coef[n]);
+        coef[n] = (coeff_t)CLIP(-32768, 32767, coeff_q << (qp_scaled/6 - shift));
+      }
+    }
+  } else {
+    int32_t scale = kvz_g_inv_quant_scales[qp_scaled%6] << (qp_scaled/6);
+    add = 1 << (shift-1);
+
+    for (n = 0; n < width*height; n++) {
+      coeff_q   = (q_coef[n] * scale + add) >> shift;
+      coef[n] = (coeff_t)CLIP(-32768, 32767, coeff_q);
+    }
+  }
+}
 
 int kvz_strategy_register_quant_generic(void* opaque, uint8_t bitdepth)
 {
   bool success = true;
 
   success &= kvz_strategyselector_register(opaque, "quant", "generic", 0, &kvz_quant_generic);
+  success &= kvz_strategyselector_register(opaque, "quantize_residual", "generic", 0, &kvz_quantize_residual_generic);
+  success &= kvz_strategyselector_register(opaque, "dequant", "generic", 0, &kvz_dequant_generic);
 
   return success;
 }

kvazaar-0.7.1.tar.gz/src/strategies/generic/quant-generic.h -> kvazaar-0.7.2.tar.gz/src/strategies/generic/quant-generic.h Changed

kvazaar-0.7.1.tar.gz/src/strategies/strategies-quant.c -> kvazaar-0.7.2.tar.gz/src/strategies/strategies-quant.c Changed

kvazaar-0.7.1.tar.gz/src/strategies/strategies-quant.h -> kvazaar-0.7.2.tar.gz/src/strategies/strategies-quant.h Changed

@@ -25,15 +25,27 @@
 // Declare function pointers.
 typedef unsigned (quant_func)(const encoder_state_t * const state, coeff_t *coef, coeff_t *q_coef, int32_t width,
   int32_t height, int8_t type, int8_t scan_idx, int8_t block_type);
+typedef unsigned (quant_residual_func)(encoder_state_t *const state,
+  const cu_info_t *const cur_cu, const int width, const color_t color,
+  const coeff_scan_order_t scan_order, const int use_trskip,
+  const int in_stride, const int out_stride,
+  const kvz_pixel *const ref_in, const kvz_pixel *const pred_in,
+  kvz_pixel *rec_out, coeff_t *coeff_out);
+typedef unsigned (dequant_func)(const encoder_state_t * const state, coeff_t *q_coef, coeff_t *coef, int32_t width,
+  int32_t height, int8_t type, int8_t block_type);
 
 // Declare function pointers.
 extern quant_func * kvz_quant;
+extern quant_residual_func * kvz_quantize_residual;
+extern dequant_func *kvz_dequant;
 
 int kvz_strategy_register_quant(void* opaque, uint8_t bitdepth);
 
 
 #define STRATEGIES_QUANT_EXPORTS \
   {"quant", (void**) &kvz_quant}, \
+  {"quantize_residual", (void**) &kvz_quantize_residual}, \
+  {"dequant", (void**) &kvz_dequant}, \

kvazaar-0.7.1.tar.gz/src/transform.c -> kvazaar-0.7.2.tar.gz/src/transform.c Changed

@@ -130,165 +130,6 @@
 }
 
 /**
- * \brief inverse quantize transformed and quantized coefficents
- *
- */
-void kvz_dequant(const encoder_state_t * const state, coeff_t *q_coef, coeff_t *coef, int32_t width, int32_t height,int8_t type, int8_t block_type)
-{
-  const encoder_control_t * const encoder = state->encoder_control;
-  int32_t shift,add,coeff_q;
-  int32_t n;
-  int32_t transform_shift = 15 - encoder->bitdepth - (kvz_g_convert_to_bit[ width ] + 2);
-
-  int32_t qp_scaled = kvz_get_scaled_qp(type, state->global->QP, (encoder->bitdepth-8)*6);
-
-  shift = 20 - QUANT_SHIFT - transform_shift;
-
-  if (encoder->scaling_list.enable)
-  {
-    uint32_t log2_tr_size = kvz_g_convert_to_bit[ width ] + 2;
-    int32_t scalinglist_type = (block_type == CU_INTRA ? 0 : 3) + (int8_t)("\0\3\1\2"[type]);
-
-    const int32_t *dequant_coef = encoder->scaling_list.de_quant_coeff[log2_tr_size-2][scalinglist_type][qp_scaled%6];
-    shift += 4;
-
-    if (shift >qp_scaled / 6) {
-      add = 1 << (shift - qp_scaled/6 - 1);
-
-      for (n = 0; n < width * height; n++) {
-        coeff_q = ((q_coef[n] * dequant_coef[n]) + add ) >> (shift -  qp_scaled/6);
-        coef[n] = (coeff_t)CLIP(-32768,32767,coeff_q);
-      }
-    } else {
-      for (n = 0; n < width * height; n++) {
-        // Clip to avoid possible overflow in following shift left operation
-        coeff_q   = CLIP(-32768, 32767, q_coef[n] * dequant_coef[n]);
-        coef[n] = (coeff_t)CLIP(-32768, 32767, coeff_q << (qp_scaled/6 - shift));
-      }
-    }
-  } else {
-    int32_t scale = kvz_g_inv_quant_scales[qp_scaled%6] << (qp_scaled/6);
-    add = 1 << (shift-1);
-
-    for (n = 0; n < width*height; n++) {
-      coeff_q   = (q_coef[n] * scale + add) >> shift;
-      coef[n] = (coeff_t)CLIP(-32768, 32767, coeff_q);
-    }
-  }
-}
-
-
-/**
- * \brief Quantize residual and get both the reconstruction and coeffs.
- * 
- * \param width  Transform width.
- * \param color  Color.
- * \param scan_order  Coefficient scan order.
- * \param use_trskip  Whether transform skip is used.
- * \param stride  Stride for ref_in, pred_in rec_out and coeff_out.
- * \param ref_in  Reference pixels.
- * \param pred_in  Predicted pixels.
- * \param rec_out  Reconstructed pixels.
- * \param coeff_out  Coefficients used for reconstruction of rec_out.
- *
- * \returns  Whether coeff_out contains any non-zero coefficients.
- */
-int kvz_quantize_residual(encoder_state_t *const state,
-                      const cu_info_t *const cur_cu, const int width, const color_t color,
-                      const coeff_scan_order_t scan_order, const int use_trskip, 
-                      const int in_stride, const int out_stride,
-                      const kvz_pixel *const ref_in, const kvz_pixel *const pred_in, 
-                      kvz_pixel *rec_out, coeff_t *coeff_out)
-{
-  // Temporary arrays to pass data to and from kvz_quant and transform functions.
-  int16_t residual[TR_MAX_WIDTH * TR_MAX_WIDTH];
-  coeff_t quant_coeff[TR_MAX_WIDTH * TR_MAX_WIDTH];
-  coeff_t coeff[TR_MAX_WIDTH * TR_MAX_WIDTH];
-
-  int has_coeffs = 0;
-
-  assert(width <= TR_MAX_WIDTH);
-  assert(width >= TR_MIN_WIDTH);
-
-  // Get residual. (ref_in - pred_in -> residual)
-  {
-    int y, x;
-    for (y = 0; y < width; ++y) {
-      for (x = 0; x < width; ++x) {
-        residual[x + y * width] = (int16_t)(ref_in[x + y * in_stride] - pred_in[x + y * in_stride]);
-      }
-    }
-  }
-  
-  // Transform residual. (residual -> coeff)
-  if (use_trskip) {
-    kvz_transformskip(state->encoder_control, residual, coeff, width);
-  } else {
-    kvz_transform2d(state->encoder_control, residual, coeff, width, (color == COLOR_Y ? 0 : 65535));
-  }
-
-  // Quantize coeffs. (coeff -> quant_coeff)
-  if (state->encoder_control->rdoq_enable) {
-    int8_t tr_depth = cur_cu->tr_depth - cur_cu->depth;
-    tr_depth += (cur_cu->part_size == SIZE_NxN ? 1 : 0);
-    kvz_rdoq(state, coeff, quant_coeff, width, width, (color == COLOR_Y ? 0 : 2),
-         scan_order, cur_cu->type, tr_depth);
-  } else {
-    kvz_quant(state, coeff, quant_coeff, width, width, (color == COLOR_Y ? 0 : 2),
-          scan_order, cur_cu->type);
-  }
-
-  // Check if there are any non-zero coefficients.
-  {
-    int i;
-    for (i = 0; i < width * width; ++i) {
-      if (quant_coeff[i] != 0) {
-        has_coeffs = 1;
-        break;
-      }
-    }
-  }
-
-  // Copy coefficients to coeff_out.
-  kvz_coefficients_blit(quant_coeff, coeff_out, width, width, width, out_stride);
-
-  // Do the inverse quantization and transformation and the reconstruction to
-  // rec_out.
-  if (has_coeffs) {
-    int y, x;
-
-    // Get quantized residual. (quant_coeff -> coeff -> residual)
-    kvz_dequant(state, quant_coeff, coeff, width, width, (color == COLOR_Y ? 0 : (color == COLOR_U ? 2 : 3)), cur_cu->type);
-    if (use_trskip) {
-      kvz_itransformskip(state->encoder_control, residual, coeff, width);
-    } else {
-      kvz_itransform2d(state->encoder_control, residual, coeff, width, (color == COLOR_Y ? 0 : 65535));
-    }
-
-    // Get quantized reconstruction. (residual + pred_in -> rec_out)
-    for (y = 0; y < width; ++y) {
-      for (x = 0; x < width; ++x) {
-        int16_t val = residual[x + y * width] + pred_in[x + y * in_stride];
-        rec_out[x + y * out_stride] = (kvz_pixel)CLIP(0, PIXEL_MAX, val);
-      }
-    }
-  } else if (rec_out != pred_in) {
-    // With no coeffs and rec_out == pred_int we skip copying the coefficients
-    // because the reconstruction is just the prediction.
-    int y, x;
-
-    for (y = 0; y < width; ++y) {
-      for (x = 0; x < width; ++x) {
-        rec_out[x + y * out_stride] = pred_in[x + y * in_stride];
-      }
-    }
-  }
-
-  return has_coeffs;
-}
-
-
-/**
  * \brief Like kvz_quantize_residual except that this uses trskip if that is better.
  *
  * Using this function saves one step of quantization and inverse quantization

kvazaar-0.7.1.tar.gz/src/transform.h -> kvazaar-0.7.2.tar.gz/src/transform.h Changed