Overview

Request 5114 (accepted)

- Update to version 3.4
New features:
* Edge-aware quadtree partitioning to terminate CU depth
recursion based on edge information. --rskip level 2 enables
the feature and --rskip-edge-threshold denotes the minimum
expected edge-density percentage within the CU, below which
the recursion is skipped. Experimental feature.
* Application-level feature --abr-ladder for automating
efficient ABR ladder generation. Shows ~65% savings in the
over-all turn-around time required for the generation of a
typical Apple HLS ladder in Intel(R) Xeon(R) Platinum 8280
CPU @ 2.70GHz over a sequential ABR-ladder generation
approach that leverages save-load architecture.
Enhancements to existing features:
* Improved efficiency in 2-pass rate-control algorithm. The
savings in the bitrate is ~1.72% with visual improvement in
quality in the initial 1-2 secs.
Encoder enhancements:
* Faster ARM64 encodes enabled by ASM contributions from
Huawei. The speed-up over no-asm version for 1080p encodes @
medium preset is ~15% in a 16 core H/W.
* Strict VBV conformance in zone encoding.
Bug fixes:
* Multi-pass encode failures with --frame-dup.
* Corrupted bitstreams with --hist-scenecut when input depth
and internal bit-depth differ.
* Incorrect analysis propagation in multi-level save-load
architecture.
* Failure in detecting NUMA packages installed in non-standard
directories.

Submit package Staging / x265 to package Essentials / x265

x265.changes Changed
x
 
1
@@ -1,4 +1,40 @@
2
 -------------------------------------------------------------------
3
+Mon Jun  1 17:51:22 UTC 2020 - Luigi Baldoni <aloisio@gmx.com>
4
+
5
+- Update to version 3.4
6
+  New features:
7
+  * Edge-aware quadtree partitioning to terminate CU depth
8
+    recursion based on edge information. --rskip level 2 enables
9
+    the feature and --rskip-edge-threshold denotes the minimum
10
+    expected edge-density percentage within the CU, below which
11
+    the recursion is skipped. Experimental feature.
12
+  * Application-level feature --abr-ladder for automating
13
+    efficient ABR ladder generation. Shows ~65% savings in the
14
+    over-all turn-around time required for the generation of a
15
+    typical Apple HLS ladder in Intel(R) Xeon(R) Platinum 8280
16
+    CPU @ 2.70GHz over a sequential ABR-ladder generation
17
+    approach that leverages save-load architecture.
18
+  Enhancements to existing features:
19
+  * Improved efficiency in 2-pass rate-control algorithm. The
20
+    savings in the bitrate is ~1.72% with visual improvement in
21
+    quality in the initial 1-2 secs.
22
+  Encoder enhancements:
23
+  * Faster ARM64 encodes enabled by ASM contributions from
24
+    Huawei. The speed-up over no-asm version for 1080p encodes @
25
+    medium preset is ~15% in a 16 core H/W.
26
+  * Strict VBV conformance in zone encoding.
27
+  Bug fixes:
28
+  * Multi-pass encode failures with --frame-dup.
29
+  * Corrupted bitstreams with --hist-scenecut when input depth
30
+    and internal bit-depth differ.
31
+  * Incorrect analysis propagation in multi-level save-load
32
+    architecture.
33
+  * Failure in detecting NUMA packages installed in non-standard
34
+    directories.
35
+
36
+- Refreshed arm.patch
37
+
38
+-------------------------------------------------------------------
39
 Sat Mar 28 14:28:56 UTC 2020 - Luigi Baldoni <aloisio@gmx.com>
40
 
41
 - Update to version 3.3
42
x265.spec Changed
23
 
1
@@ -17,11 +17,11 @@
2
 #
3
 
4
 
5
-%define sover  188
6
+%define sover  192
7
 %define libname lib%{name}
8
 %define libsoname %{libname}-%{sover}
9
 Name:           x265
10
-Version:        3.3
11
+Version:        3.4
12
 Release:        0
13
 Summary:        A free h265/HEVC encoder - encoder binary
14
 License:        GPL-2.0-or-later
15
@@ -67,7 +67,6 @@
16
 %patch0 -p1
17
 %patch1 -p1
18
 %patch2 -p1
19
-
20
 sed -i -e "s/0.0/%{sover}.0/g" source/cmake/version.cmake
21
 
22
 
23
arm.patch Changed
129
 
1
@@ -1,8 +1,8 @@
2
-Index: x265_2.2/source/CMakeLists.txt
3
+Index: x265_3.4/source/CMakeLists.txt
4
 ===================================================================
5
---- x265_2.2.orig/source/CMakeLists.txt
6
-+++ x265_2.2/source/CMakeLists.txt
7
-@@ -65,15 +65,22 @@ elseif(POWERMATCH GREATER "-1")
8
+--- x265_3.4.orig/source/CMakeLists.txt
9
++++ x265_3.4/source/CMakeLists.txt
10
+@@ -64,26 +64,26 @@ elseif(POWERMATCH GREATER "-1")
11
          add_definitions(-DPPC64=1)
12
          message(STATUS "Detected POWER PPC64 target processor")
13
      endif()
14
@@ -12,41 +12,62 @@
15
 -    else()
16
 -        set(CROSS_COMPILE_ARM 0)
17
 -    endif()
18
--    message(STATUS "Detected ARM target processor")
19
 -    set(ARM 1)
20
--    add_definitions(-DX265_ARCH_ARM=1 -DHAVE_ARMV6=1)
21
+-    if("${CMAKE_SIZEOF_VOID_P}" MATCHES 8)
22
+-        message(STATUS "Detected ARM64 target processor")
23
+-        set(ARM64 1)
24
+-        add_definitions(-DX265_ARCH_ARM=1 -DX265_ARCH_ARM64=1 -DHAVE_ARMV6=0)
25
+-    else()
26
+-        message(STATUS "Detected ARM target processor")
27
+-        add_definitions(-DX265_ARCH_ARM=1 -DX265_ARCH_ARM64=0 -DHAVE_ARMV6=1)
28
+-    endif()
29
 +elseif(${SYSPROC} MATCHES "armv5.*")
30
 +    message(STATUS "Detected ARMV5 system processor")
31
 +    set(ARMV5 1)
32
-+    add_definitions(-DX265_ARCH_ARM=1 -DHAVE_ARMV6=0 -DHAVE_NEON=0)
33
++    add_definitions(-DX265_ARCH_ARM=1 -DX265_ARCH_ARM64=0 -DHAVE_ARMV6=0 -DHAVE_NEON=0)
34
 +elseif(${SYSPROC} STREQUAL "armv6l")
35
 +    message(STATUS "Detected ARMV6 system processor")
36
 +    set(ARMV6 1)
37
-+    add_definitions(-DX265_ARCH_ARM=1 -DHAVE_ARMV6=1 -DHAVE_NEON=0)
38
++    add_definitions(-DX265_ARCH_ARM=1 -DX265_ARCH_ARM64=0 -DHAVE_ARMV6=1 -DHAVE_NEON=0)
39
 +elseif(${SYSPROC} STREQUAL "armv7l")
40
 +    message(STATUS "Detected ARMV7 system processor")
41
 +    set(ARMV7 1)
42
-+    add_definitions(-DX265_ARCH_ARM=1 -DHAVE_ARMV6=1 -DHAVE_NEON=0)
43
++    add_definitions(-DX265_ARCH_ARM=1 -DX265_ARCH_ARM64=0 -DHAVE_ARMV6=1 -DHAVE_NEON=0)
44
 +elseif(${SYSPROC} STREQUAL "aarch64")
45
 +    message(STATUS "Detected AArch64 system processor")
46
 +    set(ARMV7 1)
47
-+    add_definitions(-DX265_ARCH_ARM=1 -DHAVE_ARMV6=1 -DHAVE_NEON=0)
48
++    add_definitions(-DX265_ARCH_ARM=1 -DX265_ARCH_ARM64=1 -DHAVE_ARMV6=0 -DHAVE_NEON=0)
49
  else()
50
      message(STATUS "CMAKE_SYSTEM_PROCESSOR value `${CMAKE_SYSTEM_PROCESSOR}` is unknown")
51
      message(STATUS "Please add this value near ${CMAKE_CURRENT_LIST_FILE}:${CMAKE_CURRENT_LIST_LINE}")
52
-@@ -208,18 +215,9 @@ if(GCC)
53
+ endif()
54
+-
55
+ if(UNIX)
56
+     list(APPEND PLATFORM_LIBS pthread)
57
+     find_library(LIBRT rt)
58
+@@ -238,28 +238,9 @@ if(GCC)
59
              endif()
60
          endif()
61
      endif()
62
 -    if(ARM AND CROSS_COMPILE_ARM)
63
--        set(ARM_ARGS -march=armv6 -mfloat-abi=soft -mfpu=vfp -marm -fPIC)
64
+-        if(ARM64)
65
+-            set(ARM_ARGS -fPIC)
66
+-        else()
67
+-            set(ARM_ARGS -march=armv6 -mfloat-abi=soft -mfpu=vfp -marm -fPIC)
68
+-        endif()
69
+-        message(STATUS "cross compile arm")
70
 -    elseif(ARM)
71
--        find_package(Neon)
72
--        if(CPU_HAS_NEON)
73
--            set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=neon -marm -fPIC)
74
+-        if(ARM64)
75
+-            set(ARM_ARGS -fPIC)
76
 -            add_definitions(-DHAVE_NEON)
77
 -        else()
78
--            set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=vfp -marm)
79
+-            find_package(Neon)
80
+-            if(CPU_HAS_NEON)
81
+-                set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=neon -marm -fPIC)
82
+-                add_definitions(-DHAVE_NEON)
83
+-            else()
84
+-                set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=vfp -marm)
85
+-            endif()
86
 -        endif()
87
 +    if(ARMV7)
88
 +        add_definitions(-fPIC)
89
@@ -55,11 +76,11 @@
90
      if(FPROFILE_GENERATE)
91
          if(INTEL_CXX)
92
              add_definitions(-prof-gen -prof-dir="${CMAKE_CURRENT_BINARY_DIR}")
93
-Index: x265_2.2/source/common/cpu.cpp
94
+Index: x265_3.4/source/common/cpu.cpp
95
 ===================================================================
96
---- x265_2.2.orig/source/common/cpu.cpp
97
-+++ x265_2.2/source/common/cpu.cpp
98
-@@ -37,7 +37,7 @@
99
+--- x265_3.4.orig/source/common/cpu.cpp
100
++++ x265_3.4/source/common/cpu.cpp
101
+@@ -39,7 +39,7 @@
102
  #include <machine/cpu.h>
103
  #endif
104
  
105
@@ -68,7 +89,7 @@
106
  #include <signal.h>
107
  #include <setjmp.h>
108
  static sigjmp_buf jmpbuf;
109
-@@ -344,7 +344,6 @@ uint32_t cpu_detect(void)
110
+@@ -350,7 +350,6 @@ uint32_t cpu_detect(bool benableavx512)
111
      }
112
  
113
      canjump = 1;
114
@@ -76,7 +97,7 @@
115
      canjump = 0;
116
      signal(SIGILL, oldsig);
117
  #endif // if !HAVE_NEON
118
-@@ -360,7 +359,7 @@ uint32_t cpu_detect(void)
119
+@@ -366,7 +365,7 @@ uint32_t cpu_detect(bool benableavx512)
120
      // which may result in incorrect detection and the counters stuck enabled.
121
      // right now Apple does not seem to support performance counters for this test
122
  #ifndef __MACH__
123
@@ -84,4 +105,4 @@
124
 +    //flags |= PFX(cpu_fast_neon_mrc_test)() ? X265_CPU_FAST_NEON_MRC : 0;
125
  #endif
126
      // TODO: write dual issue test? currently it's A8 (dual issue) vs. A9 (fast mrc)
127
- #endif // if HAVE_ARMV6
128
+ #elif X265_ARCH_ARM64
129
baselibs.conf Changed
4
 
1
@@ -1,1 +1,1 @@
2
-libx265-179
3
+libx265-192
4
x265_3.3.tar.gz/.hg_archival.txt -> x265_3.4.tar.gz/.hg_archival.txt Changed
10
 
1
@@ -1,5 +1,4 @@
2
 repo: 09fe40627f03a0f9c3e6ac78b22ac93da23f9fdf
3
-node: f94b0d32737d40b2b9a9d74df57fee45e6be5cb0
4
-branch: Release_3.3
5
-latesttag: 3.3
6
-latesttagdistance: 1
7
+node: 2a65b720985096bcb1664f7cb05c3d04aeb576f5
8
+branch: Release_3.4
9
+tag: 3.4
10
x265_3.3.tar.gz/.hgtags -> x265_3.4.tar.gz/.hgtags Changed
6
 
1
@@ -40,3 +40,4 @@
2
 5ee3593ebd82b4d8957909bbc1b68b99b59ba773 3.3_RC1
3
 96a10df63c0b778b480330bdf3be8da7db8a5fb1 3.3_RC2
4
 057215961bc4b51b6260a584ff3d506e6d65cfd6 3.3
5
+ee92f36782800f145970131e01c79955a3ed5c10 3.4_RC1
6
x265_3.4.tar.gz/build/aarch64-linux/crosscompile.cmake Added
17
 
1
@@ -0,0 +1,15 @@
2
+# CMake toolchain file for cross compiling x265 for aarch64
3
+# This feature is only supported as experimental. Use with caution.
4
+# Please report bugs on bitbucket
5
+# Run cmake with: cmake -DCMAKE_TOOLCHAIN_FILE=crosscompile.cmake -G "Unix Makefiles" ../../source && ccmake ../../source
6
+
7
+set(CROSS_COMPILE_ARM 1)
8
+set(CMAKE_SYSTEM_NAME Linux)
9
+set(CMAKE_SYSTEM_PROCESSOR aarch64)
10
+
11
+# specify the cross compiler
12
+set(CMAKE_C_COMPILER aarch64-linux-gnu-gcc)
13
+set(CMAKE_CXX_COMPILER aarch64-linux-gnu-g++)
14
+
15
+# specify the target environment
16
+SET(CMAKE_FIND_ROOT_PATH  /usr/aarch64-linux-gnu)
17
x265_3.4.tar.gz/build/aarch64-linux/make-Makefiles.bash Added
6
 
1
@@ -0,0 +1,4 @@
2
+#!/bin/bash
3
+# Run this from within a bash shell
4
+
5
+cmake -DCMAKE_TOOLCHAIN_FILE="crosscompile.cmake" -G "Unix Makefiles" ../../source && ccmake ../../source
6
x265_3.3.tar.gz/doc/reST/cli.rst -> x265_3.4.tar.gz/doc/reST/cli.rst Changed
79
 
1
@@ -107,6 +107,9 @@
2
    
3
    **BufferFillFinal** Buffer bits available after removing the frame out of CPB.
4
    
5
+   **UnclippedBufferFillFinal** Unclipped buffer bits available after removing the frame 
6
+   out of CPB only used for csv logging purpose.
7
+   
8
    **Latency** Latency in terms of number of frames between when the frame 
9
    was given in and when the frame is given out.
10
    
11
@@ -842,15 +845,31 @@
12
    Measure 2Nx2N merge candidates first; if no residual is found, 
13
    additional modes at that depth are not analysed. Default disabled
14
 
15
-.. option:: --rskip, --no-rskip
16
+.. option:: --rskip <0|1|2>
17
+
18
+   This option determines early exit from CU depth recursion in modes 1 and 2. When a skip CU is
19
+   found, additional heuristics (depending on the RD level and rskip mode) are used to decide whether
20
+   to terminate recursion. The following table summarizes the behavior.
21
+   
22
+   +----------+------------+----------------------------------------------------------------+
23
+   | RD Level | Rskip Mode |   Skip Recursion Heuristic                                     |
24
+   +==========+============+================================================================+
25
+   |   0 - 4  |      1     |   Neighbour costs and CU homogenity.                           |
26
+   +----------+------------+----------------------------------------------------------------+
27
+   |   5 - 6  |      1     |   Comparison with inter2Nx2N.                                  |
28
+   +----------+------------+----------------------------------------------------------------+
29
+   |   0 - 6  |      2     |   CU edge density.                                             |
30
+   +----------+------------+----------------------------------------------------------------+
31
+
32
+   Provides minimal quality degradation at good performance gains for non-zero modes.
33
+   :option:`--rskip mode 0` means disabled. Default: 1, disabled when :option:`--tune grain` is used.
34
+   This is a integer value representing the edge-density percentage within the CU. Internally normalized to a number between 0.0 to 1.0 in x265. 
35
+   Recommended low thresholds for slow encodes and high for fast encodes.
36
 
37
-   This option determines early exit from CU depth recursion. When a skip CU is
38
-   found, additional heuristics (depending on rd-level) are used to decide whether
39
-   to terminate recursion. In rdlevels 5 and 6, comparison with inter2Nx2N is used, 
40
-   while at rdlevels 4 and neighbour costs are used to skip recursion.
41
-   Provides minimal quality degradation at good performance gains when enabled. 
42
+.. option:: --rskip-edge-threshold <0..100>
43
 
44
-   Default: enabled, disabled for :option:`--tune grain`
45
+   Denotes the minimum expected edge-density percentage within the CU, below which the recursion is skipped.
46
+   Default: 5, requires :option:`--rskip mode 2` to be enabled.
47
 
48
 .. option:: --splitrd-skip, --no-splitrd-skip
49
 
50
@@ -2501,6 +2520,28 @@
51
    --recon-y4m-exec "ffplay -i pipe:0 -autoexit"
52
 
53
    **CLI ONLY**
54
+   
55
+ABR-ladder Options
56
+==================
57
+
58
+.. option:: --abr-ladder <filename>
59
+
60
+   File containing the encoder configurations to generate ABR ladder.
61
+   The format of each line is:
62
+
63
+   **<encID:reuse-level:refID> <CLI>**
64
+   
65
+   where, encID indicates the unique name given to the encode, refID indicates
66
+   the name of the encode from which analysis info has to be re-used ( set to 'nil'
67
+   if analysis reuse isn't preferred ), and reuse-level indicates the level ( :option:`--analysis-load-reuse-level`)
68
+   at which analysis info has to be reused.
69
+   
70
+   A sample config file is available in `the downloads page <https://bitbucket.org/multicoreware/x265/downloads/Sample_ABR_ladder_config>`_
71
+   
72
+   Default: Disabled ( Conventional single encode generation ). Experimental feature.
73
+
74
+   **CLI ONLY**
75
+
76
 
77
 SVT-HEVC Encoder Options
78
 ========================
79
x265_3.3.tar.gz/doc/reST/releasenotes.rst -> x265_3.4.tar.gz/doc/reST/releasenotes.rst Changed
34
 
1
@@ -2,6 +2,32 @@
2
 Release Notes
3
 *************
4
 
5
+Version 3.4
6
+===========
7
+
8
+Release date - 29th May, 2020.
9
+
10
+New features
11
+------------
12
+1. **Edge-aware quadtree partitioning** to terminate CU depth recursion based on edge information. :option:`--rskip` level 2 enables the feature and  :option:`--rskip-edge-threshold` denotes the minimum expected edge-density percentage within the CU, below which the recursion is skipped. Experimental feature.
13
+2. Application-level feature :option:`--abr-ladder` for automating efficient ABR ladder generation. Shows ~65% savings in the over-all turn-around time required for the generation of a typical Apple HLS ladder in Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz over a sequential ABR-ladder generation approach that leverages save-load architecture.
14
+
15
+Enhancements to existing features
16
+---------------------------------
17
+1. Improved efficiency in 2-pass rate-control algorithm. The savings in the bitrate is ~1.72% with visual improvement in quality in the initial 1-2 secs.
18
+
19
+Encoder enhancements
20
+--------------------
21
+1. Faster ARM64 encodes enabled by ASM contributions from Huawei. The speed-up over no-asm version for 1080p encodes @ medium preset is ~15% in a 16 core H/W.
22
+2. Strict VBV conformance in zone encoding.
23
+
24
+Bug fixes
25
+---------
26
+1. Multi-pass encode failures with :option:`--frame-dup`.
27
+2. Corrupted bitstreams with :option:`--hist-scenecut` when input depth and internal bit-depth differ.
28
+3. Incorrect analysis propagation in multi-level save-load architecture.
29
+4. Failure in detecting NUMA packages installed in non-standard directories.
30
+
31
 Version 3.3
32
 ===========
33
 
34
x265_3.3.tar.gz/source/CMakeLists.txt -> x265_3.4.tar.gz/source/CMakeLists.txt Changed
109
 
1
@@ -29,7 +29,7 @@
2
 option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF)
3
 mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
4
 # X265_BUILD must be incremented each time the public API is changed
5
-set(X265_BUILD 188)
6
+set(X265_BUILD 192)
7
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
8
                "${PROJECT_BINARY_DIR}/x265.def")
9
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
10
@@ -40,7 +40,7 @@
11
 # System architecture detection
12
 string(TOLOWER "${CMAKE_SYSTEM_PROCESSOR}" SYSPROC)
13
 set(X86_ALIASES x86 i386 i686 x86_64 amd64)
14
-set(ARM_ALIASES armv6l armv7l)
15
+set(ARM_ALIASES armv6l armv7l aarch64)
16
 list(FIND X86_ALIASES "${SYSPROC}" X86MATCH)
17
 list(FIND ARM_ALIASES "${SYSPROC}" ARMMATCH)
18
 set(POWER_ALIASES ppc64 ppc64le)
19
@@ -70,9 +70,15 @@
20
     else()
21
         set(CROSS_COMPILE_ARM 0)
22
     endif()
23
-    message(STATUS "Detected ARM target processor")
24
     set(ARM 1)
25
-    add_definitions(-DX265_ARCH_ARM=1 -DHAVE_ARMV6=1)
26
+    if("${CMAKE_SIZEOF_VOID_P}" MATCHES 8)
27
+        message(STATUS "Detected ARM64 target processor")
28
+        set(ARM64 1)
29
+        add_definitions(-DX265_ARCH_ARM=1 -DX265_ARCH_ARM64=1 -DHAVE_ARMV6=0)
30
+    else()
31
+        message(STATUS "Detected ARM target processor")
32
+        add_definitions(-DX265_ARCH_ARM=1 -DX265_ARCH_ARM64=0 -DHAVE_ARMV6=1)
33
+    endif()
34
 else()
35
     message(STATUS "CMAKE_SYSTEM_PROCESSOR value `${CMAKE_SYSTEM_PROCESSOR}` is unknown")
36
     message(STATUS "Please add this value near ${CMAKE_CURRENT_LIST_FILE}:${CMAKE_CURRENT_LIST_LINE}")
37
@@ -95,6 +101,8 @@
38
         if(NUMA_FOUND)
39
             link_directories(${NUMA_LIBRARY_DIR})
40
             list(APPEND CMAKE_REQUIRED_LIBRARIES numa)
41
+            list(APPEND CMAKE_REQUIRED_INCLUDES ${NUMA_INCLUDE_DIR})
42
+            list(APPEND CMAKE_REQUIRED_LINK_OPTIONS "-L${NUMA_LIBRARY_DIR}")
43
             check_symbol_exists(numa_node_of_cpu numa.h NUMA_V2)
44
             if(NUMA_V2)
45
                 add_definitions(-DHAVE_LIBNUMA)
46
@@ -231,14 +239,24 @@
47
         endif()
48
     endif()
49
     if(ARM AND CROSS_COMPILE_ARM)
50
-        set(ARM_ARGS -march=armv6 -mfloat-abi=soft -mfpu=vfp -marm -fPIC)
51
+        if(ARM64)
52
+            set(ARM_ARGS -fPIC)
53
+        else()
54
+            set(ARM_ARGS -march=armv6 -mfloat-abi=soft -mfpu=vfp -marm -fPIC)
55
+        endif()
56
+        message(STATUS "cross compile arm")
57
     elseif(ARM)
58
-        find_package(Neon)
59
-        if(CPU_HAS_NEON)
60
-            set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=neon -marm -fPIC)
61
+        if(ARM64)
62
+            set(ARM_ARGS -fPIC)
63
             add_definitions(-DHAVE_NEON)
64
         else()
65
-            set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=vfp -marm)
66
+            find_package(Neon)
67
+            if(CPU_HAS_NEON)
68
+                set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=neon -marm -fPIC)
69
+                add_definitions(-DHAVE_NEON)
70
+            else()
71
+                set(ARM_ARGS -mcpu=native -mfloat-abi=hard -mfpu=vfp -marm)
72
+            endif()
73
         endif()
74
     endif()
75
     add_definitions(${ARM_ARGS})
76
@@ -518,7 +536,11 @@
77
     # compile ARM arch asm files here
78
         enable_language(ASM)
79
         foreach(ASM ${ARM_ASMS})
80
-            set(ASM_SRC ${CMAKE_CURRENT_SOURCE_DIR}/common/arm/${ASM})
81
+            if(ARM64)
82
+                set(ASM_SRC ${CMAKE_CURRENT_SOURCE_DIR}/common/aarch64/${ASM})
83
+            else()
84
+                set(ASM_SRC ${CMAKE_CURRENT_SOURCE_DIR}/common/arm/${ASM})
85
+            endif()
86
             list(APPEND ASM_SRCS ${ASM_SRC})
87
             list(APPEND ASM_OBJS ${ASM}.${SUFFIX})
88
             add_custom_command(
89
@@ -725,16 +747,16 @@
90
         # Xcode seems unable to link the CLI with libs, so link as one targget
91
         if(ENABLE_HDR10_PLUS)
92
         add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT}
93
-                        x265.cpp x265.h x265cli.h
94
+                        x265.cpp x265.h x265cli.cpp x265cli.h abrEncApp.cpp abrEncApp.h
95
                         $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10> ${ASM_OBJS})
96
         else()
97
             add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT}
98
-                        x265.cpp x265.h x265cli.h
99
+                        x265.cpp x265.h x265cli.cpp x265cli.h abrEncApp.cpp abrEncApp.h
100
                         $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> ${ASM_OBJS})
101
         endif()
102
     else()
103
         add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT} ${X265_RC_FILE}
104
-                       ${ExportDefs} x265.cpp x265.h x265cli.h)
105
+                       ${ExportDefs} x265.cpp x265.h x265cli.cpp x265cli.h abrEncApp.cpp abrEncApp.h)
106
         if(WIN32 OR NOT ENABLE_SHARED OR INTEL_CXX)
107
             # The CLI cannot link to the shared library on Windows, it
108
             # requires internal APIs not exported from the DLL
109
x265_3.4.tar.gz/source/abrEncApp.cpp Added
1110
 
1
@@ -0,0 +1,1108 @@
2
+/*****************************************************************************
3
+* Copyright (C) 2013-2020 MulticoreWare, Inc
4
+*
5
+* Authors: Pooja Venkatesan <pooja@multicorewareinc.com>
6
+*          Aruna Matheswaran <aruna@multicorewareinc.com>
7
+*
8
+* This program is free software; you can redistribute it and/or modify
9
+* it under the terms of the GNU General Public License as published by
10
+* the Free Software Foundation; either version 2 of the License, or
11
+* (at your option) any later version.
12
+*
13
+* This program is distributed in the hope that it will be useful,
14
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
15
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
16
+* GNU General Public License for more details.
17
+*
18
+* You should have received a copy of the GNU General Public License
19
+* along with this program; if not, write to the Free Software
20
+* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
21
+*
22
+* This program is also available under a commercial proprietary license.
23
+* For more information, contact us at license @ x265.com.
24
+*****************************************************************************/
25
+
26
+#include "abrEncApp.h"
27
+#include "mv.h"
28
+#include "slice.h"
29
+#include "param.h"
30
+
31
+#include <signal.h>
32
+#include <errno.h>
33
+
34
+#include <queue>
35
+
36
+using namespace X265_NS;
37
+
38
+/* Ctrl-C handler */
39
+static volatile sig_atomic_t b_ctrl_c /* = 0 */;
40
+static void sigint_handler(int)
41
+{
42
+    b_ctrl_c = 1;
43
+}
44
+
45
+namespace X265_NS {
46
+    // private namespace
47
+#define X265_INPUT_QUEUE_SIZE 250
48
+
49
+    AbrEncoder::AbrEncoder(CLIOptions cliopt[], uint8_t numEncodes, int &ret)
50
+    {
51
+        m_numEncodes = numEncodes;
52
+        m_numActiveEncodes.set(numEncodes);
53
+        m_queueSize = (numEncodes > 1) ? X265_INPUT_QUEUE_SIZE : 1;
54
+        m_passEnc = X265_MALLOC(PassEncoder*, m_numEncodes);
55
+
56
+        for (uint8_t i = 0; i < m_numEncodes; i++)
57
+        {
58
+            m_passEnc[i] = new PassEncoder(i, cliopt[i], this);
59
+            if (!m_passEnc[i])
60
+            {
61
+                x265_log(NULL, X265_LOG_ERROR, "Unable to allocate memory for passEncoder\n");
62
+                ret = 4;
63
+            }
64
+            m_passEnc[i]->init(ret);
65
+        }
66
+
67
+        if (!allocBuffers())
68
+        {
69
+            x265_log(NULL, X265_LOG_ERROR, "Unable to allocate memory for buffers\n");
70
+            ret = 4;
71
+        }
72
+
73
+        /* start passEncoder worker threads */
74
+        for (uint8_t pass = 0; pass < m_numEncodes; pass++)
75
+            m_passEnc[pass]->startThreads();
76
+    }
77
+
78
+    bool AbrEncoder::allocBuffers()
79
+    {
80
+        m_inputPicBuffer = X265_MALLOC(x265_picture**, m_numEncodes);
81
+        m_analysisBuffer = X265_MALLOC(x265_analysis_data*, m_numEncodes);
82
+
83
+        m_picWriteCnt = new ThreadSafeInteger[m_numEncodes];
84
+        m_picReadCnt = new ThreadSafeInteger[m_numEncodes];
85
+        m_analysisWriteCnt = new ThreadSafeInteger[m_numEncodes];
86
+        m_analysisReadCnt = new ThreadSafeInteger[m_numEncodes];
87
+
88
+        m_picIdxReadCnt = X265_MALLOC(ThreadSafeInteger*, m_numEncodes);
89
+        m_analysisWrite = X265_MALLOC(ThreadSafeInteger*, m_numEncodes);
90
+        m_analysisRead = X265_MALLOC(ThreadSafeInteger*, m_numEncodes);
91
+        m_readFlag = X265_MALLOC(int*, m_numEncodes);
92
+
93
+        for (uint8_t pass = 0; pass < m_numEncodes; pass++)
94
+        {
95
+            m_inputPicBuffer[pass] = X265_MALLOC(x265_picture*, m_queueSize);
96
+            for (uint32_t idx = 0; idx < m_queueSize; idx++)
97
+            {
98
+                m_inputPicBuffer[pass][idx] = x265_picture_alloc();
99
+                x265_picture_init(m_passEnc[pass]->m_param, m_inputPicBuffer[pass][idx]);
100
+            }
101
+
102
+            m_analysisBuffer[pass] = X265_MALLOC(x265_analysis_data, m_queueSize);
103
+            m_picIdxReadCnt[pass] = new ThreadSafeInteger[m_queueSize];
104
+            m_analysisWrite[pass] = new ThreadSafeInteger[m_queueSize];
105
+            m_analysisRead[pass] = new ThreadSafeInteger[m_queueSize];
106
+            m_readFlag[pass] = X265_MALLOC(int, m_queueSize);
107
+        }
108
+        return true;
109
+    }
110
+
111
+    void AbrEncoder::destroy()
112
+    {
113
+        x265_cleanup(); /* Free library singletons */
114
+        for (uint8_t pass = 0; pass < m_numEncodes; pass++)
115
+        {
116
+            for (uint32_t index = 0; index < m_queueSize; index++)
117
+            {
118
+                X265_FREE(m_inputPicBuffer[pass][index]->planes[0]);
119
+                x265_picture_free(m_inputPicBuffer[pass][index]);
120
+            }
121
+
122
+            X265_FREE(m_inputPicBuffer[pass]);
123
+            X265_FREE(m_analysisBuffer[pass]);
124
+            X265_FREE(m_readFlag[pass]);
125
+            delete[] m_picIdxReadCnt[pass];
126
+            delete[] m_analysisWrite[pass];
127
+            delete[] m_analysisRead[pass];
128
+            m_passEnc[pass]->destroy();
129
+            delete m_passEnc[pass];
130
+        }
131
+        X265_FREE(m_inputPicBuffer);
132
+        X265_FREE(m_analysisBuffer);
133
+        X265_FREE(m_readFlag);
134
+
135
+        delete[] m_picWriteCnt;
136
+        delete[] m_picReadCnt;
137
+        delete[] m_analysisWriteCnt;
138
+        delete[] m_analysisReadCnt;
139
+
140
+        X265_FREE(m_picIdxReadCnt);
141
+        X265_FREE(m_analysisWrite);
142
+        X265_FREE(m_analysisRead);
143
+
144
+        X265_FREE(m_passEnc);
145
+    }
146
+
147
+    PassEncoder::PassEncoder(uint32_t id, CLIOptions cliopt, AbrEncoder *parent)
148
+    {
149
+        m_id = id;
150
+        m_cliopt = cliopt;
151
+        m_parent = parent;
152
+        if(!(m_cliopt.enableScaler && m_id))
153
+            m_input = m_cliopt.input;
154
+        m_param = cliopt.param;
155
+        m_inputOver = false;
156
+        m_lastIdx = -1;
157
+        m_encoder = NULL;
158
+        m_scaler = NULL;
159
+        m_reader = NULL;
160
+        m_ret = 0;
161
+    }
162
+
163
+    int PassEncoder::init(int &result)
164
+    {
165
+        if (m_parent->m_numEncodes > 1)
166
+            setReuseLevel();
167
+                
168
+        if (!(m_cliopt.enableScaler && m_id))
169
+            m_reader = new Reader(m_id, this);
170
+        else
171
+        {
172
+            VideoDesc *src = NULL, *dst = NULL;
173
+            dst = new VideoDesc(m_param->sourceWidth, m_param->sourceHeight, m_param->internalCsp, m_param->internalBitDepth);
174
+            int dstW = m_parent->m_passEnc[m_id - 1]->m_param->sourceWidth;
175
+            int dstH = m_parent->m_passEnc[m_id - 1]->m_param->sourceHeight;
176
+            src = new VideoDesc(dstW, dstH, m_param->internalCsp, m_param->internalBitDepth);
177
+            if (src != NULL && dst != NULL)
178
+            {
179
+                m_scaler = new Scaler(0, 1, m_id, src, dst, this);
180
+                if (!m_scaler)
181
+                {
182
+                    x265_log(m_param, X265_LOG_ERROR, "\n MALLOC failure in Scaler");
183
+                    result = 4;
184
+                }
185
+            }
186
+        }
187
+
188
+        /* note: we could try to acquire a different libx265 API here based on
189
+        * the profile found during option parsing, but it must be done before
190
+        * opening an encoder */
191
+
192
+        if (m_param)
193
+            m_encoder = m_cliopt.api->encoder_open(m_param);
194
+        if (!m_encoder)
195
+        {
196
+            x265_log(NULL, X265_LOG_ERROR, "x265_encoder_open() failed for Enc, \n");
197
+            m_ret = 2;
198
+            return -1;
199
+        }
200
+
201
+        /* get the encoder parameters post-initialization */
202
+        m_cliopt.api->encoder_parameters(m_encoder, m_param);
203
+
204
+        return 1;
205
+    }
206
+
207
+    void PassEncoder::setReuseLevel()
208
+    {
209
+        uint32_t r, padh = 0, padw = 0;
210
+
211
+        m_param->confWinBottomOffset = m_param->confWinRightOffset = 0;
212
+
213
+        m_param->analysisLoadReuseLevel = m_cliopt.loadLevel;
214
+        m_param->analysisSaveReuseLevel = m_cliopt.saveLevel;
215
+        m_param->analysisSave = m_cliopt.saveLevel ? "save.dat" : NULL;
216
+        m_param->analysisLoad = m_cliopt.loadLevel ? "load.dat" : NULL;
217
+        m_param->bUseAnalysisFile = 0;
218
+
219
+        if (m_cliopt.loadLevel)
220
+        {
221
+            x265_param *refParam = m_parent->m_passEnc[m_cliopt.refId]->m_param;
222
+
223
+            if (m_param->sourceHeight == (refParam->sourceHeight - refParam->confWinBottomOffset) &&
224
+                m_param->sourceWidth == (refParam->sourceWidth - refParam->confWinRightOffset))
225
+            {
226
+                m_parent->m_passEnc[m_id]->m_param->confWinBottomOffset = refParam->confWinBottomOffset;
227
+                m_parent->m_passEnc[m_id]->m_param->confWinRightOffset = refParam->confWinRightOffset;
228
+            }
229
+            else
230
+            {
231
+                int srcH = refParam->sourceHeight - refParam->confWinBottomOffset;
232
+                int srcW = refParam->sourceWidth - refParam->confWinRightOffset;
233
+
234
+                double scaleFactorH = double(m_param->sourceHeight / srcH);
235
+                double scaleFactorW = double(m_param->sourceWidth / srcW);
236
+
237
+                int absScaleFactorH = (int)(10 * scaleFactorH + 0.5);
238
+                int absScaleFactorW = (int)(10 * scaleFactorW + 0.5);
239
+
240
+                if (absScaleFactorH == 20 && absScaleFactorW == 20)
241
+                {
242
+                    m_param->scaleFactor = 2;
243
+
244
+                    m_parent->m_passEnc[m_id]->m_param->confWinBottomOffset = refParam->confWinBottomOffset * 2;
245
+                    m_parent->m_passEnc[m_id]->m_param->confWinRightOffset = refParam->confWinRightOffset * 2;
246
+
247
+                }
248
+            }
249
+        }
250
+
251
+        int h = m_param->sourceHeight + m_param->confWinBottomOffset;
252
+        int w = m_param->sourceWidth + m_param->confWinRightOffset;
253
+        if (h & (m_param->minCUSize - 1))
254
+        {
255
+            r = h & (m_param->minCUSize - 1);
256
+            padh = m_param->minCUSize - r;
257
+            m_param->confWinBottomOffset += padh;
258
+
259
+        }
260
+
261
+        if (w & (m_param->minCUSize - 1))
262
+        {
263
+            r = w & (m_param->minCUSize - 1);
264
+            padw = m_param->minCUSize - r;
265
+            m_param->confWinRightOffset += padw;
266
+        }
267
+    }
268
+
269
+    void PassEncoder::startThreads()
270
+    {
271
+        /* Start slave worker threads */
272
+        m_threadActive = true;
273
+        start();
274
+        /* Start reader threads*/
275
+        if (m_reader != NULL)
276
+        {
277
+            m_reader->m_threadActive = true;
278
+            m_reader->start();
279
+        }
280
+        /* Start scaling worker threads */
281
+        if (m_scaler != NULL)
282
+        {
283
+            m_scaler->m_threadActive = true;
284
+            m_scaler->start();
285
+        }
286
+    }
287
+
288
+    void PassEncoder::copyInfo(x265_analysis_data * src)
289
+    {
290
+
291
+        uint32_t written = m_parent->m_analysisWriteCnt[m_id].get();
292
+
293
+        int index = written % m_parent->m_queueSize;
294
+        //If all streams have read analysis data, reuse that position in Queue
295
+
296
+        int read = m_parent->m_analysisRead[m_id][index].get();
297
+        int write = m_parent->m_analysisWrite[m_id][index].get();
298
+
299
+        int overwrite = written / m_parent->m_queueSize;
300
+        bool emptyIdxFound = 0;
301
+        while (!emptyIdxFound && overwrite)
302
+        {
303
+            for (uint32_t i = 0; i < m_parent->m_queueSize; i++)
304
+            {
305
+                read = m_parent->m_analysisRead[m_id][i].get();
306
+                write = m_parent->m_analysisWrite[m_id][i].get();
307
+                write *= m_cliopt.numRefs;
308
+
309
+                if (read == write)
310
+                {
311
+                    index = i;
312
+                    emptyIdxFound = 1;
313
+                }
314
+            }
315
+        }
316
+
317
+        x265_analysis_data *m_analysisInfo = &m_parent->m_analysisBuffer[m_id][index];
318
+
319
+        memcpy(m_analysisInfo, src, sizeof(x265_analysis_data));
320
+        x265_alloc_analysis_data(m_param, m_analysisInfo);
321
+
322
+        bool isVbv = m_param->rc.vbvBufferSize && m_param->rc.vbvMaxBitrate;
323
+        if (m_param->bDisableLookahead && isVbv)
324
+        {
325
+            memcpy(m_analysisInfo->lookahead.intraSatdForVbv, src->lookahead.intraSatdForVbv, src->numCuInHeight * sizeof(uint32_t));
326
+            memcpy(m_analysisInfo->lookahead.satdForVbv, src->lookahead.satdForVbv, src->numCuInHeight * sizeof(uint32_t));
327
+            memcpy(m_analysisInfo->lookahead.intraVbvCost, src->lookahead.intraVbvCost, src->numCUsInFrame * sizeof(uint32_t));
328
+            memcpy(m_analysisInfo->lookahead.vbvCost, src->lookahead.vbvCost, src->numCUsInFrame * sizeof(uint32_t));
329
+        }
330
+
331
+        if (src->sliceType == X265_TYPE_IDR || src->sliceType == X265_TYPE_I)
332
+        {
333
+            if (m_param->analysisSaveReuseLevel < 2)
334
+                goto ret;
335
+            x265_analysis_intra_data *intraDst, *intraSrc;
336
+            intraDst = (x265_analysis_intra_data*)m_analysisInfo->intraData;
337
+            intraSrc = (x265_analysis_intra_data*)src->intraData;
338
+            memcpy(intraDst->depth, intraSrc->depth, sizeof(uint8_t) * src->depthBytes);
339
+            memcpy(intraDst->modes, intraSrc->modes, sizeof(uint8_t) * src->numCUsInFrame * src->numPartitions);
340
+            memcpy(intraDst->partSizes, intraSrc->partSizes, sizeof(char) * src->depthBytes);
341
+            memcpy(intraDst->chromaModes, intraSrc->chromaModes, sizeof(uint8_t) * src->depthBytes);
342
+            if (m_param->rc.cuTree)
343
+                memcpy(intraDst->cuQPOff, intraSrc->cuQPOff, sizeof(int8_t) * src->depthBytes);
344
+        }
345
+        else
346
+        {
347
+            bool bIntraInInter = (src->sliceType == X265_TYPE_P || m_param->bIntraInBFrames);
348
+            int numDir = src->sliceType == X265_TYPE_P ? 1 : 2;
349
+            memcpy(m_analysisInfo->wt, src->wt, sizeof(WeightParam) * 3 * numDir);
350
+            if (m_param->analysisSaveReuseLevel < 2)
351
+                goto ret;
352
+            x265_analysis_inter_data *interDst, *interSrc;
353
+            interDst = (x265_analysis_inter_data*)m_analysisInfo->interData;
354
+            interSrc = (x265_analysis_inter_data*)src->interData;
355
+            memcpy(interDst->depth, interSrc->depth, sizeof(uint8_t) * src->depthBytes);
356
+            memcpy(interDst->modes, interSrc->modes, sizeof(uint8_t) * src->depthBytes);
357
+            if (m_param->rc.cuTree)
358
+                memcpy(interDst->cuQPOff, interSrc->cuQPOff, sizeof(int8_t) * src->depthBytes);
359
+            if (m_param->analysisSaveReuseLevel > 4)
360
+            {
361
+                memcpy(interDst->partSize, interSrc->partSize, sizeof(uint8_t) * src->depthBytes);
362
+                memcpy(interDst->mergeFlag, interSrc->mergeFlag, sizeof(uint8_t) * src->depthBytes);
363
+                if (m_param->analysisSaveReuseLevel == 10)
364
+                {
365
+                    memcpy(interDst->interDir, interSrc->interDir, sizeof(uint8_t) * src->depthBytes);
366
+                    for (int dir = 0; dir < numDir; dir++)
367
+                    {
368
+                        memcpy(interDst->mvpIdx[dir], interSrc->mvpIdx[dir], sizeof(uint8_t) * src->depthBytes);
369
+                        memcpy(interDst->refIdx[dir], interSrc->refIdx[dir], sizeof(int8_t) * src->depthBytes);
370
+                        memcpy(interDst->mv[dir], interSrc->mv[dir], sizeof(MV) * src->depthBytes);
371
+                    }
372
+                    if (bIntraInInter)
373
+                    {
374
+                        x265_analysis_intra_data *intraDst = (x265_analysis_intra_data*)m_analysisInfo->intraData;
375
+                        x265_analysis_intra_data *intraSrc = (x265_analysis_intra_data*)src->intraData;
376
+                        memcpy(intraDst->modes, intraSrc->modes, sizeof(uint8_t) * src->numPartitions * src->numCUsInFrame);
377
+                        memcpy(intraDst->chromaModes, intraSrc->chromaModes, sizeof(uint8_t) * src->depthBytes);
378
+                    }
379
+               }
380
+            }
381
+            if (m_param->analysisSaveReuseLevel != 10)
382
+                memcpy(interDst->ref, interSrc->ref, sizeof(int32_t) * src->numCUsInFrame * X265_MAX_PRED_MODE_PER_CTU * numDir);
383
+        }
384
+
385
+ret:
386
+        //increment analysis Write counter 
387
+        m_parent->m_analysisWriteCnt[m_id].incr();
388
+        m_parent->m_analysisWrite[m_id][index].incr();
389
+        return;
390
+    }
391
+
392
+
393
+    bool PassEncoder::readPicture(x265_picture *dstPic)
394
+    {
395
+        /*Check and wait if there any input frames to read*/
396
+        int ipread = m_parent->m_picReadCnt[m_id].get();
397
+        int ipwrite = m_parent->m_picWriteCnt[m_id].get();
398
+
399
+        bool isAbrLoad = m_cliopt.loadLevel && (m_parent->m_numEncodes > 1);
400
+        while (!m_inputOver && (ipread == ipwrite))
401
+        {
402
+            ipwrite = m_parent->m_picWriteCnt[m_id].waitForChange(ipwrite);
403
+        }
404
+
405
+        if (m_threadActive && ipread < ipwrite)
406
+        {
407
+            /*Get input index to read from inputQueue. If doesn't need analysis info, it need not wait to fetch poc from analysisQueue*/
408
+            int readPos = ipread % m_parent->m_queueSize;
409
+            x265_analysis_data* analysisData = 0;
410
+
411
+            if (isAbrLoad)
412
+            {
413
+                /*If stream is master of each slave pass, then fetch analysis data from prev pass*/
414
+                int analysisQId = m_cliopt.refId;
415
+                /*Check and wait if there any analysis Data to read*/
416
+                int analysisWrite = m_parent->m_analysisWriteCnt[analysisQId].get();
417
+                int written = analysisWrite * m_parent->m_passEnc[analysisQId]->m_cliopt.numRefs;
418
+                int analysisRead = m_parent->m_analysisReadCnt[analysisQId].get();
419
+                
420
+                while (m_threadActive && written == analysisRead)
421
+                {
422
+                    analysisWrite = m_parent->m_analysisWriteCnt[analysisQId].waitForChange(analysisWrite);
423
+                    written = analysisWrite * m_parent->m_passEnc[analysisQId]->m_cliopt.numRefs;
424
+                }
425
+
426
+                if (analysisRead < written)
427
+                {
428
+                    int analysisIdx = 0;
429
+                    if (!m_param->bDisableLookahead)
430
+                    {
431
+                        bool analysisdRead = false;
432
+                        while ((analysisRead < written) && !analysisdRead)
433
+                        {
434
+                            while (analysisWrite < ipread)
435
+                            {
436
+                                analysisWrite = m_parent->m_analysisWriteCnt[analysisQId].waitForChange(analysisWrite);
437
+                                written = analysisWrite * m_parent->m_passEnc[analysisQId]->m_cliopt.numRefs;
438
+                            }
439
+                            for (uint32_t i = 0; i < m_parent->m_queueSize; i++)
440
+                            {
441
+                                analysisData = &m_parent->m_analysisBuffer[analysisQId][i];
442
+                                int read = m_parent->m_analysisRead[analysisQId][i].get();
443
+                                int write = m_parent->m_analysisWrite[analysisQId][i].get() * m_parent->m_passEnc[analysisQId]->m_cliopt.numRefs;
444
+                                if ((analysisData->poc == (uint32_t)(ipread)) && (read < write))
445
+                                {
446
+                                    analysisIdx = i;
447
+                                    analysisdRead = true;
448
+                                    break;
449
+                                }
450
+                            }
451
+                        }
452
+                    }
453
+                    else
454
+                    {
455
+                        analysisIdx = analysisRead % m_parent->m_queueSize;
456
+                        analysisData = &m_parent->m_analysisBuffer[analysisQId][analysisIdx];
457
+                        readPos = analysisData->poc % m_parent->m_queueSize;
458
+                        while ((ipwrite < readPos) || ((ipwrite - 1) < (int)analysisData->poc))
459
+                        {
460
+                            ipwrite = m_parent->m_picWriteCnt[m_id].waitForChange(ipwrite);
461
+                        }
462
+                    }
463
+
464
+                    m_lastIdx = analysisIdx;
465
+                }
466
+                else
467
+                    return false;
468
+            }
469
+
470
+
471
+            x265_picture *srcPic = (x265_picture*)(m_parent->m_inputPicBuffer[m_id][readPos]);
472
+
473
+            x265_picture *pic = (x265_picture*)(dstPic);
474
+            pic->colorSpace = srcPic->colorSpace;
475
+            pic->bitDepth = srcPic->bitDepth;
476
+            pic->framesize = srcPic->framesize;
477
+            pic->height = srcPic->height;
478
+            pic->pts = srcPic->pts;
479
+            pic->dts = srcPic->dts;
480
+            pic->reorderedPts = srcPic->reorderedPts;
481
+            pic->width = srcPic->width;
482
+            pic->analysisData = srcPic->analysisData;
483
+            pic->userSEI = srcPic->userSEI;
484
+            pic->stride[0] = srcPic->stride[0];
485
+            pic->stride[1] = srcPic->stride[1];
486
+            pic->stride[2] = srcPic->stride[2];
487
+            pic->planes[0] = srcPic->planes[0];
488
+            pic->planes[1] = srcPic->planes[1];
489
+            pic->planes[2] = srcPic->planes[2];
490
+            if (isAbrLoad)
491
+                pic->analysisData = *analysisData;
492
+            return true;
493
+        }
494
+        else
495
+            return false;
496
+    }
497
+
498
+    void PassEncoder::threadMain()
499
+    {
500
+        THREAD_NAME("PassEncoder", m_id);
501
+
502
+        while (m_threadActive)
503
+        {
504
+
505
+#if ENABLE_LIBVMAF
506
+            x265_vmaf_data* vmafdata = m_cliopt.vmafData;
507
+#endif
508
+            /* This allows muxers to modify bitstream format */
509
+            m_cliopt.output->setParam(m_param);
510
+            const x265_api* api = m_cliopt.api;
511
+            ReconPlay* reconPlay = NULL;
512
+            if (m_cliopt.reconPlayCmd)
513
+                reconPlay = new ReconPlay(m_cliopt.reconPlayCmd, *m_param);
514
+            char* profileName = m_cliopt.encName ? m_cliopt.encName : (char *)"x265";
515
+
516
+            if (m_cliopt.zoneFile)
517
+            {
518
+                if (!m_cliopt.parseZoneFile())
519
+                {
520
+                    x265_log(NULL, X265_LOG_ERROR, "Unable to parse zonefile in %s\n", profileName);
521
+                    fclose(m_cliopt.zoneFile);
522
+                    m_cliopt.zoneFile = NULL;
523
+                }
524
+            }
525
+
526
+            if (signal(SIGINT, sigint_handler) == SIG_ERR)
527
+                x265_log(m_param, X265_LOG_ERROR, "Unable to register CTRL+C handler: %s in %s\n",
528
+                    strerror(errno), profileName);
529
+
530
+            x265_picture pic_orig, pic_out;
531
+            x265_picture *pic_in = &pic_orig;
532
+            /* Allocate recon picture if analysis save/load is enabled */
533
+            std::priority_queue<int64_t>* pts_queue = m_cliopt.output->needPTS() ? new std::priority_queue<int64_t>() : NULL;
534
+            x265_picture *pic_recon = (m_cliopt.recon || m_param->analysisSave || m_param->analysisLoad || pts_queue || reconPlay || m_param->csvLogLevel) ? &pic_out : NULL;
535
+            uint32_t inFrameCount = 0;
536
+            uint32_t outFrameCount = 0;
537
+            x265_nal *p_nal;
538
+            x265_stats stats;
539
+            uint32_t nal;
540
+            int16_t *errorBuf = NULL;
541
+            bool bDolbyVisionRPU = false;
542
+            uint8_t *rpuPayload = NULL;
543
+            int inputPicNum = 1;
544
+            x265_picture picField1, picField2;
545
+            x265_analysis_data* analysisInfo = (x265_analysis_data*)(&pic_out.analysisData);
546
+            bool isAbrSave = m_cliopt.saveLevel && (m_parent->m_numEncodes > 1);
547
+
548
+            if (!m_param->bRepeatHeaders && !m_param->bEnableSvtHevc)
549
+            {
550
+                if (api->encoder_headers(m_encoder, &p_nal, &nal) < 0)
551
+                {
552
+                    x265_log(m_param, X265_LOG_ERROR, "Failure generating stream headers in %s\n", profileName);
553
+                    m_ret = 3;
554
+                    goto fail;
555
+                }
556
+                else
557
+                    m_cliopt.totalbytes += m_cliopt.output->writeHeaders(p_nal, nal);
558
+            }
559
+
560
+            if (m_param->bField && m_param->interlaceMode)
561
+            {
562
+                api->picture_init(m_param, &picField1);
563
+                api->picture_init(m_param, &picField2);
564
+                // return back the original height of input
565
+                m_param->sourceHeight *= 2;
566
+                api->picture_init(m_param, &pic_orig);
567
+            }
568
+            else
569
+                api->picture_init(m_param, &pic_orig);
570
+
571
+            if (m_param->dolbyProfile && m_cliopt.dolbyVisionRpu)
572
+            {
573
+                rpuPayload = X265_MALLOC(uint8_t, 1024);
574
+                pic_in->rpu.payload = rpuPayload;
575
+                if (pic_in->rpu.payload)
576
+                    bDolbyVisionRPU = true;
577
+            }
578
+
579
+            if (m_cliopt.bDither)
580
+            {
581
+                errorBuf = X265_MALLOC(int16_t, m_param->sourceWidth + 1);
582
+                if (errorBuf)
583
+                    memset(errorBuf, 0, (m_param->sourceWidth + 1) * sizeof(int16_t));
584
+                else
585
+                    m_cliopt.bDither = false;
586
+            }
587
+
588
+            // main encoder loop
589
+            while (pic_in && !b_ctrl_c)
590
+            {
591
+                pic_orig.poc = (m_param->bField && m_param->interlaceMode) ? inFrameCount * 2 : inFrameCount;
592
+                if (m_cliopt.qpfile)
593
+                {
594
+                    if (!m_cliopt.parseQPFile(pic_orig))
595
+                    {
596
+                        x265_log(NULL, X265_LOG_ERROR, "can't parse qpfile for frame %d in %s\n",
597
+                            pic_in->poc, profileName);
598
+                        fclose(m_cliopt.qpfile);
599
+                        m_cliopt.qpfile = NULL;
600
+                    }
601
+                }
602
+
603
+                if (m_cliopt.framesToBeEncoded && inFrameCount >= m_cliopt.framesToBeEncoded)
604
+                    pic_in = NULL;
605
+                else if (readPicture(pic_in))
606
+                    inFrameCount++;
607
+                else
608
+                    pic_in = NULL;
609
+
610
+                if (pic_in)
611
+                {
612
+                    if (pic_in->bitDepth > m_param->internalBitDepth && m_cliopt.bDither)
613
+                    {
614
+                        x265_dither_image(pic_in, m_cliopt.input->getWidth(), m_cliopt.input->getHeight(), errorBuf, m_param->internalBitDepth);
615
+                        pic_in->bitDepth = m_param->internalBitDepth;
616
+                    }
617
+                    /* Overwrite PTS */
618
+                    pic_in->pts = pic_in->poc;
619
+
620
+                    // convert to field
621
+                    if (m_param->bField && m_param->interlaceMode)
622
+                    {
623
+                        int height = pic_in->height >> 1;
624
+
625
+                        int static bCreated = 0;
626
+                        if (bCreated == 0)
627
+                        {
628
+                            bCreated = 1;
629
+                            inputPicNum = 2;
630
+                            picField1.fieldNum = 1;
631
+                            picField2.fieldNum = 2;
632
+
633
+                            picField1.bitDepth = picField2.bitDepth = pic_in->bitDepth;
634
+                            picField1.colorSpace = picField2.colorSpace = pic_in->colorSpace;
635
+                            picField1.height = picField2.height = pic_in->height >> 1;
636
+                            picField1.framesize = picField2.framesize = pic_in->framesize >> 1;
637
+
638
+                            size_t fieldFrameSize = (size_t)pic_in->framesize >> 1;
639
+                            char* field1Buf = X265_MALLOC(char, fieldFrameSize);
640
+                            char* field2Buf = X265_MALLOC(char, fieldFrameSize);
641
+
642
+                            int stride = picField1.stride[0] = picField2.stride[0] = pic_in->stride[0];
643
+                            uint64_t framesize = stride * (height >> x265_cli_csps[pic_in->colorSpace].height[0]);
644
+                            picField1.planes[0] = field1Buf;
645
+                            picField2.planes[0] = field2Buf;
646
+                            for (int i = 1; i < x265_cli_csps[pic_in->colorSpace].planes; i++)
647
+                            {
648
+                                picField1.planes[i] = field1Buf + framesize;
649
+                                picField2.planes[i] = field2Buf + framesize;
650
+
651
+                                stride = picField1.stride[i] = picField2.stride[i] = pic_in->stride[i];
652
+                                framesize += (stride * (height >> x265_cli_csps[pic_in->colorSpace].height[i]));
653
+                            }
654
+                            assert(framesize == picField1.framesize);
655
+                        }
656
+
657
+                        picField1.pts = picField1.poc = pic_in->poc;
658
+                        picField2.pts = picField2.poc = pic_in->poc + 1;
659
+
660
+                        picField1.userSEI = picField2.userSEI = pic_in->userSEI;
661
+
662
+                        //if (pic_in->userData)
663
+                        //{
664
+                        //    // Have to handle userData here
665
+                        //}
666
+
667
+                        if (pic_in->framesize)
668
+                        {
669
+                            for (int i = 0; i < x265_cli_csps[pic_in->colorSpace].planes; i++)
670
+                            {
671
+                                char* srcP1 = (char*)pic_in->planes[i];
672
+                                char* srcP2 = (char*)pic_in->planes[i] + pic_in->stride[i];
673
+                                char* p1 = (char*)picField1.planes[i];
674
+                                char* p2 = (char*)picField2.planes[i];
675
+
676
+                                int stride = picField1.stride[i];
677
+
678
+                                for (int y = 0; y < (height >> x265_cli_csps[pic_in->colorSpace].height[i]); y++)
679
+                                {
680
+                                    memcpy(p1, srcP1, stride);
681
+                                    memcpy(p2, srcP2, stride);
682
+                                    srcP1 += 2 * stride;
683
+                                    srcP2 += 2 * stride;
684
+                                    p1 += stride;
685
+                                    p2 += stride;
686
+                                }
687
+                            }
688
+                        }
689
+                    }
690
+
691
+                    if (bDolbyVisionRPU)
692
+                    {
693
+                        if (m_param->bField && m_param->interlaceMode)
694
+                        {
695
+                            if (m_cliopt.rpuParser(&picField1) > 0)
696
+                                goto fail;
697
+                            if (m_cliopt.rpuParser(&picField2) > 0)
698
+                                goto fail;
699
+                        }
700
+                        else
701
+                        {
702
+                            if (m_cliopt.rpuParser(pic_in) > 0)
703
+                                goto fail;
704
+                        }
705
+                    }
706
+                }
707
+
708
+                for (int inputNum = 0; inputNum < inputPicNum; inputNum++)
709
+                {
710
+                    x265_picture *picInput = NULL;
711
+                    if (inputPicNum == 2)
712
+                        picInput = pic_in ? (inputNum ? &picField2 : &picField1) : NULL;
713
+                    else
714
+                        picInput = pic_in;
715
+
716
+                    int numEncoded = api->encoder_encode(m_encoder, &p_nal, &nal, picInput, pic_recon);
717
+
718
+                    int idx = (inFrameCount - 1) % m_parent->m_queueSize;
719
+                    m_parent->m_picIdxReadCnt[m_id][idx].incr();
720
+                    m_parent->m_picReadCnt[m_id].incr();
721
+                    if (m_cliopt.loadLevel && picInput)
722
+                    {
723
+                        m_parent->m_analysisReadCnt[m_cliopt.refId].incr();
724
+                        m_parent->m_analysisRead[m_cliopt.refId][m_lastIdx].incr();
725
+                    }
726
+
727
+                    if (numEncoded < 0)
728
+                    {
729
+                        b_ctrl_c = 1;
730
+                        m_ret = 4;
731
+                        break;
732
+                    }
733
+
734
+                    if (reconPlay && numEncoded)
735
+                        reconPlay->writePicture(*pic_recon);
736
+
737
+                    outFrameCount += numEncoded;
738
+
739
+                    if (isAbrSave && numEncoded)
740
+                    {
741
+                        copyInfo(analysisInfo);
742
+                    }
743
+
744
+                    if (numEncoded && pic_recon && m_cliopt.recon)
745
+                        m_cliopt.recon->writePicture(pic_out);
746
+                    if (nal)
747
+                    {
748
+                        m_cliopt.totalbytes += m_cliopt.output->writeFrame(p_nal, nal, pic_out);
749
+                        if (pts_queue)
750
+                        {
751
+                            pts_queue->push(-pic_out.pts);
752
+                            if (pts_queue->size() > 2)
753
+                                pts_queue->pop();
754
+                        }
755
+                    }
756
+                    m_cliopt.printStatus(outFrameCount);
757
+                }
758
+            }
759
+
760
+            /* Flush the encoder */
761
+            while (!b_ctrl_c)
762
+            {
763
+                int numEncoded = api->encoder_encode(m_encoder, &p_nal, &nal, NULL, pic_recon);
764
+                if (numEncoded < 0)
765
+                {
766
+                    m_ret = 4;
767
+                    break;
768
+                }
769
+
770
+                if (reconPlay && numEncoded)
771
+                    reconPlay->writePicture(*pic_recon);
772
+
773
+                outFrameCount += numEncoded;
774
+                if (isAbrSave && numEncoded)
775
+                {
776
+                    copyInfo(analysisInfo);
777
+                }
778
+
779
+                if (numEncoded && pic_recon && m_cliopt.recon)
780
+                    m_cliopt.recon->writePicture(pic_out);
781
+                if (nal)
782
+                {
783
+                    m_cliopt.totalbytes += m_cliopt.output->writeFrame(p_nal, nal, pic_out);
784
+                    if (pts_queue)
785
+                    {
786
+                        pts_queue->push(-pic_out.pts);
787
+                        if (pts_queue->size() > 2)
788
+                            pts_queue->pop();
789
+                    }
790
+                }
791
+
792
+                m_cliopt.printStatus(outFrameCount);
793
+
794
+                if (!numEncoded)
795
+                    break;
796
+            }
797
+
798
+            if (bDolbyVisionRPU)
799
+            {
800
+                if (fgetc(m_cliopt.dolbyVisionRpu) != EOF)
801
+                    x265_log(NULL, X265_LOG_WARNING, "Dolby Vision RPU count is greater than frame count in %s\n",
802
+                        profileName);
803
+                x265_log(NULL, X265_LOG_INFO, "VES muxing with Dolby Vision RPU file successful in %s\n",
804
+                    profileName);
805
+            }
806
+
807
+            /* clear progress report */
808
+            if (m_cliopt.bProgress)
809
+                fprintf(stderr, "%*s\r", 80, " ");
810
+
811
+        fail:
812
+
813
+            delete reconPlay;
814
+
815
+            api->encoder_get_stats(m_encoder, &stats, sizeof(stats));
816
+            if (m_param->csvfn && !b_ctrl_c)
817
+#if ENABLE_LIBVMAF
818
+                api->vmaf_encoder_log(m_encoder, m_cliopt.argCount, m_cliopt.argString, m_cliopt.param, vmafdata);
819
+#else
820
+                api->encoder_log(m_encoder, m_cliopt.argCnt, m_cliopt.argString);
821
+#endif
822
+            api->encoder_close(m_encoder);
823
+
824
+            int64_t second_largest_pts = 0;
825
+            int64_t largest_pts = 0;
826
+            if (pts_queue && pts_queue->size() >= 2)
827
+            {
828
+                second_largest_pts = -pts_queue->top();
829
+                pts_queue->pop();
830
+                largest_pts = -pts_queue->top();
831
+                pts_queue->pop();
832
+                delete pts_queue;
833
+                pts_queue = NULL;
834
+            }
835
+            m_cliopt.output->closeFile(largest_pts, second_largest_pts);
836
+
837
+            if (b_ctrl_c)
838
+                general_log(m_param, NULL, X265_LOG_INFO, "aborted at input frame %d, output frame %d in %s\n",
839
+                    m_cliopt.seek + inFrameCount, stats.encodedPictureCount, profileName);
840
+
841
+            api->param_free(m_param);
842
+
843
+            X265_FREE(errorBuf);
844
+            X265_FREE(rpuPayload);
845
+
846
+            m_threadActive = false;
847
+            m_parent->m_numActiveEncodes.decr();
848
+        }
849
+    }
850
+
851
+    void PassEncoder::destroy()
852
+    {
853
+        stop();
854
+        if (m_reader)
855
+        {
856
+            m_reader->stop();
857
+            delete m_reader;
858
+        }
859
+        else
860
+        {
861
+            m_scaler->stop();
862
+            m_scaler->destroy();
863
+            delete m_scaler;
864
+        }
865
+    }
866
+
867
+    Scaler::Scaler(int threadId, int threadNum, int id, VideoDesc *src, VideoDesc *dst, PassEncoder *parentEnc)
868
+    {
869
+        m_parentEnc = parentEnc;
870
+        m_id = id;
871
+        m_srcFormat = src;
872
+        m_dstFormat = dst;
873
+        m_threadActive = false;
874
+        m_scaleFrameSize = 0;
875
+        m_filterManager = NULL;
876
+        m_threadId = threadId;
877
+        m_threadTotal = threadNum;
878
+
879
+        int csp = dst->m_csp;
880
+        uint32_t pixelbytes = dst->m_inputDepth > 8 ? 2 : 1;
881
+        for (int i = 0; i < x265_cli_csps[csp].planes; i++)
882
+        {
883
+            int w = dst->m_width >> x265_cli_csps[csp].width[i];
884
+            int h = dst->m_height >> x265_cli_csps[csp].height[i];
885
+            m_scalePlanes[i] = w * h * pixelbytes;
886
+            m_scaleFrameSize += m_scalePlanes[i];
887
+        }
888
+
889
+        if (src->m_height != dst->m_height || src->m_width != dst->m_width)
890
+        {
891
+            m_filterManager = new ScalerFilterManager;
892
+            m_filterManager->init(4, m_srcFormat, m_dstFormat);
893
+        }
894
+    }
895
+
896
+    bool Scaler::scalePic(x265_picture * destination, x265_picture * source)
897
+    {
898
+        if (!destination || !source)
899
+            return false;
900
+        x265_param* param = m_parentEnc->m_param;
901
+        int pixelBytes = m_dstFormat->m_inputDepth > 8 ? 2 : 1;
902
+        if (m_srcFormat->m_height != m_dstFormat->m_height || m_srcFormat->m_width != m_dstFormat->m_width)
903
+        {
904
+            void **srcPlane = NULL, **dstPlane = NULL;
905
+            int srcStride[3], dstStride[3];
906
+            destination->bitDepth = source->bitDepth;
907
+            destination->colorSpace = source->colorSpace;
908
+            destination->pts = source->pts;
909
+            destination->dts = source->dts;
910
+            destination->reorderedPts = source->reorderedPts;
911
+            destination->poc = source->poc;
912
+            destination->userSEI = source->userSEI;
913
+            srcPlane = source->planes;
914
+            dstPlane = destination->planes;
915
+            srcStride[0] = source->stride[0];
916
+            destination->stride[0] = m_dstFormat->m_width * pixelBytes;
917
+            dstStride[0] = destination->stride[0];
918
+            if (param->internalCsp != X265_CSP_I400)
919
+            {
920
+                srcStride[1] = source->stride[1];
921
+                srcStride[2] = source->stride[2];
922
+                destination->stride[1] = destination->stride[0] >> x265_cli_csps[param->internalCsp].width[1];
923
+                destination->stride[2] = destination->stride[0] >> x265_cli_csps[param->internalCsp].width[2];
924
+                dstStride[1] = destination->stride[1];
925
+                dstStride[2] = destination->stride[2];
926
+            }
927
+            if (m_scaleFrameSize)
928
+            {
929
+                m_filterManager->scale_pic(srcPlane, dstPlane, srcStride, dstStride);
930
+                return true;
931
+            }
932
+            else
933
+                x265_log(param, X265_LOG_INFO, "Empty frame received\n");
934
+        }
935
+        return false;
936
+    }
937
+
938
+    void Scaler::threadMain()
939
+    {
940
+        THREAD_NAME("Scaler", m_id);
941
+
942
+        /* unscaled picture is stored in the last index */
943
+        uint32_t srcId = m_id - 1;
944
+        int QDepth = m_parentEnc->m_parent->m_queueSize;
945
+        while (!m_parentEnc->m_inputOver)
946
+        {
947
+
948
+            uint32_t scaledWritten = m_parentEnc->m_parent->m_picWriteCnt[m_id].get();
949
+
950
+            if (m_parentEnc->m_cliopt.framesToBeEncoded && scaledWritten >= m_parentEnc->m_cliopt.framesToBeEncoded)
951
+                break;
952
+
953
+            if (m_threadTotal > 1 && (m_threadId != scaledWritten % m_threadTotal))
954
+            {
955
+                continue;
956
+            }
957
+            uint32_t written = m_parentEnc->m_parent->m_picWriteCnt[srcId].get();
958
+
959
+            /*If all the input pictures are scaled by the current scale worker thread wait for input pictures*/
960
+            while (m_threadActive && (scaledWritten == written)) {
961
+                written = m_parentEnc->m_parent->m_picWriteCnt[srcId].waitForChange(written);
962
+            }
963
+
964
+            if (m_threadActive && scaledWritten < written)
965
+            {
966
+
967
+                int scaledWriteIdx = scaledWritten % QDepth;
968
+                int overWritePicBuffer = scaledWritten / QDepth;
969
+                int read = m_parentEnc->m_parent->m_picIdxReadCnt[m_id][scaledWriteIdx].get();
970
+
971
+                while (overWritePicBuffer && read < overWritePicBuffer)
972
+                {
973
+                    read = m_parentEnc->m_parent->m_picIdxReadCnt[m_id][scaledWriteIdx].waitForChange(read);
974
+                }
975
+
976
+                if (!m_parentEnc->m_parent->m_inputPicBuffer[m_id][scaledWriteIdx])
977
+                {
978
+                    int framesize = 0;
979
+                    int planesize[3];
980
+                    int csp = m_dstFormat->m_csp;
981
+                    int stride[3];
982
+                    stride[0] = m_dstFormat->m_width;
983
+                    stride[1] = stride[0] >> x265_cli_csps[csp].width[1];
984
+                    stride[2] = stride[0] >> x265_cli_csps[csp].width[2];
985
+                    for (int i = 0; i < x265_cli_csps[csp].planes; i++)
986
+                    {
987
+                        uint32_t h = m_dstFormat->m_height >> x265_cli_csps[csp].height[i];
988
+                        planesize[i] = h * stride[i];
989
+                        framesize += planesize[i];
990
+                    }
991
+
992
+                    m_parentEnc->m_parent->m_inputPicBuffer[m_id][scaledWriteIdx] = x265_picture_alloc();
993
+                    x265_picture_init(m_parentEnc->m_param, m_parentEnc->m_parent->m_inputPicBuffer[m_id][scaledWriteIdx]);
994
+
995
+                    ((x265_picture*)m_parentEnc->m_parent->m_inputPicBuffer[m_id][scaledWritten % QDepth])->framesize = framesize;
996
+                    for (int32_t j = 0; j < x265_cli_csps[csp].planes; j++)
997
+                    {
998
+                        m_parentEnc->m_parent->m_inputPicBuffer[m_id][scaledWritten % QDepth]->planes[j] = X265_MALLOC(char, planesize[j]);
999
+                    }
1000
+                }
1001
+
1002
+                x265_picture *srcPic = m_parentEnc->m_parent->m_inputPicBuffer[srcId][scaledWritten % QDepth];
1003
+                x265_picture* destPic = m_parentEnc->m_parent->m_inputPicBuffer[m_id][scaledWriteIdx];
1004
+
1005
+                // Enqueue this picture up with the current encoder so that it will asynchronously encode
1006
+                if (!scalePic(destPic, srcPic))
1007
+                    x265_log(NULL, X265_LOG_ERROR, "Unable to copy scaled input picture to input queue \n");
1008
+                else
1009
+                    m_parentEnc->m_parent->m_picWriteCnt[m_id].incr();
1010
+                m_scaledWriteCnt.incr();
1011
+                m_parentEnc->m_parent->m_picIdxReadCnt[srcId][scaledWriteIdx].incr();
1012
+            }
1013
+            if (m_threadTotal > 1)
1014
+            {
1015
+                written = m_parentEnc->m_parent->m_picWriteCnt[srcId].get();
1016
+                int totalWrite = written / m_threadTotal;
1017
+                if (written % m_threadTotal > m_threadId)
1018
+                    totalWrite++;
1019
+                if (totalWrite == m_scaledWriteCnt.get())
1020
+                {
1021
+                    m_parentEnc->m_parent->m_picWriteCnt[srcId].poke();
1022
+                    m_parentEnc->m_parent->m_picWriteCnt[m_id].poke();
1023
+                    break;
1024
+                }
1025
+            }
1026
+            else
1027
+            {
1028
+                /* Once end of video is reached and all frames are scaled, release wait on picwritecount */
1029
+                scaledWritten = m_parentEnc->m_parent->m_picWriteCnt[m_id].get();
1030
+                written = m_parentEnc->m_parent->m_picWriteCnt[srcId].get();
1031
+                if (written == scaledWritten)
1032
+                {
1033
+                    m_parentEnc->m_parent->m_picWriteCnt[srcId].poke();
1034
+                    m_parentEnc->m_parent->m_picWriteCnt[m_id].poke();
1035
+                    break;
1036
+                }
1037
+            }
1038
+
1039
+        }
1040
+        m_threadActive = false;
1041
+        destroy();
1042
+    }
1043
+
1044
+    Reader::Reader(int id, PassEncoder *parentEnc)
1045
+    {
1046
+        m_parentEnc = parentEnc;
1047
+        m_id = id;
1048
+        m_input = parentEnc->m_input;
1049
+    }
1050
+
1051
+    void Reader::threadMain()
1052
+    {
1053
+        THREAD_NAME("Reader", m_id);
1054
+
1055
+        int QDepth = m_parentEnc->m_parent->m_queueSize;
1056
+        x265_picture* src = x265_picture_alloc();
1057
+        x265_picture_init(m_parentEnc->m_param, src);
1058
+
1059
+        while (m_threadActive)
1060
+        {
1061
+            uint32_t written = m_parentEnc->m_parent->m_picWriteCnt[m_id].get();
1062
+            uint32_t writeIdx = written % QDepth;
1063
+            uint32_t read = m_parentEnc->m_parent->m_picIdxReadCnt[m_id][writeIdx].get();
1064
+            uint32_t overWritePicBuffer = written / QDepth;
1065
+
1066
+            if (m_parentEnc->m_cliopt.framesToBeEncoded && written >= m_parentEnc->m_cliopt.framesToBeEncoded)
1067
+                break;
1068
+
1069
+            while (overWritePicBuffer && read < overWritePicBuffer)
1070
+            {
1071
+                read = m_parentEnc->m_parent->m_picIdxReadCnt[m_id][writeIdx].waitForChange(read);
1072
+            }
1073
+
1074
+            x265_picture* dest = m_parentEnc->m_parent->m_inputPicBuffer[m_id][writeIdx];
1075
+            if (m_input->readPicture(*src))
1076
+            {
1077
+                dest->poc = src->poc;
1078
+                dest->pts = src->pts;
1079
+                dest->userSEI = src->userSEI;
1080
+                dest->bitDepth = src->bitDepth;
1081
+                dest->framesize = src->framesize;
1082
+                dest->height = src->height;
1083
+                dest->width = src->width;
1084
+                dest->colorSpace = src->colorSpace;
1085
+                dest->userSEI = src->userSEI;
1086
+                dest->rpu.payload = src->rpu.payload;
1087
+                dest->picStruct = src->picStruct;
1088
+                dest->stride[0] = src->stride[0];
1089
+                dest->stride[1] = src->stride[1];
1090
+                dest->stride[2] = src->stride[2];
1091
+
1092
+                if (!dest->planes[0])
1093
+                    dest->planes[0] = X265_MALLOC(char, dest->framesize);
1094
+
1095
+                memcpy(dest->planes[0], src->planes[0], src->framesize * sizeof(char));
1096
+                dest->planes[1] = (char*)dest->planes[0] + src->stride[0] * src->height;
1097
+                dest->planes[2] = (char*)dest->planes[1] + src->stride[1] * (src->height >> x265_cli_csps[src->colorSpace].height[1]);
1098
+                m_parentEnc->m_parent->m_picWriteCnt[m_id].incr();
1099
+            }
1100
+            else
1101
+            {
1102
+                m_threadActive = false;
1103
+                m_parentEnc->m_inputOver = true;
1104
+                m_parentEnc->m_parent->m_picWriteCnt[m_id].poke();
1105
+            }
1106
+        }
1107
+        x265_picture_free(src);
1108
+    }
1109
+}
1110
x265_3.4.tar.gz/source/abrEncApp.h Added
155
 
1
@@ -0,0 +1,153 @@
2
+/*****************************************************************************
3
+* Copyright (C) 2013-2020 MulticoreWare, Inc
4
+*
5
+* Authors: Pooja Venkatesan <pooja@multicorewareinc.com>
6
+*          Aruna Matheswaran <aruna@multicorewareinc.com>
7
+*           
8
+*
9
+* This program is free software; you can redistribute it and/or modify
10
+* it under the terms of the GNU General Public License as published by
11
+* the Free Software Foundation; either version 2 of the License, or
12
+* (at your option) any later version.
13
+*
14
+* This program is distributed in the hope that it will be useful,
15
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
16
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
17
+* GNU General Public License for more details.
18
+*
19
+* You should have received a copy of the GNU General Public License
20
+* along with this program; if not, write to the Free Software
21
+* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
22
+*
23
+* This program is also available under a commercial proprietary license.
24
+* For more information, contact us at license @ x265.com.
25
+*****************************************************************************/
26
+
27
+#ifndef ABR_ENCODE_H
28
+#define ABR_ENCODE_H
29
+
30
+#include "x265.h"
31
+#include "scaler.h"
32
+#include "threading.h"
33
+#include "x265cli.h"
34
+
35
+namespace X265_NS {
36
+    // private namespace
37
+
38
+    class PassEncoder;
39
+    class Scaler;
40
+    class Reader;
41
+
42
+    class AbrEncoder
43
+    {
44
+    public:
45
+        uint8_t           m_numEncodes;
46
+        PassEncoder        **m_passEnc;
47
+        uint32_t           m_queueSize;
48
+        ThreadSafeInteger  m_numActiveEncodes;
49
+
50
+        x265_picture       ***m_inputPicBuffer; //[numEncodes][queueSize]
51
+        x265_analysis_data **m_analysisBuffer; //[numEncodes][queueSize]
52
+        int                **m_readFlag;
53
+
54
+        ThreadSafeInteger  *m_picWriteCnt;
55
+        ThreadSafeInteger  *m_picReadCnt;
56
+        ThreadSafeInteger  **m_picIdxReadCnt;
57
+        ThreadSafeInteger  *m_analysisWriteCnt; //[numEncodes][queueSize]
58
+        ThreadSafeInteger  *m_analysisReadCnt; //[numEncodes][queueSize]
59
+        ThreadSafeInteger  **m_analysisWrite; //[numEncodes][queueSize]
60
+        ThreadSafeInteger  **m_analysisRead; //[numEncodes][queueSize]
61
+
62
+        AbrEncoder(CLIOptions cliopt[], uint8_t numEncodes, int& ret);
63
+        bool allocBuffers();
64
+        void destroy();
65
+
66
+    };
67
+
68
+    class PassEncoder : public Thread
69
+    {
70
+    public:
71
+
72
+        uint32_t m_id;
73
+        x265_param *m_param;
74
+        AbrEncoder *m_parent;
75
+        x265_encoder *m_encoder;
76
+        Reader *m_reader;
77
+        Scaler *m_scaler;
78
+        bool m_inputOver;
79
+
80
+        int m_threadActive;
81
+        int m_lastIdx;
82
+        uint32_t m_outputNalsCount;
83
+
84
+        x265_picture **m_inputPicBuffer;
85
+        x265_analysis_data **m_analysisBuffer;
86
+        x265_nal **m_outputNals;
87
+        x265_picture **m_outputRecon;
88
+
89
+        CLIOptions m_cliopt;
90
+        InputFile* m_input;
91
+        const char* m_reconPlayCmd;
92
+        FILE*    m_qpfile;
93
+        FILE*    m_zoneFile;
94
+        FILE*    m_dolbyVisionRpu;/* File containing Dolby Vision BL RPU metadata */
95
+
96
+        int m_ret;
97
+
98
+        PassEncoder(uint32_t id, CLIOptions cliopt, AbrEncoder *parent);
99
+        int init(int &result);
100
+        void setReuseLevel();
101
+
102
+        void startThreads();
103
+        void copyInfo(x265_analysis_data *src);
104
+
105
+        bool readPicture(x265_picture*);
106
+        void destroy();
107
+
108
+    private:
109
+        void threadMain();
110
+    };
111
+
112
+    class Scaler : public Thread
113
+    {
114
+    public:
115
+        PassEncoder *m_parentEnc;
116
+        int m_id;
117
+        int m_scalePlanes[3];
118
+        int m_scaleFrameSize;
119
+        uint32_t m_threadId;
120
+        uint32_t m_threadTotal;
121
+        ThreadSafeInteger m_scaledWriteCnt;
122
+        VideoDesc* m_srcFormat;
123
+        VideoDesc* m_dstFormat;
124
+        int m_threadActive;
125
+        ScalerFilterManager* m_filterManager;
126
+
127
+        Scaler(int threadId, int threadNum, int id, VideoDesc *src, VideoDesc * dst, PassEncoder *parentEnc);
128
+        bool scalePic(x265_picture *destination, x265_picture *source);
129
+        void threadMain();
130
+        void destroy()
131
+        {
132
+            if (m_filterManager)
133
+            {
134
+                delete m_filterManager;
135
+                m_filterManager = NULL;
136
+            }
137
+        }
138
+    };
139
+
140
+    class Reader : public Thread
141
+    {
142
+    public:
143
+        PassEncoder *m_parentEnc;
144
+        int m_id;
145
+        InputFile* m_input;
146
+        int m_threadActive;
147
+
148
+        Reader(int id, PassEncoder *parentEnc);
149
+        void threadMain();
150
+    };
151
+}
152
+
153
+#endif // ifndef ABR_ENCODE_H
154
+#pragma once
155
x265_3.3.tar.gz/source/common/CMakeLists.txt -> x265_3.4.tar.gz/source/common/CMakeLists.txt Changed
59
 
1
@@ -14,7 +14,7 @@
2
 endif(EXTRA_LIB)
3
 
4
 if(ENABLE_ASSEMBLY)
5
-    set_source_files_properties(threading.cpp primitives.cpp PROPERTIES COMPILE_FLAGS -DENABLE_ASSEMBLY=1)
6
+    set_source_files_properties(threading.cpp primitives.cpp pixel.cpp PROPERTIES COMPILE_FLAGS -DENABLE_ASSEMBLY=1)
7
     list(APPEND VFLAGS "-DENABLE_ASSEMBLY=1")
8
 endif(ENABLE_ASSEMBLY)
9
 
10
@@ -84,16 +84,33 @@
11
 endif(ENABLE_ASSEMBLY AND X86)
12
 
13
 if(ENABLE_ASSEMBLY AND (ARM OR CROSS_COMPILE_ARM))
14
-    set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h)
15
+    if(ARM64)
16
+        if(GCC AND (CMAKE_CXX_FLAGS_RELEASE MATCHES "-O3"))
17
+            message(STATUS "Detected CXX compiler using -O3 optimization level")
18
+            add_definitions(-DAUTO_VECTORIZE=1)
19
+        endif()
20
+        set(C_SRCS asm-primitives.cpp pixel.h ipfilter8.h)
21
 
22
-    # add ARM assembly/intrinsic files here
23
-    set(A_SRCS asm.S cpu-a.S mc-a.S sad-a.S pixel-util.S ssd-a.S blockcopy8.S ipfilter8.S dct-a.S)
24
-    set(VEC_PRIMITIVES)
25
+        # add ARM assembly/intrinsic files here
26
+        set(A_SRCS asm.S mc-a.S sad-a.S pixel-util.S ipfilter8.S)
27
+        set(VEC_PRIMITIVES)
28
 
29
-    set(ARM_ASMS "${A_SRCS}" CACHE INTERNAL "ARM Assembly Sources")
30
-    foreach(SRC ${C_SRCS})
31
-        set(ASM_PRIMITIVES ${ASM_PRIMITIVES} arm/${SRC})
32
-    endforeach()
33
+        set(ARM_ASMS "${A_SRCS}" CACHE INTERNAL "ARM Assembly Sources")
34
+        foreach(SRC ${C_SRCS})
35
+            set(ASM_PRIMITIVES ${ASM_PRIMITIVES} aarch64/${SRC})
36
+        endforeach()
37
+    else()
38
+        set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h)
39
+
40
+        # add ARM assembly/intrinsic files here
41
+        set(A_SRCS asm.S cpu-a.S mc-a.S sad-a.S pixel-util.S ssd-a.S blockcopy8.S ipfilter8.S dct-a.S)
42
+        set(VEC_PRIMITIVES)
43
+
44
+        set(ARM_ASMS "${A_SRCS}" CACHE INTERNAL "ARM Assembly Sources")
45
+        foreach(SRC ${C_SRCS})
46
+            set(ASM_PRIMITIVES ${ASM_PRIMITIVES} arm/${SRC})
47
+        endforeach()
48
+    endif()
49
     source_group(Assembly FILES ${ASM_PRIMITIVES})
50
 endif(ENABLE_ASSEMBLY AND (ARM OR CROSS_COMPILE_ARM))
51
 
52
@@ -151,4 +168,5 @@
53
     predict.cpp  predict.h
54
     scalinglist.cpp scalinglist.h
55
     quant.cpp quant.h contexts.h
56
-    deblock.cpp deblock.h)
57
+    deblock.cpp deblock.h
58
+    scaler.cpp scaler.h)
59
x265_3.4.tar.gz/source/common/aarch64/asm-primitives.cpp Added
221
 
1
@@ -0,0 +1,219 @@
2
+/*****************************************************************************
3
+ * Copyright (C) 2020 MulticoreWare, Inc
4
+ *
5
+ * Authors: Hongbin Liu <liuhongbin1@huawei.com>
6
+ *          Yimeng Su <yimeng.su@huawei.com>
7
+ *
8
+ * This program is free software; you can redistribute it and/or modify
9
+ * it under the terms of the GNU General Public License as published by
10
+ * the Free Software Foundation; either version 2 of the License, or
11
+ * (at your option) any later version.
12
+ *
13
+ * This program is distributed in the hope that it will be useful,
14
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
15
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16
+ * GNU General Public License for more details.
17
+ *
18
+ * You should have received a copy of the GNU General Public License
19
+ * along with this program; if not, write to the Free Software
20
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
21
+ *
22
+ * This program is also available under a commercial proprietary license.
23
+ * For more information, contact us at license @ x265.com.
24
+ *****************************************************************************/
25
+
26
+#include "common.h"
27
+#include "primitives.h"
28
+#include "x265.h"
29
+#include "cpu.h"
30
+
31
+
32
+#if defined(__GNUC__)
33
+#define GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
34
+#endif
35
+
36
+#define GCC_4_9_0 40900
37
+#define GCC_5_1_0 50100
38
+
39
+extern "C" {
40
+#include "pixel.h"
41
+#include "pixel-util.h"
42
+#include "ipfilter8.h"
43
+}
44
+
45
+namespace X265_NS {
46
+// private x265 namespace
47
+
48
+
49
+template<int size>
50
+void interp_8tap_hv_pp_cpu(const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int idxX, int idxY)
51
+{
52
+    ALIGN_VAR_32(int16_t, immed[MAX_CU_SIZE * (MAX_CU_SIZE + NTAPS_LUMA - 1)]);
53
+    const int halfFilterSize = NTAPS_LUMA >> 1;
54
+    const int immedStride = MAX_CU_SIZE;
55
+
56
+    primitives.pu[size].luma_hps(src, srcStride, immed, immedStride, idxX, 1);
57
+    primitives.pu[size].luma_vsp(immed + (halfFilterSize - 1) * immedStride, immedStride, dst, dstStride, idxY);
58
+}
59
+
60
+
61
+/* Temporary workaround because luma_vsp assembly primitive has not been completed
62
+ * but interp_8tap_hv_pp_cpu uses mixed C primitive and assembly primitive.
63
+ * Otherwise, segment fault occurs. */
64
+void setupAliasCPrimitives(EncoderPrimitives &cp, EncoderPrimitives &asmp, int cpuMask)
65
+{
66
+    if (cpuMask & X265_CPU_NEON)
67
+    {
68
+        asmp.pu[LUMA_8x4].luma_vsp   = cp.pu[LUMA_8x4].luma_vsp;
69
+        asmp.pu[LUMA_8x8].luma_vsp   = cp.pu[LUMA_8x8].luma_vsp;
70
+        asmp.pu[LUMA_8x16].luma_vsp  = cp.pu[LUMA_8x16].luma_vsp;
71
+        asmp.pu[LUMA_8x32].luma_vsp  = cp.pu[LUMA_8x32].luma_vsp;
72
+        asmp.pu[LUMA_12x16].luma_vsp = cp.pu[LUMA_12x16].luma_vsp;
73
+#if !AUTO_VECTORIZE || GCC_VERSION < GCC_5_1_0 /* gcc_version < gcc-5.1.0 */
74
+        asmp.pu[LUMA_16x4].luma_vsp  = cp.pu[LUMA_16x4].luma_vsp;
75
+        asmp.pu[LUMA_16x8].luma_vsp  = cp.pu[LUMA_16x8].luma_vsp;
76
+        asmp.pu[LUMA_16x12].luma_vsp = cp.pu[LUMA_16x12].luma_vsp;
77
+        asmp.pu[LUMA_16x16].luma_vsp = cp.pu[LUMA_16x16].luma_vsp;
78
+        asmp.pu[LUMA_16x32].luma_vsp = cp.pu[LUMA_16x32].luma_vsp;
79
+        asmp.pu[LUMA_16x64].luma_vsp = cp.pu[LUMA_16x64].luma_vsp;
80
+        asmp.pu[LUMA_32x16].luma_vsp = cp.pu[LUMA_32x16].luma_vsp;
81
+        asmp.pu[LUMA_32x24].luma_vsp = cp.pu[LUMA_32x24].luma_vsp;
82
+        asmp.pu[LUMA_32x32].luma_vsp = cp.pu[LUMA_32x32].luma_vsp;
83
+        asmp.pu[LUMA_32x64].luma_vsp = cp.pu[LUMA_32x64].luma_vsp;
84
+        asmp.pu[LUMA_48x64].luma_vsp = cp.pu[LUMA_48x64].luma_vsp;
85
+        asmp.pu[LUMA_64x16].luma_vsp = cp.pu[LUMA_64x16].luma_vsp;
86
+        asmp.pu[LUMA_64x32].luma_vsp = cp.pu[LUMA_64x32].luma_vsp;
87
+        asmp.pu[LUMA_64x48].luma_vsp = cp.pu[LUMA_64x48].luma_vsp;
88
+        asmp.pu[LUMA_64x64].luma_vsp = cp.pu[LUMA_64x64].luma_vsp;    
89
+#if !AUTO_VECTORIZE || GCC_VERSION < GCC_4_9_0 /* gcc_version < gcc-4.9.0 */
90
+        asmp.pu[LUMA_4x4].luma_vsp   = cp.pu[LUMA_4x4].luma_vsp;
91
+        asmp.pu[LUMA_4x8].luma_vsp   = cp.pu[LUMA_4x8].luma_vsp;
92
+        asmp.pu[LUMA_4x16].luma_vsp  = cp.pu[LUMA_4x16].luma_vsp;
93
+        asmp.pu[LUMA_24x32].luma_vsp = cp.pu[LUMA_24x32].luma_vsp;
94
+        asmp.pu[LUMA_32x8].luma_vsp  = cp.pu[LUMA_32x8].luma_vsp;
95
+#endif
96
+#endif
97
+    }
98
+}
99
+
100
+
101
+void setupAssemblyPrimitives(EncoderPrimitives &p, int cpuMask) 
102
+{
103
+    if (cpuMask & X265_CPU_NEON)
104
+    {
105
+        p.pu[LUMA_4x4].satd   = PFX(pixel_satd_4x4_neon);
106
+        p.pu[LUMA_4x8].satd   = PFX(pixel_satd_4x8_neon);
107
+        p.pu[LUMA_4x16].satd  = PFX(pixel_satd_4x16_neon);
108
+        p.pu[LUMA_8x4].satd   = PFX(pixel_satd_8x4_neon);
109
+        p.pu[LUMA_8x8].satd   = PFX(pixel_satd_8x8_neon);
110
+        p.pu[LUMA_12x16].satd = PFX(pixel_satd_12x16_neon);
111
+        
112
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].satd    = PFX(pixel_satd_4x4_neon);
113
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].satd    = PFX(pixel_satd_4x8_neon);
114
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].satd   = PFX(pixel_satd_4x16_neon);
115
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].satd    = PFX(pixel_satd_8x4_neon);
116
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].satd    = PFX(pixel_satd_8x8_neon);
117
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_12x16].satd  = PFX(pixel_satd_12x16_neon);
118
+        
119
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].satd    = PFX(pixel_satd_4x4_neon);
120
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].satd    = PFX(pixel_satd_4x8_neon);
121
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].satd   = PFX(pixel_satd_4x16_neon);
122
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].satd   = PFX(pixel_satd_4x32_neon);
123
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].satd    = PFX(pixel_satd_8x4_neon);
124
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].satd    = PFX(pixel_satd_8x8_neon);
125
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].satd  = PFX(pixel_satd_12x32_neon);
126
+
127
+        p.pu[LUMA_4x4].pixelavg_pp[NONALIGNED]   = PFX(pixel_avg_pp_4x4_neon);
128
+        p.pu[LUMA_4x8].pixelavg_pp[NONALIGNED]   = PFX(pixel_avg_pp_4x8_neon);
129
+        p.pu[LUMA_4x16].pixelavg_pp[NONALIGNED]  = PFX(pixel_avg_pp_4x16_neon);
130
+        p.pu[LUMA_8x4].pixelavg_pp[NONALIGNED]   = PFX(pixel_avg_pp_8x4_neon);
131
+        p.pu[LUMA_8x8].pixelavg_pp[NONALIGNED]   = PFX(pixel_avg_pp_8x8_neon);
132
+        p.pu[LUMA_8x16].pixelavg_pp[NONALIGNED]  = PFX(pixel_avg_pp_8x16_neon);
133
+        p.pu[LUMA_8x32].pixelavg_pp[NONALIGNED]  = PFX(pixel_avg_pp_8x32_neon);
134
+
135
+        p.pu[LUMA_4x4].pixelavg_pp[ALIGNED]   = PFX(pixel_avg_pp_4x4_neon);
136
+        p.pu[LUMA_4x8].pixelavg_pp[ALIGNED]   = PFX(pixel_avg_pp_4x8_neon);
137
+        p.pu[LUMA_4x16].pixelavg_pp[ALIGNED]  = PFX(pixel_avg_pp_4x16_neon);
138
+        p.pu[LUMA_8x4].pixelavg_pp[ALIGNED]   = PFX(pixel_avg_pp_8x4_neon);
139
+        p.pu[LUMA_8x8].pixelavg_pp[ALIGNED]   = PFX(pixel_avg_pp_8x8_neon);
140
+        p.pu[LUMA_8x16].pixelavg_pp[ALIGNED]  = PFX(pixel_avg_pp_8x16_neon);
141
+        p.pu[LUMA_8x32].pixelavg_pp[ALIGNED]  = PFX(pixel_avg_pp_8x32_neon);
142
+
143
+        p.pu[LUMA_8x4].sad_x3   = PFX(sad_x3_8x4_neon);
144
+        p.pu[LUMA_8x8].sad_x3   = PFX(sad_x3_8x8_neon);
145
+        p.pu[LUMA_8x16].sad_x3  = PFX(sad_x3_8x16_neon);
146
+        p.pu[LUMA_8x32].sad_x3  = PFX(sad_x3_8x32_neon);
147
+
148
+        p.pu[LUMA_8x4].sad_x4   = PFX(sad_x4_8x4_neon);
149
+        p.pu[LUMA_8x8].sad_x4   = PFX(sad_x4_8x8_neon);
150
+        p.pu[LUMA_8x16].sad_x4  = PFX(sad_x4_8x16_neon);
151
+        p.pu[LUMA_8x32].sad_x4  = PFX(sad_x4_8x32_neon);
152
+
153
+        // quant
154
+        p.quant = PFX(quant_neon);
155
+        // luma_hps
156
+        p.pu[LUMA_4x4].luma_hps   = PFX(interp_8tap_horiz_ps_4x4_neon);
157
+        p.pu[LUMA_4x8].luma_hps   = PFX(interp_8tap_horiz_ps_4x8_neon);
158
+        p.pu[LUMA_4x16].luma_hps  = PFX(interp_8tap_horiz_ps_4x16_neon);
159
+        p.pu[LUMA_8x4].luma_hps   = PFX(interp_8tap_horiz_ps_8x4_neon);
160
+        p.pu[LUMA_8x8].luma_hps   = PFX(interp_8tap_horiz_ps_8x8_neon);
161
+        p.pu[LUMA_8x16].luma_hps  = PFX(interp_8tap_horiz_ps_8x16_neon);
162
+        p.pu[LUMA_8x32].luma_hps  = PFX(interp_8tap_horiz_ps_8x32_neon);
163
+        p.pu[LUMA_12x16].luma_hps = PFX(interp_8tap_horiz_ps_12x16_neon);
164
+        p.pu[LUMA_24x32].luma_hps = PFX(interp_8tap_horiz_ps_24x32_neon);
165
+#if !AUTO_VECTORIZE || GCC_VERSION < GCC_5_1_0 /* gcc_version < gcc-5.1.0 */
166
+        p.pu[LUMA_16x4].luma_hps  = PFX(interp_8tap_horiz_ps_16x4_neon);
167
+        p.pu[LUMA_16x8].luma_hps  = PFX(interp_8tap_horiz_ps_16x8_neon);
168
+        p.pu[LUMA_16x12].luma_hps = PFX(interp_8tap_horiz_ps_16x12_neon);
169
+        p.pu[LUMA_16x16].luma_hps = PFX(interp_8tap_horiz_ps_16x16_neon);
170
+        p.pu[LUMA_16x32].luma_hps = PFX(interp_8tap_horiz_ps_16x32_neon);
171
+        p.pu[LUMA_16x64].luma_hps = PFX(interp_8tap_horiz_ps_16x64_neon);
172
+        p.pu[LUMA_32x8].luma_hps  = PFX(interp_8tap_horiz_ps_32x8_neon);
173
+        p.pu[LUMA_32x16].luma_hps = PFX(interp_8tap_horiz_ps_32x16_neon);
174
+        p.pu[LUMA_32x24].luma_hps = PFX(interp_8tap_horiz_ps_32x24_neon);
175
+        p.pu[LUMA_32x32].luma_hps = PFX(interp_8tap_horiz_ps_32x32_neon);
176
+        p.pu[LUMA_32x64].luma_hps = PFX(interp_8tap_horiz_ps_32x64_neon);
177
+        p.pu[LUMA_48x64].luma_hps = PFX(interp_8tap_horiz_ps_48x64_neon);
178
+        p.pu[LUMA_64x16].luma_hps = PFX(interp_8tap_horiz_ps_64x16_neon);
179
+        p.pu[LUMA_64x32].luma_hps = PFX(interp_8tap_horiz_ps_64x32_neon);
180
+        p.pu[LUMA_64x48].luma_hps = PFX(interp_8tap_horiz_ps_64x48_neon);
181
+        p.pu[LUMA_64x64].luma_hps = PFX(interp_8tap_horiz_ps_64x64_neon);
182
+#endif
183
+
184
+        p.pu[LUMA_8x4].luma_hvpp   =  interp_8tap_hv_pp_cpu<LUMA_8x4>;
185
+        p.pu[LUMA_8x8].luma_hvpp   =  interp_8tap_hv_pp_cpu<LUMA_8x8>;
186
+        p.pu[LUMA_8x16].luma_hvpp  =  interp_8tap_hv_pp_cpu<LUMA_8x16>;
187
+        p.pu[LUMA_8x32].luma_hvpp  =  interp_8tap_hv_pp_cpu<LUMA_8x32>;
188
+        p.pu[LUMA_12x16].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_12x16>;
189
+#if !AUTO_VECTORIZE || GCC_VERSION < GCC_5_1_0 /* gcc_version < gcc-5.1.0 */
190
+        p.pu[LUMA_16x4].luma_hvpp  =  interp_8tap_hv_pp_cpu<LUMA_16x4>;
191
+        p.pu[LUMA_16x8].luma_hvpp  =  interp_8tap_hv_pp_cpu<LUMA_16x8>;
192
+        p.pu[LUMA_16x12].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_16x12>;
193
+        p.pu[LUMA_16x16].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_16x16>;
194
+        p.pu[LUMA_16x32].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_16x32>;
195
+        p.pu[LUMA_16x64].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_16x64>;
196
+        p.pu[LUMA_32x16].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_32x16>;
197
+        p.pu[LUMA_32x24].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_32x24>;
198
+        p.pu[LUMA_32x32].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_32x32>;
199
+        p.pu[LUMA_32x64].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_32x64>;
200
+        p.pu[LUMA_48x64].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_48x64>;
201
+        p.pu[LUMA_64x16].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_64x16>;
202
+        p.pu[LUMA_64x32].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_64x32>;
203
+        p.pu[LUMA_64x48].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_64x48>;
204
+        p.pu[LUMA_64x64].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_64x64>;
205
+#if !AUTO_VECTORIZE || GCC_VERSION < GCC_4_9_0 /* gcc_version < gcc-4.9.0 */
206
+        p.pu[LUMA_4x4].luma_hvpp   =  interp_8tap_hv_pp_cpu<LUMA_4x4>;
207
+        p.pu[LUMA_4x8].luma_hvpp   =  interp_8tap_hv_pp_cpu<LUMA_4x8>;
208
+        p.pu[LUMA_4x16].luma_hvpp  =  interp_8tap_hv_pp_cpu<LUMA_4x16>;
209
+        p.pu[LUMA_24x32].luma_hvpp =  interp_8tap_hv_pp_cpu<LUMA_24x32>;
210
+        p.pu[LUMA_32x8].luma_hvpp  =  interp_8tap_hv_pp_cpu<LUMA_32x8>;
211
+#endif
212
+#endif
213
+
214
+#if !HIGH_BIT_DEPTH
215
+        p.cu[BLOCK_4x4].psy_cost_pp = PFX(psyCost_4x4_neon);
216
+#endif // !HIGH_BIT_DEPTH
217
+
218
+    }
219
+}
220
+} // namespace X265_NS
221
x265_3.4.tar.gz/source/common/aarch64/asm.S Added
71
 
1
@@ -0,0 +1,69 @@
2
+/*****************************************************************************
3
+ * Copyright (C) 2020 MulticoreWare, Inc
4
+ *
5
+ * Authors: Hongbin Liu <liuhongbin1@huawei.com>
6
+ *
7
+ * This program is free software; you can redistribute it and/or modify
8
+ * it under the terms of the GNU General Public License as published by
9
+ * the Free Software Foundation; either version 2 of the License, or
10
+ * (at your option) any later version.
11
+ *
12
+ * This program is distributed in the hope that it will be useful,
13
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
14
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15
+ * GNU General Public License for more details.
16
+ *
17
+ * You should have received a copy of the GNU General Public License
18
+ * along with this program; if not, write to the Free Software
19
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
20
+ *
21
+ * This program is also available under a commercial proprietary license.
22
+ * For more information, contact us at license @ x265.com.
23
+ *****************************************************************************/
24
+
25
+.arch           armv8-a
26
+
27
+#ifdef PREFIX
28
+#define EXTERN_ASM _
29
+#else
30
+#define EXTERN_ASM
31
+#endif
32
+
33
+#ifdef __ELF__
34
+#define ELF
35
+#else
36
+#define ELF @
37
+#endif
38
+
39
+#define HAVE_AS_FUNC 1
40
+
41
+#if HAVE_AS_FUNC
42
+#define FUNC
43
+#else
44
+#define FUNC @
45
+#endif
46
+
47
+.macro function name, export=1
48
+    .macro endfunc
49
+ELF     .size   \name, . - \name
50
+FUNC    .endfunc
51
+        .purgem endfunc
52
+    .endm
53
+        .align  2
54
+.if \export == 1
55
+        .global EXTERN_ASM\name
56
+ELF     .hidden EXTERN_ASM\name
57
+ELF     .type   EXTERN_ASM\name, %function
58
+FUNC    .func   EXTERN_ASM\name
59
+EXTERN_ASM\name:
60
+.else
61
+ELF     .hidden \name
62
+ELF     .type   \name, %function
63
+FUNC    .func   \name
64
+\name:
65
+.endif
66
+.endm
67
+
68
+
69
+#define FENC_STRIDE 64
70
+#define FDEC_STRIDE 32
71
x265_3.4.tar.gz/source/common/aarch64/ipfilter8.S Added
416
 
1
@@ -0,0 +1,414 @@
2
+/*****************************************************************************
3
+ * Copyright (C) 2020 MulticoreWare, Inc
4
+ *
5
+ * Authors: Yimeng Su <yimeng.su@huawei.com>
6
+ *
7
+ * This program is free software; you can redistribute it and/or modify
8
+ * it under the terms of the GNU General Public License as published by
9
+ * the Free Software Foundation; either version 2 of the License, or
10
+ * (at your option) any later version.
11
+ *
12
+ * This program is distributed in the hope that it will be useful,
13
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
14
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15
+ * GNU General Public License for more details.
16
+ *
17
+ * You should have received a copy of the GNU General Public License
18
+ * along with this program; if not, write to the Free Software
19
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
20
+ *
21
+ * This program is also available under a commercial proprietary license.
22
+ * For more information, contact us at license @ x265.com.
23
+ *****************************************************************************/
24
+
25
+#include "asm.S"
26
+
27
+.section .rodata
28
+
29
+.align 4
30
+
31
+.text
32
+
33
+
34
+
35
+.macro qpel_filter_0_32b
36
+    movi            v24.8h, #64
37
+    uxtl            v19.8h, v5.8b
38
+    smull           v17.4s, v19.4h, v24.4h
39
+    smull2          v18.4s, v19.8h, v24.8h
40
+.endm
41
+
42
+.macro qpel_filter_1_32b
43
+    movi            v16.8h, #58
44
+    uxtl            v19.8h, v5.8b
45
+    smull           v17.4s, v19.4h, v16.4h
46
+    smull2          v18.4s, v19.8h, v16.8h
47
+
48
+    movi            v24.8h, #10
49
+    uxtl            v21.8h, v1.8b
50
+    smull           v19.4s, v21.4h, v24.4h
51
+    smull2          v20.4s, v21.8h, v24.8h
52
+
53
+    movi            v16.8h, #17
54
+    uxtl            v23.8h, v2.8b
55
+    smull           v21.4s, v23.4h, v16.4h
56
+    smull2          v22.4s, v23.8h, v16.8h
57
+
58
+    movi            v24.8h, #5
59
+    uxtl            v1.8h, v6.8b
60
+    smull           v23.4s, v1.4h, v24.4h
61
+    smull2          v16.4s, v1.8h, v24.8h
62
+
63
+    sub             v17.4s, v17.4s, v19.4s
64
+    sub             v18.4s, v18.4s, v20.4s
65
+
66
+    uxtl            v1.8h, v4.8b
67
+    sshll           v19.4s, v1.4h, #2
68
+    sshll2          v20.4s, v1.8h, #2
69
+
70
+    add             v17.4s, v17.4s, v21.4s
71
+    add             v18.4s, v18.4s, v22.4s
72
+
73
+    uxtl            v1.8h, v0.8b
74
+    uxtl            v2.8h, v3.8b
75
+    ssubl           v21.4s, v2.4h, v1.4h
76
+    ssubl2          v22.4s, v2.8h, v1.8h
77
+
78
+    add             v17.4s, v17.4s, v19.4s
79
+    add             v18.4s, v18.4s, v20.4s
80
+    sub             v21.4s, v21.4s, v23.4s
81
+    sub             v22.4s, v22.4s, v16.4s
82
+    add             v17.4s, v17.4s, v21.4s
83
+    add             v18.4s, v18.4s, v22.4s
84
+.endm
85
+
86
+.macro qpel_filter_2_32b
87
+    movi            v16.4s, #11
88
+    uxtl            v19.8h, v5.8b
89
+    uxtl            v20.8h, v2.8b
90
+    saddl           v17.4s, v19.4h, v20.4h
91
+    saddl2          v18.4s, v19.8h, v20.8h
92
+
93
+    uxtl            v21.8h, v1.8b
94
+    uxtl            v22.8h, v6.8b
95
+    saddl           v19.4s, v21.4h, v22.4h
96
+    saddl2          v20.4s, v21.8h, v22.8h
97
+
98
+    mul             v19.4s, v19.4s, v16.4s
99
+    mul             v20.4s, v20.4s, v16.4s
100
+
101
+    movi            v16.4s, #40
102
+    mul             v17.4s, v17.4s, v16.4s
103
+    mul             v18.4s, v18.4s, v16.4s
104
+
105
+    uxtl            v21.8h, v4.8b
106
+    uxtl            v22.8h, v3.8b
107
+    saddl           v23.4s, v21.4h, v22.4h
108
+    saddl2          v16.4s, v21.8h, v22.8h
109
+
110
+    uxtl            v1.8h, v0.8b
111
+    uxtl            v2.8h, v7.8b
112
+    saddl           v21.4s, v1.4h, v2.4h
113
+    saddl2          v22.4s, v1.8h, v2.8h
114
+
115
+    shl             v23.4s, v23.4s, #2
116
+    shl             v16.4s, v16.4s, #2
117
+
118
+    add             v19.4s, v19.4s, v21.4s
119
+    add             v20.4s, v20.4s, v22.4s
120
+    add             v17.4s, v17.4s, v23.4s
121
+    add             v18.4s, v18.4s, v16.4s
122
+    sub             v17.4s, v17.4s, v19.4s
123
+    sub             v18.4s, v18.4s, v20.4s
124
+.endm
125
+
126
+.macro qpel_filter_3_32b
127
+    movi            v16.8h, #17
128
+    movi            v24.8h, #5
129
+
130
+    uxtl            v19.8h, v5.8b
131
+    smull           v17.4s, v19.4h, v16.4h
132
+    smull2          v18.4s, v19.8h, v16.8h
133
+
134
+    uxtl            v21.8h, v1.8b
135
+    smull           v19.4s, v21.4h, v24.4h
136
+    smull2          v20.4s, v21.8h, v24.8h
137
+
138
+    movi            v16.8h, #58
139
+    uxtl            v23.8h, v2.8b
140
+    smull           v21.4s, v23.4h, v16.4h
141
+    smull2          v22.4s, v23.8h, v16.8h
142
+
143
+    movi            v24.8h, #10
144
+    uxtl            v1.8h, v6.8b
145
+    smull           v23.4s, v1.4h, v24.4h
146
+    smull2          v16.4s, v1.8h, v24.8h
147
+
148
+    sub             v17.4s, v17.4s, v19.4s
149
+    sub             v18.4s, v18.4s, v20.4s
150
+
151
+    uxtl            v1.8h, v3.8b
152
+    sshll           v19.4s, v1.4h, #2
153
+    sshll2          v20.4s, v1.8h, #2
154
+
155
+    add             v17.4s, v17.4s, v21.4s
156
+    add             v18.4s, v18.4s, v22.4s
157
+
158
+    uxtl            v1.8h, v4.8b
159
+    uxtl            v2.8h, v7.8b
160
+    ssubl           v21.4s, v1.4h, v2.4h
161
+    ssubl2          v22.4s, v1.8h, v2.8h
162
+
163
+    add             v17.4s, v17.4s, v19.4s
164
+    add             v18.4s, v18.4s, v20.4s
165
+    sub             v21.4s, v21.4s, v23.4s
166
+    sub             v22.4s, v22.4s, v16.4s
167
+    add             v17.4s, v17.4s, v21.4s
168
+    add             v18.4s, v18.4s, v22.4s
169
+.endm
170
+
171
+
172
+
173
+
174
+.macro vextin8
175
+    ld1             {v3.16b}, [x11], #16
176
+    mov             v7.d[0], v3.d[1]
177
+    ext             v0.8b, v3.8b, v7.8b, #1
178
+    ext             v4.8b, v3.8b, v7.8b, #2
179
+    ext             v1.8b, v3.8b, v7.8b, #3
180
+    ext             v5.8b, v3.8b, v7.8b, #4
181
+    ext             v2.8b, v3.8b, v7.8b, #5
182
+    ext             v6.8b, v3.8b, v7.8b, #6
183
+    ext             v3.8b, v3.8b, v7.8b, #7
184
+.endm
185
+
186
+
187
+
188
+// void interp_horiz_ps_c(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt)
189
+.macro HPS_FILTER a b filterhps
190
+    mov             w12, #8192
191
+    mov             w6, w10
192
+    sub             x3, x3, #\a
193
+    lsl             x3, x3, #1
194
+    mov             w9, #\a
195
+    cmp             w9, #4
196
+    b.eq            14f
197
+    cmp             w9, #12
198
+    b.eq            15f
199
+    b               7f
200
+14:
201
+    HPS_FILTER_4 \a \b \filterhps
202
+    b               10f
203
+15:
204
+    HPS_FILTER_12 \a \b \filterhps
205
+    b               10f
206
+7:
207
+    cmp             w5, #0
208
+    b.eq            8f
209
+    cmp             w5, #1
210
+    b.eq            9f
211
+8:
212
+loop1_hps_\filterhps\()_\a\()x\b\()_rowext0:
213
+    mov             w7, #\a
214
+    lsr             w7, w7, #3
215
+    mov             x11, x0
216
+    sub             x11, x11, #4
217
+loop2_hps_\filterhps\()_\a\()x\b\()_rowext0:
218
+    vextin8
219
+    \filterhps
220
+    dup             v16.4s, w12
221
+    sub             v17.4s, v17.4s, v16.4s
222
+    sub             v18.4s, v18.4s, v16.4s
223
+    xtn             v0.4h, v17.4s
224
+    xtn2            v0.8h, v18.4s
225
+    st1             {v0.8h}, [x2], #16
226
+    subs            w7, w7, #1
227
+    sub             x11, x11, #8
228
+    b.ne            loop2_hps_\filterhps\()_\a\()x\b\()_rowext0
229
+    subs            w6, w6, #1
230
+    add             x0, x0, x1
231
+    add             x2, x2, x3
232
+    b.ne            loop1_hps_\filterhps\()_\a\()x\b\()_rowext0
233
+    b               10f
234
+9:
235
+loop3_hps_\filterhps\()_\a\()x\b\()_rowext1:
236
+    mov             w7, #\a
237
+    lsr             w7, w7, #3
238
+    mov             x11, x0
239
+    sub             x11, x11, #4
240
+loop4_hps_\filterhps\()_\a\()x\b\()_rowext1:
241
+    vextin8
242
+    \filterhps
243
+    dup             v16.4s, w12
244
+    sub             v17.4s, v17.4s, v16.4s
245
+    sub             v18.4s, v18.4s, v16.4s
246
+    xtn             v0.4h, v17.4s
247
+    xtn2            v0.8h, v18.4s
248
+    st1             {v0.8h}, [x2], #16
249
+    subs            w7, w7, #1
250
+    sub             x11, x11, #8
251
+    b.ne            loop4_hps_\filterhps\()_\a\()x\b\()_rowext1
252
+    subs            w6, w6, #1
253
+    add             x0, x0, x1
254
+    add             x2, x2, x3
255
+    b.ne            loop3_hps_\filterhps\()_\a\()x\b\()_rowext1
256
+10:
257
+.endm
258
+
259
+.macro HPS_FILTER_4 w h filterhps
260
+    cmp             w5, #0
261
+    b.eq            11f
262
+    cmp             w5, #1
263
+    b.eq            12f
264
+11:
265
+loop4_hps_\filterhps\()_\w\()x\h\()_rowext0:
266
+    mov             x11, x0
267
+    sub             x11, x11, #4
268
+    vextin8
269
+    \filterhps
270
+    dup             v16.4s, w12
271
+    sub             v17.4s, v17.4s, v16.4s
272
+    xtn             v0.4h, v17.4s
273
+    st1             {v0.4h}, [x2], #8
274
+    sub             x11, x11, #8
275
+    subs            w6, w6, #1
276
+    add             x0, x0, x1
277
+    add             x2, x2, x3
278
+    b.ne            loop4_hps_\filterhps\()_\w\()x\h\()_rowext0
279
+    b               13f
280
+12:
281
+loop5_hps_\filterhps\()_\w\()x\h\()_rowext1:
282
+    mov             x11, x0
283
+    sub             x11, x11, #4
284
+    vextin8
285
+    \filterhps
286
+    dup             v16.4s, w12
287
+    sub             v17.4s, v17.4s, v16.4s
288
+    xtn             v0.4h, v17.4s
289
+    st1             {v0.4h}, [x2], #8
290
+    sub             x11, x11, #8
291
+    subs            w6, w6, #1
292
+    add             x0, x0, x1
293
+    add             x2, x2, x3
294
+    b.ne            loop5_hps_\filterhps\()_\w\()x\h\()_rowext1
295
+13:
296
+.endm
297
+
298
+.macro HPS_FILTER_12 w h filterhps
299
+    cmp             w5, #0
300
+    b.eq            14f
301
+    cmp             w5, #1
302
+    b.eq            15f
303
+14:
304
+loop12_hps_\filterhps\()_\w\()x\h\()_rowext0:
305
+    mov             x11, x0
306
+    sub             x11, x11, #4
307
+    vextin8
308
+    \filterhps
309
+    dup             v16.4s, w12
310
+    sub             v17.4s, v17.4s, v16.4s
311
+    sub             v18.4s, v18.4s, v16.4s
312
+    xtn             v0.4h, v17.4s
313
+    xtn2            v0.8h, v18.4s
314
+    st1             {v0.8h}, [x2], #16
315
+    sub             x11, x11, #8
316
+
317
+    vextin8
318
+    \filterhps
319
+    dup             v16.4s, w12
320
+    sub             v17.4s, v17.4s, v16.4s
321
+    xtn             v0.4h, v17.4s
322
+    st1             {v0.4h}, [x2], #8
323
+    add             x2, x2, x3
324
+    subs            w6, w6, #1
325
+    add             x0, x0, x1
326
+    b.ne            loop12_hps_\filterhps\()_\w\()x\h\()_rowext0
327
+    b               16f
328
+15:
329
+loop12_hps_\filterhps\()_\w\()x\h\()_rowext1:
330
+    mov             x11, x0
331
+    sub             x11, x11, #4
332
+    vextin8
333
+    \filterhps
334
+    dup             v16.4s, w12
335
+    sub             v17.4s, v17.4s, v16.4s
336
+    sub             v18.4s, v18.4s, v16.4s
337
+    xtn             v0.4h, v17.4s
338
+    xtn2            v0.8h, v18.4s
339
+    st1             {v0.8h}, [x2], #16
340
+    sub             x11, x11, #8
341
+
342
+    vextin8
343
+    \filterhps
344
+    dup             v16.4s, w12
345
+    sub             v17.4s, v17.4s, v16.4s
346
+    xtn             v0.4h, v17.4s
347
+    st1             {v0.4h}, [x2], #8
348
+    add             x2, x2, x3
349
+    subs            w6, w6, #1
350
+    add             x0, x0, x1
351
+    b.ne            loop12_hps_\filterhps\()_\w\()x\h\()_rowext1
352
+16:
353
+.endm
354
+
355
+// void interp_horiz_ps_c(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt)
356
+.macro LUMA_HPS w h
357
+function x265_interp_8tap_horiz_ps_\w\()x\h\()_neon
358
+    mov             w10, #\h
359
+    cmp             w5, #0
360
+    b.eq            6f
361
+    sub             x0, x0, x1, lsl #2
362
+
363
+    add             x0, x0, x1
364
+    add             w10, w10, #7
365
+6:
366
+    cmp             w4, #0
367
+    b.eq            0f
368
+    cmp             w4, #1
369
+    b.eq            1f
370
+    cmp             w4, #2
371
+    b.eq            2f
372
+    cmp             w4, #3
373
+    b.eq            3f
374
+0:
375
+    HPS_FILTER  \w \h qpel_filter_0_32b
376
+    b               5f
377
+1:
378
+    HPS_FILTER  \w \h qpel_filter_1_32b
379
+    b               5f
380
+2:
381
+    HPS_FILTER  \w \h qpel_filter_2_32b
382
+    b               5f
383
+3:
384
+    HPS_FILTER  \w \h qpel_filter_3_32b
385
+    b               5f
386
+5:
387
+    ret
388
+endfunc
389
+.endm
390
+
391
+LUMA_HPS    4 4
392
+LUMA_HPS    4 8
393
+LUMA_HPS    4 16
394
+LUMA_HPS    8 4
395
+LUMA_HPS    8 8
396
+LUMA_HPS    8 16
397
+LUMA_HPS    8 32
398
+LUMA_HPS    12 16
399
+LUMA_HPS    16 4
400
+LUMA_HPS    16 8
401
+LUMA_HPS    16 12
402
+LUMA_HPS    16 16
403
+LUMA_HPS    16 32
404
+LUMA_HPS    16 64
405
+LUMA_HPS    24 32
406
+LUMA_HPS    32 8
407
+LUMA_HPS    32 16
408
+LUMA_HPS    32 24
409
+LUMA_HPS    32 32
410
+LUMA_HPS    32 64
411
+LUMA_HPS    48 64
412
+LUMA_HPS    64 16
413
+LUMA_HPS    64 32
414
+LUMA_HPS    64 48
415
+LUMA_HPS    64 64
416
x265_3.4.tar.gz/source/common/aarch64/ipfilter8.h Added
57
 
1
@@ -0,0 +1,55 @@
2
+/*****************************************************************************
3
+ * Copyright (C) 2020 MulticoreWare, Inc
4
+ *
5
+ * Authors: Yimeng Su <yimeng.su@huawei.com>
6
+ *
7
+ * This program is free software; you can redistribute it and/or modify
8
+ * it under the terms of the GNU General Public License as published by
9
+ * the Free Software Foundation; either version 2 of the License, or
10
+ * (at your option) any later version.
11
+ *
12
+ * This program is distributed in the hope that it will be useful,
13
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
14
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15
+ * GNU General Public License for more details.
16
+ *
17
+ * You should have received a copy of the GNU General Public License
18
+ * along with this program; if not, write to the Free Software
19
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
20
+ *
21
+ * This program is also available under a commercial proprietary license.
22
+ * For more information, contact us at license @ x265.com.
23
+ *****************************************************************************/
24
+
25
+#ifndef X265_IPFILTER8_AARCH64_H
26
+#define X265_IPFILTER8_AARCH64_H
27
+
28
+
29
+void x265_interp_8tap_horiz_ps_4x4_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
30
+void x265_interp_8tap_horiz_ps_4x8_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
31
+void x265_interp_8tap_horiz_ps_4x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
32
+void x265_interp_8tap_horiz_ps_8x4_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
33
+void x265_interp_8tap_horiz_ps_8x8_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
34
+void x265_interp_8tap_horiz_ps_8x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
35
+void x265_interp_8tap_horiz_ps_8x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
36
+void x265_interp_8tap_horiz_ps_12x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
37
+void x265_interp_8tap_horiz_ps_16x4_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
38
+void x265_interp_8tap_horiz_ps_16x8_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
39
+void x265_interp_8tap_horiz_ps_16x12_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
40
+void x265_interp_8tap_horiz_ps_16x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
41
+void x265_interp_8tap_horiz_ps_16x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
42
+void x265_interp_8tap_horiz_ps_16x64_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
43
+void x265_interp_8tap_horiz_ps_24x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
44
+void x265_interp_8tap_horiz_ps_32x8_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
45
+void x265_interp_8tap_horiz_ps_32x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
46
+void x265_interp_8tap_horiz_ps_32x24_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
47
+void x265_interp_8tap_horiz_ps_32x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
48
+void x265_interp_8tap_horiz_ps_32x64_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
49
+void x265_interp_8tap_horiz_ps_48x64_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
50
+void x265_interp_8tap_horiz_ps_64x16_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
51
+void x265_interp_8tap_horiz_ps_64x32_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
52
+void x265_interp_8tap_horiz_ps_64x48_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
53
+void x265_interp_8tap_horiz_ps_64x64_neon(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx, int isRowExt);
54
+
55
+
56
+#endif // ifndef X265_IPFILTER8_AARCH64_H
57
x265_3.4.tar.gz/source/common/aarch64/mc-a.S Added
65
 
1
@@ -0,0 +1,63 @@
2
+/*****************************************************************************
3
+ * Copyright (C) 2020 MulticoreWare, Inc
4
+ *
5
+ * Authors: Hongbin Liu <liuhongbin1@huawei.com>
6
+ *
7
+ * This program is free software; you can redistribute it and/or modify
8
+ * it under the terms of the GNU General Public License as published by
9
+ * the Free Software Foundation; either version 2 of the License, or
10
+ * (at your option) any later version.
11
+ *
12
+ * This program is distributed in the hope that it will be useful,
13
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
14
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15
+ * GNU General Public License for more details.
16
+ *
17
+ * You should have received a copy of the GNU General Public License
18
+ * along with this program; if not, write to the Free Software
19
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
20
+ *
21
+ * This program is also available under a commercial proprietary license.
22
+ * For more information, contact us at license @ x265.com.
23
+ *****************************************************************************/
24
+
25
+#include "asm.S"
26
+
27
+.section .rodata
28
+
29
+.align 4
30
+
31
+.text
32
+
33
+.macro pixel_avg_pp_4xN_neon h
34
+function x265_pixel_avg_pp_4x\h\()_neon
35
+.rept \h
36
+    ld1             {v0.s}[0], [x2], x3
37
+    ld1             {v1.s}[0], [x4], x5
38
+    urhadd          v2.8b, v0.8b, v1.8b
39
+    st1             {v2.s}[0], [x0], x1
40
+.endr
41
+    ret
42
+endfunc
43
+.endm
44
+
45
+pixel_avg_pp_4xN_neon 4
46
+pixel_avg_pp_4xN_neon 8
47
+pixel_avg_pp_4xN_neon 16
48
+
49
+.macro pixel_avg_pp_8xN_neon h
50
+function x265_pixel_avg_pp_8x\h\()_neon
51
+.rept \h
52
+    ld1             {v0.8b}, [x2], x3
53
+    ld1             {v1.8b}, [x4], x5
54
+    urhadd          v2.8b, v0.8b, v1.8b
55
+    st1             {v2.8b}, [x0], x1
56
+.endr
57
+    ret
58
+endfunc
59
+.endm
60
+
61
+pixel_avg_pp_8xN_neon 4
62
+pixel_avg_pp_8xN_neon 8
63
+pixel_avg_pp_8xN_neon 16
64
+pixel_avg_pp_8xN_neon 32
65
x265_3.4.tar.gz/source/common/aarch64/pixel-util.S Added
421
 
1
@@ -0,0 +1,419 @@
2
+/*****************************************************************************
3
+ * Copyright (C) 2020 MulticoreWare, Inc
4
+ *
5
+ * Authors: Yimeng Su <yimeng.su@huawei.com>
6
+ *          Hongbin Liu <liuhongbin1@huawei.com>
7
+ *
8
+ * This program is free software; you can redistribute it and/or modify
9
+ * it under the terms of the GNU General Public License as published by
10
+ * the Free Software Foundation; either version 2 of the License, or
11
+ * (at your option) any later version.
12
+ *
13
+ * This program is distributed in the hope that it will be useful,
14
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
15
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16
+ * GNU General Public License for more details.
17
+ *
18
+ * You should have received a copy of the GNU General Public License
19
+ * along with this program; if not, write to the Free Software
20
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
21
+ *
22
+ * This program is also available under a commercial proprietary license.
23
+ * For more information, contact us at license @ x265.com.
24
+ *****************************************************************************/
25
+
26
+#include "asm.S"
27
+
28
+.section .rodata
29
+
30
+.align 4
31
+
32
+.text
33
+
34
+.macro x265_satd_4x8_8x4_end_neon
35
+    add             v0.8h, v4.8h, v6.8h
36
+    add             v1.8h, v5.8h, v7.8h
37
+    sub             v2.8h, v4.8h, v6.8h
38
+    sub             v3.8h, v5.8h, v7.8h
39
+
40
+    trn1            v16.8h, v0.8h, v1.8h
41
+    trn2            v17.8h, v0.8h, v1.8h
42
+    add             v4.8h, v16.8h, v17.8h
43
+    trn1            v18.8h, v2.8h, v3.8h
44
+    trn2            v19.8h, v2.8h, v3.8h
45
+    sub             v5.8h, v16.8h, v17.8h
46
+    add             v6.8h, v18.8h, v19.8h
47
+    sub             v7.8h, v18.8h, v19.8h
48
+    trn1            v0.4s, v4.4s, v6.4s
49
+    trn2            v2.4s, v4.4s, v6.4s
50
+    abs             v0.8h, v0.8h
51
+    trn1            v1.4s, v5.4s, v7.4s
52
+    trn2            v3.4s, v5.4s, v7.4s
53
+    abs             v2.8h, v2.8h
54
+    abs             v1.8h, v1.8h
55
+    abs             v3.8h, v3.8h
56
+    umax            v0.8h, v0.8h, v2.8h
57
+    umax            v1.8h, v1.8h, v3.8h
58
+    add             v0.8h, v0.8h, v1.8h
59
+    uaddlv          s0, v0.8h
60
+.endm
61
+
62
+.macro pixel_satd_4x8_neon
63
+    ld1r             {v1.2s}, [x2], x3
64
+    ld1r            {v0.2s}, [x0], x1
65
+    ld1r            {v3.2s}, [x2], x3
66
+    ld1r            {v2.2s}, [x0], x1
67
+    ld1r            {v5.2s}, [x2], x3
68
+    ld1r            {v4.2s}, [x0], x1
69
+    ld1r            {v7.2s}, [x2], x3
70
+    ld1r            {v6.2s}, [x0], x1
71
+
72
+    ld1             {v1.s}[1], [x2], x3
73
+    ld1             {v0.s}[1], [x0], x1
74
+    usubl           v0.8h, v0.8b, v1.8b
75
+    ld1             {v3.s}[1], [x2], x3
76
+    ld1             {v2.s}[1], [x0], x1
77
+    usubl           v1.8h, v2.8b, v3.8b
78
+    ld1             {v5.s}[1], [x2], x3
79
+    ld1             {v4.s}[1], [x0], x1
80
+    usubl           v2.8h, v4.8b, v5.8b
81
+    ld1             {v7.s}[1], [x2], x3
82
+    add             v4.8h, v0.8h, v1.8h
83
+    sub             v5.8h, v0.8h, v1.8h
84
+    ld1             {v6.s}[1], [x0], x1
85
+    usubl           v3.8h, v6.8b, v7.8b
86
+    add         v6.8h, v2.8h, v3.8h
87
+    sub         v7.8h, v2.8h, v3.8h
88
+    x265_satd_4x8_8x4_end_neon
89
+.endm
90
+
91
+// template<int w, int h>
92
+// int satd4(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2)
93
+function x265_pixel_satd_4x8_neon
94
+    pixel_satd_4x8_neon
95
+    mov               w0, v0.s[0]
96
+    ret
97
+endfunc
98
+
99
+// template<int w, int h>
100
+// int satd4(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2)
101
+function x265_pixel_satd_4x16_neon
102
+    eor             w4, w4, w4
103
+    pixel_satd_4x8_neon
104
+    mov               w5, v0.s[0]
105
+    add             w4, w4, w5
106
+    pixel_satd_4x8_neon
107
+    mov               w5, v0.s[0]
108
+    add             w0, w5, w4
109
+    ret
110
+endfunc
111
+
112
+// template<int w, int h>
113
+// int satd4(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2)
114
+function x265_pixel_satd_4x32_neon
115
+    eor             w4, w4, w4
116
+.rept 4
117
+    pixel_satd_4x8_neon
118
+    mov             w5, v0.s[0]
119
+    add             w4, w4, w5
120
+.endr
121
+    mov             w0, w4
122
+    ret
123
+endfunc
124
+
125
+// template<int w, int h>
126
+// int satd4(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2)
127
+function x265_pixel_satd_12x16_neon
128
+    mov             x4, x0
129
+    mov             x5, x2
130
+    eor             w7, w7, w7
131
+    pixel_satd_4x8_neon
132
+    mov             w6, v0.s[0]
133
+    add             w7, w7, w6
134
+    pixel_satd_4x8_neon
135
+    mov             w6, v0.s[0]
136
+    add             w7, w7, w6
137
+
138
+    add             x0, x4, #4
139
+    add             x2, x5, #4
140
+    pixel_satd_4x8_neon
141
+    mov             w6, v0.s[0]
142
+    add             w7, w7, w6
143
+    pixel_satd_4x8_neon
144
+    mov             w6, v0.s[0]
145
+    add             w7, w7, w6
146
+
147
+    add             x0, x4, #8
148
+    add             x2, x5, #8
149
+    pixel_satd_4x8_neon
150
+    mov             w6, v0.s[0]
151
+    add             w7, w7, w6
152
+    pixel_satd_4x8_neon
153
+    mov             w6, v0.s[0]
154
+    add             w0, w7, w6
155
+    ret
156
+endfunc
157
+
158
+// template<int w, int h>
159
+// int satd4(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2)
160
+function x265_pixel_satd_12x32_neon
161
+    mov             x4, x0
162
+    mov             x5, x2
163
+    eor             w7, w7, w7
164
+.rept 4
165
+    pixel_satd_4x8_neon
166
+    mov             w6, v0.s[0]
167
+    add             w7, w7, w6
168
+.endr
169
+
170
+    add             x0, x4, #4
171
+    add             x2, x5, #4
172
+.rept 4
173
+    pixel_satd_4x8_neon
174
+    mov             w6, v0.s[0]
175
+    add             w7, w7, w6
176
+.endr
177
+
178
+    add             x0, x4, #8
179
+    add             x2, x5, #8
180
+.rept 4
181
+    pixel_satd_4x8_neon
182
+    mov             w6, v0.s[0]
183
+    add             w7, w7, w6
184
+.endr
185
+
186
+    mov             w0, w7
187
+    ret
188
+endfunc
189
+
190
+// template<int w, int h>
191
+// int satd4(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2)
192
+function x265_pixel_satd_8x8_neon
193
+    eor             w4, w4, w4
194
+    mov             x6, x0
195
+    mov             x7, x2
196
+    pixel_satd_4x8_neon
197
+    mov             w5, v0.s[0]
198
+    add             w4, w4, w5
199
+    add             x0, x6, #4
200
+    add             x2, x7, #4
201
+    pixel_satd_4x8_neon
202
+    mov             w5, v0.s[0]
203
+    add             w0, w4, w5
204
+    ret
205
+endfunc
206
+
207
+// int psyCost_pp(const pixel* source, intptr_t sstride, const pixel* recon, intptr_t rstride)
208
+function x265_psyCost_4x4_neon
209
+    ld1r            {v4.2s}, [x0], x1
210
+    ld1r            {v5.2s}, [x0], x1
211
+    ld1             {v4.s}[1], [x0], x1
212
+    ld1             {v5.s}[1], [x0], x1
213
+
214
+    ld1r            {v6.2s}, [x2], x3
215
+    ld1r            {v7.2s}, [x2], x3
216
+    ld1             {v6.s}[1], [x2], x3
217
+    ld1             {v7.s}[1], [x2], x3
218
+
219
+    uaddl           v2.8h, v4.8b, v5.8b
220
+    usubl           v3.8h, v4.8b, v5.8b
221
+    uaddl           v18.8h, v6.8b, v7.8b
222
+    usubl           v19.8h, v6.8b, v7.8b
223
+
224
+    mov             v20.d[0], v2.d[1]
225
+    add             v0.4h, v2.4h, v20.4h
226
+    sub             v1.4h, v2.4h, v20.4h
227
+    mov             v21.d[0], v3.d[1]
228
+    add             v22.4h, v3.4h, v21.4h
229
+    sub             v23.4h, v3.4h, v21.4h
230
+
231
+    mov             v24.d[0], v18.d[1]
232
+    add             v16.4h, v18.4h, v24.4h
233
+    sub             v17.4h, v18.4h, v24.4h
234
+    mov             v25.d[0], v19.d[1]
235
+    add             v26.4h, v19.4h, v25.4h
236
+    sub             v27.4h, v19.4h, v25.4h
237
+
238
+    mov             v0.d[1], v22.d[0]
239
+    mov             v1.d[1], v23.d[0]
240
+    trn1            v22.8h, v0.8h, v1.8h
241
+    trn2            v23.8h, v0.8h, v1.8h
242
+    mov             v16.d[1], v26.d[0]
243
+    mov             v17.d[1], v27.d[0]
244
+    trn1            v26.8h, v16.8h, v17.8h
245
+    trn2            v27.8h, v16.8h, v17.8h
246
+
247
+    add             v2.8h, v22.8h, v23.8h
248
+    sub             v3.8h, v22.8h, v23.8h
249
+    add             v18.8h, v26.8h, v27.8h
250
+    sub             v19.8h, v26.8h, v27.8h
251
+
252
+    uaddl           v20.8h, v4.8b, v5.8b
253
+    uaddl           v21.8h, v6.8b, v7.8b
254
+
255
+    trn1            v0.4s, v2.4s, v3.4s
256
+    trn2            v1.4s, v2.4s, v3.4s
257
+    trn1            v16.4s, v18.4s, v19.4s
258
+    trn2            v17.4s, v18.4s, v19.4s
259
+    abs             v0.8h, v0.8h
260
+    abs             v16.8h, v16.8h
261
+    abs             v1.8h, v1.8h
262
+    abs             v17.8h, v17.8h
263
+
264
+    uaddlv          s20, v20.8h
265
+    uaddlv          s21, v21.8h
266
+    mov             v20.s[1], v21.s[0]
267
+
268
+    smax            v0.8h, v0.8h, v1.8h
269
+    smax            v16.8h, v16.8h, v17.8h
270
+
271
+    trn1            v4.2d, v0.2d, v16.2d
272
+    trn2            v5.2d, v0.2d, v16.2d
273
+    add             v0.8h, v4.8h, v5.8h
274
+    mov             v4.d[0], v0.d[1]
275
+    uaddlv          s0, v0.4h
276
+    uaddlv          s4, v4.4h
277
+
278
+    ushr            v20.2s, v20.2s, #2
279
+    mov             v0.s[1], v4.s[0]
280
+    sub             v0.2s, v0.2s, v20.2s
281
+    mov             w0, v0.s[0]
282
+    mov             w1, v0.s[1]
283
+    subs            w0, w0, w1
284
+    cneg            w0, w0, mi
285
+
286
+    ret
287
+endfunc
288
+
289
+// uint32_t quant_c(const int16_t* coef, const int32_t* quantCoeff, int32_t* deltaU, int16_t* qCoef, int qBits, int add, int numCoeff)
290
+function x265_quant_neon
291
+    mov             w9, #1
292
+    lsl             w9, w9, w4
293
+    dup             v0.2s, w9
294
+    neg             w9, w4
295
+    dup             v1.4s, w9
296
+    add             w9, w9, #8
297
+    dup             v2.4s, w9
298
+    dup             v3.4s, w5
299
+
300
+    lsr             w6, w6, #2
301
+    eor             v4.16b, v4.16b, v4.16b
302
+    eor             w10, w10, w10
303
+    eor             v17.16b, v17.16b, v17.16b
304
+
305
+.loop_quant:
306
+
307
+    ld1             {v18.4h}, [x0], #8
308
+    ld1             {v7.4s}, [x1], #16
309
+    sxtl            v6.4s, v18.4h
310
+
311
+    cmlt            v5.4s, v6.4s, #0
312
+
313
+    abs             v6.4s, v6.4s
314
+
315
+
316
+    mul             v6.4s, v6.4s, v7.4s
317
+
318
+    add             v7.4s, v6.4s, v3.4s
319
+    sshl            v7.4s, v7.4s, v1.4s
320
+
321
+    mls             v6.4s, v7.4s, v0.s[0]
322
+    sshl            v16.4s, v6.4s, v2.4s
323
+    st1             {v16.4s}, [x2], #16
324
+
325
+    // numsig
326
+    cmeq            v16.4s, v7.4s, v17.4s
327
+    add             v4.4s, v4.4s, v16.4s
328
+    add             w10, w10, #4
329
+
330
+    // level *= sign
331
+    eor             v16.16b, v7.16b, v5.16b
332
+    sub             v16.4s, v16.4s, v5.4s
333
+    sqxtn           v5.4h, v16.4s
334
+    st1             {v5.4h}, [x3], #8
335
+
336
+    subs            w6, w6, #1
337
+    b.ne             .loop_quant
338
+
339
+    addv            s4, v4.4s
340
+    mov             w9, v4.s[0]
341
+    add             w0, w10, w9
342
+    ret
343
+endfunc
344
+
345
+.macro satd_4x4_neon
346
+    ld1             {v1.s}[0], [x2], x3
347
+    ld1             {v0.s}[0], [x0], x1
348
+    ld1             {v3.s}[0], [x2], x3
349
+    ld1             {v2.s}[0], [x0], x1
350
+
351
+    ld1             {v1.s}[1], [x2], x3
352
+    ld1             {v0.s}[1], [x0], x1
353
+    ld1             {v3.s}[1], [x2], x3
354
+    ld1             {v2.s}[1], [x0], x1
355
+
356
+    usubl           v4.8h, v0.8b, v1.8b
357
+    usubl           v5.8h, v2.8b, v3.8b
358
+
359
+    add             v6.8h, v4.8h, v5.8h
360
+    sub             v7.8h, v4.8h, v5.8h
361
+
362
+    mov             v4.d[0], v6.d[1]
363
+    add             v0.8h, v6.8h, v4.8h
364
+    sub             v2.8h, v6.8h, v4.8h
365
+
366
+    mov             v5.d[0], v7.d[1]
367
+    add             v1.8h, v7.8h, v5.8h
368
+    sub             v3.8h, v7.8h, v5.8h
369
+
370
+    trn1            v4.4h, v0.4h, v1.4h
371
+    trn2            v5.4h, v0.4h, v1.4h
372
+
373
+    trn1            v6.4h, v2.4h, v3.4h
374
+    trn2            v7.4h, v2.4h, v3.4h
375
+
376
+    add             v0.4h, v4.4h, v5.4h
377
+    sub             v1.4h, v4.4h, v5.4h
378
+
379
+    add             v2.4h, v6.4h, v7.4h
380
+    sub             v3.4h, v6.4h, v7.4h
381
+
382
+    trn1            v4.2s, v0.2s, v1.2s
383
+    trn2            v5.2s, v0.2s, v1.2s
384
+
385
+    trn1            v6.2s, v2.2s, v3.2s
386
+    trn2            v7.2s, v2.2s, v3.2s
387
+
388
+    abs             v4.4h, v4.4h
389
+    abs             v5.4h, v5.4h
390
+    abs             v6.4h, v6.4h
391
+    abs             v7.4h, v7.4h
392
+
393
+    smax            v1.4h, v4.4h, v5.4h
394
+    smax            v2.4h, v6.4h, v7.4h
395
+
396
+    add             v0.4h, v1.4h, v2.4h
397
+    uaddlp          v0.2s, v0.4h
398
+    uaddlp          v0.1d, v0.2s
399
+.endm
400
+
401
+// int satd_4x4(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2)
402
+function x265_pixel_satd_4x4_neon
403
+    satd_4x4_neon
404
+    umov            x0, v0.d[0]
405
+    ret
406
+endfunc
407
+
408
+// int satd_8x4(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2)
409
+function x265_pixel_satd_8x4_neon
410
+    mov             x4, x0
411
+    mov             x5, x2
412
+    satd_4x4_neon
413
+    add             x0, x4, #4
414
+    add             x2, x5, #4
415
+    umov            x6, v0.d[0]
416
+    satd_4x4_neon
417
+    umov            x0, v0.d[0]
418
+    add             x0, x0, x6
419
+    ret
420
+endfunc
421
x265_3.4.tar.gz/source/common/aarch64/pixel-util.h Added
42
 
1
@@ -0,0 +1,40 @@
2
+/*****************************************************************************
3
+ * Copyright (C) 2020 MulticoreWare, Inc
4
+ *
5
+ * Authors: Yimeng Su <yimeng.su@huawei.com>
6
+ *          Hongbin Liu <liuhongbin1@huawei.com>
7
+ *
8
+ * This program is free software; you can redistribute it and/or modify
9
+ * it under the terms of the GNU General Public License as published by
10
+ * the Free Software Foundation; either version 2 of the License, or
11
+ * (at your option) any later version.
12
+ *
13
+ * This program is distributed in the hope that it will be useful,
14
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
15
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16
+ * GNU General Public License for more details.
17
+ *
18
+ * You should have received a copy of the GNU General Public License
19
+ * along with this program; if not, write to the Free Software
20
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
21
+ *
22
+ * This program is also available under a commercial proprietary license.
23
+ * For more information, contact us at license @ x265.com.
24
+ *****************************************************************************/
25
+
26
+#ifndef X265_PIXEL_UTIL_AARCH64_H
27
+#define X265_PIXEL_UTIL_AARCH64_H
28
+
29
+int x265_pixel_satd_4x4_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
30
+int x265_pixel_satd_4x8_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
31
+int x265_pixel_satd_4x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
32
+int x265_pixel_satd_4x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
33
+int x265_pixel_satd_8x4_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
34
+int x265_pixel_satd_8x8_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
35
+int x265_pixel_satd_12x16_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
36
+int x265_pixel_satd_12x32_neon(const pixel* pix1, intptr_t stride_pix1, const pixel* pix2, intptr_t stride_pix2);
37
+
38
+uint32_t x265_quant_neon(const int16_t* coef, const int32_t* quantCoeff, int32_t* deltaU, int16_t* qCoef, int qBits, int add, int numCoeff);
39
+int PFX(psyCost_4x4_neon)(const pixel* source, intptr_t sstride, const pixel* recon, intptr_t rstride);
40
+
41
+#endif // ifndef X265_PIXEL_UTIL_AARCH64_H
42
x265_3.4.tar.gz/source/common/aarch64/pixel.h Added
107
 
1
@@ -0,0 +1,105 @@
2
+/*****************************************************************************
3
+ * Copyright (C) 2020 MulticoreWare, Inc
4
+ *
5
+ * Authors: Hongbin Liu <liuhongbin1@huawei.com>
6
+ *
7
+ * This program is free software; you can redistribute it and/or modify
8
+ * it under the terms of the GNU General Public License as published by
9
+ * the Free Software Foundation; either version 2 of the License, or
10
+ * (at your option) any later version.
11
+ *
12
+ * This program is distributed in the hope that it will be useful,
13
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
14
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15
+ * GNU General Public License for more details.
16
+ *
17
+ * You should have received a copy of the GNU General Public License
18
+ * along with this program; if not, write to the Free Software
19
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
20
+ *
21
+ * This program is also available under a commercial proprietary license.
22
+ * For more information, contact us at license @ x265.com.
23
+ *****************************************************************************/
24
+
25
+#ifndef X265_I386_PIXEL_AARCH64_H
26
+#define X265_I386_PIXEL_AARCH64_H
27
+
28
+void x265_pixel_avg_pp_4x4_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
29
+void x265_pixel_avg_pp_4x8_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
30
+void x265_pixel_avg_pp_4x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
31
+void x265_pixel_avg_pp_8x4_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
32
+void x265_pixel_avg_pp_8x8_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
33
+void x265_pixel_avg_pp_8x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
34
+void x265_pixel_avg_pp_8x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
35
+void x265_pixel_avg_pp_12x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
36
+void x265_pixel_avg_pp_16x4_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
37
+void x265_pixel_avg_pp_16x8_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
38
+void x265_pixel_avg_pp_16x12_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
39
+void x265_pixel_avg_pp_16x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
40
+void x265_pixel_avg_pp_16x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
41
+void x265_pixel_avg_pp_16x64_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
42
+void x265_pixel_avg_pp_24x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
43
+void x265_pixel_avg_pp_32x8_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
44
+void x265_pixel_avg_pp_32x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
45
+void x265_pixel_avg_pp_32x24_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
46
+void x265_pixel_avg_pp_32x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
47
+void x265_pixel_avg_pp_32x64_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
48
+void x265_pixel_avg_pp_48x64_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
49
+void x265_pixel_avg_pp_64x16_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
50
+void x265_pixel_avg_pp_64x32_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
51
+void x265_pixel_avg_pp_64x48_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
52
+void x265_pixel_avg_pp_64x64_neon (pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int);
53
+
54
+void x265_sad_x3_4x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
55
+void x265_sad_x3_4x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
56
+void x265_sad_x3_4x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
57
+void x265_sad_x3_8x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
58
+void x265_sad_x3_8x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
59
+void x265_sad_x3_8x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
60
+void x265_sad_x3_8x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
61
+void x265_sad_x3_12x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
62
+void x265_sad_x3_16x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
63
+void x265_sad_x3_16x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
64
+void x265_sad_x3_16x12_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
65
+void x265_sad_x3_16x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
66
+void x265_sad_x3_16x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
67
+void x265_sad_x3_16x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
68
+void x265_sad_x3_24x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
69
+void x265_sad_x3_32x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
70
+void x265_sad_x3_32x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
71
+void x265_sad_x3_32x24_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
72
+void x265_sad_x3_32x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
73
+void x265_sad_x3_32x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
74
+void x265_sad_x3_48x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
75
+void x265_sad_x3_64x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
76
+void x265_sad_x3_64x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
77
+void x265_sad_x3_64x48_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
78
+void x265_sad_x3_64x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, intptr_t frefstride, int32_t* res);
79
+
80
+void x265_sad_x4_4x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
81
+void x265_sad_x4_4x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
82
+void x265_sad_x4_4x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
83
+void x265_sad_x4_8x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
84
+void x265_sad_x4_8x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
85
+void x265_sad_x4_8x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
86
+void x265_sad_x4_8x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
87
+void x265_sad_x4_12x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
88
+void x265_sad_x4_16x4_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
89
+void x265_sad_x4_16x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
90
+void x265_sad_x4_16x12_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
91
+void x265_sad_x4_16x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
92
+void x265_sad_x4_16x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
93
+void x265_sad_x4_16x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
94
+void x265_sad_x4_24x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
95
+void x265_sad_x4_32x8_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
96
+void x265_sad_x4_32x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
97
+void x265_sad_x4_32x24_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
98
+void x265_sad_x4_32x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
99
+void x265_sad_x4_32x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
100
+void x265_sad_x4_48x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
101
+void x265_sad_x4_64x16_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
102
+void x265_sad_x4_64x32_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
103
+void x265_sad_x4_64x48_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
104
+void x265_sad_x4_64x64_neon(const pixel* fenc, const pixel* fref0, const pixel* fref1, const pixel* fref2, const pixel* fref3, intptr_t frefstride, int32_t* res);
105
+
106
+#endif // ifndef X265_I386_PIXEL_AARCH64_H
107
x265_3.4.tar.gz/source/common/aarch64/sad-a.S Added
107
 
1
@@ -0,0 +1,105 @@
2
+/*****************************************************************************
3
+ * Copyright (C) 2020 MulticoreWare, Inc
4
+ *
5
+ * Authors: Hongbin Liu <liuhongbin1@huawei.com>
6
+ *
7
+ * This program is free software; you can redistribute it and/or modify
8
+ * it under the terms of the GNU General Public License as published by
9
+ * the Free Software Foundation; either version 2 of the License, or
10
+ * (at your option) any later version.
11
+ *
12
+ * This program is distributed in the hope that it will be useful,
13
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
14
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15
+ * GNU General Public License for more details.
16
+ *
17
+ * You should have received a copy of the GNU General Public License
18
+ * along with this program; if not, write to the Free Software
19
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
20
+ *
21
+ * This program is also available under a commercial proprietary license.
22
+ * For more information, contact us at license @ x265.com.
23
+ *****************************************************************************/
24
+
25
+#include "asm.S"
26
+
27
+.section .rodata
28
+
29
+.align 4
30
+
31
+.text
32
+
33
+.macro SAD_X_START_8 x
34
+    ld1             {v0.8b}, [x0], x9
35
+.if \x == 3
36
+    ld1             {v1.8b}, [x1], x4
37
+    ld1             {v2.8b}, [x2], x4
38
+    ld1             {v3.8b}, [x3], x4
39
+.elseif \x == 4
40
+    ld1             {v1.8b}, [x1], x5
41
+    ld1             {v2.8b}, [x2], x5
42
+    ld1             {v3.8b}, [x3], x5
43
+    ld1             {v4.8b}, [x4], x5
44
+.endif
45
+    uabdl           v16.8h, v0.8b, v1.8b
46
+    uabdl           v17.8h, v0.8b, v2.8b
47
+    uabdl           v18.8h, v0.8b, v3.8b
48
+.if \x == 4
49
+    uabdl           v19.8h, v0.8b, v4.8b
50
+.endif
51
+.endm
52
+
53
+.macro SAD_X_8 x
54
+    ld1             {v0.8b}, [x0], x9
55
+.if \x == 3
56
+    ld1             {v1.8b}, [x1], x4
57
+    ld1             {v2.8b}, [x2], x4
58
+    ld1             {v3.8b}, [x3], x4
59
+.elseif \x == 4
60
+    ld1             {v1.8b}, [x1], x5
61
+    ld1             {v2.8b}, [x2], x5
62
+    ld1             {v3.8b}, [x3], x5
63
+    ld1             {v4.8b}, [x4], x5
64
+.endif
65
+    uabal           v16.8h, v0.8b, v1.8b
66
+    uabal           v17.8h, v0.8b, v2.8b
67
+    uabal           v18.8h, v0.8b, v3.8b
68
+.if \x == 4
69
+    uabal           v19.8h, v0.8b, v4.8b
70
+.endif
71
+.endm
72
+
73
+.macro SAD_X_8xN x, h
74
+function x265_sad_x\x\()_8x\h\()_neon
75
+    mov             x9, #FENC_STRIDE
76
+    SAD_X_START_8 \x
77
+.rept \h - 1
78
+    SAD_X_8 \x
79
+.endr
80
+    uaddlv          s0, v16.8h
81
+    uaddlv          s1, v17.8h
82
+    uaddlv          s2, v18.8h
83
+.if \x == 4
84
+    uaddlv          s3, v19.8h
85
+.endif
86
+
87
+.if \x == 3
88
+    stp             s0, s1, [x5]
89
+    str             s2, [x5, #8]
90
+.elseif \x == 4
91
+    stp             s0, s1, [x6]
92
+    stp             s2, s3, [x6, #8]
93
+.endif
94
+    ret
95
+endfunc
96
+.endm
97
+
98
+SAD_X_8xN 3 4
99
+SAD_X_8xN 3 8
100
+SAD_X_8xN 3 16
101
+SAD_X_8xN 3 32
102
+
103
+SAD_X_8xN 4 4
104
+SAD_X_8xN 4 8
105
+SAD_X_8xN 4 16
106
+SAD_X_8xN 4 32
107
x265_3.3.tar.gz/source/common/arm/asm-primitives.cpp -> x265_3.4.tar.gz/source/common/arm/asm-primitives.cpp Changed
354
 
1
@@ -5,6 +5,7 @@
2
  *          Praveen Kumar Tiwari <praveen@multicorewareinc.com>
3
  *          Min Chen <chenm003@163.com> <min.chen@multicorewareinc.com>
4
  *          Dnyaneshwar Gorade <dnyaneshwar@multicorewareinc.com>
5
+ *          Hongbin Liu<liuhongbin1@huawei.com>
6
  *
7
  * This program is free software; you can redistribute it and/or modify
8
  * it under the terms of the GNU General Public License as published by
9
@@ -48,77 +49,77 @@
10
         p.ssim_4x4x2_core = PFX(ssim_4x4x2_core_neon);
11
 
12
         // addAvg
13
-         p.pu[LUMA_4x4].addAvg   = PFX(addAvg_4x4_neon);
14
-         p.pu[LUMA_4x8].addAvg   = PFX(addAvg_4x8_neon);
15
-         p.pu[LUMA_4x16].addAvg  = PFX(addAvg_4x16_neon);
16
-         p.pu[LUMA_8x4].addAvg   = PFX(addAvg_8x4_neon);
17
-         p.pu[LUMA_8x8].addAvg   = PFX(addAvg_8x8_neon);
18
-         p.pu[LUMA_8x16].addAvg  = PFX(addAvg_8x16_neon);
19
-         p.pu[LUMA_8x32].addAvg  = PFX(addAvg_8x32_neon);
20
-         p.pu[LUMA_12x16].addAvg = PFX(addAvg_12x16_neon);
21
-         p.pu[LUMA_16x4].addAvg  = PFX(addAvg_16x4_neon);
22
-         p.pu[LUMA_16x8].addAvg  = PFX(addAvg_16x8_neon);
23
-         p.pu[LUMA_16x12].addAvg = PFX(addAvg_16x12_neon);
24
-         p.pu[LUMA_16x16].addAvg = PFX(addAvg_16x16_neon);
25
-         p.pu[LUMA_16x32].addAvg = PFX(addAvg_16x32_neon);
26
-         p.pu[LUMA_16x64].addAvg = PFX(addAvg_16x64_neon);
27
-         p.pu[LUMA_24x32].addAvg = PFX(addAvg_24x32_neon);
28
-         p.pu[LUMA_32x8].addAvg  = PFX(addAvg_32x8_neon);
29
-         p.pu[LUMA_32x16].addAvg = PFX(addAvg_32x16_neon);
30
-         p.pu[LUMA_32x24].addAvg = PFX(addAvg_32x24_neon);
31
-         p.pu[LUMA_32x32].addAvg = PFX(addAvg_32x32_neon);
32
-         p.pu[LUMA_32x64].addAvg = PFX(addAvg_32x64_neon);
33
-         p.pu[LUMA_48x64].addAvg = PFX(addAvg_48x64_neon);
34
-         p.pu[LUMA_64x16].addAvg = PFX(addAvg_64x16_neon);
35
-         p.pu[LUMA_64x32].addAvg = PFX(addAvg_64x32_neon);
36
-         p.pu[LUMA_64x48].addAvg = PFX(addAvg_64x48_neon);
37
-         p.pu[LUMA_64x64].addAvg = PFX(addAvg_64x64_neon);
38
+         p.pu[LUMA_4x4].addAvg[NONALIGNED]   = PFX(addAvg_4x4_neon);
39
+         p.pu[LUMA_4x8].addAvg[NONALIGNED]   = PFX(addAvg_4x8_neon);
40
+         p.pu[LUMA_4x16].addAvg[NONALIGNED]  = PFX(addAvg_4x16_neon);
41
+         p.pu[LUMA_8x4].addAvg[NONALIGNED]   = PFX(addAvg_8x4_neon);
42
+         p.pu[LUMA_8x8].addAvg[NONALIGNED]   = PFX(addAvg_8x8_neon);
43
+         p.pu[LUMA_8x16].addAvg[NONALIGNED]  = PFX(addAvg_8x16_neon);
44
+         p.pu[LUMA_8x32].addAvg[NONALIGNED]  = PFX(addAvg_8x32_neon);
45
+         p.pu[LUMA_12x16].addAvg[NONALIGNED] = PFX(addAvg_12x16_neon);
46
+         p.pu[LUMA_16x4].addAvg[NONALIGNED]  = PFX(addAvg_16x4_neon);
47
+         p.pu[LUMA_16x8].addAvg[NONALIGNED]  = PFX(addAvg_16x8_neon);
48
+         p.pu[LUMA_16x12].addAvg[NONALIGNED] = PFX(addAvg_16x12_neon);
49
+         p.pu[LUMA_16x16].addAvg[NONALIGNED] = PFX(addAvg_16x16_neon);
50
+         p.pu[LUMA_16x32].addAvg[NONALIGNED] = PFX(addAvg_16x32_neon);
51
+         p.pu[LUMA_16x64].addAvg[NONALIGNED] = PFX(addAvg_16x64_neon);
52
+         p.pu[LUMA_24x32].addAvg[NONALIGNED] = PFX(addAvg_24x32_neon);
53
+         p.pu[LUMA_32x8].addAvg[NONALIGNED]  = PFX(addAvg_32x8_neon);
54
+         p.pu[LUMA_32x16].addAvg[NONALIGNED] = PFX(addAvg_32x16_neon);
55
+         p.pu[LUMA_32x24].addAvg[NONALIGNED] = PFX(addAvg_32x24_neon);
56
+         p.pu[LUMA_32x32].addAvg[NONALIGNED] = PFX(addAvg_32x32_neon);
57
+         p.pu[LUMA_32x64].addAvg[NONALIGNED] = PFX(addAvg_32x64_neon);
58
+         p.pu[LUMA_48x64].addAvg[NONALIGNED] = PFX(addAvg_48x64_neon);
59
+         p.pu[LUMA_64x16].addAvg[NONALIGNED] = PFX(addAvg_64x16_neon);
60
+         p.pu[LUMA_64x32].addAvg[NONALIGNED] = PFX(addAvg_64x32_neon);
61
+         p.pu[LUMA_64x48].addAvg[NONALIGNED] = PFX(addAvg_64x48_neon);
62
+         p.pu[LUMA_64x64].addAvg[NONALIGNED] = PFX(addAvg_64x64_neon);
63
 
64
         // chroma addAvg
65
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].addAvg   = PFX(addAvg_4x2_neon);
66
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].addAvg   = PFX(addAvg_4x4_neon);
67
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].addAvg   = PFX(addAvg_4x8_neon);
68
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].addAvg  = PFX(addAvg_4x16_neon);
69
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_6x8].addAvg   = PFX(addAvg_6x8_neon);
70
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].addAvg   = PFX(addAvg_8x2_neon);
71
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].addAvg   = PFX(addAvg_8x4_neon);
72
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].addAvg   = PFX(addAvg_8x6_neon);
73
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].addAvg   = PFX(addAvg_8x8_neon);
74
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].addAvg  = PFX(addAvg_8x16_neon);
75
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].addAvg  = PFX(addAvg_8x32_neon);
76
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_12x16].addAvg = PFX(addAvg_12x16_neon);
77
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].addAvg  = PFX(addAvg_16x4_neon);
78
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].addAvg  = PFX(addAvg_16x8_neon);
79
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].addAvg = PFX(addAvg_16x12_neon);
80
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].addAvg = PFX(addAvg_16x16_neon);
81
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].addAvg = PFX(addAvg_16x32_neon);
82
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_24x32].addAvg = PFX(addAvg_24x32_neon);
83
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].addAvg  = PFX(addAvg_32x8_neon);
84
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].addAvg = PFX(addAvg_32x16_neon);
85
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].addAvg = PFX(addAvg_32x24_neon);
86
-        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].addAvg = PFX(addAvg_32x32_neon);
87
-
88
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].addAvg   = PFX(addAvg_4x8_neon);
89
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].addAvg  = PFX(addAvg_4x16_neon);
90
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].addAvg  = PFX(addAvg_4x32_neon);
91
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_6x16].addAvg  = PFX(addAvg_6x16_neon);
92
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].addAvg   = PFX(addAvg_8x4_neon);
93
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].addAvg   = PFX(addAvg_8x8_neon);
94
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].addAvg  = PFX(addAvg_8x12_neon);
95
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].addAvg  = PFX(addAvg_8x16_neon);
96
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].addAvg  = PFX(addAvg_8x32_neon);
97
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].addAvg  = PFX(addAvg_8x64_neon);
98
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].addAvg = PFX(addAvg_12x32_neon);
99
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x8].addAvg  = PFX(addAvg_16x8_neon);
100
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x16].addAvg = PFX(addAvg_16x16_neon);
101
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].addAvg = PFX(addAvg_16x24_neon);
102
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x32].addAvg = PFX(addAvg_16x32_neon);
103
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x64].addAvg = PFX(addAvg_16x64_neon);
104
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_24x64].addAvg = PFX(addAvg_24x64_neon);
105
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].addAvg = PFX(addAvg_32x16_neon);
106
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].addAvg = PFX(addAvg_32x32_neon);
107
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].addAvg = PFX(addAvg_32x48_neon);
108
-        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].addAvg = PFX(addAvg_32x64_neon);
109
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].addAvg[NONALIGNED]   = PFX(addAvg_4x2_neon);
110
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].addAvg[NONALIGNED]   = PFX(addAvg_4x4_neon);
111
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].addAvg[NONALIGNED]   = PFX(addAvg_4x8_neon);
112
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].addAvg[NONALIGNED]  = PFX(addAvg_4x16_neon);
113
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_6x8].addAvg[NONALIGNED]   = PFX(addAvg_6x8_neon);
114
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].addAvg[NONALIGNED]   = PFX(addAvg_8x2_neon);
115
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].addAvg[NONALIGNED]   = PFX(addAvg_8x4_neon);
116
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].addAvg[NONALIGNED]   = PFX(addAvg_8x6_neon);
117
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].addAvg[NONALIGNED]   = PFX(addAvg_8x8_neon);
118
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].addAvg[NONALIGNED]  = PFX(addAvg_8x16_neon);
119
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].addAvg[NONALIGNED]  = PFX(addAvg_8x32_neon);
120
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_12x16].addAvg[NONALIGNED] = PFX(addAvg_12x16_neon);
121
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].addAvg[NONALIGNED]  = PFX(addAvg_16x4_neon);
122
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].addAvg[NONALIGNED]  = PFX(addAvg_16x8_neon);
123
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].addAvg[NONALIGNED] = PFX(addAvg_16x12_neon);
124
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].addAvg[NONALIGNED] = PFX(addAvg_16x16_neon);
125
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].addAvg[NONALIGNED] = PFX(addAvg_16x32_neon);
126
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_24x32].addAvg[NONALIGNED] = PFX(addAvg_24x32_neon);
127
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].addAvg[NONALIGNED]  = PFX(addAvg_32x8_neon);
128
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].addAvg[NONALIGNED] = PFX(addAvg_32x16_neon);
129
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].addAvg[NONALIGNED] = PFX(addAvg_32x24_neon);
130
+        p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].addAvg[NONALIGNED] = PFX(addAvg_32x32_neon);
131
+
132
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].addAvg[NONALIGNED]   = PFX(addAvg_4x8_neon);
133
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].addAvg[NONALIGNED]  = PFX(addAvg_4x16_neon);
134
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].addAvg[NONALIGNED]  = PFX(addAvg_4x32_neon);
135
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_6x16].addAvg[NONALIGNED]  = PFX(addAvg_6x16_neon);
136
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].addAvg[NONALIGNED]   = PFX(addAvg_8x4_neon);
137
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].addAvg[NONALIGNED]   = PFX(addAvg_8x8_neon);
138
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].addAvg[NONALIGNED]  = PFX(addAvg_8x12_neon);
139
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].addAvg[NONALIGNED]  = PFX(addAvg_8x16_neon);
140
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].addAvg[NONALIGNED]  = PFX(addAvg_8x32_neon);
141
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].addAvg[NONALIGNED]  = PFX(addAvg_8x64_neon);
142
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].addAvg[NONALIGNED] = PFX(addAvg_12x32_neon);
143
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x8].addAvg[NONALIGNED]  = PFX(addAvg_16x8_neon);
144
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x16].addAvg[NONALIGNED] = PFX(addAvg_16x16_neon);
145
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].addAvg[NONALIGNED] = PFX(addAvg_16x24_neon);
146
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x32].addAvg[NONALIGNED] = PFX(addAvg_16x32_neon);
147
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_16x64].addAvg[NONALIGNED] = PFX(addAvg_16x64_neon);
148
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_24x64].addAvg[NONALIGNED] = PFX(addAvg_24x64_neon);
149
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].addAvg[NONALIGNED] = PFX(addAvg_32x16_neon);
150
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].addAvg[NONALIGNED] = PFX(addAvg_32x32_neon);
151
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].addAvg[NONALIGNED] = PFX(addAvg_32x48_neon);
152
+        p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].addAvg[NONALIGNED] = PFX(addAvg_32x64_neon);
153
 
154
         // quant
155
          p.quant = PFX(quant_neon);
156
@@ -402,7 +403,7 @@
157
         p.scale2D_64to32  = PFX(scale2D_64to32_neon);
158
 
159
         // scale1D_128to64
160
-        p.scale1D_128to64 = PFX(scale1D_128to64_neon);
161
+        p.scale1D_128to64[NONALIGNED] = PFX(scale1D_128to64_neon);
162
 
163
         // copy_count
164
         p.cu[BLOCK_4x4].copy_cnt     = PFX(copy_cnt_4_neon);
165
@@ -411,37 +412,37 @@
166
         p.cu[BLOCK_32x32].copy_cnt   = PFX(copy_cnt_32_neon);
167
 
168
         // filterPixelToShort
169
-        p.pu[LUMA_4x4].convert_p2s   = PFX(filterPixelToShort_4x4_neon);
170
-        p.pu[LUMA_4x8].convert_p2s   = PFX(filterPixelToShort_4x8_neon);
171
-        p.pu[LUMA_4x16].convert_p2s  = PFX(filterPixelToShort_4x16_neon);
172
-        p.pu[LUMA_8x4].convert_p2s   = PFX(filterPixelToShort_8x4_neon);
173
-        p.pu[LUMA_8x8].convert_p2s   = PFX(filterPixelToShort_8x8_neon);
174
-        p.pu[LUMA_8x16].convert_p2s  = PFX(filterPixelToShort_8x16_neon);
175
-        p.pu[LUMA_8x32].convert_p2s  = PFX(filterPixelToShort_8x32_neon);
176
-        p.pu[LUMA_12x16].convert_p2s = PFX(filterPixelToShort_12x16_neon);
177
-        p.pu[LUMA_16x4].convert_p2s  = PFX(filterPixelToShort_16x4_neon);
178
-        p.pu[LUMA_16x8].convert_p2s  = PFX(filterPixelToShort_16x8_neon);
179
-        p.pu[LUMA_16x12].convert_p2s = PFX(filterPixelToShort_16x12_neon);
180
-        p.pu[LUMA_16x16].convert_p2s = PFX(filterPixelToShort_16x16_neon);
181
-        p.pu[LUMA_16x32].convert_p2s = PFX(filterPixelToShort_16x32_neon);
182
-        p.pu[LUMA_16x64].convert_p2s = PFX(filterPixelToShort_16x64_neon);
183
-        p.pu[LUMA_24x32].convert_p2s = PFX(filterPixelToShort_24x32_neon);
184
-        p.pu[LUMA_32x8].convert_p2s  = PFX(filterPixelToShort_32x8_neon);
185
-        p.pu[LUMA_32x16].convert_p2s = PFX(filterPixelToShort_32x16_neon);
186
-        p.pu[LUMA_32x24].convert_p2s = PFX(filterPixelToShort_32x24_neon);
187
-        p.pu[LUMA_32x32].convert_p2s = PFX(filterPixelToShort_32x32_neon);
188
-        p.pu[LUMA_32x64].convert_p2s = PFX(filterPixelToShort_32x64_neon);
189
-        p.pu[LUMA_48x64].convert_p2s = PFX(filterPixelToShort_48x64_neon);
190
-        p.pu[LUMA_64x16].convert_p2s = PFX(filterPixelToShort_64x16_neon);
191
-        p.pu[LUMA_64x32].convert_p2s = PFX(filterPixelToShort_64x32_neon);
192
-        p.pu[LUMA_64x48].convert_p2s = PFX(filterPixelToShort_64x48_neon);
193
-        p.pu[LUMA_64x64].convert_p2s = PFX(filterPixelToShort_64x64_neon);
194
+        p.pu[LUMA_4x4].convert_p2s[NONALIGNED]   = PFX(filterPixelToShort_4x4_neon);
195
+        p.pu[LUMA_4x8].convert_p2s[NONALIGNED]   = PFX(filterPixelToShort_4x8_neon);
196
+        p.pu[LUMA_4x16].convert_p2s[NONALIGNED]  = PFX(filterPixelToShort_4x16_neon);
197
+        p.pu[LUMA_8x4].convert_p2s[NONALIGNED]   = PFX(filterPixelToShort_8x4_neon);
198
+        p.pu[LUMA_8x8].convert_p2s[NONALIGNED]   = PFX(filterPixelToShort_8x8_neon);
199
+        p.pu[LUMA_8x16].convert_p2s[NONALIGNED]  = PFX(filterPixelToShort_8x16_neon);
200
+        p.pu[LUMA_8x32].convert_p2s[NONALIGNED]  = PFX(filterPixelToShort_8x32_neon);
201
+        p.pu[LUMA_12x16].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_12x16_neon);
202
+        p.pu[LUMA_16x4].convert_p2s[NONALIGNED]  = PFX(filterPixelToShort_16x4_neon);
203
+        p.pu[LUMA_16x8].convert_p2s[NONALIGNED]  = PFX(filterPixelToShort_16x8_neon);
204
+        p.pu[LUMA_16x12].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_16x12_neon);
205
+        p.pu[LUMA_16x16].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_16x16_neon);
206
+        p.pu[LUMA_16x32].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_16x32_neon);
207
+        p.pu[LUMA_16x64].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_16x64_neon);
208
+        p.pu[LUMA_24x32].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_24x32_neon);
209
+        p.pu[LUMA_32x8].convert_p2s[NONALIGNED]  = PFX(filterPixelToShort_32x8_neon);
210
+        p.pu[LUMA_32x16].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_32x16_neon);
211
+        p.pu[LUMA_32x24].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_32x24_neon);
212
+        p.pu[LUMA_32x32].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_32x32_neon);
213
+        p.pu[LUMA_32x64].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_32x64_neon);
214
+        p.pu[LUMA_48x64].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_48x64_neon);
215
+        p.pu[LUMA_64x16].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_64x16_neon);
216
+        p.pu[LUMA_64x32].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_64x32_neon);
217
+        p.pu[LUMA_64x48].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_64x48_neon);
218
+        p.pu[LUMA_64x64].convert_p2s[NONALIGNED] = PFX(filterPixelToShort_64x64_neon);
219
 
220
         // Block_fill
221
-        p.cu[BLOCK_4x4].blockfill_s   = PFX(blockfill_s_4x4_neon);
222
-        p.cu[BLOCK_8x8].blockfill_s   = PFX(blockfill_s_8x8_neon);
223
-        p.cu[BLOCK_16x16].blockfill_s = PFX(blockfill_s_16x16_neon);
224
-        p.cu[BLOCK_32x32].blockfill_s = PFX(blockfill_s_32x32_neon);
225
+        p.cu[BLOCK_4x4].blockfill_s[NONALIGNED]   = PFX(blockfill_s_4x4_neon);
226
+        p.cu[BLOCK_8x8].blockfill_s[NONALIGNED]   = PFX(blockfill_s_8x8_neon);
227
+        p.cu[BLOCK_16x16].blockfill_s[NONALIGNED] = PFX(blockfill_s_16x16_neon);
228
+        p.cu[BLOCK_32x32].blockfill_s[NONALIGNED] = PFX(blockfill_s_32x32_neon);
229
 
230
         // Blockcopy_ss
231
         p.cu[BLOCK_4x4].copy_ss   = PFX(blockcopy_ss_4x4_neon);
232
@@ -495,21 +496,21 @@
233
         p.chroma[X265_CSP_I422].cu[BLOCK_422_32x64].copy_sp = PFX(blockcopy_sp_32x64_neon);
234
 
235
         // pixel_add_ps
236
-        p.cu[BLOCK_4x4].add_ps   = PFX(pixel_add_ps_4x4_neon);
237
-        p.cu[BLOCK_8x8].add_ps   = PFX(pixel_add_ps_8x8_neon);
238
-        p.cu[BLOCK_16x16].add_ps = PFX(pixel_add_ps_16x16_neon);
239
-        p.cu[BLOCK_32x32].add_ps = PFX(pixel_add_ps_32x32_neon);
240
-        p.cu[BLOCK_64x64].add_ps = PFX(pixel_add_ps_64x64_neon);
241
+        p.cu[BLOCK_4x4].add_ps[NONALIGNED]   = PFX(pixel_add_ps_4x4_neon);
242
+        p.cu[BLOCK_8x8].add_ps[NONALIGNED]   = PFX(pixel_add_ps_8x8_neon);
243
+        p.cu[BLOCK_16x16].add_ps[NONALIGNED] = PFX(pixel_add_ps_16x16_neon);
244
+        p.cu[BLOCK_32x32].add_ps[NONALIGNED] = PFX(pixel_add_ps_32x32_neon);
245
+        p.cu[BLOCK_64x64].add_ps[NONALIGNED] = PFX(pixel_add_ps_64x64_neon);
246
 
247
         // chroma add_ps
248
-        p.chroma[X265_CSP_I420].cu[BLOCK_420_4x4].add_ps   = PFX(pixel_add_ps_4x4_neon);
249
-        p.chroma[X265_CSP_I420].cu[BLOCK_420_8x8].add_ps   = PFX(pixel_add_ps_8x8_neon);
250
-        p.chroma[X265_CSP_I420].cu[BLOCK_420_16x16].add_ps = PFX(pixel_add_ps_16x16_neon);
251
-        p.chroma[X265_CSP_I420].cu[BLOCK_420_32x32].add_ps = PFX(pixel_add_ps_32x32_neon);
252
-        p.chroma[X265_CSP_I422].cu[BLOCK_422_4x8].add_ps   = PFX(pixel_add_ps_4x8_neon);
253
-        p.chroma[X265_CSP_I422].cu[BLOCK_422_8x16].add_ps  = PFX(pixel_add_ps_8x16_neon);
254
-        p.chroma[X265_CSP_I422].cu[BLOCK_422_16x32].add_ps = PFX(pixel_add_ps_16x32_neon);
255
-        p.chroma[X265_CSP_I422].cu[BLOCK_422_32x64].add_ps = PFX(pixel_add_ps_32x64_neon);
256
+        p.chroma[X265_CSP_I420].cu[BLOCK_420_4x4].add_ps[NONALIGNED]   = PFX(pixel_add_ps_4x4_neon);
257
+        p.chroma[X265_CSP_I420].cu[BLOCK_420_8x8].add_ps[NONALIGNED]   = PFX(pixel_add_ps_8x8_neon);
258
+        p.chroma[X265_CSP_I420].cu[BLOCK_420_16x16].add_ps[NONALIGNED] = PFX(pixel_add_ps_16x16_neon);
259
+        p.chroma[X265_CSP_I420].cu[BLOCK_420_32x32].add_ps[NONALIGNED] = PFX(pixel_add_ps_32x32_neon);
260
+        p.chroma[X265_CSP_I422].cu[BLOCK_422_4x8].add_ps[NONALIGNED]   = PFX(pixel_add_ps_4x8_neon);
261
+        p.chroma[X265_CSP_I422].cu[BLOCK_422_8x16].add_ps[NONALIGNED]  = PFX(pixel_add_ps_8x16_neon);
262
+        p.chroma[X265_CSP_I422].cu[BLOCK_422_16x32].add_ps[NONALIGNED] = PFX(pixel_add_ps_16x32_neon);
263
+        p.chroma[X265_CSP_I422].cu[BLOCK_422_32x64].add_ps[NONALIGNED] = PFX(pixel_add_ps_32x64_neon);
264
 
265
         // cpy2Dto1D_shr
266
         p.cu[BLOCK_4x4].cpy2Dto1D_shr   = PFX(cpy2Dto1D_shr_4x4_neon);
267
@@ -518,10 +519,10 @@
268
         p.cu[BLOCK_32x32].cpy2Dto1D_shr = PFX(cpy2Dto1D_shr_32x32_neon);
269
 
270
         // ssd_s
271
-        p.cu[BLOCK_4x4].ssd_s   = PFX(pixel_ssd_s_4x4_neon);
272
-        p.cu[BLOCK_8x8].ssd_s   = PFX(pixel_ssd_s_8x8_neon);
273
-        p.cu[BLOCK_16x16].ssd_s = PFX(pixel_ssd_s_16x16_neon);
274
-        p.cu[BLOCK_32x32].ssd_s = PFX(pixel_ssd_s_32x32_neon);
275
+        p.cu[BLOCK_4x4].ssd_s[NONALIGNED]   = PFX(pixel_ssd_s_4x4_neon);
276
+        p.cu[BLOCK_8x8].ssd_s[NONALIGNED]   = PFX(pixel_ssd_s_8x8_neon);
277
+        p.cu[BLOCK_16x16].ssd_s[NONALIGNED] = PFX(pixel_ssd_s_16x16_neon);
278
+        p.cu[BLOCK_32x32].ssd_s[NONALIGNED] = PFX(pixel_ssd_s_32x32_neon);
279
 
280
         // sse_ss
281
         p.cu[BLOCK_4x4].sse_ss   = PFX(pixel_sse_ss_4x4_neon);
282
@@ -548,10 +549,10 @@
283
         p.chroma[X265_CSP_I422].cu[BLOCK_422_32x64].sub_ps = PFX(pixel_sub_ps_32x64_neon);
284
 
285
         // calc_Residual
286
-        p.cu[BLOCK_4x4].calcresidual   = PFX(getResidual4_neon);
287
-        p.cu[BLOCK_8x8].calcresidual   = PFX(getResidual8_neon);
288
-        p.cu[BLOCK_16x16].calcresidual = PFX(getResidual16_neon);
289
-        p.cu[BLOCK_32x32].calcresidual = PFX(getResidual32_neon);
290
+        p.cu[BLOCK_4x4].calcresidual[NONALIGNED]   = PFX(getResidual4_neon);
291
+        p.cu[BLOCK_8x8].calcresidual[NONALIGNED]   = PFX(getResidual8_neon);
292
+        p.cu[BLOCK_16x16].calcresidual[NONALIGNED] = PFX(getResidual16_neon);
293
+        p.cu[BLOCK_32x32].calcresidual[NONALIGNED] = PFX(getResidual32_neon);
294
 
295
         // sse_pp
296
         p.cu[BLOCK_4x4].sse_pp   = PFX(pixel_sse_pp_4x4_neon);
297
@@ -722,31 +723,31 @@
298
         p.pu[LUMA_64x64].sad_x4 = PFX(sad_x4_64x64_neon);
299
 
300
         // pixel_avg_pp
301
-        p.pu[LUMA_4x4].pixelavg_pp   = PFX(pixel_avg_pp_4x4_neon);
302
-        p.pu[LUMA_4x8].pixelavg_pp   = PFX(pixel_avg_pp_4x8_neon);
303
-        p.pu[LUMA_4x16].pixelavg_pp  = PFX(pixel_avg_pp_4x16_neon);
304
-        p.pu[LUMA_8x4].pixelavg_pp   = PFX(pixel_avg_pp_8x4_neon);
305
-        p.pu[LUMA_8x8].pixelavg_pp   = PFX(pixel_avg_pp_8x8_neon);
306
-        p.pu[LUMA_8x16].pixelavg_pp  = PFX(pixel_avg_pp_8x16_neon);
307
-        p.pu[LUMA_8x32].pixelavg_pp  = PFX(pixel_avg_pp_8x32_neon);
308
-        p.pu[LUMA_12x16].pixelavg_pp = PFX(pixel_avg_pp_12x16_neon);
309
-        p.pu[LUMA_16x4].pixelavg_pp  = PFX(pixel_avg_pp_16x4_neon);
310
-        p.pu[LUMA_16x8].pixelavg_pp  = PFX(pixel_avg_pp_16x8_neon);
311
-        p.pu[LUMA_16x12].pixelavg_pp = PFX(pixel_avg_pp_16x12_neon);
312
-        p.pu[LUMA_16x16].pixelavg_pp = PFX(pixel_avg_pp_16x16_neon);
313
-        p.pu[LUMA_16x32].pixelavg_pp = PFX(pixel_avg_pp_16x32_neon);
314
-        p.pu[LUMA_16x64].pixelavg_pp = PFX(pixel_avg_pp_16x64_neon);
315
-        p.pu[LUMA_24x32].pixelavg_pp = PFX(pixel_avg_pp_24x32_neon);
316
-        p.pu[LUMA_32x8].pixelavg_pp  = PFX(pixel_avg_pp_32x8_neon);
317
-        p.pu[LUMA_32x16].pixelavg_pp = PFX(pixel_avg_pp_32x16_neon);
318
-        p.pu[LUMA_32x24].pixelavg_pp = PFX(pixel_avg_pp_32x24_neon);
319
-        p.pu[LUMA_32x32].pixelavg_pp = PFX(pixel_avg_pp_32x32_neon);
320
-        p.pu[LUMA_32x64].pixelavg_pp = PFX(pixel_avg_pp_32x64_neon);
321
-        p.pu[LUMA_48x64].pixelavg_pp = PFX(pixel_avg_pp_48x64_neon);
322
-        p.pu[LUMA_64x16].pixelavg_pp = PFX(pixel_avg_pp_64x16_neon);
323
-        p.pu[LUMA_64x32].pixelavg_pp = PFX(pixel_avg_pp_64x32_neon);
324
-        p.pu[LUMA_64x48].pixelavg_pp = PFX(pixel_avg_pp_64x48_neon);
325
-        p.pu[LUMA_64x64].pixelavg_pp = PFX(pixel_avg_pp_64x64_neon);
326
+        p.pu[LUMA_4x4].pixelavg_pp[NONALIGNED]   = PFX(pixel_avg_pp_4x4_neon);
327
+        p.pu[LUMA_4x8].pixelavg_pp[NONALIGNED]   = PFX(pixel_avg_pp_4x8_neon);
328
+        p.pu[LUMA_4x16].pixelavg_pp[NONALIGNED]  = PFX(pixel_avg_pp_4x16_neon);
329
+        p.pu[LUMA_8x4].pixelavg_pp[NONALIGNED]   = PFX(pixel_avg_pp_8x4_neon);
330
+        p.pu[LUMA_8x8].pixelavg_pp[NONALIGNED]   = PFX(pixel_avg_pp_8x8_neon);
331
+        p.pu[LUMA_8x16].pixelavg_pp[NONALIGNED]  = PFX(pixel_avg_pp_8x16_neon);
332
+        p.pu[LUMA_8x32].pixelavg_pp[NONALIGNED]  = PFX(pixel_avg_pp_8x32_neon);
333
+        p.pu[LUMA_12x16].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_12x16_neon);
334
+        p.pu[LUMA_16x4].pixelavg_pp[NONALIGNED]  = PFX(pixel_avg_pp_16x4_neon);
335
+        p.pu[LUMA_16x8].pixelavg_pp[NONALIGNED]  = PFX(pixel_avg_pp_16x8_neon);
336
+        p.pu[LUMA_16x12].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_16x12_neon);
337
+        p.pu[LUMA_16x16].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_16x16_neon);
338
+        p.pu[LUMA_16x32].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_16x32_neon);
339
+        p.pu[LUMA_16x64].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_16x64_neon);
340
+        p.pu[LUMA_24x32].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_24x32_neon);
341
+        p.pu[LUMA_32x8].pixelavg_pp[NONALIGNED]  = PFX(pixel_avg_pp_32x8_neon);
342
+        p.pu[LUMA_32x16].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_32x16_neon);
343
+        p.pu[LUMA_32x24].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_32x24_neon);
344
+        p.pu[LUMA_32x32].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_32x32_neon);
345
+        p.pu[LUMA_32x64].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_32x64_neon);
346
+        p.pu[LUMA_48x64].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_48x64_neon);
347
+        p.pu[LUMA_64x16].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_64x16_neon);
348
+        p.pu[LUMA_64x32].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_64x32_neon);
349
+        p.pu[LUMA_64x48].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_64x48_neon);
350
+        p.pu[LUMA_64x64].pixelavg_pp[NONALIGNED] = PFX(pixel_avg_pp_64x64_neon);
351
 
352
         // planecopy
353
         p.planecopy_cp = PFX(pixel_planecopy_cp_neon);
354
x265_3.3.tar.gz/source/common/common.h -> x265_3.4.tar.gz/source/common/common.h Changed
27
 
1
@@ -129,6 +129,7 @@
2
 typedef uint64_t sum2_t;
3
 typedef uint64_t pixel4;
4
 typedef int64_t  ssum2_t;
5
+#define SHIFT_TO_BITPLANE 9
6
 #define HISTOGRAM_BINS 1024
7
 #else
8
 typedef uint8_t  pixel;
9
@@ -136,6 +137,7 @@
10
 typedef uint32_t sum2_t;
11
 typedef uint32_t pixel4;
12
 typedef int32_t  ssum2_t; // Signed sum
13
+#define SHIFT_TO_BITPLANE 7
14
 #define HISTOGRAM_BINS 256
15
 #endif // if HIGH_BIT_DEPTH
16
 
17
@@ -270,6 +272,9 @@
18
 #define MAX_TR_SIZE (1 << MAX_LOG2_TR_SIZE)
19
 #define MAX_TS_SIZE (1 << MAX_LOG2_TS_SIZE)
20
 
21
+#define RDCOST_BASED_RSKIP 1
22
+#define EDGE_BASED_RSKIP 2
23
+
24
 #define COEF_REMAIN_BIN_REDUCTION   3 // indicates the level at which the VLC
25
                                       // transitions from Golomb-Rice to TU+EG(k)
26
 
27
x265_3.3.tar.gz/source/common/cpu.cpp -> x265_3.4.tar.gz/source/common/cpu.cpp Changed
19
 
1
@@ -5,6 +5,8 @@
2
  *          Laurent Aimar <fenrir@via.ecp.fr>
3
  *          Fiona Glaser <fiona@x264.com>
4
  *          Steve Borho <steve@borho.org>
5
+ *          Hongbin Liu <liuhongbin1@huawei.com>
6
+ *          Yimeng Su <yimeng.su@huawei.com>
7
  *
8
  * This program is free software; you can redistribute it and/or modify
9
  * it under the terms of the GNU General Public License as published by
10
@@ -367,6 +369,8 @@
11
     flags |= PFX(cpu_fast_neon_mrc_test)() ? X265_CPU_FAST_NEON_MRC : 0;
12
 #endif
13
     // TODO: write dual issue test? currently it's A8 (dual issue) vs. A9 (fast mrc)
14
+#elif X265_ARCH_ARM64
15
+    flags |= X265_CPU_NEON;
16
 #endif // if HAVE_ARMV6
17
     return flags;
18
 }
19
x265_3.3.tar.gz/source/common/frame.cpp -> x265_3.4.tar.gz/source/common/frame.cpp Changed
41
 
1
@@ -61,6 +61,8 @@
2
     m_edgePic = NULL;
3
     m_gaussianPic = NULL;
4
     m_thetaPic = NULL;
5
+    m_edgeBitPlane = NULL;
6
+    m_edgeBitPic = NULL;
7
 }
8
 
9
 bool Frame::create(x265_param *param, float* quantOffsets)
10
@@ -115,6 +117,19 @@
11
         m_thetaPic = X265_MALLOC(pixel, m_stride * (maxHeight + (m_lumaMarginY * 2)));
12
     }
13
 
14
+    if (param->recursionSkipMode == EDGE_BASED_RSKIP)
15
+    {
16
+        uint32_t numCuInWidth = (param->sourceWidth + param->maxCUSize - 1) / param->maxCUSize;
17
+        uint32_t numCuInHeight = (param->sourceHeight + param->maxCUSize - 1) / param->maxCUSize;
18
+        uint32_t lumaMarginX = param->maxCUSize + 32;
19
+        uint32_t lumaMarginY = param->maxCUSize + 16;
20
+        uint32_t stride = (numCuInWidth * param->maxCUSize) + (lumaMarginX << 1);
21
+        uint32_t maxHeight = numCuInHeight * param->maxCUSize;
22
+        uint32_t bitPlaneSize = stride * (maxHeight + (lumaMarginY * 2));
23
+        CHECKED_MALLOC_ZERO(m_edgeBitPlane, pixel, bitPlaneSize);
24
+        m_edgeBitPic = m_edgeBitPlane + lumaMarginY * stride + lumaMarginX;
25
+    }
26
+
27
     if (m_fencPic->create(param, !!m_param->bCopyPicToFrame) && m_lowres.create(param, m_fencPic, param->rc.qgSize))
28
     {
29
         X265_CHECK((m_reconColCount == NULL), "m_reconColCount was initialized");
30
@@ -267,4 +282,10 @@
31
         X265_FREE(m_gaussianPic);
32
         X265_FREE(m_thetaPic);
33
     }
34
+
35
+    if (m_param->recursionSkipMode == EDGE_BASED_RSKIP)
36
+    {
37
+        X265_FREE_ZERO(m_edgeBitPlane);
38
+        m_edgeBitPic = NULL;
39
+    }
40
 }
41
x265_3.3.tar.gz/source/common/frame.h -> x265_3.4.tar.gz/source/common/frame.h Changed
21
 
1
@@ -99,7 +99,7 @@
2
     float*                 m_quantOffsets;       // points to quantOffsets in x265_picture
3
     x265_sei               m_userSEI;
4
     uint32_t               m_picStruct;          // picture structure SEI message
5
-    x265_dolby_vision_rpu            m_rpu;
6
+    x265_dolby_vision_rpu  m_rpu;
7
 
8
     /* Frame Parallelism - notification between FrameEncoders of available motion reference rows */
9
     ThreadSafeInteger*     m_reconRowFlag;       // flag of CTU rows completely reconstructed and extended for motion reference
10
@@ -137,6 +137,10 @@
11
     pixel*                 m_gaussianPic;
12
     pixel*                 m_thetaPic;
13
 
14
+    /* edge bit plane for rskips 2 and 3 */
15
+    pixel*                 m_edgeBitPlane;
16
+    pixel*                 m_edgeBitPic;
17
+
18
     Frame();
19
 
20
     bool create(x265_param *param, float* quantOffsets);
21
x265_3.3.tar.gz/source/common/param.cpp -> x265_3.4.tar.gz/source/common/param.cpp Changed
145
 
1
@@ -198,7 +198,8 @@
2
     param->bEnableWeightedPred = 1;
3
     param->bEnableWeightedBiPred = 0;
4
     param->bEnableEarlySkip = 1;
5
-    param->bEnableRecursionSkip = 1;
6
+    param->recursionSkipMode = 1;
7
+    param->edgeVarThreshold = 0.05f;
8
     param->bEnableAMP = 0;
9
     param->bEnableRectInter = 0;
10
     param->rdLevel = 3;
11
@@ -285,6 +286,7 @@
12
     param->rc.bEnableConstVbv = 0;
13
     param->bResetZoneConfig = 1;
14
     param->reconfigWindowSize = 0;
15
+    param->decoderVbvMaxRate = 0;
16
 
17
     /* Video Usability Information (VUI) */
18
     param->vui.aspectRatioIdc = 0;
19
@@ -546,7 +548,7 @@
20
             param->maxNumMergeCand = 5;
21
             param->searchMethod = X265_STAR_SEARCH;
22
             param->bEnableTransformSkip = 1;
23
-            param->bEnableRecursionSkip = 0;
24
+            param->recursionSkipMode = 0;
25
             param->maxNumReferences = 5;
26
             param->limitReferences = 0;
27
             param->lookaheadSlices = 0; // disabled for best quality
28
@@ -598,7 +600,7 @@
29
             param->rc.hevcAq = 0;
30
             param->rc.qpStep = 1;
31
             param->rc.bEnableGrain = 1;
32
-            param->bEnableRecursionSkip = 0;
33
+            param->recursionSkipMode = 0;
34
             param->psyRd = 4.0;
35
             param->psyRdoq = 10.0;
36
             param->bEnableSAO = 0;
37
@@ -702,8 +704,9 @@
38
     OPT("ref") p->maxNumReferences = atoi(value);
39
     OPT("fast-intra") p->bEnableFastIntra = atobool(value);
40
     OPT("early-skip") p->bEnableEarlySkip = atobool(value);
41
-    OPT("rskip") p->bEnableRecursionSkip = atobool(value);
42
-    OPT("me")p->searchMethod = parseName(value, x265_motion_est_names, bError);
43
+    OPT("rskip") p->recursionSkipMode = atoi(value);
44
+    OPT("rskip-edge-threshold") p->edgeVarThreshold = atoi(value)/100.0f;
45
+    OPT("me") p->searchMethod = parseName(value, x265_motion_est_names, bError);
46
     OPT("subme") p->subpelRefine = atoi(value);
47
     OPT("merange") p->searchRange = atoi(value);
48
     OPT("rect") p->bEnableRectInter = atobool(value);
49
@@ -919,7 +922,7 @@
50
     OPT("max-merge") p->maxNumMergeCand = (uint32_t)atoi(value);
51
     OPT("temporal-mvp") p->bEnableTemporalMvp = atobool(value);
52
     OPT("early-skip") p->bEnableEarlySkip = atobool(value);
53
-    OPT("rskip") p->bEnableRecursionSkip = atobool(value);
54
+    OPT("rskip") p->recursionSkipMode = atoi(value);
55
     OPT("rdpenalty") p->rdPenalty = atoi(value);
56
     OPT("tskip") p->bEnableTransformSkip = atobool(value);
57
     OPT("no-tskip-fast") p->bEnableTSkipFast = atobool(value);
58
@@ -1221,6 +1224,7 @@
59
             }
60
         }
61
         OPT("hist-threshold") p->edgeTransitionThreshold = atof(value);
62
+        OPT("rskip-edge-threshold") p->edgeVarThreshold = atoi(value)/100.0f;
63
         OPT("lookahead-threads") p->lookaheadThreads = atoi(value);
64
         OPT("opt-cu-delta-qp") p->bOptCUDeltaQP = atobool(value);
65
         OPT("multi-pass-opt-analysis") p->analysisMultiPassRefine = atobool(value);
66
@@ -1596,9 +1600,16 @@
67
     CHECK(param->rdLevel < 1 || param->rdLevel > 6,
68
           "RD Level is out of range");
69
     CHECK(param->rdoqLevel < 0 || param->rdoqLevel > 2,
70
-        "RDOQ Level is out of range");
71
+          "RDOQ Level is out of range");
72
     CHECK(param->dynamicRd < 0 || param->dynamicRd > x265_ADAPT_RD_STRENGTH,
73
-        "Dynamic RD strength must be between 0 and 4");
74
+          "Dynamic RD strength must be between 0 and 4");
75
+    CHECK(param->recursionSkipMode > 2 || param->recursionSkipMode < 0,
76
+          "Invalid Recursion skip mode. Valid modes 0,1,2");
77
+    if (param->recursionSkipMode == EDGE_BASED_RSKIP)
78
+    {
79
+        CHECK(param->edgeVarThreshold < 0.0f || param->edgeVarThreshold > 1.0f,
80
+              "Minimum edge density percentage for a CU should be an integer between 0 to 100");
81
+    }
82
     CHECK(param->bframes && param->bframes >= param->lookaheadDepth && !param->rc.bStatRead,
83
           "Lookahead depth must be greater than the max consecutive bframe count");
84
     CHECK(param->bframes < 0,
85
@@ -1789,6 +1800,7 @@
86
     }
87
     CHECK(param->confWinRightOffset < 0, "Conformance Window Right Offset must be 0 or greater");
88
     CHECK(param->confWinBottomOffset < 0, "Conformance Window Bottom Offset must be 0 or greater");
89
+    CHECK(param->decoderVbvMaxRate < 0, "Invalid Decoder Vbv Maxrate. Value can not be less than zero");
90
     return check_failed;
91
 }
92
 
93
@@ -1908,7 +1920,9 @@
94
     TOOLVAL(param->psyRdoq, "psy-rdoq=%.2lf");
95
     TOOLOPT(param->bEnableRdRefine, "rd-refine");
96
     TOOLOPT(param->bEnableEarlySkip, "early-skip");
97
-    TOOLOPT(param->bEnableRecursionSkip, "rskip");
98
+    TOOLVAL(param->recursionSkipMode, "rskip mode=%d");
99
+    if (param->recursionSkipMode == EDGE_BASED_RSKIP)
100
+        TOOLVAL(param->edgeVarThreshold, "rskip-edge-threshold=%.2f");
101
     TOOLOPT(param->bEnableSplitRdSkip, "splitrd-skip");
102
     TOOLVAL(param->noiseReductionIntra, "nr-intra=%d");
103
     TOOLVAL(param->noiseReductionInter, "nr-inter=%d");
104
@@ -2066,7 +2080,10 @@
105
     s += sprintf(s, " rd=%d", p->rdLevel);
106
     s += sprintf(s, " selective-sao=%d", p->selectiveSAO);
107
     BOOL(p->bEnableEarlySkip, "early-skip");
108
-    BOOL(p->bEnableRecursionSkip, "rskip");
109
+    BOOL(p->recursionSkipMode, "rskip");
110
+    if (p->recursionSkipMode == EDGE_BASED_RSKIP)
111
+        s += sprintf(s, " rskip-edge-threshold=%f", p->edgeVarThreshold);
112
+
113
     BOOL(p->bEnableFastIntra, "fast-intra");
114
     BOOL(p->bEnableTSkipFast, "tskip-fast");
115
     BOOL(p->bCULossless, "cu-lossless");
116
@@ -2204,6 +2221,7 @@
117
     if (p->bEnableSceneCutAwareQp)
118
         s += sprintf(s, " scenecut-window=%d max-qp-delta=%d", p->scenecutWindow, p->maxQpDelta);
119
     s += sprintf(s, "conformance-window-offsets right=%d bottom=%d", p->confWinRightOffset, p->confWinBottomOffset);
120
+    s += sprintf(s, " decoder-max-rate=%d", p->decoderVbvMaxRate);
121
 #undef BOOL
122
     return buf;
123
 }
124
@@ -2373,7 +2391,8 @@
125
     dst->bSaoNonDeblocked = src->bSaoNonDeblocked;
126
     dst->rdLevel = src->rdLevel;
127
     dst->bEnableEarlySkip = src->bEnableEarlySkip;
128
-    dst->bEnableRecursionSkip = src->bEnableRecursionSkip;
129
+    dst->recursionSkipMode = src->recursionSkipMode;
130
+    dst->edgeVarThreshold = src->edgeVarThreshold;
131
     dst->bEnableFastIntra = src->bEnableFastIntra;
132
     dst->bEnableTSkipFast = src->bEnableTSkipFast;
133
     dst->bCULossless = src->bCULossless;
134
@@ -2419,8 +2438,9 @@
135
     dst->rc.zonefileCount = src->rc.zonefileCount;
136
     dst->reconfigWindowSize = src->reconfigWindowSize;
137
     dst->bResetZoneConfig = src->bResetZoneConfig;
138
+    dst->decoderVbvMaxRate = src->decoderVbvMaxRate;
139
 
140
-    if (src->rc.zonefileCount && src->rc.zones)
141
+    if (src->rc.zonefileCount && src->rc.zones && src->bResetZoneConfig)
142
     {
143
         for (int i = 0; i < src->rc.zonefileCount; i++)
144
         {
145
x265_3.3.tar.gz/source/common/pixel.cpp -> x265_3.4.tar.gz/source/common/pixel.cpp Changed
58
 
1
@@ -5,6 +5,7 @@
2
  *          Mandar Gurav <mandar@multicorewareinc.com>
3
  *          Mahesh Pittala <mahesh@multicorewareinc.com>
4
  *          Min Chen <min.chen@multicorewareinc.com>
5
+ *          Hongbin Liu<liuhongbin1@huawei.com>
6
  *
7
  * This program is free software; you can redistribute it and/or modify
8
  * it under the terms of the GNU General Public License as published by
9
@@ -265,6 +266,10 @@
10
 {
11
     int satd = 0;
12
 
13
+#if ENABLE_ASSEMBLY && X265_ARCH_ARM64
14
+    pixelcmp_t satd_4x4 = x265_pixel_satd_4x4_neon;
15
+#endif
16
+
17
     for (int row = 0; row < h; row += 4)
18
         for (int col = 0; col < w; col += 4)
19
             satd += satd_4x4(pix1 + row * stride_pix1 + col, stride_pix1,
20
@@ -279,6 +284,10 @@
21
 {
22
     int satd = 0;
23
 
24
+#if ENABLE_ASSEMBLY && X265_ARCH_ARM64
25
+    pixelcmp_t satd_8x4 = x265_pixel_satd_8x4_neon;
26
+#endif
27
+
28
     for (int row = 0; row < h; row += 4)
29
         for (int col = 0; col < w; col += 8)
30
             satd += satd_8x4(pix1 + row * stride_pix1 + col, stride_pix1,
31
@@ -876,6 +885,18 @@
32
     }
33
 }
34
 
35
+static void planecopy_pp_shr_c(const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int width, int height, int shift)
36
+{
37
+    for (int r = 0; r < height; r++)
38
+    {
39
+        for (int c = 0; c < width; c++)
40
+            dst[c] = (pixel)((src[c] >> shift));
41
+
42
+        dst += dstStride;
43
+        src += srcStride;
44
+    }
45
+}
46
+
47
 static void planecopy_sp_shl_c(const uint16_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int width, int height, int shift, uint16_t mask)
48
 {
49
     for (int r = 0; r < height; r++)
50
@@ -1316,6 +1337,7 @@
51
     p.planecopy_cp = planecopy_cp_c;
52
     p.planecopy_sp = planecopy_sp_c;
53
     p.planecopy_sp_shl = planecopy_sp_shl_c;
54
+    p.planecopy_pp_shr = planecopy_pp_shr_c;
55
 #if HIGH_BIT_DEPTH
56
     p.planeClipAndMax = planeClipAndMax_c;
57
 #endif
58
x265_3.3.tar.gz/source/common/primitives.h -> x265_3.4.tar.gz/source/common/primitives.h Changed
47
 
1
@@ -8,6 +8,8 @@
2
  *          Rajesh Paulraj <rajesh@multicorewareinc.com>
3
  *          Praveen Kumar Tiwari <praveen@multicorewareinc.com>
4
  *          Min Chen <chenm003@163.com>
5
+ *          Hongbin Liu<liuhongbin1@huawei.com>
6
+ *          Yimeng Su <yimeng.su@huawei.com>
7
  *
8
  * This program is free software; you can redistribute it and/or modify
9
  * it under the terms of the GNU General Public License as published by
10
@@ -204,6 +206,7 @@
11
 typedef void (*sign_t)(int8_t *dst, const pixel *src1, const pixel *src2, const int endX);
12
 typedef void (*planecopy_cp_t) (const uint8_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int width, int height, int shift);
13
 typedef void (*planecopy_sp_t) (const uint16_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int width, int height, int shift, uint16_t mask);
14
+typedef void (*planecopy_pp_t) (const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int width, int height, int shift);
15
 typedef pixel (*planeClipAndMax_t)(pixel *src, intptr_t stride, int width, int height, uint64_t *outsum, const pixel minPix, const pixel maxPix);
16
 
17
 typedef void (*cutree_propagate_cost) (int* dst, const uint16_t* propagateIn, const int32_t* intraCosts, const uint16_t* interCosts, const int32_t* invQscales, const double* fpsFactor, int len);
18
@@ -358,6 +361,7 @@
19
     planecopy_cp_t        planecopy_cp;
20
     planecopy_sp_t        planecopy_sp;
21
     planecopy_sp_t        planecopy_sp_shl;
22
+    planecopy_pp_t        planecopy_pp_shr;
23
     planeClipAndMax_t     planeClipAndMax;
24
 
25
     weightp_sp_t          weight_sp;
26
@@ -465,6 +469,9 @@
27
 void setupInstrinsicPrimitives(EncoderPrimitives &p, int cpuMask);
28
 void setupAssemblyPrimitives(EncoderPrimitives &p, int cpuMask);
29
 void setupAliasPrimitives(EncoderPrimitives &p);
30
+#if X265_ARCH_ARM64
31
+void setupAliasCPrimitives(EncoderPrimitives &cp, EncoderPrimitives &asmp, int cpuMask);
32
+#endif
33
 #if HAVE_ALTIVEC
34
 void setupPixelPrimitives_altivec(EncoderPrimitives &p);
35
 void setupDCTPrimitives_altivec(EncoderPrimitives &p);
36
@@ -479,4 +486,10 @@
37
 extern const char* PFX(build_info_str);
38
 #endif
39
 
40
+#if ENABLE_ASSEMBLY && X265_ARCH_ARM64
41
+extern "C" {
42
+#include "aarch64/pixel-util.h"
43
+}
44
+#endif
45
+
46
 #endif // ifndef X265_PRIMITIVES_H
47
x265_3.4.tar.gz/source/common/scaler.cpp Added
1112
 
1
@@ -0,0 +1,1110 @@
2
+/*****************************************************************************
3
+* Copyright (C) 2013-2020 MulticoreWare, Inc
4
+*
5
+* Authors: Pooja Venkatesan <pooja@multicorewareinc.com>
6
+*
7
+* This program is free software; you can redistribute it and/or modify
8
+* it under the terms of the GNU General Public License as published by
9
+* the Free Software Foundation; either version 2 of the License, or
10
+* (at your option) any later version.
11
+*
12
+* This program is distributed in the hope that it will be useful,
13
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
14
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
15
+* GNU General Public License for more details.
16
+*
17
+* You should have received a copy of the GNU General Public License
18
+* along with this program; if not, write to the Free Software
19
+* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
20
+*
21
+* This program is also available under a commercial proprietary license.
22
+* For more information, contact us at license @ x265.com.
23
+*****************************************************************************/
24
+
25
+#include "scaler.h"
26
+
27
+#if _MSC_VER
28
+#pragma warning(disable: 4706) // assignment within conditional
29
+#pragma warning(disable: 4244) // '=' : possible loss of data
30
+#endif
31
+
32
+#define SHORT_MIN (-(1 << 15))
33
+#define SHORT_MAX ((1 << 15) - 1)
34
+#define SHORT_MAX_10 ((1 << 10) - 1)
35
+
36
+namespace X265_NS{
37
+
38
+ScalerFilterManager::ScalerFilterManager() :
39
+    m_bitDepth(0),
40
+    m_algorithmFlags(0),
41
+    m_srcW(0),
42
+    m_srcH(0),
43
+    m_dstW(0),
44
+    m_dstH(0),
45
+    m_crSrcW(0),
46
+    m_crSrcH(0),
47
+    m_crDstW(0),
48
+    m_crDstH(0),
49
+    m_crSrcHSubSample(0),
50
+    m_crSrcVSubSample(0),
51
+    m_crDstHSubSample(0),
52
+    m_crDstVSubSample(0)
53
+{
54
+    for (int i = 0; i < m_numSlice; i++)
55
+        m_slices[i] = NULL;
56
+    for (int i = 0; i < m_numFilter; i++)
57
+        m_ScalerFilters[i] = NULL;
58
+}
59
+
60
+inline static void filter_copy_c(int64_t* filter, int64_t* filter2, int size)
61
+{
62
+    for (int i = 0; i < size; i++)
63
+        filter2[i] = filter[i];
64
+}
65
+
66
+#if X265_DEPTH == 8
67
+static void doScaling_c(int16_t *dst, int dstW, const uint8_t *src, const int16_t *filter, const int32_t *filterPos, int filterSize)
68
+{
69
+    for (int i = 0; i < dstW; i++)
70
+    {
71
+        int val = 0;
72
+        int sourcePos = filterPos[i];
73
+        for (int j = 0; j < filterSize; j++)
74
+            val += ((int)src[sourcePos + j]) * filter[filterSize * i + j];
75
+        // the cubic equation does overflow ...
76
+        dst[i] = x265_clip3(SHORT_MIN, SHORT_MAX, val >> 7);
77
+    }
78
+}
79
+static uint8_t clipUint8(int a)
80
+{
81
+    if (a&(~0xFF))
82
+        return (-a) >> 31;
83
+    else
84
+        return a;
85
+}
86
+
87
+static void yuv2PlaneX_c(const int16_t *filter, int filterSize, const int16_t **src, uint8_t *dest, int dstW)
88
+{
89
+    for (int i = 0; i < dstW; i++)
90
+    {
91
+        int val = 64 << 12;
92
+        for (int j = 0; j < filterSize; j++)
93
+            val += src[j][i] * filter[j];
94
+        dest[i] = clipUint8(val >> 19);
95
+    }
96
+}
97
+#else
98
+static void yuv2PlaneX_c_h(const int16_t *filter, int filterSize, const int16_t **src, uint8_t *dest, int dstW)
99
+{
100
+    for (int i = 0; i < dstW; i++)
101
+    {
102
+        int val = 1 << 16;
103
+        uint16_t* dst16bit = (uint16_t *)dest;
104
+        for (int j = 0; j < filterSize; j++)
105
+            val += src[j][i] * filter[j];
106
+        uint16_t d = x265_clip3(0, SHORT_MAX_10, val >> 17);
107
+        ((uint8_t*)(&dst16bit[i]))[0] = (d);
108
+        ((uint8_t*)(&dst16bit[i]))[1] = (d) >> 8;
109
+    }
110
+}
111
+static void doScaling_c_h(int16_t *dst, int dstW, const uint8_t *src, const int16_t *filter, const int32_t *filterPos, int filterSize)
112
+{
113
+    const uint16_t *srcLocal = (const uint16_t *)src;
114
+    for (int i = 0; i < dstW; i++)
115
+    {
116
+        int val = 0;
117
+        int sourcePos = filterPos[i];
118
+        for (int j = 0; j < filterSize; j++)
119
+            val += ((int)srcLocal[sourcePos + j]) * filter[filterSize * i + j];
120
+        // the cubic equation does overflow
121
+        dst[i] = x265_clip3(SHORT_MIN, SHORT_MAX, val >> 9);
122
+    }
123
+}
124
+#endif
125
+
126
+ScalerFilter::ScalerFilter() :
127
+    m_filtLen(0),
128
+    m_filtPos(NULL),
129
+    m_filt(NULL),
130
+    m_sourceSlice(NULL),
131
+    m_destSlice(NULL)
132
+{
133
+}
134
+
135
+ScalerFilter::~ScalerFilter()
136
+{
137
+    if (m_filtPos) {
138
+        delete[] m_filtPos; m_filtPos = NULL;
139
+    }
140
+    if (m_filt) {
141
+        delete[] m_filt; m_filt = NULL;
142
+    }
143
+}
144
+
145
+void ScalerHLumFilter::process(int sliceVer, int sliceHor)
146
+{
147
+    uint8_t ** src = m_sourceSlice->m_plane[0].lineBuf;
148
+    uint8_t ** dst = m_destSlice->m_plane[0].lineBuf;
149
+    int sourcePos = sliceVer - m_sourceSlice->m_plane[0].sliceVer;
150
+    int destPos = sliceVer - m_destSlice->m_plane[0].sliceVer;
151
+    int dstW = m_destSlice->m_width;
152
+    for (int i = 0; i < sliceHor; ++i)
153
+    {
154
+        m_hFilterScaler->doScaling((int16_t*)dst[destPos + i], dstW, (const uint8_t *)src[sourcePos + i], m_filt, m_filtPos, m_filtLen);
155
+        m_destSlice->m_plane[0].sliceHor += 1;
156
+    }
157
+}
158
+
159
+void ScalerHCrFilter::process(int sliceVer, int sliceHor)
160
+{
161
+    uint8_t ** src1 = m_sourceSlice->m_plane[1].lineBuf;
162
+    uint8_t ** dst1 = m_destSlice->m_plane[1].lineBuf;
163
+    uint8_t ** src2 = m_sourceSlice->m_plane[2].lineBuf;
164
+    uint8_t ** dst2 = m_destSlice->m_plane[2].lineBuf;
165
+
166
+    int sourcePos1 = sliceVer - m_sourceSlice->m_plane[1].sliceVer;
167
+    int destPos1 = sliceVer - m_destSlice->m_plane[1].sliceVer;
168
+    int sourcePos2 = sliceVer - m_sourceSlice->m_plane[2].sliceVer;
169
+    int destPos2 = sliceVer - m_destSlice->m_plane[2].sliceVer;
170
+
171
+    int dstW = m_destSlice->m_width >> m_destSlice->m_hCrSubSample;
172
+
173
+    for (int i = 0; i < sliceHor; ++i)
174
+    {
175
+        m_hFilterScaler->doScaling((int16_t*)dst1[destPos1 + i], dstW, src1[sourcePos1 + i], m_filt, m_filtPos, m_filtLen);
176
+        m_hFilterScaler->doScaling((int16_t*)dst2[destPos2 + i], dstW, src2[sourcePos2 + i], m_filt, m_filtPos, m_filtLen);
177
+        m_destSlice->m_plane[1].sliceHor += 1;
178
+        m_destSlice->m_plane[2].sliceHor += 1;
179
+    }
180
+}
181
+
182
+void VFilterScaler8Bit::yuv2PlaneX(const int16_t *filter, int filterSize, const int16_t **src, uint8_t *dest, int dstW)
183
+{
184
+    int IdxW = FACTOR_4;
185
+    int IdxF = FIL_DEF;
186
+
187
+    (dstW % 4 == 0) && (filterSize == 6) && (IdxF = FIL_6) && (IdxW = FACTOR_4);
188
+    (dstW % 4 == 0) && (filterSize == 8) && (IdxF = FIL_8) && (IdxW = FACTOR_4);
189
+
190
+#if X265_DEPTH == 8
191
+    yuv2PlaneX_c(filter, filterSize, src, dest, dstW);
192
+#else
193
+    yuv2PlaneX_c_h(filter, filterSize, src, dest, dstW);
194
+#endif
195
+}
196
+
197
+void VFilterScaler10Bit::yuv2PlaneX(const int16_t *filter, int filterSize, const int16_t **src, uint8_t *dest, int dstW)
198
+{
199
+    int IdxW = FACTOR_4;
200
+    int IdxF = FIL_DEF;
201
+
202
+    (dstW % 4 == 0) && (filterSize == 6) && (IdxF = FIL_6) && (IdxW = FACTOR_4);
203
+    (dstW % 4 == 0) && (filterSize == 8) && (IdxF = FIL_8) && (IdxW = FACTOR_4);
204
+
205
+#if X265_DEPTH == 8
206
+    yuv2PlaneX_c(filter, filterSize, src, dest, dstW);
207
+#else
208
+    yuv2PlaneX_c_h(filter, filterSize, src, dest, dstW);
209
+#endif
210
+}
211
+
212
+void ScalerVLumFilter::process(int sliceVer, int sliceHor)
213
+{
214
+    (void)sliceHor;
215
+    int first = X265_MAX(1 - m_filtLen, m_filtPos[sliceVer]);
216
+    int sp = first - m_sourceSlice->m_plane[0].sliceVer;
217
+    int dp = sliceVer - m_destSlice->m_plane[0].sliceVer;
218
+    uint8_t **src = m_sourceSlice->m_plane[0].lineBuf + sp;
219
+    uint8_t **dst = m_destSlice->m_plane[0].lineBuf + dp;
220
+    int16_t *filter = m_filt + (sliceVer * m_filtLen);
221
+    int dstW = m_destSlice->m_width;
222
+    m_vFilterScaler->yuv2PlaneX(filter, m_filtLen, (const int16_t**)src, dst[0], dstW);
223
+}
224
+
225
+void ScalerVCrFilter::process(int sliceVer, int sliceHor)
226
+{
227
+    (void)sliceHor;
228
+
229
+    const int crSkipMask = (1 << m_destSlice->m_vCrSubSample) - 1;
230
+    if (sliceVer & crSkipMask)
231
+        return;
232
+    else
233
+    {
234
+        int dstW = m_destSlice->m_width >> m_destSlice->m_hCrSubSample;
235
+        int crSliceVer = sliceVer >> m_destSlice->m_vCrSubSample;
236
+        int first = X265_MAX(1 - m_filtLen, m_filtPos[crSliceVer]);
237
+        int sp1 = first - m_sourceSlice->m_plane[1].sliceVer;
238
+        int sp2 = first - m_sourceSlice->m_plane[2].sliceVer;
239
+        int dp1 = crSliceVer - m_destSlice->m_plane[1].sliceVer;
240
+        int dp2 = crSliceVer - m_destSlice->m_plane[2].sliceVer;
241
+        uint8_t **src1 = m_sourceSlice->m_plane[1].lineBuf + sp1;
242
+        uint8_t **src2 = m_sourceSlice->m_plane[2].lineBuf + sp2;
243
+        uint8_t **dst1 = m_destSlice->m_plane[1].lineBuf + dp1;
244
+        uint8_t **dst2 = m_destSlice->m_plane[2].lineBuf + dp2;
245
+        int16_t *filter = m_filt + (crSliceVer * m_filtLen);
246
+
247
+        m_vFilterScaler->yuv2PlaneX((int16_t*)filter, m_filtLen, (const int16_t**)src1, dst1[0], dstW);
248
+        m_vFilterScaler->yuv2PlaneX((int16_t*)filter, m_filtLen, (const int16_t**)src2, dst2[0], dstW);
249
+    }
250
+}
251
+
252
+int ScalerFilter::initCoeff(int flag, int inc, int srcW, int dstW, int filtAlign, int one, int sourcePos, int destPos)
253
+{
254
+    int filterSize;
255
+    int filter2Size;
256
+    int minFilterSize;
257
+    int64_t *filter = NULL;
258
+    int64_t *filter2 = NULL;
259
+    const int64_t fone = 1LL << (54 - x265_min((int)X265_LOG2(srcW / dstW), 8));
260
+    int *outFilterSize = &m_filtLen;
261
+    int64_t xDstInSrc;
262
+    int sizeFactor = flag;
263
+
264
+    // Init filter pos, the +3 is for the MMX(+1) / SSE(+3) scaler which reads over the end
265
+    m_filtPos = new int32_t[dstW + 3];
266
+    int32_t **filterPos = &m_filtPos;
267
+
268
+    if (inc <= 1 << 16)
269
+        filterSize = 1 + sizeFactor; // upscale
270
+    else
271
+        filterSize = 1 + (sizeFactor * srcW + dstW - 1) / dstW;
272
+
273
+    filterSize = x265_min(filterSize, srcW - 2);
274
+    filterSize = x265_max(filterSize, 1);
275
+    filter = new int64_t[dstW * sizeof(*filter) * filterSize];
276
+
277
+    xDstInSrc = ((destPos*(int64_t)inc) >> 7) - ((sourcePos * 0x10000LL) >> 7);
278
+    for (int i = 0; i < dstW; i++)
279
+    {
280
+        int xx = (xDstInSrc - (filterSize - 2) * (1LL << 16)) / (1 << 17);
281
+        (*filterPos)[i] = xx;
282
+        for (int j = 0; j < filterSize; j++)
283
+        {
284
+            int64_t d = (X265_ABS(((int64_t)xx * (1 << 17)) - xDstInSrc)) << 13;
285
+            int64_t coeff = 0;
286
+
287
+            if (inc > 1 << 16)
288
+                d = d * dstW / srcW;
289
+
290
+            if (flag == 4) // BiCUBIC
291
+            {
292
+                int64_t B = (0) * (1 << 24);
293
+                int64_t C = (0.6) * (1 << 24);
294
+
295
+                if (d >= 1LL << 31)
296
+                    coeff = 0.0;
297
+                else
298
+                {
299
+                    int64_t dd = (d  * d) >> 30;
300
+                    int64_t ddd = (dd * d) >> 30;
301
+
302
+                    if (d < 1LL << 30)
303
+                        coeff = (12 * (1 << 24) - 9 * B - 6 * C) * ddd + (-18 * (1 << 24) + 12 * B + 6 * C) * dd + (6 * (1 << 24) - 2 * B) * (1 << 30);
304
+                    else
305
+                        coeff = (-B - 6 * C) * ddd + (6 * B + 30 * C) * dd + (-12 * B - 48 * C) * d + (8 * B + 24 * C) * (1 << 30);
306
+                }
307
+                coeff /= (1LL << 54) / fone;
308
+            }
309
+            else if (flag == 1) // BILINEAR
310
+            {
311
+                coeff = (1 << 30) - d;
312
+                if (coeff < 0)
313
+                    coeff = 0;
314
+                coeff *= fone >> 30;
315
+            }
316
+            else
317
+                assert(0);
318
+
319
+            filter[i * filterSize + j] = coeff;
320
+            xx++;
321
+        }
322
+        xDstInSrc += 2 * inc;
323
+    }
324
+
325
+    //apply src & dst Filter to filter -> filter2
326
+    X265_CHECK(filterSize > 0, "invalid filterSize value.\n");
327
+    filter2Size = filterSize;
328
+    filter2 = new int64_t[dstW * sizeof(*filter2) * filter2Size];
329
+
330
+    /* This is hard to read code, but much faster. Speed is crucial here */
331
+    int index = RES_FACTOR_DEF;
332
+    int size = dstW * filterSize;
333
+
334
+    (size % 4 == 0) && (index = RES_FACTOR_4);
335
+    (size % 8 == 0) && (index = RES_FACTOR_8);
336
+    (size % 16 == 0) && (index = RES_FACTOR_16);
337
+    (size % 32 == 0) && (index = RES_FACTOR_32);
338
+    (size % 64 == 0) && (index = RES_FACTOR_64);
339
+
340
+    filter_copy_c(filter, filter2, size);
341
+
342
+    delete[](filter);
343
+
344
+    // try to reduce the filter-size (step1 find size and shift left)
345
+    // Assume it is near normalized (*0.5 or *2.0 is OK but * 0.001 is not).
346
+    minFilterSize = 0;
347
+    for (int i = dstW - 1; i >= 0; i--)
348
+    {
349
+        int min = filter2Size;
350
+        int64_t cutOff = 0.0;
351
+
352
+        // get rid of near zero elements on the left by shifting left
353
+        for (int j = 0; j < filter2Size; j++)
354
+        {
355
+            int k;
356
+            cutOff += X265_ABS(filter2[i * filter2Size]);
357
+
358
+            if (cutOff > SCALER_MAX_REDUCE_CUTOFF * fone)
359
+                break;
360
+            // preserve monotonicity because the core can't handle the filter otherwise
361
+            if (i < dstW - 1 && (*filterPos)[i] >= (*filterPos)[i + 1])
362
+                break;
363
+
364
+            // move filter coefficients left
365
+            for (k = 1; k < filter2Size; k++)
366
+                filter2[i * filter2Size + k - 1] = filter2[i * filter2Size + k];
367
+            filter2[i * filter2Size + k - 1] = 0;
368
+            (*filterPos)[i]++;
369
+        }
370
+
371
+        cutOff = 0;
372
+        // count near zeros on the right
373
+        for (int j = filter2Size - 1; j > 0; j--)
374
+        {
375
+            cutOff += X265_ABS(filter2[i * filter2Size + j]);
376
+
377
+            if (cutOff > SCALER_MAX_REDUCE_CUTOFF * fone)
378
+                break;
379
+            min--;
380
+        }
381
+
382
+        if (min > minFilterSize)
383
+            minFilterSize = min;
384
+    }
385
+
386
+    X265_CHECK(minFilterSize > 0, "invalid minFilterSize value.\n");
387
+    filterSize = (minFilterSize + (filtAlign - 1)) & (~(filtAlign - 1));
388
+    X265_CHECK(filterSize > 0, "invalid filterSize value.\n");
389
+    filter = new int64_t[dstW*filterSize * sizeof(*filter)];
390
+
391
+    *outFilterSize = filterSize;
392
+
393
+    // try to reduce the filter-size (step2 reduce it)
394
+    for (int i = 0; i < dstW; i++)
395
+    {
396
+        for (int j = 0; j < filterSize; j++)
397
+        {
398
+            if (j >= filter2Size)
399
+                filter[i * filterSize + j] = 0;
400
+            else
401
+                filter[i * filterSize + j] = filter2[i * filter2Size + j];
402
+            if ((flag & SCALER_BITEXACT) && j >= minFilterSize)
403
+                filter[i * filterSize + j] = 0;
404
+        }
405
+    }
406
+
407
+    // fix borders
408
+    for (int i = 0; i < dstW; i++)
409
+    {
410
+        int j;
411
+        if ((*filterPos)[i] < 0)
412
+        {
413
+            // move filter coefficients left to compensate for filterPos
414
+            for (j = 1; j < filterSize; j++)
415
+            {
416
+                int left = x265_max(j + (*filterPos)[i], 0);
417
+                filter[i * filterSize + left] += filter[i * filterSize + j];
418
+                filter[i * filterSize + j] = 0;
419
+            }
420
+            (*filterPos)[i] = 0;
421
+        }
422
+
423
+        if ((*filterPos)[i] + filterSize > srcW)
424
+        {
425
+            int shift = (*filterPos)[i] + x265_min(filterSize - srcW, 0);
426
+            int64_t acc = 0;
427
+
428
+            for (j = filterSize - 1; j >= 0; j--)
429
+            {
430
+                if ((*filterPos)[i] + j >= srcW)
431
+                {
432
+                    acc += filter[i * filterSize + j];
433
+                    filter[i * filterSize + j] = 0;
434
+                }
435
+            }
436
+            for (j = filterSize - 1; j >= 0; j--)
437
+            {
438
+                if (j < shift)
439
+                    filter[i * filterSize + j] = 0;
440
+                else
441
+                    filter[i * filterSize + j] = filter[i * filterSize + j - shift];
442
+            }
443
+
444
+            (*filterPos)[i] -= shift;
445
+            filter[i * filterSize + srcW - 1 - (*filterPos)[i]] += acc;
446
+        }
447
+
448
+        X265_CHECK((*filterPos)[i] >= 0, "invalid: Value of (*filterPos)[%d] < 0.\n", i);
449
+        X265_CHECK((*filterPos)[i] < srcW, "invalid: Value of (*filterPos)[%d] > %d .\n", i, srcW);
450
+        if ((*filterPos)[i] + filterSize > srcW)
451
+        {
452
+            for (j = 0; j < filterSize; j++)
453
+            {
454
+                X265_CHECK(!filter[i * filterSize + j], "invalid: Value of filter[%d * filterSize + %d] != 0.\n", i, j);
455
+                X265_CHECK((*filterPos)[i] + j < srcW, "invalid: (*filterPos)[%d] + %d > %d .\n", i, i, srcW);
456
+            }
457
+        }
458
+    }
459
+
460
+    // init filter
461
+    m_filt = new int16_t[(dstW + 3)*(*outFilterSize)];
462
+    int16_t **outFilter = &m_filt;
463
+
464
+    // normalize & store in outFilter
465
+    for (int i = 0; i < dstW; i++)
466
+    {
467
+        int64_t error = 0;
468
+        int64_t sum = 0;
469
+
470
+        for (int j = 0; j < filterSize; j++)
471
+            sum += filter[i * filterSize + j];
472
+        sum = (sum + one / 2) / one;
473
+        if (!sum)
474
+        {
475
+            x265_log(NULL, X265_LOG_WARNING, "Scaler: zero vector in scaling\n");
476
+            sum = 1;
477
+        }
478
+        for (int j = 0; j < *outFilterSize; j++)
479
+        {
480
+            int64_t v = filter[i * filterSize + j] + error;
481
+            int intV = ROUNDED_DIVISION(v, sum);
482
+            (*outFilter)[i * (*outFilterSize) + j] = intV;
483
+            error = v - intV * sum;
484
+        }
485
+    }
486
+
487
+    (*filterPos)[dstW + 0] =
488
+        (*filterPos)[dstW + 1] =
489
+        (*filterPos)[dstW + 2] = (*filterPos)[dstW - 1];
490
+    for (int i = 0; i < *outFilterSize; i++)
491
+    {
492
+        int k = (dstW - 1) * (*outFilterSize) + i;
493
+        (*outFilter)[k + 1 * (*outFilterSize)] =
494
+            (*outFilter)[k + 2 * (*outFilterSize)] =
495
+            (*outFilter)[k + 3 * (*outFilterSize)] = (*outFilter)[k];
496
+    }
497
+
498
+    delete[](filter);
499
+    delete[](filter2);
500
+    return 0;
501
+}
502
+
503
+int ScalerFilterManager::init(int algorithmFlags, VideoDesc *srcVideoDesc, VideoDesc *dstVideoDesc)
504
+{
505
+    int srcW = m_srcW = srcVideoDesc->m_width;
506
+    int srcH = m_srcH = srcVideoDesc->m_height;
507
+    int dstW = m_dstW = dstVideoDesc->m_width;
508
+    int dstH = m_dstH = dstVideoDesc->m_height;
509
+    int lumXInc, crXInc;
510
+    int lumYInc, crYInc;
511
+    int  srcHCrPos;
512
+    int  dstHCrPos;
513
+    int  srcVCrPos;
514
+    int  dstVCrPos;
515
+    int dst_stride = SCALER_ALIGN(dstW * sizeof(int16_t) + 66, 16);
516
+    m_bitDepth = dstVideoDesc->m_inputDepth;
517
+    if (m_bitDepth == 16)
518
+        dst_stride <<= 1;
519
+
520
+    m_algorithmFlags = algorithmFlags;
521
+    lumXInc = (((int64_t)srcW << 16) + (dstW >> 1)) / dstW;
522
+    lumYInc = (((int64_t)srcH << 16) + (dstH >> 1)) / dstH;
523
+
524
+    srcHCrPos = -513;
525
+    dstHCrPos = -513;
526
+    srcVCrPos = -513;
527
+    dstVCrPos = -513;
528
+
529
+    int srcCsp = srcVideoDesc->m_csp;
530
+    if (x265_cli_csps[srcCsp].planes > 1)
531
+    {
532
+        m_crSrcHSubSample = x265_cli_csps[srcCsp].width[1];
533
+        m_crSrcVSubSample = x265_cli_csps[srcCsp].height[1];
534
+        m_crSrcW = srcVideoDesc->m_width >> m_crSrcHSubSample;
535
+        m_crSrcH = srcVideoDesc->m_height >> m_crSrcVSubSample;
536
+        if (srcCsp == 1)// i420
537
+            srcVCrPos = 128;
538
+    }
539
+    else
540
+    {
541
+        m_crSrcW = 0;
542
+        m_crSrcH = 0;
543
+        m_crSrcHSubSample = 0;
544
+        m_crSrcVSubSample = 0;
545
+    }
546
+    int dstCsp = dstVideoDesc->m_csp;
547
+    if (x265_cli_csps[dstCsp].planes > 1)
548
+    {
549
+        m_crDstHSubSample = x265_cli_csps[dstCsp].width[1];
550
+        m_crDstVSubSample = x265_cli_csps[dstCsp].height[1];
551
+        m_crDstW = dstVideoDesc->m_width >> m_crDstHSubSample;
552
+        m_crDstH = dstVideoDesc->m_height >> m_crDstVSubSample;
553
+        if (dstCsp == 1)// i420
554
+            dstVCrPos = 128;
555
+    }
556
+    else
557
+    {
558
+        m_crDstW = 0;
559
+        m_crDstH = 0;
560
+        m_crDstHSubSample = 0;
561
+        m_crDstVSubSample = 0;
562
+    }
563
+    // Only srcCsp == dstCsp is supported at present
564
+    if (srcCsp != dstCsp)
565
+    {
566
+        x265_log(NULL, X265_LOG_ERROR, "wrong, source csp != destination csp \n");
567
+        return false;
568
+    }
569
+
570
+    lumXInc = (((int64_t)srcW << 16) + (dstW >> 1)) / dstW;
571
+    lumYInc = (((int64_t)srcH << 16) + (dstH >> 1)) / dstH;
572
+    crXInc = (((int64_t)m_crSrcW << 16) + (m_crDstW >> 1)) / m_crDstW;
573
+    crYInc = (((int64_t)m_crSrcH << 16) + (m_crDstH >> 1)) / m_crDstH;
574
+
575
+    const int filterAlign = 1;
576
+
577
+    // init horizontal Luma Scaler filter
578
+    m_ScalerFilters[0] = new ScalerHLumFilter(m_bitDepth);
579
+    m_ScalerFilters[0]->initCoeff(m_algorithmFlags, lumXInc, srcW, dstW, filterAlign, 1 << 14, getLocalPos(0, 0), getLocalPos(0, 0));
580
+
581
+    // init horizontal cr Scaler filter
582
+    m_ScalerFilters[1] = new ScalerHCrFilter(m_bitDepth);
583
+    m_ScalerFilters[1]->initCoeff(m_algorithmFlags, crXInc, m_crSrcW, m_crDstW, filterAlign, 1 << 14,
584
+        getLocalPos(m_crSrcHSubSample, srcHCrPos), getLocalPos(m_crDstHSubSample, dstHCrPos));
585
+
586
+    // init vertical Luma scaler filter
587
+    m_ScalerFilters[2] = new ScalerVLumFilter(m_bitDepth);
588
+    m_ScalerFilters[2]->initCoeff(m_algorithmFlags, lumYInc, srcH, dstH, filterAlign, 1 << 12, getLocalPos(0, 0), getLocalPos(0, 0));
589
+
590
+    // init vertical cr scaler filter
591
+    m_ScalerFilters[3] = new ScalerVCrFilter(m_bitDepth);
592
+    m_ScalerFilters[3]->initCoeff(m_algorithmFlags, crYInc, m_crSrcH, m_crDstH, filterAlign, 1 << 12,
593
+        getLocalPos(m_crSrcVSubSample, srcVCrPos), getLocalPos(m_crDstVSubSample, dstVCrPos));
594
+
595
+    // init slice, must after filter initialization
596
+    initScalerSlice();
597
+
598
+    // set slice
599
+    m_ScalerFilters[0]->setSlice(m_slices[0], m_slices[1]);
600
+    m_ScalerFilters[1]->setSlice(m_slices[0], m_slices[1]);
601
+
602
+    m_ScalerFilters[2]->setSlice(m_slices[1], m_slices[2]);
603
+    m_ScalerFilters[3]->setSlice(m_slices[1], m_slices[2]);
604
+
605
+    return 0;
606
+}
607
+
608
+void HFilterScaler8Bit::doScaling(int16_t *dst, int dstW, const uint8_t *src, const int16_t *filter, const int32_t *filterPos, int filterSize)
609
+{
610
+    int IdxW = FACTOR_4;
611
+    int IdxF = FIL_DEF;
612
+
613
+    /* This is hard to read code, but much faster. Speed is crucial here */
614
+    (dstW % 8 == 0) && (filterSize == 6) && (IdxF = FIL_6) && (IdxW = FACTOR_8);
615
+    (dstW % 8 == 0) && (filterSize == 8) && (IdxF = FIL_8) && (IdxW = FACTOR_8);
616
+    (dstW % 8 == 0) && (filterSize == 16) && (IdxF = FIL_16) && (IdxW = FACTOR_8);
617
+    (dstW % 8 == 0) && (filterSize == 11) && (IdxF = FIL_11) && (IdxW = FACTOR_8);
618
+    (dstW % 8 == 0) && (filterSize == 10) && (IdxF = FIL_10) && (IdxW = FACTOR_8);
619
+    (dstW % 8 == 0) && (filterSize == 9) && (IdxF = FIL_9) && (IdxW = FACTOR_8);
620
+    (dstW % 8 == 0) && (filterSize == 15) && (IdxF = FIL_15) && (IdxW = FACTOR_8);
621
+    (dstW % 8 == 0) && (filterSize == 13) && (IdxF = FIL_13) && (IdxW = FACTOR_8);
622
+
623
+    /* Do not check multiple of width 4, if width is already multiple of 8 */
624
+    !(dstW % 8 == 0) && (dstW % 4 == 0) && (filterSize == 6) && (IdxF = FIL_6) && (IdxW = FACTOR_4);
625
+    !(dstW % 8 == 0) && (dstW % 4 == 0) && (filterSize == 8) && (IdxF = FIL_8) && (IdxW = FACTOR_4);
626
+    !(dstW % 8 == 0) && (dstW % 4 == 0) && (filterSize == 16) && (IdxF = FIL_16) && (IdxW = FACTOR_4);
627
+
628
+    (dstW % 4 == 0) && (filterSize == 24) && (IdxF = FIL_24) && (IdxW = FACTOR_4);
629
+    (dstW % 4 == 0) && (filterSize == 22) && (IdxF = FIL_22) && (IdxW = FACTOR_4);
630
+    (dstW % 4 == 0) && (filterSize == 19) && (IdxF = FIL_19) && (IdxW = FACTOR_4);
631
+    (dstW % 4 == 0) && (filterSize == 17) && (IdxF = FIL_17) && (IdxW = FACTOR_4);
632
+
633
+#if X265_DEPTH == 8
634
+    doScaling_c(dst, dstW, src, filter, filterPos, filterSize);
635
+#else
636
+    doScaling_c_h(dst, dstW, src, filter, filterPos, filterSize);
637
+#endif
638
+}
639
+
640
+void HFilterScaler10Bit::doScaling(int16_t *dst, int dstW, const uint8_t *src, const int16_t *filter, const int32_t *filterPos, int filterSize)
641
+{
642
+    int IdxW = FACTOR_4;
643
+    int IdxF = FIL_DEF;
644
+
645
+    /* This is hard to read code, but much faster. Speed is crucial here */
646
+    (dstW % 8 == 0) && (filterSize == 6) && (IdxF = FIL_6) && (IdxW = FACTOR_8);
647
+    (dstW % 8 == 0) && (filterSize == 8) && (IdxF = FIL_8) && (IdxW = FACTOR_8);
648
+    (dstW % 8 == 0) && (filterSize == 16) && (IdxF = FIL_16) && (IdxW = FACTOR_8);
649
+    (dstW % 8 == 0) && (filterSize == 11) && (IdxF = FIL_11) && (IdxW = FACTOR_8);
650
+    (dstW % 8 == 0) && (filterSize == 10) && (IdxF = FIL_10) && (IdxW = FACTOR_8);
651
+    (dstW % 8 == 0) && (filterSize == 9) && (IdxF = FIL_9) && (IdxW = FACTOR_8);
652
+    (dstW % 8 == 0) && (filterSize == 15) && (IdxF = FIL_15) && (IdxW = FACTOR_8);
653
+    (dstW % 8 == 0) && (filterSize == 13) && (IdxF = FIL_13) && (IdxW = FACTOR_8);
654
+
655
+    /* Do not check multiple of width 4, if width is already multiple of 8 */
656
+    !(dstW % 8 == 0) && (dstW % 4 == 0) && (filterSize == 6) && (IdxF = FIL_6) && (IdxW = FACTOR_4);
657
+    !(dstW % 8 == 0) && (dstW % 4 == 0) && (filterSize == 8) && (IdxF = FIL_8) && (IdxW = FACTOR_4);
658
+    !(dstW % 8 == 0) && (dstW % 4 == 0) && (filterSize == 16) && (IdxF = FIL_16) && (IdxW = FACTOR_4);
659
+
660
+    (dstW % 4 == 0) && (filterSize == 24) && (IdxF = FIL_24) && (IdxW = FACTOR_4);
661
+    (dstW % 4 == 0) && (filterSize == 22) && (IdxF = FIL_22) && (IdxW = FACTOR_4);
662
+    (dstW % 4 == 0) && (filterSize == 19) && (IdxF = FIL_19) && (IdxW = FACTOR_4);
663
+    (dstW % 4 == 0) && (filterSize == 17) && (IdxF = FIL_17) && (IdxW = FACTOR_4);
664
+
665
+#if X265_DEPTH == 8
666
+    doScaling_c(dst, dstW, src, filter, filterPos, filterSize);
667
+#else
668
+    doScaling_c_h(dst, dstW, src, filter, filterPos, filterSize);
669
+#endif
670
+}
671
+
672
+int ScalerFilterManager::scale_pic(void ** src, void ** dst, int * srcStride, int * dstStride)
673
+{
674
+    uint8_t** src_8bit, **dst_8bit;
675
+    src_8bit = (uint8_t**)src;
676
+    dst_8bit = (uint8_t**)dst;
677
+    if (!src_8bit || !dst_8bit)
678
+        return -1;
679
+
680
+    const int srcsliceHor = m_srcH;
681
+    const int dstW = m_dstW;
682
+    const int dstH = m_dstH;
683
+    int32_t *vLumFilterPos = m_ScalerFilters[2]->m_filtPos;
684
+    int32_t *vCrFilterPos = m_ScalerFilters[3]->m_filtPos;
685
+    const int vLumFilterSize = m_ScalerFilters[2]->m_filtLen;
686
+    const int vCrFilterSize = m_ScalerFilters[3]->m_filtLen;
687
+    const int crSrcsliceHor = UH_CEIL_SHIFTR(srcsliceHor, m_crSrcVSubSample);
688
+
689
+    // vars which will change and which we need to store back in the context
690
+    int lumBufIndex = -1;
691
+    int crBufIndex = -1;
692
+    int lastInLumBuf = -1;
693
+    int lastInCrBuf = -1;
694
+
695
+    int hasLumHoles = 1;
696
+    int hasCrHoles = 1;
697
+
698
+    ScalerSlice *src_slice = m_slices[0];
699
+    ScalerSlice *hout_slice = m_slices[1];
700
+    ScalerSlice *vout_slice = m_slices[2];
701
+    src_slice->initFromSrc((uint8_t**)src, srcStride, m_srcW, 0, srcsliceHor, 0, crSrcsliceHor, 1);
702
+    vout_slice->initFromSrc((uint8_t**)dst, dstStride, m_dstW, 0, dstH, 0, UH_CEIL_SHIFTR(dstH, m_crDstVSubSample), 0);
703
+
704
+    hout_slice->m_plane[0].sliceVer = 0;
705
+    hout_slice->m_plane[1].sliceVer = 0;
706
+    hout_slice->m_plane[2].sliceVer = 0;
707
+    hout_slice->m_plane[3].sliceVer = 0;
708
+    hout_slice->m_plane[0].sliceHor = 0;
709
+    hout_slice->m_plane[1].sliceHor = 0;
710
+    hout_slice->m_plane[2].sliceHor = 0;
711
+    hout_slice->m_plane[3].sliceHor = 0;
712
+    hout_slice->m_width = dstW;
713
+
714
+    for (int dstY = 0; dstY < dstH; dstY++)
715
+    {
716
+        const int crDstY = dstY >> m_crDstVSubSample;
717
+        const int firstLumSrcY = x265_max(1 - vLumFilterSize, vLumFilterPos[dstY]);
718
+        const int firstLumSrcY2 = x265_max(1 - vLumFilterSize, vLumFilterPos[x265_min(dstY | ((1 << m_crDstVSubSample) - 1), dstH - 1)]);
719
+        const int firstCrSrcY = x265_max(1 - vCrFilterSize, vCrFilterPos[crDstY]);
720
+
721
+        int lastLumSrcY = x265_min(m_srcH, firstLumSrcY + vLumFilterSize) - 1;
722
+        int lastLumSrcY2 = x265_min(m_srcH, firstLumSrcY2 + vLumFilterSize) - 1;
723
+        int lastCrSrcY = x265_min(m_crSrcH, firstCrSrcY + vCrFilterSize) - 1;
724
+
725
+        // handle holes
726
+        if (firstLumSrcY > lastInLumBuf)
727
+        {
728
+            hasLumHoles = lastInLumBuf != firstLumSrcY - 1;
729
+            if (hasLumHoles)
730
+            {
731
+                hout_slice->m_plane[0].sliceVer = firstLumSrcY;
732
+                hout_slice->m_plane[3].sliceVer = firstLumSrcY;
733
+                hout_slice->m_plane[0].sliceHor =
734
+                    hout_slice->m_plane[3].sliceHor = 0;
735
+            }
736
+
737
+            lastInLumBuf = firstLumSrcY - 1;
738
+        }
739
+        if (firstCrSrcY > lastInCrBuf)
740
+        {
741
+            hasCrHoles = lastInCrBuf != firstCrSrcY - 1;
742
+            if (hasCrHoles)
743
+            {
744
+                hout_slice->m_plane[1].sliceVer = firstCrSrcY;
745
+                hout_slice->m_plane[2].sliceVer = firstCrSrcY;
746
+                hout_slice->m_plane[1].sliceHor =
747
+                    hout_slice->m_plane[2].sliceHor = 0;
748
+            }
749
+
750
+            lastInCrBuf = firstCrSrcY - 1;
751
+        }
752
+
753
+        // Do we have enough lines in this slice to output the dstY line
754
+        int enoughLines = lastLumSrcY2 < 0 + srcsliceHor && lastCrSrcY < UH_CEIL_SHIFTR(0 + srcsliceHor, m_crSrcVSubSample);
755
+        if (!enoughLines)
756
+        {
757
+            lastLumSrcY = 0 + srcsliceHor - 1;
758
+            lastCrSrcY = 0 + crSrcsliceHor - 1;
759
+            x265_log(NULL, X265_LOG_INFO, "buffering slice: lastLumSrcY %d lastCrSrcY %d\n", lastLumSrcY, lastCrSrcY);
760
+        }
761
+
762
+        X265_CHECK(((lastLumSrcY - firstLumSrcY + 1) <= hout_slice->m_plane[0].availLines), "invalid value %d", lastLumSrcY - firstLumSrcY + 1);
763
+        X265_CHECK((lastCrSrcY - firstCrSrcY + 1) <= hout_slice->m_plane[1].availLines, "invalid value %d", lastCrSrcY - firstCrSrcY + 1);
764
+
765
+        int firstPosY, lastPosY, firstCPosY, lastCPosY;
766
+        int posY = hout_slice->m_plane[0].sliceVer + hout_slice->m_plane[0].sliceHor;
767
+        if (posY <= lastLumSrcY && !hasLumHoles)
768
+        {
769
+            firstPosY = x265_max(firstLumSrcY, posY);
770
+            lastPosY = x265_min(firstLumSrcY + hout_slice->m_plane[0].availLines - 1, 0 + srcsliceHor - 1);
771
+        }
772
+        else
773
+        {
774
+            firstPosY = posY;
775
+            lastPosY = lastLumSrcY;
776
+        }
777
+
778
+        int cPosY = hout_slice->m_plane[1].sliceVer + hout_slice->m_plane[1].sliceHor;
779
+        if (cPosY <= lastCrSrcY && !hasCrHoles)
780
+        {
781
+            firstCPosY = x265_max(firstCrSrcY, cPosY);
782
+            lastCPosY = x265_min(firstCrSrcY + hout_slice->m_plane[1].availLines - 1, UH_CEIL_SHIFTR(0 + srcsliceHor, m_crSrcVSubSample) - 1);
783
+        }
784
+        else
785
+        {
786
+            firstCPosY = cPosY;
787
+            lastCPosY = lastCrSrcY;
788
+        }
789
+
790
+        hout_slice->rotate(lastPosY, lastCPosY);
791
+        // horizontal luma scale
792
+        if (posY < lastLumSrcY + 1)
793
+            m_ScalerFilters[0]->process(firstPosY, lastPosY - firstPosY + 1);
794
+
795
+        lumBufIndex += lastLumSrcY - lastInLumBuf;
796
+        lastInLumBuf = lastLumSrcY;
797
+        // horizontal chroma Scale
798
+        if (cPosY < lastCrSrcY + 1)
799
+            m_ScalerFilters[1]->process(firstCPosY, lastCPosY - firstCPosY + 1);
800
+
801
+        crBufIndex += lastCrSrcY - lastInCrBuf;
802
+        lastInCrBuf = lastCrSrcY;
803
+
804
+        // wrap buf index around to stay inside the ring buffer
805
+        if (lumBufIndex >= vLumFilterSize)
806
+            lumBufIndex -= vLumFilterSize;
807
+        if (crBufIndex >= vCrFilterSize)
808
+            crBufIndex -= vCrFilterSize;
809
+        if (!enoughLines)
810
+            break;  // we can't output a dstY line so let's try with the next slice
811
+
812
+        // vertical scale(output converter)
813
+        for (int i = 2; i < m_numFilter; ++i)
814
+            m_ScalerFilters[i]->process(dstY, 1);
815
+    }
816
+    return 0;
817
+}
818
+
819
+void ScalerFilterManager::getMinBufferSize(int *out_lum_size, int *out_cr_size)
820
+{
821
+    int lumY;
822
+    int dstH = m_dstH;
823
+    int crDstH = m_crDstH;
824
+    int *lumFilterPos = m_ScalerFilters[2]->m_filtPos;
825
+    int *crFilterPos = m_ScalerFilters[3]->m_filtPos;
826
+    int lumFilterSize = m_ScalerFilters[2]->m_filtLen;
827
+    int crFilterSize = m_ScalerFilters[3]->m_filtLen;
828
+    int crSubSample = m_crSrcVSubSample;
829
+
830
+    *out_lum_size = lumFilterSize;
831
+    *out_cr_size = crFilterSize;
832
+
833
+    for (lumY = 0; lumY < dstH; lumY++)
834
+    {
835
+        int crY = (int64_t)lumY * crDstH / dstH;
836
+        int nextSlice = x265_max(lumFilterPos[lumY] + lumFilterSize - 1, ((crFilterPos[crY] + crFilterSize - 1) << crSubSample));
837
+
838
+        nextSlice >>= crSubSample;
839
+        nextSlice <<= crSubSample;
840
+        (*out_lum_size) = x265_max((*out_lum_size), nextSlice - lumFilterPos[lumY]);
841
+        (*out_cr_size) = x265_max((*out_cr_size), (nextSlice >> crSubSample) - crFilterPos[crY]);
842
+    }
843
+}
844
+
845
+int ScalerFilterManager::initScalerSlice()
846
+{
847
+    int ret = 0;
848
+    int dst_stride = SCALER_ALIGN(m_dstW * sizeof(int16_t) + 66, 16);
849
+    if (m_bitDepth == 16)
850
+        dst_stride <<= 1;
851
+
852
+    int lumBufSize;
853
+    int crBufSize;
854
+    int vLumFilterSize = m_ScalerFilters[2]->m_filtLen; // Vertical filter size for luma pixels.
855
+    int vCrFilterSize = m_ScalerFilters[3]->m_filtLen;  // Vertical filter size for chroma pixels.
856
+    getMinBufferSize(&lumBufSize, &crBufSize);
857
+    lumBufSize = X265_MAX(lumBufSize, vLumFilterSize + MAX_NUM_LINES_AHEAD);
858
+    crBufSize = X265_MAX(crBufSize, vCrFilterSize + MAX_NUM_LINES_AHEAD);
859
+
860
+    for (int i = 0; i < m_numSlice; i++)
861
+        m_slices[i] = new ScalerSlice;
862
+    ret = m_slices[0]->create(m_srcH, m_crSrcH, m_crSrcHSubSample, m_crSrcVSubSample, 0);
863
+    if (ret < 0)
864
+    {
865
+        x265_log(NULL, X265_LOG_ERROR, "alloc_slice m_slice[0] failed\n");
866
+        return -1;
867
+    }
868
+
869
+    // horizontal scaler output
870
+    ret = m_slices[1]->create(lumBufSize, crBufSize, m_crDstHSubSample, m_crDstVSubSample, 1);
871
+    if (ret < 0)
872
+    {
873
+        x265_log(NULL, X265_LOG_ERROR, "m_slice[1].create failed\n");
874
+        return -1;
875
+    }
876
+    ret = m_slices[1]->createLines(dst_stride, m_dstW);
877
+    if (ret < 0)
878
+    {
879
+        x265_log(NULL, X265_LOG_ERROR, "m_slice[1].createLines failed\n");
880
+        return -1;
881
+    }
882
+
883
+    m_slices[1]->fillOnes(dst_stride >> 1, m_bitDepth == 16);
884
+
885
+    // vertical scaler output
886
+    ret = m_slices[2]->create(m_dstH, m_crDstH, m_crDstHSubSample, m_crDstVSubSample, 0);
887
+    if (ret < 0)
888
+    {
889
+        x265_log(NULL, X265_LOG_ERROR, "m_slice[2].create failed\n");
890
+        return -1;
891
+    }
892
+
893
+    return 0;
894
+}
895
+
896
+int ScalerFilterManager::getLocalPos(int crSubSample, int pos)
897
+{
898
+    if (pos == -1 || pos <= -513)
899
+        pos = (128 << crSubSample) - 128;
900
+    pos += 128; // relative to ideal left edge
901
+    return pos >> crSubSample;
902
+}
903
+
904
+ScalerSlice::ScalerSlice() :
905
+    m_width(0),
906
+    m_hCrSubSample(0),
907
+    m_vCrSubSample(0),
908
+    m_isRing(0),
909
+    m_destroyLines(0)
910
+{
911
+    for (int i = 0; i < m_numSlicePlane; i++)
912
+    {
913
+        m_plane[i].availLines = 0;
914
+        m_plane[i].sliceVer = 0;
915
+        m_plane[i].sliceHor = 0;
916
+        m_plane[i].lineBuf = NULL;
917
+    }
918
+}
919
+
920
+void ScalerSlice::destroy()
921
+{
922
+    if (m_destroyLines)
923
+        destroyLines();
924
+    for (int i = 0; i < m_numSlicePlane; i++)
925
+    {
926
+        if (m_plane[i].lineBuf)
927
+            X265_FREE(m_plane[i].lineBuf);
928
+    }
929
+}
930
+
931
+int ScalerSlice::create(int lumLines, int crLines, int h_sub_sample, int v_sub_sample, int ring)
932
+{
933
+    int i;
934
+    int size[4] = { lumLines, crLines, crLines, lumLines };
935
+
936
+    m_hCrSubSample = h_sub_sample;
937
+    m_vCrSubSample = v_sub_sample;
938
+    m_isRing = ring;
939
+    m_destroyLines = 0;
940
+
941
+    for (i = 0; i < m_numSlicePlane; ++i)
942
+    {
943
+        int n = size[i] * (ring == 0 ? 1 : 3);
944
+        m_plane[i].lineBuf = X265_MALLOC(uint8_t*, n);
945
+        if (!m_plane[i].lineBuf)
946
+            return -1;
947
+
948
+        m_plane[i].availLines = size[i];
949
+        m_plane[i].sliceVer = 0;
950
+        m_plane[i].sliceHor = 0;
951
+    }
952
+    return 0;
953
+}
954
+
955
+/*
956
+slice lines contains extra bytes for vectorial code thus @size
957
+is the allocated memory size and @width is the number of pixels
958
+*/
959
+int ScalerSlice::createLines(int size, int width)
960
+{
961
+    int i;
962
+    int idx[2] = { 3, 2 };
963
+
964
+    m_destroyLines = 1;
965
+    m_width = width;
966
+
967
+    for (i = 0; i < 2; ++i) {
968
+        int n = m_plane[i].availLines;
969
+        int j;
970
+        int ii = idx[i];
971
+        assert(n == m_plane[ii].availLines);
972
+        for (j = 0; j < n; ++j)
973
+        {
974
+            // chroma plane line U and V are expected to be contiguous in memory
975
+            m_plane[i].lineBuf[j] = (uint8_t*)X265_MALLOC(uint8_t, size * 2 + 32);
976
+            if (!m_plane[i].lineBuf[j])
977
+            {
978
+                destroyLines();
979
+                return -1;
980
+            }
981
+            m_plane[ii].lineBuf[j] = m_plane[i].lineBuf[j] + size + 16;
982
+            if (m_isRing)
983
+            {
984
+                m_plane[i].lineBuf[j + n] = m_plane[i].lineBuf[j];
985
+                m_plane[ii].lineBuf[j + n] = m_plane[ii].lineBuf[j];
986
+            }
987
+        }
988
+    }
989
+
990
+    return 0;
991
+}
992
+
993
+void ScalerSlice::destroyLines()
994
+{
995
+    int i;
996
+    for (i = 0; i < 2; ++i)
997
+    {
998
+        int n = m_plane[i].availLines;
999
+        int j;
1000
+        for (j = 0; j < n; ++j)
1001
+        {
1002
+            X265_FREE(m_plane[i].lineBuf[j]);
1003
+            m_plane[i].lineBuf[j] = NULL;
1004
+            if (m_isRing)
1005
+                m_plane[i].lineBuf[j + n] = NULL;
1006
+        }
1007
+    }
1008
+
1009
+    for (i = 0; i < m_numSlicePlane; ++i)
1010
+        memset(m_plane[i].lineBuf, 0, sizeof(uint8_t*) * m_plane[i].availLines * (m_isRing ? 3 : 1));
1011
+    m_destroyLines = 0;
1012
+}
1013
+
1014
+void ScalerSlice::fillOnes(int n, int is16bit)
1015
+{
1016
+    int i;
1017
+    for (i = 0; i < m_numSlicePlane; ++i)
1018
+    {
1019
+        int j;
1020
+        int size = m_plane[i].availLines;
1021
+        for (j = 0; j < size; ++j)
1022
+        {
1023
+            int k;
1024
+            int end = is16bit ? n >> 1 : n;
1025
+            // fill also one extra element
1026
+            end += 1;
1027
+            if (is16bit)
1028
+                for (k = 0; k < end; ++k)
1029
+                    ((int32_t*)(m_plane[i].lineBuf[j]))[k] = 1 << 18;
1030
+            else
1031
+                for (k = 0; k < end; ++k)
1032
+                    ((int16_t*)(m_plane[i].lineBuf[j]))[k] = 1 << 14;
1033
+        }
1034
+    }
1035
+}
1036
+
1037
+int ScalerSlice::rotate(int lum, int cr)
1038
+{
1039
+    int i;
1040
+    if (lum)
1041
+    {
1042
+        for (i = 0; i < m_numSlicePlane; i += 3)
1043
+        {
1044
+            int n = m_plane[i].availLines;
1045
+            int l = lum - m_plane[i].sliceVer;
1046
+
1047
+            if (l >= n * 2)
1048
+            {
1049
+                m_plane[i].sliceVer += n;
1050
+                m_plane[i].sliceHor -= n;
1051
+            }
1052
+        }
1053
+    }
1054
+    if (cr)
1055
+    {
1056
+        for (i = 1; i < 3; ++i)
1057
+        {
1058
+            int n = m_plane[i].availLines;
1059
+            int l = cr - m_plane[i].sliceVer;
1060
+
1061
+            if (l >= n * 2)
1062
+            {
1063
+                m_plane[i].sliceVer += n;
1064
+                m_plane[i].sliceHor -= n;
1065
+            }
1066
+        }
1067
+    }
1068
+    return 0;
1069
+}
1070
+
1071
+int ScalerSlice::initFromSrc(uint8_t *src[4], const int stride[4], int srcW, int lumY, int lumH, int crY, int crH, int relative)
1072
+{
1073
+    int i = 0;
1074
+
1075
+    const int start[m_numSlicePlane] = { lumY, crY, crY, lumY };
1076
+
1077
+    const int end[m_numSlicePlane] = { lumY + lumH, crY + crH, crY + crH, lumY + lumH };
1078
+
1079
+    uint8_t *const src_[m_numSlicePlane] = { src[0] + (relative ? 0 : start[0]) * stride[0],
1080
+        src[1] + (relative ? 0 : start[1]) * stride[1],
1081
+        src[2] + (relative ? 0 : start[2]) * stride[2],
1082
+        src[3] + (relative ? 0 : start[3]) * stride[3] };
1083
+
1084
+    m_width = srcW;
1085
+
1086
+    for (i = 0; i < m_numSlicePlane; ++i)
1087
+    {
1088
+        int j;
1089
+        int first = m_plane[i].sliceVer;
1090
+        int n = m_plane[i].availLines;
1091
+        int lines = end[i] - start[i];
1092
+        int tot_lines = end[i] - first;
1093
+
1094
+        if (start[i] >= first && n >= tot_lines)
1095
+        {
1096
+            m_plane[i].sliceHor = x265_max(tot_lines, m_plane[i].sliceHor);
1097
+            for (j = 0; j < lines; j += 1)
1098
+                m_plane[i].lineBuf[start[i] - first + j] = src_[i] + j * stride[i];
1099
+        }
1100
+        else
1101
+        {
1102
+            m_plane[i].sliceVer = start[i];
1103
+            lines = lines > n ? n : lines;
1104
+            m_plane[i].sliceHor = lines;
1105
+            for (j = 0; j < lines; j += 1)
1106
+                m_plane[i].lineBuf[j] = src_[i] + j * stride[i];
1107
+        }
1108
+    }
1109
+    return 0;
1110
+}
1111
+}
1112
x265_3.4.tar.gz/source/common/scaler.h Added
256
 
1
@@ -0,0 +1,254 @@
2
+/*****************************************************************************
3
+ * Copyright (C) 2013-2020 MulticoreWare, Inc
4
+ *
5
+ * Authors: Pooja Venkatesan <pooja@multicorewareinc.com>
6
+ *
7
+ * This program is free software; you can redistribute it and/or modify
8
+ * it under the terms of the GNU General Public License as published by
9
+ * the Free Software Foundation; either version 2 of the License, or
10
+ * (at your option) any later version.
11
+ *
12
+ * This program is distributed in the hope that it will be useful,
13
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
14
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
15
+ * GNU General Public License for more details.
16
+ *
17
+ * You should have received a copy of the GNU General Public License
18
+ * along with this program; if not, write to the Free Software
19
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
20
+ *
21
+ * This program is also available under a commercial proprietary license.
22
+ * For more information, contact us at license @ x265.com.
23
+ *****************************************************************************/
24
+
25
+#ifndef X265_SCALER_H
26
+#define X265_SCALER_H
27
+
28
+#include "common.h"
29
+
30
+namespace X265_NS {
31
+//x265 private namespace
32
+
33
+class ScalerSlice;
34
+class VideoDesc;
35
+
36
+#define MAX_NUM_LINES_AHEAD 4
37
+#define SCALER_ALIGN(x, j) (((x)+(j)-1)&~((j)-1))
38
+#define X265_ABS(j) ((j) >= 0 ? (j) : (-(j)))
39
+#define SCALER_MAX_REDUCE_CUTOFF 0.002
40
+#define SCALER_BITEXACT  0x80000
41
+#define ROUNDED_DIVISION(i,j) (((i)>0 ? (i) + ((j)>>1) : (i) - ((j)>>1))/(j))
42
+#define UH_CEIL_SHIFTR(i,j) (!scale_builtin_constant_p(j) ? -((-(i)) >> (j)) \
43
+                                                          : ((i) + (1<<(j)) - 1) >> (j))
44
+
45
+#if defined(__GNUC__) || defined(__clang__)
46
+#    define scale_builtin_constant_p __builtin_constant_p
47
+#else
48
+#    define scale_builtin_constant_p(x) 0
49
+#endif
50
+
51
+enum ResFactor
52
+{
53
+    RES_FACTOR_64, RES_FACTOR_32, RES_FACTOR_16, RES_FACTOR_8,
54
+    RES_FACTOR_4, RES_FACTOR_DEF, NUM_RES_FACTOR
55
+};
56
+
57
+enum ScalerFactor
58
+{
59
+    FACTOR_4, FACTOR_8, NUM_FACTOR
60
+};
61
+
62
+enum FilterSize
63
+{
64
+    FIL_4, FIL_6, FIL_8, FIL_9, FIL_10, FIL_11, FIL_13, FIL_15,
65
+    FIL_16, FIL_17, FIL_19, FIL_22, FIL_24, FIL_DEF, NUM_FIL
66
+};
67
+
68
+class ScalerFilter {
69
+public:
70
+    int             m_filtLen;
71
+    int32_t*        m_filtPos;      // Array of horizontal/vertical starting pos for each dst for luma / chroma planes.
72
+    int16_t*        m_filt;         // Array of horizontal/vertical filter coefficients for luma / chroma planes.
73
+    ScalerSlice*    m_sourceSlice;  // Source slice
74
+    ScalerSlice*    m_destSlice;    // Output slice
75
+    ScalerFilter();
76
+    virtual ~ScalerFilter();
77
+    virtual void process(int sliceVer, int sliceHor) = 0;
78
+    int initCoeff(int flag, int inc, int srcW, int dstW, int filtAlign, int one, int sourcePos, int destPos);
79
+    void setSlice(ScalerSlice* source, ScalerSlice* dest) { m_sourceSlice = source; m_destSlice = dest; }
80
+};
81
+
82
+class VideoDesc {
83
+public:
84
+    int         m_width;
85
+    int         m_height;
86
+    int         m_csp;
87
+    int         m_inputDepth;
88
+
89
+    VideoDesc(int w, int h, int csp, int bitDepth)
90
+    {
91
+        m_width = w;
92
+        m_height = h;
93
+        m_csp = csp;
94
+        m_inputDepth = bitDepth;
95
+    }
96
+};
97
+
98
+typedef struct ScalerPlane
99
+{
100
+    int       availLines; // max number of lines that can be held by this plane
101
+    int       sliceVer;   // index of first line
102
+    int       sliceHor;   // number of lines
103
+    uint8_t** lineBuf;    // line buffer
104
+} ScalerPlane;
105
+
106
+// Assist horizontal filtering, base class
107
+class HFilterScaler {
108
+public:
109
+    int m_bitDepth;
110
+public:
111
+    HFilterScaler() :m_bitDepth(0) {};
112
+    virtual ~HFilterScaler() {};
113
+    virtual void doScaling(int16_t *dst, int dstW, const uint8_t *src, const int16_t *filter, const int32_t *filterPos, int filterSize) = 0;
114
+};
115
+
116
+// Assist vertical filtering, base class
117
+class VFilterScaler {
118
+public:
119
+    int m_bitDepth;
120
+public:
121
+    VFilterScaler() :m_bitDepth(0) {};
122
+    virtual ~VFilterScaler() {};
123
+    virtual void yuv2PlaneX(const int16_t *filter, int filterSize, const int16_t **src, uint8_t *dest, int dstW) = 0;
124
+};
125
+
126
+//  Assist horizontal filtering, process 8 bit case
127
+class HFilterScaler8Bit : public HFilterScaler {
128
+public:
129
+    HFilterScaler8Bit() { m_bitDepth = 8; }
130
+    virtual void doScaling(int16_t *dst, int dstW, const uint8_t *src, const int16_t *filter, const int32_t *filterPos, int filterSize);
131
+};
132
+
133
+//  Assist horizontal filtering, process 10 bit case
134
+class HFilterScaler10Bit : public HFilterScaler {
135
+public:
136
+    HFilterScaler10Bit() { m_bitDepth = 10; }
137
+    virtual void doScaling(int16_t *dst, int dstW, const uint8_t *src, const int16_t *filter, const int32_t *filterPos, int filterSize);
138
+};
139
+
140
+//  Assist vertical filtering, process 8 bit case
141
+class VFilterScaler8Bit : public VFilterScaler {
142
+public:
143
+    VFilterScaler8Bit() { m_bitDepth = 8; }
144
+    virtual void yuv2PlaneX(const int16_t *filter, int filterSize, const int16_t **src, uint8_t *dest, int dstW);
145
+};
146
+
147
+//  Assist vertical filtering, process 10 bit case
148
+class VFilterScaler10Bit : public VFilterScaler {
149
+public:
150
+    VFilterScaler10Bit() { m_bitDepth = 10; }
151
+    virtual void yuv2PlaneX(const int16_t *filter, int filterSize, const int16_t **src, uint8_t *dest, int dstW);
152
+};
153
+
154
+// Horizontal filter for luma
155
+class ScalerHLumFilter : public ScalerFilter {
156
+private:
157
+    HFilterScaler* m_hFilterScaler;
158
+public:
159
+    ScalerHLumFilter(int bitDepth) { bitDepth == 8 ? m_hFilterScaler = new HFilterScaler8Bit : bitDepth == 10 ? m_hFilterScaler = new HFilterScaler10Bit : NULL;}
160
+    ~ScalerHLumFilter() { if (m_hFilterScaler) X265_FREE(m_hFilterScaler); }
161
+    virtual void process(int sliceVer, int sliceHor);
162
+};
163
+
164
+// Horizontal filter for chroma
165
+class ScalerHCrFilter : public ScalerFilter {
166
+private:
167
+    HFilterScaler* m_hFilterScaler;
168
+public:
169
+    ScalerHCrFilter(int bitDepth) { bitDepth == 8 ? m_hFilterScaler = new HFilterScaler8Bit : bitDepth == 10 ? m_hFilterScaler = new HFilterScaler10Bit : NULL;}
170
+    ~ScalerHCrFilter() { if (m_hFilterScaler) X265_FREE(m_hFilterScaler); }
171
+    virtual void process(int sliceVer, int sliceHor);
172
+};
173
+
174
+// Vertical filter for luma
175
+class ScalerVLumFilter : public ScalerFilter {
176
+private:
177
+    VFilterScaler* m_vFilterScaler;
178
+public:
179
+    ScalerVLumFilter(int bitDepth) { bitDepth == 8 ? m_vFilterScaler = new VFilterScaler8Bit : bitDepth == 10 ? m_vFilterScaler = new VFilterScaler10Bit : NULL;}
180
+    ~ScalerVLumFilter() { if (m_vFilterScaler) X265_FREE(m_vFilterScaler); }
181
+    virtual void process(int sliceVer, int sliceHor);
182
+};
183
+
184
+// Vertical filter for chroma
185
+class ScalerVCrFilter : public ScalerFilter {
186
+private:
187
+    VFilterScaler*    m_vFilterScaler;
188
+public:
189
+    ScalerVCrFilter(int bitDepth) { bitDepth == 8 ? m_vFilterScaler = new VFilterScaler8Bit : bitDepth == 10 ? m_vFilterScaler = new VFilterScaler10Bit : NULL;}
190
+    ~ScalerVCrFilter() { if (m_vFilterScaler) X265_FREE(m_vFilterScaler); }
191
+    virtual void process(int sliceVer, int sliceHor);
192
+};
193
+
194
+class ScalerSlice
195
+{
196
+private:
197
+    enum ScalerSlicePlaneNum { m_numSlicePlane = 4 };
198
+public:
199
+    int m_width;        // Slice line width
200
+    int m_hCrSubSample; // horizontal Chroma subsampling factor
201
+    int m_vCrSubSample; // vertical chroma subsampling factor
202
+    int m_isRing;       // flag to identify if this ScalerSlice is a ring buffer
203
+    int m_destroyLines; // flag to identify if there are dynamic allocated lines
204
+    ScalerPlane m_plane[m_numSlicePlane];
205
+public:
206
+    ScalerSlice();
207
+    ~ScalerSlice() { destroy(); }
208
+    int rotate(int lum, int cr);
209
+    void fillOnes(int n, int is16bit);
210
+    int create(int lumLines, int crLines, int h_sub_sample, int v_sub_sample, int ring);
211
+    int createLines(int size, int width);
212
+    void destroyLines();
213
+    void destroy();
214
+    int initFromSrc(uint8_t *src[4], const int stride[4], int srcW, int lumY, int lumH, int crY, int crH, int relative);
215
+};
216
+
217
+class ScalerFilterManager {
218
+private:
219
+    enum ScalerFilterNum { m_numSlice = 3, m_numFilter = 4 };
220
+
221
+private:
222
+    int                     m_bitDepth;
223
+    int                     m_algorithmFlags;  // 1, bilinear; 4 bicubic, default is bicubic
224
+    int                     m_srcW;            // Width  of source luma planes.
225
+    int                     m_srcH;            // Height of source luma planes.
226
+    int                     m_dstW;            // Width of dest luma planes.
227
+    int                     m_dstH;            // Height of dest luma planes.
228
+    int                     m_crSrcW;          // Width  of source chroma planes.
229
+    int                     m_crSrcH;          // Height of source chroma planes.
230
+    int                     m_crDstW;          // Width  of dest chroma planes.
231
+    int                     m_crDstH;          // Height of dest chroma planes.
232
+    int                     m_crSrcHSubSample; // Binary log of horizontal subsampling factor between Y and Cr planes in src  image.
233
+    int                     m_crSrcVSubSample; // Binary log of vertical   subsampling factor between Y and Cr planes in src  image.
234
+    int                     m_crDstHSubSample; // Binary log of horizontal subsampling factor between Y and Cr planes in dest image.
235
+    int                     m_crDstVSubSample; // Binary log of vertical   subsampling factor between Y and Cr planes in dest image.
236
+    ScalerSlice*            m_slices[m_numSlice];
237
+    ScalerFilter*           m_ScalerFilters[m_numFilter];
238
+private:
239
+    int getLocalPos(int crSubSample, int pos);
240
+    void getMinBufferSize(int *out_lum_size, int *out_cr_size);
241
+    int initScalerSlice();
242
+public:
243
+    ScalerFilterManager();
244
+    ~ScalerFilterManager() {
245
+        for (int i = 0; i < m_numSlice; i++)
246
+            if (m_slices[i]) { m_slices[i]->destroy(); delete m_slices[i]; m_slices[i] = NULL; }
247
+        for (int i = 0; i < m_numFilter; i++)
248
+            if (m_ScalerFilters[i]) { delete m_ScalerFilters[i]; m_ScalerFilters[i] = NULL; }
249
+    }
250
+    int init(int algorithmFlags, VideoDesc* srcVideoDesc, VideoDesc* dstVideoDesc);
251
+    int scale_pic(void** src, void** dst, int* srcStride, int* dstStride);
252
+};
253
+}
254
+
255
+#endif //ifndef X265_SCALER_H
256
x265_3.3.tar.gz/source/common/threading.h -> x265_3.4.tar.gz/source/common/threading.h Changed
31
 
1
@@ -238,6 +238,14 @@
2
         LeaveCriticalSection(&m_cs);
3
     }
4
 
5
+    void decr()
6
+    {
7
+        EnterCriticalSection(&m_cs);
8
+        m_val--;
9
+        WakeAllConditionVariable(&m_cv);
10
+        LeaveCriticalSection(&m_cs);
11
+    }
12
+
13
 protected:
14
 
15
     CRITICAL_SECTION   m_cs;
16
@@ -436,6 +444,14 @@
17
         pthread_mutex_unlock(&m_mutex);
18
     }
19
 
20
+    void decr()
21
+    {
22
+        pthread_mutex_lock(&m_mutex);
23
+        m_val--;
24
+        pthread_cond_broadcast(&m_cond);
25
+        pthread_mutex_unlock(&m_mutex);
26
+    }
27
+
28
 protected:
29
 
30
     pthread_mutex_t m_mutex;
31
x265_3.3.tar.gz/source/encoder/analysis.cpp -> x265_3.4.tar.gz/source/encoder/analysis.cpp Changed
151
 
1
@@ -1272,7 +1272,7 @@
2
                     md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
3
                     checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
4
 
5
-                    skipRecursion = !!m_param->bEnableRecursionSkip && md.bestMode;
6
+                    skipRecursion = !!m_param->recursionSkipMode && md.bestMode;
7
                     if (m_param->rdLevel)
8
                         skipModes = m_param->bEnableEarlySkip && md.bestMode;
9
                 }
10
@@ -1296,7 +1296,7 @@
11
                     md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
12
                     checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
13
 
14
-                    skipRecursion = !!m_param->bEnableRecursionSkip && md.bestMode;
15
+                    skipRecursion = !!m_param->recursionSkipMode && md.bestMode;
16
                     if (m_param->rdLevel)
17
                         skipModes = m_param->bEnableEarlySkip && md.bestMode;
18
                 }
19
@@ -1314,15 +1314,23 @@
20
                 skipModes = (m_param->bEnableEarlySkip || m_refineLevel == 2)
21
                 && md.bestMode && md.bestMode->cu.isSkipped(0); // TODO: sa8d threshold per depth
22
         }
23
-        if (md.bestMode && m_param->bEnableRecursionSkip && !bCtuInfoCheck && !(m_param->bAnalysisType == AVC_INFO && m_param->analysisLoadReuseLevel == 7 && (m_modeFlag[0] || m_modeFlag[1])))
24
+        if (md.bestMode && m_param->recursionSkipMode && !bCtuInfoCheck && !(m_param->bAnalysisType == AVC_INFO && m_param->analysisLoadReuseLevel == 7 && (m_modeFlag[0] || m_modeFlag[1])))
25
         {
26
             skipRecursion = md.bestMode->cu.isSkipped(0);
27
-            if (mightSplit && depth >= minDepth && !skipRecursion)
28
+            if (mightSplit && !skipRecursion)
29
             {
30
-                if (depth)
31
-                    skipRecursion = recursionDepthCheck(parentCTU, cuGeom, *md.bestMode);
32
-                if (m_bHD && !skipRecursion && m_param->rdLevel == 2 && md.fencYuv.m_size != MAX_CU_SIZE)
33
+                if (depth >= minDepth && m_param->recursionSkipMode == RDCOST_BASED_RSKIP)
34
+                {
35
+                    if (depth)
36
+                        skipRecursion = recursionDepthCheck(parentCTU, cuGeom, *md.bestMode);
37
+                    if (m_bHD && !skipRecursion && m_param->rdLevel == 2 && md.fencYuv.m_size != MAX_CU_SIZE)
38
+                        skipRecursion = complexityCheckCU(*md.bestMode);
39
+                }
40
+                else if (cuGeom.log2CUSize >= MAX_LOG2_CU_SIZE - 1 && m_param->recursionSkipMode == EDGE_BASED_RSKIP)
41
+                {
42
                     skipRecursion = complexityCheckCU(*md.bestMode);
43
+                }
44
+
45
             }
46
         }
47
         if (m_param->bAnalysisType == AVC_INFO && md.bestMode && cuGeom.numPartitions <= 16 && m_param->analysisLoadReuseLevel == 7)
48
@@ -1972,7 +1980,7 @@
49
                     checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, refMasks);
50
                     checkBestMode(md.pred[PRED_2Nx2N], cuGeom.depth);
51
 
52
-                    if (m_param->bEnableRecursionSkip && depth && m_modeDepth[depth - 1].bestMode)
53
+                    if (m_param->recursionSkipMode && depth && m_modeDepth[depth - 1].bestMode)
54
                         skipRecursion = md.bestMode && !md.bestMode->cu.getQtRootCbf(0);
55
                 }
56
                 if (m_param->analysisLoadReuseLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
57
@@ -1996,7 +2004,7 @@
58
                     checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, refMasks);
59
                     checkBestMode(md.pred[PRED_2Nx2N], cuGeom.depth);
60
 
61
-                    if (m_param->bEnableRecursionSkip && depth && m_modeDepth[depth - 1].bestMode)
62
+                    if (m_param->recursionSkipMode && depth && m_modeDepth[depth - 1].bestMode)
63
                         skipRecursion = md.bestMode && !md.bestMode->cu.getQtRootCbf(0);
64
                 }
65
             }
66
@@ -2015,8 +2023,10 @@
67
             checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, refMasks);
68
             checkBestMode(md.pred[PRED_2Nx2N], cuGeom.depth);
69
 
70
-            if (m_param->bEnableRecursionSkip && depth && m_modeDepth[depth - 1].bestMode)
71
+            if (m_param->recursionSkipMode == RDCOST_BASED_RSKIP && depth && m_modeDepth[depth - 1].bestMode)
72
                 skipRecursion = md.bestMode && !md.bestMode->cu.getQtRootCbf(0);
73
+            else if (cuGeom.log2CUSize >= MAX_LOG2_CU_SIZE - 1 && m_param->recursionSkipMode == EDGE_BASED_RSKIP)
74
+                skipRecursion = md.bestMode && complexityCheckCU(*md.bestMode);
75
         }
76
         if (m_param->bAnalysisType == AVC_INFO && md.bestMode && cuGeom.numPartitions <= 16 && m_param->analysisLoadReuseLevel == 7)
77
             skipRecursion = true;
78
@@ -3525,27 +3535,47 @@
79
 
80
 bool Analysis::complexityCheckCU(const Mode& bestMode)
81
 {
82
-    uint32_t mean = 0;
83
-    uint32_t homo = 0;
84
-    uint32_t cuSize = bestMode.fencYuv->m_size;
85
-    for (uint32_t y = 0; y < cuSize; y++) {
86
-        for (uint32_t x = 0; x < cuSize; x++) {
87
-            mean += (bestMode.fencYuv->m_buf[0][y * cuSize + x]);
88
+    if (m_param->recursionSkipMode == RDCOST_BASED_RSKIP)
89
+    {
90
+        uint32_t mean = 0;
91
+        uint32_t homo = 0;
92
+        uint32_t cuSize = bestMode.fencYuv->m_size;
93
+        for (uint32_t y = 0; y < cuSize; y++) {
94
+            for (uint32_t x = 0; x < cuSize; x++) {
95
+                mean += (bestMode.fencYuv->m_buf[0][y * cuSize + x]);
96
+            }
97
         }
98
-    }
99
-    mean = mean / (cuSize * cuSize);
100
-    for (uint32_t y = 0 ; y < cuSize; y++){
101
-        for (uint32_t x = 0 ; x < cuSize; x++){
102
-            homo += abs(int(bestMode.fencYuv->m_buf[0][y * cuSize + x] - mean));
103
+        mean = mean / (cuSize * cuSize);
104
+        for (uint32_t y = 0; y < cuSize; y++) {
105
+            for (uint32_t x = 0; x < cuSize; x++) {
106
+                homo += abs(int(bestMode.fencYuv->m_buf[0][y * cuSize + x] - mean));
107
+            }
108
         }
109
-    }
110
-    homo = homo / (cuSize * cuSize);
111
+        homo = homo / (cuSize * cuSize);
112
 
113
-    if (homo < (.1 * mean))
114
-        return true;
115
+        if (homo < (.1 * mean))
116
+            return true;
117
 
118
-    return false;
119
-}
120
+        return false;
121
+    }
122
+    else
123
+    {
124
+        int blockType = bestMode.cu.m_log2CUSize[0] - LOG2_UNIT_SIZE;
125
+        int shift = bestMode.cu.m_log2CUSize[0] * LOG2_UNIT_SIZE;
126
+        intptr_t stride = m_frame->m_fencPic->m_stride;
127
+        intptr_t blockOffsetLuma = bestMode.cu.m_cuPelX + bestMode.cu.m_cuPelY * stride;
128
+        uint64_t sum_ss = primitives.cu[blockType].var(m_frame->m_edgeBitPic + blockOffsetLuma, stride);
129
+        uint32_t sum = (uint32_t)sum_ss;
130
+        uint32_t ss = (uint32_t)(sum_ss >> 32);
131
+        uint32_t pixelCount = 1 << shift;
132
+        double cuEdgeVariance = (ss - ((double)sum * sum / pixelCount)) / pixelCount;
133
+
134
+        if (cuEdgeVariance > (double)m_param->edgeVarThreshold)
135
+            return false;
136
+        else
137
+            return true;
138
+    }
139
+ }
140
 
141
 uint32_t Analysis::calculateCUVariance(const CUData& ctu, const CUGeom& cuGeom)
142
 {
143
@@ -3570,7 +3600,6 @@
144
             cnt++;
145
         }
146
     }
147
-    
148
     return cuVariance / cnt;
149
 }
150
 
151
x265_3.3.tar.gz/source/encoder/analysis.h -> x265_3.4.tar.gz/source/encoder/analysis.h Changed
18
 
1
@@ -52,7 +52,7 @@
2
         splitRefs = 0;
3
         mvCost[0] = 0; // L0
4
         mvCost[1] = 0; // L1
5
-        sa8dCost    = 0;
6
+        sa8dCost  = 0;
7
     }
8
 };
9
 
10
@@ -120,7 +120,6 @@
11
 
12
     Mode& compressCTU(CUData& ctu, Frame& frame, const CUGeom& cuGeom, const Entropy& initialContext);
13
     int32_t loadTUDepth(CUGeom cuGeom, CUData parentCTU);
14
-
15
 protected:
16
     /* Analysis data for save/load mode, writes/reads data based on absPartIdx */
17
     x265_analysis_inter_data*  m_reuseInterDataCTU;
18
x265_3.3.tar.gz/source/encoder/api.cpp -> x265_3.4.tar.gz/source/encoder/api.cpp Changed
35
 
1
@@ -1016,12 +1016,12 @@
2
 
3
 void x265_zone_free(x265_param *param)
4
 {
5
-    if (param && param->rc.zonefileCount) {
6
+    if (param && param->rc.zones && (param->rc.zoneCount || param->rc.zonefileCount))
7
+    {
8
         for (int i = 0; i < param->rc.zonefileCount; i++)
9
             x265_free(param->rc.zones[i].zoneParam);
10
-    }
11
-    if (param && (param->rc.zoneCount || param->rc.zonefileCount))
12
         x265_free(param->rc.zones);
13
+    }
14
 }
15
 
16
 static const x265_api libapi =
17
@@ -1294,6 +1294,8 @@
18
                     fprintf(csvfp, "RateFactor, ");
19
                 if (param->rc.vbvBufferSize)
20
                     fprintf(csvfp, "BufferFill, BufferFillFinal, ");
21
+                if (param->rc.vbvBufferSize && param->csvLogLevel >= 2)
22
+                    fprintf(csvfp, "UnclippedBufferFillFinal, ");
23
                 if (param->bEnablePsnr)
24
                     fprintf(csvfp, "Y PSNR, U PSNR, V PSNR, YUV PSNR, ");
25
                 if (param->bEnableSsim)
26
@@ -1405,6 +1407,8 @@
27
         fprintf(param->csvfpt, "%.3lf,", frameStats->rateFactor);
28
     if (param->rc.vbvBufferSize)
29
         fprintf(param->csvfpt, "%.3lf, %.3lf,", frameStats->bufferFill, frameStats->bufferFillFinal);
30
+    if (param->rc.vbvBufferSize && param->csvLogLevel >= 2)
31
+        fprintf(param->csvfpt, "%.3lf,", frameStats->unclippedBufferFillFinal);
32
     if (param->bEnablePsnr)
33
         fprintf(param->csvfpt, "%.3lf, %.3lf, %.3lf, %.3lf,", frameStats->psnrY, frameStats->psnrU, frameStats->psnrV, frameStats->psnr);
34
     if (param->bEnableSsim)
35
x265_3.3.tar.gz/source/encoder/encoder.cpp -> x265_3.4.tar.gz/source/encoder/encoder.cpp Changed
327
 
1
@@ -218,10 +218,7 @@
2
 
3
     if (m_param->bHistBasedSceneCut)
4
     {
5
-        for (int i = 0; i < x265_cli_csps[m_param->internalCsp].planes; i++)
6
-        {
7
-            m_planeSizes[i] = (m_param->sourceWidth >> x265_cli_csps[p->internalCsp].width[i]) * (m_param->sourceHeight >> x265_cli_csps[m_param->internalCsp].height[i]);
8
-        }
9
+        m_planeSizes[0] = (m_param->sourceWidth >> x265_cli_csps[p->internalCsp].width[0]) * (m_param->sourceHeight >> x265_cli_csps[m_param->internalCsp].height[0]);
10
         uint32_t pixelbytes = m_param->internalBitDepth > 8 ? 2 : 1;
11
         m_edgePic = X265_MALLOC(pixel, m_planeSizes[0] * pixelbytes);
12
         m_edgeHistThreshold = m_param->edgeTransitionThreshold;
13
@@ -1443,9 +1440,9 @@
14
     int32_t planeCount = x265_cli_csps[m_param->internalCsp].planes;
15
     memset(m_edgePic, 0, bufSize);
16
 
17
-    if (!computeEdge(m_edgePic, src, NULL, pic->width, pic->height, pic->width, false))
18
+    if (!computeEdge(m_edgePic, src, NULL, pic->width, pic->height, pic->width, false, 1))
19
     {
20
-        x265_log(m_param, X265_LOG_ERROR, "Failed edge computation!");
21
+        x265_log(m_param, X265_LOG_ERROR, "Failed to compute edge!");
22
         return false;
23
     }
24
 
25
@@ -1605,6 +1602,14 @@
26
         if (m_param->bHistBasedSceneCut && pic_in)
27
         {
28
             x265_picture *pic = (x265_picture *) pic_in;
29
+
30
+            if (pic->poc == 0)
31
+            {
32
+                /* for entire encode compute the chroma plane sizes only once */
33
+                for (int i = 1; i < x265_cli_csps[m_param->internalCsp].planes; i++)
34
+                    m_planeSizes[i] = (pic->width >> x265_cli_csps[m_param->internalCsp].width[i]) * (pic->height >> x265_cli_csps[m_param->internalCsp].height[i]);
35
+            }
36
+
37
             if (computeHistograms(pic))
38
             {
39
                 double maxUVSad = 0.0, edgeSad = 0.0;
40
@@ -1752,6 +1757,12 @@
41
                         }
42
                     }
43
                 }
44
+                if (m_param->recursionSkipMode == EDGE_BASED_RSKIP && m_param->bHistBasedSceneCut)
45
+                {
46
+                    pixel* src = m_edgePic;
47
+                    primitives.planecopy_pp_shr(src, inFrame->m_fencPic->m_picWidth, inFrame->m_edgeBitPic, inFrame->m_fencPic->m_stride,
48
+                        inFrame->m_fencPic->m_picWidth, inFrame->m_fencPic->m_picHeight, 0);
49
+                }
50
             }
51
             else
52
             {
53
@@ -2414,7 +2425,7 @@
54
         encParam->maxNumReferences = param->maxNumReferences; // never uses more refs than specified in stream headers
55
         encParam->bEnableFastIntra = param->bEnableFastIntra;
56
         encParam->bEnableEarlySkip = param->bEnableEarlySkip;
57
-        encParam->bEnableRecursionSkip = param->bEnableRecursionSkip;
58
+        encParam->recursionSkipMode = param->recursionSkipMode;
59
         encParam->searchMethod = param->searchMethod;
60
         /* Scratch buffer prevents me_range from being increased for esa/tesa */
61
         if (param->searchRange < encParam->searchRange)
62
@@ -3006,6 +3017,8 @@
63
             frameStats->ipCostRatio = curFrame->m_lowres.ipCostRatio;
64
         frameStats->bufferFill = m_rateControl->m_bufferFillActual;
65
         frameStats->bufferFillFinal = m_rateControl->m_bufferFillFinal;
66
+        if (m_param->csvLogLevel >= 2)
67
+            frameStats->unclippedBufferFillFinal = m_rateControl->m_unclippedBufferFillFinal;
68
         frameStats->frameLatency = inPoc - poc;
69
         if (m_param->rc.rateControlMode == X265_RC_CRF)
70
             frameStats->rateFactor = curEncData.m_rateFactor;
71
@@ -3400,7 +3413,7 @@
72
         p->maxNumReferences = zone->maxNumReferences;
73
         p->bEnableFastIntra = zone->bEnableFastIntra;
74
         p->bEnableEarlySkip = zone->bEnableEarlySkip;
75
-        p->bEnableRecursionSkip = zone->bEnableRecursionSkip;
76
+        p->recursionSkipMode = zone->recursionSkipMode;
77
         p->searchMethod = zone->searchMethod;
78
         p->searchRange = zone->searchRange;
79
         p->subpelRefine = zone->subpelRefine;
80
@@ -3681,20 +3694,6 @@
81
     if (p->analysisLoad && !p->analysisLoadReuseLevel)
82
         p->analysisLoadReuseLevel = 5;
83
 
84
-    if ((p->bAnalysisType == DEFAULT) && p->rc.cuTree)
85
-    {
86
-        if (p->analysisSaveReuseLevel && p->analysisSaveReuseLevel < 10)
87
-        {
88
-            x265_log(p, X265_LOG_WARNING, "cu-tree works only with analysis-save-reuse-level 10, Disabling cu-tree\n");
89
-            p->rc.cuTree = 0;
90
-        }
91
-        if (p->analysisLoadReuseLevel && p->analysisLoadReuseLevel < 10)
92
-        {
93
-            x265_log(p, X265_LOG_WARNING, "cu-tree works only with analysis-load-reuse-level 10, Disabling cu-tree\n");
94
-            p->rc.cuTree = 0;
95
-        }
96
-    }
97
-
98
     if ((p->analysisLoad || p->analysisSave) && (p->bDistributeModeAnalysis || p->bDistributeMotionEstimation))
99
     {
100
         x265_log(p, X265_LOG_WARNING, "Analysis load/save options incompatible with pmode/pme, Disabling pmode/pme\n");
101
@@ -3867,29 +3866,30 @@
102
         }
103
         else
104
         {
105
-            if (fread(&m_conformanceWindow.rightOffset, sizeof(int), 1, m_analysisFileIn) != 1)
106
+            int rightOffset, bottomOffset;
107
+            if (fread(&rightOffset, sizeof(int), 1, m_analysisFileIn) != 1)
108
             {
109
                 x265_log(NULL, X265_LOG_ERROR, "Error reading analysis data. Conformance window right offset missing\n");
110
                 m_aborted = true;
111
             }
112
-            else if (m_conformanceWindow.rightOffset && p->analysisLoadReuseLevel > 1)
113
+            else if (rightOffset && p->analysisLoadReuseLevel > 1)
114
             {
115
                 int scaleFactor = p->scaleFactor < 2 ? 1 : p->scaleFactor;
116
-                padsize = m_conformanceWindow.rightOffset * scaleFactor;
117
+                padsize = rightOffset * scaleFactor;
118
                 p->sourceWidth += padsize;
119
                 m_conformanceWindow.bEnabled = true;
120
                 m_conformanceWindow.rightOffset = padsize;
121
             }
122
 
123
-            if (fread(&m_conformanceWindow.bottomOffset, sizeof(int), 1, m_analysisFileIn) != 1)
124
+            if (fread(&bottomOffset, sizeof(int), 1, m_analysisFileIn) != 1)
125
             {
126
                 x265_log(NULL, X265_LOG_ERROR, "Error reading analysis data. Conformance window bottom offset missing\n");
127
                 m_aborted = true;
128
             }
129
-            else if (m_conformanceWindow.bottomOffset && p->analysisLoadReuseLevel > 1)
130
+            else if (bottomOffset && p->analysisLoadReuseLevel > 1)
131
             {
132
                 int scaleFactor = p->scaleFactor < 2 ? 1 : p->scaleFactor;
133
-                padsize = m_conformanceWindow.bottomOffset * scaleFactor;
134
+                padsize = bottomOffset * scaleFactor;
135
                 p->sourceHeight += padsize;
136
                 m_conformanceWindow.bEnabled = true;
137
                 m_conformanceWindow.bottomOffset = padsize;
138
@@ -4196,7 +4196,7 @@
139
         x265_log(p, X265_LOG_WARNING, "Radl requires fixed gop-length (keyint == min-keyint). Disabling radl.\n");
140
     }
141
 
142
-    if ((p->chunkStart || p->chunkEnd) && p->bOpenGOP)
143
+    if ((p->chunkStart || p->chunkEnd) && p->bOpenGOP && m_param->bResetZoneConfig)
144
     {
145
         p->chunkStart = p->chunkEnd = 0;
146
         x265_log(p, X265_LOG_WARNING, "Chunking requires closed gop structure. Disabling chunking.\n");
147
@@ -4229,12 +4229,6 @@
148
         x265_log(p, X265_LOG_WARNING, "Turning on repeat - headers for zone encoding\n");
149
     }
150
 
151
-    if (!m_param->bResetZoneConfig && (p->keyframeMax != p->keyframeMin))
152
-        x265_log(p, X265_LOG_WARNING, "External zone reconfiguration requires a fixed GOP size to enable appropriate signaling of HRD info\n");
153
-
154
-    if (!m_param->bResetZoneConfig && (p->reconfigWindowSize != (uint64_t)p->keyframeMax))
155
-        x265_log(p, X265_LOG_WARNING, "Zone size must be multiple of GOP size to enable appropriate signaling of HRD info\n");
156
-
157
     if (m_param->bEnableHME)
158
     {
159
         if (m_param->sourceHeight < 540)
160
@@ -4311,18 +4305,27 @@
161
         }
162
     }
163
 
164
+    uint32_t numCUsLoad, numCUsInHeightLoad;
165
+
166
     /* Now arrived at the right frame, read the record */
167
     analysis->poc = poc;
168
     analysis->frameRecordSize = frameRecordSize;
169
     X265_FREAD(&analysis->sliceType, sizeof(int), 1, m_analysisFileIn, &(picData->sliceType));
170
     X265_FREAD(&analysis->bScenecut, sizeof(int), 1, m_analysisFileIn, &(picData->bScenecut));
171
     X265_FREAD(&analysis->satdCost, sizeof(int64_t), 1, m_analysisFileIn, &(picData->satdCost));
172
-    X265_FREAD(&analysis->numCUsInFrame, sizeof(int), 1, m_analysisFileIn, &(picData->numCUsInFrame));
173
+    X265_FREAD(&numCUsLoad, sizeof(int), 1, m_analysisFileIn, &(picData->numCUsInFrame));
174
     X265_FREAD(&analysis->numPartitions, sizeof(int), 1, m_analysisFileIn, &(picData->numPartitions));
175
 
176
+    /* Update analysis info to save current settings */
177
+    uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
178
+    uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
179
+    uint32_t numCUsInFrame = widthInCU * heightInCU;
180
+    analysis->numCUsInFrame = numCUsInFrame;
181
+    analysis->numCuInHeight = heightInCU;
182
+
183
     if (m_param->bDisableLookahead)
184
     {
185
-        X265_FREAD(&analysis->numCuInHeight, sizeof(uint32_t), 1, m_analysisFileIn, &(picData->numCuInHeight));
186
+        X265_FREAD(&numCUsInHeightLoad, sizeof(uint32_t), 1, m_analysisFileIn, &(picData->numCuInHeight));
187
         X265_FREAD(&analysis->lookahead, sizeof(x265_lookahead_data), 1, m_analysisFileIn, &(picData->lookahead));
188
     }
189
     int scaledNumPartition = analysis->numPartitions;
190
@@ -4335,16 +4338,16 @@
191
 
192
     if (m_param->ctuDistortionRefine == CTU_DISTORTION_INTERNAL)
193
     {
194
-        X265_FREAD((analysis->distortionData)->ctuDistortion, sizeof(sse_t), analysis->numCUsInFrame, m_analysisFileIn, picDistortion);
195
+        X265_FREAD((analysis->distortionData)->ctuDistortion, sizeof(sse_t), numCUsLoad, m_analysisFileIn, picDistortion);
196
         computeDistortionOffset(analysis);
197
     }
198
     if (m_param->bDisableLookahead && m_rateControl->m_isVbv)
199
     {
200
         size_t vbvCount = m_param->lookaheadDepth + m_param->bframes + 2;
201
-        X265_FREAD(analysis->lookahead.intraVbvCost, sizeof(uint32_t), analysis->numCUsInFrame, m_analysisFileIn, picData->lookahead.intraVbvCost);
202
-        X265_FREAD(analysis->lookahead.vbvCost, sizeof(uint32_t), analysis->numCUsInFrame, m_analysisFileIn, picData->lookahead.vbvCost);
203
-        X265_FREAD(analysis->lookahead.satdForVbv, sizeof(uint32_t), analysis->numCuInHeight, m_analysisFileIn, picData->lookahead.satdForVbv);
204
-        X265_FREAD(analysis->lookahead.intraSatdForVbv, sizeof(uint32_t), analysis->numCuInHeight, m_analysisFileIn, picData->lookahead.intraSatdForVbv);
205
+        X265_FREAD(analysis->lookahead.intraVbvCost, sizeof(uint32_t), numCUsLoad, m_analysisFileIn, picData->lookahead.intraVbvCost);
206
+        X265_FREAD(analysis->lookahead.vbvCost, sizeof(uint32_t), numCUsLoad, m_analysisFileIn, picData->lookahead.vbvCost);
207
+        X265_FREAD(analysis->lookahead.satdForVbv, sizeof(uint32_t), numCUsInHeightLoad, m_analysisFileIn, picData->lookahead.satdForVbv);
208
+        X265_FREAD(analysis->lookahead.intraSatdForVbv, sizeof(uint32_t), numCUsInHeightLoad, m_analysisFileIn, picData->lookahead.intraSatdForVbv);
209
         X265_FREAD(analysis->lookahead.plannedSatd, sizeof(int64_t), vbvCount, m_analysisFileIn, picData->lookahead.plannedSatd);
210
 
211
         if (m_param->scaleFactor)
212
@@ -4352,12 +4355,12 @@
213
             for (uint64_t index = 0; index < vbvCount; index++)
214
                 analysis->lookahead.plannedSatd[index] *= factor;
215
 
216
-            for (uint32_t i = 0; i < analysis->numCuInHeight; i++)
217
+            for (uint32_t i = 0; i < numCUsInHeightLoad; i++)
218
             {
219
                 analysis->lookahead.satdForVbv[i] *= factor;
220
                 analysis->lookahead.intraSatdForVbv[i] *= factor;
221
             }
222
-            for (uint32_t i = 0; i < analysis->numCUsInFrame; i++)
223
+            for (uint32_t i = 0; i < numCUsLoad; i++)
224
             {
225
                 analysis->lookahead.vbvCost[i] *= factor;
226
                 analysis->lookahead.intraVbvCost[i] *= factor;
227
@@ -4407,13 +4410,13 @@
228
 
229
         if (!m_param->scaleFactor)
230
         {
231
-            X265_FREAD((analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFileIn, intraPic->modes);
232
+            X265_FREAD((analysis->intraData)->modes, sizeof(uint8_t), numCUsLoad * analysis->numPartitions, m_analysisFileIn, intraPic->modes);
233
         }
234
         else
235
         {
236
-            uint8_t *tempLumaBuf = X265_MALLOC(uint8_t, analysis->numCUsInFrame * scaledNumPartition);
237
-            X265_FREAD(tempLumaBuf, sizeof(uint8_t), analysis->numCUsInFrame * scaledNumPartition, m_analysisFileIn, intraPic->modes);
238
-            for (uint32_t ctu32Idx = 0, cnt = 0; ctu32Idx < analysis->numCUsInFrame * scaledNumPartition; ctu32Idx++, cnt += factor)
239
+            uint8_t *tempLumaBuf = X265_MALLOC(uint8_t, numCUsLoad * scaledNumPartition);
240
+            X265_FREAD(tempLumaBuf, sizeof(uint8_t), numCUsLoad * scaledNumPartition, m_analysisFileIn, intraPic->modes);
241
+            for (uint32_t ctu32Idx = 0, cnt = 0; ctu32Idx < numCUsLoad * scaledNumPartition; ctu32Idx++, cnt += factor)
242
                 memset(&(analysis->intraData)->modes[cnt], tempLumaBuf[ctu32Idx], factor);
243
             X265_FREE(tempLumaBuf);
244
         }
245
@@ -4447,7 +4450,7 @@
246
         }
247
         if (m_param->bAnalysisType == HEVC_INFO)
248
         {
249
-            depthBytes = analysis->numCUsInFrame * analysis->numPartitions;
250
+            depthBytes = numCUsLoad * analysis->numPartitions;
251
             memcpy(((x265_analysis_inter_data *)analysis->interData)->depth, interPic->depth, depthBytes);
252
         }
253
         else
254
@@ -4551,25 +4554,26 @@
255
             {
256
                 if (!m_param->scaleFactor)
257
                 {
258
-                    X265_FREAD((analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFileIn, intraPic->modes);
259
+                    X265_FREAD((analysis->intraData)->modes, sizeof(uint8_t), numCUsLoad * analysis->numPartitions, m_analysisFileIn, intraPic->modes);
260
                 }
261
                 else
262
                 {
263
-                    uint8_t *tempLumaBuf = X265_MALLOC(uint8_t, analysis->numCUsInFrame * scaledNumPartition);
264
-                    X265_FREAD(tempLumaBuf, sizeof(uint8_t), analysis->numCUsInFrame * scaledNumPartition, m_analysisFileIn, intraPic->modes);
265
-                    for (uint32_t ctu32Idx = 0, cnt = 0; ctu32Idx < analysis->numCUsInFrame * scaledNumPartition; ctu32Idx++, cnt += factor)
266
+                    uint8_t *tempLumaBuf = X265_MALLOC(uint8_t, numCUsLoad * scaledNumPartition);
267
+                    X265_FREAD(tempLumaBuf, sizeof(uint8_t), numCUsLoad * scaledNumPartition, m_analysisFileIn, intraPic->modes);
268
+                    for (uint32_t ctu32Idx = 0, cnt = 0; ctu32Idx < numCUsLoad * scaledNumPartition; ctu32Idx++, cnt += factor)
269
                         memset(&(analysis->intraData)->modes[cnt], tempLumaBuf[ctu32Idx], factor);
270
                     X265_FREE(tempLumaBuf);
271
                 }
272
             }
273
         }
274
         else
275
-            X265_FREAD((analysis->interData)->ref, sizeof(int32_t), analysis->numCUsInFrame * X265_MAX_PRED_MODE_PER_CTU * numDir, m_analysisFileIn, interPic->ref);
276
+            X265_FREAD((analysis->interData)->ref, sizeof(int32_t), numCUsLoad * X265_MAX_PRED_MODE_PER_CTU * numDir, m_analysisFileIn, interPic->ref);
277
 
278
         consumedBytes += frameRecordSize;
279
         if (numDir == 1)
280
             totalConsumedBytes = consumedBytes;
281
     }
282
+
283
 #undef X265_FREAD
284
 }
285
 
286
@@ -5032,13 +5036,14 @@
287
     X265_PARAM_VALIDATE(saveParam->lookaheadDepth, sizeof(int), 1, &m_param->lookaheadDepth, rc - lookahead);
288
     X265_PARAM_VALIDATE(saveParam->chunkStart, sizeof(int), 1, &m_param->chunkStart, chunk-start);
289
     X265_PARAM_VALIDATE(saveParam->chunkEnd, sizeof(int), 1, &m_param->chunkEnd, chunk-end);
290
-    X265_PARAM_VALIDATE(saveParam->cuTree,sizeof(int),1,&m_param->rc.cuTree, cutree - offset);
291
     X265_PARAM_VALIDATE(saveParam->ctuDistortionRefine, sizeof(int), 1, &m_param->ctuDistortionRefine, ctu - distortion);
292
+    X265_PARAM_VALIDATE(saveParam->frameDuplication, sizeof(int), 1, &m_param->bEnableFrameDuplication, frame - dup);
293
 
294
     int sourceHeight, sourceWidth;
295
     if (writeFlag)
296
     {
297
         X265_PARAM_VALIDATE(saveParam->analysisReuseLevel, sizeof(int), 1, &m_param->analysisSaveReuseLevel, analysis - save - reuse - level);
298
+        X265_PARAM_VALIDATE(saveParam->cuTree, sizeof(int), 1, &m_param->rc.cuTree, cutree-offset);
299
         sourceHeight = m_param->sourceHeight - m_conformanceWindow.bottomOffset;
300
         sourceWidth = m_param->sourceWidth - m_conformanceWindow.rightOffset;
301
         X265_PARAM_VALIDATE(saveParam->sourceWidth, sizeof(int), 1, &sourceWidth, res-width);
302
@@ -5073,6 +5078,15 @@
303
             return -1;
304
         }
305
 
306
+        int bcutree;
307
+        X265_FREAD(&bcutree, sizeof(int), 1, m_analysisFileIn, &(saveParam->cuTree));
308
+        if (loadLevel == 10 && m_param->rc.cuTree && (!bcutree || saveLevel < 2))
309
+        {
310
+            x265_log(NULL, X265_LOG_ERROR, "Error reading cu-tree info. Disabling cutree offsets. \n");
311
+            m_param->rc.cuTree = 0;
312
+            return -1;
313
+        }
314
+
315
         bool error = false;
316
         int curSourceHeight = m_param->sourceHeight - m_conformanceWindow.bottomOffset;
317
         int curSourceWidth = m_param->sourceWidth - m_conformanceWindow.rightOffset;
318
@@ -5701,7 +5715,7 @@
319
     TOOLCMP(oldParam->maxNumReferences, newParam->maxNumReferences, "ref=%d to %d\n");
320
     TOOLCMP(oldParam->bEnableFastIntra, newParam->bEnableFastIntra, "fast-intra=%d to %d\n");
321
     TOOLCMP(oldParam->bEnableEarlySkip, newParam->bEnableEarlySkip, "early-skip=%d to %d\n");
322
-    TOOLCMP(oldParam->bEnableRecursionSkip, newParam->bEnableRecursionSkip, "rskip=%d to %d\n");
323
+    TOOLCMP(oldParam->recursionSkipMode, newParam->recursionSkipMode, "rskip=%d to %d\n");
324
     TOOLCMP(oldParam->searchMethod, newParam->searchMethod, "me=%d to %d\n");
325
     TOOLCMP(oldParam->searchRange, newParam->searchRange, "merange=%d to %d\n");
326
     TOOLCMP(oldParam->subpelRefine, newParam->subpelRefine, "subme= %d to %d\n");
327
x265_3.3.tar.gz/source/encoder/frameencoder.cpp -> x265_3.4.tar.gz/source/encoder/frameencoder.cpp Changed
29
 
1
@@ -130,7 +130,7 @@
2
         {
3
             rowSum += sliceGroupSizeAccu;
4
             m_sliceBaseRow[++sidx] = i;
5
-        }        
6
+        }
7
     }
8
     X265_CHECK(sidx < m_param->maxSlices, "sliceID check failed!");
9
     m_sliceBaseRow[0] = 0;
10
@@ -448,6 +448,18 @@
11
     m_ssimCnt = 0;
12
     memset(&(m_frame->m_encData->m_frameStats), 0, sizeof(m_frame->m_encData->m_frameStats));
13
 
14
+    if (!m_param->bHistBasedSceneCut && m_param->rc.aqMode != X265_AQ_EDGE && m_param->recursionSkipMode == EDGE_BASED_RSKIP)
15
+    {
16
+        int height = m_frame->m_fencPic->m_picHeight;
17
+        int width = m_frame->m_fencPic->m_picWidth;
18
+        intptr_t stride = m_frame->m_fencPic->m_stride;
19
+
20
+        if (!computeEdge(m_frame->m_edgeBitPic, m_frame->m_fencPic->m_picOrg[0], NULL, stride, height, width, false, 1))
21
+        {
22
+            x265_log(m_param, X265_LOG_ERROR, " Failed to compute edge !");
23
+        }
24
+    }
25
+
26
     /* Emit access unit delimiter unless this is the first frame and the user is
27
      * not repeating headers (since AUD is supposed to be the first NAL in the access
28
      * unit) */
29
x265_3.3.tar.gz/source/encoder/ratecontrol.cpp -> x265_3.4.tar.gz/source/encoder/ratecontrol.cpp Changed
173
 
1
@@ -269,7 +269,7 @@
2
         x265_log(m_param, X265_LOG_WARNING, "NAL HRD parameters require VBV parameters, ignored\n");
3
         m_param->bEmitHRDSEI = 0;
4
     }
5
-    m_isCbr = m_param->rc.rateControlMode == X265_RC_ABR && m_isVbv && !m_2pass && m_param->rc.vbvMaxBitrate <= m_param->rc.bitrate;
6
+    m_isCbr = m_param->rc.rateControlMode == X265_RC_ABR && m_isVbv && m_param->rc.vbvMaxBitrate <= m_param->rc.bitrate;
7
     if (m_param->rc.bStrictCbr && !m_isCbr)
8
     {
9
         x265_log(m_param, X265_LOG_WARNING, "strict CBR set without CBR mode, ignored\n");
10
@@ -335,7 +335,7 @@
11
         int vbvBufferSize = m_param->rc.vbvBufferSize * 1000;
12
         int vbvMaxBitrate = m_param->rc.vbvMaxBitrate * 1000;
13
 
14
-        if (m_param->bEmitHRDSEI)
15
+        if (m_param->bEmitHRDSEI && !m_param->decoderVbvMaxRate)
16
         {
17
             const HRDInfo* hrd = &sps.vuiParameters.hrdParameters;
18
             vbvBufferSize = hrd->cpbSizeValue << (hrd->cpbSizeScale + CPB_SHIFT);
19
@@ -509,6 +509,7 @@
20
                 CMP_OPT_FIRST_PASS(" keyint", m_param->keyframeMax);
21
                 CMP_OPT_FIRST_PASS("scenecut", m_param->scenecutThreshold);
22
                 CMP_OPT_FIRST_PASS("intra-refresh", m_param->bIntraRefresh);
23
+                CMP_OPT_FIRST_PASS("frame-dup", m_param->bEnableFrameDuplication);
24
                 if (m_param->bMultiPassOptRPS)
25
                 {
26
                     CMP_OPT_FIRST_PASS("multi-pass-opt-rps", m_param->bMultiPassOptRPS);
27
@@ -546,7 +547,7 @@
28
                 x265_log(m_param, X265_LOG_WARNING, "2nd pass has fewer frames than 1st pass (%d vs %d)\n",
29
                          m_param->totalFrames, m_numEntries);
30
             }
31
-            if (m_param->totalFrames > m_numEntries)
32
+            if (m_param->totalFrames > m_numEntries && !m_param->bEnableFrameDuplication)
33
             {
34
                 x265_log(m_param, X265_LOG_ERROR, "2nd pass has more frames than 1st pass (%d vs %d)\n",
35
                          m_param->totalFrames, m_numEntries);
36
@@ -781,6 +782,10 @@
37
     // Init HRD
38
     HRDInfo* hrd = &sps.vuiParameters.hrdParameters;
39
     hrd->cbrFlag = m_isCbr;
40
+    if (m_param->reconfigWindowSize) {
41
+        hrd->cbrFlag = 0;
42
+        vbvMaxBitrate = m_param->decoderVbvMaxRate * 1000;
43
+    }
44
 
45
     // normalize HRD size and rate to the value / scale notation
46
     hrd->bitRateScale = x265_clip3(0, 15, calcScale(vbvMaxBitrate) - BR_SHIFT);
47
@@ -829,7 +834,7 @@
48
         /* weighted average of cplx of future frames */
49
         for (int j = 1; j < cplxBlur * 2 && j < m_numEntries - i; j++)
50
         {
51
-            int index = m_encOrder[i + j];
52
+            int index = i+j;
53
             RateControlEntry *rcj = &m_rce2Pass[index];
54
             weight *= 1 - pow(rcj->iCuCount / m_ncu, 2);
55
             if (weight < 0.0001)
56
@@ -842,7 +847,7 @@
57
         weight = 1.0;
58
         for (int j = 0; j <= cplxBlur * 2 && j <= i; j++)
59
         {
60
-            int index = m_encOrder[i - j];
61
+            int index = i-j;
62
             RateControlEntry *rcj = &m_rce2Pass[index];
63
             gaussianWeight = weight * exp(-j * j / 200.0);
64
             weightSum += gaussianWeight;
65
@@ -851,7 +856,7 @@
66
             if (weight < .0001)
67
                 break;
68
         }
69
-        m_rce2Pass[m_encOrder[i]].blurredComplexity = cplxSum / weightSum;
70
+        m_rce2Pass[i].blurredComplexity= cplxSum / weightSum;
71
     }
72
     CHECKED_MALLOC(qScale, double, m_numEntries);
73
     if (filterSize > 1)
74
@@ -870,7 +875,7 @@
75
     expectedBits = 1;
76
     for (int i = 0; i < m_numEntries; i++)
77
     {
78
-        RateControlEntry* rce = &m_rce2Pass[m_encOrder[i]];
79
+        RateControlEntry* rce = &m_rce2Pass[i];
80
         double q = getQScale(rce, 1.0);
81
         expectedBits += qScale2bits(rce, q);
82
         m_lastQScaleFor[rce->sliceType] = q;
83
@@ -893,15 +898,15 @@
84
         /* find qscale */
85
         for (int i = 0; i < m_numEntries; i++)
86
         {
87
-            RateControlEntry *rce = &m_rce2Pass[m_encOrder[i]];
88
+            RateControlEntry *rce = &m_rce2Pass[i];
89
             qScale[i] = getQScale(rce, rateFactor);
90
             m_lastQScaleFor[rce->sliceType] = qScale[i];
91
         }
92
 
93
         /* fixed I/B qscale relative to P */
94
-        for (int i = m_numEntries - 1; i >= 0; i--)
95
+        for (int i = 0; i < m_numEntries; i++)
96
         {
97
-            qScale[i] = getDiffLimitedQScale(&m_rce2Pass[m_encOrder[i]], qScale[i]);
98
+            qScale[i] = getDiffLimitedQScale(&m_rce2Pass[i], qScale[i]);
99
             X265_CHECK(qScale[i] >= 0, "qScale became negative\n");
100
         }
101
 
102
@@ -912,7 +917,6 @@
103
             for (int i = 0; i < m_numEntries; i++)
104
             {
105
                 double q = 0.0, sum = 0.0;
106
-
107
                 for (int j = 0; j < filterSize; j++)
108
                 {
109
                     int idx = i + j - filterSize / 2;
110
@@ -920,7 +924,7 @@
111
                     double coeff = qBlur == 0 ? 1.0 : exp(-d * d / (qBlur * qBlur));
112
                     if (idx < 0 || idx >= m_numEntries)
113
                         continue;
114
-                    if (m_rce2Pass[m_encOrder[i]].sliceType != m_rce2Pass[m_encOrder[idx]].sliceType)
115
+                    if (m_rce2Pass[i].sliceType != m_rce2Pass[idx].sliceType)
116
                         continue;
117
                     q += qScale[idx] * coeff;
118
                     sum += coeff;
119
@@ -932,7 +936,7 @@
120
         /* find expected bits */
121
         for (int i = 0; i < m_numEntries; i++)
122
         {
123
-            RateControlEntry *rce = &m_rce2Pass[m_encOrder[i]];
124
+            RateControlEntry *rce = &m_rce2Pass[i];
125
             rce->newQScale = clipQscale(NULL, rce, blurredQscale[i]); // check if needed
126
             X265_CHECK(rce->newQScale >= 0, "new Qscale is negative\n");
127
             expectedBits += qScale2bits(rce, rce->newQScale);
128
@@ -1279,6 +1283,7 @@
129
                 m_param->rc.vbvMaxBitrate = m_param->rc.zones[i].zoneParam->rc.vbvMaxBitrate;
130
                 memcpy(m_relativeComplexity, m_param->rc.zones[i].relativeComplexity, sizeof(double) * m_param->reconfigWindowSize);
131
                 reconfigureRC();
132
+                m_isCbr = 1; /* Always vbvmaxrate == bitrate here*/
133
                 m_top->zoneReadCount[i].incr();
134
             }
135
         }
136
@@ -1951,7 +1956,7 @@
137
                 /* Adjust quant based on the difference between
138
                  * achieved and expected bitrate so far */
139
                 double curTime = (double)rce->encodeOrder / m_numEntries;
140
-                double w = x265_clip3(0.0, 1.0, curTime * 100);
141
+                double w = x265_clip3(0.0, 1.0, curTime);
142
                 q *= pow((double)m_totalBits / m_expectedBitsSum, w);
143
             }
144
             if (m_framesDone == 0 && m_param->rc.rateControlMode == X265_RC_ABR && m_isGrainEnabled)
145
@@ -2742,7 +2747,9 @@
146
         x265_log(m_param, X265_LOG_WARNING, "poc:%d, VBV underflow (%.0f bits)\n", rce->poc, m_bufferFillFinal);
147
 
148
     m_bufferFillFinal = X265_MAX(m_bufferFillFinal, 0);
149
-    m_bufferFillFinal += m_bufferRate;
150
+    m_bufferFillFinal += rce->bufferRate;
151
+    if (m_param->csvLogLevel >= 2)
152
+        m_unclippedBufferFillFinal = m_bufferFillFinal;
153
 
154
     if (m_param->rc.bStrictCbr)
155
     {
156
@@ -2752,14 +2759,14 @@
157
             filler += FILLER_OVERHEAD * 8;
158
         }
159
         m_bufferFillFinal -= filler;
160
-        bufferBits = X265_MIN(bits + filler + m_bufferExcess, m_bufferRate);
161
+        bufferBits = X265_MIN(bits + filler + m_bufferExcess, rce->bufferRate);
162
         m_bufferExcess = X265_MAX(m_bufferExcess - bufferBits + bits + filler, 0);
163
         m_bufferFillActual += bufferBits - bits - filler;
164
     }
165
     else
166
     {
167
         m_bufferFillFinal = X265_MIN(m_bufferFillFinal, m_bufferSize);
168
-        bufferBits = X265_MIN(bits + m_bufferExcess, m_bufferRate);
169
+        bufferBits = X265_MIN(bits + m_bufferExcess, rce->bufferRate);
170
         m_bufferExcess = X265_MAX(m_bufferExcess - bufferBits + bits, 0);
171
         m_bufferFillActual += bufferBits - bits;
172
         m_bufferFillActual = X265_MIN(m_bufferFillActual, m_bufferSize);
173
x265_3.3.tar.gz/source/encoder/ratecontrol.h -> x265_3.4.tar.gz/source/encoder/ratecontrol.h Changed
9
 
1
@@ -157,6 +157,7 @@
2
     double m_rateFactorConstant;
3
     double m_bufferSize;
4
     double m_bufferFillFinal;  /* real buffer as of the last finished frame */
5
+    double m_unclippedBufferFillFinal; /* real unclipped buffer as of the last finished frame used to log in CSV*/
6
     double m_bufferFill;       /* planned buffer, if all in-progress frames hit their bit budget */
7
     double m_bufferRate;       /* # of bits added to buffer_fill after each frame */
8
     double m_vbvMaxRate;       /* in kbps */
9
x265_3.3.tar.gz/source/encoder/slicetype.cpp -> x265_3.4.tar.gz/source/encoder/slicetype.cpp Changed
33
 
1
@@ -87,7 +87,7 @@
2
 
3
 namespace X265_NS {
4
 
5
-bool computeEdge(pixel *edgePic, pixel *refPic, pixel *edgeTheta, intptr_t stride, int height, int width, bool bcalcTheta)
6
+bool computeEdge(pixel* edgePic, pixel* refPic, pixel* edgeTheta, intptr_t stride, int height, int width, bool bcalcTheta, pixel whitePixel)
7
 {
8
     intptr_t rowOne = 0, rowTwo = 0, rowThree = 0, colOne = 0, colTwo = 0, colThree = 0;
9
     intptr_t middle = 0, topLeft = 0, topRight = 0, bottomLeft = 0, bottomRight = 0;
10
@@ -141,7 +141,7 @@
11
                        theta = 180 + theta;
12
                     edgeTheta[middle] = (pixel)theta;
13
                 }
14
-                edgePic[middle] = (pixel)(gradientMagnitude >= edgeThreshold ? edgeThreshold : blackPixel);
15
+                edgePic[middle] = (pixel)(gradientMagnitude >= EDGE_THRESHOLD ? whitePixel : blackPixel);
16
             }
17
         }
18
         return true;
19
@@ -519,6 +519,13 @@
20
                 if (param->rc.aqMode == X265_AQ_EDGE)
21
                     edgeFilter(curFrame, param);
22
 
23
+                if (param->rc.aqMode == X265_AQ_EDGE && !param->bHistBasedSceneCut && param->recursionSkipMode == EDGE_BASED_RSKIP)
24
+                {
25
+                    pixel* src = curFrame->m_edgePic + curFrame->m_fencPic->m_lumaMarginY * curFrame->m_fencPic->m_stride + curFrame->m_fencPic->m_lumaMarginX;
26
+                    primitives.planecopy_pp_shr(src, curFrame->m_fencPic->m_stride, curFrame->m_edgeBitPic,
27
+                        curFrame->m_fencPic->m_stride, curFrame->m_fencPic->m_picWidth, curFrame->m_fencPic->m_picHeight, SHIFT_TO_BITPLANE);
28
+                }
29
+
30
                 if (param->rc.aqMode == X265_AQ_AUTO_VARIANCE || param->rc.aqMode == X265_AQ_AUTO_VARIANCE_BIASED || param->rc.aqMode == X265_AQ_EDGE)
31
                 {
32
                     double bit_depth_correction = 1.f / (1 << (2 * (X265_DEPTH - 8)));
33
x265_3.3.tar.gz/source/encoder/slicetype.h -> x265_3.4.tar.gz/source/encoder/slicetype.h Changed
31
 
1
@@ -44,9 +44,9 @@
2
 #define EDGE_INCLINATION 45
3
 
4
 #if HIGH_BIT_DEPTH
5
-#define edgeThreshold 1023.0
6
+#define EDGE_THRESHOLD 1023.0
7
 #else
8
-#define edgeThreshold 255.0
9
+#define EDGE_THRESHOLD 255.0
10
 #endif
11
 #define PI 3.14159265
12
 
13
@@ -101,7 +101,7 @@
14
 protected:
15
 
16
     uint32_t acEnergyCu(Frame* curFrame, uint32_t blockX, uint32_t blockY, int csp, uint32_t qgSize);
17
-    uint32_t edgeDensityCu(Frame*curFrame, uint32_t &avgAngle, uint32_t blockX, uint32_t blockY, uint32_t qgSize);
18
+    uint32_t edgeDensityCu(Frame* curFrame, uint32_t &avgAngle, uint32_t blockX, uint32_t blockY, uint32_t qgSize);
19
     uint32_t lumaSumCu(Frame* curFrame, uint32_t blockX, uint32_t blockY, uint32_t qgSize);
20
     uint32_t weightCostLuma(Lowres& fenc, Lowres& ref, WeightParam& wp);
21
     bool     allocWeightedRef(Lowres& fenc);
22
@@ -265,7 +265,6 @@
23
     CostEstimateGroup& operator=(const CostEstimateGroup&);
24
 };
25
 
26
-bool computeEdge(pixel *edgePic, pixel *refPic, pixel *edgeTheta, intptr_t stride, int height, int width, bool bcalcTheta);
27
-
28
+bool computeEdge(pixel* edgePic, pixel* refPic, pixel* edgeTheta, intptr_t stride, int height, int width, bool bcalcTheta, pixel whitePixel = EDGE_THRESHOLD);
29
 }
30
 #endif // ifndef X265_SLICETYPE_H
31
x265_3.3.tar.gz/source/test/CMakeLists.txt -> x265_3.4.tar.gz/source/test/CMakeLists.txt Changed
24
 
1
@@ -23,13 +23,15 @@
2
 
3
 # add ARM assembly files
4
 if(ARM OR CROSS_COMPILE_ARM)
5
-    enable_language(ASM)
6
-    set(NASM_SRC checkasm-arm.S)
7
-    add_custom_command(
8
-        OUTPUT checkasm-arm.obj
9
-        COMMAND ${CMAKE_CXX_COMPILER}
10
-        ARGS ${NASM_FLAGS} ${CMAKE_CURRENT_SOURCE_DIR}/checkasm-arm.S -o checkasm-arm.obj
11
-        DEPENDS checkasm-arm.S)
12
+    if(NOT ARM64)
13
+        enable_language(ASM)
14
+        set(NASM_SRC checkasm-arm.S)
15
+        add_custom_command(
16
+            OUTPUT checkasm-arm.obj
17
+            COMMAND ${CMAKE_CXX_COMPILER}
18
+            ARGS ${NASM_FLAGS} ${CMAKE_CURRENT_SOURCE_DIR}/checkasm-arm.S -o checkasm-arm.obj
19
+            DEPENDS checkasm-arm.S)
20
+    endif()
21
 endif(ARM OR CROSS_COMPILE_ARM)
22
 
23
 # add PowerPC assembly files
24
x265_3.3.tar.gz/source/test/regression-tests.txt -> x265_3.4.tar.gz/source/test/regression-tests.txt Changed
23
 
1
@@ -75,7 +75,7 @@
2
 News-4k.y4m,--preset superfast --lookahead-slices 6 --aq-mode 0
3
 News-4k.y4m,--preset superfast --slices 4 --aq-mode 0 
4
 News-4k.y4m,--preset medium --tune ssim --no-sao --qg-size 16
5
-News-4k.y4m,--preset veryslow --no-rskip
6
+News-4k.y4m,--preset veryslow --rskip 0
7
 News-4k.y4m,--preset veryslow --pme --crf 40
8
 OldTownCross_1920x1080_50_10bit_422.yuv,--preset superfast --weightp
9
 OldTownCross_1920x1080_50_10bit_422.yuv,--preset medium --no-weightp
10
@@ -162,7 +162,11 @@
11
 sintel_trailer_2k_1920x1080_24.yuv, --preset medium --hist-scenecut --hist-threshold 0.02 --frame-dup --dup-threshold 60 --hrd --bitrate 10000 --vbv-bufsize 15000 --vbv-maxrate 12000
12
 sintel_trailer_2k_1920x1080_24.yuv, --preset medium --hist-scenecut --hist-threshold 0.02
13
 sintel_trailer_2k_1920x1080_24.yuv, --preset ultrafast --hist-scenecut --hist-threshold 0.02
14
-
15
+crowd_run_1920x1080_50.yuv, --preset faster --ctu 32 --rskip 2 --rskip-edge-threshold 5
16
+crowd_run_1920x1080_50.yuv, --preset fast --ctu 64 --rskip 2 --rskip-edge-threshold 5 --aq-mode 4
17
+crowd_run_1920x1080_50.yuv, --preset slow --ctu 32 --rskip 2 --rskip-edge-threshold 5 --hist-scenecut --hist-threshold 0.1
18
+crowd_run_1920x1080_50.yuv, --preset slower --ctu 16 --rskip 2 --rskip-edge-threshold 5 --hist-scenecut --hist-threshold 0.1 --aq-mode 4
19
+ 
20
 # Main12 intraCost overflow bug test
21
 720p50_parkrun_ter.y4m,--preset medium
22
 
23
x265_3.3.tar.gz/source/test/save-load-tests.txt -> x265_3.4.tar.gz/source/test/save-load-tests.txt Changed
6
 
1
@@ -18,3 +18,4 @@
2
 RaceHorses_416x240_30.y4m,   --preset slow --no-cutree --ctu 16 --analysis-save x265_analysis.dat --analysis-save-reuse-level 10 --scale-factor 2 --crf 22  --vbv-maxrate 1000 --vbv-bufsize 1000::RaceHorses_832x480_30.y4m,    --preset slow --no-cutree --ctu 32 --analysis-load x265_analysis.dat  --analysis-save x265_analysis_2.dat --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --crf 16 --vbv-maxrate 4000 --vbv-bufsize 4000 --refine-intra 0 --refine-inter 1::RaceHorses_1664x960_30.y4m,   --preset slow --no-cutree --ctu 64 --analysis-load x265_analysis_2.dat  --analysis-load-reuse-level 10 --scale-factor 2 --crf 12 --vbv-maxrate 7000 --vbv-bufsize 7000 --refine-intra 2 --refine-inter 2
3
 crowd_run_540p50.y4m,   --preset veryslow --no-cutree --analysis-save x265_analysis_540.dat  --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 5000 --vbv-bufsize 15000 --vbv-maxrate 9000::crowd_run_1080p50.y4m,   --preset veryslow --no-cutree --analysis-save x265_analysis_1080.dat  --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_1080p50.y4m,  --preset veryslow --no-cutree --analysis-save x265_analysis_1080.dat --analysis-load x265_analysis_540.dat --refine-intra 4 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_2160p50.y4m,  --preset veryslow --no-cutree --analysis-save x265_analysis_2160.dat --analysis-load x265_analysis_1080.dat --refine-intra 3 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000::crowd_run_2160p50.y4m,  --preset veryslow --no-cutree --analysis-load x265_analysis_2160.dat --refine-intra 2 --dynamic-refine --analysis-load-reuse-level 10 --scale-factor 1 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000
4
 crowd_run_540p50.y4m,  --preset medium --no-cutree --analysis-save x265_analysis_540.dat  --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 5000 --vbv-bufsize 15000 --vbv-maxrate 9000::crowd_run_1080p50.y4m,  --preset medium --no-cutree --analysis-save x265_analysis_1080.dat  --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_1080p50.y4m,  --preset medium --no-cutree --analysis-save x265_analysis_1080.dat --analysis-load x265_analysis_540.dat --refine-intra 4 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 10000 --vbv-bufsize 30000 --vbv-maxrate 17500::crowd_run_2160p50.y4m,  --preset medium --no-cutree --analysis-save x265_analysis_2160.dat --analysis-load x265_analysis_1080.dat --refine-intra 3 --dynamic-refine --analysis-load-reuse-level 10 --analysis-save-reuse-level 10 --scale-factor 2 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000::crowd_run_2160p50.y4m,  --preset medium --no-cutree --analysis-load x265_analysis_2160.dat --refine-intra 2 --dynamic-refine --analysis-load-reuse-level 10 --scale-factor 1 --bitrate 24000 --vbv-bufsize 84000 --vbv-maxrate 49000
5
+News-4k.y4m,  --preset medium --analysis-save x265_analysis_fdup.dat --frame-dup --hrd --bitrate 10000 --vbv-bufsize 15000 --vbv-maxrate 12000::News-4k.y4m, --analysis-load x265_analysis_fdup.dat --frame-dup --hrd --bitrate 10000 --vbv-bufsize 15000 --vbv-maxrate 12000
6
x265_3.3.tar.gz/source/test/testbench.cpp -> x265_3.4.tar.gz/source/test/testbench.cpp Changed
38
 
1
@@ -5,6 +5,7 @@
2
  *          Mandar Gurav <mandar@multicorewareinc.com>
3
  *          Mahesh Pittala <mahesh@multicorewareinc.com>
4
  *          Min Chen <chenm003@163.com>
5
+ *          Yimeng Su <yimeng.su@huawei.com>
6
  *
7
  * This program is free software; you can redistribute it and/or modify
8
  * it under the terms of the GNU General Public License as published by
9
@@ -208,6 +209,14 @@
10
         EncoderPrimitives asmprim;
11
         memset(&asmprim, 0, sizeof(asmprim));
12
         setupAssemblyPrimitives(asmprim, test_arch[i].flag);
13
+
14
+#if X265_ARCH_ARM64
15
+        /* Temporary workaround because luma_vsp assembly primitive has not been completed
16
+         * but interp_8tap_hv_pp_cpu uses mixed C primitive and assembly primitive.
17
+         * Otherwise, segment fault occurs. */
18
+        setupAliasCPrimitives(cprim, asmprim, test_arch[i].flag);
19
+#endif
20
+
21
         setupAliasPrimitives(asmprim);
22
         memcpy(&primitives, &asmprim, sizeof(EncoderPrimitives));
23
         for (size_t h = 0; h < sizeof(harness) / sizeof(TestHarness*); h++)
24
@@ -232,6 +241,13 @@
25
 #endif
26
     setupAssemblyPrimitives(optprim, cpuid);
27
 
28
+#if X265_ARCH_ARM64
29
+    /* Temporary workaround because luma_vsp assembly primitive has not been completed
30
+     * but interp_8tap_hv_pp_cpu uses mixed C primitive and assembly primitive.
31
+     * Otherwise, segment fault occurs. */
32
+    setupAliasCPrimitives(cprim, optprim, cpuid);
33
+#endif
34
+
35
     /* Note that we do not setup aliases for performance tests, that would be
36
      * redundant. The testbench only verifies they are correctly aliased */
37
 
38
x265_3.3.tar.gz/source/test/testharness.h -> x265_3.4.tar.gz/source/test/testharness.h Changed
26
 
1
@@ -3,6 +3,7 @@
2
  *
3
  * Authors: Steve Borho <steve@borho.org>
4
  *          Min Chen <chenm003@163.com>
5
+ *          Yimeng Su <yimeng.su@huawei.com>
6
  *
7
  * This program is free software; you can redistribute it and/or modify
8
  * it under the terms of the GNU General Public License as published by
9
@@ -81,12 +82,16 @@
10
 #if X265_ARCH_X86
11
     asm volatile("rdtsc" : "=a" (a) ::"edx");
12
 #elif X265_ARCH_ARM
13
+#if X265_ARCH_ARM64
14
+    asm volatile("mrs %0, cntvct_el0" : "=r"(a));
15
+#else
16
     // TOD-DO: verify following inline asm to get cpu Timestamp Counter for ARM arch
17
     // asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r"(a));
18
 
19
     // TO-DO: replace clock() function with appropriate ARM cpu instructions
20
     a = clock();
21
 #endif
22
+#endif
23
     return a;
24
 }
25
 #endif // ifdef _MSC_VER
26
x265_3.3.tar.gz/source/x265.cpp -> x265_3.4.tar.gz/source/x265.cpp Changed
1314
 
1
@@ -27,11 +27,7 @@
2
 
3
 #include "x265.h"
4
 #include "x265cli.h"
5
-
6
-#include "input/input.h"
7
-#include "output/output.h"
8
-#include "output/reconplay.h"
9
-#include "svt.h"
10
+#include "abrEncApp.h"
11
 
12
 #if HAVE_VLD
13
 /* Visual Leak Detector */
14
@@ -47,191 +43,59 @@
15
 #include <fstream>
16
 #include <queue>
17
 
18
-#define CONSOLE_TITLE_SIZE 200
19
-#ifdef _WIN32
20
-#include <windows.h>
21
-#define SetThreadExecutionState(es)
22
-static char orgConsoleTitle[CONSOLE_TITLE_SIZE] = "";
23
-#else
24
-#define GetConsoleTitle(t, n)
25
-#define SetConsoleTitle(t)
26
-#define SetThreadExecutionState(es)
27
-#endif
28
-
29
 using namespace X265_NS;
30
 
31
-/* Ctrl-C handler */
32
-static volatile sig_atomic_t b_ctrl_c /* = 0 */;
33
-static void sigint_handler(int)
34
-{
35
-    b_ctrl_c = 1;
36
-}
37
-#define START_CODE 0x00000001
38
-#define START_CODE_BYTES 4
39
-
40
-struct CLIOptions
41
-{
42
-    InputFile* input;
43
-    ReconFile* recon;
44
-    OutputFile* output;
45
-    FILE*       qpfile;
46
-    FILE*       zoneFile;
47
-    FILE*    dolbyVisionRpu;    /* File containing Dolby Vision BL RPU metadata */
48
-    const char* reconPlayCmd;
49
-    const x265_api* api;
50
-    x265_param* param;
51
-    x265_vmaf_data* vmafData;
52
-    bool bProgress;
53
-    bool bForceY4m;
54
-    bool bDither;
55
-    uint32_t seek;              // number of frames to skip from the beginning
56
-    uint32_t framesToBeEncoded; // number of frames to encode
57
-    uint64_t totalbytes;
58
-    int64_t startTime;
59
-    int64_t prevUpdateTime;
60
-
61
-    /* in microseconds */
62
-    static const int UPDATE_INTERVAL = 250000;
63
-
64
-    CLIOptions()
65
-    {
66
-        input = NULL;
67
-        recon = NULL;
68
-        output = NULL;
69
-        qpfile = NULL;
70
-        zoneFile = NULL;
71
-        dolbyVisionRpu = NULL;
72
-        reconPlayCmd = NULL;
73
-        api = NULL;
74
-        param = NULL;
75
-        vmafData = NULL;
76
-        framesToBeEncoded = seek = 0;
77
-        totalbytes = 0;
78
-        bProgress = true;
79
-        bForceY4m = false;
80
-        startTime = x265_mdate();
81
-        prevUpdateTime = 0;
82
-        bDither = false;
83
-    }
84
+#define X265_HEAD_ENTRIES 3
85
 
86
-    void destroy();
87
-    void printStatus(uint32_t frameNum);
88
-    bool parse(int argc, char **argv);
89
-    bool parseZoneParam(int argc, char **argv, x265_param* globalParam, int zonefileCount);
90
-    bool parseQPFile(x265_picture &pic_org);
91
-    bool parseZoneFile();
92
-};
93
-
94
-void CLIOptions::destroy()
95
-{
96
-    if (input)
97
-        input->release();
98
-    input = NULL;
99
-    if (recon)
100
-        recon->release();
101
-    recon = NULL;
102
-    if (qpfile)
103
-        fclose(qpfile);
104
-    qpfile = NULL;
105
-    if (zoneFile)
106
-        fclose(zoneFile);
107
-    zoneFile = NULL;
108
-    if (dolbyVisionRpu)
109
-        fclose(dolbyVisionRpu);
110
-    dolbyVisionRpu = NULL;
111
-    if (output)
112
-        output->release();
113
-    output = NULL;
114
-}
115
-
116
-void CLIOptions::printStatus(uint32_t frameNum)
117
-{
118
-    char buf[200];
119
-    int64_t time = x265_mdate();
120
-
121
-    if (!bProgress || !frameNum || (prevUpdateTime && time - prevUpdateTime < UPDATE_INTERVAL))
122
-        return;
123
-
124
-    int64_t elapsed = time - startTime;
125
-    double fps = elapsed > 0 ? frameNum * 1000000. / elapsed : 0;
126
-    float bitrate = 0.008f * totalbytes * (param->fpsNum / param->fpsDenom) / ((float)frameNum);
127
-    if (framesToBeEncoded)
128
-    {
129
-        int eta = (int)(elapsed * (framesToBeEncoded - frameNum) / ((int64_t)frameNum * 1000000));
130
-        sprintf(buf, "x265 [%.1f%%] %d/%d frames, %.2f fps, %.2f kb/s, eta %d:%02d:%02d",
131
-            100. * frameNum / (param->chunkEnd ? param->chunkEnd : param->totalFrames), frameNum, (param->chunkEnd ? param->chunkEnd : param->totalFrames), fps, bitrate,
132
-                eta / 3600, (eta / 60) % 60, eta % 60);
133
-    }
134
-    else
135
-        sprintf(buf, "x265 %d frames: %.2f fps, %.2f kb/s", frameNum, fps, bitrate);
136
-
137
-    fprintf(stderr, "%s  \r", buf + 5);
138
-    SetConsoleTitle(buf);
139
-    fflush(stderr); // needed in windows
140
-    prevUpdateTime = time;
141
-}
142
+#ifdef _WIN32
143
+#define strdup _strdup
144
+#endif
145
 
146
-bool CLIOptions::parseZoneParam(int argc, char **argv, x265_param* globalParam, int zonefileCount)
147
+#ifdef _WIN32
148
+/* Copy of x264 code, which allows for Unicode characters in the command line.
149
+ * Retrieve command line arguments as UTF-8. */
150
+static int get_argv_utf8(int *argc_ptr, char ***argv_ptr)
151
 {
152
-    bool bError = false;
153
-    int bShowHelp = false;
154
-    int outputBitDepth = 0;
155
-    const char *profile = NULL;
156
-
157
-    /* Presets are applied before all other options. */
158
-    for (optind = 0;;)
159
-    {
160
-        int c = getopt_long(argc, argv, short_options, long_options, NULL);
161
-        if (c == -1)
162
-            break;
163
-        else if (c == 'D')
164
-            outputBitDepth = atoi(optarg);
165
-        else if (c == 'P')
166
-            profile = optarg;
167
-        else if (c == '?')
168
-            bShowHelp = true;
169
-    }
170
-
171
-    if (!outputBitDepth && profile)
172
-    {
173
-        /* try to derive the output bit depth from the requested profile */
174
-        if (strstr(profile, "10"))
175
-            outputBitDepth = 10;
176
-        else if (strstr(profile, "12"))
177
-            outputBitDepth = 12;
178
-        else
179
-            outputBitDepth = 8;
180
-    }
181
-
182
-    api = x265_api_get(outputBitDepth);
183
-    if (!api)
184
+    int ret = 0;
185
+    wchar_t **argv_utf16 = CommandLineToArgvW(GetCommandLineW(), argc_ptr);
186
+    if (argv_utf16)
187
     {
188
-        x265_log(NULL, X265_LOG_WARNING, "falling back to default bit-depth\n");
189
-        api = x265_api_get(0);
190
-    }
191
+        int argc = *argc_ptr;
192
+        int offset = (argc + 1) * sizeof(char*);
193
+        int size = offset;
194
 
195
-    if (bShowHelp)
196
-    {
197
-        printVersion(globalParam, api);
198
-        showHelp(globalParam);
199
-    }
200
+        for (int i = 0; i < argc; i++)
201
+            size += WideCharToMultiByte(CP_UTF8, 0, argv_utf16[i], -1, NULL, 0, NULL, NULL);
202
 
203
-    globalParam->rc.zones[zonefileCount].zoneParam = api->param_alloc();
204
-    if (!globalParam->rc.zones[zonefileCount].zoneParam)
205
-    {
206
-        x265_log(NULL, X265_LOG_ERROR, "param alloc failed\n");
207
-        return true;
208
+        char **argv = *argv_ptr = (char**)malloc(size);
209
+        if (argv)
210
+        {
211
+            for (int i = 0; i < argc; i++)
212
+            {
213
+                argv[i] = (char*)argv + offset;
214
+                offset += WideCharToMultiByte(CP_UTF8, 0, argv_utf16[i], -1, argv[i], size - offset, NULL, NULL);
215
+            }
216
+            argv[argc] = NULL;
217
+            ret = 1;
218
+        }
219
+        LocalFree(argv_utf16);
220
     }
221
+    return ret;
222
+}
223
+#endif
224
 
225
-    memcpy(globalParam->rc.zones[zonefileCount].zoneParam, globalParam, sizeof(x265_param));
226
+/* Checks for abr-ladder config file in the command line.
227
+ * Returns true if abr-config file is present. Returns 
228
+ * false otherwise */
229
 
230
+static bool checkAbrLadder(int argc, char **argv, FILE **abrConfig)
231
+{
232
     for (optind = 0;;)
233
     {
234
         int long_options_index = -1;
235
         int c = getopt_long(argc, argv, short_options, long_options, &long_options_index);
236
         if (c == -1)
237
             break;
238
-
239
         if (long_options_index < 0 && c > 0)
240
         {
241
             for (size_t i = 0; i < sizeof(long_options) / sizeof(long_options[0]); i++)
242
@@ -248,593 +112,138 @@
243
                 /* getopt_long might have already printed an error message */
244
                 if (c != 63)
245
                     x265_log(NULL, X265_LOG_WARNING, "internal error: short option '%c' has no long option\n", c);
246
-                return true;
247
+                return false;
248
             }
249
         }
250
         if (long_options_index < 0)
251
         {
252
             x265_log(NULL, X265_LOG_WARNING, "short option '%c' unrecognized\n", c);
253
-            return true;
254
+            return false;
255
         }
256
-
257
-        bError |= !!api->zone_param_parse(globalParam->rc.zones[zonefileCount].zoneParam, long_options[long_options_index].name, optarg);
258
-
259
-        if (bError)
260
+        if (!strcmp(long_options[long_options_index].name, "abr-ladder"))
261
         {
262
-            const char *name = long_options_index > 0 ? long_options[long_options_index].name : argv[optind - 2];
263
-            x265_log(NULL, X265_LOG_ERROR, "invalid argument: %s = %s\n", name, optarg);
264
+            *abrConfig = x265_fopen(optarg, "rb");
265
+            if (!abrConfig)
266
+                x265_log_file(NULL, X265_LOG_ERROR, "%s abr-ladder config file not found or error in opening zone file\n", optarg);
267
             return true;
268
         }
269
     }
270
-
271
-    if (optind < argc)
272
-    {
273
-        x265_log(param, X265_LOG_WARNING, "extra unused command arguments given <%s>\n", argv[optind]);
274
-        return true;
275
-    }
276
     return false;
277
 }
278
 
279
-bool CLIOptions::parse(int argc, char **argv)
280
+static uint8_t getNumAbrEncodes(FILE* abrConfig)
281
 {
282
-    bool bError = false;
283
-    int bShowHelp = false;
284
-    int inputBitDepth = 8;
285
-    int outputBitDepth = 0;
286
-    int reconFileBitDepth = 0;
287
-    const char *inputfn = NULL;
288
-    const char *reconfn = NULL;
289
-    const char *outputfn = NULL;
290
-    const char *preset = NULL;
291
-    const char *tune = NULL;
292
-    const char *profile = NULL;
293
-    int svtEnabled = 0;
294
-
295
-    if (argc <= 1)
296
-    {
297
-        x265_log(NULL, X265_LOG_ERROR, "No input file. Run x265 --help for a list of options.\n");
298
-        return true;
299
-    }
300
-
301
-    /* Presets are applied before all other options. */
302
-    for (optind = 0;; )
303
-    {
304
-        int optionsIndex = -1;
305
-        int c = getopt_long(argc, argv, short_options, long_options, &optionsIndex);
306
-        if (c == -1)
307
-            break;
308
-        else if (c == 'p')
309
-            preset = optarg;
310
-        else if (c == 't')
311
-            tune = optarg;
312
-        else if (c == 'D')
313
-            outputBitDepth = atoi(optarg);
314
-        else if (c == 'P')
315
-            profile = optarg;
316
-        else if (c == '?')
317
-            bShowHelp = true;
318
-        else if (!c && !strcmp(long_options[optionsIndex].name, "svt"))
319
-            svtEnabled = 1;
320
-    }
321
+    char line[1024];
322
+    uint8_t numEncodes = 0;
323
 
324
-    if (!outputBitDepth && profile)
325
+    while (fgets(line, sizeof(line), abrConfig))
326
     {
327
-        /* try to derive the output bit depth from the requested profile */
328
-        if (strstr(profile, "10"))
329
-            outputBitDepth = 10;
330
-        else if (strstr(profile, "12"))
331
-            outputBitDepth = 12;
332
-        else
333
-            outputBitDepth = 8;
334
-    }
335
-
336
-    api = x265_api_get(outputBitDepth);
337
-    if (!api)
338
-    {
339
-        x265_log(NULL, X265_LOG_WARNING, "falling back to default bit-depth\n");
340
-        api = x265_api_get(0);
341
-    }
342
-
343
-    param = api->param_alloc();
344
-    if (!param)
345
-    {
346
-        x265_log(NULL, X265_LOG_ERROR, "param alloc failed\n");
347
-        return true;
348
-    }
349
-#if ENABLE_LIBVMAF
350
-    vmafData = (x265_vmaf_data*)x265_malloc(sizeof(x265_vmaf_data));
351
-    if(!vmafData)
352
-    {
353
-        x265_log(NULL, X265_LOG_ERROR, "vmaf data alloc failed\n");
354
-        return true;
355
-    }
356
-#endif
357
-
358
-    if (api->param_default_preset(param, preset, tune) < 0)
359
-    {
360
-        x265_log(NULL, X265_LOG_ERROR, "preset or tune unrecognized\n");
361
-        return true;
362
-    }
363
-
364
-    if (bShowHelp)
365
-    {
366
-        printVersion(param, api);
367
-        showHelp(param);
368
+        if (strcmp(line, "\n") == 0)
369
+            continue;
370
+        else if (!(*line == '#'))
371
+            numEncodes++;
372
     }
373
+    rewind(abrConfig);
374
+    return numEncodes;
375
+}
376
 
377
-    //Set enable SVT-HEVC encoder first if found in the command line
378
-    if (svtEnabled) api->param_parse(param, "svt", NULL);
379
+static bool parseAbrConfig(FILE* abrConfig, CLIOptions cliopt[], uint8_t numEncodes)
380
+{
381
+    char line[1024];
382
+    char* argLine;
383
 
384
-    for (optind = 0;; )
385
+    for (uint32_t i = 0; i < numEncodes; i++)
386
     {
387
-        int long_options_index = -1;
388
-        int c = getopt_long(argc, argv, short_options, long_options, &long_options_index);
389
-        if (c == -1)
390
-            break;
391
-
392
-        switch (c)
393
+        fgets(line, sizeof(line), abrConfig);
394
+        if (*line == '#' || (strcmp(line, "\r\n") == 0))
395
+            continue;
396
+        int index = (int)strcspn(line, "\r\n");
397
+        line[index] = '\0';
398
+        argLine = line;
399
+        char* start = strchr(argLine, ' ');
400
+        while (isspace((unsigned char)*start)) start++;
401
+        int argc = 0;
402
+        char **argv = (char**)malloc(256 * sizeof(char *));
403
+        // Adding a dummy string to avoid file parsing error
404
+        argv[argc++] = (char *)"x265";
405
+
406
+        /* Parse CLI header to identify the ID of the load encode and the reuse level */
407
+        char *header = strtok(argLine, "[]");
408
+        uint32_t idCount = 0;
409
+        char *id = strtok(header, ":");
410
+        char *head[X265_HEAD_ENTRIES];
411
+        cliopt[i].encId = i;
412
+        cliopt[i].isAbrLadderConfig = true;
413
+
414
+        while (id && (idCount <= X265_HEAD_ENTRIES))
415
         {
416
-        case 'h':
417
-            printVersion(param, api);
418
-            showHelp(param);
419
-            break;
420
-
421
-        case 'V':
422
-            printVersion(param, api);
423
-            x265_report_simd(param);
424
-            exit(0);
425
-
426
-        default:
427
-            if (long_options_index < 0 && c > 0)
428
-            {
429
-                for (size_t i = 0; i < sizeof(long_options) / sizeof(long_options[0]); i++)
430
-                {
431
-                    if (long_options[i].val == c)
432
-                    {
433
-                        long_options_index = (int)i;
434
-                        break;
435
-                    }
436
-                }
437
-
438
-                if (long_options_index < 0)
439
-                {
440
-                    /* getopt_long might have already printed an error message */
441
-                    if (c != 63)
442
-                        x265_log(NULL, X265_LOG_WARNING, "internal error: short option '%c' has no long option\n", c);
443
-                    return true;
444
-                }
445
-            }
446
-            if (long_options_index < 0)
447
-            {
448
-                x265_log(NULL, X265_LOG_WARNING, "short option '%c' unrecognized\n", c);
449
-                return true;
450
-            }
451
-#define OPT(longname) \
452
-    else if (!strcmp(long_options[long_options_index].name, longname))
453
-#define OPT2(name1, name2) \
454
-    else if (!strcmp(long_options[long_options_index].name, name1) || \
455
-             !strcmp(long_options[long_options_index].name, name2))
456
-
457
-            if (0) ;
458
-            OPT2("frame-skip", "seek") this->seek = (uint32_t)x265_atoi(optarg, bError);
459
-            OPT("frames") this->framesToBeEncoded = (uint32_t)x265_atoi(optarg, bError);
460
-            OPT("no-progress") this->bProgress = false;
461
-            OPT("output") outputfn = optarg;
462
-            OPT("input") inputfn = optarg;
463
-            OPT("recon") reconfn = optarg;
464
-            OPT("input-depth") inputBitDepth = (uint32_t)x265_atoi(optarg, bError);
465
-            OPT("dither") this->bDither = true;
466
-            OPT("recon-depth") reconFileBitDepth = (uint32_t)x265_atoi(optarg, bError);
467
-            OPT("y4m") this->bForceY4m = true;
468
-            OPT("profile") /* handled above */;
469
-            OPT("preset")  /* handled above */;
470
-            OPT("tune")    /* handled above */;
471
-            OPT("output-depth")   /* handled above */;
472
-            OPT("recon-y4m-exec") reconPlayCmd = optarg;
473
-            OPT("svt")    /* handled above */;
474
-            OPT("qpfile")
475
-            {
476
-                this->qpfile = x265_fopen(optarg, "rb");
477
-                if (!this->qpfile)
478
-                    x265_log_file(param, X265_LOG_ERROR, "%s qpfile not found or error in opening qp file\n", optarg);
479
-            }
480
-            OPT("dolby-vision-rpu")
481
-            {
482
-                this->dolbyVisionRpu = x265_fopen(optarg, "rb");
483
-                if (!this->dolbyVisionRpu)
484
-                {
485
-                    x265_log_file(param, X265_LOG_ERROR, "Dolby Vision RPU metadata file %s not found or error in opening file\n", optarg);
486
-                    return true;
487
-                }
488
-            }
489
-            OPT("zonefile")
490
-            {
491
-                this->zoneFile = x265_fopen(optarg, "rb");
492
-                if (!this->zoneFile)
493
-                    x265_log_file(param, X265_LOG_ERROR, "%s zone file not found or error in opening zone file\n", optarg);
494
-            }
495
-            OPT("fullhelp")
496
-            {
497
-                param->logLevel = X265_LOG_FULL;
498
-                printVersion(param, api);
499
-                showHelp(param);
500
-                break;
501
-            }
502
-            else
503
-                bError |= !!api->param_parse(param, long_options[long_options_index].name, optarg);
504
-            if (bError)
505
-            {
506
-                const char *name = long_options_index > 0 ? long_options[long_options_index].name : argv[optind - 2];
507
-                x265_log(NULL, X265_LOG_ERROR, "invalid argument: %s = %s\n", name, optarg);
508
-                return true;
509
-            }
510
-#undef OPT
511
+            head[idCount] = id;
512
+            id = strtok(NULL, ":");
513
+            idCount++;
514
         }
515
-    }
516
-
517
-    if (optind < argc && !inputfn)
518
-        inputfn = argv[optind++];
519
-    if (optind < argc && !outputfn)
520
-        outputfn = argv[optind++];
521
-    if (optind < argc)
522
-    {
523
-        x265_log(param, X265_LOG_WARNING, "extra unused command arguments given <%s>\n", argv[optind]);
524
-        return true;
525
-    }
526
-
527
-    if (argc <= 1)
528
-    {
529
-        api->param_default(param);
530
-        printVersion(param, api);
531
-        showHelp(param);
532
-    }
533
-
534
-    if (!inputfn || !outputfn)
535
-    {
536
-        x265_log(param, X265_LOG_ERROR, "input or output file not specified, try --help for help\n");
537
-        return true;
538
-    }
539
-
540
-    if (param->internalBitDepth != api->bit_depth)
541
-    {
542
-        x265_log(param, X265_LOG_ERROR, "Only bit depths of %d are supported in this build\n", api->bit_depth);
543
-        return true;
544
-    }
545
-
546
-#ifdef SVT_HEVC
547
-    if (svtEnabled)
548
-    {
549
-        EB_H265_ENC_CONFIGURATION* svtParam = (EB_H265_ENC_CONFIGURATION*)param->svtHevcParam;
550
-        param->sourceWidth = svtParam->sourceWidth;
551
-        param->sourceHeight = svtParam->sourceHeight;
552
-        param->fpsNum = svtParam->frameRateNumerator;
553
-        param->fpsDenom = svtParam->frameRateDenominator;
554
-        svtParam->encoderBitDepth = inputBitDepth;
555
-    }
556
-#endif
557
-
558
-    InputFileInfo info;
559
-    info.filename = inputfn;
560
-    info.depth = inputBitDepth;
561
-    info.csp = param->internalCsp;
562
-    info.width = param->sourceWidth;
563
-    info.height = param->sourceHeight;
564
-    info.fpsNum = param->fpsNum;
565
-    info.fpsDenom = param->fpsDenom;
566
-    info.sarWidth = param->vui.sarWidth;
567
-    info.sarHeight = param->vui.sarHeight;
568
-    info.skipFrames = seek;
569
-    info.frameCount = 0;
570
-    getParamAspectRatio(param, info.sarWidth, info.sarHeight);
571
-
572
-
573
-    this->input = InputFile::open(info, this->bForceY4m);
574
-    if (!this->input || this->input->isFail())
575
-    {
576
-        x265_log_file(param, X265_LOG_ERROR, "unable to open input file <%s>\n", inputfn);
577
-        return true;
578
-    }
579
-
580
-    if (info.depth < 8 || info.depth > 16)
581
-    {
582
-        x265_log(param, X265_LOG_ERROR, "Input bit depth (%d) must be between 8 and 16\n", inputBitDepth);
583
-        return true;
584
-    }
585
-
586
-    /* Unconditionally accept height/width/csp/bitDepth from file info */
587
-    param->sourceWidth = info.width;
588
-    param->sourceHeight = info.height;
589
-    param->internalCsp = info.csp;
590
-    param->sourceBitDepth = info.depth;
591
-
592
-    /* Accept fps and sar from file info if not specified by user */
593
-    if (param->fpsDenom == 0 || param->fpsNum == 0)
594
-    {
595
-        param->fpsDenom = info.fpsDenom;
596
-        param->fpsNum = info.fpsNum;
597
-    }
598
-    if (!param->vui.aspectRatioIdc && info.sarWidth && info.sarHeight)
599
-        setParamAspectRatio(param, info.sarWidth, info.sarHeight);
600
-    if (this->framesToBeEncoded == 0 && info.frameCount > (int)seek)
601
-        this->framesToBeEncoded = info.frameCount - seek;
602
-    param->totalFrames = this->framesToBeEncoded;
603
-
604
-#ifdef SVT_HEVC
605
-    if (svtEnabled)
606
-    {
607
-        EB_H265_ENC_CONFIGURATION* svtParam = (EB_H265_ENC_CONFIGURATION*)param->svtHevcParam;
608
-        svtParam->sourceWidth = param->sourceWidth;
609
-        svtParam->sourceHeight = param->sourceHeight;
610
-        svtParam->frameRateNumerator = param->fpsNum;
611
-        svtParam->frameRateDenominator = param->fpsDenom;
612
-        svtParam->framesToBeEncoded = param->totalFrames;
613
-       svtParam->encoderColorFormat = (EB_COLOR_FORMAT)param->internalCsp;
614
-    }
615
-#endif
616
-    
617
-    /* Force CFR until we have support for VFR */
618
-    info.timebaseNum = param->fpsDenom;
619
-    info.timebaseDenom = param->fpsNum;
620
-
621
-    if (param->bField && param->interlaceMode)
622
-    {   // Field FPS
623
-        param->fpsNum *= 2;
624
-        // Field height
625
-        param->sourceHeight = param->sourceHeight >> 1;
626
-        // Number of fields to encode
627
-        param->totalFrames *= 2;
628
-    }
629
-
630
-    if (api->param_apply_profile(param, profile))
631
-        return true;
632
-
633
-    if (param->logLevel >= X265_LOG_INFO)
634
-    {
635
-        char buf[128];
636
-        int p = sprintf(buf, "%dx%d fps %d/%d %sp%d", param->sourceWidth, param->sourceHeight,
637
-                        param->fpsNum, param->fpsDenom, x265_source_csp_names[param->internalCsp], info.depth);
638
-
639
-        int width, height;
640
-        getParamAspectRatio(param, width, height);
641
-        if (width && height)
642
-            p += sprintf(buf + p, " sar %d:%d", width, height);
643
-
644
-        if (framesToBeEncoded <= 0 || info.frameCount <= 0)
645
-            strcpy(buf + p, " unknown frame count");
646
-        else
647
-            sprintf(buf + p, " frames %u - %d of %d", this->seek, this->seek + this->framesToBeEncoded - 1, info.frameCount);
648
-
649
-        general_log(param, input->getName(), X265_LOG_INFO, "%s\n", buf);
650
-    }
651
-
652
-    this->input->startReader();
653
-
654
-    if (reconfn)
655
-    {
656
-        if (reconFileBitDepth == 0)
657
-            reconFileBitDepth = param->internalBitDepth;
658
-        this->recon = ReconFile::open(reconfn, param->sourceWidth, param->sourceHeight, reconFileBitDepth,
659
-                                      param->fpsNum, param->fpsDenom, param->internalCsp);
660
-        if (this->recon->isFail())
661
+        if (idCount != X265_HEAD_ENTRIES)
662
         {
663
-            x265_log(param, X265_LOG_WARNING, "unable to write reconstructed outputs file\n");
664
-            this->recon->release();
665
-            this->recon = 0;
666
+            x265_log(NULL, X265_LOG_ERROR, "Incorrect number of arguments in ABR CLI header at line %d\n", i);
667
+            return false;
668
         }
669
         else
670
-            general_log(param, this->recon->getName(), X265_LOG_INFO,
671
-                    "reconstructed images %dx%d fps %d/%d %s\n",
672
-                    param->sourceWidth, param->sourceHeight, param->fpsNum, param->fpsDenom,
673
-                    x265_source_csp_names[param->internalCsp]);
674
-    }
675
-#if ENABLE_LIBVMAF
676
-    if (!reconfn)
677
-    {
678
-        x265_log(param, X265_LOG_ERROR, "recon file must be specified to get VMAF score, try --help for help\n");
679
-        return true;
680
-    }
681
-    const char *str = strrchr(info.filename, '.');
682
-
683
-    if (!strcmp(str, ".y4m"))
684
-    {
685
-        x265_log(param, X265_LOG_ERROR, "VMAF supports YUV file format only.\n");
686
-        return true; 
687
-    }
688
-    if(param->internalCsp == X265_CSP_I420 || param->internalCsp == X265_CSP_I422 || param->internalCsp == X265_CSP_I444)
689
-    {
690
-        vmafData->reference_file = x265_fopen(inputfn, "rb");
691
-        vmafData->distorted_file = x265_fopen(reconfn, "rb");
692
-    }
693
-    else
694
-    {
695
-        x265_log(param, X265_LOG_ERROR, "VMAF will support only yuv420p, yu422p, yu444p, yuv420p10le, yuv422p10le, yuv444p10le formats.\n");
696
-        return true;
697
-    }
698
-#endif
699
-    this->output = OutputFile::open(outputfn, info);
700
-    if (this->output->isFail())
701
-    {
702
-        x265_log_file(param, X265_LOG_ERROR, "failed to open output file <%s> for writing\n", outputfn);
703
-        return true;
704
-    }
705
-    general_log_file(param, this->output->getName(), X265_LOG_INFO, "output file: %s\n", outputfn);
706
-    return false;
707
-}
708
-
709
-bool CLIOptions::parseQPFile(x265_picture &pic_org)
710
-{
711
-    int32_t num = -1, qp, ret;
712
-    char type;
713
-    uint32_t filePos;
714
-    pic_org.forceqp = 0;
715
-    pic_org.sliceType = X265_TYPE_AUTO;
716
-    while (num < pic_org.poc)
717
-    {
718
-        filePos = ftell(qpfile);
719
-        qp = -1;
720
-        ret = fscanf(qpfile, "%d %c%*[ \t]%d\n", &num, &type, &qp);
721
-
722
-        if (num > pic_org.poc || ret == EOF)
723
         {
724
-            fseek(qpfile, filePos, SEEK_SET);
725
-            break;
726
+            cliopt[i].encName = strdup(head[0]);
727
+            cliopt[i].loadLevel = atoi(head[1]);
728
+            cliopt[i].reuseName = strdup(head[2]);
729
         }
730
-        if (num < pic_org.poc && ret >= 2)
731
-            continue;
732
-        if (ret == 3 && qp >= 0)
733
-            pic_org.forceqp = qp + 1;
734
-        if (type == 'I') pic_org.sliceType = X265_TYPE_IDR;
735
-        else if (type == 'i') pic_org.sliceType = X265_TYPE_I;
736
-        else if (type == 'K') pic_org.sliceType = param->bOpenGOP ? X265_TYPE_I : X265_TYPE_IDR;
737
-        else if (type == 'P') pic_org.sliceType = X265_TYPE_P;
738
-        else if (type == 'B') pic_org.sliceType = X265_TYPE_BREF;
739
-        else if (type == 'b') pic_org.sliceType = X265_TYPE_B;
740
-        else ret = 0;
741
-        if (ret < 2 || qp < -1 || qp > 51)
742
-            return 0;
743
-    }
744
-    return 1;
745
-}
746
 
747
-bool CLIOptions::parseZoneFile()
748
-{
749
-    char line[256];
750
-    char* argLine;
751
-    param->rc.zonefileCount = 0;
752
-
753
-    while (fgets(line, sizeof(line), zoneFile))
754
-    {
755
-        if (!((*line == '#') || (strcmp(line, "\r\n") == 0)))
756
-            param->rc.zonefileCount++;
757
-    }
758
-
759
-    rewind(zoneFile);
760
-    param->rc.zones = X265_MALLOC(x265_zone, param->rc.zonefileCount);
761
-    for (int i = 0; i < param->rc.zonefileCount; i++)
762
-    {
763
-        while (fgets(line, sizeof(line), zoneFile))
764
+        char* token = strtok(start, " ");
765
+        while (token)
766
         {
767
-            if (*line == '#' || (strcmp(line, "\r\n") == 0))
768
-                continue;
769
-            param->rc.zones[i].zoneParam = X265_MALLOC(x265_param, 1);
770
-            int index = (int)strcspn(line, "\r\n");
771
-            line[index] = '\0';
772
-            argLine = line;
773
-            while (isspace((unsigned char)*argLine)) argLine++;
774
-            char* start = strchr(argLine, ' ');
775
-            start++;
776
-            param->rc.zones[i].startFrame = atoi(argLine);
777
-            int argCount = 0;
778
-            char **args = (char**)malloc(256 * sizeof(char *));
779
-            // Adding a dummy string to avoid file parsing error
780
-            args[argCount++] = (char *)"x265";
781
-            char* token = strtok(start, " ");
782
-            while (token) 
783
-            {
784
-                args[argCount++] = token;
785
-                token = strtok(NULL, " ");
786
-            }
787
-            args[argCount] = NULL;
788
-            CLIOptions cliopt;
789
-            if (cliopt.parseZoneParam(argCount, args,param, i))
790
-            {
791
-                cliopt.destroy();
792
-                if (cliopt.api)
793
-                    cliopt.api->param_free(cliopt.param);
794
-                exit(1);
795
-            }
796
-            break;
797
+            argv[argc++] = strdup(token);
798
+            token = strtok(NULL, " ");
799
         }
800
-    }
801
-    return 1;
802
-}
803
-
804
-#ifdef _WIN32
805
-/* Copy of x264 code, which allows for Unicode characters in the command line.
806
- * Retrieve command line arguments as UTF-8. */
807
-static int get_argv_utf8(int *argc_ptr, char ***argv_ptr)
808
-{
809
-    int ret = 0;
810
-    wchar_t **argv_utf16 = CommandLineToArgvW(GetCommandLineW(), argc_ptr);
811
-    if (argv_utf16)
812
-    {
813
-        int argc = *argc_ptr;
814
-        int offset = (argc + 1) * sizeof(char*);
815
-        int size = offset;
816
-
817
-        for (int i = 0; i < argc; i++)
818
-            size += WideCharToMultiByte(CP_UTF8, 0, argv_utf16[i], -1, NULL, 0, NULL, NULL);
819
-
820
-        char **argv = *argv_ptr = (char**)malloc(size);
821
-        if (argv)
822
+        argv[argc] = NULL;
823
+        if (cliopt[i].parse(argc++, argv))
824
         {
825
-            for (int i = 0; i < argc; i++)
826
-            {
827
-                argv[i] = (char*)argv + offset;
828
-                offset += WideCharToMultiByte(CP_UTF8, 0, argv_utf16[i], -1, argv[i], size - offset, NULL, NULL);
829
-            }
830
-            argv[argc] = NULL;
831
-            ret = 1;
832
+            cliopt[i].destroy();
833
+            if (cliopt[i].api)
834
+                cliopt[i].api->param_free(cliopt[i].param);
835
+            exit(1);
836
         }
837
-        LocalFree(argv_utf16);
838
     }
839
-    return ret;
840
+    return true;
841
 }
842
-#endif
843
 
844
-/* Parse the RPU file and extract the RPU corresponding to the current picture 
845
- * and fill the rpu field of the input picture */
846
-static int rpuParser(x265_picture * pic, FILE * ptr)
847
+static bool setRefContext(CLIOptions cliopt[], uint32_t numEncodes)
848
 {
849
-    uint8_t byteVal;
850
-    uint32_t code = 0;
851
-    int bytesRead = 0;
852
-    pic->rpu.payloadSize = 0;
853
+    bool hasRef = false;
854
+    bool isRefFound = false;
855
 
856
-    if (!pic->pts)
857
+    /* Identify reference encode IDs and set save/load reuse levels */
858
+    for (uint32_t curEnc = 0; curEnc < numEncodes; curEnc++)
859
     {
860
-        while (bytesRead++ < 4 && fread(&byteVal, sizeof(uint8_t), 1, ptr))
861
-            code = (code << 8) | byteVal;
862
-      
863
-        if (code != START_CODE)
864
+        isRefFound = false;
865
+        hasRef = !strcmp(cliopt[curEnc].reuseName, "nil") ? false : true;
866
+        if (hasRef)
867
         {
868
-            x265_log(NULL, X265_LOG_ERROR, "Invalid Dolby Vision RPU startcode in POC %d\n", pic->pts);
869
-            return 1;
870
-        }
871
-    } 
872
-
873
-    bytesRead = 0;
874
-    while (fread(&byteVal, sizeof(uint8_t), 1, ptr))
875
-    {
876
-        code = (code << 8) | byteVal;
877
-        if (bytesRead++ < 3)
878
-            continue;
879
-        if (bytesRead >= 1024)
880
-        {
881
-            x265_log(NULL, X265_LOG_ERROR, "Invalid Dolby Vision RPU size in POC %d\n", pic->pts);
882
-            return 1;
883
+            for (uint32_t refEnc = 0; refEnc < numEncodes; refEnc++)
884
+            {
885
+                if (!strcmp(cliopt[curEnc].reuseName, cliopt[refEnc].encName))
886
+                {
887
+                    cliopt[curEnc].refId = refEnc;
888
+                    cliopt[refEnc].numRefs++;
889
+                    cliopt[refEnc].saveLevel = X265_MAX(cliopt[refEnc].saveLevel, cliopt[curEnc].loadLevel);
890
+                    isRefFound = true;
891
+                    break;
892
+                }
893
+            }
894
+            if (!isRefFound)
895
+            {
896
+                x265_log(NULL, X265_LOG_ERROR, "Reference encode (%s) not found for %s\n", cliopt[curEnc].reuseName,
897
+                    cliopt[curEnc].encName);
898
+                return false;
899
+            }
900
         }
901
-        
902
-        if (code != START_CODE)
903
-            pic->rpu.payload[pic->rpu.payloadSize++] = (code >> (3 * 8)) & 0xFF;
904
-        else
905
-            return 0;       
906
     }
907
-
908
-    int ShiftBytes = START_CODE_BYTES - (bytesRead - pic->rpu.payloadSize);
909
-    int bytesLeft = bytesRead - pic->rpu.payloadSize;
910
-    code = (code << ShiftBytes * 8);
911
-    for (int i = 0; i < bytesLeft; i++)
912
-    {
913
-        pic->rpu.payload[pic->rpu.payloadSize++] = (code >> (3 * 8)) & 0xFF;
914
-        code = (code << 8);
915
-    }
916
-    if (!pic->rpu.payloadSize)
917
-        x265_log(NULL, X265_LOG_WARNING, "Dolby Vision RPU not found for POC %d\n", pic->pts);
918
-    return 0;
919
+    return true;
920
 }
921
-
922
-
923
 /* CLI return codes:
924
  *
925
  * 0 - encode successful
926
@@ -859,354 +268,57 @@
927
     get_argv_utf8(&argc, &argv);
928
 #endif
929
 
930
-    ReconPlay* reconPlay = NULL;
931
-    CLIOptions cliopt;
932
+    uint8_t numEncodes = 1;
933
+    FILE *abrConfig = NULL;
934
+    bool isAbrLadder = checkAbrLadder(argc, argv, &abrConfig);
935
 
936
-    if (cliopt.parse(argc, argv))
937
-    {
938
-        cliopt.destroy();
939
-        if (cliopt.api)
940
-            cliopt.api->param_free(cliopt.param);
941
-        exit(1);
942
-    }
943
+    if (isAbrLadder)
944
+        numEncodes = getNumAbrEncodes(abrConfig);
945
 
946
-    x265_param* param = cliopt.param;
947
-    const x265_api* api = cliopt.api;
948
-#if ENABLE_LIBVMAF
949
-    x265_vmaf_data* vmafdata = cliopt.vmafData;
950
-#endif
951
-    /* This allows muxers to modify bitstream format */
952
-    cliopt.output->setParam(param);
953
+    CLIOptions* cliopt = new CLIOptions[numEncodes];
954
 
955
-    if (cliopt.reconPlayCmd)
956
-        reconPlay = new ReconPlay(cliopt.reconPlayCmd, *param);
957
-
958
-    if (cliopt.zoneFile)
959
+    if (isAbrLadder)
960
     {
961
-        if (!cliopt.parseZoneFile())
962
-        {
963
-            x265_log(NULL, X265_LOG_ERROR, "Unable to parse zonefile\n");
964
-            fclose(cliopt.zoneFile);
965
-            cliopt.zoneFile = NULL;
966
-        }
967
+        if (!parseAbrConfig(abrConfig, cliopt, numEncodes))
968
+            exit(1);
969
+        if (!setRefContext(cliopt, numEncodes))
970
+            exit(1);
971
     }
972
-
973
-    /* note: we could try to acquire a different libx265 API here based on
974
-     * the profile found during option parsing, but it must be done before
975
-     * opening an encoder */
976
-
977
-    x265_encoder *encoder = api->encoder_open(param);
978
-    if (!encoder)
979
+    else if (cliopt[0].parse(argc, argv))
980
     {
981
-        x265_log(param, X265_LOG_ERROR, "failed to open encoder\n");
982
-        cliopt.destroy();
983
-        api->param_free(param);
984
-        api->cleanup();
985
-        exit(2);
986
+        cliopt[0].destroy();
987
+        if (cliopt[0].api)
988
+            cliopt[0].api->param_free(cliopt[0].param);
989
+        exit(1);
990
     }
991
 
992
-    /* get the encoder parameters post-initialization */
993
-    api->encoder_parameters(encoder, param);
994
-
995
-     /* Control-C handler */
996
-    if (signal(SIGINT, sigint_handler) == SIG_ERR)
997
-        x265_log(param, X265_LOG_ERROR, "Unable to register CTRL+C handler: %s\n", strerror(errno));
998
-
999
-    x265_picture pic_orig, pic_out;
1000
-    x265_picture *pic_in = &pic_orig;
1001
-    /* Allocate recon picture if analysis save/load is enabled */
1002
-    std::priority_queue<int64_t>* pts_queue = cliopt.output->needPTS() ? new std::priority_queue<int64_t>() : NULL;
1003
-    x265_picture *pic_recon = (cliopt.recon || param->analysisSave || param->analysisLoad || pts_queue || reconPlay || param->csvLogLevel) ? &pic_out : NULL;
1004
-    uint32_t inFrameCount = 0;
1005
-    uint32_t outFrameCount = 0;
1006
-    x265_nal *p_nal;
1007
-    x265_stats stats;
1008
-    uint32_t nal;
1009
-    int16_t *errorBuf = NULL;
1010
-    bool bDolbyVisionRPU = false;
1011
-    uint8_t *rpuPayload = NULL;
1012
     int ret = 0;
1013
-    int inputPicNum = 1;
1014
-    x265_picture picField1, picField2;
1015
-
1016
-    if (!param->bRepeatHeaders && !param->bEnableSvtHevc)
1017
-    {
1018
-        if (api->encoder_headers(encoder, &p_nal, &nal) < 0)
1019
-        {
1020
-            x265_log(param, X265_LOG_ERROR, "Failure generating stream headers\n");
1021
-            ret = 3;
1022
-            goto fail;
1023
-        }
1024
-        else
1025
-            cliopt.totalbytes += cliopt.output->writeHeaders(p_nal, nal);
1026
-    }
1027
-
1028
-    if (param->bField && param->interlaceMode)
1029
-    {
1030
-        api->picture_init(param, &picField1);
1031
-        api->picture_init(param, &picField2);
1032
-        // return back the original height of input
1033
-        param->sourceHeight *= 2;
1034
-        api->picture_init(param, pic_in);
1035
-    }
1036
-    else
1037
-        api->picture_init(param, pic_in);
1038
-
1039
-    if (param->dolbyProfile && cliopt.dolbyVisionRpu)
1040
-    {
1041
-        rpuPayload = X265_MALLOC(uint8_t, 1024);
1042
-        pic_in->rpu.payload = rpuPayload;
1043
-        if (pic_in->rpu.payload)
1044
-            bDolbyVisionRPU = true;
1045
-    }
1046
-    
1047
-    if (cliopt.bDither)
1048
-    {
1049
-        errorBuf = X265_MALLOC(int16_t, param->sourceWidth + 1);
1050
-        if (errorBuf)
1051
-            memset(errorBuf, 0, (param->sourceWidth + 1) * sizeof(int16_t));
1052
-        else
1053
-            cliopt.bDither = false;
1054
-    }
1055
-
1056
-    // main encoder loop
1057
-    while (pic_in && !b_ctrl_c)
1058
-    {
1059
-        pic_orig.poc = (param->bField && param->interlaceMode) ? inFrameCount * 2 : inFrameCount;
1060
-        if (cliopt.qpfile)
1061
-        {
1062
-            if (!cliopt.parseQPFile(pic_orig))
1063
-            {
1064
-                x265_log(NULL, X265_LOG_ERROR, "can't parse qpfile for frame %d\n", pic_in->poc);
1065
-                fclose(cliopt.qpfile);
1066
-                cliopt.qpfile = NULL;
1067
-            }
1068
-        }
1069
-
1070
-        if (cliopt.framesToBeEncoded && inFrameCount >= cliopt.framesToBeEncoded)
1071
-            pic_in = NULL;
1072
-        else if (cliopt.input->readPicture(pic_orig))
1073
-            inFrameCount++;
1074
-        else
1075
-            pic_in = NULL;
1076
-
1077
-        if (pic_in)
1078
-        {
1079
-            if (pic_in->bitDepth > param->internalBitDepth && cliopt.bDither)
1080
-            {
1081
-                x265_dither_image(pic_in, cliopt.input->getWidth(), cliopt.input->getHeight(), errorBuf, param->internalBitDepth);
1082
-                pic_in->bitDepth = param->internalBitDepth;
1083
-            }
1084
-            /* Overwrite PTS */
1085
-            pic_in->pts = pic_in->poc;
1086
-
1087
-            // convert to field
1088
-            if (param->bField && param->interlaceMode)
1089
-            {
1090
-                int height = pic_in->height >> 1;
1091
-                
1092
-                int static bCreated = 0;
1093
-                if (bCreated == 0)
1094
-                {
1095
-                    bCreated = 1;
1096
-                    inputPicNum = 2;
1097
-                    picField1.fieldNum = 1;
1098
-                    picField2.fieldNum = 2;
1099
-
1100
-                    picField1.bitDepth = picField2.bitDepth = pic_in->bitDepth;
1101
-                    picField1.colorSpace = picField2.colorSpace = pic_in->colorSpace;
1102
-                    picField1.height = picField2.height = pic_in->height >> 1;
1103
-                    picField1.framesize = picField2.framesize = pic_in->framesize >> 1;
1104
-
1105
-                    size_t fieldFrameSize = (size_t)pic_in->framesize >> 1;
1106
-                    char* field1Buf = X265_MALLOC(char, fieldFrameSize);
1107
-                    char* field2Buf = X265_MALLOC(char, fieldFrameSize);
1108
-  
1109
-                    int stride = picField1.stride[0] = picField2.stride[0] = pic_in->stride[0];
1110
-                    uint64_t framesize = stride * (height >> x265_cli_csps[pic_in->colorSpace].height[0]);
1111
-                    picField1.planes[0] = field1Buf;
1112
-                    picField2.planes[0] = field2Buf;
1113
-                    for (int i = 1; i < x265_cli_csps[pic_in->colorSpace].planes; i++)
1114
-                    {
1115
-                        picField1.planes[i] = field1Buf + framesize;
1116
-                        picField2.planes[i] = field2Buf + framesize;
1117
-
1118
-                        stride = picField1.stride[i] = picField2.stride[i] = pic_in->stride[i];
1119
-                        framesize += (stride * (height >> x265_cli_csps[pic_in->colorSpace].height[i]));
1120
-                    }
1121
-                    assert(framesize  == picField1.framesize);
1122
-                }
1123
-
1124
-                picField1.pts = picField1.poc = pic_in->poc;
1125
-                picField2.pts = picField2.poc = pic_in->poc + 1;
1126
-
1127
-                picField1.userSEI = picField2.userSEI = pic_in->userSEI;
1128
-
1129
-                //if (pic_in->userData)
1130
-                //{
1131
-                //    // Have to handle userData here
1132
-                //}
1133
-
1134
-                if (pic_in->framesize)
1135
-                {
1136
-                    for (int i = 0; i < x265_cli_csps[pic_in->colorSpace].planes; i++)
1137
-                    {
1138
-                        char* srcP1 = (char*)pic_in->planes[i];
1139
-                        char* srcP2 = (char*)pic_in->planes[i] + pic_in->stride[i];
1140
-                        char* p1 = (char*)picField1.planes[i];
1141
-                        char* p2 = (char*)picField2.planes[i];
1142
 
1143
-                        int stride = picField1.stride[i];
1144
-
1145
-                        for (int y = 0; y < (height >> x265_cli_csps[pic_in->colorSpace].height[i]); y++)
1146
-                        {
1147
-                            memcpy(p1, srcP1, stride);
1148
-                            memcpy(p2, srcP2, stride);
1149
-                            srcP1 += 2*stride;
1150
-                            srcP2 += 2*stride;
1151
-                            p1 += stride;
1152
-                            p2 += stride;
1153
-                        }
1154
-                    }
1155
-                }
1156
-            }
1157
-
1158
-            if (bDolbyVisionRPU)
1159
-            {
1160
-                if (param->bField && param->interlaceMode)
1161
-                {
1162
-                    if (rpuParser(&picField1, cliopt.dolbyVisionRpu) > 0)
1163
-                        goto fail;
1164
-                    if (rpuParser(&picField2, cliopt.dolbyVisionRpu) > 0)
1165
-                        goto fail;
1166
-                }
1167
-                else
1168
-                {
1169
-                    if (rpuParser(pic_in, cliopt.dolbyVisionRpu) > 0)
1170
-                        goto fail;
1171
-                }
1172
-            }
1173
-        }
1174
-                
1175
-        for (int inputNum = 0; inputNum < inputPicNum; inputNum++)
1176
-        {  
1177
-            x265_picture *picInput = NULL;
1178
-            if (inputPicNum == 2)
1179
-                picInput = pic_in ? (inputNum ? &picField2 : &picField1) : NULL;
1180
-            else
1181
-                picInput = pic_in;
1182
-
1183
-            int numEncoded = api->encoder_encode( encoder, &p_nal, &nal, picInput, pic_recon );
1184
-            if( numEncoded < 0 )
1185
-            {
1186
-                b_ctrl_c = 1;
1187
-                ret = 4;
1188
-                break;
1189
-            }
1190
-
1191
-            if (reconPlay && numEncoded)
1192
-                reconPlay->writePicture(*pic_recon);
1193
-
1194
-            outFrameCount += numEncoded;
1195
-
1196
-            if (numEncoded && pic_recon && cliopt.recon)
1197
-                cliopt.recon->writePicture(pic_out);
1198
-            if (nal)
1199
-            {
1200
-                cliopt.totalbytes += cliopt.output->writeFrame(p_nal, nal, pic_out);
1201
-                if (pts_queue)
1202
-                {
1203
-                    pts_queue->push(-pic_out.pts);
1204
-                    if (pts_queue->size() > 2)
1205
-                        pts_queue->pop();
1206
-                }
1207
-            }
1208
-            cliopt.printStatus( outFrameCount );
1209
-        }
1210
-    }
1211
-
1212
-    /* Flush the encoder */
1213
-    while (!b_ctrl_c)
1214
+    AbrEncoder* abrEnc = new AbrEncoder(cliopt, numEncodes, ret);
1215
+    int threadsActive = abrEnc->m_numActiveEncodes.get();
1216
+    while (threadsActive)
1217
     {
1218
-        int numEncoded = api->encoder_encode(encoder, &p_nal, &nal, NULL, pic_recon);
1219
-        if (numEncoded < 0)
1220
-        {
1221
-            ret = 4;
1222
-            break;
1223
-        }
1224
-
1225
-        if (reconPlay && numEncoded)
1226
-            reconPlay->writePicture(*pic_recon);
1227
-
1228
-        outFrameCount += numEncoded;
1229
-        if (numEncoded && pic_recon && cliopt.recon)
1230
-            cliopt.recon->writePicture(pic_out);
1231
-        if (nal)
1232
+        threadsActive = abrEnc->m_numActiveEncodes.waitForChange(threadsActive);
1233
+        for (uint8_t idx = 0; idx < numEncodes; idx++)
1234
         {
1235
-            cliopt.totalbytes += cliopt.output->writeFrame(p_nal, nal, pic_out);
1236
-            if (pts_queue)
1237
+            if (abrEnc->m_passEnc[idx]->m_ret)
1238
             {
1239
-                pts_queue->push(-pic_out.pts);
1240
-                if (pts_queue->size() > 2)
1241
-                    pts_queue->pop();
1242
-            }
1243
+                if (isAbrLadder)
1244
+                    x265_log(NULL, X265_LOG_INFO, "Error generating ABR-ladder \n");
1245
+                ret = abrEnc->m_passEnc[idx]->m_ret;
1246
+                threadsActive = 0;
1247
+                break;
1248
+            }
1249
         }
1250
-
1251
-        cliopt.printStatus(outFrameCount);
1252
-
1253
-        if (!numEncoded)
1254
-            break;
1255
-    }
1256
-  
1257
-    if (bDolbyVisionRPU)
1258
-    {
1259
-        if(fgetc(cliopt.dolbyVisionRpu) != EOF)
1260
-            x265_log(NULL, X265_LOG_WARNING, "Dolby Vision RPU count is greater than frame count\n");
1261
-        x265_log(NULL, X265_LOG_INFO, "VES muxing with Dolby Vision RPU file successful\n");
1262
     }
1263
 
1264
-    /* clear progress report */
1265
-    if (cliopt.bProgress)
1266
-        fprintf(stderr, "%*s\r", 80, " ");
1267
-
1268
-fail:
1269
-
1270
-    delete reconPlay;
1271
-
1272
-    api->encoder_get_stats(encoder, &stats, sizeof(stats));
1273
-    if (param->csvfn && !b_ctrl_c)
1274
-#if ENABLE_LIBVMAF
1275
-        api->vmaf_encoder_log(encoder, argc, argv, param, vmafdata);
1276
-#else
1277
-        api->encoder_log(encoder, argc, argv);
1278
-#endif
1279
-    api->encoder_close(encoder);
1280
-
1281
-    int64_t second_largest_pts = 0;
1282
-    int64_t largest_pts = 0;
1283
-    if (pts_queue && pts_queue->size() >= 2)
1284
-    {
1285
-        second_largest_pts = -pts_queue->top();
1286
-        pts_queue->pop();
1287
-        largest_pts = -pts_queue->top();
1288
-        pts_queue->pop();
1289
-        delete pts_queue;
1290
-        pts_queue = NULL;
1291
-    }
1292
-    cliopt.output->closeFile(largest_pts, second_largest_pts);
1293
-
1294
-    if (b_ctrl_c)
1295
-        general_log(param, NULL, X265_LOG_INFO, "aborted at input frame %d, output frame %d\n",
1296
-                    cliopt.seek + inFrameCount, stats.encodedPictureCount);
1297
-
1298
-    api->cleanup(); /* Free library singletons */
1299
-
1300
-    cliopt.destroy();
1301
+    abrEnc->destroy();
1302
+    delete abrEnc;
1303
 
1304
-    api->param_free(param);
1305
+    for (uint8_t idx = 0; idx < numEncodes; idx++)
1306
+        cliopt[idx].destroy();
1307
 
1308
-    X265_FREE(errorBuf);
1309
-    X265_FREE(rpuPayload);
1310
+    delete[] cliopt;
1311
 
1312
     SetConsoleTitle(orgConsoleTitle);
1313
     SetThreadExecutionState(ES_CONTINUOUS);
1314
x265_3.3.tar.gz/source/x265.h -> x265_3.4.tar.gz/source/x265.h Changed
60
 
1
@@ -134,6 +134,7 @@
2
     int     ctuDistortionRefine;
3
     int     rightOffset;
4
     int     bottomOffset;
5
+    int     frameDuplication;
6
 }x265_analysis_validate;
7
 
8
 /* Stores intra analysis data for a single frame. This struct needs better packing */
9
@@ -304,6 +305,7 @@
10
     double           totalFrameTime;
11
     double           vmafFrameScore;
12
     double           bufferFillFinal;
13
+    double           unclippedBufferFillFinal;
14
 } x265_frame_stats;
15
 
16
 typedef struct x265_ctu_info_t
17
@@ -1255,9 +1257,9 @@
18
      * skip blocks. Default is disabled */
19
     int       bEnableEarlySkip;
20
 
21
-    /* Enable early CU size decisions to avoid recursing to higher depths. 
22
+    /* Enable early CU size decisions to avoid recursing to higher depths.
23
      * Default is enabled */
24
-    int bEnableRecursionSkip;
25
+    int       recursionSkipMode;
26
 
27
     /* Use a faster search method to find the best intra mode. Default is 0 */
28
     int       bEnableFastIntra;
29
@@ -1857,7 +1859,7 @@
30
     double    edgeTransitionThreshold;
31
 
32
     /* Enables histogram based scenecut detection algorithm to detect scenecuts. Default disabled */
33
-    int      bHistBasedSceneCut;
34
+    int       bHistBasedSceneCut;
35
 
36
     /* Enable HME search ranges for L0, L1 and L2 respectively. */
37
     int       hmeRange[3];
38
@@ -1874,7 +1876,7 @@
39
     * analysis information stored in analysis-save. Higher the refine level higher
40
     * the information stored. Default is 5 */
41
     int       analysisSaveReuseLevel;
42
-    
43
+
44
     /* A value between 1 and 10 (both inclusive) determines the level of
45
     * analysis information reused in analysis-load. Higher the refine level higher
46
     * the information reused. Default is 5 */
47
@@ -1901,6 +1903,12 @@
48
     * info is available from the corresponding analysis-save. */
49
 
50
     int      confWinBottomOffset;
51
+
52
+    /* Edge variance threshold for quad tree establishment. */
53
+    float    edgeVarThreshold;
54
+
55
+    /* Maxrate that could be signaled to the decoder. Default 0. API only. */
56
+    int      decoderVbvMaxRate;
57
 } x265_param;
58
 
59
 /* x265_param_alloc:
60
x265_3.4.tar.gz/source/x265cli.cpp Added
1065
 
1
@@ -0,0 +1,1062 @@
2
+/*****************************************************************************
3
+ * Copyright (C) 2013-2020 MulticoreWare, Inc
4
+ *
5
+ * Authors: Steve Borho <steve@borho.org>
6
+ *          Min Chen <chenm003@163.com>
7
+ *
8
+ * This program is free software; you can redistribute it and/or modify
9
+ * it under the terms of the GNU General Public License as published by
10
+ * the Free Software Foundation; either version 2 of the License, or
11
+ * (at your option) any later version.
12
+ *
13
+ * This program is distributed in the hope that it will be useful,
14
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
15
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
16
+ * GNU General Public License for more details.
17
+ *
18
+ * You should have received a copy of the GNU General Public License
19
+ * along with this program; if not, write to the Free Software
20
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
21
+ *
22
+ * This program is also available under a commercial proprietary license.
23
+ * For more information, contact us at license @ x265.com.
24
+ *****************************************************************************/
25
+#if _MSC_VER
26
+#pragma warning(disable: 4127) // conditional expression is constant, yes I know
27
+#endif
28
+
29
+#include "x265cli.h"
30
+#include "svt.h"
31
+
32
+#define START_CODE 0x00000001
33
+#define START_CODE_BYTES 4
34
+
35
+#ifdef __cplusplus
36
+namespace X265_NS {
37
+#endif
38
+
39
+    static void printVersion(x265_param *param, const x265_api* api)
40
+    {
41
+        x265_log(param, X265_LOG_INFO, "HEVC encoder version %s\n", api->version_str);
42
+        x265_log(param, X265_LOG_INFO, "build info %s\n", api->build_info_str);
43
+    }
44
+
45
+    static void showHelp(x265_param *param)
46
+    {
47
+        int level = param->logLevel;
48
+
49
+#define OPT(value) (value ? "enabled" : "disabled")
50
+#define H0 printf
51
+#define H1 if (level >= X265_LOG_DEBUG) printf
52
+
53
+        H0("\nSyntax: x265 [options] infile [-o] outfile\n");
54
+        H0("    infile can be YUV or Y4M\n");
55
+        H0("    outfile is raw HEVC bitstream\n");
56
+        H0("\nExecutable Options:\n");
57
+        H0("-h/--help                        Show this help text and exit\n");
58
+        H0("   --fullhelp                    Show all options and exit\n");
59
+        H0("-V/--version                     Show version info and exit\n");
60
+        H0("\nOutput Options:\n");
61
+        H0("-o/--output <filename>           Bitstream output file name\n");
62
+        H0("-D/--output-depth 8|10|12        Output bit depth (also internal bit depth). Default %d\n", param->internalBitDepth);
63
+        H0("   --log-level <string>          Logging level: none error warning info debug full. Default %s\n", X265_NS::logLevelNames[param->logLevel + 1]);
64
+        H0("   --no-progress                 Disable CLI progress reports\n");
65
+        H0("   --csv <filename>              Comma separated log file, if csv-log-level > 0 frame level statistics, else one line per run\n");
66
+        H0("   --csv-log-level <integer>     Level of csv logging, if csv-log-level > 0 frame level statistics, else one line per run: 0-2\n");
67
+        H0("\nInput Options:\n");
68
+        H0("   --input <filename>            Raw YUV or Y4M input file name. `-` for stdin\n");
69
+        H1("   --y4m                         Force parsing of input stream as YUV4MPEG2 regardless of file extension\n");
70
+        H0("   --fps <float|rational>        Source frame rate (float or num/denom), auto-detected if Y4M\n");
71
+        H0("   --input-res WxH               Source picture size [w x h], auto-detected if Y4M\n");
72
+        H1("   --input-depth <integer>       Bit-depth of input file. Default 8\n");
73
+        H1("   --input-csp <string>          Chroma subsampling, auto-detected if Y4M\n");
74
+        H1("                                 0 - i400 (4:0:0 monochrome)\n");
75
+        H1("                                 1 - i420 (4:2:0 default)\n");
76
+        H1("                                 2 - i422 (4:2:2)\n");
77
+        H1("                                 3 - i444 (4:4:4)\n");
78
+#if ENABLE_HDR10_PLUS
79
+        H0("   --dhdr10-info <filename>      JSON file containing the Creative Intent Metadata to be encoded as Dynamic Tone Mapping\n");
80
+        H0("   --[no-]dhdr10-opt             Insert tone mapping SEI only for IDR frames and when the tone mapping information changes. Default disabled\n");
81
+#endif
82
+        H0("   --dolby-vision-profile <float|integer> Specifies Dolby Vision profile ID. Currently only profile 5, profile 8.1 and profile 8.2 enabled. Specified as '5' or '50'. Default 0 (disabled).\n");
83
+        H0("   --dolby-vision-rpu <filename> File containing Dolby Vision RPU metadata.\n"
84
+            "                                 If given, x265's Dolby Vision metadata parser will fill the RPU field of input pictures with the metadata read from the file. Default NULL(disabled).\n");
85
+        H0("   --nalu-file <filename>        Text file containing SEI messages in the following format : <POC><space><PREFIX><space><NAL UNIT TYPE>/<SEI TYPE><space><SEI Payload>\n");
86
+        H0("-f/--frames <integer>            Maximum number of frames to encode. Default all\n");
87
+        H0("   --seek <integer>              First frame to encode\n");
88
+        H1("   --[no-]interlace <bff|tff>    Indicate input pictures are interlace fields in temporal order. Default progressive\n");
89
+        H0("   --[no-]field                  Enable or disable field coding. Default %s\n", OPT(param->bField));
90
+        H1("   --dither                      Enable dither if downscaling to 8 bit pixels. Default disabled\n");
91
+        H0("   --[no-]copy-pic               Copy buffers of input picture in frame. Default %s\n", OPT(param->bCopyPicToFrame));
92
+        H0("\nQuality reporting metrics:\n");
93
+        H0("   --[no-]ssim                   Enable reporting SSIM metric scores. Default %s\n", OPT(param->bEnableSsim));
94
+        H0("   --[no-]psnr                   Enable reporting PSNR metric scores. Default %s\n", OPT(param->bEnablePsnr));
95
+        H0("\nProfile, Level, Tier:\n");
96
+        H0("-P/--profile <string>            Enforce an encode profile: main, main10, mainstillpicture\n");
97
+        H0("   --level-idc <integer|float>   Force a minimum required decoder level (as '5.0' or '50')\n");
98
+        H0("   --[no-]high-tier              If a decoder level is specified, this modifier selects High tier of that level\n");
99
+        H0("   --uhd-bd                      Enable UHD Bluray compatibility support\n");
100
+        H0("   --[no-]allow-non-conformance  Allow the encoder to generate profile NONE bitstreams. Default %s\n", OPT(param->bAllowNonConformance));
101
+        H0("\nThreading, performance:\n");
102
+        H0("   --pools <integer,...>         Comma separated thread count per thread pool (pool per NUMA node)\n");
103
+        H0("                                 '-' implies no threads on node, '+' implies one thread per core on node\n");
104
+        H0("-F/--frame-threads <integer>     Number of concurrently encoded frames. 0: auto-determined by core count\n");
105
+        H0("   --[no-]wpp                    Enable Wavefront Parallel Processing. Default %s\n", OPT(param->bEnableWavefront));
106
+        H0("   --[no-]slices <integer>       Enable Multiple Slices feature. Default %d\n", param->maxSlices);
107
+        H0("   --[no-]pmode                  Parallel mode analysis. Default %s\n", OPT(param->bDistributeModeAnalysis));
108
+        H0("   --[no-]pme                    Parallel motion estimation. Default %s\n", OPT(param->bDistributeMotionEstimation));
109
+        H0("   --[no-]asm <bool|int|string>  Override CPU detection. Default: auto\n");
110
+        H0("\nPresets:\n");
111
+        H0("-p/--preset <string>             Trade off performance for compression efficiency. Default medium\n");
112
+        H0("                                 ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, veryslow, or placebo\n");
113
+        H0("-t/--tune <string>               Tune the settings for a particular type of source or situation:\n");
114
+        H0("                                 psnr, ssim, grain, zerolatency, fastdecode\n");
115
+        H0("\nQuad-Tree size and depth:\n");
116
+        H0("-s/--ctu <64|32|16>              Maximum CU size (WxH). Default %d\n", param->maxCUSize);
117
+        H0("   --min-cu-size <64|32|16|8>    Minimum CU size (WxH). Default %d\n", param->minCUSize);
118
+        H0("   --max-tu-size <32|16|8|4>     Maximum TU size (WxH). Default %d\n", param->maxTUSize);
119
+        H0("   --tu-intra-depth <integer>    Max TU recursive depth for intra CUs. Default %d\n", param->tuQTMaxIntraDepth);
120
+        H0("   --tu-inter-depth <integer>    Max TU recursive depth for inter CUs. Default %d\n", param->tuQTMaxInterDepth);
121
+        H0("   --limit-tu <0..4>             Enable early exit from TU recursion for inter coded blocks. Default %d\n", param->limitTU);
122
+        H0("\nAnalysis:\n");
123
+        H0("   --rd <1..6>                   Level of RDO in mode decision 1:least....6:full RDO. Default %d\n", param->rdLevel);
124
+        H0("   --[no-]psy-rd <0..5.0>        Strength of psycho-visual rate distortion optimization, 0 to disable. Default %.1f\n", param->psyRd);
125
+        H0("   --[no-]rdoq-level <0|1|2>     Level of RDO in quantization 0:none, 1:levels, 2:levels & coding groups. Default %d\n", param->rdoqLevel);
126
+        H0("   --[no-]psy-rdoq <0..50.0>     Strength of psycho-visual optimization in RDO quantization, 0 to disable. Default %.1f\n", param->psyRdoq);
127
+        H0("   --dynamic-rd <0..4.0>         Strength of dynamic RD, 0 to disable. Default %.2f\n", param->dynamicRd);
128
+        H0("   --[no-]ssim-rd                Enable ssim rate distortion optimization, 0 to disable. Default %s\n", OPT(param->bSsimRd));
129
+        H0("   --[no-]rd-refine              Enable QP based RD refinement for rd levels 5 and 6. Default %s\n", OPT(param->bEnableRdRefine));
130
+        H0("   --[no-]early-skip             Enable early SKIP detection. Default %s\n", OPT(param->bEnableEarlySkip));
131
+        H0("   --rskip <mode>                Set mode for early exit from recursion. Mode 1: exit using rdcost & CU homogenity. Mode 2: exit using CU edge density.\n"
132
+            "                                 Mode 0: disabled. Default %d\n", param->recursionSkipMode);
133
+        H1("   --rskip-edge-threshold        Threshold in terms of percentage (integer of range [0,100]) for minimum edge density in CUs used to prun the recursion depth. Applicable only for rskip mode 2. Value is preset dependent. Default: %.f\n", param->edgeVarThreshold*100.0f);
134
+        H1("   --[no-]tskip-fast             Enable fast intra transform skipping. Default %s\n", OPT(param->bEnableTSkipFast));
135
+        H1("   --[no-]splitrd-skip           Enable skipping split RD analysis when sum of split CU rdCost larger than one split CU rdCost for Intra CU. Default %s\n", OPT(param->bEnableSplitRdSkip));
136
+        H1("   --nr-intra <integer>          An integer value in range of 0 to 2000, which denotes strength of noise reduction in intra CUs. Default 0\n");
137
+        H1("   --nr-inter <integer>          An integer value in range of 0 to 2000, which denotes strength of noise reduction in inter CUs. Default 0\n");
138
+        H0("   --ctu-info <integer>          Enable receiving ctu information asynchronously and determine reaction to the CTU information (0, 1, 2, 4, 6) Default 0\n"
139
+            "                                    - 1: force the partitions if CTU information is present\n"
140
+            "                                    - 2: functionality of (1) and reduce qp if CTU information has changed\n"
141
+            "                                    - 4: functionality of (1) and force Inter modes when CTU Information has changed, merge/skip otherwise\n"
142
+            "                                    Enable this option only when planning to invoke the API function x265_encoder_ctu_info to copy ctu-info asynchronously\n");
143
+        H0("\nCoding tools:\n");
144
+        H0("-w/--[no-]weightp                Enable weighted prediction in P slices. Default %s\n", OPT(param->bEnableWeightedPred));
145
+        H0("   --[no-]weightb                Enable weighted prediction in B slices. Default %s\n", OPT(param->bEnableWeightedBiPred));
146
+        H0("   --[no-]cu-lossless            Consider lossless mode in CU RDO decisions. Default %s\n", OPT(param->bCULossless));
147
+        H0("   --[no-]signhide               Hide sign bit of one coeff per TU (rdo). Default %s\n", OPT(param->bEnableSignHiding));
148
+        H1("   --[no-]tskip                  Enable intra 4x4 transform skipping. Default %s\n", OPT(param->bEnableTransformSkip));
149
+        H0("\nTemporal / motion search options:\n");
150
+        H0("   --max-merge <1..5>            Maximum number of merge candidates. Default %d\n", param->maxNumMergeCand);
151
+        H0("   --ref <integer>               max number of L0 references to be allowed (1 .. 16) Default %d\n", param->maxNumReferences);
152
+        H0("   --limit-refs <0|1|2|3>        Limit references per depth (1) or CU (2) or both (3). Default %d\n", param->limitReferences);
153
+        H0("   --me <string>                 Motion search method dia hex umh star full. Default %d\n", param->searchMethod);
154
+        H0("-m/--subme <integer>             Amount of subpel refinement to perform (0:least .. 7:most). Default %d \n", param->subpelRefine);
155
+        H0("   --merange <integer>           Motion search range. Default %d\n", param->searchRange);
156
+        H0("   --[no-]rect                   Enable rectangular motion partitions Nx2N and 2NxN. Default %s\n", OPT(param->bEnableRectInter));
157
+        H0("   --[no-]amp                    Enable asymmetric motion partitions, requires --rect. Default %s\n", OPT(param->bEnableAMP));
158
+        H0("   --[no-]limit-modes            Limit rectangular and asymmetric motion predictions. Default %d\n", param->limitModes);
159
+        H1("   --[no-]temporal-mvp           Enable temporal MV predictors. Default %s\n", OPT(param->bEnableTemporalMvp));
160
+        H1("   --[no-]hme                    Enable Hierarchical Motion Estimation. Default %s\n", OPT(param->bEnableHME));
161
+        H1("   --hme-search <string>         Motion search-method for HME L0,L1 and L2. Default(L0,L1,L2) is %d,%d,%d\n", param->hmeSearchMethod[0], param->hmeSearchMethod[1], param->hmeSearchMethod[2]);
162
+        H1("   --hme-range <int>,<int>,<int> Motion search-range for HME L0,L1 and L2. Default(L0,L1,L2) is %d,%d,%d\n", param->hmeRange[0], param->hmeRange[1], param->hmeRange[2]);
163
+        H0("\nSpatial / intra options:\n");
164
+        H0("   --[no-]strong-intra-smoothing Enable strong intra smoothing for 32x32 blocks. Default %s\n", OPT(param->bEnableStrongIntraSmoothing));
165
+        H0("   --[no-]constrained-intra      Constrained intra prediction (use only intra coded reference pixels) Default %s\n", OPT(param->bEnableConstrainedIntra));
166
+        H0("   --[no-]b-intra                Enable intra in B frames in veryslow presets. Default %s\n", OPT(param->bIntraInBFrames));
167
+        H0("   --[no-]fast-intra             Enable faster search method for angular intra predictions. Default %s\n", OPT(param->bEnableFastIntra));
168
+        H0("   --rdpenalty <0..2>            penalty for 32x32 intra TU in non-I slices. 0:disabled 1:RD-penalty 2:maximum. Default %d\n", param->rdPenalty);
169
+        H0("\nSlice decision options:\n");
170
+        H0("   --[no-]open-gop               Enable open-GOP, allows I slices to be non-IDR. Default %s\n", OPT(param->bOpenGOP));
171
+        H0("-I/--keyint <integer>            Max IDR period in frames. -1 for infinite-gop. Default %d\n", param->keyframeMax);
172
+        H0("-i/--min-keyint <integer>        Scenecuts closer together than this are coded as I, not IDR. Default: auto\n");
173
+        H0("   --gop-lookahead <integer>     Extends gop boundary if a scenecut is found within this from keyint boundary. Default 0\n");
174
+        H0("   --no-scenecut                 Disable adaptive I-frame decision\n");
175
+        H0("   --scenecut <integer>          How aggressively to insert extra I-frames. Default %d\n", param->scenecutThreshold);
176
+        H1("   --scenecut-bias <0..100.0>    Bias for scenecut detection. Default %.2f\n", param->scenecutBias);
177
+        H0("   --hist-scenecut               Enables histogram based scene-cut detection using histogram based algorithm.\n");
178
+        H0("   --no-hist-scenecut            Disables histogram based scene-cut detection using histogram based algorithm.\n");
179
+        H1("   --hist-threshold <0.0..2.0>   Luma Edge histogram's Normalized SAD threshold for histogram based scenecut detection Default %.2f\n", param->edgeTransitionThreshold);
180
+        H0("   --[no-]fades                  Enable detection and handling of fade-in regions. Default %s\n", OPT(param->bEnableFades));
181
+        H1("   --[no-]scenecut-aware-qp      Enable increasing QP for frames inside the scenecut window after scenecut. Default %s\n", OPT(param->bEnableSceneCutAwareQp));
182
+        H1("   --scenecut-window <0..1000>   QP incremental duration(in milliseconds) when scenecut-aware-qp is enabled. Default %d\n", param->scenecutWindow);
183
+        H1("   --max-qp-delta <0..10>        QP offset to increment with base QP for inter-frames. Default %d\n", param->maxQpDelta);
184
+        H0("   --radl <integer>              Number of RADL pictures allowed in front of IDR. Default %d\n", param->radl);
185
+        H0("   --intra-refresh               Use Periodic Intra Refresh instead of IDR frames\n");
186
+        H0("   --rc-lookahead <integer>      Number of frames for frame-type lookahead (determines encoder latency) Default %d\n", param->lookaheadDepth);
187
+        H1("   --lookahead-slices <0..16>    Number of slices to use per lookahead cost estimate. Default %d\n", param->lookaheadSlices);
188
+        H0("   --lookahead-threads <integer> Number of threads to be dedicated to perform lookahead only. Default %d\n", param->lookaheadThreads);
189
+        H0("-b/--bframes <0..16>             Maximum number of consecutive b-frames. Default %d\n", param->bframes);
190
+        H1("   --bframe-bias <integer>       Bias towards B frame decisions. Default %d\n", param->bFrameBias);
191
+        H0("   --b-adapt <0..2>              0 - none, 1 - fast, 2 - full (trellis) adaptive B frame scheduling. Default %d\n", param->bFrameAdaptive);
192
+        H0("   --[no-]b-pyramid              Use B-frames as references. Default %s\n", OPT(param->bBPyramid));
193
+        H1("   --qpfile <string>             Force frametypes and QPs for some or all frames\n");
194
+        H1("                                 Format of each line: framenumber frametype QP\n");
195
+        H1("                                 QP is optional (none lets x265 choose). Frametypes: I,i,K,P,B,b.\n");
196
+        H1("                                 QPs are restricted by qpmin/qpmax.\n");
197
+        H1("   --force-flush <integer>       Force the encoder to flush frames. Default %d\n", param->forceFlush);
198
+        H1("                                 0 - flush the encoder only when all the input pictures are over.\n");
199
+        H1("                                 1 - flush all the frames even when the input is not over. Slicetype decision may change with this option.\n");
200
+        H1("                                 2 - flush the slicetype decided frames only.\n");
201
+        H0("   --[no-]-hrd-concat            Set HRD concatenation flag for the first keyframe in the buffering period SEI. Default %s\n", OPT(param->bEnableHRDConcatFlag));
202
+        H0("\nRate control, Adaptive Quantization:\n");
203
+        H0("   --bitrate <integer>           Target bitrate (kbps) for ABR (implied). Default %d\n", param->rc.bitrate);
204
+        H1("-q/--qp <integer>                QP for P slices in CQP mode (implied). --ipratio and --pbration determine other slice QPs\n");
205
+        H0("   --crf <float>                 Quality-based VBR (0-51). Default %.1f\n", param->rc.rfConstant);
206
+        H1("   --[no-]lossless               Enable lossless: bypass transform, quant and loop filters globally. Default %s\n", OPT(param->bLossless));
207
+        H1("   --crf-max <float>             With CRF+VBV, limit RF to this value. Default %f\n", param->rc.rfConstantMax);
208
+        H1("                                 May cause VBV underflows!\n");
209
+        H1("   --crf-min <float>             With CRF+VBV, limit RF to this value. Default %f\n", param->rc.rfConstantMin);
210
+        H1("                                 this specifies a minimum rate factor value for encode!\n");
211
+        H0("   --vbv-maxrate <integer>       Max local bitrate (kbit/s). Default %d\n", param->rc.vbvMaxBitrate);
212
+        H0("   --vbv-bufsize <integer>       Set size of the VBV buffer (kbit). Default %d\n", param->rc.vbvBufferSize);
213
+        H0("   --vbv-init <float>            Initial VBV buffer occupancy (fraction of bufsize or in kbits). Default %.2f\n", param->rc.vbvBufferInit);
214
+        H0("   --vbv-end <float>             Final VBV buffer emptiness (fraction of bufsize or in kbits). Default 0 (disabled)\n");
215
+        H0("   --vbv-end-fr-adj <float>      Frame from which qp has to be adjusted to achieve final decode buffer emptiness. Default 0\n");
216
+        H0("   --chunk-start <integer>       First frame of the chunk. Default 0 (disabled)\n");
217
+        H0("   --chunk-end <integer>         Last frame of the chunk. Default 0 (disabled)\n");
218
+        H0("   --pass                        Multi pass rate control.\n"
219
+            "                                   - 1 : First pass, creates stats file\n"
220
+            "                                   - 2 : Last pass, does not overwrite stats file\n"
221
+            "                                   - 3 : Nth pass, overwrites stats file\n");
222
+        H0("   --[no-]multi-pass-opt-analysis   Refine analysis in 2 pass based on analysis information from pass 1\n");
223
+        H0("   --[no-]multi-pass-opt-distortion Use distortion of CTU from pass 1 to refine qp in 2 pass\n");
224
+        H0("   --stats                       Filename for stats file in multipass pass rate control. Default x265_2pass.log\n");
225
+        H0("   --[no-]analyze-src-pics       Motion estimation uses source frame planes. Default disable\n");
226
+        H0("   --[no-]slow-firstpass         Enable a slow first pass in a multipass rate control mode. Default %s\n", OPT(param->rc.bEnableSlowFirstPass));
227
+        H0("   --[no-]strict-cbr             Enable stricter conditions and tolerance for bitrate deviations in CBR mode. Default %s\n", OPT(param->rc.bStrictCbr));
228
+        H0("   --analysis-save <filename>    Dump analysis info into the specified file. Default Disabled\n");
229
+        H0("   --analysis-load <filename>    Load analysis buffers from the file specified. Default Disabled\n");
230
+        H0("   --analysis-reuse-file <filename>    Specify file name used for either dumping or reading analysis data. Deault x265_analysis.dat\n");
231
+        H0("   --analysis-reuse-level <1..10>      Level of analysis reuse indicates amount of info stored/reused in save/load mode, 1:least..10:most. Now deprecated. Default %d\n", param->analysisReuseLevel);
232
+        H0("   --analysis-save-reuse-level <1..10> Indicates the amount of analysis info stored in save mode, 1:least..10:most. Default %d\n", param->analysisSaveReuseLevel);
233
+        H0("   --analysis-load-reuse-level <1..10> Indicates the amount of analysis info reused in load mode, 1:least..10:most. Default %d\n", param->analysisLoadReuseLevel);
234
+        H0("   --refine-analysis-type <string>     Reuse anlaysis information received through API call. Supported options are avc and hevc. Default disabled - %d\n", param->bAnalysisType);
235
+        H0("   --scale-factor <int>          Specify factor by which input video is scaled down for analysis save mode. Default %d\n", param->scaleFactor);
236
+        H0("   --refine-intra <0..4>         Enable intra refinement for encode that uses analysis-load.\n"
237
+            "                                    - 0 : Forces both mode and depth from the save encode.\n"
238
+            "                                    - 1 : Functionality of (0) + evaluate all intra modes at min-cu-size's depth when current depth is one smaller than min-cu-size's depth.\n"
239
+            "                                    - 2 : Functionality of (1) + irrespective of size evaluate all angular modes when the save encode decides the best mode as angular.\n"
240
+            "                                    - 3 : Functionality of (1) + irrespective of size evaluate all intra modes.\n"
241
+            "                                    - 4 : Re-evaluate all intra blocks, does not reuse data from save encode.\n"
242
+            "                                Default:%d\n", param->intraRefine);
243
+        H0("   --refine-inter <0..3>         Enable inter refinement for encode that uses analysis-load.\n"
244
+            "                                    - 0 : Forces both mode and depth from the save encode.\n"
245
+            "                                    - 1 : Functionality of (0) + evaluate all inter modes at min-cu-size's depth when current depth is one smaller than\n"
246
+            "                                          min-cu-size's depth. When save encode decides the current block as skip(for all sizes) evaluate skip/merge.\n"
247
+            "                                    - 2 : Functionality of (1) + irrespective of size restrict the modes evaluated when specific modes are decided as the best mode by the save encode.\n"
248
+            "                                    - 3 : Functionality of (1) + irrespective of size evaluate all inter modes.\n"
249
+            "                                Default:%d\n", param->interRefine);
250
+        H0("   --[no-]dynamic-refine         Dynamically changes refine-inter level for each CU. Default %s\n", OPT(param->bDynamicRefine));
251
+        H0("   --refine-mv <1..3>            Enable mv refinement for load mode. Default %d\n", param->mvRefine);
252
+        H0("   --refine-ctu-distortion       Store/normalize ctu distortion in analysis-save/load.\n"
253
+            "                                    - 0 : Disabled.\n"
254
+            "                                    - 1 : Store/Load ctu distortion to/from the file specified in analysis-save/load.\n"
255
+            "                                Default 0 - Disabled\n");
256
+        H0("   --aq-mode <integer>           Mode for Adaptive Quantization - 0:none 1:uniform AQ 2:auto variance 3:auto variance with bias to dark scenes 4:auto variance with edge information. Default %d\n", param->rc.aqMode);
257
+        H0("   --[no-]hevc-aq                Mode for HEVC Adaptive Quantization. Default %s\n", OPT(param->rc.hevcAq));
258
+        H0("   --aq-strength <float>         Reduces blocking and blurring in flat and textured areas (0 to 3.0). Default %.2f\n", param->rc.aqStrength);
259
+        H0("   --qp-adaptation-range <float> Delta QP range by QP adaptation based on a psycho-visual model (1.0 to 6.0). Default %.2f\n", param->rc.qpAdaptationRange);
260
+        H0("   --[no-]aq-motion              Block level QP adaptation based on the relative motion between the block and the frame. Default %s\n", OPT(param->bAQMotion));
261
+        H0("   --qg-size <int>               Specifies the size of the quantization group (64, 32, 16, 8). Default %d\n", param->rc.qgSize);
262
+        H0("   --[no-]cutree                 Enable cutree for Adaptive Quantization. Default %s\n", OPT(param->rc.cuTree));
263
+        H0("   --[no-]rc-grain               Enable ratecontrol mode to handle grains specifically. turned on with tune grain. Default %s\n", OPT(param->rc.bEnableGrain));
264
+        H1("   --ipratio <float>             QP factor between I and P. Default %.2f\n", param->rc.ipFactor);
265
+        H1("   --pbratio <float>             QP factor between P and B. Default %.2f\n", param->rc.pbFactor);
266
+        H1("   --qcomp <float>               Weight given to predicted complexity. Default %.2f\n", param->rc.qCompress);
267
+        H1("   --qpstep <integer>            The maximum single adjustment in QP allowed to rate control. Default %d\n", param->rc.qpStep);
268
+        H1("   --qpmin <integer>             sets a hard lower limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMin);
269
+        H1("   --qpmax <integer>             sets a hard upper limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMax);
270
+        H0("   --[no-]const-vbv              Enable consistent vbv. turned on with tune grain. Default %s\n", OPT(param->rc.bEnableConstVbv));
271
+        H1("   --cbqpoffs <integer>          Chroma Cb QP Offset [-12..12]. Default %d\n", param->cbQpOffset);
272
+        H1("   --crqpoffs <integer>          Chroma Cr QP Offset [-12..12]. Default %d\n", param->crQpOffset);
273
+        H1("   --scaling-list <string>       Specify a file containing HM style quant scaling lists or 'default' or 'off'. Default: off\n");
274
+        H1("   --zones <zone0>/<zone1>/...   Tweak the bitrate of regions of the video\n");
275
+        H1("                                 Each zone is of the form\n");
276
+        H1("                                   <start frame>,<end frame>,<option>\n");
277
+        H1("                                   where <option> is either\n");
278
+        H1("                                       q=<integer> (force QP)\n");
279
+        H1("                                   or  b=<float> (bitrate multiplier)\n");
280
+        H0("   --zonefile <filename>         Zone file containing the zone boundaries and the parameters to be reconfigured.\n");
281
+        H1("   --lambda-file <string>        Specify a file containing replacement values for the lambda tables\n");
282
+        H1("                                 MAX_MAX_QP+1 floats for lambda table, then again for lambda2 table\n");
283
+        H1("                                 Blank lines and lines starting with hash(#) are ignored\n");
284
+        H1("                                 Comma is considered to be white-space\n");
285
+        H0("   --max-ausize-factor <float>   This value controls the maximum AU size defined in specification.\n");
286
+        H0("                                 It represents the percentage of maximum AU size used. Default %.1f\n", param->maxAUSizeFactor);
287
+        H0("\nLoop filters (deblock and SAO):\n");
288
+        H0("   --[no-]deblock                Enable Deblocking Loop Filter, optionally specify tC:Beta offsets Default %s\n", OPT(param->bEnableLoopFilter));
289
+        H0("   --[no-]sao                    Enable Sample Adaptive Offset. Default %s\n", OPT(param->bEnableSAO));
290
+        H1("   --[no-]sao-non-deblock        Use non-deblocked pixels, else right/bottom boundary areas skipped. Default %s\n", OPT(param->bSaoNonDeblocked));
291
+        H0("   --[no-]limit-sao              Limit Sample Adaptive Offset types. Default %s\n", OPT(param->bLimitSAO));
292
+        H0("   --selective-sao <int>         Enable slice-level SAO filter. Default %d\n", param->selectiveSAO);
293
+        H0("\nVUI options:\n");
294
+        H0("   --sar <width:height|int>      Sample Aspect Ratio, the ratio of width to height of an individual pixel.\n");
295
+        H0("                                 Choose from 0=undef, 1=1:1(\"square\"), 2=12:11, 3=10:11, 4=16:11,\n");
296
+        H0("                                 5=40:33, 6=24:11, 7=20:11, 8=32:11, 9=80:33, 10=18:11, 11=15:11,\n");
297
+        H0("                                 12=64:33, 13=160:99, 14=4:3, 15=3:2, 16=2:1 or custom ratio of <int:int>. Default %d\n", param->vui.aspectRatioIdc);
298
+        H1("   --display-window <string>     Describe overscan cropping region as 'left,top,right,bottom' in pixels\n");
299
+        H1("   --overscan <string>           Specify whether it is appropriate for decoder to show cropped region: undef, show or crop. Default undef\n");
300
+        H0("   --videoformat <string>        Specify video format from undef, component, pal, ntsc, secam, mac. Default undef\n");
301
+        H0("   --range <string>              Specify black level and range of luma and chroma signals as full or limited Default limited\n");
302
+        H0("   --colorprim <string>          Specify color primaries from  bt709, unknown, reserved, bt470m, bt470bg, smpte170m,\n");
303
+        H0("                                 smpte240m, film, bt2020, smpte428, smpte431, smpte432. Default undef\n");
304
+        H0("   --transfer <string>           Specify transfer characteristics from bt709, unknown, reserved, bt470m, bt470bg, smpte170m,\n");
305
+        H0("                                 smpte240m, linear, log100, log316, iec61966-2-4, bt1361e, iec61966-2-1,\n");
306
+        H0("                                 bt2020-10, bt2020-12, smpte2084, smpte428, arib-std-b67. Default undef\n");
307
+        H1("   --colormatrix <string>        Specify color matrix setting from undef, bt709, fcc, bt470bg, smpte170m,\n");
308
+        H1("                                 smpte240m, GBR, YCgCo, bt2020nc, bt2020c, smpte2085, chroma-derived-nc, chroma-derived-c, ictcp. Default undef\n");
309
+        H1("   --chromaloc <integer>         Specify chroma sample location (0 to 5). Default of %d\n", param->vui.chromaSampleLocTypeTopField);
310
+        H0("   --master-display <string>     SMPTE ST 2086 master display color volume info SEI (HDR)\n");
311
+        H0("                                    format: G(x,y)B(x,y)R(x,y)WP(x,y)L(max,min)\n");
312
+        H0("   --max-cll <string>            Specify content light level info SEI as \"cll,fall\" (HDR).\n");
313
+        H0("   --[no-]cll                    Emit content light level info SEI. Default %s\n", OPT(param->bEmitCLL));
314
+        H0("   --[no-]hdr10                  Control dumping of HDR10 SEI packet. If max-cll or master-display has non-zero values, this is enabled. Default %s\n", OPT(param->bEmitHDR10SEI));
315
+        H0("   --[no-]hdr-opt                Add luma and chroma offsets for HDR/WCG content. Default %s. Now deprecated.\n", OPT(param->bHDROpt));
316
+        H0("   --[no-]hdr10-opt              Block-level QP optimization for HDR10 content. Default %s.\n", OPT(param->bHDR10Opt));
317
+        H0("   --min-luma <integer>          Minimum luma plane value of input source picture\n");
318
+        H0("   --max-luma <integer>          Maximum luma plane value of input source picture\n");
319
+        H0("\nBitstream options:\n");
320
+        H0("   --[no-]repeat-headers         Emit SPS and PPS headers at each keyframe. Default %s\n", OPT(param->bRepeatHeaders));
321
+        H0("   --[no-]info                   Emit SEI identifying encoder and parameters. Default %s\n", OPT(param->bEmitInfoSEI));
322
+        H0("   --[no-]hrd                    Enable HRD parameters signaling. Default %s\n", OPT(param->bEmitHRDSEI));
323
+        H0("   --[no-]idr-recovery-sei      Emit recovery point infor SEI at each IDR frame \n");
324
+        H0("   --[no-]temporal-layers        Enable a temporal sublayer for unreferenced B frames. Default %s\n", OPT(param->bEnableTemporalSubLayers));
325
+        H0("   --[no-]aud                    Emit access unit delimiters at the start of each access unit. Default %s\n", OPT(param->bEnableAccessUnitDelimiters));
326
+        H1("   --hash <integer>              Decoded Picture Hash SEI 0: disabled, 1: MD5, 2: CRC, 3: Checksum. Default %d\n", param->decodedPictureHashSEI);
327
+        H0("   --atc-sei <integer>           Emit the alternative transfer characteristics SEI message where the integer is the preferred transfer characteristics. Default disabled\n");
328
+        H0("   --pic-struct <integer>        Set the picture structure and emits it in the picture timing SEI message. Values in the range 0..12. See D.3.3 of the HEVC spec. for a detailed explanation.\n");
329
+        H0("   --log2-max-poc-lsb <integer>  Maximum of the picture order count\n");
330
+        H0("   --[no-]vui-timing-info        Emit VUI timing information in the bistream. Default %s\n", OPT(param->bEmitVUITimingInfo));
331
+        H0("   --[no-]vui-hrd-info           Emit VUI HRD information in the bistream. Default %s\n", OPT(param->bEmitVUIHRDInfo));
332
+        H0("   --[no-]opt-qp-pps             Dynamically optimize QP in PPS (instead of default 26) based on QPs in previous GOP. Default %s\n", OPT(param->bOptQpPPS));
333
+        H0("   --[no-]opt-ref-list-length-pps  Dynamically set L0 and L1 ref list length in PPS (instead of default 0) based on values in last GOP. Default %s\n", OPT(param->bOptRefListLengthPPS));
334
+        H0("   --[no-]multi-pass-opt-rps     Enable storing commonly used RPS in SPS in multi pass mode. Default %s\n", OPT(param->bMultiPassOptRPS));
335
+        H0("   --[no-]opt-cu-delta-qp        Optimize to signal consistent CU level delta QPs in frame. Default %s\n", OPT(param->bOptCUDeltaQP));
336
+        H1("\nReconstructed video options (debugging):\n");
337
+        H1("-r/--recon <filename>            Reconstructed raw image YUV or Y4M output file name\n");
338
+        H1("   --recon-depth <integer>       Bit-depth of reconstructed raw image file. Defaults to input bit depth, or 8 if Y4M\n");
339
+        H1("   --recon-y4m-exec <string>     pipe reconstructed frames to Y4M viewer, ex:\"ffplay -i pipe:0 -autoexit\"\n");
340
+        H0("   --lowpass-dct                 Use low-pass subband dct approximation. Default %s\n", OPT(param->bLowPassDct));
341
+        H0("   --[no-]frame-dup              Enable Frame duplication. Default %s\n", OPT(param->bEnableFrameDuplication));
342
+        H0("   --dup-threshold <integer>     PSNR threshold for Frame duplication. Default %d\n", param->dupThreshold);
343
+#ifdef SVT_HEVC
344
+        H0("   --[no]svt                     Enable SVT HEVC encoder %s\n", OPT(param->bEnableSvtHevc));
345
+        H0("   --[no-]svt-hme                Enable Hierarchial motion estimation(HME) in SVT HEVC encoder \n");
346
+        H0("   --svt-search-width            Motion estimation search area width for SVT HEVC encoder \n");
347
+        H0("   --svt-search-height           Motion estimation search area height for SVT HEVC encoder \n");
348
+        H0("   --[no-]svt-compressed-ten-bit-format  Enable 8+2 encoding mode for 10bit input in SVT HEVC encoder \n");
349
+        H0("   --[no-]svt-speed-control      Enable speed control functionality to achieve real time encoding speed for  SVT HEVC encoder \n");
350
+        H0("   --svt-preset-tuner            Enable additional faster presets of SVT; This only has to be used on top of x265's ultrafast preset. Accepts values in the range of 0-2 \n");
351
+        H0("   --svt-hierarchical-level      Hierarchical layer for SVT-HEVC encoder; Accepts inputs in the range 0-3 \n");
352
+        H0("   --svt-base-layer-switch-mode  Select whether B/P slice should be used in base layer for SVT-HEVC encoder. 0-Use B-frames; 1-Use P frames in the base layer \n");
353
+        H0("   --svt-pred-struct             Select pred structure for SVT HEVC encoder;  Accepts inputs in the range 0-2 \n");
354
+        H0("   --[no-]svt-fps-in-vps         Enable VPS timing info for SVT HEVC encoder  \n");
355
+#endif
356
+        H0(" ABR-ladder settings\n");
357
+        H0("   --abr-ladder <file>           File containing config settings required for the generation of ABR-ladder\n");
358
+        H1("\nExecutable return codes:\n");
359
+        H1("    0 - encode successful\n");
360
+        H1("    1 - unable to parse command line\n");
361
+        H1("    2 - unable to open encoder\n");
362
+        H1("    3 - unable to generate stream headers\n");
363
+        H1("    4 - encoder abort\n");
364
+#undef OPT
365
+#undef H0
366
+#undef H1
367
+        if (level < X265_LOG_DEBUG)
368
+            printf("\nUse --fullhelp for a full listing (or --log-level full --help)\n");
369
+        printf("\n\nComplete documentation may be found at http://x265.readthedocs.org/en/default/cli.html\n");
370
+        exit(1);
371
+    }
372
+
373
+    void CLIOptions::destroy()
374
+    {
375
+        if (isAbrLadderConfig)
376
+        {
377
+            for (int idx = 1; idx < argCnt; idx++)
378
+                free(argString[idx]);
379
+            free(argString);
380
+        }
381
+
382
+        if (input)
383
+            input->release();
384
+        input = NULL;
385
+        if (recon)
386
+            recon->release();
387
+        recon = NULL;
388
+        if (qpfile)
389
+            fclose(qpfile);
390
+        qpfile = NULL;
391
+        if (zoneFile)
392
+            fclose(zoneFile);
393
+        zoneFile = NULL;
394
+        if (dolbyVisionRpu)
395
+            fclose(dolbyVisionRpu);
396
+        dolbyVisionRpu = NULL;
397
+        if (output)
398
+            output->release();
399
+        output = NULL;
400
+    }
401
+
402
+    void CLIOptions::printStatus(uint32_t frameNum)
403
+    {
404
+        char buf[200];
405
+        int64_t time = x265_mdate();
406
+
407
+        if (!bProgress || !frameNum || (prevUpdateTime && time - prevUpdateTime < UPDATE_INTERVAL))
408
+            return;
409
+
410
+        int64_t elapsed = time - startTime;
411
+        double fps = elapsed > 0 ? frameNum * 1000000. / elapsed : 0;
412
+        float bitrate = 0.008f * totalbytes * (param->fpsNum / param->fpsDenom) / ((float)frameNum);
413
+        if (framesToBeEncoded)
414
+        {
415
+            int eta = (int)(elapsed * (framesToBeEncoded - frameNum) / ((int64_t)frameNum * 1000000));
416
+            sprintf(buf, "x265 [%.1f%%] %d/%d frames, %.2f fps, %.2f kb/s, eta %d:%02d:%02d",
417
+                100. * frameNum / (param->chunkEnd ? param->chunkEnd : param->totalFrames), frameNum, (param->chunkEnd ? param->chunkEnd : param->totalFrames), fps, bitrate,
418
+                eta / 3600, (eta / 60) % 60, eta % 60);
419
+        }
420
+        else
421
+            sprintf(buf, "x265 %d frames: %.2f fps, %.2f kb/s", frameNum, fps, bitrate);
422
+
423
+        fprintf(stderr, "%s  \r", buf + 5);
424
+        SetConsoleTitle(buf);
425
+        fflush(stderr); // needed in windows
426
+        prevUpdateTime = time;
427
+    }
428
+
429
+    bool CLIOptions::parseZoneParam(int argc, char **argv, x265_param* globalParam, int zonefileCount)
430
+    {
431
+        bool bError = false;
432
+        int bShowHelp = false;
433
+        int outputBitDepth = 0;
434
+        const char *profile = NULL;
435
+
436
+        /* Presets are applied before all other options. */
437
+        for (optind = 0;;)
438
+        {
439
+            int c = getopt_long(argc, argv, short_options, long_options, NULL);
440
+            if (c == -1)
441
+                break;
442
+            else if (c == 'D')
443
+                outputBitDepth = atoi(optarg);
444
+            else if (c == 'P')
445
+                profile = optarg;
446
+            else if (c == '?')
447
+                bShowHelp = true;
448
+        }
449
+
450
+        if (!outputBitDepth && profile)
451
+        {
452
+            /* try to derive the output bit depth from the requested profile */
453
+            if (strstr(profile, "10"))
454
+                outputBitDepth = 10;
455
+            else if (strstr(profile, "12"))
456
+                outputBitDepth = 12;
457
+            else
458
+                outputBitDepth = 8;
459
+        }
460
+
461
+        api = x265_api_get(outputBitDepth);
462
+        if (!api)
463
+        {
464
+            x265_log(NULL, X265_LOG_WARNING, "falling back to default bit-depth\n");
465
+            api = x265_api_get(0);
466
+        }
467
+
468
+        if (bShowHelp)
469
+        {
470
+            printVersion(globalParam, api);
471
+            showHelp(globalParam);
472
+        }
473
+
474
+        globalParam->rc.zones[zonefileCount].zoneParam = api->param_alloc();
475
+        if (!globalParam->rc.zones[zonefileCount].zoneParam)
476
+        {
477
+            x265_log(NULL, X265_LOG_ERROR, "param alloc failed\n");
478
+            return true;
479
+        }
480
+
481
+        memcpy(globalParam->rc.zones[zonefileCount].zoneParam, globalParam, sizeof(x265_param));
482
+
483
+        for (optind = 0;;)
484
+        {
485
+            int long_options_index = -1;
486
+            int c = getopt_long(argc, argv, short_options, long_options, &long_options_index);
487
+            if (c == -1)
488
+                break;
489
+
490
+            if (long_options_index < 0 && c > 0)
491
+            {
492
+                for (size_t i = 0; i < sizeof(long_options) / sizeof(long_options[0]); i++)
493
+                {
494
+                    if (long_options[i].val == c)
495
+                    {
496
+                        long_options_index = (int)i;
497
+                        break;
498
+                    }
499
+                }
500
+
501
+                if (long_options_index < 0)
502
+                {
503
+                    /* getopt_long might have already printed an error message */
504
+                    if (c != 63)
505
+                        x265_log(NULL, X265_LOG_WARNING, "internal error: short option '%c' has no long option\n", c);
506
+                    return true;
507
+                }
508
+            }
509
+            if (long_options_index < 0)
510
+            {
511
+                x265_log(NULL, X265_LOG_WARNING, "short option '%c' unrecognized\n", c);
512
+                return true;
513
+            }
514
+
515
+            bError |= !!api->zone_param_parse(globalParam->rc.zones[zonefileCount].zoneParam, long_options[long_options_index].name, optarg);
516
+
517
+            if (bError)
518
+            {
519
+                const char *name = long_options_index > 0 ? long_options[long_options_index].name : argv[optind - 2];
520
+                x265_log(NULL, X265_LOG_ERROR, "invalid argument: %s = %s\n", name, optarg);
521
+                return true;
522
+            }
523
+        }
524
+
525
+        if (optind < argc)
526
+        {
527
+            x265_log(param, X265_LOG_WARNING, "extra unused command arguments given <%s>\n", argv[optind]);
528
+            return true;
529
+        }
530
+        return false;
531
+    }
532
+
533
+    bool CLIOptions::parse(int argc, char **argv)
534
+    {
535
+        bool bError = false;
536
+        int bShowHelp = false;
537
+        int inputBitDepth = 8;
538
+        int outputBitDepth = 0;
539
+        int reconFileBitDepth = 0;
540
+        const char *inputfn = NULL;
541
+        const char *reconfn = NULL;
542
+        const char *outputfn = NULL;
543
+        const char *preset = NULL;
544
+        const char *tune = NULL;
545
+        const char *profile = NULL;
546
+        int svtEnabled = 0;
547
+        argCnt = argc;
548
+        argString = argv;
549
+
550
+        if (argc <= 1)
551
+        {
552
+            x265_log(NULL, X265_LOG_ERROR, "No input file. Run x265 --help for a list of options.\n");
553
+            return true;
554
+        }
555
+
556
+        /* Presets are applied before all other options. */
557
+        for (optind = 0;;)
558
+        {
559
+            int optionsIndex = -1;
560
+            int c = getopt_long(argc, argv, short_options, long_options, &optionsIndex);
561
+            if (c == -1)
562
+                break;
563
+            else if (c == 'p')
564
+                preset = optarg;
565
+            else if (c == 't')
566
+                tune = optarg;
567
+            else if (c == 'D')
568
+                outputBitDepth = atoi(optarg);
569
+            else if (c == 'P')
570
+                profile = optarg;
571
+            else if (c == '?')
572
+                bShowHelp = true;
573
+            else if (!c && !strcmp(long_options[optionsIndex].name, "svt"))
574
+                svtEnabled = 1;
575
+        }
576
+
577
+        if (!outputBitDepth && profile)
578
+        {
579
+            /* try to derive the output bit depth from the requested profile */
580
+            if (strstr(profile, "10"))
581
+                outputBitDepth = 10;
582
+            else if (strstr(profile, "12"))
583
+                outputBitDepth = 12;
584
+            else
585
+                outputBitDepth = 8;
586
+        }
587
+
588
+        api = x265_api_get(outputBitDepth);
589
+        if (!api)
590
+        {
591
+            x265_log(NULL, X265_LOG_WARNING, "falling back to default bit-depth\n");
592
+            api = x265_api_get(0);
593
+        }
594
+
595
+        param = api->param_alloc();
596
+        if (!param)
597
+        {
598
+            x265_log(NULL, X265_LOG_ERROR, "param alloc failed\n");
599
+            return true;
600
+        }
601
+#if ENABLE_LIBVMAF
602
+        vmafData = (x265_vmaf_data*)x265_malloc(sizeof(x265_vmaf_data));
603
+        if (!vmafData)
604
+        {
605
+            x265_log(NULL, X265_LOG_ERROR, "vmaf data alloc failed\n");
606
+            return true;
607
+        }
608
+#endif
609
+
610
+        if (api->param_default_preset(param, preset, tune) < 0)
611
+        {
612
+            x265_log(NULL, X265_LOG_ERROR, "preset or tune unrecognized\n");
613
+            return true;
614
+        }
615
+
616
+        if (bShowHelp)
617
+        {
618
+            printVersion(param, api);
619
+            showHelp(param);
620
+        }
621
+
622
+        //Set enable SVT-HEVC encoder first if found in the command line
623
+        if (svtEnabled) api->param_parse(param, "svt", NULL);
624
+
625
+        for (optind = 0;;)
626
+        {
627
+            int long_options_index = -1;
628
+            int c = getopt_long(argc, argv, short_options, long_options, &long_options_index);
629
+            if (c == -1)
630
+                break;
631
+
632
+            switch (c)
633
+            {
634
+            case 'h':
635
+                printVersion(param, api);
636
+                showHelp(param);
637
+                break;
638
+
639
+            case 'V':
640
+                printVersion(param, api);
641
+                x265_report_simd(param);
642
+                exit(0);
643
+
644
+            default:
645
+                if (long_options_index < 0 && c > 0)
646
+                {
647
+                    for (size_t i = 0; i < sizeof(long_options) / sizeof(long_options[0]); i++)
648
+                    {
649
+                        if (long_options[i].val == c)
650
+                        {
651
+                            long_options_index = (int)i;
652
+                            break;
653
+                        }
654
+                    }
655
+
656
+                    if (long_options_index < 0)
657
+                    {
658
+                        /* getopt_long might have already printed an error message */
659
+                        if (c != 63)
660
+                            x265_log(NULL, X265_LOG_WARNING, "internal error: short option '%c' has no long option\n", c);
661
+                        return true;
662
+                    }
663
+                }
664
+                if (long_options_index < 0)
665
+                {
666
+                    x265_log(NULL, X265_LOG_WARNING, "short option '%c' unrecognized\n", c);
667
+                    return true;
668
+                }
669
+#define OPT(longname) \
670
+                                            else if (!strcmp(long_options[long_options_index].name, longname))
671
+#define OPT2(name1, name2) \
672
+                                            else if (!strcmp(long_options[long_options_index].name, name1) || \
673
+             !strcmp(long_options[long_options_index].name, name2))
674
+
675
+                if (0);
676
+                OPT2("frame-skip", "seek") this->seek = (uint32_t)x265_atoi(optarg, bError);
677
+                OPT("frames") this->framesToBeEncoded = (uint32_t)x265_atoi(optarg, bError);
678
+                OPT("no-progress") this->bProgress = false;
679
+                OPT("output") outputfn = optarg;
680
+                OPT("input") inputfn = optarg;
681
+                OPT("recon") reconfn = optarg;
682
+                OPT("input-depth") inputBitDepth = (uint32_t)x265_atoi(optarg, bError);
683
+                OPT("dither") this->bDither = true;
684
+                OPT("recon-depth") reconFileBitDepth = (uint32_t)x265_atoi(optarg, bError);
685
+                OPT("y4m") this->bForceY4m = true;
686
+                OPT("profile") /* handled above */;
687
+                OPT("preset")  /* handled above */;
688
+                OPT("tune")    /* handled above */;
689
+                OPT("output-depth")   /* handled above */;
690
+                OPT("recon-y4m-exec") reconPlayCmd = optarg;
691
+                OPT("svt")    /* handled above */;
692
+                OPT("qpfile")
693
+                {
694
+                    this->qpfile = x265_fopen(optarg, "rb");
695
+                    if (!this->qpfile)
696
+                        x265_log_file(param, X265_LOG_ERROR, "%s qpfile not found or error in opening qp file\n", optarg);
697
+                }
698
+                OPT("dolby-vision-rpu")
699
+                {
700
+                    this->dolbyVisionRpu = x265_fopen(optarg, "rb");
701
+                    if (!this->dolbyVisionRpu)
702
+                    {
703
+                        x265_log_file(param, X265_LOG_ERROR, "Dolby Vision RPU metadata file %s not found or error in opening file\n", optarg);
704
+                        return true;
705
+                    }
706
+                }
707
+                OPT("zonefile")
708
+                {
709
+                    this->zoneFile = x265_fopen(optarg, "rb");
710
+                    if (!this->zoneFile)
711
+                        x265_log_file(param, X265_LOG_ERROR, "%s zone file not found or error in opening zone file\n", optarg);
712
+                }
713
+                OPT("fullhelp")
714
+                {
715
+                    param->logLevel = X265_LOG_FULL;
716
+                    printVersion(param, api);
717
+                    showHelp(param);
718
+                    break;
719
+                }
720
+                else
721
+                    bError |= !!api->param_parse(param, long_options[long_options_index].name, optarg);
722
+                if (bError)
723
+                {
724
+                    const char *name = long_options_index > 0 ? long_options[long_options_index].name : argv[optind - 2];
725
+                    x265_log(NULL, X265_LOG_ERROR, "invalid argument: %s = %s\n", name, optarg);
726
+                    return true;
727
+                }
728
+#undef OPT
729
+            }
730
+        }
731
+
732
+        if (optind < argc && !inputfn)
733
+            inputfn = argv[optind++];
734
+        if (optind < argc && !outputfn)
735
+            outputfn = argv[optind++];
736
+        if (optind < argc)
737
+        {
738
+            x265_log(param, X265_LOG_WARNING, "extra unused command arguments given <%s>\n", argv[optind]);
739
+            return true;
740
+        }
741
+
742
+        if (argc <= 1)
743
+        {
744
+            api->param_default(param);
745
+            printVersion(param, api);
746
+            showHelp(param);
747
+        }
748
+
749
+        if (!inputfn || !outputfn)
750
+        {
751
+            x265_log(param, X265_LOG_ERROR, "input or output file not specified, try --help for help\n");
752
+            return true;
753
+        }
754
+
755
+        if (param->internalBitDepth != api->bit_depth)
756
+        {
757
+            x265_log(param, X265_LOG_ERROR, "Only bit depths of %d are supported in this build\n", api->bit_depth);
758
+            return true;
759
+        }
760
+
761
+#ifdef SVT_HEVC
762
+        if (svtEnabled)
763
+        {
764
+            EB_H265_ENC_CONFIGURATION* svtParam = (EB_H265_ENC_CONFIGURATION*)param->svtHevcParam;
765
+            param->sourceWidth = svtParam->sourceWidth;
766
+            param->sourceHeight = svtParam->sourceHeight;
767
+            param->fpsNum = svtParam->frameRateNumerator;
768
+            param->fpsDenom = svtParam->frameRateDenominator;
769
+            svtParam->encoderBitDepth = inputBitDepth;
770
+        }
771
+#endif
772
+
773
+        InputFileInfo info;
774
+        info.filename = inputfn;
775
+        info.depth = inputBitDepth;
776
+        info.csp = param->internalCsp;
777
+        info.width = param->sourceWidth;
778
+        info.height = param->sourceHeight;
779
+        info.fpsNum = param->fpsNum;
780
+        info.fpsDenom = param->fpsDenom;
781
+        info.sarWidth = param->vui.sarWidth;
782
+        info.sarHeight = param->vui.sarHeight;
783
+        info.skipFrames = seek;
784
+        info.frameCount = 0;
785
+        getParamAspectRatio(param, info.sarWidth, info.sarHeight);
786
+
787
+
788
+        this->input = InputFile::open(info, this->bForceY4m);
789
+        if (!this->input || this->input->isFail())
790
+        {
791
+            x265_log_file(param, X265_LOG_ERROR, "unable to open input file <%s>\n", inputfn);
792
+            return true;
793
+        }
794
+
795
+        if (info.depth < 8 || info.depth > 16)
796
+        {
797
+            x265_log(param, X265_LOG_ERROR, "Input bit depth (%d) must be between 8 and 16\n", inputBitDepth);
798
+            return true;
799
+        }
800
+
801
+        /* Unconditionally accept height/width/csp/bitDepth from file info */
802
+        param->sourceWidth = info.width;
803
+        param->sourceHeight = info.height;
804
+        param->internalCsp = info.csp;
805
+        param->sourceBitDepth = info.depth;
806
+
807
+        /* Accept fps and sar from file info if not specified by user */
808
+        if (param->fpsDenom == 0 || param->fpsNum == 0)
809
+        {
810
+            param->fpsDenom = info.fpsDenom;
811
+            param->fpsNum = info.fpsNum;
812
+        }
813
+        if (!param->vui.aspectRatioIdc && info.sarWidth && info.sarHeight)
814
+            setParamAspectRatio(param, info.sarWidth, info.sarHeight);
815
+        if (this->framesToBeEncoded == 0 && info.frameCount > (int)seek)
816
+            this->framesToBeEncoded = info.frameCount - seek;
817
+        param->totalFrames = this->framesToBeEncoded;
818
+
819
+#ifdef SVT_HEVC
820
+        if (svtEnabled)
821
+        {
822
+            EB_H265_ENC_CONFIGURATION* svtParam = (EB_H265_ENC_CONFIGURATION*)param->svtHevcParam;
823
+            svtParam->sourceWidth = param->sourceWidth;
824
+            svtParam->sourceHeight = param->sourceHeight;
825
+            svtParam->frameRateNumerator = param->fpsNum;
826
+            svtParam->frameRateDenominator = param->fpsDenom;
827
+            svtParam->framesToBeEncoded = param->totalFrames;
828
+            svtParam->encoderColorFormat = (EB_COLOR_FORMAT)param->internalCsp;
829
+        }
830
+#endif
831
+
832
+        /* Force CFR until we have support for VFR */
833
+        info.timebaseNum = param->fpsDenom;
834
+        info.timebaseDenom = param->fpsNum;
835
+
836
+        if (param->bField && param->interlaceMode)
837
+        {   // Field FPS
838
+            param->fpsNum *= 2;
839
+            // Field height
840
+            param->sourceHeight = param->sourceHeight >> 1;
841
+            // Number of fields to encode
842
+            param->totalFrames *= 2;
843
+        }
844
+
845
+        if (api->param_apply_profile(param, profile))
846
+            return true;
847
+
848
+        if (param->logLevel >= X265_LOG_INFO)
849
+        {
850
+            char buf[128];
851
+            int p = sprintf(buf, "%dx%d fps %d/%d %sp%d", param->sourceWidth, param->sourceHeight,
852
+                param->fpsNum, param->fpsDenom, x265_source_csp_names[param->internalCsp], info.depth);
853
+
854
+            int width, height;
855
+            getParamAspectRatio(param, width, height);
856
+            if (width && height)
857
+                p += sprintf(buf + p, " sar %d:%d", width, height);
858
+
859
+            if (framesToBeEncoded <= 0 || info.frameCount <= 0)
860
+                strcpy(buf + p, " unknown frame count");
861
+            else
862
+                sprintf(buf + p, " frames %u - %d of %d", this->seek, this->seek + this->framesToBeEncoded - 1, info.frameCount);
863
+
864
+            general_log(param, input->getName(), X265_LOG_INFO, "%s\n", buf);
865
+        }
866
+
867
+        this->input->startReader();
868
+
869
+        if (reconfn)
870
+        {
871
+            if (reconFileBitDepth == 0)
872
+                reconFileBitDepth = param->internalBitDepth;
873
+            this->recon = ReconFile::open(reconfn, param->sourceWidth, param->sourceHeight, reconFileBitDepth,
874
+                param->fpsNum, param->fpsDenom, param->internalCsp);
875
+            if (this->recon->isFail())
876
+            {
877
+                x265_log(param, X265_LOG_WARNING, "unable to write reconstructed outputs file\n");
878
+                this->recon->release();
879
+                this->recon = 0;
880
+            }
881
+            else
882
+                general_log(param, this->recon->getName(), X265_LOG_INFO,
883
+                "reconstructed images %dx%d fps %d/%d %s\n",
884
+                param->sourceWidth, param->sourceHeight, param->fpsNum, param->fpsDenom,
885
+                x265_source_csp_names[param->internalCsp]);
886
+        }
887
+#if ENABLE_LIBVMAF
888
+        if (!reconfn)
889
+        {
890
+            x265_log(param, X265_LOG_ERROR, "recon file must be specified to get VMAF score, try --help for help\n");
891
+            return true;
892
+        }
893
+        const char *str = strrchr(info.filename, '.');
894
+
895
+        if (!strcmp(str, ".y4m"))
896
+        {
897
+            x265_log(param, X265_LOG_ERROR, "VMAF supports YUV file format only.\n");
898
+            return true;
899
+        }
900
+        if (param->internalCsp == X265_CSP_I420 || param->internalCsp == X265_CSP_I422 || param->internalCsp == X265_CSP_I444)
901
+        {
902
+            vmafData->reference_file = x265_fopen(inputfn, "rb");
903
+            vmafData->distorted_file = x265_fopen(reconfn, "rb");
904
+        }
905
+        else
906
+        {
907
+            x265_log(param, X265_LOG_ERROR, "VMAF will support only yuv420p, yu422p, yu444p, yuv420p10le, yuv422p10le, yuv444p10le formats.\n");
908
+            return true;
909
+        }
910
+#endif
911
+        this->output = OutputFile::open(outputfn, info);
912
+        if (this->output->isFail())
913
+        {
914
+            x265_log_file(param, X265_LOG_ERROR, "failed to open output file <%s> for writing\n", outputfn);
915
+            return true;
916
+        }
917
+        general_log_file(param, this->output->getName(), X265_LOG_INFO, "output file: %s\n", outputfn);
918
+        return false;
919
+    }
920
+
921
+    bool CLIOptions::parseQPFile(x265_picture &pic_org)
922
+    {
923
+        int32_t num = -1, qp, ret;
924
+        char type;
925
+        uint32_t filePos;
926
+        pic_org.forceqp = 0;
927
+        pic_org.sliceType = X265_TYPE_AUTO;
928
+        while (num < pic_org.poc)
929
+        {
930
+            filePos = ftell(qpfile);
931
+            qp = -1;
932
+            ret = fscanf(qpfile, "%d %c%*[ \t]%d\n", &num, &type, &qp);
933
+
934
+            if (num > pic_org.poc || ret == EOF)
935
+            {
936
+                fseek(qpfile, filePos, SEEK_SET);
937
+                break;
938
+            }
939
+            if (num < pic_org.poc && ret >= 2)
940
+                continue;
941
+            if (ret == 3 && qp >= 0)
942
+                pic_org.forceqp = qp + 1;
943
+            if (type == 'I') pic_org.sliceType = X265_TYPE_IDR;
944
+            else if (type == 'i') pic_org.sliceType = X265_TYPE_I;
945
+            else if (type == 'K') pic_org.sliceType = param->bOpenGOP ? X265_TYPE_I : X265_TYPE_IDR;
946
+            else if (type == 'P') pic_org.sliceType = X265_TYPE_P;
947
+            else if (type == 'B') pic_org.sliceType = X265_TYPE_BREF;
948
+            else if (type == 'b') pic_org.sliceType = X265_TYPE_B;
949
+            else ret = 0;
950
+            if (ret < 2 || qp < -1 || qp > 51)
951
+                return 0;
952
+        }
953
+        return 1;
954
+    }
955
+
956
+    bool CLIOptions::parseZoneFile()
957
+    {
958
+        char line[256];
959
+        char* argLine;
960
+        param->rc.zonefileCount = 0;
961
+
962
+        while (fgets(line, sizeof(line), zoneFile))
963
+        {
964
+            if (!((*line == '#') || (strcmp(line, "\r\n") == 0)))
965
+                param->rc.zonefileCount++;
966
+        }
967
+
968
+        rewind(zoneFile);
969
+        param->rc.zones = X265_MALLOC(x265_zone, param->rc.zonefileCount);
970
+        for (int i = 0; i < param->rc.zonefileCount; i++)
971
+        {
972
+            while (fgets(line, sizeof(line), zoneFile))
973
+            {
974
+                if (*line == '#' || (strcmp(line, "\r\n") == 0))
975
+                    continue;
976
+                param->rc.zones[i].zoneParam = X265_MALLOC(x265_param, 1);
977
+                int index = (int)strcspn(line, "\r\n");
978
+                line[index] = '\0';
979
+                argLine = line;
980
+                while (isspace((unsigned char)*argLine)) argLine++;
981
+                char* start = strchr(argLine, ' ');
982
+                start++;
983
+                param->rc.zones[i].startFrame = atoi(argLine);
984
+                int argCount = 0;
985
+                char **args = (char**)malloc(256 * sizeof(char *));
986
+                // Adding a dummy string to avoid file parsing error
987
+                args[argCount++] = (char *)"x265";
988
+                char* token = strtok(start, " ");
989
+                while (token)
990
+                {
991
+                    args[argCount++] = token;
992
+                    token = strtok(NULL, " ");
993
+                }
994
+                args[argCount] = NULL;
995
+                CLIOptions cliopt;
996
+                if (cliopt.parseZoneParam(argCount, args, param, i))
997
+                {
998
+                    cliopt.destroy();
999
+                    if (cliopt.api)
1000
+                        cliopt.api->param_free(cliopt.param);
1001
+                    exit(1);
1002
+                }
1003
+                break;
1004
+            }
1005
+        }
1006
+        return 1;
1007
+    }
1008
+
1009
+    /* Parse the RPU file and extract the RPU corresponding to the current picture
1010
+    * and fill the rpu field of the input picture */
1011
+    int CLIOptions::rpuParser(x265_picture * pic)
1012
+    {
1013
+        uint8_t byteVal;
1014
+        uint32_t code = 0;
1015
+        int bytesRead = 0;
1016
+        pic->rpu.payloadSize = 0;
1017
+
1018
+        if (!pic->pts)
1019
+        {
1020
+            while (bytesRead++ < 4 && fread(&byteVal, sizeof(uint8_t), 1, dolbyVisionRpu))
1021
+                code = (code << 8) | byteVal;
1022
+
1023
+            if (code != START_CODE)
1024
+            {
1025
+                x265_log(NULL, X265_LOG_ERROR, "Invalid Dolby Vision RPU startcode in POC %d\n", pic->pts);
1026
+                return 1;
1027
+            }
1028
+        }
1029
+
1030
+        bytesRead = 0;
1031
+        while (fread(&byteVal, sizeof(uint8_t), 1, dolbyVisionRpu))
1032
+        {
1033
+            code = (code << 8) | byteVal;
1034
+            if (bytesRead++ < 3)
1035
+                continue;
1036
+            if (bytesRead >= 1024)
1037
+            {
1038
+                x265_log(NULL, X265_LOG_ERROR, "Invalid Dolby Vision RPU size in POC %d\n", pic->pts);
1039
+                return 1;
1040
+            }
1041
+
1042
+            if (code != START_CODE)
1043
+                pic->rpu.payload[pic->rpu.payloadSize++] = (code >> (3 * 8)) & 0xFF;
1044
+            else
1045
+                return 0;
1046
+        }
1047
+
1048
+        int ShiftBytes = START_CODE_BYTES - (bytesRead - pic->rpu.payloadSize);
1049
+        int bytesLeft = bytesRead - pic->rpu.payloadSize;
1050
+        code = (code << ShiftBytes * 8);
1051
+        for (int i = 0; i < bytesLeft; i++)
1052
+        {
1053
+            pic->rpu.payload[pic->rpu.payloadSize++] = (code >> (3 * 8)) & 0xFF;
1054
+            code = (code << 8);
1055
+        }
1056
+        if (!pic->rpu.payloadSize)
1057
+            x265_log(NULL, X265_LOG_WARNING, "Dolby Vision RPU not found for POC %d\n", pic->pts);
1058
+        return 0;
1059
+    }
1060
+
1061
+#ifdef __cplusplus
1062
+}
1063
+#endif
1064
\ No newline at end of file
1065
x265_3.3.tar.gz/source/x265cli.h -> x265_3.4.tar.gz/source/x265cli.h Changed
453
 
1
@@ -27,9 +27,23 @@
2
 
3
 #include "common.h"
4
 #include "param.h"
5
+#include "input/input.h"
6
+#include "output/output.h"
7
+#include "output/reconplay.h"
8
 
9
 #include <getopt.h>
10
 
11
+#define CONSOLE_TITLE_SIZE 200
12
+#ifdef _WIN32
13
+#include <windows.h>
14
+#define SetThreadExecutionState(es)
15
+static char orgConsoleTitle[CONSOLE_TITLE_SIZE] = "";
16
+#else
17
+#define GetConsoleTitle(t, n)
18
+#define SetConsoleTitle(t)
19
+#define SetThreadExecutionState(es)
20
+#endif
21
+
22
 #ifdef __cplusplus
23
 namespace X265_NS {
24
 #endif
25
@@ -105,8 +119,8 @@
26
     { "amp",                  no_argument, NULL, 0 },
27
     { "no-early-skip",        no_argument, NULL, 0 },
28
     { "early-skip",           no_argument, NULL, 0 },
29
-    { "no-rskip",             no_argument, NULL, 0 },
30
-    { "rskip",                no_argument, NULL, 0 },
31
+    { "rskip",                required_argument, NULL, 0 },
32
+    { "rskip-edge-threshold", required_argument, NULL, 0 },
33
     { "no-fast-cbf",          no_argument, NULL, 0 },
34
     { "fast-cbf",             no_argument, NULL, 0 },
35
     { "no-tskip",             no_argument, NULL, 0 },
36
@@ -358,6 +372,7 @@
37
     { "cll", no_argument, NULL, 0 },
38
     { "no-cll", no_argument, NULL, 0 },
39
     { "hme-range", required_argument, NULL, 0 },
40
+    { "abr-ladder", required_argument, NULL, 0 },
41
     { 0, 0, 0, 0 },
42
     { 0, 0, 0, 0 },
43
     { 0, 0, 0, 0 },
44
@@ -365,336 +380,82 @@
45
     { 0, 0, 0, 0 }
46
 };
47
 
48
-static void printVersion(x265_param *param, const x265_api* api)
49
-{
50
-    x265_log(param, X265_LOG_INFO, "HEVC encoder version %s\n", api->version_str);
51
-    x265_log(param, X265_LOG_INFO, "build info %s\n", api->build_info_str);
52
-}
53
+    struct CLIOptions
54
+    {
55
+        InputFile* input;
56
+        ReconFile* recon;
57
+        OutputFile* output;
58
+        FILE*       qpfile;
59
+        FILE*       zoneFile;
60
+        FILE*    dolbyVisionRpu;    /* File containing Dolby Vision BL RPU metadata */
61
+        const char* reconPlayCmd;
62
+        const x265_api* api;
63
+        x265_param* param;
64
+        x265_vmaf_data* vmafData;
65
+        bool bProgress;
66
+        bool bForceY4m;
67
+        bool bDither;
68
+        uint32_t seek;              // number of frames to skip from the beginning
69
+        uint32_t framesToBeEncoded; // number of frames to encode
70
+        uint64_t totalbytes;
71
+        int64_t startTime;
72
+        int64_t prevUpdateTime;
73
 
74
-static void showHelp(x265_param *param)
75
-{
76
-    int level = param->logLevel;
77
+        int argCnt;
78
+        char** argString;
79
 
80
-#define OPT(value) (value ? "enabled" : "disabled")
81
-#define H0 printf
82
-#define H1 if (level >= X265_LOG_DEBUG) printf
83
+        /* ABR ladder settings */
84
+        bool isAbrLadderConfig;
85
+        bool enableScaler;
86
+        char*    encName;
87
+        char*    reuseName;
88
+        uint32_t encId;
89
+        int      refId;
90
+        uint32_t loadLevel;
91
+        uint32_t saveLevel;
92
+        uint32_t numRefs;
93
 
94
-    H0("\nSyntax: x265 [options] infile [-o] outfile\n");
95
-    H0("    infile can be YUV or Y4M\n");
96
-    H0("    outfile is raw HEVC bitstream\n");
97
-    H0("\nExecutable Options:\n");
98
-    H0("-h/--help                        Show this help text and exit\n");
99
-    H0("   --fullhelp                    Show all options and exit\n");
100
-    H0("-V/--version                     Show version info and exit\n");
101
-    H0("\nOutput Options:\n");
102
-    H0("-o/--output <filename>           Bitstream output file name\n");
103
-    H0("-D/--output-depth 8|10|12        Output bit depth (also internal bit depth). Default %d\n", param->internalBitDepth);
104
-    H0("   --log-level <string>          Logging level: none error warning info debug full. Default %s\n", X265_NS::logLevelNames[param->logLevel + 1]);
105
-    H0("   --no-progress                 Disable CLI progress reports\n");
106
-    H0("   --csv <filename>              Comma separated log file, if csv-log-level > 0 frame level statistics, else one line per run\n");
107
-    H0("   --csv-log-level <integer>     Level of csv logging, if csv-log-level > 0 frame level statistics, else one line per run: 0-2\n");
108
-    H0("\nInput Options:\n");
109
-    H0("   --input <filename>            Raw YUV or Y4M input file name. `-` for stdin\n");
110
-    H1("   --y4m                         Force parsing of input stream as YUV4MPEG2 regardless of file extension\n");
111
-    H0("   --fps <float|rational>        Source frame rate (float or num/denom), auto-detected if Y4M\n");
112
-    H0("   --input-res WxH               Source picture size [w x h], auto-detected if Y4M\n");
113
-    H1("   --input-depth <integer>       Bit-depth of input file. Default 8\n");
114
-    H1("   --input-csp <string>          Chroma subsampling, auto-detected if Y4M\n");
115
-    H1("                                 0 - i400 (4:0:0 monochrome)\n");
116
-    H1("                                 1 - i420 (4:2:0 default)\n");
117
-    H1("                                 2 - i422 (4:2:2)\n");
118
-    H1("                                 3 - i444 (4:4:4)\n");
119
-#if ENABLE_HDR10_PLUS
120
-    H0("   --dhdr10-info <filename>      JSON file containing the Creative Intent Metadata to be encoded as Dynamic Tone Mapping\n");
121
-    H0("   --[no-]dhdr10-opt             Insert tone mapping SEI only for IDR frames and when the tone mapping information changes. Default disabled\n");
122
-#endif
123
-    H0("   --dolby-vision-profile <float|integer> Specifies Dolby Vision profile ID. Currently only profile 5, profile 8.1 and profile 8.2 enabled. Specified as '5' or '50'. Default 0 (disabled).\n");
124
-    H0("   --dolby-vision-rpu <filename> File containing Dolby Vision RPU metadata.\n"
125
-       "                                 If given, x265's Dolby Vision metadata parser will fill the RPU field of input pictures with the metadata read from the file. Default NULL(disabled).\n");
126
-    H0("   --nalu-file <filename>        Text file containing SEI messages in the following format : <POC><space><PREFIX><space><NAL UNIT TYPE>/<SEI TYPE><space><SEI Payload>\n");
127
-    H0("-f/--frames <integer>            Maximum number of frames to encode. Default all\n");
128
-    H0("   --seek <integer>              First frame to encode\n");
129
-    H1("   --[no-]interlace <bff|tff>    Indicate input pictures are interlace fields in temporal order. Default progressive\n");
130
-    H0("   --[no-]field                  Enable or disable field coding. Default %s\n", OPT( param->bField));
131
-    H1("   --dither                      Enable dither if downscaling to 8 bit pixels. Default disabled\n");
132
-    H0("   --[no-]copy-pic               Copy buffers of input picture in frame. Default %s\n", OPT(param->bCopyPicToFrame));
133
-    H0("\nQuality reporting metrics:\n");
134
-    H0("   --[no-]ssim                   Enable reporting SSIM metric scores. Default %s\n", OPT(param->bEnableSsim));
135
-    H0("   --[no-]psnr                   Enable reporting PSNR metric scores. Default %s\n", OPT(param->bEnablePsnr));
136
-    H0("\nProfile, Level, Tier:\n");
137
-    H0("-P/--profile <string>            Enforce an encode profile: main, main10, mainstillpicture\n");
138
-    H0("   --level-idc <integer|float>   Force a minimum required decoder level (as '5.0' or '50')\n");
139
-    H0("   --[no-]high-tier              If a decoder level is specified, this modifier selects High tier of that level\n");
140
-    H0("   --uhd-bd                      Enable UHD Bluray compatibility support\n");
141
-    H0("   --[no-]allow-non-conformance  Allow the encoder to generate profile NONE bitstreams. Default %s\n", OPT(param->bAllowNonConformance));
142
-    H0("\nThreading, performance:\n");
143
-    H0("   --pools <integer,...>         Comma separated thread count per thread pool (pool per NUMA node)\n");
144
-    H0("                                 '-' implies no threads on node, '+' implies one thread per core on node\n");
145
-    H0("-F/--frame-threads <integer>     Number of concurrently encoded frames. 0: auto-determined by core count\n");
146
-    H0("   --[no-]wpp                    Enable Wavefront Parallel Processing. Default %s\n", OPT(param->bEnableWavefront));
147
-    H0("   --[no-]slices <integer>       Enable Multiple Slices feature. Default %d\n", param->maxSlices);
148
-    H0("   --[no-]pmode                  Parallel mode analysis. Default %s\n", OPT(param->bDistributeModeAnalysis));
149
-    H0("   --[no-]pme                    Parallel motion estimation. Default %s\n", OPT(param->bDistributeMotionEstimation));
150
-    H0("   --[no-]asm <bool|int|string>  Override CPU detection. Default: auto\n");
151
-    H0("\nPresets:\n");
152
-    H0("-p/--preset <string>             Trade off performance for compression efficiency. Default medium\n");
153
-    H0("                                 ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, veryslow, or placebo\n");
154
-    H0("-t/--tune <string>               Tune the settings for a particular type of source or situation:\n");
155
-    H0("                                 psnr, ssim, grain, zerolatency, fastdecode\n");
156
-    H0("\nQuad-Tree size and depth:\n");
157
-    H0("-s/--ctu <64|32|16>              Maximum CU size (WxH). Default %d\n", param->maxCUSize);
158
-    H0("   --min-cu-size <64|32|16|8>    Minimum CU size (WxH). Default %d\n", param->minCUSize);
159
-    H0("   --max-tu-size <32|16|8|4>     Maximum TU size (WxH). Default %d\n", param->maxTUSize);
160
-    H0("   --tu-intra-depth <integer>    Max TU recursive depth for intra CUs. Default %d\n", param->tuQTMaxIntraDepth);
161
-    H0("   --tu-inter-depth <integer>    Max TU recursive depth for inter CUs. Default %d\n", param->tuQTMaxInterDepth);
162
-    H0("   --limit-tu <0..4>             Enable early exit from TU recursion for inter coded blocks. Default %d\n", param->limitTU);
163
-    H0("\nAnalysis:\n");
164
-    H0("   --rd <1..6>                   Level of RDO in mode decision 1:least....6:full RDO. Default %d\n", param->rdLevel);
165
-    H0("   --[no-]psy-rd <0..5.0>        Strength of psycho-visual rate distortion optimization, 0 to disable. Default %.1f\n", param->psyRd);
166
-    H0("   --[no-]rdoq-level <0|1|2>     Level of RDO in quantization 0:none, 1:levels, 2:levels & coding groups. Default %d\n", param->rdoqLevel);
167
-    H0("   --[no-]psy-rdoq <0..50.0>     Strength of psycho-visual optimization in RDO quantization, 0 to disable. Default %.1f\n", param->psyRdoq);
168
-    H0("   --dynamic-rd <0..4.0>         Strength of dynamic RD, 0 to disable. Default %.2f\n", param->dynamicRd);
169
-    H0("   --[no-]ssim-rd                Enable ssim rate distortion optimization, 0 to disable. Default %s\n", OPT(param->bSsimRd));
170
-    H0("   --[no-]rd-refine              Enable QP based RD refinement for rd levels 5 and 6. Default %s\n", OPT(param->bEnableRdRefine));
171
-    H0("   --[no-]early-skip             Enable early SKIP detection. Default %s\n", OPT(param->bEnableEarlySkip));
172
-    H0("   --[no-]rskip                  Enable early exit from recursion. Default %s\n", OPT(param->bEnableRecursionSkip));
173
-    H1("   --[no-]tskip-fast             Enable fast intra transform skipping. Default %s\n", OPT(param->bEnableTSkipFast));
174
-    H1("   --[no-]splitrd-skip           Enable skipping split RD analysis when sum of split CU rdCost larger than one split CU rdCost for Intra CU. Default %s\n", OPT(param->bEnableSplitRdSkip));
175
-    H1("   --nr-intra <integer>          An integer value in range of 0 to 2000, which denotes strength of noise reduction in intra CUs. Default 0\n");
176
-    H1("   --nr-inter <integer>          An integer value in range of 0 to 2000, which denotes strength of noise reduction in inter CUs. Default 0\n");
177
-    H0("   --ctu-info <integer>          Enable receiving ctu information asynchronously and determine reaction to the CTU information (0, 1, 2, 4, 6) Default 0\n"
178
-       "                                    - 1: force the partitions if CTU information is present\n"
179
-       "                                    - 2: functionality of (1) and reduce qp if CTU information has changed\n"
180
-       "                                    - 4: functionality of (1) and force Inter modes when CTU Information has changed, merge/skip otherwise\n"
181
-       "                                    Enable this option only when planning to invoke the API function x265_encoder_ctu_info to copy ctu-info asynchronously\n");
182
-    H0("\nCoding tools:\n");
183
-    H0("-w/--[no-]weightp                Enable weighted prediction in P slices. Default %s\n", OPT(param->bEnableWeightedPred));
184
-    H0("   --[no-]weightb                Enable weighted prediction in B slices. Default %s\n", OPT(param->bEnableWeightedBiPred));
185
-    H0("   --[no-]cu-lossless            Consider lossless mode in CU RDO decisions. Default %s\n", OPT(param->bCULossless));
186
-    H0("   --[no-]signhide               Hide sign bit of one coeff per TU (rdo). Default %s\n", OPT(param->bEnableSignHiding));
187
-    H1("   --[no-]tskip                  Enable intra 4x4 transform skipping. Default %s\n", OPT(param->bEnableTransformSkip));
188
-    H0("\nTemporal / motion search options:\n");
189
-    H0("   --max-merge <1..5>            Maximum number of merge candidates. Default %d\n", param->maxNumMergeCand);
190
-    H0("   --ref <integer>               max number of L0 references to be allowed (1 .. 16) Default %d\n", param->maxNumReferences);
191
-    H0("   --limit-refs <0|1|2|3>        Limit references per depth (1) or CU (2) or both (3). Default %d\n", param->limitReferences);
192
-    H0("   --me <string>                 Motion search method dia hex umh star full. Default %d\n", param->searchMethod);
193
-    H0("-m/--subme <integer>             Amount of subpel refinement to perform (0:least .. 7:most). Default %d \n", param->subpelRefine);
194
-    H0("   --merange <integer>           Motion search range. Default %d\n", param->searchRange);
195
-    H0("   --[no-]rect                   Enable rectangular motion partitions Nx2N and 2NxN. Default %s\n", OPT(param->bEnableRectInter));
196
-    H0("   --[no-]amp                    Enable asymmetric motion partitions, requires --rect. Default %s\n", OPT(param->bEnableAMP));
197
-    H0("   --[no-]limit-modes            Limit rectangular and asymmetric motion predictions. Default %d\n", param->limitModes);
198
-    H1("   --[no-]temporal-mvp           Enable temporal MV predictors. Default %s\n", OPT(param->bEnableTemporalMvp));
199
-    H1("   --[no-]hme                    Enable Hierarchical Motion Estimation. Default %s\n", OPT(param->bEnableHME));
200
-    H1("   --hme-search <string>         Motion search-method for HME L0,L1 and L2. Default(L0,L1,L2) is %d,%d,%d\n", param->hmeSearchMethod[0], param->hmeSearchMethod[1], param->hmeSearchMethod[2]);
201
-    H1("   --hme-range <int>,<int>,<int> Motion search-range for HME L0,L1 and L2. Default(L0,L1,L2) is %d,%d,%d\n", param->hmeRange[0], param->hmeRange[1], param->hmeRange[2]);
202
-    H0("\nSpatial / intra options:\n");
203
-    H0("   --[no-]strong-intra-smoothing Enable strong intra smoothing for 32x32 blocks. Default %s\n", OPT(param->bEnableStrongIntraSmoothing));
204
-    H0("   --[no-]constrained-intra      Constrained intra prediction (use only intra coded reference pixels) Default %s\n", OPT(param->bEnableConstrainedIntra));
205
-    H0("   --[no-]b-intra                Enable intra in B frames in veryslow presets. Default %s\n", OPT(param->bIntraInBFrames));
206
-    H0("   --[no-]fast-intra             Enable faster search method for angular intra predictions. Default %s\n", OPT(param->bEnableFastIntra));
207
-    H0("   --rdpenalty <0..2>            penalty for 32x32 intra TU in non-I slices. 0:disabled 1:RD-penalty 2:maximum. Default %d\n", param->rdPenalty);
208
-    H0("\nSlice decision options:\n");
209
-    H0("   --[no-]open-gop               Enable open-GOP, allows I slices to be non-IDR. Default %s\n", OPT(param->bOpenGOP));
210
-    H0("-I/--keyint <integer>            Max IDR period in frames. -1 for infinite-gop. Default %d\n", param->keyframeMax);
211
-    H0("-i/--min-keyint <integer>        Scenecuts closer together than this are coded as I, not IDR. Default: auto\n");
212
-    H0("   --gop-lookahead <integer>     Extends gop boundary if a scenecut is found within this from keyint boundary. Default 0\n");
213
-    H0("   --no-scenecut                 Disable adaptive I-frame decision\n");
214
-    H0("   --scenecut <integer>          How aggressively to insert extra I-frames. Default %d\n", param->scenecutThreshold);
215
-    H1("   --scenecut-bias <0..100.0>    Bias for scenecut detection. Default %.2f\n", param->scenecutBias); 
216
-    H0("   --hist-scenecut               Enables histogram based scene-cut detection using histogram based algorithm.\n");
217
-    H0("   --no-hist-scenecut            Disables histogram based scene-cut detection using histogram based algorithm.\n");
218
-    H1("   --hist-threshold <0.0..2.0>   Luma Edge histogram's Normalized SAD threshold for histogram based scenecut detection Default %.2f\n", param->edgeTransitionThreshold);
219
-    H0("   --[no-]fades                  Enable detection and handling of fade-in regions. Default %s\n", OPT(param->bEnableFades));
220
-    H1("   --[no-]scenecut-aware-qp      Enable increasing QP for frames inside the scenecut window after scenecut. Default %s\n", OPT(param->bEnableSceneCutAwareQp));
221
-    H1("   --scenecut-window <0..1000>   QP incremental duration(in milliseconds) when scenecut-aware-qp is enabled. Default %d\n", param->scenecutWindow);
222
-    H1("   --max-qp-delta <0..10>        QP offset to increment with base QP for inter-frames. Default %d\n", param->maxQpDelta);
223
-    H0("   --radl <integer>              Number of RADL pictures allowed in front of IDR. Default %d\n", param->radl);
224
-    H0("   --intra-refresh               Use Periodic Intra Refresh instead of IDR frames\n");
225
-    H0("   --rc-lookahead <integer>      Number of frames for frame-type lookahead (determines encoder latency) Default %d\n", param->lookaheadDepth);
226
-    H1("   --lookahead-slices <0..16>    Number of slices to use per lookahead cost estimate. Default %d\n", param->lookaheadSlices);
227
-    H0("   --lookahead-threads <integer> Number of threads to be dedicated to perform lookahead only. Default %d\n", param->lookaheadThreads);
228
-    H0("-b/--bframes <0..16>             Maximum number of consecutive b-frames. Default %d\n", param->bframes);
229
-    H1("   --bframe-bias <integer>       Bias towards B frame decisions. Default %d\n", param->bFrameBias);
230
-    H0("   --b-adapt <0..2>              0 - none, 1 - fast, 2 - full (trellis) adaptive B frame scheduling. Default %d\n", param->bFrameAdaptive);
231
-    H0("   --[no-]b-pyramid              Use B-frames as references. Default %s\n", OPT(param->bBPyramid));
232
-    H1("   --qpfile <string>             Force frametypes and QPs for some or all frames\n");
233
-    H1("                                 Format of each line: framenumber frametype QP\n");
234
-    H1("                                 QP is optional (none lets x265 choose). Frametypes: I,i,K,P,B,b.\n");
235
-    H1("                                 QPs are restricted by qpmin/qpmax.\n");
236
-    H1("   --force-flush <integer>       Force the encoder to flush frames. Default %d\n", param->forceFlush);
237
-    H1("                                 0 - flush the encoder only when all the input pictures are over.\n");
238
-    H1("                                 1 - flush all the frames even when the input is not over. Slicetype decision may change with this option.\n");
239
-    H1("                                 2 - flush the slicetype decided frames only.\n");
240
-    H0("   --[no-]-hrd-concat            Set HRD concatenation flag for the first keyframe in the buffering period SEI. Default %s\n", OPT(param->bEnableHRDConcatFlag));
241
-    H0("\nRate control, Adaptive Quantization:\n");
242
-    H0("   --bitrate <integer>           Target bitrate (kbps) for ABR (implied). Default %d\n", param->rc.bitrate);
243
-    H1("-q/--qp <integer>                QP for P slices in CQP mode (implied). --ipratio and --pbration determine other slice QPs\n");
244
-    H0("   --crf <float>                 Quality-based VBR (0-51). Default %.1f\n", param->rc.rfConstant);
245
-    H1("   --[no-]lossless               Enable lossless: bypass transform, quant and loop filters globally. Default %s\n", OPT(param->bLossless));
246
-    H1("   --crf-max <float>             With CRF+VBV, limit RF to this value. Default %f\n", param->rc.rfConstantMax);
247
-    H1("                                 May cause VBV underflows!\n");
248
-    H1("   --crf-min <float>             With CRF+VBV, limit RF to this value. Default %f\n", param->rc.rfConstantMin);
249
-    H1("                                 this specifies a minimum rate factor value for encode!\n");
250
-    H0("   --vbv-maxrate <integer>       Max local bitrate (kbit/s). Default %d\n", param->rc.vbvMaxBitrate);
251
-    H0("   --vbv-bufsize <integer>       Set size of the VBV buffer (kbit). Default %d\n", param->rc.vbvBufferSize);
252
-    H0("   --vbv-init <float>            Initial VBV buffer occupancy (fraction of bufsize or in kbits). Default %.2f\n", param->rc.vbvBufferInit);
253
-    H0("   --vbv-end <float>             Final VBV buffer emptiness (fraction of bufsize or in kbits). Default 0 (disabled)\n");
254
-    H0("   --vbv-end-fr-adj <float>      Frame from which qp has to be adjusted to achieve final decode buffer emptiness. Default 0\n");
255
-    H0("   --chunk-start <integer>       First frame of the chunk. Default 0 (disabled)\n");
256
-    H0("   --chunk-end <integer>         Last frame of the chunk. Default 0 (disabled)\n");
257
-    H0("   --pass                        Multi pass rate control.\n"
258
-       "                                   - 1 : First pass, creates stats file\n"
259
-       "                                   - 2 : Last pass, does not overwrite stats file\n"
260
-       "                                   - 3 : Nth pass, overwrites stats file\n");
261
-    H0("   --[no-]multi-pass-opt-analysis   Refine analysis in 2 pass based on analysis information from pass 1\n");
262
-    H0("   --[no-]multi-pass-opt-distortion Use distortion of CTU from pass 1 to refine qp in 2 pass\n");
263
-    H0("   --stats                       Filename for stats file in multipass pass rate control. Default x265_2pass.log\n");
264
-    H0("   --[no-]analyze-src-pics       Motion estimation uses source frame planes. Default disable\n");
265
-    H0("   --[no-]slow-firstpass         Enable a slow first pass in a multipass rate control mode. Default %s\n", OPT(param->rc.bEnableSlowFirstPass));
266
-    H0("   --[no-]strict-cbr             Enable stricter conditions and tolerance for bitrate deviations in CBR mode. Default %s\n", OPT(param->rc.bStrictCbr));
267
-    H0("   --analysis-save <filename>    Dump analysis info into the specified file. Default Disabled\n");
268
-    H0("   --analysis-load <filename>    Load analysis buffers from the file specified. Default Disabled\n");
269
-    H0("   --analysis-reuse-file <filename>    Specify file name used for either dumping or reading analysis data. Deault x265_analysis.dat\n");
270
-    H0("   --analysis-reuse-level <1..10>      Level of analysis reuse indicates amount of info stored/reused in save/load mode, 1:least..10:most. Now deprecated. Default %d\n", param->analysisReuseLevel);
271
-    H0("   --analysis-save-reuse-level <1..10> Indicates the amount of analysis info stored in save mode, 1:least..10:most. Default %d\n", param->analysisSaveReuseLevel);
272
-    H0("   --analysis-load-reuse-level <1..10> Indicates the amount of analysis info reused in load mode, 1:least..10:most. Default %d\n", param->analysisLoadReuseLevel);
273
-    H0("   --refine-analysis-type <string>     Reuse anlaysis information received through API call. Supported options are avc and hevc. Default disabled - %d\n", param->bAnalysisType);
274
-    H0("   --scale-factor <int>          Specify factor by which input video is scaled down for analysis save mode. Default %d\n", param->scaleFactor);
275
-    H0("   --refine-intra <0..4>         Enable intra refinement for encode that uses analysis-load.\n"
276
-        "                                    - 0 : Forces both mode and depth from the save encode.\n"
277
-        "                                    - 1 : Functionality of (0) + evaluate all intra modes at min-cu-size's depth when current depth is one smaller than min-cu-size's depth.\n"
278
-        "                                    - 2 : Functionality of (1) + irrespective of size evaluate all angular modes when the save encode decides the best mode as angular.\n"
279
-        "                                    - 3 : Functionality of (1) + irrespective of size evaluate all intra modes.\n"
280
-        "                                    - 4 : Re-evaluate all intra blocks, does not reuse data from save encode.\n"
281
-        "                                Default:%d\n", param->intraRefine);
282
-    H0("   --refine-inter <0..3>         Enable inter refinement for encode that uses analysis-load.\n"
283
-        "                                    - 0 : Forces both mode and depth from the save encode.\n"
284
-        "                                    - 1 : Functionality of (0) + evaluate all inter modes at min-cu-size's depth when current depth is one smaller than\n"
285
-        "                                          min-cu-size's depth. When save encode decides the current block as skip(for all sizes) evaluate skip/merge.\n"
286
-        "                                    - 2 : Functionality of (1) + irrespective of size restrict the modes evaluated when specific modes are decided as the best mode by the save encode.\n"
287
-        "                                    - 3 : Functionality of (1) + irrespective of size evaluate all inter modes.\n"
288
-        "                                Default:%d\n", param->interRefine);
289
-    H0("   --[no-]dynamic-refine         Dynamically changes refine-inter level for each CU. Default %s\n", OPT(param->bDynamicRefine));
290
-    H0("   --refine-mv <1..3>            Enable mv refinement for load mode. Default %d\n", param->mvRefine);
291
-    H0("   --refine-ctu-distortion       Store/normalize ctu distortion in analysis-save/load.\n"
292
-        "                                    - 0 : Disabled.\n"
293
-        "                                    - 1 : Store/Load ctu distortion to/from the file specified in analysis-save/load.\n"
294
-        "                                Default 0 - Disabled\n");
295
-    H0("   --aq-mode <integer>           Mode for Adaptive Quantization - 0:none 1:uniform AQ 2:auto variance 3:auto variance with bias to dark scenes 4:auto variance with edge information. Default %d\n", param->rc.aqMode);
296
-    H0("   --[no-]hevc-aq                Mode for HEVC Adaptive Quantization. Default %s\n", OPT(param->rc.hevcAq));
297
-    H0("   --aq-strength <float>         Reduces blocking and blurring in flat and textured areas (0 to 3.0). Default %.2f\n", param->rc.aqStrength);
298
-    H0("   --qp-adaptation-range <float> Delta QP range by QP adaptation based on a psycho-visual model (1.0 to 6.0). Default %.2f\n", param->rc.qpAdaptationRange);
299
-    H0("   --[no-]aq-motion              Block level QP adaptation based on the relative motion between the block and the frame. Default %s\n", OPT(param->bAQMotion));
300
-    H0("   --qg-size <int>               Specifies the size of the quantization group (64, 32, 16, 8). Default %d\n", param->rc.qgSize);
301
-    H0("   --[no-]cutree                 Enable cutree for Adaptive Quantization. Default %s\n", OPT(param->rc.cuTree));
302
-    H0("   --[no-]rc-grain               Enable ratecontrol mode to handle grains specifically. turned on with tune grain. Default %s\n", OPT(param->rc.bEnableGrain));
303
-    H1("   --ipratio <float>             QP factor between I and P. Default %.2f\n", param->rc.ipFactor);
304
-    H1("   --pbratio <float>             QP factor between P and B. Default %.2f\n", param->rc.pbFactor);
305
-    H1("   --qcomp <float>               Weight given to predicted complexity. Default %.2f\n", param->rc.qCompress);
306
-    H1("   --qpstep <integer>            The maximum single adjustment in QP allowed to rate control. Default %d\n", param->rc.qpStep);
307
-    H1("   --qpmin <integer>             sets a hard lower limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMin);
308
-    H1("   --qpmax <integer>             sets a hard upper limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMax);
309
-    H0("   --[no-]const-vbv              Enable consistent vbv. turned on with tune grain. Default %s\n", OPT(param->rc.bEnableConstVbv));
310
-    H1("   --cbqpoffs <integer>          Chroma Cb QP Offset [-12..12]. Default %d\n", param->cbQpOffset);
311
-    H1("   --crqpoffs <integer>          Chroma Cr QP Offset [-12..12]. Default %d\n", param->crQpOffset);
312
-    H1("   --scaling-list <string>       Specify a file containing HM style quant scaling lists or 'default' or 'off'. Default: off\n");
313
-    H1("   --zones <zone0>/<zone1>/...   Tweak the bitrate of regions of the video\n");
314
-    H1("                                 Each zone is of the form\n");
315
-    H1("                                   <start frame>,<end frame>,<option>\n");
316
-    H1("                                   where <option> is either\n");
317
-    H1("                                       q=<integer> (force QP)\n");
318
-    H1("                                   or  b=<float> (bitrate multiplier)\n");
319
-    H0("   --zonefile <filename>         Zone file containing the zone boundaries and the parameters to be reconfigured.\n");
320
-    H1("   --lambda-file <string>        Specify a file containing replacement values for the lambda tables\n");
321
-    H1("                                 MAX_MAX_QP+1 floats for lambda table, then again for lambda2 table\n");
322
-    H1("                                 Blank lines and lines starting with hash(#) are ignored\n");
323
-    H1("                                 Comma is considered to be white-space\n");
324
-    H0("   --max-ausize-factor <float>   This value controls the maximum AU size defined in specification.\n");
325
-    H0("                                 It represents the percentage of maximum AU size used. Default %.1f\n", param->maxAUSizeFactor);
326
-    H0("\nLoop filters (deblock and SAO):\n");
327
-    H0("   --[no-]deblock                Enable Deblocking Loop Filter, optionally specify tC:Beta offsets Default %s\n", OPT(param->bEnableLoopFilter));
328
-    H0("   --[no-]sao                    Enable Sample Adaptive Offset. Default %s\n", OPT(param->bEnableSAO));
329
-    H1("   --[no-]sao-non-deblock        Use non-deblocked pixels, else right/bottom boundary areas skipped. Default %s\n", OPT(param->bSaoNonDeblocked));
330
-    H0("   --[no-]limit-sao              Limit Sample Adaptive Offset types. Default %s\n", OPT(param->bLimitSAO));
331
-    H0("   --selective-sao <int>         Enable slice-level SAO filter. Default %d\n", param->selectiveSAO);
332
-    H0("\nVUI options:\n");
333
-    H0("   --sar <width:height|int>      Sample Aspect Ratio, the ratio of width to height of an individual pixel.\n");
334
-    H0("                                 Choose from 0=undef, 1=1:1(\"square\"), 2=12:11, 3=10:11, 4=16:11,\n");
335
-    H0("                                 5=40:33, 6=24:11, 7=20:11, 8=32:11, 9=80:33, 10=18:11, 11=15:11,\n");
336
-    H0("                                 12=64:33, 13=160:99, 14=4:3, 15=3:2, 16=2:1 or custom ratio of <int:int>. Default %d\n", param->vui.aspectRatioIdc);
337
-    H1("   --display-window <string>     Describe overscan cropping region as 'left,top,right,bottom' in pixels\n");
338
-    H1("   --overscan <string>           Specify whether it is appropriate for decoder to show cropped region: undef, show or crop. Default undef\n");
339
-    H0("   --videoformat <string>        Specify video format from undef, component, pal, ntsc, secam, mac. Default undef\n");
340
-    H0("   --range <string>              Specify black level and range of luma and chroma signals as full or limited Default limited\n");
341
-    H0("   --colorprim <string>          Specify color primaries from  bt709, unknown, reserved, bt470m, bt470bg, smpte170m,\n");
342
-    H0("                                 smpte240m, film, bt2020, smpte428, smpte431, smpte432. Default undef\n");
343
-    H0("   --transfer <string>           Specify transfer characteristics from bt709, unknown, reserved, bt470m, bt470bg, smpte170m,\n");
344
-    H0("                                 smpte240m, linear, log100, log316, iec61966-2-4, bt1361e, iec61966-2-1,\n");
345
-    H0("                                 bt2020-10, bt2020-12, smpte2084, smpte428, arib-std-b67. Default undef\n");
346
-    H1("   --colormatrix <string>        Specify color matrix setting from undef, bt709, fcc, bt470bg, smpte170m,\n");
347
-    H1("                                 smpte240m, GBR, YCgCo, bt2020nc, bt2020c, smpte2085, chroma-derived-nc, chroma-derived-c, ictcp. Default undef\n");
348
-    H1("   --chromaloc <integer>         Specify chroma sample location (0 to 5). Default of %d\n", param->vui.chromaSampleLocTypeTopField);
349
-    H0("   --master-display <string>     SMPTE ST 2086 master display color volume info SEI (HDR)\n");
350
-    H0("                                    format: G(x,y)B(x,y)R(x,y)WP(x,y)L(max,min)\n");
351
-    H0("   --max-cll <string>            Specify content light level info SEI as \"cll,fall\" (HDR).\n");
352
-    H0("   --[no-]cll                    Emit content light level info SEI. Default %s\n", OPT(param->bEmitCLL));
353
-    H0("   --[no-]hdr10                  Control dumping of HDR10 SEI packet. If max-cll or master-display has non-zero values, this is enabled. Default %s\n", OPT(param->bEmitHDR10SEI));
354
-    H0("   --[no-]hdr-opt                Add luma and chroma offsets for HDR/WCG content. Default %s. Now deprecated.\n", OPT(param->bHDROpt));
355
-    H0("   --[no-]hdr10-opt              Block-level QP optimization for HDR10 content. Default %s.\n", OPT(param->bHDR10Opt));
356
-    H0("   --min-luma <integer>          Minimum luma plane value of input source picture\n");
357
-    H0("   --max-luma <integer>          Maximum luma plane value of input source picture\n");
358
-    H0("\nBitstream options:\n");
359
-    H0("   --[no-]repeat-headers         Emit SPS and PPS headers at each keyframe. Default %s\n", OPT(param->bRepeatHeaders));
360
-    H0("   --[no-]info                   Emit SEI identifying encoder and parameters. Default %s\n", OPT(param->bEmitInfoSEI));
361
-    H0("   --[no-]hrd                    Enable HRD parameters signaling. Default %s\n", OPT(param->bEmitHRDSEI));
362
-    H0("   --[no-]idr-recovery-sei      Emit recovery point infor SEI at each IDR frame \n");
363
-    H0("   --[no-]temporal-layers        Enable a temporal sublayer for unreferenced B frames. Default %s\n", OPT(param->bEnableTemporalSubLayers));
364
-    H0("   --[no-]aud                    Emit access unit delimiters at the start of each access unit. Default %s\n", OPT(param->bEnableAccessUnitDelimiters));
365
-    H1("   --hash <integer>              Decoded Picture Hash SEI 0: disabled, 1: MD5, 2: CRC, 3: Checksum. Default %d\n", param->decodedPictureHashSEI);
366
-    H0("   --atc-sei <integer>           Emit the alternative transfer characteristics SEI message where the integer is the preferred transfer characteristics. Default disabled\n");
367
-    H0("   --pic-struct <integer>        Set the picture structure and emits it in the picture timing SEI message. Values in the range 0..12. See D.3.3 of the HEVC spec. for a detailed explanation.\n");
368
-    H0("   --log2-max-poc-lsb <integer>  Maximum of the picture order count\n");
369
-    H0("   --[no-]vui-timing-info        Emit VUI timing information in the bistream. Default %s\n", OPT(param->bEmitVUITimingInfo));
370
-    H0("   --[no-]vui-hrd-info           Emit VUI HRD information in the bistream. Default %s\n", OPT(param->bEmitVUIHRDInfo));
371
-    H0("   --[no-]opt-qp-pps             Dynamically optimize QP in PPS (instead of default 26) based on QPs in previous GOP. Default %s\n", OPT(param->bOptQpPPS));
372
-    H0("   --[no-]opt-ref-list-length-pps  Dynamically set L0 and L1 ref list length in PPS (instead of default 0) based on values in last GOP. Default %s\n", OPT(param->bOptRefListLengthPPS));
373
-    H0("   --[no-]multi-pass-opt-rps     Enable storing commonly used RPS in SPS in multi pass mode. Default %s\n", OPT(param->bMultiPassOptRPS));
374
-    H0("   --[no-]opt-cu-delta-qp        Optimize to signal consistent CU level delta QPs in frame. Default %s\n", OPT(param->bOptCUDeltaQP));
375
-    H1("\nReconstructed video options (debugging):\n");
376
-    H1("-r/--recon <filename>            Reconstructed raw image YUV or Y4M output file name\n");
377
-    H1("   --recon-depth <integer>       Bit-depth of reconstructed raw image file. Defaults to input bit depth, or 8 if Y4M\n");
378
-    H1("   --recon-y4m-exec <string>     pipe reconstructed frames to Y4M viewer, ex:\"ffplay -i pipe:0 -autoexit\"\n");
379
-    H0("   --lowpass-dct                 Use low-pass subband dct approximation. Default %s\n", OPT(param->bLowPassDct));
380
-    H0("   --[no-]frame-dup              Enable Frame duplication. Default %s\n", OPT(param->bEnableFrameDuplication));
381
-    H0("   --dup-threshold <integer>     PSNR threshold for Frame duplication. Default %d\n", param->dupThreshold);
382
-#ifdef SVT_HEVC
383
-    H0("   --[no]svt                     Enable SVT HEVC encoder %s\n", OPT(param->bEnableSvtHevc));
384
-    H0("   --[no-]svt-hme                Enable Hierarchial motion estimation(HME) in SVT HEVC encoder \n");
385
-    H0("   --svt-search-width            Motion estimation search area width for SVT HEVC encoder \n");
386
-    H0("   --svt-search-height           Motion estimation search area height for SVT HEVC encoder \n");
387
-    H0("   --[no-]svt-compressed-ten-bit-format  Enable 8+2 encoding mode for 10bit input in SVT HEVC encoder \n");
388
-    H0("   --[no-]svt-speed-control      Enable speed control functionality to achieve real time encoding speed for  SVT HEVC encoder \n");
389
-    H0("   --svt-preset-tuner            Enable additional faster presets of SVT; This only has to be used on top of x265's ultrafast preset. Accepts values in the range of 0-2 \n");
390
-    H0("   --svt-hierarchical-level      Hierarchical layer for SVT-HEVC encoder; Accepts inputs in the range 0-3 \n");
391
-    H0("   --svt-base-layer-switch-mode  Select whether B/P slice should be used in base layer for SVT-HEVC encoder. 0-Use B-frames; 1-Use P frames in the base layer \n");
392
-    H0("   --svt-pred-struct             Select pred structure for SVT HEVC encoder;  Accepts inputs in the range 0-2 \n");
393
-    H0("   --[no-]svt-fps-in-vps         Enable VPS timing info for SVT HEVC encoder  \n");
394
-#endif
395
-    H1("\nExecutable return codes:\n");
396
-    H1("    0 - encode successful\n");
397
-    H1("    1 - unable to parse command line\n");
398
-    H1("    2 - unable to open encoder\n");
399
-    H1("    3 - unable to generate stream headers\n");
400
-    H1("    4 - encoder abort\n");
401
-#undef OPT
402
-#undef H0
403
-#undef H1
404
-    if (level < X265_LOG_DEBUG)
405
-        printf("\nUse --fullhelp for a full listing (or --log-level full --help)\n");
406
-    printf("\n\nComplete documentation may be found at http://x265.readthedocs.org/en/default/cli.html\n");
407
-    exit(1);
408
-}
409
+        /* in microseconds */
410
+        static const int UPDATE_INTERVAL = 250000;
411
+        CLIOptions()
412
+        {
413
+            input = NULL;
414
+            recon = NULL;
415
+            output = NULL;
416
+            qpfile = NULL;
417
+            zoneFile = NULL;
418
+            dolbyVisionRpu = NULL;
419
+            reconPlayCmd = NULL;
420
+            api = NULL;
421
+            param = NULL;
422
+            vmafData = NULL;
423
+            framesToBeEncoded = seek = 0;
424
+            totalbytes = 0;
425
+            bProgress = true;
426
+            bForceY4m = false;
427
+            startTime = x265_mdate();
428
+            prevUpdateTime = 0;
429
+            bDither = false;
430
+            isAbrLadderConfig = false;
431
+            enableScaler = false;
432
+            encName = NULL;
433
+            reuseName = NULL;
434
+            encId = 0;
435
+            refId = -1;
436
+            loadLevel = 0;
437
+            saveLevel = 0;
438
+            numRefs = 0;
439
+            argCnt = 0;
440
+        }
441
 
442
+        void destroy();
443
+        void printStatus(uint32_t frameNum);
444
+        bool parse(int argc, char **argv);
445
+        bool parseZoneParam(int argc, char **argv, x265_param* globalParam, int zonefileCount);
446
+        bool parseQPFile(x265_picture &pic_org);
447
+        bool parseZoneFile();
448
+        int rpuParser(x265_picture * pic);
449
+    };
450
 #ifdef __cplusplus
451
 }
452
 #endif
453
Refresh
Refresh


Request History
Olaf Hering's avatar

olh created request almost 5 years ago

- Update to version 3.4
New features:
* Edge-aware quadtree partitioning to terminate CU depth
recursion based on edge information. --rskip level 2 enables
the feature and --rskip-edge-threshold denotes the minimum
expected edge-density percentage within the CU, below which
the recursion is skipped. Experimental feature.
* Application-level feature --abr-ladder for automating
efficient ABR ladder generation. Shows ~65% savings in the
over-all turn-around time required for the generation of a
typical Apple HLS ladder in Intel(R) Xeon(R) Platinum 8280
CPU @ 2.70GHz over a sequential ABR-ladder generation
approach that leverages save-load architecture.
Enhancements to existing features:
* Improved efficiency in 2-pass rate-control algorithm. The
savings in the bitrate is ~1.72% with visual improvement in
quality in the initial 1-2 secs.
Encoder enhancements:
* Faster ARM64 encodes enabled by ASM contributions from
Huawei. The speed-up over no-asm version for 1080p encodes @
medium preset is ~15% in a 16 core H/W.
* Strict VBV conformance in zone encoding.
Bug fixes:
* Multi-pass encode failures with --frame-dup.
* Corrupted bitstreams with --hist-scenecut when input depth
and internal bit-depth differ.
* Incorrect analysis propagation in multi-level save-load
architecture.
* Failure in detecting NUMA packages installed in non-standard
directories.


Olaf Hering's avatar

olh accepted request almost 5 years ago