Overview

Request 3913 (accepted)

Update to version 2.5

Submit package home:enzokiel:branches:Essentials / x265 to package Essentials / x265

x265.changes Changed
x
 
1
@@ -1,4 +1,57 @@
2
 -------------------------------------------------------------------
3
+Thu Jul 27 08:33:52 UTC 2017 - joerg.lorenzen@ki.tng.de
4
+
5
+- Update to version 2.5
6
+  Encoder enhancements
7
+  * Improved grain handling with --tune grain option by throttling
8
+    VBV operations to limit QP jumps.
9
+  * Frame threads are now decided based on number of threads
10
+    specified in the --pools, as opposed to the number of hardware
11
+    threads available. The mapping was also adjusted to improve
12
+    quality of the encodes with minimal impact to performance.
13
+  * CSV logging feature (enabled by --csv) is now part of the
14
+    library; it was previously part of the x265 application.
15
+    Applications that integrate libx265 can now extract frame level
16
+    statistics for their encodes by exercising this option in the
17
+    library.
18
+  * Globals that track min and max CU sizes, number of slices, and
19
+    other parameters have now been moved into instance-specific
20
+    variables. Consequently, applications that invoke multiple
21
+    instances of x265 library are no longer restricted to use the
22
+    same settings for these parameter options across the multiple
23
+    instances.
24
+  * x265 can now generate a seprate library that exports the HDR10+
25
+    parsing API. Other libraries that wish to use this API may do
26
+    so by linking against this library. Enable ENABLE_HDR10_PLUS in
27
+    CMake options and build to generate this library.
28
+  * SEA motion search receives a 10% performance boost from AVX2
29
+    optimization of its kernels.
30
+  * The CSV log is now more elaborate with additional fields such
31
+    as PU statistics, average-min-max luma and chroma values, etc.
32
+    Refer to documentation of --csv for details of all fields.
33
+  * x86inc.asm cleaned-up for improved instruction handling.
34
+  API changes
35
+  * New API x265_encoder_ctu_info() introduced to specify suggested
36
+    partition sizes for various CTUs in a frame. To be used in
37
+    conjunction with --ctu-info to react to the specified
38
+    partitions appropriately.
39
+  * Rate-control statistics passed through the x265_picture object
40
+    for an incoming frame are now used by the encoder.
41
+  * Options to scale, reuse, and refine analysis for incoming
42
+    analysis shared through the x265_analysis_data field in
43
+    x265_picture for runs that use --analysis-reuse-mode load; use
44
+    options --scale, --refine-mv, --refine-inter, and
45
+    --refine-intra to explore.
46
+  * VBV now has a deterministic mode. Use --const-vbv to exercise.
47
+  Bug fixes
48
+  * Several fixes for HDR10+ parsing code including incompatibility
49
+    with user-specific SEI, removal of warnings, linking issues in
50
+    linux, etc.
51
+  * SEI messages for HDR10 repeated every keyint when HDR options
52
+    (--hdr-opt, --master-display) specified.
53
+- soname bump to 130.
54
+
55
+-------------------------------------------------------------------
56
 Thu Apr 27 14:15:13 UTC 2017 - joerg.lorenzen@ki.tng.de
57
 
58
 - Update to version 2.4
59
x265.spec Changed
14
 
1
@@ -1,10 +1,10 @@
2
 # based on the spec file from https://build.opensuse.org/package/view_file/home:Simmphonie/libx265/
3
 
4
 Name:           x265
5
-%define soname  116
6
+%define soname  130
7
 %define libname lib%{name}
8
 %define libsoname %{libname}-%{soname}
9
-Version:        2.4
10
+Version:        2.5
11
 Release:        0
12
 License:        GPL-2.0+
13
 Summary:        A free h265/HEVC encoder - encoder binary
14
baselibs.conf Changed
4
 
1
@@ -1,1 +1,1 @@
2
-libx265-116
3
+libx265-130
4
x265_2.4.tar.gz/source/dynamicHDR10/BasicStructures.cpp Deleted
42
 
1
@@ -1,40 +0,0 @@
2
-/**
3
- * @file                       BasicStructures.cpp
4
- * @brief                      Defines the structure of metadata parameters
5
- * @author                     Daniel Maximiliano Valenzuela, Seongnam Oh.
6
- * @create date                03/01/2017
7
- * @version                    0.0.1
8
- *
9
- * Copyright @ 2017 Samsung Electronics, DMS Lab, Samsung Research America and Samsung Research Tijuana
10
- *
11
- * This program is free software; you can redistribute it and/or
12
- * modify it under the terms of the GNU General Public License
13
- * as published by the Free Software Foundation; either version 2
14
- * of the License, or (at your option) any later version.
15
- *
16
- * This program is distributed in the hope that it will be useful,
17
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
18
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
19
- * GNU General Public License for more details.
20
- *
21
- * You should have received a copy of the GNU General Public License
22
- * along with this program; if not, write to the Free Software
23
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
24
- * MA 02110-1301, USA.
25
-**/
26
-
27
-#include "BasicStructures.h"
28
-#include "vector"
29
-
30
-struct PercentileLuminance{
31
-
32
-    float averageLuminance = 0.0;
33
-    float maxRLuminance = 0.0;
34
-    float maxGLuminance = 0.0;
35
-    float maxBLuminance = 0.0;
36
-    int order;
37
-    std::vector<unsigned int> percentiles;
38
-};
39
-
40
-
41
-
42
x265_2.4.tar.gz/.hg_archival.txt -> x265_2.5.tar.gz/.hg_archival.txt Changed
8
 
1
@@ -1,4 +1,4 @@
2
 repo: 09fe40627f03a0f9c3e6ac78b22ac93da23f9fdf
3
-node: e7a4dd48293b7956d4a20df257d23904cc78e376
4
+node: 64b2d0bf45a52511e57a6b7299160b961ca3d51c
5
 branch: stable
6
-tag: 2.4
7
+tag: 2.5
8
x265_2.4.tar.gz/.hgtags -> x265_2.5.tar.gz/.hgtags Changed
6
 
1
@@ -22,3 +22,4 @@
2
 981e3bfef16a997bce6f46ce1b15631a0e234747 2.1
3
 be14a7e9755e54f0fd34911c72bdfa66981220bc 2.2
4
 3037c1448549ca920967831482c653e5892fa8ed 2.3
5
+e7a4dd48293b7956d4a20df257d23904cc78e376 2.4
6
x265_2.4.tar.gz/doc/reST/api.rst -> x265_2.5.tar.gz/doc/reST/api.rst Changed
29
 
1
@@ -192,6 +192,12 @@
2
     *      presets is not recommended without a more fine-grained breakdown of
3
     *      parameters to take this into account. */
4
    int x265_encoder_reconfig(x265_encoder *, x265_param *);
5
+**x265_encoder_ctu_info**
6
+       /* x265_encoder_ctu_info:
7
+        *    Copy CTU information such as ctu address and ctu partition structure of all
8
+        *    CTUs in each frame. The function is invoked only if "--ctu-info" is enabled and
9
+        *    the encoder will wait for this copy to complete if enabled.
10
+        */
11
 
12
 Pictures
13
 ========
14
@@ -341,6 +347,14 @@
15
 Cleanup
16
 =======
17
 
18
+At the end of the encode, the application will want to trigger logging
19
+of the final encode statistics, if :option:`--csv` had been specified::
20
+
21
+   /* x265_encoder_log:
22
+    *       write a line to the configured CSV file. If a CSV filename was not
23
+    *       configured, or file open failed, this function will perform no write. */
24
+   void x265_encoder_log(x265_encoder *encoder, int argc, char **argv);
25
+   
26
 Finally, the encoder must be closed in order to free all of its
27
 resources. An encoder that has been flushed cannot be restarted and
28
 reused. Once **x265_encoder_close()** has been called, the encoder
29
x265_2.4.tar.gz/doc/reST/cli.rst -> x265_2.5.tar.gz/doc/reST/cli.rst Changed
268
 
1
@@ -52,8 +52,7 @@
2
    2. unable to open encoder
3
    3. unable to generate stream headers
4
    4. encoder abort
5
-   5. unable to open csv file
6
-
7
+   
8
 Logging/Statistic Options
9
 =========================
10
 
11
@@ -83,9 +82,66 @@
12
    it adds one line per run. If :option:`--csv-log-level` is greater than
13
    0, it writes one line per frame. Default none
14
 
15
-   Several frame performance statistics are available when 
16
-   :option:`--csv-log-level` is greater than or equal to 2:
17
-
18
+   The following statistics are available when :option:`--csv-log-level` is
19
+   greater than or equal to 1:
20
+   
21
+   **Encode Order** The frame order in which the encoder encodes.
22
+   
23
+   **Type** Slice type of the frame.
24
+   
25
+   **POC** Picture Order Count - The display order of the frames. 
26
+   
27
+   **QP** Quantization Parameter decided for the frame. 
28
+   
29
+   **Bits** Number of bits consumed by the frame.
30
+   
31
+   **Scenecut** 1 if the frame is a scenecut, 0 otherwise. 
32
+   
33
+   **RateFactor** Applicable only when CRF is enabled. The rate factor depends
34
+   on the CRF given by the user. This is used to determine the QP so as to 
35
+   target a certain quality.
36
+   
37
+   **BufferFill** Bits available for the next frame. Includes bits carried
38
+   over from the current frame.
39
+   
40
+   **Latency** Latency in terms of number of frames between when the frame 
41
+   was given in and when the frame is given out.
42
+   
43
+   **PSNR** Peak signal to noise ratio for Y, U and V planes.
44
+   
45
+   **SSIM** A quality metric that denotes the structural similarity between frames.
46
+   
47
+   **Ref lists** POC of references in lists 0 and 1 for the frame.
48
+   
49
+   Several statistics about the encoded bitstream and encoder performance are 
50
+   available when :option:`--csv-log-level` is greater than or equal to 2:
51
+   
52
+   **I/P cost ratio:** The ratio between the cost when a frame is decided as an
53
+   I frame to that when it is decided as a P frame as computed from the 
54
+   quarter-resolution frame in look-ahead. This, in combination with other parameters
55
+   such as position of the frame in the GOP, is used to decide scene transitions.
56
+   
57
+   **Analysis statistics:**
58
+   
59
+   **CU Statistics** percentage of CU modes.
60
+   
61
+   **Distortion** Average luma and chroma distortion. Calculated as
62
+   SSE is done on fenc and recon(after quantization).
63
+   
64
+   **Psy Energy**  Average psy energy calculated as the sum of absolute
65
+   difference between source and recon energy. Energy is measured by sa8d
66
+   minus SAD.
67
+   
68
+   **Residual Energy** Average residual energy. SSE is calculated on fenc 
69
+   and pred(before quantization).
70
+   
71
+   **Luma/Chroma Values** minumum, maximum and average(averaged by area)
72
+   luma and chroma values of source for each frame.
73
+   
74
+   **PU Statistics** percentage of PU modes at each depth.
75
+   
76
+   **Performance statistics:**
77
+   
78
    **DecideWait ms** number of milliseconds the frame encoder had to
79
    wait, since the previous frame was retrieved by the API thread,
80
    before a new frame has been given to it. This is the latency
81
@@ -111,6 +167,8 @@
82
    **Stall Time ms** the number of milliseconds of the reported wall
83
    time that were spent with zero worker threads, aka all compression
84
    was completely stalled.
85
+   
86
+   **Total frame time** Total time spent to encode the frame.
87
 
88
    **Avg WPP** the average number of worker threads working on this
89
    frame, at any given time. This value is sampled at the completion of
90
@@ -123,8 +181,6 @@
91
    is more of a problem for P frames where some blocks are much more
92
    expensive than others.
93
    
94
-   **CLI ONLY**
95
-
96
 .. option:: --csv-log-level <integer>
97
 
98
     Controls the level of detail (and size) of --csv log files
99
@@ -133,8 +189,6 @@
100
     1. frame level logging
101
     2. frame level logging with performance statistics
102
 
103
-    **CLI ONLY**
104
-
105
 .. option:: --ssim, --no-ssim
106
 
107
    Calculate and report Structural Similarity values. It is
108
@@ -795,33 +849,31 @@
109
 
110
 Analysis re-use options, to improve performance when encoding the same
111
 sequence multiple times (presumably at varying bitrates). The encoder
112
-will not reuse analysis if the resolution and slice type parameters do
113
-not match.
114
+will not reuse analysis if slice type parameters do not match.
115
 
116
-.. option:: --analysis-mode <string|int>
117
+.. option:: --analysis-reuse-mode <string|int>
118
 
119
-   Specify whether analysis information of each frame is output by encoder
120
-   or input for reuse. By reading the analysis data writen by an
121
-   earlier encode of the same sequence, substantial redundant work may
122
-   be avoided.
123
-
124
-   The following data may be stored and reused:
125
-   I frames   - split decisions and luma intra directions of all CUs.
126
-   P/B frames - motion vectors are dumped at each depth for all CUs.
127
+   This option allows reuse of analysis information from first pass to second pass.
128
+   :option:`--analysis-reuse-mode save` specifies that encoder outputs analysis information of each frame.
129
+   :option:`--analysis-reuse-mode load` specifies that encoder reuses analysis information from first pass.
130
+   There is no benefit using load mode without running encoder in save mode. Analysis data from save mode is
131
+   written to a file specified by :option:`--analysis-reuse-file`. The amount of analysis data stored/reused
132
+   is determined by :option:`--analysis-reuse-level`. By reading the analysis data writen by an earlier encode
133
+   of the same sequence, substantial redundant work may be avoided. Requires cutree, pmode to be off. Default 0.
134
 
135
    **Values:** off(0), save(1): dump analysis data, load(2): read analysis data
136
 
137
-.. option:: --analysis-file <filename>
138
+.. option:: --analysis-reuse-file <filename>
139
 
140
-   Specify a filename for analysis data (see :option:`--analysis-mode`)
141
+   Specify a filename for analysis data (see :option:`--analysis-reuse-mode`)
142
    If no filename is specified, x265_analysis.dat is used.
143
 
144
-.. option:: --refine-level <1..10>
145
+.. option:: --analysis-reuse-level <1..10>
146
 
147
-   Amount of information stored/reused in :option:`--analysis-mode` is distributed across levels.
148
+   Amount of information stored/reused in :option:`--analysis-reuse-mode` is distributed across levels.
149
    Higher the value, higher the information stored/reused, faster the encode. Default 5.
150
 
151
-   Note that --refine-level must be paired with analysis-mode.
152
+   Note that --analysis-reuse-level must be paired with analysis-reuse-mode.
153
 
154
    +--------+-----------------------------------------+
155
    | Level  | Description                             |
156
@@ -835,6 +887,41 @@
157
    | 10     | Level 5 + Full CU analysis-info         |
158
    +--------+-----------------------------------------+
159
 
160
+.. option:: --scale-factor
161
+
162
+       Factor by which input video is scaled down for analysis save mode.
163
+       This option should be coupled with analysis-reuse-mode option, --analysis-reuse-level 10.
164
+       The ctu size of load should be double the size of save. Default 0.
165
+
166
+.. option:: --refine-intra <0|1|2>
167
+   
168
+   Enables refinement of intra blocks in current encode. 
169
+   
170
+   Level 0 - Forces both mode and depth from the previous encode.
171
+   
172
+   Level 1 - Evaluates all intra modes for blocks of size one smaller than 
173
+   the min-cu-size of the incoming analysis data from the previous encode, 
174
+   forces modes for blocks of larger size.
175
+   
176
+   Level 2 - Evaluates all intra modes for blocks of size one smaller than 
177
+   the min-cu-size of the incoming analysis data from the previous encode. 
178
+   For larger blocks, force only depth when angular mode is chosen by the 
179
+   previous encode, force depth and mode when other intra modes are chosen.
180
+   
181
+   Default 0.
182
+   
183
+.. option:: --refine-inter-depth
184
+
185
+   Enables refinement of inter blocks in current encode. Evaluates all 
186
+   inter modes for blocks of size one smaller than the min-cu-size of the 
187
+   incoming analysis data from the previous encode. Default disabled.
188
+
189
+.. option:: --refine-mv
190
+   
191
+   Enables refinement of motion vector for scaled video. Evaluates the best 
192
+   motion vector by searching the surrounding eight integer and subpel pixel
193
+    positions.
194
+
195
 Options which affect the transform unit quad-tree, sometimes referred to
196
 as the residual quad-tree (RQT).
197
 
198
@@ -1221,7 +1308,16 @@
199
    intra cost of a frame used in scenecut detection. For example, a value of 5 indicates,
200
    if the inter cost of a frame is greater than or equal to 95 percent of the intra cost of the frame,
201
    then detect this frame as scenecut. Values between 5 and 15 are recommended. Default 5. 
202
-   
203
+
204
+.. option:: --ctu-info <0, 1, 2, 4, 6>
205
+
206
+   This value enables receiving CTU information asynchronously and determine reaction to the CTU information. Default 0.
207
+   1: force the partitions if CTU information is present.
208
+   2: functionality of (1) and reduce qp if CTU information has changed.
209
+   4: functionality of (1) and force Inter modes when CTU Information has changed, merge/skip otherwise.
210
+   This option should be enabled only when planning to invoke the API function x265_encoder_ctu_info to copy ctu-info asynchronously. 
211
+   If enabled without calling the API function, the encoder will wait indefinitely.
212
+
213
 .. option:: --intra-refresh
214
 
215
    Enables Periodic Intra Refresh(PIR) instead of keyframe insertion.
216
@@ -1491,7 +1587,11 @@
217
     and also redundant steps are skipped.
218
     In pass 1 analysis information like motion vector, depth, reference and prediction
219
     modes of the final best CTU partition is stored for each CTU.
220
-    Default disabled.
221
+    Multipass analysis refinement cannot be enabled when 'analysis-save/analysis-load' option
222
+    is enabled and both will be disabled when enabled together. This feature requires 'pmode/pme'
223
+    to be disabled and hence pmode/pme will be disabled when enabled at the same time.
224
+
225
+    Default: disabled.
226
 
227
 .. option:: --multi-pass-opt-distortion, --no-multi-pass-opt-distortion
228
 
229
@@ -1499,7 +1599,11 @@
230
     ratecontrol. In pass 1 distortion of best CTU partition is stored. CTUs with high
231
     distortion get lower(negative)qp offsets and vice-versa for low distortion CTUs in pass 2.
232
     This helps to improve the subjective quality.
233
-    Default disabled.
234
+    Multipass refinement of qp cannot be enabled when 'analysis-save/analysis-load' option
235
+    is enabled and both will be disabled when enabled together. 'multi-pass-opt-distortion' 
236
+    requires 'pmode/pme' to be disabled and hence pmode/pme will be disabled when enabled along with it.
237
+
238
+    Default: disabled.
239
 
240
 .. option:: --strict-cbr, --no-strict-cbr
241
    
242
@@ -1573,6 +1677,11 @@
243
    that this option is used through the tune grain feature where a combination 
244
    of param options are used to improve visual quality.
245
    
246
+.. option:: --const-vbv, --no-const-vbv
247
+
248
+   Enables VBV algorithm to be consistent across runs. Default disabled. 
249
+   Enabled when :option:'--tune' grain is applied.
250
+   
251
 .. option:: --qblur <float>
252
 
253
    Temporally blur quants. Default 0.5
254
@@ -1879,7 +1988,12 @@
255
    
256
 .. option:: --dhdr10-info <filename>
257
 
258
-   Inserts tone mapping information as an SEI message.
259
+   Inserts tone mapping information as an SEI message. It takes as input, 
260
+   the path to the JSON file containing the Creative Intent Metadata 
261
+   to be encoded as Dynamic Tone Mapping into the bitstream. 
262
+   
263
+   Click `here <https://www.sra.samsung.com/assets/User-data-registered-itu-t-t35-SEI-message-for-ST-2094-40-v1.1.pdf>`_
264
+   for the syntax of the metadata file. A sample JSON file is available in `the downloads page <https://bitbucket.org/multicoreware/x265/downloads/DCIP3_4K_to_400_dynamic.json>`_
265
    
266
 .. option:: --dhdr10-opt, --no-dhdr10-opt
267
 
268
x265_2.4.tar.gz/doc/reST/releasenotes.rst -> x265_2.5.tar.gz/doc/reST/releasenotes.rst Changed
37
 
1
@@ -2,8 +2,33 @@
2
 Release Notes
3
 *************
4
 
5
-Release Notes
6
-*************
7
+Version 2.5
8
+===========
9
+
10
+Release date - 13th July, 2017.
11
+
12
+Encoder enhancements
13
+--------------------
14
+1. Improved grain handling with :option:`--tune` grain option by throttling VBV operations to limit QP jumps.
15
+2. Frame threads are now decided based on number of threads specified in the :option:`--pools`, as opposed to the number of hardware threads available. The mapping was also adjusted to improve quality of the encodes with minimal impact to performance.
16
+3. CSV logging feature (enabled by :option:`--csv`) is now part of the library; it was previously part of the x265 application. Applications that integrate libx265 can now extract frame level statistics for their encodes by exercising this option in the library.
17
+4.  Globals that track min and max CU sizes, number of slices, and other parameters have now been moved into instance-specific variables. Consequently, applications that invoke multiple instances of x265 library are no longer restricted to use the same settings for these parameter options across the multiple instances.
18
+5. x265 can now generate a seprate library that exports the HDR10+ parsing API. Other libraries that wish to use this API may do so by linking against this library. Enable ENABLE_HDR10_PLUS in CMake options and build to generate this library.
19
+6. SEA motion search receives a 10% performance boost from AVX2 optimization of its kernels.
20
+7. The CSV log is now more elaborate with additional fields such as PU statistics, average-min-max luma and chroma values, etc. Refer to documentation of :option:`--csv` for details of all fields.
21
+8. x86inc.asm cleaned-up for improved instruction handling.
22
+
23
+API changes
24
+-----------
25
+1. New API x265_encoder_ctu_info() introduced to specify suggested partition sizes for various CTUs in a frame. To be used in conjunction with :option:`--ctu-info` to react to the specified partitions appropriately.
26
+2. Rate-control statistics passed through the x265_picture object for an incoming frame are now used by the encoder.
27
+3. Options to scale, reuse, and refine analysis for incoming analysis shared through the x265_analysis_data field in x265_picture for runs that use :option:`--analysis-reuse-mode` load; use options :option:`--scale`, :option:`--refine-mv`, :option:`--refine-inter`, and :option:`--refine-intra` to explore. 
28
+4. VBV now has a deterministic mode. Use :option:`--const-vbv` to exercise.
29
+
30
+Bug fixes
31
+---------
32
+1. Several fixes for HDR10+ parsing code including incompatibility with user-specific SEI, removal of warnings, linking issues in linux, etc.
33
+2. SEI messages for HDR10 repeated every keyint when HDR options (:option:`--hdr-opt`, :option:`--master-display`) specified.
34
 
35
 Version 2.4
36
 ===========
37
x265_2.4.tar.gz/source/CMakeLists.txt -> x265_2.5.tar.gz/source/CMakeLists.txt Changed
132
 
1
@@ -29,7 +29,7 @@
2
 option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF)
3
 mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
4
 # X265_BUILD must be incremented each time the public API is changed
5
-set(X265_BUILD 116)
6
+set(X265_BUILD 130)
7
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
8
                "${PROJECT_BINARY_DIR}/x265.def")
9
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
10
@@ -182,12 +182,19 @@
11
     add_definitions(-O3 -qstrict -qhot -qaltivec)
12
     add_definitions(-qinline=level=10 -qpath=IL:/data/video_files/latest.tpo/)
13
 endif()
14
-
15
-
16
+# this option is to enable the inclusion of dynamic HDR10 library to the libx265 compilation
17
+option(ENABLE_HDR10_PLUS "Enable dynamic HDR10 compilation" OFF)
18
 if(GCC)
19
     add_definitions(-Wall -Wextra -Wshadow)
20
     add_definitions(-D__STDC_LIMIT_MACROS=1)
21
-    add_definitions(-std=gnu++98)
22
+    if(ENABLE_HDR10_PLUS)
23
+        if(CMAKE_CXX_COMPILER_VERSION VERSION_LESS "4.8")
24
+            message(FATAL_ERROR "gcc version above 4.8 required to support hdr10plus")
25
+        endif()
26
+        add_definitions(-std=gnu++11)
27
+    else()
28
+        add_definitions(-std=gnu++98)
29
+    endif()
30
     if(ENABLE_PIC)
31
          add_definitions(-fPIC)
32
     endif(ENABLE_PIC)
33
@@ -363,14 +370,12 @@
34
 else(HIGH_BIT_DEPTH)
35
     add_definitions(-DHIGH_BIT_DEPTH=0 -DX265_DEPTH=8)
36
 endif(HIGH_BIT_DEPTH)
37
-# this option is to enable the inclusion of dynamic HDR10 library to the libx265 compilation
38
-option(ENABLE_DYNAMIC_HDR10 "Enable dynamic HDR10 compilation" OFF)
39
-if (ENABLE_DYNAMIC_HDR10)
40
-    add_subdirectory(dynamicHDR10)
41
-    include_directories(dynamicHDR10)
42
-    add_definitions(-DENABLE_DYNAMIC_HDR10)
43
-endif(ENABLE_DYNAMIC_HDR10)
44
 
45
+if (ENABLE_HDR10_PLUS)
46
+    include_directories(. dynamicHDR10 "${PROJECT_BINARY_DIR}")
47
+    add_subdirectory(dynamicHDR10)
48
+    add_definitions(-DENABLE_HDR10_PLUS)
49
+endif(ENABLE_HDR10_PLUS)
50
 # this option can only be used when linking multiple libx265 libraries
51
 # together, and some alternate API access method is implemented.
52
 option(EXPORT_C_API "Implement public C programming interface" ON)
53
@@ -510,8 +515,10 @@
54
     endif()
55
 endif()
56
 source_group(ASM FILES ${ASM_SRCS})
57
-if(ENABLE_DYNAMIC_HDR10)
58
+if(ENABLE_HDR10_PLUS)
59
     add_library(x265-static STATIC $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10> ${ASM_OBJS} ${ASM_SRCS})
60
+    add_library(hdr10plus-static STATIC $<TARGET_OBJECTS:dynamicHDR10>)
61
+    set_target_properties(hdr10plus-static PROPERTIES OUTPUT_NAME hdr10plus)
62
 else()
63
     add_library(x265-static STATIC $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> ${ASM_OBJS} ${ASM_SRCS})
64
 endif()
65
@@ -524,6 +531,12 @@
66
 install(TARGETS x265-static
67
     LIBRARY DESTINATION ${LIB_INSTALL_DIR}
68
     ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
69
+
70
+if(ENABLE_HDR10_PLUS)
71
+    install(TARGETS hdr10plus-static
72
+        LIBRARY DESTINATION ${LIB_INSTALL_DIR}
73
+        ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
74
+endif()
75
 install(FILES x265.h "${PROJECT_BINARY_DIR}/x265_config.h" DESTINATION include)
76
 
77
 if(CMAKE_RC_COMPILER)
78
@@ -547,10 +560,16 @@
79
 endif()
80
 option(ENABLE_SHARED "Build shared library" ON)
81
 if(ENABLE_SHARED)
82
-
83
-    if(ENABLE_DYNAMIC_HDR10)
84
+    if(ENABLE_HDR10_PLUS)
85
         add_library(x265-shared SHARED "${PROJECT_BINARY_DIR}/x265.def" ${ASM_OBJS}
86
                     ${X265_RC_FILE} $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10>)
87
+        add_library(hdr10plus-shared SHARED $<TARGET_OBJECTS:dynamicHDR10>)
88
+
89
+        if(MSVC)
90
+            set_target_properties(hdr10plus-shared PROPERTIES OUTPUT_NAME libhdr10plus)
91
+        else()
92
+            set_target_properties(hdr10plus-shared PROPERTIES OUTPUT_NAME hdr10plus)
93
+        endif()
94
     else()
95
         add_library(x265-shared SHARED "${PROJECT_BINARY_DIR}/x265.def" ${ASM_OBJS}
96
                    ${X265_RC_FILE} $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common>)
97
@@ -585,6 +604,11 @@
98
                 ARCHIVE DESTINATION ${LIB_INSTALL_DIR}
99
                 RUNTIME DESTINATION ${BIN_INSTALL_DIR})
100
     endif()
101
+    if(ENABLE_HDR10_PLUS)
102
+        install(TARGETS hdr10plus-shared
103
+            LIBRARY DESTINATION ${LIB_INSTALL_DIR}
104
+            ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
105
+    endif()
106
     if(LINKER_OPTIONS)
107
         # set_target_properties can't do list expansion
108
         string(REPLACE ";" " " LINKER_OPTION_STR "${LINKER_OPTIONS}")
109
@@ -646,18 +670,18 @@
110
     endif(WIN32)
111
     if(XCODE)
112
         # Xcode seems unable to link the CLI with libs, so link as one targget
113
-        if(ENABLE_DYNAMIC_HDR10)
114
+        if(ENABLE_HDR10_PLUS)
115
         add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT}
116
-                        x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp
117
+                        x265.cpp x265.h x265cli.h
118
                         $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10> ${ASM_OBJS} ${ASM_SRCS})
119
         else()
120
             add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT}
121
-                        x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp
122
+                        x265.cpp x265.h x265cli.h
123
                         $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> ${ASM_OBJS} ${ASM_SRCS})
124
         endif()
125
     else()
126
         add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT} ${X265_RC_FILE}
127
-                       ${ExportDefs} x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp)
128
+                       ${ExportDefs} x265.cpp x265.h x265cli.h)
129
         if(WIN32 OR NOT ENABLE_SHARED OR INTEL_CXX)
130
             # The CLI cannot link to the shared library on Windows, it
131
             # requires internal APIs not exported from the DLL
132
x265_2.4.tar.gz/source/common/CMakeLists.txt -> x265_2.5.tar.gz/source/common/CMakeLists.txt Changed
14
 
1
@@ -57,10 +57,10 @@
2
     set(VEC_PRIMITIVES vec/vec-primitives.cpp ${PRIMITIVES})
3
     source_group(Intrinsics FILES ${VEC_PRIMITIVES})
4
 
5
-    set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h)
6
+    set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h seaintegral.h)
7
     set(A_SRCS pixel-a.asm const-a.asm cpu-a.asm ssd-a.asm mc-a.asm
8
                mc-a2.asm pixel-util8.asm blockcopy8.asm
9
-               pixeladd8.asm dct8.asm)
10
+               pixeladd8.asm dct8.asm seaintegral.asm)
11
     if(HIGH_BIT_DEPTH)
12
         set(A_SRCS ${A_SRCS} sad16-a.asm intrapred16.asm ipfilter16.asm loopfilter.asm)
13
     else()
14
x265_2.4.tar.gz/source/common/common.h -> x265_2.5.tar.gz/source/common/common.h Changed
9
 
1
@@ -259,7 +259,6 @@
2
 #define LOG2_RASTER_SIZE        (MAX_LOG2_CU_SIZE - LOG2_UNIT_SIZE)
3
 #define RASTER_SIZE             (1 << LOG2_RASTER_SIZE)
4
 #define MAX_NUM_PARTITIONS      (RASTER_SIZE * RASTER_SIZE)
5
-#define NUM_4x4_PARTITIONS      (1U << (g_unitSizeDepth << 1)) // number of 4x4 units in max CU size
6
 
7
 #define MIN_PU_SIZE             4
8
 #define MIN_TU_SIZE             4
9
x265_2.4.tar.gz/source/common/constants.cpp -> x265_2.5.tar.gz/source/common/constants.cpp Changed
9
 
1
@@ -161,7 +161,6 @@
2
     65535
3
 };
4
 
5
-int      g_ctuSizeConfigured = 0;
6
 uint32_t g_maxLog2CUSize = MAX_LOG2_CU_SIZE;
7
 uint32_t g_maxCUSize     = MAX_CU_SIZE;
8
 uint32_t g_unitSizeDepth = NUM_CU_DEPTH;
9
x265_2.4.tar.gz/source/common/constants.h -> x265_2.5.tar.gz/source/common/constants.h Changed
10
 
1
@@ -30,8 +30,6 @@
2
 namespace X265_NS {
3
 // private namespace
4
 
5
-extern int g_ctuSizeConfigured;
6
-
7
 extern double x265_lambda_tab[QP_MAX_MAX + 1];
8
 extern double x265_lambda2_tab[QP_MAX_MAX + 1];
9
 extern const uint16_t x265_chroma_lambda2_offset_tab[MAX_CHROMA_LAMBDA_OFFSET + 1];
10
x265_2.4.tar.gz/source/common/cpu.cpp -> x265_2.5.tar.gz/source/common/cpu.cpp Changed
31
 
1
@@ -69,6 +69,7 @@
2
     { "SSE2Slow",    SSE2 | X265_CPU_SSE2_IS_SLOW },
3
     { "SSE2",        SSE2 },
4
     { "SSE2Fast",    SSE2 | X265_CPU_SSE2_IS_FAST },
5
+    { "LZCNT", X265_CPU_LZCNT },
6
     { "SSE3",        SSE2 | X265_CPU_SSE3 },
7
     { "SSSE3",       SSE2 | X265_CPU_SSE3 | X265_CPU_SSSE3 },
8
     { "SSE4.1",      SSE2 | X265_CPU_SSE3 | X265_CPU_SSSE3 | X265_CPU_SSE4 },
9
@@ -78,16 +79,17 @@
10
     { "AVX",         AVX },
11
     { "XOP",         AVX | X265_CPU_XOP },
12
     { "FMA4",        AVX | X265_CPU_FMA4 },
13
-    { "AVX2",        AVX | X265_CPU_AVX2 },
14
     { "FMA3",        AVX | X265_CPU_FMA3 },
15
+    { "BMI1",        AVX | X265_CPU_LZCNT | X265_CPU_BMI1 },
16
+    { "BMI2",        AVX | X265_CPU_LZCNT | X265_CPU_BMI1 | X265_CPU_BMI2 },
17
+#define AVX2 AVX | X265_CPU_FMA3 | X265_CPU_LZCNT | X265_CPU_BMI1 | X265_CPU_BMI2 | X265_CPU_AVX2
18
+    { "AVX2", AVX2},
19
+#undef AVX2
20
 #undef AVX
21
 #undef SSE2
22
 #undef MMX2
23
     { "Cache32",         X265_CPU_CACHELINE_32 },
24
     { "Cache64",         X265_CPU_CACHELINE_64 },
25
-    { "LZCNT",           X265_CPU_LZCNT },
26
-    { "BMI1",            X265_CPU_BMI1 },
27
-    { "BMI2",            X265_CPU_BMI1 | X265_CPU_BMI2 },
28
     { "SlowCTZ",         X265_CPU_SLOW_CTZ },
29
     { "SlowAtom",        X265_CPU_SLOW_ATOM },
30
     { "SlowPshufb",      X265_CPU_SLOW_PSHUFB },
31
x265_2.4.tar.gz/source/common/cudata.cpp -> x265_2.5.tar.gz/source/common/cudata.cpp Changed
219
 
1
@@ -28,6 +28,7 @@
2
 #include "picyuv.h"
3
 #include "mv.h"
4
 #include "cudata.h"
5
+#define MAX_MV 1 << 14
6
 
7
 using namespace X265_NS;
8
 
9
@@ -110,25 +111,23 @@
10
 
11
 }
12
 
13
-cubcast_t CUData::s_partSet[NUM_FULL_DEPTH] = { NULL, NULL, NULL, NULL, NULL };
14
-uint32_t CUData::s_numPartInCUSize;
15
-
16
 CUData::CUData()
17
 {
18
     memset(this, 0, sizeof(*this));
19
 }
20
 
21
-void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp, int instance)
22
+void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, const x265_param& param, int instance)
23
 {
24
+    int csp = param.internalCsp;
25
     m_chromaFormat  = csp;
26
     m_hChromaShift  = CHROMA_H_SHIFT(csp);
27
     m_vChromaShift  = CHROMA_V_SHIFT(csp);
28
-    m_numPartitions = NUM_4x4_PARTITIONS >> (depth * 2);
29
+    m_numPartitions = param.num4x4Partitions >> (depth * 2);
30
 
31
     if (!s_partSet[0])
32
     {
33
-        s_numPartInCUSize = 1 << g_unitSizeDepth;
34
-        switch (g_maxLog2CUSize)
35
+        s_numPartInCUSize = 1 << param.unitSizeDepth;
36
+        switch (param.maxLog2CUSize)
37
         {
38
         case 6:
39
             s_partSet[0] = bcast256;
40
@@ -220,7 +219,7 @@
41
 
42
         m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions;
43
 
44
-        uint32_t cuSize = g_maxCUSize >> depth;
45
+        uint32_t cuSize = param.maxCUSize >> depth;
46
         m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (cuSize * cuSize);
47
         m_trCoeff[1] = m_trCoeff[2] = 0;
48
         m_transformSkip[1] = m_transformSkip[2] = m_cbf[1] = m_cbf[2] = 0;
49
@@ -262,7 +261,7 @@
50
 
51
         m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions;
52
 
53
-        uint32_t cuSize = g_maxCUSize >> depth;
54
+        uint32_t cuSize = param.maxCUSize >> depth;
55
         uint32_t sizeL = cuSize * cuSize;
56
         uint32_t sizeC = sizeL >> (m_hChromaShift + m_vChromaShift); // block chroma part
57
         m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (sizeL + sizeC * 2);
58
@@ -278,17 +277,17 @@
59
     m_encData       = frame.m_encData;
60
     m_slice         = m_encData->m_slice;
61
     m_cuAddr        = cuAddr;
62
-    m_cuPelX        = (cuAddr % m_slice->m_sps->numCuInWidth) << g_maxLog2CUSize;
63
-    m_cuPelY        = (cuAddr / m_slice->m_sps->numCuInWidth) << g_maxLog2CUSize;
64
+    m_cuPelX        = (cuAddr % m_slice->m_sps->numCuInWidth) << m_slice->m_param->maxLog2CUSize;
65
+    m_cuPelY        = (cuAddr / m_slice->m_sps->numCuInWidth) << m_slice->m_param->maxLog2CUSize;
66
     m_absIdxInCTU   = 0;
67
-    m_numPartitions = NUM_4x4_PARTITIONS;
68
+    m_numPartitions = m_encData->m_param->num4x4Partitions;
69
     m_bFirstRowInSlice = (uint8_t)firstRowInSlice;
70
     m_bLastRowInSlice  = (uint8_t)lastRowInSlice;
71
     m_bLastCuInSlice   = (uint8_t)lastCuInSlice;
72
 
73
     /* sequential memsets */
74
     m_partSet((uint8_t*)m_qp, (uint8_t)qp);
75
-    m_partSet(m_log2CUSize,   (uint8_t)g_maxLog2CUSize);
76
+    m_partSet(m_log2CUSize,   (uint8_t)m_slice->m_param->maxLog2CUSize);
77
     m_partSet(m_lumaIntraDir, (uint8_t)ALL_IDX);
78
     m_partSet(m_chromaIntraDir, (uint8_t)ALL_IDX);
79
     m_partSet(m_tqBypass,     (uint8_t)frame.m_encData->m_param->bLossless);
80
@@ -390,7 +389,7 @@
81
 
82
     memcpy(m_distortion + offset, subCU.m_distortion, childGeom.numPartitions * sizeof(sse_t));
83
 
84
-    uint32_t tmp = 1 << ((g_maxLog2CUSize - childGeom.depth) * 2);
85
+    uint32_t tmp = 1 << ((m_slice->m_param->maxLog2CUSize - childGeom.depth) * 2);
86
     uint32_t tmp2 = subPartIdx * tmp;
87
     memcpy(m_trCoeff[0] + tmp2, subCU.m_trCoeff[0], sizeof(coeff_t)* tmp);
88
 
89
@@ -489,7 +488,7 @@
90
 
91
     memcpy(ctu.m_distortion + m_absIdxInCTU, m_distortion, m_numPartitions * sizeof(sse_t));
92
 
93
-    uint32_t tmpY = 1 << ((g_maxLog2CUSize - depth) * 2);
94
+    uint32_t tmpY = 1 << ((m_slice->m_param->maxLog2CUSize - depth) * 2);
95
     uint32_t tmpY2 = m_absIdxInCTU << (LOG2_UNIT_SIZE * 2);
96
     memcpy(ctu.m_trCoeff[0] + tmpY2, m_trCoeff[0], sizeof(coeff_t)* tmpY);
97
 
98
@@ -568,7 +567,7 @@
99
     m_partCopy(ctu.m_tuDepth + m_absIdxInCTU, m_tuDepth);
100
     m_partCopy(ctu.m_cbf[0] + m_absIdxInCTU, m_cbf[0]);
101
 
102
-    uint32_t tmpY = 1 << ((g_maxLog2CUSize - depth) * 2);
103
+    uint32_t tmpY = 1 << ((m_slice->m_param->maxLog2CUSize - depth) * 2);
104
     uint32_t tmpY2 = m_absIdxInCTU << (LOG2_UNIT_SIZE * 2);
105
     memcpy(ctu.m_trCoeff[0] + tmpY2, m_trCoeff[0], sizeof(coeff_t)* tmpY);
106
 
107
@@ -656,7 +655,7 @@
108
         return m_cuLeft;
109
     }
110
 
111
-    alPartUnitIdx = NUM_4x4_PARTITIONS - 1;
112
+    alPartUnitIdx = m_encData->m_param->num4x4Partitions - 1;
113
     return m_cuAboveLeft;
114
 }
115
 
116
@@ -799,7 +798,7 @@
117
 /* Get left QpMinCu */
118
 const CUData* CUData::getQpMinCuLeft(uint32_t& lPartUnitIdx, uint32_t curAbsIdxInCTU) const
119
 {
120
-    uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
121
+    uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
122
     uint32_t absRorderQpMinCUIdx = g_zscanToRaster[absZorderQpMinCUIdx];
123
 
124
     // check for left CTU boundary
125
@@ -816,7 +815,7 @@
126
 /* Get above QpMinCu */
127
 const CUData* CUData::getQpMinCuAbove(uint32_t& aPartUnitIdx, uint32_t curAbsIdxInCTU) const
128
 {
129
-    uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
130
+    uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
131
     uint32_t absRorderQpMinCUIdx = g_zscanToRaster[absZorderQpMinCUIdx];
132
 
133
     // check for top CTU boundary
134
@@ -855,7 +854,7 @@
135
 
136
 int8_t CUData::getLastCodedQP(uint32_t absPartIdx) const
137
 {
138
-    uint32_t quPartIdxMask = 0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2;
139
+    uint32_t quPartIdxMask = 0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2;
140
     int lastValidPartIdx = getLastValidPartIdx(absPartIdx & quPartIdxMask);
141
 
142
     if (lastValidPartIdx >= 0)
143
@@ -865,7 +864,7 @@
144
         if (m_absIdxInCTU)
145
             return m_encData->getPicCTU(m_cuAddr)->getLastCodedQP(m_absIdxInCTU);
146
         else if (m_cuAddr > 0 && !(m_slice->m_pps->bEntropyCodingSyncEnabled && !(m_cuAddr % m_slice->m_sps->numCuInWidth)))
147
-            return m_encData->getPicCTU(m_cuAddr - 1)->getLastCodedQP(NUM_4x4_PARTITIONS);
148
+            return m_encData->getPicCTU(m_cuAddr - 1)->getLastCodedQP(m_encData->m_param->num4x4Partitions);
149
         else
150
             return (int8_t)m_slice->m_sliceQp;
151
     }
152
@@ -997,7 +996,7 @@
153
 
154
 bool CUData::setQPSubCUs(int8_t qp, uint32_t absPartIdx, uint32_t depth)
155
 {
156
-    uint32_t curPartNumb = NUM_4x4_PARTITIONS >> (depth << 1);
157
+    uint32_t curPartNumb = m_encData->m_param->num4x4Partitions >> (depth << 1);
158
     uint32_t curPartNumQ = curPartNumb >> 2;
159
 
160
     if (m_cuDepth[absPartIdx] > depth)
161
@@ -1623,6 +1622,11 @@
162
                 dir |= (1 << list);
163
                 candMvField[count][list].mv = colmv;
164
                 candMvField[count][list].refIdx = refIdx;
165
+                if (m_encData->m_param->scaleFactor && m_encData->m_param->analysisReuseMode == X265_ANALYSIS_SAVE && m_log2CUSize[0] < 4)
166
+                {
167
+                    MV dist(MAX_MV, MAX_MV);
168
+                    candMvField[count][list].mv = dist;
169
+                }
170
             }
171
         }
172
 
173
@@ -1783,7 +1787,13 @@
174
             int curRefPOC = m_slice->m_refPOCList[picList][refIdx];
175
             int curPOC = m_slice->m_poc;
176
 
177
-            pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC);
178
+            if (m_encData->m_param->scaleFactor && m_encData->m_param->analysisReuseMode == X265_ANALYSIS_SAVE && (m_log2CUSize[0] < 4))
179
+            {
180
+                MV dist(MAX_MV, MAX_MV);
181
+                pmv[numMvc++] = amvpCand[num++] = dist;
182
+            }
183
+            else
184
+                pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC);
185
         }
186
     }
187
 
188
@@ -1905,10 +1915,10 @@
189
     uint32_t offset = 8;
190
 
191
     int16_t xmax = (int16_t)((m_slice->m_sps->picWidthInLumaSamples + offset - m_cuPelX - 1) << mvshift);
192
-    int16_t xmin = -(int16_t)((g_maxCUSize + offset + m_cuPelX - 1) << mvshift);
193
+    int16_t xmin = -(int16_t)((m_encData->m_param->maxCUSize + offset + m_cuPelX - 1) << mvshift);
194
 
195
     int16_t ymax = (int16_t)((m_slice->m_sps->picHeightInLumaSamples + offset - m_cuPelY - 1) << mvshift);
196
-    int16_t ymin = -(int16_t)((g_maxCUSize + offset + m_cuPelY - 1) << mvshift);
197
+    int16_t ymin = -(int16_t)((m_encData->m_param->maxCUSize + offset + m_cuPelY - 1) << mvshift);
198
 
199
     outMV.x = X265_MIN(xmax, X265_MAX(xmin, outMV.x));
200
     outMV.y = X265_MIN(ymax, X265_MAX(ymin, outMV.y));
201
@@ -2090,6 +2100,8 @@
202
 
203
 void CUData::calcCTUGeoms(uint32_t ctuWidth, uint32_t ctuHeight, uint32_t maxCUSize, uint32_t minCUSize, CUGeom cuDataArray[CUGeom::MAX_GEOMS])
204
 {
205
+    uint32_t num4x4Partition = (1U << ((g_log2Size[maxCUSize] - LOG2_UNIT_SIZE) << 1));
206
+
207
     // Initialize the coding blocks inside the CTB
208
     for (uint32_t log2CUSize = g_log2Size[maxCUSize], rangeCUIdx = 0; log2CUSize >= g_log2Size[minCUSize]; log2CUSize--)
209
     {
210
@@ -2118,7 +2130,7 @@
211
                 cu->log2CUSize = log2CUSize;
212
                 cu->childOffset = childIdx - cuIdx;
213
                 cu->absPartIdx = g_depthScanIdx[yOffset][xOffset] * 4;
214
-                cu->numPartitions = (NUM_4x4_PARTITIONS >> ((g_maxLog2CUSize - cu->log2CUSize) * 2));
215
+                cu->numPartitions = (num4x4Partition >> ((g_log2Size[maxCUSize] - cu->log2CUSize) * 2));
216
                 cu->depth = g_log2Size[maxCUSize] - log2CUSize;
217
                 cu->geomRecurId = cuIdx;
218
 
219
x265_2.4.tar.gz/source/common/cudata.h -> x265_2.5.tar.gz/source/common/cudata.h Changed
53
 
1
@@ -161,8 +161,8 @@
2
 {
3
 public:
4
 
5
-    static cubcast_t s_partSet[NUM_FULL_DEPTH]; // pointer to broadcast set functions per absolute depth
6
-    static uint32_t  s_numPartInCUSize;
7
+    cubcast_t s_partSet[NUM_FULL_DEPTH]; // pointer to broadcast set functions per absolute depth
8
+    uint32_t  s_numPartInCUSize;
9
 
10
     bool          m_vbvAffected;
11
 
12
@@ -225,7 +225,7 @@
13
 
14
     CUData();
15
 
16
-    void     initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp, int instance);
17
+    void     initialize(const CUDataMemPool& dataPool, uint32_t depth, const x265_param& param, int instance);
18
     static void calcCTUGeoms(uint32_t ctuWidth, uint32_t ctuHeight, uint32_t maxCUSize, uint32_t minCUSize, CUGeom cuDataArray[CUGeom::MAX_GEOMS]);
19
 
20
     void     initCTU(const Frame& frame, uint32_t cuAddr, int qp, uint32_t firstRowInSlice, uint32_t lastRowInSlice, uint32_t lastCUInSlice);
21
@@ -271,7 +271,7 @@
22
     void     getInterTUQtDepthRange(uint32_t tuDepthRange[2], uint32_t absPartIdx) const;
23
     uint32_t getBestRefIdx(uint32_t subPartIdx) const { return ((m_interDir[subPartIdx] & 1) << m_refIdx[0][subPartIdx]) | 
24
                                                               (((m_interDir[subPartIdx] >> 1) & 1) << (m_refIdx[1][subPartIdx] + 16)); }
25
-    uint32_t getPUOffset(uint32_t puIdx, uint32_t absPartIdx) const { return (partAddrTable[(int)m_partSize[absPartIdx]][puIdx] << (g_unitSizeDepth - m_cuDepth[absPartIdx]) * 2) >> 4; }
26
+    uint32_t getPUOffset(uint32_t puIdx, uint32_t absPartIdx) const { return (partAddrTable[(int)m_partSize[absPartIdx]][puIdx] << (m_slice->m_param->unitSizeDepth - m_cuDepth[absPartIdx]) * 2) >> 4; }
27
 
28
     uint32_t getNumPartInter(uint32_t absPartIdx) const              { return nbPartsTable[(int)m_partSize[absPartIdx]]; }
29
     bool     isIntra(uint32_t absPartIdx) const   { return m_predMode[absPartIdx] == MODE_INTRA; }
30
@@ -285,7 +285,7 @@
31
     void     getAllowedChromaDir(uint32_t absPartIdx, uint32_t* modeList) const;
32
     int      getIntraDirLumaPredictor(uint32_t absPartIdx, uint32_t* intraDirPred) const;
33
 
34
-    uint32_t getSCUAddr() const                  { return (m_cuAddr << g_unitSizeDepth * 2) + m_absIdxInCTU; }
35
+    uint32_t getSCUAddr() const                  { return (m_cuAddr << m_slice->m_param->unitSizeDepth * 2) + m_absIdxInCTU; }
36
     uint32_t getCtxSplitFlag(uint32_t absPartIdx, uint32_t depth) const;
37
     uint32_t getCtxSkipFlag(uint32_t absPartIdx) const;
38
     void     getTUEntropyCodingParameters(TUEntropyCodingParameters &result, uint32_t absPartIdx, uint32_t log2TrSize, bool bIsLuma) const;
39
@@ -350,10 +350,10 @@
40
 
41
     CUDataMemPool() { charMemBlock = NULL; trCoeffMemBlock = NULL; mvMemBlock = NULL; distortionMemBlock = NULL; }
42
 
43
-    bool create(uint32_t depth, uint32_t csp, uint32_t numInstances)
44
+    bool create(uint32_t depth, uint32_t csp, uint32_t numInstances, const x265_param& param)
45
     {
46
-        uint32_t numPartition = NUM_4x4_PARTITIONS >> (depth * 2);
47
-        uint32_t cuSize = g_maxCUSize >> depth;
48
+        uint32_t numPartition = param.num4x4Partitions >> (depth * 2);
49
+        uint32_t cuSize = param.maxCUSize >> depth;
50
         uint32_t sizeL = cuSize * cuSize;
51
         if (csp == X265_CSP_I400)
52
         {
53
x265_2.4.tar.gz/source/common/frame.cpp -> x265_2.5.tar.gz/source/common/frame.cpp Changed
94
 
1
@@ -48,6 +48,11 @@
2
     m_rcData = NULL;
3
     m_encodeStartTime = 0;
4
     m_reconfigureRc = false;
5
+    m_ctuInfo = NULL;
6
+    m_prevCtuInfoChange = NULL;
7
+    m_addOnDepth = NULL;
8
+    m_addOnCtuInfo = NULL;
9
+    m_addOnPrevChange = NULL;
10
 }
11
 
12
 bool Frame::create(x265_param *param, float* quantOffsets)
13
@@ -56,11 +61,26 @@
14
     m_param = param;
15
     CHECKED_MALLOC_ZERO(m_rcData, RcStats, 1);
16
 
17
-    if (m_fencPic->create(param->sourceWidth, param->sourceHeight, param->internalCsp) &&
18
-        m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode || !!param->bAQMotion, param->rc.qgSize))
19
+    if (param->bCTUInfo)
20
+    {
21
+        uint32_t widthInCTU = (m_param->sourceWidth + param->maxCUSize - 1) >> m_param->maxLog2CUSize;
22
+        uint32_t heightInCTU = (m_param->sourceHeight +  param->maxCUSize - 1) >> m_param->maxLog2CUSize;
23
+        uint32_t numCTUsInFrame = widthInCTU * heightInCTU;
24
+        CHECKED_MALLOC_ZERO(m_addOnDepth, uint8_t *, numCTUsInFrame);
25
+        CHECKED_MALLOC_ZERO(m_addOnCtuInfo, uint8_t *, numCTUsInFrame);
26
+        CHECKED_MALLOC_ZERO(m_addOnPrevChange, int *, numCTUsInFrame);
27
+        for (uint32_t i = 0; i < numCTUsInFrame; i++)
28
+        {
29
+            CHECKED_MALLOC_ZERO(m_addOnDepth[i], uint8_t, uint32_t(param->num4x4Partitions));
30
+            CHECKED_MALLOC_ZERO(m_addOnCtuInfo[i], uint8_t, uint32_t(param->num4x4Partitions));
31
+            CHECKED_MALLOC_ZERO(m_addOnPrevChange[i], int, uint32_t(param->num4x4Partitions));
32
+        }
33
+    }
34
+
35
+    if (m_fencPic->create(param) && m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode || !!param->bAQMotion, param->rc.qgSize))
36
     {
37
         X265_CHECK((m_reconColCount == NULL), "m_reconColCount was initialized");
38
-        m_numRows = (m_fencPic->m_picHeight + g_maxCUSize - 1)  / g_maxCUSize;
39
+        m_numRows = (m_fencPic->m_picHeight + param->maxCUSize - 1)  / param->maxCUSize;
40
         m_reconRowFlag = new ThreadSafeInteger[m_numRows];
41
         m_reconColCount = new ThreadSafeInteger[m_numRows];
42
 
43
@@ -86,12 +106,12 @@
44
     m_reconPic = new PicYuv;
45
     m_param = param;
46
     m_encData->m_reconPic = m_reconPic;
47
-    bool ok = m_encData->create(*param, sps, m_fencPic->m_picCsp) && m_reconPic->create(param->sourceWidth, param->sourceHeight, param->internalCsp);
48
+    bool ok = m_encData->create(*param, sps, m_fencPic->m_picCsp) && m_reconPic->create(param);
49
     if (ok)
50
     {
51
         /* initialize right border of m_reconpicYuv as SAO may read beyond the
52
          * end of the picture accessing uninitialized pixels */
53
-        int maxHeight = sps.numCuInHeight * g_maxCUSize;
54
+        int maxHeight = sps.numCuInHeight * param->maxCUSize;
55
         memset(m_reconPic->m_picOrg[0], 0, sizeof(pixel)* m_reconPic->m_stride * maxHeight);
56
 
57
         /* use pre-calculated cu/pu offsets cached in the SPS structure */
58
@@ -166,6 +186,35 @@
59
         delete[] m_userSEI.payloads;
60
     }
61
 
62
+    if (m_ctuInfo)
63
+    {
64
+        uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
65
+        uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
66
+        uint32_t numCUsInFrame = widthInCU * heightInCU;
67
+        for (uint32_t i = 0; i < numCUsInFrame; i++)
68
+        {
69
+            X265_FREE((*m_ctuInfo + i)->ctuInfo);
70
+            (*m_ctuInfo + i)->ctuInfo = NULL;
71
+            X265_FREE(m_addOnDepth[i]);
72
+            m_addOnDepth[i] = NULL;
73
+            X265_FREE(m_addOnCtuInfo[i]);
74
+            m_addOnCtuInfo[i] = NULL;
75
+            X265_FREE(m_addOnPrevChange[i]);
76
+            m_addOnPrevChange[i] = NULL;
77
+        }
78
+        X265_FREE(*m_ctuInfo);
79
+        *m_ctuInfo = NULL;
80
+        X265_FREE(m_ctuInfo);
81
+        m_ctuInfo = NULL;
82
+        X265_FREE(m_prevCtuInfoChange);
83
+        m_prevCtuInfoChange = NULL;
84
+        X265_FREE(m_addOnDepth);
85
+        m_addOnDepth = NULL;
86
+        X265_FREE(m_addOnCtuInfo);
87
+        m_addOnCtuInfo = NULL;
88
+        X265_FREE(m_addOnPrevChange);
89
+        m_addOnPrevChange = NULL;
90
+    }
91
     m_lowres.destroy();
92
     X265_FREE(m_rcData);
93
 }
94
x265_2.4.tar.gz/source/common/frame.h -> x265_2.5.tar.gz/source/common/frame.h Changed
27
 
1
@@ -66,6 +66,10 @@
2
     double   shortTermCplxCount;
3
     int64_t  totalBits;
4
     int64_t  encodedBits;
5
+    double   coeff[4];
6
+    double   count[4];
7
+    double   offset[4];
8
+    double   bufferFillFinal;
9
 };
10
 
11
 class Frame
12
@@ -108,7 +112,14 @@
13
     x265_analysis_2Pass    m_analysis2Pass;
14
     RcStats*               m_rcData;
15
 
16
+    x265_ctu_info_t**      m_ctuInfo;
17
+    Event                  m_copied;
18
+    int*                   m_prevCtuInfoChange;
19
     int64_t                m_encodeStartTime;
20
+
21
+    uint8_t**              m_addOnDepth;
22
+    uint8_t**              m_addOnCtuInfo;
23
+    int**                  m_addOnPrevChange;
24
     Frame();
25
 
26
     bool create(x265_param *param, float* quantOffsets);
27
x265_2.4.tar.gz/source/common/framedata.cpp -> x265_2.5.tar.gz/source/common/framedata.cpp Changed
13
 
1
@@ -41,9 +41,9 @@
2
     if (param.rc.bStatWrite)
3
         m_spsrps = const_cast<RPS*>(sps.spsrps);
4
 
5
-    m_cuMemPool.create(0, param.internalCsp, sps.numCUsInFrame);
6
+    m_cuMemPool.create(0, param.internalCsp, sps.numCUsInFrame, param);
7
     for (uint32_t ctuAddr = 0; ctuAddr < sps.numCUsInFrame; ctuAddr++)
8
-        m_picCTU[ctuAddr].initialize(m_cuMemPool, 0, param.internalCsp, ctuAddr);
9
+        m_picCTU[ctuAddr].initialize(m_cuMemPool, 0, param, ctuAddr);
10
 
11
     CHECKED_MALLOC_ZERO(m_cuStat, RCStatCU, sps.numCUsInFrame);
12
     CHECKED_MALLOC(m_rowStat, RCStatRow, sps.numCuInHeight);
13
x265_2.4.tar.gz/source/common/framedata.h -> x265_2.5.tar.gz/source/common/framedata.h Changed
25
 
1
@@ -62,6 +62,7 @@
2
     double      percentMergeCu[NUM_CU_DEPTH];
3
     double      percentIntraDistribution[NUM_CU_DEPTH][INTRA_MODES];
4
     double      percentInterDistribution[NUM_CU_DEPTH][3];           // 2Nx2N, RECT, AMP modes percentage
5
+    double      ipCostRatio;
6
 
7
     uint64_t    cntIntraNxN;
8
     uint64_t    totalCu;
9
@@ -78,6 +79,15 @@
10
     uint64_t    cuInterDistribution[NUM_CU_DEPTH][INTER_MODES];
11
     uint64_t    cuIntraDistribution[NUM_CU_DEPTH][INTRA_MODES];
12
 
13
+
14
+    uint64_t    totalPu[NUM_CU_DEPTH + 1];
15
+    uint64_t    cntSkipPu[NUM_CU_DEPTH];
16
+    uint64_t    cntIntraPu[NUM_CU_DEPTH];
17
+    uint64_t    cntAmp[NUM_CU_DEPTH];
18
+    uint64_t    cnt4x4;
19
+    uint64_t    cntInterPu[NUM_CU_DEPTH][INTER_MODES - 1];
20
+    uint64_t    cntMergePu[NUM_CU_DEPTH][INTER_MODES - 1];
21
+
22
     FrameStats()
23
     {
24
         memset(this, 0, sizeof(FrameStats));
25
x265_2.4.tar.gz/source/common/ipfilter.cpp -> x265_2.5.tar.gz/source/common/ipfilter.cpp Changed
24
 
1
@@ -123,9 +123,8 @@
2
     const int16_t* coeff = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx];
3
     int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
4
     int shift = IF_FILTER_PREC - headRoom;
5
-    int offset = -IF_INTERNAL_OFFS << shift;
6
+    int offset = (unsigned)-IF_INTERNAL_OFFS << shift;
7
     int blkheight = height;
8
-
9
     src -= N / 2 - 1;
10
 
11
     if (isRowExt)
12
@@ -209,10 +208,8 @@
13
     const int16_t* c = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx];
14
     int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
15
     int shift = IF_FILTER_PREC - headRoom;
16
-    int offset = -IF_INTERNAL_OFFS << shift;
17
-
18
+    int offset = (unsigned)-IF_INTERNAL_OFFS << shift;
19
     src -= (N / 2 - 1) * srcStride;
20
-
21
     int row, col;
22
     for (row = 0; row < height; row++)
23
     {
24
x265_2.4.tar.gz/source/common/lowres.h -> x265_2.5.tar.gz/source/common/lowres.h Changed
10
 
1
@@ -118,6 +118,8 @@
2
     bool   bKeyframe;
3
     bool   bLastMiniGopBFrame;
4
 
5
+    double ipCostRatio;
6
+
7
     /* lookahead output data */
8
     int64_t   costEst[X265_BFRAME_MAX + 2][X265_BFRAME_MAX + 2];
9
     int64_t   costEstAq[X265_BFRAME_MAX + 2][X265_BFRAME_MAX + 2];
10
x265_2.4.tar.gz/source/common/param.cpp -> x265_2.5.tar.gz/source/common/param.cpp Changed
232
 
1
@@ -110,6 +110,7 @@
2
     param->frameNumThreads = 0;
3
 
4
     param->logLevel = X265_LOG_INFO;
5
+    param->csvLogLevel = 0;
6
     param->csvfn = NULL;
7
     param->rc.lambdaFileName = NULL;
8
     param->bLogCuStats = 0;
9
@@ -194,10 +195,10 @@
10
     param->rdPenalty = 0;
11
     param->psyRd = 2.0;
12
     param->psyRdoq = 0.0;
13
-    param->analysisMode = 0;
14
+    param->analysisReuseMode = 0;
15
     param->analysisMultiPassRefine = 0;
16
     param->analysisMultiPassDistortion = 0;
17
-    param->analysisFileName = NULL;
18
+    param->analysisReuseFileName = NULL;
19
     param->bIntraInBFrames = 0;
20
     param->bLossless = 0;
21
     param->bCULossless = 0;
22
@@ -236,6 +237,7 @@
23
     param->rc.bEnableGrain = 0;
24
     param->rc.qpMin = 0;
25
     param->rc.qpMax = QP_MAX_MAX;
26
+    param->rc.bEnableConstVbv = 0;
27
 
28
     /* Video Usability Information (VUI) */
29
     param->vui.aspectRatioIdc = 0;
30
@@ -271,10 +273,18 @@
31
     param->bOptCUDeltaQP        = 0;
32
     param->bAQMotion = 0;
33
     param->bHDROpt = 0;
34
-    param->analysisRefineLevel = 5;
35
+    param->analysisReuseLevel = 5;
36
 
37
     param->toneMapFile = NULL;
38
     param->bDhdr10opt = 0;
39
+    param->bCTUInfo = 0;
40
+    param->bUseRcStats = 0;
41
+    param->scaleFactor = 0;
42
+    param->intraRefine = 0;
43
+    param->interRefine = 0;
44
+    param->mvRefine = 0;
45
+    param->bUseAnalysisFile = 1;
46
+    param->csvfpt = NULL;
47
 }
48
 
49
 int x265_param_default_preset(x265_param* param, const char* preset, const char* tune)
50
@@ -494,6 +504,7 @@
51
             param->psyRd = 4.0;
52
             param->psyRdoq = 10.0;
53
             param->bEnableSAO = 0;
54
+            param->rc.bEnableConstVbv = 1;
55
         }
56
         else
57
             return -1;
58
@@ -828,7 +839,7 @@
59
         p->rc.bStrictCbr = atobool(value);
60
         p->rc.pbFactor = 1.0;
61
     }
62
-    OPT("analysis-mode") p->analysisMode = parseName(value, x265_analysis_names, bError);
63
+    OPT("analysis-reuse-mode") p->analysisReuseMode = parseName(value, x265_analysis_names, bError);
64
     OPT("sar")
65
     {
66
         p->vui.aspectRatioIdc = parseName(value, x265_sar_names, bError);
67
@@ -907,7 +918,7 @@
68
     OPT("scaling-list") p->scalingLists = strdup(value);
69
     OPT2("pools", "numa-pools") p->numaPools = strdup(value);
70
     OPT("lambda-file") p->rc.lambdaFileName = strdup(value);
71
-    OPT("analysis-file") p->analysisFileName = strdup(value);
72
+    OPT("analysis-reuse-file") p->analysisReuseFileName = strdup(value);
73
     OPT("qg-size") p->rc.qgSize = atoi(value);
74
     OPT("master-display") p->masteringDisplayColorVolume = strdup(value);
75
     OPT("max-cll") bError |= sscanf(value, "%hu,%hu", &p->maxCLL, &p->maxFALL) != 2;
76
@@ -921,6 +932,8 @@
77
     if (bExtraParams)
78
     {
79
         if (0) ;
80
+        OPT("csv") p->csvfn = strdup(value);
81
+        OPT("csv-log-level") p->csvLogLevel = atoi(value);
82
         OPT("qpmin") p->rc.qpMin = atoi(value);
83
         OPT("analyze-src-pics") p->bSourceReferenceEstimation = atobool(value);
84
         OPT("log2-max-poc-lsb") p->log2MaxPocLsb = atoi(value);
85
@@ -938,7 +951,7 @@
86
         OPT("multi-pass-opt-distortion") p->analysisMultiPassDistortion = atobool(value);
87
         OPT("aq-motion") p->bAQMotion = atobool(value);
88
         OPT("dynamic-rd") p->dynamicRd = atof(value);
89
-        OPT("refine-level") p->analysisRefineLevel = atoi(value);
90
+        OPT("analysis-reuse-level") p->analysisReuseLevel = atoi(value);
91
         OPT("ssim-rd")
92
         {
93
             int bval = atobool(value);
94
@@ -954,6 +967,12 @@
95
         OPT("limit-sao") p->bLimitSAO = atobool(value);
96
         OPT("dhdr10-info") p->toneMapFile = strdup(value);
97
         OPT("dhdr10-opt") p->bDhdr10opt = atobool(value);
98
+        OPT("const-vbv") p->rc.bEnableConstVbv = atobool(value);
99
+        OPT("ctu-info") p->bCTUInfo = atoi(value);
100
+        OPT("scale-factor") p->scaleFactor = atoi(value);
101
+        OPT("refine-intra")p->intraRefine = atoi(value);
102
+        OPT("refine-inter")p->interRefine = atobool(value);
103
+        OPT("refine-mv")p->mvRefine = atobool(value);
104
         else
105
             return X265_PARAM_BAD_NAME;
106
     }
107
@@ -1284,16 +1303,19 @@
108
           "Constant QP is incompatible with 2pass");
109
     CHECK(param->rc.bStrictCbr && (param->rc.bitrate <= 0 || param->rc.vbvBufferSize <=0),
110
           "Strict-cbr cannot be applied without specifying target bitrate or vbv bufsize");
111
-    CHECK(param->analysisMode && (param->analysisMode < X265_ANALYSIS_OFF || param->analysisMode > X265_ANALYSIS_LOAD),
112
+    CHECK(param->analysisReuseMode && (param->analysisReuseMode < X265_ANALYSIS_OFF || param->analysisReuseMode > X265_ANALYSIS_LOAD),
113
         "Invalid analysis mode. Analysis mode 0: OFF 1: SAVE : 2 LOAD");
114
-    CHECK(param->analysisMode && (param->analysisRefineLevel < 1 || param->analysisRefineLevel > 10),
115
+    CHECK(param->analysisReuseMode && (param->analysisReuseLevel < 1 || param->analysisReuseLevel > 10),
116
         "Invalid analysis refine level. Value must be between 1 and 10 (inclusive)");
117
+    CHECK(param->scaleFactor > 2, "Invalid scale-factor. Supports factor <= 2");
118
     CHECK(param->rc.qpMax < QP_MIN || param->rc.qpMax > QP_MAX_MAX,
119
         "qpmax exceeds supported range (0 to 69)");
120
     CHECK(param->rc.qpMin < QP_MIN || param->rc.qpMin > QP_MAX_MAX,
121
         "qpmin exceeds supported range (0 to 69)");
122
     CHECK(param->log2MaxPocLsb < 4 || param->log2MaxPocLsb > 16,
123
         "Supported range for log2MaxPocLsb is 4 to 16");
124
+    CHECK(param->bCTUInfo < 0 || (param->bCTUInfo != 0 && param->bCTUInfo != 1 && param->bCTUInfo != 2 && param->bCTUInfo != 4 && param->bCTUInfo != 6) || param->bCTUInfo > 6,
125
+        "Supported values for bCTUInfo are 0, 1, 2, 4, 6");
126
 #if !X86_64
127
     CHECK(param->searchMethod == X265_SEA && (param->sourceWidth > 840 || param->sourceHeight > 480),
128
         "SEA motion search does not support resolutions greater than 480p in 32 bit build");
129
@@ -1322,42 +1344,6 @@
130
     }
131
 }
132
 
133
-int x265_set_globals(x265_param* param)
134
-{
135
-    uint32_t maxLog2CUSize = (uint32_t)g_log2Size[param->maxCUSize];
136
-    uint32_t minLog2CUSize = (uint32_t)g_log2Size[param->minCUSize];
137
-
138
-    Lock gLock;
139
-    ScopedLock sLock(gLock);
140
-
141
-    if (++g_ctuSizeConfigured > 1)
142
-    {
143
-        if (g_maxCUSize != param->maxCUSize)
144
-        {
145
-            x265_log(param, X265_LOG_WARNING, "maxCUSize must be the same for all encoders in a single process");
146
-        }
147
-        if (g_maxCUDepth != maxLog2CUSize - minLog2CUSize)
148
-        {
149
-            x265_log(param, X265_LOG_WARNING, "maxCUDepth must be the same for all encoders in a single process");
150
-        }
151
-        param->maxCUSize = g_maxCUSize;
152
-        return x265_check_params(param); /* Check again, since param may have changed */
153
-    }
154
-    else
155
-    {
156
-        // set max CU width & height
157
-        g_maxCUSize     = param->maxCUSize;
158
-        g_maxLog2CUSize = maxLog2CUSize;
159
-
160
-        // compute actual CU depth with respect to config depth and max transform size
161
-        g_maxCUDepth    = maxLog2CUSize - minLog2CUSize;
162
-        g_unitSizeDepth = maxLog2CUSize - LOG2_UNIT_SIZE;
163
-    }
164
-
165
-    g_maxSlices = param->maxSlices;
166
-    return 0;
167
-}
168
-
169
 static void appendtool(x265_param* param, char* buf, size_t size, const char* toolstr)
170
 {
171
     static const int overhead = (int)strlen("x265 [info]: tools: ");
172
@@ -1457,6 +1443,7 @@
173
     TOOLOPT(param->bEnableStrongIntraSmoothing, "strong-intra-smoothing");
174
     TOOLVAL(param->lookaheadSlices, "lslices=%d");
175
     TOOLVAL(param->lookaheadThreads, "lthreads=%d")
176
+    TOOLVAL(param->bCTUInfo, "ctu-info=%d");
177
     if (param->maxSlices > 1)
178
         TOOLVAL(param->maxSlices, "slices=%d");
179
     if (param->bEnableLoopFilter)
180
@@ -1473,8 +1460,8 @@
181
     TOOLOPT(!param->bSaoNonDeblocked && param->bEnableSAO, "sao");
182
     TOOLOPT(param->rc.bStatWrite, "stats-write");
183
     TOOLOPT(param->rc.bStatRead,  "stats-read");
184
-#if ENABLE_DYNAMIC_HDR10
185
-    TOOLVAL(param->toneMapFile != NULL, "dhdr10-info");
186
+#if ENABLE_HDR10_PLUS
187
+    TOOLOPT(param->toneMapFile != NULL, "dhdr10-info");
188
 #endif
189
     x265_log(param, X265_LOG_INFO, "tools:%s\n", buf);
190
     fflush(stderr);
191
@@ -1501,6 +1488,8 @@
192
     BOOL(p->bEnablePsnr, "psnr");
193
     BOOL(p->bEnableSsim, "ssim");
194
     s += sprintf(s, " log-level=%d", p->logLevel);
195
+    if (p->csvfn)
196
+        s += sprintf(s, " csvfn=%s csv-log-level=%d", p->csvfn, p->csvLogLevel);
197
     s += sprintf(s, " bitdepth=%d", p->internalBitDepth);
198
     s += sprintf(s, " input-csp=%d", p->internalCsp);
199
     s += sprintf(s, " fps=%u/%u", p->fpsNum, p->fpsDenom);
200
@@ -1573,7 +1562,7 @@
201
     s += sprintf(s, " psy-rd=%.2f", p->psyRd);
202
     s += sprintf(s, " psy-rdoq=%.2f", p->psyRdoq);
203
     BOOL(p->bEnableRdRefine, "rd-refine");
204
-    s += sprintf(s, " analysis-mode=%d", p->analysisMode);
205
+    s += sprintf(s, " analysis-reuse-mode=%d", p->analysisReuseMode);
206
     BOOL(p->bLossless, "lossless");
207
     s += sprintf(s, " cbqpoffs=%d", p->cbQpOffset);
208
     s += sprintf(s, " crqpoffs=%d", p->crQpOffset);
209
@@ -1630,6 +1619,7 @@
210
     s += sprintf(s, " qg-size=%d", p->rc.qgSize);
211
     BOOL(p->rc.bEnableGrain, "rc-grain");
212
     s += sprintf(s, " qpmax=%d qpmin=%d", p->rc.qpMax, p->rc.qpMin);
213
+    BOOL(p->rc.bEnableConstVbv, "const-vbv");
214
     s += sprintf(s, " sar=%d", p->vui.aspectRatioIdc);
215
     if (p->vui.aspectRatioIdc == X265_EXTENDED_SAR)
216
         s += sprintf(s, " sar-width : sar-height=%d:%d", p->vui.sarWidth, p->vui.sarHeight);
217
@@ -1668,8 +1658,13 @@
218
     BOOL(p->bEmitHDRSEI, "hdr");
219
     BOOL(p->bHDROpt, "hdr-opt");
220
     BOOL(p->bDhdr10opt, "dhdr10-opt");
221
-    s += sprintf(s, " refine-level=%d", p->analysisRefineLevel);
222
+    s += sprintf(s, " analysis-reuse-level=%d", p->analysisReuseLevel);
223
+    s += sprintf(s, " scale-factor=%d", p->scaleFactor);
224
+    s += sprintf(s, " refine-intra=%d", p->intraRefine);
225
+    s += sprintf(s, " refine-inter=%d", p->interRefine);
226
+    s += sprintf(s, " refine-mv=%d", p->mvRefine);
227
     BOOL(p->bLimitSAO, "limit-sao");
228
+    s += sprintf(s, " ctu-info=%d", p->bCTUInfo);
229
 #undef BOOL
230
     return buf;
231
 }
232
x265_2.4.tar.gz/source/common/param.h -> x265_2.5.tar.gz/source/common/param.h Changed
9
 
1
@@ -28,7 +28,6 @@
2
 namespace X265_NS {
3
 
4
 int   x265_check_params(x265_param *param);
5
-int   x265_set_globals(x265_param *param);
6
 void  x265_print_params(x265_param *param);
7
 void  x265_param_apply_fastfirstpass(x265_param *p);
8
 char* x265_param2string(x265_param *param, int padx, int pady);
9
x265_2.4.tar.gz/source/common/picyuv.cpp -> x265_2.5.tar.gz/source/common/picyuv.cpp Changed
189
 
1
@@ -46,36 +46,62 @@
2
 
3
     m_maxLumaLevel = 0;
4
     m_avgLumaLevel = 0;
5
+
6
+    m_maxChromaULevel = 0;
7
+    m_avgChromaULevel = 0;
8
+
9
+    m_maxChromaVLevel = 0;
10
+    m_avgChromaVLevel = 0;
11
+
12
+#if (X265_DEPTH > 8)
13
+    m_minLumaLevel = 0xFFFF;
14
+    m_minChromaULevel = 0xFFFF;
15
+    m_minChromaVLevel = 0xFFFF;
16
+#else
17
+    m_minLumaLevel = 0xFF;
18
+    m_minChromaULevel = 0xFF;
19
+    m_minChromaVLevel = 0xFF;
20
+#endif
21
+
22
     m_stride = 0;
23
     m_strideC = 0;
24
     m_hChromaShift = 0;
25
     m_vChromaShift = 0;
26
 }
27
 
28
-bool PicYuv::create(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp)
29
+bool PicYuv::create(x265_param* param, pixel *pixelbuf)
30
 {
31
+    m_param = param;
32
+    uint32_t picWidth = m_param->sourceWidth;
33
+    uint32_t picHeight = m_param->sourceHeight;
34
+    uint32_t picCsp = m_param->internalCsp;
35
     m_picWidth  = picWidth;
36
     m_picHeight = picHeight;
37
     m_hChromaShift = CHROMA_H_SHIFT(picCsp);
38
     m_vChromaShift = CHROMA_V_SHIFT(picCsp);
39
     m_picCsp = picCsp;
40
 
41
-    uint32_t numCuInWidth = (m_picWidth + g_maxCUSize - 1)  / g_maxCUSize;
42
-    uint32_t numCuInHeight = (m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
43
+    uint32_t numCuInWidth = (m_picWidth + param->maxCUSize - 1)  / param->maxCUSize;
44
+    uint32_t numCuInHeight = (m_picHeight + param->maxCUSize - 1) / param->maxCUSize;
45
 
46
-    m_lumaMarginX = g_maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
47
-    m_lumaMarginY = g_maxCUSize + 16; // margin for 8-tap filter and infinite padding
48
-    m_stride = (numCuInWidth * g_maxCUSize) + (m_lumaMarginX << 1);
49
+    m_lumaMarginX = param->maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
50
+    m_lumaMarginY = param->maxCUSize + 16; // margin for 8-tap filter and infinite padding
51
+    m_stride = (numCuInWidth * param->maxCUSize) + (m_lumaMarginX << 1);
52
 
53
-    int maxHeight = numCuInHeight * g_maxCUSize;
54
-    CHECKED_MALLOC(m_picBuf[0], pixel, m_stride * (maxHeight + (m_lumaMarginY * 2)));
55
-    m_picOrg[0] = m_picBuf[0] + m_lumaMarginY * m_stride + m_lumaMarginX;
56
+    int maxHeight = numCuInHeight * param->maxCUSize;
57
+    if (pixelbuf)
58
+        m_picOrg[0] = pixelbuf;
59
+    else
60
+    {
61
+        CHECKED_MALLOC(m_picBuf[0], pixel, m_stride * (maxHeight + (m_lumaMarginY * 2)));
62
+        m_picOrg[0] = m_picBuf[0] + m_lumaMarginY * m_stride + m_lumaMarginX;
63
+    }
64
 
65
     if (picCsp != X265_CSP_I400)
66
     {
67
         m_chromaMarginX = m_lumaMarginX;  // keep 16-byte alignment for chroma CTUs
68
         m_chromaMarginY = m_lumaMarginY >> m_vChromaShift;
69
-        m_strideC = ((numCuInWidth * g_maxCUSize) >> m_hChromaShift) + (m_chromaMarginX * 2);
70
+        m_strideC = ((numCuInWidth * m_param->maxCUSize) >> m_hChromaShift) + (m_chromaMarginX * 2);
71
 
72
         CHECKED_MALLOC(m_picBuf[1], pixel, m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2)));
73
         CHECKED_MALLOC(m_picBuf[2], pixel, m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2)));
74
@@ -94,12 +120,33 @@
75
     return false;
76
 }
77
 
78
+int PicYuv::getLumaBufLen(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp)
79
+{
80
+    m_picWidth = picWidth;
81
+    m_picHeight = picHeight;
82
+    m_hChromaShift = CHROMA_H_SHIFT(picCsp);
83
+    m_vChromaShift = CHROMA_V_SHIFT(picCsp);
84
+    m_picCsp = picCsp;
85
+
86
+    uint32_t numCuInWidth = (m_picWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
87
+    uint32_t numCuInHeight = (m_picHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
88
+
89
+    m_lumaMarginX = m_param->maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
90
+    m_lumaMarginY = m_param->maxCUSize + 16; // margin for 8-tap filter and infinite padding
91
+    m_stride = (numCuInWidth * m_param->maxCUSize) + (m_lumaMarginX << 1);
92
+
93
+    int maxHeight = numCuInHeight * m_param->maxCUSize;
94
+    int bufLen = (int)(m_stride * (maxHeight + (m_lumaMarginY * 2)));
95
+
96
+    return bufLen;
97
+}
98
+
99
 /* the first picture allocated by the encoder will be asked to generate these
100
  * offset arrays. Once generated, they will be provided to all future PicYuv
101
  * allocated by the same encoder. */
102
 bool PicYuv::createOffsets(const SPS& sps)
103
 {
104
-    uint32_t numPartitions = 1 << (g_unitSizeDepth * 2);
105
+    uint32_t numPartitions = 1 << (m_param->unitSizeDepth * 2);
106
 
107
     if (m_picCsp != X265_CSP_I400)
108
     {
109
@@ -109,8 +156,8 @@
110
         {
111
             for (uint32_t cuCol = 0; cuCol < sps.numCuInWidth; cuCol++)
112
             {
113
-                m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * g_maxCUSize + cuCol * g_maxCUSize;
114
-                m_cuOffsetC[cuRow * sps.numCuInWidth + cuCol] = m_strideC * cuRow * (g_maxCUSize >> m_vChromaShift) + cuCol * (g_maxCUSize >> m_hChromaShift);
115
+                m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * m_param->maxCUSize + cuCol * m_param->maxCUSize;
116
+                m_cuOffsetC[cuRow * sps.numCuInWidth + cuCol] = m_strideC * cuRow * (m_param->maxCUSize >> m_vChromaShift) + cuCol * (m_param->maxCUSize >> m_hChromaShift);
117
             }
118
         }
119
 
120
@@ -129,7 +176,7 @@
121
         CHECKED_MALLOC(m_cuOffsetY, intptr_t, sps.numCuInWidth * sps.numCuInHeight);
122
         for (uint32_t cuRow = 0; cuRow < sps.numCuInHeight; cuRow++)
123
         for (uint32_t cuCol = 0; cuCol < sps.numCuInWidth; cuCol++)
124
-            m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * g_maxCUSize + cuCol * g_maxCUSize;
125
+            m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * m_param->maxCUSize + cuCol * m_param->maxCUSize;
126
 
127
         CHECKED_MALLOC(m_buOffsetY, intptr_t, (size_t)numPartitions);
128
         for (uint32_t idx = 0; idx < numPartitions; ++idx)
129
@@ -184,6 +231,11 @@
130
 
131
     X265_CHECK(pic.bitDepth >= 8, "pic.bitDepth check failure");
132
 
133
+    uint64_t lumaSum;
134
+    uint64_t cbSum;
135
+    uint64_t crSum;
136
+    lumaSum = cbSum = crSum = 0;
137
+
138
     if (pic.bitDepth == 8)
139
     {
140
 #if (X265_DEPTH > 8)
141
@@ -288,6 +340,47 @@
142
     pixel *U = m_picOrg[1];
143
     pixel *V = m_picOrg[2];
144
 
145
+    pixel *yPic = m_picOrg[0];
146
+    pixel *uPic = m_picOrg[1];
147
+    pixel *vPic = m_picOrg[2];
148
+
149
+    for (int r = 0; r < height; r++)
150
+    {
151
+        for (int c = 0; c < width; c++)
152
+        {
153
+            m_maxLumaLevel = X265_MAX(yPic[c], m_maxLumaLevel);
154
+            m_minLumaLevel = X265_MIN(yPic[c], m_minLumaLevel);
155
+            lumaSum += yPic[c];
156
+        }
157
+        yPic += m_stride;
158
+    }
159
+    m_avgLumaLevel = (double)lumaSum / (m_picHeight * m_picWidth);
160
+
161
+    if (param.csvLogLevel >= 2)
162
+    {
163
+        if (param.internalCsp != X265_CSP_I400)
164
+        {
165
+            for (int r = 0; r < height >> m_vChromaShift; r++)
166
+            {
167
+                for (int c = 0; c < width >> m_hChromaShift; c++)
168
+                {
169
+                    m_maxChromaULevel = X265_MAX(uPic[c], m_maxChromaULevel);
170
+                    m_minChromaULevel = X265_MIN(uPic[c], m_minChromaULevel);
171
+                    cbSum += uPic[c];
172
+
173
+                    m_maxChromaVLevel = X265_MAX(vPic[c], m_maxChromaVLevel);
174
+                    m_minChromaVLevel = X265_MIN(vPic[c], m_minChromaVLevel);
175
+                    crSum += vPic[c];
176
+                }
177
+
178
+                uPic += m_strideC;
179
+                vPic += m_strideC;
180
+            }
181
+            m_avgChromaULevel = (double)cbSum / ((height >> m_vChromaShift) * (width >> m_hChromaShift));
182
+            m_avgChromaVLevel = (double)crSum / ((height >> m_vChromaShift) * (width >> m_hChromaShift));
183
+        }
184
+    }
185
+
186
 #if HIGH_BIT_DEPTH
187
     bool calcHDRParams = !!param.minLuma || (param.maxLuma != PIXEL_MAX);
188
     /* Apply min/max luma bounds for HDR pixel manipulations */
189
x265_2.4.tar.gz/source/common/picyuv.h -> x265_2.5.tar.gz/source/common/picyuv.h Changed
30
 
1
@@ -60,14 +60,25 @@
2
     uint32_t m_chromaMarginX;
3
     uint32_t m_chromaMarginY;
4
 
5
-    pixel m_maxLumaLevel;
6
-    double   m_avgLumaLevel;
7
+    pixel   m_maxLumaLevel;
8
+    pixel   m_minLumaLevel;
9
+    double  m_avgLumaLevel;
10
+
11
+    pixel   m_maxChromaULevel;
12
+    pixel   m_minChromaULevel;
13
+    double  m_avgChromaULevel;
14
+
15
+    pixel   m_maxChromaVLevel;
16
+    pixel   m_minChromaVLevel;
17
+    double  m_avgChromaVLevel;
18
+    x265_param *m_param;
19
 
20
     PicYuv();
21
 
22
-    bool  create(uint32_t picWidth, uint32_t picHeight, uint32_t csp);
23
+    bool  create(x265_param* param, pixel *pixelbuf = NULL);
24
     bool  createOffsets(const SPS& sps);
25
     void  destroy();
26
+    int   getLumaBufLen(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp);
27
 
28
     void  copyFromPicture(const x265_picture&, const x265_param& param, int padx, int pady);
29
 
30
x265_2.4.tar.gz/source/common/primitives.cpp -> x265_2.5.tar.gz/source/common/primitives.cpp Changed
17
 
1
@@ -57,6 +57,7 @@
2
 void setupIntraPrimitives_c(EncoderPrimitives &p);
3
 void setupLoopFilterPrimitives_c(EncoderPrimitives &p);
4
 void setupSaoPrimitives_c(EncoderPrimitives &p);
5
+void setupSeaIntegralPrimitives_c(EncoderPrimitives &p);
6
 
7
 void setupCPrimitives(EncoderPrimitives &p)
8
 {
9
@@ -66,6 +67,7 @@
10
     setupIntraPrimitives_c(p);      // intrapred.cpp
11
     setupLoopFilterPrimitives_c(p); // loopfilter.cpp
12
     setupSaoPrimitives_c(p);        // sao.cpp
13
+    setupSeaIntegralPrimitives_c(p);  // framefilter.cpp
14
 }
15
 
16
 void setupAliasPrimitives(EncoderPrimitives &p)
17
x265_2.4.tar.gz/source/common/primitives.h -> x265_2.5.tar.gz/source/common/primitives.h Changed
39
 
1
@@ -110,6 +110,17 @@
2
     BLOCK_422_32x64
3
 };
4
 
5
+enum IntegralSize
6
+{
7
+    INTEGRAL_4,
8
+    INTEGRAL_8,
9
+    INTEGRAL_12,
10
+    INTEGRAL_16,
11
+    INTEGRAL_24,
12
+    INTEGRAL_32,
13
+    NUM_INTEGRAL_SIZE
14
+};
15
+
16
 typedef int  (*pixelcmp_t)(const pixel* fenc, intptr_t fencstride, const pixel* fref, intptr_t frefstride); // fenc is aligned
17
 typedef int  (*pixelcmp_ss_t)(const int16_t* fenc, intptr_t fencstride, const int16_t* fref, intptr_t frefstride);
18
 typedef sse_t (*pixel_sse_t)(const pixel* fenc, intptr_t fencstride, const pixel* fref, intptr_t frefstride); // fenc is aligned
19
@@ -203,6 +214,9 @@
20
 typedef void (*pelFilterLumaStrong_t)(pixel* src, intptr_t srcStep, intptr_t offset, int32_t tcP, int32_t tcQ);
21
 typedef void (*pelFilterChroma_t)(pixel* src, intptr_t srcStep, intptr_t offset, int32_t tc, int32_t maskP, int32_t maskQ);
22
 
23
+typedef void (*integralv_t)(uint32_t *sum, intptr_t stride);
24
+typedef void (*integralh_t)(uint32_t *sum, pixel *pix, intptr_t stride);
25
+
26
 /* Function pointers to optimized encoder primitives. Each pointer can reference
27
  * either an assembly routine, a SIMD intrinsic primitive, or a C function */
28
 struct EncoderPrimitives
29
@@ -342,6 +356,9 @@
30
     pelFilterLumaStrong_t pelFilterLumaStrong[2]; // EDGE_VER = 0, EDGE_HOR = 1
31
     pelFilterChroma_t     pelFilterChroma[2];     // EDGE_VER = 0, EDGE_HOR = 1
32
 
33
+    integralv_t            integral_initv[NUM_INTEGRAL_SIZE];
34
+    integralh_t            integral_inith[NUM_INTEGRAL_SIZE];
35
+
36
     /* There is one set of chroma primitives per color space. An encoder will
37
      * have just a single color space and thus it will only ever use one entry
38
      * in this array. However we always fill all entries in the array in case
39
x265_2.4.tar.gz/source/common/slice.cpp -> x265_2.5.tar.gz/source/common/slice.cpp Changed
30
 
1
@@ -185,22 +185,22 @@
2
 uint32_t Slice::realEndAddress(uint32_t endCUAddr) const
3
 {
4
     // Calculate end address
5
-    uint32_t internalAddress = (endCUAddr - 1) % NUM_4x4_PARTITIONS;
6
-    uint32_t externalAddress = (endCUAddr - 1) / NUM_4x4_PARTITIONS;
7
-    uint32_t xmax = m_sps->picWidthInLumaSamples - (externalAddress % m_sps->numCuInWidth) * g_maxCUSize;
8
-    uint32_t ymax = m_sps->picHeightInLumaSamples - (externalAddress / m_sps->numCuInWidth) * g_maxCUSize;
9
+    uint32_t internalAddress = (endCUAddr - 1) % m_param->num4x4Partitions;
10
+    uint32_t externalAddress = (endCUAddr - 1) / m_param->num4x4Partitions;
11
+    uint32_t xmax = m_sps->picWidthInLumaSamples - (externalAddress % m_sps->numCuInWidth) * m_param->maxCUSize;
12
+    uint32_t ymax = m_sps->picHeightInLumaSamples - (externalAddress / m_sps->numCuInWidth) * m_param->maxCUSize;
13
 
14
     while (g_zscanToPelX[internalAddress] >= xmax || g_zscanToPelY[internalAddress] >= ymax)
15
         internalAddress--;
16
 
17
     internalAddress++;
18
-    if (internalAddress == NUM_4x4_PARTITIONS)
19
+    if (internalAddress == m_param->num4x4Partitions)
20
     {
21
         internalAddress = 0;
22
         externalAddress++;
23
     }
24
 
25
-    return externalAddress * NUM_4x4_PARTITIONS + internalAddress;
26
+    return externalAddress * m_param->num4x4Partitions + internalAddress;
27
 }
28
 
29
 
30
x265_2.4.tar.gz/source/common/slice.h -> x265_2.5.tar.gz/source/common/slice.h Changed
9
 
1
@@ -360,6 +360,7 @@
2
     int         m_iPPSQpMinus26;
3
     int         numRefIdxDefault[2];
4
     int         m_iNumRPSInSPS;
5
+    const x265_param *m_param;
6
 
7
     Slice()
8
     {
9
x265_2.4.tar.gz/source/common/threadpool.cpp -> x265_2.5.tar.gz/source/common/threadpool.cpp Changed
73
 
1
@@ -253,6 +253,7 @@
2
     int cpusPerNode[MAX_NODE_NUM + 1];
3
     int threadsPerPool[MAX_NODE_NUM + 2];
4
     uint64_t nodeMaskPerPool[MAX_NODE_NUM + 2];
5
+    int totalNumThreads = 0;
6
 
7
     memset(cpusPerNode, 0, sizeof(cpusPerNode));
8
     memset(threadsPerPool, 0, sizeof(threadsPerPool));
9
@@ -388,9 +389,23 @@
10
         if (bNumaSupport)
11
             x265_log(p, X265_LOG_DEBUG, "NUMA node %d may use %d logical cores\n", i, cpusPerNode[i]);
12
         if (threadsPerPool[i])
13
+        {
14
             numPools += (threadsPerPool[i] + MAX_POOL_THREADS - 1) / MAX_POOL_THREADS;
15
+            totalNumThreads += threadsPerPool[i];
16
+        }
17
     }
18
+    if (!isThreadsReserved)
19
+    {
20
+        if (!numPools)
21
+        {
22
+            x265_log(p, X265_LOG_DEBUG, "No pool thread available. Deciding frame-threads based on detected CPU threads\n");
23
+            totalNumThreads = ThreadPool::getCpuCount(); // auto-detect frame threads
24
+        }
25
 
26
+        if (!p->frameNumThreads)
27
+            ThreadPool::getFrameThreadsCount(p, totalNumThreads);
28
+    }
29
+    
30
     if (!numPools)
31
         return NULL;
32
 
33
@@ -412,7 +427,7 @@
34
                 node++;
35
             int numThreads = X265_MIN(MAX_POOL_THREADS, threadsPerPool[node]);
36
             int origNumThreads = numThreads;
37
-            if (p->lookaheadThreads > numThreads / 2)
38
+            if (i == 0 && p->lookaheadThreads > numThreads / 2)
39
             {
40
                 p->lookaheadThreads = numThreads / 2;
41
                 x265_log(p, X265_LOG_DEBUG, "Setting lookahead threads to a maximum of half the total number of threads\n");
42
@@ -423,7 +438,7 @@
43
                 maxProviders = 1;
44
             }
45
 
46
-            else
47
+            else if (i == 0)
48
                 numThreads -= p->lookaheadThreads;
49
             if (!pools[i].create(numThreads, maxProviders, nodeMaskPerPool[node]))
50
             {
51
@@ -643,4 +658,21 @@
52
 #endif
53
 }
54
 
55
+void ThreadPool::getFrameThreadsCount(x265_param* p, int cpuCount)
56
+{
57
+    int rows = (p->sourceHeight + p->maxCUSize - 1) >> g_log2Size[p->maxCUSize];
58
+    if (!p->bEnableWavefront)
59
+        p->frameNumThreads = X265_MIN3(cpuCount, (rows + 1) / 2, X265_MAX_FRAME_THREADS);
60
+    else if (cpuCount >= 32)
61
+        p->frameNumThreads = (p->sourceHeight > 2000) ? 6 : 5; 
62
+    else if (cpuCount >= 16)
63
+        p->frameNumThreads = 4; 
64
+    else if (cpuCount >= 8)
65
+        p->frameNumThreads = 3;
66
+    else if (cpuCount >= 4)
67
+        p->frameNumThreads = 2;
68
+    else
69
+        p->frameNumThreads = 1;
70
+}
71
+
72
 } // end namespace X265_NS
73
x265_2.4.tar.gz/source/common/threadpool.h -> x265_2.5.tar.gz/source/common/threadpool.h Changed
9
 
1
@@ -105,6 +105,7 @@
2
     static ThreadPool* allocThreadPools(x265_param* p, int& numPools, bool isThreadsReserved);
3
     static int  getCpuCount();
4
     static int  getNumaNodeCount();
5
+    static void getFrameThreadsCount(x265_param* p,int cpuCount);
6
 };
7
 
8
 /* Any worker thread may enlist the help of idle worker threads from the same
9
x265_2.4.tar.gz/source/common/x86/asm-primitives.cpp -> x265_2.5.tar.gz/source/common/x86/asm-primitives.cpp Changed
47
 
1
@@ -114,6 +114,7 @@
2
 #include "blockcopy8.h"
3
 #include "intrapred.h"
4
 #include "dct8.h"
5
+#include "seaintegral.h"
6
 }
7
 
8
 #define ALL_LUMA_CU_TYPED(prim, fncdef, fname, cpu) \
9
@@ -2157,6 +2158,17 @@
10
         p.fix8Unpack = PFX(cutree_fix8_unpack_avx2);
11
         p.fix8Pack = PFX(cutree_fix8_pack_avx2);
12
 
13
+        p.integral_initv[INTEGRAL_4] = PFX(integral4v_avx2);
14
+        p.integral_initv[INTEGRAL_8] = PFX(integral8v_avx2);
15
+        p.integral_initv[INTEGRAL_12] = PFX(integral12v_avx2);
16
+        p.integral_initv[INTEGRAL_16] = PFX(integral16v_avx2);
17
+        p.integral_initv[INTEGRAL_24] = PFX(integral24v_avx2);
18
+        p.integral_initv[INTEGRAL_32] = PFX(integral32v_avx2);
19
+        p.integral_inith[INTEGRAL_4] = PFX(integral4h_avx2);
20
+        p.integral_inith[INTEGRAL_8] = PFX(integral8h_avx2);
21
+        p.integral_inith[INTEGRAL_12] = PFX(integral12h_avx2);
22
+        p.integral_inith[INTEGRAL_16] = PFX(integral16h_avx2);
23
+
24
         /* TODO: This kernel needs to be modified to work with HIGH_BIT_DEPTH only 
25
         p.planeClipAndMax = PFX(planeClipAndMax_avx2); */
26
 
27
@@ -3695,6 +3707,19 @@
28
         p.fix8Unpack = PFX(cutree_fix8_unpack_avx2);
29
         p.fix8Pack = PFX(cutree_fix8_pack_avx2);
30
 
31
+        p.integral_initv[INTEGRAL_4] = PFX(integral4v_avx2);
32
+        p.integral_initv[INTEGRAL_8] = PFX(integral8v_avx2);
33
+        p.integral_initv[INTEGRAL_12] = PFX(integral12v_avx2);
34
+        p.integral_initv[INTEGRAL_16] = PFX(integral16v_avx2);
35
+        p.integral_initv[INTEGRAL_24] = PFX(integral24v_avx2);
36
+        p.integral_initv[INTEGRAL_32] = PFX(integral32v_avx2);
37
+        p.integral_inith[INTEGRAL_4] = PFX(integral4h_avx2);
38
+        p.integral_inith[INTEGRAL_8] = PFX(integral8h_avx2);
39
+        p.integral_inith[INTEGRAL_12] = PFX(integral12h_avx2);
40
+        p.integral_inith[INTEGRAL_16] = PFX(integral16h_avx2);
41
+        p.integral_inith[INTEGRAL_24] = PFX(integral24h_avx2);
42
+        p.integral_inith[INTEGRAL_32] = PFX(integral32h_avx2);
43
+
44
     }
45
 #endif
46
 }
47
x265_2.4.tar.gz/source/common/x86/loopfilter.asm -> x265_2.5.tar.gz/source/common/x86/loopfilter.asm Changed
28
 
1
@@ -1583,7 +1583,7 @@
2
     pshufb      m1, m4, m0
3
     pcmpgtb     m0, [pb_15]         ; m0 = [mask]
4
 
5
-    pblendvb    m6, m6, m1, m0      ; NOTE: don't use 3 parameters style, x264 macro have some bug!
6
+    pblendvb    m6, m1, m0
7
 
8
     pmovsxbw    m0, m6              ; offset
9
     punpckhbw   m6, m6
10
@@ -1630,7 +1630,7 @@
11
     pshufb      m6, m3, m1
12
     pshufb      m5, m4, m1
13
 
14
-    pblendvb    m6, m6, m5, m0    ; NOTE: don't use 3 parameters style, x264 macro have some bug!
15
+    pblendvb    m6, m5, m0
16
 
17
     pmovzxbw    m1, m2            ; rec
18
     punpckhbw   m2, m7
19
@@ -1904,7 +1904,7 @@
20
     sub         r3,     r4
21
     movu        xmm0,   [r3]
22
     movu        m3,     [r0]
23
-    pblendvb    m5,     m5,     m3,     xmm0
24
+    pblendvb    m5,     m3,     xmm0
25
     movu        [r0],   m5
26
 
27
 .end:
28
x265_2.4.tar.gz/source/common/x86/pixel-a.asm -> x265_2.5.tar.gz/source/common/x86/pixel-a.asm Changed
10
 
1
@@ -227,7 +227,7 @@
2
 ; clobber: m3..m7
3
 ; out: %1 = satd
4
 %macro SATD_4x4_MMX 3
5
-    %xdefine %%n n%1
6
+    %xdefine %%n nn%1
7
     %assign offset %2*SIZEOF_PIXEL
8
     LOAD_DIFF m4, m3, none, [r0+     offset], [r2+     offset]
9
     LOAD_DIFF m5, m3, none, [r0+  r1+offset], [r2+  r3+offset]
10
x265_2.4.tar.gz/source/common/x86/pixel-util8.asm -> x265_2.5.tar.gz/source/common/x86/pixel-util8.asm Changed
10
 
1
@@ -1597,7 +1597,7 @@
2
 
3
 .widthLess8:
4
     movu        m6, [r1]
5
-    pblendvb    m6, m6, m7, m0
6
+    pblendvb    m6, m7, m0
7
     movu        [r1], m6
8
 
9
 .nextH:
10
x265_2.5.tar.gz/source/common/x86/seaintegral.asm Added
1064
 
1
@@ -0,0 +1,1062 @@
2
+;*****************************************************************************
3
+;* Copyright (C) 2013-2017 MulticoreWare, Inc
4
+;*
5
+;* Authors: Jayashri Murugan <jayashri@multicorewareinc.com>
6
+;*          Vignesh V Menon <vignesh@multicorewareinc.com>
7
+;*          Praveen Tiwari <praveen@multicorewareinc.com>
8
+;*
9
+;* This program is free software; you can redistribute it and/or modify
10
+;* it under the terms of the GNU General Public License as published by
11
+;* the Free Software Foundation; either version 2 of the License, or
12
+;* (at your option) any later version.
13
+;*
14
+;* This program is distributed in the hope that it will be useful,
15
+;* but WITHOUT ANY WARRANTY; without even the implied warranty of
16
+;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
17
+;* GNU General Public License for more details.
18
+;*
19
+;* You should have received a copy of the GNU General Public License
20
+;* along with this program; if not, write to the Free Software
21
+;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
22
+;*
23
+;* This program is also available under a commercial proprietary license.
24
+;* For more information, contact us at license @ x265.com.
25
+;*****************************************************************************/
26
+
27
+%include "x86inc.asm"
28
+%include "x86util.asm"
29
+
30
+SECTION .text 
31
+
32
+;-----------------------------------------------------------------------------
33
+;void integral_init4v_c(uint32_t *sum4, intptr_t stride)
34
+;-----------------------------------------------------------------------------
35
+INIT_YMM avx2
36
+cglobal integral4v, 2, 3, 2
37
+    mov r2, r1
38
+    shl r2, 4
39
+
40
+.loop
41
+    movu    m0, [r0]
42
+    movu    m1, [r0 + r2]
43
+    psubd   m1, m0
44
+    movu    [r0], m1
45
+    add     r0, 32
46
+    sub     r1, 8
47
+    jnz     .loop
48
+    RET
49
+
50
+;-----------------------------------------------------------------------------
51
+;void integral_init8v_c(uint32_t *sum8, intptr_t stride)
52
+;-----------------------------------------------------------------------------
53
+INIT_YMM avx2
54
+cglobal integral8v, 2, 3, 2
55
+    mov r2, r1
56
+    shl r2, 5
57
+
58
+.loop
59
+    movu    m0, [r0]
60
+    movu    m1, [r0 + r2]
61
+    psubd   m1, m0
62
+    movu    [r0], m1
63
+    add     r0, 32
64
+    sub     r1, 8
65
+    jnz     .loop
66
+    RET
67
+
68
+;-----------------------------------------------------------------------------
69
+;void integral_init12v_c(uint32_t *sum12, intptr_t stride)
70
+;-----------------------------------------------------------------------------
71
+INIT_YMM avx2
72
+cglobal integral12v, 2, 4, 2
73
+    mov r2, r1
74
+    mov r3, r1
75
+    shl r2, 5
76
+    shl r3, 4
77
+    add r2, r3
78
+
79
+.loop
80
+    movu    m0, [r0]
81
+    movu    m1, [r0 + r2]
82
+    psubd   m1, m0
83
+    movu    [r0], m1
84
+    add     r0, 32
85
+    sub     r1, 8
86
+    jnz     .loop
87
+    RET
88
+
89
+;-----------------------------------------------------------------------------
90
+;void integral_init16v_c(uint32_t *sum16, intptr_t stride)
91
+;-----------------------------------------------------------------------------
92
+INIT_YMM avx2
93
+cglobal integral16v, 2, 3, 2
94
+    mov r2, r1
95
+    shl r2, 6
96
+
97
+.loop
98
+    movu    m0, [r0]
99
+    movu    m1, [r0 + r2]
100
+    psubd   m1, m0
101
+    movu    [r0], m1
102
+    add     r0, 32
103
+    sub     r1, 8
104
+    jnz     .loop
105
+    RET
106
+
107
+;-----------------------------------------------------------------------------
108
+;void integral_init24v_c(uint32_t *sum24, intptr_t stride)
109
+;-----------------------------------------------------------------------------
110
+INIT_YMM avx2
111
+cglobal integral24v, 2, 4, 2
112
+    mov r2, r1
113
+    mov r3, r1
114
+    shl r2, 6
115
+    shl r3, 5
116
+    add r2, r3
117
+
118
+.loop
119
+    movu    m0, [r0]
120
+    movu    m1, [r0 + r2]
121
+    psubd   m1, m0
122
+    movu    [r0], m1
123
+    add     r0, 32
124
+    sub     r1, 8
125
+    jnz     .loop
126
+    RET
127
+
128
+;-----------------------------------------------------------------------------
129
+;void integral_init32v_c(uint32_t *sum32, intptr_t stride)
130
+;-----------------------------------------------------------------------------
131
+INIT_YMM avx2
132
+cglobal integral32v, 2, 3, 2
133
+    mov r2, r1
134
+    shl r2, 7
135
+
136
+.loop
137
+    movu    m0, [r0]
138
+    movu    m1, [r0 + r2]
139
+    psubd   m1, m0
140
+    movu    [r0], m1
141
+    add     r0, 32
142
+    sub     r1, 8
143
+    jnz     .loop
144
+    RET
145
+
146
+%macro INTEGRAL_FOUR_HORIZONTAL_16 0
147
+    pmovzxbw       m0, [r1]
148
+    pmovzxbw       m1, [r1 + 1]
149
+    paddw          m0, m1
150
+    pmovzxbw       m1, [r1 + 2]
151
+    paddw          m0, m1
152
+    pmovzxbw       m1, [r1 + 3]
153
+    paddw          m0, m1
154
+%endmacro
155
+
156
+%macro INTEGRAL_FOUR_HORIZONTAL_4 0
157
+    movd       xm0, [r1]
158
+    movd       xm1, [r1 + 1]
159
+    pmovzxbw   xm0, xm0
160
+    pmovzxbw   xm1, xm1
161
+    paddw      xm0, xm1
162
+    movd       xm1, [r1 + 2]
163
+    pmovzxbw   xm1, xm1
164
+    paddw      xm0, xm1
165
+    movd       xm1, [r1 + 3]
166
+    pmovzxbw   xm1, xm1
167
+    paddw      xm0, xm1
168
+%endmacro
169
+
170
+%macro INTEGRAL_FOUR_HORIZONTAL_8_HBD 0
171
+    pmovzxwd       m0, [r1]
172
+    pmovzxwd       m1, [r1 + 2]
173
+    paddd          m0, m1
174
+    pmovzxwd       m1, [r1 + 4]
175
+    paddd          m0, m1
176
+    pmovzxwd       m1, [r1 + 6]
177
+    paddd          m0, m1
178
+%endmacro
179
+
180
+%macro INTEGRAL_FOUR_HORIZONTAL_4_HBD 0
181
+    pmovzxwd       xm0, [r1]
182
+    pmovzxwd       xm1, [r1 + 2]
183
+    paddd          xm0, xm1
184
+    pmovzxwd       xm1, [r1 + 4]
185
+    paddd          xm0, xm1
186
+    pmovzxwd       xm1, [r1 + 6]
187
+    paddd          xm0, xm1
188
+%endmacro
189
+
190
+;-----------------------------------------------------------------------------
191
+;static void integral_init4h(uint32_t *sum, pixel *pix, intptr_t stride)
192
+;-----------------------------------------------------------------------------
193
+INIT_YMM avx2
194
+%if HIGH_BIT_DEPTH
195
+cglobal integral4h, 3, 5, 3
196
+    lea            r3, [4 * r2]
197
+    sub            r0, r3
198
+    sub            r2, 4                      ;stride - 4
199
+    mov            r4, r2
200
+    shr            r4, 3
201
+
202
+.loop_8:
203
+    INTEGRAL_FOUR_HORIZONTAL_8_HBD
204
+    movu           m1, [r0]
205
+    paddd          m0, m1
206
+    movu           [r0 + r3], m0 
207
+    add            r1, 16
208
+    add            r0, 32
209
+    sub            r2, 8
210
+    sub            r4, 1
211
+    jnz            .loop_8
212
+    INTEGRAL_FOUR_HORIZONTAL_4_HBD
213
+    movu           xm1, [r0]
214
+    paddd          xm0, xm1
215
+    movu           [r0 + r3], xm0
216
+    RET
217
+
218
+%else
219
+cglobal integral4h, 3, 5, 3
220
+    lea            r3, [4 * r2]
221
+    sub            r0, r3
222
+    sub            r2, 4                      ;stride - 4
223
+    mov            r4, r2
224
+    shr            r4, 4
225
+
226
+.loop_16:
227
+    INTEGRAL_FOUR_HORIZONTAL_16
228
+    vperm2i128     m2, m0, m0, 1
229
+    pmovzxwd       m2, xm2
230
+    pmovzxwd       m0, xm0
231
+    movu           m1, [r0]
232
+    paddd          m0, m1
233
+    movu           [r0 + r3], m0
234
+    movu           m1, [r0 + 32]
235
+    paddd          m2, m1
236
+    movu           [r0 + r3 + 32], m2
237
+    add            r1, 16
238
+    add            r0, 64
239
+    sub            r2, 16
240
+    sub            r4, 1
241
+    jnz            .loop_16
242
+    cmp            r2, 12
243
+    je             .loop_12
244
+    cmp            r2, 4
245
+    je             .loop_4
246
+
247
+.loop_12:
248
+    INTEGRAL_FOUR_HORIZONTAL_16
249
+    vperm2i128     m2, m0, m0, 1
250
+    pmovzxwd       xm2, xm2
251
+    pmovzxwd       m0, xm0
252
+    movu           m1, [r0]
253
+    paddd          m0, m1
254
+    movu           [r0 + r3], m0
255
+    movu           xm1, [r0 + 32]
256
+    paddd          xm2, xm1
257
+    movu           [r0 + r3 + 32], xm2
258
+    jmp             .end
259
+
260
+.loop_4:
261
+    INTEGRAL_FOUR_HORIZONTAL_4
262
+    pmovzxwd       xm0, xm0
263
+    movu           xm1, [r0]
264
+    paddd          xm0, xm1
265
+    movu           [r0 + r3], xm0
266
+    jmp            .end
267
+
268
+.end
269
+    RET
270
+%endif
271
+
272
+%macro INTEGRAL_EIGHT_HORIZONTAL_16 0
273
+    pmovzxbw       m0, [r1]
274
+    pmovzxbw       m1, [r1 + 1]
275
+    paddw          m0, m1
276
+    pmovzxbw       m1, [r1 + 2]
277
+    paddw          m0, m1
278
+    pmovzxbw       m1, [r1 + 3]
279
+    paddw          m0, m1
280
+    pmovzxbw       m1, [r1 + 4]
281
+    paddw          m0, m1
282
+    pmovzxbw       m1, [r1 + 5]
283
+    paddw          m0, m1
284
+    pmovzxbw       m1, [r1 + 6]
285
+    paddw          m0, m1
286
+    pmovzxbw       m1, [r1 + 7]
287
+    paddw          m0, m1
288
+%endmacro
289
+
290
+%macro INTEGRAL_EIGHT_HORIZONTAL_8 0
291
+    pmovzxbw       xm0, [r1]
292
+    pmovzxbw       xm1, [r1 + 1]
293
+    paddw          xm0, xm1
294
+    pmovzxbw       xm1, [r1 + 2]
295
+    paddw          xm0, xm1
296
+    pmovzxbw       xm1, [r1 + 3]
297
+    paddw          xm0, xm1
298
+    pmovzxbw       xm1, [r1 + 4]
299
+    paddw          xm0, xm1
300
+    pmovzxbw       xm1, [r1 + 5]
301
+    paddw          xm0, xm1
302
+    pmovzxbw       xm1, [r1 + 6]
303
+    paddw          xm0, xm1
304
+    pmovzxbw       xm1, [r1 + 7]
305
+    paddw          xm0, xm1
306
+%endmacro
307
+
308
+%macro INTEGRAL_EIGHT_HORIZONTAL_8_HBD 0
309
+    pmovzxwd       m0, [r1]
310
+    pmovzxwd       m1, [r1 + 2]
311
+    paddd          m0, m1
312
+    pmovzxwd       m1, [r1 + 4]
313
+    paddd          m0, m1
314
+    pmovzxwd       m1, [r1 + 6]
315
+    paddd          m0, m1
316
+    pmovzxwd       m1, [r1 + 8]
317
+    paddd          m0, m1
318
+    pmovzxwd       m1, [r1 + 10]
319
+    paddd          m0, m1
320
+    pmovzxwd       m1, [r1 + 12]
321
+    paddd          m0, m1
322
+    pmovzxwd       m1, [r1 + 14]
323
+    paddd          m0, m1
324
+%endmacro
325
+
326
+;-----------------------------------------------------------------------------
327
+;static void integral_init8h_c(uint32_t *sum, pixel *pix, intptr_t stride)
328
+;-----------------------------------------------------------------------------
329
+INIT_YMM avx2
330
+%if HIGH_BIT_DEPTH
331
+cglobal integral8h, 3, 4, 3
332
+    lea            r3, [4 * r2]
333
+    sub            r0, r3
334
+    sub            r2, 8                      ;stride - 8
335
+
336
+.loop:
337
+    INTEGRAL_EIGHT_HORIZONTAL_8_HBD
338
+    movu           m1, [r0]
339
+    paddd          m0, m1
340
+    movu           [r0 + r3], m0 
341
+    add            r1, 16
342
+    add            r0, 32
343
+    sub            r2, 8
344
+    jnz            .loop
345
+    RET
346
+
347
+%else
348
+cglobal integral8h, 3, 5, 3
349
+    lea            r3, [4 * r2]
350
+    sub            r0, r3
351
+    sub            r2, 8                      ;stride - 8
352
+    mov            r4, r2
353
+    shr            r4, 4
354
+
355
+.loop_16:
356
+    INTEGRAL_EIGHT_HORIZONTAL_16
357
+    vperm2i128     m2, m0, m0, 1
358
+    pmovzxwd       m2, xm2
359
+    pmovzxwd       m0, xm0
360
+    movu           m1, [r0]
361
+    paddd          m0, m1
362
+    movu           [r0 + r3], m0
363
+    movu           m1, [r0 + 32]
364
+    paddd          m2, m1
365
+    movu           [r0 + r3 + 32], m2
366
+    add            r1, 16
367
+    add            r0, 64
368
+    sub            r2, 16
369
+    sub            r4, 1
370
+    jnz            .loop_16
371
+    cmp            r2, 8
372
+    je             .loop_8
373
+    jmp             .end
374
+
375
+.loop_8:
376
+    INTEGRAL_EIGHT_HORIZONTAL_8
377
+    pmovzxwd       m0, xm0
378
+    movu           m1, [r0]
379
+    paddd          m0, m1
380
+    movu           [r0 + r3], m0
381
+    jmp             .end
382
+
383
+.end
384
+    RET
385
+%endif
386
+
387
+%macro INTEGRAL_TWELVE_HORIZONTAL_16 0
388
+    pmovzxbw       m0, [r1]
389
+    pmovzxbw       m1, [r1 + 1]
390
+    paddw          m0, m1
391
+    pmovzxbw       m1, [r1 + 2]
392
+    paddw          m0, m1
393
+    pmovzxbw       m1, [r1 + 3]
394
+    paddw          m0, m1
395
+    pmovzxbw       m1, [r1 + 4]
396
+    paddw          m0, m1
397
+    pmovzxbw       m1, [r1 + 5]
398
+    paddw          m0, m1
399
+    pmovzxbw       m1, [r1 + 6]
400
+    paddw          m0, m1
401
+    pmovzxbw       m1, [r1 + 7]
402
+    paddw          m0, m1
403
+    pmovzxbw       m1, [r1 + 8]
404
+    paddw          m0, m1
405
+    pmovzxbw       m1, [r1 + 9]
406
+    paddw          m0, m1
407
+    pmovzxbw       m1, [r1 + 10]
408
+    paddw          m0, m1
409
+    pmovzxbw       m1, [r1 + 11]
410
+    paddw          m0, m1
411
+%endmacro
412
+
413
+%macro INTEGRAL_TWELVE_HORIZONTAL_4 0
414
+    movd           xm0, [r1]
415
+    movd           xm1, [r1 + 1]
416
+    pmovzxbw       xm0, xm0
417
+    pmovzxbw       xm1, xm1
418
+    paddw          xm0, xm1
419
+    movd           xm1, [r1 + 2]
420
+    pmovzxbw       xm1, xm1
421
+    paddw          xm0, xm1
422
+    movd           xm1, [r1 + 3]
423
+    pmovzxbw       xm1, xm1
424
+    paddw          xm0, xm1
425
+    movd           xm1, [r1 + 4]
426
+    pmovzxbw       xm1, xm1
427
+    paddw          xm0, xm1
428
+    movd           xm1, [r1 + 5]
429
+    pmovzxbw       xm1, xm1
430
+    paddw          xm0, xm1
431
+    movd           xm1, [r1 + 6]
432
+    pmovzxbw       xm1, xm1
433
+    paddw          xm0, xm1
434
+    movd           xm1, [r1 + 7]
435
+    pmovzxbw       xm1, xm1
436
+    paddw          xm0, xm1
437
+    movd           xm1, [r1 + 8]
438
+    pmovzxbw       xm1, xm1
439
+    paddw          xm0, xm1
440
+    movd           xm1, [r1 + 9]
441
+    pmovzxbw       xm1, xm1
442
+    paddw          xm0, xm1
443
+    movd           xm1, [r1 + 10]
444
+    pmovzxbw       xm1, xm1
445
+    paddw          xm0, xm1
446
+    movd           xm1, [r1 + 11]
447
+    pmovzxbw       xm1, xm1
448
+    paddw          xm0, xm1
449
+%endmacro
450
+
451
+%macro INTEGRAL_TWELVE_HORIZONTAL_8_HBD 0
452
+    pmovzxwd       m0, [r1]
453
+    pmovzxwd       m1, [r1 + 2]
454
+    paddd          m0, m1
455
+    pmovzxwd       m1, [r1 + 4]
456
+    paddd          m0, m1
457
+    pmovzxwd       m1, [r1 + 6]
458
+    paddd          m0, m1
459
+    pmovzxwd       m1, [r1 + 8]
460
+    paddd          m0, m1
461
+    pmovzxwd       m1, [r1 + 10]
462
+    paddd          m0, m1
463
+    pmovzxwd       m1, [r1 + 12]
464
+    paddd          m0, m1
465
+    pmovzxwd       m1, [r1 + 14]
466
+    paddd          m0, m1
467
+    pmovzxwd       m1, [r1 + 16]
468
+    paddd          m0, m1
469
+    pmovzxwd       m1, [r1 + 18]
470
+    paddd          m0, m1
471
+    pmovzxwd       m1, [r1 + 20]
472
+    paddd          m0, m1
473
+    pmovzxwd       m1, [r1 + 22]
474
+    paddd          m0, m1
475
+%endmacro
476
+
477
+%macro INTEGRAL_TWELVE_HORIZONTAL_4_HBD 0
478
+    pmovzxwd       xm0, [r1]
479
+    pmovzxwd       xm1, [r1 + 2]
480
+    paddd          xm0, xm1
481
+    pmovzxwd       xm1, [r1 + 4]
482
+    paddd          xm0, xm1
483
+    pmovzxwd       xm1, [r1 + 6]
484
+    paddd          xm0, xm1
485
+    pmovzxwd       xm1, [r1 + 8]
486
+    paddd          xm0, xm1
487
+    pmovzxwd       xm1, [r1 + 10]
488
+    paddd          xm0, xm1
489
+    pmovzxwd       xm1, [r1 + 12]
490
+    paddd          xm0, xm1
491
+    pmovzxwd       xm1, [r1 + 14]
492
+    paddd          xm0, xm1
493
+    pmovzxwd       xm1, [r1 + 16]
494
+    paddd          xm0, xm1
495
+    pmovzxwd       xm1, [r1 + 18]
496
+    paddd          xm0, xm1
497
+    pmovzxwd       xm1, [r1 + 20]
498
+    paddd          xm0, xm1
499
+    pmovzxwd       xm1, [r1 + 22]
500
+    paddd          xm0, xm1
501
+%endmacro
502
+
503
+;-----------------------------------------------------------------------------
504
+;static void integral_init12h_c(uint32_t *sum, pixel *pix, intptr_t stride)
505
+;-----------------------------------------------------------------------------
506
+INIT_YMM avx2
507
+%if HIGH_BIT_DEPTH
508
+cglobal integral12h, 3, 5, 3
509
+    lea            r3, [4 * r2]
510
+    sub            r0, r3
511
+    sub            r2, 12                      ;stride - 12
512
+    mov            r4, r2
513
+    shr            r4, 3
514
+
515
+.loop:
516
+    INTEGRAL_TWELVE_HORIZONTAL_8_HBD
517
+    movu           m1, [r0]
518
+    paddd          m0, m1
519
+    movu           [r0 + r3], m0
520
+    add            r1, 16
521
+    add            r0, 32
522
+    sub            r2, 8
523
+    sub            r4, 1
524
+    jnz            .loop
525
+    INTEGRAL_TWELVE_HORIZONTAL_4_HBD
526
+    movu           xm1, [r0]
527
+    paddd          xm0, xm1
528
+    movu           [r0 + r3], xm0
529
+    RET
530
+
531
+%else
532
+cglobal integral12h, 3, 5, 3
533
+    lea            r3, [4 * r2]
534
+    sub            r0, r3
535
+    sub            r2, 12                      ;stride - 12
536
+    mov            r4, r2
537
+    shr            r4, 4
538
+
539
+.loop_16:
540
+    INTEGRAL_TWELVE_HORIZONTAL_16
541
+    vperm2i128     m2, m0, m0, 1
542
+    pmovzxwd       m2, xm2
543
+    pmovzxwd       m0, xm0
544
+    movu           m1, [r0]
545
+    paddd          m0, m1
546
+    movu           [r0 + r3], m0
547
+    movu           m1, [r0 + 32]
548
+    paddd          m2, m1
549
+    movu           [r0 + r3 + 32], m2
550
+    add            r1, 16
551
+    add            r0, 64
552
+    sub            r2, 16
553
+    sub            r4, 1
554
+    jnz            .loop_16
555
+    cmp            r2, 12
556
+    je             .loop_12
557
+    cmp            r2, 4
558
+    je             .loop_4
559
+
560
+.loop_12:
561
+    INTEGRAL_TWELVE_HORIZONTAL_16
562
+    vperm2i128     m2, m0, m0, 1
563
+    pmovzxwd       xm2, xm2
564
+    pmovzxwd       m0, xm0
565
+    movu           m1, [r0]
566
+    paddd          m0, m1
567
+    movu           [r0 + r3], m0
568
+    movu           xm1, [r0 + 32]
569
+    paddd          xm2, xm1
570
+    movu           [r0 + r3 + 32], xm2
571
+    jmp             .end
572
+
573
+.loop_4:
574
+    INTEGRAL_TWELVE_HORIZONTAL_4
575
+    pmovzxwd       xm0, xm0
576
+    movu           xm1, [r0]
577
+    paddd          xm0, xm1
578
+    movu           [r0 + r3], xm0
579
+    jmp            .end
580
+
581
+.end
582
+    RET
583
+%endif
584
+
585
+%macro INTEGRAL_SIXTEEN_HORIZONTAL_16 0
586
+    pmovzxbw       m0, [r1]
587
+    pmovzxbw       m1, [r1 + 1]
588
+    paddw          m0, m1
589
+    pmovzxbw       m1, [r1 + 2]
590
+    paddw          m0, m1
591
+    pmovzxbw       m1, [r1 + 3]
592
+    paddw          m0, m1
593
+    pmovzxbw       m1, [r1 + 4]
594
+    paddw          m0, m1
595
+    pmovzxbw       m1, [r1 + 5]
596
+    paddw          m0, m1
597
+    pmovzxbw       m1, [r1 + 6]
598
+    paddw          m0, m1
599
+    pmovzxbw       m1, [r1 + 7]
600
+    paddw          m0, m1
601
+    pmovzxbw       m1, [r1 + 8]
602
+    paddw          m0, m1
603
+    pmovzxbw       m1, [r1 + 9]
604
+    paddw          m0, m1
605
+    pmovzxbw       m1, [r1 + 10]
606
+    paddw          m0, m1
607
+    pmovzxbw       m1, [r1 + 11]
608
+    paddw          m0, m1
609
+    pmovzxbw       m1, [r1 + 12]
610
+    paddw          m0, m1
611
+    pmovzxbw       m1, [r1 + 13]
612
+    paddw          m0, m1
613
+    pmovzxbw       m1, [r1 + 14]
614
+    paddw          m0, m1
615
+    pmovzxbw       m1, [r1 + 15]
616
+    paddw          m0, m1
617
+%endmacro
618
+
619
+%macro INTEGRAL_SIXTEEN_HORIZONTAL_8 0
620
+    pmovzxbw       xm0, [r1]
621
+    pmovzxbw       xm1, [r1 + 1]
622
+    paddw          xm0, xm1
623
+    pmovzxbw       xm1, [r1 + 2]
624
+    paddw          xm0, xm1
625
+    pmovzxbw       xm1, [r1 + 3]
626
+    paddw          xm0, xm1
627
+    pmovzxbw       xm1, [r1 + 4]
628
+    paddw          xm0, xm1
629
+    pmovzxbw       xm1, [r1 + 5]
630
+    paddw          xm0, xm1
631
+    pmovzxbw       xm1, [r1 + 6]
632
+    paddw          xm0, xm1
633
+    pmovzxbw       xm1, [r1 + 7]
634
+    paddw          xm0, xm1
635
+    pmovzxbw       xm1, [r1 + 8]
636
+    paddw          xm0, xm1
637
+    pmovzxbw       xm1, [r1 + 9]
638
+    paddw          xm0, xm1
639
+    pmovzxbw       xm1, [r1 + 10]
640
+    paddw          xm0, xm1
641
+    pmovzxbw       xm1, [r1 + 11]
642
+    paddw          xm0, xm1
643
+    pmovzxbw       xm1, [r1 + 12]
644
+    paddw          xm0, xm1
645
+    pmovzxbw       xm1, [r1 + 13]
646
+    paddw          xm0, xm1
647
+    pmovzxbw       xm1, [r1 + 14]
648
+    paddw          xm0, xm1
649
+    pmovzxbw       xm1, [r1 + 15]
650
+    paddw          xm0, xm1
651
+%endmacro
652
+
653
+%macro INTEGRAL_SIXTEEN_HORIZONTAL_8_HBD 0
654
+    pmovzxwd       m0, [r1]
655
+    pmovzxwd       m1, [r1 + 2]
656
+    paddd          m0, m1
657
+    pmovzxwd       m1, [r1 + 4]
658
+    paddd          m0, m1
659
+    pmovzxwd       m1, [r1 + 6]
660
+    paddd          m0, m1
661
+    pmovzxwd       m1, [r1 + 8]
662
+    paddd          m0, m1
663
+    pmovzxwd       m1, [r1 + 10]
664
+    paddd          m0, m1
665
+    pmovzxwd       m1, [r1 + 12]
666
+    paddd          m0, m1
667
+    pmovzxwd       m1, [r1 + 14]
668
+    paddd          m0, m1
669
+    pmovzxwd       m1, [r1 + 16]
670
+    paddd          m0, m1
671
+    pmovzxwd       m1, [r1 + 18]
672
+    paddd          m0, m1
673
+    pmovzxwd       m1, [r1 + 20]
674
+    paddd          m0, m1
675
+    pmovzxwd       m1, [r1 + 22]
676
+    paddd          m0, m1
677
+    pmovzxwd       m1, [r1 + 24]
678
+    paddd          m0, m1
679
+    pmovzxwd       m1, [r1 + 26]
680
+    paddd          m0, m1
681
+    pmovzxwd       m1, [r1 + 28]
682
+    paddd          m0, m1
683
+    pmovzxwd       m1, [r1 + 30]
684
+    paddd          m0, m1
685
+%endmacro
686
+
687
+;-----------------------------------------------------------------------------
688
+;static void integral_init16h_c(uint32_t *sum, pixel *pix, intptr_t stride)
689
+;-----------------------------------------------------------------------------
690
+INIT_YMM avx2
691
+%if HIGH_BIT_DEPTH
692
+cglobal integral16h, 3, 4, 3
693
+    lea            r3, [4 * r2]
694
+    sub            r0, r3
695
+    sub            r2, 16                      ;stride - 16
696
+
697
+.loop:
698
+    INTEGRAL_SIXTEEN_HORIZONTAL_8_HBD
699
+    movu           m1, [r0]
700
+    paddd          m0, m1
701
+    movu           [r0 + r3], m0 
702
+    add            r1, 16
703
+    add            r0, 32
704
+    sub            r2, 8
705
+    jnz            .loop
706
+    RET
707
+
708
+%else
709
+cglobal integral16h, 3, 5, 3
710
+    lea            r3, [4 * r2]
711
+    sub            r0, r3
712
+    sub            r2, 16                      ;stride - 16
713
+    mov            r4, r2
714
+    shr            r4, 4
715
+
716
+.loop_16:
717
+    INTEGRAL_SIXTEEN_HORIZONTAL_16
718
+    vperm2i128     m2, m0, m0, 1
719
+    pmovzxwd       m2, xm2
720
+    pmovzxwd       m0, xm0
721
+    movu           m1, [r0]
722
+    paddd          m0, m1
723
+    movu           [r0 + r3], m0
724
+    movu           m1, [r0 + 32]
725
+    paddd          m2, m1
726
+    movu           [r0 + r3 + 32], m2
727
+    add            r1, 16
728
+    add            r0, 64
729
+    sub            r2, 16
730
+    sub            r4, 1
731
+    jnz            .loop_16
732
+    cmp            r2, 8
733
+    je             .loop_8
734
+    jmp             .end
735
+
736
+.loop_8:
737
+    INTEGRAL_SIXTEEN_HORIZONTAL_8
738
+    pmovzxwd       m0, xm0
739
+    movu           m1, [r0]
740
+    paddd          m0, m1
741
+    movu           [r0 + r3], m0
742
+    jmp             .end
743
+
744
+.end
745
+    RET
746
+%endif
747
+
748
+%macro INTEGRAL_TWENTYFOUR_HORIZONTAL_16 0
749
+    pmovzxbw       m0, [r1]
750
+    pmovzxbw       m1, [r1 + 1]
751
+    paddw          m0, m1
752
+    pmovzxbw       m1, [r1 + 2]
753
+    paddw          m0, m1
754
+    pmovzxbw       m1, [r1 + 3]
755
+    paddw          m0, m1
756
+    pmovzxbw       m1, [r1 + 4]
757
+    paddw          m0, m1
758
+    pmovzxbw       m1, [r1 + 5]
759
+    paddw          m0, m1
760
+    pmovzxbw       m1, [r1 + 6]
761
+    paddw          m0, m1
762
+    pmovzxbw       m1, [r1 + 7]
763
+    paddw          m0, m1
764
+    pmovzxbw       m1, [r1 + 8]
765
+    paddw          m0, m1
766
+    pmovzxbw       m1, [r1 + 9]
767
+    paddw          m0, m1
768
+    pmovzxbw       m1, [r1 + 10]
769
+    paddw          m0, m1
770
+    pmovzxbw       m1, [r1 + 11]
771
+    paddw          m0, m1
772
+    pmovzxbw       m1, [r1 + 12]
773
+    paddw          m0, m1
774
+    pmovzxbw       m1, [r1 + 13]
775
+    paddw          m0, m1
776
+    pmovzxbw       m1, [r1 + 14]
777
+    paddw          m0, m1
778
+    pmovzxbw       m1, [r1 + 15]
779
+    paddw          m0, m1
780
+    pmovzxbw       m1, [r1 + 16]
781
+    paddw          m0, m1
782
+    pmovzxbw       m1, [r1 + 17]
783
+    paddw          m0, m1
784
+    pmovzxbw       m1, [r1 + 18]
785
+    paddw          m0, m1
786
+    pmovzxbw       m1, [r1 + 19]
787
+    paddw          m0, m1
788
+    pmovzxbw       m1, [r1 + 20]
789
+    paddw          m0, m1
790
+    pmovzxbw       m1, [r1 + 21]
791
+    paddw          m0, m1
792
+    pmovzxbw       m1, [r1 + 22]
793
+    paddw          m0, m1
794
+    pmovzxbw       m1, [r1 + 23]
795
+    paddw          m0, m1
796
+%endmacro
797
+
798
+%macro INTEGRAL_TWENTYFOUR_HORIZONTAL_8 0
799
+    pmovzxbw       xm0, [r1]
800
+    pmovzxbw       xm1, [r1 + 1]
801
+    paddw          xm0, xm1
802
+    pmovzxbw       xm1, [r1 + 2]
803
+    paddw          xm0, xm1
804
+    pmovzxbw       xm1, [r1 + 3]
805
+    paddw          xm0, xm1
806
+    pmovzxbw       xm1, [r1 + 4]
807
+    paddw          xm0, xm1
808
+    pmovzxbw       xm1, [r1 + 5]
809
+    paddw          xm0, xm1
810
+    pmovzxbw       xm1, [r1 + 6]
811
+    paddw          xm0, xm1
812
+    pmovzxbw       xm1, [r1 + 7]
813
+    paddw          xm0, xm1
814
+    pmovzxbw       xm1, [r1 + 8]
815
+    paddw          xm0, xm1
816
+    pmovzxbw       xm1, [r1 + 9]
817
+    paddw          xm0, xm1
818
+    pmovzxbw       xm1, [r1 + 10]
819
+    paddw          xm0, xm1
820
+    pmovzxbw       xm1, [r1 + 11]
821
+    paddw          xm0, xm1
822
+    pmovzxbw       xm1, [r1 + 12]
823
+    paddw          xm0, xm1
824
+    pmovzxbw       xm1, [r1 + 13]
825
+    paddw          xm0, xm1
826
+    pmovzxbw       xm1, [r1 + 14]
827
+    paddw          xm0, xm1
828
+    pmovzxbw       xm1, [r1 + 15]
829
+    paddw          xm0, xm1
830
+    pmovzxbw       xm1, [r1 + 16]
831
+    paddw          xm0, xm1
832
+    pmovzxbw       xm1, [r1 + 17]
833
+    paddw          xm0, xm1
834
+    pmovzxbw       xm1, [r1 + 18]
835
+    paddw          xm0, xm1
836
+    pmovzxbw       xm1, [r1 + 19]
837
+    paddw          xm0, xm1
838
+    pmovzxbw       xm1, [r1 + 20]
839
+    paddw          xm0, xm1
840
+    pmovzxbw       xm1, [r1 + 21]
841
+    paddw          xm0, xm1
842
+    pmovzxbw       xm1, [r1 + 22]
843
+    paddw          xm0, xm1
844
+    pmovzxbw       xm1, [r1 + 23]
845
+    paddw          xm0, xm1
846
+%endmacro
847
+
848
+;-----------------------------------------------------------------------------
849
+;static void integral_init24h_c(uint32_t *sum, pixel *pix, intptr_t stride)
850
+;-----------------------------------------------------------------------------
851
+INIT_YMM avx2
852
+cglobal integral24h, 3, 5, 3
853
+    lea            r3, [4 * r2]
854
+    sub            r0, r3
855
+    sub            r2, 24                      ;stride - 24
856
+    mov            r4, r2
857
+    shr            r4, 4
858
+
859
+.loop_16:
860
+    INTEGRAL_TWENTYFOUR_HORIZONTAL_16
861
+    vperm2i128     m2, m0, m0, 1
862
+    pmovzxwd       m2, xm2
863
+    pmovzxwd       m0, xm0
864
+    movu           m1, [r0]
865
+    paddd          m0, m1
866
+    movu           [r0 + r3], m0
867
+    movu           m1, [r0 + 32]
868
+    paddd          m2, m1
869
+    movu           [r0 + r3 + 32], m2
870
+    add            r1, 16
871
+    add            r0, 64
872
+    sub            r2, 16
873
+    sub            r4, 1
874
+    jnz            .loop_16
875
+    cmp            r2, 8
876
+    je             .loop_8
877
+    jmp             .end
878
+
879
+.loop_8:
880
+    INTEGRAL_TWENTYFOUR_HORIZONTAL_8
881
+    pmovzxwd       m0, xm0
882
+    movu           m1, [r0]
883
+    paddd          m0, m1
884
+    movu           [r0 + r3], m0
885
+    jmp             .end
886
+
887
+.end
888
+    RET
889
+
890
+%macro INTEGRAL_THIRTYTWO_HORIZONTAL_16 0
891
+    pmovzxbw       m0, [r1]
892
+    pmovzxbw       m1, [r1 + 1]
893
+    paddw          m0, m1
894
+    pmovzxbw       m1, [r1 + 2]
895
+    paddw          m0, m1
896
+    pmovzxbw       m1, [r1 + 3]
897
+    paddw          m0, m1
898
+    pmovzxbw       m1, [r1 + 4]
899
+    paddw          m0, m1
900
+    pmovzxbw       m1, [r1 + 5]
901
+    paddw          m0, m1
902
+    pmovzxbw       m1, [r1 + 6]
903
+    paddw          m0, m1
904
+    pmovzxbw       m1, [r1 + 7]
905
+    paddw          m0, m1
906
+    pmovzxbw       m1, [r1 + 8]
907
+    paddw          m0, m1
908
+    pmovzxbw       m1, [r1 + 9]
909
+    paddw          m0, m1
910
+    pmovzxbw       m1, [r1 + 10]
911
+    paddw          m0, m1
912
+    pmovzxbw       m1, [r1 + 11]
913
+    paddw          m0, m1
914
+    pmovzxbw       m1, [r1 + 12]
915
+    paddw          m0, m1
916
+    pmovzxbw       m1, [r1 + 13]
917
+    paddw          m0, m1
918
+    pmovzxbw       m1, [r1 + 14]
919
+    paddw          m0, m1
920
+    pmovzxbw       m1, [r1 + 15]
921
+    paddw          m0, m1
922
+    pmovzxbw       m1, [r1 + 16]
923
+    paddw          m0, m1
924
+    pmovzxbw       m1, [r1 + 17]
925
+    paddw          m0, m1
926
+    pmovzxbw       m1, [r1 + 18]
927
+    paddw          m0, m1
928
+    pmovzxbw       m1, [r1 + 19]
929
+    paddw          m0, m1
930
+    pmovzxbw       m1, [r1 + 20]
931
+    paddw          m0, m1
932
+    pmovzxbw       m1, [r1 + 21]
933
+    paddw          m0, m1
934
+    pmovzxbw       m1, [r1 + 22]
935
+    paddw          m0, m1
936
+    pmovzxbw       m1, [r1 + 23]
937
+    paddw          m0, m1
938
+    pmovzxbw       m1, [r1 + 24]
939
+    paddw          m0, m1
940
+    pmovzxbw       m1, [r1 + 25]
941
+    paddw          m0, m1
942
+    pmovzxbw       m1, [r1 + 26]
943
+    paddw          m0, m1
944
+    pmovzxbw       m1, [r1 + 27]
945
+    paddw          m0, m1
946
+    pmovzxbw       m1, [r1 + 28]
947
+    paddw          m0, m1
948
+    pmovzxbw       m1, [r1 + 29]
949
+    paddw          m0, m1
950
+    pmovzxbw       m1, [r1 + 30]
951
+    paddw          m0, m1
952
+    pmovzxbw       m1, [r1 + 31]
953
+    paddw          m0, m1
954
+%endmacro
955
+
956
+
957
+%macro INTEGRAL_THIRTYTWO_HORIZONTAL_8 0
958
+    pmovzxbw       xm0, [r1]
959
+    pmovzxbw       xm1, [r1 + 1]
960
+    paddw          xm0, xm1
961
+    pmovzxbw       xm1, [r1 + 2]
962
+    paddw          xm0, xm1
963
+    pmovzxbw       xm1, [r1 + 3]
964
+    paddw          xm0, xm1
965
+    pmovzxbw       xm1, [r1 + 4]
966
+    paddw          xm0, xm1
967
+    pmovzxbw       xm1, [r1 + 5]
968
+    paddw          xm0, xm1
969
+    pmovzxbw       xm1, [r1 + 6]
970
+    paddw          xm0, xm1
971
+    pmovzxbw       xm1, [r1 + 7]
972
+    paddw          xm0, xm1
973
+    pmovzxbw       xm1, [r1 + 8]
974
+    paddw          xm0, xm1
975
+    pmovzxbw       xm1, [r1 + 9]
976
+    paddw          xm0, xm1
977
+    pmovzxbw       xm1, [r1 + 10]
978
+    paddw          xm0, xm1
979
+    pmovzxbw       xm1, [r1 + 11]
980
+    paddw          xm0, xm1
981
+    pmovzxbw       xm1, [r1 + 12]
982
+    paddw          xm0, xm1
983
+    pmovzxbw       xm1, [r1 + 13]
984
+    paddw          xm0, xm1
985
+    pmovzxbw       xm1, [r1 + 14]
986
+    paddw          xm0, xm1
987
+    pmovzxbw       xm1, [r1 + 15]
988
+    paddw          xm0, xm1
989
+    pmovzxbw       xm1, [r1 + 16]
990
+    paddw          xm0, xm1
991
+    pmovzxbw       xm1, [r1 + 17]
992
+    paddw          xm0, xm1
993
+    pmovzxbw       xm1, [r1 + 18]
994
+    paddw          xm0, xm1
995
+    pmovzxbw       xm1, [r1 + 19]
996
+    paddw          xm0, xm1
997
+    pmovzxbw       xm1, [r1 + 20]
998
+    paddw          xm0, xm1
999
+    pmovzxbw       xm1, [r1 + 21]
1000
+    paddw          xm0, xm1
1001
+    pmovzxbw       xm1, [r1 + 22]
1002
+    paddw          xm0, xm1
1003
+    pmovzxbw       xm1, [r1 + 23]
1004
+    paddw          xm0, xm1
1005
+    pmovzxbw       xm1, [r1 + 24]
1006
+    paddw          xm0, xm1
1007
+    pmovzxbw       xm1, [r1 + 25]
1008
+    paddw          xm0, xm1
1009
+    pmovzxbw       xm1, [r1 + 26]
1010
+    paddw          xm0, xm1
1011
+    pmovzxbw       xm1, [r1 + 27]
1012
+    paddw          xm0, xm1
1013
+    pmovzxbw       xm1, [r1 + 28]
1014
+    paddw          xm0, xm1
1015
+    pmovzxbw       xm1, [r1 + 29]
1016
+    paddw          xm0, xm1
1017
+    pmovzxbw       xm1, [r1 + 30]
1018
+    paddw          xm0, xm1
1019
+    pmovzxbw       xm1, [r1 + 31]
1020
+    paddw          xm0, xm1
1021
+%endmacro
1022
+
1023
+;-----------------------------------------------------------------------------
1024
+;static void integral_init32h_c(uint32_t *sum, pixel *pix, intptr_t stride)
1025
+;-----------------------------------------------------------------------------
1026
+INIT_YMM avx2
1027
+cglobal integral32h, 3, 5, 3
1028
+    lea            r3, [4 * r2]
1029
+    sub            r0, r3
1030
+    sub            r2, 32                      ;stride - 32
1031
+    mov            r4, r2
1032
+    shr            r4, 4
1033
+
1034
+.loop_16:
1035
+    INTEGRAL_THIRTYTWO_HORIZONTAL_16
1036
+    vperm2i128     m2, m0, m0, 1
1037
+    pmovzxwd       m2, xm2
1038
+    pmovzxwd       m0, xm0
1039
+    movu           m1, [r0]
1040
+    paddd          m0, m1
1041
+    movu           [r0 + r3], m0
1042
+    movu           m1, [r0 + 32]
1043
+    paddd          m2, m1
1044
+    movu           [r0 + r3 + 32], m2
1045
+    add            r1, 16
1046
+    add            r0, 64
1047
+    sub            r2, 16
1048
+    sub            r4, 1
1049
+    jnz            .loop_16
1050
+    cmp            r2, 8
1051
+    je             .loop_8
1052
+    jmp             .end
1053
+
1054
+.loop_8:
1055
+    INTEGRAL_THIRTYTWO_HORIZONTAL_8
1056
+    pmovzxwd       m0, xm0
1057
+    movu           m1, [r0]
1058
+    paddd          m0, m1
1059
+    movu           [r0 + r3], m0
1060
+    jmp             .end
1061
+
1062
+.end
1063
+    RET
1064
x265_2.5.tar.gz/source/common/x86/seaintegral.h Added
44
 
1
@@ -0,0 +1,42 @@
2
+/*****************************************************************************
3
+* Copyright (C) 2013-2017 MulticoreWare, Inc
4
+*
5
+* Authors: Vignesh V Menon <vignesh@multicorewareinc.com>
6
+*          Jayashri Murugan <jayashri@multicorewareinc.com>
7
+*          Praveen Tiwari <praveen@multicorewareinc.com>
8
+*
9
+* This program is free software; you can redistribute it and/or modify
10
+* it under the terms of the GNU General Public License as published by
11
+* the Free Software Foundation; either version 2 of the License, or
12
+* (at your option) any later version.
13
+*
14
+* This program is distributed in the hope that it will be useful,
15
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
16
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
17
+* GNU General Public License for more details.
18
+*
19
+* You should have received a copy of the GNU General Public License
20
+* along with this program; if not, write to the Free Software
21
+* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
22
+*
23
+* This program is also available under a commercial proprietary license.
24
+* For more information, contact us at license @ x265.com.
25
+*****************************************************************************/
26
+
27
+#ifndef X265_SEAINTEGRAL_H
28
+#define X265_SEAINTEGRAL_H
29
+
30
+void PFX(integral4v_avx2)(uint32_t *sum, intptr_t stride);
31
+void PFX(integral8v_avx2)(uint32_t *sum, intptr_t stride);
32
+void PFX(integral12v_avx2)(uint32_t *sum, intptr_t stride);
33
+void PFX(integral16v_avx2)(uint32_t *sum, intptr_t stride);
34
+void PFX(integral24v_avx2)(uint32_t *sum, intptr_t stride);
35
+void PFX(integral32v_avx2)(uint32_t *sum, intptr_t stride);
36
+void PFX(integral4h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
37
+void PFX(integral8h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
38
+void PFX(integral12h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
39
+void PFX(integral16h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
40
+void PFX(integral24h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
41
+void PFX(integral32h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
42
+
43
+#endif //X265_SEAINTEGRAL_H
44
x265_2.4.tar.gz/source/common/x86/x86inc.asm -> x265_2.5.tar.gz/source/common/x86/x86inc.asm Changed
827
 
1
@@ -76,10 +76,6 @@
2
     SECTION .rodata align=%1
3
 %endmacro
4
 
5
-%macro SECTION_TEXT 0-1 16
6
-    SECTION .text align=%1
7
-%endmacro
8
-
9
 %if WIN64
10
     %define PIC
11
 %elif ARCH_X86_64 == 0
12
@@ -139,6 +135,7 @@
13
     %define r%1w %2w
14
     %define r%1b %2b
15
     %define r%1h %2h
16
+    %define %2q %2
17
     %if %0 == 2
18
         %define r%1m  %2d
19
         %define r%1mp %2
20
@@ -163,9 +160,9 @@
21
     %define e%1h %3
22
     %define r%1b %2
23
     %define e%1b %2
24
-%if ARCH_X86_64 == 0
25
-    %define r%1  e%1
26
-%endif
27
+    %if ARCH_X86_64 == 0
28
+        %define r%1 e%1
29
+    %endif
30
 %endmacro
31
 
32
 DECLARE_REG_SIZE ax, al, ah
33
@@ -275,7 +272,7 @@
34
 
35
 %macro ASSERT 1
36
     %if (%1) == 0
37
-        %error assert failed
38
+        %error assertion ``%1'' failed
39
     %endif
40
 %endmacro
41
 
42
@@ -365,9 +362,19 @@
43
     %ifnum %1
44
         %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT
45
             %if %1 > 0
46
+                ; Reserve an additional register for storing the original stack pointer, but avoid using
47
+                ; eax/rax for this purpose since it can potentially get overwritten as a return value.
48
                 %assign regs_used (regs_used + 1)
49
-            %elif ARCH_X86_64 && regs_used == num_args && num_args <= 4 + UNIX64 * 2
50
-                %warning "Stack pointer will overwrite register argument"
51
+                %if ARCH_X86_64 && regs_used == 7
52
+                    %assign regs_used 8
53
+                %elif ARCH_X86_64 == 0 && regs_used == 1
54
+                    %assign regs_used 2
55
+                %endif
56
+            %endif
57
+            %if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
58
+                ; Ensure that we don't clobber any registers containing arguments. For UNIX64 we also preserve r6 (rax)
59
+                ; since it's used as a hidden argument in vararg functions to specify the number of vector registers used.
60
+                %assign regs_used 5 + UNIX64 * 3
61
             %endif
62
         %endif
63
     %endif
64
@@ -396,10 +403,10 @@
65
 DECLARE_REG 8,  rsi, 72
66
 DECLARE_REG 9,  rbx, 80
67
 DECLARE_REG 10, rbp, 88
68
-DECLARE_REG 11, R12, 96
69
-DECLARE_REG 12, R13, 104
70
-DECLARE_REG 13, R14, 112
71
-DECLARE_REG 14, R15, 120
72
+DECLARE_REG 11, R14, 96
73
+DECLARE_REG 12, R15, 104
74
+DECLARE_REG 13, R12, 112
75
+DECLARE_REG 14, R13, 120
76
 
77
 %macro PROLOGUE 2-5+ 0 ; #args, #regs, #xmm_regs, [stack_size,] arg_names...
78
     %assign num_args %1
79
@@ -445,45 +452,46 @@
80
     WIN64_PUSH_XMM
81
 %endmacro
82
 
83
-%macro WIN64_RESTORE_XMM_INTERNAL 1
84
+%macro WIN64_RESTORE_XMM_INTERNAL 0
85
     %assign %%pad_size 0
86
     %if xmm_regs_used > 8
87
         %assign %%i xmm_regs_used
88
         %rep xmm_regs_used-8
89
             %assign %%i %%i-1
90
-            movaps xmm %+ %%i, [%1 + (%%i-8)*16 + stack_size + 32]
91
+            movaps xmm %+ %%i, [rsp + (%%i-8)*16 + stack_size + 32]
92
         %endrep
93
     %endif
94
     %if stack_size_padded > 0
95
         %if stack_size > 0 && required_stack_alignment > STACK_ALIGNMENT
96
             mov rsp, rstkm
97
         %else
98
-            add %1, stack_size_padded
99
+            add rsp, stack_size_padded
100
             %assign %%pad_size stack_size_padded
101
         %endif
102
     %endif
103
     %if xmm_regs_used > 7
104
-        movaps xmm7, [%1 + stack_offset - %%pad_size + 24]
105
+        movaps xmm7, [rsp + stack_offset - %%pad_size + 24]
106
     %endif
107
     %if xmm_regs_used > 6
108
-        movaps xmm6, [%1 + stack_offset - %%pad_size +  8]
109
+        movaps xmm6, [rsp + stack_offset - %%pad_size +  8]
110
     %endif
111
 %endmacro
112
 
113
-%macro WIN64_RESTORE_XMM 1
114
-    WIN64_RESTORE_XMM_INTERNAL %1
115
+%macro WIN64_RESTORE_XMM 0
116
+    WIN64_RESTORE_XMM_INTERNAL
117
     %assign stack_offset (stack_offset-stack_size_padded)
118
+    %assign stack_size_padded 0
119
     %assign xmm_regs_used 0
120
 %endmacro
121
 
122
 %define has_epilogue regs_used > 7 || xmm_regs_used > 6 || mmsize == 32 || stack_size > 0
123
 
124
 %macro RET 0
125
-    WIN64_RESTORE_XMM_INTERNAL rsp
126
+    WIN64_RESTORE_XMM_INTERNAL
127
     POP_IF_USED 14, 13, 12, 11, 10, 9, 8, 7
128
-%if mmsize == 32
129
-    vzeroupper
130
-%endif
131
+    %if mmsize == 32
132
+        vzeroupper
133
+    %endif
134
     AUTO_REP_RET
135
 %endmacro
136
 
137
@@ -500,10 +508,10 @@
138
 DECLARE_REG 8,  R11, 24
139
 DECLARE_REG 9,  rbx, 32
140
 DECLARE_REG 10, rbp, 40
141
-DECLARE_REG 11, R12, 48
142
-DECLARE_REG 12, R13, 56
143
-DECLARE_REG 13, R14, 64
144
-DECLARE_REG 14, R15, 72
145
+DECLARE_REG 11, R14, 48
146
+DECLARE_REG 12, R15, 56
147
+DECLARE_REG 13, R12, 64
148
+DECLARE_REG 14, R13, 72
149
 
150
 %macro PROLOGUE 2-5+ ; #args, #regs, #xmm_regs, [stack_size,] arg_names...
151
     %assign num_args %1
152
@@ -520,17 +528,17 @@
153
 %define has_epilogue regs_used > 9 || mmsize == 32 || stack_size > 0
154
 
155
 %macro RET 0
156
-%if stack_size_padded > 0
157
-%if required_stack_alignment > STACK_ALIGNMENT
158
-    mov rsp, rstkm
159
-%else
160
-    add rsp, stack_size_padded
161
-%endif
162
-%endif
163
+    %if stack_size_padded > 0
164
+        %if required_stack_alignment > STACK_ALIGNMENT
165
+            mov rsp, rstkm
166
+        %else
167
+            add rsp, stack_size_padded
168
+        %endif
169
+    %endif
170
     POP_IF_USED 14, 13, 12, 11, 10, 9
171
-%if mmsize == 32
172
-    vzeroupper
173
-%endif
174
+    %if mmsize == 32
175
+        vzeroupper
176
+    %endif
177
     AUTO_REP_RET
178
 %endmacro
179
 
180
@@ -576,29 +584,29 @@
181
 %define has_epilogue regs_used > 3 || mmsize == 32 || stack_size > 0
182
 
183
 %macro RET 0
184
-%if stack_size_padded > 0
185
-%if required_stack_alignment > STACK_ALIGNMENT
186
-    mov rsp, rstkm
187
-%else
188
-    add rsp, stack_size_padded
189
-%endif
190
-%endif
191
+    %if stack_size_padded > 0
192
+        %if required_stack_alignment > STACK_ALIGNMENT
193
+            mov rsp, rstkm
194
+        %else
195
+            add rsp, stack_size_padded
196
+        %endif
197
+    %endif
198
     POP_IF_USED 6, 5, 4, 3
199
-%if mmsize == 32
200
-    vzeroupper
201
-%endif
202
+    %if mmsize == 32
203
+        vzeroupper
204
+    %endif
205
     AUTO_REP_RET
206
 %endmacro
207
 
208
 %endif ;======================================================================
209
 
210
 %if WIN64 == 0
211
-%macro WIN64_SPILL_XMM 1
212
-%endmacro
213
-%macro WIN64_RESTORE_XMM 1
214
-%endmacro
215
-%macro WIN64_PUSH_XMM 0
216
-%endmacro
217
+    %macro WIN64_SPILL_XMM 1
218
+    %endmacro
219
+    %macro WIN64_RESTORE_XMM 0
220
+    %endmacro
221
+    %macro WIN64_PUSH_XMM 0
222
+    %endmacro
223
 %endif
224
 
225
 ; On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
226
@@ -615,10 +623,8 @@
227
 
228
 %define last_branch_adr $$
229
 %macro AUTO_REP_RET 0
230
-    %ifndef cpuflags
231
-        times ((last_branch_adr-$)>>31)+1 rep ; times 1 iff $ != last_branch_adr.
232
-    %elif notcpuflag(ssse3)
233
-        times ((last_branch_adr-$)>>31)+1 rep
234
+    %if notcpuflag(ssse3)
235
+        times ((last_branch_adr-$)>>31)+1 rep ; times 1 iff $ == last_branch_adr.
236
     %endif
237
     ret
238
 %endmacro
239
@@ -627,8 +633,10 @@
240
     %rep %0
241
         %macro %1 1-2 %1
242
             %2 %1
243
-            %%branch_instr:
244
-            %xdefine last_branch_adr %%branch_instr
245
+            %if notcpuflag(ssse3)
246
+                %%branch_instr equ $
247
+                %xdefine last_branch_adr %%branch_instr
248
+            %endif
249
         %endmacro
250
         %rotate 1
251
     %endrep
252
@@ -722,7 +730,7 @@
253
 ; This is needed for ELF, otherwise the GNU linker assumes the stack is
254
 ; executable by default.
255
 %ifidn __OUTPUT_FORMAT__,elf
256
-SECTION .note.GNU-stack noalloc noexec nowrite progbits
257
+    [SECTION .note.GNU-stack noalloc noexec nowrite progbits]
258
 %endif
259
 
260
 ; cpuflags
261
@@ -734,27 +742,28 @@
262
 %assign cpuflags_sse      (1<<4) | cpuflags_mmx2
263
 %assign cpuflags_sse2     (1<<5) | cpuflags_sse
264
 %assign cpuflags_sse2slow (1<<6) | cpuflags_sse2
265
-%assign cpuflags_sse3     (1<<7) | cpuflags_sse2
266
-%assign cpuflags_ssse3    (1<<8) | cpuflags_sse3
267
-%assign cpuflags_sse4     (1<<9) | cpuflags_ssse3
268
-%assign cpuflags_sse42    (1<<10)| cpuflags_sse4
269
-%assign cpuflags_avx      (1<<11)| cpuflags_sse42
270
-%assign cpuflags_xop      (1<<12)| cpuflags_avx
271
-%assign cpuflags_fma4     (1<<13)| cpuflags_avx
272
-%assign cpuflags_avx2     (1<<14)| cpuflags_avx
273
+%assign cpuflags_lzcnt    (1<<7) | cpuflags_sse2
274
+%assign cpuflags_sse3     (1<<8) | cpuflags_sse2
275
+%assign cpuflags_ssse3    (1<<9) | cpuflags_sse3
276
+%assign cpuflags_sse4     (1<<10)| cpuflags_ssse3
277
+%assign cpuflags_sse42    (1<<11)| cpuflags_sse4
278
+%assign cpuflags_avx      (1<<12)| cpuflags_sse42
279
+%assign cpuflags_xop      (1<<13)| cpuflags_avx
280
+%assign cpuflags_fma4     (1<<14)| cpuflags_avx
281
 %assign cpuflags_fma3     (1<<15)| cpuflags_avx
282
+%assign cpuflags_bmi1     (1<<16)| cpuflags_avx | cpuflags_lzcnt
283
+%assign cpuflags_bmi2     (1<<17)| cpuflags_bmi1
284
+%assign cpuflags_avx2     (1<<18)| cpuflags_fma3 | cpuflags_bmi2
285
 
286
-%assign cpuflags_cache32  (1<<16)
287
-%assign cpuflags_cache64  (1<<17)
288
-%assign cpuflags_slowctz  (1<<18)
289
-%assign cpuflags_lzcnt    (1<<19)
290
-%assign cpuflags_aligned  (1<<20) ; not a cpu feature, but a function variant
291
-%assign cpuflags_atom     (1<<21)
292
-%assign cpuflags_bmi1     (1<<22)|cpuflags_lzcnt
293
-%assign cpuflags_bmi2     (1<<23)|cpuflags_bmi1
294
+%assign cpuflags_cache32  (1<<19)
295
+%assign cpuflags_cache64  (1<<20)
296
+%assign cpuflags_slowctz  (1<<21)
297
+%assign cpuflags_aligned  (1<<22) ; not a cpu feature, but a function variant
298
+%assign cpuflags_atom     (1<<23)
299
 
300
-%define    cpuflag(x) ((cpuflags & (cpuflags_ %+ x)) == (cpuflags_ %+ x))
301
-%define notcpuflag(x) ((cpuflags & (cpuflags_ %+ x)) != (cpuflags_ %+ x))
302
+; Returns a boolean value expressing whether or not the specified cpuflag is enabled.
303
+%define    cpuflag(x) (((((cpuflags & (cpuflags_ %+ x)) ^ (cpuflags_ %+ x)) - 1) >> 31) & 1)
304
+%define notcpuflag(x) (cpuflag(x) ^ 1)
305
 
306
 ; Takes an arbitrary number of cpuflags from the above list.
307
 ; All subsequent functions (up to the next INIT_CPUFLAGS) is built for the specified cpu.
308
@@ -823,14 +832,14 @@
309
     %define movnta movntq
310
     %assign %%i 0
311
     %rep 8
312
-    CAT_XDEFINE m, %%i, mm %+ %%i
313
-    CAT_XDEFINE nmm, %%i, %%i
314
-    %assign %%i %%i+1
315
+        CAT_XDEFINE m, %%i, mm %+ %%i
316
+        CAT_XDEFINE nnmm, %%i, %%i
317
+        %assign %%i %%i+1
318
     %endrep
319
     %rep 8
320
-    CAT_UNDEF m, %%i
321
-    CAT_UNDEF nmm, %%i
322
-    %assign %%i %%i+1
323
+        CAT_UNDEF m, %%i
324
+        CAT_UNDEF nnmm, %%i
325
+        %assign %%i %%i+1
326
     %endrep
327
     INIT_CPUFLAGS %1
328
 %endmacro
329
@@ -841,7 +850,7 @@
330
     %define mmsize 16
331
     %define num_mmregs 8
332
     %if ARCH_X86_64
333
-    %define num_mmregs 16
334
+        %define num_mmregs 16
335
     %endif
336
     %define mova movdqa
337
     %define movu movdqu
338
@@ -849,9 +858,9 @@
339
     %define movnta movntdq
340
     %assign %%i 0
341
     %rep num_mmregs
342
-    CAT_XDEFINE m, %%i, xmm %+ %%i
343
-    CAT_XDEFINE nxmm, %%i, %%i
344
-    %assign %%i %%i+1
345
+        CAT_XDEFINE m, %%i, xmm %+ %%i
346
+        CAT_XDEFINE nnxmm, %%i, %%i
347
+        %assign %%i %%i+1
348
     %endrep
349
     INIT_CPUFLAGS %1
350
 %endmacro
351
@@ -862,7 +871,7 @@
352
     %define mmsize 32
353
     %define num_mmregs 8
354
     %if ARCH_X86_64
355
-    %define num_mmregs 16
356
+        %define num_mmregs 16
357
     %endif
358
     %define mova movdqa
359
     %define movu movdqu
360
@@ -870,9 +879,9 @@
361
     %define movnta movntdq
362
     %assign %%i 0
363
     %rep num_mmregs
364
-    CAT_XDEFINE m, %%i, ymm %+ %%i
365
-    CAT_XDEFINE nymm, %%i, %%i
366
-    %assign %%i %%i+1
367
+        CAT_XDEFINE m, %%i, ymm %+ %%i
368
+        CAT_XDEFINE nnymm, %%i, %%i
369
+        %assign %%i %%i+1
370
     %endrep
371
     INIT_CPUFLAGS %1
372
 %endmacro
373
@@ -889,8 +898,6 @@
374
     %define ymmmm%1   mm%1
375
     %define ymmxmm%1 xmm%1
376
     %define ymmymm%1 ymm%1
377
-    %define ymm%1xmm xmm%1
378
-    %define xmm%1ymm ymm%1
379
     %define xm%1 xmm %+ m%1
380
     %define ym%1 ymm %+ m%1
381
 %endmacro
382
@@ -898,7 +905,7 @@
383
 %assign i 0
384
 %rep 16
385
     DECLARE_MMCAST i
386
-%assign i i+1
387
+    %assign i i+1
388
 %endrep
389
 
390
 ; I often want to use macros that permute their arguments. e.g. there's no
391
@@ -916,23 +923,23 @@
392
 ; doesn't cost any cycles.
393
 
394
 %macro PERMUTE 2-* ; takes a list of pairs to swap
395
-%rep %0/2
396
-    %xdefine %%tmp%2 m%2
397
-    %rotate 2
398
-%endrep
399
-%rep %0/2
400
-    %xdefine m%1 %%tmp%2
401
-    CAT_XDEFINE n, m%1, %1
402
-    %rotate 2
403
-%endrep
404
+    %rep %0/2
405
+        %xdefine %%tmp%2 m%2
406
+        %rotate 2
407
+    %endrep
408
+    %rep %0/2
409
+        %xdefine m%1 %%tmp%2
410
+        CAT_XDEFINE nn, m%1, %1
411
+        %rotate 2
412
+    %endrep
413
 %endmacro
414
 
415
 %macro SWAP 2+ ; swaps a single chain (sometimes more concise than pairs)
416
-%ifnum %1 ; SWAP 0, 1, ...
417
-    SWAP_INTERNAL_NUM %1, %2
418
-%else ; SWAP m0, m1, ...
419
-    SWAP_INTERNAL_NAME %1, %2
420
-%endif
421
+    %ifnum %1 ; SWAP 0, 1, ...
422
+        SWAP_INTERNAL_NUM %1, %2
423
+    %else ; SWAP m0, m1, ...
424
+        SWAP_INTERNAL_NAME %1, %2
425
+    %endif
426
 %endmacro
427
 
428
 %macro SWAP_INTERNAL_NUM 2-*
429
@@ -940,17 +947,17 @@
430
         %xdefine %%tmp m%1
431
         %xdefine m%1 m%2
432
         %xdefine m%2 %%tmp
433
-        CAT_XDEFINE n, m%1, %1
434
-        CAT_XDEFINE n, m%2, %2
435
-    %rotate 1
436
+        CAT_XDEFINE nn, m%1, %1
437
+        CAT_XDEFINE nn, m%2, %2
438
+        %rotate 1
439
     %endrep
440
 %endmacro
441
 
442
 %macro SWAP_INTERNAL_NAME 2-*
443
-    %xdefine %%args n %+ %1
444
+    %xdefine %%args nn %+ %1
445
     %rep %0-1
446
-        %xdefine %%args %%args, n %+ %2
447
-    %rotate 1
448
+        %xdefine %%args %%args, nn %+ %2
449
+        %rotate 1
450
     %endrep
451
     SWAP_INTERNAL_NUM %%args
452
 %endmacro
453
@@ -967,7 +974,7 @@
454
     %assign %%i 0
455
     %rep num_mmregs
456
         CAT_XDEFINE %%f, %%i, m %+ %%i
457
-    %assign %%i %%i+1
458
+        %assign %%i %%i+1
459
     %endrep
460
 %endmacro
461
 
462
@@ -976,21 +983,25 @@
463
         %assign %%i 0
464
         %rep num_mmregs
465
             CAT_XDEFINE m, %%i, %1_m %+ %%i
466
-            CAT_XDEFINE n, m %+ %%i, %%i
467
-        %assign %%i %%i+1
468
+            CAT_XDEFINE nn, m %+ %%i, %%i
469
+            %assign %%i %%i+1
470
         %endrep
471
     %endif
472
 %endmacro
473
 
474
 ; Append cpuflags to the callee's name iff the appended name is known and the plain name isn't
475
 %macro call 1
476
-    call_internal %1, %1 %+ SUFFIX
477
+    %ifid %1
478
+        call_internal %1 %+ SUFFIX, %1
479
+    %else
480
+        call %1
481
+    %endif
482
 %endmacro
483
 %macro call_internal 2
484
-    %xdefine %%i %1
485
-    %ifndef cglobaled_%1
486
-        %ifdef cglobaled_%2
487
-            %xdefine %%i %2
488
+    %xdefine %%i %2
489
+    %ifndef cglobaled_%2
490
+        %ifdef cglobaled_%1
491
+            %xdefine %%i %1
492
         %endif
493
     %endif
494
     call %%i
495
@@ -1033,7 +1044,7 @@
496
     %endif
497
     CAT_XDEFINE sizeofxmm, i, 16
498
     CAT_XDEFINE sizeofymm, i, 32
499
-%assign i i+1
500
+    %assign i i+1
501
 %endrep
502
 %undef i
503
 
504
@@ -1051,7 +1062,7 @@
505
 ;%1 == instruction
506
 ;%2 == minimal instruction set
507
 ;%3 == 1 if float, 0 if int
508
-;%4 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise
509
+;%4 == 1 if 4-operand emulation, 0 if 3-operand emulation, 255 otherwise (no emulation)
510
 ;%5 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not
511
 ;%6+: operands
512
 %macro RUN_AVX_INSTR 6-9+
513
@@ -1075,6 +1086,8 @@
514
         %ifdef cpuname
515
             %if notcpuflag(%2)
516
                 %error use of ``%1'' %2 instruction in cpuname function: current_function
517
+            %elif cpuflags_%2 < cpuflags_sse && notcpuflag(sse2) && __sizeofreg > 8
518
+                %error use of ``%1'' sse2 instruction in cpuname function: current_function
519
             %endif
520
         %endif
521
     %endif
522
@@ -1082,14 +1095,12 @@
523
     %if __emulate_avx
524
         %xdefine __src1 %7
525
         %xdefine __src2 %8
526
-        %ifnidn %6, %7
527
-            %if %0 >= 9
528
-                CHECK_AVX_INSTR_EMU {%1 %6, %7, %8, %9}, %6, %8, %9
529
-            %else
530
-                CHECK_AVX_INSTR_EMU {%1 %6, %7, %8}, %6, %8
531
-            %endif
532
-            %if %5 && %4 == 0
533
-                %ifnid %8
534
+        %if %5 && %4 == 0
535
+            %ifnidn %6, %7
536
+                %ifidn %6, %8
537
+                    %xdefine __src1 %8
538
+                    %xdefine __src2 %7
539
+                %elifnnum sizeof%8
540
                     ; 3-operand AVX instructions with a memory arg can only have it in src2,
541
                     ; whereas SSE emulation prefers to have it in src1 (i.e. the mov).
542
                     ; So, if the instruction is commutative with a memory arg, swap them.
543
@@ -1097,6 +1108,13 @@
544
                     %xdefine __src2 %7
545
                 %endif
546
             %endif
547
+        %endif
548
+        %ifnidn %6, __src1
549
+            %if %0 >= 9
550
+                CHECK_AVX_INSTR_EMU {%1 %6, %7, %8, %9}, %6, __src2, %9
551
+            %else
552
+                CHECK_AVX_INSTR_EMU {%1 %6, %7, %8}, %6, __src2
553
+            %endif
554
             %if __sizeofreg == 8
555
                 MOVQ %6, __src1
556
             %elif %3
557
@@ -1124,9 +1142,9 @@
558
 ;%1 == instruction
559
 ;%2 == minimal instruction set
560
 ;%3 == 1 if float, 0 if int
561
-;%4 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise
562
+;%4 == 1 if 4-operand emulation, 0 if 3-operand emulation, 255 otherwise (no emulation)
563
 ;%5 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not
564
-%macro AVX_INSTR 1-5 fnord, 0, 1, 0
565
+%macro AVX_INSTR 1-5 fnord, 0, 255, 0
566
     %macro %1 1-10 fnord, fnord, fnord, fnord, %1, %2, %3, %4, %5
567
         %ifidn %2, fnord
568
             RUN_AVX_INSTR %6, %7, %8, %9, %10, %1
569
@@ -1146,8 +1164,8 @@
570
 ; Non-destructive instructions are written without parameters
571
 AVX_INSTR addpd, sse2, 1, 0, 1
572
 AVX_INSTR addps, sse, 1, 0, 1
573
-AVX_INSTR addsd, sse2, 1, 0, 1
574
-AVX_INSTR addss, sse, 1, 0, 1
575
+AVX_INSTR addsd, sse2, 1, 0, 0
576
+AVX_INSTR addss, sse, 1, 0, 0
577
 AVX_INSTR addsubpd, sse3, 1, 0, 0
578
 AVX_INSTR addsubps, sse3, 1, 0, 0
579
 AVX_INSTR aesdec, fnord, 0, 0, 0
580
@@ -1160,10 +1178,10 @@
581
 AVX_INSTR andnps, sse, 1, 0, 0
582
 AVX_INSTR andpd, sse2, 1, 0, 1
583
 AVX_INSTR andps, sse, 1, 0, 1
584
-AVX_INSTR blendpd, sse4, 1, 0, 0
585
-AVX_INSTR blendps, sse4, 1, 0, 0
586
-AVX_INSTR blendvpd, sse4, 1, 0, 0
587
-AVX_INSTR blendvps, sse4, 1, 0, 0
588
+AVX_INSTR blendpd, sse4, 1, 1, 0
589
+AVX_INSTR blendps, sse4, 1, 1, 0
590
+AVX_INSTR blendvpd, sse4 ; can't be emulated
591
+AVX_INSTR blendvps, sse4 ; can't be emulated
592
 AVX_INSTR cmppd, sse2, 1, 1, 0
593
 AVX_INSTR cmpps, sse, 1, 1, 0
594
 AVX_INSTR cmpsd, sse2, 1, 1, 0
595
@@ -1177,10 +1195,10 @@
596
 AVX_INSTR cvtps2dq, sse2
597
 AVX_INSTR cvtps2pd, sse2
598
 AVX_INSTR cvtsd2si, sse2
599
-AVX_INSTR cvtsd2ss, sse2
600
-AVX_INSTR cvtsi2sd, sse2
601
-AVX_INSTR cvtsi2ss, sse
602
-AVX_INSTR cvtss2sd, sse2
603
+AVX_INSTR cvtsd2ss, sse2, 1, 0, 0
604
+AVX_INSTR cvtsi2sd, sse2, 1, 0, 0
605
+AVX_INSTR cvtsi2ss, sse, 1, 0, 0
606
+AVX_INSTR cvtss2sd, sse2, 1, 0, 0
607
 AVX_INSTR cvtss2si, sse
608
 AVX_INSTR cvttpd2dq, sse2
609
 AVX_INSTR cvttps2dq, sse2
610
@@ -1203,15 +1221,15 @@
611
 AVX_INSTR maskmovdqu, sse2
612
 AVX_INSTR maxpd, sse2, 1, 0, 1
613
 AVX_INSTR maxps, sse, 1, 0, 1
614
-AVX_INSTR maxsd, sse2, 1, 0, 1
615
-AVX_INSTR maxss, sse, 1, 0, 1
616
+AVX_INSTR maxsd, sse2, 1, 0, 0
617
+AVX_INSTR maxss, sse, 1, 0, 0
618
 AVX_INSTR minpd, sse2, 1, 0, 1
619
 AVX_INSTR minps, sse, 1, 0, 1
620
-AVX_INSTR minsd, sse2, 1, 0, 1
621
-AVX_INSTR minss, sse, 1, 0, 1
622
+AVX_INSTR minsd, sse2, 1, 0, 0
623
+AVX_INSTR minss, sse, 1, 0, 0
624
 AVX_INSTR movapd, sse2
625
 AVX_INSTR movaps, sse
626
-AVX_INSTR movd
627
+AVX_INSTR movd, mmx
628
 AVX_INSTR movddup, sse3
629
 AVX_INSTR movdqa, sse2
630
 AVX_INSTR movdqu, sse2
631
@@ -1227,18 +1245,18 @@
632
 AVX_INSTR movntdqa, sse4
633
 AVX_INSTR movntpd, sse2
634
 AVX_INSTR movntps, sse
635
-AVX_INSTR movq
636
+AVX_INSTR movq, mmx
637
 AVX_INSTR movsd, sse2, 1, 0, 0
638
 AVX_INSTR movshdup, sse3
639
 AVX_INSTR movsldup, sse3
640
 AVX_INSTR movss, sse, 1, 0, 0
641
 AVX_INSTR movupd, sse2
642
 AVX_INSTR movups, sse
643
-AVX_INSTR mpsadbw, sse4
644
+AVX_INSTR mpsadbw, sse4, 0, 1, 0
645
 AVX_INSTR mulpd, sse2, 1, 0, 1
646
 AVX_INSTR mulps, sse, 1, 0, 1
647
-AVX_INSTR mulsd, sse2, 1, 0, 1
648
-AVX_INSTR mulss, sse, 1, 0, 1
649
+AVX_INSTR mulsd, sse2, 1, 0, 0
650
+AVX_INSTR mulss, sse, 1, 0, 0
651
 AVX_INSTR orpd, sse2, 1, 0, 1
652
 AVX_INSTR orps, sse, 1, 0, 1
653
 AVX_INSTR pabsb, ssse3
654
@@ -1256,14 +1274,18 @@
655
 AVX_INSTR paddsw, mmx, 0, 0, 1
656
 AVX_INSTR paddusb, mmx, 0, 0, 1
657
 AVX_INSTR paddusw, mmx, 0, 0, 1
658
-AVX_INSTR palignr, ssse3
659
+AVX_INSTR palignr, ssse3, 0, 1, 0
660
 AVX_INSTR pand, mmx, 0, 0, 1
661
 AVX_INSTR pandn, mmx, 0, 0, 0
662
 AVX_INSTR pavgb, mmx2, 0, 0, 1
663
 AVX_INSTR pavgw, mmx2, 0, 0, 1
664
-AVX_INSTR pblendvb, sse4, 0, 0, 0
665
-AVX_INSTR pblendw, sse4
666
-AVX_INSTR pclmulqdq
667
+AVX_INSTR pblendvb, sse4 ; can't be emulated
668
+AVX_INSTR pblendw, sse4, 0, 1, 0
669
+AVX_INSTR pclmulqdq, fnord, 0, 1, 0
670
+AVX_INSTR pclmulhqhqdq, fnord, 0, 0, 0
671
+AVX_INSTR pclmulhqlqdq, fnord, 0, 0, 0
672
+AVX_INSTR pclmullqhqdq, fnord, 0, 0, 0
673
+AVX_INSTR pclmullqlqdq, fnord, 0, 0, 0
674
 AVX_INSTR pcmpestri, sse42
675
 AVX_INSTR pcmpestrm, sse42
676
 AVX_INSTR pcmpistri, sse42
677
@@ -1287,10 +1309,10 @@
678
 AVX_INSTR phsubw, ssse3, 0, 0, 0
679
 AVX_INSTR phsubd, ssse3, 0, 0, 0
680
 AVX_INSTR phsubsw, ssse3, 0, 0, 0
681
-AVX_INSTR pinsrb, sse4
682
-AVX_INSTR pinsrd, sse4
683
-AVX_INSTR pinsrq, sse4
684
-AVX_INSTR pinsrw, mmx2
685
+AVX_INSTR pinsrb, sse4, 0, 1, 0
686
+AVX_INSTR pinsrd, sse4, 0, 1, 0
687
+AVX_INSTR pinsrq, sse4, 0, 1, 0
688
+AVX_INSTR pinsrw, mmx2, 0, 1, 0
689
 AVX_INSTR pmaddwd, mmx, 0, 0, 1
690
 AVX_INSTR pmaddubsw, ssse3, 0, 0, 0
691
 AVX_INSTR pmaxsb, sse4, 0, 0, 1
692
@@ -1362,18 +1384,18 @@
693
 AVX_INSTR punpckldq, mmx, 0, 0, 0
694
 AVX_INSTR punpcklqdq, sse2, 0, 0, 0
695
 AVX_INSTR pxor, mmx, 0, 0, 1
696
-AVX_INSTR rcpps, sse, 1, 0, 0
697
+AVX_INSTR rcpps, sse
698
 AVX_INSTR rcpss, sse, 1, 0, 0
699
 AVX_INSTR roundpd, sse4
700
 AVX_INSTR roundps, sse4
701
-AVX_INSTR roundsd, sse4
702
-AVX_INSTR roundss, sse4
703
-AVX_INSTR rsqrtps, sse, 1, 0, 0
704
+AVX_INSTR roundsd, sse4, 1, 1, 0
705
+AVX_INSTR roundss, sse4, 1, 1, 0
706
+AVX_INSTR rsqrtps, sse
707
 AVX_INSTR rsqrtss, sse, 1, 0, 0
708
 AVX_INSTR shufpd, sse2, 1, 1, 0
709
 AVX_INSTR shufps, sse, 1, 1, 0
710
-AVX_INSTR sqrtpd, sse2, 1, 0, 0
711
-AVX_INSTR sqrtps, sse, 1, 0, 0
712
+AVX_INSTR sqrtpd, sse2
713
+AVX_INSTR sqrtps, sse
714
 AVX_INSTR sqrtsd, sse2, 1, 0, 0
715
 AVX_INSTR sqrtss, sse, 1, 0, 0
716
 AVX_INSTR stmxcsr, sse
717
@@ -1408,7 +1430,7 @@
718
     %else
719
         CAT_XDEFINE q, j, i
720
     %endif
721
-%assign i i+1
722
+    %assign i i+1
723
 %endrep
724
 %undef i
725
 %undef j
726
@@ -1431,55 +1453,52 @@
727
 FMA_INSTR pmacsdql,  pmuldq, paddq ; sse4 emulation
728
 FMA_INSTR pmadcswd, pmaddwd, paddd
729
 
730
-; convert FMA4 to FMA3 if possible
731
-%macro FMA4_INSTR 4
732
-    %macro %1 4-8 %1, %2, %3, %4
733
-        %if cpuflag(fma4)
734
-            v%5 %1, %2, %3, %4
735
-        %elifidn %1, %2
736
-            v%6 %1, %4, %3 ; %1 = %1 * %3 + %4
737
-        %elifidn %1, %3
738
-            v%7 %1, %2, %4 ; %1 = %2 * %1 + %4
739
-        %elifidn %1, %4
740
-            v%8 %1, %2, %3 ; %1 = %2 * %3 + %1
741
+; Macros for consolidating FMA3 and FMA4 using 4-operand (dst, src1, src2, src3) syntax.
742
+; FMA3 is only possible if dst is the same as one of the src registers.
743
+; Either src2 or src3 can be a memory operand.
744
+%macro FMA4_INSTR 2-*
745
+    %push fma4_instr
746
+    %xdefine %$prefix %1
747
+    %rep %0 - 1
748
+        %macro %$prefix%2 4-6 %$prefix, %2
749
+            %if notcpuflag(fma3) && notcpuflag(fma4)
750
+                %error use of ``%5%6'' fma instruction in cpuname function: current_function
751
+            %elif cpuflag(fma4)
752
+                v%5%6 %1, %2, %3, %4
753
+            %elifidn %1, %2
754
+                ; If %3 or %4 is a memory operand it needs to be encoded as the last operand.
755
+                %ifid %3
756
+                    v%{5}213%6 %2, %3, %4
757
+                %else
758
+                    v%{5}132%6 %2, %4, %3
759
+                %endif
760
+            %elifidn %1, %3
761
+                v%{5}213%6 %3, %2, %4
762
+            %elifidn %1, %4
763
+                v%{5}231%6 %4, %2, %3
764
+            %else
765
+                %error fma3 emulation of ``%5%6 %1, %2, %3, %4'' is not supported
766
+            %endif
767
+        %endmacro
768
+        %rotate 1
769
+    %endrep
770
+    %pop
771
+%endmacro
772
+
773
+FMA4_INSTR fmadd,    pd, ps, sd, ss
774
+FMA4_INSTR fmaddsub, pd, ps
775
+FMA4_INSTR fmsub,    pd, ps, sd, ss
776
+FMA4_INSTR fmsubadd, pd, ps
777
+FMA4_INSTR fnmadd,   pd, ps, sd, ss
778
+FMA4_INSTR fnmsub,   pd, ps, sd, ss
779
+
780
+; workaround: vpbroadcastq is broken in x86_32 due to a yasm bug (fixed in 1.3.0)
781
+%if __YASM_VERSION_ID__ < 0x01030000 && ARCH_X86_64 == 0
782
+    %macro vpbroadcastq 2
783
+        %if sizeof%1 == 16
784
+            movddup %1, %2
785
         %else
786
-            %error fma3 emulation of ``%5 %1, %2, %3, %4'' is not supported
787
+            vbroadcastsd %1, %2
788
         %endif
789
     %endmacro
790
-%endmacro
791
-
792
-FMA4_INSTR fmaddpd, fmadd132pd, fmadd213pd, fmadd231pd
793
-FMA4_INSTR fmaddps, fmadd132ps, fmadd213ps, fmadd231ps
794
-FMA4_INSTR fmaddsd, fmadd132sd, fmadd213sd, fmadd231sd
795
-FMA4_INSTR fmaddss, fmadd132ss, fmadd213ss, fmadd231ss
796
-
797
-FMA4_INSTR fmaddsubpd, fmaddsub132pd, fmaddsub213pd, fmaddsub231pd
798
-FMA4_INSTR fmaddsubps, fmaddsub132ps, fmaddsub213ps, fmaddsub231ps
799
-FMA4_INSTR fmsubaddpd, fmsubadd132pd, fmsubadd213pd, fmsubadd231pd
800
-FMA4_INSTR fmsubaddps, fmsubadd132ps, fmsubadd213ps, fmsubadd231ps
801
-
802
-FMA4_INSTR fmsubpd, fmsub132pd, fmsub213pd, fmsub231pd
803
-FMA4_INSTR fmsubps, fmsub132ps, fmsub213ps, fmsub231ps
804
-FMA4_INSTR fmsubsd, fmsub132sd, fmsub213sd, fmsub231sd
805
-FMA4_INSTR fmsubss, fmsub132ss, fmsub213ss, fmsub231ss
806
-
807
-FMA4_INSTR fnmaddpd, fnmadd132pd, fnmadd213pd, fnmadd231pd
808
-FMA4_INSTR fnmaddps, fnmadd132ps, fnmadd213ps, fnmadd231ps
809
-FMA4_INSTR fnmaddsd, fnmadd132sd, fnmadd213sd, fnmadd231sd
810
-FMA4_INSTR fnmaddss, fnmadd132ss, fnmadd213ss, fnmadd231ss
811
-
812
-FMA4_INSTR fnmsubpd, fnmsub132pd, fnmsub213pd, fnmsub231pd
813
-FMA4_INSTR fnmsubps, fnmsub132ps, fnmsub213ps, fnmsub231ps
814
-FMA4_INSTR fnmsubsd, fnmsub132sd, fnmsub213sd, fnmsub231sd
815
-FMA4_INSTR fnmsubss, fnmsub132ss, fnmsub213ss, fnmsub231ss
816
-
817
-; workaround: vpbroadcastq is broken in x86_32 due to a yasm bug
818
-%if ARCH_X86_64 == 0
819
-%macro vpbroadcastq 2
820
-%if sizeof%1 == 16
821
-    movddup %1, %2
822
-%else
823
-    vbroadcastsd %1, %2
824
-%endif
825
-%endmacro
826
 %endif
827
x265_2.4.tar.gz/source/dynamicHDR10/BasicStructures.h -> x265_2.5.tar.gz/source/dynamicHDR10/BasicStructures.h Changed
32
 
1
@@ -35,16 +35,26 @@
2
     float maxRLuminance = 0.0;
3
     float maxGLuminance = 0.0;
4
     float maxBLuminance = 0.0;
5
-    int order;
6
+    int order = 0;
7
     std::vector<unsigned int> percentiles;
8
 };
9
 
10
 struct BezierCurveData
11
 {
12
-    int order;
13
-    int sPx;
14
-    int sPy;
15
+    int order = 0;
16
+    int sPx = 0;
17
+    int sPy = 0;
18
     std::vector<int> coeff;
19
 };
20
 
21
+struct PercentileLuminance{
22
+
23
+    float averageLuminance = 0.0;
24
+    float maxRLuminance = 0.0;
25
+    float maxGLuminance = 0.0;
26
+    float maxBLuminance = 0.0;
27
+    int order = 0;
28
+    std::vector<unsigned int> percentiles;
29
+};
30
+
31
 #endif // BASICSTRUCTURES_H
32
x265_2.4.tar.gz/source/dynamicHDR10/CMakeLists.txt -> x265_2.5.tar.gz/source/dynamicHDR10/CMakeLists.txt Changed
48
 
1
@@ -1,8 +1,8 @@
2
 # vim: syntax=cmake
3
-if(ENABLE_DYNAMIC_HDR10)
4
+if(ENABLE_HDR10_PLUS)
5
 
6
 add_library(dynamicHDR10 OBJECT 
7
-    BasicStructures.cpp BasicStructures.h
8
+    BasicStructures.h
9
     json11/json11.cpp json11/json11.h
10
     JsonHelper.cpp JsonHelper.h
11
     metadataFromJson.cpp metadataFromJson.h
12
@@ -10,7 +10,6 @@
13
     hdr10plus.h
14
     api.cpp )
15
 
16
-else()
17
 cmake_minimum_required (VERSION 2.8.11)
18
 project(dynamicHDR10)
19
 include(CheckIncludeFiles)
20
@@ -150,26 +149,5 @@
21
     
22
 option(ENABLE_SHARED "Build shared library" OFF)
23
 
24
-if(ENABLE_SHARED)
25
-    add_library(dynamicHDR10 SHARED
26
-        json11/json11.cpp json11/json11.h
27
-        BasicStructures.cpp BasicStructures.h
28
-        JsonHelper.cpp JsonHelper.h
29
-        metadataFromJson.cpp metadataFromJson.h
30
-        SeiMetadataDictionary.cpp SeiMetadataDictionary.h
31
-        hdr10plus.h api.cpp )
32
-else()
33
-    add_library(dynamicHDR10 STATIC
34
-    json11/json11.cpp json11/json11.h
35
-    BasicStructures.cpp BasicStructures.h
36
-    JsonHelper.cpp JsonHelper.h
37
-    metadataFromJson.cpp metadataFromJson.h
38
-    SeiMetadataDictionary.cpp SeiMetadataDictionary.h
39
-    hdr10plus.h api.cpp )
40
-endif()
41
-
42
-install (TARGETS dynamicHDR10
43
-    LIBRARY DESTINATION ${LIB_INSTALL_DIR}
44
-    ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
45
 install(FILES hdr10plus.h DESTINATION include)
46
 endif()
47
\ No newline at end of file
48
x265_2.4.tar.gz/source/dynamicHDR10/json11/json11.cpp -> x265_2.5.tar.gz/source/dynamicHDR10/json11/json11.cpp Changed
50
 
1
@@ -26,6 +26,12 @@
2
 #include <cstdio>
3
 #include <limits>
4
 
5
+#if _MSC_VER
6
+#pragma warning(disable: 4510) //const member cannot be default initialized
7
+#pragma warning(disable: 4512) //assignment operator could not be generated
8
+#pragma warning(disable: 4610) //const member cannot be default initialized
9
+#endif
10
+
11
 namespace json11 {
12
 
13
 static const int max_depth = 200;
14
@@ -435,7 +441,7 @@
15
     char get_next_token() {
16
         consume_garbage();
17
         if (i == str.size())
18
-            return fail("unexpected end of input", 0);
19
+            return fail("unexpected end of input", '0');
20
 
21
         return str[i++];
22
     }
23
@@ -472,7 +478,7 @@
24
     string parse_string() {
25
         string out;
26
         long last_escaped_codepoint = -1;
27
-        while (true) {
28
+        for (;;) {
29
             if (i == str.size())
30
                 return fail("unexpected end of input in string", "");
31
 
32
@@ -665,7 +671,7 @@
33
             if (ch == '}')
34
                 return data;
35
 
36
-            while (1) {
37
+            for (;;) {
38
                 if (ch != '"')
39
                     return fail("expected '\"' in object, got " + esc(ch));
40
 
41
@@ -698,7 +704,7 @@
42
             if (ch == ']')
43
                 return data;
44
 
45
-            while (1) {
46
+            for (;;) {
47
                 i--;
48
                 data.push_back(parse_json(depth + 1));
49
                 if (failed)
50
x265_2.4.tar.gz/source/dynamicHDR10/metadataFromJson.cpp -> x265_2.5.tar.gz/source/dynamicHDR10/metadataFromJson.cpp Changed
10
 
1
@@ -168,7 +168,7 @@
2
     {
3
         int payloadBytes = 1;
4
 
5
-        for(;payload > 0xFF; payload -= 0xFF, ++payloadBytes);
6
+        for(;payload >= 0xFF; payload -= 0xFF, ++payloadBytes);
7
 
8
         if(payloadBytes > 1)
9
         {
10
x265_2.4.tar.gz/source/encoder/CMakeLists.txt -> x265_2.5.tar.gz/source/encoder/CMakeLists.txt Changed
8
 
1
@@ -43,4 +43,5 @@
2
     reference.cpp reference.h
3
     encoder.cpp encoder.h
4
     api.cpp
5
-    weightPrediction.cpp)
6
+    weightPrediction.cpp
7
+    ../x265-extras.cpp ../x265-extras.h)
8
x265_2.4.tar.gz/source/encoder/analysis.cpp -> x265_2.5.tar.gz/source/encoder/analysis.cpp Changed
773
 
1
@@ -75,6 +75,7 @@
2
     m_reuseInterDataCTU = NULL;
3
     m_reuseRef = NULL;
4
     m_bHD = false;
5
+    m_evaluateInter = 0;
6
 }
7
 
8
 bool Analysis::create(ThreadLocalData *tld)
9
@@ -89,19 +90,19 @@
10
     cacheCost = X265_MALLOC(uint64_t, costArrSize);
11
 
12
     int csp = m_param->internalCsp;
13
-    uint32_t cuSize = g_maxCUSize;
14
+    uint32_t cuSize = m_param->maxCUSize;
15
 
16
     bool ok = true;
17
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++, cuSize >>= 1)
18
+    for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++, cuSize >>= 1)
19
     {
20
         ModeDepth &md = m_modeDepth[depth];
21
 
22
-        md.cuMemPool.create(depth, csp, MAX_PRED_TYPES);
23
+        md.cuMemPool.create(depth, csp, MAX_PRED_TYPES, *m_param);
24
         ok &= md.fencYuv.create(cuSize, csp);
25
 
26
         for (int j = 0; j < MAX_PRED_TYPES; j++)
27
         {
28
-            md.pred[j].cu.initialize(md.cuMemPool, depth, csp, j);
29
+            md.pred[j].cu.initialize(md.cuMemPool, depth, *m_param, j);
30
             ok &= md.pred[j].predYuv.create(cuSize, csp);
31
             ok &= md.pred[j].reconYuv.create(cuSize, csp);
32
             md.pred[j].fencYuv = &md.fencYuv;
33
@@ -115,7 +116,7 @@
34
 
35
 void Analysis::destroy()
36
 {
37
-    for (uint32_t i = 0; i <= g_maxCUDepth; i++)
38
+    for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
39
     {
40
         m_modeDepth[i].cuMemPool.destroy();
41
         m_modeDepth[i].fencYuv.destroy();
42
@@ -150,6 +151,41 @@
43
         calculateNormFactor(ctu, qp);
44
 
45
     uint32_t numPartition = ctu.m_numPartitions;
46
+    if (m_param->bCTUInfo && (*m_frame->m_ctuInfo + ctu.m_cuAddr))
47
+    {
48
+        x265_ctu_info_t* ctuTemp = *m_frame->m_ctuInfo + ctu.m_cuAddr;
49
+        if (ctuTemp->ctuPartitions)
50
+        {
51
+            int32_t depthIdx = 0;
52
+            uint32_t maxNum8x8Partitions = 64;
53
+            uint8_t* depthInfoPtr = m_frame->m_addOnDepth[ctu.m_cuAddr];
54
+            uint8_t* contentInfoPtr = m_frame->m_addOnCtuInfo[ctu.m_cuAddr];
55
+            int* prevCtuInfoChangePtr = m_frame->m_addOnPrevChange[ctu.m_cuAddr];
56
+            do
57
+            {
58
+                uint8_t depth = (uint8_t)ctuTemp->ctuPartitions[depthIdx];
59
+                uint8_t content = (uint8_t)(*((int32_t *)ctuTemp->ctuInfo + depthIdx));
60
+                int prevCtuInfoChange = m_frame->m_prevCtuInfoChange[ctu.m_cuAddr * maxNum8x8Partitions + depthIdx];
61
+                memset(depthInfoPtr, depth, sizeof(uint8_t) * numPartition >> 2 * depth);
62
+                memset(contentInfoPtr, content, sizeof(uint8_t) * numPartition >> 2 * depth);
63
+                memset(prevCtuInfoChangePtr, 0, sizeof(int) * numPartition >> 2 * depth);
64
+                for (uint32_t l = 0; l < numPartition >> 2 * depth; l++)
65
+                    prevCtuInfoChangePtr[l] = prevCtuInfoChange;
66
+                depthInfoPtr += ctu.m_numPartitions >> 2 * depth;
67
+                contentInfoPtr += ctu.m_numPartitions >> 2 * depth;
68
+                prevCtuInfoChangePtr += ctu.m_numPartitions >> 2 * depth;
69
+                depthIdx++;
70
+            } while (ctuTemp->ctuPartitions[depthIdx] != 0);
71
+
72
+            m_additionalCtuInfo = m_frame->m_addOnCtuInfo[ctu.m_cuAddr];
73
+            m_prevCtuInfoChange = m_frame->m_addOnPrevChange[ctu.m_cuAddr];
74
+            memcpy(ctu.m_cuDepth, m_frame->m_addOnDepth[ctu.m_cuAddr], sizeof(uint8_t) * numPartition);
75
+            //Calculate log2CUSize from depth
76
+            for (uint32_t i = 0; i < cuGeom.numPartitions; i++)
77
+                ctu.m_log2CUSize[i] = (uint8_t)m_param->maxLog2CUSize - ctu.m_cuDepth[i];
78
+        }
79
+    }
80
+
81
     if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead)
82
     {
83
         m_multipassAnalysis = (analysis2PassFrameData*)m_frame->m_analysis2Pass.analysisFramedata;
84
@@ -167,19 +203,19 @@
85
         }
86
     }
87
 
88
-    if (m_param->analysisMode && m_slice->m_sliceType != I_SLICE && m_param->analysisRefineLevel > 1 && m_param->analysisRefineLevel < 10)
89
+    if (m_param->analysisReuseMode && m_slice->m_sliceType != I_SLICE && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel < 10)
90
     {
91
         int numPredDir = m_slice->isInterP() ? 1 : 2;
92
         m_reuseInterDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData;
93
         m_reuseRef = &m_reuseInterDataCTU->ref[ctu.m_cuAddr * X265_MAX_PRED_MODE_PER_CTU * numPredDir];
94
         m_reuseDepth = &m_reuseInterDataCTU->depth[ctu.m_cuAddr * ctu.m_numPartitions];
95
         m_reuseModes = &m_reuseInterDataCTU->modes[ctu.m_cuAddr * ctu.m_numPartitions];
96
-        if (m_param->analysisRefineLevel > 4)
97
+        if (m_param->analysisReuseLevel > 4)
98
         {
99
             m_reusePartSize = &m_reuseInterDataCTU->partSize[ctu.m_cuAddr * ctu.m_numPartitions];
100
             m_reuseMergeFlag = &m_reuseInterDataCTU->mergeFlag[ctu.m_cuAddr * ctu.m_numPartitions];
101
         }
102
-        if (m_param->analysisMode == X265_ANALYSIS_SAVE)
103
+        if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
104
             for (int i = 0; i < X265_MAX_PRED_MODE_PER_CTU * numPredDir; i++)
105
                 m_reuseRef[i] = -1;
106
     }
107
@@ -188,7 +224,7 @@
108
     if (m_slice->m_sliceType == I_SLICE)
109
     {
110
         analysis_intra_data* intraDataCTU = (analysis_intra_data*)m_frame->m_analysisData.intraData;
111
-        if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
112
+        if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1)
113
         {
114
             memcpy(ctu.m_cuDepth, &intraDataCTU->depth[ctu.m_cuAddr * numPartition], sizeof(uint8_t) * numPartition);
115
             memcpy(ctu.m_lumaIntraDir, &intraDataCTU->modes[ctu.m_cuAddr * numPartition], sizeof(uint8_t) * numPartition);
116
@@ -200,8 +236,8 @@
117
     else
118
     {
119
         if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
120
-            ctu.m_cuPelX / g_maxCUSize >= frame.m_encData->m_pir.pirStartCol
121
-            && ctu.m_cuPelX / g_maxCUSize < frame.m_encData->m_pir.pirEndCol)
122
+            ctu.m_cuPelX / m_param->maxCUSize >= frame.m_encData->m_pir.pirStartCol
123
+            && ctu.m_cuPelX / m_param->maxCUSize < frame.m_encData->m_pir.pirEndCol)
124
             compressIntraCU(ctu, cuGeom, qp);
125
         else if (!m_param->rdLevel)
126
         {
127
@@ -214,7 +250,7 @@
128
             /* generate residual for entire CTU at once and copy to reconPic */
129
             encodeResidue(ctu, cuGeom);
130
         }
131
-        else if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel == 10)
132
+        else if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel == 10)
133
         {
134
             analysis_inter_data* interDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData;
135
             int posCTU = ctu.m_cuAddr * numPartition;
136
@@ -229,7 +265,7 @@
137
             }
138
             //Calculate log2CUSize from depth
139
             for (uint32_t i = 0; i < cuGeom.numPartitions; i++)
140
-                ctu.m_log2CUSize[i] = (uint8_t)g_maxLog2CUSize - ctu.m_cuDepth[i];
141
+                ctu.m_log2CUSize[i] = (uint8_t)m_param->maxLog2CUSize - ctu.m_cuDepth[i];
142
 
143
             qprdRefine (ctu, cuGeom, qp, qp);
144
             return *m_modeDepth[0].bestMode;
145
@@ -245,9 +281,69 @@
146
     if (m_param->bEnableRdRefine || m_param->bOptCUDeltaQP)
147
         qprdRefine(ctu, cuGeom, qp, qp);
148
 
149
+    if (m_param->csvLogLevel >= 2)
150
+        collectPUStatistics(ctu, cuGeom);
151
+
152
     return *m_modeDepth[0].bestMode;
153
 }
154
 
155
+void Analysis::collectPUStatistics(const CUData& ctu, const CUGeom& cuGeom)
156
+{
157
+    uint8_t depth = 0;
158
+    uint8_t partSize = 0;
159
+    for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
160
+    {
161
+        depth = ctu.m_cuDepth[absPartIdx];
162
+        partSize = ctu.m_partSize[absPartIdx];
163
+        uint32_t numPU = nbPartsTable[(int)partSize];
164
+        int shift = 2 * (m_param->maxCUDepth + 1 - depth);
165
+        for (uint32_t puIdx = 0; puIdx < numPU; puIdx++)
166
+        {
167
+            PredictionUnit pu(ctu, cuGeom, puIdx);
168
+            int puabsPartIdx = ctu.getPUOffset(puIdx, absPartIdx);
169
+            int mode = 1;
170
+            if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_Nx2N || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxN)
171
+                mode = 2;
172
+            else if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxnU || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxnD || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_nLx2N || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_nRx2N)
173
+                 mode = 3;
174
+
175
+            if (ctu.m_predMode[puabsPartIdx + absPartIdx] == MODE_SKIP)
176
+            {
177
+                ctu.m_encData->m_frameStats.cntSkipPu[depth] += (uint64_t)(1 << shift);
178
+                ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
179
+            }
180
+            else if (ctu.m_predMode[puabsPartIdx + absPartIdx] == MODE_INTRA)
181
+            {
182
+                if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_NxN)
183
+                {
184
+                    ctu.m_encData->m_frameStats.cnt4x4++;
185
+                    ctu.m_encData->m_frameStats.totalPu[4]++;
186
+                }
187
+                else
188
+                {
189
+                    ctu.m_encData->m_frameStats.cntIntraPu[depth] += (uint64_t)(1 << shift);
190
+                    ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
191
+                }
192
+            }
193
+            else if (mode == 3)
194
+            {
195
+                ctu.m_encData->m_frameStats.cntAmp[depth] += (uint64_t)(1 << shift);
196
+                ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
197
+                break;
198
+            }
199
+            else
200
+            {
201
+                if (ctu.m_mergeFlag[puabsPartIdx + absPartIdx])
202
+                    ctu.m_encData->m_frameStats.cntMergePu[depth][ctu.m_partSize[puabsPartIdx + absPartIdx]] += (1 << shift) / mode;
203
+                else
204
+                    ctu.m_encData->m_frameStats.cntInterPu[depth][ctu.m_partSize[puabsPartIdx + absPartIdx]] += (1 << shift) / mode;
205
+
206
+                ctu.m_encData->m_frameStats.totalPu[depth] += (1 << shift) / mode;
207
+            }
208
+        }
209
+    }
210
+}
211
+
212
 int32_t Analysis::loadTUDepth(CUGeom cuGeom, CUData parentCTU)
213
 {
214
     float predDepth = 0;
215
@@ -336,7 +432,7 @@
216
     int lambdaQP = lqp;
217
 
218
     bool doQPRefine = (bDecidedDepth && depth <= m_slice->m_pps->maxCuDQPDepth) || (!bDecidedDepth && depth == m_slice->m_pps->maxCuDQPDepth);
219
-    if (m_param->analysisRefineLevel == 10)
220
+    if (m_param->analysisReuseLevel == 10)
221
         doQPRefine = false;
222
 
223
     if (doQPRefine)
224
@@ -400,6 +496,13 @@
225
 
226
     bool bAlreadyDecided = parentCTU.m_lumaIntraDir[cuGeom.absPartIdx] != (uint8_t)ALL_IDX;
227
     bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
228
+    int split = 0;
229
+    if (m_param->intraRefine)
230
+    {
231
+        split = ((cuGeom.log2CUSize == (uint32_t)(g_log2Size[m_param->minCUSize] + 1)) && bDecidedDepth);
232
+        if (cuGeom.log2CUSize == (uint32_t)(g_log2Size[m_param->minCUSize]) && !bDecidedDepth)
233
+            bAlreadyDecided = false;
234
+    }
235
 
236
     if (bAlreadyDecided)
237
     {
238
@@ -408,8 +511,11 @@
239
             Mode& mode = md.pred[0];
240
             md.bestMode = &mode;
241
             mode.cu.initSubCU(parentCTU, cuGeom, qp);
242
-            memcpy(mode.cu.m_lumaIntraDir, parentCTU.m_lumaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
243
-            memcpy(mode.cu.m_chromaIntraDir, parentCTU.m_chromaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
244
+            if (m_param->intraRefine != 2 || parentCTU.m_lumaIntraDir[cuGeom.absPartIdx] <= 1)
245
+            {
246
+                memcpy(mode.cu.m_lumaIntraDir, parentCTU.m_lumaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
247
+                memcpy(mode.cu.m_chromaIntraDir, parentCTU.m_chromaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
248
+            }
249
             checkIntra(mode, cuGeom, (PartSize)parentCTU.m_partSize[cuGeom.absPartIdx]);
250
 
251
             if (m_bTryLossless)
252
@@ -440,7 +546,7 @@
253
     }
254
 
255
     // stop recursion if we reach the depth of previous analysis decision
256
-    mightSplit &= !(bAlreadyDecided && bDecidedDepth);
257
+    mightSplit &= !(bAlreadyDecided && bDecidedDepth) || split;
258
 
259
     if (mightSplit)
260
     {
261
@@ -501,7 +607,7 @@
262
     }
263
 
264
     /* Save Intra CUs TU depth only when analysis mode is OFF */
265
-    if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4 && !m_param->analysisMode)
266
+    if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4 && !m_param->analysisReuseMode)
267
     {
268
         CUData* ctu = md.bestMode->cu.m_encData->getPicCTU(parentCTU.m_cuAddr);
269
         int8_t maxTUDepth = -1;
270
@@ -1017,11 +1123,21 @@
271
     bool mightSplit = !(cuGeom.flags & CUGeom::LEAF);
272
     bool mightNotSplit = !(cuGeom.flags & CUGeom::SPLIT_MANDATORY);
273
     uint32_t minDepth = topSkipMinDepth(parentCTU, cuGeom);
274
+    bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
275
     bool skipModes = false; /* Skip any remaining mode analyses at current depth */
276
     bool skipRecursion = false; /* Skip recursion */
277
     bool splitIntra = true;
278
     bool skipRectAmp = false;
279
     bool chooseMerge = false;
280
+    bool bCtuInfoCheck = false;
281
+    int sameContentRef = 0;
282
+
283
+    if (m_evaluateInter == 1)
284
+    {
285
+        skipRectAmp = !!md.bestMode;
286
+        mightSplit &= false;
287
+        minDepth = depth;
288
+    }
289
 
290
     if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4)
291
         m_maxTUDepth = loadTUDepth(cuGeom, parentCTU);
292
@@ -1040,7 +1156,54 @@
293
         md.pred[PRED_2Nx2N].sa8dCost = 0;
294
     }
295
 
296
-    if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
297
+    if (m_param->bCTUInfo && depth <= parentCTU.m_cuDepth[cuGeom.absPartIdx])
298
+    {
299
+        if (bDecidedDepth && m_additionalCtuInfo[cuGeom.absPartIdx])
300
+            sameContentRef = findSameContentRefCount(parentCTU, cuGeom);
301
+        if (depth < parentCTU.m_cuDepth[cuGeom.absPartIdx])
302
+        {
303
+            mightNotSplit &= bDecidedDepth;
304
+            bCtuInfoCheck = skipRecursion = false;
305
+            skipModes = true;
306
+        }
307
+        else if (mightNotSplit && bDecidedDepth)
308
+        {
309
+            if (m_additionalCtuInfo[cuGeom.absPartIdx])
310
+            {
311
+                bCtuInfoCheck = skipRecursion = true;
312
+                md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
313
+                md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
314
+                checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
315
+                if (!sameContentRef)
316
+                {
317
+                    if ((m_param->bCTUInfo & 2) && (m_slice->m_pps->bUseDQP && depth <= m_slice->m_pps->maxCuDQPDepth))
318
+                    {
319
+                        qp -= int32_t(0.04 * qp);
320
+                        setLambdaFromQP(parentCTU, qp);
321
+                    }
322
+                    if (m_param->bCTUInfo & 4)
323
+                        skipModes = false;
324
+                }
325
+                if (sameContentRef || (!sameContentRef && !(m_param->bCTUInfo & 4)))
326
+                {
327
+                    if (m_param->rdLevel)
328
+                        skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0);
329
+                    if ((m_param->bCTUInfo & 4) && sameContentRef)
330
+                        skipModes = md.bestMode && true;
331
+                }
332
+            }
333
+            else
334
+            {
335
+                md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
336
+                md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
337
+                checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
338
+                if (m_param->rdLevel)
339
+                    skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0);
340
+            }
341
+            mightSplit &= !bDecidedDepth;
342
+        }
343
+    }
344
+    if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
345
     {
346
         if (mightNotSplit && depth == m_reuseDepth[cuGeom.absPartIdx])
347
         {
348
@@ -1054,7 +1217,7 @@
349
                 if (m_param->rdLevel)
350
                     skipModes = m_param->bEnableEarlySkip && md.bestMode;
351
             }
352
-            if (m_param->analysisRefineLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
353
+            if (m_param->analysisReuseLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
354
             {
355
                 if (m_reuseModes[cuGeom.absPartIdx] != MODE_INTRA  && m_reuseModes[cuGeom.absPartIdx] != 4)
356
                 {
357
@@ -1082,7 +1245,7 @@
358
     }
359
 
360
     /* Step 1. Evaluate Merge/Skip candidates for likely early-outs, if skip mode was not set above */
361
-    if (mightNotSplit && depth >= minDepth && !md.bestMode) /* TODO: Re-evaluate if analysis load/save still works */
362
+    if (mightNotSplit && depth >= minDepth && !md.bestMode && !bCtuInfoCheck) /* TODO: Re-evaluate if analysis load/save still works */
363
     {
364
         /* Compute Merge Cost */
365
         md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
366
@@ -1092,7 +1255,7 @@
367
             skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0); // TODO: sa8d threshold per depth
368
     }
369
 
370
-    if (md.bestMode && m_param->bEnableRecursionSkip)
371
+    if (md.bestMode && m_param->bEnableRecursionSkip && !bCtuInfoCheck)
372
     {
373
         skipRecursion = md.bestMode->cu.isSkipped(0);
374
         if (mightSplit && depth >= minDepth && !skipRecursion)
375
@@ -1107,6 +1270,8 @@
376
     /* Step 2. Evaluate each of the 4 split sub-blocks in series */
377
     if (mightSplit && !skipRecursion)
378
     {
379
+        if (bCtuInfoCheck && m_param->bCTUInfo & 2)
380
+            qp = int((1 / 0.96) * qp + 0.5);
381
         Mode* splitPred = &md.pred[PRED_SPLIT];
382
         splitPred->initCosts();
383
         CUData* splitCU = &splitPred->cu;
384
@@ -1162,7 +1327,7 @@
385
      *   2  3 */
386
     uint32_t allSplitRefs = splitData[0].splitRefs | splitData[1].splitRefs | splitData[2].splitRefs | splitData[3].splitRefs;
387
     /* Step 3. Evaluate ME (2Nx2N, rect, amp) and intra modes at current depth */
388
-    if (mightNotSplit && depth >= minDepth)
389
+    if (mightNotSplit && (depth >= minDepth || (m_param->bCTUInfo && !md.bestMode)))
390
     {
391
         if (m_slice->m_pps->bUseDQP && depth <= m_slice->m_pps->maxCuDQPDepth && m_slice->m_pps->maxCuDQPDepth != 0)
392
             setLambdaFromQP(parentCTU, qp);
393
@@ -1346,7 +1511,7 @@
394
                     }
395
                 }
396
             }
397
-            bool bTryIntra = (m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames) && cuGeom.log2CUSize != MAX_LOG2_CU_SIZE;
398
+            bool bTryIntra = (m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames) && cuGeom.log2CUSize != MAX_LOG2_CU_SIZE && !((m_param->bCTUInfo & 4) && bCtuInfoCheck);
399
             if (m_param->rdLevel >= 3)
400
             {
401
                 /* Calculate RD cost of best inter option */
402
@@ -1584,10 +1749,19 @@
403
 
404
     bool mightSplit = !(cuGeom.flags & CUGeom::LEAF);
405
     bool mightNotSplit = !(cuGeom.flags & CUGeom::SPLIT_MANDATORY);
406
+    bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
407
     bool skipRecursion = false;
408
     bool skipModes = false;
409
     bool splitIntra = true;
410
     bool skipRectAmp = false;
411
+    bool bCtuInfoCheck = false;
412
+    int sameContentRef = 0;
413
+
414
+    if (m_evaluateInter == 1)
415
+    {
416
+        skipRectAmp = !!md.bestMode;
417
+        mightSplit &= false;
418
+    }
419
 
420
     // avoid uninitialize value in below reference
421
     if (m_param->limitModes)
422
@@ -1607,7 +1781,58 @@
423
     splitData[3].initSplitCUData();
424
     uint32_t allSplitRefs = splitData[0].splitRefs | splitData[1].splitRefs | splitData[2].splitRefs | splitData[3].splitRefs;
425
     uint32_t refMasks[2];
426
-    if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
427
+    if (m_param->bCTUInfo && depth <= parentCTU.m_cuDepth[cuGeom.absPartIdx])
428
+    {
429
+        if (bDecidedDepth && m_additionalCtuInfo[cuGeom.absPartIdx])
430
+            sameContentRef = findSameContentRefCount(parentCTU, cuGeom);
431
+        if (depth < parentCTU.m_cuDepth[cuGeom.absPartIdx])
432
+        {
433
+            mightNotSplit &= bDecidedDepth;
434
+            bCtuInfoCheck = skipRecursion = false;
435
+            skipModes = true;
436
+        }
437
+        else if (mightNotSplit && bDecidedDepth)
438
+        {
439
+            if (m_additionalCtuInfo[cuGeom.absPartIdx])
440
+            {
441
+                bCtuInfoCheck = skipRecursion = true;
442
+                refMasks[0] = allSplitRefs;
443
+                md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);
444
+                checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, refMasks);
445
+                checkBestMode(md.pred[PRED_2Nx2N], cuGeom.depth);
446
+                if (!sameContentRef)
447
+                {
448
+                    if ((m_param->bCTUInfo & 2) && (m_slice->m_pps->bUseDQP && depth <= m_slice->m_pps->maxCuDQPDepth))
449
+                    {
450
+                        qp -= int32_t(0.04 * qp);
451
+                        setLambdaFromQP(parentCTU, qp);
452
+                    }
453
+                    if (m_param->bCTUInfo & 4)
454
+                        skipModes = false;
455
+                }
456
+                if (sameContentRef || (!sameContentRef && !(m_param->bCTUInfo & 4)))
457
+                {
458
+                    if (m_param->rdLevel)
459
+                        skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0);
460
+                    if ((m_param->bCTUInfo & 4) && sameContentRef)
461
+                        skipModes = md.bestMode && true;
462
+                }
463
+            }
464
+            else
465
+            {
466
+                md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
467
+                md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
468
+                checkMerge2Nx2N_rd5_6(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
469
+                skipModes = !!m_param->bEnableEarlySkip && md.bestMode;
470
+                refMasks[0] = allSplitRefs;
471
+                md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);
472
+                checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, refMasks);
473
+                checkBestMode(md.pred[PRED_2Nx2N], cuGeom.depth);
474
+            }
475
+            mightSplit &= !bDecidedDepth;
476
+        }
477
+    }
478
+    if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
479
     {
480
         if (mightNotSplit && depth == m_reuseDepth[cuGeom.absPartIdx])
481
         {
482
@@ -1625,7 +1850,7 @@
483
                 if (m_param->bEnableRecursionSkip && depth && m_modeDepth[depth - 1].bestMode)
484
                     skipRecursion = md.bestMode && !md.bestMode->cu.getQtRootCbf(0);
485
             }
486
-            if (m_param->analysisRefineLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
487
+            if (m_param->analysisReuseLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
488
                 skipRectAmp = true && !!md.bestMode;
489
         }
490
     }
491
@@ -1653,7 +1878,7 @@
492
     }
493
 
494
     /* Step 1. Evaluate Merge/Skip candidates for likely early-outs */
495
-    if (mightNotSplit && !md.bestMode)
496
+    if (mightNotSplit && !md.bestMode && !bCtuInfoCheck)
497
     {
498
         md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
499
         md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
500
@@ -1672,6 +1897,8 @@
501
     /* Step 2. Evaluate each of the 4 split sub-blocks in series */
502
     if (mightSplit && !skipRecursion)
503
     {
504
+        if (bCtuInfoCheck && m_param->bCTUInfo & 2)
505
+            qp = int((1 / 0.96) * qp + 0.5);
506
         Mode* splitPred = &md.pred[PRED_SPLIT];
507
         splitPred->initCosts();
508
         CUData* splitCU = &splitPred->cu;
509
@@ -1908,7 +2135,7 @@
510
                 }
511
             }
512
 
513
-            if ((m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames) && cuGeom.log2CUSize != MAX_LOG2_CU_SIZE)
514
+            if ((m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames) && (cuGeom.log2CUSize != MAX_LOG2_CU_SIZE) && !((m_param->bCTUInfo & 4) && bCtuInfoCheck))
515
             {
516
                 if (!m_param->limitReferences || splitIntra)
517
                 {
518
@@ -2008,10 +2235,14 @@
519
     ModeDepth& md = m_modeDepth[depth];
520
     md.bestMode = NULL;
521
 
522
+    m_evaluateInter = 0;
523
     bool mightSplit = !(cuGeom.flags & CUGeom::LEAF);
524
     bool mightNotSplit = !(cuGeom.flags & CUGeom::SPLIT_MANDATORY);
525
     bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
526
 
527
+    int split = (m_param->interRefine && cuGeom.log2CUSize == (uint32_t)(g_log2Size[m_param->minCUSize] + 1)
528
+                && bDecidedDepth && parentCTU.m_predMode[cuGeom.absPartIdx] == MODE_SKIP);
529
+
530
     if (bDecidedDepth)
531
     {
532
         setLambdaFromQP(parentCTU, qp, lqp);
533
@@ -2022,8 +2253,11 @@
534
         PartSize size = (PartSize)parentCTU.m_partSize[cuGeom.absPartIdx];
535
         if (parentCTU.isIntra(cuGeom.absPartIdx))
536
         {
537
-            memcpy(mode.cu.m_lumaIntraDir, parentCTU.m_lumaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
538
-            memcpy(mode.cu.m_chromaIntraDir, parentCTU.m_chromaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
539
+            if (m_param->intraRefine != 2 || parentCTU.m_lumaIntraDir[cuGeom.absPartIdx] <= 1)
540
+            {
541
+                memcpy(mode.cu.m_lumaIntraDir, parentCTU.m_lumaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
542
+                memcpy(mode.cu.m_chromaIntraDir, parentCTU.m_chromaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
543
+            }
544
             checkIntra(mode, cuGeom, size);
545
         }
546
         else
547
@@ -2033,20 +2267,22 @@
548
             for (uint32_t part = 0; part < numPU; part++)
549
             {
550
                 PredictionUnit pu(mode.cu, cuGeom, part);
551
-                if (m_param->analysisRefineLevel == 10)
552
+                if (m_param->analysisReuseLevel == 10)
553
                 {
554
                     analysis_inter_data* interDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData;
555
                     int cuIdx = (mode.cu.m_cuAddr * parentCTU.m_numPartitions) + cuGeom.absPartIdx;
556
                     mode.cu.m_mergeFlag[pu.puAbsPartIdx] = interDataCTU->mergeFlag[cuIdx + part];
557
                     mode.cu.setPUInterDir(interDataCTU->interDir[cuIdx + part], pu.puAbsPartIdx, part);
558
-                    for (int dir = 0; dir < m_slice->isInterB() + 1; dir++)
559
+                    for (int list = 0; list < m_slice->isInterB() + 1; list++)
560
                     {
561
-                        mode.cu.setPUMv(dir, interDataCTU->mv[dir][cuIdx + part], pu.puAbsPartIdx, part);
562
-                        mode.cu.setPURefIdx(dir, interDataCTU->refIdx[dir][cuIdx + part], pu.puAbsPartIdx, part);
563
-                        mode.cu.m_mvpIdx[dir][pu.puAbsPartIdx] = interDataCTU->mvpIdx[dir][cuIdx + part];
564
+                        mode.cu.setPUMv(list, interDataCTU->mv[list][cuIdx + part], pu.puAbsPartIdx, part);
565
+                        mode.cu.setPURefIdx(list, interDataCTU->refIdx[list][cuIdx + part], pu.puAbsPartIdx, part);
566
+                        mode.cu.m_mvpIdx[list][pu.puAbsPartIdx] = interDataCTU->mvpIdx[list][cuIdx + part];
567
                     }
568
                     if (!mode.cu.m_mergeFlag[pu.puAbsPartIdx])
569
                     {
570
+                        if (m_param->mvRefine)
571
+                            m_me.setSourcePU(*mode.fencYuv, pu.ctuAddr, pu.cuAbsPartIdx, pu.puAbsPartIdx, pu.width, pu.height, m_param->searchMethod, m_param->subpelRefine, false);
572
                         //AMVP
573
                         MV mvc[(MD_ABOVE_LEFT + 1) * 2 + 2];
574
                         mode.cu.getNeighbourMV(part, pu.puAbsPartIdx, mode.interNeighbours);
575
@@ -2057,14 +2293,31 @@
576
                                 continue;
577
                             mode.cu.getPMV(mode.interNeighbours, list, ref, mode.amvpCand[list][ref], mvc);
578
                             MV mvp = mode.amvpCand[list][ref][mode.cu.m_mvpIdx[list][pu.puAbsPartIdx]];
579
+                            if (m_param->mvRefine)
580
+                            {
581
+                                MV outmv;
582
+                                searchMV(mode, pu, list, ref, outmv);
583
+                                mode.cu.setPUMv(list, outmv, pu.puAbsPartIdx, part);
584
+                            }
585
                             mode.cu.m_mvd[list][pu.puAbsPartIdx] = mode.cu.m_mv[list][pu.puAbsPartIdx] - mvp;
586
                         }
587
                     }
588
+                    else if(m_param->scaleFactor)
589
+                    {
590
+                        MVField candMvField[MRG_MAX_NUM_CANDS][2]; // double length for mv of both lists
591
+                        uint8_t candDir[MRG_MAX_NUM_CANDS];
592
+                        mode.cu.getInterMergeCandidates(pu.puAbsPartIdx, part, candMvField, candDir);
593
+                        uint8_t mvpIdx = mode.cu.m_mvpIdx[0][pu.puAbsPartIdx];
594
+                        mode.cu.setPUInterDir(candDir[mvpIdx], pu.puAbsPartIdx, part);
595
+                        mode.cu.setPUMv(0, candMvField[mvpIdx][0].mv, pu.puAbsPartIdx, part);
596
+                        mode.cu.setPUMv(1, candMvField[mvpIdx][1].mv, pu.puAbsPartIdx, part);
597
+                        mode.cu.setPURefIdx(0, (int8_t)candMvField[mvpIdx][0].refIdx, pu.puAbsPartIdx, part);
598
+                        mode.cu.setPURefIdx(1, (int8_t)candMvField[mvpIdx][1].refIdx, pu.puAbsPartIdx, part);
599
+                    }
600
                 }
601
                 motionCompensation(mode.cu, pu, mode.predYuv, true, (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400));
602
             }
603
-
604
-            if (parentCTU.isSkipped(cuGeom.absPartIdx))
605
+            if (!m_param->interRefine && parentCTU.isSkipped(cuGeom.absPartIdx))
606
                 encodeResAndCalcRdSkipCU(mode);
607
             else
608
                 encodeResAndCalcRdInterCU(mode, cuGeom);
609
@@ -2083,11 +2336,18 @@
610
 
611
         if (mightSplit && m_param->rdLevel < 5)
612
             checkDQPForSplitPred(*md.bestMode, cuGeom);
613
+
614
+        if (m_param->interRefine && parentCTU.m_predMode[cuGeom.absPartIdx] == MODE_SKIP  && !mode.cu.isSkipped(0))
615
+        {
616
+            m_evaluateInter = 1;
617
+            m_param->rdLevel > 4 ? compressInterCU_rd5_6(parentCTU, cuGeom, qp) : compressInterCU_rd0_4(parentCTU, cuGeom, qp);
618
+        }
619
     }
620
-    else
621
+    if (!bDecidedDepth || split)
622
     {
623
         Mode* splitPred = &md.pred[PRED_SPLIT];
624
-        md.bestMode = splitPred;
625
+        if (!split)
626
+            md.bestMode = splitPred;
627
         splitPred->initCosts();
628
         CUData* splitCU = &splitPred->cu;
629
         splitCU->initSubCU(parentCTU, cuGeom, qp);
630
@@ -2109,8 +2369,12 @@
631
                 if (m_slice->m_pps->bUseDQP && nextDepth <= m_slice->m_pps->maxCuDQPDepth)
632
                     nextQP = setLambdaFromQP(parentCTU, calculateQpforCuSize(parentCTU, childGeom));
633
 
634
-                int lamdaQP = m_param->analysisRefineLevel == 10 ? nextQP : lqp;
635
-                qprdRefine(parentCTU, childGeom, nextQP, lamdaQP);
636
+                int lamdaQP = m_param->analysisReuseLevel == 10 ? nextQP : lqp;
637
+
638
+                if (split)
639
+                    m_param->rdLevel > 4 ? compressInterCU_rd5_6(parentCTU, childGeom, nextQP) : compressInterCU_rd0_4(parentCTU, childGeom, nextQP);
640
+                else
641
+                    qprdRefine(parentCTU, childGeom, nextQP, lamdaQP);
642
 
643
                 // Save best CU and pred data for this sub CU
644
                 splitCU->copyPartFrom(nd.bestMode->cu, childGeom, subPartIdx);
645
@@ -2131,6 +2395,14 @@
646
         else
647
             updateModeCost(*splitPred);
648
 
649
+        if (m_param->interRefine)
650
+        {
651
+            if (m_param->rdLevel > 1)
652
+                checkBestMode(*splitPred, cuGeom.depth);
653
+            else if (splitPred->sa8dCost < md.bestMode->sa8dCost)
654
+                md.bestMode = splitPred;
655
+        }
656
+
657
         checkDQPForSplitPred(*splitPred, cuGeom);
658
 
659
         /* Copy best data to encData CTU and recon */
660
@@ -2174,7 +2446,7 @@
661
     int safeX, maxSafeMv;
662
     if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE)
663
     {
664
-        safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * g_maxCUSize - 3;
665
+        safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * m_param->maxCUSize - 3;
666
         maxSafeMv = (safeX - tempPred->cu.m_cuPelX) * 4;
667
     }
668
     for (uint32_t i = 0; i < numMergeCand; ++i)
669
@@ -2200,7 +2472,7 @@
670
         }
671
 
672
         if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
673
-            tempPred->cu.m_cuPelX / g_maxCUSize < m_frame->m_encData->m_pir.pirEndCol &&
674
+            tempPred->cu.m_cuPelX / m_param->maxCUSize < m_frame->m_encData->m_pir.pirEndCol &&
675
             candMvField[i][0].mv.x > maxSafeMv)
676
             // skip merge candidates which reference beyond safe reference area
677
             continue;
678
@@ -2304,7 +2576,7 @@
679
     int safeX, maxSafeMv;
680
     if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE)
681
     {
682
-        safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * g_maxCUSize - 3;
683
+        safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * m_param->maxCUSize - 3;
684
         maxSafeMv = (safeX - tempPred->cu.m_cuPelX) * 4;
685
     }
686
     for (uint32_t i = 0; i < numMergeCand; i++)
687
@@ -2345,7 +2617,7 @@
688
             triedBZero = true;
689
         }
690
         if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
691
-            tempPred->cu.m_cuPelX / g_maxCUSize < m_frame->m_encData->m_pir.pirEndCol &&
692
+            tempPred->cu.m_cuPelX / m_param->maxCUSize < m_frame->m_encData->m_pir.pirEndCol &&
693
             candMvField[i][0].mv.x > maxSafeMv)
694
             // skip merge candidates which reference beyond safe reference area
695
             continue;
696
@@ -2420,7 +2692,7 @@
697
     interMode.cu.setPredModeSubParts(MODE_INTER);
698
     int numPredDir = m_slice->isInterP() ? 1 : 2;
699
 
700
-    if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_reuseInterDataCTU && m_param->analysisRefineLevel > 1)
701
+    if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_reuseInterDataCTU && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
702
     {
703
         int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;
704
         int index = 0;
705
@@ -2462,7 +2734,7 @@
706
     }
707
     interMode.sa8dCost = m_rdCost.calcRdSADCost((uint32_t)interMode.distortion, interMode.sa8dBits);
708
 
709
-    if (m_param->analysisMode == X265_ANALYSIS_SAVE && m_reuseInterDataCTU && m_param->analysisRefineLevel > 1)
710
+    if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE && m_reuseInterDataCTU && m_param->analysisReuseLevel > 1)
711
     {
712
         int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;
713
         int index = 0;
714
@@ -2484,7 +2756,7 @@
715
     interMode.cu.setPredModeSubParts(MODE_INTER);
716
     int numPredDir = m_slice->isInterP() ? 1 : 2;
717
 
718
-    if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_reuseInterDataCTU && m_param->analysisRefineLevel > 1)
719
+    if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_reuseInterDataCTU && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
720
     {
721
         int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;
722
         int index = 0;
723
@@ -2518,7 +2790,7 @@
724
     /* predInterSearch sets interMode.sa8dBits, but this is ignored */
725
     encodeResAndCalcRdInterCU(interMode, cuGeom);
726
 
727
-    if (m_param->analysisMode == X265_ANALYSIS_SAVE && m_reuseInterDataCTU && m_param->analysisRefineLevel > 1)
728
+    if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE && m_reuseInterDataCTU && m_param->analysisReuseLevel > 1)
729
     {
730
         int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;
731
         int index = 0;
732
@@ -2671,7 +2943,7 @@
733
 
734
 void Analysis::encodeResidue(const CUData& ctu, const CUGeom& cuGeom)
735
 {
736
-    if (cuGeom.depth < ctu.m_cuDepth[cuGeom.absPartIdx] && cuGeom.depth < g_maxCUDepth)
737
+    if (cuGeom.depth < ctu.m_cuDepth[cuGeom.absPartIdx] && cuGeom.depth < ctu.m_encData->m_param->maxCUDepth)
738
     {
739
         for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++)
740
         {
741
@@ -2970,7 +3242,7 @@
742
         uint32_t block_x = ctu.m_cuPelX + g_zscanToPelX[cuGeom.absPartIdx];
743
         uint32_t block_y = ctu.m_cuPelY + g_zscanToPelY[cuGeom.absPartIdx];
744
         uint32_t maxCols = (m_frame->m_fencPic->m_picWidth + (loopIncr - 1)) / loopIncr;
745
-        uint32_t blockSize = g_maxCUSize >> cuGeom.depth;
746
+        uint32_t blockSize = m_param->maxCUSize >> cuGeom.depth;
747
         double qp_offset = 0;
748
         uint32_t cnt = 0;
749
         uint32_t idx;
750
@@ -3064,3 +3336,22 @@
751
         normFactor(srcV, blockSizeC, ctu, qp, TEXT_CHROMA_V);
752
     }
753
 }
754
+
755
+int Analysis::findSameContentRefCount(const CUData& parentCTU, const CUGeom& cuGeom)
756
+{
757
+    int sameContentRef = 0;
758
+    int m_curPoc = parentCTU.m_slice->m_poc;
759
+    int prevChange = m_prevCtuInfoChange[cuGeom.absPartIdx];
760
+    int numPredDir = m_slice->isInterP() ? 1 : 2;
761
+    for (int list = 0; list < numPredDir; list++)
762
+    {
763
+        for (int i = 0; i < m_frame->m_encData->m_slice->m_numRefIdx[list]; i++)
764
+        {
765
+            int refPoc = m_frame->m_encData->m_slice->m_refFrameList[list][i]->m_poc;
766
+            int refPrevChange = m_frame->m_encData->m_slice->m_refFrameList[list][i]->m_addOnPrevChange[parentCTU.m_cuAddr][cuGeom.absPartIdx];
767
+            if ((refPoc < prevChange && refPoc < m_curPoc) || (refPoc > m_curPoc && prevChange < m_curPoc && refPrevChange > m_curPoc) || ((refPoc == prevChange) && (m_additionalCtuInfo[cuGeom.absPartIdx] == CTU_INFO_CHANGE)))
768
+                sameContentRef++;    /* Content changed */
769
+        }
770
+    }
771
+    return sameContentRef;
772
+}
773
x265_2.4.tar.gz/source/encoder/analysis.h -> x265_2.5.tar.gz/source/encoder/analysis.h Changed
30
 
1
@@ -137,6 +137,10 @@
2
     int*                    m_multipassMvpIdx[2];
3
     int32_t*                m_multipassRef[2];
4
     uint8_t*                m_multipassModes;
5
+
6
+    uint8_t                 m_evaluateInter;
7
+    uint8_t*                m_additionalCtuInfo;
8
+    int*                    m_prevCtuInfoChange;
9
     /* refine RD based on QP for rd-levels 5 and 6 */
10
     void qprdRefine(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp, int32_t lqp);
11
 
12
@@ -178,6 +182,9 @@
13
 
14
     void calculateNormFactor(CUData& ctu, int qp);
15
     void normFactor(const pixel* src, uint32_t blockSize, CUData& ctu, int qp, TextType ttype);
16
+
17
+    void collectPUStatistics(const CUData& ctu, const CUGeom& cuGeom);
18
+
19
     /* check whether current mode is the new best */
20
     inline void checkBestMode(Mode& mode, uint32_t depth)
21
     {
22
@@ -190,6 +197,7 @@
23
         else
24
             md.bestMode = &mode;
25
     }
26
+    int findSameContentRefCount(const CUData& parentCTU, const CUGeom& cuGeom);
27
 };
28
 
29
 struct ThreadLocalData
30
x265_2.4.tar.gz/source/encoder/api.cpp -> x265_2.5.tar.gz/source/encoder/api.cpp Changed
152
 
1
@@ -30,6 +30,7 @@
2
 #include "level.h"
3
 #include "nal.h"
4
 #include "bitcost.h"
5
+#include "x265-extras.h"
6
 
7
 /* multilib namespace reflectors */
8
 #if LINKED_8BIT
9
@@ -96,9 +97,6 @@
10
     if (x265_check_params(param))
11
         goto fail;
12
 
13
-    if (x265_set_globals(param))
14
-        goto fail;
15
-
16
     encoder = new Encoder;
17
     if (!param->rc.bEnableSlowFirstPass)
18
         PARAM_NS::x265_param_apply_fastfirstpass(param);
19
@@ -119,6 +117,17 @@
20
     }
21
 
22
     encoder->create();
23
+    /* Try to open CSV file handle */
24
+    if (encoder->m_param->csvfn)
25
+    {
26
+        encoder->m_param->csvfpt = x265_csvlog_open(*encoder->m_param, encoder->m_param->csvfn, encoder->m_param->csvLogLevel);
27
+        if (!encoder->m_param->csvfpt)
28
+        {
29
+            x265_log(encoder->m_param, X265_LOG_ERROR, "Unable to open CSV log file <%s>, aborting\n", encoder->m_param->csvfn);
30
+            encoder->m_aborted = true;
31
+        }
32
+    }
33
+
34
     encoder->m_latestParam = latestParam;
35
     memcpy(latestParam, param, sizeof(x265_param));
36
     if (encoder->m_aborted)
37
@@ -144,7 +153,10 @@
38
         if (encoder->m_param->rc.bStatRead && encoder->m_param->bMultiPassOptRPS)
39
         {
40
             if (!encoder->computeSPSRPSIndex())
41
+            {
42
+                encoder->m_aborted = true;
43
                 return -1;
44
+            }
45
         }
46
         encoder->getStreamHeaders(encoder->m_nalList, sbacCoder, bs);
47
         *pp_nal = &encoder->m_nalList.m_nal[0];
48
@@ -152,6 +164,11 @@
49
         return encoder->m_nalList.m_occupancy;
50
     }
51
 
52
+    if (enc)
53
+    {
54
+        Encoder *encoder = static_cast<Encoder*>(enc);
55
+        encoder->m_aborted = true;
56
+    }
57
     return -1;
58
 }
59
 
60
@@ -251,6 +268,12 @@
61
     else if (pi_nal)
62
         *pi_nal = 0;
63
 
64
+    if (numEncoded && encoder->m_param->csvLogLevel)
65
+        x265_csvlog_frame(encoder->m_param->csvfpt, *encoder->m_param, *pic_out, encoder->m_param->csvLogLevel);
66
+
67
+    if (numEncoded < 0)
68
+        encoder->m_aborted = true;
69
+
70
     return numEncoded;
71
 }
72
 
73
@@ -263,12 +286,17 @@
74
     }
75
 }
76
 
77
-void x265_encoder_log(x265_encoder* enc, int, char **)
78
+void x265_encoder_log(x265_encoder* enc, int argc, char **argv)
79
 {
80
     if (enc)
81
     {
82
         Encoder *encoder = static_cast<Encoder*>(enc);
83
-        x265_log(encoder->m_param, X265_LOG_WARNING, "x265_encoder_log is now deprecated\n");
84
+        x265_stats stats;
85
+        int padx = encoder->m_sps.conformanceWindow.rightOffset;
86
+        int pady = encoder->m_sps.conformanceWindow.bottomOffset;
87
+        encoder->fetchStats(&stats, sizeof(stats));
88
+        const x265_api * api = x265_api_get(0);
89
+        x265_csvlog_encode(encoder->m_param->csvfpt, api->version_str, *encoder->m_param, padx, pady, stats, encoder->m_param->csvLogLevel, argc, argv);
90
     }
91
 }
92
 
93
@@ -282,7 +310,6 @@
94
         encoder->printSummary();
95
         encoder->destroy();
96
         delete encoder;
97
-        ATOMIC_DEC(&g_ctuSizeConfigured);
98
     }
99
 }
100
 
101
@@ -295,14 +322,18 @@
102
     encoder->m_bQueuedIntraRefresh = 1;
103
     return 0;
104
 }
105
+int x265_encoder_ctu_info(x265_encoder *enc, int poc, x265_ctu_info_t** ctu)
106
+{
107
+    if (!ctu || !enc)
108
+        return -1;
109
+    Encoder* encoder = static_cast<Encoder*>(enc);
110
+    encoder->copyCtuInfo(ctu, poc);
111
+    return 0;
112
+}
113
 
114
 void x265_cleanup(void)
115
 {
116
-    if (!g_ctuSizeConfigured)
117
-    {
118
-        BitCost::destroy();
119
-        CUData::s_partSet[0] = NULL; /* allow CUData to adjust to new CTU size */
120
-    }
121
+    BitCost::destroy();
122
 }
123
 
124
 x265_picture *x265_picture_alloc()
125
@@ -321,14 +352,14 @@
126
     pic->userSEI.payloads = NULL;
127
     pic->userSEI.numPayloads = 0;
128
 
129
-    if (param->analysisMode)
130
+    if (param->analysisReuseMode)
131
     {
132
-        uint32_t widthInCU       = (param->sourceWidth  + g_maxCUSize - 1) >> g_maxLog2CUSize;
133
-        uint32_t heightInCU      = (param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
134
+        uint32_t widthInCU = (param->sourceWidth + param->maxCUSize - 1) >> param->maxLog2CUSize;
135
+        uint32_t heightInCU = (param->sourceHeight + param->maxCUSize - 1) >> param->maxLog2CUSize;
136
 
137
         uint32_t numCUsInFrame   = widthInCU * heightInCU;
138
         pic->analysisData.numCUsInFrame = numCUsInFrame;
139
-        pic->analysisData.numPartitions = NUM_4x4_PARTITIONS;
140
+        pic->analysisData.numPartitions = param->num4x4Partitions;
141
     }
142
 }
143
 
144
@@ -372,6 +403,7 @@
145
 
146
     sizeof(x265_frame_stats),
147
     &x265_encoder_intra_refresh,
148
+    &x265_encoder_ctu_info,
149
 };
150
 
151
 typedef const x265_api* (*api_get_func)(int bitDepth);
152
x265_2.4.tar.gz/source/encoder/dpb.cpp -> x265_2.5.tar.gz/source/encoder/dpb.cpp Changed
34
 
1
@@ -105,6 +105,23 @@
2
                 }
3
             }
4
 
5
+            if (curFrame->m_ctuInfo != NULL)
6
+            {
7
+                uint32_t widthInCU = (curFrame->m_param->sourceWidth + curFrame->m_param->maxCUSize - 1) >> curFrame->m_param->maxLog2CUSize;
8
+                uint32_t heightInCU = (curFrame->m_param->sourceHeight + curFrame->m_param->maxCUSize - 1) >> curFrame->m_param->maxLog2CUSize;
9
+                uint32_t numCUsInFrame = widthInCU * heightInCU;
10
+                for (uint32_t i = 0; i < numCUsInFrame; i++)
11
+                {
12
+                    X265_FREE((*curFrame->m_ctuInfo + i)->ctuInfo);
13
+                    (*curFrame->m_ctuInfo + i)->ctuInfo = NULL;
14
+                }
15
+                X265_FREE(*curFrame->m_ctuInfo);
16
+                *(curFrame->m_ctuInfo) = NULL;
17
+                X265_FREE(curFrame->m_ctuInfo);
18
+                curFrame->m_ctuInfo = NULL;
19
+                X265_FREE(curFrame->m_prevCtuInfoChange);
20
+                curFrame->m_prevCtuInfoChange = NULL;
21
+            }
22
             curFrame->m_encData = NULL;
23
             curFrame->m_reconPic = NULL;
24
         }
25
@@ -187,7 +204,7 @@
26
     }
27
 
28
     // Disable Loopfilter in bound area, because we will do slice-parallelism in future
29
-    slice->m_sLFaseFlag = (g_maxSlices > 1) ? false : ((SLFASE_CONSTANT & (1 << (pocCurr % 31))) > 0);
30
+    slice->m_sLFaseFlag = (newFrame->m_param->maxSlices > 1) ? false : ((SLFASE_CONSTANT & (1 << (pocCurr % 31))) > 0);
31
 
32
     /* Increment reference count of all motion-referenced frames to prevent them
33
      * from being recycled. These counts are decremented at the end of
34
x265_2.4.tar.gz/source/encoder/encoder.cpp -> x265_2.5.tar.gz/source/encoder/encoder.cpp Changed
1421
 
1
@@ -86,8 +86,10 @@
2
         m_frameEncoder[i] = NULL;
3
     MotionEstimate::initScales();
4
 
5
-#if ENABLE_DYNAMIC_HDR10
6
+#if ENABLE_HDR10_PLUS
7
     m_hdr10plus_api = hdr10plus_api_get();
8
+    numCimInfo = 0;
9
+    cim = NULL;
10
 #endif
11
 
12
     m_prevTonemapPayload.payload = NULL;
13
@@ -132,26 +134,19 @@
14
     if (!p->bEnableWavefront && !p->bDistributeModeAnalysis && !p->bDistributeMotionEstimation && !p->lookaheadSlices)
15
         allowPools = false;
16
 
17
-    if (!p->frameNumThreads)
18
-    {
19
-        // auto-detect frame threads
20
-        int cpuCount = ThreadPool::getCpuCount();
21
-        if (!p->bEnableWavefront)
22
-            p->frameNumThreads = X265_MIN3(cpuCount, (rows + 1) / 2, X265_MAX_FRAME_THREADS);
23
-        else if (cpuCount >= 32)
24
-            p->frameNumThreads = (p->sourceHeight > 2000) ? 8 : 6; // dual-socket 10-core IvyBridge or higher
25
-        else if (cpuCount >= 16)
26
-            p->frameNumThreads = 5; // 8 HT cores, or dual socket
27
-        else if (cpuCount >= 8)
28
-            p->frameNumThreads = 3; // 4 HT cores
29
-        else if (cpuCount >= 4)
30
-            p->frameNumThreads = 2; // Dual or Quad core
31
-        else
32
-            p->frameNumThreads = 1;
33
-    }
34
     m_numPools = 0;
35
     if (allowPools)
36
         m_threadPool = ThreadPool::allocThreadPools(p, m_numPools, 0);
37
+    else
38
+    {
39
+        if (!p->frameNumThreads)
40
+        {
41
+            // auto-detect frame threads
42
+            int cpuCount = ThreadPool::getCpuCount();
43
+            ThreadPool::getFrameThreadsCount(p, cpuCount);
44
+        }
45
+    }
46
+
47
     if (!m_numPools)
48
     {
49
         // issue warnings if any of these features were requested
50
@@ -320,8 +315,8 @@
51
     else
52
         m_scalingList.setupQuantMatrices(m_sps.chromaFormatIdc);
53
 
54
-    int numRows = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
55
-    int numCols = (m_param->sourceWidth  + g_maxCUSize - 1) / g_maxCUSize;
56
+    int numRows = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
57
+    int numCols = (m_param->sourceWidth  + m_param->maxCUSize - 1) / m_param->maxCUSize;
58
     for (int i = 0; i < m_param->frameNumThreads; i++)
59
     {
60
         if (!m_frameEncoder[i]->init(this, numRows, numCols))
61
@@ -346,12 +341,12 @@
62
 
63
     initRefIdx();
64
 
65
-    if (m_param->analysisMode)
66
+    if (m_param->analysisReuseMode)
67
     {
68
-        const char* name = m_param->analysisFileName;
69
+        const char* name = m_param->analysisReuseFileName;
70
         if (!name)
71
             name = defaultAnalysisFileName;
72
-        const char* mode = m_param->analysisMode == X265_ANALYSIS_LOAD ? "rb" : "wb";
73
+        const char* mode = m_param->analysisReuseMode == X265_ANALYSIS_LOAD ? "rb" : "wb";
74
         m_analysisFile = x265_fopen(name, mode);
75
         if (!m_analysisFile)
76
         {
77
@@ -362,7 +357,7 @@
78
 
79
     if (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion)
80
     {
81
-        const char* name = m_param->analysisFileName;
82
+        const char* name = m_param->analysisReuseFileName;
83
         if (!name)
84
             name = defaultAnalysisFileName;
85
         if (m_param->rc.bStatWrite)
86
@@ -431,6 +426,10 @@
87
 
88
 void Encoder::destroy()
89
 {
90
+#if ENABLE_HDR10_PLUS
91
+    m_hdr10plus_api->hdr10plus_clear_movie(cim, numCimInfo);
92
+#endif
93
+        
94
     if (m_exportedPic)
95
     {
96
         ATOMIC_DEC(&m_exportedPic->m_countRefEncoders);
97
@@ -482,7 +481,7 @@
98
     {
99
         int bError = 1;
100
         fclose(m_analysisFileOut);
101
-        const char* name = m_param->analysisFileName;
102
+        const char* name = m_param->analysisReuseFileName;
103
         if (!name)
104
             name = defaultAnalysisFileName;
105
         char* temp = strcatFilename(name, ".temp");
106
@@ -499,11 +498,14 @@
107
      }
108
     if (m_param)
109
     {
110
+        if (m_param->csvfpt)
111
+            fclose(m_param->csvfpt);
112
         /* release string arguments that were strdup'd */
113
         free((char*)m_param->rc.lambdaFileName);
114
         free((char*)m_param->rc.statFileName);
115
-        free((char*)m_param->analysisFileName);
116
+        free((char*)m_param->analysisReuseFileName);
117
         free((char*)m_param->scalingLists);
118
+        free((char*)m_param->csvfn);
119
         free((char*)m_param->numaPools);
120
         free((char*)m_param->masteringDisplayColorVolume);
121
         free((char*)m_param->toneMapFile);
122
@@ -518,7 +520,7 @@
123
         FrameEncoder *encoder = m_frameEncoder[i];
124
         if (encoder->m_rce.isActive && encoder->m_rce.poc != rc->m_curSlice->m_poc)
125
         {
126
-            int64_t bits = (int64_t) X265_MAX(encoder->m_rce.frameSizeEstimated, encoder->m_rce.frameSizePlanned);
127
+            int64_t bits = m_param->rc.bEnableConstVbv ? (int64_t)encoder->m_rce.frameSizePlanned : (int64_t)X265_MAX(encoder->m_rce.frameSizeEstimated, encoder->m_rce.frameSizePlanned);
128
             rc->m_bufferFill -= bits;
129
             rc->m_bufferFill = X265_MAX(rc->m_bufferFill, 0);
130
             rc->m_bufferFill += encoder->m_rce.bufferRate;
131
@@ -593,6 +595,8 @@
132
 
133
     if (m_exportedPic)
134
     {
135
+        if (!m_param->bUseAnalysisFile && m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
136
+            freeAnalysis(&m_exportedPic->m_analysisData);
137
         ATOMIC_DEC(&m_exportedPic->m_countRefEncoders);
138
         m_exportedPic = NULL;
139
         m_dpb->recycleUnreferenced();
140
@@ -601,16 +605,22 @@
141
     {
142
         x265_sei_payload toneMap;
143
         toneMap.payload = NULL;
144
-#if ENABLE_DYNAMIC_HDR10
145
+#if ENABLE_HDR10_PLUS
146
         if (m_bToneMap)
147
         {
148
-            uint8_t *cim = NULL;
149
-            if (m_hdr10plus_api->hdr10plus_json_to_frame_cim(m_param->toneMapFile, pic_in->poc, cim))
150
-            {
151
-                toneMap.payload = (uint8_t*)x265_malloc(sizeof(uint8_t) * cim[0]);
152
-                toneMap.payloadSize = cim[0];
153
+            if (pic_in->poc == 0)
154
+                numCimInfo = m_hdr10plus_api->hdr10plus_json_to_movie_cim(m_param->toneMapFile, cim);
155
+            if (pic_in->poc < numCimInfo)
156
+            {
157
+                int32_t i = 0;
158
+                toneMap.payloadSize = 0;
159
+                while (cim[pic_in->poc][i] == 0xFF)
160
+                    toneMap.payloadSize += cim[pic_in->poc][i++];
161
+                toneMap.payloadSize += cim[pic_in->poc][i++];
162
+
163
+                toneMap.payload = (uint8_t*)x265_malloc(sizeof(uint8_t) * toneMap.payloadSize);
164
                 toneMap.payloadType = USER_DATA_REGISTERED_ITU_T_T35;
165
-                memcpy(toneMap.payload, cim, toneMap.payloadSize);
166
+                memcpy(toneMap.payload, cim[pic_in->poc] + i, toneMap.payloadSize);
167
             }
168
         }
169
 #endif
170
@@ -708,7 +718,7 @@
171
             for (int i = 0; i < numPayloads; i++)
172
             {
173
                 x265_sei_payload input;
174
-                if (i == (numPayloads - 1))
175
+                if ((i == (numPayloads - 1)) && toneMapEnable)
176
                     input = toneMap;
177
                 else
178
                     input = pic_in->userSEI.payloads[i];
179
@@ -754,24 +764,40 @@
180
 
181
         /* In analysisSave mode, x265_analysis_data is allocated in pic_in and inFrame points to this */
182
         /* Load analysis data before lookahead->addPicture, since sliceType has been decided */
183
-        if (m_param->analysisMode == X265_ANALYSIS_LOAD)
184
+        if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD)
185
         {
186
-            x265_picture* inputPic = const_cast<x265_picture*>(pic_in);
187
             /* readAnalysisFile reads analysis data for the frame and allocates memory based on slicetype */
188
-            readAnalysisFile(&inputPic->analysisData, inFrame->m_poc);
189
-            inFrame->m_analysisData.poc = inFrame->m_poc;
190
-            inFrame->m_analysisData.sliceType = inputPic->analysisData.sliceType;
191
-            inFrame->m_analysisData.bScenecut = inputPic->analysisData.bScenecut;
192
-            inFrame->m_analysisData.satdCost = inputPic->analysisData.satdCost;
193
-            inFrame->m_analysisData.numCUsInFrame = inputPic->analysisData.numCUsInFrame;
194
-            inFrame->m_analysisData.numPartitions = inputPic->analysisData.numPartitions;
195
-            inFrame->m_analysisData.wt = inputPic->analysisData.wt;
196
-            inFrame->m_analysisData.interData = inputPic->analysisData.interData;
197
-            inFrame->m_analysisData.intraData = inputPic->analysisData.intraData;
198
-            sliceType = inputPic->analysisData.sliceType;
199
+            readAnalysisFile(&inFrame->m_analysisData, inFrame->m_poc, pic_in);
200
+            sliceType = inFrame->m_analysisData.sliceType;
201
             inFrame->m_lowres.bScenecut = !!inFrame->m_analysisData.bScenecut;
202
             inFrame->m_lowres.satdCost = inFrame->m_analysisData.satdCost;
203
         }
204
+        if (m_param->bUseRcStats && pic_in->rcData)
205
+        {
206
+            RcStats* rc = (RcStats*)pic_in->rcData;
207
+            m_rateControl->m_accumPQp = rc->cumulativePQp;
208
+            m_rateControl->m_accumPNorm = rc->cumulativePNorm;
209
+            m_rateControl->m_isNextGop = true;
210
+            for (int j = 0; j < 3; j++)
211
+                m_rateControl->m_lastQScaleFor[j] = rc->lastQScaleFor[j];
212
+            m_rateControl->m_wantedBitsWindow = rc->wantedBitsWindow;
213
+            m_rateControl->m_cplxrSum = rc->cplxrSum;
214
+            m_rateControl->m_totalBits = rc->totalBits;
215
+            m_rateControl->m_encodedBits = rc->encodedBits;
216
+            m_rateControl->m_shortTermCplxSum = rc->shortTermCplxSum;
217
+            m_rateControl->m_shortTermCplxCount = rc->shortTermCplxCount;
218
+            if (m_rateControl->m_isVbv)
219
+            {
220
+                m_rateControl->m_bufferFillFinal = rc->bufferFillFinal;
221
+                for (int i = 0; i < 4; i++)
222
+                {
223
+                    m_rateControl->m_pred[i].coeff = rc->coeff[i];
224
+                    m_rateControl->m_pred[i].count = rc->count[i];
225
+                    m_rateControl->m_pred[i].offset = rc->offset[i];
226
+                }
227
+            }
228
+            m_param->bUseRcStats = 0;
229
+        }
230
         if (m_reconfigureRc)
231
             inFrame->m_reconfigureRc = true;
232
 
233
@@ -805,7 +831,7 @@
234
             x265_frame_stats* frameData = NULL;
235
 
236
             /* Free up pic_in->analysisData since it has already been used */
237
-            if (m_param->analysisMode == X265_ANALYSIS_LOAD)
238
+            if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD)
239
                 freeAnalysis(&outFrame->m_analysisData);
240
 
241
             if (pic_out)
242
@@ -819,20 +845,7 @@
243
 
244
                 pic_out->pts = outFrame->m_pts;
245
                 pic_out->dts = outFrame->m_dts;
246
-
247
-                switch (slice->m_sliceType)
248
-                {
249
-                case I_SLICE:
250
-                    pic_out->sliceType = outFrame->m_lowres.bKeyframe ? X265_TYPE_IDR : X265_TYPE_I;
251
-                    break;
252
-                case P_SLICE:
253
-                    pic_out->sliceType = X265_TYPE_P;
254
-                    break;
255
-                case B_SLICE:
256
-                    pic_out->sliceType = X265_TYPE_B;
257
-                    break;
258
-                }
259
-
260
+                pic_out->sliceType = outFrame->m_lowres.sliceType;
261
                 pic_out->planes[0] = recpic->m_picOrg[0];
262
                 pic_out->stride[0] = (int)(recpic->m_stride * sizeof(pixel));
263
                 if (m_param->internalCsp != X265_CSP_I400)
264
@@ -844,7 +857,7 @@
265
                 }
266
 
267
                 /* Dump analysis data from pic_out to file in save mode and free */
268
-                if (m_param->analysisMode == X265_ANALYSIS_SAVE)
269
+                if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
270
                 {
271
                     pic_out->analysisData.poc = pic_out->poc;
272
                     pic_out->analysisData.sliceType = pic_out->sliceType;
273
@@ -856,7 +869,8 @@
274
                     pic_out->analysisData.interData = outFrame->m_analysisData.interData;
275
                     pic_out->analysisData.intraData = outFrame->m_analysisData.intraData;
276
                     writeAnalysisFile(&pic_out->analysisData, *outFrame->m_encData);
277
-                    freeAnalysis(&pic_out->analysisData);
278
+                    if (m_param->bUseAnalysisFile)
279
+                        freeAnalysis(&pic_out->analysisData);
280
                 }
281
             }
282
             if (m_param->rc.bStatWrite && (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion))
283
@@ -1012,16 +1026,17 @@
284
                 Slice* slice = frameEnc->m_encData->m_slice;
285
                 slice->m_sps = &m_sps;
286
                 slice->m_pps = &m_pps;
287
+                slice->m_param = m_param;
288
                 slice->m_maxNumMergeCand = m_param->maxNumMergeCand;
289
-                slice->m_endCUAddr = slice->realEndAddress(m_sps.numCUsInFrame * NUM_4x4_PARTITIONS);
290
+                slice->m_endCUAddr = slice->realEndAddress(m_sps.numCUsInFrame * m_param->num4x4Partitions);
291
             }
292
 
293
             if (m_param->searchMethod == X265_SEA && frameEnc->m_lowres.sliceType != X265_TYPE_B)
294
             {
295
-                int padX = g_maxCUSize + 32;
296
-                int padY = g_maxCUSize + 16;
297
-                uint32_t numCuInHeight = (frameEnc->m_encData->m_reconPic->m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
298
-                int maxHeight = numCuInHeight * g_maxCUSize;
299
+                int padX = m_param->maxCUSize + 32;
300
+                int padY = m_param->maxCUSize + 16;
301
+                uint32_t numCuInHeight = (frameEnc->m_encData->m_reconPic->m_picHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
302
+                int maxHeight = numCuInHeight * m_param->maxCUSize;
303
                 for (int i = 0; i < INTEGRAL_PLANE_NUM; i++)
304
                 {
305
                     frameEnc->m_encData->m_meBuffer[i] = X265_MALLOC(uint32_t, frameEnc->m_reconPic->m_stride * (maxHeight + (2 * padY)));
306
@@ -1080,17 +1095,17 @@
307
                 frameEnc->m_dts = frameEnc->m_reorderedPts;
308
 
309
             /* Allocate analysis data before encode in save mode. This is allocated in frameEnc */
310
-            if (m_param->analysisMode == X265_ANALYSIS_SAVE)
311
+            if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
312
             {
313
                 x265_analysis_data* analysis = &frameEnc->m_analysisData;
314
                 analysis->poc = frameEnc->m_poc;
315
                 analysis->sliceType = frameEnc->m_lowres.sliceType;
316
-                uint32_t widthInCU       = (m_param->sourceWidth  + g_maxCUSize - 1) >> g_maxLog2CUSize;
317
-                uint32_t heightInCU      = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
318
+                uint32_t widthInCU       = (m_param->sourceWidth  + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
319
+                uint32_t heightInCU      = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
320
 
321
                 uint32_t numCUsInFrame   = widthInCU * heightInCU;
322
                 analysis->numCUsInFrame  = numCUsInFrame;
323
-                analysis->numPartitions  = NUM_4x4_PARTITIONS;
324
+                analysis->numPartitions  = m_param->num4x4Partitions;
325
                 allocAnalysis(analysis);
326
             }
327
             /* determine references, setup RPS, etc */
328
@@ -1157,6 +1172,120 @@
329
     return x265_check_params(encParam);
330
 }
331
 
332
+void Encoder::copyCtuInfo(x265_ctu_info_t** frameCtuInfo, int poc)
333
+{
334
+    uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
335
+    uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
336
+    Frame* curFrame;
337
+    Frame* prevFrame = NULL;
338
+    int32_t* frameCTU;
339
+    uint32_t numCUsInFrame = widthInCU * heightInCU;
340
+    uint32_t maxNum8x8Partitions = 64;
341
+    bool copied = false;
342
+    do
343
+    {
344
+        curFrame = m_lookahead->m_inputQueue.getPOC(poc);
345
+        if (!curFrame)
346
+            curFrame = m_lookahead->m_outputQueue.getPOC(poc);
347
+
348
+        if (poc > 0)
349
+        {
350
+            prevFrame = m_lookahead->m_inputQueue.getPOC(poc - 1);
351
+            if (!prevFrame)
352
+                prevFrame = m_lookahead->m_outputQueue.getPOC(poc - 1);
353
+            if (!prevFrame)
354
+            {
355
+                FrameEncoder* prevEncoder;
356
+                for (int i = 0; i < m_param->frameNumThreads; i++)
357
+                {
358
+                    prevEncoder = m_frameEncoder[i];
359
+                    prevFrame = prevEncoder->m_frame;
360
+                    if (prevFrame && (prevEncoder->m_frame->m_poc == poc - 1))
361
+                    {
362
+                        prevFrame = prevEncoder->m_frame;
363
+                        break;
364
+                    }
365
+                }
366
+            }
367
+        }
368
+        x265_ctu_info_t* ctuTemp, *prevCtuTemp;
369
+        if (curFrame)
370
+        {
371
+            if (!curFrame->m_ctuInfo)
372
+                CHECKED_MALLOC(curFrame->m_ctuInfo, x265_ctu_info_t*, 1);
373
+            CHECKED_MALLOC(*curFrame->m_ctuInfo, x265_ctu_info_t, numCUsInFrame);
374
+            CHECKED_MALLOC_ZERO(curFrame->m_prevCtuInfoChange, int, numCUsInFrame * maxNum8x8Partitions);
375
+            for (uint32_t i = 0; i < numCUsInFrame; i++)
376
+            {
377
+                ctuTemp = *curFrame->m_ctuInfo + i;
378
+                CHECKED_MALLOC(frameCTU, int32_t, maxNum8x8Partitions);
379
+                ctuTemp->ctuInfo = (int32_t*)frameCTU;
380
+                ctuTemp->ctuAddress = frameCtuInfo[i]->ctuAddress;
381
+                memcpy(ctuTemp->ctuPartitions, frameCtuInfo[i]->ctuPartitions, sizeof(int32_t) * maxNum8x8Partitions);
382
+                memcpy(ctuTemp->ctuInfo, frameCtuInfo[i]->ctuInfo, sizeof(int32_t) * maxNum8x8Partitions);
383
+                if (prevFrame && curFrame->m_poc > 1)
384
+                {
385
+                    prevCtuTemp = *prevFrame->m_ctuInfo + i;
386
+                    for (uint32_t j = 0; j < maxNum8x8Partitions; j++)
387
+                        curFrame->m_prevCtuInfoChange[i * maxNum8x8Partitions + j] = (*((int32_t *)prevCtuTemp->ctuInfo + j) == 2) ? (poc - 1) : prevFrame->m_prevCtuInfoChange[i * maxNum8x8Partitions + j];
388
+                }
389
+            }
390
+            copied = true;
391
+            curFrame->m_copied.trigger();
392
+        }
393
+        else
394
+        {
395
+            FrameEncoder* curEncoder;
396
+            for (int i = 0; i < m_param->frameNumThreads; i++)
397
+            {
398
+                curEncoder = m_frameEncoder[i];
399
+                curFrame = curEncoder->m_frame;
400
+                if (curFrame)
401
+                {
402
+                    if (poc == curFrame->m_poc)
403
+                    {
404
+                        if (!curFrame->m_ctuInfo)
405
+                            CHECKED_MALLOC(curFrame->m_ctuInfo, x265_ctu_info_t*, 1);
406
+                        CHECKED_MALLOC(*curFrame->m_ctuInfo, x265_ctu_info_t, numCUsInFrame);
407
+                        CHECKED_MALLOC_ZERO(curFrame->m_prevCtuInfoChange, int, numCUsInFrame * maxNum8x8Partitions);
408
+                        for (uint32_t l = 0; l < numCUsInFrame; l++)
409
+                        {
410
+                            ctuTemp = *curFrame->m_ctuInfo + l;
411
+                            CHECKED_MALLOC(frameCTU, int32_t, maxNum8x8Partitions);
412
+                            ctuTemp->ctuInfo = (int32_t*)frameCTU;
413
+                            ctuTemp->ctuAddress = frameCtuInfo[l]->ctuAddress;
414
+                            memcpy(ctuTemp->ctuPartitions, frameCtuInfo[l]->ctuPartitions, sizeof(int32_t) * maxNum8x8Partitions);
415
+                            memcpy(ctuTemp->ctuInfo, frameCtuInfo[l]->ctuInfo, sizeof(int32_t) * maxNum8x8Partitions);
416
+                            if (prevFrame && curFrame->m_poc > 1)
417
+                            {
418
+                                prevCtuTemp = *prevFrame->m_ctuInfo + l;
419
+                                for (uint32_t j = 0; j < maxNum8x8Partitions; j++)
420
+                                    curFrame->m_prevCtuInfoChange[l * maxNum8x8Partitions + j] = (*((int32_t *)prevCtuTemp->ctuInfo + j) == CTU_INFO_CHANGE) ? (poc - 1) : prevFrame->m_prevCtuInfoChange[l * maxNum8x8Partitions + j];
421
+                            }
422
+                        }
423
+                        copied = true;
424
+                        curFrame->m_copied.trigger();
425
+                        break;
426
+                    }
427
+                }
428
+            }
429
+        }
430
+    } while (!copied);
431
+    return;
432
+fail:
433
+    for (uint32_t i = 0; i < numCUsInFrame; i++)
434
+    {
435
+        X265_FREE((*curFrame->m_ctuInfo + i)->ctuInfo);
436
+        (*curFrame->m_ctuInfo + i)->ctuInfo = NULL;
437
+    }
438
+    X265_FREE(*curFrame->m_ctuInfo);
439
+    *(curFrame->m_ctuInfo) = NULL;
440
+    X265_FREE(curFrame->m_ctuInfo);
441
+    curFrame->m_ctuInfo = NULL;
442
+    X265_FREE(curFrame->m_prevCtuInfoChange);
443
+    curFrame->m_prevCtuInfoChange = NULL;
444
+}
445
+
446
 void EncStats::addPsnr(double psnrY, double psnrU, double psnrV)
447
 {
448
     m_psnrSumY += psnrY;
449
@@ -1286,7 +1415,7 @@
450
     /* Summarize stats from all frame encoders */
451
     CUStats cuStats;
452
     for (int i = 0; i < m_param->frameNumThreads; i++)
453
-        cuStats.accumulate(m_frameEncoder[i]->m_cuStats);
454
+        cuStats.accumulate(m_frameEncoder[i]->m_cuStats, *m_param);
455
 
456
     if (!cuStats.totalCTUTime)
457
         return;
458
@@ -1307,7 +1436,7 @@
459
 
460
     int64_t interRDOTotalTime = 0, intraRDOTotalTime = 0;
461
     uint64_t interRDOTotalCount = 0, intraRDOTotalCount = 0;
462
-    for (uint32_t i = 0; i <= g_maxCUDepth; i++)
463
+    for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
464
     {
465
         interRDOTotalTime += cuStats.interRDOElapsedTime[i];
466
         intraRDOTotalTime += cuStats.intraRDOElapsedTime[i];
467
@@ -1417,7 +1546,7 @@
468
     }
469
 
470
     x265_log(m_param, X265_LOG_INFO, "CU: " X265_LL " %dX%d CTUs compressed in %.3lf seconds, %.3lf CTUs per worker-second\n",
471
-             cuStats.totalCTUs, g_maxCUSize, g_maxCUSize,
472
+             cuStats.totalCTUs, m_param->maxCUSize, m_param->maxCUSize,
473
              ELAPSED_SEC(totalWorkerTime),
474
              cuStats.totalCTUs / ELAPSED_SEC(totalWorkerTime));
475
 
476
@@ -1578,6 +1707,8 @@
477
         frameStats->qp = curEncData.m_avgQpAq;
478
         frameStats->bits = bits;
479
         frameStats->bScenecut = curFrame->m_lowres.bScenecut;
480
+        if (m_param->csvLogLevel >= 2)
481
+            frameStats->ipCostRatio = curFrame->m_lowres.ipCostRatio;
482
         frameStats->bufferFill = m_rateControl->m_bufferFillActual;
483
         frameStats->frameLatency = inPoc - poc;
484
         if (m_param->rc.rateControlMode == X265_RC_CRF)
485
@@ -1602,35 +1733,83 @@
486
 
487
 #define ELAPSED_MSEC(start, end) (((double)(end) - (start)) / 1000)
488
 
489
-        frameStats->decideWaitTime = ELAPSED_MSEC(0, curEncoder->m_slicetypeWaitTime);
490
-        frameStats->row0WaitTime = ELAPSED_MSEC(curEncoder->m_startCompressTime, curEncoder->m_row0WaitTime);
491
-        frameStats->wallTime = ELAPSED_MSEC(curEncoder->m_row0WaitTime, curEncoder->m_endCompressTime);
492
-        frameStats->refWaitWallTime = ELAPSED_MSEC(curEncoder->m_row0WaitTime, curEncoder->m_allRowsAvailableTime);
493
-        frameStats->totalCTUTime = ELAPSED_MSEC(0, curEncoder->m_totalWorkerElapsedTime);
494
-        frameStats->stallTime = ELAPSED_MSEC(0, curEncoder->m_totalNoWorkerTime);
495
-        frameStats->totalFrameTime = ELAPSED_MSEC(curFrame->m_encodeStartTime, x265_mdate());
496
-        if (curEncoder->m_totalActiveWorkerCount)
497
-            frameStats->avgWPP = (double)curEncoder->m_totalActiveWorkerCount / curEncoder->m_activeWorkerCountSamples;
498
-        else
499
-            frameStats->avgWPP = 1;
500
-        frameStats->countRowBlocks = curEncoder->m_countRowBlocks;
501
+        frameStats->maxLumaLevel = curFrame->m_fencPic->m_maxLumaLevel;
502
+        frameStats->minLumaLevel = curFrame->m_fencPic->m_minLumaLevel;
503
+        frameStats->avgLumaLevel = curFrame->m_fencPic->m_avgLumaLevel;
504
+
505
+        if (m_param->csvLogLevel >= 2)
506
+        {
507
+            frameStats->decideWaitTime = ELAPSED_MSEC(0, curEncoder->m_slicetypeWaitTime);
508
+            frameStats->row0WaitTime = ELAPSED_MSEC(curEncoder->m_startCompressTime, curEncoder->m_row0WaitTime);
509
+            frameStats->wallTime = ELAPSED_MSEC(curEncoder->m_row0WaitTime, curEncoder->m_endCompressTime);
510
+            frameStats->refWaitWallTime = ELAPSED_MSEC(curEncoder->m_row0WaitTime, curEncoder->m_allRowsAvailableTime);
511
+            frameStats->totalCTUTime = ELAPSED_MSEC(0, curEncoder->m_totalWorkerElapsedTime);
512
+            frameStats->stallTime = ELAPSED_MSEC(0, curEncoder->m_totalNoWorkerTime);
513
+            frameStats->totalFrameTime = ELAPSED_MSEC(curFrame->m_encodeStartTime, x265_mdate());
514
+            if (curEncoder->m_totalActiveWorkerCount)
515
+                frameStats->avgWPP = (double)curEncoder->m_totalActiveWorkerCount / curEncoder->m_activeWorkerCountSamples;
516
+            else
517
+                frameStats->avgWPP = 1;
518
+            frameStats->countRowBlocks = curEncoder->m_countRowBlocks;
519
+
520
+            frameStats->avgChromaDistortion = curFrame->m_encData->m_frameStats.avgChromaDistortion;
521
+            frameStats->avgLumaDistortion = curFrame->m_encData->m_frameStats.avgLumaDistortion;
522
+            frameStats->avgPsyEnergy = curFrame->m_encData->m_frameStats.avgPsyEnergy;
523
+            frameStats->avgResEnergy = curFrame->m_encData->m_frameStats.avgResEnergy;
524
+
525
+            frameStats->maxChromaULevel = curFrame->m_fencPic->m_maxChromaULevel;
526
+            frameStats->minChromaULevel = curFrame->m_fencPic->m_minChromaULevel;
527
+            frameStats->avgChromaULevel = curFrame->m_fencPic->m_avgChromaULevel;
528
+
529
+            frameStats->maxChromaVLevel = curFrame->m_fencPic->m_maxChromaVLevel;
530
+            frameStats->minChromaVLevel = curFrame->m_fencPic->m_minChromaVLevel;
531
+            frameStats->avgChromaVLevel = curFrame->m_fencPic->m_avgChromaVLevel;
532
+
533
+            if (curFrame->m_encData->m_frameStats.totalPu[4] == 0)
534
+                frameStats->puStats.percentNxN = 0;
535
+            else
536
+                frameStats->puStats.percentNxN = (double)(curFrame->m_encData->m_frameStats.cnt4x4 / (double)curFrame->m_encData->m_frameStats.totalPu[4]) * 100;
537
+            for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
538
+            {
539
+                if (curFrame->m_encData->m_frameStats.totalPu[depth] == 0)
540
+                {
541
+                    frameStats->puStats.percentSkipPu[depth] = 0;
542
+                    frameStats->puStats.percentIntraPu[depth] = 0;
543
+                    frameStats->puStats.percentAmpPu[depth] = 0;
544
+                    for (int i = 0; i < INTER_MODES - 1; i++)
545
+                    {
546
+                        frameStats->puStats.percentInterPu[depth][i] = 0;
547
+                        frameStats->puStats.percentMergePu[depth][i] = 0;
548
+                    }
549
+                }
550
+                else
551
+                {
552
+                    frameStats->puStats.percentSkipPu[depth] = (double)(curFrame->m_encData->m_frameStats.cntSkipPu[depth] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
553
+                    frameStats->puStats.percentIntraPu[depth] = (double)(curFrame->m_encData->m_frameStats.cntIntraPu[depth] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
554
+                    frameStats->puStats.percentAmpPu[depth] = (double)(curFrame->m_encData->m_frameStats.cntAmp[depth] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
555
+                    for (int i = 0; i < INTER_MODES - 1; i++)
556
+                    {
557
+                        frameStats->puStats.percentInterPu[depth][i] = (double)(curFrame->m_encData->m_frameStats.cntInterPu[depth][i] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
558
+                        frameStats->puStats.percentMergePu[depth][i] = (double)(curFrame->m_encData->m_frameStats.cntMergePu[depth][i] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
559
+                    }
560
+                }
561
+            }
562
+        }
563
+
564
+        if (m_param->csvLogLevel >= 1)
565
+        {
566
+            frameStats->cuStats.percentIntraNxN = curFrame->m_encData->m_frameStats.percentIntraNxN;
567
 
568
-        frameStats->cuStats.percentIntraNxN = curFrame->m_encData->m_frameStats.percentIntraNxN;
569
-        frameStats->avgChromaDistortion     = curFrame->m_encData->m_frameStats.avgChromaDistortion;
570
-        frameStats->avgLumaDistortion       = curFrame->m_encData->m_frameStats.avgLumaDistortion;
571
-        frameStats->avgPsyEnergy            = curFrame->m_encData->m_frameStats.avgPsyEnergy;
572
-        frameStats->avgResEnergy            = curFrame->m_encData->m_frameStats.avgResEnergy;
573
-        frameStats->avgLumaLevel            = curFrame->m_fencPic->m_avgLumaLevel;
574
-        frameStats->maxLumaLevel            = curFrame->m_fencPic->m_maxLumaLevel;
575
-        for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
576
-        {
577
-            frameStats->cuStats.percentSkipCu[depth]  = curFrame->m_encData->m_frameStats.percentSkipCu[depth];
578
-            frameStats->cuStats.percentMergeCu[depth] = curFrame->m_encData->m_frameStats.percentMergeCu[depth];
579
-            frameStats->cuStats.percentInterDistribution[depth][0] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][0];
580
-            frameStats->cuStats.percentInterDistribution[depth][1] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][1];
581
-            frameStats->cuStats.percentInterDistribution[depth][2] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][2];
582
-            for (int n = 0; n < INTRA_MODES; n++)
583
-                frameStats->cuStats.percentIntraDistribution[depth][n] = curFrame->m_encData->m_frameStats.percentIntraDistribution[depth][n];
584
+            for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
585
+            {
586
+                frameStats->cuStats.percentSkipCu[depth] = curFrame->m_encData->m_frameStats.percentSkipCu[depth];
587
+                frameStats->cuStats.percentMergeCu[depth] = curFrame->m_encData->m_frameStats.percentMergeCu[depth];
588
+                frameStats->cuStats.percentInterDistribution[depth][0] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][0];
589
+                frameStats->cuStats.percentInterDistribution[depth][1] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][1];
590
+                frameStats->cuStats.percentInterDistribution[depth][2] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][2];
591
+                for (int n = 0; n < INTRA_MODES; n++)
592
+                    frameStats->cuStats.percentIntraDistribution[depth][n] = curFrame->m_encData->m_frameStats.percentIntraDistribution[depth][n];
593
+            }
594
         }
595
     }
596
 }
597
@@ -1803,16 +1982,16 @@
598
     sps->chromaFormatIdc = m_param->internalCsp;
599
     sps->picWidthInLumaSamples = m_param->sourceWidth;
600
     sps->picHeightInLumaSamples = m_param->sourceHeight;
601
-    sps->numCuInWidth = (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize;
602
-    sps->numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
603
+    sps->numCuInWidth = (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
604
+    sps->numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
605
     sps->numCUsInFrame = sps->numCuInWidth * sps->numCuInHeight;
606
-    sps->numPartitions = NUM_4x4_PARTITIONS;
607
-    sps->numPartInCUSize = 1 << g_unitSizeDepth;
608
+    sps->numPartitions = m_param->num4x4Partitions;
609
+    sps->numPartInCUSize = 1 << m_param->unitSizeDepth;
610
 
611
-    sps->log2MinCodingBlockSize = g_maxLog2CUSize - g_maxCUDepth;
612
-    sps->log2DiffMaxMinCodingBlockSize = g_maxCUDepth;
613
+    sps->log2MinCodingBlockSize = m_param->maxLog2CUSize - m_param->maxCUDepth;
614
+    sps->log2DiffMaxMinCodingBlockSize = m_param->maxCUDepth;
615
     uint32_t maxLog2TUSize = (uint32_t)g_log2Size[m_param->maxTUSize];
616
-    sps->quadtreeTULog2MaxSize = X265_MIN(g_maxLog2CUSize, maxLog2TUSize);
617
+    sps->quadtreeTULog2MaxSize = X265_MIN((uint32_t)m_param->maxLog2CUSize, maxLog2TUSize);
618
     sps->quadtreeTULog2MinSize = 2;
619
     sps->quadtreeTUMaxDepthInter = m_param->tuQTMaxInterDepth;
620
     sps->quadtreeTUMaxDepthIntra = m_param->tuQTMaxIntraDepth;
621
@@ -1820,7 +1999,7 @@
622
     sps->bUseSAO = m_param->bEnableSAO;
623
 
624
     sps->bUseAMP = m_param->bEnableAMP;
625
-    sps->maxAMPDepth = m_param->bEnableAMP ? g_maxCUDepth : 0;
626
+    sps->maxAMPDepth = m_param->bEnableAMP ? m_param->maxCUDepth : 0;
627
 
628
     sps->maxTempSubLayers = m_param->bEnableTemporalSubLayers ? 2 : 1;
629
     sps->maxDecPicBuffering = m_vps.maxDecPicBuffering;
630
@@ -2034,7 +2213,7 @@
631
         p->lookaheadDepth = p->totalFrames;
632
     if (p->bIntraRefresh)
633
     {
634
-        int numCuInWidth = (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize;
635
+        int numCuInWidth = (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
636
         if (p->maxNumReferences > 1)
637
         {
638
             x265_log(p,  X265_LOG_WARNING, "Max References > 1 + intra-refresh is not supported , setting max num references = 1\n");
639
@@ -2070,23 +2249,68 @@
640
         p->rc.rfConstantMin = 0;
641
     }
642
 
643
-    if (p->analysisMode && (p->bDistributeModeAnalysis || p->bDistributeMotionEstimation))
644
+    if (p->analysisReuseMode && (p->bDistributeModeAnalysis || p->bDistributeMotionEstimation))
645
     {
646
         x265_log(p, X265_LOG_WARNING, "Analysis load/save options incompatible with pmode/pme, Disabling pmode/pme\n");
647
         p->bDistributeMotionEstimation = p->bDistributeModeAnalysis = 0;
648
     }
649
 
650
-    if (p->analysisMode && p->rc.cuTree)
651
+    if (p->analysisReuseMode && p->rc.cuTree)
652
     {
653
         x265_log(p, X265_LOG_WARNING, "Analysis load/save options works only with cu-tree off, Disabling cu-tree\n");
654
         p->rc.cuTree = 0;
655
     }
656
 
657
-    if (p->analysisMode && (p->analysisMultiPassRefine || p->analysisMultiPassDistortion))
658
+    if (p->analysisReuseMode && (p->analysisMultiPassRefine || p->analysisMultiPassDistortion))
659
     {
660
         x265_log(p, X265_LOG_WARNING, "Cannot use Analysis load/save option and multi-pass-opt-analysis/multi-pass-opt-distortion together,"
661
             "Disabling Analysis load/save and multi-pass-opt-analysis/multi-pass-opt-distortion\n");
662
-        p->analysisMode = p->analysisMultiPassRefine = p->analysisMultiPassDistortion = 0;
663
+        p->analysisReuseMode = p->analysisMultiPassRefine = p->analysisMultiPassDistortion = 0;
664
+    }
665
+    if (p->scaleFactor)
666
+    {
667
+        if (p->scaleFactor == 1)
668
+        {
669
+            p->scaleFactor = 0;
670
+        }
671
+        else if (!p->analysisReuseMode || p->analysisReuseLevel < 10)
672
+        {
673
+            x265_log(p, X265_LOG_WARNING, "Input scaling works with analysis-reuse-mode, analysis-reuse-level 10. Disabling scale-factor.\n");
674
+            p->scaleFactor = 0;
675
+        }
676
+    }
677
+
678
+    if (p->intraRefine)
679
+    {
680
+        if (p->analysisReuseMode!= X265_ANALYSIS_LOAD || p->analysisReuseLevel < 10 || !p->scaleFactor)
681
+        {
682
+            x265_log(p, X265_LOG_WARNING, "Intra refinement requires analysis load, analysis-reuse-level 10, scale factor. Disabling intra refine.\n");
683
+            p->intraRefine = 0;
684
+        }
685
+    }
686
+
687
+    if (p->interRefine)
688
+    {
689
+        if (p->analysisReuseMode != X265_ANALYSIS_LOAD || p->analysisReuseLevel < 10 || !p->scaleFactor)
690
+        {
691
+            x265_log(p, X265_LOG_WARNING, "Inter refinement requires analysis load, analysis-reuse-level 10, scale factor. Disabling inter refine.\n");
692
+            p->interRefine = 0;
693
+        }
694
+    }
695
+
696
+    if (p->limitTU && p->interRefine)
697
+    {
698
+        x265_log(p, X265_LOG_WARNING, "Inter refinement does not support limitTU. Disabling limitTU.\n");
699
+        p->limitTU = 0;
700
+    }
701
+
702
+    if (p->mvRefine)
703
+    {
704
+        if (p->analysisReuseMode != X265_ANALYSIS_LOAD || p->analysisReuseLevel < 10 || !p->scaleFactor)
705
+        {
706
+            x265_log(p, X265_LOG_WARNING, "MV refinement requires analysis load, analysis-reuse-level 10, scale factor. Disabling MV refine.\n");
707
+            p->mvRefine = 0;
708
+        }
709
     }
710
 
711
     if ((p->analysisMultiPassRefine || p->analysisMultiPassDistortion) && (p->bDistributeModeAnalysis || p->bDistributeMotionEstimation))
712
@@ -2177,9 +2401,17 @@
713
     m_conformanceWindow.topOffset = 0;
714
     m_conformanceWindow.bottomOffset = 0;
715
     m_conformanceWindow.leftOffset = 0;
716
-
717
     /* set pad size if width is not multiple of the minimum CU size */
718
-    if (p->sourceWidth & (p->minCUSize - 1))
719
+    if (p->scaleFactor == 2 && ((p->sourceWidth / 2) & (p->minCUSize - 1)) && p->analysisReuseMode == X265_ANALYSIS_LOAD)
720
+    {
721
+        uint32_t rem = (p->sourceWidth / 2) & (p->minCUSize - 1);
722
+        uint32_t padsize = p->minCUSize - rem;
723
+        p->sourceWidth += padsize * 2;
724
+
725
+        m_conformanceWindow.bEnabled = true;
726
+        m_conformanceWindow.rightOffset = padsize * 2;
727
+    }
728
+    else if(p->sourceWidth & (p->minCUSize - 1))
729
     {
730
         uint32_t rem = p->sourceWidth & (p->minCUSize - 1);
731
         uint32_t padsize = p->minCUSize - rem;
732
@@ -2228,7 +2460,7 @@
733
         p->dynamicRd = 0;
734
         x265_log(p, X265_LOG_WARNING, "Dynamic-rd disabled, requires RD <= 4, VBV and aq-mode enabled\n");
735
     }
736
-#ifdef ENABLE_DYNAMIC_HDR10
737
+#ifdef ENABLE_HDR10_PLUS
738
     if (m_param->bDhdr10opt && m_param->toneMapFile == NULL)
739
     {
740
         x265_log(p, X265_LOG_WARNING, "Disabling dhdr10-opt. dhdr10-info must be enabled.\n");
741
@@ -2252,7 +2484,7 @@
742
 #else
743
     if (m_param->toneMapFile)
744
     {
745
-        x265_log(p, X265_LOG_WARNING, "--dhdr10-info disabled. Enable dynamic HDR in cmake.\n");
746
+        x265_log(p, X265_LOG_WARNING, "--dhdr10-info disabled. Enable HDR10_PLUS in cmake.\n");
747
         m_bToneMap = 0;
748
         m_param->toneMapFile = NULL;
749
     }
750
@@ -2358,9 +2590,16 @@
751
             x265_log(p, X265_LOG_ERROR, "uhd-bd: Disabled\n");
752
         }
753
     }
754
-
755
     /* set pad size if height is not multiple of the minimum CU size */
756
-    if (p->sourceHeight & (p->minCUSize - 1))
757
+    if (p->scaleFactor == 2 && ((p->sourceHeight / 2) & (p->minCUSize - 1)) && p->analysisReuseMode == X265_ANALYSIS_LOAD)
758
+    {
759
+        uint32_t rem = (p->sourceHeight / 2) & (p->minCUSize - 1);
760
+        uint32_t padsize = p->minCUSize - rem;
761
+        p->sourceHeight += padsize * 2;
762
+        m_conformanceWindow.bEnabled = true;
763
+        m_conformanceWindow.bottomOffset = padsize * 2;
764
+    }
765
+    else if(p->sourceHeight & (p->minCUSize - 1))
766
     {
767
         uint32_t rem = p->sourceHeight & (p->minCUSize - 1);
768
         uint32_t padsize = p->minCUSize - rem;
769
@@ -2372,9 +2611,6 @@
770
     if (p->bLogCuStats)
771
         x265_log(p, X265_LOG_WARNING, "--cu-stats option is now deprecated\n");
772
 
773
-    if (p->csvfn)
774
-        x265_log(p, X265_LOG_WARNING, "libx265 no longer supports CSV file statistics\n");
775
-
776
     if (p->log2MaxPocLsb < 4)
777
     {
778
         x265_log(p, X265_LOG_WARNING, "maximum of the picture order count can not be less than 4\n");
779
@@ -2406,6 +2642,20 @@
780
             p->bHDROpt = 0;
781
         }
782
     }
783
+
784
+    if (m_param->toneMapFile || p->bHDROpt || p->bEmitHDRSEI)
785
+    {
786
+        if (!p->bRepeatHeaders)
787
+        {
788
+            p->bRepeatHeaders = 1;
789
+            x265_log(p, X265_LOG_WARNING, "Turning on repeat-headers for HDR compatibility\n");
790
+        }
791
+    }
792
+
793
+    p->maxLog2CUSize = g_log2Size[p->maxCUSize];
794
+    p->maxCUDepth    = p->maxLog2CUSize - g_log2Size[p->minCUSize];
795
+    p->unitSizeDepth = p->maxLog2CUSize - LOG2_UNIT_SIZE;
796
+    p->num4x4Partitions = (1U << (p->unitSizeDepth << 1));
797
 }
798
 
799
 void Encoder::allocAnalysis(x265_analysis_data* analysis)
800
@@ -2414,7 +2664,7 @@
801
     analysis->interData = analysis->intraData = NULL;
802
     if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
803
     {
804
-        if (m_param->analysisRefineLevel < 2)
805
+        if (m_param->analysisReuseLevel < 2)
806
             return;
807
 
808
         analysis_intra_data *intraData = (analysis_intra_data*)analysis->intraData;
809
@@ -2430,27 +2680,27 @@
810
         int numDir = analysis->sliceType == X265_TYPE_P ? 1 : 2;
811
         uint32_t numPlanes = m_param->internalCsp == X265_CSP_I400 ? 1 : 3;
812
         CHECKED_MALLOC_ZERO(analysis->wt, WeightParam, numPlanes * numDir);
813
-        if (m_param->analysisRefineLevel < 2)
814
+        if (m_param->analysisReuseLevel < 2)
815
             return;
816
 
817
         analysis_inter_data *interData = (analysis_inter_data*)analysis->interData;
818
         CHECKED_MALLOC_ZERO(interData, analysis_inter_data, 1);
819
         CHECKED_MALLOC(interData->depth, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
820
         CHECKED_MALLOC(interData->modes, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
821
-        if (m_param->analysisRefineLevel > 4)
822
+        if (m_param->analysisReuseLevel > 4)
823
         {
824
             CHECKED_MALLOC(interData->partSize, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
825
             CHECKED_MALLOC(interData->mergeFlag, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
826
         }
827
 
828
-        if (m_param->analysisRefineLevel == 10)
829
+        if (m_param->analysisReuseLevel == 10)
830
         {
831
             CHECKED_MALLOC(interData->interDir, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
832
             for (int dir = 0; dir < numDir; dir++)
833
             {
834
                 CHECKED_MALLOC(interData->mvpIdx[dir], uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
835
                 CHECKED_MALLOC(interData->refIdx[dir], int8_t, analysis->numPartitions * analysis->numCUsInFrame);
836
-               CHECKED_MALLOC(interData->mv[dir], MV, analysis->numPartitions * analysis->numCUsInFrame);
837
+                CHECKED_MALLOC(interData->mv[dir], MV, analysis->numPartitions * analysis->numCUsInFrame);
838
             }
839
 
840
             /* Allocate intra in inter */
841
@@ -2480,51 +2730,56 @@
842
     /* Early exit freeing weights alone if level is 1 (when there is no analysis inter/intra) */
843
     if (analysis->sliceType > X265_TYPE_I && analysis->wt)
844
         X265_FREE(analysis->wt);
845
-    if (m_param->analysisRefineLevel < 2)
846
+    if (m_param->analysisReuseLevel < 2)
847
         return;
848
 
849
-    if (analysis->intraData)
850
+    if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
851
     {
852
-        if (m_param->analysisRefineLevel < 2)
853
-            return;
854
-
855
-        X265_FREE(((analysis_intra_data*)analysis->intraData)->depth);
856
-        X265_FREE(((analysis_intra_data*)analysis->intraData)->modes);
857
-        X265_FREE(((analysis_intra_data*)analysis->intraData)->partSizes);
858
-        X265_FREE(((analysis_intra_data*)analysis->intraData)->chromaModes);
859
-        X265_FREE(analysis->intraData);
860
+        if (analysis->intraData)
861
+        {
862
+            X265_FREE(((analysis_intra_data*)analysis->intraData)->depth);
863
+            X265_FREE(((analysis_intra_data*)analysis->intraData)->modes);
864
+            X265_FREE(((analysis_intra_data*)analysis->intraData)->partSizes);
865
+            X265_FREE(((analysis_intra_data*)analysis->intraData)->chromaModes);
866
+            X265_FREE(analysis->intraData);
867
+            analysis->intraData = NULL;
868
+        }
869
     }
870
-    else if (analysis->interData)
871
+    else
872
     {
873
-        X265_FREE(((analysis_inter_data*)analysis->interData)->depth);
874
-        X265_FREE(((analysis_inter_data*)analysis->interData)->modes);
875
-        if (m_param->analysisRefineLevel > 4)
876
+        if (analysis->intraData)
877
         {
878
-            X265_FREE(((analysis_inter_data*)analysis->interData)->mergeFlag);
879
-            X265_FREE(((analysis_inter_data*)analysis->interData)->partSize);
880
+            X265_FREE(((analysis_intra_data*)analysis->intraData)->modes);
881
+            X265_FREE(((analysis_intra_data*)analysis->intraData)->chromaModes);
882
+            X265_FREE(analysis->intraData);
883
+            analysis->intraData = NULL;
884
         }
885
-
886
-        if (m_param->analysisRefineLevel == 10)
887
+        if (analysis->interData)
888
         {
889
-            X265_FREE(((analysis_inter_data*)analysis->interData)->interDir);
890
-            int numDir = analysis->sliceType == X265_TYPE_P ? 1 : 2;
891
-            for (int dir = 0; dir < numDir; dir++)
892
+            X265_FREE(((analysis_inter_data*)analysis->interData)->depth);
893
+            X265_FREE(((analysis_inter_data*)analysis->interData)->modes);
894
+            if (m_param->analysisReuseLevel > 4)
895
             {
896
-                X265_FREE(((analysis_inter_data*)analysis->interData)->mvpIdx[dir]);
897
-                X265_FREE(((analysis_inter_data*)analysis->interData)->refIdx[dir]);
898
-                X265_FREE(((analysis_inter_data*)analysis->interData)->mv[dir]);
899
-            }
900
-            if (analysis->sliceType == P_SLICE || m_param->bIntraInBFrames)
901
-            {
902
-                X265_FREE(((analysis_intra_data*)analysis->intraData)->modes);
903
-                X265_FREE(((analysis_intra_data*)analysis->intraData)->chromaModes);
904
-                X265_FREE(analysis->intraData);
905
+                X265_FREE(((analysis_inter_data*)analysis->interData)->mergeFlag);
906
+                X265_FREE(((analysis_inter_data*)analysis->interData)->partSize);
907
             }
908
-        }
909
-        else
910
-            X265_FREE(((analysis_inter_data*)analysis->interData)->ref);
911
+            if (m_param->analysisReuseLevel == 10)
912
+            {
913
+                X265_FREE(((analysis_inter_data*)analysis->interData)->interDir);
914
+                int numDir = analysis->sliceType == X265_TYPE_P ? 1 : 2;
915
+                for (int dir = 0; dir < numDir; dir++)
916
+                {
917
+                    X265_FREE(((analysis_inter_data*)analysis->interData)->mvpIdx[dir]);
918
+                    X265_FREE(((analysis_inter_data*)analysis->interData)->refIdx[dir]);
919
+                    X265_FREE(((analysis_inter_data*)analysis->interData)->mv[dir]);
920
+                }
921
+            }
922
+            else
923
+                X265_FREE(((analysis_inter_data*)analysis->interData)->ref);
924
 
925
-        X265_FREE(analysis->interData);
926
+            X265_FREE(analysis->interData);
927
+            analysis->interData = NULL;
928
+        }
929
     }
930
 }
931
 
932
@@ -2532,13 +2787,13 @@
933
 {
934
     analysis->analysisFramedata = NULL;
935
     analysis2PassFrameData *analysisFrameData = (analysis2PassFrameData*)analysis->analysisFramedata;
936
-    uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
937
-    uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
938
+    uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
939
+    uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
940
 
941
     uint32_t numCUsInFrame = widthInCU * heightInCU;
942
     CHECKED_MALLOC_ZERO(analysisFrameData, analysis2PassFrameData, 1);
943
-    CHECKED_MALLOC_ZERO(analysisFrameData->depth, uint8_t, NUM_4x4_PARTITIONS * numCUsInFrame);
944
-    CHECKED_MALLOC_ZERO(analysisFrameData->distortion, sse_t, NUM_4x4_PARTITIONS * numCUsInFrame);
945
+    CHECKED_MALLOC_ZERO(analysisFrameData->depth, uint8_t, m_param->num4x4Partitions * numCUsInFrame);
946
+    CHECKED_MALLOC_ZERO(analysisFrameData->distortion, sse_t, m_param->num4x4Partitions * numCUsInFrame);
947
     if (m_param->rc.bStatRead)
948
     {
949
         CHECKED_MALLOC_ZERO(analysisFrameData->ctuDistortion, sse_t, numCUsInFrame);
950
@@ -2548,13 +2803,13 @@
951
     }
952
     if (!IS_X265_TYPE_I(sliceType))
953
     {
954
-        CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[0], MV, NUM_4x4_PARTITIONS * numCUsInFrame);
955
-        CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[1], MV, NUM_4x4_PARTITIONS * numCUsInFrame);
956
-        CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[0], int, NUM_4x4_PARTITIONS * numCUsInFrame);
957
-        CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[1], int, NUM_4x4_PARTITIONS * numCUsInFrame);
958
-        CHECKED_MALLOC_ZERO(analysisFrameData->ref[0], int32_t, NUM_4x4_PARTITIONS * numCUsInFrame);
959
-        CHECKED_MALLOC_ZERO(analysisFrameData->ref[1], int32_t, NUM_4x4_PARTITIONS * numCUsInFrame);
960
-        CHECKED_MALLOC(analysisFrameData->modes, uint8_t, NUM_4x4_PARTITIONS * numCUsInFrame);
961
+        CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[0], MV, m_param->num4x4Partitions * numCUsInFrame);
962
+        CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[1], MV, m_param->num4x4Partitions * numCUsInFrame);
963
+        CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[0], int, m_param->num4x4Partitions * numCUsInFrame);
964
+        CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[1], int, m_param->num4x4Partitions * numCUsInFrame);
965
+        CHECKED_MALLOC_ZERO(analysisFrameData->ref[0], int32_t, m_param->num4x4Partitions * numCUsInFrame);
966
+        CHECKED_MALLOC_ZERO(analysisFrameData->ref[1], int32_t, m_param->num4x4Partitions * numCUsInFrame);
967
+        CHECKED_MALLOC(analysisFrameData->modes, uint8_t, m_param->num4x4Partitions * numCUsInFrame);
968
     }
969
 
970
     analysis->analysisFramedata = analysisFrameData;
971
@@ -2593,11 +2848,15 @@
972
     }
973
 }
974
 
975
-void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc)
976
+void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc, const x265_picture* picIn)
977
 {
978
 
979
-#define X265_FREAD(val, size, readSize, fileOffset)\
980
-    if (fread(val, size, readSize, fileOffset) != readSize)\
981
+#define X265_FREAD(val, size, readSize, fileOffset, src)\
982
+    if (!m_param->bUseAnalysisFile)\
983
+    {\
984
+        memcpy(val, src, (size * readSize));\
985
+    }\
986
+    else if (fread(val, size, readSize, fileOffset) != readSize)\
987
     {\
988
         x265_log(NULL, X265_LOG_ERROR, "Error reading analysis data\n");\
989
         freeAnalysis(analysis);\
990
@@ -2610,67 +2869,98 @@
991
     uint32_t depthBytes = 0;
992
     fseeko(m_analysisFile, totalConsumedBytes, SEEK_SET);
993
 
994
-    int poc; uint32_t frameRecordSize;
995
-    X265_FREAD(&frameRecordSize, sizeof(uint32_t), 1, m_analysisFile);
996
-    X265_FREAD(&depthBytes, sizeof(uint32_t), 1, m_analysisFile);
997
-    X265_FREAD(&poc, sizeof(int), 1, m_analysisFile);
998
+    const x265_analysis_data *picData = &(picIn->analysisData);
999
+    analysis_intra_data *intraPic = (analysis_intra_data *)picData->intraData;
1000
+    analysis_inter_data *interPic = (analysis_inter_data *)picData->interData;
1001
 
1002
-    uint64_t currentOffset = totalConsumedBytes;
1003
+    int poc; uint32_t frameRecordSize;
1004
+    X265_FREAD(&frameRecordSize, sizeof(uint32_t), 1, m_analysisFile, &(picData->frameRecordSize));
1005
+    X265_FREAD(&depthBytes, sizeof(uint32_t), 1, m_analysisFile, &(picData->depthBytes));
1006
+    X265_FREAD(&poc, sizeof(int), 1, m_analysisFile, &(picData->poc));
1007
 
1008
-    /* Seeking to the right frame Record */
1009
-    while (poc != curPoc && !feof(m_analysisFile))
1010
+    if (m_param->bUseAnalysisFile)
1011
     {
1012
-        currentOffset += frameRecordSize;
1013
-        fseeko(m_analysisFile, currentOffset, SEEK_SET);
1014
-        X265_FREAD(&frameRecordSize, sizeof(uint32_t), 1, m_analysisFile);
1015
-        X265_FREAD(&depthBytes, sizeof(uint32_t), 1, m_analysisFile);
1016
-        X265_FREAD(&poc, sizeof(int), 1, m_analysisFile);
1017
-    }
1018
+        uint64_t currentOffset = totalConsumedBytes;
1019
 
1020
-    if (poc != curPoc || feof(m_analysisFile))
1021
-    {
1022
-        x265_log(NULL, X265_LOG_WARNING, "Error reading analysis data: Cannot find POC %d\n", curPoc);
1023
-        freeAnalysis(analysis);
1024
-        return;
1025
+        /* Seeking to the right frame Record */
1026
+        while (poc != curPoc && !feof(m_analysisFile))
1027
+        {
1028
+            currentOffset += frameRecordSize;
1029
+            fseeko(m_analysisFile, currentOffset, SEEK_SET);
1030
+            X265_FREAD(&frameRecordSize, sizeof(uint32_t), 1, m_analysisFile, &(picData->frameRecordSize));
1031
+            X265_FREAD(&depthBytes, sizeof(uint32_t), 1, m_analysisFile, &(picData->depthBytes));
1032
+            X265_FREAD(&poc, sizeof(int), 1, m_analysisFile, &(picData->poc));
1033
+        }
1034
+        if (poc != curPoc || feof(m_analysisFile))
1035
+        {
1036
+            x265_log(NULL, X265_LOG_WARNING, "Error reading analysis data: Cannot find POC %d\n", curPoc);
1037
+            freeAnalysis(analysis);
1038
+            return;
1039
+        }
1040
     }
1041
 
1042
     /* Now arrived at the right frame, read the record */
1043
     analysis->poc = poc;
1044
     analysis->frameRecordSize = frameRecordSize;
1045
-    X265_FREAD(&analysis->sliceType, sizeof(int), 1, m_analysisFile);
1046
-    X265_FREAD(&analysis->bScenecut, sizeof(int), 1, m_analysisFile);
1047
-    X265_FREAD(&analysis->satdCost, sizeof(int64_t), 1, m_analysisFile);
1048
-    X265_FREAD(&analysis->numCUsInFrame, sizeof(int), 1, m_analysisFile);
1049
-    X265_FREAD(&analysis->numPartitions, sizeof(int), 1, m_analysisFile);
1050
+    X265_FREAD(&analysis->sliceType, sizeof(int), 1, m_analysisFile, &(picData->sliceType));
1051
+    X265_FREAD(&analysis->bScenecut, sizeof(int), 1, m_analysisFile, &(picData->bScenecut));
1052
+    X265_FREAD(&analysis->satdCost, sizeof(int64_t), 1, m_analysisFile, &(picData->satdCost));
1053
+    X265_FREAD(&analysis->numCUsInFrame, sizeof(int), 1, m_analysisFile, &(picData->numCUsInFrame));
1054
+    X265_FREAD(&analysis->numPartitions, sizeof(int), 1, m_analysisFile, &(picData->numPartitions));
1055
+    int scaledNumPartition = analysis->numPartitions;
1056
+    int factor = 1 << m_param->scaleFactor;
1057
+
1058
+    if (m_param->scaleFactor)
1059
+        analysis->numPartitions *= factor;
1060
 
1061
     /* Memory is allocated for inter and intra analysis data based on the slicetype */
1062
     allocAnalysis(analysis);
1063
 
1064
     if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
1065
     {
1066
-        analysis->sliceType = X265_TYPE_I;
1067
-        if (m_param->analysisRefineLevel < 2)
1068
+        if (m_param->analysisReuseLevel < 2)
1069
             return;
1070
 
1071
         uint8_t *tempBuf = NULL, *depthBuf = NULL, *modeBuf = NULL, *partSizes = NULL;
1072
 
1073
         tempBuf = X265_MALLOC(uint8_t, depthBytes * 3);
1074
-        X265_FREAD(tempBuf, sizeof(uint8_t), depthBytes * 3, m_analysisFile);
1075
-
1076
         depthBuf = tempBuf;
1077
         modeBuf = tempBuf + depthBytes;
1078
         partSizes = tempBuf + 2 * depthBytes;
1079
 
1080
+        X265_FREAD(depthBuf, sizeof(uint8_t), depthBytes, m_analysisFile, intraPic->depth);
1081
+        X265_FREAD(modeBuf, sizeof(uint8_t), depthBytes, m_analysisFile, intraPic->chromaModes);
1082
+        X265_FREAD(partSizes, sizeof(uint8_t), depthBytes, m_analysisFile, intraPic->partSizes);
1083
+
1084
         size_t count = 0;
1085
         for (uint32_t d = 0; d < depthBytes; d++)
1086
         {
1087
             int bytes = analysis->numPartitions >> (depthBuf[d] * 2);
1088
+            if (m_param->scaleFactor)
1089
+            {
1090
+                if (depthBuf[d] == 0)
1091
+                    depthBuf[d] = 1;
1092
+                if (partSizes[d] == SIZE_NxN)
1093
+                    partSizes[d] = SIZE_2Nx2N;
1094
+            }
1095
             memset(&((analysis_intra_data *)analysis->intraData)->depth[count], depthBuf[d], bytes);
1096
             memset(&((analysis_intra_data *)analysis->intraData)->chromaModes[count], modeBuf[d], bytes);
1097
             memset(&((analysis_intra_data *)analysis->intraData)->partSizes[count], partSizes[d], bytes);
1098
             count += bytes;
1099
         }
1100
-        X265_FREAD(((analysis_intra_data *)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile);
1101
+
1102
+        if (!m_param->scaleFactor)
1103
+        {
1104
+            X265_FREAD(((analysis_intra_data *)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile, intraPic->modes);
1105
+        }
1106
+        else
1107
+        {
1108
+            uint8_t *tempLumaBuf = X265_MALLOC(uint8_t, analysis->numCUsInFrame * scaledNumPartition);
1109
+            X265_FREAD(tempLumaBuf, sizeof(uint8_t), analysis->numCUsInFrame * scaledNumPartition, m_analysisFile, intraPic->modes);
1110
+            for (uint32_t ctu32Idx = 0, cnt = 0; ctu32Idx < analysis->numCUsInFrame * scaledNumPartition; ctu32Idx++, cnt += factor)
1111
+                memset(&((analysis_intra_data *)analysis->intraData)->modes[cnt], tempLumaBuf[ctu32Idx], factor);
1112
+            X265_FREE(tempLumaBuf);
1113
+        }
1114
         X265_FREE(tempBuf);
1115
         consumedBytes += frameRecordSize;
1116
     }
1117
@@ -2679,8 +2969,8 @@
1118
     {
1119
         uint32_t numDir = analysis->sliceType == X265_TYPE_P ? 1 : 2;
1120
         uint32_t numPlanes = m_param->internalCsp == X265_CSP_I400 ? 1 : 3;
1121
-        X265_FREAD((WeightParam*)analysis->wt, sizeof(WeightParam), numPlanes * numDir, m_analysisFile);
1122
-        if (m_param->analysisRefineLevel < 2)
1123
+        X265_FREAD((WeightParam*)analysis->wt, sizeof(WeightParam), numPlanes * numDir, m_analysisFile, (picIn->analysisData.wt));
1124
+        if (m_param->analysisReuseLevel < 2)
1125
             return;
1126
 
1127
         uint8_t *tempBuf = NULL, *depthBuf = NULL, *modeBuf = NULL, *partSize = NULL, *mergeFlag = NULL;
1128
@@ -2688,9 +2978,9 @@
1129
         MV* mv[2];
1130
         int8_t* refIdx[2];
1131
 
1132
-        int numBuf = m_param->analysisRefineLevel > 4 ? 4 : 2;
1133
+        int numBuf = m_param->analysisReuseLevel > 4 ? 4 : 2;
1134
         bool bIntraInInter = false;
1135
-        if (m_param->analysisRefineLevel == 10)
1136
+        if (m_param->analysisReuseLevel == 10)
1137
         {
1138
             numBuf++;
1139
             bIntraInInter = (analysis->sliceType == X265_TYPE_P || m_param->bIntraInBFrames);
1140
@@ -2698,26 +2988,36 @@
1141
         }
1142
 
1143
         tempBuf = X265_MALLOC(uint8_t, depthBytes * numBuf);
1144
-        X265_FREAD(tempBuf, sizeof(uint8_t), depthBytes * numBuf, m_analysisFile);
1145
-
1146
         depthBuf = tempBuf;
1147
         modeBuf = tempBuf + depthBytes;
1148
-        if (m_param->analysisRefineLevel > 4)
1149
+
1150
+        X265_FREAD(depthBuf, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->depth);
1151
+        X265_FREAD(modeBuf, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->modes);
1152
+
1153
+        if (m_param->analysisReuseLevel > 4)
1154
         {
1155
             partSize = modeBuf + depthBytes;
1156
             mergeFlag = partSize + depthBytes;
1157
-            if (m_param->analysisRefineLevel == 10)
1158
+            X265_FREAD(partSize, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->partSize);
1159
+            X265_FREAD(mergeFlag, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->mergeFlag);
1160
+
1161
+            if (m_param->analysisReuseLevel == 10)
1162
             {
1163
                 interDir = mergeFlag + depthBytes;
1164
-                if (bIntraInInter) chromaDir = interDir + depthBytes;
1165
+                X265_FREAD(interDir, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->interDir);
1166
+                if (bIntraInInter)
1167
+                {
1168
+                    chromaDir = interDir + depthBytes;
1169
+                    X265_FREAD(chromaDir, sizeof(uint8_t), depthBytes, m_analysisFile, intraPic->chromaModes);
1170
+                }
1171
                 for (uint32_t i = 0; i < numDir; i++)
1172
                 {
1173
-                    mvpIdx[i] = X265_MALLOC(uint8_t, depthBytes * 3);
1174
-                    X265_FREAD(mvpIdx[i], sizeof(uint8_t), depthBytes, m_analysisFile);
1175
+                    mvpIdx[i] = X265_MALLOC(uint8_t, depthBytes);
1176
                     refIdx[i] = X265_MALLOC(int8_t, depthBytes);
1177
-                    X265_FREAD(refIdx[i], sizeof(int8_t), depthBytes, m_analysisFile);
1178
                     mv[i] = X265_MALLOC(MV, depthBytes);
1179
-                    X265_FREAD(mv[i], sizeof(MV), depthBytes, m_analysisFile);
1180
+                    X265_FREAD(mvpIdx[i], sizeof(uint8_t), depthBytes, m_analysisFile, interPic->mvpIdx[i]);
1181
+                    X265_FREAD(refIdx[i], sizeof(int8_t), depthBytes, m_analysisFile, interPic->refIdx[i]);
1182
+                    X265_FREAD(mv[i], sizeof(MV), depthBytes, m_analysisFile, interPic->mv[i]);
1183
                 }
1184
             }
1185
         }
1186
@@ -2726,28 +3026,37 @@
1187
         for (uint32_t d = 0; d < depthBytes; d++)
1188
         {
1189
             int bytes = analysis->numPartitions >> (depthBuf[d] * 2);
1190
+            if (m_param->scaleFactor && modeBuf[d] == MODE_INTRA && depthBuf[d] == 0)
1191
+                 depthBuf[d] = 1;
1192
             memset(&((analysis_inter_data *)analysis->interData)->depth[count], depthBuf[d], bytes);
1193
             memset(&((analysis_inter_data *)analysis->interData)->modes[count], modeBuf[d], bytes);
1194
-            if (m_param->analysisRefineLevel > 4)
1195
+            if (m_param->analysisReuseLevel > 4)
1196
             {
1197
+                if (m_param->scaleFactor && modeBuf[d] == MODE_INTRA && partSize[d] == SIZE_NxN)
1198
+                     partSize[d] = SIZE_2Nx2N;
1199
                 memset(&((analysis_inter_data *)analysis->interData)->partSize[count], partSize[d], bytes);
1200
-                int numPU = nbPartsTable[(int)partSize[d]];
1201
+                int numPU = (modeBuf[d] == MODE_INTRA) ? 1 : nbPartsTable[(int)partSize[d]];
1202
                 for (int pu = 0; pu < numPU; pu++)
1203
                 {
1204
                     if (pu) d++;
1205
                     ((analysis_inter_data *)analysis->interData)->mergeFlag[count + pu] = mergeFlag[d];
1206
-                    if (m_param->analysisRefineLevel == 10)
1207
+                    if (m_param->analysisReuseLevel == 10)
1208
                     {
1209
                         ((analysis_inter_data *)analysis->interData)->interDir[count + pu] = interDir[d];
1210
                         for (uint32_t i = 0; i < numDir; i++)
1211
                         {
1212
                             ((analysis_inter_data *)analysis->interData)->mvpIdx[i][count + pu] = mvpIdx[i][d];
1213
                             ((analysis_inter_data *)analysis->interData)->refIdx[i][count + pu] = refIdx[i][d];
1214
+                            if (m_param->scaleFactor)
1215
+                            {
1216
+                                mv[i][d].x *= (int16_t)m_param->scaleFactor;
1217
+                                mv[i][d].y *= (int16_t)m_param->scaleFactor;
1218
+                            }
1219
                             memcpy(&((analysis_inter_data *)analysis->interData)->mv[i][count + pu], &mv[i][d], sizeof(MV));
1220
                         }
1221
                     }
1222
                 }
1223
-                if (m_param->analysisRefineLevel == 10 && bIntraInInter)
1224
+                if (m_param->analysisReuseLevel == 10 && bIntraInInter)
1225
                     memset(&((analysis_intra_data *)analysis->intraData)->chromaModes[count], chromaDir[d], bytes);
1226
             }
1227
             count += bytes;
1228
@@ -2755,7 +3064,7 @@
1229
 
1230
         X265_FREE(tempBuf);
1231
 
1232
-        if (m_param->analysisRefineLevel == 10)
1233
+        if (m_param->analysisReuseLevel == 10)
1234
         {
1235
             for (uint32_t i = 0; i < numDir; i++)
1236
             {
1237
@@ -2764,10 +3073,23 @@
1238
                 X265_FREE(mv[i]);
1239
             }
1240
             if (bIntraInInter)
1241
-                X265_FREAD(((analysis_intra_data *)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile);
1242
+            {
1243
+                if (!m_param->scaleFactor)
1244
+                {
1245
+                    X265_FREAD(((analysis_intra_data *)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile, intraPic->modes);
1246
+                }
1247
+                else
1248
+                {
1249
+                    uint8_t *tempLumaBuf = X265_MALLOC(uint8_t, analysis->numCUsInFrame * scaledNumPartition);
1250
+                    X265_FREAD(tempLumaBuf, sizeof(uint8_t), analysis->numCUsInFrame * scaledNumPartition, m_analysisFile, intraPic->modes);
1251
+                    for (uint32_t ctu32Idx = 0, cnt = 0; ctu32Idx < analysis->numCUsInFrame * scaledNumPartition; ctu32Idx++, cnt += factor)
1252
+                        memset(&((analysis_intra_data *)analysis->intraData)->modes[cnt], tempLumaBuf[ctu32Idx], factor);
1253
+                    X265_FREE(tempLumaBuf);
1254
+                }
1255
+            }
1256
         }
1257
         else
1258
-            X265_FREAD(((analysis_inter_data *)analysis->interData)->ref, sizeof(int32_t), analysis->numCUsInFrame * X265_MAX_PRED_MODE_PER_CTU * numDir, m_analysisFile);
1259
+            X265_FREAD(((analysis_inter_data *)analysis->interData)->ref, sizeof(int32_t), analysis->numCUsInFrame * X265_MAX_PRED_MODE_PER_CTU * numDir, m_analysisFile, interPic->ref);
1260
 
1261
         consumedBytes += frameRecordSize;
1262
         if (numDir == 1)
1263
@@ -2789,8 +3111,8 @@
1264
 }\
1265
 
1266
     uint32_t depthBytes = 0;
1267
-    uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
1268
-    uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
1269
+    uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
1270
+    uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
1271
     uint32_t numCUsInFrame = widthInCU * heightInCU;
1272
 
1273
     int poc; uint32_t frameRecordSize;
1274
@@ -2820,12 +3142,12 @@
1275
     double sum = 0, sqrSum = 0;
1276
     for (uint32_t d = 0; d < depthBytes; d++)
1277
     {
1278
-        int bytes = NUM_4x4_PARTITIONS >> (depthBuf[d] * 2);
1279
+        int bytes = m_param->num4x4Partitions >> (depthBuf[d] * 2);
1280
         memset(&analysisFrameData->depth[count], depthBuf[d], bytes);
1281
         analysisFrameData->distortion[count] = distortionBuf[d];
1282
         analysisFrameData->ctuDistortion[ctuCount] += analysisFrameData->distortion[count];
1283
         count += bytes;
1284
-        if ((count % (size_t)NUM_4x4_PARTITIONS) == 0)
1285
+        if ((count % (unsigned)m_param->num4x4Partitions) == 0)
1286
         {
1287
             analysisFrameData->scaledDistortion[ctuCount] = X265_LOG2(X265_MAX(analysisFrameData->ctuDistortion[ctuCount], 1));
1288
             sum += analysisFrameData->scaledDistortion[ctuCount];
1289
@@ -2873,7 +3195,7 @@
1290
         count = 0;
1291
         for (uint32_t d = 0; d < depthBytes; d++)
1292
         {
1293
-            size_t bytes = NUM_4x4_PARTITIONS >> (depthBuf[d] * 2);
1294
+            size_t bytes = m_param->num4x4Partitions >> (depthBuf[d] * 2);
1295
             for (int i = 0; i < numDir; i++)
1296
             {
1297
                 for (size_t j = count, k = 0; k < bytes; j++, k++)
1298
@@ -2927,7 +3249,7 @@
1299
         analysis->frameRecordSize += sizeof(WeightParam) * numPlanes * numDir;
1300
     }
1301
 
1302
-    if (m_param->analysisRefineLevel > 1)
1303
+    if (m_param->analysisReuseLevel > 1)
1304
     {
1305
         if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
1306
         {
1307
@@ -2975,25 +3297,25 @@
1308
                     interDataCTU->depth[depthBytes] = depth;
1309
 
1310
                     predMode = ctu->m_predMode[absPartIdx];
1311
-                    if (m_param->analysisRefineLevel != 10 && ctu->m_refIdx[1][absPartIdx] != -1)
1312
+                    if (m_param->analysisReuseLevel != 10 && ctu->m_refIdx[1][absPartIdx] != -1)
1313
                         predMode = 4; // used as indiacator if the block is coded as bidir
1314
 
1315
                     interDataCTU->modes[depthBytes] = predMode;
1316
 
1317
-                    if (m_param->analysisRefineLevel > 4)
1318
+                    if (m_param->analysisReuseLevel > 4)
1319
                     {
1320
                         partSize = ctu->m_partSize[absPartIdx];
1321
                         interDataCTU->partSize[depthBytes] = partSize;
1322
 
1323
                         /* Store per PU data */
1324
-                        uint32_t numPU = nbPartsTable[(int)partSize];
1325
+                        uint32_t numPU = (predMode == MODE_INTRA) ? 1 : nbPartsTable[(int)partSize];
1326
                         for (uint32_t puIdx = 0; puIdx < numPU; puIdx++)
1327
                         {
1328
                             uint32_t puabsPartIdx = ctu->getPUOffset(puIdx, absPartIdx) + absPartIdx;
1329
                             if (puIdx) depthBytes++;
1330
                             interDataCTU->mergeFlag[depthBytes] = ctu->m_mergeFlag[puabsPartIdx];
1331
 
1332
-                            if (m_param->analysisRefineLevel == 10)
1333
+                            if (m_param->analysisReuseLevel == 10)
1334
                             {
1335
                                 interDataCTU->interDir[depthBytes] = ctu->m_interDir[puabsPartIdx];
1336
                                 for (uint32_t dir = 0; dir < numDir; dir++)
1337
@@ -3004,12 +3326,12 @@
1338
                                 }
1339
                             }
1340
                         }
1341
-                        if (m_param->analysisRefineLevel == 10 && bIntraInInter)
1342
+                        if (m_param->analysisReuseLevel == 10 && bIntraInInter)
1343
                             intraDataCTU->chromaModes[depthBytes] = ctu->m_chromaIntraDir[absPartIdx];
1344
                     }
1345
                     absPartIdx += ctu->m_numPartitions >> (depth * 2);
1346
                 }
1347
-                if (m_param->analysisRefineLevel == 10 && bIntraInInter)
1348
+                if (m_param->analysisReuseLevel == 10 && bIntraInInter)
1349
                     memcpy(&intraDataCTU->modes[ctu->m_cuAddr * ctu->m_numPartitions], ctu->m_lumaIntraDir, sizeof(uint8_t)* ctu->m_numPartitions);
1350
             }
1351
         }
1352
@@ -3020,10 +3342,10 @@
1353
         {
1354
             /* Add sizeof depth, modes, partSize, mergeFlag */
1355
             analysis->frameRecordSize += depthBytes * 2;
1356
-            if (m_param->analysisRefineLevel > 4)
1357
+            if (m_param->analysisReuseLevel > 4)
1358
                 analysis->frameRecordSize += (depthBytes * 2);
1359
 
1360
-            if (m_param->analysisRefineLevel == 10)
1361
+            if (m_param->analysisReuseLevel == 10)
1362
             {
1363
                 /* Add Size of interDir, mvpIdx, refIdx, mv, luma and chroma modes */
1364
                 analysis->frameRecordSize += depthBytes;
1365
@@ -3036,7 +3358,12 @@
1366
             else
1367
                 analysis->frameRecordSize += sizeof(int32_t)* analysis->numCUsInFrame * X265_MAX_PRED_MODE_PER_CTU * numDir;
1368
         }
1369
+        analysis->depthBytes = depthBytes;
1370
     }
1371
+
1372
+    if (!m_param->bUseAnalysisFile)
1373
+        return;
1374
+
1375
     X265_FWRITE(&analysis->frameRecordSize, sizeof(uint32_t), 1, m_analysisFile);
1376
     X265_FWRITE(&depthBytes, sizeof(uint32_t), 1, m_analysisFile);
1377
     X265_FWRITE(&analysis->poc, sizeof(int), 1, m_analysisFile);
1378
@@ -3048,7 +3375,7 @@
1379
     if (analysis->sliceType > X265_TYPE_I)
1380
         X265_FWRITE((WeightParam*)analysis->wt, sizeof(WeightParam), numPlanes * numDir, m_analysisFile);
1381
 
1382
-    if (m_param->analysisRefineLevel < 2)
1383
+    if (m_param->analysisReuseLevel < 2)
1384
         return;
1385
 
1386
     if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
1387
@@ -3062,11 +3389,11 @@
1388
     {
1389
         X265_FWRITE(((analysis_inter_data*)analysis->interData)->depth, sizeof(uint8_t), depthBytes, m_analysisFile);
1390
         X265_FWRITE(((analysis_inter_data*)analysis->interData)->modes, sizeof(uint8_t), depthBytes, m_analysisFile);
1391
-        if (m_param->analysisRefineLevel > 4)
1392
+        if (m_param->analysisReuseLevel > 4)
1393
         {
1394
             X265_FWRITE(((analysis_inter_data*)analysis->interData)->partSize, sizeof(uint8_t), depthBytes, m_analysisFile);
1395
             X265_FWRITE(((analysis_inter_data*)analysis->interData)->mergeFlag, sizeof(uint8_t), depthBytes, m_analysisFile);
1396
-            if (m_param->analysisRefineLevel == 10)
1397
+            if (m_param->analysisReuseLevel == 10)
1398
             {
1399
                 X265_FWRITE(((analysis_inter_data*)analysis->interData)->interDir, sizeof(uint8_t), depthBytes, m_analysisFile);
1400
                 if (bIntraInInter) X265_FWRITE(((analysis_intra_data*)analysis->intraData)->chromaModes, sizeof(uint8_t), depthBytes, m_analysisFile);
1401
@@ -3080,7 +3407,7 @@
1402
                     X265_FWRITE(((analysis_intra_data*)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile);
1403
             }
1404
         }
1405
-        if (m_param->analysisRefineLevel != 10)
1406
+        if (m_param->analysisReuseLevel != 10)
1407
             X265_FWRITE(((analysis_inter_data*)analysis->interData)->ref, sizeof(int32_t), analysis->numCUsInFrame * X265_MAX_PRED_MODE_PER_CTU * numDir, m_analysisFile);
1408
 
1409
     }
1410
@@ -3099,8 +3426,8 @@
1411
 }\
1412
 
1413
     uint32_t depthBytes = 0;
1414
-    uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
1415
-    uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
1416
+    uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
1417
+    uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
1418
     uint32_t numCUsInFrame = widthInCU * heightInCU;
1419
     analysis2PassFrameData* analysisFrameData = (analysis2PassFrameData*)analysis2Pass->analysisFramedata;
1420
 
1421
x265_2.4.tar.gz/source/encoder/encoder.h -> x265_2.5.tar.gz/source/encoder/encoder.h Changed
54
 
1
@@ -31,11 +31,9 @@
2
 #include "x265.h"
3
 #include "nal.h"
4
 #include "framedata.h"
5
-
6
-#ifdef ENABLE_DYNAMIC_HDR10
7
-    #include "dynamicHDR10\hdr10plus.h"
8
+#ifdef ENABLE_HDR10_PLUS
9
+    #include "dynamicHDR10/hdr10plus.h"
10
 #endif
11
-
12
 struct x265_encoder {};
13
 namespace X265_NS {
14
 // private namespace
15
@@ -178,8 +176,10 @@
16
 
17
     int                     m_bToneMap; // Enables tone-mapping
18
 
19
-#ifdef ENABLE_DYNAMIC_HDR10
20
+#ifdef ENABLE_HDR10_PLUS
21
     const hdr10plus_api     *m_hdr10plus_api;
22
+    uint8_t                 **cim;
23
+    int                     numCimInfo;
24
 #endif
25
 
26
     x265_sei_payload        m_prevTonemapPayload;
27
@@ -187,7 +187,7 @@
28
     Encoder();
29
     ~Encoder()
30
     {
31
-#ifdef ENABLE_DYNAMIC_HDR10
32
+#ifdef ENABLE_HDR10_PLUS
33
         if (m_prevTonemapPayload.payload != NULL)
34
             X265_FREE(m_prevTonemapPayload.payload);
35
 #endif
36
@@ -201,6 +201,8 @@
37
 
38
     int reconfigureParam(x265_param* encParam, x265_param* param);
39
 
40
+    void copyCtuInfo(x265_ctu_info_t** frameCtuInfo, int poc);
41
+
42
     void getStreamHeaders(NALList& list, Entropy& sbacCoder, Bitstream& bs);
43
 
44
     void fetchStats(x265_stats* stats, size_t statsSizeBytes);
45
@@ -223,7 +225,7 @@
46
 
47
     void freeAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType);
48
 
49
-    void readAnalysisFile(x265_analysis_data* analysis, int poc);
50
+    void readAnalysisFile(x265_analysis_data* analysis, int poc, const x265_picture* picIn);
51
 
52
     void writeAnalysisFile(x265_analysis_data* pic, FrameData &curEncData);
53
     void readAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, int poc, int sliceType);
54
x265_2.4.tar.gz/source/encoder/entropy.cpp -> x265_2.5.tar.gz/source/encoder/entropy.cpp Changed
64
 
1
@@ -700,7 +700,7 @@
2
     // TODO: Enable when pps_loop_filter_across_slices_enabled_flag==1
3
     //       We didn't support filter across slice board, so disable it now
4
 
5
-    if (g_maxSlices <= 1)
6
+    if (encData.m_param->maxSlices <= 1)
7
     {
8
         bool isSAOEnabled = slice.m_sps->bUseSAO ? saoParam->bSaoFlag[0] || saoParam->bSaoFlag[1] : false;
9
         bool isDBFEnabled = !slice.m_pps->bPicDisableDeblockingFilter;
10
@@ -783,7 +783,7 @@
11
     if (cuSplitFlag) 
12
         codeSplitFlag(ctu, absPartIdx, depth);
13
 
14
-    if (depth < ctu.m_cuDepth[absPartIdx] && depth < g_maxCUDepth)
15
+    if (depth < ctu.m_cuDepth[absPartIdx] && depth < ctu.m_encData->m_param->maxCUDepth)
16
     {
17
         uint32_t qNumParts = cuGeom.numPartitions >> 2;
18
         if (depth == slice->m_pps->maxCuDQPDepth && slice->m_pps->bUseDQP)
19
@@ -863,7 +863,7 @@
20
     case SIZE_nRx2N:
21
         bits += bitsCodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 0]);
22
         bits += bitsCodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 1]);
23
-        if (depth == g_maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
24
+        if (depth == cu.m_encData->m_param->maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
25
             bits += bitsCodeBin(1, m_contextState[OFF_PART_SIZE_CTX + 2]);
26
         if (cu.m_slice->m_sps->maxAMPDepth > depth)
27
         {
28
@@ -888,7 +888,7 @@
29
     uint32_t cuAddr = ctu.getSCUAddr() + absPartIdx;
30
     X265_CHECK(realEndAddress == slice->realEndAddress(slice->m_endCUAddr), "real end address expected\n");
31
 
32
-    uint32_t granularityMask = g_maxCUSize - 1;
33
+    uint32_t granularityMask = ctu.m_encData->m_param->maxCUSize - 1;
34
     uint32_t cuSize = 1 << ctu.m_log2CUSize[absPartIdx];
35
     uint32_t rpelx = ctu.m_cuPelX + g_zscanToPelX[absPartIdx] + cuSize;
36
     uint32_t bpely = ctu.m_cuPelY + g_zscanToPelY[absPartIdx] + cuSize;
37
@@ -902,7 +902,7 @@
38
     {
39
         // Encode slice finish
40
         uint32_t bTerminateSlice = ctu.m_bLastCuInSlice;
41
-        if (cuAddr + (NUM_4x4_PARTITIONS >> (depth << 1)) == realEndAddress)
42
+        if (cuAddr + (slice->m_param->num4x4Partitions >> (depth << 1)) == realEndAddress)
43
             bTerminateSlice = 1;
44
 
45
         // The 1-terminating bit is added to all streams, so don't add it here when it's 1.
46
@@ -1512,7 +1512,7 @@
47
 
48
     if (cu.isIntra(absPartIdx))
49
     {
50
-        if (depth == g_maxCUDepth)
51
+        if (depth == cu.m_encData->m_param->maxCUDepth)
52
             encodeBin(partSize == SIZE_2Nx2N ? 1 : 0, m_contextState[OFF_PART_SIZE_CTX]);
53
         return;
54
     }
55
@@ -1541,7 +1541,7 @@
56
     case SIZE_nRx2N:
57
         encodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 0]);
58
         encodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 1]);
59
-        if (depth == g_maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
60
+        if (depth == cu.m_encData->m_param->maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
61
             encodeBin(1, m_contextState[OFF_PART_SIZE_CTX + 2]);
62
         if (cu.m_slice->m_sps->maxAMPDepth > depth)
63
         {
64
x265_2.4.tar.gz/source/encoder/frameencoder.cpp -> x265_2.5.tar.gz/source/encoder/frameencoder.cpp Changed
459
 
1
@@ -124,7 +124,7 @@
2
     range += !!(m_param->searchMethod < 2);  /* diamond/hex range check lag */
3
     range += NTAPS_LUMA / 2;                 /* subpel filter half-length */
4
     range += 2 + (MotionEstimate::hpelIterationCount(m_param->subpelRefine) + 1) / 2; /* subpel refine steps */
5
-    m_refLagRows = /*(m_param->maxSlices > 1 ? 1 : 0) +*/ 1 + ((range + g_maxCUSize - 1) / g_maxCUSize);
6
+    m_refLagRows = /*(m_param->maxSlices > 1 ? 1 : 0) +*/ 1 + ((range + m_param->maxCUSize - 1) / m_param->maxCUSize);
7
 
8
     // NOTE: 2 times of numRows because both Encoder and Filter in same queue
9
     if (!WaveFront::init(m_numRows * 2))
10
@@ -295,6 +295,11 @@
11
 
12
     while (m_threadActive)
13
     {
14
+        if (m_param->bCTUInfo)
15
+        {
16
+            while (!m_frame->m_ctuInfo)
17
+                m_frame->m_copied.wait();
18
+        }
19
         compressFrame();
20
         m_done.trigger(); /* FrameEncoder::getEncodedPicture() blocks for this event */
21
         m_enable.wait();
22
@@ -383,7 +388,7 @@
23
     bool bUseWeightB = slice->m_sliceType == B_SLICE && slice->m_pps->bUseWeightedBiPred;
24
 
25
     WeightParam* reuseWP = NULL;
26
-    if (m_param->analysisMode && (bUseWeightP || bUseWeightB))
27
+    if (m_param->analysisReuseMode && (bUseWeightP || bUseWeightB))
28
         reuseWP = (WeightParam*)m_frame->m_analysisData.wt;
29
 
30
     if (bUseWeightP || bUseWeightB)
31
@@ -392,7 +397,7 @@
32
         m_cuStats.countWeightAnalyze++;
33
         ScopedElapsedTime time(m_cuStats.weightAnalyzeTime);
34
 #endif
35
-        if (m_param->analysisMode == X265_ANALYSIS_LOAD)
36
+        if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD)
37
         {
38
             for (int list = 0; list < slice->isInterB() + 1; list++) 
39
             {
40
@@ -431,7 +436,7 @@
41
             slice->m_refReconPicList[l][ref] = slice->m_refFrameList[l][ref]->m_reconPic;
42
             m_mref[l][ref].init(slice->m_refReconPicList[l][ref], w, *m_param);
43
         }
44
-        if (m_param->analysisMode == X265_ANALYSIS_SAVE && (bUseWeightP || bUseWeightB))
45
+        if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE && (bUseWeightP || bUseWeightB))
46
         {
47
             for (int i = 0; i < (m_param->internalCsp != X265_CSP_I400 ? 3 : 1); i++)
48
                 *(reuseWP++) = slice->m_weightPredTable[l][0][i];
49
@@ -664,7 +669,7 @@
50
             if (writeSei)
51
             {
52
                 SEICreativeIntentMeta sei;
53
-                sei.cim = payload->payload;
54
+                sei.m_payload = payload->payload;
55
                 m_bs.resetBits();
56
                 sei.setSize(payload->payloadSize);
57
                 sei.write(m_bs, *slice->m_sps);
58
@@ -832,7 +837,7 @@
59
         }
60
         else if (m_param->decodedPictureHashSEI == 3)
61
         {
62
-            uint32_t cuHeight = g_maxCUSize;
63
+            uint32_t cuHeight = m_param->maxCUSize;
64
 
65
             m_checksum[0] = 0;
66
 
67
@@ -872,43 +877,52 @@
68
         m_frame->m_encData->m_frameStats.percent8x8Inter = (double)totalP / totalCuCount;
69
         m_frame->m_encData->m_frameStats.percent8x8Skip  = (double)totalSkip / totalCuCount;
70
     }
71
-    for (uint32_t i = 0; i < m_numRows; i++)
72
+
73
+    if (m_param->csvLogLevel >= 1)
74
     {
75
-        m_frame->m_encData->m_frameStats.cntIntraNxN      += m_rows[i].rowStats.cntIntraNxN;
76
-        m_frame->m_encData->m_frameStats.totalCu          += m_rows[i].rowStats.totalCu;
77
-        m_frame->m_encData->m_frameStats.totalCtu         += m_rows[i].rowStats.totalCtu;
78
-        m_frame->m_encData->m_frameStats.lumaDistortion   += m_rows[i].rowStats.lumaDistortion;
79
-        m_frame->m_encData->m_frameStats.chromaDistortion += m_rows[i].rowStats.chromaDistortion;
80
-        m_frame->m_encData->m_frameStats.psyEnergy        += m_rows[i].rowStats.psyEnergy;
81
-        m_frame->m_encData->m_frameStats.ssimEnergy       += m_rows[i].rowStats.ssimEnergy;
82
-        m_frame->m_encData->m_frameStats.resEnergy        += m_rows[i].rowStats.resEnergy;
83
-        for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
84
+        for (uint32_t i = 0; i < m_numRows; i++)
85
         {
86
-            m_frame->m_encData->m_frameStats.cntSkipCu[depth] += m_rows[i].rowStats.cntSkipCu[depth];
87
-            m_frame->m_encData->m_frameStats.cntMergeCu[depth] += m_rows[i].rowStats.cntMergeCu[depth];
88
-            for (int m = 0; m < INTER_MODES; m++)
89
-                m_frame->m_encData->m_frameStats.cuInterDistribution[depth][m] += m_rows[i].rowStats.cuInterDistribution[depth][m];
90
+            m_frame->m_encData->m_frameStats.cntIntraNxN += m_rows[i].rowStats.cntIntraNxN;
91
+            m_frame->m_encData->m_frameStats.totalCu += m_rows[i].rowStats.totalCu;
92
+            m_frame->m_encData->m_frameStats.totalCtu += m_rows[i].rowStats.totalCtu;
93
+            m_frame->m_encData->m_frameStats.lumaDistortion += m_rows[i].rowStats.lumaDistortion;
94
+            m_frame->m_encData->m_frameStats.chromaDistortion += m_rows[i].rowStats.chromaDistortion;
95
+            m_frame->m_encData->m_frameStats.psyEnergy += m_rows[i].rowStats.psyEnergy;
96
+            m_frame->m_encData->m_frameStats.ssimEnergy += m_rows[i].rowStats.ssimEnergy;
97
+            m_frame->m_encData->m_frameStats.resEnergy += m_rows[i].rowStats.resEnergy;
98
+            for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
99
+            {
100
+                m_frame->m_encData->m_frameStats.cntSkipCu[depth] += m_rows[i].rowStats.cntSkipCu[depth];
101
+                m_frame->m_encData->m_frameStats.cntMergeCu[depth] += m_rows[i].rowStats.cntMergeCu[depth];
102
+                for (int m = 0; m < INTER_MODES; m++)
103
+                    m_frame->m_encData->m_frameStats.cuInterDistribution[depth][m] += m_rows[i].rowStats.cuInterDistribution[depth][m];
104
+                for (int n = 0; n < INTRA_MODES; n++)
105
+                    m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] += m_rows[i].rowStats.cuIntraDistribution[depth][n];
106
+            }
107
+        }
108
+        m_frame->m_encData->m_frameStats.percentIntraNxN = (double)(m_frame->m_encData->m_frameStats.cntIntraNxN * 100) / m_frame->m_encData->m_frameStats.totalCu;
109
+
110
+        for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
111
+        {
112
+            m_frame->m_encData->m_frameStats.percentSkipCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntSkipCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
113
+            m_frame->m_encData->m_frameStats.percentMergeCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntMergeCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
114
             for (int n = 0; n < INTRA_MODES; n++)
115
-                m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] += m_rows[i].rowStats.cuIntraDistribution[depth][n];
116
+                m_frame->m_encData->m_frameStats.percentIntraDistribution[depth][n] = (double)(m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] * 100) / m_frame->m_encData->m_frameStats.totalCu;
117
+            uint64_t cuInterRectCnt = 0; // sum of Nx2N, 2NxN counts
118
+            cuInterRectCnt += m_frame->m_encData->m_frameStats.cuInterDistribution[depth][1] + m_frame->m_encData->m_frameStats.cuInterDistribution[depth][2];
119
+            m_frame->m_encData->m_frameStats.percentInterDistribution[depth][0] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][0] * 100) / m_frame->m_encData->m_frameStats.totalCu;
120
+            m_frame->m_encData->m_frameStats.percentInterDistribution[depth][1] = (double)(cuInterRectCnt * 100) / m_frame->m_encData->m_frameStats.totalCu;
121
+            m_frame->m_encData->m_frameStats.percentInterDistribution[depth][2] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][3] * 100) / m_frame->m_encData->m_frameStats.totalCu;
122
         }
123
     }
124
-    m_frame->m_encData->m_frameStats.avgLumaDistortion   = (double)(m_frame->m_encData->m_frameStats.lumaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
125
-    m_frame->m_encData->m_frameStats.avgChromaDistortion = (double)(m_frame->m_encData->m_frameStats.chromaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
126
-    m_frame->m_encData->m_frameStats.avgPsyEnergy        = (double)(m_frame->m_encData->m_frameStats.psyEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
127
-    m_frame->m_encData->m_frameStats.avgSsimEnergy       = (double)(m_frame->m_encData->m_frameStats.ssimEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
128
-    m_frame->m_encData->m_frameStats.avgResEnergy        = (double)(m_frame->m_encData->m_frameStats.resEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
129
-    m_frame->m_encData->m_frameStats.percentIntraNxN     = (double)(m_frame->m_encData->m_frameStats.cntIntraNxN * 100) / m_frame->m_encData->m_frameStats.totalCu;
130
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
131
+
132
+    if (m_param->csvLogLevel >= 2)
133
     {
134
-        m_frame->m_encData->m_frameStats.percentSkipCu[depth]  = (double)(m_frame->m_encData->m_frameStats.cntSkipCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
135
-        m_frame->m_encData->m_frameStats.percentMergeCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntMergeCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
136
-        for (int n = 0; n < INTRA_MODES; n++)
137
-            m_frame->m_encData->m_frameStats.percentIntraDistribution[depth][n] = (double)(m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] * 100) / m_frame->m_encData->m_frameStats.totalCu;
138
-        uint64_t cuInterRectCnt = 0; // sum of Nx2N, 2NxN counts
139
-        cuInterRectCnt += m_frame->m_encData->m_frameStats.cuInterDistribution[depth][1] + m_frame->m_encData->m_frameStats.cuInterDistribution[depth][2];
140
-        m_frame->m_encData->m_frameStats.percentInterDistribution[depth][0] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][0] * 100) / m_frame->m_encData->m_frameStats.totalCu;
141
-        m_frame->m_encData->m_frameStats.percentInterDistribution[depth][1] = (double)(cuInterRectCnt * 100) / m_frame->m_encData->m_frameStats.totalCu;
142
-        m_frame->m_encData->m_frameStats.percentInterDistribution[depth][2] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][3] * 100) / m_frame->m_encData->m_frameStats.totalCu;
143
+        m_frame->m_encData->m_frameStats.avgLumaDistortion = (double)(m_frame->m_encData->m_frameStats.lumaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
144
+        m_frame->m_encData->m_frameStats.avgChromaDistortion = (double)(m_frame->m_encData->m_frameStats.chromaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
145
+        m_frame->m_encData->m_frameStats.avgPsyEnergy = (double)(m_frame->m_encData->m_frameStats.psyEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
146
+        m_frame->m_encData->m_frameStats.avgSsimEnergy = (double)(m_frame->m_encData->m_frameStats.ssimEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
147
+        m_frame->m_encData->m_frameStats.avgResEnergy = (double)(m_frame->m_encData->m_frameStats.resEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
148
     }
149
 
150
     m_bs.resetBits();
151
@@ -1096,7 +1110,7 @@
152
     /* Accumulate CU statistics from each worker thread, we could report
153
      * per-frame stats here, but currently we do not. */
154
     for (int i = 0; i < numTLD; i++)
155
-        m_cuStats.accumulate(m_tld[i].analysis.m_stats[m_jpId]);
156
+        m_cuStats.accumulate(m_tld[i].analysis.m_stats[m_jpId], *m_param);
157
 #endif
158
 
159
     m_endFrameTime = x265_mdate();
160
@@ -1106,7 +1120,7 @@
161
 {
162
     Slice* slice = m_frame->m_encData->m_slice;
163
     const uint32_t widthInLCUs = slice->m_sps->numCuInWidth;
164
-    const uint32_t lastCUAddr = (slice->m_endCUAddr + NUM_4x4_PARTITIONS - 1) / NUM_4x4_PARTITIONS;
165
+    const uint32_t lastCUAddr = (slice->m_endCUAddr + m_param->num4x4Partitions - 1) / m_param->num4x4Partitions;
166
     const uint32_t numSubstreams = m_param->bEnableWavefront ? slice->m_sps->numCuInHeight : 1;
167
 
168
     SAOParam* saoParam = slice->m_sps->bUseSAO ? m_frame->m_encData->m_saoParam : NULL;
169
@@ -1208,7 +1222,6 @@
170
     const uint32_t row = (uint32_t)intRow;
171
     CTURow& curRow = m_rows[row];
172
 
173
-    tld.analysis.m_param = m_param;
174
     if (m_param->bEnableWavefront)
175
     {
176
         ScopedLock self(curRow.lock);
177
@@ -1241,7 +1254,7 @@
178
 
179
     uint32_t maxBlockCols = (m_frame->m_fencPic->m_picWidth + (16 - 1)) / 16;
180
     uint32_t maxBlockRows = (m_frame->m_fencPic->m_picHeight + (16 - 1)) / 16;
181
-    uint32_t noOfBlocks = g_maxCUSize / 16;
182
+    uint32_t noOfBlocks = m_param->maxCUSize / 16;
183
     const uint32_t bFirstRowInSlice = ((row == 0) || (m_rows[row - 1].sliceId != curRow.sliceId)) ? 1 : 0;
184
     const uint32_t bLastRowInSlice = ((row == m_numRows - 1) || (m_rows[row + 1].sliceId != curRow.sliceId)) ? 1 : 0;
185
     const uint32_t sliceId = curRow.sliceId;
186
@@ -1320,8 +1333,8 @@
187
     // TODO: specially case handle on first and last row
188
 
189
     // Initialize restrict on MV range in slices
190
-    tld.analysis.m_sliceMinY = -(int16_t)(rowInSlice * g_maxCUSize * 4) + 3 * 4;
191
-    tld.analysis.m_sliceMaxY = (int16_t)((endRowInSlicePlus1 - 1 - row) * (g_maxCUSize * 4) - 4 * 4);
192
+    tld.analysis.m_sliceMinY = -(int16_t)(rowInSlice * m_param->maxCUSize * 4) + 3 * 4;
193
+    tld.analysis.m_sliceMaxY = (int16_t)((endRowInSlicePlus1 - 1 - row) * (m_param->maxCUSize * 4) - 4 * 4);
194
 
195
     // Handle single row slice
196
     if (tld.analysis.m_sliceMaxY < tld.analysis.m_sliceMinY)
197
@@ -1361,8 +1374,8 @@
198
                 cuStat.baseQp = curEncData.m_rowStat[row].rowQp;
199
 
200
             /* TODO: use defines from slicetype.h for lowres block size */
201
-            uint32_t block_y = (ctu->m_cuPelY >> g_maxLog2CUSize) * noOfBlocks;
202
-            uint32_t block_x = (ctu->m_cuPelX >> g_maxLog2CUSize) * noOfBlocks;
203
+            uint32_t block_y = (ctu->m_cuPelY >> m_param->maxLog2CUSize) * noOfBlocks;
204
+            uint32_t block_x = (ctu->m_cuPelX >> m_param->maxLog2CUSize) * noOfBlocks;
205
             
206
             cuStat.vbvCost = 0;
207
             cuStat.intraVbvCost = 0;
208
@@ -1473,11 +1486,11 @@
209
             curRow.rowStats.coeffBits += best.coeffBits;
210
             curRow.rowStats.miscBits  += best.totalBits - (best.mvBits + best.coeffBits);
211
 
212
-            for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
213
+            for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
214
             {
215
                 /* 1 << shift == number of 8x8 blocks at current depth */
216
-                int shift = 2 * (g_maxCUDepth - depth);
217
-                int cuSize = g_maxCUSize >> depth;
218
+                int shift = 2 * (m_param->maxCUDepth - depth);
219
+                int cuSize = m_param->maxCUSize >> depth;
220
 
221
                 if (cuSize == 8)
222
                     curRow.rowStats.intra8x8Cnt += (int)(frameLog.cntIntra[depth] + frameLog.cntIntraNxN);
223
@@ -1496,7 +1509,7 @@
224
         curRow.rowStats.resEnergy        += best.resEnergy;
225
         curRow.rowStats.cntIntraNxN      += frameLog.cntIntraNxN;
226
         curRow.rowStats.totalCu          += frameLog.totalCu;
227
-        for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
228
+        for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
229
         {
230
             curRow.rowStats.cntSkipCu[depth] += frameLog.cntSkipCu[depth];
231
             curRow.rowStats.cntMergeCu[depth] += frameLog.cntMergeCu[depth];
232
@@ -1510,14 +1523,17 @@
233
         x265_emms();
234
 
235
         if (bIsVbv)
236
-        {
237
-            // Update encoded bits, satdCost, baseQP for each CU
238
-            curEncData.m_rowStat[row].rowSatd      += curEncData.m_cuStat[cuAddr].vbvCost;
239
-            curEncData.m_rowStat[row].rowIntraSatd += curEncData.m_cuStat[cuAddr].intraVbvCost;
240
-            curEncData.m_rowStat[row].encodedBits   += curEncData.m_cuStat[cuAddr].totalBits;
241
-            curEncData.m_rowStat[row].sumQpRc       += curEncData.m_cuStat[cuAddr].baseQp;
242
-            curEncData.m_rowStat[row].numEncodedCUs = cuAddr;
243
-
244
+        {   
245
+            // Update encoded bits, satdCost, baseQP for each CU if tune grain is disabled
246
+            if ((m_param->bEnableWavefront && (!cuAddr || !m_param->rc.bEnableConstVbv)) || !m_param->bEnableWavefront)
247
+            {
248
+                curEncData.m_rowStat[row].rowSatd += curEncData.m_cuStat[cuAddr].vbvCost;
249
+                curEncData.m_rowStat[row].rowIntraSatd += curEncData.m_cuStat[cuAddr].intraVbvCost;
250
+                curEncData.m_rowStat[row].encodedBits += curEncData.m_cuStat[cuAddr].totalBits;
251
+                curEncData.m_rowStat[row].sumQpRc += curEncData.m_cuStat[cuAddr].baseQp;
252
+                curEncData.m_rowStat[row].numEncodedCUs = cuAddr;
253
+            }
254
+            
255
             // If current block is at row end checkpoint, call vbv ratecontrol.
256
 
257
             if (!m_param->bEnableWavefront && col == numCols - 1)
258
@@ -1553,6 +1569,24 @@
259
 
260
             else if (m_param->bEnableWavefront && row == col && row)
261
             {
262
+                if (m_param->rc.bEnableConstVbv)
263
+                {
264
+                    int32_t startCuAddr = numCols * row;
265
+                    int32_t EndCuAddr = startCuAddr + col;
266
+                    for (int32_t r = row; r >= 0; r--)
267
+                    {
268
+                        for (int32_t c = startCuAddr; c <= EndCuAddr && c <= (int32_t)numCols * (r + 1) - 1; c++)
269
+                        {
270
+                            curEncData.m_rowStat[r].rowSatd += curEncData.m_cuStat[c].vbvCost;
271
+                            curEncData.m_rowStat[r].rowIntraSatd += curEncData.m_cuStat[c].intraVbvCost;
272
+                            curEncData.m_rowStat[r].encodedBits += curEncData.m_cuStat[c].totalBits;
273
+                            curEncData.m_rowStat[r].sumQpRc += curEncData.m_cuStat[c].baseQp;
274
+                            curEncData.m_rowStat[r].numEncodedCUs = c;
275
+                        }
276
+                        startCuAddr = EndCuAddr - numCols;
277
+                        EndCuAddr = startCuAddr + 1;
278
+                    }
279
+                }
280
                 double qpBase = curEncData.m_cuStat[cuAddr].baseQp;
281
                 int reEncode = m_top->m_rateControl->rowVbvRateControl(m_frame, row, &m_rce, qpBase);
282
                 qpBase = x265_clip3((double)m_param->rc.qpMin, (double)m_param->rc.qpMax, qpBase);
283
@@ -1648,6 +1682,23 @@
284
     }
285
 
286
     /** this row of CTUs has been compressed **/
287
+    if (m_param->bEnableWavefront && m_param->rc.bEnableConstVbv)
288
+    {
289
+        if (row == m_numRows - 1)
290
+        {
291
+            for (int32_t r = 0; r < (int32_t)m_numRows; r++)
292
+            {
293
+                for (int32_t c = curEncData.m_rowStat[r].numEncodedCUs + 1; c < (int32_t)numCols * (r + 1); c++)
294
+                {
295
+                    curEncData.m_rowStat[r].rowSatd += curEncData.m_cuStat[c].vbvCost;
296
+                    curEncData.m_rowStat[r].rowIntraSatd += curEncData.m_cuStat[c].intraVbvCost;
297
+                    curEncData.m_rowStat[r].encodedBits += curEncData.m_cuStat[c].totalBits;
298
+                    curEncData.m_rowStat[r].sumQpRc += curEncData.m_cuStat[c].baseQp;
299
+                    curEncData.m_rowStat[r].numEncodedCUs = c;
300
+                }
301
+            }
302
+        }
303
+    }
304
 
305
     /* If encoding with ABR, update update bits and complexity in rate control
306
      * after a number of rows so the next frame's rateControlStart has more
307
@@ -1729,7 +1780,6 @@
308
         }
309
     }
310
 
311
-    tld.analysis.m_param = NULL;
312
     curRow.busy = false;
313
 
314
     // CHECK_ME: Does it always FALSE condition?
315
@@ -1741,73 +1791,36 @@
316
 int FrameEncoder::collectCTUStatistics(const CUData& ctu, FrameStats* log)
317
 {
318
     int totQP = 0;
319
-    if (ctu.m_slice->m_sliceType == I_SLICE)
320
+    uint32_t depth = 0;
321
+    for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
322
     {
323
-        uint32_t depth = 0;
324
-        for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
325
-        {
326
-            depth = ctu.m_cuDepth[absPartIdx];
327
-
328
-            log->totalCu++;
329
-            log->cntIntra[depth]++;
330
-            totQP += ctu.m_qp[absPartIdx] * (ctu.m_numPartitions >> (depth * 2));
331
-
332
-            if (ctu.m_predMode[absPartIdx] == MODE_NONE)
333
-            {
334
-                log->totalCu--;
335
-                log->cntIntra[depth]--;
336
-            }
337
-            else if (ctu.m_partSize[absPartIdx] != SIZE_2Nx2N)
338
-            {
339
-                /* TODO: log intra modes at absPartIdx +0 to +3 */
340
-                X265_CHECK(ctu.m_log2CUSize[absPartIdx] == 3 && ctu.m_slice->m_sps->quadtreeTULog2MinSize < 3, "Intra NxN found at improbable depth\n");
341
-                log->cntIntraNxN++;
342
-                log->cntIntra[depth]--;
343
-            }
344
-            else if (ctu.m_lumaIntraDir[absPartIdx] > 1)
345
-                log->cuIntraDistribution[depth][ANGULAR_MODE_ID]++;
346
-            else
347
-                log->cuIntraDistribution[depth][ctu.m_lumaIntraDir[absPartIdx]]++;
348
-        }
349
+        depth = ctu.m_cuDepth[absPartIdx];
350
+        totQP += ctu.m_qp[absPartIdx] * (ctu.m_numPartitions >> (depth * 2));
351
     }
352
-    else
353
+
354
+    if (m_param->csvLogLevel >= 1 || m_param->rc.bStatWrite)
355
     {
356
-        uint32_t depth = 0;
357
-        for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
358
+        if (ctu.m_slice->m_sliceType == I_SLICE)
359
         {
360
-            depth = ctu.m_cuDepth[absPartIdx];
361
-
362
-            log->totalCu++;
363
-            totQP += ctu.m_qp[absPartIdx] * (ctu.m_numPartitions >> (depth * 2));
364
-
365
-            if (ctu.m_predMode[absPartIdx] == MODE_NONE)
366
-                log->totalCu--;
367
-            else if (ctu.isSkipped(absPartIdx))
368
+            depth = 0;
369
+            for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
370
             {
371
-                if (ctu.m_mergeFlag[0])
372
-                    log->cntMergeCu[depth]++;
373
-                else
374
-                    log->cntSkipCu[depth]++;
375
-            }
376
-            else if (ctu.isInter(absPartIdx))
377
-            {
378
-                log->cntInter[depth]++;
379
+                depth = ctu.m_cuDepth[absPartIdx];
380
 
381
-                if (ctu.m_partSize[absPartIdx] < AMP_ID)
382
-                    log->cuInterDistribution[depth][ctu.m_partSize[absPartIdx]]++;
383
-                else
384
-                    log->cuInterDistribution[depth][AMP_ID]++;
385
-            }
386
-            else if (ctu.isIntra(absPartIdx))
387
-            {
388
+                log->totalCu++;
389
                 log->cntIntra[depth]++;
390
 
391
-                if (ctu.m_partSize[absPartIdx] != SIZE_2Nx2N)
392
+                if (ctu.m_predMode[absPartIdx] == MODE_NONE)
393
+                {
394
+                    log->totalCu--;
395
+                    log->cntIntra[depth]--;
396
+                }
397
+                else if (ctu.m_partSize[absPartIdx] != SIZE_2Nx2N)
398
                 {
399
+                    /* TODO: log intra modes at absPartIdx +0 to +3 */
400
                     X265_CHECK(ctu.m_log2CUSize[absPartIdx] == 3 && ctu.m_slice->m_sps->quadtreeTULog2MinSize < 3, "Intra NxN found at improbable depth\n");
401
                     log->cntIntraNxN++;
402
                     log->cntIntra[depth]--;
403
-                    /* TODO: log intra modes at absPartIdx +0 to +3 */
404
                 }
405
                 else if (ctu.m_lumaIntraDir[absPartIdx] > 1)
406
                     log->cuIntraDistribution[depth][ANGULAR_MODE_ID]++;
407
@@ -1815,6 +1828,51 @@
408
                     log->cuIntraDistribution[depth][ctu.m_lumaIntraDir[absPartIdx]]++;
409
             }
410
         }
411
+        else
412
+        {
413
+            depth = 0;
414
+            for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
415
+            {
416
+                depth = ctu.m_cuDepth[absPartIdx];
417
+
418
+                log->totalCu++;
419
+
420
+                if (ctu.m_predMode[absPartIdx] == MODE_NONE)
421
+                    log->totalCu--;
422
+                else if (ctu.isSkipped(absPartIdx))
423
+                {
424
+                    if (ctu.m_mergeFlag[0])
425
+                        log->cntMergeCu[depth]++;
426
+                    else
427
+                        log->cntSkipCu[depth]++;
428
+                }
429
+                else if (ctu.isInter(absPartIdx))
430
+                {
431
+                    log->cntInter[depth]++;
432
+
433
+                    if (ctu.m_partSize[absPartIdx] < AMP_ID)
434
+                        log->cuInterDistribution[depth][ctu.m_partSize[absPartIdx]]++;
435
+                    else
436
+                        log->cuInterDistribution[depth][AMP_ID]++;
437
+                }
438
+                else if (ctu.isIntra(absPartIdx))
439
+                {
440
+                    log->cntIntra[depth]++;
441
+
442
+                    if (ctu.m_partSize[absPartIdx] != SIZE_2Nx2N)
443
+                    {
444
+                        X265_CHECK(ctu.m_log2CUSize[absPartIdx] == 3 && ctu.m_slice->m_sps->quadtreeTULog2MinSize < 3, "Intra NxN found at improbable depth\n");
445
+                        log->cntIntraNxN++;
446
+                        log->cntIntra[depth]--;
447
+                        /* TODO: log intra modes at absPartIdx +0 to +3 */
448
+                    }
449
+                    else if (ctu.m_lumaIntraDir[absPartIdx] > 1)
450
+                        log->cuIntraDistribution[depth][ANGULAR_MODE_ID]++;
451
+                    else
452
+                        log->cuIntraDistribution[depth][ctu.m_lumaIntraDir[absPartIdx]]++;
453
+                }
454
+            }
455
+        }
456
     }
457
 
458
     return totQP;
459
x265_2.4.tar.gz/source/encoder/framefilter.cpp -> x265_2.5.tar.gz/source/encoder/framefilter.cpp Changed
351
 
1
@@ -35,107 +35,126 @@
2
 static uint64_t computeSSD(pixel *fenc, pixel *rec, intptr_t stride, uint32_t width, uint32_t height);
3
 static float calculateSSIM(pixel *pix1, intptr_t stride1, pixel *pix2, intptr_t stride2, uint32_t width, uint32_t height, void *buf, uint32_t& cnt);
4
 
5
-static void integral_init4h(uint32_t *sum, pixel *pix, intptr_t stride)
6
+namespace X265_NS
7
 {
8
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3];
9
-    for (int16_t x = 0; x < stride - 4; x++)
10
+    static void integral_init4h_c(uint32_t *sum, pixel *pix, intptr_t stride)
11
     {
12
-        sum[x] = v + sum[x - stride];
13
-        v += pix[x + 4] - pix[x];
14
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3];
15
+        for (int16_t x = 0; x < stride - 4; x++)
16
+        {
17
+            sum[x] = v + sum[x - stride];
18
+            v += pix[x + 4] - pix[x];
19
+        }
20
     }
21
-}
22
 
23
-static void integral_init8h(uint32_t *sum, pixel *pix, intptr_t stride)
24
-{
25
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7];
26
-    for (int16_t x = 0; x < stride - 8; x++)
27
+    static void integral_init8h_c(uint32_t *sum, pixel *pix, intptr_t stride)
28
     {
29
-        sum[x] = v + sum[x - stride];
30
-        v += pix[x + 8] - pix[x];
31
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7];
32
+        for (int16_t x = 0; x < stride - 8; x++)
33
+        {
34
+            sum[x] = v + sum[x - stride];
35
+            v += pix[x + 8] - pix[x];
36
+        }
37
     }
38
-}
39
 
40
-static void integral_init12h(uint32_t *sum, pixel *pix, intptr_t stride)
41
-{
42
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
43
-        pix[8] + pix[9] + pix[10] + pix[11];
44
-    for (int16_t x = 0; x < stride - 12; x++)
45
+    static void integral_init12h_c(uint32_t *sum, pixel *pix, intptr_t stride)
46
     {
47
-        sum[x] = v + sum[x - stride];
48
-        v += pix[x + 12] - pix[x];
49
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
50
+            pix[8] + pix[9] + pix[10] + pix[11];
51
+        for (int16_t x = 0; x < stride - 12; x++)
52
+        {
53
+            sum[x] = v + sum[x - stride];
54
+            v += pix[x + 12] - pix[x];
55
+        }
56
     }
57
-}
58
 
59
-static void integral_init16h(uint32_t *sum, pixel *pix, intptr_t stride)
60
-{
61
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
62
-        pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15];
63
-    for (int16_t x = 0; x < stride - 16; x++)
64
+    static void integral_init16h_c(uint32_t *sum, pixel *pix, intptr_t stride)
65
     {
66
-        sum[x] = v + sum[x - stride];
67
-        v += pix[x + 16] - pix[x];
68
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
69
+            pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15];
70
+        for (int16_t x = 0; x < stride - 16; x++)
71
+        {
72
+            sum[x] = v + sum[x - stride];
73
+            v += pix[x + 16] - pix[x];
74
+        }
75
     }
76
-}
77
 
78
-static void integral_init24h(uint32_t *sum, pixel *pix, intptr_t stride)
79
-{
80
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
81
-        pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
82
-        pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23];
83
-    for (int16_t x = 0; x < stride - 24; x++)
84
+    static void integral_init24h_c(uint32_t *sum, pixel *pix, intptr_t stride)
85
     {
86
-        sum[x] = v + sum[x - stride];
87
-        v += pix[x + 24] - pix[x];
88
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
89
+            pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
90
+            pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23];
91
+        for (int16_t x = 0; x < stride - 24; x++)
92
+        {
93
+            sum[x] = v + sum[x - stride];
94
+            v += pix[x + 24] - pix[x];
95
+        }
96
     }
97
-}
98
 
99
-static void integral_init32h(uint32_t *sum, pixel *pix, intptr_t stride)
100
-{
101
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
102
-        pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
103
-        pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23] +
104
-        pix[24] + pix[25] + pix[26] + pix[27] + pix[28] + pix[29] + pix[30] + pix[31];
105
-    for (int16_t x = 0; x < stride - 32; x++)
106
+    static void integral_init32h_c(uint32_t *sum, pixel *pix, intptr_t stride)
107
     {
108
-        sum[x] = v + sum[x - stride];
109
-        v += pix[x + 32] - pix[x];
110
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
111
+            pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
112
+            pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23] +
113
+            pix[24] + pix[25] + pix[26] + pix[27] + pix[28] + pix[29] + pix[30] + pix[31];
114
+        for (int16_t x = 0; x < stride - 32; x++)
115
+        {
116
+            sum[x] = v + sum[x - stride];
117
+            v += pix[x + 32] - pix[x];
118
+        }
119
     }
120
-}
121
 
122
-static void integral_init4v(uint32_t *sum4, intptr_t stride)
123
-{
124
-    for (int x = 0; x < stride; x++)
125
-        sum4[x] = sum4[x + 4 * stride] - sum4[x];
126
-}
127
+    static void integral_init4v_c(uint32_t *sum4, intptr_t stride)
128
+    {
129
+        for (int x = 0; x < stride; x++)
130
+            sum4[x] = sum4[x + 4 * stride] - sum4[x];
131
+    }
132
 
133
-static void integral_init8v(uint32_t *sum8, intptr_t stride)
134
-{
135
-    for (int x = 0; x < stride; x++)
136
-        sum8[x] = sum8[x + 8 * stride] - sum8[x];
137
-}
138
+    static void integral_init8v_c(uint32_t *sum8, intptr_t stride)
139
+    {
140
+        for (int x = 0; x < stride; x++)
141
+            sum8[x] = sum8[x + 8 * stride] - sum8[x];
142
+    }
143
 
144
-static void integral_init12v(uint32_t *sum12, intptr_t stride)
145
-{
146
-    for (int x = 0; x < stride; x++)
147
-        sum12[x] = sum12[x + 12 * stride] - sum12[x];
148
-}
149
+    static void integral_init12v_c(uint32_t *sum12, intptr_t stride)
150
+    {
151
+        for (int x = 0; x < stride; x++)
152
+            sum12[x] = sum12[x + 12 * stride] - sum12[x];
153
+    }
154
 
155
-static void integral_init16v(uint32_t *sum16, intptr_t stride)
156
-{
157
-    for (int x = 0; x < stride; x++)
158
-        sum16[x] = sum16[x + 16 * stride] - sum16[x];
159
-}
160
+    static void integral_init16v_c(uint32_t *sum16, intptr_t stride)
161
+    {
162
+        for (int x = 0; x < stride; x++)
163
+            sum16[x] = sum16[x + 16 * stride] - sum16[x];
164
+    }
165
 
166
-static void integral_init24v(uint32_t *sum24, intptr_t stride)
167
-{
168
-    for (int x = 0; x < stride; x++)
169
-        sum24[x] = sum24[x + 24 * stride] - sum24[x];
170
-}
171
+    static void integral_init24v_c(uint32_t *sum24, intptr_t stride)
172
+    {
173
+        for (int x = 0; x < stride; x++)
174
+            sum24[x] = sum24[x + 24 * stride] - sum24[x];
175
+    }
176
 
177
-static void integral_init32v(uint32_t *sum32, intptr_t stride)
178
-{
179
-    for (int x = 0; x < stride; x++)
180
-        sum32[x] = sum32[x + 32 * stride] - sum32[x];
181
+    static void integral_init32v_c(uint32_t *sum32, intptr_t stride)
182
+    {
183
+        for (int x = 0; x < stride; x++)
184
+            sum32[x] = sum32[x + 32 * stride] - sum32[x];
185
+    }
186
+
187
+    void setupSeaIntegralPrimitives_c(EncoderPrimitives &p)
188
+    {
189
+        p.integral_initv[INTEGRAL_4] = integral_init4v_c;
190
+        p.integral_initv[INTEGRAL_8] = integral_init8v_c;
191
+        p.integral_initv[INTEGRAL_12] = integral_init12v_c;
192
+        p.integral_initv[INTEGRAL_16] = integral_init16v_c;
193
+        p.integral_initv[INTEGRAL_24] = integral_init24v_c;
194
+        p.integral_initv[INTEGRAL_32] = integral_init32v_c;
195
+        p.integral_inith[INTEGRAL_4] = integral_init4h_c;
196
+        p.integral_inith[INTEGRAL_8] = integral_init8h_c;
197
+        p.integral_inith[INTEGRAL_12] = integral_init12h_c;
198
+        p.integral_inith[INTEGRAL_16] = integral_init16h_c;
199
+        p.integral_inith[INTEGRAL_24] = integral_init24h_c;
200
+        p.integral_inith[INTEGRAL_32] = integral_init32h_c;
201
+    }
202
 }
203
 
204
 void FrameFilter::destroy()
205
@@ -166,8 +185,8 @@
206
     m_pad[0] = top->m_sps.conformanceWindow.rightOffset;
207
     m_pad[1] = top->m_sps.conformanceWindow.bottomOffset;
208
     m_saoRowDelay = m_param->bEnableLoopFilter ? 1 : 0;
209
-    m_lastHeight = (m_param->sourceHeight % g_maxCUSize) ? (m_param->sourceHeight % g_maxCUSize) : g_maxCUSize;
210
-    m_lastWidth = (m_param->sourceWidth % g_maxCUSize) ? (m_param->sourceWidth % g_maxCUSize) : g_maxCUSize;
211
+    m_lastHeight = (m_param->sourceHeight % m_param->maxCUSize) ? (m_param->sourceHeight % m_param->maxCUSize) : m_param->maxCUSize;
212
+    m_lastWidth = (m_param->sourceWidth % m_param->maxCUSize) ? (m_param->sourceWidth % m_param->maxCUSize) : m_param->maxCUSize;
213
     integralCompleted.set(0);
214
 
215
     if (m_param->bEnableSsim)
216
@@ -195,7 +214,7 @@
217
         for(int row = 0; row < numRows; row++)
218
         {
219
             // Setting maximum bound information
220
-            m_parallelFilter[row].m_rowHeight = (row == numRows - 1) ? m_lastHeight : g_maxCUSize;
221
+            m_parallelFilter[row].m_rowHeight = (row == numRows - 1) ? m_lastHeight : m_param->maxCUSize;
222
             m_parallelFilter[row].m_row = row;
223
             m_parallelFilter[row].m_rowAddr = row * numCols;
224
             m_parallelFilter[row].m_frameFilter = this;
225
@@ -281,7 +300,7 @@
226
 void FrameFilter::ParallelFilter::copySaoAboveRef(const CUData *ctu, PicYuv* reconPic, uint32_t cuAddr, int col)
227
 {
228
     // Copy SAO Top Reference Pixels
229
-    int ctuWidth  = g_maxCUSize;
230
+    int ctuWidth  = ctu->m_encData->m_param->maxCUSize;
231
     const pixel* recY = reconPic->getPlaneAddr(0, cuAddr) - (ctu->m_bFirstRowInSlice ? 0 : reconPic->m_stride);
232
 
233
     // Luma
234
@@ -682,8 +701,8 @@
235
         intptr_t stride2 = m_frame->m_fencPic->m_stride;
236
         uint32_t bEnd = ((row) == (this->m_numRows - 1));
237
         uint32_t bStart = (row == 0);
238
-        uint32_t minPixY = row * g_maxCUSize - 4 * !bStart;
239
-        uint32_t maxPixY = X265_MIN((row + 1) * g_maxCUSize - 4 * !bEnd, (uint32_t)m_param->sourceHeight);
240
+        uint32_t minPixY = row * m_param->maxCUSize - 4 * !bStart;
241
+        uint32_t maxPixY = X265_MIN((row + 1) * m_param->maxCUSize - 4 * !bEnd, (uint32_t)m_param->sourceHeight);
242
         uint32_t ssim_cnt;
243
         x265_emms();
244
 
245
@@ -749,7 +768,7 @@
246
             uint32_t width = reconPic->m_picWidth;
247
             uint32_t height = m_parallelFilter[row].getCUHeight();
248
             intptr_t stride = reconPic->m_stride;
249
-            uint32_t cuHeight = g_maxCUSize;
250
+            uint32_t cuHeight = m_param->maxCUSize;
251
 
252
             if (!row)
253
                 m_frameEncoder->m_checksum[0] = 0;
254
@@ -793,18 +812,18 @@
255
         }
256
 
257
         int stride = (int)m_frame->m_reconPic->m_stride;
258
-        int padX = g_maxCUSize + 32;
259
-        int padY = g_maxCUSize + 16;
260
+        int padX = m_param->maxCUSize + 32;
261
+        int padY = m_param->maxCUSize + 16;
262
         int numCuInHeight = m_frame->m_encData->m_slice->m_sps->numCuInHeight;
263
-        int maxHeight = numCuInHeight * g_maxCUSize;
264
+        int maxHeight = numCuInHeight * m_param->maxCUSize;
265
         int startRow = 0;
266
 
267
         if (m_param->interlaceMode)
268
-            startRow = (row * g_maxCUSize >> 1);
269
+            startRow = (row * m_param->maxCUSize >> 1);
270
         else
271
-            startRow = row * g_maxCUSize;
272
+            startRow = row * m_param->maxCUSize;
273
 
274
-        int height = lastRow ? (maxHeight + g_maxCUSize * m_param->interlaceMode) : (((row + m_param->interlaceMode) * g_maxCUSize) + g_maxCUSize);
275
+        int height = lastRow ? (maxHeight + m_param->maxCUSize * m_param->interlaceMode) : (((row + m_param->interlaceMode) * m_param->maxCUSize) + m_param->maxCUSize);
276
 
277
         if (!row)
278
         {
279
@@ -833,47 +852,47 @@
280
             uint32_t *sum4x4 = m_frame->m_encData->m_meIntegral[11] + (y + 1) * stride - padX;
281
 
282
             /*For width = 32 */
283
-            integral_init32h(sum32x32, pix, stride);
284
+            primitives.integral_inith[INTEGRAL_32](sum32x32, pix, stride);
285
             if (y >= 32 - padY)
286
-                integral_init32v(sum32x32 - 32 * stride, stride);
287
-            integral_init32h(sum32x24, pix, stride);
288
+                primitives.integral_initv[INTEGRAL_32](sum32x32 - 32 * stride, stride);
289
+            primitives.integral_inith[INTEGRAL_32](sum32x24, pix, stride);
290
             if (y >= 24 - padY)
291
-                integral_init24v(sum32x24 - 24 * stride, stride);
292
-            integral_init32h(sum32x8, pix, stride);
293
+                primitives.integral_initv[INTEGRAL_24](sum32x24 - 24 * stride, stride);
294
+            primitives.integral_inith[INTEGRAL_32](sum32x8, pix, stride);
295
             if (y >= 8 - padY)
296
-                integral_init8v(sum32x8 - 8 * stride, stride);
297
+                primitives.integral_initv[INTEGRAL_8](sum32x8 - 8 * stride, stride);
298
             /*For width = 24 */
299
-            integral_init24h(sum24x32, pix, stride);
300
+            primitives.integral_inith[INTEGRAL_24](sum24x32, pix, stride);
301
             if (y >= 32 - padY)
302
-                integral_init32v(sum24x32 - 32 * stride, stride);
303
+                primitives.integral_initv[INTEGRAL_32](sum24x32 - 32 * stride, stride);
304
             /*For width = 16 */
305
-            integral_init16h(sum16x16, pix, stride);
306
+            primitives.integral_inith[INTEGRAL_16](sum16x16, pix, stride);
307
             if (y >= 16 - padY)
308
-                integral_init16v(sum16x16 - 16 * stride, stride);
309
-            integral_init16h(sum16x12, pix, stride);
310
+                primitives.integral_initv[INTEGRAL_16](sum16x16 - 16 * stride, stride);
311
+            primitives.integral_inith[INTEGRAL_16](sum16x12, pix, stride);
312
             if (y >= 12 - padY)
313
-                integral_init12v(sum16x12 - 12 * stride, stride);
314
-            integral_init16h(sum16x4, pix, stride);
315
+                primitives.integral_initv[INTEGRAL_12](sum16x12 - 12 * stride, stride);
316
+            primitives.integral_inith[INTEGRAL_16](sum16x4, pix, stride);
317
             if (y >= 4 - padY)
318
-                integral_init4v(sum16x4 - 4 * stride, stride);
319
+                primitives.integral_initv[INTEGRAL_4](sum16x4 - 4 * stride, stride);
320
             /*For width = 12 */
321
-            integral_init12h(sum12x16, pix, stride);
322
+            primitives.integral_inith[INTEGRAL_12](sum12x16, pix, stride);
323
             if (y >= 16 - padY)
324
-                integral_init16v(sum12x16 - 16 * stride, stride);
325
+                primitives.integral_initv[INTEGRAL_16](sum12x16 - 16 * stride, stride);
326
             /*For width = 8 */
327
-            integral_init8h(sum8x32, pix, stride);
328
+            primitives.integral_inith[INTEGRAL_8](sum8x32, pix, stride);
329
             if (y >= 32 - padY)
330
-                integral_init32v(sum8x32 - 32 * stride, stride);
331
-            integral_init8h(sum8x8, pix, stride);
332
+                primitives.integral_initv[INTEGRAL_32](sum8x32 - 32 * stride, stride);
333
+            primitives.integral_inith[INTEGRAL_8](sum8x8, pix, stride);
334
             if (y >= 8 - padY)
335
-                integral_init8v(sum8x8 - 8 * stride, stride);
336
+                primitives.integral_initv[INTEGRAL_8](sum8x8 - 8 * stride, stride);
337
             /*For width = 4 */
338
-            integral_init4h(sum4x16, pix, stride);
339
+            primitives.integral_inith[INTEGRAL_4](sum4x16, pix, stride);
340
             if (y >= 16 - padY)
341
-                integral_init16v(sum4x16 - 16 * stride, stride);
342
-            integral_init4h(sum4x4, pix, stride);
343
+                primitives.integral_initv[INTEGRAL_16](sum4x16 - 16 * stride, stride);
344
+            primitives.integral_inith[INTEGRAL_4](sum4x4, pix, stride);
345
             if (y >= 4 - padY)
346
-                integral_init4v(sum4x4 - 4 * stride, stride);
347
+                primitives.integral_initv[INTEGRAL_4](sum4x4 - 4 * stride, stride);
348
         }
349
         m_parallelFilter[row].m_frameFilter->integralCompleted.set(1);
350
     }
351
x265_2.4.tar.gz/source/encoder/framefilter.h -> x265_2.5.tar.gz/source/encoder/framefilter.h Changed
10
 
1
@@ -123,7 +123,7 @@
2
 
3
     uint32_t getCUWidth(int colNum) const
4
     {
5
-        return (colNum == (int)m_numCols - 1) ? m_lastWidth : g_maxCUSize;
6
+        return (colNum == (int)m_numCols - 1) ? m_lastWidth : m_param->maxCUSize;
7
     }
8
 
9
     void init(Encoder *top, FrameEncoder *frame, int numRows, uint32_t numCols);
10
x265_2.4.tar.gz/source/encoder/motion.cpp -> x265_2.5.tar.gz/source/encoder/motion.cpp Changed
158
 
1
@@ -598,6 +598,139 @@
2
     }
3
 }
4
 
5
+void MotionEstimate::refineMV(ReferencePlanes* ref,
6
+                              const MV&        mvmin,
7
+                              const MV&        mvmax,
8
+                              const MV&        qmvp,
9
+                              MV&              outQMv)
10
+{
11
+    ALIGN_VAR_16(int, costs[16]);
12
+    if (ctuAddr >= 0)
13
+        blockOffset = ref->reconPic->getLumaAddr(ctuAddr, absPartIdx) - ref->reconPic->getLumaAddr(0);
14
+    intptr_t stride = ref->lumaStride;
15
+    pixel* fenc = fencPUYuv.m_buf[0];
16
+    pixel* fref = ref->fpelPlane[0] + blockOffset;
17
+    
18
+    setMVP(qmvp);
19
+    
20
+    MV qmvmin = mvmin.toQPel();
21
+    MV qmvmax = mvmax.toQPel();
22
+   
23
+    /* The term cost used here means satd/sad values for that particular search.
24
+     * The costs used in ME integer search only includes the SAD cost of motion
25
+     * residual and sqrtLambda times MVD bits.  The subpel refine steps use SATD
26
+     * cost of residual and sqrtLambda * MVD bits.
27
+    */
28
+             
29
+    // measure SATD cost at clipped QPEL MVP
30
+    MV pmv = qmvp.clipped(qmvmin, qmvmax);
31
+    MV bestpre = pmv;
32
+    int bprecost;
33
+
34
+    bprecost = subpelCompare(ref, pmv, sad);
35
+
36
+    /* re-measure full pel rounded MVP with SAD as search start point */
37
+    MV bmv = pmv.roundToFPel();
38
+    int bcost = bprecost;
39
+    if (pmv.isSubpel())
40
+        bcost = sad(fenc, FENC_STRIDE, fref + bmv.x + bmv.y * stride, stride) + mvcost(bmv << 2);
41
+
42
+    /* square refine */
43
+    int dir = 0;
44
+    COST_MV_X4_DIR(0, -1, 0, 1, -1, 0, 1, 0, costs);
45
+    if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
46
+        COPY2_IF_LT(bcost, costs[0], dir, 1);
47
+    if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
48
+        COPY2_IF_LT(bcost, costs[1], dir, 2);
49
+    COPY2_IF_LT(bcost, costs[2], dir, 3);
50
+    COPY2_IF_LT(bcost, costs[3], dir, 4);
51
+    COST_MV_X4_DIR(-1, -1, -1, 1, 1, -1, 1, 1, costs);
52
+    if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
53
+        COPY2_IF_LT(bcost, costs[0], dir, 5);
54
+    if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
55
+        COPY2_IF_LT(bcost, costs[1], dir, 6);
56
+    if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
57
+        COPY2_IF_LT(bcost, costs[2], dir, 7);
58
+    if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
59
+        COPY2_IF_LT(bcost, costs[3], dir, 8);
60
+    bmv += square1[dir];
61
+
62
+    if (bprecost < bcost)
63
+    {
64
+        bmv = bestpre;
65
+        bcost = bprecost;
66
+    }
67
+    else
68
+        bmv = bmv.toQPel(); // promote search bmv to qpel
69
+
70
+    // TO DO: Change SubpelWorkload to fine tune MV
71
+    // Now it is set to 5 for experiment.
72
+    // const SubpelWorkload& wl = workload[this->subpelRefine];
73
+    const SubpelWorkload& wl = workload[5];
74
+
75
+    pixelcmp_t hpelcomp;
76
+
77
+    if (wl.hpel_satd)
78
+    {
79
+        bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
80
+        hpelcomp = satd;
81
+    }
82
+    else
83
+        hpelcomp = sad;
84
+
85
+    for (int iter = 0; iter < wl.hpel_iters; iter++)
86
+    {
87
+        int bdir = 0;
88
+        for (int i = 1; i <= wl.hpel_dirs; i++)
89
+        {
90
+            MV qmv = bmv + square1[i] * 2;            
91
+
92
+            // check mv range for slice bound
93
+            if ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y))
94
+                continue;
95
+
96
+            int cost = subpelCompare(ref, qmv, hpelcomp) + mvcost(qmv);
97
+            COPY2_IF_LT(bcost, cost, bdir, i);
98
+        }
99
+
100
+        if (bdir)
101
+            bmv += square1[bdir] * 2;            
102
+        else
103
+            break;
104
+    }
105
+
106
+    /* if HPEL search used SAD, remeasure with SATD before QPEL */
107
+    if (!wl.hpel_satd)
108
+        bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
109
+
110
+    for (int iter = 0; iter < wl.qpel_iters; iter++)
111
+    {
112
+        int bdir = 0;
113
+        for (int i = 1; i <= wl.qpel_dirs; i++)
114
+        {
115
+            MV qmv = bmv + square1[i];
116
+            
117
+            // check mv range for slice bound
118
+            if ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y))
119
+                continue;
120
+
121
+            int cost = subpelCompare(ref, qmv, satd) + mvcost(qmv);
122
+            COPY2_IF_LT(bcost, cost, bdir, i);
123
+        }
124
+
125
+        if (bdir)
126
+            bmv += square1[bdir];
127
+        else
128
+            break;
129
+    }
130
+
131
+    // check mv range for slice bound
132
+    X265_CHECK(((pmv.y >= qmvmin.y) & (pmv.y <= qmvmax.y)), "mv beyond range!");
133
+    
134
+    x265_emms();
135
+    outQMv = bmv;
136
+}
137
+
138
 int MotionEstimate::motionEstimate(ReferencePlanes *ref,
139
                                    const MV &       mvmin,
140
                                    const MV &       mvmax,
141
@@ -606,6 +739,7 @@
142
                                    const MV *       mvc,
143
                                    int              merange,
144
                                    MV &             outQMv,
145
+                                   uint32_t         maxSlices,
146
                                    pixel *          srcReferencePlane)
147
 {
148
     ALIGN_VAR_16(int, costs[16]);
149
@@ -1306,7 +1440,7 @@
150
     const SubpelWorkload& wl = workload[this->subpelRefine];
151
 
152
     // check mv range for slice bound
153
-    if ((g_maxSlices > 1) & ((bmv.y < qmvmin.y) | (bmv.y > qmvmax.y)))
154
+    if ((maxSlices > 1) & ((bmv.y < qmvmin.y) | (bmv.y > qmvmax.y)))
155
     {
156
         bmv.y = x265_min(x265_max(bmv.y, qmvmin.y), qmvmax.y);
157
         bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
158
x265_2.4.tar.gz/source/encoder/motion.h -> x265_2.5.tar.gz/source/encoder/motion.h Changed
11
 
1
@@ -92,7 +92,8 @@
2
                chromaSatd(refYuv.getCrAddr(puPartIdx), refYuv.m_csize, fencPUYuv.m_buf[2], fencPUYuv.m_csize);
3
     }
4
 
5
-    int motionEstimate(ReferencePlanes* ref, const MV & mvmin, const MV & mvmax, const MV & qmvp, int numCandidates, const MV * mvc, int merange, MV & outQMv, pixel *srcReferencePlane = 0);
6
+    void refineMV(ReferencePlanes* ref, const MV& mvmin, const MV& mvmax, const MV& qmvp, MV& outQMv);
7
+    int motionEstimate(ReferencePlanes* ref, const MV & mvmin, const MV & mvmax, const MV & qmvp, int numCandidates, const MV * mvc, int merange, MV & outQMv, uint32_t maxSlices, pixel *srcReferencePlane = 0);
8
 
9
     int subpelCompare(ReferencePlanes* ref, const MV &qmv, pixelcmp_t);
10
 
11
x265_2.4.tar.gz/source/encoder/ratecontrol.cpp -> x265_2.5.tar.gz/source/encoder/ratecontrol.cpp Changed
52
 
1
@@ -2272,7 +2272,7 @@
2
             uint32_t refRowSatdCost = 0, refRowBits = 0, intraCostForPendingCus = 0;
3
             double refQScale = 0;
4
 
5
-            if (picType != I_SLICE)
6
+            if (picType != I_SLICE && !m_param->rc.bEnableConstVbv)
7
             {
8
                 FrameData& refEncData = *refFrame->m_encData;
9
                 uint32_t endCuAddr = maxCols * (row + 1);
10
@@ -2301,7 +2301,8 @@
11
                     && refFrame 
12
                     && refFrame->m_encData->m_slice->m_sliceType == picType
13
                     && refQScale > 0
14
-                    && refRowSatdCost > 0)
15
+                    && refRowBits > 0
16
+                    && !m_param->rc.bEnableConstVbv)
17
                 {
18
                     if (abs((int32_t)(refRowSatdCost - satdCostForPendingCus)) < (int32_t)satdCostForPendingCus / 2)
19
                     {
20
@@ -2343,7 +2344,7 @@
21
     }
22
     rowSatdCost >>= X265_DEPTH - 8;
23
     updatePredictor(rce->rowPred[0], qScaleVbv, (double)rowSatdCost, encodedBits);
24
-    if (curEncData.m_slice->m_sliceType != I_SLICE)
25
+    if (curEncData.m_slice->m_sliceType != I_SLICE && !m_param->rc.bEnableConstVbv)
26
     {
27
         Frame* refFrame = curEncData.m_slice->m_refFrameList[0][0];
28
         if (qpVbv < refFrame->m_encData->m_rowStat[row].rowQp)
29
@@ -2613,7 +2614,7 @@
30
             for (uint32_t i = 0; i < slice->m_sps->numCuInHeight; i++)
31
                 avgQpAq += curEncData.m_rowStat[i].sumQpAq;
32
 
33
-            avgQpAq /= (slice->m_sps->numCUsInFrame * NUM_4x4_PARTITIONS);
34
+            avgQpAq /= (slice->m_sps->numCUsInFrame * m_param->num4x4Partitions);
35
             curEncData.m_avgQpAq = avgQpAq;
36
         }
37
         else
38
@@ -2711,6 +2712,13 @@
39
     {
40
         *filler = updateVbv(actualBits, rce);
41
 
42
+        curFrame->m_rcData->bufferFillFinal = m_bufferFillFinal;
43
+        for (int i = 0; i < 4; i++)
44
+        {
45
+            curFrame->m_rcData->coeff[i] = m_pred[i].coeff;
46
+            curFrame->m_rcData->count[i] = m_pred[i].count;
47
+            curFrame->m_rcData->offset[i] = m_pred[i].offset;
48
+        }
49
         if (m_param->bEmitHRDSEI)
50
         {
51
             const VUI *vui = &curEncData.m_slice->m_sps->vuiParameters;
52
x265_2.4.tar.gz/source/encoder/reference.cpp -> x265_2.5.tar.gz/source/encoder/reference.cpp Changed
36
 
1
@@ -72,12 +72,12 @@
2
 
3
     if (wp)
4
     {
5
-        uint32_t numCUinHeight = (reconPic->m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
6
+        uint32_t numCUinHeight = (reconPic->m_picHeight + p.maxCUSize - 1) / p.maxCUSize;
7
 
8
         int marginX = reconPic->m_lumaMarginX;
9
         int marginY = reconPic->m_lumaMarginY;
10
         intptr_t stride = reconPic->m_stride;
11
-        int cuHeight = g_maxCUSize;
12
+        int cuHeight = p.maxCUSize;
13
 
14
         for (int c = 0; c < (p.internalCsp != X265_CSP_I400 && recPic->m_picCsp != X265_CSP_I400 ? numInterpPlanes : 1); c++)
15
         {
16
@@ -127,15 +127,15 @@
17
     int marginY = reconPic->m_lumaMarginY;
18
     intptr_t stride = reconPic->m_stride;
19
     int width   = reconPic->m_picWidth;
20
-    int height  = (finishedRows - numWeightedRows) * g_maxCUSize;
21
+    int height  = (finishedRows - numWeightedRows) * reconPic->m_param->maxCUSize;
22
     /* the last row may be partial height */
23
     if (finishedRows == maxNumRows - 1)
24
     {
25
-        const int leftRows = (reconPic->m_picHeight & (g_maxCUSize - 1));
26
+        const int leftRows = (reconPic->m_picHeight & (reconPic->m_param->maxCUSize - 1));
27
 
28
-        height += leftRows ? leftRows : g_maxCUSize;
29
+        height += leftRows ? leftRows : reconPic->m_param->maxCUSize;
30
     }
31
-    int cuHeight = g_maxCUSize;
32
+    int cuHeight = reconPic->m_param->maxCUSize;
33
 
34
     for (int c = 0; c < numInterpPlanes; c++)
35
     {
36
x265_2.4.tar.gz/source/encoder/sao.cpp -> x265_2.5.tar.gz/source/encoder/sao.cpp Changed
118
 
1
@@ -98,8 +98,8 @@
2
     m_hChromaShift = CHROMA_H_SHIFT(param->internalCsp);
3
     m_vChromaShift = CHROMA_V_SHIFT(param->internalCsp);
4
 
5
-    m_numCuInWidth =  (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize;
6
-    m_numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
7
+    m_numCuInWidth =  (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
8
+    m_numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
9
 
10
     const pixel maxY = (1 << X265_DEPTH) - 1;
11
     const pixel rangeExt = maxY >> 1;
12
@@ -107,12 +107,12 @@
13
 
14
     for (int i = 0; i < (param->internalCsp != X265_CSP_I400 ? 3 : 1); i++)
15
     {
16
-        CHECKED_MALLOC(m_tmpL1[i], pixel, g_maxCUSize + 1);
17
-        CHECKED_MALLOC(m_tmpL2[i], pixel, g_maxCUSize + 1);
18
+        CHECKED_MALLOC(m_tmpL1[i], pixel, m_param->maxCUSize + 1);
19
+        CHECKED_MALLOC(m_tmpL2[i], pixel, m_param->maxCUSize + 1);
20
 
21
         // SAO asm code will read 1 pixel before and after, so pad by 2
22
         // NOTE: m_param->sourceWidth+2 enough, to avoid condition check in copySaoAboveRef(), I alloc more up to 63 bytes in here
23
-        CHECKED_MALLOC(m_tmpU[i], pixel, m_numCuInWidth * g_maxCUSize + 2 + 32);
24
+        CHECKED_MALLOC(m_tmpU[i], pixel, m_numCuInWidth * m_param->maxCUSize + 2 + 32);
25
         m_tmpU[i] += 1;
26
     }
27
 
28
@@ -279,8 +279,8 @@
29
     uint32_t picWidth  = m_param->sourceWidth;
30
     uint32_t picHeight = m_param->sourceHeight;
31
     const CUData* cu = m_frame->m_encData->getPicCTU(addr);
32
-    int ctuWidth = g_maxCUSize;
33
-    int ctuHeight = g_maxCUSize;
34
+    int ctuWidth = m_param->maxCUSize;
35
+    int ctuHeight = m_param->maxCUSize;
36
     uint32_t lpelx = cu->m_cuPelX;
37
     uint32_t tpely = cu->m_cuPelY;
38
     const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
39
@@ -573,8 +573,8 @@
40
 {
41
     PicYuv* reconPic = m_frame->m_reconPic;
42
     intptr_t stride = reconPic->m_stride;
43
-    int ctuWidth  = g_maxCUSize;
44
-    int ctuHeight = g_maxCUSize;
45
+    int ctuWidth = m_param->maxCUSize;
46
+    int ctuHeight = m_param->maxCUSize;
47
 
48
     int addr = idxY * m_numCuInWidth + idxX;
49
     pixel* rec = reconPic->getLumaAddr(addr);
50
@@ -633,8 +633,8 @@
51
 {
52
     PicYuv* reconPic = m_frame->m_reconPic;
53
     intptr_t stride = reconPic->m_strideC;
54
-    int ctuWidth  = g_maxCUSize;
55
-    int ctuHeight = g_maxCUSize;
56
+    int ctuWidth  = m_param->maxCUSize;
57
+    int ctuHeight = m_param->maxCUSize;
58
 
59
     {
60
         ctuWidth  >>= m_hChromaShift;
61
@@ -744,8 +744,8 @@
62
     intptr_t stride = plane ? reconPic->m_strideC : reconPic->m_stride;
63
     uint32_t picWidth  = m_param->sourceWidth;
64
     uint32_t picHeight = m_param->sourceHeight;
65
-    int ctuWidth  = g_maxCUSize;
66
-    int ctuHeight = g_maxCUSize;
67
+    int ctuWidth  = m_param->maxCUSize;
68
+    int ctuHeight = m_param->maxCUSize;
69
     uint32_t lpelx = cu->m_cuPelX;
70
     uint32_t tpely = cu->m_cuPelY;
71
     const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
72
@@ -791,9 +791,9 @@
73
         // WARNING: *) May read beyond bound on video than ctuWidth or ctuHeight is NOT multiple of cuSize
74
         X265_CHECK((ctuWidth == ctuHeight) || (m_chromaFormat != X265_CSP_I420), "video size check failure\n");
75
         if (plane)
76
-            primitives.chroma[m_chromaFormat].cu[g_maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
77
+            primitives.chroma[m_chromaFormat].cu[m_param->maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
78
         else
79
-           primitives.cu[g_maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
80
+           primitives.cu[m_param->maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
81
     }
82
     else
83
     {
84
@@ -928,8 +928,8 @@
85
     intptr_t stride = reconPic->m_stride;
86
     uint32_t picWidth  = m_param->sourceWidth;
87
     uint32_t picHeight = m_param->sourceHeight;
88
-    int ctuWidth  = g_maxCUSize;
89
-    int ctuHeight = g_maxCUSize;
90
+    int ctuWidth  = m_param->maxCUSize;
91
+    int ctuHeight = m_param->maxCUSize;
92
     uint32_t lpelx = cu->m_cuPelX;
93
     uint32_t tpely = cu->m_cuPelY;
94
     const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
95
@@ -1553,14 +1553,17 @@
96
     }
97
 
98
     // Estimate Best Position
99
-    int64_t bestRDCostBO = MAX_INT64;
100
     int32_t bestClassBO  = 0;
101
+    int64_t currentRDCost = costClasses[0];
102
+    currentRDCost += costClasses[1];
103
+    currentRDCost += costClasses[2];
104
+    currentRDCost += costClasses[3];
105
+    int64_t bestRDCostBO = currentRDCost;
106
 
107
-    for (int i = 0; i < MAX_NUM_SAO_CLASS - SAO_NUM_OFFSET + 1; i++)
108
+    for (int i = 1; i < MAX_NUM_SAO_CLASS - SAO_NUM_OFFSET + 1; i++)
109
     {
110
-        int64_t currentRDCost = 0;
111
-        for (int j = i; j < i + SAO_NUM_OFFSET; j++)
112
-            currentRDCost += costClasses[j];
113
+        currentRDCost -= costClasses[i - 1];
114
+        currentRDCost += costClasses[i + 3];
115
 
116
         if (currentRDCost < bestRDCostBO)
117
         {
118
x265_2.4.tar.gz/source/encoder/search.cpp -> x265_2.5.tar.gz/source/encoder/search.cpp Changed
127
 
1
@@ -120,8 +120,8 @@
2
             CHECKED_MALLOC(m_rqt[i].coeffRQT[0], coeff_t, sizeL + sizeC * 2);
3
             m_rqt[i].coeffRQT[1] = m_rqt[i].coeffRQT[0] + sizeL;
4
             m_rqt[i].coeffRQT[2] = m_rqt[i].coeffRQT[0] + sizeL + sizeC;
5
-            ok &= m_rqt[i].reconQtYuv.create(g_maxCUSize, param.internalCsp);
6
-            ok &= m_rqt[i].resiQtYuv.create(g_maxCUSize, param.internalCsp);
7
+            ok &= m_rqt[i].reconQtYuv.create(param.maxCUSize, param.internalCsp);
8
+            ok &= m_rqt[i].resiQtYuv.create(param.maxCUSize, param.internalCsp);
9
         }
10
     }
11
     else
12
@@ -130,15 +130,15 @@
13
         {
14
             CHECKED_MALLOC(m_rqt[i].coeffRQT[0], coeff_t, sizeL);
15
             m_rqt[i].coeffRQT[1] = m_rqt[i].coeffRQT[2] = NULL;
16
-            ok &= m_rqt[i].reconQtYuv.create(g_maxCUSize, param.internalCsp);
17
-            ok &= m_rqt[i].resiQtYuv.create(g_maxCUSize, param.internalCsp);
18
+            ok &= m_rqt[i].reconQtYuv.create(param.maxCUSize, param.internalCsp);
19
+            ok &= m_rqt[i].resiQtYuv.create(param.maxCUSize, param.internalCsp);
20
         }
21
     }
22
 
23
     /* the rest of these buffers are indexed per-depth */
24
-    for (uint32_t i = 0; i <= g_maxCUDepth; i++)
25
+    for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
26
     {
27
-        int cuSize = g_maxCUSize >> i;
28
+        int cuSize = param.maxCUSize >> i;
29
         ok &= m_rqt[i].tmpResiYuv.create(cuSize, param.internalCsp);
30
         ok &= m_rqt[i].tmpPredYuv.create(cuSize, param.internalCsp);
31
         ok &= m_rqt[i].bidirPredYuv[0].create(cuSize, param.internalCsp);
32
@@ -186,7 +186,7 @@
33
         m_rqt[i].resiQtYuv.destroy();
34
     }
35
 
36
-    for (uint32_t i = 0; i <= g_maxCUDepth; i++)
37
+    for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
38
     {
39
         m_rqt[i].tmpResiYuv.destroy();
40
         m_rqt[i].tmpPredYuv.destroy();
41
@@ -2073,7 +2073,7 @@
42
     int mvpIdx = selectMVP(interMode.cu, pu, amvp, list, ref);
43
     MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx];
44
 
45
-    if (!m_param->analysisMode) /* Prevents load/save outputs from diverging if lowresMV is not available */
46
+    if (!m_param->analysisReuseMode) /* Prevents load/save outputs from diverging if lowresMV is not available */
47
     {
48
         MV lmv = getLowresMV(interMode.cu, pu, list, ref);
49
         if (lmv.notZero())
50
@@ -2082,7 +2082,7 @@
51
 
52
     setSearchRange(interMode.cu, mvp, m_param->searchRange, mvmin, mvmax);
53
 
54
-    int satdCost = m_me.motionEstimate(&m_slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, 
55
+    int satdCost = m_me.motionEstimate(&m_slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices, 
56
       m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
57
 
58
     /* Get total cost of partition, but only include MV bit cost once */
59
@@ -2108,6 +2108,17 @@
60
     }
61
 }
62
 
63
+void Search::searchMV(Mode& interMode, const PredictionUnit& pu, int list, int ref, MV& outmv)
64
+{
65
+    CUData& cu = interMode.cu;
66
+    const Slice *slice = m_slice;
67
+    MV mv = cu.m_mv[list][pu.puAbsPartIdx];
68
+    cu.clipMv(mv);
69
+    MV mvmin, mvmax;
70
+    setSearchRange(cu, mv, m_param->searchRange, mvmin, mvmax);
71
+    m_me.refineMV(&slice->m_mref[list][ref], mvmin, mvmax, mv, outmv);
72
+}
73
+
74
 /* find the best inter prediction for each PU of specified mode */
75
 void Search::predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChromaMC, uint32_t refMasks[2])
76
 {
77
@@ -2150,7 +2161,7 @@
78
         cu.getNeighbourMV(puIdx, pu.puAbsPartIdx, interMode.interNeighbours);
79
 
80
         /* Uni-directional prediction */
81
-        if ((m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
82
+        if ((m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
83
             || (m_param->analysisMultiPassRefine && m_param->rc.bStatRead))
84
         {
85
             for (int list = 0; list < numPredDir; list++)
86
@@ -2180,7 +2191,7 @@
87
                 if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && mvpIdx == bestME[list].mvpIdx)
88
                     mvpIn = bestME[list].mv;
89
                     
90
-                int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvpIn, numMvc, mvc, m_param->searchRange, outmv,
91
+                int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvpIn, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices, 
92
                   m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
93
 
94
                 /* Get total cost of partition, but only include MV bit cost once */
95
@@ -2286,7 +2297,7 @@
96
                     int mvpIdx = selectMVP(cu, pu, amvp, list, ref);
97
                     MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx];
98
 
99
-                    if (!m_param->analysisMode) /* Prevents load/save outputs from diverging when lowresMV is not available */
100
+                    if (!m_param->analysisReuseMode) /* Prevents load/save outputs from diverging when lowresMV is not available */
101
                     {
102
                         MV lmv = getLowresMV(cu, pu, list, ref);
103
                         if (lmv.notZero())
104
@@ -2300,7 +2311,7 @@
105
                             m_me.integral[planes] = interMode.fencYuv->m_integral[list][ref][planes] + puX * pu.width + puY * pu.height * m_slice->m_refFrameList[list][ref]->m_reconPic->m_stride;
106
                     }
107
                     setSearchRange(cu, mvp, m_param->searchRange, mvmin, mvmax);
108
-                    int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, 
109
+                    int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices, 
110
                       m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
111
 
112
                     /* Get total cost of partition, but only include MV bit cost once */
113
@@ -2582,11 +2593,11 @@
114
     cu.clipMv(mvmax);
115
 
116
     if (cu.m_encData->m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
117
-          cu.m_cuPelX / g_maxCUSize < m_frame->m_encData->m_pir.pirStartCol &&
118
+          cu.m_cuPelX / m_param->maxCUSize < m_frame->m_encData->m_pir.pirStartCol &&
119
           m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol < m_slice->m_sps->numCuInWidth)
120
     {
121
         int safeX, maxSafeMv;
122
-        safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * g_maxCUSize - 3;
123
+        safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * m_param->maxCUSize - 3;
124
         maxSafeMv = (safeX - cu.m_cuPelX) * 4;
125
         mvmax.x = X265_MIN(mvmax.x, maxSafeMv);
126
         mvmin.x = X265_MIN(mvmin.x, maxSafeMv);
127
x265_2.4.tar.gz/source/encoder/search.h -> x265_2.5.tar.gz/source/encoder/search.h Changed
21
 
1
@@ -204,9 +204,9 @@
2
         memset(this, 0, sizeof(*this));
3
     }
4
 
5
-    void accumulate(CUStats& other)
6
+    void accumulate(CUStats& other, x265_param& param)
7
     {
8
-        for (uint32_t i = 0; i <= g_maxCUDepth; i++)
9
+        for (uint32_t i = 0; i <= param.maxCUDepth; i++)
10
         {
11
             intraRDOElapsedTime[i] += other.intraRDOElapsedTime[i];
12
             interRDOElapsedTime[i] += other.interRDOElapsedTime[i];
13
@@ -311,6 +311,7 @@
14
     // estimation inter prediction (non-skip)
15
     void     predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChromaMC, uint32_t masks[2]);
16
 
17
+    void     searchMV(Mode& interMode, const PredictionUnit& pu, int list, int ref, MV& outmv);
18
     // encode residual and compute rd-cost for inter mode
19
     void     encodeResAndCalcRdInterCU(Mode& interMode, const CUGeom& cuGeom);
20
     void     encodeResAndCalcRdSkipCU(Mode& interMode);
21
x265_2.4.tar.gz/source/encoder/sei.cpp -> x265_2.5.tar.gz/source/encoder/sei.cpp Changed
28
 
1
@@ -54,21 +54,23 @@
2
     }
3
     WRITE_CODE(type, 8, "payload_type");
4
     uint32_t payloadSize;
5
-    if (hrdTypes || m_payloadType == USER_DATA_UNREGISTERED)
6
+    if (hrdTypes || m_payloadType == USER_DATA_UNREGISTERED || m_payloadType == USER_DATA_REGISTERED_ITU_T_T35)
7
     {
8
         if (hrdTypes)
9
         {
10
             X265_CHECK(0 == (count.getNumberOfWrittenBits() & 7), "payload unaligned\n");
11
             payloadSize = count.getNumberOfWrittenBits() >> 3;
12
         }
13
-        else
14
+        else if (m_payloadType == USER_DATA_UNREGISTERED)
15
             payloadSize = m_payloadSize + 16;
16
+        else
17
+            payloadSize = m_payloadSize;
18
 
19
         for (; payloadSize >= 0xff; payloadSize -= 0xff)
20
             WRITE_CODE(0xff, 8, "payload_size");
21
         WRITE_CODE(payloadSize, 8, "payload_size");
22
     }
23
-    else if(m_payloadType != USER_DATA_REGISTERED_ITU_T_T35)
24
+    else
25
         WRITE_CODE(m_payloadSize, 8, "payload_size");
26
     /* virtual writeSEI method, write to bs */
27
     writeSEI(sps);
28
x265_2.4.tar.gz/source/encoder/sei.h -> x265_2.5.tar.gz/source/encoder/sei.h Changed
34
 
1
@@ -276,27 +276,17 @@
2
         m_payloadSize = 0;
3
     }
4
 
5
-    uint8_t *cim;
6
+    uint8_t *m_payload;
7
 
8
     // daniel.vt@samsung.com :: for the Creative Intent Meta Data Encoding ( seongnam.oh@samsung.com )
9
     void writeSEI(const SPS&)
10
     {
11
-        if (!cim)
12
+        if (!m_payload)
13
             return;
14
 
15
-        int i = 0;
16
-        int payloadSize = m_payloadSize;
17
-        while (cim[i] == 0xFF)
18
-        {
19
-            i++;
20
-            payloadSize += cim[i];
21
-            WRITE_CODE(0xFF, 8, "payload_size");
22
-        }
23
-        WRITE_CODE(payloadSize, 8, "payload_size");
24
-        i++;
25
-        payloadSize += i;
26
-        for (; i < payloadSize; ++i)
27
-            WRITE_CODE(cim[i], 8, "creative_intent_metadata");
28
+        uint32_t i = 0;
29
+        for (; i < m_payloadSize; ++i)
30
+            WRITE_CODE(m_payload[i], 8, "creative_intent_metadata");
31
     }
32
 };
33
 }
34
x265_2.4.tar.gz/source/encoder/slicetype.cpp -> x265_2.5.tar.gz/source/encoder/slicetype.cpp Changed
52
 
1
@@ -893,7 +893,7 @@
2
     if (m_param->rc.cuTree && !m_param->rc.bStatRead)
3
         /* update row satds based on cutree offsets */
4
         curFrame->m_lowres.satdCost = frameCostRecalculate(frames, p0, p1, b);
5
-    else if (m_param->analysisMode != X265_ANALYSIS_LOAD)
6
+    else if (m_param->analysisReuseMode != X265_ANALYSIS_LOAD || m_param->scaleFactor)
7
     {
8
         if (m_param->rc.aqMode)
9
             curFrame->m_lowres.satdCost = curFrame->m_lowres.costEstAq[b - p0][p1 - b];
10
@@ -907,7 +907,7 @@
11
         curFrame->m_lowres.lowresCostForRc = curFrame->m_lowres.lowresCosts[b - p0][p1 - b];
12
         uint32_t lowresRow = 0, lowresCol = 0, lowresCuIdx = 0, sum = 0, intraSum = 0;
13
         uint32_t scale = m_param->maxCUSize / (2 * X265_LOWRES_CU_SIZE);
14
-        uint32_t numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
15
+        uint32_t numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
16
         uint32_t widthInLowresCu = (uint32_t)m_8x8Width, heightInLowresCu = (uint32_t)m_8x8Height;
17
         double *qp_offset = 0;
18
         /* Factor in qpoffsets based on Aq/Cutree in CU costs */
19
@@ -1638,6 +1638,13 @@
20
             m_isSceneTransition = false; /* Signal end of scene transitioning */
21
     }
22
 
23
+    if (m_param->csvLogLevel >= 2)
24
+    {
25
+        int64_t icost = frames[p1]->costEst[0][0];
26
+        int64_t pcost = frames[p1]->costEst[p1 - p0][0];
27
+        frames[p1]->ipCostRatio = (double)icost / pcost;
28
+    }
29
+
30
     /* A frame is always analysed with bRealScenecut = true first, and then bRealScenecut = false,
31
        the former for I decisions and the latter for P/B decisions. It's possible that the first 
32
        analysis detected scenecuts which were later nulled due to scene transitioning, in which 
33
@@ -1812,7 +1819,8 @@
34
                     MV *mvs = frames[b]->lowresMvs[list][listDist[list]];
35
                     int32_t x = mvs[cuIndex].x;
36
                     int32_t y = mvs[cuIndex].y;
37
-                    displacement += sqrt(pow(abs(x), 2) + pow(abs(y), 2));
38
+                    // NOTE: the dynamic range of abs(x) and abs(y) is 15-bits
39
+                    displacement += sqrt((double)(abs(x) * abs(x)) + (double)(abs(y) * abs(y)));
40
                 }
41
                 else
42
                     displacement += 0.0;
43
@@ -2400,7 +2408,7 @@
44
 
45
         /* ME will never return a cost larger than the cost @MVP, so we do not
46
          * have to check that ME cost is more than the estimated merge cost */
47
-        fencCost = tld.me.motionEstimate(fref, mvmin, mvmax, mvp, 0, NULL, s_merange, *fencMV);
48
+        fencCost = tld.me.motionEstimate(fref, mvmin, mvmax, mvp, 0, NULL, s_merange, *fencMV, m_lookahead.m_param->maxSlices);
49
         if (skipCost < 64 && skipCost < fencCost && bBidir)
50
         {
51
             fencCost = skipCost;
52
x265_2.4.tar.gz/source/test/ipfilterharness.cpp -> x265_2.5.tar.gz/source/test/ipfilterharness.cpp Changed
13
 
1
@@ -38,10 +38,8 @@
2
     {
3
         pixel_test_buff[0][i] = rand() & PIXEL_MAX;
4
         short_test_buff[0][i] = (rand() % (2 * SMAX)) - SMAX;
5
-
6
         pixel_test_buff[1][i] = PIXEL_MIN;
7
-        short_test_buff[1][i] = SMIN;
8
-
9
+        short_test_buff[1][i] = (int16_t)SMIN;
10
         pixel_test_buff[2][i] = PIXEL_MAX;
11
         short_test_buff[2][i] = SMAX;
12
     }
13
x265_2.4.tar.gz/source/test/ipfilterharness.h -> x265_2.5.tar.gz/source/test/ipfilterharness.h Changed
11
 
1
@@ -39,8 +39,7 @@
2
     enum { ITERS = 100 };
3
     enum { TEST_CASES = 3 };
4
     enum { SMAX = 1 << 12 };
5
-    enum { SMIN = -1 << 12 };
6
-
7
+    enum { SMIN = (unsigned)-1 << 12 };
8
     ALIGN_VAR_32(pixel, pixel_buff[TEST_BUF_SIZE]);
9
     int16_t short_buff[TEST_BUF_SIZE];
10
     int16_t IPF_vec_output_s[TEST_BUF_SIZE];
11
x265_2.4.tar.gz/source/test/pixelharness.cpp -> x265_2.5.tar.gz/source/test/pixelharness.cpp Changed
222
 
1
@@ -44,9 +44,8 @@
2
         uchar_test_buff[0][i]   = rand() % ((1 << 8) - 1);
3
         residual_test_buff[0][i] = (rand() % (2 * RMAX + 1)) - RMAX - 1;// For sse_ss only
4
         double_test_buff[0][i]  = (double)(short_test_buff[0][i]) / 256.0;
5
-
6
         pixel_test_buff[1][i]   = PIXEL_MIN;
7
-        short_test_buff[1][i]   = SMIN;
8
+        short_test_buff[1][i]   = (int16_t)SMIN;
9
         short_test_buff1[1][i]  = PIXEL_MIN;
10
         short_test_buff2[1][i]  = -16384;
11
         int_test_buff[1][i]     = SHORT_MIN;
12
@@ -2003,6 +2002,76 @@
13
     return true;
14
 }
15
 
16
+bool PixelHarness::check_integral_initv(integralv_t ref, integralv_t opt)
17
+{
18
+    intptr_t srcStep = 64;
19
+    int j = 0;
20
+    uint32_t dst_ref[BUFFSIZE] = { 0 };
21
+    uint32_t dst_opt[BUFFSIZE] = { 0 };
22
+
23
+    for (int i = 0; i < 64; i++)
24
+    {
25
+        dst_ref[i] = pixel_test_buff[0][i];
26
+        dst_opt[i] = pixel_test_buff[0][i];
27
+    }
28
+
29
+    for (int i = 0, k = 0; i < BUFFSIZE; i++)
30
+    {
31
+        if (i % 64 == 0)
32
+            k++;
33
+        dst_ref[i] = dst_ref[i % 64] + k;
34
+        dst_opt[i] = dst_opt[i % 64] + k;
35
+    }
36
+
37
+    int padx = 4;
38
+    int pady = 4;
39
+    uint32_t *dst_ref_ptr = dst_ref + srcStep * pady + padx;
40
+    uint32_t *dst_opt_ptr = dst_opt + srcStep * pady + padx;
41
+    for (int i = 0; i < ITERS; i++)
42
+    {
43
+        ref(dst_ref_ptr, srcStep);
44
+        checked(opt, dst_opt_ptr, srcStep);
45
+
46
+        if (memcmp(dst_ref, dst_opt, sizeof(uint32_t) * BUFFSIZE))
47
+            return false;
48
+
49
+        reportfail()
50
+            j += INCR;
51
+    }
52
+    return true;
53
+}
54
+
55
+bool PixelHarness::check_integral_inith(integralh_t ref, integralh_t opt)
56
+{
57
+    /* Since stride is always a multiple of 8 and data movement in AVX2 is 16 elements at a time for 8 bit pixel, we need
58
+     * to check correctness for two cases: stride multiple of 16 and stride not a multiple of 16; fine for High bit depth
59
+     * where data movement in AVX2 is 8 elements at a time */
60
+    intptr_t srcStep[2] = { 56, 64 };
61
+    int j = 0;
62
+    uint32_t dst_ref[BUFFSIZE] = { 0 };
63
+    uint32_t dst_opt[BUFFSIZE] = { 0 };
64
+
65
+    int padx = 4;
66
+    int pady = 4;
67
+    for (int l = 0; l < 2; l++)
68
+    {
69
+        uint32_t *dst_ref_ptr = dst_ref + srcStep[l] * pady + padx;
70
+        uint32_t *dst_opt_ptr = dst_opt + srcStep[l] * pady + padx;
71
+        for (int k = 0; k < ITERS; k++)
72
+        {
73
+            ref(dst_ref_ptr, pixel_test_buff[0], srcStep[l]);
74
+            checked(opt, dst_opt_ptr, pixel_test_buff[0], srcStep[l]);
75
+
76
+            if (memcmp(dst_ref, dst_opt, sizeof(uint32_t) * BUFFSIZE))
77
+                return false;
78
+
79
+            reportfail()
80
+                j += INCR;
81
+        }
82
+    }
83
+    return true;
84
+}
85
+
86
 bool PixelHarness::testPU(int part, const EncoderPrimitives& ref, const EncoderPrimitives& opt)
87
 {
88
     if (opt.pu[part].satd)
89
@@ -2688,6 +2757,64 @@
90
         }
91
     }
92
 
93
+    for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
94
+    {
95
+        if (opt.integral_initv[k] && !check_integral_initv(ref.integral_initv[k], opt.integral_initv[k]))
96
+        {
97
+            switch (k)
98
+            {
99
+            case 0:
100
+                printf("Integral4v failed!\n");
101
+                break;
102
+            case 1:
103
+                printf("Integral8v failed!\n");
104
+                break;
105
+            case 2:
106
+                printf("Integral12v failed!\n");
107
+                break;
108
+            case 3:
109
+                printf("Integral16v failed!\n");
110
+                break;
111
+            case 4:
112
+                printf("Integral24v failed!\n");
113
+                break;
114
+            case 5:
115
+                printf("Integral32v failed!\n");
116
+                break;
117
+            }
118
+            return false;
119
+        }
120
+    }
121
+
122
+
123
+    for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
124
+    {
125
+        if (opt.integral_inith[k] && !check_integral_inith(ref.integral_inith[k], opt.integral_inith[k]))
126
+        {
127
+            switch (k)
128
+            {
129
+                case 0:
130
+                    printf("Integral4h failed!\n");
131
+                    break;
132
+                case 1:
133
+                    printf("Integral8h failed!\n");
134
+                    break;
135
+                case 2:
136
+                    printf("Integral12h failed!\n");
137
+                    break;
138
+                case 3:
139
+                    printf("Integral16h failed!\n");
140
+                    break;
141
+                case 4:
142
+                    printf("Integral24h failed!\n");
143
+                    break;
144
+                case 5:
145
+                    printf("Integral32h failed!\n");
146
+                    break;
147
+            }
148
+            return false;
149
+        }
150
+    }
151
     return true;
152
 }
153
 
154
@@ -3210,4 +3337,67 @@
155
         HEADER0("pelFilterChroma_Horizontal");
156
         REPORT_SPEEDUP(opt.pelFilterChroma[1], ref.pelFilterChroma[1], pbuf1, 1, STRIDE, tc, maskP, maskQ);
157
     }
158
+
159
+    for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
160
+    {
161
+        if (opt.integral_initv[k])
162
+        {
163
+            switch (k)
164
+            {
165
+                case 0:
166
+                    HEADER0("integral_init4v");
167
+                    break;
168
+                case 1:
169
+                    HEADER0("integral_init8v");
170
+                    break;
171
+                case 2:
172
+                    HEADER0("integral_init12v");
173
+                    break;
174
+                case 3:
175
+                    HEADER0("integral_init16v");
176
+                    break;
177
+                case 4:
178
+                    HEADER0("integral_init24v");
179
+                    break;
180
+                case 5:
181
+                    HEADER0("integral_init32v");
182
+                    break;
183
+                default:
184
+                    break;
185
+            }
186
+            REPORT_SPEEDUP(opt.integral_initv[k], ref.integral_initv[k], (uint32_t*)pbuf1, STRIDE);
187
+        }
188
+    }
189
+
190
+    for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
191
+    {
192
+        if (opt.integral_inith[k])
193
+        {
194
+            uint32_t dst_buf[BUFFSIZE] = { 0 };
195
+            switch (k)
196
+            {
197
+            case 0:
198
+                HEADER0("integral_init4h");
199
+                break;
200
+            case 1:
201
+                HEADER0("integral_init8h");
202
+                break;
203
+            case 2:
204
+                HEADER0("integral_init12h");
205
+                break;
206
+            case 3:
207
+                HEADER0("integral_init16h");
208
+                break;
209
+            case 4:
210
+                HEADER0("integral_init24h");
211
+                break;
212
+            case 5:
213
+                HEADER0("integral_init32h");
214
+                break;
215
+            default:
216
+                break;
217
+            }
218
+            REPORT_SPEEDUP(opt.integral_inith[k], ref.integral_inith[k], dst_buf, pbuf1, STRIDE);
219
+        }
220
+    }
221
 }
222
x265_2.4.tar.gz/source/test/pixelharness.h -> x265_2.5.tar.gz/source/test/pixelharness.h Changed
19
 
1
@@ -40,7 +40,7 @@
2
     enum { BUFFSIZE = STRIDE * (MAX_HEIGHT + PAD_ROWS) + INCR * ITERS };
3
     enum { TEST_CASES = 3 };
4
     enum { SMAX = 1 << 12 };
5
-    enum { SMIN = -1 << 12 };
6
+    enum { SMIN = (unsigned)-1 << 12 };
7
     enum { RMAX = PIXEL_MAX - PIXEL_MIN }; //The maximum value obtained by subtracting pixel values (residual max)
8
     enum { RMIN = PIXEL_MIN - PIXEL_MAX }; //The minimum value obtained by subtracting pixel values (residual min)
9
 
10
@@ -126,6 +126,8 @@
11
     bool check_pelFilterLumaStrong_H(pelFilterLumaStrong_t ref, pelFilterLumaStrong_t opt);
12
     bool check_pelFilterChroma_V(pelFilterChroma_t ref, pelFilterChroma_t opt);
13
     bool check_pelFilterChroma_H(pelFilterChroma_t ref, pelFilterChroma_t opt);
14
+    bool check_integral_initv(integralv_t ref, integralv_t opt);
15
+    bool check_integral_inith(integralh_t ref, integralh_t opt);
16
 
17
 public:
18
 
19
x265_2.4.tar.gz/source/test/regression-tests.txt -> x265_2.5.tar.gz/source/test/regression-tests.txt Changed
52
 
1
@@ -17,17 +17,17 @@
2
 BasketballDrive_1920x1080_50.y4m,--preset faster --aq-strength 2 --merange 190 --slices 3
3
 BasketballDrive_1920x1080_50.y4m,--preset medium --ctu 16 --max-tu-size 8 --subme 7 --qg-size 16 --cu-lossless --tu-inter-depth 3 --limit-tu 1
4
 BasketballDrive_1920x1080_50.y4m,--preset medium --keyint -1 --nr-inter 100 -F4 --no-sao
5
-BasketballDrive_1920x1080_50.y4m,--preset medium --no-cutree --analysis-mode=save --refine-level 2 --bitrate 7000 --limit-modes,--preset medium --no-cutree --analysis-mode=load --refine-level 2 --bitrate 7000 --limit-modes
6
+BasketballDrive_1920x1080_50.y4m,--preset medium --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 2 --bitrate 7000 --limit-modes,--preset medium --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 2 --bitrate 7000 --limit-modes
7
 BasketballDrive_1920x1080_50.y4m,--preset slow --nr-intra 100 -F4 --aq-strength 3 --qg-size 16 --limit-refs 1
8
 BasketballDrive_1920x1080_50.y4m,--preset slower --lossless --chromaloc 3 --subme 0 --limit-tu 4
9
-BasketballDrive_1920x1080_50.y4m,--preset slower --no-cutree --analysis-mode=save --refine-level 10 --bitrate 7000 --limit-tu 0,--preset slower --no-cutree --analysis-mode=load --refine-level 10 --bitrate 7000 --limit-tu 0
10
+BasketballDrive_1920x1080_50.y4m,--preset slower --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 10 --bitrate 7000 --limit-tu 0,--preset slower --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 10 --bitrate 7000 --limit-tu 0
11
 BasketballDrive_1920x1080_50.y4m,--preset veryslow --crf 4 --cu-lossless --pmode --limit-refs 1 --aq-mode 3 --limit-tu 3
12
-BasketballDrive_1920x1080_50.y4m,--preset veryslow --no-cutree --analysis-mode=save --bitrate 7000 --tskip-fast --limit-tu 4,--preset veryslow --no-cutree --analysis-mode=load --bitrate 7000  --tskip-fast --limit-tu 4
13
+BasketballDrive_1920x1080_50.y4m,--preset veryslow --no-cutree --analysis-reuse-mode=save --bitrate 7000 --tskip-fast --limit-tu 4,--preset veryslow --no-cutree --analysis-reuse-mode=load --bitrate 7000  --tskip-fast --limit-tu 4
14
 BasketballDrive_1920x1080_50.y4m,--preset veryslow --recon-y4m-exec "ffplay -i pipe:0 -autoexit"
15
 Coastguard-4k.y4m,--preset ultrafast --recon-y4m-exec "ffplay -i pipe:0 -autoexit"
16
 Coastguard-4k.y4m,--preset superfast --tune grain --overscan=crop
17
 Coastguard-4k.y4m,--preset superfast --tune grain --pme --aq-strength 2 --merange 190
18
-Coastguard-4k.y4m,--preset veryfast --no-cutree --analysis-mode=save --refine-level 1 --bitrate 15000,--preset veryfast --no-cutree --analysis-mode=load --refine-level 1 --bitrate 15000
19
+Coastguard-4k.y4m,--preset veryfast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 1 --bitrate 15000,--preset veryfast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 1 --bitrate 15000
20
 Coastguard-4k.y4m,--preset medium --rdoq-level 1 --tune ssim --no-signhide --me umh --slices 2
21
 Coastguard-4k.y4m,--preset slow --tune psnr --cbqpoffs -1 --crqpoffs 1 --limit-refs 1
22
 CrowdRun_1920x1080_50_10bit_422.yuv,--preset ultrafast --weightp --tune zerolatency --qg-size 16
23
@@ -51,7 +51,7 @@
24
 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset veryfast --weightp --nr-intra 1000 -F4
25
 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset medium --nr-inter 500 -F4 --no-psy-rdoq
26
 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset slower --no-weightp --rdoq-level 0 --limit-refs 3 --tu-inter-depth 4 --limit-tu 3
27
-DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --no-cutree --analysis-mode=save --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1,--preset fast --no-cutree --analysis-mode=load --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1
28
+DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --no-cutree --analysis-reuse-mode=save --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1,--preset fast --no-cutree --analysis-reuse-mode=load --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1
29
 FourPeople_1280x720_60.y4m,--preset superfast --no-wpp --lookahead-slices 2
30
 FourPeople_1280x720_60.y4m,--preset veryfast --aq-mode 2 --aq-strength 1.5 --qg-size 8
31
 FourPeople_1280x720_60.y4m,--preset medium --qp 38 --no-psy-rd
32
@@ -68,8 +68,8 @@
33
 KristenAndSara_1280x720_60.y4m,--preset slower --pmode --max-tu-size 8 --limit-refs 0 --limit-modes --limit-tu 1
34
 NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset superfast --tune psnr
35
 NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset medium --tune grain --limit-refs 2
36
-NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset slow --no-cutree --analysis-mode=save --rd 5 --refine-level 10 --bitrate 9000,--preset slow --no-cutree --analysis-mode=load --rd 5 --refine-level 10 --bitrate 9000
37
-News-4k.y4m,--preset ultrafast --no-cutree --analysis-mode=save --refine-level 2 --bitrate 15000,--preset ultrafast --no-cutree --analysis-mode=load --refine-level 2 --bitrate 15000
38
+NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset slow --no-cutree --analysis-reuse-mode=save --rd 5 --analysis-reuse-level 10 --bitrate 9000,--preset slow --no-cutree --analysis-reuse-mode=load --rd 5 --analysis-reuse-level 10 --bitrate 9000
39
+News-4k.y4m,--preset ultrafast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 2 --bitrate 15000,--preset ultrafast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 2 --bitrate 15000
40
 News-4k.y4m,--preset superfast --lookahead-slices 6 --aq-mode 0
41
 News-4k.y4m,--preset superfast --slices 4 --aq-mode 0 
42
 News-4k.y4m,--preset medium --tune ssim --no-sao --qg-size 16
43
@@ -123,7 +123,7 @@
44
 old_town_cross_444_720p50.y4m,--preset superfast --weightp --min-cu 16 --limit-modes
45
 old_town_cross_444_720p50.y4m,--preset veryfast --qp 1 --tune ssim
46
 old_town_cross_444_720p50.y4m,--preset faster --rd 1 --tune zero-latency
47
-old_town_cross_444_720p50.y4m,--preset fast --no-cutree --analysis-mode=save --refine-level 1 --bitrate 3000 --early-skip,--preset fast --no-cutree --analysis-mode=load --refine-level 1 --bitrate 3000 --early-skip
48
+old_town_cross_444_720p50.y4m,--preset fast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 1 --bitrate 3000 --early-skip,--preset fast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 1 --bitrate 3000 --early-skip
49
 old_town_cross_444_720p50.y4m,--preset medium --keyint -1 --no-weightp --ref 6
50
 old_town_cross_444_720p50.y4m,--preset slow --rdoq-level 1 --early-skip --ref 7 --no-b-pyramid
51
 old_town_cross_444_720p50.y4m,--preset slower --crf 4 --cu-lossless
52
x265_2.4.tar.gz/source/x265-extras.cpp -> x265_2.5.tar.gz/source/x265-extras.cpp Changed
258
 
1
@@ -25,7 +25,7 @@
2
 
3
 #include "x265.h"
4
 #include "x265-extras.h"
5
-
6
+#include "param.h"
7
 #include "common.h"
8
 
9
 using namespace X265_NS;
10
@@ -38,14 +38,8 @@
11
     "B count, B ave-QP, B kbps, B-PSNR Y, B-PSNR U, B-PSNR V, B-SSIM (dB), "
12
     "MaxCLL, MaxFALL, Version\n";
13
 
14
-FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char* fname, int level)
15
+FILE* x265_csvlog_open(const x265_param& param, const char* fname, int level)
16
 {
17
-    if (sizeof(x265_stats) != api.sizeof_stats || sizeof(x265_picture) != api.sizeof_picture)
18
-    {
19
-        fprintf(stderr, "extras [error]: structure size skew, unable to create CSV logfile\n");
20
-        return NULL;
21
-    }
22
-
23
     FILE *csvfp = x265_fopen(fname, "r");
24
     if (csvfp)
25
     {
26
@@ -62,6 +56,8 @@
27
             if (level)
28
             {
29
                 fprintf(csvfp, "Encode Order, Type, POC, QP, Bits, Scenecut, ");
30
+                if (level >= 2)
31
+                    fprintf(csvfp, "I/P cost ratio, ");
32
                 if (param.rc.rateControlMode == X265_RC_CRF)
33
                     fprintf(csvfp, "RateFactor, ");
34
                 if (param.rc.vbvBufferSize)
35
@@ -73,7 +69,7 @@
36
                 fprintf(csvfp, "Latency, ");
37
                 fprintf(csvfp, "List 0, List 1");
38
                 uint32_t size = param.maxCUSize;
39
-                for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
40
+                for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
41
                 {
42
                     fprintf(csvfp, ", Intra %dx%d DC, Intra %dx%d Planar, Intra %dx%d Ang", size, size, size, size, size, size);
43
                     size /= 2;
44
@@ -82,7 +78,7 @@
45
                 size = param.maxCUSize;
46
                 if (param.bEnableRectInter)
47
                 {
48
-                    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
49
+                    for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
50
                     {
51
                         fprintf(csvfp, ", Inter %dx%d, Inter %dx%d (Rect)", size, size, size, size);
52
                         if (param.bEnableAMP)
53
@@ -92,29 +88,56 @@
54
                 }
55
                 else
56
                 {
57
-                    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
58
+                    for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
59
                     {
60
                         fprintf(csvfp, ", Inter %dx%d", size, size);
61
                         size /= 2;
62
                     }
63
                 }
64
                 size = param.maxCUSize;
65
-                for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
66
+                for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
67
                 {
68
                     fprintf(csvfp, ", Skip %dx%d", size, size);
69
                     size /= 2;
70
                 }
71
                 size = param.maxCUSize;
72
-                for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
73
+                for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
74
                 {
75
                     fprintf(csvfp, ", Merge %dx%d", size, size);
76
                     size /= 2;
77
                 }
78
-                fprintf(csvfp, ", Avg Luma Distortion, Avg Chroma Distortion, Avg psyEnergy, Avg Luma Level, Max Luma Level, Avg Residual Energy");
79
 
80
-                /* detailed performance statistics */
81
                 if (level >= 2)
82
-                    fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms), Stall Time (ms), Total frame time (ms), Avg WPP, Row Blocks");
83
+                {
84
+                    fprintf(csvfp, ", Avg Luma Distortion, Avg Chroma Distortion, Avg psyEnergy, Avg Residual Energy,"
85
+                        " Min Luma Level, Max Luma Level, Avg Luma Level");
86
+
87
+                    if (param.internalCsp != X265_CSP_I400)
88
+                        fprintf(csvfp, ", Min Cb Level, Max Cb Level, Avg Cb Level, Min Cr Level, Max Cr Level, Avg Cr Level");
89
+
90
+                    /* PU statistics */
91
+                    size = param.maxCUSize;
92
+                    for (uint32_t i = 0; i< param.maxLog2CUSize - (uint32_t)g_log2Size[param.minCUSize] + 1; i++)
93
+                    {
94
+                        fprintf(csvfp, ", Intra %dx%d", size, size);
95
+                        fprintf(csvfp, ", Skip %dx%d", size, size);
96
+                        fprintf(csvfp, ", AMP %d", size);
97
+                        fprintf(csvfp, ", Inter %dx%d", size, size);
98
+                        fprintf(csvfp, ", Merge %dx%d", size, size);
99
+                        fprintf(csvfp, ", Inter %dx%d", size, size / 2);
100
+                        fprintf(csvfp, ", Merge %dx%d", size, size / 2);
101
+                        fprintf(csvfp, ", Inter %dx%d", size / 2, size);
102
+                        fprintf(csvfp, ", Merge %dx%d", size / 2, size);
103
+                        size /= 2;
104
+                    }
105
+
106
+                    if ((uint32_t)g_log2Size[param.minCUSize] == 3)
107
+                        fprintf(csvfp, ", 4x4");
108
+
109
+                    /* detailed performance statistics */
110
+                    fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms),"
111
+                    "Stall Time (ms), Total frame time (ms), Avg WPP, Row Blocks");
112
+                }
113
                 fprintf(csvfp, "\n");
114
             }
115
             else
116
@@ -131,7 +154,10 @@
117
         return;
118
 
119
     const x265_frame_stats* frameStats = &pic.frameData;
120
-    fprintf(csvfp, "%d, %c-SLICE, %4d, %2.2lf, %10d, %d,", frameStats->encoderOrder, frameStats->sliceType, frameStats->poc, frameStats->qp, (int)frameStats->bits, frameStats->bScenecut);
121
+    fprintf(csvfp, "%d, %c-SLICE, %4d, %2.2lf, %10d, %d,", frameStats->encoderOrder, frameStats->sliceType, frameStats->poc, 
122
+                                                           frameStats->qp, (int)frameStats->bits, frameStats->bScenecut);
123
+    if (level >= 2)
124
+        fprintf(csvfp, "%.2f,", frameStats->ipCostRatio);
125
     if (param.rc.rateControlMode == X265_RC_CRF)
126
         fprintf(csvfp, "%.3lf,", frameStats->rateFactor);
127
     if (param.rc.vbvBufferSize)
128
@@ -159,39 +185,76 @@
129
         else
130
             fputs(" -,", csvfp);
131
     }
132
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
133
-        fprintf(csvfp, "%5.2lf%%, %5.2lf%%, %5.2lf%%,", frameStats->cuStats.percentIntraDistribution[depth][0], frameStats->cuStats.percentIntraDistribution[depth][1], frameStats->cuStats.percentIntraDistribution[depth][2]);
134
-    fprintf(csvfp, "%5.2lf%%", frameStats->cuStats.percentIntraNxN);
135
-    if (param.bEnableRectInter)
136
+
137
+    if (level)
138
     {
139
-        for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
140
+        for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
141
+            fprintf(csvfp, "%5.2lf%%, %5.2lf%%, %5.2lf%%,", frameStats->cuStats.percentIntraDistribution[depth][0],
142
+            frameStats->cuStats.percentIntraDistribution[depth][1],
143
+            frameStats->cuStats.percentIntraDistribution[depth][2]);
144
+        fprintf(csvfp, "%5.2lf%%", frameStats->cuStats.percentIntraNxN);
145
+        if (param.bEnableRectInter)
146
         {
147
-            fprintf(csvfp, ", %5.2lf%%, %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0], frameStats->cuStats.percentInterDistribution[depth][1]);
148
-            if (param.bEnableAMP)
149
-                fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][2]);
150
+            for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
151
+            {
152
+                fprintf(csvfp, ", %5.2lf%%, %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0],
153
+                    frameStats->cuStats.percentInterDistribution[depth][1]);
154
+                if (param.bEnableAMP)
155
+                    fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][2]);
156
+            }
157
         }
158
+        else
159
+        {
160
+            for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
161
+                fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0]);
162
+        }
163
+        for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
164
+            fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentSkipCu[depth]);
165
+        for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
166
+            fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentMergeCu[depth]);
167
     }
168
-    else
169
-    {
170
-        for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
171
-            fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0]);
172
-    }
173
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
174
-        fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentSkipCu[depth]);
175
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
176
-        fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentMergeCu[depth]);
177
-    fprintf(csvfp, ", %.2lf, %.2lf, %.2lf, %.2lf, %d, %.2lf", frameStats->avgLumaDistortion, frameStats->avgChromaDistortion, frameStats->avgPsyEnergy, frameStats->avgLumaLevel, frameStats->maxLumaLevel, frameStats->avgResEnergy);
178
 
179
     if (level >= 2)
180
     {
181
-        fprintf(csvfp, ", %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf,", frameStats->decideWaitTime, frameStats->row0WaitTime, frameStats->wallTime, frameStats->refWaitWallTime, frameStats->totalCTUTime, frameStats->stallTime, frameStats->totalFrameTime);
182
+        fprintf(csvfp, ", %.2lf, %.2lf, %.2lf, %.2lf ", frameStats->avgLumaDistortion,
183
+            frameStats->avgChromaDistortion,
184
+            frameStats->avgPsyEnergy,
185
+            frameStats->avgResEnergy);
186
+
187
+        fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minLumaLevel, frameStats->maxLumaLevel, frameStats->avgLumaLevel);
188
+
189
+        if (param.internalCsp != X265_CSP_I400)
190
+        {
191
+            fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minChromaULevel, frameStats->maxChromaULevel, frameStats->avgChromaULevel);
192
+            fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minChromaVLevel, frameStats->maxChromaVLevel, frameStats->avgChromaVLevel);
193
+        }
194
+
195
+        for (uint32_t i = 0; i < param.maxLog2CUSize - (uint32_t)g_log2Size[param.minCUSize] + 1; i++)
196
+        {
197
+            fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentIntraPu[i]);
198
+            fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentSkipPu[i]);
199
+            fprintf(csvfp, ",%.2lf%%", frameStats->puStats.percentAmpPu[i]);
200
+            for (uint32_t j = 0; j < 3; j++)
201
+            {
202
+                fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentInterPu[i][j]);
203
+                fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentMergePu[i][j]);
204
+            }
205
+        }
206
+        if ((uint32_t)g_log2Size[param.minCUSize] == 3)
207
+            fprintf(csvfp, ",%.2lf%%", frameStats->puStats.percentNxN);
208
+
209
+        fprintf(csvfp, ", %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf,", frameStats->decideWaitTime, frameStats->row0WaitTime,
210
+                                                                             frameStats->wallTime, frameStats->refWaitWallTime,
211
+                                                                             frameStats->totalCTUTime, frameStats->stallTime,
212
+                                                                             frameStats->totalFrameTime);
213
+
214
         fprintf(csvfp, " %.3lf, %d", frameStats->avgWPP, frameStats->countRowBlocks);
215
     }
216
     fprintf(csvfp, "\n");
217
     fflush(stderr);
218
 }
219
 
220
-void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, const x265_stats& stats, int level, int argc, char** argv)
221
+void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, int padx, int pady, const x265_stats& stats, int level, int argc, char** argv)
222
 {
223
     if (!csvfp)
224
         return;
225
@@ -204,13 +267,27 @@
226
     }
227
 
228
     // CLI arguments or other
229
-    fputc('"', csvfp);
230
-    for (int i = 1; i < argc; i++)
231
+    if (argc)
232
     {
233
-        fputc(' ', csvfp);
234
-        fputs(argv[i], csvfp);
235
+        fputc('"', csvfp);
236
+        for (int i = 1; i < argc; i++)
237
+        {
238
+            fputc(' ', csvfp);
239
+            fputs(argv[i], csvfp);
240
+        }
241
+        fputc('"', csvfp);
242
+    }
243
+    else
244
+    {
245
+        const x265_param* paramTemp = &param;
246
+        char *opts = x265_param2string((x265_param*)paramTemp, padx, pady);
247
+        if (opts)
248
+        {
249
+            fputc('"', csvfp);
250
+            fputs(opts, csvfp);
251
+            fputc('"', csvfp);
252
+        }
253
     }
254
-    fputc('"', csvfp);
255
 
256
     // current date and time
257
     time_t now;
258
x265_2.4.tar.gz/source/x265-extras.h -> x265_2.5.tar.gz/source/x265-extras.h Changed
19
 
1
@@ -44,7 +44,7 @@
2
  * closed by the caller using fclose(). If level is 0, then no frame logging
3
  * header is written to the file. This function will return NULL if it is unable
4
  * to open the file for write or if it detects a structure size skew */
5
-LIBAPI FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char* fname, int level);
6
+LIBAPI FILE* x265_csvlog_open(const x265_param& param, const char* fname, int level);
7
 
8
 /* Log frame statistics to the CSV file handle. level should have been non-zero
9
  * in the call to x265_csvlog_open() if this function is called. */
10
@@ -53,7 +53,7 @@
11
 /* Log final encode statistics to the CSV file handle. 'argc' and 'argv' are
12
  * intended to be command line arguments passed to the encoder. Encode
13
  * statistics should be queried from the encoder just prior to closing it. */
14
-LIBAPI void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, const x265_stats& stats, int level, int argc, char** argv);
15
+LIBAPI void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, int padx, int pady, const x265_stats& stats, int level, int argc, char** argv);
16
 
17
 /* In-place downshift from a bit-depth greater than 8 to a bit-depth of 8, using
18
  * the residual bits to dither each row. */
19
x265_2.4.tar.gz/source/x265.cpp -> x265_2.5.tar.gz/source/x265.cpp Changed
124
 
1
@@ -73,15 +73,12 @@
2
     ReconFile* recon;
3
     OutputFile* output;
4
     FILE*       qpfile;
5
-    FILE*       csvfpt;
6
-    const char* csvfn;
7
     const char* reconPlayCmd;
8
     const x265_api* api;
9
     x265_param* param;
10
     bool bProgress;
11
     bool bForceY4m;
12
     bool bDither;
13
-    int csvLogLevel;
14
     uint32_t seek;              // number of frames to skip from the beginning
15
     uint32_t framesToBeEncoded; // number of frames to encode
16
     uint64_t totalbytes;
17
@@ -97,8 +94,6 @@
18
         recon = NULL;
19
         output = NULL;
20
         qpfile = NULL;
21
-        csvfpt = NULL;
22
-        csvfn = NULL;
23
         reconPlayCmd = NULL;
24
         api = NULL;
25
         param = NULL;
26
@@ -109,7 +104,6 @@
27
         startTime = x265_mdate();
28
         prevUpdateTime = 0;
29
         bDither = false;
30
-        csvLogLevel = 0;
31
     }
32
 
33
     void destroy();
34
@@ -129,9 +123,6 @@
35
     if (qpfile)
36
         fclose(qpfile);
37
     qpfile = NULL;
38
-    if (csvfpt)
39
-        fclose(csvfpt);
40
-    csvfpt = NULL;
41
     if (output)
42
         output->release();
43
     output = NULL;
44
@@ -292,8 +283,6 @@
45
             if (0) ;
46
             OPT2("frame-skip", "seek") this->seek = (uint32_t)x265_atoi(optarg, bError);
47
             OPT("frames") this->framesToBeEncoded = (uint32_t)x265_atoi(optarg, bError);
48
-            OPT("csv") this->csvfn = optarg;
49
-            OPT("csv-log-level") this->csvLogLevel = x265_atoi(optarg, bError);
50
             OPT("no-progress") this->bProgress = false;
51
             OPT("output") outputfn = optarg;
52
             OPT("input") inputfn = optarg;
53
@@ -530,8 +519,7 @@
54
  * 1 - unable to parse command line
55
  * 2 - unable to open encoder
56
  * 3 - unable to generate stream headers
57
- * 4 - encoder abort
58
- * 5 - unable to open csv file */
59
+ * 4 - encoder abort */
60
 
61
 int main(int argc, char **argv)
62
 {
63
@@ -586,28 +574,15 @@
64
     /* get the encoder parameters post-initialization */
65
     api->encoder_parameters(encoder, param);
66
 
67
-    if (cliopt.csvfn)
68
-    {
69
-        cliopt.csvfpt = x265_csvlog_open(*api, *param, cliopt.csvfn, cliopt.csvLogLevel);
70
-        if (!cliopt.csvfpt)
71
-        {
72
-            x265_log_file(param, X265_LOG_ERROR, "Unable to open CSV log file <%s>, aborting\n", cliopt.csvfn);
73
-            cliopt.destroy();
74
-            if (cliopt.api)
75
-                cliopt.api->param_free(cliopt.param);
76
-            exit(5);
77
-        }
78
-    }
79
-
80
-    /* Control-C handler */
81
+     /* Control-C handler */
82
     if (signal(SIGINT, sigint_handler) == SIG_ERR)
83
         x265_log(param, X265_LOG_ERROR, "Unable to register CTRL+C handler: %s\n", strerror(errno));
84
 
85
     x265_picture pic_orig, pic_out;
86
     x265_picture *pic_in = &pic_orig;
87
-    /* Allocate recon picture if analysisMode is enabled */
88
+    /* Allocate recon picture if analysisReuseMode is enabled */
89
     std::priority_queue<int64_t>* pts_queue = cliopt.output->needPTS() ? new std::priority_queue<int64_t>() : NULL;
90
-    x265_picture *pic_recon = (cliopt.recon || !!param->analysisMode || pts_queue || reconPlay || cliopt.csvLogLevel) ? &pic_out : NULL;
91
+    x265_picture *pic_recon = (cliopt.recon || !!param->analysisReuseMode || pts_queue || reconPlay || param->csvLogLevel) ? &pic_out : NULL;
92
     uint32_t inFrameCount = 0;
93
     uint32_t outFrameCount = 0;
94
     x265_nal *p_nal;
95
@@ -698,8 +673,6 @@
96
         }
97
 
98
         cliopt.printStatus(outFrameCount);
99
-        if (numEncoded && cliopt.csvLogLevel)
100
-            x265_csvlog_frame(cliopt.csvfpt, *param, *pic_recon, cliopt.csvLogLevel);
101
     }
102
 
103
     /* Flush the encoder */
104
@@ -730,8 +703,6 @@
105
         }
106
 
107
         cliopt.printStatus(outFrameCount);
108
-        if (numEncoded && cliopt.csvLogLevel)
109
-            x265_csvlog_frame(cliopt.csvfpt, *param, *pic_recon, cliopt.csvLogLevel);
110
 
111
         if (!numEncoded)
112
             break;
113
@@ -746,8 +717,8 @@
114
     delete reconPlay;
115
 
116
     api->encoder_get_stats(encoder, &stats, sizeof(stats));
117
-    if (cliopt.csvfpt && !b_ctrl_c)
118
-        x265_csvlog_encode(cliopt.csvfpt, api->version_str, *param, stats, cliopt.csvLogLevel, argc, argv);
119
+    if (param->csvfn && !b_ctrl_c)
120
+        api->encoder_log(encoder, argc, argv);
121
     api->encoder_close(encoder);
122
 
123
     int64_t second_largest_pts = 0;
124
x265_2.4.tar.gz/source/x265.h -> x265_2.5.tar.gz/source/x265.h Changed
232
 
1
@@ -24,10 +24,9 @@
2
 
3
 #ifndef X265_H
4
 #define X265_H
5
-
6
 #include <stdint.h>
7
+#include <stdio.h>
8
 #include "x265_config.h"
9
-
10
 #ifdef __cplusplus
11
 extern "C" {
12
 #endif
13
@@ -98,6 +97,7 @@
14
     uint32_t         sliceType;
15
     uint32_t         numCUsInFrame;
16
     uint32_t         numPartitions;
17
+    uint32_t         depthBytes;
18
     int              bScenecut;
19
     void*            wt;
20
     void*            interData;
21
@@ -117,6 +117,20 @@
22
 } x265_cu_stats;
23
 
24
 
25
+/* pu statistics */
26
+typedef struct x265_pu_stats
27
+{
28
+    double      percentSkipPu[4];               // Percentage of skip cu in all depths
29
+    double      percentIntraPu[4];              // Percentage of intra modes in all depths
30
+    double      percentAmpPu[4];                // Percentage of amp modes in all depths
31
+    double      percentInterPu[4][3];           // Percentage of inter 2nx2n, 2nxn and nx2n in all depths
32
+    double      percentMergePu[4][3];           // Percentage of merge 2nx2n, 2nxn and nx2n in all depth
33
+    double      percentNxN;
34
+
35
+    /* All the above values will add up to 100%. */
36
+} x265_pu_stats;
37
+
38
+
39
 typedef struct x265_analysis_2Pass
40
 {
41
     uint32_t      poc;
42
@@ -154,13 +168,41 @@
43
     int              list0POC[16];
44
     int              list1POC[16];
45
     uint16_t         maxLumaLevel;
46
+    uint16_t         minLumaLevel;
47
+
48
+    uint16_t         maxChromaULevel;
49
+    uint16_t         minChromaULevel;
50
+    double           avgChromaULevel;
51
+
52
+
53
+    uint16_t         maxChromaVLevel;
54
+    uint16_t         minChromaVLevel;
55
+    double           avgChromaVLevel;
56
+
57
     char             sliceType;
58
     int              bScenecut;
59
+    double           ipCostRatio;
60
     int              frameLatency;
61
     x265_cu_stats    cuStats;
62
+    x265_pu_stats    puStats;
63
     double           totalFrameTime;
64
 } x265_frame_stats;
65
 
66
+typedef struct x265_ctu_info_t
67
+{
68
+    int32_t ctuAddress;
69
+    int32_t ctuPartitions[64];
70
+    void*    ctuInfo;
71
+} x265_ctu_info_t;
72
+
73
+typedef enum
74
+{
75
+    NO_CTU_INFO = 0,
76
+    HAS_CTU_INFO = 1,
77
+    CTU_INFO_CHANGE = 2,
78
+}CTUInfo;
79
+
80
+
81
 /* Arbitrary User SEI
82
  * Payload size is in bytes and the payload pointer must be non-NULL. 
83
  * Payload types and syntax can be found in Annex D of the H.265 Specification.
84
@@ -258,15 +300,15 @@
85
      * to allow the encoder to determine base QP */
86
     int     forceqp;
87
 
88
-    /* If param.analysisMode is X265_ANALYSIS_OFF this field is ignored on input
89
+    /* If param.analysisReuseMode is X265_ANALYSIS_OFF this field is ignored on input
90
      * and output. Else the user must call x265_alloc_analysis_data() to
91
      * allocate analysis buffers for every picture passed to the encoder.
92
      *
93
-     * On input when param.analysisMode is X265_ANALYSIS_LOAD and analysisData
94
+     * On input when param.analysisReuseMode is X265_ANALYSIS_LOAD and analysisData
95
      * member pointers are valid, the encoder will use the data stored here to
96
      * reduce encoder work.
97
      *
98
-     * On output when param.analysisMode is X265_ANALYSIS_SAVE and analysisData
99
+     * On output when param.analysisReuseMode is X265_ANALYSIS_SAVE and analysisData
100
      * member pointers are valid, the encoder will write output analysis into
101
      * this data structure */
102
     x265_analysis_data analysisData;
103
@@ -612,7 +654,14 @@
104
      * X265_LOG_FULL, default is X265_LOG_INFO */
105
     int       logLevel;
106
 
107
-    /* Filename of CSV log. Now deprecated */
108
+    /* Level of csv logging. 0 is summary, 1 is frame level logging,
109
+     * 2 is frame level logging with performance statistics */
110
+    int       csvLogLevel;
111
+
112
+    /* filename of CSV log. If csvLogLevel is non-zero, the encoder will emit
113
+     * per-slice statistics to this log file in encode order. Otherwise the
114
+     * encoder will emit per-stream statistics into the log file when
115
+     * x265_encoder_log is called (presumably at the end of the encode) */
116
     const char* csvfn;
117
 
118
     /*== Internal Picture Specification ==*/
119
@@ -1057,10 +1106,10 @@
120
      * buffers.  if X265_ANALYSIS_LOAD, read analysis information into analysis
121
      * buffer and use this analysis information to reduce the amount of work
122
      * the encoder must perform. Default X265_ANALYSIS_OFF */
123
-    int       analysisMode;
124
+    int       analysisReuseMode;
125
 
126
-    /* Filename for analysisMode save/load. Default name is "x265_analysis.dat" */
127
-    const char* analysisFileName;
128
+    /* Filename for analysisReuseMode save/load. Default name is "x265_analysis.dat" */
129
+    const char* analysisReuseFileName;
130
 
131
     /*== Rate Control ==*/
132
 
133
@@ -1194,6 +1243,9 @@
134
 
135
         /* sets a hard lower limit on QP */
136
         int      qpMin;
137
+
138
+        /* internally enable if tune grain is set */
139
+        int      bEnableConstVbv;
140
     } rc;
141
 
142
     /*== Video Usability Information ==*/
143
@@ -1376,9 +1428,9 @@
144
     int       bHDROpt;
145
 
146
     /* A value between 1 and 10 (both inclusive) determines the level of
147
-    * information stored/reused in save/load analysis-mode. Higher the refine
148
-    * level higher the informtion stored/reused. Default is 5 */
149
-    int       analysisRefineLevel;
150
+    * information stored/reused in save/load analysis-reuse-mode. Higher the refine
151
+    * level higher the information stored/reused. Default is 5 */
152
+    int       analysisReuseLevel;
153
 
154
      /* Limit Sample Adaptive Offset filter computation by early terminating SAO
155
      * process based on inter prediction mode, CTU spatial-domain correlations,
156
@@ -1391,7 +1443,44 @@
157
     /* Insert tone mapping information only for IDR frames and when the 
158
      * tone mapping information changes. */
159
     int       bDhdr10opt;
160
+
161
+    /* Determine how x265 react to the content information recieved through the API */
162
+    int       bCTUInfo;
163
+
164
+    /* Use ratecontrol statistics from pic_in, if available*/
165
+    int       bUseRcStats;
166
+
167
+    /* Factor by which input video is scaled down for analysis save mode. Default is 0 */
168
+    int       scaleFactor;
169
+
170
+    /* Enable intra refinement in load mode*/
171
+    int       intraRefine;
172
+
173
+    /* Enable inter refinement in load mode*/
174
+    int       interRefine;
175
+
176
+    /* Enable motion vector refinement in load mode*/
177
+    int       mvRefine;
178
+
179
+    /* Log of maximum CTU size */
180
+    uint32_t  maxLog2CUSize;
181
+
182
+    /* Actual CU depth with respect to config depth */
183
+    uint32_t  maxCUDepth;
184
+
185
+    /* CU depth with respect to maximum transform size */
186
+    uint32_t  unitSizeDepth;
187
+
188
+    /* Number of 4x4 units in maximum CU size */
189
+    uint32_t  num4x4Partitions;
190
+
191
+    /* Specify if analysis mode uses file for data reuse */
192
+    int       bUseAnalysisFile;
193
+
194
+    /* File pointer for csv log */
195
+    FILE*     csvfpt;
196
 } x265_param;
197
+
198
 /* x265_param_alloc:
199
  *  Allocates an x265_param instance. The returned param structure is not
200
  *  special in any way, but using this method together with x265_param_free()
201
@@ -1558,7 +1647,8 @@
202
 void x265_encoder_get_stats(x265_encoder *encoder, x265_stats *, uint32_t statsSizeBytes);
203
 
204
 /* x265_encoder_log:
205
- *       This function is deprecated */
206
+ *       write a line to the configured CSV file.  If a CSV filename was not
207
+ *       configured, or file open failed, this function will perform no write. */
208
 void x265_encoder_log(x265_encoder *encoder, int argc, char **argv);
209
 
210
 /* x265_encoder_close:
211
@@ -1581,6 +1671,12 @@
212
 
213
 int x265_encoder_intra_refresh(x265_encoder *);
214
 
215
+/* x265_encoder_ctu_info:
216
+ *    Copy CTU information such as ctu address and ctu partition structure of all
217
+ *    CTUs in each frame. The function is invoked only if "--ctu-info" is enabled and
218
+ *    the encoder will wait for this copy to complete if enabled.
219
+ */
220
+int x265_encoder_ctu_info(x265_encoder *, int poc, x265_ctu_info_t** ctu);
221
 /* x265_cleanup:
222
  *       release library static allocations, reset configured CTU size */
223
 void x265_cleanup(void);
224
@@ -1629,6 +1725,7 @@
225
 
226
     int           sizeof_frame_stats;   /* sizeof(x265_frame_stats) */
227
     int           (*encoder_intra_refresh)(x265_encoder*);
228
+    int           (*encoder_ctu_info)(x265_encoder*, int, x265_ctu_info_t**);
229
     /* add new pointers to the end, or increment X265_MAJOR_VERSION */
230
 } x265_api;
231
 
232
x265_2.4.tar.gz/source/x265cli.h -> x265_2.5.tar.gz/source/x265cli.h Changed
94
 
1
@@ -122,6 +122,7 @@
2
     { "scenecut",       required_argument, NULL, 0 },
3
     { "no-scenecut",          no_argument, NULL, 0 },
4
     { "scenecut-bias",  required_argument, NULL, 0 },
5
+    { "ctu-info",       required_argument, NULL, 0 },
6
     { "intra-refresh",        no_argument, NULL, 0 },
7
     { "rc-lookahead",   required_argument, NULL, 0 },
8
     { "lookahead-slices", required_argument, NULL, 0 },
9
@@ -158,6 +159,8 @@
10
     { "qpstep",         required_argument, NULL, 0 },
11
     { "qpmin",          required_argument, NULL, 0 },
12
     { "qpmax",          required_argument, NULL, 0 },
13
+    { "const-vbv",            no_argument, NULL, 0 },
14
+    { "no-const-vbv",         no_argument, NULL, 0 },
15
     { "ratetol",        required_argument, NULL, 0 },
16
     { "cplxblur",       required_argument, NULL, 0 },
17
     { "qblur",          required_argument, NULL, 0 },
18
@@ -247,9 +250,13 @@
19
     { "no-slow-firstpass",    no_argument, NULL, 0 },
20
     { "multi-pass-opt-rps",   no_argument, NULL, 0 },
21
     { "no-multi-pass-opt-rps", no_argument, NULL, 0 },
22
-    { "analysis-mode",  required_argument, NULL, 0 },
23
-    { "analysis-file",  required_argument, NULL, 0 },
24
-    { "refine-level",   required_argument, NULL, 0 },
25
+    { "analysis-reuse-mode",  required_argument, NULL, 0 },
26
+    { "analysis-reuse-file",  required_argument, NULL, 0 },
27
+    { "analysis-reuse-level", required_argument, NULL, 0 },
28
+    { "scale-factor",   required_argument, NULL, 0 },
29
+    { "refine-intra",   required_argument, NULL, 0 },
30
+    { "refine-inter",   no_argument, NULL, 0 },
31
+    { "no-refine-inter",no_argument, NULL, 0 },
32
     { "strict-cbr",           no_argument, NULL, 0 },
33
     { "temporal-layers",      no_argument, NULL, 0 },
34
     { "no-temporal-layers",   no_argument, NULL, 0 },
35
@@ -271,6 +278,8 @@
36
     { "dhdr10-info",    required_argument, NULL, 0 },
37
     { "dhdr10-opt",           no_argument, NULL, 0},
38
     { "no-dhdr10-opt",        no_argument, NULL, 0},
39
+    { "refine-mv",            no_argument, NULL, 0 },
40
+    { "no-refine-mv",         no_argument, NULL, 0 },
41
     { 0, 0, 0, 0 },
42
     { 0, 0, 0, 0 },
43
     { 0, 0, 0, 0 },
44
@@ -316,9 +325,9 @@
45
     H1("                                 1 - i420 (4:2:0 default)\n");
46
     H1("                                 2 - i422 (4:2:2)\n");
47
     H1("                                 3 - i444 (4:4:4)\n");
48
-#if ENABLE_DYNAMIC_HDR10
49
-    H0("   --dhdr10-info <filename>      JSON file containing the Creative Intent Metadata to be encoded as Dynamic Tone Mapping \n");
50
-    H0("   --[no-]dhdr10-opt             Insert tone mapping SEI only for IDR frames and when the tone mapping information changes. Default disabled");
51
+#if ENABLE_HDR10_PLUS
52
+    H0("   --dhdr10-info <filename>      JSON file containing the Creative Intent Metadata to be encoded as Dynamic Tone Mapping\n");
53
+    H0("   --[no-]dhdr10-opt             Insert tone mapping SEI only for IDR frames and when the tone mapping information changes. Default disabled\n");
54
 #endif
55
     H0("-f/--frames <integer>            Maximum number of frames to encode. Default all\n");
56
     H0("   --seek <integer>              First frame to encode\n");
57
@@ -367,6 +376,11 @@
58
     H1("   --[no-]tskip-fast             Enable fast intra transform skipping. Default %s\n", OPT(param->bEnableTSkipFast));
59
     H1("   --nr-intra <integer>          An integer value in range of 0 to 2000, which denotes strength of noise reduction in intra CUs. Default 0\n");
60
     H1("   --nr-inter <integer>          An integer value in range of 0 to 2000, which denotes strength of noise reduction in inter CUs. Default 0\n");
61
+    H0("   --ctu-info <integer>          Enable receiving ctu information asynchronously and determine reaction to the CTU information (0, 1, 2, 4, 6) Default 0\n"
62
+       "                                    - 1: force the partitions if CTU information is present\n"
63
+       "                                    - 2: functionality of (1) and reduce qp if CTU information has changed\n"
64
+       "                                    - 4: functionality of (1) and force Inter modes when CTU Information has changed, merge/skip otherwise\n"
65
+       "                                    Enable this option only when planning to invoke the API function x265_encoder_ctu_info to copy ctu-info asynchronously\n");
66
     H0("\nCoding tools:\n");
67
     H0("-w/--[no-]weightp                Enable weighted prediction in P slices. Default %s\n", OPT(param->bEnableWeightedPred));
68
     H0("   --[no-]weightb                Enable weighted prediction in B slices. Default %s\n", OPT(param->bEnableWeightedBiPred));
69
@@ -431,9 +445,13 @@
70
     H0("   --[no-]analyze-src-pics       Motion estimation uses source frame planes. Default disable\n");
71
     H0("   --[no-]slow-firstpass         Enable a slow first pass in a multipass rate control mode. Default %s\n", OPT(param->rc.bEnableSlowFirstPass));
72
     H0("   --[no-]strict-cbr             Enable stricter conditions and tolerance for bitrate deviations in CBR mode. Default %s\n", OPT(param->rc.bStrictCbr));
73
-    H0("   --analysis-mode <string|int>  save - Dump analysis info into file, load - Load analysis buffers from the file. Default %d\n", param->analysisMode);
74
-    H0("   --analysis-file <filename>    Specify file name used for either dumping or reading analysis data.\n");
75
-    H0("   --refine-level <1..10>        Level of analysis refinement indicates amount of info stored/reused in save/load mode, 1:least....10:most. Default %d\n", param->analysisRefineLevel);
76
+    H0("   --analysis-reuse-mode <string|int>  save - Dump analysis info into file, load - Load analysis buffers from the file. Default %d\n", param->analysisReuseMode);
77
+    H0("   --analysis-reuse-file <filename>    Specify file name used for either dumping or reading analysis data. Deault x265_analysis.dat\n");
78
+    H0("   --analysis-reuse-level <1..10>      Level of analysis reuse indicates amount of info stored/reused in save/load mode, 1:least..10:most. Default %d\n", param->analysisReuseLevel);
79
+    H0("   --scale-factor <int>          Specify factor by which input video is scaled down for analysis save mode. Default %d\n", param->scaleFactor);
80
+    H0("   --refine-intra <int>          Enable intra refinement for load mode. Default %d\n", param->intraRefine);
81
+    H0("   --[no-]refine-inter           Enable inter refinement for load mode. Default %s\n", OPT(param->interRefine));
82
+    H0("   --[no-]refine-mv              Enable mv refinement for load mode. Default %s\n", OPT(param->mvRefine));
83
     H0("   --aq-mode <integer>           Mode for Adaptive Quantization - 0:none 1:uniform AQ 2:auto variance 3:auto variance with bias to dark scenes. Default %d\n", param->rc.aqMode);
84
     H0("   --aq-strength <float>         Reduces blocking and blurring in flat and textured areas (0 to 3.0). Default %.2f\n", param->rc.aqStrength);
85
     H0("   --[no-]aq-motion              Adaptive Quantization based on the relative motion of each CU w.r.t., frame. Default %s\n", OPT(param->bOptCUDeltaQP));
86
@@ -446,6 +464,7 @@
87
     H1("   --qpstep <integer>            The maximum single adjustment in QP allowed to rate control. Default %d\n", param->rc.qpStep);
88
     H1("   --qpmin <integer>             sets a hard lower limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMin);
89
     H1("   --qpmax <integer>             sets a hard upper limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMax);
90
+    H0("   --[no-]const-vbv              Enable consistent vbv. turned on with tune grain. Default %s\n", OPT(param->rc.bEnableConstVbv));
91
     H1("   --cbqpoffs <integer>          Chroma Cb QP Offset [-12..12]. Default %d\n", param->cbQpOffset);
92
     H1("   --crqpoffs <integer>          Chroma Cr QP Offset [-12..12]. Default %d\n", param->crQpOffset);
93
     H1("   --scaling-list <string>       Specify a file containing HM style quant scaling lists or 'default' or 'off'. Default: off\n");
94
Refresh

No build results available

Refresh

No rpmlint results available

Request History
enzokiel's avatar

enzokiel created request over 7 years ago

Update to version 2.5


enzokiel's avatar

enzokiel accepted request over 7 years ago