Packman Build Service PMBS

We truncated the diff of some files because they were too big. If you want to see the full diff for every file, click here.

Changes of Revision 24

x265.changes Changed

@@ -1,4 +1,57 @@
 -------------------------------------------------------------------
+Thu Jul 27 08:33:52 UTC 2017 - joerg.lorenzen@ki.tng.de
+
+- Update to version 2.5
+  Encoder enhancements
+  * Improved grain handling with --tune grain option by throttling
+    VBV operations to limit QP jumps.
+  * Frame threads are now decided based on number of threads
+    specified in the --pools, as opposed to the number of hardware
+    threads available. The mapping was also adjusted to improve
+    quality of the encodes with minimal impact to performance.
+  * CSV logging feature (enabled by --csv) is now part of the
+    library; it was previously part of the x265 application.
+    Applications that integrate libx265 can now extract frame level
+    statistics for their encodes by exercising this option in the
+    library.
+  * Globals that track min and max CU sizes, number of slices, and
+    other parameters have now been moved into instance-specific
+    variables. Consequently, applications that invoke multiple
+    instances of x265 library are no longer restricted to use the
+    same settings for these parameter options across the multiple
+    instances.
+  * x265 can now generate a seprate library that exports the HDR10+
+    parsing API. Other libraries that wish to use this API may do
+    so by linking against this library. Enable ENABLE_HDR10_PLUS in
+    CMake options and build to generate this library.
+  * SEA motion search receives a 10% performance boost from AVX2
+    optimization of its kernels.
+  * The CSV log is now more elaborate with additional fields such
+    as PU statistics, average-min-max luma and chroma values, etc.
+    Refer to documentation of --csv for details of all fields.
+  * x86inc.asm cleaned-up for improved instruction handling.
+  API changes
+  * New API x265_encoder_ctu_info() introduced to specify suggested
+    partition sizes for various CTUs in a frame. To be used in
+    conjunction with --ctu-info to react to the specified
+    partitions appropriately.
+  * Rate-control statistics passed through the x265_picture object
+    for an incoming frame are now used by the encoder.
+  * Options to scale, reuse, and refine analysis for incoming
+    analysis shared through the x265_analysis_data field in
+    x265_picture for runs that use --analysis-reuse-mode load; use
+    options --scale, --refine-mv, --refine-inter, and
+    --refine-intra to explore.
+  * VBV now has a deterministic mode. Use --const-vbv to exercise.
+  Bug fixes
+  * Several fixes for HDR10+ parsing code including incompatibility
+    with user-specific SEI, removal of warnings, linking issues in
+    linux, etc.
+  * SEI messages for HDR10 repeated every keyint when HDR options
+    (--hdr-opt, --master-display) specified.
+- soname bump to 130.
+
+-------------------------------------------------------------------
 Thu Apr 27 14:15:13 UTC 2017 - joerg.lorenzen@ki.tng.de
 
 - Update to version 2.4

​x
 
@@ -1,4 +1,57 @@
 -------------------------------------------------------------------
+Thu Jul 27 08:33:52 UTC 2017 - joerg.lorenzen@ki.tng.de
+
+- Update to version 2.5
+  Encoder enhancements
+  * Improved grain handling with --tune grain option by throttling
+    VBV operations to limit QP jumps.
+  * Frame threads are now decided based on number of threads
+    specified in the --pools, as opposed to the number of hardware
+    threads available. The mapping was also adjusted to improve
+    quality of the encodes with minimal impact to performance.
+  * CSV logging feature (enabled by --csv) is now part of the
+    library; it was previously part of the x265 application.
+    Applications that integrate libx265 can now extract frame level
+    statistics for their encodes by exercising this option in the
+    library.
+  * Globals that track min and max CU sizes, number of slices, and
+    other parameters have now been moved into instance-specific
+    variables. Consequently, applications that invoke multiple
+    instances of x265 library are no longer restricted to use the
+    same settings for these parameter options across the multiple
+    instances.
+  * x265 can now generate a seprate library that exports the HDR10+
+    parsing API. Other libraries that wish to use this API may do
+    so by linking against this library. Enable ENABLE_HDR10_PLUS in
+    CMake options and build to generate this library.
+  * SEA motion search receives a 10% performance boost from AVX2
+    optimization of its kernels.
+  * The CSV log is now more elaborate with additional fields such
+    as PU statistics, average-min-max luma and chroma values, etc.
+    Refer to documentation of --csv for details of all fields.
+  * x86inc.asm cleaned-up for improved instruction handling.
+  API changes
+  * New API x265_encoder_ctu_info() introduced to specify suggested
+    partition sizes for various CTUs in a frame. To be used in
+    conjunction with --ctu-info to react to the specified
+    partitions appropriately.
+  * Rate-control statistics passed through the x265_picture object
+    for an incoming frame are now used by the encoder.
+  * Options to scale, reuse, and refine analysis for incoming
+    analysis shared through the x265_analysis_data field in
+    x265_picture for runs that use --analysis-reuse-mode load; use
+    options --scale, --refine-mv, --refine-inter, and
+    --refine-intra to explore.
+  * VBV now has a deterministic mode. Use --const-vbv to exercise.
+  Bug fixes
+  * Several fixes for HDR10+ parsing code including incompatibility
+    with user-specific SEI, removal of warnings, linking issues in
+    linux, etc.
+  * SEI messages for HDR10 repeated every keyint when HDR options
+    (--hdr-opt, --master-display) specified.
+- soname bump to 130.
+
+-------------------------------------------------------------------
 Thu Apr 27 14:15:13 UTC 2017 - joerg.lorenzen@ki.tng.de
 
 - Update to version 2.4
​

x265.spec Changed

 
@@ -1,10 +1,10 @@
 # based on the spec file from https://build.opensuse.org/package/view_file/home:Simmphonie/libx265/
 
 Name:           x265
-%define soname  116
+%define soname  130
 %define libname lib%{name}
 %define libsoname %{libname}-%{soname}
-Version:        2.4
+Version:        2.5
 Release:        0
 License:        GPL-2.0+
 Summary:        A free h265/HEVC encoder - encoder binary
​

baselibs.conf Changed

 
@@ -1,1 +1,1 @@
-libx265-116
+libx265-130
​

x265_2.4.tar.gz/source/dynamicHDR10/BasicStructures.cpp Deleted

@@ -1,40 +0,0 @@
-/**
- * @file                       BasicStructures.cpp
- * @brief                      Defines the structure of metadata parameters
- * @author                     Daniel Maximiliano Valenzuela, Seongnam Oh.
- * @create date                03/01/2017
- * @version                    0.0.1
- *
- * Copyright @ 2017 Samsung Electronics, DMS Lab, Samsung Research America and Samsung Research Tijuana
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version 2
- * of the License, or (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
- * MA 02110-1301, USA.
-**/
-
-#include "BasicStructures.h"
-#include "vector"
-
-struct PercentileLuminance{
-
-    float averageLuminance = 0.0;
-    float maxRLuminance = 0.0;
-    float maxGLuminance = 0.0;
-    float maxBLuminance = 0.0;
-    int order;
-    std::vector<unsigned int> percentiles;
-};
-
-
-

 
@@ -1,40 +0,0 @@
-/**
- * @file                       BasicStructures.cpp
- * @brief                      Defines the structure of metadata parameters
- * @author                     Daniel Maximiliano Valenzuela, Seongnam Oh.
- * @create date                03/01/2017
- * @version                    0.0.1
- *
- * Copyright @ 2017 Samsung Electronics, DMS Lab, Samsung Research America and Samsung Research Tijuana
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version 2
- * of the License, or (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
- * MA 02110-1301, USA.
-**/
-
-#include "BasicStructures.h"
-#include "vector"
-
-struct PercentileLuminance{
-
-    float averageLuminance = 0.0;
-    float maxRLuminance = 0.0;
-    float maxGLuminance = 0.0;
-    float maxBLuminance = 0.0;
-    int order;
-    std::vector<unsigned int> percentiles;
-};
-
-
-
​

x265_2.4.tar.gz/.hg_archival.txt -> x265_2.5.tar.gz/.hg_archival.txt Changed

 
@@ -1,4 +1,4 @@
 repo: 09fe40627f03a0f9c3e6ac78b22ac93da23f9fdf
-node: e7a4dd48293b7956d4a20df257d23904cc78e376
+node: 64b2d0bf45a52511e57a6b7299160b961ca3d51c
 branch: stable
-tag: 2.4
+tag: 2.5
​

x265_2.4.tar.gz/.hgtags -> x265_2.5.tar.gz/.hgtags Changed

 
@@ -22,3 +22,4 @@
 981e3bfef16a997bce6f46ce1b15631a0e234747 2.1
 be14a7e9755e54f0fd34911c72bdfa66981220bc 2.2
 3037c1448549ca920967831482c653e5892fa8ed 2.3
+e7a4dd48293b7956d4a20df257d23904cc78e376 2.4
​

x265_2.4.tar.gz/doc/reST/api.rst -> x265_2.5.tar.gz/doc/reST/api.rst Changed

@@ -192,6 +192,12 @@
 	 *      presets is not recommended without a more fine-grained breakdown of
 	 *      parameters to take this into account. */
 	int x265_encoder_reconfig(x265_encoder *, x265_param *);
+**x265_encoder_ctu_info**
+       /* x265_encoder_ctu_info:
+        *    Copy CTU information such as ctu address and ctu partition structure of all
+        *    CTUs in each frame. The function is invoked only if "--ctu-info" is enabled and
+        *    the encoder will wait for this copy to complete if enabled.
+        */
 
 Pictures
 ========
@@ -341,6 +347,14 @@
 Cleanup
 =======
 
+At the end of the encode, the application will want to trigger logging
+of the final encode statistics, if :option:`--csv` had been specified::
+
+ 	/* x265_encoder_log:
+	 *       write a line to the configured CSV file. If a CSV filename was not
+	 *       configured, or file open failed, this function will perform no write. */
+ 	void x265_encoder_log(x265_encoder *encoder, int argc, char **argv);
+ 	
 Finally, the encoder must be closed in order to free all of its
 resources. An encoder that has been flushed cannot be restarted and
 reused. Once **x265_encoder_close()** has been called, the encoder

 
@@ -192,6 +192,12 @@
     *      presets is not recommended without a more fine-grained breakdown of
     *      parameters to take this into account. */
    int x265_encoder_reconfig(x265_encoder *, x265_param *);
+**x265_encoder_ctu_info**
+       /* x265_encoder_ctu_info:
+        *    Copy CTU information such as ctu address and ctu partition structure of all
+        *    CTUs in each frame. The function is invoked only if "--ctu-info" is enabled and
+        *    the encoder will wait for this copy to complete if enabled.
+        */
 
 Pictures
 ========
@@ -341,6 +347,14 @@
 Cleanup
 =======
 
+At the end of the encode, the application will want to trigger logging
+of the final encode statistics, if :option:`--csv` had been specified::
+
+   /* x265_encoder_log:
+    *       write a line to the configured CSV file. If a CSV filename was not
+    *       configured, or file open failed, this function will perform no write. */
+   void x265_encoder_log(x265_encoder *encoder, int argc, char **argv);
+   
 Finally, the encoder must be closed in order to free all of its
 resources. An encoder that has been flushed cannot be restarted and
 reused. Once **x265_encoder_close()** has been called, the encoder
​

x265_2.4.tar.gz/doc/reST/cli.rst -> x265_2.5.tar.gz/doc/reST/cli.rst Changed

@@ -52,8 +52,7 @@
 	2. unable to open encoder
 	3. unable to generate stream headers
 	4. encoder abort
-	5. unable to open csv file
-
+	
 Logging/Statistic Options
 =========================
 
@@ -83,9 +82,66 @@
 	it adds one line per run. If :option:`--csv-log-level` is greater than
 	0, it writes one line per frame. Default none
 
-	Several frame performance statistics are available when 
-	:option:`--csv-log-level` is greater than or equal to 2:
-
+	The following statistics are available when :option:`--csv-log-level` is
+	greater than or	equal to 1:
+	
+	**Encode Order** The frame order in which the encoder encodes.
+	
+	**Type** Slice type of the frame.
+	
+	**POC** Picture Order Count - The display order of the frames. 
+	
+	**QP** Quantization Parameter decided for the frame. 
+	
+	**Bits** Number of bits consumed by the frame.
+	
+	**Scenecut** 1 if the frame is a scenecut, 0 otherwise. 
+	
+	**RateFactor** Applicable only when CRF is enabled. The rate factor depends
+	on the CRF given by the user. This is used to determine the QP so as to 
+	target a certain quality.
+	
+	**BufferFill** Bits available for the next frame. Includes bits carried
+	over from the current frame.
+	
+	**Latency** Latency in terms of number of frames between when the frame 
+	was given in and when the frame is given out.
+	
+	**PSNR** Peak signal to noise ratio for Y, U and V planes.
+	
+	**SSIM** A quality metric that denotes the structural similarity between frames.
+	
+	**Ref lists** POC of references in lists 0 and 1 for the frame.
+	
+	Several statistics about the encoded bitstream and encoder performance are 
+	available when :option:`--csv-log-level` is greater than or equal to 2:
+	
+	**I/P cost ratio:** The ratio between the cost when a frame is decided as an
+	I frame to that when it is decided as a P frame as computed from the 
+	quarter-resolution frame in look-ahead. This, in combination with other parameters
+	such as position of the frame in the GOP, is used to decide scene transitions.
+	
+	**Analysis statistics:**
+	
+	**CU Statistics** percentage of CU modes.
+	
+	**Distortion** Average luma and chroma distortion. Calculated as
+	SSE is done on fenc and recon(after quantization).
+	
+	**Psy Energy**  Average psy energy calculated as the sum of absolute
+	difference between source and recon energy. Energy is measured by sa8d
+	minus SAD.
+	
+	**Residual Energy** Average residual energy. SSE is calculated on fenc 
+	and pred(before quantization).
+	
+	**Luma/Chroma Values** minumum, maximum and average(averaged by area)
+	luma and chroma values of source for each frame.
+	
+	**PU Statistics** percentage of PU modes at each depth.
+	
+	**Performance statistics:**
+	
 	**DecideWait ms** number of milliseconds the frame encoder had to
 	wait, since the previous frame was retrieved by the API thread,
 	before a new frame has been given to it. This is the latency
@@ -111,6 +167,8 @@
 	**Stall Time ms** the number of milliseconds of the reported wall
 	time that were spent with zero worker threads, aka all compression
 	was completely stalled.
+	
+	**Total frame time** Total time spent to encode the frame.
 
 	**Avg WPP** the average number of worker threads working on this
 	frame, at any given time. This value is sampled at the completion of
@@ -123,8 +181,6 @@
 	is more of a problem for P frames where some blocks are much more
 	expensive than others.
 	
-	**CLI ONLY**
-
 .. option:: --csv-log-level <integer>
 
     Controls the level of detail (and size) of --csv log files
@@ -133,8 +189,6 @@
     1. frame level logging
     2. frame level logging with performance statistics
 
-    **CLI ONLY**
-
 .. option:: --ssim, --no-ssim
 
 	Calculate and report Structural Similarity values. It is
@@ -795,33 +849,31 @@
 
 Analysis re-use options, to improve performance when encoding the same
 sequence multiple times (presumably at varying bitrates). The encoder
-will not reuse analysis if the resolution and slice type parameters do
-not match.
+will not reuse analysis if slice type parameters do not match.
 
-.. option:: --analysis-mode <string|int>
+.. option:: --analysis-reuse-mode <string|int>
 
-	Specify whether analysis information of each frame is output by encoder
-	or input for reuse. By reading the analysis data writen by an
-	earlier encode of the same sequence, substantial redundant work may
-	be avoided.
-
-	The following data may be stored and reused:
-	I frames   - split decisions and luma intra directions of all CUs.
-	P/B frames - motion vectors are dumped at each depth for all CUs.
+	This option allows reuse of analysis information from first pass to second pass.
+	:option:`--analysis-reuse-mode save` specifies that encoder outputs analysis information of each frame.
+	:option:`--analysis-reuse-mode load` specifies that encoder reuses analysis information from first pass.
+	There is no benefit using load mode without running encoder in save mode. Analysis data from save mode is
+	written to a file specified by :option:`--analysis-reuse-file`. The amount of analysis data stored/reused
+	is determined by :option:`--analysis-reuse-level`. By reading the analysis data writen by an earlier encode
+	of the same sequence, substantial redundant work may be avoided. Requires cutree, pmode to be off. Default 0.
 
 	**Values:** off(0), save(1): dump analysis data, load(2): read analysis data
 
-.. option:: --analysis-file <filename>
+.. option:: --analysis-reuse-file <filename>
 
-	Specify a filename for analysis data (see :option:`--analysis-mode`)
+	Specify a filename for analysis data (see :option:`--analysis-reuse-mode`)
 	If no filename is specified, x265_analysis.dat is used.
 
-.. option:: --refine-level <1..10>
+.. option:: --analysis-reuse-level <1..10>
 
-	Amount of information stored/reused in :option:`--analysis-mode` is distributed across levels.
+	Amount of information stored/reused in :option:`--analysis-reuse-mode` is distributed across levels.
 	Higher the value, higher the information stored/reused, faster the encode. Default 5.
 
-	Note that --refine-level must be paired with analysis-mode.
+	Note that --analysis-reuse-level must be paired with analysis-reuse-mode.
 
 	+--------+-----------------------------------------+
 	| Level  | Description                             |
@@ -835,6 +887,41 @@
 	| 10     | Level 5 + Full CU analysis-info         |
 	+--------+-----------------------------------------+
 
+.. option:: --scale-factor
+
+       Factor by which input video is scaled down for analysis save mode.
+       This option should be coupled with analysis-reuse-mode option, --analysis-reuse-level 10.
+       The ctu size of load should be double the size of save. Default 0.
+
+.. option:: --refine-intra <0|1|2>
+	
+	Enables refinement of intra blocks in current encode. 
+	
+	Level 0 - Forces both mode and depth from the previous encode.
+	
+	Level 1 - Evaluates all intra modes for blocks of size one smaller than 
+	the min-cu-size of the incoming analysis data from the previous encode, 
+	forces modes for blocks of larger size.
+	
+	Level 2 - Evaluates all intra modes for	blocks of size one smaller than 
+	the min-cu-size of the incoming analysis data from the previous encode. 
+	For larger blocks, force only depth when angular mode is chosen by the 
+	previous encode, force depth and mode when other intra modes are chosen.
+	
+	Default 0.
+	
+.. option:: --refine-inter-depth
+
+	Enables refinement of inter blocks in current encode. Evaluates all 
+	inter modes for blocks of size one smaller than the min-cu-size of the 
+	incoming analysis data from the previous encode. Default disabled.
+
+.. option:: --refine-mv
+	
+	Enables refinement of motion vector for scaled video. Evaluates the best 
+	motion vector by searching the surrounding eight integer and subpel pixel
+    positions.
+
 Options which affect the transform unit quad-tree, sometimes referred to
 as the residual quad-tree (RQT).
 
@@ -1221,7 +1308,16 @@
 	intra cost of a frame used in scenecut detection. For example, a value of 5 indicates,
 	if the inter cost of a frame is greater than or equal to 95 percent of the intra cost of the frame,

 
@@ -52,8 +52,7 @@
    2. unable to open encoder
    3. unable to generate stream headers
    4. encoder abort
-   5. unable to open csv file
-
+   
 Logging/Statistic Options
 =========================
 
@@ -83,9 +82,66 @@
    it adds one line per run. If :option:`--csv-log-level` is greater than
    0, it writes one line per frame. Default none
 
-   Several frame performance statistics are available when 
-   :option:`--csv-log-level` is greater than or equal to 2:
-
+   The following statistics are available when :option:`--csv-log-level` is
+   greater than or equal to 1:
+   
+   **Encode Order** The frame order in which the encoder encodes.
+   
+   **Type** Slice type of the frame.
+   
+   **POC** Picture Order Count - The display order of the frames. 
+   
+   **QP** Quantization Parameter decided for the frame. 
+   
+   **Bits** Number of bits consumed by the frame.
+   
+   **Scenecut** 1 if the frame is a scenecut, 0 otherwise. 
+   
+   **RateFactor** Applicable only when CRF is enabled. The rate factor depends
+   on the CRF given by the user. This is used to determine the QP so as to 
+   target a certain quality.
+   
+   **BufferFill** Bits available for the next frame. Includes bits carried
+   over from the current frame.
+   
+   **Latency** Latency in terms of number of frames between when the frame 
+   was given in and when the frame is given out.
+   
+   **PSNR** Peak signal to noise ratio for Y, U and V planes.
+   
+   **SSIM** A quality metric that denotes the structural similarity between frames.
+   
+   **Ref lists** POC of references in lists 0 and 1 for the frame.
+   
+   Several statistics about the encoded bitstream and encoder performance are 
+   available when :option:`--csv-log-level` is greater than or equal to 2:
+   
+   **I/P cost ratio:** The ratio between the cost when a frame is decided as an
+   I frame to that when it is decided as a P frame as computed from the 
+   quarter-resolution frame in look-ahead. This, in combination with other parameters
+   such as position of the frame in the GOP, is used to decide scene transitions.
+   
+   **Analysis statistics:**
+   
+   **CU Statistics** percentage of CU modes.
+   
+   **Distortion** Average luma and chroma distortion. Calculated as
+   SSE is done on fenc and recon(after quantization).
+   
+   **Psy Energy**  Average psy energy calculated as the sum of absolute
+   difference between source and recon energy. Energy is measured by sa8d
+   minus SAD.
+   
+   **Residual Energy** Average residual energy. SSE is calculated on fenc 
+   and pred(before quantization).
+   
+   **Luma/Chroma Values** minumum, maximum and average(averaged by area)
+   luma and chroma values of source for each frame.
+   
+   **PU Statistics** percentage of PU modes at each depth.
+   
+   **Performance statistics:**
+   
    **DecideWait ms** number of milliseconds the frame encoder had to
    wait, since the previous frame was retrieved by the API thread,
    before a new frame has been given to it. This is the latency
@@ -111,6 +167,8 @@
    **Stall Time ms** the number of milliseconds of the reported wall
    time that were spent with zero worker threads, aka all compression
    was completely stalled.
+   
+   **Total frame time** Total time spent to encode the frame.
 
    **Avg WPP** the average number of worker threads working on this
    frame, at any given time. This value is sampled at the completion of
@@ -123,8 +181,6 @@
    is more of a problem for P frames where some blocks are much more
    expensive than others.
    
-   **CLI ONLY**
-
 .. option:: --csv-log-level <integer>
 
     Controls the level of detail (and size) of --csv log files
@@ -133,8 +189,6 @@
     1. frame level logging
     2. frame level logging with performance statistics
 
-    **CLI ONLY**
-
 .. option:: --ssim, --no-ssim
 
    Calculate and report Structural Similarity values. It is
@@ -795,33 +849,31 @@
 
 Analysis re-use options, to improve performance when encoding the same
 sequence multiple times (presumably at varying bitrates). The encoder
-will not reuse analysis if the resolution and slice type parameters do
-not match.
+will not reuse analysis if slice type parameters do not match.
 
-.. option:: --analysis-mode <string|int>
+.. option:: --analysis-reuse-mode <string|int>
 
-   Specify whether analysis information of each frame is output by encoder
-   or input for reuse. By reading the analysis data writen by an
-   earlier encode of the same sequence, substantial redundant work may
-   be avoided.
-
-   The following data may be stored and reused:
-   I frames   - split decisions and luma intra directions of all CUs.
-   P/B frames - motion vectors are dumped at each depth for all CUs.
+   This option allows reuse of analysis information from first pass to second pass.
+   :option:`--analysis-reuse-mode save` specifies that encoder outputs analysis information of each frame.
+   :option:`--analysis-reuse-mode load` specifies that encoder reuses analysis information from first pass.
+   There is no benefit using load mode without running encoder in save mode. Analysis data from save mode is
+   written to a file specified by :option:`--analysis-reuse-file`. The amount of analysis data stored/reused
+   is determined by :option:`--analysis-reuse-level`. By reading the analysis data writen by an earlier encode
+   of the same sequence, substantial redundant work may be avoided. Requires cutree, pmode to be off. Default 0.
 
    **Values:** off(0), save(1): dump analysis data, load(2): read analysis data
 
-.. option:: --analysis-file <filename>
+.. option:: --analysis-reuse-file <filename>
 
-   Specify a filename for analysis data (see :option:`--analysis-mode`)
+   Specify a filename for analysis data (see :option:`--analysis-reuse-mode`)
    If no filename is specified, x265_analysis.dat is used.
 
-.. option:: --refine-level <1..10>
+.. option:: --analysis-reuse-level <1..10>
 
-   Amount of information stored/reused in :option:`--analysis-mode` is distributed across levels.
+   Amount of information stored/reused in :option:`--analysis-reuse-mode` is distributed across levels.
    Higher the value, higher the information stored/reused, faster the encode. Default 5.
 
-   Note that --refine-level must be paired with analysis-mode.
+   Note that --analysis-reuse-level must be paired with analysis-reuse-mode.
 
    +--------+-----------------------------------------+
    | Level  | Description                             |
@@ -835,6 +887,41 @@
    | 10     | Level 5 + Full CU analysis-info         |
    +--------+-----------------------------------------+
 
+.. option:: --scale-factor
+
+       Factor by which input video is scaled down for analysis save mode.
+       This option should be coupled with analysis-reuse-mode option, --analysis-reuse-level 10.
+       The ctu size of load should be double the size of save. Default 0.
+
+.. option:: --refine-intra <0|1|2>
+   
+   Enables refinement of intra blocks in current encode. 
+   
+   Level 0 - Forces both mode and depth from the previous encode.
+   
+   Level 1 - Evaluates all intra modes for blocks of size one smaller than 
+   the min-cu-size of the incoming analysis data from the previous encode, 
+   forces modes for blocks of larger size.
+   
+   Level 2 - Evaluates all intra modes for blocks of size one smaller than 
+   the min-cu-size of the incoming analysis data from the previous encode. 
+   For larger blocks, force only depth when angular mode is chosen by the 
+   previous encode, force depth and mode when other intra modes are chosen.
+   
+   Default 0.
+   
+.. option:: --refine-inter-depth
+
+   Enables refinement of inter blocks in current encode. Evaluates all 
+   inter modes for blocks of size one smaller than the min-cu-size of the 
+   incoming analysis data from the previous encode. Default disabled.
+
+.. option:: --refine-mv
+   
+   Enables refinement of motion vector for scaled video. Evaluates the best 
+   motion vector by searching the surrounding eight integer and subpel pixel
+    positions.
+
 Options which affect the transform unit quad-tree, sometimes referred to
 as the residual quad-tree (RQT).
 
@@ -1221,7 +1308,16 @@
    intra cost of a frame used in scenecut detection. For example, a value of 5 indicates,
    if the inter cost of a frame is greater than or equal to 95 percent of the intra cost of the frame,
​

x265_2.4.tar.gz/doc/reST/releasenotes.rst -> x265_2.5.tar.gz/doc/reST/releasenotes.rst Changed

@@ -2,8 +2,33 @@
 Release Notes
 *************
 
-Release Notes
-*************
+Version 2.5
+===========
+
+Release date - 13th July, 2017.
+
+Encoder enhancements
+--------------------
+1. Improved grain handling with :option:`--tune` grain option by throttling VBV operations to limit QP jumps.
+2. Frame threads are now decided based on number of threads specified in the :option:`--pools`, as opposed to the number of hardware threads available. The mapping was also adjusted to improve quality of the encodes with minimal impact to performance.
+3. CSV logging feature (enabled by :option:`--csv`) is now part of the library; it was previously part of the x265 application. Applications that integrate libx265 can now extract frame level statistics for their encodes by exercising this option in the library.
+4.  Globals that track min and max CU sizes, number of slices, and other parameters have now been moved into instance-specific variables. Consequently, applications that invoke multiple instances of x265 library are no longer restricted to use the same settings for these parameter options across the multiple instances.
+5. x265 can now generate a seprate library that exports the HDR10+ parsing API. Other libraries that wish to use this API may do so by linking against this library. Enable ENABLE_HDR10_PLUS in CMake options and build to generate this library.
+6. SEA motion search receives a 10% performance boost from AVX2 optimization of its kernels.
+7. The CSV log is now more elaborate with additional fields such as PU statistics, average-min-max luma and chroma values, etc. Refer to documentation of :option:`--csv` for details of all fields.
+8. x86inc.asm cleaned-up for improved instruction handling.
+
+API changes
+-----------
+1. New API x265_encoder_ctu_info() introduced to specify suggested partition sizes for various CTUs in a frame. To be used in conjunction with :option:`--ctu-info` to react to the specified partitions appropriately.
+2. Rate-control statistics passed through the x265_picture object for an incoming frame are now used by the encoder.
+3. Options to scale, reuse, and refine analysis for incoming analysis shared through the x265_analysis_data field in x265_picture for runs that use :option:`--analysis-reuse-mode` load; use options :option:`--scale`, :option:`--refine-mv`, :option:`--refine-inter`, and :option:`--refine-intra` to explore. 
+4. VBV now has a deterministic mode. Use :option:`--const-vbv` to exercise.
+
+Bug fixes
+---------
+1. Several fixes for HDR10+ parsing code including incompatibility with user-specific SEI, removal of warnings, linking issues in linux, etc.
+2. SEI messages for HDR10 repeated every keyint when HDR options (:option:`--hdr-opt`, :option:`--master-display`) specified.
 
 Version 2.4
 ===========

 
@@ -2,8 +2,33 @@
 Release Notes
 *************
 
-Release Notes
-*************
+Version 2.5
+===========
+
+Release date - 13th July, 2017.
+
+Encoder enhancements
+--------------------
+1. Improved grain handling with :option:`--tune` grain option by throttling VBV operations to limit QP jumps.
+2. Frame threads are now decided based on number of threads specified in the :option:`--pools`, as opposed to the number of hardware threads available. The mapping was also adjusted to improve quality of the encodes with minimal impact to performance.
+3. CSV logging feature (enabled by :option:`--csv`) is now part of the library; it was previously part of the x265 application. Applications that integrate libx265 can now extract frame level statistics for their encodes by exercising this option in the library.
+4.  Globals that track min and max CU sizes, number of slices, and other parameters have now been moved into instance-specific variables. Consequently, applications that invoke multiple instances of x265 library are no longer restricted to use the same settings for these parameter options across the multiple instances.
+5. x265 can now generate a seprate library that exports the HDR10+ parsing API. Other libraries that wish to use this API may do so by linking against this library. Enable ENABLE_HDR10_PLUS in CMake options and build to generate this library.
+6. SEA motion search receives a 10% performance boost from AVX2 optimization of its kernels.
+7. The CSV log is now more elaborate with additional fields such as PU statistics, average-min-max luma and chroma values, etc. Refer to documentation of :option:`--csv` for details of all fields.
+8. x86inc.asm cleaned-up for improved instruction handling.
+
+API changes
+-----------
+1. New API x265_encoder_ctu_info() introduced to specify suggested partition sizes for various CTUs in a frame. To be used in conjunction with :option:`--ctu-info` to react to the specified partitions appropriately.
+2. Rate-control statistics passed through the x265_picture object for an incoming frame are now used by the encoder.
+3. Options to scale, reuse, and refine analysis for incoming analysis shared through the x265_analysis_data field in x265_picture for runs that use :option:`--analysis-reuse-mode` load; use options :option:`--scale`, :option:`--refine-mv`, :option:`--refine-inter`, and :option:`--refine-intra` to explore. 
+4. VBV now has a deterministic mode. Use :option:`--const-vbv` to exercise.
+
+Bug fixes
+---------
+1. Several fixes for HDR10+ parsing code including incompatibility with user-specific SEI, removal of warnings, linking issues in linux, etc.
+2. SEI messages for HDR10 repeated every keyint when HDR options (:option:`--hdr-opt`, :option:`--master-display`) specified.
 
 Version 2.4
 ===========
​

x265_2.4.tar.gz/source/CMakeLists.txt -> x265_2.5.tar.gz/source/CMakeLists.txt Changed

@@ -29,7 +29,7 @@
 option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF)
 mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
 # X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 116)
+set(X265_BUILD 130)
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
                "${PROJECT_BINARY_DIR}/x265.def")
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
@@ -182,12 +182,19 @@
     add_definitions(-O3 -qstrict -qhot -qaltivec)
     add_definitions(-qinline=level=10 -qpath=IL:/data/video_files/latest.tpo/)
 endif()
-
-
+# this option is to enable the inclusion of dynamic HDR10 library to the libx265 compilation
+option(ENABLE_HDR10_PLUS "Enable dynamic HDR10 compilation" OFF)
 if(GCC)
     add_definitions(-Wall -Wextra -Wshadow)
     add_definitions(-D__STDC_LIMIT_MACROS=1)
-    add_definitions(-std=gnu++98)
+    if(ENABLE_HDR10_PLUS)
+        if(CMAKE_CXX_COMPILER_VERSION VERSION_LESS "4.8")
+            message(FATAL_ERROR "gcc version above 4.8 required to support hdr10plus")
+        endif()
+        add_definitions(-std=gnu++11)
+    else()
+        add_definitions(-std=gnu++98)
+    endif()
     if(ENABLE_PIC)
          add_definitions(-fPIC)
     endif(ENABLE_PIC)
@@ -363,14 +370,12 @@
 else(HIGH_BIT_DEPTH)
     add_definitions(-DHIGH_BIT_DEPTH=0 -DX265_DEPTH=8)
 endif(HIGH_BIT_DEPTH)
-# this option is to enable the inclusion of dynamic HDR10 library to the libx265 compilation
-option(ENABLE_DYNAMIC_HDR10 "Enable dynamic HDR10 compilation" OFF)
-if (ENABLE_DYNAMIC_HDR10)
-    add_subdirectory(dynamicHDR10)
-    include_directories(dynamicHDR10)
-    add_definitions(-DENABLE_DYNAMIC_HDR10)
-endif(ENABLE_DYNAMIC_HDR10)
 
+if (ENABLE_HDR10_PLUS)
+    include_directories(. dynamicHDR10 "${PROJECT_BINARY_DIR}")
+    add_subdirectory(dynamicHDR10)
+    add_definitions(-DENABLE_HDR10_PLUS)
+endif(ENABLE_HDR10_PLUS)
 # this option can only be used when linking multiple libx265 libraries
 # together, and some alternate API access method is implemented.
 option(EXPORT_C_API "Implement public C programming interface" ON)
@@ -510,8 +515,10 @@
     endif()
 endif()
 source_group(ASM FILES ${ASM_SRCS})
-if(ENABLE_DYNAMIC_HDR10)
+if(ENABLE_HDR10_PLUS)
     add_library(x265-static STATIC $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10> ${ASM_OBJS} ${ASM_SRCS})
+    add_library(hdr10plus-static STATIC $<TARGET_OBJECTS:dynamicHDR10>)
+    set_target_properties(hdr10plus-static PROPERTIES OUTPUT_NAME hdr10plus)
 else()
     add_library(x265-static STATIC $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> ${ASM_OBJS} ${ASM_SRCS})
 endif()
@@ -524,6 +531,12 @@
 install(TARGETS x265-static
     LIBRARY DESTINATION ${LIB_INSTALL_DIR}
     ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
+
+if(ENABLE_HDR10_PLUS)
+    install(TARGETS hdr10plus-static
+        LIBRARY DESTINATION ${LIB_INSTALL_DIR}
+        ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
+endif()
 install(FILES x265.h "${PROJECT_BINARY_DIR}/x265_config.h" DESTINATION include)
 
 if(CMAKE_RC_COMPILER)
@@ -547,10 +560,16 @@
 endif()
 option(ENABLE_SHARED "Build shared library" ON)
 if(ENABLE_SHARED)
-
-    if(ENABLE_DYNAMIC_HDR10)
+    if(ENABLE_HDR10_PLUS)
         add_library(x265-shared SHARED "${PROJECT_BINARY_DIR}/x265.def" ${ASM_OBJS}
                     ${X265_RC_FILE} $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10>)
+        add_library(hdr10plus-shared SHARED $<TARGET_OBJECTS:dynamicHDR10>)
+
+        if(MSVC)
+            set_target_properties(hdr10plus-shared PROPERTIES OUTPUT_NAME libhdr10plus)
+        else()
+            set_target_properties(hdr10plus-shared PROPERTIES OUTPUT_NAME hdr10plus)
+        endif()
     else()
         add_library(x265-shared SHARED "${PROJECT_BINARY_DIR}/x265.def" ${ASM_OBJS}
                    ${X265_RC_FILE} $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common>)
@@ -585,6 +604,11 @@
                 ARCHIVE DESTINATION ${LIB_INSTALL_DIR}
                 RUNTIME DESTINATION ${BIN_INSTALL_DIR})
     endif()
+    if(ENABLE_HDR10_PLUS)
+        install(TARGETS hdr10plus-shared
+            LIBRARY DESTINATION ${LIB_INSTALL_DIR}
+            ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
+    endif()
     if(LINKER_OPTIONS)
         # set_target_properties can't do list expansion
         string(REPLACE ";" " " LINKER_OPTION_STR "${LINKER_OPTIONS}")
@@ -646,18 +670,18 @@
     endif(WIN32)
     if(XCODE)
         # Xcode seems unable to link the CLI with libs, so link as one targget
-        if(ENABLE_DYNAMIC_HDR10)
+        if(ENABLE_HDR10_PLUS)
         add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT}
-                        x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp
+                        x265.cpp x265.h x265cli.h
                         $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10> ${ASM_OBJS} ${ASM_SRCS})
         else()
             add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT}
-                        x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp
+                        x265.cpp x265.h x265cli.h
                         $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> ${ASM_OBJS} ${ASM_SRCS})
         endif()
     else()
         add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT} ${X265_RC_FILE}
-                       ${ExportDefs} x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp)
+                       ${ExportDefs} x265.cpp x265.h x265cli.h)
         if(WIN32 OR NOT ENABLE_SHARED OR INTEL_CXX)
             # The CLI cannot link to the shared library on Windows, it
             # requires internal APIs not exported from the DLL

 
@@ -29,7 +29,7 @@
 option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF)
 mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
 # X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 116)
+set(X265_BUILD 130)
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
                "${PROJECT_BINARY_DIR}/x265.def")
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
@@ -182,12 +182,19 @@
     add_definitions(-O3 -qstrict -qhot -qaltivec)
     add_definitions(-qinline=level=10 -qpath=IL:/data/video_files/latest.tpo/)
 endif()
-
-
+# this option is to enable the inclusion of dynamic HDR10 library to the libx265 compilation
+option(ENABLE_HDR10_PLUS "Enable dynamic HDR10 compilation" OFF)
 if(GCC)
     add_definitions(-Wall -Wextra -Wshadow)
     add_definitions(-D__STDC_LIMIT_MACROS=1)
-    add_definitions(-std=gnu++98)
+    if(ENABLE_HDR10_PLUS)
+        if(CMAKE_CXX_COMPILER_VERSION VERSION_LESS "4.8")
+            message(FATAL_ERROR "gcc version above 4.8 required to support hdr10plus")
+        endif()
+        add_definitions(-std=gnu++11)
+    else()
+        add_definitions(-std=gnu++98)
+    endif()
     if(ENABLE_PIC)
          add_definitions(-fPIC)
     endif(ENABLE_PIC)
@@ -363,14 +370,12 @@
 else(HIGH_BIT_DEPTH)
     add_definitions(-DHIGH_BIT_DEPTH=0 -DX265_DEPTH=8)
 endif(HIGH_BIT_DEPTH)
-# this option is to enable the inclusion of dynamic HDR10 library to the libx265 compilation
-option(ENABLE_DYNAMIC_HDR10 "Enable dynamic HDR10 compilation" OFF)
-if (ENABLE_DYNAMIC_HDR10)
-    add_subdirectory(dynamicHDR10)
-    include_directories(dynamicHDR10)
-    add_definitions(-DENABLE_DYNAMIC_HDR10)
-endif(ENABLE_DYNAMIC_HDR10)
 
+if (ENABLE_HDR10_PLUS)
+    include_directories(. dynamicHDR10 "${PROJECT_BINARY_DIR}")
+    add_subdirectory(dynamicHDR10)
+    add_definitions(-DENABLE_HDR10_PLUS)
+endif(ENABLE_HDR10_PLUS)
 # this option can only be used when linking multiple libx265 libraries
 # together, and some alternate API access method is implemented.
 option(EXPORT_C_API "Implement public C programming interface" ON)
@@ -510,8 +515,10 @@
     endif()
 endif()
 source_group(ASM FILES ${ASM_SRCS})
-if(ENABLE_DYNAMIC_HDR10)
+if(ENABLE_HDR10_PLUS)
     add_library(x265-static STATIC $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10> ${ASM_OBJS} ${ASM_SRCS})
+    add_library(hdr10plus-static STATIC $<TARGET_OBJECTS:dynamicHDR10>)
+    set_target_properties(hdr10plus-static PROPERTIES OUTPUT_NAME hdr10plus)
 else()
     add_library(x265-static STATIC $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> ${ASM_OBJS} ${ASM_SRCS})
 endif()
@@ -524,6 +531,12 @@
 install(TARGETS x265-static
     LIBRARY DESTINATION ${LIB_INSTALL_DIR}
     ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
+
+if(ENABLE_HDR10_PLUS)
+    install(TARGETS hdr10plus-static
+        LIBRARY DESTINATION ${LIB_INSTALL_DIR}
+        ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
+endif()
 install(FILES x265.h "${PROJECT_BINARY_DIR}/x265_config.h" DESTINATION include)
 
 if(CMAKE_RC_COMPILER)
@@ -547,10 +560,16 @@
 endif()
 option(ENABLE_SHARED "Build shared library" ON)
 if(ENABLE_SHARED)
-
-    if(ENABLE_DYNAMIC_HDR10)
+    if(ENABLE_HDR10_PLUS)
         add_library(x265-shared SHARED "${PROJECT_BINARY_DIR}/x265.def" ${ASM_OBJS}
                     ${X265_RC_FILE} $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10>)
+        add_library(hdr10plus-shared SHARED $<TARGET_OBJECTS:dynamicHDR10>)
+
+        if(MSVC)
+            set_target_properties(hdr10plus-shared PROPERTIES OUTPUT_NAME libhdr10plus)
+        else()
+            set_target_properties(hdr10plus-shared PROPERTIES OUTPUT_NAME hdr10plus)
+        endif()
     else()
         add_library(x265-shared SHARED "${PROJECT_BINARY_DIR}/x265.def" ${ASM_OBJS}
                    ${X265_RC_FILE} $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common>)
@@ -585,6 +604,11 @@
                 ARCHIVE DESTINATION ${LIB_INSTALL_DIR}
                 RUNTIME DESTINATION ${BIN_INSTALL_DIR})
     endif()
+    if(ENABLE_HDR10_PLUS)
+        install(TARGETS hdr10plus-shared
+            LIBRARY DESTINATION ${LIB_INSTALL_DIR}
+            ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
+    endif()
     if(LINKER_OPTIONS)
         # set_target_properties can't do list expansion
         string(REPLACE ";" " " LINKER_OPTION_STR "${LINKER_OPTIONS}")
@@ -646,18 +670,18 @@
     endif(WIN32)
     if(XCODE)
         # Xcode seems unable to link the CLI with libs, so link as one targget
-        if(ENABLE_DYNAMIC_HDR10)
+        if(ENABLE_HDR10_PLUS)
         add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT}
-                        x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp
+                        x265.cpp x265.h x265cli.h
                         $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10> ${ASM_OBJS} ${ASM_SRCS})
         else()
             add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT}
-                        x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp
+                        x265.cpp x265.h x265cli.h
                         $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> ${ASM_OBJS} ${ASM_SRCS})
         endif()
     else()
         add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT} ${X265_RC_FILE}
-                       ${ExportDefs} x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp)
+                       ${ExportDefs} x265.cpp x265.h x265cli.h)
         if(WIN32 OR NOT ENABLE_SHARED OR INTEL_CXX)
             # The CLI cannot link to the shared library on Windows, it
             # requires internal APIs not exported from the DLL
​

x265_2.4.tar.gz/source/common/CMakeLists.txt -> x265_2.5.tar.gz/source/common/CMakeLists.txt Changed

 
@@ -57,10 +57,10 @@
     set(VEC_PRIMITIVES vec/vec-primitives.cpp ${PRIMITIVES})
     source_group(Intrinsics FILES ${VEC_PRIMITIVES})
 
-    set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h)
+    set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h seaintegral.h)
     set(A_SRCS pixel-a.asm const-a.asm cpu-a.asm ssd-a.asm mc-a.asm
                mc-a2.asm pixel-util8.asm blockcopy8.asm
-               pixeladd8.asm dct8.asm)
+               pixeladd8.asm dct8.asm seaintegral.asm)
     if(HIGH_BIT_DEPTH)
         set(A_SRCS ${A_SRCS} sad16-a.asm intrapred16.asm ipfilter16.asm loopfilter.asm)
     else()
​

x265_2.4.tar.gz/source/common/common.h -> x265_2.5.tar.gz/source/common/common.h Changed

 
@@ -259,7 +259,6 @@
 #define LOG2_RASTER_SIZE        (MAX_LOG2_CU_SIZE - LOG2_UNIT_SIZE)
 #define RASTER_SIZE             (1 << LOG2_RASTER_SIZE)
 #define MAX_NUM_PARTITIONS      (RASTER_SIZE * RASTER_SIZE)
-#define NUM_4x4_PARTITIONS      (1U << (g_unitSizeDepth << 1)) // number of 4x4 units in max CU size
 
 #define MIN_PU_SIZE             4
 #define MIN_TU_SIZE             4
​

x265_2.4.tar.gz/source/common/constants.cpp -> x265_2.5.tar.gz/source/common/constants.cpp Changed

 
@@ -161,7 +161,6 @@
     65535
 };
 
-int      g_ctuSizeConfigured = 0;
 uint32_t g_maxLog2CUSize = MAX_LOG2_CU_SIZE;
 uint32_t g_maxCUSize     = MAX_CU_SIZE;
 uint32_t g_unitSizeDepth = NUM_CU_DEPTH;
​

x265_2.4.tar.gz/source/common/constants.h -> x265_2.5.tar.gz/source/common/constants.h Changed

 
@@ -30,8 +30,6 @@
 namespace X265_NS {
 // private namespace
 
-extern int g_ctuSizeConfigured;
-
 extern double x265_lambda_tab[QP_MAX_MAX + 1];
 extern double x265_lambda2_tab[QP_MAX_MAX + 1];
 extern const uint16_t x265_chroma_lambda2_offset_tab[MAX_CHROMA_LAMBDA_OFFSET + 1];
​

x265_2.4.tar.gz/source/common/cpu.cpp -> x265_2.5.tar.gz/source/common/cpu.cpp Changed

@@ -69,6 +69,7 @@
     { "SSE2Slow",    SSE2 | X265_CPU_SSE2_IS_SLOW },
     { "SSE2",        SSE2 },
     { "SSE2Fast",    SSE2 | X265_CPU_SSE2_IS_FAST },
+    { "LZCNT", X265_CPU_LZCNT },
     { "SSE3",        SSE2 | X265_CPU_SSE3 },
     { "SSSE3",       SSE2 | X265_CPU_SSE3 | X265_CPU_SSSE3 },
     { "SSE4.1",      SSE2 | X265_CPU_SSE3 | X265_CPU_SSSE3 | X265_CPU_SSE4 },
@@ -78,16 +79,17 @@
     { "AVX",         AVX },
     { "XOP",         AVX | X265_CPU_XOP },
     { "FMA4",        AVX | X265_CPU_FMA4 },
-    { "AVX2",        AVX | X265_CPU_AVX2 },
     { "FMA3",        AVX | X265_CPU_FMA3 },
+    { "BMI1",        AVX | X265_CPU_LZCNT | X265_CPU_BMI1 },
+    { "BMI2",        AVX | X265_CPU_LZCNT | X265_CPU_BMI1 | X265_CPU_BMI2 },
+#define AVX2 AVX | X265_CPU_FMA3 | X265_CPU_LZCNT | X265_CPU_BMI1 | X265_CPU_BMI2 | X265_CPU_AVX2
+    { "AVX2", AVX2},
+#undef AVX2
 #undef AVX
 #undef SSE2
 #undef MMX2
     { "Cache32",         X265_CPU_CACHELINE_32 },
     { "Cache64",         X265_CPU_CACHELINE_64 },
-    { "LZCNT",           X265_CPU_LZCNT },
-    { "BMI1",            X265_CPU_BMI1 },
-    { "BMI2",            X265_CPU_BMI1 | X265_CPU_BMI2 },
     { "SlowCTZ",         X265_CPU_SLOW_CTZ },
     { "SlowAtom",        X265_CPU_SLOW_ATOM },
     { "SlowPshufb",      X265_CPU_SLOW_PSHUFB },

 
@@ -69,6 +69,7 @@
     { "SSE2Slow",    SSE2 | X265_CPU_SSE2_IS_SLOW },
     { "SSE2",        SSE2 },
     { "SSE2Fast",    SSE2 | X265_CPU_SSE2_IS_FAST },
+    { "LZCNT", X265_CPU_LZCNT },
     { "SSE3",        SSE2 | X265_CPU_SSE3 },
     { "SSSE3",       SSE2 | X265_CPU_SSE3 | X265_CPU_SSSE3 },
     { "SSE4.1",      SSE2 | X265_CPU_SSE3 | X265_CPU_SSSE3 | X265_CPU_SSE4 },
@@ -78,16 +79,17 @@
     { "AVX",         AVX },
     { "XOP",         AVX | X265_CPU_XOP },
     { "FMA4",        AVX | X265_CPU_FMA4 },
-    { "AVX2",        AVX | X265_CPU_AVX2 },
     { "FMA3",        AVX | X265_CPU_FMA3 },
+    { "BMI1",        AVX | X265_CPU_LZCNT | X265_CPU_BMI1 },
+    { "BMI2",        AVX | X265_CPU_LZCNT | X265_CPU_BMI1 | X265_CPU_BMI2 },
+#define AVX2 AVX | X265_CPU_FMA3 | X265_CPU_LZCNT | X265_CPU_BMI1 | X265_CPU_BMI2 | X265_CPU_AVX2
+    { "AVX2", AVX2},
+#undef AVX2
 #undef AVX
 #undef SSE2
 #undef MMX2
     { "Cache32",         X265_CPU_CACHELINE_32 },
     { "Cache64",         X265_CPU_CACHELINE_64 },
-    { "LZCNT",           X265_CPU_LZCNT },
-    { "BMI1",            X265_CPU_BMI1 },
-    { "BMI2",            X265_CPU_BMI1 | X265_CPU_BMI2 },
     { "SlowCTZ",         X265_CPU_SLOW_CTZ },
     { "SlowAtom",        X265_CPU_SLOW_ATOM },
     { "SlowPshufb",      X265_CPU_SLOW_PSHUFB },
​

x265_2.4.tar.gz/source/common/cudata.cpp -> x265_2.5.tar.gz/source/common/cudata.cpp Changed

@@ -28,6 +28,7 @@
 #include "picyuv.h"
 #include "mv.h"
 #include "cudata.h"
+#define MAX_MV 1 << 14
 
 using namespace X265_NS;
 
@@ -110,25 +111,23 @@
 
 }
 
-cubcast_t CUData::s_partSet[NUM_FULL_DEPTH] = { NULL, NULL, NULL, NULL, NULL };
-uint32_t CUData::s_numPartInCUSize;
-
 CUData::CUData()
 {
     memset(this, 0, sizeof(*this));
 }
 
-void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp, int instance)
+void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, const x265_param& param, int instance)
 {
+    int csp = param.internalCsp;
     m_chromaFormat  = csp;
     m_hChromaShift  = CHROMA_H_SHIFT(csp);
     m_vChromaShift  = CHROMA_V_SHIFT(csp);
-    m_numPartitions = NUM_4x4_PARTITIONS >> (depth * 2);
+    m_numPartitions = param.num4x4Partitions >> (depth * 2);
 
     if (!s_partSet[0])
     {
-        s_numPartInCUSize = 1 << g_unitSizeDepth;
-        switch (g_maxLog2CUSize)
+        s_numPartInCUSize = 1 << param.unitSizeDepth;
+        switch (param.maxLog2CUSize)
         {
         case 6:
             s_partSet[0] = bcast256;
@@ -220,7 +219,7 @@
 
         m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions;
 
-        uint32_t cuSize = g_maxCUSize >> depth;
+        uint32_t cuSize = param.maxCUSize >> depth;
         m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (cuSize * cuSize);
         m_trCoeff[1] = m_trCoeff[2] = 0;
         m_transformSkip[1] = m_transformSkip[2] = m_cbf[1] = m_cbf[2] = 0;
@@ -262,7 +261,7 @@
 
         m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions;
 
-        uint32_t cuSize = g_maxCUSize >> depth;
+        uint32_t cuSize = param.maxCUSize >> depth;
         uint32_t sizeL = cuSize * cuSize;
         uint32_t sizeC = sizeL >> (m_hChromaShift + m_vChromaShift); // block chroma part
         m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (sizeL + sizeC * 2);
@@ -278,17 +277,17 @@
     m_encData       = frame.m_encData;
     m_slice         = m_encData->m_slice;
     m_cuAddr        = cuAddr;
-    m_cuPelX        = (cuAddr % m_slice->m_sps->numCuInWidth) << g_maxLog2CUSize;
-    m_cuPelY        = (cuAddr / m_slice->m_sps->numCuInWidth) << g_maxLog2CUSize;
+    m_cuPelX        = (cuAddr % m_slice->m_sps->numCuInWidth) << m_slice->m_param->maxLog2CUSize;
+    m_cuPelY        = (cuAddr / m_slice->m_sps->numCuInWidth) << m_slice->m_param->maxLog2CUSize;
     m_absIdxInCTU   = 0;
-    m_numPartitions = NUM_4x4_PARTITIONS;
+    m_numPartitions = m_encData->m_param->num4x4Partitions;
     m_bFirstRowInSlice = (uint8_t)firstRowInSlice;
     m_bLastRowInSlice  = (uint8_t)lastRowInSlice;
     m_bLastCuInSlice   = (uint8_t)lastCuInSlice;
 
     /* sequential memsets */
     m_partSet((uint8_t*)m_qp, (uint8_t)qp);
-    m_partSet(m_log2CUSize,   (uint8_t)g_maxLog2CUSize);
+    m_partSet(m_log2CUSize,   (uint8_t)m_slice->m_param->maxLog2CUSize);
     m_partSet(m_lumaIntraDir, (uint8_t)ALL_IDX);
     m_partSet(m_chromaIntraDir, (uint8_t)ALL_IDX);
     m_partSet(m_tqBypass,     (uint8_t)frame.m_encData->m_param->bLossless);
@@ -390,7 +389,7 @@
 
     memcpy(m_distortion + offset, subCU.m_distortion, childGeom.numPartitions * sizeof(sse_t));
 
-    uint32_t tmp = 1 << ((g_maxLog2CUSize - childGeom.depth) * 2);
+    uint32_t tmp = 1 << ((m_slice->m_param->maxLog2CUSize - childGeom.depth) * 2);
     uint32_t tmp2 = subPartIdx * tmp;
     memcpy(m_trCoeff[0] + tmp2, subCU.m_trCoeff[0], sizeof(coeff_t)* tmp);
 
@@ -489,7 +488,7 @@
 
     memcpy(ctu.m_distortion + m_absIdxInCTU, m_distortion, m_numPartitions * sizeof(sse_t));
 
-    uint32_t tmpY = 1 << ((g_maxLog2CUSize - depth) * 2);
+    uint32_t tmpY = 1 << ((m_slice->m_param->maxLog2CUSize - depth) * 2);
     uint32_t tmpY2 = m_absIdxInCTU << (LOG2_UNIT_SIZE * 2);
     memcpy(ctu.m_trCoeff[0] + tmpY2, m_trCoeff[0], sizeof(coeff_t)* tmpY);
 
@@ -568,7 +567,7 @@
     m_partCopy(ctu.m_tuDepth + m_absIdxInCTU, m_tuDepth);
     m_partCopy(ctu.m_cbf[0] + m_absIdxInCTU, m_cbf[0]);
 
-    uint32_t tmpY = 1 << ((g_maxLog2CUSize - depth) * 2);
+    uint32_t tmpY = 1 << ((m_slice->m_param->maxLog2CUSize - depth) * 2);
     uint32_t tmpY2 = m_absIdxInCTU << (LOG2_UNIT_SIZE * 2);
     memcpy(ctu.m_trCoeff[0] + tmpY2, m_trCoeff[0], sizeof(coeff_t)* tmpY);
 
@@ -656,7 +655,7 @@
         return m_cuLeft;
     }
 
-    alPartUnitIdx = NUM_4x4_PARTITIONS - 1;
+    alPartUnitIdx = m_encData->m_param->num4x4Partitions - 1;
     return m_cuAboveLeft;
 }
 
@@ -799,7 +798,7 @@
 /* Get left QpMinCu */
 const CUData* CUData::getQpMinCuLeft(uint32_t& lPartUnitIdx, uint32_t curAbsIdxInCTU) const
 {
-    uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
+    uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
     uint32_t absRorderQpMinCUIdx = g_zscanToRaster[absZorderQpMinCUIdx];
 
     // check for left CTU boundary
@@ -816,7 +815,7 @@
 /* Get above QpMinCu */
 const CUData* CUData::getQpMinCuAbove(uint32_t& aPartUnitIdx, uint32_t curAbsIdxInCTU) const
 {
-    uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
+    uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
     uint32_t absRorderQpMinCUIdx = g_zscanToRaster[absZorderQpMinCUIdx];
 
     // check for top CTU boundary
@@ -855,7 +854,7 @@
 
 int8_t CUData::getLastCodedQP(uint32_t absPartIdx) const
 {
-    uint32_t quPartIdxMask = 0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2;
+    uint32_t quPartIdxMask = 0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2;
     int lastValidPartIdx = getLastValidPartIdx(absPartIdx & quPartIdxMask);
 
     if (lastValidPartIdx >= 0)
@@ -865,7 +864,7 @@
         if (m_absIdxInCTU)
             return m_encData->getPicCTU(m_cuAddr)->getLastCodedQP(m_absIdxInCTU);
         else if (m_cuAddr > 0 && !(m_slice->m_pps->bEntropyCodingSyncEnabled && !(m_cuAddr % m_slice->m_sps->numCuInWidth)))
-            return m_encData->getPicCTU(m_cuAddr - 1)->getLastCodedQP(NUM_4x4_PARTITIONS);
+            return m_encData->getPicCTU(m_cuAddr - 1)->getLastCodedQP(m_encData->m_param->num4x4Partitions);
         else
             return (int8_t)m_slice->m_sliceQp;
     }
@@ -997,7 +996,7 @@
 
 bool CUData::setQPSubCUs(int8_t qp, uint32_t absPartIdx, uint32_t depth)
 {
-    uint32_t curPartNumb = NUM_4x4_PARTITIONS >> (depth << 1);
+    uint32_t curPartNumb = m_encData->m_param->num4x4Partitions >> (depth << 1);
     uint32_t curPartNumQ = curPartNumb >> 2;
 
     if (m_cuDepth[absPartIdx] > depth)
@@ -1623,6 +1622,11 @@
                 dir |= (1 << list);
                 candMvField[count][list].mv = colmv;
                 candMvField[count][list].refIdx = refIdx;
+                if (m_encData->m_param->scaleFactor && m_encData->m_param->analysisReuseMode == X265_ANALYSIS_SAVE && m_log2CUSize[0] < 4)
+                {
+                    MV dist(MAX_MV, MAX_MV);
+                    candMvField[count][list].mv = dist;
+                }
             }
         }
 
@@ -1783,7 +1787,13 @@
             int curRefPOC = m_slice->m_refPOCList[picList][refIdx];
             int curPOC = m_slice->m_poc;
 
-            pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC);
+            if (m_encData->m_param->scaleFactor && m_encData->m_param->analysisReuseMode == X265_ANALYSIS_SAVE && (m_log2CUSize[0] < 4))
+            {
+                MV dist(MAX_MV, MAX_MV);
+                pmv[numMvc++] = amvpCand[num++] = dist;
+            }
+            else
+                pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC);
         }
     }
 
@@ -1905,10 +1915,10 @@
     uint32_t offset = 8;
 
     int16_t xmax = (int16_t)((m_slice->m_sps->picWidthInLumaSamples + offset - m_cuPelX - 1) << mvshift);
-    int16_t xmin = -(int16_t)((g_maxCUSize + offset + m_cuPelX - 1) << mvshift);
+    int16_t xmin = -(int16_t)((m_encData->m_param->maxCUSize + offset + m_cuPelX - 1) << mvshift);
 
     int16_t ymax = (int16_t)((m_slice->m_sps->picHeightInLumaSamples + offset - m_cuPelY - 1) << mvshift);
-    int16_t ymin = -(int16_t)((g_maxCUSize + offset + m_cuPelY - 1) << mvshift);
+    int16_t ymin = -(int16_t)((m_encData->m_param->maxCUSize + offset + m_cuPelY - 1) << mvshift);
 
     outMV.x = X265_MIN(xmax, X265_MAX(xmin, outMV.x));
     outMV.y = X265_MIN(ymax, X265_MAX(ymin, outMV.y));

 
@@ -28,6 +28,7 @@
 #include "picyuv.h"
 #include "mv.h"
 #include "cudata.h"
+#define MAX_MV 1 << 14
 
 using namespace X265_NS;
 
@@ -110,25 +111,23 @@
 
 }
 
-cubcast_t CUData::s_partSet[NUM_FULL_DEPTH] = { NULL, NULL, NULL, NULL, NULL };
-uint32_t CUData::s_numPartInCUSize;
-
 CUData::CUData()
 {
     memset(this, 0, sizeof(*this));
 }
 
-void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp, int instance)
+void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, const x265_param& param, int instance)
 {
+    int csp = param.internalCsp;
     m_chromaFormat  = csp;
     m_hChromaShift  = CHROMA_H_SHIFT(csp);
     m_vChromaShift  = CHROMA_V_SHIFT(csp);
-    m_numPartitions = NUM_4x4_PARTITIONS >> (depth * 2);
+    m_numPartitions = param.num4x4Partitions >> (depth * 2);
 
     if (!s_partSet[0])
     {
-        s_numPartInCUSize = 1 << g_unitSizeDepth;
-        switch (g_maxLog2CUSize)
+        s_numPartInCUSize = 1 << param.unitSizeDepth;
+        switch (param.maxLog2CUSize)
         {
         case 6:
             s_partSet[0] = bcast256;
@@ -220,7 +219,7 @@
 
         m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions;
 
-        uint32_t cuSize = g_maxCUSize >> depth;
+        uint32_t cuSize = param.maxCUSize >> depth;
         m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (cuSize * cuSize);
         m_trCoeff[1] = m_trCoeff[2] = 0;
         m_transformSkip[1] = m_transformSkip[2] = m_cbf[1] = m_cbf[2] = 0;
@@ -262,7 +261,7 @@
 
         m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions;
 
-        uint32_t cuSize = g_maxCUSize >> depth;
+        uint32_t cuSize = param.maxCUSize >> depth;
         uint32_t sizeL = cuSize * cuSize;
         uint32_t sizeC = sizeL >> (m_hChromaShift + m_vChromaShift); // block chroma part
         m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (sizeL + sizeC * 2);
@@ -278,17 +277,17 @@
     m_encData       = frame.m_encData;
     m_slice         = m_encData->m_slice;
     m_cuAddr        = cuAddr;
-    m_cuPelX        = (cuAddr % m_slice->m_sps->numCuInWidth) << g_maxLog2CUSize;
-    m_cuPelY        = (cuAddr / m_slice->m_sps->numCuInWidth) << g_maxLog2CUSize;
+    m_cuPelX        = (cuAddr % m_slice->m_sps->numCuInWidth) << m_slice->m_param->maxLog2CUSize;
+    m_cuPelY        = (cuAddr / m_slice->m_sps->numCuInWidth) << m_slice->m_param->maxLog2CUSize;
     m_absIdxInCTU   = 0;
-    m_numPartitions = NUM_4x4_PARTITIONS;
+    m_numPartitions = m_encData->m_param->num4x4Partitions;
     m_bFirstRowInSlice = (uint8_t)firstRowInSlice;
     m_bLastRowInSlice  = (uint8_t)lastRowInSlice;
     m_bLastCuInSlice   = (uint8_t)lastCuInSlice;
 
     /* sequential memsets */
     m_partSet((uint8_t*)m_qp, (uint8_t)qp);
-    m_partSet(m_log2CUSize,   (uint8_t)g_maxLog2CUSize);
+    m_partSet(m_log2CUSize,   (uint8_t)m_slice->m_param->maxLog2CUSize);
     m_partSet(m_lumaIntraDir, (uint8_t)ALL_IDX);
     m_partSet(m_chromaIntraDir, (uint8_t)ALL_IDX);
     m_partSet(m_tqBypass,     (uint8_t)frame.m_encData->m_param->bLossless);
@@ -390,7 +389,7 @@
 
     memcpy(m_distortion + offset, subCU.m_distortion, childGeom.numPartitions * sizeof(sse_t));
 
-    uint32_t tmp = 1 << ((g_maxLog2CUSize - childGeom.depth) * 2);
+    uint32_t tmp = 1 << ((m_slice->m_param->maxLog2CUSize - childGeom.depth) * 2);
     uint32_t tmp2 = subPartIdx * tmp;
     memcpy(m_trCoeff[0] + tmp2, subCU.m_trCoeff[0], sizeof(coeff_t)* tmp);
 
@@ -489,7 +488,7 @@
 
     memcpy(ctu.m_distortion + m_absIdxInCTU, m_distortion, m_numPartitions * sizeof(sse_t));
 
-    uint32_t tmpY = 1 << ((g_maxLog2CUSize - depth) * 2);
+    uint32_t tmpY = 1 << ((m_slice->m_param->maxLog2CUSize - depth) * 2);
     uint32_t tmpY2 = m_absIdxInCTU << (LOG2_UNIT_SIZE * 2);
     memcpy(ctu.m_trCoeff[0] + tmpY2, m_trCoeff[0], sizeof(coeff_t)* tmpY);
 
@@ -568,7 +567,7 @@
     m_partCopy(ctu.m_tuDepth + m_absIdxInCTU, m_tuDepth);
     m_partCopy(ctu.m_cbf[0] + m_absIdxInCTU, m_cbf[0]);
 
-    uint32_t tmpY = 1 << ((g_maxLog2CUSize - depth) * 2);
+    uint32_t tmpY = 1 << ((m_slice->m_param->maxLog2CUSize - depth) * 2);
     uint32_t tmpY2 = m_absIdxInCTU << (LOG2_UNIT_SIZE * 2);
     memcpy(ctu.m_trCoeff[0] + tmpY2, m_trCoeff[0], sizeof(coeff_t)* tmpY);
 
@@ -656,7 +655,7 @@
         return m_cuLeft;
     }
 
-    alPartUnitIdx = NUM_4x4_PARTITIONS - 1;
+    alPartUnitIdx = m_encData->m_param->num4x4Partitions - 1;
     return m_cuAboveLeft;
 }
 
@@ -799,7 +798,7 @@
 /* Get left QpMinCu */
 const CUData* CUData::getQpMinCuLeft(uint32_t& lPartUnitIdx, uint32_t curAbsIdxInCTU) const
 {
-    uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
+    uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
     uint32_t absRorderQpMinCUIdx = g_zscanToRaster[absZorderQpMinCUIdx];
 
     // check for left CTU boundary
@@ -816,7 +815,7 @@
 /* Get above QpMinCu */
 const CUData* CUData::getQpMinCuAbove(uint32_t& aPartUnitIdx, uint32_t curAbsIdxInCTU) const
 {
-    uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
+    uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
     uint32_t absRorderQpMinCUIdx = g_zscanToRaster[absZorderQpMinCUIdx];
 
     // check for top CTU boundary
@@ -855,7 +854,7 @@
 
 int8_t CUData::getLastCodedQP(uint32_t absPartIdx) const
 {
-    uint32_t quPartIdxMask = 0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2;
+    uint32_t quPartIdxMask = 0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2;
     int lastValidPartIdx = getLastValidPartIdx(absPartIdx & quPartIdxMask);
 
     if (lastValidPartIdx >= 0)
@@ -865,7 +864,7 @@
         if (m_absIdxInCTU)
             return m_encData->getPicCTU(m_cuAddr)->getLastCodedQP(m_absIdxInCTU);
         else if (m_cuAddr > 0 && !(m_slice->m_pps->bEntropyCodingSyncEnabled && !(m_cuAddr % m_slice->m_sps->numCuInWidth)))
-            return m_encData->getPicCTU(m_cuAddr - 1)->getLastCodedQP(NUM_4x4_PARTITIONS);
+            return m_encData->getPicCTU(m_cuAddr - 1)->getLastCodedQP(m_encData->m_param->num4x4Partitions);
         else
             return (int8_t)m_slice->m_sliceQp;
     }
@@ -997,7 +996,7 @@
 
 bool CUData::setQPSubCUs(int8_t qp, uint32_t absPartIdx, uint32_t depth)
 {
-    uint32_t curPartNumb = NUM_4x4_PARTITIONS >> (depth << 1);
+    uint32_t curPartNumb = m_encData->m_param->num4x4Partitions >> (depth << 1);
     uint32_t curPartNumQ = curPartNumb >> 2;
 
     if (m_cuDepth[absPartIdx] > depth)
@@ -1623,6 +1622,11 @@
                 dir |= (1 << list);
                 candMvField[count][list].mv = colmv;
                 candMvField[count][list].refIdx = refIdx;
+                if (m_encData->m_param->scaleFactor && m_encData->m_param->analysisReuseMode == X265_ANALYSIS_SAVE && m_log2CUSize[0] < 4)
+                {
+                    MV dist(MAX_MV, MAX_MV);
+                    candMvField[count][list].mv = dist;
+                }
             }
         }
 
@@ -1783,7 +1787,13 @@
             int curRefPOC = m_slice->m_refPOCList[picList][refIdx];
             int curPOC = m_slice->m_poc;
 
-            pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC);
+            if (m_encData->m_param->scaleFactor && m_encData->m_param->analysisReuseMode == X265_ANALYSIS_SAVE && (m_log2CUSize[0] < 4))
+            {
+                MV dist(MAX_MV, MAX_MV);
+                pmv[numMvc++] = amvpCand[num++] = dist;
+            }
+            else
+                pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC);
         }
     }
 
@@ -1905,10 +1915,10 @@
     uint32_t offset = 8;
 
     int16_t xmax = (int16_t)((m_slice->m_sps->picWidthInLumaSamples + offset - m_cuPelX - 1) << mvshift);
-    int16_t xmin = -(int16_t)((g_maxCUSize + offset + m_cuPelX - 1) << mvshift);
+    int16_t xmin = -(int16_t)((m_encData->m_param->maxCUSize + offset + m_cuPelX - 1) << mvshift);
 
     int16_t ymax = (int16_t)((m_slice->m_sps->picHeightInLumaSamples + offset - m_cuPelY - 1) << mvshift);
-    int16_t ymin = -(int16_t)((g_maxCUSize + offset + m_cuPelY - 1) << mvshift);
+    int16_t ymin = -(int16_t)((m_encData->m_param->maxCUSize + offset + m_cuPelY - 1) << mvshift);
 
     outMV.x = X265_MIN(xmax, X265_MAX(xmin, outMV.x));
     outMV.y = X265_MIN(ymax, X265_MAX(ymin, outMV.y));
​

x265_2.4.tar.gz/source/common/cudata.h -> x265_2.5.tar.gz/source/common/cudata.h Changed

@@ -161,8 +161,8 @@
 {
 public:
 
-    static cubcast_t s_partSet[NUM_FULL_DEPTH]; // pointer to broadcast set functions per absolute depth
-    static uint32_t  s_numPartInCUSize;
+    cubcast_t s_partSet[NUM_FULL_DEPTH]; // pointer to broadcast set functions per absolute depth
+    uint32_t  s_numPartInCUSize;
 
     bool          m_vbvAffected;
 
@@ -225,7 +225,7 @@
 
     CUData();
 
-    void     initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp, int instance);
+    void     initialize(const CUDataMemPool& dataPool, uint32_t depth, const x265_param& param, int instance);
     static void calcCTUGeoms(uint32_t ctuWidth, uint32_t ctuHeight, uint32_t maxCUSize, uint32_t minCUSize, CUGeom cuDataArray[CUGeom::MAX_GEOMS]);
 
     void     initCTU(const Frame& frame, uint32_t cuAddr, int qp, uint32_t firstRowInSlice, uint32_t lastRowInSlice, uint32_t lastCUInSlice);
@@ -271,7 +271,7 @@
     void     getInterTUQtDepthRange(uint32_t tuDepthRange[2], uint32_t absPartIdx) const;
     uint32_t getBestRefIdx(uint32_t subPartIdx) const { return ((m_interDir[subPartIdx] & 1) << m_refIdx[0][subPartIdx]) | 
                                                               (((m_interDir[subPartIdx] >> 1) & 1) << (m_refIdx[1][subPartIdx] + 16)); }
-    uint32_t getPUOffset(uint32_t puIdx, uint32_t absPartIdx) const { return (partAddrTable[(int)m_partSize[absPartIdx]][puIdx] << (g_unitSizeDepth - m_cuDepth[absPartIdx]) * 2) >> 4; }
+    uint32_t getPUOffset(uint32_t puIdx, uint32_t absPartIdx) const { return (partAddrTable[(int)m_partSize[absPartIdx]][puIdx] << (m_slice->m_param->unitSizeDepth - m_cuDepth[absPartIdx]) * 2) >> 4; }
 
     uint32_t getNumPartInter(uint32_t absPartIdx) const              { return nbPartsTable[(int)m_partSize[absPartIdx]]; }
     bool     isIntra(uint32_t absPartIdx) const   { return m_predMode[absPartIdx] == MODE_INTRA; }
@@ -285,7 +285,7 @@
     void     getAllowedChromaDir(uint32_t absPartIdx, uint32_t* modeList) const;
     int      getIntraDirLumaPredictor(uint32_t absPartIdx, uint32_t* intraDirPred) const;
 
-    uint32_t getSCUAddr() const                  { return (m_cuAddr << g_unitSizeDepth * 2) + m_absIdxInCTU; }
+    uint32_t getSCUAddr() const                  { return (m_cuAddr << m_slice->m_param->unitSizeDepth * 2) + m_absIdxInCTU; }
     uint32_t getCtxSplitFlag(uint32_t absPartIdx, uint32_t depth) const;
     uint32_t getCtxSkipFlag(uint32_t absPartIdx) const;
     void     getTUEntropyCodingParameters(TUEntropyCodingParameters &result, uint32_t absPartIdx, uint32_t log2TrSize, bool bIsLuma) const;
@@ -350,10 +350,10 @@
 
     CUDataMemPool() { charMemBlock = NULL; trCoeffMemBlock = NULL; mvMemBlock = NULL; distortionMemBlock = NULL; }
 
-    bool create(uint32_t depth, uint32_t csp, uint32_t numInstances)
+    bool create(uint32_t depth, uint32_t csp, uint32_t numInstances, const x265_param& param)
     {
-        uint32_t numPartition = NUM_4x4_PARTITIONS >> (depth * 2);
-        uint32_t cuSize = g_maxCUSize >> depth;
+        uint32_t numPartition = param.num4x4Partitions >> (depth * 2);
+        uint32_t cuSize = param.maxCUSize >> depth;
         uint32_t sizeL = cuSize * cuSize;
         if (csp == X265_CSP_I400)
         {

 
@@ -161,8 +161,8 @@
 {
 public:
 
-    static cubcast_t s_partSet[NUM_FULL_DEPTH]; // pointer to broadcast set functions per absolute depth
-    static uint32_t  s_numPartInCUSize;
+    cubcast_t s_partSet[NUM_FULL_DEPTH]; // pointer to broadcast set functions per absolute depth
+    uint32_t  s_numPartInCUSize;
 
     bool          m_vbvAffected;
 
@@ -225,7 +225,7 @@
 
     CUData();
 
-    void     initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp, int instance);
+    void     initialize(const CUDataMemPool& dataPool, uint32_t depth, const x265_param& param, int instance);
     static void calcCTUGeoms(uint32_t ctuWidth, uint32_t ctuHeight, uint32_t maxCUSize, uint32_t minCUSize, CUGeom cuDataArray[CUGeom::MAX_GEOMS]);
 
     void     initCTU(const Frame& frame, uint32_t cuAddr, int qp, uint32_t firstRowInSlice, uint32_t lastRowInSlice, uint32_t lastCUInSlice);
@@ -271,7 +271,7 @@
     void     getInterTUQtDepthRange(uint32_t tuDepthRange[2], uint32_t absPartIdx) const;
     uint32_t getBestRefIdx(uint32_t subPartIdx) const { return ((m_interDir[subPartIdx] & 1) << m_refIdx[0][subPartIdx]) | 
                                                               (((m_interDir[subPartIdx] >> 1) & 1) << (m_refIdx[1][subPartIdx] + 16)); }
-    uint32_t getPUOffset(uint32_t puIdx, uint32_t absPartIdx) const { return (partAddrTable[(int)m_partSize[absPartIdx]][puIdx] << (g_unitSizeDepth - m_cuDepth[absPartIdx]) * 2) >> 4; }
+    uint32_t getPUOffset(uint32_t puIdx, uint32_t absPartIdx) const { return (partAddrTable[(int)m_partSize[absPartIdx]][puIdx] << (m_slice->m_param->unitSizeDepth - m_cuDepth[absPartIdx]) * 2) >> 4; }
 
     uint32_t getNumPartInter(uint32_t absPartIdx) const              { return nbPartsTable[(int)m_partSize[absPartIdx]]; }
     bool     isIntra(uint32_t absPartIdx) const   { return m_predMode[absPartIdx] == MODE_INTRA; }
@@ -285,7 +285,7 @@
     void     getAllowedChromaDir(uint32_t absPartIdx, uint32_t* modeList) const;
     int      getIntraDirLumaPredictor(uint32_t absPartIdx, uint32_t* intraDirPred) const;
 
-    uint32_t getSCUAddr() const                  { return (m_cuAddr << g_unitSizeDepth * 2) + m_absIdxInCTU; }
+    uint32_t getSCUAddr() const                  { return (m_cuAddr << m_slice->m_param->unitSizeDepth * 2) + m_absIdxInCTU; }
     uint32_t getCtxSplitFlag(uint32_t absPartIdx, uint32_t depth) const;
     uint32_t getCtxSkipFlag(uint32_t absPartIdx) const;
     void     getTUEntropyCodingParameters(TUEntropyCodingParameters &result, uint32_t absPartIdx, uint32_t log2TrSize, bool bIsLuma) const;
@@ -350,10 +350,10 @@
 
     CUDataMemPool() { charMemBlock = NULL; trCoeffMemBlock = NULL; mvMemBlock = NULL; distortionMemBlock = NULL; }
 
-    bool create(uint32_t depth, uint32_t csp, uint32_t numInstances)
+    bool create(uint32_t depth, uint32_t csp, uint32_t numInstances, const x265_param& param)
     {
-        uint32_t numPartition = NUM_4x4_PARTITIONS >> (depth * 2);
-        uint32_t cuSize = g_maxCUSize >> depth;
+        uint32_t numPartition = param.num4x4Partitions >> (depth * 2);
+        uint32_t cuSize = param.maxCUSize >> depth;
         uint32_t sizeL = cuSize * cuSize;
         if (csp == X265_CSP_I400)
         {
​

x265_2.4.tar.gz/source/common/frame.cpp -> x265_2.5.tar.gz/source/common/frame.cpp Changed

@@ -48,6 +48,11 @@
     m_rcData = NULL;
     m_encodeStartTime = 0;
     m_reconfigureRc = false;
+    m_ctuInfo = NULL;
+    m_prevCtuInfoChange = NULL;
+    m_addOnDepth = NULL;
+    m_addOnCtuInfo = NULL;
+    m_addOnPrevChange = NULL;
 }
 
 bool Frame::create(x265_param *param, float* quantOffsets)
@@ -56,11 +61,26 @@
     m_param = param;
     CHECKED_MALLOC_ZERO(m_rcData, RcStats, 1);
 
-    if (m_fencPic->create(param->sourceWidth, param->sourceHeight, param->internalCsp) &&
-        m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode || !!param->bAQMotion, param->rc.qgSize))
+    if (param->bCTUInfo)
+    {
+        uint32_t widthInCTU = (m_param->sourceWidth + param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+        uint32_t heightInCTU = (m_param->sourceHeight +  param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+        uint32_t numCTUsInFrame = widthInCTU * heightInCTU;
+        CHECKED_MALLOC_ZERO(m_addOnDepth, uint8_t *, numCTUsInFrame);
+        CHECKED_MALLOC_ZERO(m_addOnCtuInfo, uint8_t *, numCTUsInFrame);
+        CHECKED_MALLOC_ZERO(m_addOnPrevChange, int *, numCTUsInFrame);
+        for (uint32_t i = 0; i < numCTUsInFrame; i++)
+        {
+            CHECKED_MALLOC_ZERO(m_addOnDepth[i], uint8_t, uint32_t(param->num4x4Partitions));
+            CHECKED_MALLOC_ZERO(m_addOnCtuInfo[i], uint8_t, uint32_t(param->num4x4Partitions));
+            CHECKED_MALLOC_ZERO(m_addOnPrevChange[i], int, uint32_t(param->num4x4Partitions));
+        }
+    }
+
+    if (m_fencPic->create(param) && m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode || !!param->bAQMotion, param->rc.qgSize))
     {
         X265_CHECK((m_reconColCount == NULL), "m_reconColCount was initialized");
-        m_numRows = (m_fencPic->m_picHeight + g_maxCUSize - 1)  / g_maxCUSize;
+        m_numRows = (m_fencPic->m_picHeight + param->maxCUSize - 1)  / param->maxCUSize;
         m_reconRowFlag = new ThreadSafeInteger[m_numRows];
         m_reconColCount = new ThreadSafeInteger[m_numRows];
 
@@ -86,12 +106,12 @@
     m_reconPic = new PicYuv;
     m_param = param;
     m_encData->m_reconPic = m_reconPic;
-    bool ok = m_encData->create(*param, sps, m_fencPic->m_picCsp) && m_reconPic->create(param->sourceWidth, param->sourceHeight, param->internalCsp);
+    bool ok = m_encData->create(*param, sps, m_fencPic->m_picCsp) && m_reconPic->create(param);
     if (ok)
     {
         /* initialize right border of m_reconpicYuv as SAO may read beyond the
          * end of the picture accessing uninitialized pixels */
-        int maxHeight = sps.numCuInHeight * g_maxCUSize;
+        int maxHeight = sps.numCuInHeight * param->maxCUSize;
         memset(m_reconPic->m_picOrg[0], 0, sizeof(pixel)* m_reconPic->m_stride * maxHeight);
 
         /* use pre-calculated cu/pu offsets cached in the SPS structure */
@@ -166,6 +186,35 @@
         delete[] m_userSEI.payloads;
     }
 
+    if (m_ctuInfo)
+    {
+        uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+        uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+        uint32_t numCUsInFrame = widthInCU * heightInCU;
+        for (uint32_t i = 0; i < numCUsInFrame; i++)
+        {
+            X265_FREE((*m_ctuInfo + i)->ctuInfo);
+            (*m_ctuInfo + i)->ctuInfo = NULL;
+            X265_FREE(m_addOnDepth[i]);
+            m_addOnDepth[i] = NULL;
+            X265_FREE(m_addOnCtuInfo[i]);
+            m_addOnCtuInfo[i] = NULL;
+            X265_FREE(m_addOnPrevChange[i]);
+            m_addOnPrevChange[i] = NULL;
+        }
+        X265_FREE(*m_ctuInfo);
+        *m_ctuInfo = NULL;
+        X265_FREE(m_ctuInfo);
+        m_ctuInfo = NULL;
+        X265_FREE(m_prevCtuInfoChange);
+        m_prevCtuInfoChange = NULL;
+        X265_FREE(m_addOnDepth);
+        m_addOnDepth = NULL;
+        X265_FREE(m_addOnCtuInfo);
+        m_addOnCtuInfo = NULL;
+        X265_FREE(m_addOnPrevChange);
+        m_addOnPrevChange = NULL;
+    }
     m_lowres.destroy();
     X265_FREE(m_rcData);
 }

 
@@ -48,6 +48,11 @@
     m_rcData = NULL;
     m_encodeStartTime = 0;
     m_reconfigureRc = false;
+    m_ctuInfo = NULL;
+    m_prevCtuInfoChange = NULL;
+    m_addOnDepth = NULL;
+    m_addOnCtuInfo = NULL;
+    m_addOnPrevChange = NULL;
 }
 
 bool Frame::create(x265_param *param, float* quantOffsets)
@@ -56,11 +61,26 @@
     m_param = param;
     CHECKED_MALLOC_ZERO(m_rcData, RcStats, 1);
 
-    if (m_fencPic->create(param->sourceWidth, param->sourceHeight, param->internalCsp) &&
-        m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode || !!param->bAQMotion, param->rc.qgSize))
+    if (param->bCTUInfo)
+    {
+        uint32_t widthInCTU = (m_param->sourceWidth + param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+        uint32_t heightInCTU = (m_param->sourceHeight +  param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+        uint32_t numCTUsInFrame = widthInCTU * heightInCTU;
+        CHECKED_MALLOC_ZERO(m_addOnDepth, uint8_t *, numCTUsInFrame);
+        CHECKED_MALLOC_ZERO(m_addOnCtuInfo, uint8_t *, numCTUsInFrame);
+        CHECKED_MALLOC_ZERO(m_addOnPrevChange, int *, numCTUsInFrame);
+        for (uint32_t i = 0; i < numCTUsInFrame; i++)
+        {
+            CHECKED_MALLOC_ZERO(m_addOnDepth[i], uint8_t, uint32_t(param->num4x4Partitions));
+            CHECKED_MALLOC_ZERO(m_addOnCtuInfo[i], uint8_t, uint32_t(param->num4x4Partitions));
+            CHECKED_MALLOC_ZERO(m_addOnPrevChange[i], int, uint32_t(param->num4x4Partitions));
+        }
+    }
+
+    if (m_fencPic->create(param) && m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode || !!param->bAQMotion, param->rc.qgSize))
     {
         X265_CHECK((m_reconColCount == NULL), "m_reconColCount was initialized");
-        m_numRows = (m_fencPic->m_picHeight + g_maxCUSize - 1)  / g_maxCUSize;
+        m_numRows = (m_fencPic->m_picHeight + param->maxCUSize - 1)  / param->maxCUSize;
         m_reconRowFlag = new ThreadSafeInteger[m_numRows];
         m_reconColCount = new ThreadSafeInteger[m_numRows];
 
@@ -86,12 +106,12 @@
     m_reconPic = new PicYuv;
     m_param = param;
     m_encData->m_reconPic = m_reconPic;
-    bool ok = m_encData->create(*param, sps, m_fencPic->m_picCsp) && m_reconPic->create(param->sourceWidth, param->sourceHeight, param->internalCsp);
+    bool ok = m_encData->create(*param, sps, m_fencPic->m_picCsp) && m_reconPic->create(param);
     if (ok)
     {
         /* initialize right border of m_reconpicYuv as SAO may read beyond the
          * end of the picture accessing uninitialized pixels */
-        int maxHeight = sps.numCuInHeight * g_maxCUSize;
+        int maxHeight = sps.numCuInHeight * param->maxCUSize;
         memset(m_reconPic->m_picOrg[0], 0, sizeof(pixel)* m_reconPic->m_stride * maxHeight);
 
         /* use pre-calculated cu/pu offsets cached in the SPS structure */
@@ -166,6 +186,35 @@
         delete[] m_userSEI.payloads;
     }
 
+    if (m_ctuInfo)
+    {
+        uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+        uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
+        uint32_t numCUsInFrame = widthInCU * heightInCU;
+        for (uint32_t i = 0; i < numCUsInFrame; i++)
+        {
+            X265_FREE((*m_ctuInfo + i)->ctuInfo);
+            (*m_ctuInfo + i)->ctuInfo = NULL;
+            X265_FREE(m_addOnDepth[i]);
+            m_addOnDepth[i] = NULL;
+            X265_FREE(m_addOnCtuInfo[i]);
+            m_addOnCtuInfo[i] = NULL;
+            X265_FREE(m_addOnPrevChange[i]);
+            m_addOnPrevChange[i] = NULL;
+        }
+        X265_FREE(*m_ctuInfo);
+        *m_ctuInfo = NULL;
+        X265_FREE(m_ctuInfo);
+        m_ctuInfo = NULL;
+        X265_FREE(m_prevCtuInfoChange);
+        m_prevCtuInfoChange = NULL;
+        X265_FREE(m_addOnDepth);
+        m_addOnDepth = NULL;
+        X265_FREE(m_addOnCtuInfo);
+        m_addOnCtuInfo = NULL;
+        X265_FREE(m_addOnPrevChange);
+        m_addOnPrevChange = NULL;
+    }
     m_lowres.destroy();
     X265_FREE(m_rcData);
 }
​

x265_2.4.tar.gz/source/common/frame.h -> x265_2.5.tar.gz/source/common/frame.h Changed

 
@@ -66,6 +66,10 @@
     double   shortTermCplxCount;
     int64_t  totalBits;
     int64_t  encodedBits;
+    double   coeff[4];
+    double   count[4];
+    double   offset[4];
+    double   bufferFillFinal;
 };
 
 class Frame
@@ -108,7 +112,14 @@
     x265_analysis_2Pass    m_analysis2Pass;
     RcStats*               m_rcData;
 
+    x265_ctu_info_t**      m_ctuInfo;
+    Event                  m_copied;
+    int*                   m_prevCtuInfoChange;
     int64_t                m_encodeStartTime;
+
+    uint8_t**              m_addOnDepth;
+    uint8_t**              m_addOnCtuInfo;
+    int**                  m_addOnPrevChange;
     Frame();
 
     bool create(x265_param *param, float* quantOffsets);
​

x265_2.4.tar.gz/source/common/framedata.cpp -> x265_2.5.tar.gz/source/common/framedata.cpp Changed

 
@@ -41,9 +41,9 @@
     if (param.rc.bStatWrite)
         m_spsrps = const_cast<RPS*>(sps.spsrps);
 
-    m_cuMemPool.create(0, param.internalCsp, sps.numCUsInFrame);
+    m_cuMemPool.create(0, param.internalCsp, sps.numCUsInFrame, param);
     for (uint32_t ctuAddr = 0; ctuAddr < sps.numCUsInFrame; ctuAddr++)
-        m_picCTU[ctuAddr].initialize(m_cuMemPool, 0, param.internalCsp, ctuAddr);
+        m_picCTU[ctuAddr].initialize(m_cuMemPool, 0, param, ctuAddr);
 
     CHECKED_MALLOC_ZERO(m_cuStat, RCStatCU, sps.numCUsInFrame);
     CHECKED_MALLOC(m_rowStat, RCStatRow, sps.numCuInHeight);
​

x265_2.4.tar.gz/source/common/framedata.h -> x265_2.5.tar.gz/source/common/framedata.h Changed

 
@@ -62,6 +62,7 @@
     double      percentMergeCu[NUM_CU_DEPTH];
     double      percentIntraDistribution[NUM_CU_DEPTH][INTRA_MODES];
     double      percentInterDistribution[NUM_CU_DEPTH][3];           // 2Nx2N, RECT, AMP modes percentage
+    double      ipCostRatio;
 
     uint64_t    cntIntraNxN;
     uint64_t    totalCu;
@@ -78,6 +79,15 @@
     uint64_t    cuInterDistribution[NUM_CU_DEPTH][INTER_MODES];
     uint64_t    cuIntraDistribution[NUM_CU_DEPTH][INTRA_MODES];
 
+
+    uint64_t    totalPu[NUM_CU_DEPTH + 1];
+    uint64_t    cntSkipPu[NUM_CU_DEPTH];
+    uint64_t    cntIntraPu[NUM_CU_DEPTH];
+    uint64_t    cntAmp[NUM_CU_DEPTH];
+    uint64_t    cnt4x4;
+    uint64_t    cntInterPu[NUM_CU_DEPTH][INTER_MODES - 1];
+    uint64_t    cntMergePu[NUM_CU_DEPTH][INTER_MODES - 1];
+
     FrameStats()
     {
         memset(this, 0, sizeof(FrameStats));
​

x265_2.4.tar.gz/source/common/ipfilter.cpp -> x265_2.5.tar.gz/source/common/ipfilter.cpp Changed

 
@@ -123,9 +123,8 @@
     const int16_t* coeff = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx];
     int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
     int shift = IF_FILTER_PREC - headRoom;
-    int offset = -IF_INTERNAL_OFFS << shift;
+    int offset = (unsigned)-IF_INTERNAL_OFFS << shift;
     int blkheight = height;
-
     src -= N / 2 - 1;
 
     if (isRowExt)
@@ -209,10 +208,8 @@
     const int16_t* c = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx];
     int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
     int shift = IF_FILTER_PREC - headRoom;
-    int offset = -IF_INTERNAL_OFFS << shift;
-
+    int offset = (unsigned)-IF_INTERNAL_OFFS << shift;
     src -= (N / 2 - 1) * srcStride;
-
     int row, col;
     for (row = 0; row < height; row++)
     {
​

x265_2.4.tar.gz/source/common/lowres.h -> x265_2.5.tar.gz/source/common/lowres.h Changed

 
@@ -118,6 +118,8 @@
     bool   bKeyframe;
     bool   bLastMiniGopBFrame;
 
+    double ipCostRatio;
+
     /* lookahead output data */
     int64_t   costEst[X265_BFRAME_MAX + 2][X265_BFRAME_MAX + 2];
     int64_t   costEstAq[X265_BFRAME_MAX + 2][X265_BFRAME_MAX + 2];
​

x265_2.4.tar.gz/source/common/param.cpp -> x265_2.5.tar.gz/source/common/param.cpp Changed

@@ -110,6 +110,7 @@
     param->frameNumThreads = 0;
 
     param->logLevel = X265_LOG_INFO;
+    param->csvLogLevel = 0;
     param->csvfn = NULL;
     param->rc.lambdaFileName = NULL;
     param->bLogCuStats = 0;
@@ -194,10 +195,10 @@
     param->rdPenalty = 0;
     param->psyRd = 2.0;
     param->psyRdoq = 0.0;
-    param->analysisMode = 0;
+    param->analysisReuseMode = 0;
     param->analysisMultiPassRefine = 0;
     param->analysisMultiPassDistortion = 0;
-    param->analysisFileName = NULL;
+    param->analysisReuseFileName = NULL;
     param->bIntraInBFrames = 0;
     param->bLossless = 0;
     param->bCULossless = 0;
@@ -236,6 +237,7 @@
     param->rc.bEnableGrain = 0;
     param->rc.qpMin = 0;
     param->rc.qpMax = QP_MAX_MAX;
+    param->rc.bEnableConstVbv = 0;
 
     /* Video Usability Information (VUI) */
     param->vui.aspectRatioIdc = 0;
@@ -271,10 +273,18 @@
     param->bOptCUDeltaQP        = 0;
     param->bAQMotion = 0;
     param->bHDROpt = 0;
-    param->analysisRefineLevel = 5;
+    param->analysisReuseLevel = 5;
 
     param->toneMapFile = NULL;
     param->bDhdr10opt = 0;
+    param->bCTUInfo = 0;
+    param->bUseRcStats = 0;
+    param->scaleFactor = 0;
+    param->intraRefine = 0;
+    param->interRefine = 0;
+    param->mvRefine = 0;
+    param->bUseAnalysisFile = 1;
+    param->csvfpt = NULL;
 }
 
 int x265_param_default_preset(x265_param* param, const char* preset, const char* tune)
@@ -494,6 +504,7 @@
             param->psyRd = 4.0;
             param->psyRdoq = 10.0;
             param->bEnableSAO = 0;
+            param->rc.bEnableConstVbv = 1;
         }
         else
             return -1;
@@ -828,7 +839,7 @@
         p->rc.bStrictCbr = atobool(value);
         p->rc.pbFactor = 1.0;
     }
-    OPT("analysis-mode") p->analysisMode = parseName(value, x265_analysis_names, bError);
+    OPT("analysis-reuse-mode") p->analysisReuseMode = parseName(value, x265_analysis_names, bError);
     OPT("sar")
     {
         p->vui.aspectRatioIdc = parseName(value, x265_sar_names, bError);
@@ -907,7 +918,7 @@
     OPT("scaling-list") p->scalingLists = strdup(value);
     OPT2("pools", "numa-pools") p->numaPools = strdup(value);
     OPT("lambda-file") p->rc.lambdaFileName = strdup(value);
-    OPT("analysis-file") p->analysisFileName = strdup(value);
+    OPT("analysis-reuse-file") p->analysisReuseFileName = strdup(value);
     OPT("qg-size") p->rc.qgSize = atoi(value);
     OPT("master-display") p->masteringDisplayColorVolume = strdup(value);
     OPT("max-cll") bError |= sscanf(value, "%hu,%hu", &p->maxCLL, &p->maxFALL) != 2;
@@ -921,6 +932,8 @@
     if (bExtraParams)
     {
         if (0) ;
+        OPT("csv") p->csvfn = strdup(value);
+        OPT("csv-log-level") p->csvLogLevel = atoi(value);
         OPT("qpmin") p->rc.qpMin = atoi(value);
         OPT("analyze-src-pics") p->bSourceReferenceEstimation = atobool(value);
         OPT("log2-max-poc-lsb") p->log2MaxPocLsb = atoi(value);
@@ -938,7 +951,7 @@
         OPT("multi-pass-opt-distortion") p->analysisMultiPassDistortion = atobool(value);
         OPT("aq-motion") p->bAQMotion = atobool(value);
         OPT("dynamic-rd") p->dynamicRd = atof(value);
-        OPT("refine-level") p->analysisRefineLevel = atoi(value);
+        OPT("analysis-reuse-level") p->analysisReuseLevel = atoi(value);
         OPT("ssim-rd")
         {
             int bval = atobool(value);
@@ -954,6 +967,12 @@
         OPT("limit-sao") p->bLimitSAO = atobool(value);
         OPT("dhdr10-info") p->toneMapFile = strdup(value);
         OPT("dhdr10-opt") p->bDhdr10opt = atobool(value);
+        OPT("const-vbv") p->rc.bEnableConstVbv = atobool(value);
+        OPT("ctu-info") p->bCTUInfo = atoi(value);
+        OPT("scale-factor") p->scaleFactor = atoi(value);
+        OPT("refine-intra")p->intraRefine = atoi(value);
+        OPT("refine-inter")p->interRefine = atobool(value);
+        OPT("refine-mv")p->mvRefine = atobool(value);
         else
             return X265_PARAM_BAD_NAME;
     }
@@ -1284,16 +1303,19 @@
           "Constant QP is incompatible with 2pass");
     CHECK(param->rc.bStrictCbr && (param->rc.bitrate <= 0 || param->rc.vbvBufferSize <=0),
           "Strict-cbr cannot be applied without specifying target bitrate or vbv bufsize");
-    CHECK(param->analysisMode && (param->analysisMode < X265_ANALYSIS_OFF || param->analysisMode > X265_ANALYSIS_LOAD),
+    CHECK(param->analysisReuseMode && (param->analysisReuseMode < X265_ANALYSIS_OFF || param->analysisReuseMode > X265_ANALYSIS_LOAD),
         "Invalid analysis mode. Analysis mode 0: OFF 1: SAVE : 2 LOAD");
-    CHECK(param->analysisMode && (param->analysisRefineLevel < 1 || param->analysisRefineLevel > 10),
+    CHECK(param->analysisReuseMode && (param->analysisReuseLevel < 1 || param->analysisReuseLevel > 10),
         "Invalid analysis refine level. Value must be between 1 and 10 (inclusive)");
+    CHECK(param->scaleFactor > 2, "Invalid scale-factor. Supports factor <= 2");
     CHECK(param->rc.qpMax < QP_MIN || param->rc.qpMax > QP_MAX_MAX,
         "qpmax exceeds supported range (0 to 69)");
     CHECK(param->rc.qpMin < QP_MIN || param->rc.qpMin > QP_MAX_MAX,
         "qpmin exceeds supported range (0 to 69)");
     CHECK(param->log2MaxPocLsb < 4 || param->log2MaxPocLsb > 16,
         "Supported range for log2MaxPocLsb is 4 to 16");
+    CHECK(param->bCTUInfo < 0 || (param->bCTUInfo != 0 && param->bCTUInfo != 1 && param->bCTUInfo != 2 && param->bCTUInfo != 4 && param->bCTUInfo != 6) || param->bCTUInfo > 6,
+        "Supported values for bCTUInfo are 0, 1, 2, 4, 6");
 #if !X86_64
     CHECK(param->searchMethod == X265_SEA && (param->sourceWidth > 840 || param->sourceHeight > 480),
         "SEA motion search does not support resolutions greater than 480p in 32 bit build");
@@ -1322,42 +1344,6 @@
     }
 }
 
-int x265_set_globals(x265_param* param)
-{
-    uint32_t maxLog2CUSize = (uint32_t)g_log2Size[param->maxCUSize];
-    uint32_t minLog2CUSize = (uint32_t)g_log2Size[param->minCUSize];
-
-    Lock gLock;
-    ScopedLock sLock(gLock);
-
-    if (++g_ctuSizeConfigured > 1)
-    {
-        if (g_maxCUSize != param->maxCUSize)
-        {
-            x265_log(param, X265_LOG_WARNING, "maxCUSize must be the same for all encoders in a single process");
-        }
-        if (g_maxCUDepth != maxLog2CUSize - minLog2CUSize)
-        {
-            x265_log(param, X265_LOG_WARNING, "maxCUDepth must be the same for all encoders in a single process");
-        }
-        param->maxCUSize = g_maxCUSize;
-        return x265_check_params(param); /* Check again, since param may have changed */
-    }
-    else
-    {
-        // set max CU width & height
-        g_maxCUSize     = param->maxCUSize;
-        g_maxLog2CUSize = maxLog2CUSize;
-
-        // compute actual CU depth with respect to config depth and max transform size
-        g_maxCUDepth    = maxLog2CUSize - minLog2CUSize;
-        g_unitSizeDepth = maxLog2CUSize - LOG2_UNIT_SIZE;
-    }
-
-    g_maxSlices = param->maxSlices;
-    return 0;
-}
-
 static void appendtool(x265_param* param, char* buf, size_t size, const char* toolstr)
 {
     static const int overhead = (int)strlen("x265 [info]: tools: ");
@@ -1457,6 +1443,7 @@
     TOOLOPT(param->bEnableStrongIntraSmoothing, "strong-intra-smoothing");
     TOOLVAL(param->lookaheadSlices, "lslices=%d");
     TOOLVAL(param->lookaheadThreads, "lthreads=%d")
+    TOOLVAL(param->bCTUInfo, "ctu-info=%d");
     if (param->maxSlices > 1)
         TOOLVAL(param->maxSlices, "slices=%d");
     if (param->bEnableLoopFilter)
@@ -1473,8 +1460,8 @@
     TOOLOPT(!param->bSaoNonDeblocked && param->bEnableSAO, "sao");
     TOOLOPT(param->rc.bStatWrite, "stats-write");
     TOOLOPT(param->rc.bStatRead,  "stats-read");
-#if ENABLE_DYNAMIC_HDR10
-    TOOLVAL(param->toneMapFile != NULL, "dhdr10-info");
+#if ENABLE_HDR10_PLUS
+    TOOLOPT(param->toneMapFile != NULL, "dhdr10-info");
 #endif
     x265_log(param, X265_LOG_INFO, "tools:%s\n", buf);
     fflush(stderr);
@@ -1501,6 +1488,8 @@
     BOOL(p->bEnablePsnr, "psnr");
     BOOL(p->bEnableSsim, "ssim");
     s += sprintf(s, " log-level=%d", p->logLevel);
+    if (p->csvfn)
+        s += sprintf(s, " csvfn=%s csv-log-level=%d", p->csvfn, p->csvLogLevel);
     s += sprintf(s, " bitdepth=%d", p->internalBitDepth);
     s += sprintf(s, " input-csp=%d", p->internalCsp);
     s += sprintf(s, " fps=%u/%u", p->fpsNum, p->fpsDenom);
@@ -1573,7 +1562,7 @@

 
@@ -110,6 +110,7 @@
     param->frameNumThreads = 0;
 
     param->logLevel = X265_LOG_INFO;
+    param->csvLogLevel = 0;
     param->csvfn = NULL;
     param->rc.lambdaFileName = NULL;
     param->bLogCuStats = 0;
@@ -194,10 +195,10 @@
     param->rdPenalty = 0;
     param->psyRd = 2.0;
     param->psyRdoq = 0.0;
-    param->analysisMode = 0;
+    param->analysisReuseMode = 0;
     param->analysisMultiPassRefine = 0;
     param->analysisMultiPassDistortion = 0;
-    param->analysisFileName = NULL;
+    param->analysisReuseFileName = NULL;
     param->bIntraInBFrames = 0;
     param->bLossless = 0;
     param->bCULossless = 0;
@@ -236,6 +237,7 @@
     param->rc.bEnableGrain = 0;
     param->rc.qpMin = 0;
     param->rc.qpMax = QP_MAX_MAX;
+    param->rc.bEnableConstVbv = 0;
 
     /* Video Usability Information (VUI) */
     param->vui.aspectRatioIdc = 0;
@@ -271,10 +273,18 @@
     param->bOptCUDeltaQP        = 0;
     param->bAQMotion = 0;
     param->bHDROpt = 0;
-    param->analysisRefineLevel = 5;
+    param->analysisReuseLevel = 5;
 
     param->toneMapFile = NULL;
     param->bDhdr10opt = 0;
+    param->bCTUInfo = 0;
+    param->bUseRcStats = 0;
+    param->scaleFactor = 0;
+    param->intraRefine = 0;
+    param->interRefine = 0;
+    param->mvRefine = 0;
+    param->bUseAnalysisFile = 1;
+    param->csvfpt = NULL;
 }
 
 int x265_param_default_preset(x265_param* param, const char* preset, const char* tune)
@@ -494,6 +504,7 @@
             param->psyRd = 4.0;
             param->psyRdoq = 10.0;
             param->bEnableSAO = 0;
+            param->rc.bEnableConstVbv = 1;
         }
         else
             return -1;
@@ -828,7 +839,7 @@
         p->rc.bStrictCbr = atobool(value);
         p->rc.pbFactor = 1.0;
     }
-    OPT("analysis-mode") p->analysisMode = parseName(value, x265_analysis_names, bError);
+    OPT("analysis-reuse-mode") p->analysisReuseMode = parseName(value, x265_analysis_names, bError);
     OPT("sar")
     {
         p->vui.aspectRatioIdc = parseName(value, x265_sar_names, bError);
@@ -907,7 +918,7 @@
     OPT("scaling-list") p->scalingLists = strdup(value);
     OPT2("pools", "numa-pools") p->numaPools = strdup(value);
     OPT("lambda-file") p->rc.lambdaFileName = strdup(value);
-    OPT("analysis-file") p->analysisFileName = strdup(value);
+    OPT("analysis-reuse-file") p->analysisReuseFileName = strdup(value);
     OPT("qg-size") p->rc.qgSize = atoi(value);
     OPT("master-display") p->masteringDisplayColorVolume = strdup(value);
     OPT("max-cll") bError |= sscanf(value, "%hu,%hu", &p->maxCLL, &p->maxFALL) != 2;
@@ -921,6 +932,8 @@
     if (bExtraParams)
     {
         if (0) ;
+        OPT("csv") p->csvfn = strdup(value);
+        OPT("csv-log-level") p->csvLogLevel = atoi(value);
         OPT("qpmin") p->rc.qpMin = atoi(value);
         OPT("analyze-src-pics") p->bSourceReferenceEstimation = atobool(value);
         OPT("log2-max-poc-lsb") p->log2MaxPocLsb = atoi(value);
@@ -938,7 +951,7 @@
         OPT("multi-pass-opt-distortion") p->analysisMultiPassDistortion = atobool(value);
         OPT("aq-motion") p->bAQMotion = atobool(value);
         OPT("dynamic-rd") p->dynamicRd = atof(value);
-        OPT("refine-level") p->analysisRefineLevel = atoi(value);
+        OPT("analysis-reuse-level") p->analysisReuseLevel = atoi(value);
         OPT("ssim-rd")
         {
             int bval = atobool(value);
@@ -954,6 +967,12 @@
         OPT("limit-sao") p->bLimitSAO = atobool(value);
         OPT("dhdr10-info") p->toneMapFile = strdup(value);
         OPT("dhdr10-opt") p->bDhdr10opt = atobool(value);
+        OPT("const-vbv") p->rc.bEnableConstVbv = atobool(value);
+        OPT("ctu-info") p->bCTUInfo = atoi(value);
+        OPT("scale-factor") p->scaleFactor = atoi(value);
+        OPT("refine-intra")p->intraRefine = atoi(value);
+        OPT("refine-inter")p->interRefine = atobool(value);
+        OPT("refine-mv")p->mvRefine = atobool(value);
         else
             return X265_PARAM_BAD_NAME;
     }
@@ -1284,16 +1303,19 @@
           "Constant QP is incompatible with 2pass");
     CHECK(param->rc.bStrictCbr && (param->rc.bitrate <= 0 || param->rc.vbvBufferSize <=0),
           "Strict-cbr cannot be applied without specifying target bitrate or vbv bufsize");
-    CHECK(param->analysisMode && (param->analysisMode < X265_ANALYSIS_OFF || param->analysisMode > X265_ANALYSIS_LOAD),
+    CHECK(param->analysisReuseMode && (param->analysisReuseMode < X265_ANALYSIS_OFF || param->analysisReuseMode > X265_ANALYSIS_LOAD),
         "Invalid analysis mode. Analysis mode 0: OFF 1: SAVE : 2 LOAD");
-    CHECK(param->analysisMode && (param->analysisRefineLevel < 1 || param->analysisRefineLevel > 10),
+    CHECK(param->analysisReuseMode && (param->analysisReuseLevel < 1 || param->analysisReuseLevel > 10),
         "Invalid analysis refine level. Value must be between 1 and 10 (inclusive)");
+    CHECK(param->scaleFactor > 2, "Invalid scale-factor. Supports factor <= 2");
     CHECK(param->rc.qpMax < QP_MIN || param->rc.qpMax > QP_MAX_MAX,
         "qpmax exceeds supported range (0 to 69)");
     CHECK(param->rc.qpMin < QP_MIN || param->rc.qpMin > QP_MAX_MAX,
         "qpmin exceeds supported range (0 to 69)");
     CHECK(param->log2MaxPocLsb < 4 || param->log2MaxPocLsb > 16,
         "Supported range for log2MaxPocLsb is 4 to 16");
+    CHECK(param->bCTUInfo < 0 || (param->bCTUInfo != 0 && param->bCTUInfo != 1 && param->bCTUInfo != 2 && param->bCTUInfo != 4 && param->bCTUInfo != 6) || param->bCTUInfo > 6,
+        "Supported values for bCTUInfo are 0, 1, 2, 4, 6");
 #if !X86_64
     CHECK(param->searchMethod == X265_SEA && (param->sourceWidth > 840 || param->sourceHeight > 480),
         "SEA motion search does not support resolutions greater than 480p in 32 bit build");
@@ -1322,42 +1344,6 @@
     }
 }
 
-int x265_set_globals(x265_param* param)
-{
-    uint32_t maxLog2CUSize = (uint32_t)g_log2Size[param->maxCUSize];
-    uint32_t minLog2CUSize = (uint32_t)g_log2Size[param->minCUSize];
-
-    Lock gLock;
-    ScopedLock sLock(gLock);
-
-    if (++g_ctuSizeConfigured > 1)
-    {
-        if (g_maxCUSize != param->maxCUSize)
-        {
-            x265_log(param, X265_LOG_WARNING, "maxCUSize must be the same for all encoders in a single process");
-        }
-        if (g_maxCUDepth != maxLog2CUSize - minLog2CUSize)
-        {
-            x265_log(param, X265_LOG_WARNING, "maxCUDepth must be the same for all encoders in a single process");
-        }
-        param->maxCUSize = g_maxCUSize;
-        return x265_check_params(param); /* Check again, since param may have changed */
-    }
-    else
-    {
-        // set max CU width & height
-        g_maxCUSize     = param->maxCUSize;
-        g_maxLog2CUSize = maxLog2CUSize;
-
-        // compute actual CU depth with respect to config depth and max transform size
-        g_maxCUDepth    = maxLog2CUSize - minLog2CUSize;
-        g_unitSizeDepth = maxLog2CUSize - LOG2_UNIT_SIZE;
-    }
-
-    g_maxSlices = param->maxSlices;
-    return 0;
-}
-
 static void appendtool(x265_param* param, char* buf, size_t size, const char* toolstr)
 {
     static const int overhead = (int)strlen("x265 [info]: tools: ");
@@ -1457,6 +1443,7 @@
     TOOLOPT(param->bEnableStrongIntraSmoothing, "strong-intra-smoothing");
     TOOLVAL(param->lookaheadSlices, "lslices=%d");
     TOOLVAL(param->lookaheadThreads, "lthreads=%d")
+    TOOLVAL(param->bCTUInfo, "ctu-info=%d");
     if (param->maxSlices > 1)
         TOOLVAL(param->maxSlices, "slices=%d");
     if (param->bEnableLoopFilter)
@@ -1473,8 +1460,8 @@
     TOOLOPT(!param->bSaoNonDeblocked && param->bEnableSAO, "sao");
     TOOLOPT(param->rc.bStatWrite, "stats-write");
     TOOLOPT(param->rc.bStatRead,  "stats-read");
-#if ENABLE_DYNAMIC_HDR10
-    TOOLVAL(param->toneMapFile != NULL, "dhdr10-info");
+#if ENABLE_HDR10_PLUS
+    TOOLOPT(param->toneMapFile != NULL, "dhdr10-info");
 #endif
     x265_log(param, X265_LOG_INFO, "tools:%s\n", buf);
     fflush(stderr);
@@ -1501,6 +1488,8 @@
     BOOL(p->bEnablePsnr, "psnr");
     BOOL(p->bEnableSsim, "ssim");
     s += sprintf(s, " log-level=%d", p->logLevel);
+    if (p->csvfn)
+        s += sprintf(s, " csvfn=%s csv-log-level=%d", p->csvfn, p->csvLogLevel);
     s += sprintf(s, " bitdepth=%d", p->internalBitDepth);
     s += sprintf(s, " input-csp=%d", p->internalCsp);
     s += sprintf(s, " fps=%u/%u", p->fpsNum, p->fpsDenom);
@@ -1573,7 +1562,7 @@
​

x265_2.4.tar.gz/source/common/param.h -> x265_2.5.tar.gz/source/common/param.h Changed

 
@@ -28,7 +28,6 @@
 namespace X265_NS {
 
 int   x265_check_params(x265_param *param);
-int   x265_set_globals(x265_param *param);
 void  x265_print_params(x265_param *param);
 void  x265_param_apply_fastfirstpass(x265_param *p);
 char* x265_param2string(x265_param *param, int padx, int pady);
​

x265_2.4.tar.gz/source/common/picyuv.cpp -> x265_2.5.tar.gz/source/common/picyuv.cpp Changed

@@ -46,36 +46,62 @@
 
     m_maxLumaLevel = 0;
     m_avgLumaLevel = 0;
+
+    m_maxChromaULevel = 0;
+    m_avgChromaULevel = 0;
+
+    m_maxChromaVLevel = 0;
+    m_avgChromaVLevel = 0;
+
+#if (X265_DEPTH > 8)
+    m_minLumaLevel = 0xFFFF;
+    m_minChromaULevel = 0xFFFF;
+    m_minChromaVLevel = 0xFFFF;
+#else
+    m_minLumaLevel = 0xFF;
+    m_minChromaULevel = 0xFF;
+    m_minChromaVLevel = 0xFF;
+#endif
+
     m_stride = 0;
     m_strideC = 0;
     m_hChromaShift = 0;
     m_vChromaShift = 0;
 }
 
-bool PicYuv::create(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp)
+bool PicYuv::create(x265_param* param, pixel *pixelbuf)
 {
+    m_param = param;
+    uint32_t picWidth = m_param->sourceWidth;
+    uint32_t picHeight = m_param->sourceHeight;
+    uint32_t picCsp = m_param->internalCsp;
     m_picWidth  = picWidth;
     m_picHeight = picHeight;
     m_hChromaShift = CHROMA_H_SHIFT(picCsp);
     m_vChromaShift = CHROMA_V_SHIFT(picCsp);
     m_picCsp = picCsp;
 
-    uint32_t numCuInWidth = (m_picWidth + g_maxCUSize - 1)  / g_maxCUSize;
-    uint32_t numCuInHeight = (m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
+    uint32_t numCuInWidth = (m_picWidth + param->maxCUSize - 1)  / param->maxCUSize;
+    uint32_t numCuInHeight = (m_picHeight + param->maxCUSize - 1) / param->maxCUSize;
 
-    m_lumaMarginX = g_maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
-    m_lumaMarginY = g_maxCUSize + 16; // margin for 8-tap filter and infinite padding
-    m_stride = (numCuInWidth * g_maxCUSize) + (m_lumaMarginX << 1);
+    m_lumaMarginX = param->maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
+    m_lumaMarginY = param->maxCUSize + 16; // margin for 8-tap filter and infinite padding
+    m_stride = (numCuInWidth * param->maxCUSize) + (m_lumaMarginX << 1);
 
-    int maxHeight = numCuInHeight * g_maxCUSize;
-    CHECKED_MALLOC(m_picBuf[0], pixel, m_stride * (maxHeight + (m_lumaMarginY * 2)));
-    m_picOrg[0] = m_picBuf[0] + m_lumaMarginY * m_stride + m_lumaMarginX;
+    int maxHeight = numCuInHeight * param->maxCUSize;
+    if (pixelbuf)
+        m_picOrg[0] = pixelbuf;
+    else
+    {
+        CHECKED_MALLOC(m_picBuf[0], pixel, m_stride * (maxHeight + (m_lumaMarginY * 2)));
+        m_picOrg[0] = m_picBuf[0] + m_lumaMarginY * m_stride + m_lumaMarginX;
+    }
 
     if (picCsp != X265_CSP_I400)
     {
         m_chromaMarginX = m_lumaMarginX;  // keep 16-byte alignment for chroma CTUs
         m_chromaMarginY = m_lumaMarginY >> m_vChromaShift;
-        m_strideC = ((numCuInWidth * g_maxCUSize) >> m_hChromaShift) + (m_chromaMarginX * 2);
+        m_strideC = ((numCuInWidth * m_param->maxCUSize) >> m_hChromaShift) + (m_chromaMarginX * 2);
 
         CHECKED_MALLOC(m_picBuf[1], pixel, m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2)));
         CHECKED_MALLOC(m_picBuf[2], pixel, m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2)));
@@ -94,12 +120,33 @@
     return false;
 }
 
+int PicYuv::getLumaBufLen(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp)
+{
+    m_picWidth = picWidth;
+    m_picHeight = picHeight;
+    m_hChromaShift = CHROMA_H_SHIFT(picCsp);
+    m_vChromaShift = CHROMA_V_SHIFT(picCsp);
+    m_picCsp = picCsp;
+
+    uint32_t numCuInWidth = (m_picWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
+    uint32_t numCuInHeight = (m_picHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
+
+    m_lumaMarginX = m_param->maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
+    m_lumaMarginY = m_param->maxCUSize + 16; // margin for 8-tap filter and infinite padding
+    m_stride = (numCuInWidth * m_param->maxCUSize) + (m_lumaMarginX << 1);
+
+    int maxHeight = numCuInHeight * m_param->maxCUSize;
+    int bufLen = (int)(m_stride * (maxHeight + (m_lumaMarginY * 2)));
+
+    return bufLen;
+}
+
 /* the first picture allocated by the encoder will be asked to generate these
  * offset arrays. Once generated, they will be provided to all future PicYuv
  * allocated by the same encoder. */
 bool PicYuv::createOffsets(const SPS& sps)
 {
-    uint32_t numPartitions = 1 << (g_unitSizeDepth * 2);
+    uint32_t numPartitions = 1 << (m_param->unitSizeDepth * 2);
 
     if (m_picCsp != X265_CSP_I400)
     {
@@ -109,8 +156,8 @@
         {
             for (uint32_t cuCol = 0; cuCol < sps.numCuInWidth; cuCol++)
             {
-                m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * g_maxCUSize + cuCol * g_maxCUSize;
-                m_cuOffsetC[cuRow * sps.numCuInWidth + cuCol] = m_strideC * cuRow * (g_maxCUSize >> m_vChromaShift) + cuCol * (g_maxCUSize >> m_hChromaShift);
+                m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * m_param->maxCUSize + cuCol * m_param->maxCUSize;
+                m_cuOffsetC[cuRow * sps.numCuInWidth + cuCol] = m_strideC * cuRow * (m_param->maxCUSize >> m_vChromaShift) + cuCol * (m_param->maxCUSize >> m_hChromaShift);
             }
         }
 
@@ -129,7 +176,7 @@
         CHECKED_MALLOC(m_cuOffsetY, intptr_t, sps.numCuInWidth * sps.numCuInHeight);
         for (uint32_t cuRow = 0; cuRow < sps.numCuInHeight; cuRow++)
         for (uint32_t cuCol = 0; cuCol < sps.numCuInWidth; cuCol++)
-            m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * g_maxCUSize + cuCol * g_maxCUSize;
+            m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * m_param->maxCUSize + cuCol * m_param->maxCUSize;
 
         CHECKED_MALLOC(m_buOffsetY, intptr_t, (size_t)numPartitions);
         for (uint32_t idx = 0; idx < numPartitions; ++idx)
@@ -184,6 +231,11 @@
 
     X265_CHECK(pic.bitDepth >= 8, "pic.bitDepth check failure");
 
+    uint64_t lumaSum;
+    uint64_t cbSum;
+    uint64_t crSum;
+    lumaSum = cbSum = crSum = 0;
+
     if (pic.bitDepth == 8)
     {
 #if (X265_DEPTH > 8)
@@ -288,6 +340,47 @@
     pixel *U = m_picOrg[1];
     pixel *V = m_picOrg[2];
 
+    pixel *yPic = m_picOrg[0];
+    pixel *uPic = m_picOrg[1];
+    pixel *vPic = m_picOrg[2];
+
+    for (int r = 0; r < height; r++)
+    {
+        for (int c = 0; c < width; c++)
+        {
+            m_maxLumaLevel = X265_MAX(yPic[c], m_maxLumaLevel);
+            m_minLumaLevel = X265_MIN(yPic[c], m_minLumaLevel);
+            lumaSum += yPic[c];
+        }
+        yPic += m_stride;
+    }
+    m_avgLumaLevel = (double)lumaSum / (m_picHeight * m_picWidth);
+
+    if (param.csvLogLevel >= 2)
+    {
+        if (param.internalCsp != X265_CSP_I400)
+        {
+            for (int r = 0; r < height >> m_vChromaShift; r++)
+            {
+                for (int c = 0; c < width >> m_hChromaShift; c++)
+                {
+                    m_maxChromaULevel = X265_MAX(uPic[c], m_maxChromaULevel);
+                    m_minChromaULevel = X265_MIN(uPic[c], m_minChromaULevel);
+                    cbSum += uPic[c];
+
+                    m_maxChromaVLevel = X265_MAX(vPic[c], m_maxChromaVLevel);
+                    m_minChromaVLevel = X265_MIN(vPic[c], m_minChromaVLevel);
+                    crSum += vPic[c];
+                }
+
+                uPic += m_strideC;
+                vPic += m_strideC;
+            }
+            m_avgChromaULevel = (double)cbSum / ((height >> m_vChromaShift) * (width >> m_hChromaShift));
+            m_avgChromaVLevel = (double)crSum / ((height >> m_vChromaShift) * (width >> m_hChromaShift));
+        }
+    }
+
 #if HIGH_BIT_DEPTH
     bool calcHDRParams = !!param.minLuma || (param.maxLuma != PIXEL_MAX);
     /* Apply min/max luma bounds for HDR pixel manipulations */

 
@@ -46,36 +46,62 @@
 
     m_maxLumaLevel = 0;
     m_avgLumaLevel = 0;
+
+    m_maxChromaULevel = 0;
+    m_avgChromaULevel = 0;
+
+    m_maxChromaVLevel = 0;
+    m_avgChromaVLevel = 0;
+
+#if (X265_DEPTH > 8)
+    m_minLumaLevel = 0xFFFF;
+    m_minChromaULevel = 0xFFFF;
+    m_minChromaVLevel = 0xFFFF;
+#else
+    m_minLumaLevel = 0xFF;
+    m_minChromaULevel = 0xFF;
+    m_minChromaVLevel = 0xFF;
+#endif
+
     m_stride = 0;
     m_strideC = 0;
     m_hChromaShift = 0;
     m_vChromaShift = 0;
 }
 
-bool PicYuv::create(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp)
+bool PicYuv::create(x265_param* param, pixel *pixelbuf)
 {
+    m_param = param;
+    uint32_t picWidth = m_param->sourceWidth;
+    uint32_t picHeight = m_param->sourceHeight;
+    uint32_t picCsp = m_param->internalCsp;
     m_picWidth  = picWidth;
     m_picHeight = picHeight;
     m_hChromaShift = CHROMA_H_SHIFT(picCsp);
     m_vChromaShift = CHROMA_V_SHIFT(picCsp);
     m_picCsp = picCsp;
 
-    uint32_t numCuInWidth = (m_picWidth + g_maxCUSize - 1)  / g_maxCUSize;
-    uint32_t numCuInHeight = (m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
+    uint32_t numCuInWidth = (m_picWidth + param->maxCUSize - 1)  / param->maxCUSize;
+    uint32_t numCuInHeight = (m_picHeight + param->maxCUSize - 1) / param->maxCUSize;
 
-    m_lumaMarginX = g_maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
-    m_lumaMarginY = g_maxCUSize + 16; // margin for 8-tap filter and infinite padding
-    m_stride = (numCuInWidth * g_maxCUSize) + (m_lumaMarginX << 1);
+    m_lumaMarginX = param->maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
+    m_lumaMarginY = param->maxCUSize + 16; // margin for 8-tap filter and infinite padding
+    m_stride = (numCuInWidth * param->maxCUSize) + (m_lumaMarginX << 1);
 
-    int maxHeight = numCuInHeight * g_maxCUSize;
-    CHECKED_MALLOC(m_picBuf[0], pixel, m_stride * (maxHeight + (m_lumaMarginY * 2)));
-    m_picOrg[0] = m_picBuf[0] + m_lumaMarginY * m_stride + m_lumaMarginX;
+    int maxHeight = numCuInHeight * param->maxCUSize;
+    if (pixelbuf)
+        m_picOrg[0] = pixelbuf;
+    else
+    {
+        CHECKED_MALLOC(m_picBuf[0], pixel, m_stride * (maxHeight + (m_lumaMarginY * 2)));
+        m_picOrg[0] = m_picBuf[0] + m_lumaMarginY * m_stride + m_lumaMarginX;
+    }
 
     if (picCsp != X265_CSP_I400)
     {
         m_chromaMarginX = m_lumaMarginX;  // keep 16-byte alignment for chroma CTUs
         m_chromaMarginY = m_lumaMarginY >> m_vChromaShift;
-        m_strideC = ((numCuInWidth * g_maxCUSize) >> m_hChromaShift) + (m_chromaMarginX * 2);
+        m_strideC = ((numCuInWidth * m_param->maxCUSize) >> m_hChromaShift) + (m_chromaMarginX * 2);
 
         CHECKED_MALLOC(m_picBuf[1], pixel, m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2)));
         CHECKED_MALLOC(m_picBuf[2], pixel, m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2)));
@@ -94,12 +120,33 @@
     return false;
 }
 
+int PicYuv::getLumaBufLen(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp)
+{
+    m_picWidth = picWidth;
+    m_picHeight = picHeight;
+    m_hChromaShift = CHROMA_H_SHIFT(picCsp);
+    m_vChromaShift = CHROMA_V_SHIFT(picCsp);
+    m_picCsp = picCsp;
+
+    uint32_t numCuInWidth = (m_picWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
+    uint32_t numCuInHeight = (m_picHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
+
+    m_lumaMarginX = m_param->maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
+    m_lumaMarginY = m_param->maxCUSize + 16; // margin for 8-tap filter and infinite padding
+    m_stride = (numCuInWidth * m_param->maxCUSize) + (m_lumaMarginX << 1);
+
+    int maxHeight = numCuInHeight * m_param->maxCUSize;
+    int bufLen = (int)(m_stride * (maxHeight + (m_lumaMarginY * 2)));
+
+    return bufLen;
+}
+
 /* the first picture allocated by the encoder will be asked to generate these
  * offset arrays. Once generated, they will be provided to all future PicYuv
  * allocated by the same encoder. */
 bool PicYuv::createOffsets(const SPS& sps)
 {
-    uint32_t numPartitions = 1 << (g_unitSizeDepth * 2);
+    uint32_t numPartitions = 1 << (m_param->unitSizeDepth * 2);
 
     if (m_picCsp != X265_CSP_I400)
     {
@@ -109,8 +156,8 @@
         {
             for (uint32_t cuCol = 0; cuCol < sps.numCuInWidth; cuCol++)
             {
-                m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * g_maxCUSize + cuCol * g_maxCUSize;
-                m_cuOffsetC[cuRow * sps.numCuInWidth + cuCol] = m_strideC * cuRow * (g_maxCUSize >> m_vChromaShift) + cuCol * (g_maxCUSize >> m_hChromaShift);
+                m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * m_param->maxCUSize + cuCol * m_param->maxCUSize;
+                m_cuOffsetC[cuRow * sps.numCuInWidth + cuCol] = m_strideC * cuRow * (m_param->maxCUSize >> m_vChromaShift) + cuCol * (m_param->maxCUSize >> m_hChromaShift);
             }
         }
 
@@ -129,7 +176,7 @@
         CHECKED_MALLOC(m_cuOffsetY, intptr_t, sps.numCuInWidth * sps.numCuInHeight);
         for (uint32_t cuRow = 0; cuRow < sps.numCuInHeight; cuRow++)
         for (uint32_t cuCol = 0; cuCol < sps.numCuInWidth; cuCol++)
-            m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * g_maxCUSize + cuCol * g_maxCUSize;
+            m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * m_param->maxCUSize + cuCol * m_param->maxCUSize;
 
         CHECKED_MALLOC(m_buOffsetY, intptr_t, (size_t)numPartitions);
         for (uint32_t idx = 0; idx < numPartitions; ++idx)
@@ -184,6 +231,11 @@
 
     X265_CHECK(pic.bitDepth >= 8, "pic.bitDepth check failure");
 
+    uint64_t lumaSum;
+    uint64_t cbSum;
+    uint64_t crSum;
+    lumaSum = cbSum = crSum = 0;
+
     if (pic.bitDepth == 8)
     {
 #if (X265_DEPTH > 8)
@@ -288,6 +340,47 @@
     pixel *U = m_picOrg[1];
     pixel *V = m_picOrg[2];
 
+    pixel *yPic = m_picOrg[0];
+    pixel *uPic = m_picOrg[1];
+    pixel *vPic = m_picOrg[2];
+
+    for (int r = 0; r < height; r++)
+    {
+        for (int c = 0; c < width; c++)
+        {
+            m_maxLumaLevel = X265_MAX(yPic[c], m_maxLumaLevel);
+            m_minLumaLevel = X265_MIN(yPic[c], m_minLumaLevel);
+            lumaSum += yPic[c];
+        }
+        yPic += m_stride;
+    }
+    m_avgLumaLevel = (double)lumaSum / (m_picHeight * m_picWidth);
+
+    if (param.csvLogLevel >= 2)
+    {
+        if (param.internalCsp != X265_CSP_I400)
+        {
+            for (int r = 0; r < height >> m_vChromaShift; r++)
+            {
+                for (int c = 0; c < width >> m_hChromaShift; c++)
+                {
+                    m_maxChromaULevel = X265_MAX(uPic[c], m_maxChromaULevel);
+                    m_minChromaULevel = X265_MIN(uPic[c], m_minChromaULevel);
+                    cbSum += uPic[c];
+
+                    m_maxChromaVLevel = X265_MAX(vPic[c], m_maxChromaVLevel);
+                    m_minChromaVLevel = X265_MIN(vPic[c], m_minChromaVLevel);
+                    crSum += vPic[c];
+                }
+
+                uPic += m_strideC;
+                vPic += m_strideC;
+            }
+            m_avgChromaULevel = (double)cbSum / ((height >> m_vChromaShift) * (width >> m_hChromaShift));
+            m_avgChromaVLevel = (double)crSum / ((height >> m_vChromaShift) * (width >> m_hChromaShift));
+        }
+    }
+
 #if HIGH_BIT_DEPTH
     bool calcHDRParams = !!param.minLuma || (param.maxLuma != PIXEL_MAX);
     /* Apply min/max luma bounds for HDR pixel manipulations */
​

x265_2.4.tar.gz/source/common/picyuv.h -> x265_2.5.tar.gz/source/common/picyuv.h Changed

 
@@ -60,14 +60,25 @@
     uint32_t m_chromaMarginX;
     uint32_t m_chromaMarginY;
 
-    pixel m_maxLumaLevel;
-    double   m_avgLumaLevel;
+    pixel   m_maxLumaLevel;
+    pixel   m_minLumaLevel;
+    double  m_avgLumaLevel;
+
+    pixel   m_maxChromaULevel;
+    pixel   m_minChromaULevel;
+    double  m_avgChromaULevel;
+
+    pixel   m_maxChromaVLevel;
+    pixel   m_minChromaVLevel;
+    double  m_avgChromaVLevel;
+    x265_param *m_param;
 
     PicYuv();
 
-    bool  create(uint32_t picWidth, uint32_t picHeight, uint32_t csp);
+    bool  create(x265_param* param, pixel *pixelbuf = NULL);
     bool  createOffsets(const SPS& sps);
     void  destroy();
+    int   getLumaBufLen(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp);
 
     void  copyFromPicture(const x265_picture&, const x265_param& param, int padx, int pady);
 
​

x265_2.4.tar.gz/source/common/primitives.cpp -> x265_2.5.tar.gz/source/common/primitives.cpp Changed

 
@@ -57,6 +57,7 @@
 void setupIntraPrimitives_c(EncoderPrimitives &p);
 void setupLoopFilterPrimitives_c(EncoderPrimitives &p);
 void setupSaoPrimitives_c(EncoderPrimitives &p);
+void setupSeaIntegralPrimitives_c(EncoderPrimitives &p);
 
 void setupCPrimitives(EncoderPrimitives &p)
 {
@@ -66,6 +67,7 @@
     setupIntraPrimitives_c(p);      // intrapred.cpp
     setupLoopFilterPrimitives_c(p); // loopfilter.cpp
     setupSaoPrimitives_c(p);        // sao.cpp
+    setupSeaIntegralPrimitives_c(p);  // framefilter.cpp
 }
 
 void setupAliasPrimitives(EncoderPrimitives &p)
​

x265_2.4.tar.gz/source/common/primitives.h -> x265_2.5.tar.gz/source/common/primitives.h Changed

@@ -110,6 +110,17 @@
     BLOCK_422_32x64
 };
 
+enum IntegralSize
+{
+    INTEGRAL_4,
+    INTEGRAL_8,
+    INTEGRAL_12,
+    INTEGRAL_16,
+    INTEGRAL_24,
+    INTEGRAL_32,
+    NUM_INTEGRAL_SIZE
+};
+
 typedef int  (*pixelcmp_t)(const pixel* fenc, intptr_t fencstride, const pixel* fref, intptr_t frefstride); // fenc is aligned
 typedef int  (*pixelcmp_ss_t)(const int16_t* fenc, intptr_t fencstride, const int16_t* fref, intptr_t frefstride);
 typedef sse_t (*pixel_sse_t)(const pixel* fenc, intptr_t fencstride, const pixel* fref, intptr_t frefstride); // fenc is aligned
@@ -203,6 +214,9 @@
 typedef void (*pelFilterLumaStrong_t)(pixel* src, intptr_t srcStep, intptr_t offset, int32_t tcP, int32_t tcQ);
 typedef void (*pelFilterChroma_t)(pixel* src, intptr_t srcStep, intptr_t offset, int32_t tc, int32_t maskP, int32_t maskQ);
 
+typedef void (*integralv_t)(uint32_t *sum, intptr_t stride);
+typedef void (*integralh_t)(uint32_t *sum, pixel *pix, intptr_t stride);
+
 /* Function pointers to optimized encoder primitives. Each pointer can reference
  * either an assembly routine, a SIMD intrinsic primitive, or a C function */
 struct EncoderPrimitives
@@ -342,6 +356,9 @@
     pelFilterLumaStrong_t pelFilterLumaStrong[2]; // EDGE_VER = 0, EDGE_HOR = 1
     pelFilterChroma_t     pelFilterChroma[2];     // EDGE_VER = 0, EDGE_HOR = 1
 
+    integralv_t            integral_initv[NUM_INTEGRAL_SIZE];
+    integralh_t            integral_inith[NUM_INTEGRAL_SIZE];
+
     /* There is one set of chroma primitives per color space. An encoder will
      * have just a single color space and thus it will only ever use one entry
      * in this array. However we always fill all entries in the array in case

 
@@ -110,6 +110,17 @@
     BLOCK_422_32x64
 };
 
+enum IntegralSize
+{
+    INTEGRAL_4,
+    INTEGRAL_8,
+    INTEGRAL_12,
+    INTEGRAL_16,
+    INTEGRAL_24,
+    INTEGRAL_32,
+    NUM_INTEGRAL_SIZE
+};
+
 typedef int  (*pixelcmp_t)(const pixel* fenc, intptr_t fencstride, const pixel* fref, intptr_t frefstride); // fenc is aligned
 typedef int  (*pixelcmp_ss_t)(const int16_t* fenc, intptr_t fencstride, const int16_t* fref, intptr_t frefstride);
 typedef sse_t (*pixel_sse_t)(const pixel* fenc, intptr_t fencstride, const pixel* fref, intptr_t frefstride); // fenc is aligned
@@ -203,6 +214,9 @@
 typedef void (*pelFilterLumaStrong_t)(pixel* src, intptr_t srcStep, intptr_t offset, int32_t tcP, int32_t tcQ);
 typedef void (*pelFilterChroma_t)(pixel* src, intptr_t srcStep, intptr_t offset, int32_t tc, int32_t maskP, int32_t maskQ);
 
+typedef void (*integralv_t)(uint32_t *sum, intptr_t stride);
+typedef void (*integralh_t)(uint32_t *sum, pixel *pix, intptr_t stride);
+
 /* Function pointers to optimized encoder primitives. Each pointer can reference
  * either an assembly routine, a SIMD intrinsic primitive, or a C function */
 struct EncoderPrimitives
@@ -342,6 +356,9 @@
     pelFilterLumaStrong_t pelFilterLumaStrong[2]; // EDGE_VER = 0, EDGE_HOR = 1
     pelFilterChroma_t     pelFilterChroma[2];     // EDGE_VER = 0, EDGE_HOR = 1
 
+    integralv_t            integral_initv[NUM_INTEGRAL_SIZE];
+    integralh_t            integral_inith[NUM_INTEGRAL_SIZE];
+
     /* There is one set of chroma primitives per color space. An encoder will
      * have just a single color space and thus it will only ever use one entry
      * in this array. However we always fill all entries in the array in case
​

x265_2.4.tar.gz/source/common/slice.cpp -> x265_2.5.tar.gz/source/common/slice.cpp Changed

@@ -185,22 +185,22 @@
 uint32_t Slice::realEndAddress(uint32_t endCUAddr) const
 {
     // Calculate end address
-    uint32_t internalAddress = (endCUAddr - 1) % NUM_4x4_PARTITIONS;
-    uint32_t externalAddress = (endCUAddr - 1) / NUM_4x4_PARTITIONS;
-    uint32_t xmax = m_sps->picWidthInLumaSamples - (externalAddress % m_sps->numCuInWidth) * g_maxCUSize;
-    uint32_t ymax = m_sps->picHeightInLumaSamples - (externalAddress / m_sps->numCuInWidth) * g_maxCUSize;
+    uint32_t internalAddress = (endCUAddr - 1) % m_param->num4x4Partitions;
+    uint32_t externalAddress = (endCUAddr - 1) / m_param->num4x4Partitions;
+    uint32_t xmax = m_sps->picWidthInLumaSamples - (externalAddress % m_sps->numCuInWidth) * m_param->maxCUSize;
+    uint32_t ymax = m_sps->picHeightInLumaSamples - (externalAddress / m_sps->numCuInWidth) * m_param->maxCUSize;
 
     while (g_zscanToPelX[internalAddress] >= xmax || g_zscanToPelY[internalAddress] >= ymax)
         internalAddress--;
 
     internalAddress++;
-    if (internalAddress == NUM_4x4_PARTITIONS)
+    if (internalAddress == m_param->num4x4Partitions)
     {
         internalAddress = 0;
         externalAddress++;
     }
 
-    return externalAddress * NUM_4x4_PARTITIONS + internalAddress;
+    return externalAddress * m_param->num4x4Partitions + internalAddress;
 }

 
@@ -185,22 +185,22 @@
 uint32_t Slice::realEndAddress(uint32_t endCUAddr) const
 {
     // Calculate end address
-    uint32_t internalAddress = (endCUAddr - 1) % NUM_4x4_PARTITIONS;
-    uint32_t externalAddress = (endCUAddr - 1) / NUM_4x4_PARTITIONS;
-    uint32_t xmax = m_sps->picWidthInLumaSamples - (externalAddress % m_sps->numCuInWidth) * g_maxCUSize;
-    uint32_t ymax = m_sps->picHeightInLumaSamples - (externalAddress / m_sps->numCuInWidth) * g_maxCUSize;
+    uint32_t internalAddress = (endCUAddr - 1) % m_param->num4x4Partitions;
+    uint32_t externalAddress = (endCUAddr - 1) / m_param->num4x4Partitions;
+    uint32_t xmax = m_sps->picWidthInLumaSamples - (externalAddress % m_sps->numCuInWidth) * m_param->maxCUSize;
+    uint32_t ymax = m_sps->picHeightInLumaSamples - (externalAddress / m_sps->numCuInWidth) * m_param->maxCUSize;
 
     while (g_zscanToPelX[internalAddress] >= xmax || g_zscanToPelY[internalAddress] >= ymax)
         internalAddress--;
 
     internalAddress++;
-    if (internalAddress == NUM_4x4_PARTITIONS)
+    if (internalAddress == m_param->num4x4Partitions)
     {
         internalAddress = 0;
         externalAddress++;
     }
 
-    return externalAddress * NUM_4x4_PARTITIONS + internalAddress;
+    return externalAddress * m_param->num4x4Partitions + internalAddress;
 }
 
 
​

x265_2.4.tar.gz/source/common/slice.h -> x265_2.5.tar.gz/source/common/slice.h Changed

 
@@ -360,6 +360,7 @@
     int         m_iPPSQpMinus26;
     int         numRefIdxDefault[2];
     int         m_iNumRPSInSPS;
+    const x265_param *m_param;
 
     Slice()
     {
​

x265_2.4.tar.gz/source/common/threadpool.cpp -> x265_2.5.tar.gz/source/common/threadpool.cpp Changed

@@ -253,6 +253,7 @@
     int cpusPerNode[MAX_NODE_NUM + 1];
     int threadsPerPool[MAX_NODE_NUM + 2];
     uint64_t nodeMaskPerPool[MAX_NODE_NUM + 2];
+    int totalNumThreads = 0;
 
     memset(cpusPerNode, 0, sizeof(cpusPerNode));
     memset(threadsPerPool, 0, sizeof(threadsPerPool));
@@ -388,9 +389,23 @@
         if (bNumaSupport)
             x265_log(p, X265_LOG_DEBUG, "NUMA node %d may use %d logical cores\n", i, cpusPerNode[i]);
         if (threadsPerPool[i])
+        {
             numPools += (threadsPerPool[i] + MAX_POOL_THREADS - 1) / MAX_POOL_THREADS;
+            totalNumThreads += threadsPerPool[i];
+        }
     }
+    if (!isThreadsReserved)
+    {
+        if (!numPools)
+        {
+            x265_log(p, X265_LOG_DEBUG, "No pool thread available. Deciding frame-threads based on detected CPU threads\n");
+            totalNumThreads = ThreadPool::getCpuCount(); // auto-detect frame threads
+        }
 
+        if (!p->frameNumThreads)
+            ThreadPool::getFrameThreadsCount(p, totalNumThreads);
+    }
+    
     if (!numPools)
         return NULL;
 
@@ -412,7 +427,7 @@
                 node++;
             int numThreads = X265_MIN(MAX_POOL_THREADS, threadsPerPool[node]);
             int origNumThreads = numThreads;
-            if (p->lookaheadThreads > numThreads / 2)
+            if (i == 0 && p->lookaheadThreads > numThreads / 2)
             {
                 p->lookaheadThreads = numThreads / 2;
                 x265_log(p, X265_LOG_DEBUG, "Setting lookahead threads to a maximum of half the total number of threads\n");
@@ -423,7 +438,7 @@
                 maxProviders = 1;
             }
 
-            else
+            else if (i == 0)
                 numThreads -= p->lookaheadThreads;
             if (!pools[i].create(numThreads, maxProviders, nodeMaskPerPool[node]))
             {
@@ -643,4 +658,21 @@
 #endif
 }
 
+void ThreadPool::getFrameThreadsCount(x265_param* p, int cpuCount)
+{
+    int rows = (p->sourceHeight + p->maxCUSize - 1) >> g_log2Size[p->maxCUSize];
+    if (!p->bEnableWavefront)
+        p->frameNumThreads = X265_MIN3(cpuCount, (rows + 1) / 2, X265_MAX_FRAME_THREADS);
+    else if (cpuCount >= 32)
+        p->frameNumThreads = (p->sourceHeight > 2000) ? 6 : 5; 
+    else if (cpuCount >= 16)
+        p->frameNumThreads = 4; 
+    else if (cpuCount >= 8)
+        p->frameNumThreads = 3;
+    else if (cpuCount >= 4)
+        p->frameNumThreads = 2;
+    else
+        p->frameNumThreads = 1;
+}
+
 } // end namespace X265_NS

 
@@ -253,6 +253,7 @@
     int cpusPerNode[MAX_NODE_NUM + 1];
     int threadsPerPool[MAX_NODE_NUM + 2];
     uint64_t nodeMaskPerPool[MAX_NODE_NUM + 2];
+    int totalNumThreads = 0;
 
     memset(cpusPerNode, 0, sizeof(cpusPerNode));
     memset(threadsPerPool, 0, sizeof(threadsPerPool));
@@ -388,9 +389,23 @@
         if (bNumaSupport)
             x265_log(p, X265_LOG_DEBUG, "NUMA node %d may use %d logical cores\n", i, cpusPerNode[i]);
         if (threadsPerPool[i])
+        {
             numPools += (threadsPerPool[i] + MAX_POOL_THREADS - 1) / MAX_POOL_THREADS;
+            totalNumThreads += threadsPerPool[i];
+        }
     }
+    if (!isThreadsReserved)
+    {
+        if (!numPools)
+        {
+            x265_log(p, X265_LOG_DEBUG, "No pool thread available. Deciding frame-threads based on detected CPU threads\n");
+            totalNumThreads = ThreadPool::getCpuCount(); // auto-detect frame threads
+        }
 
+        if (!p->frameNumThreads)
+            ThreadPool::getFrameThreadsCount(p, totalNumThreads);
+    }
+    
     if (!numPools)
         return NULL;
 
@@ -412,7 +427,7 @@
                 node++;
             int numThreads = X265_MIN(MAX_POOL_THREADS, threadsPerPool[node]);
             int origNumThreads = numThreads;
-            if (p->lookaheadThreads > numThreads / 2)
+            if (i == 0 && p->lookaheadThreads > numThreads / 2)
             {
                 p->lookaheadThreads = numThreads / 2;
                 x265_log(p, X265_LOG_DEBUG, "Setting lookahead threads to a maximum of half the total number of threads\n");
@@ -423,7 +438,7 @@
                 maxProviders = 1;
             }
 
-            else
+            else if (i == 0)
                 numThreads -= p->lookaheadThreads;
             if (!pools[i].create(numThreads, maxProviders, nodeMaskPerPool[node]))
             {
@@ -643,4 +658,21 @@
 #endif
 }
 
+void ThreadPool::getFrameThreadsCount(x265_param* p, int cpuCount)
+{
+    int rows = (p->sourceHeight + p->maxCUSize - 1) >> g_log2Size[p->maxCUSize];
+    if (!p->bEnableWavefront)
+        p->frameNumThreads = X265_MIN3(cpuCount, (rows + 1) / 2, X265_MAX_FRAME_THREADS);
+    else if (cpuCount >= 32)
+        p->frameNumThreads = (p->sourceHeight > 2000) ? 6 : 5; 
+    else if (cpuCount >= 16)
+        p->frameNumThreads = 4; 
+    else if (cpuCount >= 8)
+        p->frameNumThreads = 3;
+    else if (cpuCount >= 4)
+        p->frameNumThreads = 2;
+    else
+        p->frameNumThreads = 1;
+}
+
 } // end namespace X265_NS
​

x265_2.4.tar.gz/source/common/threadpool.h -> x265_2.5.tar.gz/source/common/threadpool.h Changed

 
@@ -105,6 +105,7 @@
     static ThreadPool* allocThreadPools(x265_param* p, int& numPools, bool isThreadsReserved);
     static int  getCpuCount();
     static int  getNumaNodeCount();
+    static void getFrameThreadsCount(x265_param* p,int cpuCount);
 };
 
 /* Any worker thread may enlist the help of idle worker threads from the same
​

x265_2.4.tar.gz/source/common/x86/asm-primitives.cpp -> x265_2.5.tar.gz/source/common/x86/asm-primitives.cpp Changed

@@ -114,6 +114,7 @@
 #include "blockcopy8.h"
 #include "intrapred.h"
 #include "dct8.h"
+#include "seaintegral.h"
 }
 
 #define ALL_LUMA_CU_TYPED(prim, fncdef, fname, cpu) \
@@ -2157,6 +2158,17 @@
         p.fix8Unpack = PFX(cutree_fix8_unpack_avx2);
         p.fix8Pack = PFX(cutree_fix8_pack_avx2);
 
+        p.integral_initv[INTEGRAL_4] = PFX(integral4v_avx2);
+        p.integral_initv[INTEGRAL_8] = PFX(integral8v_avx2);
+        p.integral_initv[INTEGRAL_12] = PFX(integral12v_avx2);
+        p.integral_initv[INTEGRAL_16] = PFX(integral16v_avx2);
+        p.integral_initv[INTEGRAL_24] = PFX(integral24v_avx2);
+        p.integral_initv[INTEGRAL_32] = PFX(integral32v_avx2);
+        p.integral_inith[INTEGRAL_4] = PFX(integral4h_avx2);
+        p.integral_inith[INTEGRAL_8] = PFX(integral8h_avx2);
+        p.integral_inith[INTEGRAL_12] = PFX(integral12h_avx2);
+        p.integral_inith[INTEGRAL_16] = PFX(integral16h_avx2);
+
         /* TODO: This kernel needs to be modified to work with HIGH_BIT_DEPTH only 
         p.planeClipAndMax = PFX(planeClipAndMax_avx2); */
 
@@ -3695,6 +3707,19 @@
         p.fix8Unpack = PFX(cutree_fix8_unpack_avx2);
         p.fix8Pack = PFX(cutree_fix8_pack_avx2);
 
+        p.integral_initv[INTEGRAL_4] = PFX(integral4v_avx2);
+        p.integral_initv[INTEGRAL_8] = PFX(integral8v_avx2);
+        p.integral_initv[INTEGRAL_12] = PFX(integral12v_avx2);
+        p.integral_initv[INTEGRAL_16] = PFX(integral16v_avx2);
+        p.integral_initv[INTEGRAL_24] = PFX(integral24v_avx2);
+        p.integral_initv[INTEGRAL_32] = PFX(integral32v_avx2);
+        p.integral_inith[INTEGRAL_4] = PFX(integral4h_avx2);
+        p.integral_inith[INTEGRAL_8] = PFX(integral8h_avx2);
+        p.integral_inith[INTEGRAL_12] = PFX(integral12h_avx2);
+        p.integral_inith[INTEGRAL_16] = PFX(integral16h_avx2);
+        p.integral_inith[INTEGRAL_24] = PFX(integral24h_avx2);
+        p.integral_inith[INTEGRAL_32] = PFX(integral32h_avx2);
+
     }
 #endif
 }

 
@@ -114,6 +114,7 @@
 #include "blockcopy8.h"
 #include "intrapred.h"
 #include "dct8.h"
+#include "seaintegral.h"
 }
 
 #define ALL_LUMA_CU_TYPED(prim, fncdef, fname, cpu) \
@@ -2157,6 +2158,17 @@
         p.fix8Unpack = PFX(cutree_fix8_unpack_avx2);
         p.fix8Pack = PFX(cutree_fix8_pack_avx2);
 
+        p.integral_initv[INTEGRAL_4] = PFX(integral4v_avx2);
+        p.integral_initv[INTEGRAL_8] = PFX(integral8v_avx2);
+        p.integral_initv[INTEGRAL_12] = PFX(integral12v_avx2);
+        p.integral_initv[INTEGRAL_16] = PFX(integral16v_avx2);
+        p.integral_initv[INTEGRAL_24] = PFX(integral24v_avx2);
+        p.integral_initv[INTEGRAL_32] = PFX(integral32v_avx2);
+        p.integral_inith[INTEGRAL_4] = PFX(integral4h_avx2);
+        p.integral_inith[INTEGRAL_8] = PFX(integral8h_avx2);
+        p.integral_inith[INTEGRAL_12] = PFX(integral12h_avx2);
+        p.integral_inith[INTEGRAL_16] = PFX(integral16h_avx2);
+
         /* TODO: This kernel needs to be modified to work with HIGH_BIT_DEPTH only 
         p.planeClipAndMax = PFX(planeClipAndMax_avx2); */
 
@@ -3695,6 +3707,19 @@
         p.fix8Unpack = PFX(cutree_fix8_unpack_avx2);
         p.fix8Pack = PFX(cutree_fix8_pack_avx2);
 
+        p.integral_initv[INTEGRAL_4] = PFX(integral4v_avx2);
+        p.integral_initv[INTEGRAL_8] = PFX(integral8v_avx2);
+        p.integral_initv[INTEGRAL_12] = PFX(integral12v_avx2);
+        p.integral_initv[INTEGRAL_16] = PFX(integral16v_avx2);
+        p.integral_initv[INTEGRAL_24] = PFX(integral24v_avx2);
+        p.integral_initv[INTEGRAL_32] = PFX(integral32v_avx2);
+        p.integral_inith[INTEGRAL_4] = PFX(integral4h_avx2);
+        p.integral_inith[INTEGRAL_8] = PFX(integral8h_avx2);
+        p.integral_inith[INTEGRAL_12] = PFX(integral12h_avx2);
+        p.integral_inith[INTEGRAL_16] = PFX(integral16h_avx2);
+        p.integral_inith[INTEGRAL_24] = PFX(integral24h_avx2);
+        p.integral_inith[INTEGRAL_32] = PFX(integral32h_avx2);
+
     }
 #endif
 }
​

x265_2.4.tar.gz/source/common/x86/loopfilter.asm -> x265_2.5.tar.gz/source/common/x86/loopfilter.asm Changed

 
@@ -1583,7 +1583,7 @@
     pshufb      m1, m4, m0
     pcmpgtb     m0, [pb_15]         ; m0 = [mask]
 
-    pblendvb    m6, m6, m1, m0      ; NOTE: don't use 3 parameters style, x264 macro have some bug!
+    pblendvb    m6, m1, m0
 
     pmovsxbw    m0, m6              ; offset
     punpckhbw   m6, m6
@@ -1630,7 +1630,7 @@
     pshufb      m6, m3, m1
     pshufb      m5, m4, m1
 
-    pblendvb    m6, m6, m5, m0    ; NOTE: don't use 3 parameters style, x264 macro have some bug!
+    pblendvb    m6, m5, m0
 
     pmovzxbw    m1, m2            ; rec
     punpckhbw   m2, m7
@@ -1904,7 +1904,7 @@
     sub         r3,     r4
     movu        xmm0,   [r3]
     movu        m3,     [r0]
-    pblendvb    m5,     m5,     m3,     xmm0
+    pblendvb    m5,     m3,     xmm0
     movu        [r0],   m5
 
 .end:
​

x265_2.4.tar.gz/source/common/x86/pixel-a.asm -> x265_2.5.tar.gz/source/common/x86/pixel-a.asm Changed

 
@@ -227,7 +227,7 @@
 ; clobber: m3..m7
 ; out: %1 = satd
 %macro SATD_4x4_MMX 3
-    %xdefine %%n n%1
+    %xdefine %%n nn%1
     %assign offset %2*SIZEOF_PIXEL
     LOAD_DIFF m4, m3, none, [r0+     offset], [r2+     offset]
     LOAD_DIFF m5, m3, none, [r0+  r1+offset], [r2+  r3+offset]
​

x265_2.4.tar.gz/source/common/x86/pixel-util8.asm -> x265_2.5.tar.gz/source/common/x86/pixel-util8.asm Changed

 
@@ -1597,7 +1597,7 @@
 
 .widthLess8:
     movu        m6, [r1]
-    pblendvb    m6, m6, m7, m0
+    pblendvb    m6, m7, m0
     movu        [r1], m6
 
 .nextH:
​

x265_2.5.tar.gz/source/common/x86/seaintegral.asm Added

@@ -0,0 +1,1062 @@
+;*****************************************************************************
+;* Copyright (C) 2013-2017 MulticoreWare, Inc
+;*
+;* Authors: Jayashri Murugan <jayashri@multicorewareinc.com>
+;*          Vignesh V Menon <vignesh@multicorewareinc.com>
+;*          Praveen Tiwari <praveen@multicorewareinc.com>
+;*
+;* This program is free software; you can redistribute it and/or modify
+;* it under the terms of the GNU General Public License as published by
+;* the Free Software Foundation; either version 2 of the License, or
+;* (at your option) any later version.
+;*
+;* This program is distributed in the hope that it will be useful,
+;* but WITHOUT ANY WARRANTY; without even the implied warranty of
+;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;* GNU General Public License for more details.
+;*
+;* You should have received a copy of the GNU General Public License
+;* along with this program; if not, write to the Free Software
+;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
+;*
+;* This program is also available under a commercial proprietary license.
+;* For more information, contact us at license @ x265.com.
+;*****************************************************************************/
+
+%include "x86inc.asm"
+%include "x86util.asm"
+
+SECTION .text 
+
+;-----------------------------------------------------------------------------
+;void integral_init4v_c(uint32_t *sum4, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral4v, 2, 3, 2
+    mov r2, r1
+    shl r2, 4
+
+.loop
+    movu    m0, [r0]
+    movu    m1, [r0 + r2]
+    psubd   m1, m0
+    movu    [r0], m1
+    add     r0, 32
+    sub     r1, 8
+    jnz     .loop
+    RET
+
+;-----------------------------------------------------------------------------
+;void integral_init8v_c(uint32_t *sum8, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral8v, 2, 3, 2
+    mov r2, r1
+    shl r2, 5
+
+.loop
+    movu    m0, [r0]
+    movu    m1, [r0 + r2]
+    psubd   m1, m0
+    movu    [r0], m1
+    add     r0, 32
+    sub     r1, 8
+    jnz     .loop
+    RET
+
+;-----------------------------------------------------------------------------
+;void integral_init12v_c(uint32_t *sum12, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral12v, 2, 4, 2
+    mov r2, r1
+    mov r3, r1
+    shl r2, 5
+    shl r3, 4
+    add r2, r3
+
+.loop
+    movu    m0, [r0]
+    movu    m1, [r0 + r2]
+    psubd   m1, m0
+    movu    [r0], m1
+    add     r0, 32
+    sub     r1, 8
+    jnz     .loop
+    RET
+
+;-----------------------------------------------------------------------------
+;void integral_init16v_c(uint32_t *sum16, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral16v, 2, 3, 2
+    mov r2, r1
+    shl r2, 6
+
+.loop
+    movu    m0, [r0]
+    movu    m1, [r0 + r2]
+    psubd   m1, m0
+    movu    [r0], m1
+    add     r0, 32
+    sub     r1, 8
+    jnz     .loop
+    RET
+
+;-----------------------------------------------------------------------------
+;void integral_init24v_c(uint32_t *sum24, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral24v, 2, 4, 2
+    mov r2, r1
+    mov r3, r1
+    shl r2, 6
+    shl r3, 5
+    add r2, r3
+
+.loop
+    movu    m0, [r0]
+    movu    m1, [r0 + r2]
+    psubd   m1, m0
+    movu    [r0], m1
+    add     r0, 32
+    sub     r1, 8
+    jnz     .loop
+    RET
+
+;-----------------------------------------------------------------------------
+;void integral_init32v_c(uint32_t *sum32, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral32v, 2, 3, 2
+    mov r2, r1
+    shl r2, 7
+
+.loop
+    movu    m0, [r0]
+    movu    m1, [r0 + r2]
+    psubd   m1, m0
+    movu    [r0], m1
+    add     r0, 32
+    sub     r1, 8
+    jnz     .loop
+    RET
+
+%macro INTEGRAL_FOUR_HORIZONTAL_16 0
+    pmovzxbw       m0, [r1]
+    pmovzxbw       m1, [r1 + 1]
+    paddw          m0, m1
+    pmovzxbw       m1, [r1 + 2]
+    paddw          m0, m1
+    pmovzxbw       m1, [r1 + 3]
+    paddw          m0, m1
+%endmacro
+
+%macro INTEGRAL_FOUR_HORIZONTAL_4 0
+    movd       xm0, [r1]
+    movd       xm1, [r1 + 1]
+    pmovzxbw   xm0, xm0
+    pmovzxbw   xm1, xm1
+    paddw      xm0, xm1
+    movd       xm1, [r1 + 2]
+    pmovzxbw   xm1, xm1
+    paddw      xm0, xm1
+    movd       xm1, [r1 + 3]
+    pmovzxbw   xm1, xm1
+    paddw      xm0, xm1
+%endmacro
+
+%macro INTEGRAL_FOUR_HORIZONTAL_8_HBD 0
+    pmovzxwd       m0, [r1]
+    pmovzxwd       m1, [r1 + 2]
+    paddd          m0, m1
+    pmovzxwd       m1, [r1 + 4]
+    paddd          m0, m1
+    pmovzxwd       m1, [r1 + 6]
+    paddd          m0, m1
+%endmacro
+
+%macro INTEGRAL_FOUR_HORIZONTAL_4_HBD 0
+    pmovzxwd       xm0, [r1]
+    pmovzxwd       xm1, [r1 + 2]
+    paddd          xm0, xm1
+    pmovzxwd       xm1, [r1 + 4]
+    paddd          xm0, xm1
+    pmovzxwd       xm1, [r1 + 6]
+    paddd          xm0, xm1
+%endmacro
+
+;-----------------------------------------------------------------------------
+;static void integral_init4h(uint32_t *sum, pixel *pix, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+%if HIGH_BIT_DEPTH
+cglobal integral4h, 3, 5, 3
+    lea            r3, [4 * r2]
+    sub            r0, r3
+    sub            r2, 4                      ;stride - 4
+    mov            r4, r2
+    shr            r4, 3

 
@@ -0,0 +1,1062 @@
+;*****************************************************************************
+;* Copyright (C) 2013-2017 MulticoreWare, Inc
+;*
+;* Authors: Jayashri Murugan <jayashri@multicorewareinc.com>
+;*          Vignesh V Menon <vignesh@multicorewareinc.com>
+;*          Praveen Tiwari <praveen@multicorewareinc.com>
+;*
+;* This program is free software; you can redistribute it and/or modify
+;* it under the terms of the GNU General Public License as published by
+;* the Free Software Foundation; either version 2 of the License, or
+;* (at your option) any later version.
+;*
+;* This program is distributed in the hope that it will be useful,
+;* but WITHOUT ANY WARRANTY; without even the implied warranty of
+;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;* GNU General Public License for more details.
+;*
+;* You should have received a copy of the GNU General Public License
+;* along with this program; if not, write to the Free Software
+;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
+;*
+;* This program is also available under a commercial proprietary license.
+;* For more information, contact us at license @ x265.com.
+;*****************************************************************************/
+
+%include "x86inc.asm"
+%include "x86util.asm"
+
+SECTION .text 
+
+;-----------------------------------------------------------------------------
+;void integral_init4v_c(uint32_t *sum4, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral4v, 2, 3, 2
+    mov r2, r1
+    shl r2, 4
+
+.loop
+    movu    m0, [r0]
+    movu    m1, [r0 + r2]
+    psubd   m1, m0
+    movu    [r0], m1
+    add     r0, 32
+    sub     r1, 8
+    jnz     .loop
+    RET
+
+;-----------------------------------------------------------------------------
+;void integral_init8v_c(uint32_t *sum8, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral8v, 2, 3, 2
+    mov r2, r1
+    shl r2, 5
+
+.loop
+    movu    m0, [r0]
+    movu    m1, [r0 + r2]
+    psubd   m1, m0
+    movu    [r0], m1
+    add     r0, 32
+    sub     r1, 8
+    jnz     .loop
+    RET
+
+;-----------------------------------------------------------------------------
+;void integral_init12v_c(uint32_t *sum12, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral12v, 2, 4, 2
+    mov r2, r1
+    mov r3, r1
+    shl r2, 5
+    shl r3, 4
+    add r2, r3
+
+.loop
+    movu    m0, [r0]
+    movu    m1, [r0 + r2]
+    psubd   m1, m0
+    movu    [r0], m1
+    add     r0, 32
+    sub     r1, 8
+    jnz     .loop
+    RET
+
+;-----------------------------------------------------------------------------
+;void integral_init16v_c(uint32_t *sum16, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral16v, 2, 3, 2
+    mov r2, r1
+    shl r2, 6
+
+.loop
+    movu    m0, [r0]
+    movu    m1, [r0 + r2]
+    psubd   m1, m0
+    movu    [r0], m1
+    add     r0, 32
+    sub     r1, 8
+    jnz     .loop
+    RET
+
+;-----------------------------------------------------------------------------
+;void integral_init24v_c(uint32_t *sum24, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral24v, 2, 4, 2
+    mov r2, r1
+    mov r3, r1
+    shl r2, 6
+    shl r3, 5
+    add r2, r3
+
+.loop
+    movu    m0, [r0]
+    movu    m1, [r0 + r2]
+    psubd   m1, m0
+    movu    [r0], m1
+    add     r0, 32
+    sub     r1, 8
+    jnz     .loop
+    RET
+
+;-----------------------------------------------------------------------------
+;void integral_init32v_c(uint32_t *sum32, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+cglobal integral32v, 2, 3, 2
+    mov r2, r1
+    shl r2, 7
+
+.loop
+    movu    m0, [r0]
+    movu    m1, [r0 + r2]
+    psubd   m1, m0
+    movu    [r0], m1
+    add     r0, 32
+    sub     r1, 8
+    jnz     .loop
+    RET
+
+%macro INTEGRAL_FOUR_HORIZONTAL_16 0
+    pmovzxbw       m0, [r1]
+    pmovzxbw       m1, [r1 + 1]
+    paddw          m0, m1
+    pmovzxbw       m1, [r1 + 2]
+    paddw          m0, m1
+    pmovzxbw       m1, [r1 + 3]
+    paddw          m0, m1
+%endmacro
+
+%macro INTEGRAL_FOUR_HORIZONTAL_4 0
+    movd       xm0, [r1]
+    movd       xm1, [r1 + 1]
+    pmovzxbw   xm0, xm0
+    pmovzxbw   xm1, xm1
+    paddw      xm0, xm1
+    movd       xm1, [r1 + 2]
+    pmovzxbw   xm1, xm1
+    paddw      xm0, xm1
+    movd       xm1, [r1 + 3]
+    pmovzxbw   xm1, xm1
+    paddw      xm0, xm1
+%endmacro
+
+%macro INTEGRAL_FOUR_HORIZONTAL_8_HBD 0
+    pmovzxwd       m0, [r1]
+    pmovzxwd       m1, [r1 + 2]
+    paddd          m0, m1
+    pmovzxwd       m1, [r1 + 4]
+    paddd          m0, m1
+    pmovzxwd       m1, [r1 + 6]
+    paddd          m0, m1
+%endmacro
+
+%macro INTEGRAL_FOUR_HORIZONTAL_4_HBD 0
+    pmovzxwd       xm0, [r1]
+    pmovzxwd       xm1, [r1 + 2]
+    paddd          xm0, xm1
+    pmovzxwd       xm1, [r1 + 4]
+    paddd          xm0, xm1
+    pmovzxwd       xm1, [r1 + 6]
+    paddd          xm0, xm1
+%endmacro
+
+;-----------------------------------------------------------------------------
+;static void integral_init4h(uint32_t *sum, pixel *pix, intptr_t stride)
+;-----------------------------------------------------------------------------
+INIT_YMM avx2
+%if HIGH_BIT_DEPTH
+cglobal integral4h, 3, 5, 3
+    lea            r3, [4 * r2]
+    sub            r0, r3
+    sub            r2, 4                      ;stride - 4
+    mov            r4, r2
+    shr            r4, 3
​

x265_2.5.tar.gz/source/common/x86/seaintegral.h Added

@@ -0,0 +1,42 @@
+/*****************************************************************************
+* Copyright (C) 2013-2017 MulticoreWare, Inc
+*
+* Authors: Vignesh V Menon <vignesh@multicorewareinc.com>
+*          Jayashri Murugan <jayashri@multicorewareinc.com>
+*          Praveen Tiwari <praveen@multicorewareinc.com>
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
+*
+* This program is also available under a commercial proprietary license.
+* For more information, contact us at license @ x265.com.
+*****************************************************************************/
+
+#ifndef X265_SEAINTEGRAL_H
+#define X265_SEAINTEGRAL_H
+
+void PFX(integral4v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral8v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral12v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral16v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral24v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral32v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral4h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral8h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral12h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral16h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral24h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral32h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+
+#endif //X265_SEAINTEGRAL_H

 
@@ -0,0 +1,42 @@
+/*****************************************************************************
+* Copyright (C) 2013-2017 MulticoreWare, Inc
+*
+* Authors: Vignesh V Menon <vignesh@multicorewareinc.com>
+*          Jayashri Murugan <jayashri@multicorewareinc.com>
+*          Praveen Tiwari <praveen@multicorewareinc.com>
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02111, USA.
+*
+* This program is also available under a commercial proprietary license.
+* For more information, contact us at license @ x265.com.
+*****************************************************************************/
+
+#ifndef X265_SEAINTEGRAL_H
+#define X265_SEAINTEGRAL_H
+
+void PFX(integral4v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral8v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral12v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral16v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral24v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral32v_avx2)(uint32_t *sum, intptr_t stride);
+void PFX(integral4h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral8h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral12h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral16h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral24h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+void PFX(integral32h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
+
+#endif //X265_SEAINTEGRAL_H
​

x265_2.4.tar.gz/source/common/x86/x86inc.asm -> x265_2.5.tar.gz/source/common/x86/x86inc.asm Changed

@@ -76,10 +76,6 @@
     SECTION .rodata align=%1
 %endmacro
 
-%macro SECTION_TEXT 0-1 16
-    SECTION .text align=%1
-%endmacro
-
 %if WIN64
     %define PIC
 %elif ARCH_X86_64 == 0
@@ -139,6 +135,7 @@
     %define r%1w %2w
     %define r%1b %2b
     %define r%1h %2h
+    %define %2q %2
     %if %0 == 2
         %define r%1m  %2d
         %define r%1mp %2
@@ -163,9 +160,9 @@
     %define e%1h %3
     %define r%1b %2
     %define e%1b %2
-%if ARCH_X86_64 == 0
-    %define r%1  e%1
-%endif
+    %if ARCH_X86_64 == 0
+        %define r%1 e%1
+    %endif
 %endmacro
 
 DECLARE_REG_SIZE ax, al, ah
@@ -275,7 +272,7 @@
 
 %macro ASSERT 1
     %if (%1) == 0
-        %error assert failed
+        %error assertion ``%1'' failed
     %endif
 %endmacro
 
@@ -365,9 +362,19 @@
     %ifnum %1
         %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT
             %if %1 > 0
+                ; Reserve an additional register for storing the original stack pointer, but avoid using
+                ; eax/rax for this purpose since it can potentially get overwritten as a return value.
                 %assign regs_used (regs_used + 1)
-            %elif ARCH_X86_64 && regs_used == num_args && num_args <= 4 + UNIX64 * 2
-                %warning "Stack pointer will overwrite register argument"
+                %if ARCH_X86_64 && regs_used == 7
+                    %assign regs_used 8
+                %elif ARCH_X86_64 == 0 && regs_used == 1
+                    %assign regs_used 2
+                %endif
+            %endif
+            %if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
+                ; Ensure that we don't clobber any registers containing arguments. For UNIX64 we also preserve r6 (rax)
+                ; since it's used as a hidden argument in vararg functions to specify the number of vector registers used.
+                %assign regs_used 5 + UNIX64 * 3
             %endif
         %endif
     %endif
@@ -396,10 +403,10 @@
 DECLARE_REG 8,  rsi, 72
 DECLARE_REG 9,  rbx, 80
 DECLARE_REG 10, rbp, 88
-DECLARE_REG 11, R12, 96
-DECLARE_REG 12, R13, 104
-DECLARE_REG 13, R14, 112
-DECLARE_REG 14, R15, 120
+DECLARE_REG 11, R14, 96
+DECLARE_REG 12, R15, 104
+DECLARE_REG 13, R12, 112
+DECLARE_REG 14, R13, 120
 
 %macro PROLOGUE 2-5+ 0 ; #args, #regs, #xmm_regs, [stack_size,] arg_names...
     %assign num_args %1
@@ -445,45 +452,46 @@
     WIN64_PUSH_XMM
 %endmacro
 
-%macro WIN64_RESTORE_XMM_INTERNAL 1
+%macro WIN64_RESTORE_XMM_INTERNAL 0
     %assign %%pad_size 0
     %if xmm_regs_used > 8
         %assign %%i xmm_regs_used
         %rep xmm_regs_used-8
             %assign %%i %%i-1
-            movaps xmm %+ %%i, [%1 + (%%i-8)*16 + stack_size + 32]
+            movaps xmm %+ %%i, [rsp + (%%i-8)*16 + stack_size + 32]
         %endrep
     %endif
     %if stack_size_padded > 0
         %if stack_size > 0 && required_stack_alignment > STACK_ALIGNMENT
             mov rsp, rstkm
         %else
-            add %1, stack_size_padded
+            add rsp, stack_size_padded
             %assign %%pad_size stack_size_padded
         %endif
     %endif
     %if xmm_regs_used > 7
-        movaps xmm7, [%1 + stack_offset - %%pad_size + 24]
+        movaps xmm7, [rsp + stack_offset - %%pad_size + 24]
     %endif
     %if xmm_regs_used > 6
-        movaps xmm6, [%1 + stack_offset - %%pad_size +  8]
+        movaps xmm6, [rsp + stack_offset - %%pad_size +  8]
     %endif
 %endmacro
 
-%macro WIN64_RESTORE_XMM 1
-    WIN64_RESTORE_XMM_INTERNAL %1
+%macro WIN64_RESTORE_XMM 0
+    WIN64_RESTORE_XMM_INTERNAL
     %assign stack_offset (stack_offset-stack_size_padded)
+    %assign stack_size_padded 0
     %assign xmm_regs_used 0
 %endmacro
 
 %define has_epilogue regs_used > 7 || xmm_regs_used > 6 || mmsize == 32 || stack_size > 0
 
 %macro RET 0
-    WIN64_RESTORE_XMM_INTERNAL rsp
+    WIN64_RESTORE_XMM_INTERNAL
     POP_IF_USED 14, 13, 12, 11, 10, 9, 8, 7
-%if mmsize == 32
-    vzeroupper
-%endif
+    %if mmsize == 32
+        vzeroupper
+    %endif
     AUTO_REP_RET
 %endmacro
 
@@ -500,10 +508,10 @@
 DECLARE_REG 8,  R11, 24
 DECLARE_REG 9,  rbx, 32
 DECLARE_REG 10, rbp, 40
-DECLARE_REG 11, R12, 48
-DECLARE_REG 12, R13, 56
-DECLARE_REG 13, R14, 64
-DECLARE_REG 14, R15, 72
+DECLARE_REG 11, R14, 48
+DECLARE_REG 12, R15, 56
+DECLARE_REG 13, R12, 64
+DECLARE_REG 14, R13, 72
 
 %macro PROLOGUE 2-5+ ; #args, #regs, #xmm_regs, [stack_size,] arg_names...
     %assign num_args %1
@@ -520,17 +528,17 @@
 %define has_epilogue regs_used > 9 || mmsize == 32 || stack_size > 0
 
 %macro RET 0
-%if stack_size_padded > 0
-%if required_stack_alignment > STACK_ALIGNMENT
-    mov rsp, rstkm
-%else
-    add rsp, stack_size_padded
-%endif
-%endif
+    %if stack_size_padded > 0
+        %if required_stack_alignment > STACK_ALIGNMENT
+            mov rsp, rstkm
+        %else
+            add rsp, stack_size_padded
+        %endif
+    %endif
     POP_IF_USED 14, 13, 12, 11, 10, 9
-%if mmsize == 32
-    vzeroupper
-%endif
+    %if mmsize == 32
+        vzeroupper
+    %endif
     AUTO_REP_RET
 %endmacro
 
@@ -576,29 +584,29 @@
 %define has_epilogue regs_used > 3 || mmsize == 32 || stack_size > 0
 
 %macro RET 0
-%if stack_size_padded > 0
-%if required_stack_alignment > STACK_ALIGNMENT
-    mov rsp, rstkm
-%else
-    add rsp, stack_size_padded
-%endif
-%endif
+    %if stack_size_padded > 0
+        %if required_stack_alignment > STACK_ALIGNMENT
+            mov rsp, rstkm
+        %else
+            add rsp, stack_size_padded
+        %endif
+    %endif
     POP_IF_USED 6, 5, 4, 3
-%if mmsize == 32
-    vzeroupper

 
@@ -76,10 +76,6 @@
     SECTION .rodata align=%1
 %endmacro
 
-%macro SECTION_TEXT 0-1 16
-    SECTION .text align=%1
-%endmacro
-
 %if WIN64
     %define PIC
 %elif ARCH_X86_64 == 0
@@ -139,6 +135,7 @@
     %define r%1w %2w
     %define r%1b %2b
     %define r%1h %2h
+    %define %2q %2
     %if %0 == 2
         %define r%1m  %2d
         %define r%1mp %2
@@ -163,9 +160,9 @@
     %define e%1h %3
     %define r%1b %2
     %define e%1b %2
-%if ARCH_X86_64 == 0
-    %define r%1  e%1
-%endif
+    %if ARCH_X86_64 == 0
+        %define r%1 e%1
+    %endif
 %endmacro
 
 DECLARE_REG_SIZE ax, al, ah
@@ -275,7 +272,7 @@
 
 %macro ASSERT 1
     %if (%1) == 0
-        %error assert failed
+        %error assertion ``%1'' failed
     %endif
 %endmacro
 
@@ -365,9 +362,19 @@
     %ifnum %1
         %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT
             %if %1 > 0
+                ; Reserve an additional register for storing the original stack pointer, but avoid using
+                ; eax/rax for this purpose since it can potentially get overwritten as a return value.
                 %assign regs_used (regs_used + 1)
-            %elif ARCH_X86_64 && regs_used == num_args && num_args <= 4 + UNIX64 * 2
-                %warning "Stack pointer will overwrite register argument"
+                %if ARCH_X86_64 && regs_used == 7
+                    %assign regs_used 8
+                %elif ARCH_X86_64 == 0 && regs_used == 1
+                    %assign regs_used 2
+                %endif
+            %endif
+            %if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
+                ; Ensure that we don't clobber any registers containing arguments. For UNIX64 we also preserve r6 (rax)
+                ; since it's used as a hidden argument in vararg functions to specify the number of vector registers used.
+                %assign regs_used 5 + UNIX64 * 3
             %endif
         %endif
     %endif
@@ -396,10 +403,10 @@
 DECLARE_REG 8,  rsi, 72
 DECLARE_REG 9,  rbx, 80
 DECLARE_REG 10, rbp, 88
-DECLARE_REG 11, R12, 96
-DECLARE_REG 12, R13, 104
-DECLARE_REG 13, R14, 112
-DECLARE_REG 14, R15, 120
+DECLARE_REG 11, R14, 96
+DECLARE_REG 12, R15, 104
+DECLARE_REG 13, R12, 112
+DECLARE_REG 14, R13, 120
 
 %macro PROLOGUE 2-5+ 0 ; #args, #regs, #xmm_regs, [stack_size,] arg_names...
     %assign num_args %1
@@ -445,45 +452,46 @@
     WIN64_PUSH_XMM
 %endmacro
 
-%macro WIN64_RESTORE_XMM_INTERNAL 1
+%macro WIN64_RESTORE_XMM_INTERNAL 0
     %assign %%pad_size 0
     %if xmm_regs_used > 8
         %assign %%i xmm_regs_used
         %rep xmm_regs_used-8
             %assign %%i %%i-1
-            movaps xmm %+ %%i, [%1 + (%%i-8)*16 + stack_size + 32]
+            movaps xmm %+ %%i, [rsp + (%%i-8)*16 + stack_size + 32]
         %endrep
     %endif
     %if stack_size_padded > 0
         %if stack_size > 0 && required_stack_alignment > STACK_ALIGNMENT
             mov rsp, rstkm
         %else
-            add %1, stack_size_padded
+            add rsp, stack_size_padded
             %assign %%pad_size stack_size_padded
         %endif
     %endif
     %if xmm_regs_used > 7
-        movaps xmm7, [%1 + stack_offset - %%pad_size + 24]
+        movaps xmm7, [rsp + stack_offset - %%pad_size + 24]
     %endif
     %if xmm_regs_used > 6
-        movaps xmm6, [%1 + stack_offset - %%pad_size +  8]
+        movaps xmm6, [rsp + stack_offset - %%pad_size +  8]
     %endif
 %endmacro
 
-%macro WIN64_RESTORE_XMM 1
-    WIN64_RESTORE_XMM_INTERNAL %1
+%macro WIN64_RESTORE_XMM 0
+    WIN64_RESTORE_XMM_INTERNAL
     %assign stack_offset (stack_offset-stack_size_padded)
+    %assign stack_size_padded 0
     %assign xmm_regs_used 0
 %endmacro
 
 %define has_epilogue regs_used > 7 || xmm_regs_used > 6 || mmsize == 32 || stack_size > 0
 
 %macro RET 0
-    WIN64_RESTORE_XMM_INTERNAL rsp
+    WIN64_RESTORE_XMM_INTERNAL
     POP_IF_USED 14, 13, 12, 11, 10, 9, 8, 7
-%if mmsize == 32
-    vzeroupper
-%endif
+    %if mmsize == 32
+        vzeroupper
+    %endif
     AUTO_REP_RET
 %endmacro
 
@@ -500,10 +508,10 @@
 DECLARE_REG 8,  R11, 24
 DECLARE_REG 9,  rbx, 32
 DECLARE_REG 10, rbp, 40
-DECLARE_REG 11, R12, 48
-DECLARE_REG 12, R13, 56
-DECLARE_REG 13, R14, 64
-DECLARE_REG 14, R15, 72
+DECLARE_REG 11, R14, 48
+DECLARE_REG 12, R15, 56
+DECLARE_REG 13, R12, 64
+DECLARE_REG 14, R13, 72
 
 %macro PROLOGUE 2-5+ ; #args, #regs, #xmm_regs, [stack_size,] arg_names...
     %assign num_args %1
@@ -520,17 +528,17 @@
 %define has_epilogue regs_used > 9 || mmsize == 32 || stack_size > 0
 
 %macro RET 0
-%if stack_size_padded > 0
-%if required_stack_alignment > STACK_ALIGNMENT
-    mov rsp, rstkm
-%else
-    add rsp, stack_size_padded
-%endif
-%endif
+    %if stack_size_padded > 0
+        %if required_stack_alignment > STACK_ALIGNMENT
+            mov rsp, rstkm
+        %else
+            add rsp, stack_size_padded
+        %endif
+    %endif
     POP_IF_USED 14, 13, 12, 11, 10, 9
-%if mmsize == 32
-    vzeroupper
-%endif
+    %if mmsize == 32
+        vzeroupper
+    %endif
     AUTO_REP_RET
 %endmacro
 
@@ -576,29 +584,29 @@
 %define has_epilogue regs_used > 3 || mmsize == 32 || stack_size > 0
 
 %macro RET 0
-%if stack_size_padded > 0
-%if required_stack_alignment > STACK_ALIGNMENT
-    mov rsp, rstkm
-%else
-    add rsp, stack_size_padded
-%endif
-%endif
+    %if stack_size_padded > 0
+        %if required_stack_alignment > STACK_ALIGNMENT
+            mov rsp, rstkm
+        %else
+            add rsp, stack_size_padded
+        %endif
+    %endif
     POP_IF_USED 6, 5, 4, 3
-%if mmsize == 32
-    vzeroupper
​

x265_2.4.tar.gz/source/dynamicHDR10/BasicStructures.h -> x265_2.5.tar.gz/source/dynamicHDR10/BasicStructures.h Changed

 
@@ -35,16 +35,26 @@
     float maxRLuminance = 0.0;
     float maxGLuminance = 0.0;
     float maxBLuminance = 0.0;
-    int order;
+    int order = 0;
     std::vector<unsigned int> percentiles;
 };
 
 struct BezierCurveData
 {
-    int order;
-    int sPx;
-    int sPy;
+    int order = 0;
+    int sPx = 0;
+    int sPy = 0;
     std::vector<int> coeff;
 };
 
+struct PercentileLuminance{
+
+    float averageLuminance = 0.0;
+    float maxRLuminance = 0.0;
+    float maxGLuminance = 0.0;
+    float maxBLuminance = 0.0;
+    int order = 0;
+    std::vector<unsigned int> percentiles;
+};
+
 #endif // BASICSTRUCTURES_H
​

x265_2.4.tar.gz/source/dynamicHDR10/CMakeLists.txt -> x265_2.5.tar.gz/source/dynamicHDR10/CMakeLists.txt Changed

@@ -1,8 +1,8 @@
 # vim: syntax=cmake
-if(ENABLE_DYNAMIC_HDR10)
+if(ENABLE_HDR10_PLUS)
 
 add_library(dynamicHDR10 OBJECT 
-    BasicStructures.cpp BasicStructures.h
+    BasicStructures.h
     json11/json11.cpp json11/json11.h
     JsonHelper.cpp JsonHelper.h
     metadataFromJson.cpp metadataFromJson.h
@@ -10,7 +10,6 @@
     hdr10plus.h
     api.cpp )
 
-else()
 cmake_minimum_required (VERSION 2.8.11)
 project(dynamicHDR10)
 include(CheckIncludeFiles)
@@ -150,26 +149,5 @@
     
 option(ENABLE_SHARED "Build shared library" OFF)
 
-if(ENABLE_SHARED)
-    add_library(dynamicHDR10 SHARED
-        json11/json11.cpp json11/json11.h
-        BasicStructures.cpp BasicStructures.h
-        JsonHelper.cpp JsonHelper.h
-        metadataFromJson.cpp metadataFromJson.h
-        SeiMetadataDictionary.cpp SeiMetadataDictionary.h
-        hdr10plus.h api.cpp )
-else()
-    add_library(dynamicHDR10 STATIC
-    json11/json11.cpp json11/json11.h
-    BasicStructures.cpp BasicStructures.h
-    JsonHelper.cpp JsonHelper.h
-    metadataFromJson.cpp metadataFromJson.h
-    SeiMetadataDictionary.cpp SeiMetadataDictionary.h
-    hdr10plus.h api.cpp )
-endif()
-
-install (TARGETS dynamicHDR10
-    LIBRARY DESTINATION ${LIB_INSTALL_DIR}
-    ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
 install(FILES hdr10plus.h DESTINATION include)
 endif()
\ No newline at end of file

 
@@ -1,8 +1,8 @@
 # vim: syntax=cmake
-if(ENABLE_DYNAMIC_HDR10)
+if(ENABLE_HDR10_PLUS)
 
 add_library(dynamicHDR10 OBJECT 
-    BasicStructures.cpp BasicStructures.h
+    BasicStructures.h
     json11/json11.cpp json11/json11.h
     JsonHelper.cpp JsonHelper.h
     metadataFromJson.cpp metadataFromJson.h
@@ -10,7 +10,6 @@
     hdr10plus.h
     api.cpp )
 
-else()
 cmake_minimum_required (VERSION 2.8.11)
 project(dynamicHDR10)
 include(CheckIncludeFiles)
@@ -150,26 +149,5 @@
     
 option(ENABLE_SHARED "Build shared library" OFF)
 
-if(ENABLE_SHARED)
-    add_library(dynamicHDR10 SHARED
-        json11/json11.cpp json11/json11.h
-        BasicStructures.cpp BasicStructures.h
-        JsonHelper.cpp JsonHelper.h
-        metadataFromJson.cpp metadataFromJson.h
-        SeiMetadataDictionary.cpp SeiMetadataDictionary.h
-        hdr10plus.h api.cpp )
-else()
-    add_library(dynamicHDR10 STATIC
-    json11/json11.cpp json11/json11.h
-    BasicStructures.cpp BasicStructures.h
-    JsonHelper.cpp JsonHelper.h
-    metadataFromJson.cpp metadataFromJson.h
-    SeiMetadataDictionary.cpp SeiMetadataDictionary.h
-    hdr10plus.h api.cpp )
-endif()
-
-install (TARGETS dynamicHDR10
-    LIBRARY DESTINATION ${LIB_INSTALL_DIR}
-    ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
 install(FILES hdr10plus.h DESTINATION include)
 endif()
\ No newline at end of file
​

x265_2.4.tar.gz/source/dynamicHDR10/json11/json11.cpp -> x265_2.5.tar.gz/source/dynamicHDR10/json11/json11.cpp Changed

@@ -26,6 +26,12 @@
 #include <cstdio>
 #include <limits>
 
+#if _MSC_VER
+#pragma warning(disable: 4510) //const member cannot be default initialized
+#pragma warning(disable: 4512) //assignment operator could not be generated
+#pragma warning(disable: 4610) //const member cannot be default initialized
+#endif
+
 namespace json11 {
 
 static const int max_depth = 200;
@@ -435,7 +441,7 @@
     char get_next_token() {
         consume_garbage();
         if (i == str.size())
-            return fail("unexpected end of input", 0);
+            return fail("unexpected end of input", '0');
 
         return str[i++];
     }
@@ -472,7 +478,7 @@
     string parse_string() {
         string out;
         long last_escaped_codepoint = -1;
-        while (true) {
+        for (;;) {
             if (i == str.size())
                 return fail("unexpected end of input in string", "");
 
@@ -665,7 +671,7 @@
             if (ch == '}')
                 return data;
 
-            while (1) {
+            for (;;) {
                 if (ch != '"')
                     return fail("expected '\"' in object, got " + esc(ch));
 
@@ -698,7 +704,7 @@
             if (ch == ']')
                 return data;
 
-            while (1) {
+            for (;;) {
                 i--;
                 data.push_back(parse_json(depth + 1));
                 if (failed)

 
@@ -26,6 +26,12 @@
 #include <cstdio>
 #include <limits>
 
+#if _MSC_VER
+#pragma warning(disable: 4510) //const member cannot be default initialized
+#pragma warning(disable: 4512) //assignment operator could not be generated
+#pragma warning(disable: 4610) //const member cannot be default initialized
+#endif
+
 namespace json11 {
 
 static const int max_depth = 200;
@@ -435,7 +441,7 @@
     char get_next_token() {
         consume_garbage();
         if (i == str.size())
-            return fail("unexpected end of input", 0);
+            return fail("unexpected end of input", '0');
 
         return str[i++];
     }
@@ -472,7 +478,7 @@
     string parse_string() {
         string out;
         long last_escaped_codepoint = -1;
-        while (true) {
+        for (;;) {
             if (i == str.size())
                 return fail("unexpected end of input in string", "");
 
@@ -665,7 +671,7 @@
             if (ch == '}')
                 return data;
 
-            while (1) {
+            for (;;) {
                 if (ch != '"')
                     return fail("expected '\"' in object, got " + esc(ch));
 
@@ -698,7 +704,7 @@
             if (ch == ']')
                 return data;
 
-            while (1) {
+            for (;;) {
                 i--;
                 data.push_back(parse_json(depth + 1));
                 if (failed)
​

x265_2.4.tar.gz/source/dynamicHDR10/metadataFromJson.cpp -> x265_2.5.tar.gz/source/dynamicHDR10/metadataFromJson.cpp Changed

 
@@ -168,7 +168,7 @@
     {
         int payloadBytes = 1;
 
-        for(;payload > 0xFF; payload -= 0xFF, ++payloadBytes);
+        for(;payload >= 0xFF; payload -= 0xFF, ++payloadBytes);
 
         if(payloadBytes > 1)
         {
​

x265_2.4.tar.gz/source/encoder/CMakeLists.txt -> x265_2.5.tar.gz/source/encoder/CMakeLists.txt Changed

 
@@ -43,4 +43,5 @@
     reference.cpp reference.h
     encoder.cpp encoder.h
     api.cpp
-    weightPrediction.cpp)
+    weightPrediction.cpp
+    ../x265-extras.cpp ../x265-extras.h)
​

x265_2.4.tar.gz/source/encoder/analysis.cpp -> x265_2.5.tar.gz/source/encoder/analysis.cpp Changed

@@ -75,6 +75,7 @@
     m_reuseInterDataCTU = NULL;
     m_reuseRef = NULL;
     m_bHD = false;
+    m_evaluateInter = 0;
 }
 
 bool Analysis::create(ThreadLocalData *tld)
@@ -89,19 +90,19 @@
     cacheCost = X265_MALLOC(uint64_t, costArrSize);
 
     int csp = m_param->internalCsp;
-    uint32_t cuSize = g_maxCUSize;
+    uint32_t cuSize = m_param->maxCUSize;
 
     bool ok = true;
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++, cuSize >>= 1)
+    for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++, cuSize >>= 1)
     {
         ModeDepth &md = m_modeDepth[depth];
 
-        md.cuMemPool.create(depth, csp, MAX_PRED_TYPES);
+        md.cuMemPool.create(depth, csp, MAX_PRED_TYPES, *m_param);
         ok &= md.fencYuv.create(cuSize, csp);
 
         for (int j = 0; j < MAX_PRED_TYPES; j++)
         {
-            md.pred[j].cu.initialize(md.cuMemPool, depth, csp, j);
+            md.pred[j].cu.initialize(md.cuMemPool, depth, *m_param, j);
             ok &= md.pred[j].predYuv.create(cuSize, csp);
             ok &= md.pred[j].reconYuv.create(cuSize, csp);
             md.pred[j].fencYuv = &md.fencYuv;
@@ -115,7 +116,7 @@
 
 void Analysis::destroy()
 {
-    for (uint32_t i = 0; i <= g_maxCUDepth; i++)
+    for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
     {
         m_modeDepth[i].cuMemPool.destroy();
         m_modeDepth[i].fencYuv.destroy();
@@ -150,6 +151,41 @@
         calculateNormFactor(ctu, qp);
 
     uint32_t numPartition = ctu.m_numPartitions;
+    if (m_param->bCTUInfo && (*m_frame->m_ctuInfo + ctu.m_cuAddr))
+    {
+        x265_ctu_info_t* ctuTemp = *m_frame->m_ctuInfo + ctu.m_cuAddr;
+        if (ctuTemp->ctuPartitions)
+        {
+            int32_t depthIdx = 0;
+            uint32_t maxNum8x8Partitions = 64;
+            uint8_t* depthInfoPtr = m_frame->m_addOnDepth[ctu.m_cuAddr];
+            uint8_t* contentInfoPtr = m_frame->m_addOnCtuInfo[ctu.m_cuAddr];
+            int* prevCtuInfoChangePtr = m_frame->m_addOnPrevChange[ctu.m_cuAddr];
+            do
+            {
+                uint8_t depth = (uint8_t)ctuTemp->ctuPartitions[depthIdx];
+                uint8_t content = (uint8_t)(*((int32_t *)ctuTemp->ctuInfo + depthIdx));
+                int prevCtuInfoChange = m_frame->m_prevCtuInfoChange[ctu.m_cuAddr * maxNum8x8Partitions + depthIdx];
+                memset(depthInfoPtr, depth, sizeof(uint8_t) * numPartition >> 2 * depth);
+                memset(contentInfoPtr, content, sizeof(uint8_t) * numPartition >> 2 * depth);
+                memset(prevCtuInfoChangePtr, 0, sizeof(int) * numPartition >> 2 * depth);
+                for (uint32_t l = 0; l < numPartition >> 2 * depth; l++)
+                    prevCtuInfoChangePtr[l] = prevCtuInfoChange;
+                depthInfoPtr += ctu.m_numPartitions >> 2 * depth;
+                contentInfoPtr += ctu.m_numPartitions >> 2 * depth;
+                prevCtuInfoChangePtr += ctu.m_numPartitions >> 2 * depth;
+                depthIdx++;
+            } while (ctuTemp->ctuPartitions[depthIdx] != 0);
+
+            m_additionalCtuInfo = m_frame->m_addOnCtuInfo[ctu.m_cuAddr];
+            m_prevCtuInfoChange = m_frame->m_addOnPrevChange[ctu.m_cuAddr];
+            memcpy(ctu.m_cuDepth, m_frame->m_addOnDepth[ctu.m_cuAddr], sizeof(uint8_t) * numPartition);
+            //Calculate log2CUSize from depth
+            for (uint32_t i = 0; i < cuGeom.numPartitions; i++)
+                ctu.m_log2CUSize[i] = (uint8_t)m_param->maxLog2CUSize - ctu.m_cuDepth[i];
+        }
+    }
+
     if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead)
     {
         m_multipassAnalysis = (analysis2PassFrameData*)m_frame->m_analysis2Pass.analysisFramedata;
@@ -167,19 +203,19 @@
         }
     }
 
-    if (m_param->analysisMode && m_slice->m_sliceType != I_SLICE && m_param->analysisRefineLevel > 1 && m_param->analysisRefineLevel < 10)
+    if (m_param->analysisReuseMode && m_slice->m_sliceType != I_SLICE && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel < 10)
     {
         int numPredDir = m_slice->isInterP() ? 1 : 2;
         m_reuseInterDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData;
         m_reuseRef = &m_reuseInterDataCTU->ref[ctu.m_cuAddr * X265_MAX_PRED_MODE_PER_CTU * numPredDir];
         m_reuseDepth = &m_reuseInterDataCTU->depth[ctu.m_cuAddr * ctu.m_numPartitions];
         m_reuseModes = &m_reuseInterDataCTU->modes[ctu.m_cuAddr * ctu.m_numPartitions];
-        if (m_param->analysisRefineLevel > 4)
+        if (m_param->analysisReuseLevel > 4)
         {
             m_reusePartSize = &m_reuseInterDataCTU->partSize[ctu.m_cuAddr * ctu.m_numPartitions];
             m_reuseMergeFlag = &m_reuseInterDataCTU->mergeFlag[ctu.m_cuAddr * ctu.m_numPartitions];
         }
-        if (m_param->analysisMode == X265_ANALYSIS_SAVE)
+        if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
             for (int i = 0; i < X265_MAX_PRED_MODE_PER_CTU * numPredDir; i++)
                 m_reuseRef[i] = -1;
     }
@@ -188,7 +224,7 @@
     if (m_slice->m_sliceType == I_SLICE)
     {
         analysis_intra_data* intraDataCTU = (analysis_intra_data*)m_frame->m_analysisData.intraData;
-        if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
+        if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1)
         {
             memcpy(ctu.m_cuDepth, &intraDataCTU->depth[ctu.m_cuAddr * numPartition], sizeof(uint8_t) * numPartition);
             memcpy(ctu.m_lumaIntraDir, &intraDataCTU->modes[ctu.m_cuAddr * numPartition], sizeof(uint8_t) * numPartition);
@@ -200,8 +236,8 @@
     else
     {
         if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
-            ctu.m_cuPelX / g_maxCUSize >= frame.m_encData->m_pir.pirStartCol
-            && ctu.m_cuPelX / g_maxCUSize < frame.m_encData->m_pir.pirEndCol)
+            ctu.m_cuPelX / m_param->maxCUSize >= frame.m_encData->m_pir.pirStartCol
+            && ctu.m_cuPelX / m_param->maxCUSize < frame.m_encData->m_pir.pirEndCol)
             compressIntraCU(ctu, cuGeom, qp);
         else if (!m_param->rdLevel)
         {
@@ -214,7 +250,7 @@
             /* generate residual for entire CTU at once and copy to reconPic */
             encodeResidue(ctu, cuGeom);
         }
-        else if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel == 10)
+        else if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel == 10)
         {
             analysis_inter_data* interDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData;
             int posCTU = ctu.m_cuAddr * numPartition;
@@ -229,7 +265,7 @@
             }
             //Calculate log2CUSize from depth
             for (uint32_t i = 0; i < cuGeom.numPartitions; i++)
-                ctu.m_log2CUSize[i] = (uint8_t)g_maxLog2CUSize - ctu.m_cuDepth[i];
+                ctu.m_log2CUSize[i] = (uint8_t)m_param->maxLog2CUSize - ctu.m_cuDepth[i];
 
             qprdRefine (ctu, cuGeom, qp, qp);
             return *m_modeDepth[0].bestMode;
@@ -245,9 +281,69 @@
     if (m_param->bEnableRdRefine || m_param->bOptCUDeltaQP)
         qprdRefine(ctu, cuGeom, qp, qp);
 
+    if (m_param->csvLogLevel >= 2)
+        collectPUStatistics(ctu, cuGeom);
+
     return *m_modeDepth[0].bestMode;
 }
 
+void Analysis::collectPUStatistics(const CUData& ctu, const CUGeom& cuGeom)
+{
+    uint8_t depth = 0;
+    uint8_t partSize = 0;
+    for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
+    {
+        depth = ctu.m_cuDepth[absPartIdx];
+        partSize = ctu.m_partSize[absPartIdx];
+        uint32_t numPU = nbPartsTable[(int)partSize];
+        int shift = 2 * (m_param->maxCUDepth + 1 - depth);
+        for (uint32_t puIdx = 0; puIdx < numPU; puIdx++)
+        {
+            PredictionUnit pu(ctu, cuGeom, puIdx);
+            int puabsPartIdx = ctu.getPUOffset(puIdx, absPartIdx);
+            int mode = 1;
+            if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_Nx2N || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxN)
+                mode = 2;
+            else if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxnU || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxnD || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_nLx2N || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_nRx2N)
+                 mode = 3;
+
+            if (ctu.m_predMode[puabsPartIdx + absPartIdx] == MODE_SKIP)
+            {
+                ctu.m_encData->m_frameStats.cntSkipPu[depth] += (uint64_t)(1 << shift);
+                ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
+            }
+            else if (ctu.m_predMode[puabsPartIdx + absPartIdx] == MODE_INTRA)
+            {
+                if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_NxN)
+                {
+                    ctu.m_encData->m_frameStats.cnt4x4++;
+                    ctu.m_encData->m_frameStats.totalPu[4]++;
+                }
+                else
+                {
+                    ctu.m_encData->m_frameStats.cntIntraPu[depth] += (uint64_t)(1 << shift);
+                    ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
+                }
+            }
+            else if (mode == 3)
+            {
+                ctu.m_encData->m_frameStats.cntAmp[depth] += (uint64_t)(1 << shift);
+                ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
+                break;
+            }
+            else
+            {

 
@@ -75,6 +75,7 @@
     m_reuseInterDataCTU = NULL;
     m_reuseRef = NULL;
     m_bHD = false;
+    m_evaluateInter = 0;
 }
 
 bool Analysis::create(ThreadLocalData *tld)
@@ -89,19 +90,19 @@
     cacheCost = X265_MALLOC(uint64_t, costArrSize);
 
     int csp = m_param->internalCsp;
-    uint32_t cuSize = g_maxCUSize;
+    uint32_t cuSize = m_param->maxCUSize;
 
     bool ok = true;
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++, cuSize >>= 1)
+    for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++, cuSize >>= 1)
     {
         ModeDepth &md = m_modeDepth[depth];
 
-        md.cuMemPool.create(depth, csp, MAX_PRED_TYPES);
+        md.cuMemPool.create(depth, csp, MAX_PRED_TYPES, *m_param);
         ok &= md.fencYuv.create(cuSize, csp);
 
         for (int j = 0; j < MAX_PRED_TYPES; j++)
         {
-            md.pred[j].cu.initialize(md.cuMemPool, depth, csp, j);
+            md.pred[j].cu.initialize(md.cuMemPool, depth, *m_param, j);
             ok &= md.pred[j].predYuv.create(cuSize, csp);
             ok &= md.pred[j].reconYuv.create(cuSize, csp);
             md.pred[j].fencYuv = &md.fencYuv;
@@ -115,7 +116,7 @@
 
 void Analysis::destroy()
 {
-    for (uint32_t i = 0; i <= g_maxCUDepth; i++)
+    for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
     {
         m_modeDepth[i].cuMemPool.destroy();
         m_modeDepth[i].fencYuv.destroy();
@@ -150,6 +151,41 @@
         calculateNormFactor(ctu, qp);
 
     uint32_t numPartition = ctu.m_numPartitions;
+    if (m_param->bCTUInfo && (*m_frame->m_ctuInfo + ctu.m_cuAddr))
+    {
+        x265_ctu_info_t* ctuTemp = *m_frame->m_ctuInfo + ctu.m_cuAddr;
+        if (ctuTemp->ctuPartitions)
+        {
+            int32_t depthIdx = 0;
+            uint32_t maxNum8x8Partitions = 64;
+            uint8_t* depthInfoPtr = m_frame->m_addOnDepth[ctu.m_cuAddr];
+            uint8_t* contentInfoPtr = m_frame->m_addOnCtuInfo[ctu.m_cuAddr];
+            int* prevCtuInfoChangePtr = m_frame->m_addOnPrevChange[ctu.m_cuAddr];
+            do
+            {
+                uint8_t depth = (uint8_t)ctuTemp->ctuPartitions[depthIdx];
+                uint8_t content = (uint8_t)(*((int32_t *)ctuTemp->ctuInfo + depthIdx));
+                int prevCtuInfoChange = m_frame->m_prevCtuInfoChange[ctu.m_cuAddr * maxNum8x8Partitions + depthIdx];
+                memset(depthInfoPtr, depth, sizeof(uint8_t) * numPartition >> 2 * depth);
+                memset(contentInfoPtr, content, sizeof(uint8_t) * numPartition >> 2 * depth);
+                memset(prevCtuInfoChangePtr, 0, sizeof(int) * numPartition >> 2 * depth);
+                for (uint32_t l = 0; l < numPartition >> 2 * depth; l++)
+                    prevCtuInfoChangePtr[l] = prevCtuInfoChange;
+                depthInfoPtr += ctu.m_numPartitions >> 2 * depth;
+                contentInfoPtr += ctu.m_numPartitions >> 2 * depth;
+                prevCtuInfoChangePtr += ctu.m_numPartitions >> 2 * depth;
+                depthIdx++;
+            } while (ctuTemp->ctuPartitions[depthIdx] != 0);
+
+            m_additionalCtuInfo = m_frame->m_addOnCtuInfo[ctu.m_cuAddr];
+            m_prevCtuInfoChange = m_frame->m_addOnPrevChange[ctu.m_cuAddr];
+            memcpy(ctu.m_cuDepth, m_frame->m_addOnDepth[ctu.m_cuAddr], sizeof(uint8_t) * numPartition);
+            //Calculate log2CUSize from depth
+            for (uint32_t i = 0; i < cuGeom.numPartitions; i++)
+                ctu.m_log2CUSize[i] = (uint8_t)m_param->maxLog2CUSize - ctu.m_cuDepth[i];
+        }
+    }
+
     if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead)
     {
         m_multipassAnalysis = (analysis2PassFrameData*)m_frame->m_analysis2Pass.analysisFramedata;
@@ -167,19 +203,19 @@
         }
     }
 
-    if (m_param->analysisMode && m_slice->m_sliceType != I_SLICE && m_param->analysisRefineLevel > 1 && m_param->analysisRefineLevel < 10)
+    if (m_param->analysisReuseMode && m_slice->m_sliceType != I_SLICE && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel < 10)
     {
         int numPredDir = m_slice->isInterP() ? 1 : 2;
         m_reuseInterDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData;
         m_reuseRef = &m_reuseInterDataCTU->ref[ctu.m_cuAddr * X265_MAX_PRED_MODE_PER_CTU * numPredDir];
         m_reuseDepth = &m_reuseInterDataCTU->depth[ctu.m_cuAddr * ctu.m_numPartitions];
         m_reuseModes = &m_reuseInterDataCTU->modes[ctu.m_cuAddr * ctu.m_numPartitions];
-        if (m_param->analysisRefineLevel > 4)
+        if (m_param->analysisReuseLevel > 4)
         {
             m_reusePartSize = &m_reuseInterDataCTU->partSize[ctu.m_cuAddr * ctu.m_numPartitions];
             m_reuseMergeFlag = &m_reuseInterDataCTU->mergeFlag[ctu.m_cuAddr * ctu.m_numPartitions];
         }
-        if (m_param->analysisMode == X265_ANALYSIS_SAVE)
+        if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
             for (int i = 0; i < X265_MAX_PRED_MODE_PER_CTU * numPredDir; i++)
                 m_reuseRef[i] = -1;
     }
@@ -188,7 +224,7 @@
     if (m_slice->m_sliceType == I_SLICE)
     {
         analysis_intra_data* intraDataCTU = (analysis_intra_data*)m_frame->m_analysisData.intraData;
-        if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
+        if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1)
         {
             memcpy(ctu.m_cuDepth, &intraDataCTU->depth[ctu.m_cuAddr * numPartition], sizeof(uint8_t) * numPartition);
             memcpy(ctu.m_lumaIntraDir, &intraDataCTU->modes[ctu.m_cuAddr * numPartition], sizeof(uint8_t) * numPartition);
@@ -200,8 +236,8 @@
     else
     {
         if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
-            ctu.m_cuPelX / g_maxCUSize >= frame.m_encData->m_pir.pirStartCol
-            && ctu.m_cuPelX / g_maxCUSize < frame.m_encData->m_pir.pirEndCol)
+            ctu.m_cuPelX / m_param->maxCUSize >= frame.m_encData->m_pir.pirStartCol
+            && ctu.m_cuPelX / m_param->maxCUSize < frame.m_encData->m_pir.pirEndCol)
             compressIntraCU(ctu, cuGeom, qp);
         else if (!m_param->rdLevel)
         {
@@ -214,7 +250,7 @@
             /* generate residual for entire CTU at once and copy to reconPic */
             encodeResidue(ctu, cuGeom);
         }
-        else if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel == 10)
+        else if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel == 10)
         {
             analysis_inter_data* interDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData;
             int posCTU = ctu.m_cuAddr * numPartition;
@@ -229,7 +265,7 @@
             }
             //Calculate log2CUSize from depth
             for (uint32_t i = 0; i < cuGeom.numPartitions; i++)
-                ctu.m_log2CUSize[i] = (uint8_t)g_maxLog2CUSize - ctu.m_cuDepth[i];
+                ctu.m_log2CUSize[i] = (uint8_t)m_param->maxLog2CUSize - ctu.m_cuDepth[i];
 
             qprdRefine (ctu, cuGeom, qp, qp);
             return *m_modeDepth[0].bestMode;
@@ -245,9 +281,69 @@
     if (m_param->bEnableRdRefine || m_param->bOptCUDeltaQP)
         qprdRefine(ctu, cuGeom, qp, qp);
 
+    if (m_param->csvLogLevel >= 2)
+        collectPUStatistics(ctu, cuGeom);
+
     return *m_modeDepth[0].bestMode;
 }
 
+void Analysis::collectPUStatistics(const CUData& ctu, const CUGeom& cuGeom)
+{
+    uint8_t depth = 0;
+    uint8_t partSize = 0;
+    for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
+    {
+        depth = ctu.m_cuDepth[absPartIdx];
+        partSize = ctu.m_partSize[absPartIdx];
+        uint32_t numPU = nbPartsTable[(int)partSize];
+        int shift = 2 * (m_param->maxCUDepth + 1 - depth);
+        for (uint32_t puIdx = 0; puIdx < numPU; puIdx++)
+        {
+            PredictionUnit pu(ctu, cuGeom, puIdx);
+            int puabsPartIdx = ctu.getPUOffset(puIdx, absPartIdx);
+            int mode = 1;
+            if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_Nx2N || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxN)
+                mode = 2;
+            else if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxnU || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxnD || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_nLx2N || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_nRx2N)
+                 mode = 3;
+
+            if (ctu.m_predMode[puabsPartIdx + absPartIdx] == MODE_SKIP)
+            {
+                ctu.m_encData->m_frameStats.cntSkipPu[depth] += (uint64_t)(1 << shift);
+                ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
+            }
+            else if (ctu.m_predMode[puabsPartIdx + absPartIdx] == MODE_INTRA)
+            {
+                if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_NxN)
+                {
+                    ctu.m_encData->m_frameStats.cnt4x4++;
+                    ctu.m_encData->m_frameStats.totalPu[4]++;
+                }
+                else
+                {
+                    ctu.m_encData->m_frameStats.cntIntraPu[depth] += (uint64_t)(1 << shift);
+                    ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
+                }
+            }
+            else if (mode == 3)
+            {
+                ctu.m_encData->m_frameStats.cntAmp[depth] += (uint64_t)(1 << shift);
+                ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
+                break;
+            }
+            else
+            {
​

x265_2.4.tar.gz/source/encoder/analysis.h -> x265_2.5.tar.gz/source/encoder/analysis.h Changed

@@ -137,6 +137,10 @@
     int*                    m_multipassMvpIdx[2];
     int32_t*                m_multipassRef[2];
     uint8_t*                m_multipassModes;
+
+    uint8_t                 m_evaluateInter;
+    uint8_t*                m_additionalCtuInfo;
+    int*                    m_prevCtuInfoChange;
     /* refine RD based on QP for rd-levels 5 and 6 */
     void qprdRefine(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp, int32_t lqp);
 
@@ -178,6 +182,9 @@
 
     void calculateNormFactor(CUData& ctu, int qp);
     void normFactor(const pixel* src, uint32_t blockSize, CUData& ctu, int qp, TextType ttype);
+
+    void collectPUStatistics(const CUData& ctu, const CUGeom& cuGeom);
+
     /* check whether current mode is the new best */
     inline void checkBestMode(Mode& mode, uint32_t depth)
     {
@@ -190,6 +197,7 @@
         else
             md.bestMode = &mode;
     }
+    int findSameContentRefCount(const CUData& parentCTU, const CUGeom& cuGeom);
 };
 
 struct ThreadLocalData

 
@@ -137,6 +137,10 @@
     int*                    m_multipassMvpIdx[2];
     int32_t*                m_multipassRef[2];
     uint8_t*                m_multipassModes;
+
+    uint8_t                 m_evaluateInter;
+    uint8_t*                m_additionalCtuInfo;
+    int*                    m_prevCtuInfoChange;
     /* refine RD based on QP for rd-levels 5 and 6 */
     void qprdRefine(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp, int32_t lqp);
 
@@ -178,6 +182,9 @@
 
     void calculateNormFactor(CUData& ctu, int qp);
     void normFactor(const pixel* src, uint32_t blockSize, CUData& ctu, int qp, TextType ttype);
+
+    void collectPUStatistics(const CUData& ctu, const CUGeom& cuGeom);
+
     /* check whether current mode is the new best */
     inline void checkBestMode(Mode& mode, uint32_t depth)
     {
@@ -190,6 +197,7 @@
         else
             md.bestMode = &mode;
     }
+    int findSameContentRefCount(const CUData& parentCTU, const CUGeom& cuGeom);
 };
 
 struct ThreadLocalData
​

x265_2.4.tar.gz/source/encoder/api.cpp -> x265_2.5.tar.gz/source/encoder/api.cpp Changed

@@ -30,6 +30,7 @@
 #include "level.h"
 #include "nal.h"
 #include "bitcost.h"
+#include "x265-extras.h"
 
 /* multilib namespace reflectors */
 #if LINKED_8BIT
@@ -96,9 +97,6 @@
     if (x265_check_params(param))
         goto fail;
 
-    if (x265_set_globals(param))
-        goto fail;
-
     encoder = new Encoder;
     if (!param->rc.bEnableSlowFirstPass)
         PARAM_NS::x265_param_apply_fastfirstpass(param);
@@ -119,6 +117,17 @@
     }
 
     encoder->create();
+    /* Try to open CSV file handle */
+    if (encoder->m_param->csvfn)
+    {
+        encoder->m_param->csvfpt = x265_csvlog_open(*encoder->m_param, encoder->m_param->csvfn, encoder->m_param->csvLogLevel);
+        if (!encoder->m_param->csvfpt)
+        {
+            x265_log(encoder->m_param, X265_LOG_ERROR, "Unable to open CSV log file <%s>, aborting\n", encoder->m_param->csvfn);
+            encoder->m_aborted = true;
+        }
+    }
+
     encoder->m_latestParam = latestParam;
     memcpy(latestParam, param, sizeof(x265_param));
     if (encoder->m_aborted)
@@ -144,7 +153,10 @@
         if (encoder->m_param->rc.bStatRead && encoder->m_param->bMultiPassOptRPS)
         {
             if (!encoder->computeSPSRPSIndex())
+            {
+                encoder->m_aborted = true;
                 return -1;
+            }
         }
         encoder->getStreamHeaders(encoder->m_nalList, sbacCoder, bs);
         *pp_nal = &encoder->m_nalList.m_nal[0];
@@ -152,6 +164,11 @@
         return encoder->m_nalList.m_occupancy;
     }
 
+    if (enc)
+    {
+        Encoder *encoder = static_cast<Encoder*>(enc);
+        encoder->m_aborted = true;
+    }
     return -1;
 }
 
@@ -251,6 +268,12 @@
     else if (pi_nal)
         *pi_nal = 0;
 
+    if (numEncoded && encoder->m_param->csvLogLevel)
+        x265_csvlog_frame(encoder->m_param->csvfpt, *encoder->m_param, *pic_out, encoder->m_param->csvLogLevel);
+
+    if (numEncoded < 0)
+        encoder->m_aborted = true;
+
     return numEncoded;
 }
 
@@ -263,12 +286,17 @@
     }
 }
 
-void x265_encoder_log(x265_encoder* enc, int, char **)
+void x265_encoder_log(x265_encoder* enc, int argc, char **argv)
 {
     if (enc)
     {
         Encoder *encoder = static_cast<Encoder*>(enc);
-        x265_log(encoder->m_param, X265_LOG_WARNING, "x265_encoder_log is now deprecated\n");
+        x265_stats stats;
+        int padx = encoder->m_sps.conformanceWindow.rightOffset;
+        int pady = encoder->m_sps.conformanceWindow.bottomOffset;
+        encoder->fetchStats(&stats, sizeof(stats));
+        const x265_api * api = x265_api_get(0);
+        x265_csvlog_encode(encoder->m_param->csvfpt, api->version_str, *encoder->m_param, padx, pady, stats, encoder->m_param->csvLogLevel, argc, argv);
     }
 }
 
@@ -282,7 +310,6 @@
         encoder->printSummary();
         encoder->destroy();
         delete encoder;
-        ATOMIC_DEC(&g_ctuSizeConfigured);
     }
 }
 
@@ -295,14 +322,18 @@
     encoder->m_bQueuedIntraRefresh = 1;
     return 0;
 }
+int x265_encoder_ctu_info(x265_encoder *enc, int poc, x265_ctu_info_t** ctu)
+{
+    if (!ctu || !enc)
+        return -1;
+    Encoder* encoder = static_cast<Encoder*>(enc);
+    encoder->copyCtuInfo(ctu, poc);
+    return 0;
+}
 
 void x265_cleanup(void)
 {
-    if (!g_ctuSizeConfigured)
-    {
-        BitCost::destroy();
-        CUData::s_partSet[0] = NULL; /* allow CUData to adjust to new CTU size */
-    }
+    BitCost::destroy();
 }
 
 x265_picture *x265_picture_alloc()
@@ -321,14 +352,14 @@
     pic->userSEI.payloads = NULL;
     pic->userSEI.numPayloads = 0;
 
-    if (param->analysisMode)
+    if (param->analysisReuseMode)
     {
-        uint32_t widthInCU       = (param->sourceWidth  + g_maxCUSize - 1) >> g_maxLog2CUSize;
-        uint32_t heightInCU      = (param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
+        uint32_t widthInCU = (param->sourceWidth + param->maxCUSize - 1) >> param->maxLog2CUSize;
+        uint32_t heightInCU = (param->sourceHeight + param->maxCUSize - 1) >> param->maxLog2CUSize;
 
         uint32_t numCUsInFrame   = widthInCU * heightInCU;
         pic->analysisData.numCUsInFrame = numCUsInFrame;
-        pic->analysisData.numPartitions = NUM_4x4_PARTITIONS;
+        pic->analysisData.numPartitions = param->num4x4Partitions;
     }
 }
 
@@ -372,6 +403,7 @@
 
     sizeof(x265_frame_stats),
     &x265_encoder_intra_refresh,
+    &x265_encoder_ctu_info,
 };
 
 typedef const x265_api* (*api_get_func)(int bitDepth);

 
@@ -30,6 +30,7 @@
 #include "level.h"
 #include "nal.h"
 #include "bitcost.h"
+#include "x265-extras.h"
 
 /* multilib namespace reflectors */
 #if LINKED_8BIT
@@ -96,9 +97,6 @@
     if (x265_check_params(param))
         goto fail;
 
-    if (x265_set_globals(param))
-        goto fail;
-
     encoder = new Encoder;
     if (!param->rc.bEnableSlowFirstPass)
         PARAM_NS::x265_param_apply_fastfirstpass(param);
@@ -119,6 +117,17 @@
     }
 
     encoder->create();
+    /* Try to open CSV file handle */
+    if (encoder->m_param->csvfn)
+    {
+        encoder->m_param->csvfpt = x265_csvlog_open(*encoder->m_param, encoder->m_param->csvfn, encoder->m_param->csvLogLevel);
+        if (!encoder->m_param->csvfpt)
+        {
+            x265_log(encoder->m_param, X265_LOG_ERROR, "Unable to open CSV log file <%s>, aborting\n", encoder->m_param->csvfn);
+            encoder->m_aborted = true;
+        }
+    }
+
     encoder->m_latestParam = latestParam;
     memcpy(latestParam, param, sizeof(x265_param));
     if (encoder->m_aborted)
@@ -144,7 +153,10 @@
         if (encoder->m_param->rc.bStatRead && encoder->m_param->bMultiPassOptRPS)
         {
             if (!encoder->computeSPSRPSIndex())
+            {
+                encoder->m_aborted = true;
                 return -1;
+            }
         }
         encoder->getStreamHeaders(encoder->m_nalList, sbacCoder, bs);
         *pp_nal = &encoder->m_nalList.m_nal[0];
@@ -152,6 +164,11 @@
         return encoder->m_nalList.m_occupancy;
     }
 
+    if (enc)
+    {
+        Encoder *encoder = static_cast<Encoder*>(enc);
+        encoder->m_aborted = true;
+    }
     return -1;
 }
 
@@ -251,6 +268,12 @@
     else if (pi_nal)
         *pi_nal = 0;
 
+    if (numEncoded && encoder->m_param->csvLogLevel)
+        x265_csvlog_frame(encoder->m_param->csvfpt, *encoder->m_param, *pic_out, encoder->m_param->csvLogLevel);
+
+    if (numEncoded < 0)
+        encoder->m_aborted = true;
+
     return numEncoded;
 }
 
@@ -263,12 +286,17 @@
     }
 }
 
-void x265_encoder_log(x265_encoder* enc, int, char **)
+void x265_encoder_log(x265_encoder* enc, int argc, char **argv)
 {
     if (enc)
     {
         Encoder *encoder = static_cast<Encoder*>(enc);
-        x265_log(encoder->m_param, X265_LOG_WARNING, "x265_encoder_log is now deprecated\n");
+        x265_stats stats;
+        int padx = encoder->m_sps.conformanceWindow.rightOffset;
+        int pady = encoder->m_sps.conformanceWindow.bottomOffset;
+        encoder->fetchStats(&stats, sizeof(stats));
+        const x265_api * api = x265_api_get(0);
+        x265_csvlog_encode(encoder->m_param->csvfpt, api->version_str, *encoder->m_param, padx, pady, stats, encoder->m_param->csvLogLevel, argc, argv);
     }
 }
 
@@ -282,7 +310,6 @@
         encoder->printSummary();
         encoder->destroy();
         delete encoder;
-        ATOMIC_DEC(&g_ctuSizeConfigured);
     }
 }
 
@@ -295,14 +322,18 @@
     encoder->m_bQueuedIntraRefresh = 1;
     return 0;
 }
+int x265_encoder_ctu_info(x265_encoder *enc, int poc, x265_ctu_info_t** ctu)
+{
+    if (!ctu || !enc)
+        return -1;
+    Encoder* encoder = static_cast<Encoder*>(enc);
+    encoder->copyCtuInfo(ctu, poc);
+    return 0;
+}
 
 void x265_cleanup(void)
 {
-    if (!g_ctuSizeConfigured)
-    {
-        BitCost::destroy();
-        CUData::s_partSet[0] = NULL; /* allow CUData to adjust to new CTU size */
-    }
+    BitCost::destroy();
 }
 
 x265_picture *x265_picture_alloc()
@@ -321,14 +352,14 @@
     pic->userSEI.payloads = NULL;
     pic->userSEI.numPayloads = 0;
 
-    if (param->analysisMode)
+    if (param->analysisReuseMode)
     {
-        uint32_t widthInCU       = (param->sourceWidth  + g_maxCUSize - 1) >> g_maxLog2CUSize;
-        uint32_t heightInCU      = (param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
+        uint32_t widthInCU = (param->sourceWidth + param->maxCUSize - 1) >> param->maxLog2CUSize;
+        uint32_t heightInCU = (param->sourceHeight + param->maxCUSize - 1) >> param->maxLog2CUSize;
 
         uint32_t numCUsInFrame   = widthInCU * heightInCU;
         pic->analysisData.numCUsInFrame = numCUsInFrame;
-        pic->analysisData.numPartitions = NUM_4x4_PARTITIONS;
+        pic->analysisData.numPartitions = param->num4x4Partitions;
     }
 }
 
@@ -372,6 +403,7 @@
 
     sizeof(x265_frame_stats),
     &x265_encoder_intra_refresh,
+    &x265_encoder_ctu_info,
 };
 
 typedef const x265_api* (*api_get_func)(int bitDepth);
​

x265_2.4.tar.gz/source/encoder/dpb.cpp -> x265_2.5.tar.gz/source/encoder/dpb.cpp Changed

@@ -105,6 +105,23 @@
                 }
             }
 
+            if (curFrame->m_ctuInfo != NULL)
+            {
+                uint32_t widthInCU = (curFrame->m_param->sourceWidth + curFrame->m_param->maxCUSize - 1) >> curFrame->m_param->maxLog2CUSize;
+                uint32_t heightInCU = (curFrame->m_param->sourceHeight + curFrame->m_param->maxCUSize - 1) >> curFrame->m_param->maxLog2CUSize;
+                uint32_t numCUsInFrame = widthInCU * heightInCU;
+                for (uint32_t i = 0; i < numCUsInFrame; i++)
+                {
+                    X265_FREE((*curFrame->m_ctuInfo + i)->ctuInfo);
+                    (*curFrame->m_ctuInfo + i)->ctuInfo = NULL;
+                }
+                X265_FREE(*curFrame->m_ctuInfo);
+                *(curFrame->m_ctuInfo) = NULL;
+                X265_FREE(curFrame->m_ctuInfo);
+                curFrame->m_ctuInfo = NULL;
+                X265_FREE(curFrame->m_prevCtuInfoChange);
+                curFrame->m_prevCtuInfoChange = NULL;
+            }
             curFrame->m_encData = NULL;
             curFrame->m_reconPic = NULL;
         }
@@ -187,7 +204,7 @@
     }
 
     // Disable Loopfilter in bound area, because we will do slice-parallelism in future
-    slice->m_sLFaseFlag = (g_maxSlices > 1) ? false : ((SLFASE_CONSTANT & (1 << (pocCurr % 31))) > 0);
+    slice->m_sLFaseFlag = (newFrame->m_param->maxSlices > 1) ? false : ((SLFASE_CONSTANT & (1 << (pocCurr % 31))) > 0);
 
     /* Increment reference count of all motion-referenced frames to prevent them
      * from being recycled. These counts are decremented at the end of

 
@@ -105,6 +105,23 @@
                 }
             }
 
+            if (curFrame->m_ctuInfo != NULL)
+            {
+                uint32_t widthInCU = (curFrame->m_param->sourceWidth + curFrame->m_param->maxCUSize - 1) >> curFrame->m_param->maxLog2CUSize;
+                uint32_t heightInCU = (curFrame->m_param->sourceHeight + curFrame->m_param->maxCUSize - 1) >> curFrame->m_param->maxLog2CUSize;
+                uint32_t numCUsInFrame = widthInCU * heightInCU;
+                for (uint32_t i = 0; i < numCUsInFrame; i++)
+                {
+                    X265_FREE((*curFrame->m_ctuInfo + i)->ctuInfo);
+                    (*curFrame->m_ctuInfo + i)->ctuInfo = NULL;
+                }
+                X265_FREE(*curFrame->m_ctuInfo);
+                *(curFrame->m_ctuInfo) = NULL;
+                X265_FREE(curFrame->m_ctuInfo);
+                curFrame->m_ctuInfo = NULL;
+                X265_FREE(curFrame->m_prevCtuInfoChange);
+                curFrame->m_prevCtuInfoChange = NULL;
+            }
             curFrame->m_encData = NULL;
             curFrame->m_reconPic = NULL;
         }
@@ -187,7 +204,7 @@
     }
 
     // Disable Loopfilter in bound area, because we will do slice-parallelism in future
-    slice->m_sLFaseFlag = (g_maxSlices > 1) ? false : ((SLFASE_CONSTANT & (1 << (pocCurr % 31))) > 0);
+    slice->m_sLFaseFlag = (newFrame->m_param->maxSlices > 1) ? false : ((SLFASE_CONSTANT & (1 << (pocCurr % 31))) > 0);
 
     /* Increment reference count of all motion-referenced frames to prevent them
      * from being recycled. These counts are decremented at the end of
​

x265_2.4.tar.gz/source/encoder/encoder.cpp -> x265_2.5.tar.gz/source/encoder/encoder.cpp Changed

@@ -86,8 +86,10 @@
         m_frameEncoder[i] = NULL;
     MotionEstimate::initScales();
 
-#if ENABLE_DYNAMIC_HDR10
+#if ENABLE_HDR10_PLUS
     m_hdr10plus_api = hdr10plus_api_get();
+    numCimInfo = 0;
+    cim = NULL;
 #endif
 
     m_prevTonemapPayload.payload = NULL;
@@ -132,26 +134,19 @@
     if (!p->bEnableWavefront && !p->bDistributeModeAnalysis && !p->bDistributeMotionEstimation && !p->lookaheadSlices)
         allowPools = false;
 
-    if (!p->frameNumThreads)
-    {
-        // auto-detect frame threads
-        int cpuCount = ThreadPool::getCpuCount();
-        if (!p->bEnableWavefront)
-            p->frameNumThreads = X265_MIN3(cpuCount, (rows + 1) / 2, X265_MAX_FRAME_THREADS);
-        else if (cpuCount >= 32)
-            p->frameNumThreads = (p->sourceHeight > 2000) ? 8 : 6; // dual-socket 10-core IvyBridge or higher
-        else if (cpuCount >= 16)
-            p->frameNumThreads = 5; // 8 HT cores, or dual socket
-        else if (cpuCount >= 8)
-            p->frameNumThreads = 3; // 4 HT cores
-        else if (cpuCount >= 4)
-            p->frameNumThreads = 2; // Dual or Quad core
-        else
-            p->frameNumThreads = 1;
-    }
     m_numPools = 0;
     if (allowPools)
         m_threadPool = ThreadPool::allocThreadPools(p, m_numPools, 0);
+    else
+    {
+        if (!p->frameNumThreads)
+        {
+            // auto-detect frame threads
+            int cpuCount = ThreadPool::getCpuCount();
+            ThreadPool::getFrameThreadsCount(p, cpuCount);
+        }
+    }
+
     if (!m_numPools)
     {
         // issue warnings if any of these features were requested
@@ -320,8 +315,8 @@
     else
         m_scalingList.setupQuantMatrices(m_sps.chromaFormatIdc);
 
-    int numRows = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
-    int numCols = (m_param->sourceWidth  + g_maxCUSize - 1) / g_maxCUSize;
+    int numRows = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
+    int numCols = (m_param->sourceWidth  + m_param->maxCUSize - 1) / m_param->maxCUSize;
     for (int i = 0; i < m_param->frameNumThreads; i++)
     {
         if (!m_frameEncoder[i]->init(this, numRows, numCols))
@@ -346,12 +341,12 @@
 
     initRefIdx();
 
-    if (m_param->analysisMode)
+    if (m_param->analysisReuseMode)
     {
-        const char* name = m_param->analysisFileName;
+        const char* name = m_param->analysisReuseFileName;
         if (!name)
             name = defaultAnalysisFileName;
-        const char* mode = m_param->analysisMode == X265_ANALYSIS_LOAD ? "rb" : "wb";
+        const char* mode = m_param->analysisReuseMode == X265_ANALYSIS_LOAD ? "rb" : "wb";
         m_analysisFile = x265_fopen(name, mode);
         if (!m_analysisFile)
         {
@@ -362,7 +357,7 @@
 
     if (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion)
     {
-        const char* name = m_param->analysisFileName;
+        const char* name = m_param->analysisReuseFileName;
         if (!name)
             name = defaultAnalysisFileName;
         if (m_param->rc.bStatWrite)
@@ -431,6 +426,10 @@
 
 void Encoder::destroy()
 {
+#if ENABLE_HDR10_PLUS
+    m_hdr10plus_api->hdr10plus_clear_movie(cim, numCimInfo);
+#endif
+        
     if (m_exportedPic)
     {
         ATOMIC_DEC(&m_exportedPic->m_countRefEncoders);
@@ -482,7 +481,7 @@
     {
         int bError = 1;
         fclose(m_analysisFileOut);
-        const char* name = m_param->analysisFileName;
+        const char* name = m_param->analysisReuseFileName;
         if (!name)
             name = defaultAnalysisFileName;
         char* temp = strcatFilename(name, ".temp");
@@ -499,11 +498,14 @@
      }
     if (m_param)
     {
+        if (m_param->csvfpt)
+            fclose(m_param->csvfpt);
         /* release string arguments that were strdup'd */
         free((char*)m_param->rc.lambdaFileName);
         free((char*)m_param->rc.statFileName);
-        free((char*)m_param->analysisFileName);
+        free((char*)m_param->analysisReuseFileName);
         free((char*)m_param->scalingLists);
+        free((char*)m_param->csvfn);
         free((char*)m_param->numaPools);
         free((char*)m_param->masteringDisplayColorVolume);
         free((char*)m_param->toneMapFile);
@@ -518,7 +520,7 @@
         FrameEncoder *encoder = m_frameEncoder[i];
         if (encoder->m_rce.isActive && encoder->m_rce.poc != rc->m_curSlice->m_poc)
         {
-            int64_t bits = (int64_t) X265_MAX(encoder->m_rce.frameSizeEstimated, encoder->m_rce.frameSizePlanned);
+            int64_t bits = m_param->rc.bEnableConstVbv ? (int64_t)encoder->m_rce.frameSizePlanned : (int64_t)X265_MAX(encoder->m_rce.frameSizeEstimated, encoder->m_rce.frameSizePlanned);
             rc->m_bufferFill -= bits;
             rc->m_bufferFill = X265_MAX(rc->m_bufferFill, 0);
             rc->m_bufferFill += encoder->m_rce.bufferRate;
@@ -593,6 +595,8 @@
 
     if (m_exportedPic)
     {
+        if (!m_param->bUseAnalysisFile && m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
+            freeAnalysis(&m_exportedPic->m_analysisData);
         ATOMIC_DEC(&m_exportedPic->m_countRefEncoders);
         m_exportedPic = NULL;
         m_dpb->recycleUnreferenced();
@@ -601,16 +605,22 @@
     {
         x265_sei_payload toneMap;
         toneMap.payload = NULL;
-#if ENABLE_DYNAMIC_HDR10
+#if ENABLE_HDR10_PLUS
         if (m_bToneMap)
         {
-            uint8_t *cim = NULL;
-            if (m_hdr10plus_api->hdr10plus_json_to_frame_cim(m_param->toneMapFile, pic_in->poc, cim))
-            {
-                toneMap.payload = (uint8_t*)x265_malloc(sizeof(uint8_t) * cim[0]);
-                toneMap.payloadSize = cim[0];
+            if (pic_in->poc == 0)
+                numCimInfo = m_hdr10plus_api->hdr10plus_json_to_movie_cim(m_param->toneMapFile, cim);
+            if (pic_in->poc < numCimInfo)
+            {
+                int32_t i = 0;
+                toneMap.payloadSize = 0;
+                while (cim[pic_in->poc][i] == 0xFF)
+                    toneMap.payloadSize += cim[pic_in->poc][i++];
+                toneMap.payloadSize += cim[pic_in->poc][i++];
+
+                toneMap.payload = (uint8_t*)x265_malloc(sizeof(uint8_t) * toneMap.payloadSize);
                 toneMap.payloadType = USER_DATA_REGISTERED_ITU_T_T35;
-                memcpy(toneMap.payload, cim, toneMap.payloadSize);
+                memcpy(toneMap.payload, cim[pic_in->poc] + i, toneMap.payloadSize);
             }
         }
 #endif
@@ -708,7 +718,7 @@
             for (int i = 0; i < numPayloads; i++)
             {
                 x265_sei_payload input;
-                if (i == (numPayloads - 1))
+                if ((i == (numPayloads - 1)) && toneMapEnable)
                     input = toneMap;
                 else
                     input = pic_in->userSEI.payloads[i];
@@ -754,24 +764,40 @@
 
         /* In analysisSave mode, x265_analysis_data is allocated in pic_in and inFrame points to this */
         /* Load analysis data before lookahead->addPicture, since sliceType has been decided */
-        if (m_param->analysisMode == X265_ANALYSIS_LOAD)
+        if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD)
         {
-            x265_picture* inputPic = const_cast<x265_picture*>(pic_in);
             /* readAnalysisFile reads analysis data for the frame and allocates memory based on slicetype */
-            readAnalysisFile(&inputPic->analysisData, inFrame->m_poc);
-            inFrame->m_analysisData.poc = inFrame->m_poc;
-            inFrame->m_analysisData.sliceType = inputPic->analysisData.sliceType;
-            inFrame->m_analysisData.bScenecut = inputPic->analysisData.bScenecut;
-            inFrame->m_analysisData.satdCost = inputPic->analysisData.satdCost;
-            inFrame->m_analysisData.numCUsInFrame = inputPic->analysisData.numCUsInFrame;
-            inFrame->m_analysisData.numPartitions = inputPic->analysisData.numPartitions;
-            inFrame->m_analysisData.wt = inputPic->analysisData.wt;
-            inFrame->m_analysisData.interData = inputPic->analysisData.interData;
-            inFrame->m_analysisData.intraData = inputPic->analysisData.intraData;
-            sliceType = inputPic->analysisData.sliceType;
+            readAnalysisFile(&inFrame->m_analysisData, inFrame->m_poc, pic_in);
+            sliceType = inFrame->m_analysisData.sliceType;

 
@@ -86,8 +86,10 @@
         m_frameEncoder[i] = NULL;
     MotionEstimate::initScales();
 
-#if ENABLE_DYNAMIC_HDR10
+#if ENABLE_HDR10_PLUS
     m_hdr10plus_api = hdr10plus_api_get();
+    numCimInfo = 0;
+    cim = NULL;
 #endif
 
     m_prevTonemapPayload.payload = NULL;
@@ -132,26 +134,19 @@
     if (!p->bEnableWavefront && !p->bDistributeModeAnalysis && !p->bDistributeMotionEstimation && !p->lookaheadSlices)
         allowPools = false;
 
-    if (!p->frameNumThreads)
-    {
-        // auto-detect frame threads
-        int cpuCount = ThreadPool::getCpuCount();
-        if (!p->bEnableWavefront)
-            p->frameNumThreads = X265_MIN3(cpuCount, (rows + 1) / 2, X265_MAX_FRAME_THREADS);
-        else if (cpuCount >= 32)
-            p->frameNumThreads = (p->sourceHeight > 2000) ? 8 : 6; // dual-socket 10-core IvyBridge or higher
-        else if (cpuCount >= 16)
-            p->frameNumThreads = 5; // 8 HT cores, or dual socket
-        else if (cpuCount >= 8)
-            p->frameNumThreads = 3; // 4 HT cores
-        else if (cpuCount >= 4)
-            p->frameNumThreads = 2; // Dual or Quad core
-        else
-            p->frameNumThreads = 1;
-    }
     m_numPools = 0;
     if (allowPools)
         m_threadPool = ThreadPool::allocThreadPools(p, m_numPools, 0);
+    else
+    {
+        if (!p->frameNumThreads)
+        {
+            // auto-detect frame threads
+            int cpuCount = ThreadPool::getCpuCount();
+            ThreadPool::getFrameThreadsCount(p, cpuCount);
+        }
+    }
+
     if (!m_numPools)
     {
         // issue warnings if any of these features were requested
@@ -320,8 +315,8 @@
     else
         m_scalingList.setupQuantMatrices(m_sps.chromaFormatIdc);
 
-    int numRows = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
-    int numCols = (m_param->sourceWidth  + g_maxCUSize - 1) / g_maxCUSize;
+    int numRows = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
+    int numCols = (m_param->sourceWidth  + m_param->maxCUSize - 1) / m_param->maxCUSize;
     for (int i = 0; i < m_param->frameNumThreads; i++)
     {
         if (!m_frameEncoder[i]->init(this, numRows, numCols))
@@ -346,12 +341,12 @@
 
     initRefIdx();
 
-    if (m_param->analysisMode)
+    if (m_param->analysisReuseMode)
     {
-        const char* name = m_param->analysisFileName;
+        const char* name = m_param->analysisReuseFileName;
         if (!name)
             name = defaultAnalysisFileName;
-        const char* mode = m_param->analysisMode == X265_ANALYSIS_LOAD ? "rb" : "wb";
+        const char* mode = m_param->analysisReuseMode == X265_ANALYSIS_LOAD ? "rb" : "wb";
         m_analysisFile = x265_fopen(name, mode);
         if (!m_analysisFile)
         {
@@ -362,7 +357,7 @@
 
     if (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion)
     {
-        const char* name = m_param->analysisFileName;
+        const char* name = m_param->analysisReuseFileName;
         if (!name)
             name = defaultAnalysisFileName;
         if (m_param->rc.bStatWrite)
@@ -431,6 +426,10 @@
 
 void Encoder::destroy()
 {
+#if ENABLE_HDR10_PLUS
+    m_hdr10plus_api->hdr10plus_clear_movie(cim, numCimInfo);
+#endif
+        
     if (m_exportedPic)
     {
         ATOMIC_DEC(&m_exportedPic->m_countRefEncoders);
@@ -482,7 +481,7 @@
     {
         int bError = 1;
         fclose(m_analysisFileOut);
-        const char* name = m_param->analysisFileName;
+        const char* name = m_param->analysisReuseFileName;
         if (!name)
             name = defaultAnalysisFileName;
         char* temp = strcatFilename(name, ".temp");
@@ -499,11 +498,14 @@
      }
     if (m_param)
     {
+        if (m_param->csvfpt)
+            fclose(m_param->csvfpt);
         /* release string arguments that were strdup'd */
         free((char*)m_param->rc.lambdaFileName);
         free((char*)m_param->rc.statFileName);
-        free((char*)m_param->analysisFileName);
+        free((char*)m_param->analysisReuseFileName);
         free((char*)m_param->scalingLists);
+        free((char*)m_param->csvfn);
         free((char*)m_param->numaPools);
         free((char*)m_param->masteringDisplayColorVolume);
         free((char*)m_param->toneMapFile);
@@ -518,7 +520,7 @@
         FrameEncoder *encoder = m_frameEncoder[i];
         if (encoder->m_rce.isActive && encoder->m_rce.poc != rc->m_curSlice->m_poc)
         {
-            int64_t bits = (int64_t) X265_MAX(encoder->m_rce.frameSizeEstimated, encoder->m_rce.frameSizePlanned);
+            int64_t bits = m_param->rc.bEnableConstVbv ? (int64_t)encoder->m_rce.frameSizePlanned : (int64_t)X265_MAX(encoder->m_rce.frameSizeEstimated, encoder->m_rce.frameSizePlanned);
             rc->m_bufferFill -= bits;
             rc->m_bufferFill = X265_MAX(rc->m_bufferFill, 0);
             rc->m_bufferFill += encoder->m_rce.bufferRate;
@@ -593,6 +595,8 @@
 
     if (m_exportedPic)
     {
+        if (!m_param->bUseAnalysisFile && m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
+            freeAnalysis(&m_exportedPic->m_analysisData);
         ATOMIC_DEC(&m_exportedPic->m_countRefEncoders);
         m_exportedPic = NULL;
         m_dpb->recycleUnreferenced();
@@ -601,16 +605,22 @@
     {
         x265_sei_payload toneMap;
         toneMap.payload = NULL;
-#if ENABLE_DYNAMIC_HDR10
+#if ENABLE_HDR10_PLUS
         if (m_bToneMap)
         {
-            uint8_t *cim = NULL;
-            if (m_hdr10plus_api->hdr10plus_json_to_frame_cim(m_param->toneMapFile, pic_in->poc, cim))
-            {
-                toneMap.payload = (uint8_t*)x265_malloc(sizeof(uint8_t) * cim[0]);
-                toneMap.payloadSize = cim[0];
+            if (pic_in->poc == 0)
+                numCimInfo = m_hdr10plus_api->hdr10plus_json_to_movie_cim(m_param->toneMapFile, cim);
+            if (pic_in->poc < numCimInfo)
+            {
+                int32_t i = 0;
+                toneMap.payloadSize = 0;
+                while (cim[pic_in->poc][i] == 0xFF)
+                    toneMap.payloadSize += cim[pic_in->poc][i++];
+                toneMap.payloadSize += cim[pic_in->poc][i++];
+
+                toneMap.payload = (uint8_t*)x265_malloc(sizeof(uint8_t) * toneMap.payloadSize);
                 toneMap.payloadType = USER_DATA_REGISTERED_ITU_T_T35;
-                memcpy(toneMap.payload, cim, toneMap.payloadSize);
+                memcpy(toneMap.payload, cim[pic_in->poc] + i, toneMap.payloadSize);
             }
         }
 #endif
@@ -708,7 +718,7 @@
             for (int i = 0; i < numPayloads; i++)
             {
                 x265_sei_payload input;
-                if (i == (numPayloads - 1))
+                if ((i == (numPayloads - 1)) && toneMapEnable)
                     input = toneMap;
                 else
                     input = pic_in->userSEI.payloads[i];
@@ -754,24 +764,40 @@
 
         /* In analysisSave mode, x265_analysis_data is allocated in pic_in and inFrame points to this */
         /* Load analysis data before lookahead->addPicture, since sliceType has been decided */
-        if (m_param->analysisMode == X265_ANALYSIS_LOAD)
+        if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD)
         {
-            x265_picture* inputPic = const_cast<x265_picture*>(pic_in);
             /* readAnalysisFile reads analysis data for the frame and allocates memory based on slicetype */
-            readAnalysisFile(&inputPic->analysisData, inFrame->m_poc);
-            inFrame->m_analysisData.poc = inFrame->m_poc;
-            inFrame->m_analysisData.sliceType = inputPic->analysisData.sliceType;
-            inFrame->m_analysisData.bScenecut = inputPic->analysisData.bScenecut;
-            inFrame->m_analysisData.satdCost = inputPic->analysisData.satdCost;
-            inFrame->m_analysisData.numCUsInFrame = inputPic->analysisData.numCUsInFrame;
-            inFrame->m_analysisData.numPartitions = inputPic->analysisData.numPartitions;
-            inFrame->m_analysisData.wt = inputPic->analysisData.wt;
-            inFrame->m_analysisData.interData = inputPic->analysisData.interData;
-            inFrame->m_analysisData.intraData = inputPic->analysisData.intraData;
-            sliceType = inputPic->analysisData.sliceType;
+            readAnalysisFile(&inFrame->m_analysisData, inFrame->m_poc, pic_in);
+            sliceType = inFrame->m_analysisData.sliceType;
​

x265_2.4.tar.gz/source/encoder/encoder.h -> x265_2.5.tar.gz/source/encoder/encoder.h Changed

@@ -31,11 +31,9 @@
 #include "x265.h"
 #include "nal.h"
 #include "framedata.h"
-
-#ifdef ENABLE_DYNAMIC_HDR10
-    #include "dynamicHDR10\hdr10plus.h"
+#ifdef ENABLE_HDR10_PLUS
+    #include "dynamicHDR10/hdr10plus.h"
 #endif
-
 struct x265_encoder {};
 namespace X265_NS {
 // private namespace
@@ -178,8 +176,10 @@
 
     int                     m_bToneMap; // Enables tone-mapping
 
-#ifdef ENABLE_DYNAMIC_HDR10
+#ifdef ENABLE_HDR10_PLUS
     const hdr10plus_api     *m_hdr10plus_api;
+    uint8_t                 **cim;
+    int                     numCimInfo;
 #endif
 
     x265_sei_payload        m_prevTonemapPayload;
@@ -187,7 +187,7 @@
     Encoder();
     ~Encoder()
     {
-#ifdef ENABLE_DYNAMIC_HDR10
+#ifdef ENABLE_HDR10_PLUS
         if (m_prevTonemapPayload.payload != NULL)
             X265_FREE(m_prevTonemapPayload.payload);
 #endif
@@ -201,6 +201,8 @@
 
     int reconfigureParam(x265_param* encParam, x265_param* param);
 
+    void copyCtuInfo(x265_ctu_info_t** frameCtuInfo, int poc);
+
     void getStreamHeaders(NALList& list, Entropy& sbacCoder, Bitstream& bs);
 
     void fetchStats(x265_stats* stats, size_t statsSizeBytes);
@@ -223,7 +225,7 @@
 
     void freeAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType);
 
-    void readAnalysisFile(x265_analysis_data* analysis, int poc);
+    void readAnalysisFile(x265_analysis_data* analysis, int poc, const x265_picture* picIn);
 
     void writeAnalysisFile(x265_analysis_data* pic, FrameData &curEncData);
     void readAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, int poc, int sliceType);

 
@@ -31,11 +31,9 @@
 #include "x265.h"
 #include "nal.h"
 #include "framedata.h"
-
-#ifdef ENABLE_DYNAMIC_HDR10
-    #include "dynamicHDR10\hdr10plus.h"
+#ifdef ENABLE_HDR10_PLUS
+    #include "dynamicHDR10/hdr10plus.h"
 #endif
-
 struct x265_encoder {};
 namespace X265_NS {
 // private namespace
@@ -178,8 +176,10 @@
 
     int                     m_bToneMap; // Enables tone-mapping
 
-#ifdef ENABLE_DYNAMIC_HDR10
+#ifdef ENABLE_HDR10_PLUS
     const hdr10plus_api     *m_hdr10plus_api;
+    uint8_t                 **cim;
+    int                     numCimInfo;
 #endif
 
     x265_sei_payload        m_prevTonemapPayload;
@@ -187,7 +187,7 @@
     Encoder();
     ~Encoder()
     {
-#ifdef ENABLE_DYNAMIC_HDR10
+#ifdef ENABLE_HDR10_PLUS
         if (m_prevTonemapPayload.payload != NULL)
             X265_FREE(m_prevTonemapPayload.payload);
 #endif
@@ -201,6 +201,8 @@
 
     int reconfigureParam(x265_param* encParam, x265_param* param);
 
+    void copyCtuInfo(x265_ctu_info_t** frameCtuInfo, int poc);
+
     void getStreamHeaders(NALList& list, Entropy& sbacCoder, Bitstream& bs);
 
     void fetchStats(x265_stats* stats, size_t statsSizeBytes);
@@ -223,7 +225,7 @@
 
     void freeAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType);
 
-    void readAnalysisFile(x265_analysis_data* analysis, int poc);
+    void readAnalysisFile(x265_analysis_data* analysis, int poc, const x265_picture* picIn);
 
     void writeAnalysisFile(x265_analysis_data* pic, FrameData &curEncData);
     void readAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, int poc, int sliceType);
​

x265_2.4.tar.gz/source/encoder/entropy.cpp -> x265_2.5.tar.gz/source/encoder/entropy.cpp Changed

@@ -700,7 +700,7 @@
     // TODO: Enable when pps_loop_filter_across_slices_enabled_flag==1
     //       We didn't support filter across slice board, so disable it now
 
-    if (g_maxSlices <= 1)
+    if (encData.m_param->maxSlices <= 1)
     {
         bool isSAOEnabled = slice.m_sps->bUseSAO ? saoParam->bSaoFlag[0] || saoParam->bSaoFlag[1] : false;
         bool isDBFEnabled = !slice.m_pps->bPicDisableDeblockingFilter;
@@ -783,7 +783,7 @@
     if (cuSplitFlag) 
         codeSplitFlag(ctu, absPartIdx, depth);
 
-    if (depth < ctu.m_cuDepth[absPartIdx] && depth < g_maxCUDepth)
+    if (depth < ctu.m_cuDepth[absPartIdx] && depth < ctu.m_encData->m_param->maxCUDepth)
     {
         uint32_t qNumParts = cuGeom.numPartitions >> 2;
         if (depth == slice->m_pps->maxCuDQPDepth && slice->m_pps->bUseDQP)
@@ -863,7 +863,7 @@
     case SIZE_nRx2N:
         bits += bitsCodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 0]);
         bits += bitsCodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 1]);
-        if (depth == g_maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
+        if (depth == cu.m_encData->m_param->maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
             bits += bitsCodeBin(1, m_contextState[OFF_PART_SIZE_CTX + 2]);
         if (cu.m_slice->m_sps->maxAMPDepth > depth)
         {
@@ -888,7 +888,7 @@
     uint32_t cuAddr = ctu.getSCUAddr() + absPartIdx;
     X265_CHECK(realEndAddress == slice->realEndAddress(slice->m_endCUAddr), "real end address expected\n");
 
-    uint32_t granularityMask = g_maxCUSize - 1;
+    uint32_t granularityMask = ctu.m_encData->m_param->maxCUSize - 1;
     uint32_t cuSize = 1 << ctu.m_log2CUSize[absPartIdx];
     uint32_t rpelx = ctu.m_cuPelX + g_zscanToPelX[absPartIdx] + cuSize;
     uint32_t bpely = ctu.m_cuPelY + g_zscanToPelY[absPartIdx] + cuSize;
@@ -902,7 +902,7 @@
     {
         // Encode slice finish
         uint32_t bTerminateSlice = ctu.m_bLastCuInSlice;
-        if (cuAddr + (NUM_4x4_PARTITIONS >> (depth << 1)) == realEndAddress)
+        if (cuAddr + (slice->m_param->num4x4Partitions >> (depth << 1)) == realEndAddress)
             bTerminateSlice = 1;
 
         // The 1-terminating bit is added to all streams, so don't add it here when it's 1.
@@ -1512,7 +1512,7 @@
 
     if (cu.isIntra(absPartIdx))
     {
-        if (depth == g_maxCUDepth)
+        if (depth == cu.m_encData->m_param->maxCUDepth)
             encodeBin(partSize == SIZE_2Nx2N ? 1 : 0, m_contextState[OFF_PART_SIZE_CTX]);
         return;
     }
@@ -1541,7 +1541,7 @@
     case SIZE_nRx2N:
         encodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 0]);
         encodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 1]);
-        if (depth == g_maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
+        if (depth == cu.m_encData->m_param->maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
             encodeBin(1, m_contextState[OFF_PART_SIZE_CTX + 2]);
         if (cu.m_slice->m_sps->maxAMPDepth > depth)
         {

 
@@ -700,7 +700,7 @@
     // TODO: Enable when pps_loop_filter_across_slices_enabled_flag==1
     //       We didn't support filter across slice board, so disable it now
 
-    if (g_maxSlices <= 1)
+    if (encData.m_param->maxSlices <= 1)
     {
         bool isSAOEnabled = slice.m_sps->bUseSAO ? saoParam->bSaoFlag[0] || saoParam->bSaoFlag[1] : false;
         bool isDBFEnabled = !slice.m_pps->bPicDisableDeblockingFilter;
@@ -783,7 +783,7 @@
     if (cuSplitFlag) 
         codeSplitFlag(ctu, absPartIdx, depth);
 
-    if (depth < ctu.m_cuDepth[absPartIdx] && depth < g_maxCUDepth)
+    if (depth < ctu.m_cuDepth[absPartIdx] && depth < ctu.m_encData->m_param->maxCUDepth)
     {
         uint32_t qNumParts = cuGeom.numPartitions >> 2;
         if (depth == slice->m_pps->maxCuDQPDepth && slice->m_pps->bUseDQP)
@@ -863,7 +863,7 @@
     case SIZE_nRx2N:
         bits += bitsCodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 0]);
         bits += bitsCodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 1]);
-        if (depth == g_maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
+        if (depth == cu.m_encData->m_param->maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
             bits += bitsCodeBin(1, m_contextState[OFF_PART_SIZE_CTX + 2]);
         if (cu.m_slice->m_sps->maxAMPDepth > depth)
         {
@@ -888,7 +888,7 @@
     uint32_t cuAddr = ctu.getSCUAddr() + absPartIdx;
     X265_CHECK(realEndAddress == slice->realEndAddress(slice->m_endCUAddr), "real end address expected\n");
 
-    uint32_t granularityMask = g_maxCUSize - 1;
+    uint32_t granularityMask = ctu.m_encData->m_param->maxCUSize - 1;
     uint32_t cuSize = 1 << ctu.m_log2CUSize[absPartIdx];
     uint32_t rpelx = ctu.m_cuPelX + g_zscanToPelX[absPartIdx] + cuSize;
     uint32_t bpely = ctu.m_cuPelY + g_zscanToPelY[absPartIdx] + cuSize;
@@ -902,7 +902,7 @@
     {
         // Encode slice finish
         uint32_t bTerminateSlice = ctu.m_bLastCuInSlice;
-        if (cuAddr + (NUM_4x4_PARTITIONS >> (depth << 1)) == realEndAddress)
+        if (cuAddr + (slice->m_param->num4x4Partitions >> (depth << 1)) == realEndAddress)
             bTerminateSlice = 1;
 
         // The 1-terminating bit is added to all streams, so don't add it here when it's 1.
@@ -1512,7 +1512,7 @@
 
     if (cu.isIntra(absPartIdx))
     {
-        if (depth == g_maxCUDepth)
+        if (depth == cu.m_encData->m_param->maxCUDepth)
             encodeBin(partSize == SIZE_2Nx2N ? 1 : 0, m_contextState[OFF_PART_SIZE_CTX]);
         return;
     }
@@ -1541,7 +1541,7 @@
     case SIZE_nRx2N:
         encodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 0]);
         encodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 1]);
-        if (depth == g_maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
+        if (depth == cu.m_encData->m_param->maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
             encodeBin(1, m_contextState[OFF_PART_SIZE_CTX + 2]);
         if (cu.m_slice->m_sps->maxAMPDepth > depth)
         {
​

x265_2.4.tar.gz/source/encoder/frameencoder.cpp -> x265_2.5.tar.gz/source/encoder/frameencoder.cpp Changed

@@ -124,7 +124,7 @@
     range += !!(m_param->searchMethod < 2);  /* diamond/hex range check lag */
     range += NTAPS_LUMA / 2;                 /* subpel filter half-length */
     range += 2 + (MotionEstimate::hpelIterationCount(m_param->subpelRefine) + 1) / 2; /* subpel refine steps */
-    m_refLagRows = /*(m_param->maxSlices > 1 ? 1 : 0) +*/ 1 + ((range + g_maxCUSize - 1) / g_maxCUSize);
+    m_refLagRows = /*(m_param->maxSlices > 1 ? 1 : 0) +*/ 1 + ((range + m_param->maxCUSize - 1) / m_param->maxCUSize);
 
     // NOTE: 2 times of numRows because both Encoder and Filter in same queue
     if (!WaveFront::init(m_numRows * 2))
@@ -295,6 +295,11 @@
 
     while (m_threadActive)
     {
+        if (m_param->bCTUInfo)
+        {
+            while (!m_frame->m_ctuInfo)
+                m_frame->m_copied.wait();
+        }
         compressFrame();
         m_done.trigger(); /* FrameEncoder::getEncodedPicture() blocks for this event */
         m_enable.wait();
@@ -383,7 +388,7 @@
     bool bUseWeightB = slice->m_sliceType == B_SLICE && slice->m_pps->bUseWeightedBiPred;
 
     WeightParam* reuseWP = NULL;
-    if (m_param->analysisMode && (bUseWeightP || bUseWeightB))
+    if (m_param->analysisReuseMode && (bUseWeightP || bUseWeightB))
         reuseWP = (WeightParam*)m_frame->m_analysisData.wt;
 
     if (bUseWeightP || bUseWeightB)
@@ -392,7 +397,7 @@
         m_cuStats.countWeightAnalyze++;
         ScopedElapsedTime time(m_cuStats.weightAnalyzeTime);
 #endif
-        if (m_param->analysisMode == X265_ANALYSIS_LOAD)
+        if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD)
         {
             for (int list = 0; list < slice->isInterB() + 1; list++) 
             {
@@ -431,7 +436,7 @@
             slice->m_refReconPicList[l][ref] = slice->m_refFrameList[l][ref]->m_reconPic;
             m_mref[l][ref].init(slice->m_refReconPicList[l][ref], w, *m_param);
         }
-        if (m_param->analysisMode == X265_ANALYSIS_SAVE && (bUseWeightP || bUseWeightB))
+        if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE && (bUseWeightP || bUseWeightB))
         {
             for (int i = 0; i < (m_param->internalCsp != X265_CSP_I400 ? 3 : 1); i++)
                 *(reuseWP++) = slice->m_weightPredTable[l][0][i];
@@ -664,7 +669,7 @@
             if (writeSei)
             {
                 SEICreativeIntentMeta sei;
-                sei.cim = payload->payload;
+                sei.m_payload = payload->payload;
                 m_bs.resetBits();
                 sei.setSize(payload->payloadSize);
                 sei.write(m_bs, *slice->m_sps);
@@ -832,7 +837,7 @@
         }
         else if (m_param->decodedPictureHashSEI == 3)
         {
-            uint32_t cuHeight = g_maxCUSize;
+            uint32_t cuHeight = m_param->maxCUSize;
 
             m_checksum[0] = 0;
 
@@ -872,43 +877,52 @@
         m_frame->m_encData->m_frameStats.percent8x8Inter = (double)totalP / totalCuCount;
         m_frame->m_encData->m_frameStats.percent8x8Skip  = (double)totalSkip / totalCuCount;
     }
-    for (uint32_t i = 0; i < m_numRows; i++)
+
+    if (m_param->csvLogLevel >= 1)
     {
-        m_frame->m_encData->m_frameStats.cntIntraNxN      += m_rows[i].rowStats.cntIntraNxN;
-        m_frame->m_encData->m_frameStats.totalCu          += m_rows[i].rowStats.totalCu;
-        m_frame->m_encData->m_frameStats.totalCtu         += m_rows[i].rowStats.totalCtu;
-        m_frame->m_encData->m_frameStats.lumaDistortion   += m_rows[i].rowStats.lumaDistortion;
-        m_frame->m_encData->m_frameStats.chromaDistortion += m_rows[i].rowStats.chromaDistortion;
-        m_frame->m_encData->m_frameStats.psyEnergy        += m_rows[i].rowStats.psyEnergy;
-        m_frame->m_encData->m_frameStats.ssimEnergy       += m_rows[i].rowStats.ssimEnergy;
-        m_frame->m_encData->m_frameStats.resEnergy        += m_rows[i].rowStats.resEnergy;
-        for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+        for (uint32_t i = 0; i < m_numRows; i++)
         {
-            m_frame->m_encData->m_frameStats.cntSkipCu[depth] += m_rows[i].rowStats.cntSkipCu[depth];
-            m_frame->m_encData->m_frameStats.cntMergeCu[depth] += m_rows[i].rowStats.cntMergeCu[depth];
-            for (int m = 0; m < INTER_MODES; m++)
-                m_frame->m_encData->m_frameStats.cuInterDistribution[depth][m] += m_rows[i].rowStats.cuInterDistribution[depth][m];
+            m_frame->m_encData->m_frameStats.cntIntraNxN += m_rows[i].rowStats.cntIntraNxN;
+            m_frame->m_encData->m_frameStats.totalCu += m_rows[i].rowStats.totalCu;
+            m_frame->m_encData->m_frameStats.totalCtu += m_rows[i].rowStats.totalCtu;
+            m_frame->m_encData->m_frameStats.lumaDistortion += m_rows[i].rowStats.lumaDistortion;
+            m_frame->m_encData->m_frameStats.chromaDistortion += m_rows[i].rowStats.chromaDistortion;
+            m_frame->m_encData->m_frameStats.psyEnergy += m_rows[i].rowStats.psyEnergy;
+            m_frame->m_encData->m_frameStats.ssimEnergy += m_rows[i].rowStats.ssimEnergy;
+            m_frame->m_encData->m_frameStats.resEnergy += m_rows[i].rowStats.resEnergy;
+            for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
+            {
+                m_frame->m_encData->m_frameStats.cntSkipCu[depth] += m_rows[i].rowStats.cntSkipCu[depth];
+                m_frame->m_encData->m_frameStats.cntMergeCu[depth] += m_rows[i].rowStats.cntMergeCu[depth];
+                for (int m = 0; m < INTER_MODES; m++)
+                    m_frame->m_encData->m_frameStats.cuInterDistribution[depth][m] += m_rows[i].rowStats.cuInterDistribution[depth][m];
+                for (int n = 0; n < INTRA_MODES; n++)
+                    m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] += m_rows[i].rowStats.cuIntraDistribution[depth][n];
+            }
+        }
+        m_frame->m_encData->m_frameStats.percentIntraNxN = (double)(m_frame->m_encData->m_frameStats.cntIntraNxN * 100) / m_frame->m_encData->m_frameStats.totalCu;
+
+        for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
+        {
+            m_frame->m_encData->m_frameStats.percentSkipCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntSkipCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
+            m_frame->m_encData->m_frameStats.percentMergeCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntMergeCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
             for (int n = 0; n < INTRA_MODES; n++)
-                m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] += m_rows[i].rowStats.cuIntraDistribution[depth][n];
+                m_frame->m_encData->m_frameStats.percentIntraDistribution[depth][n] = (double)(m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] * 100) / m_frame->m_encData->m_frameStats.totalCu;
+            uint64_t cuInterRectCnt = 0; // sum of Nx2N, 2NxN counts
+            cuInterRectCnt += m_frame->m_encData->m_frameStats.cuInterDistribution[depth][1] + m_frame->m_encData->m_frameStats.cuInterDistribution[depth][2];
+            m_frame->m_encData->m_frameStats.percentInterDistribution[depth][0] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][0] * 100) / m_frame->m_encData->m_frameStats.totalCu;
+            m_frame->m_encData->m_frameStats.percentInterDistribution[depth][1] = (double)(cuInterRectCnt * 100) / m_frame->m_encData->m_frameStats.totalCu;
+            m_frame->m_encData->m_frameStats.percentInterDistribution[depth][2] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][3] * 100) / m_frame->m_encData->m_frameStats.totalCu;
         }
     }
-    m_frame->m_encData->m_frameStats.avgLumaDistortion   = (double)(m_frame->m_encData->m_frameStats.lumaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
-    m_frame->m_encData->m_frameStats.avgChromaDistortion = (double)(m_frame->m_encData->m_frameStats.chromaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
-    m_frame->m_encData->m_frameStats.avgPsyEnergy        = (double)(m_frame->m_encData->m_frameStats.psyEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
-    m_frame->m_encData->m_frameStats.avgSsimEnergy       = (double)(m_frame->m_encData->m_frameStats.ssimEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
-    m_frame->m_encData->m_frameStats.avgResEnergy        = (double)(m_frame->m_encData->m_frameStats.resEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
-    m_frame->m_encData->m_frameStats.percentIntraNxN     = (double)(m_frame->m_encData->m_frameStats.cntIntraNxN * 100) / m_frame->m_encData->m_frameStats.totalCu;
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+
+    if (m_param->csvLogLevel >= 2)
     {
-        m_frame->m_encData->m_frameStats.percentSkipCu[depth]  = (double)(m_frame->m_encData->m_frameStats.cntSkipCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
-        m_frame->m_encData->m_frameStats.percentMergeCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntMergeCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
-        for (int n = 0; n < INTRA_MODES; n++)
-            m_frame->m_encData->m_frameStats.percentIntraDistribution[depth][n] = (double)(m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] * 100) / m_frame->m_encData->m_frameStats.totalCu;
-        uint64_t cuInterRectCnt = 0; // sum of Nx2N, 2NxN counts
-        cuInterRectCnt += m_frame->m_encData->m_frameStats.cuInterDistribution[depth][1] + m_frame->m_encData->m_frameStats.cuInterDistribution[depth][2];
-        m_frame->m_encData->m_frameStats.percentInterDistribution[depth][0] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][0] * 100) / m_frame->m_encData->m_frameStats.totalCu;
-        m_frame->m_encData->m_frameStats.percentInterDistribution[depth][1] = (double)(cuInterRectCnt * 100) / m_frame->m_encData->m_frameStats.totalCu;
-        m_frame->m_encData->m_frameStats.percentInterDistribution[depth][2] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][3] * 100) / m_frame->m_encData->m_frameStats.totalCu;
+        m_frame->m_encData->m_frameStats.avgLumaDistortion = (double)(m_frame->m_encData->m_frameStats.lumaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
+        m_frame->m_encData->m_frameStats.avgChromaDistortion = (double)(m_frame->m_encData->m_frameStats.chromaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
+        m_frame->m_encData->m_frameStats.avgPsyEnergy = (double)(m_frame->m_encData->m_frameStats.psyEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
+        m_frame->m_encData->m_frameStats.avgSsimEnergy = (double)(m_frame->m_encData->m_frameStats.ssimEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
+        m_frame->m_encData->m_frameStats.avgResEnergy = (double)(m_frame->m_encData->m_frameStats.resEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
     }
 
     m_bs.resetBits();
@@ -1096,7 +1110,7 @@
     /* Accumulate CU statistics from each worker thread, we could report
      * per-frame stats here, but currently we do not. */
     for (int i = 0; i < numTLD; i++)
-        m_cuStats.accumulate(m_tld[i].analysis.m_stats[m_jpId]);
+        m_cuStats.accumulate(m_tld[i].analysis.m_stats[m_jpId], *m_param);
 #endif
 
     m_endFrameTime = x265_mdate();
@@ -1106,7 +1120,7 @@
 {
     Slice* slice = m_frame->m_encData->m_slice;
     const uint32_t widthInLCUs = slice->m_sps->numCuInWidth;
-    const uint32_t lastCUAddr = (slice->m_endCUAddr + NUM_4x4_PARTITIONS - 1) / NUM_4x4_PARTITIONS;
+    const uint32_t lastCUAddr = (slice->m_endCUAddr + m_param->num4x4Partitions - 1) / m_param->num4x4Partitions;
     const uint32_t numSubstreams = m_param->bEnableWavefront ? slice->m_sps->numCuInHeight : 1;
 
     SAOParam* saoParam = slice->m_sps->bUseSAO ? m_frame->m_encData->m_saoParam : NULL;
@@ -1208,7 +1222,6 @@
     const uint32_t row = (uint32_t)intRow;
     CTURow& curRow = m_rows[row];
 
-    tld.analysis.m_param = m_param;
     if (m_param->bEnableWavefront)
     {
         ScopedLock self(curRow.lock);
@@ -1241,7 +1254,7 @@
 
     uint32_t maxBlockCols = (m_frame->m_fencPic->m_picWidth + (16 - 1)) / 16;
     uint32_t maxBlockRows = (m_frame->m_fencPic->m_picHeight + (16 - 1)) / 16;
-    uint32_t noOfBlocks = g_maxCUSize / 16;
+    uint32_t noOfBlocks = m_param->maxCUSize / 16;
     const uint32_t bFirstRowInSlice = ((row == 0) || (m_rows[row - 1].sliceId != curRow.sliceId)) ? 1 : 0;
     const uint32_t bLastRowInSlice = ((row == m_numRows - 1) || (m_rows[row + 1].sliceId != curRow.sliceId)) ? 1 : 0;
     const uint32_t sliceId = curRow.sliceId;
@@ -1320,8 +1333,8 @@
     // TODO: specially case handle on first and last row
 
     // Initialize restrict on MV range in slices
-    tld.analysis.m_sliceMinY = -(int16_t)(rowInSlice * g_maxCUSize * 4) + 3 * 4;
-    tld.analysis.m_sliceMaxY = (int16_t)((endRowInSlicePlus1 - 1 - row) * (g_maxCUSize * 4) - 4 * 4);
+    tld.analysis.m_sliceMinY = -(int16_t)(rowInSlice * m_param->maxCUSize * 4) + 3 * 4;
+    tld.analysis.m_sliceMaxY = (int16_t)((endRowInSlicePlus1 - 1 - row) * (m_param->maxCUSize * 4) - 4 * 4);
 
     // Handle single row slice
     if (tld.analysis.m_sliceMaxY < tld.analysis.m_sliceMinY)
@@ -1361,8 +1374,8 @@
                 cuStat.baseQp = curEncData.m_rowStat[row].rowQp;
 
             /* TODO: use defines from slicetype.h for lowres block size */

 
@@ -124,7 +124,7 @@
     range += !!(m_param->searchMethod < 2);  /* diamond/hex range check lag */
     range += NTAPS_LUMA / 2;                 /* subpel filter half-length */
     range += 2 + (MotionEstimate::hpelIterationCount(m_param->subpelRefine) + 1) / 2; /* subpel refine steps */
-    m_refLagRows = /*(m_param->maxSlices > 1 ? 1 : 0) +*/ 1 + ((range + g_maxCUSize - 1) / g_maxCUSize);
+    m_refLagRows = /*(m_param->maxSlices > 1 ? 1 : 0) +*/ 1 + ((range + m_param->maxCUSize - 1) / m_param->maxCUSize);
 
     // NOTE: 2 times of numRows because both Encoder and Filter in same queue
     if (!WaveFront::init(m_numRows * 2))
@@ -295,6 +295,11 @@
 
     while (m_threadActive)
     {
+        if (m_param->bCTUInfo)
+        {
+            while (!m_frame->m_ctuInfo)
+                m_frame->m_copied.wait();
+        }
         compressFrame();
         m_done.trigger(); /* FrameEncoder::getEncodedPicture() blocks for this event */
         m_enable.wait();
@@ -383,7 +388,7 @@
     bool bUseWeightB = slice->m_sliceType == B_SLICE && slice->m_pps->bUseWeightedBiPred;
 
     WeightParam* reuseWP = NULL;
-    if (m_param->analysisMode && (bUseWeightP || bUseWeightB))
+    if (m_param->analysisReuseMode && (bUseWeightP || bUseWeightB))
         reuseWP = (WeightParam*)m_frame->m_analysisData.wt;
 
     if (bUseWeightP || bUseWeightB)
@@ -392,7 +397,7 @@
         m_cuStats.countWeightAnalyze++;
         ScopedElapsedTime time(m_cuStats.weightAnalyzeTime);
 #endif
-        if (m_param->analysisMode == X265_ANALYSIS_LOAD)
+        if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD)
         {
             for (int list = 0; list < slice->isInterB() + 1; list++) 
             {
@@ -431,7 +436,7 @@
             slice->m_refReconPicList[l][ref] = slice->m_refFrameList[l][ref]->m_reconPic;
             m_mref[l][ref].init(slice->m_refReconPicList[l][ref], w, *m_param);
         }
-        if (m_param->analysisMode == X265_ANALYSIS_SAVE && (bUseWeightP || bUseWeightB))
+        if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE && (bUseWeightP || bUseWeightB))
         {
             for (int i = 0; i < (m_param->internalCsp != X265_CSP_I400 ? 3 : 1); i++)
                 *(reuseWP++) = slice->m_weightPredTable[l][0][i];
@@ -664,7 +669,7 @@
             if (writeSei)
             {
                 SEICreativeIntentMeta sei;
-                sei.cim = payload->payload;
+                sei.m_payload = payload->payload;
                 m_bs.resetBits();
                 sei.setSize(payload->payloadSize);
                 sei.write(m_bs, *slice->m_sps);
@@ -832,7 +837,7 @@
         }
         else if (m_param->decodedPictureHashSEI == 3)
         {
-            uint32_t cuHeight = g_maxCUSize;
+            uint32_t cuHeight = m_param->maxCUSize;
 
             m_checksum[0] = 0;
 
@@ -872,43 +877,52 @@
         m_frame->m_encData->m_frameStats.percent8x8Inter = (double)totalP / totalCuCount;
         m_frame->m_encData->m_frameStats.percent8x8Skip  = (double)totalSkip / totalCuCount;
     }
-    for (uint32_t i = 0; i < m_numRows; i++)
+
+    if (m_param->csvLogLevel >= 1)
     {
-        m_frame->m_encData->m_frameStats.cntIntraNxN      += m_rows[i].rowStats.cntIntraNxN;
-        m_frame->m_encData->m_frameStats.totalCu          += m_rows[i].rowStats.totalCu;
-        m_frame->m_encData->m_frameStats.totalCtu         += m_rows[i].rowStats.totalCtu;
-        m_frame->m_encData->m_frameStats.lumaDistortion   += m_rows[i].rowStats.lumaDistortion;
-        m_frame->m_encData->m_frameStats.chromaDistortion += m_rows[i].rowStats.chromaDistortion;
-        m_frame->m_encData->m_frameStats.psyEnergy        += m_rows[i].rowStats.psyEnergy;
-        m_frame->m_encData->m_frameStats.ssimEnergy       += m_rows[i].rowStats.ssimEnergy;
-        m_frame->m_encData->m_frameStats.resEnergy        += m_rows[i].rowStats.resEnergy;
-        for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+        for (uint32_t i = 0; i < m_numRows; i++)
         {
-            m_frame->m_encData->m_frameStats.cntSkipCu[depth] += m_rows[i].rowStats.cntSkipCu[depth];
-            m_frame->m_encData->m_frameStats.cntMergeCu[depth] += m_rows[i].rowStats.cntMergeCu[depth];
-            for (int m = 0; m < INTER_MODES; m++)
-                m_frame->m_encData->m_frameStats.cuInterDistribution[depth][m] += m_rows[i].rowStats.cuInterDistribution[depth][m];
+            m_frame->m_encData->m_frameStats.cntIntraNxN += m_rows[i].rowStats.cntIntraNxN;
+            m_frame->m_encData->m_frameStats.totalCu += m_rows[i].rowStats.totalCu;
+            m_frame->m_encData->m_frameStats.totalCtu += m_rows[i].rowStats.totalCtu;
+            m_frame->m_encData->m_frameStats.lumaDistortion += m_rows[i].rowStats.lumaDistortion;
+            m_frame->m_encData->m_frameStats.chromaDistortion += m_rows[i].rowStats.chromaDistortion;
+            m_frame->m_encData->m_frameStats.psyEnergy += m_rows[i].rowStats.psyEnergy;
+            m_frame->m_encData->m_frameStats.ssimEnergy += m_rows[i].rowStats.ssimEnergy;
+            m_frame->m_encData->m_frameStats.resEnergy += m_rows[i].rowStats.resEnergy;
+            for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
+            {
+                m_frame->m_encData->m_frameStats.cntSkipCu[depth] += m_rows[i].rowStats.cntSkipCu[depth];
+                m_frame->m_encData->m_frameStats.cntMergeCu[depth] += m_rows[i].rowStats.cntMergeCu[depth];
+                for (int m = 0; m < INTER_MODES; m++)
+                    m_frame->m_encData->m_frameStats.cuInterDistribution[depth][m] += m_rows[i].rowStats.cuInterDistribution[depth][m];
+                for (int n = 0; n < INTRA_MODES; n++)
+                    m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] += m_rows[i].rowStats.cuIntraDistribution[depth][n];
+            }
+        }
+        m_frame->m_encData->m_frameStats.percentIntraNxN = (double)(m_frame->m_encData->m_frameStats.cntIntraNxN * 100) / m_frame->m_encData->m_frameStats.totalCu;
+
+        for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
+        {
+            m_frame->m_encData->m_frameStats.percentSkipCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntSkipCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
+            m_frame->m_encData->m_frameStats.percentMergeCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntMergeCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
             for (int n = 0; n < INTRA_MODES; n++)
-                m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] += m_rows[i].rowStats.cuIntraDistribution[depth][n];
+                m_frame->m_encData->m_frameStats.percentIntraDistribution[depth][n] = (double)(m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] * 100) / m_frame->m_encData->m_frameStats.totalCu;
+            uint64_t cuInterRectCnt = 0; // sum of Nx2N, 2NxN counts
+            cuInterRectCnt += m_frame->m_encData->m_frameStats.cuInterDistribution[depth][1] + m_frame->m_encData->m_frameStats.cuInterDistribution[depth][2];
+            m_frame->m_encData->m_frameStats.percentInterDistribution[depth][0] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][0] * 100) / m_frame->m_encData->m_frameStats.totalCu;
+            m_frame->m_encData->m_frameStats.percentInterDistribution[depth][1] = (double)(cuInterRectCnt * 100) / m_frame->m_encData->m_frameStats.totalCu;
+            m_frame->m_encData->m_frameStats.percentInterDistribution[depth][2] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][3] * 100) / m_frame->m_encData->m_frameStats.totalCu;
         }
     }
-    m_frame->m_encData->m_frameStats.avgLumaDistortion   = (double)(m_frame->m_encData->m_frameStats.lumaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
-    m_frame->m_encData->m_frameStats.avgChromaDistortion = (double)(m_frame->m_encData->m_frameStats.chromaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
-    m_frame->m_encData->m_frameStats.avgPsyEnergy        = (double)(m_frame->m_encData->m_frameStats.psyEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
-    m_frame->m_encData->m_frameStats.avgSsimEnergy       = (double)(m_frame->m_encData->m_frameStats.ssimEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
-    m_frame->m_encData->m_frameStats.avgResEnergy        = (double)(m_frame->m_encData->m_frameStats.resEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
-    m_frame->m_encData->m_frameStats.percentIntraNxN     = (double)(m_frame->m_encData->m_frameStats.cntIntraNxN * 100) / m_frame->m_encData->m_frameStats.totalCu;
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+
+    if (m_param->csvLogLevel >= 2)
     {
-        m_frame->m_encData->m_frameStats.percentSkipCu[depth]  = (double)(m_frame->m_encData->m_frameStats.cntSkipCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
-        m_frame->m_encData->m_frameStats.percentMergeCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntMergeCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
-        for (int n = 0; n < INTRA_MODES; n++)
-            m_frame->m_encData->m_frameStats.percentIntraDistribution[depth][n] = (double)(m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] * 100) / m_frame->m_encData->m_frameStats.totalCu;
-        uint64_t cuInterRectCnt = 0; // sum of Nx2N, 2NxN counts
-        cuInterRectCnt += m_frame->m_encData->m_frameStats.cuInterDistribution[depth][1] + m_frame->m_encData->m_frameStats.cuInterDistribution[depth][2];
-        m_frame->m_encData->m_frameStats.percentInterDistribution[depth][0] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][0] * 100) / m_frame->m_encData->m_frameStats.totalCu;
-        m_frame->m_encData->m_frameStats.percentInterDistribution[depth][1] = (double)(cuInterRectCnt * 100) / m_frame->m_encData->m_frameStats.totalCu;
-        m_frame->m_encData->m_frameStats.percentInterDistribution[depth][2] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][3] * 100) / m_frame->m_encData->m_frameStats.totalCu;
+        m_frame->m_encData->m_frameStats.avgLumaDistortion = (double)(m_frame->m_encData->m_frameStats.lumaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
+        m_frame->m_encData->m_frameStats.avgChromaDistortion = (double)(m_frame->m_encData->m_frameStats.chromaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
+        m_frame->m_encData->m_frameStats.avgPsyEnergy = (double)(m_frame->m_encData->m_frameStats.psyEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
+        m_frame->m_encData->m_frameStats.avgSsimEnergy = (double)(m_frame->m_encData->m_frameStats.ssimEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
+        m_frame->m_encData->m_frameStats.avgResEnergy = (double)(m_frame->m_encData->m_frameStats.resEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
     }
 
     m_bs.resetBits();
@@ -1096,7 +1110,7 @@
     /* Accumulate CU statistics from each worker thread, we could report
      * per-frame stats here, but currently we do not. */
     for (int i = 0; i < numTLD; i++)
-        m_cuStats.accumulate(m_tld[i].analysis.m_stats[m_jpId]);
+        m_cuStats.accumulate(m_tld[i].analysis.m_stats[m_jpId], *m_param);
 #endif
 
     m_endFrameTime = x265_mdate();
@@ -1106,7 +1120,7 @@
 {
     Slice* slice = m_frame->m_encData->m_slice;
     const uint32_t widthInLCUs = slice->m_sps->numCuInWidth;
-    const uint32_t lastCUAddr = (slice->m_endCUAddr + NUM_4x4_PARTITIONS - 1) / NUM_4x4_PARTITIONS;
+    const uint32_t lastCUAddr = (slice->m_endCUAddr + m_param->num4x4Partitions - 1) / m_param->num4x4Partitions;
     const uint32_t numSubstreams = m_param->bEnableWavefront ? slice->m_sps->numCuInHeight : 1;
 
     SAOParam* saoParam = slice->m_sps->bUseSAO ? m_frame->m_encData->m_saoParam : NULL;
@@ -1208,7 +1222,6 @@
     const uint32_t row = (uint32_t)intRow;
     CTURow& curRow = m_rows[row];
 
-    tld.analysis.m_param = m_param;
     if (m_param->bEnableWavefront)
     {
         ScopedLock self(curRow.lock);
@@ -1241,7 +1254,7 @@
 
     uint32_t maxBlockCols = (m_frame->m_fencPic->m_picWidth + (16 - 1)) / 16;
     uint32_t maxBlockRows = (m_frame->m_fencPic->m_picHeight + (16 - 1)) / 16;
-    uint32_t noOfBlocks = g_maxCUSize / 16;
+    uint32_t noOfBlocks = m_param->maxCUSize / 16;
     const uint32_t bFirstRowInSlice = ((row == 0) || (m_rows[row - 1].sliceId != curRow.sliceId)) ? 1 : 0;
     const uint32_t bLastRowInSlice = ((row == m_numRows - 1) || (m_rows[row + 1].sliceId != curRow.sliceId)) ? 1 : 0;
     const uint32_t sliceId = curRow.sliceId;
@@ -1320,8 +1333,8 @@
     // TODO: specially case handle on first and last row
 
     // Initialize restrict on MV range in slices
-    tld.analysis.m_sliceMinY = -(int16_t)(rowInSlice * g_maxCUSize * 4) + 3 * 4;
-    tld.analysis.m_sliceMaxY = (int16_t)((endRowInSlicePlus1 - 1 - row) * (g_maxCUSize * 4) - 4 * 4);
+    tld.analysis.m_sliceMinY = -(int16_t)(rowInSlice * m_param->maxCUSize * 4) + 3 * 4;
+    tld.analysis.m_sliceMaxY = (int16_t)((endRowInSlicePlus1 - 1 - row) * (m_param->maxCUSize * 4) - 4 * 4);
 
     // Handle single row slice
     if (tld.analysis.m_sliceMaxY < tld.analysis.m_sliceMinY)
@@ -1361,8 +1374,8 @@
                 cuStat.baseQp = curEncData.m_rowStat[row].rowQp;
 
             /* TODO: use defines from slicetype.h for lowres block size */
​

x265_2.4.tar.gz/source/encoder/framefilter.cpp -> x265_2.5.tar.gz/source/encoder/framefilter.cpp Changed

@@ -35,107 +35,126 @@
 static uint64_t computeSSD(pixel *fenc, pixel *rec, intptr_t stride, uint32_t width, uint32_t height);
 static float calculateSSIM(pixel *pix1, intptr_t stride1, pixel *pix2, intptr_t stride2, uint32_t width, uint32_t height, void *buf, uint32_t& cnt);
 
-static void integral_init4h(uint32_t *sum, pixel *pix, intptr_t stride)
+namespace X265_NS
 {
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3];
-    for (int16_t x = 0; x < stride - 4; x++)
+    static void integral_init4h_c(uint32_t *sum, pixel *pix, intptr_t stride)
     {
-        sum[x] = v + sum[x - stride];
-        v += pix[x + 4] - pix[x];
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3];
+        for (int16_t x = 0; x < stride - 4; x++)
+        {
+            sum[x] = v + sum[x - stride];
+            v += pix[x + 4] - pix[x];
+        }
     }
-}
 
-static void integral_init8h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7];
-    for (int16_t x = 0; x < stride - 8; x++)
+    static void integral_init8h_c(uint32_t *sum, pixel *pix, intptr_t stride)
     {
-        sum[x] = v + sum[x - stride];
-        v += pix[x + 8] - pix[x];
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7];
+        for (int16_t x = 0; x < stride - 8; x++)
+        {
+            sum[x] = v + sum[x - stride];
+            v += pix[x + 8] - pix[x];
+        }
     }
-}
 
-static void integral_init12h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
-        pix[8] + pix[9] + pix[10] + pix[11];
-    for (int16_t x = 0; x < stride - 12; x++)
+    static void integral_init12h_c(uint32_t *sum, pixel *pix, intptr_t stride)
     {
-        sum[x] = v + sum[x - stride];
-        v += pix[x + 12] - pix[x];
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
+            pix[8] + pix[9] + pix[10] + pix[11];
+        for (int16_t x = 0; x < stride - 12; x++)
+        {
+            sum[x] = v + sum[x - stride];
+            v += pix[x + 12] - pix[x];
+        }
     }
-}
 
-static void integral_init16h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
-        pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15];
-    for (int16_t x = 0; x < stride - 16; x++)
+    static void integral_init16h_c(uint32_t *sum, pixel *pix, intptr_t stride)
     {
-        sum[x] = v + sum[x - stride];
-        v += pix[x + 16] - pix[x];
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
+            pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15];
+        for (int16_t x = 0; x < stride - 16; x++)
+        {
+            sum[x] = v + sum[x - stride];
+            v += pix[x + 16] - pix[x];
+        }
     }
-}
 
-static void integral_init24h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
-        pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
-        pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23];
-    for (int16_t x = 0; x < stride - 24; x++)
+    static void integral_init24h_c(uint32_t *sum, pixel *pix, intptr_t stride)
     {
-        sum[x] = v + sum[x - stride];
-        v += pix[x + 24] - pix[x];
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
+            pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
+            pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23];
+        for (int16_t x = 0; x < stride - 24; x++)
+        {
+            sum[x] = v + sum[x - stride];
+            v += pix[x + 24] - pix[x];
+        }
     }
-}
 
-static void integral_init32h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
-        pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
-        pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23] +
-        pix[24] + pix[25] + pix[26] + pix[27] + pix[28] + pix[29] + pix[30] + pix[31];
-    for (int16_t x = 0; x < stride - 32; x++)
+    static void integral_init32h_c(uint32_t *sum, pixel *pix, intptr_t stride)
     {
-        sum[x] = v + sum[x - stride];
-        v += pix[x + 32] - pix[x];
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
+            pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
+            pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23] +
+            pix[24] + pix[25] + pix[26] + pix[27] + pix[28] + pix[29] + pix[30] + pix[31];
+        for (int16_t x = 0; x < stride - 32; x++)
+        {
+            sum[x] = v + sum[x - stride];
+            v += pix[x + 32] - pix[x];
+        }
     }
-}
 
-static void integral_init4v(uint32_t *sum4, intptr_t stride)
-{
-    for (int x = 0; x < stride; x++)
-        sum4[x] = sum4[x + 4 * stride] - sum4[x];
-}
+    static void integral_init4v_c(uint32_t *sum4, intptr_t stride)
+    {
+        for (int x = 0; x < stride; x++)
+            sum4[x] = sum4[x + 4 * stride] - sum4[x];
+    }
 
-static void integral_init8v(uint32_t *sum8, intptr_t stride)
-{
-    for (int x = 0; x < stride; x++)
-        sum8[x] = sum8[x + 8 * stride] - sum8[x];
-}
+    static void integral_init8v_c(uint32_t *sum8, intptr_t stride)
+    {
+        for (int x = 0; x < stride; x++)
+            sum8[x] = sum8[x + 8 * stride] - sum8[x];
+    }
 
-static void integral_init12v(uint32_t *sum12, intptr_t stride)
-{
-    for (int x = 0; x < stride; x++)
-        sum12[x] = sum12[x + 12 * stride] - sum12[x];
-}
+    static void integral_init12v_c(uint32_t *sum12, intptr_t stride)
+    {
+        for (int x = 0; x < stride; x++)
+            sum12[x] = sum12[x + 12 * stride] - sum12[x];
+    }
 
-static void integral_init16v(uint32_t *sum16, intptr_t stride)
-{
-    for (int x = 0; x < stride; x++)
-        sum16[x] = sum16[x + 16 * stride] - sum16[x];
-}
+    static void integral_init16v_c(uint32_t *sum16, intptr_t stride)
+    {
+        for (int x = 0; x < stride; x++)
+            sum16[x] = sum16[x + 16 * stride] - sum16[x];
+    }
 
-static void integral_init24v(uint32_t *sum24, intptr_t stride)
-{
-    for (int x = 0; x < stride; x++)
-        sum24[x] = sum24[x + 24 * stride] - sum24[x];
-}
+    static void integral_init24v_c(uint32_t *sum24, intptr_t stride)
+    {
+        for (int x = 0; x < stride; x++)
+            sum24[x] = sum24[x + 24 * stride] - sum24[x];
+    }
 
-static void integral_init32v(uint32_t *sum32, intptr_t stride)
-{
-    for (int x = 0; x < stride; x++)
-        sum32[x] = sum32[x + 32 * stride] - sum32[x];
+    static void integral_init32v_c(uint32_t *sum32, intptr_t stride)
+    {
+        for (int x = 0; x < stride; x++)
+            sum32[x] = sum32[x + 32 * stride] - sum32[x];
+    }
+
+    void setupSeaIntegralPrimitives_c(EncoderPrimitives &p)
+    {
+        p.integral_initv[INTEGRAL_4] = integral_init4v_c;
+        p.integral_initv[INTEGRAL_8] = integral_init8v_c;
+        p.integral_initv[INTEGRAL_12] = integral_init12v_c;
+        p.integral_initv[INTEGRAL_16] = integral_init16v_c;
+        p.integral_initv[INTEGRAL_24] = integral_init24v_c;
+        p.integral_initv[INTEGRAL_32] = integral_init32v_c;
+        p.integral_inith[INTEGRAL_4] = integral_init4h_c;
+        p.integral_inith[INTEGRAL_8] = integral_init8h_c;
+        p.integral_inith[INTEGRAL_12] = integral_init12h_c;
+        p.integral_inith[INTEGRAL_16] = integral_init16h_c;
+        p.integral_inith[INTEGRAL_24] = integral_init24h_c;
+        p.integral_inith[INTEGRAL_32] = integral_init32h_c;

 
@@ -35,107 +35,126 @@
 static uint64_t computeSSD(pixel *fenc, pixel *rec, intptr_t stride, uint32_t width, uint32_t height);
 static float calculateSSIM(pixel *pix1, intptr_t stride1, pixel *pix2, intptr_t stride2, uint32_t width, uint32_t height, void *buf, uint32_t& cnt);
 
-static void integral_init4h(uint32_t *sum, pixel *pix, intptr_t stride)
+namespace X265_NS
 {
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3];
-    for (int16_t x = 0; x < stride - 4; x++)
+    static void integral_init4h_c(uint32_t *sum, pixel *pix, intptr_t stride)
     {
-        sum[x] = v + sum[x - stride];
-        v += pix[x + 4] - pix[x];
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3];
+        for (int16_t x = 0; x < stride - 4; x++)
+        {
+            sum[x] = v + sum[x - stride];
+            v += pix[x + 4] - pix[x];
+        }
     }
-}
 
-static void integral_init8h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7];
-    for (int16_t x = 0; x < stride - 8; x++)
+    static void integral_init8h_c(uint32_t *sum, pixel *pix, intptr_t stride)
     {
-        sum[x] = v + sum[x - stride];
-        v += pix[x + 8] - pix[x];
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7];
+        for (int16_t x = 0; x < stride - 8; x++)
+        {
+            sum[x] = v + sum[x - stride];
+            v += pix[x + 8] - pix[x];
+        }
     }
-}
 
-static void integral_init12h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
-        pix[8] + pix[9] + pix[10] + pix[11];
-    for (int16_t x = 0; x < stride - 12; x++)
+    static void integral_init12h_c(uint32_t *sum, pixel *pix, intptr_t stride)
     {
-        sum[x] = v + sum[x - stride];
-        v += pix[x + 12] - pix[x];
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
+            pix[8] + pix[9] + pix[10] + pix[11];
+        for (int16_t x = 0; x < stride - 12; x++)
+        {
+            sum[x] = v + sum[x - stride];
+            v += pix[x + 12] - pix[x];
+        }
     }
-}
 
-static void integral_init16h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
-        pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15];
-    for (int16_t x = 0; x < stride - 16; x++)
+    static void integral_init16h_c(uint32_t *sum, pixel *pix, intptr_t stride)
     {
-        sum[x] = v + sum[x - stride];
-        v += pix[x + 16] - pix[x];
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
+            pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15];
+        for (int16_t x = 0; x < stride - 16; x++)
+        {
+            sum[x] = v + sum[x - stride];
+            v += pix[x + 16] - pix[x];
+        }
     }
-}
 
-static void integral_init24h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
-        pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
-        pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23];
-    for (int16_t x = 0; x < stride - 24; x++)
+    static void integral_init24h_c(uint32_t *sum, pixel *pix, intptr_t stride)
     {
-        sum[x] = v + sum[x - stride];
-        v += pix[x + 24] - pix[x];
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
+            pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
+            pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23];
+        for (int16_t x = 0; x < stride - 24; x++)
+        {
+            sum[x] = v + sum[x - stride];
+            v += pix[x + 24] - pix[x];
+        }
     }
-}
 
-static void integral_init32h(uint32_t *sum, pixel *pix, intptr_t stride)
-{
-    int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
-        pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
-        pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23] +
-        pix[24] + pix[25] + pix[26] + pix[27] + pix[28] + pix[29] + pix[30] + pix[31];
-    for (int16_t x = 0; x < stride - 32; x++)
+    static void integral_init32h_c(uint32_t *sum, pixel *pix, intptr_t stride)
     {
-        sum[x] = v + sum[x - stride];
-        v += pix[x + 32] - pix[x];
+        int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
+            pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
+            pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23] +
+            pix[24] + pix[25] + pix[26] + pix[27] + pix[28] + pix[29] + pix[30] + pix[31];
+        for (int16_t x = 0; x < stride - 32; x++)
+        {
+            sum[x] = v + sum[x - stride];
+            v += pix[x + 32] - pix[x];
+        }
     }
-}
 
-static void integral_init4v(uint32_t *sum4, intptr_t stride)
-{
-    for (int x = 0; x < stride; x++)
-        sum4[x] = sum4[x + 4 * stride] - sum4[x];
-}
+    static void integral_init4v_c(uint32_t *sum4, intptr_t stride)
+    {
+        for (int x = 0; x < stride; x++)
+            sum4[x] = sum4[x + 4 * stride] - sum4[x];
+    }
 
-static void integral_init8v(uint32_t *sum8, intptr_t stride)
-{
-    for (int x = 0; x < stride; x++)
-        sum8[x] = sum8[x + 8 * stride] - sum8[x];
-}
+    static void integral_init8v_c(uint32_t *sum8, intptr_t stride)
+    {
+        for (int x = 0; x < stride; x++)
+            sum8[x] = sum8[x + 8 * stride] - sum8[x];
+    }
 
-static void integral_init12v(uint32_t *sum12, intptr_t stride)
-{
-    for (int x = 0; x < stride; x++)
-        sum12[x] = sum12[x + 12 * stride] - sum12[x];
-}
+    static void integral_init12v_c(uint32_t *sum12, intptr_t stride)
+    {
+        for (int x = 0; x < stride; x++)
+            sum12[x] = sum12[x + 12 * stride] - sum12[x];
+    }
 
-static void integral_init16v(uint32_t *sum16, intptr_t stride)
-{
-    for (int x = 0; x < stride; x++)
-        sum16[x] = sum16[x + 16 * stride] - sum16[x];
-}
+    static void integral_init16v_c(uint32_t *sum16, intptr_t stride)
+    {
+        for (int x = 0; x < stride; x++)
+            sum16[x] = sum16[x + 16 * stride] - sum16[x];
+    }
 
-static void integral_init24v(uint32_t *sum24, intptr_t stride)
-{
-    for (int x = 0; x < stride; x++)
-        sum24[x] = sum24[x + 24 * stride] - sum24[x];
-}
+    static void integral_init24v_c(uint32_t *sum24, intptr_t stride)
+    {
+        for (int x = 0; x < stride; x++)
+            sum24[x] = sum24[x + 24 * stride] - sum24[x];
+    }
 
-static void integral_init32v(uint32_t *sum32, intptr_t stride)
-{
-    for (int x = 0; x < stride; x++)
-        sum32[x] = sum32[x + 32 * stride] - sum32[x];
+    static void integral_init32v_c(uint32_t *sum32, intptr_t stride)
+    {
+        for (int x = 0; x < stride; x++)
+            sum32[x] = sum32[x + 32 * stride] - sum32[x];
+    }
+
+    void setupSeaIntegralPrimitives_c(EncoderPrimitives &p)
+    {
+        p.integral_initv[INTEGRAL_4] = integral_init4v_c;
+        p.integral_initv[INTEGRAL_8] = integral_init8v_c;
+        p.integral_initv[INTEGRAL_12] = integral_init12v_c;
+        p.integral_initv[INTEGRAL_16] = integral_init16v_c;
+        p.integral_initv[INTEGRAL_24] = integral_init24v_c;
+        p.integral_initv[INTEGRAL_32] = integral_init32v_c;
+        p.integral_inith[INTEGRAL_4] = integral_init4h_c;
+        p.integral_inith[INTEGRAL_8] = integral_init8h_c;
+        p.integral_inith[INTEGRAL_12] = integral_init12h_c;
+        p.integral_inith[INTEGRAL_16] = integral_init16h_c;
+        p.integral_inith[INTEGRAL_24] = integral_init24h_c;
+        p.integral_inith[INTEGRAL_32] = integral_init32h_c;
​

x265_2.4.tar.gz/source/encoder/framefilter.h -> x265_2.5.tar.gz/source/encoder/framefilter.h Changed

 
@@ -123,7 +123,7 @@
 
     uint32_t getCUWidth(int colNum) const
     {
-        return (colNum == (int)m_numCols - 1) ? m_lastWidth : g_maxCUSize;
+        return (colNum == (int)m_numCols - 1) ? m_lastWidth : m_param->maxCUSize;
     }
 
     void init(Encoder *top, FrameEncoder *frame, int numRows, uint32_t numCols);
​

x265_2.4.tar.gz/source/encoder/motion.cpp -> x265_2.5.tar.gz/source/encoder/motion.cpp Changed

@@ -598,6 +598,139 @@
     }
 }
 
+void MotionEstimate::refineMV(ReferencePlanes* ref,
+                              const MV&        mvmin,
+                              const MV&        mvmax,
+                              const MV&        qmvp,
+                              MV&              outQMv)
+{
+    ALIGN_VAR_16(int, costs[16]);
+    if (ctuAddr >= 0)
+        blockOffset = ref->reconPic->getLumaAddr(ctuAddr, absPartIdx) - ref->reconPic->getLumaAddr(0);
+    intptr_t stride = ref->lumaStride;
+    pixel* fenc = fencPUYuv.m_buf[0];
+    pixel* fref = ref->fpelPlane[0] + blockOffset;
+    
+    setMVP(qmvp);
+    
+    MV qmvmin = mvmin.toQPel();
+    MV qmvmax = mvmax.toQPel();
+   
+    /* The term cost used here means satd/sad values for that particular search.
+     * The costs used in ME integer search only includes the SAD cost of motion
+     * residual and sqrtLambda times MVD bits.  The subpel refine steps use SATD
+     * cost of residual and sqrtLambda * MVD bits.
+    */
+             
+    // measure SATD cost at clipped QPEL MVP
+    MV pmv = qmvp.clipped(qmvmin, qmvmax);
+    MV bestpre = pmv;
+    int bprecost;
+
+    bprecost = subpelCompare(ref, pmv, sad);
+
+    /* re-measure full pel rounded MVP with SAD as search start point */
+    MV bmv = pmv.roundToFPel();
+    int bcost = bprecost;
+    if (pmv.isSubpel())
+        bcost = sad(fenc, FENC_STRIDE, fref + bmv.x + bmv.y * stride, stride) + mvcost(bmv << 2);
+
+    /* square refine */
+    int dir = 0;
+    COST_MV_X4_DIR(0, -1, 0, 1, -1, 0, 1, 0, costs);
+    if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
+        COPY2_IF_LT(bcost, costs[0], dir, 1);
+    if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
+        COPY2_IF_LT(bcost, costs[1], dir, 2);
+    COPY2_IF_LT(bcost, costs[2], dir, 3);
+    COPY2_IF_LT(bcost, costs[3], dir, 4);
+    COST_MV_X4_DIR(-1, -1, -1, 1, 1, -1, 1, 1, costs);
+    if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
+        COPY2_IF_LT(bcost, costs[0], dir, 5);
+    if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
+        COPY2_IF_LT(bcost, costs[1], dir, 6);
+    if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
+        COPY2_IF_LT(bcost, costs[2], dir, 7);
+    if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
+        COPY2_IF_LT(bcost, costs[3], dir, 8);
+    bmv += square1[dir];
+
+    if (bprecost < bcost)
+    {
+        bmv = bestpre;
+        bcost = bprecost;
+    }
+    else
+        bmv = bmv.toQPel(); // promote search bmv to qpel
+
+    // TO DO: Change SubpelWorkload to fine tune MV
+    // Now it is set to 5 for experiment.
+    // const SubpelWorkload& wl = workload[this->subpelRefine];
+    const SubpelWorkload& wl = workload[5];
+
+    pixelcmp_t hpelcomp;
+
+    if (wl.hpel_satd)
+    {
+        bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
+        hpelcomp = satd;
+    }
+    else
+        hpelcomp = sad;
+
+    for (int iter = 0; iter < wl.hpel_iters; iter++)
+    {
+        int bdir = 0;
+        for (int i = 1; i <= wl.hpel_dirs; i++)
+        {
+            MV qmv = bmv + square1[i] * 2;            
+
+            // check mv range for slice bound
+            if ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y))
+                continue;
+
+            int cost = subpelCompare(ref, qmv, hpelcomp) + mvcost(qmv);
+            COPY2_IF_LT(bcost, cost, bdir, i);
+        }
+
+        if (bdir)
+            bmv += square1[bdir] * 2;            
+        else
+            break;
+    }
+
+    /* if HPEL search used SAD, remeasure with SATD before QPEL */
+    if (!wl.hpel_satd)
+        bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
+
+    for (int iter = 0; iter < wl.qpel_iters; iter++)
+    {
+        int bdir = 0;
+        for (int i = 1; i <= wl.qpel_dirs; i++)
+        {
+            MV qmv = bmv + square1[i];
+            
+            // check mv range for slice bound
+            if ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y))
+                continue;
+
+            int cost = subpelCompare(ref, qmv, satd) + mvcost(qmv);
+            COPY2_IF_LT(bcost, cost, bdir, i);
+        }
+
+        if (bdir)
+            bmv += square1[bdir];
+        else
+            break;
+    }
+
+    // check mv range for slice bound
+    X265_CHECK(((pmv.y >= qmvmin.y) & (pmv.y <= qmvmax.y)), "mv beyond range!");
+    
+    x265_emms();
+    outQMv = bmv;
+}
+
 int MotionEstimate::motionEstimate(ReferencePlanes *ref,
                                    const MV &       mvmin,
                                    const MV &       mvmax,
@@ -606,6 +739,7 @@
                                    const MV *       mvc,
                                    int              merange,
                                    MV &             outQMv,
+                                   uint32_t         maxSlices,
                                    pixel *          srcReferencePlane)
 {
     ALIGN_VAR_16(int, costs[16]);
@@ -1306,7 +1440,7 @@
     const SubpelWorkload& wl = workload[this->subpelRefine];
 
     // check mv range for slice bound
-    if ((g_maxSlices > 1) & ((bmv.y < qmvmin.y) | (bmv.y > qmvmax.y)))
+    if ((maxSlices > 1) & ((bmv.y < qmvmin.y) | (bmv.y > qmvmax.y)))
     {
         bmv.y = x265_min(x265_max(bmv.y, qmvmin.y), qmvmax.y);
         bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);

 
@@ -598,6 +598,139 @@
     }
 }
 
+void MotionEstimate::refineMV(ReferencePlanes* ref,
+                              const MV&        mvmin,
+                              const MV&        mvmax,
+                              const MV&        qmvp,
+                              MV&              outQMv)
+{
+    ALIGN_VAR_16(int, costs[16]);
+    if (ctuAddr >= 0)
+        blockOffset = ref->reconPic->getLumaAddr(ctuAddr, absPartIdx) - ref->reconPic->getLumaAddr(0);
+    intptr_t stride = ref->lumaStride;
+    pixel* fenc = fencPUYuv.m_buf[0];
+    pixel* fref = ref->fpelPlane[0] + blockOffset;
+    
+    setMVP(qmvp);
+    
+    MV qmvmin = mvmin.toQPel();
+    MV qmvmax = mvmax.toQPel();
+   
+    /* The term cost used here means satd/sad values for that particular search.
+     * The costs used in ME integer search only includes the SAD cost of motion
+     * residual and sqrtLambda times MVD bits.  The subpel refine steps use SATD
+     * cost of residual and sqrtLambda * MVD bits.
+    */
+             
+    // measure SATD cost at clipped QPEL MVP
+    MV pmv = qmvp.clipped(qmvmin, qmvmax);
+    MV bestpre = pmv;
+    int bprecost;
+
+    bprecost = subpelCompare(ref, pmv, sad);
+
+    /* re-measure full pel rounded MVP with SAD as search start point */
+    MV bmv = pmv.roundToFPel();
+    int bcost = bprecost;
+    if (pmv.isSubpel())
+        bcost = sad(fenc, FENC_STRIDE, fref + bmv.x + bmv.y * stride, stride) + mvcost(bmv << 2);
+
+    /* square refine */
+    int dir = 0;
+    COST_MV_X4_DIR(0, -1, 0, 1, -1, 0, 1, 0, costs);
+    if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
+        COPY2_IF_LT(bcost, costs[0], dir, 1);
+    if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
+        COPY2_IF_LT(bcost, costs[1], dir, 2);
+    COPY2_IF_LT(bcost, costs[2], dir, 3);
+    COPY2_IF_LT(bcost, costs[3], dir, 4);
+    COST_MV_X4_DIR(-1, -1, -1, 1, 1, -1, 1, 1, costs);
+    if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
+        COPY2_IF_LT(bcost, costs[0], dir, 5);
+    if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
+        COPY2_IF_LT(bcost, costs[1], dir, 6);
+    if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
+        COPY2_IF_LT(bcost, costs[2], dir, 7);
+    if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
+        COPY2_IF_LT(bcost, costs[3], dir, 8);
+    bmv += square1[dir];
+
+    if (bprecost < bcost)
+    {
+        bmv = bestpre;
+        bcost = bprecost;
+    }
+    else
+        bmv = bmv.toQPel(); // promote search bmv to qpel
+
+    // TO DO: Change SubpelWorkload to fine tune MV
+    // Now it is set to 5 for experiment.
+    // const SubpelWorkload& wl = workload[this->subpelRefine];
+    const SubpelWorkload& wl = workload[5];
+
+    pixelcmp_t hpelcomp;
+
+    if (wl.hpel_satd)
+    {
+        bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
+        hpelcomp = satd;
+    }
+    else
+        hpelcomp = sad;
+
+    for (int iter = 0; iter < wl.hpel_iters; iter++)
+    {
+        int bdir = 0;
+        for (int i = 1; i <= wl.hpel_dirs; i++)
+        {
+            MV qmv = bmv + square1[i] * 2;            
+
+            // check mv range for slice bound
+            if ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y))
+                continue;
+
+            int cost = subpelCompare(ref, qmv, hpelcomp) + mvcost(qmv);
+            COPY2_IF_LT(bcost, cost, bdir, i);
+        }
+
+        if (bdir)
+            bmv += square1[bdir] * 2;            
+        else
+            break;
+    }
+
+    /* if HPEL search used SAD, remeasure with SATD before QPEL */
+    if (!wl.hpel_satd)
+        bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
+
+    for (int iter = 0; iter < wl.qpel_iters; iter++)
+    {
+        int bdir = 0;
+        for (int i = 1; i <= wl.qpel_dirs; i++)
+        {
+            MV qmv = bmv + square1[i];
+            
+            // check mv range for slice bound
+            if ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y))
+                continue;
+
+            int cost = subpelCompare(ref, qmv, satd) + mvcost(qmv);
+            COPY2_IF_LT(bcost, cost, bdir, i);
+        }
+
+        if (bdir)
+            bmv += square1[bdir];
+        else
+            break;
+    }
+
+    // check mv range for slice bound
+    X265_CHECK(((pmv.y >= qmvmin.y) & (pmv.y <= qmvmax.y)), "mv beyond range!");
+    
+    x265_emms();
+    outQMv = bmv;
+}
+
 int MotionEstimate::motionEstimate(ReferencePlanes *ref,
                                    const MV &       mvmin,
                                    const MV &       mvmax,
@@ -606,6 +739,7 @@
                                    const MV *       mvc,
                                    int              merange,
                                    MV &             outQMv,
+                                   uint32_t         maxSlices,
                                    pixel *          srcReferencePlane)
 {
     ALIGN_VAR_16(int, costs[16]);
@@ -1306,7 +1440,7 @@
     const SubpelWorkload& wl = workload[this->subpelRefine];
 
     // check mv range for slice bound
-    if ((g_maxSlices > 1) & ((bmv.y < qmvmin.y) | (bmv.y > qmvmax.y)))
+    if ((maxSlices > 1) & ((bmv.y < qmvmin.y) | (bmv.y > qmvmax.y)))
     {
         bmv.y = x265_min(x265_max(bmv.y, qmvmin.y), qmvmax.y);
         bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
​

x265_2.4.tar.gz/source/encoder/motion.h -> x265_2.5.tar.gz/source/encoder/motion.h Changed

 
@@ -92,7 +92,8 @@
                chromaSatd(refYuv.getCrAddr(puPartIdx), refYuv.m_csize, fencPUYuv.m_buf[2], fencPUYuv.m_csize);
     }
 
-    int motionEstimate(ReferencePlanes* ref, const MV & mvmin, const MV & mvmax, const MV & qmvp, int numCandidates, const MV * mvc, int merange, MV & outQMv, pixel *srcReferencePlane = 0);
+    void refineMV(ReferencePlanes* ref, const MV& mvmin, const MV& mvmax, const MV& qmvp, MV& outQMv);
+    int motionEstimate(ReferencePlanes* ref, const MV & mvmin, const MV & mvmax, const MV & qmvp, int numCandidates, const MV * mvc, int merange, MV & outQMv, uint32_t maxSlices, pixel *srcReferencePlane = 0);
 
     int subpelCompare(ReferencePlanes* ref, const MV &qmv, pixelcmp_t);
 
​

x265_2.4.tar.gz/source/encoder/ratecontrol.cpp -> x265_2.5.tar.gz/source/encoder/ratecontrol.cpp Changed

@@ -2272,7 +2272,7 @@
             uint32_t refRowSatdCost = 0, refRowBits = 0, intraCostForPendingCus = 0;
             double refQScale = 0;
 
-            if (picType != I_SLICE)
+            if (picType != I_SLICE && !m_param->rc.bEnableConstVbv)
             {
                 FrameData& refEncData = *refFrame->m_encData;
                 uint32_t endCuAddr = maxCols * (row + 1);
@@ -2301,7 +2301,8 @@
                     && refFrame 
                     && refFrame->m_encData->m_slice->m_sliceType == picType
                     && refQScale > 0
-                    && refRowSatdCost > 0)
+                    && refRowBits > 0
+                    && !m_param->rc.bEnableConstVbv)
                 {
                     if (abs((int32_t)(refRowSatdCost - satdCostForPendingCus)) < (int32_t)satdCostForPendingCus / 2)
                     {
@@ -2343,7 +2344,7 @@
     }
     rowSatdCost >>= X265_DEPTH - 8;
     updatePredictor(rce->rowPred[0], qScaleVbv, (double)rowSatdCost, encodedBits);
-    if (curEncData.m_slice->m_sliceType != I_SLICE)
+    if (curEncData.m_slice->m_sliceType != I_SLICE && !m_param->rc.bEnableConstVbv)
     {
         Frame* refFrame = curEncData.m_slice->m_refFrameList[0][0];
         if (qpVbv < refFrame->m_encData->m_rowStat[row].rowQp)
@@ -2613,7 +2614,7 @@
             for (uint32_t i = 0; i < slice->m_sps->numCuInHeight; i++)
                 avgQpAq += curEncData.m_rowStat[i].sumQpAq;
 
-            avgQpAq /= (slice->m_sps->numCUsInFrame * NUM_4x4_PARTITIONS);
+            avgQpAq /= (slice->m_sps->numCUsInFrame * m_param->num4x4Partitions);
             curEncData.m_avgQpAq = avgQpAq;
         }
         else
@@ -2711,6 +2712,13 @@
     {
         *filler = updateVbv(actualBits, rce);
 
+        curFrame->m_rcData->bufferFillFinal = m_bufferFillFinal;
+        for (int i = 0; i < 4; i++)
+        {
+            curFrame->m_rcData->coeff[i] = m_pred[i].coeff;
+            curFrame->m_rcData->count[i] = m_pred[i].count;
+            curFrame->m_rcData->offset[i] = m_pred[i].offset;
+        }
         if (m_param->bEmitHRDSEI)
         {
             const VUI *vui = &curEncData.m_slice->m_sps->vuiParameters;

 
@@ -2272,7 +2272,7 @@
             uint32_t refRowSatdCost = 0, refRowBits = 0, intraCostForPendingCus = 0;
             double refQScale = 0;
 
-            if (picType != I_SLICE)
+            if (picType != I_SLICE && !m_param->rc.bEnableConstVbv)
             {
                 FrameData& refEncData = *refFrame->m_encData;
                 uint32_t endCuAddr = maxCols * (row + 1);
@@ -2301,7 +2301,8 @@
                     && refFrame 
                     && refFrame->m_encData->m_slice->m_sliceType == picType
                     && refQScale > 0
-                    && refRowSatdCost > 0)
+                    && refRowBits > 0
+                    && !m_param->rc.bEnableConstVbv)
                 {
                     if (abs((int32_t)(refRowSatdCost - satdCostForPendingCus)) < (int32_t)satdCostForPendingCus / 2)
                     {
@@ -2343,7 +2344,7 @@
     }
     rowSatdCost >>= X265_DEPTH - 8;
     updatePredictor(rce->rowPred[0], qScaleVbv, (double)rowSatdCost, encodedBits);
-    if (curEncData.m_slice->m_sliceType != I_SLICE)
+    if (curEncData.m_slice->m_sliceType != I_SLICE && !m_param->rc.bEnableConstVbv)
     {
         Frame* refFrame = curEncData.m_slice->m_refFrameList[0][0];
         if (qpVbv < refFrame->m_encData->m_rowStat[row].rowQp)
@@ -2613,7 +2614,7 @@
             for (uint32_t i = 0; i < slice->m_sps->numCuInHeight; i++)
                 avgQpAq += curEncData.m_rowStat[i].sumQpAq;
 
-            avgQpAq /= (slice->m_sps->numCUsInFrame * NUM_4x4_PARTITIONS);
+            avgQpAq /= (slice->m_sps->numCUsInFrame * m_param->num4x4Partitions);
             curEncData.m_avgQpAq = avgQpAq;
         }
         else
@@ -2711,6 +2712,13 @@
     {
         *filler = updateVbv(actualBits, rce);
 
+        curFrame->m_rcData->bufferFillFinal = m_bufferFillFinal;
+        for (int i = 0; i < 4; i++)
+        {
+            curFrame->m_rcData->coeff[i] = m_pred[i].coeff;
+            curFrame->m_rcData->count[i] = m_pred[i].count;
+            curFrame->m_rcData->offset[i] = m_pred[i].offset;
+        }
         if (m_param->bEmitHRDSEI)
         {
             const VUI *vui = &curEncData.m_slice->m_sps->vuiParameters;
​

x265_2.4.tar.gz/source/encoder/reference.cpp -> x265_2.5.tar.gz/source/encoder/reference.cpp Changed

@@ -72,12 +72,12 @@
 
     if (wp)
     {
-        uint32_t numCUinHeight = (reconPic->m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
+        uint32_t numCUinHeight = (reconPic->m_picHeight + p.maxCUSize - 1) / p.maxCUSize;
 
         int marginX = reconPic->m_lumaMarginX;
         int marginY = reconPic->m_lumaMarginY;
         intptr_t stride = reconPic->m_stride;
-        int cuHeight = g_maxCUSize;
+        int cuHeight = p.maxCUSize;
 
         for (int c = 0; c < (p.internalCsp != X265_CSP_I400 && recPic->m_picCsp != X265_CSP_I400 ? numInterpPlanes : 1); c++)
         {
@@ -127,15 +127,15 @@
     int marginY = reconPic->m_lumaMarginY;
     intptr_t stride = reconPic->m_stride;
     int width   = reconPic->m_picWidth;
-    int height  = (finishedRows - numWeightedRows) * g_maxCUSize;
+    int height  = (finishedRows - numWeightedRows) * reconPic->m_param->maxCUSize;
     /* the last row may be partial height */
     if (finishedRows == maxNumRows - 1)
     {
-        const int leftRows = (reconPic->m_picHeight & (g_maxCUSize - 1));
+        const int leftRows = (reconPic->m_picHeight & (reconPic->m_param->maxCUSize - 1));
 
-        height += leftRows ? leftRows : g_maxCUSize;
+        height += leftRows ? leftRows : reconPic->m_param->maxCUSize;
     }
-    int cuHeight = g_maxCUSize;
+    int cuHeight = reconPic->m_param->maxCUSize;
 
     for (int c = 0; c < numInterpPlanes; c++)
     {

 
@@ -72,12 +72,12 @@
 
     if (wp)
     {
-        uint32_t numCUinHeight = (reconPic->m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
+        uint32_t numCUinHeight = (reconPic->m_picHeight + p.maxCUSize - 1) / p.maxCUSize;
 
         int marginX = reconPic->m_lumaMarginX;
         int marginY = reconPic->m_lumaMarginY;
         intptr_t stride = reconPic->m_stride;
-        int cuHeight = g_maxCUSize;
+        int cuHeight = p.maxCUSize;
 
         for (int c = 0; c < (p.internalCsp != X265_CSP_I400 && recPic->m_picCsp != X265_CSP_I400 ? numInterpPlanes : 1); c++)
         {
@@ -127,15 +127,15 @@
     int marginY = reconPic->m_lumaMarginY;
     intptr_t stride = reconPic->m_stride;
     int width   = reconPic->m_picWidth;
-    int height  = (finishedRows - numWeightedRows) * g_maxCUSize;
+    int height  = (finishedRows - numWeightedRows) * reconPic->m_param->maxCUSize;
     /* the last row may be partial height */
     if (finishedRows == maxNumRows - 1)
     {
-        const int leftRows = (reconPic->m_picHeight & (g_maxCUSize - 1));
+        const int leftRows = (reconPic->m_picHeight & (reconPic->m_param->maxCUSize - 1));
 
-        height += leftRows ? leftRows : g_maxCUSize;
+        height += leftRows ? leftRows : reconPic->m_param->maxCUSize;
     }
-    int cuHeight = g_maxCUSize;
+    int cuHeight = reconPic->m_param->maxCUSize;
 
     for (int c = 0; c < numInterpPlanes; c++)
     {
​

x265_2.4.tar.gz/source/encoder/sao.cpp -> x265_2.5.tar.gz/source/encoder/sao.cpp Changed

@@ -98,8 +98,8 @@
     m_hChromaShift = CHROMA_H_SHIFT(param->internalCsp);
     m_vChromaShift = CHROMA_V_SHIFT(param->internalCsp);
 
-    m_numCuInWidth =  (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize;
-    m_numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
+    m_numCuInWidth =  (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
+    m_numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
 
     const pixel maxY = (1 << X265_DEPTH) - 1;
     const pixel rangeExt = maxY >> 1;
@@ -107,12 +107,12 @@
 
     for (int i = 0; i < (param->internalCsp != X265_CSP_I400 ? 3 : 1); i++)
     {
-        CHECKED_MALLOC(m_tmpL1[i], pixel, g_maxCUSize + 1);
-        CHECKED_MALLOC(m_tmpL2[i], pixel, g_maxCUSize + 1);
+        CHECKED_MALLOC(m_tmpL1[i], pixel, m_param->maxCUSize + 1);
+        CHECKED_MALLOC(m_tmpL2[i], pixel, m_param->maxCUSize + 1);
 
         // SAO asm code will read 1 pixel before and after, so pad by 2
         // NOTE: m_param->sourceWidth+2 enough, to avoid condition check in copySaoAboveRef(), I alloc more up to 63 bytes in here
-        CHECKED_MALLOC(m_tmpU[i], pixel, m_numCuInWidth * g_maxCUSize + 2 + 32);
+        CHECKED_MALLOC(m_tmpU[i], pixel, m_numCuInWidth * m_param->maxCUSize + 2 + 32);
         m_tmpU[i] += 1;
     }
 
@@ -279,8 +279,8 @@
     uint32_t picWidth  = m_param->sourceWidth;
     uint32_t picHeight = m_param->sourceHeight;
     const CUData* cu = m_frame->m_encData->getPicCTU(addr);
-    int ctuWidth = g_maxCUSize;
-    int ctuHeight = g_maxCUSize;
+    int ctuWidth = m_param->maxCUSize;
+    int ctuHeight = m_param->maxCUSize;
     uint32_t lpelx = cu->m_cuPelX;
     uint32_t tpely = cu->m_cuPelY;
     const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
@@ -573,8 +573,8 @@
 {
     PicYuv* reconPic = m_frame->m_reconPic;
     intptr_t stride = reconPic->m_stride;
-    int ctuWidth  = g_maxCUSize;
-    int ctuHeight = g_maxCUSize;
+    int ctuWidth = m_param->maxCUSize;
+    int ctuHeight = m_param->maxCUSize;
 
     int addr = idxY * m_numCuInWidth + idxX;
     pixel* rec = reconPic->getLumaAddr(addr);
@@ -633,8 +633,8 @@
 {
     PicYuv* reconPic = m_frame->m_reconPic;
     intptr_t stride = reconPic->m_strideC;
-    int ctuWidth  = g_maxCUSize;
-    int ctuHeight = g_maxCUSize;
+    int ctuWidth  = m_param->maxCUSize;
+    int ctuHeight = m_param->maxCUSize;
 
     {
         ctuWidth  >>= m_hChromaShift;
@@ -744,8 +744,8 @@
     intptr_t stride = plane ? reconPic->m_strideC : reconPic->m_stride;
     uint32_t picWidth  = m_param->sourceWidth;
     uint32_t picHeight = m_param->sourceHeight;
-    int ctuWidth  = g_maxCUSize;
-    int ctuHeight = g_maxCUSize;
+    int ctuWidth  = m_param->maxCUSize;
+    int ctuHeight = m_param->maxCUSize;
     uint32_t lpelx = cu->m_cuPelX;
     uint32_t tpely = cu->m_cuPelY;
     const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
@@ -791,9 +791,9 @@
         // WARNING: *) May read beyond bound on video than ctuWidth or ctuHeight is NOT multiple of cuSize
         X265_CHECK((ctuWidth == ctuHeight) || (m_chromaFormat != X265_CSP_I420), "video size check failure\n");
         if (plane)
-            primitives.chroma[m_chromaFormat].cu[g_maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
+            primitives.chroma[m_chromaFormat].cu[m_param->maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
         else
-           primitives.cu[g_maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
+           primitives.cu[m_param->maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
     }
     else
     {
@@ -928,8 +928,8 @@
     intptr_t stride = reconPic->m_stride;
     uint32_t picWidth  = m_param->sourceWidth;
     uint32_t picHeight = m_param->sourceHeight;
-    int ctuWidth  = g_maxCUSize;
-    int ctuHeight = g_maxCUSize;
+    int ctuWidth  = m_param->maxCUSize;
+    int ctuHeight = m_param->maxCUSize;
     uint32_t lpelx = cu->m_cuPelX;
     uint32_t tpely = cu->m_cuPelY;
     const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
@@ -1553,14 +1553,17 @@
     }
 
     // Estimate Best Position
-    int64_t bestRDCostBO = MAX_INT64;
     int32_t bestClassBO  = 0;
+    int64_t currentRDCost = costClasses[0];
+    currentRDCost += costClasses[1];
+    currentRDCost += costClasses[2];
+    currentRDCost += costClasses[3];
+    int64_t bestRDCostBO = currentRDCost;
 
-    for (int i = 0; i < MAX_NUM_SAO_CLASS - SAO_NUM_OFFSET + 1; i++)
+    for (int i = 1; i < MAX_NUM_SAO_CLASS - SAO_NUM_OFFSET + 1; i++)
     {
-        int64_t currentRDCost = 0;
-        for (int j = i; j < i + SAO_NUM_OFFSET; j++)
-            currentRDCost += costClasses[j];
+        currentRDCost -= costClasses[i - 1];
+        currentRDCost += costClasses[i + 3];
 
         if (currentRDCost < bestRDCostBO)
         {

 
@@ -98,8 +98,8 @@
     m_hChromaShift = CHROMA_H_SHIFT(param->internalCsp);
     m_vChromaShift = CHROMA_V_SHIFT(param->internalCsp);
 
-    m_numCuInWidth =  (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize;
-    m_numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
+    m_numCuInWidth =  (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
+    m_numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
 
     const pixel maxY = (1 << X265_DEPTH) - 1;
     const pixel rangeExt = maxY >> 1;
@@ -107,12 +107,12 @@
 
     for (int i = 0; i < (param->internalCsp != X265_CSP_I400 ? 3 : 1); i++)
     {
-        CHECKED_MALLOC(m_tmpL1[i], pixel, g_maxCUSize + 1);
-        CHECKED_MALLOC(m_tmpL2[i], pixel, g_maxCUSize + 1);
+        CHECKED_MALLOC(m_tmpL1[i], pixel, m_param->maxCUSize + 1);
+        CHECKED_MALLOC(m_tmpL2[i], pixel, m_param->maxCUSize + 1);
 
         // SAO asm code will read 1 pixel before and after, so pad by 2
         // NOTE: m_param->sourceWidth+2 enough, to avoid condition check in copySaoAboveRef(), I alloc more up to 63 bytes in here
-        CHECKED_MALLOC(m_tmpU[i], pixel, m_numCuInWidth * g_maxCUSize + 2 + 32);
+        CHECKED_MALLOC(m_tmpU[i], pixel, m_numCuInWidth * m_param->maxCUSize + 2 + 32);
         m_tmpU[i] += 1;
     }
 
@@ -279,8 +279,8 @@
     uint32_t picWidth  = m_param->sourceWidth;
     uint32_t picHeight = m_param->sourceHeight;
     const CUData* cu = m_frame->m_encData->getPicCTU(addr);
-    int ctuWidth = g_maxCUSize;
-    int ctuHeight = g_maxCUSize;
+    int ctuWidth = m_param->maxCUSize;
+    int ctuHeight = m_param->maxCUSize;
     uint32_t lpelx = cu->m_cuPelX;
     uint32_t tpely = cu->m_cuPelY;
     const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
@@ -573,8 +573,8 @@
 {
     PicYuv* reconPic = m_frame->m_reconPic;
     intptr_t stride = reconPic->m_stride;
-    int ctuWidth  = g_maxCUSize;
-    int ctuHeight = g_maxCUSize;
+    int ctuWidth = m_param->maxCUSize;
+    int ctuHeight = m_param->maxCUSize;
 
     int addr = idxY * m_numCuInWidth + idxX;
     pixel* rec = reconPic->getLumaAddr(addr);
@@ -633,8 +633,8 @@
 {
     PicYuv* reconPic = m_frame->m_reconPic;
     intptr_t stride = reconPic->m_strideC;
-    int ctuWidth  = g_maxCUSize;
-    int ctuHeight = g_maxCUSize;
+    int ctuWidth  = m_param->maxCUSize;
+    int ctuHeight = m_param->maxCUSize;
 
     {
         ctuWidth  >>= m_hChromaShift;
@@ -744,8 +744,8 @@
     intptr_t stride = plane ? reconPic->m_strideC : reconPic->m_stride;
     uint32_t picWidth  = m_param->sourceWidth;
     uint32_t picHeight = m_param->sourceHeight;
-    int ctuWidth  = g_maxCUSize;
-    int ctuHeight = g_maxCUSize;
+    int ctuWidth  = m_param->maxCUSize;
+    int ctuHeight = m_param->maxCUSize;
     uint32_t lpelx = cu->m_cuPelX;
     uint32_t tpely = cu->m_cuPelY;
     const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
@@ -791,9 +791,9 @@
         // WARNING: *) May read beyond bound on video than ctuWidth or ctuHeight is NOT multiple of cuSize
         X265_CHECK((ctuWidth == ctuHeight) || (m_chromaFormat != X265_CSP_I420), "video size check failure\n");
         if (plane)
-            primitives.chroma[m_chromaFormat].cu[g_maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
+            primitives.chroma[m_chromaFormat].cu[m_param->maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
         else
-           primitives.cu[g_maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
+           primitives.cu[m_param->maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
     }
     else
     {
@@ -928,8 +928,8 @@
     intptr_t stride = reconPic->m_stride;
     uint32_t picWidth  = m_param->sourceWidth;
     uint32_t picHeight = m_param->sourceHeight;
-    int ctuWidth  = g_maxCUSize;
-    int ctuHeight = g_maxCUSize;
+    int ctuWidth  = m_param->maxCUSize;
+    int ctuHeight = m_param->maxCUSize;
     uint32_t lpelx = cu->m_cuPelX;
     uint32_t tpely = cu->m_cuPelY;
     const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
@@ -1553,14 +1553,17 @@
     }
 
     // Estimate Best Position
-    int64_t bestRDCostBO = MAX_INT64;
     int32_t bestClassBO  = 0;
+    int64_t currentRDCost = costClasses[0];
+    currentRDCost += costClasses[1];
+    currentRDCost += costClasses[2];
+    currentRDCost += costClasses[3];
+    int64_t bestRDCostBO = currentRDCost;
 
-    for (int i = 0; i < MAX_NUM_SAO_CLASS - SAO_NUM_OFFSET + 1; i++)
+    for (int i = 1; i < MAX_NUM_SAO_CLASS - SAO_NUM_OFFSET + 1; i++)
     {
-        int64_t currentRDCost = 0;
-        for (int j = i; j < i + SAO_NUM_OFFSET; j++)
-            currentRDCost += costClasses[j];
+        currentRDCost -= costClasses[i - 1];
+        currentRDCost += costClasses[i + 3];
 
         if (currentRDCost < bestRDCostBO)
         {
​

x265_2.4.tar.gz/source/encoder/search.cpp -> x265_2.5.tar.gz/source/encoder/search.cpp Changed

@@ -120,8 +120,8 @@
             CHECKED_MALLOC(m_rqt[i].coeffRQT[0], coeff_t, sizeL + sizeC * 2);
             m_rqt[i].coeffRQT[1] = m_rqt[i].coeffRQT[0] + sizeL;
             m_rqt[i].coeffRQT[2] = m_rqt[i].coeffRQT[0] + sizeL + sizeC;
-            ok &= m_rqt[i].reconQtYuv.create(g_maxCUSize, param.internalCsp);
-            ok &= m_rqt[i].resiQtYuv.create(g_maxCUSize, param.internalCsp);
+            ok &= m_rqt[i].reconQtYuv.create(param.maxCUSize, param.internalCsp);
+            ok &= m_rqt[i].resiQtYuv.create(param.maxCUSize, param.internalCsp);
         }
     }
     else
@@ -130,15 +130,15 @@
         {
             CHECKED_MALLOC(m_rqt[i].coeffRQT[0], coeff_t, sizeL);
             m_rqt[i].coeffRQT[1] = m_rqt[i].coeffRQT[2] = NULL;
-            ok &= m_rqt[i].reconQtYuv.create(g_maxCUSize, param.internalCsp);
-            ok &= m_rqt[i].resiQtYuv.create(g_maxCUSize, param.internalCsp);
+            ok &= m_rqt[i].reconQtYuv.create(param.maxCUSize, param.internalCsp);
+            ok &= m_rqt[i].resiQtYuv.create(param.maxCUSize, param.internalCsp);
         }
     }
 
     /* the rest of these buffers are indexed per-depth */
-    for (uint32_t i = 0; i <= g_maxCUDepth; i++)
+    for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
     {
-        int cuSize = g_maxCUSize >> i;
+        int cuSize = param.maxCUSize >> i;
         ok &= m_rqt[i].tmpResiYuv.create(cuSize, param.internalCsp);
         ok &= m_rqt[i].tmpPredYuv.create(cuSize, param.internalCsp);
         ok &= m_rqt[i].bidirPredYuv[0].create(cuSize, param.internalCsp);
@@ -186,7 +186,7 @@
         m_rqt[i].resiQtYuv.destroy();
     }
 
-    for (uint32_t i = 0; i <= g_maxCUDepth; i++)
+    for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
     {
         m_rqt[i].tmpResiYuv.destroy();
         m_rqt[i].tmpPredYuv.destroy();
@@ -2073,7 +2073,7 @@
     int mvpIdx = selectMVP(interMode.cu, pu, amvp, list, ref);
     MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx];
 
-    if (!m_param->analysisMode) /* Prevents load/save outputs from diverging if lowresMV is not available */
+    if (!m_param->analysisReuseMode) /* Prevents load/save outputs from diverging if lowresMV is not available */
     {
         MV lmv = getLowresMV(interMode.cu, pu, list, ref);
         if (lmv.notZero())
@@ -2082,7 +2082,7 @@
 
     setSearchRange(interMode.cu, mvp, m_param->searchRange, mvmin, mvmax);
 
-    int satdCost = m_me.motionEstimate(&m_slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, 
+    int satdCost = m_me.motionEstimate(&m_slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices, 
       m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
 
     /* Get total cost of partition, but only include MV bit cost once */
@@ -2108,6 +2108,17 @@
     }
 }
 
+void Search::searchMV(Mode& interMode, const PredictionUnit& pu, int list, int ref, MV& outmv)
+{
+    CUData& cu = interMode.cu;
+    const Slice *slice = m_slice;
+    MV mv = cu.m_mv[list][pu.puAbsPartIdx];
+    cu.clipMv(mv);
+    MV mvmin, mvmax;
+    setSearchRange(cu, mv, m_param->searchRange, mvmin, mvmax);
+    m_me.refineMV(&slice->m_mref[list][ref], mvmin, mvmax, mv, outmv);
+}
+
 /* find the best inter prediction for each PU of specified mode */
 void Search::predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChromaMC, uint32_t refMasks[2])
 {
@@ -2150,7 +2161,7 @@
         cu.getNeighbourMV(puIdx, pu.puAbsPartIdx, interMode.interNeighbours);
 
         /* Uni-directional prediction */
-        if ((m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
+        if ((m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
             || (m_param->analysisMultiPassRefine && m_param->rc.bStatRead))
         {
             for (int list = 0; list < numPredDir; list++)
@@ -2180,7 +2191,7 @@
                 if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && mvpIdx == bestME[list].mvpIdx)
                     mvpIn = bestME[list].mv;
                     
-                int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvpIn, numMvc, mvc, m_param->searchRange, outmv,
+                int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvpIn, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices, 
                   m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
 
                 /* Get total cost of partition, but only include MV bit cost once */
@@ -2286,7 +2297,7 @@
                     int mvpIdx = selectMVP(cu, pu, amvp, list, ref);
                     MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx];
 
-                    if (!m_param->analysisMode) /* Prevents load/save outputs from diverging when lowresMV is not available */
+                    if (!m_param->analysisReuseMode) /* Prevents load/save outputs from diverging when lowresMV is not available */
                     {
                         MV lmv = getLowresMV(cu, pu, list, ref);
                         if (lmv.notZero())
@@ -2300,7 +2311,7 @@
                             m_me.integral[planes] = interMode.fencYuv->m_integral[list][ref][planes] + puX * pu.width + puY * pu.height * m_slice->m_refFrameList[list][ref]->m_reconPic->m_stride;
                     }
                     setSearchRange(cu, mvp, m_param->searchRange, mvmin, mvmax);
-                    int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, 
+                    int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices, 
                       m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
 
                     /* Get total cost of partition, but only include MV bit cost once */
@@ -2582,11 +2593,11 @@
     cu.clipMv(mvmax);
 
     if (cu.m_encData->m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
-          cu.m_cuPelX / g_maxCUSize < m_frame->m_encData->m_pir.pirStartCol &&
+          cu.m_cuPelX / m_param->maxCUSize < m_frame->m_encData->m_pir.pirStartCol &&
           m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol < m_slice->m_sps->numCuInWidth)
     {
         int safeX, maxSafeMv;
-        safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * g_maxCUSize - 3;
+        safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * m_param->maxCUSize - 3;
         maxSafeMv = (safeX - cu.m_cuPelX) * 4;
         mvmax.x = X265_MIN(mvmax.x, maxSafeMv);
         mvmin.x = X265_MIN(mvmin.x, maxSafeMv);

 
@@ -120,8 +120,8 @@
             CHECKED_MALLOC(m_rqt[i].coeffRQT[0], coeff_t, sizeL + sizeC * 2);
             m_rqt[i].coeffRQT[1] = m_rqt[i].coeffRQT[0] + sizeL;
             m_rqt[i].coeffRQT[2] = m_rqt[i].coeffRQT[0] + sizeL + sizeC;
-            ok &= m_rqt[i].reconQtYuv.create(g_maxCUSize, param.internalCsp);
-            ok &= m_rqt[i].resiQtYuv.create(g_maxCUSize, param.internalCsp);
+            ok &= m_rqt[i].reconQtYuv.create(param.maxCUSize, param.internalCsp);
+            ok &= m_rqt[i].resiQtYuv.create(param.maxCUSize, param.internalCsp);
         }
     }
     else
@@ -130,15 +130,15 @@
         {
             CHECKED_MALLOC(m_rqt[i].coeffRQT[0], coeff_t, sizeL);
             m_rqt[i].coeffRQT[1] = m_rqt[i].coeffRQT[2] = NULL;
-            ok &= m_rqt[i].reconQtYuv.create(g_maxCUSize, param.internalCsp);
-            ok &= m_rqt[i].resiQtYuv.create(g_maxCUSize, param.internalCsp);
+            ok &= m_rqt[i].reconQtYuv.create(param.maxCUSize, param.internalCsp);
+            ok &= m_rqt[i].resiQtYuv.create(param.maxCUSize, param.internalCsp);
         }
     }
 
     /* the rest of these buffers are indexed per-depth */
-    for (uint32_t i = 0; i <= g_maxCUDepth; i++)
+    for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
     {
-        int cuSize = g_maxCUSize >> i;
+        int cuSize = param.maxCUSize >> i;
         ok &= m_rqt[i].tmpResiYuv.create(cuSize, param.internalCsp);
         ok &= m_rqt[i].tmpPredYuv.create(cuSize, param.internalCsp);
         ok &= m_rqt[i].bidirPredYuv[0].create(cuSize, param.internalCsp);
@@ -186,7 +186,7 @@
         m_rqt[i].resiQtYuv.destroy();
     }
 
-    for (uint32_t i = 0; i <= g_maxCUDepth; i++)
+    for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
     {
         m_rqt[i].tmpResiYuv.destroy();
         m_rqt[i].tmpPredYuv.destroy();
@@ -2073,7 +2073,7 @@
     int mvpIdx = selectMVP(interMode.cu, pu, amvp, list, ref);
     MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx];
 
-    if (!m_param->analysisMode) /* Prevents load/save outputs from diverging if lowresMV is not available */
+    if (!m_param->analysisReuseMode) /* Prevents load/save outputs from diverging if lowresMV is not available */
     {
         MV lmv = getLowresMV(interMode.cu, pu, list, ref);
         if (lmv.notZero())
@@ -2082,7 +2082,7 @@
 
     setSearchRange(interMode.cu, mvp, m_param->searchRange, mvmin, mvmax);
 
-    int satdCost = m_me.motionEstimate(&m_slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, 
+    int satdCost = m_me.motionEstimate(&m_slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices, 
       m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
 
     /* Get total cost of partition, but only include MV bit cost once */
@@ -2108,6 +2108,17 @@
     }
 }
 
+void Search::searchMV(Mode& interMode, const PredictionUnit& pu, int list, int ref, MV& outmv)
+{
+    CUData& cu = interMode.cu;
+    const Slice *slice = m_slice;
+    MV mv = cu.m_mv[list][pu.puAbsPartIdx];
+    cu.clipMv(mv);
+    MV mvmin, mvmax;
+    setSearchRange(cu, mv, m_param->searchRange, mvmin, mvmax);
+    m_me.refineMV(&slice->m_mref[list][ref], mvmin, mvmax, mv, outmv);
+}
+
 /* find the best inter prediction for each PU of specified mode */
 void Search::predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChromaMC, uint32_t refMasks[2])
 {
@@ -2150,7 +2161,7 @@
         cu.getNeighbourMV(puIdx, pu.puAbsPartIdx, interMode.interNeighbours);
 
         /* Uni-directional prediction */
-        if ((m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
+        if ((m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
             || (m_param->analysisMultiPassRefine && m_param->rc.bStatRead))
         {
             for (int list = 0; list < numPredDir; list++)
@@ -2180,7 +2191,7 @@
                 if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && mvpIdx == bestME[list].mvpIdx)
                     mvpIn = bestME[list].mv;
                     
-                int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvpIn, numMvc, mvc, m_param->searchRange, outmv,
+                int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvpIn, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices, 
                   m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
 
                 /* Get total cost of partition, but only include MV bit cost once */
@@ -2286,7 +2297,7 @@
                     int mvpIdx = selectMVP(cu, pu, amvp, list, ref);
                     MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx];
 
-                    if (!m_param->analysisMode) /* Prevents load/save outputs from diverging when lowresMV is not available */
+                    if (!m_param->analysisReuseMode) /* Prevents load/save outputs from diverging when lowresMV is not available */
                     {
                         MV lmv = getLowresMV(cu, pu, list, ref);
                         if (lmv.notZero())
@@ -2300,7 +2311,7 @@
                             m_me.integral[planes] = interMode.fencYuv->m_integral[list][ref][planes] + puX * pu.width + puY * pu.height * m_slice->m_refFrameList[list][ref]->m_reconPic->m_stride;
                     }
                     setSearchRange(cu, mvp, m_param->searchRange, mvmin, mvmax);
-                    int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, 
+                    int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices, 
                       m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
 
                     /* Get total cost of partition, but only include MV bit cost once */
@@ -2582,11 +2593,11 @@
     cu.clipMv(mvmax);
 
     if (cu.m_encData->m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
-          cu.m_cuPelX / g_maxCUSize < m_frame->m_encData->m_pir.pirStartCol &&
+          cu.m_cuPelX / m_param->maxCUSize < m_frame->m_encData->m_pir.pirStartCol &&
           m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol < m_slice->m_sps->numCuInWidth)
     {
         int safeX, maxSafeMv;
-        safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * g_maxCUSize - 3;
+        safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * m_param->maxCUSize - 3;
         maxSafeMv = (safeX - cu.m_cuPelX) * 4;
         mvmax.x = X265_MIN(mvmax.x, maxSafeMv);
         mvmin.x = X265_MIN(mvmin.x, maxSafeMv);
​

x265_2.4.tar.gz/source/encoder/search.h -> x265_2.5.tar.gz/source/encoder/search.h Changed

 
@@ -204,9 +204,9 @@
         memset(this, 0, sizeof(*this));
     }
 
-    void accumulate(CUStats& other)
+    void accumulate(CUStats& other, x265_param& param)
     {
-        for (uint32_t i = 0; i <= g_maxCUDepth; i++)
+        for (uint32_t i = 0; i <= param.maxCUDepth; i++)
         {
             intraRDOElapsedTime[i] += other.intraRDOElapsedTime[i];
             interRDOElapsedTime[i] += other.interRDOElapsedTime[i];
@@ -311,6 +311,7 @@
     // estimation inter prediction (non-skip)
     void     predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChromaMC, uint32_t masks[2]);
 
+    void     searchMV(Mode& interMode, const PredictionUnit& pu, int list, int ref, MV& outmv);
     // encode residual and compute rd-cost for inter mode
     void     encodeResAndCalcRdInterCU(Mode& interMode, const CUGeom& cuGeom);
     void     encodeResAndCalcRdSkipCU(Mode& interMode);
​

x265_2.4.tar.gz/source/encoder/sei.cpp -> x265_2.5.tar.gz/source/encoder/sei.cpp Changed

@@ -54,21 +54,23 @@
     }
     WRITE_CODE(type, 8, "payload_type");
     uint32_t payloadSize;
-    if (hrdTypes || m_payloadType == USER_DATA_UNREGISTERED)
+    if (hrdTypes || m_payloadType == USER_DATA_UNREGISTERED || m_payloadType == USER_DATA_REGISTERED_ITU_T_T35)
     {
         if (hrdTypes)
         {
             X265_CHECK(0 == (count.getNumberOfWrittenBits() & 7), "payload unaligned\n");
             payloadSize = count.getNumberOfWrittenBits() >> 3;
         }
-        else
+        else if (m_payloadType == USER_DATA_UNREGISTERED)
             payloadSize = m_payloadSize + 16;
+        else
+            payloadSize = m_payloadSize;
 
         for (; payloadSize >= 0xff; payloadSize -= 0xff)
             WRITE_CODE(0xff, 8, "payload_size");
         WRITE_CODE(payloadSize, 8, "payload_size");
     }
-    else if(m_payloadType != USER_DATA_REGISTERED_ITU_T_T35)
+    else
         WRITE_CODE(m_payloadSize, 8, "payload_size");
     /* virtual writeSEI method, write to bs */
     writeSEI(sps);

 
@@ -54,21 +54,23 @@
     }
     WRITE_CODE(type, 8, "payload_type");
     uint32_t payloadSize;
-    if (hrdTypes || m_payloadType == USER_DATA_UNREGISTERED)
+    if (hrdTypes || m_payloadType == USER_DATA_UNREGISTERED || m_payloadType == USER_DATA_REGISTERED_ITU_T_T35)
     {
         if (hrdTypes)
         {
             X265_CHECK(0 == (count.getNumberOfWrittenBits() & 7), "payload unaligned\n");
             payloadSize = count.getNumberOfWrittenBits() >> 3;
         }
-        else
+        else if (m_payloadType == USER_DATA_UNREGISTERED)
             payloadSize = m_payloadSize + 16;
+        else
+            payloadSize = m_payloadSize;
 
         for (; payloadSize >= 0xff; payloadSize -= 0xff)
             WRITE_CODE(0xff, 8, "payload_size");
         WRITE_CODE(payloadSize, 8, "payload_size");
     }
-    else if(m_payloadType != USER_DATA_REGISTERED_ITU_T_T35)
+    else
         WRITE_CODE(m_payloadSize, 8, "payload_size");
     /* virtual writeSEI method, write to bs */
     writeSEI(sps);
​

x265_2.4.tar.gz/source/encoder/sei.h -> x265_2.5.tar.gz/source/encoder/sei.h Changed

 
@@ -276,27 +276,17 @@
         m_payloadSize = 0;
     }
 
-    uint8_t *cim;
+    uint8_t *m_payload;
 
     // daniel.vt@samsung.com :: for the Creative Intent Meta Data Encoding ( seongnam.oh@samsung.com )
     void writeSEI(const SPS&)
     {
-        if (!cim)
+        if (!m_payload)
             return;
 
-        int i = 0;
-        int payloadSize = m_payloadSize;
-        while (cim[i] == 0xFF)
-        {
-            i++;
-            payloadSize += cim[i];
-            WRITE_CODE(0xFF, 8, "payload_size");
-        }
-        WRITE_CODE(payloadSize, 8, "payload_size");
-        i++;
-        payloadSize += i;
-        for (; i < payloadSize; ++i)
-            WRITE_CODE(cim[i], 8, "creative_intent_metadata");
+        uint32_t i = 0;
+        for (; i < m_payloadSize; ++i)
+            WRITE_CODE(m_payload[i], 8, "creative_intent_metadata");
     }
 };
 }
​

x265_2.4.tar.gz/source/encoder/slicetype.cpp -> x265_2.5.tar.gz/source/encoder/slicetype.cpp Changed

@@ -893,7 +893,7 @@
     if (m_param->rc.cuTree && !m_param->rc.bStatRead)
         /* update row satds based on cutree offsets */
         curFrame->m_lowres.satdCost = frameCostRecalculate(frames, p0, p1, b);
-    else if (m_param->analysisMode != X265_ANALYSIS_LOAD)
+    else if (m_param->analysisReuseMode != X265_ANALYSIS_LOAD || m_param->scaleFactor)
     {
         if (m_param->rc.aqMode)
             curFrame->m_lowres.satdCost = curFrame->m_lowres.costEstAq[b - p0][p1 - b];
@@ -907,7 +907,7 @@
         curFrame->m_lowres.lowresCostForRc = curFrame->m_lowres.lowresCosts[b - p0][p1 - b];
         uint32_t lowresRow = 0, lowresCol = 0, lowresCuIdx = 0, sum = 0, intraSum = 0;
         uint32_t scale = m_param->maxCUSize / (2 * X265_LOWRES_CU_SIZE);
-        uint32_t numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
+        uint32_t numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
         uint32_t widthInLowresCu = (uint32_t)m_8x8Width, heightInLowresCu = (uint32_t)m_8x8Height;
         double *qp_offset = 0;
         /* Factor in qpoffsets based on Aq/Cutree in CU costs */
@@ -1638,6 +1638,13 @@
             m_isSceneTransition = false; /* Signal end of scene transitioning */
     }
 
+    if (m_param->csvLogLevel >= 2)
+    {
+        int64_t icost = frames[p1]->costEst[0][0];
+        int64_t pcost = frames[p1]->costEst[p1 - p0][0];
+        frames[p1]->ipCostRatio = (double)icost / pcost;
+    }
+
     /* A frame is always analysed with bRealScenecut = true first, and then bRealScenecut = false,
        the former for I decisions and the latter for P/B decisions. It's possible that the first 
        analysis detected scenecuts which were later nulled due to scene transitioning, in which 
@@ -1812,7 +1819,8 @@
                     MV *mvs = frames[b]->lowresMvs[list][listDist[list]];
                     int32_t x = mvs[cuIndex].x;
                     int32_t y = mvs[cuIndex].y;
-                    displacement += sqrt(pow(abs(x), 2) + pow(abs(y), 2));
+                    // NOTE: the dynamic range of abs(x) and abs(y) is 15-bits
+                    displacement += sqrt((double)(abs(x) * abs(x)) + (double)(abs(y) * abs(y)));
                 }
                 else
                     displacement += 0.0;
@@ -2400,7 +2408,7 @@
 
         /* ME will never return a cost larger than the cost @MVP, so we do not
          * have to check that ME cost is more than the estimated merge cost */
-        fencCost = tld.me.motionEstimate(fref, mvmin, mvmax, mvp, 0, NULL, s_merange, *fencMV);
+        fencCost = tld.me.motionEstimate(fref, mvmin, mvmax, mvp, 0, NULL, s_merange, *fencMV, m_lookahead.m_param->maxSlices);
         if (skipCost < 64 && skipCost < fencCost && bBidir)
         {
             fencCost = skipCost;

 
@@ -893,7 +893,7 @@
     if (m_param->rc.cuTree && !m_param->rc.bStatRead)
         /* update row satds based on cutree offsets */
         curFrame->m_lowres.satdCost = frameCostRecalculate(frames, p0, p1, b);
-    else if (m_param->analysisMode != X265_ANALYSIS_LOAD)
+    else if (m_param->analysisReuseMode != X265_ANALYSIS_LOAD || m_param->scaleFactor)
     {
         if (m_param->rc.aqMode)
             curFrame->m_lowres.satdCost = curFrame->m_lowres.costEstAq[b - p0][p1 - b];
@@ -907,7 +907,7 @@
         curFrame->m_lowres.lowresCostForRc = curFrame->m_lowres.lowresCosts[b - p0][p1 - b];
         uint32_t lowresRow = 0, lowresCol = 0, lowresCuIdx = 0, sum = 0, intraSum = 0;
         uint32_t scale = m_param->maxCUSize / (2 * X265_LOWRES_CU_SIZE);
-        uint32_t numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
+        uint32_t numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
         uint32_t widthInLowresCu = (uint32_t)m_8x8Width, heightInLowresCu = (uint32_t)m_8x8Height;
         double *qp_offset = 0;
         /* Factor in qpoffsets based on Aq/Cutree in CU costs */
@@ -1638,6 +1638,13 @@
             m_isSceneTransition = false; /* Signal end of scene transitioning */
     }
 
+    if (m_param->csvLogLevel >= 2)
+    {
+        int64_t icost = frames[p1]->costEst[0][0];
+        int64_t pcost = frames[p1]->costEst[p1 - p0][0];
+        frames[p1]->ipCostRatio = (double)icost / pcost;
+    }
+
     /* A frame is always analysed with bRealScenecut = true first, and then bRealScenecut = false,
        the former for I decisions and the latter for P/B decisions. It's possible that the first 
        analysis detected scenecuts which were later nulled due to scene transitioning, in which 
@@ -1812,7 +1819,8 @@
                     MV *mvs = frames[b]->lowresMvs[list][listDist[list]];
                     int32_t x = mvs[cuIndex].x;
                     int32_t y = mvs[cuIndex].y;
-                    displacement += sqrt(pow(abs(x), 2) + pow(abs(y), 2));
+                    // NOTE: the dynamic range of abs(x) and abs(y) is 15-bits
+                    displacement += sqrt((double)(abs(x) * abs(x)) + (double)(abs(y) * abs(y)));
                 }
                 else
                     displacement += 0.0;
@@ -2400,7 +2408,7 @@
 
         /* ME will never return a cost larger than the cost @MVP, so we do not
          * have to check that ME cost is more than the estimated merge cost */
-        fencCost = tld.me.motionEstimate(fref, mvmin, mvmax, mvp, 0, NULL, s_merange, *fencMV);
+        fencCost = tld.me.motionEstimate(fref, mvmin, mvmax, mvp, 0, NULL, s_merange, *fencMV, m_lookahead.m_param->maxSlices);
         if (skipCost < 64 && skipCost < fencCost && bBidir)
         {
             fencCost = skipCost;
​

x265_2.4.tar.gz/source/test/ipfilterharness.cpp -> x265_2.5.tar.gz/source/test/ipfilterharness.cpp Changed

 
@@ -38,10 +38,8 @@
     {
         pixel_test_buff[0][i] = rand() & PIXEL_MAX;
         short_test_buff[0][i] = (rand() % (2 * SMAX)) - SMAX;
-
         pixel_test_buff[1][i] = PIXEL_MIN;
-        short_test_buff[1][i] = SMIN;
-
+        short_test_buff[1][i] = (int16_t)SMIN;
         pixel_test_buff[2][i] = PIXEL_MAX;
         short_test_buff[2][i] = SMAX;
     }
​

x265_2.4.tar.gz/source/test/ipfilterharness.h -> x265_2.5.tar.gz/source/test/ipfilterharness.h Changed

 
@@ -39,8 +39,7 @@
     enum { ITERS = 100 };
     enum { TEST_CASES = 3 };
     enum { SMAX = 1 << 12 };
-    enum { SMIN = -1 << 12 };
-
+    enum { SMIN = (unsigned)-1 << 12 };
     ALIGN_VAR_32(pixel, pixel_buff[TEST_BUF_SIZE]);
     int16_t short_buff[TEST_BUF_SIZE];
     int16_t IPF_vec_output_s[TEST_BUF_SIZE];
​

x265_2.4.tar.gz/source/test/pixelharness.cpp -> x265_2.5.tar.gz/source/test/pixelharness.cpp Changed

@@ -44,9 +44,8 @@
         uchar_test_buff[0][i]   = rand() % ((1 << 8) - 1);
         residual_test_buff[0][i] = (rand() % (2 * RMAX + 1)) - RMAX - 1;// For sse_ss only
         double_test_buff[0][i]  = (double)(short_test_buff[0][i]) / 256.0;
-
         pixel_test_buff[1][i]   = PIXEL_MIN;
-        short_test_buff[1][i]   = SMIN;
+        short_test_buff[1][i]   = (int16_t)SMIN;
         short_test_buff1[1][i]  = PIXEL_MIN;
         short_test_buff2[1][i]  = -16384;
         int_test_buff[1][i]     = SHORT_MIN;
@@ -2003,6 +2002,76 @@
     return true;
 }
 
+bool PixelHarness::check_integral_initv(integralv_t ref, integralv_t opt)
+{
+    intptr_t srcStep = 64;
+    int j = 0;
+    uint32_t dst_ref[BUFFSIZE] = { 0 };
+    uint32_t dst_opt[BUFFSIZE] = { 0 };
+
+    for (int i = 0; i < 64; i++)
+    {
+        dst_ref[i] = pixel_test_buff[0][i];
+        dst_opt[i] = pixel_test_buff[0][i];
+    }
+
+    for (int i = 0, k = 0; i < BUFFSIZE; i++)
+    {
+        if (i % 64 == 0)
+            k++;
+        dst_ref[i] = dst_ref[i % 64] + k;
+        dst_opt[i] = dst_opt[i % 64] + k;
+    }
+
+    int padx = 4;
+    int pady = 4;
+    uint32_t *dst_ref_ptr = dst_ref + srcStep * pady + padx;
+    uint32_t *dst_opt_ptr = dst_opt + srcStep * pady + padx;
+    for (int i = 0; i < ITERS; i++)
+    {
+        ref(dst_ref_ptr, srcStep);
+        checked(opt, dst_opt_ptr, srcStep);
+
+        if (memcmp(dst_ref, dst_opt, sizeof(uint32_t) * BUFFSIZE))
+            return false;
+
+        reportfail()
+            j += INCR;
+    }
+    return true;
+}
+
+bool PixelHarness::check_integral_inith(integralh_t ref, integralh_t opt)
+{
+    /* Since stride is always a multiple of 8 and data movement in AVX2 is 16 elements at a time for 8 bit pixel, we need
+     * to check correctness for two cases: stride multiple of 16 and stride not a multiple of 16; fine for High bit depth
+     * where data movement in AVX2 is 8 elements at a time */
+    intptr_t srcStep[2] = { 56, 64 };
+    int j = 0;
+    uint32_t dst_ref[BUFFSIZE] = { 0 };
+    uint32_t dst_opt[BUFFSIZE] = { 0 };
+
+    int padx = 4;
+    int pady = 4;
+    for (int l = 0; l < 2; l++)
+    {
+        uint32_t *dst_ref_ptr = dst_ref + srcStep[l] * pady + padx;
+        uint32_t *dst_opt_ptr = dst_opt + srcStep[l] * pady + padx;
+        for (int k = 0; k < ITERS; k++)
+        {
+            ref(dst_ref_ptr, pixel_test_buff[0], srcStep[l]);
+            checked(opt, dst_opt_ptr, pixel_test_buff[0], srcStep[l]);
+
+            if (memcmp(dst_ref, dst_opt, sizeof(uint32_t) * BUFFSIZE))
+                return false;
+
+            reportfail()
+                j += INCR;
+        }
+    }
+    return true;
+}
+
 bool PixelHarness::testPU(int part, const EncoderPrimitives& ref, const EncoderPrimitives& opt)
 {
     if (opt.pu[part].satd)
@@ -2688,6 +2757,64 @@
         }
     }
 
+    for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
+    {
+        if (opt.integral_initv[k] && !check_integral_initv(ref.integral_initv[k], opt.integral_initv[k]))
+        {
+            switch (k)
+            {
+            case 0:
+                printf("Integral4v failed!\n");
+                break;
+            case 1:
+                printf("Integral8v failed!\n");
+                break;
+            case 2:
+                printf("Integral12v failed!\n");
+                break;
+            case 3:
+                printf("Integral16v failed!\n");
+                break;
+            case 4:
+                printf("Integral24v failed!\n");
+                break;
+            case 5:
+                printf("Integral32v failed!\n");
+                break;
+            }
+            return false;
+        }
+    }
+
+
+    for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
+    {
+        if (opt.integral_inith[k] && !check_integral_inith(ref.integral_inith[k], opt.integral_inith[k]))
+        {
+            switch (k)
+            {
+                case 0:
+                    printf("Integral4h failed!\n");
+                    break;
+                case 1:
+                    printf("Integral8h failed!\n");
+                    break;
+                case 2:
+                    printf("Integral12h failed!\n");
+                    break;
+                case 3:
+                    printf("Integral16h failed!\n");
+                    break;
+                case 4:
+                    printf("Integral24h failed!\n");
+                    break;
+                case 5:
+                    printf("Integral32h failed!\n");
+                    break;
+            }
+            return false;
+        }
+    }
     return true;
 }
 
@@ -3210,4 +3337,67 @@
         HEADER0("pelFilterChroma_Horizontal");
         REPORT_SPEEDUP(opt.pelFilterChroma[1], ref.pelFilterChroma[1], pbuf1, 1, STRIDE, tc, maskP, maskQ);
     }
+
+    for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
+    {
+        if (opt.integral_initv[k])
+        {
+            switch (k)
+            {
+                case 0:
+                    HEADER0("integral_init4v");
+                    break;
+                case 1:
+                    HEADER0("integral_init8v");
+                    break;
+                case 2:
+                    HEADER0("integral_init12v");
+                    break;
+                case 3:
+                    HEADER0("integral_init16v");
+                    break;
+                case 4:
+                    HEADER0("integral_init24v");
+                    break;
+                case 5:
+                    HEADER0("integral_init32v");
+                    break;
+                default:
+                    break;
+            }
+            REPORT_SPEEDUP(opt.integral_initv[k], ref.integral_initv[k], (uint32_t*)pbuf1, STRIDE);
+        }
+    }
+
+    for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
+    {
+        if (opt.integral_inith[k])
+        {
+            uint32_t dst_buf[BUFFSIZE] = { 0 };
+            switch (k)
+            {
+            case 0:
+                HEADER0("integral_init4h");
+                break;
+            case 1:

 
@@ -44,9 +44,8 @@
         uchar_test_buff[0][i]   = rand() % ((1 << 8) - 1);
         residual_test_buff[0][i] = (rand() % (2 * RMAX + 1)) - RMAX - 1;// For sse_ss only
         double_test_buff[0][i]  = (double)(short_test_buff[0][i]) / 256.0;
-
         pixel_test_buff[1][i]   = PIXEL_MIN;
-        short_test_buff[1][i]   = SMIN;
+        short_test_buff[1][i]   = (int16_t)SMIN;
         short_test_buff1[1][i]  = PIXEL_MIN;
         short_test_buff2[1][i]  = -16384;
         int_test_buff[1][i]     = SHORT_MIN;
@@ -2003,6 +2002,76 @@
     return true;
 }
 
+bool PixelHarness::check_integral_initv(integralv_t ref, integralv_t opt)
+{
+    intptr_t srcStep = 64;
+    int j = 0;
+    uint32_t dst_ref[BUFFSIZE] = { 0 };
+    uint32_t dst_opt[BUFFSIZE] = { 0 };
+
+    for (int i = 0; i < 64; i++)
+    {
+        dst_ref[i] = pixel_test_buff[0][i];
+        dst_opt[i] = pixel_test_buff[0][i];
+    }
+
+    for (int i = 0, k = 0; i < BUFFSIZE; i++)
+    {
+        if (i % 64 == 0)
+            k++;
+        dst_ref[i] = dst_ref[i % 64] + k;
+        dst_opt[i] = dst_opt[i % 64] + k;
+    }
+
+    int padx = 4;
+    int pady = 4;
+    uint32_t *dst_ref_ptr = dst_ref + srcStep * pady + padx;
+    uint32_t *dst_opt_ptr = dst_opt + srcStep * pady + padx;
+    for (int i = 0; i < ITERS; i++)
+    {
+        ref(dst_ref_ptr, srcStep);
+        checked(opt, dst_opt_ptr, srcStep);
+
+        if (memcmp(dst_ref, dst_opt, sizeof(uint32_t) * BUFFSIZE))
+            return false;
+
+        reportfail()
+            j += INCR;
+    }
+    return true;
+}
+
+bool PixelHarness::check_integral_inith(integralh_t ref, integralh_t opt)
+{
+    /* Since stride is always a multiple of 8 and data movement in AVX2 is 16 elements at a time for 8 bit pixel, we need
+     * to check correctness for two cases: stride multiple of 16 and stride not a multiple of 16; fine for High bit depth
+     * where data movement in AVX2 is 8 elements at a time */
+    intptr_t srcStep[2] = { 56, 64 };
+    int j = 0;
+    uint32_t dst_ref[BUFFSIZE] = { 0 };
+    uint32_t dst_opt[BUFFSIZE] = { 0 };
+
+    int padx = 4;
+    int pady = 4;
+    for (int l = 0; l < 2; l++)
+    {
+        uint32_t *dst_ref_ptr = dst_ref + srcStep[l] * pady + padx;
+        uint32_t *dst_opt_ptr = dst_opt + srcStep[l] * pady + padx;
+        for (int k = 0; k < ITERS; k++)
+        {
+            ref(dst_ref_ptr, pixel_test_buff[0], srcStep[l]);
+            checked(opt, dst_opt_ptr, pixel_test_buff[0], srcStep[l]);
+
+            if (memcmp(dst_ref, dst_opt, sizeof(uint32_t) * BUFFSIZE))
+                return false;
+
+            reportfail()
+                j += INCR;
+        }
+    }
+    return true;
+}
+
 bool PixelHarness::testPU(int part, const EncoderPrimitives& ref, const EncoderPrimitives& opt)
 {
     if (opt.pu[part].satd)
@@ -2688,6 +2757,64 @@
         }
     }
 
+    for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
+    {
+        if (opt.integral_initv[k] && !check_integral_initv(ref.integral_initv[k], opt.integral_initv[k]))
+        {
+            switch (k)
+            {
+            case 0:
+                printf("Integral4v failed!\n");
+                break;
+            case 1:
+                printf("Integral8v failed!\n");
+                break;
+            case 2:
+                printf("Integral12v failed!\n");
+                break;
+            case 3:
+                printf("Integral16v failed!\n");
+                break;
+            case 4:
+                printf("Integral24v failed!\n");
+                break;
+            case 5:
+                printf("Integral32v failed!\n");
+                break;
+            }
+            return false;
+        }
+    }
+
+
+    for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
+    {
+        if (opt.integral_inith[k] && !check_integral_inith(ref.integral_inith[k], opt.integral_inith[k]))
+        {
+            switch (k)
+            {
+                case 0:
+                    printf("Integral4h failed!\n");
+                    break;
+                case 1:
+                    printf("Integral8h failed!\n");
+                    break;
+                case 2:
+                    printf("Integral12h failed!\n");
+                    break;
+                case 3:
+                    printf("Integral16h failed!\n");
+                    break;
+                case 4:
+                    printf("Integral24h failed!\n");
+                    break;
+                case 5:
+                    printf("Integral32h failed!\n");
+                    break;
+            }
+            return false;
+        }
+    }
     return true;
 }
 
@@ -3210,4 +3337,67 @@
         HEADER0("pelFilterChroma_Horizontal");
         REPORT_SPEEDUP(opt.pelFilterChroma[1], ref.pelFilterChroma[1], pbuf1, 1, STRIDE, tc, maskP, maskQ);
     }
+
+    for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
+    {
+        if (opt.integral_initv[k])
+        {
+            switch (k)
+            {
+                case 0:
+                    HEADER0("integral_init4v");
+                    break;
+                case 1:
+                    HEADER0("integral_init8v");
+                    break;
+                case 2:
+                    HEADER0("integral_init12v");
+                    break;
+                case 3:
+                    HEADER0("integral_init16v");
+                    break;
+                case 4:
+                    HEADER0("integral_init24v");
+                    break;
+                case 5:
+                    HEADER0("integral_init32v");
+                    break;
+                default:
+                    break;
+            }
+            REPORT_SPEEDUP(opt.integral_initv[k], ref.integral_initv[k], (uint32_t*)pbuf1, STRIDE);
+        }
+    }
+
+    for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
+    {
+        if (opt.integral_inith[k])
+        {
+            uint32_t dst_buf[BUFFSIZE] = { 0 };
+            switch (k)
+            {
+            case 0:
+                HEADER0("integral_init4h");
+                break;
+            case 1:
​

x265_2.4.tar.gz/source/test/pixelharness.h -> x265_2.5.tar.gz/source/test/pixelharness.h Changed

 
@@ -40,7 +40,7 @@
     enum { BUFFSIZE = STRIDE * (MAX_HEIGHT + PAD_ROWS) + INCR * ITERS };
     enum { TEST_CASES = 3 };
     enum { SMAX = 1 << 12 };
-    enum { SMIN = -1 << 12 };
+    enum { SMIN = (unsigned)-1 << 12 };
     enum { RMAX = PIXEL_MAX - PIXEL_MIN }; //The maximum value obtained by subtracting pixel values (residual max)
     enum { RMIN = PIXEL_MIN - PIXEL_MAX }; //The minimum value obtained by subtracting pixel values (residual min)
 
@@ -126,6 +126,8 @@
     bool check_pelFilterLumaStrong_H(pelFilterLumaStrong_t ref, pelFilterLumaStrong_t opt);
     bool check_pelFilterChroma_V(pelFilterChroma_t ref, pelFilterChroma_t opt);
     bool check_pelFilterChroma_H(pelFilterChroma_t ref, pelFilterChroma_t opt);
+    bool check_integral_initv(integralv_t ref, integralv_t opt);
+    bool check_integral_inith(integralh_t ref, integralh_t opt);
 
 public:
 
​

x265_2.4.tar.gz/source/test/regression-tests.txt -> x265_2.5.tar.gz/source/test/regression-tests.txt Changed

@@ -17,17 +17,17 @@
 BasketballDrive_1920x1080_50.y4m,--preset faster --aq-strength 2 --merange 190 --slices 3
 BasketballDrive_1920x1080_50.y4m,--preset medium --ctu 16 --max-tu-size 8 --subme 7 --qg-size 16 --cu-lossless --tu-inter-depth 3 --limit-tu 1
 BasketballDrive_1920x1080_50.y4m,--preset medium --keyint -1 --nr-inter 100 -F4 --no-sao
-BasketballDrive_1920x1080_50.y4m,--preset medium --no-cutree --analysis-mode=save --refine-level 2 --bitrate 7000 --limit-modes,--preset medium --no-cutree --analysis-mode=load --refine-level 2 --bitrate 7000 --limit-modes
+BasketballDrive_1920x1080_50.y4m,--preset medium --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 2 --bitrate 7000 --limit-modes,--preset medium --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 2 --bitrate 7000 --limit-modes
 BasketballDrive_1920x1080_50.y4m,--preset slow --nr-intra 100 -F4 --aq-strength 3 --qg-size 16 --limit-refs 1
 BasketballDrive_1920x1080_50.y4m,--preset slower --lossless --chromaloc 3 --subme 0 --limit-tu 4
-BasketballDrive_1920x1080_50.y4m,--preset slower --no-cutree --analysis-mode=save --refine-level 10 --bitrate 7000 --limit-tu 0,--preset slower --no-cutree --analysis-mode=load --refine-level 10 --bitrate 7000 --limit-tu 0
+BasketballDrive_1920x1080_50.y4m,--preset slower --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 10 --bitrate 7000 --limit-tu 0,--preset slower --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 10 --bitrate 7000 --limit-tu 0
 BasketballDrive_1920x1080_50.y4m,--preset veryslow --crf 4 --cu-lossless --pmode --limit-refs 1 --aq-mode 3 --limit-tu 3
-BasketballDrive_1920x1080_50.y4m,--preset veryslow --no-cutree --analysis-mode=save --bitrate 7000 --tskip-fast --limit-tu 4,--preset veryslow --no-cutree --analysis-mode=load --bitrate 7000  --tskip-fast --limit-tu 4
+BasketballDrive_1920x1080_50.y4m,--preset veryslow --no-cutree --analysis-reuse-mode=save --bitrate 7000 --tskip-fast --limit-tu 4,--preset veryslow --no-cutree --analysis-reuse-mode=load --bitrate 7000  --tskip-fast --limit-tu 4
 BasketballDrive_1920x1080_50.y4m,--preset veryslow --recon-y4m-exec "ffplay -i pipe:0 -autoexit"
 Coastguard-4k.y4m,--preset ultrafast --recon-y4m-exec "ffplay -i pipe:0 -autoexit"
 Coastguard-4k.y4m,--preset superfast --tune grain --overscan=crop
 Coastguard-4k.y4m,--preset superfast --tune grain --pme --aq-strength 2 --merange 190
-Coastguard-4k.y4m,--preset veryfast --no-cutree --analysis-mode=save --refine-level 1 --bitrate 15000,--preset veryfast --no-cutree --analysis-mode=load --refine-level 1 --bitrate 15000
+Coastguard-4k.y4m,--preset veryfast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 1 --bitrate 15000,--preset veryfast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 1 --bitrate 15000
 Coastguard-4k.y4m,--preset medium --rdoq-level 1 --tune ssim --no-signhide --me umh --slices 2
 Coastguard-4k.y4m,--preset slow --tune psnr --cbqpoffs -1 --crqpoffs 1 --limit-refs 1
 CrowdRun_1920x1080_50_10bit_422.yuv,--preset ultrafast --weightp --tune zerolatency --qg-size 16
@@ -51,7 +51,7 @@
 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset veryfast --weightp --nr-intra 1000 -F4
 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset medium --nr-inter 500 -F4 --no-psy-rdoq
 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset slower --no-weightp --rdoq-level 0 --limit-refs 3 --tu-inter-depth 4 --limit-tu 3
-DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --no-cutree --analysis-mode=save --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1,--preset fast --no-cutree --analysis-mode=load --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1
+DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --no-cutree --analysis-reuse-mode=save --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1,--preset fast --no-cutree --analysis-reuse-mode=load --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1
 FourPeople_1280x720_60.y4m,--preset superfast --no-wpp --lookahead-slices 2
 FourPeople_1280x720_60.y4m,--preset veryfast --aq-mode 2 --aq-strength 1.5 --qg-size 8
 FourPeople_1280x720_60.y4m,--preset medium --qp 38 --no-psy-rd
@@ -68,8 +68,8 @@
 KristenAndSara_1280x720_60.y4m,--preset slower --pmode --max-tu-size 8 --limit-refs 0 --limit-modes --limit-tu 1
 NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset superfast --tune psnr
 NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset medium --tune grain --limit-refs 2
-NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset slow --no-cutree --analysis-mode=save --rd 5 --refine-level 10 --bitrate 9000,--preset slow --no-cutree --analysis-mode=load --rd 5 --refine-level 10 --bitrate 9000
-News-4k.y4m,--preset ultrafast --no-cutree --analysis-mode=save --refine-level 2 --bitrate 15000,--preset ultrafast --no-cutree --analysis-mode=load --refine-level 2 --bitrate 15000
+NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset slow --no-cutree --analysis-reuse-mode=save --rd 5 --analysis-reuse-level 10 --bitrate 9000,--preset slow --no-cutree --analysis-reuse-mode=load --rd 5 --analysis-reuse-level 10 --bitrate 9000
+News-4k.y4m,--preset ultrafast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 2 --bitrate 15000,--preset ultrafast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 2 --bitrate 15000
 News-4k.y4m,--preset superfast --lookahead-slices 6 --aq-mode 0
 News-4k.y4m,--preset superfast --slices 4 --aq-mode 0 
 News-4k.y4m,--preset medium --tune ssim --no-sao --qg-size 16
@@ -123,7 +123,7 @@
 old_town_cross_444_720p50.y4m,--preset superfast --weightp --min-cu 16 --limit-modes
 old_town_cross_444_720p50.y4m,--preset veryfast --qp 1 --tune ssim
 old_town_cross_444_720p50.y4m,--preset faster --rd 1 --tune zero-latency
-old_town_cross_444_720p50.y4m,--preset fast --no-cutree --analysis-mode=save --refine-level 1 --bitrate 3000 --early-skip,--preset fast --no-cutree --analysis-mode=load --refine-level 1 --bitrate 3000 --early-skip
+old_town_cross_444_720p50.y4m,--preset fast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 1 --bitrate 3000 --early-skip,--preset fast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 1 --bitrate 3000 --early-skip
 old_town_cross_444_720p50.y4m,--preset medium --keyint -1 --no-weightp --ref 6
 old_town_cross_444_720p50.y4m,--preset slow --rdoq-level 1 --early-skip --ref 7 --no-b-pyramid
 old_town_cross_444_720p50.y4m,--preset slower --crf 4 --cu-lossless

 
@@ -17,17 +17,17 @@
 BasketballDrive_1920x1080_50.y4m,--preset faster --aq-strength 2 --merange 190 --slices 3
 BasketballDrive_1920x1080_50.y4m,--preset medium --ctu 16 --max-tu-size 8 --subme 7 --qg-size 16 --cu-lossless --tu-inter-depth 3 --limit-tu 1
 BasketballDrive_1920x1080_50.y4m,--preset medium --keyint -1 --nr-inter 100 -F4 --no-sao
-BasketballDrive_1920x1080_50.y4m,--preset medium --no-cutree --analysis-mode=save --refine-level 2 --bitrate 7000 --limit-modes,--preset medium --no-cutree --analysis-mode=load --refine-level 2 --bitrate 7000 --limit-modes
+BasketballDrive_1920x1080_50.y4m,--preset medium --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 2 --bitrate 7000 --limit-modes,--preset medium --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 2 --bitrate 7000 --limit-modes
 BasketballDrive_1920x1080_50.y4m,--preset slow --nr-intra 100 -F4 --aq-strength 3 --qg-size 16 --limit-refs 1
 BasketballDrive_1920x1080_50.y4m,--preset slower --lossless --chromaloc 3 --subme 0 --limit-tu 4
-BasketballDrive_1920x1080_50.y4m,--preset slower --no-cutree --analysis-mode=save --refine-level 10 --bitrate 7000 --limit-tu 0,--preset slower --no-cutree --analysis-mode=load --refine-level 10 --bitrate 7000 --limit-tu 0
+BasketballDrive_1920x1080_50.y4m,--preset slower --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 10 --bitrate 7000 --limit-tu 0,--preset slower --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 10 --bitrate 7000 --limit-tu 0
 BasketballDrive_1920x1080_50.y4m,--preset veryslow --crf 4 --cu-lossless --pmode --limit-refs 1 --aq-mode 3 --limit-tu 3
-BasketballDrive_1920x1080_50.y4m,--preset veryslow --no-cutree --analysis-mode=save --bitrate 7000 --tskip-fast --limit-tu 4,--preset veryslow --no-cutree --analysis-mode=load --bitrate 7000  --tskip-fast --limit-tu 4
+BasketballDrive_1920x1080_50.y4m,--preset veryslow --no-cutree --analysis-reuse-mode=save --bitrate 7000 --tskip-fast --limit-tu 4,--preset veryslow --no-cutree --analysis-reuse-mode=load --bitrate 7000  --tskip-fast --limit-tu 4
 BasketballDrive_1920x1080_50.y4m,--preset veryslow --recon-y4m-exec "ffplay -i pipe:0 -autoexit"
 Coastguard-4k.y4m,--preset ultrafast --recon-y4m-exec "ffplay -i pipe:0 -autoexit"
 Coastguard-4k.y4m,--preset superfast --tune grain --overscan=crop
 Coastguard-4k.y4m,--preset superfast --tune grain --pme --aq-strength 2 --merange 190
-Coastguard-4k.y4m,--preset veryfast --no-cutree --analysis-mode=save --refine-level 1 --bitrate 15000,--preset veryfast --no-cutree --analysis-mode=load --refine-level 1 --bitrate 15000
+Coastguard-4k.y4m,--preset veryfast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 1 --bitrate 15000,--preset veryfast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 1 --bitrate 15000
 Coastguard-4k.y4m,--preset medium --rdoq-level 1 --tune ssim --no-signhide --me umh --slices 2
 Coastguard-4k.y4m,--preset slow --tune psnr --cbqpoffs -1 --crqpoffs 1 --limit-refs 1
 CrowdRun_1920x1080_50_10bit_422.yuv,--preset ultrafast --weightp --tune zerolatency --qg-size 16
@@ -51,7 +51,7 @@
 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset veryfast --weightp --nr-intra 1000 -F4
 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset medium --nr-inter 500 -F4 --no-psy-rdoq
 DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset slower --no-weightp --rdoq-level 0 --limit-refs 3 --tu-inter-depth 4 --limit-tu 3
-DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --no-cutree --analysis-mode=save --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1,--preset fast --no-cutree --analysis-mode=load --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1
+DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --no-cutree --analysis-reuse-mode=save --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1,--preset fast --no-cutree --analysis-reuse-mode=load --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1
 FourPeople_1280x720_60.y4m,--preset superfast --no-wpp --lookahead-slices 2
 FourPeople_1280x720_60.y4m,--preset veryfast --aq-mode 2 --aq-strength 1.5 --qg-size 8
 FourPeople_1280x720_60.y4m,--preset medium --qp 38 --no-psy-rd
@@ -68,8 +68,8 @@
 KristenAndSara_1280x720_60.y4m,--preset slower --pmode --max-tu-size 8 --limit-refs 0 --limit-modes --limit-tu 1
 NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset superfast --tune psnr
 NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset medium --tune grain --limit-refs 2
-NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset slow --no-cutree --analysis-mode=save --rd 5 --refine-level 10 --bitrate 9000,--preset slow --no-cutree --analysis-mode=load --rd 5 --refine-level 10 --bitrate 9000
-News-4k.y4m,--preset ultrafast --no-cutree --analysis-mode=save --refine-level 2 --bitrate 15000,--preset ultrafast --no-cutree --analysis-mode=load --refine-level 2 --bitrate 15000
+NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset slow --no-cutree --analysis-reuse-mode=save --rd 5 --analysis-reuse-level 10 --bitrate 9000,--preset slow --no-cutree --analysis-reuse-mode=load --rd 5 --analysis-reuse-level 10 --bitrate 9000
+News-4k.y4m,--preset ultrafast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 2 --bitrate 15000,--preset ultrafast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 2 --bitrate 15000
 News-4k.y4m,--preset superfast --lookahead-slices 6 --aq-mode 0
 News-4k.y4m,--preset superfast --slices 4 --aq-mode 0 
 News-4k.y4m,--preset medium --tune ssim --no-sao --qg-size 16
@@ -123,7 +123,7 @@
 old_town_cross_444_720p50.y4m,--preset superfast --weightp --min-cu 16 --limit-modes
 old_town_cross_444_720p50.y4m,--preset veryfast --qp 1 --tune ssim
 old_town_cross_444_720p50.y4m,--preset faster --rd 1 --tune zero-latency
-old_town_cross_444_720p50.y4m,--preset fast --no-cutree --analysis-mode=save --refine-level 1 --bitrate 3000 --early-skip,--preset fast --no-cutree --analysis-mode=load --refine-level 1 --bitrate 3000 --early-skip
+old_town_cross_444_720p50.y4m,--preset fast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 1 --bitrate 3000 --early-skip,--preset fast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 1 --bitrate 3000 --early-skip
 old_town_cross_444_720p50.y4m,--preset medium --keyint -1 --no-weightp --ref 6
 old_town_cross_444_720p50.y4m,--preset slow --rdoq-level 1 --early-skip --ref 7 --no-b-pyramid
 old_town_cross_444_720p50.y4m,--preset slower --crf 4 --cu-lossless
​

x265_2.4.tar.gz/source/x265-extras.cpp -> x265_2.5.tar.gz/source/x265-extras.cpp Changed

@@ -25,7 +25,7 @@
 
 #include "x265.h"
 #include "x265-extras.h"
-
+#include "param.h"
 #include "common.h"
 
 using namespace X265_NS;
@@ -38,14 +38,8 @@
     "B count, B ave-QP, B kbps, B-PSNR Y, B-PSNR U, B-PSNR V, B-SSIM (dB), "
     "MaxCLL, MaxFALL, Version\n";
 
-FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char* fname, int level)
+FILE* x265_csvlog_open(const x265_param& param, const char* fname, int level)
 {
-    if (sizeof(x265_stats) != api.sizeof_stats || sizeof(x265_picture) != api.sizeof_picture)
-    {
-        fprintf(stderr, "extras [error]: structure size skew, unable to create CSV logfile\n");
-        return NULL;
-    }
-
     FILE *csvfp = x265_fopen(fname, "r");
     if (csvfp)
     {
@@ -62,6 +56,8 @@
             if (level)
             {
                 fprintf(csvfp, "Encode Order, Type, POC, QP, Bits, Scenecut, ");
+                if (level >= 2)
+                    fprintf(csvfp, "I/P cost ratio, ");
                 if (param.rc.rateControlMode == X265_RC_CRF)
                     fprintf(csvfp, "RateFactor, ");
                 if (param.rc.vbvBufferSize)
@@ -73,7 +69,7 @@
                 fprintf(csvfp, "Latency, ");
                 fprintf(csvfp, "List 0, List 1");
                 uint32_t size = param.maxCUSize;
-                for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+                for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
                 {
                     fprintf(csvfp, ", Intra %dx%d DC, Intra %dx%d Planar, Intra %dx%d Ang", size, size, size, size, size, size);
                     size /= 2;
@@ -82,7 +78,7 @@
                 size = param.maxCUSize;
                 if (param.bEnableRectInter)
                 {
-                    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+                    for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
                     {
                         fprintf(csvfp, ", Inter %dx%d, Inter %dx%d (Rect)", size, size, size, size);
                         if (param.bEnableAMP)
@@ -92,29 +88,56 @@
                 }
                 else
                 {
-                    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+                    for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
                     {
                         fprintf(csvfp, ", Inter %dx%d", size, size);
                         size /= 2;
                     }
                 }
                 size = param.maxCUSize;
-                for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+                for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
                 {
                     fprintf(csvfp, ", Skip %dx%d", size, size);
                     size /= 2;
                 }
                 size = param.maxCUSize;
-                for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+                for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
                 {
                     fprintf(csvfp, ", Merge %dx%d", size, size);
                     size /= 2;
                 }
-                fprintf(csvfp, ", Avg Luma Distortion, Avg Chroma Distortion, Avg psyEnergy, Avg Luma Level, Max Luma Level, Avg Residual Energy");
 
-                /* detailed performance statistics */
                 if (level >= 2)
-                    fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms), Stall Time (ms), Total frame time (ms), Avg WPP, Row Blocks");
+                {
+                    fprintf(csvfp, ", Avg Luma Distortion, Avg Chroma Distortion, Avg psyEnergy, Avg Residual Energy,"
+                        " Min Luma Level, Max Luma Level, Avg Luma Level");
+
+                    if (param.internalCsp != X265_CSP_I400)
+                        fprintf(csvfp, ", Min Cb Level, Max Cb Level, Avg Cb Level, Min Cr Level, Max Cr Level, Avg Cr Level");
+
+                    /* PU statistics */
+                    size = param.maxCUSize;
+                    for (uint32_t i = 0; i< param.maxLog2CUSize - (uint32_t)g_log2Size[param.minCUSize] + 1; i++)
+                    {
+                        fprintf(csvfp, ", Intra %dx%d", size, size);
+                        fprintf(csvfp, ", Skip %dx%d", size, size);
+                        fprintf(csvfp, ", AMP %d", size);
+                        fprintf(csvfp, ", Inter %dx%d", size, size);
+                        fprintf(csvfp, ", Merge %dx%d", size, size);
+                        fprintf(csvfp, ", Inter %dx%d", size, size / 2);
+                        fprintf(csvfp, ", Merge %dx%d", size, size / 2);
+                        fprintf(csvfp, ", Inter %dx%d", size / 2, size);
+                        fprintf(csvfp, ", Merge %dx%d", size / 2, size);
+                        size /= 2;
+                    }
+
+                    if ((uint32_t)g_log2Size[param.minCUSize] == 3)
+                        fprintf(csvfp, ", 4x4");
+
+                    /* detailed performance statistics */
+                    fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms),"
+                    "Stall Time (ms), Total frame time (ms), Avg WPP, Row Blocks");
+                }
                 fprintf(csvfp, "\n");
             }
             else
@@ -131,7 +154,10 @@
         return;
 
     const x265_frame_stats* frameStats = &pic.frameData;
-    fprintf(csvfp, "%d, %c-SLICE, %4d, %2.2lf, %10d, %d,", frameStats->encoderOrder, frameStats->sliceType, frameStats->poc, frameStats->qp, (int)frameStats->bits, frameStats->bScenecut);
+    fprintf(csvfp, "%d, %c-SLICE, %4d, %2.2lf, %10d, %d,", frameStats->encoderOrder, frameStats->sliceType, frameStats->poc, 
+                                                           frameStats->qp, (int)frameStats->bits, frameStats->bScenecut);
+    if (level >= 2)
+        fprintf(csvfp, "%.2f,", frameStats->ipCostRatio);
     if (param.rc.rateControlMode == X265_RC_CRF)
         fprintf(csvfp, "%.3lf,", frameStats->rateFactor);
     if (param.rc.vbvBufferSize)
@@ -159,39 +185,76 @@
         else
             fputs(" -,", csvfp);
     }
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
-        fprintf(csvfp, "%5.2lf%%, %5.2lf%%, %5.2lf%%,", frameStats->cuStats.percentIntraDistribution[depth][0], frameStats->cuStats.percentIntraDistribution[depth][1], frameStats->cuStats.percentIntraDistribution[depth][2]);
-    fprintf(csvfp, "%5.2lf%%", frameStats->cuStats.percentIntraNxN);
-    if (param.bEnableRectInter)
+
+    if (level)
     {
-        for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+        for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+            fprintf(csvfp, "%5.2lf%%, %5.2lf%%, %5.2lf%%,", frameStats->cuStats.percentIntraDistribution[depth][0],
+            frameStats->cuStats.percentIntraDistribution[depth][1],
+            frameStats->cuStats.percentIntraDistribution[depth][2]);
+        fprintf(csvfp, "%5.2lf%%", frameStats->cuStats.percentIntraNxN);
+        if (param.bEnableRectInter)
         {
-            fprintf(csvfp, ", %5.2lf%%, %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0], frameStats->cuStats.percentInterDistribution[depth][1]);
-            if (param.bEnableAMP)
-                fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][2]);
+            for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+            {
+                fprintf(csvfp, ", %5.2lf%%, %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0],
+                    frameStats->cuStats.percentInterDistribution[depth][1]);
+                if (param.bEnableAMP)
+                    fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][2]);
+            }
         }
+        else
+        {
+            for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+                fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0]);
+        }
+        for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+            fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentSkipCu[depth]);
+        for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+            fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentMergeCu[depth]);
     }
-    else
-    {
-        for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
-            fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0]);
-    }
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
-        fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentSkipCu[depth]);
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
-        fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentMergeCu[depth]);
-    fprintf(csvfp, ", %.2lf, %.2lf, %.2lf, %.2lf, %d, %.2lf", frameStats->avgLumaDistortion, frameStats->avgChromaDistortion, frameStats->avgPsyEnergy, frameStats->avgLumaLevel, frameStats->maxLumaLevel, frameStats->avgResEnergy);
 
     if (level >= 2)
     {
-        fprintf(csvfp, ", %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf,", frameStats->decideWaitTime, frameStats->row0WaitTime, frameStats->wallTime, frameStats->refWaitWallTime, frameStats->totalCTUTime, frameStats->stallTime, frameStats->totalFrameTime);
+        fprintf(csvfp, ", %.2lf, %.2lf, %.2lf, %.2lf ", frameStats->avgLumaDistortion,
+            frameStats->avgChromaDistortion,
+            frameStats->avgPsyEnergy,
+            frameStats->avgResEnergy);
+
+        fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minLumaLevel, frameStats->maxLumaLevel, frameStats->avgLumaLevel);
+
+        if (param.internalCsp != X265_CSP_I400)
+        {
+            fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minChromaULevel, frameStats->maxChromaULevel, frameStats->avgChromaULevel);
+            fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minChromaVLevel, frameStats->maxChromaVLevel, frameStats->avgChromaVLevel);
+        }
+
+        for (uint32_t i = 0; i < param.maxLog2CUSize - (uint32_t)g_log2Size[param.minCUSize] + 1; i++)
+        {
+            fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentIntraPu[i]);
+            fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentSkipPu[i]);
+            fprintf(csvfp, ",%.2lf%%", frameStats->puStats.percentAmpPu[i]);
+            for (uint32_t j = 0; j < 3; j++)

 
@@ -25,7 +25,7 @@
 
 #include "x265.h"
 #include "x265-extras.h"
-
+#include "param.h"
 #include "common.h"
 
 using namespace X265_NS;
@@ -38,14 +38,8 @@
     "B count, B ave-QP, B kbps, B-PSNR Y, B-PSNR U, B-PSNR V, B-SSIM (dB), "
     "MaxCLL, MaxFALL, Version\n";
 
-FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char* fname, int level)
+FILE* x265_csvlog_open(const x265_param& param, const char* fname, int level)
 {
-    if (sizeof(x265_stats) != api.sizeof_stats || sizeof(x265_picture) != api.sizeof_picture)
-    {
-        fprintf(stderr, "extras [error]: structure size skew, unable to create CSV logfile\n");
-        return NULL;
-    }
-
     FILE *csvfp = x265_fopen(fname, "r");
     if (csvfp)
     {
@@ -62,6 +56,8 @@
             if (level)
             {
                 fprintf(csvfp, "Encode Order, Type, POC, QP, Bits, Scenecut, ");
+                if (level >= 2)
+                    fprintf(csvfp, "I/P cost ratio, ");
                 if (param.rc.rateControlMode == X265_RC_CRF)
                     fprintf(csvfp, "RateFactor, ");
                 if (param.rc.vbvBufferSize)
@@ -73,7 +69,7 @@
                 fprintf(csvfp, "Latency, ");
                 fprintf(csvfp, "List 0, List 1");
                 uint32_t size = param.maxCUSize;
-                for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+                for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
                 {
                     fprintf(csvfp, ", Intra %dx%d DC, Intra %dx%d Planar, Intra %dx%d Ang", size, size, size, size, size, size);
                     size /= 2;
@@ -82,7 +78,7 @@
                 size = param.maxCUSize;
                 if (param.bEnableRectInter)
                 {
-                    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+                    for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
                     {
                         fprintf(csvfp, ", Inter %dx%d, Inter %dx%d (Rect)", size, size, size, size);
                         if (param.bEnableAMP)
@@ -92,29 +88,56 @@
                 }
                 else
                 {
-                    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+                    for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
                     {
                         fprintf(csvfp, ", Inter %dx%d", size, size);
                         size /= 2;
                     }
                 }
                 size = param.maxCUSize;
-                for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+                for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
                 {
                     fprintf(csvfp, ", Skip %dx%d", size, size);
                     size /= 2;
                 }
                 size = param.maxCUSize;
-                for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+                for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
                 {
                     fprintf(csvfp, ", Merge %dx%d", size, size);
                     size /= 2;
                 }
-                fprintf(csvfp, ", Avg Luma Distortion, Avg Chroma Distortion, Avg psyEnergy, Avg Luma Level, Max Luma Level, Avg Residual Energy");
 
-                /* detailed performance statistics */
                 if (level >= 2)
-                    fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms), Stall Time (ms), Total frame time (ms), Avg WPP, Row Blocks");
+                {
+                    fprintf(csvfp, ", Avg Luma Distortion, Avg Chroma Distortion, Avg psyEnergy, Avg Residual Energy,"
+                        " Min Luma Level, Max Luma Level, Avg Luma Level");
+
+                    if (param.internalCsp != X265_CSP_I400)
+                        fprintf(csvfp, ", Min Cb Level, Max Cb Level, Avg Cb Level, Min Cr Level, Max Cr Level, Avg Cr Level");
+
+                    /* PU statistics */
+                    size = param.maxCUSize;
+                    for (uint32_t i = 0; i< param.maxLog2CUSize - (uint32_t)g_log2Size[param.minCUSize] + 1; i++)
+                    {
+                        fprintf(csvfp, ", Intra %dx%d", size, size);
+                        fprintf(csvfp, ", Skip %dx%d", size, size);
+                        fprintf(csvfp, ", AMP %d", size);
+                        fprintf(csvfp, ", Inter %dx%d", size, size);
+                        fprintf(csvfp, ", Merge %dx%d", size, size);
+                        fprintf(csvfp, ", Inter %dx%d", size, size / 2);
+                        fprintf(csvfp, ", Merge %dx%d", size, size / 2);
+                        fprintf(csvfp, ", Inter %dx%d", size / 2, size);
+                        fprintf(csvfp, ", Merge %dx%d", size / 2, size);
+                        size /= 2;
+                    }
+
+                    if ((uint32_t)g_log2Size[param.minCUSize] == 3)
+                        fprintf(csvfp, ", 4x4");
+
+                    /* detailed performance statistics */
+                    fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms),"
+                    "Stall Time (ms), Total frame time (ms), Avg WPP, Row Blocks");
+                }
                 fprintf(csvfp, "\n");
             }
             else
@@ -131,7 +154,10 @@
         return;
 
     const x265_frame_stats* frameStats = &pic.frameData;
-    fprintf(csvfp, "%d, %c-SLICE, %4d, %2.2lf, %10d, %d,", frameStats->encoderOrder, frameStats->sliceType, frameStats->poc, frameStats->qp, (int)frameStats->bits, frameStats->bScenecut);
+    fprintf(csvfp, "%d, %c-SLICE, %4d, %2.2lf, %10d, %d,", frameStats->encoderOrder, frameStats->sliceType, frameStats->poc, 
+                                                           frameStats->qp, (int)frameStats->bits, frameStats->bScenecut);
+    if (level >= 2)
+        fprintf(csvfp, "%.2f,", frameStats->ipCostRatio);
     if (param.rc.rateControlMode == X265_RC_CRF)
         fprintf(csvfp, "%.3lf,", frameStats->rateFactor);
     if (param.rc.vbvBufferSize)
@@ -159,39 +185,76 @@
         else
             fputs(" -,", csvfp);
     }
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
-        fprintf(csvfp, "%5.2lf%%, %5.2lf%%, %5.2lf%%,", frameStats->cuStats.percentIntraDistribution[depth][0], frameStats->cuStats.percentIntraDistribution[depth][1], frameStats->cuStats.percentIntraDistribution[depth][2]);
-    fprintf(csvfp, "%5.2lf%%", frameStats->cuStats.percentIntraNxN);
-    if (param.bEnableRectInter)
+
+    if (level)
     {
-        for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
+        for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+            fprintf(csvfp, "%5.2lf%%, %5.2lf%%, %5.2lf%%,", frameStats->cuStats.percentIntraDistribution[depth][0],
+            frameStats->cuStats.percentIntraDistribution[depth][1],
+            frameStats->cuStats.percentIntraDistribution[depth][2]);
+        fprintf(csvfp, "%5.2lf%%", frameStats->cuStats.percentIntraNxN);
+        if (param.bEnableRectInter)
         {
-            fprintf(csvfp, ", %5.2lf%%, %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0], frameStats->cuStats.percentInterDistribution[depth][1]);
-            if (param.bEnableAMP)
-                fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][2]);
+            for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+            {
+                fprintf(csvfp, ", %5.2lf%%, %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0],
+                    frameStats->cuStats.percentInterDistribution[depth][1]);
+                if (param.bEnableAMP)
+                    fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][2]);
+            }
         }
+        else
+        {
+            for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+                fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0]);
+        }
+        for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+            fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentSkipCu[depth]);
+        for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
+            fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentMergeCu[depth]);
     }
-    else
-    {
-        for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
-            fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0]);
-    }
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
-        fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentSkipCu[depth]);
-    for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
-        fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentMergeCu[depth]);
-    fprintf(csvfp, ", %.2lf, %.2lf, %.2lf, %.2lf, %d, %.2lf", frameStats->avgLumaDistortion, frameStats->avgChromaDistortion, frameStats->avgPsyEnergy, frameStats->avgLumaLevel, frameStats->maxLumaLevel, frameStats->avgResEnergy);
 
     if (level >= 2)
     {
-        fprintf(csvfp, ", %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf,", frameStats->decideWaitTime, frameStats->row0WaitTime, frameStats->wallTime, frameStats->refWaitWallTime, frameStats->totalCTUTime, frameStats->stallTime, frameStats->totalFrameTime);
+        fprintf(csvfp, ", %.2lf, %.2lf, %.2lf, %.2lf ", frameStats->avgLumaDistortion,
+            frameStats->avgChromaDistortion,
+            frameStats->avgPsyEnergy,
+            frameStats->avgResEnergy);
+
+        fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minLumaLevel, frameStats->maxLumaLevel, frameStats->avgLumaLevel);
+
+        if (param.internalCsp != X265_CSP_I400)
+        {
+            fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minChromaULevel, frameStats->maxChromaULevel, frameStats->avgChromaULevel);
+            fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minChromaVLevel, frameStats->maxChromaVLevel, frameStats->avgChromaVLevel);
+        }
+
+        for (uint32_t i = 0; i < param.maxLog2CUSize - (uint32_t)g_log2Size[param.minCUSize] + 1; i++)
+        {
+            fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentIntraPu[i]);
+            fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentSkipPu[i]);
+            fprintf(csvfp, ",%.2lf%%", frameStats->puStats.percentAmpPu[i]);
+            for (uint32_t j = 0; j < 3; j++)
​

x265_2.4.tar.gz/source/x265-extras.h -> x265_2.5.tar.gz/source/x265-extras.h Changed

@@ -44,7 +44,7 @@
  * closed by the caller using fclose(). If level is 0, then no frame logging
  * header is written to the file. This function will return NULL if it is unable
  * to open the file for write or if it detects a structure size skew */
-LIBAPI FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char* fname, int level);
+LIBAPI FILE* x265_csvlog_open(const x265_param& param, const char* fname, int level);
 
 /* Log frame statistics to the CSV file handle. level should have been non-zero
  * in the call to x265_csvlog_open() if this function is called. */
@@ -53,7 +53,7 @@
 /* Log final encode statistics to the CSV file handle. 'argc' and 'argv' are
  * intended to be command line arguments passed to the encoder. Encode
  * statistics should be queried from the encoder just prior to closing it. */
-LIBAPI void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, const x265_stats& stats, int level, int argc, char** argv);
+LIBAPI void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, int padx, int pady, const x265_stats& stats, int level, int argc, char** argv);
 
 /* In-place downshift from a bit-depth greater than 8 to a bit-depth of 8, using
  * the residual bits to dither each row. */

 
@@ -44,7 +44,7 @@
  * closed by the caller using fclose(). If level is 0, then no frame logging
  * header is written to the file. This function will return NULL if it is unable
  * to open the file for write or if it detects a structure size skew */
-LIBAPI FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char* fname, int level);
+LIBAPI FILE* x265_csvlog_open(const x265_param& param, const char* fname, int level);
 
 /* Log frame statistics to the CSV file handle. level should have been non-zero
  * in the call to x265_csvlog_open() if this function is called. */
@@ -53,7 +53,7 @@
 /* Log final encode statistics to the CSV file handle. 'argc' and 'argv' are
  * intended to be command line arguments passed to the encoder. Encode
  * statistics should be queried from the encoder just prior to closing it. */
-LIBAPI void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, const x265_stats& stats, int level, int argc, char** argv);
+LIBAPI void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, int padx, int pady, const x265_stats& stats, int level, int argc, char** argv);
 
 /* In-place downshift from a bit-depth greater than 8 to a bit-depth of 8, using
  * the residual bits to dither each row. */
​

x265_2.4.tar.gz/source/x265.cpp -> x265_2.5.tar.gz/source/x265.cpp Changed

@@ -73,15 +73,12 @@
     ReconFile* recon;
     OutputFile* output;
     FILE*       qpfile;
-    FILE*       csvfpt;
-    const char* csvfn;
     const char* reconPlayCmd;
     const x265_api* api;
     x265_param* param;
     bool bProgress;
     bool bForceY4m;
     bool bDither;
-    int csvLogLevel;
     uint32_t seek;              // number of frames to skip from the beginning
     uint32_t framesToBeEncoded; // number of frames to encode
     uint64_t totalbytes;
@@ -97,8 +94,6 @@
         recon = NULL;
         output = NULL;
         qpfile = NULL;
-        csvfpt = NULL;
-        csvfn = NULL;
         reconPlayCmd = NULL;
         api = NULL;
         param = NULL;
@@ -109,7 +104,6 @@
         startTime = x265_mdate();
         prevUpdateTime = 0;
         bDither = false;
-        csvLogLevel = 0;
     }
 
     void destroy();
@@ -129,9 +123,6 @@
     if (qpfile)
         fclose(qpfile);
     qpfile = NULL;
-    if (csvfpt)
-        fclose(csvfpt);
-    csvfpt = NULL;
     if (output)
         output->release();
     output = NULL;
@@ -292,8 +283,6 @@
             if (0) ;
             OPT2("frame-skip", "seek") this->seek = (uint32_t)x265_atoi(optarg, bError);
             OPT("frames") this->framesToBeEncoded = (uint32_t)x265_atoi(optarg, bError);
-            OPT("csv") this->csvfn = optarg;
-            OPT("csv-log-level") this->csvLogLevel = x265_atoi(optarg, bError);
             OPT("no-progress") this->bProgress = false;
             OPT("output") outputfn = optarg;
             OPT("input") inputfn = optarg;
@@ -530,8 +519,7 @@
  * 1 - unable to parse command line
  * 2 - unable to open encoder
  * 3 - unable to generate stream headers
- * 4 - encoder abort
- * 5 - unable to open csv file */
+ * 4 - encoder abort */
 
 int main(int argc, char **argv)
 {
@@ -586,28 +574,15 @@
     /* get the encoder parameters post-initialization */
     api->encoder_parameters(encoder, param);
 
-    if (cliopt.csvfn)
-    {
-        cliopt.csvfpt = x265_csvlog_open(*api, *param, cliopt.csvfn, cliopt.csvLogLevel);
-        if (!cliopt.csvfpt)
-        {
-            x265_log_file(param, X265_LOG_ERROR, "Unable to open CSV log file <%s>, aborting\n", cliopt.csvfn);
-            cliopt.destroy();
-            if (cliopt.api)
-                cliopt.api->param_free(cliopt.param);
-            exit(5);
-        }
-    }
-
-    /* Control-C handler */
+     /* Control-C handler */
     if (signal(SIGINT, sigint_handler) == SIG_ERR)
         x265_log(param, X265_LOG_ERROR, "Unable to register CTRL+C handler: %s\n", strerror(errno));
 
     x265_picture pic_orig, pic_out;
     x265_picture *pic_in = &pic_orig;
-    /* Allocate recon picture if analysisMode is enabled */
+    /* Allocate recon picture if analysisReuseMode is enabled */
     std::priority_queue<int64_t>* pts_queue = cliopt.output->needPTS() ? new std::priority_queue<int64_t>() : NULL;
-    x265_picture *pic_recon = (cliopt.recon || !!param->analysisMode || pts_queue || reconPlay || cliopt.csvLogLevel) ? &pic_out : NULL;
+    x265_picture *pic_recon = (cliopt.recon || !!param->analysisReuseMode || pts_queue || reconPlay || param->csvLogLevel) ? &pic_out : NULL;
     uint32_t inFrameCount = 0;
     uint32_t outFrameCount = 0;
     x265_nal *p_nal;
@@ -698,8 +673,6 @@
         }
 
         cliopt.printStatus(outFrameCount);
-        if (numEncoded && cliopt.csvLogLevel)
-            x265_csvlog_frame(cliopt.csvfpt, *param, *pic_recon, cliopt.csvLogLevel);
     }
 
     /* Flush the encoder */
@@ -730,8 +703,6 @@
         }
 
         cliopt.printStatus(outFrameCount);
-        if (numEncoded && cliopt.csvLogLevel)
-            x265_csvlog_frame(cliopt.csvfpt, *param, *pic_recon, cliopt.csvLogLevel);
 
         if (!numEncoded)
             break;
@@ -746,8 +717,8 @@
     delete reconPlay;
 
     api->encoder_get_stats(encoder, &stats, sizeof(stats));
-    if (cliopt.csvfpt && !b_ctrl_c)
-        x265_csvlog_encode(cliopt.csvfpt, api->version_str, *param, stats, cliopt.csvLogLevel, argc, argv);
+    if (param->csvfn && !b_ctrl_c)
+        api->encoder_log(encoder, argc, argv);
     api->encoder_close(encoder);
 
     int64_t second_largest_pts = 0;

 
@@ -73,15 +73,12 @@
     ReconFile* recon;
     OutputFile* output;
     FILE*       qpfile;
-    FILE*       csvfpt;
-    const char* csvfn;
     const char* reconPlayCmd;
     const x265_api* api;
     x265_param* param;
     bool bProgress;
     bool bForceY4m;
     bool bDither;
-    int csvLogLevel;
     uint32_t seek;              // number of frames to skip from the beginning
     uint32_t framesToBeEncoded; // number of frames to encode
     uint64_t totalbytes;
@@ -97,8 +94,6 @@
         recon = NULL;
         output = NULL;
         qpfile = NULL;
-        csvfpt = NULL;
-        csvfn = NULL;
         reconPlayCmd = NULL;
         api = NULL;
         param = NULL;
@@ -109,7 +104,6 @@
         startTime = x265_mdate();
         prevUpdateTime = 0;
         bDither = false;
-        csvLogLevel = 0;
     }
 
     void destroy();
@@ -129,9 +123,6 @@
     if (qpfile)
         fclose(qpfile);
     qpfile = NULL;
-    if (csvfpt)
-        fclose(csvfpt);
-    csvfpt = NULL;
     if (output)
         output->release();
     output = NULL;
@@ -292,8 +283,6 @@
             if (0) ;
             OPT2("frame-skip", "seek") this->seek = (uint32_t)x265_atoi(optarg, bError);
             OPT("frames") this->framesToBeEncoded = (uint32_t)x265_atoi(optarg, bError);
-            OPT("csv") this->csvfn = optarg;
-            OPT("csv-log-level") this->csvLogLevel = x265_atoi(optarg, bError);
             OPT("no-progress") this->bProgress = false;
             OPT("output") outputfn = optarg;
             OPT("input") inputfn = optarg;
@@ -530,8 +519,7 @@
  * 1 - unable to parse command line
  * 2 - unable to open encoder
  * 3 - unable to generate stream headers
- * 4 - encoder abort
- * 5 - unable to open csv file */
+ * 4 - encoder abort */
 
 int main(int argc, char **argv)
 {
@@ -586,28 +574,15 @@
     /* get the encoder parameters post-initialization */
     api->encoder_parameters(encoder, param);
 
-    if (cliopt.csvfn)
-    {
-        cliopt.csvfpt = x265_csvlog_open(*api, *param, cliopt.csvfn, cliopt.csvLogLevel);
-        if (!cliopt.csvfpt)
-        {
-            x265_log_file(param, X265_LOG_ERROR, "Unable to open CSV log file <%s>, aborting\n", cliopt.csvfn);
-            cliopt.destroy();
-            if (cliopt.api)
-                cliopt.api->param_free(cliopt.param);
-            exit(5);
-        }
-    }
-
-    /* Control-C handler */
+     /* Control-C handler */
     if (signal(SIGINT, sigint_handler) == SIG_ERR)
         x265_log(param, X265_LOG_ERROR, "Unable to register CTRL+C handler: %s\n", strerror(errno));
 
     x265_picture pic_orig, pic_out;
     x265_picture *pic_in = &pic_orig;
-    /* Allocate recon picture if analysisMode is enabled */
+    /* Allocate recon picture if analysisReuseMode is enabled */
     std::priority_queue<int64_t>* pts_queue = cliopt.output->needPTS() ? new std::priority_queue<int64_t>() : NULL;
-    x265_picture *pic_recon = (cliopt.recon || !!param->analysisMode || pts_queue || reconPlay || cliopt.csvLogLevel) ? &pic_out : NULL;
+    x265_picture *pic_recon = (cliopt.recon || !!param->analysisReuseMode || pts_queue || reconPlay || param->csvLogLevel) ? &pic_out : NULL;
     uint32_t inFrameCount = 0;
     uint32_t outFrameCount = 0;
     x265_nal *p_nal;
@@ -698,8 +673,6 @@
         }
 
         cliopt.printStatus(outFrameCount);
-        if (numEncoded && cliopt.csvLogLevel)
-            x265_csvlog_frame(cliopt.csvfpt, *param, *pic_recon, cliopt.csvLogLevel);
     }
 
     /* Flush the encoder */
@@ -730,8 +703,6 @@
         }
 
         cliopt.printStatus(outFrameCount);
-        if (numEncoded && cliopt.csvLogLevel)
-            x265_csvlog_frame(cliopt.csvfpt, *param, *pic_recon, cliopt.csvLogLevel);
 
         if (!numEncoded)
             break;
@@ -746,8 +717,8 @@
     delete reconPlay;
 
     api->encoder_get_stats(encoder, &stats, sizeof(stats));
-    if (cliopt.csvfpt && !b_ctrl_c)
-        x265_csvlog_encode(cliopt.csvfpt, api->version_str, *param, stats, cliopt.csvLogLevel, argc, argv);
+    if (param->csvfn && !b_ctrl_c)
+        api->encoder_log(encoder, argc, argv);
     api->encoder_close(encoder);
 
     int64_t second_largest_pts = 0;
​

x265_2.4.tar.gz/source/x265.h -> x265_2.5.tar.gz/source/x265.h Changed

@@ -24,10 +24,9 @@
 
 #ifndef X265_H
 #define X265_H
-
 #include <stdint.h>
+#include <stdio.h>
 #include "x265_config.h"
-
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -98,6 +97,7 @@
     uint32_t         sliceType;
     uint32_t         numCUsInFrame;
     uint32_t         numPartitions;
+    uint32_t         depthBytes;
     int              bScenecut;
     void*            wt;
     void*            interData;
@@ -117,6 +117,20 @@
 } x265_cu_stats;
 
 
+/* pu statistics */
+typedef struct x265_pu_stats
+{
+    double      percentSkipPu[4];               // Percentage of skip cu in all depths
+    double      percentIntraPu[4];              // Percentage of intra modes in all depths
+    double      percentAmpPu[4];                // Percentage of amp modes in all depths
+    double      percentInterPu[4][3];           // Percentage of inter 2nx2n, 2nxn and nx2n in all depths
+    double      percentMergePu[4][3];           // Percentage of merge 2nx2n, 2nxn and nx2n in all depth
+    double      percentNxN;
+
+    /* All the above values will add up to 100%. */
+} x265_pu_stats;
+
+
 typedef struct x265_analysis_2Pass
 {
     uint32_t      poc;
@@ -154,13 +168,41 @@
     int              list0POC[16];
     int              list1POC[16];
     uint16_t         maxLumaLevel;
+    uint16_t         minLumaLevel;
+
+    uint16_t         maxChromaULevel;
+    uint16_t         minChromaULevel;
+    double           avgChromaULevel;
+
+
+    uint16_t         maxChromaVLevel;
+    uint16_t         minChromaVLevel;
+    double           avgChromaVLevel;
+
     char             sliceType;
     int              bScenecut;
+    double           ipCostRatio;
     int              frameLatency;
     x265_cu_stats    cuStats;
+    x265_pu_stats    puStats;
     double           totalFrameTime;
 } x265_frame_stats;
 
+typedef struct x265_ctu_info_t
+{
+    int32_t ctuAddress;
+    int32_t ctuPartitions[64];
+    void*    ctuInfo;
+} x265_ctu_info_t;
+
+typedef enum
+{
+    NO_CTU_INFO = 0,
+    HAS_CTU_INFO = 1,
+    CTU_INFO_CHANGE = 2,
+}CTUInfo;
+
+
 /* Arbitrary User SEI
  * Payload size is in bytes and the payload pointer must be non-NULL. 
  * Payload types and syntax can be found in Annex D of the H.265 Specification.
@@ -258,15 +300,15 @@
      * to allow the encoder to determine base QP */
     int     forceqp;
 
-    /* If param.analysisMode is X265_ANALYSIS_OFF this field is ignored on input
+    /* If param.analysisReuseMode is X265_ANALYSIS_OFF this field is ignored on input
      * and output. Else the user must call x265_alloc_analysis_data() to
      * allocate analysis buffers for every picture passed to the encoder.
      *
-     * On input when param.analysisMode is X265_ANALYSIS_LOAD and analysisData
+     * On input when param.analysisReuseMode is X265_ANALYSIS_LOAD and analysisData
      * member pointers are valid, the encoder will use the data stored here to
      * reduce encoder work.
      *
-     * On output when param.analysisMode is X265_ANALYSIS_SAVE and analysisData
+     * On output when param.analysisReuseMode is X265_ANALYSIS_SAVE and analysisData
      * member pointers are valid, the encoder will write output analysis into
      * this data structure */
     x265_analysis_data analysisData;
@@ -612,7 +654,14 @@
      * X265_LOG_FULL, default is X265_LOG_INFO */
     int       logLevel;
 
-    /* Filename of CSV log. Now deprecated */
+    /* Level of csv logging. 0 is summary, 1 is frame level logging,
+     * 2 is frame level logging with performance statistics */
+    int       csvLogLevel;
+
+    /* filename of CSV log. If csvLogLevel is non-zero, the encoder will emit
+     * per-slice statistics to this log file in encode order. Otherwise the
+     * encoder will emit per-stream statistics into the log file when
+     * x265_encoder_log is called (presumably at the end of the encode) */
     const char* csvfn;
 
     /*== Internal Picture Specification ==*/
@@ -1057,10 +1106,10 @@
      * buffers.  if X265_ANALYSIS_LOAD, read analysis information into analysis
      * buffer and use this analysis information to reduce the amount of work
      * the encoder must perform. Default X265_ANALYSIS_OFF */
-    int       analysisMode;
+    int       analysisReuseMode;
 
-    /* Filename for analysisMode save/load. Default name is "x265_analysis.dat" */
-    const char* analysisFileName;
+    /* Filename for analysisReuseMode save/load. Default name is "x265_analysis.dat" */
+    const char* analysisReuseFileName;
 
     /*== Rate Control ==*/
 
@@ -1194,6 +1243,9 @@
 
         /* sets a hard lower limit on QP */
         int      qpMin;
+
+        /* internally enable if tune grain is set */
+        int      bEnableConstVbv;
     } rc;
 
     /*== Video Usability Information ==*/
@@ -1376,9 +1428,9 @@
     int       bHDROpt;
 
     /* A value between 1 and 10 (both inclusive) determines the level of
-    * information stored/reused in save/load analysis-mode. Higher the refine
-    * level higher the informtion stored/reused. Default is 5 */
-    int       analysisRefineLevel;
+    * information stored/reused in save/load analysis-reuse-mode. Higher the refine
+    * level higher the information stored/reused. Default is 5 */
+    int       analysisReuseLevel;
 
      /* Limit Sample Adaptive Offset filter computation by early terminating SAO
      * process based on inter prediction mode, CTU spatial-domain correlations,
@@ -1391,7 +1443,44 @@
     /* Insert tone mapping information only for IDR frames and when the 
      * tone mapping information changes. */
     int       bDhdr10opt;
+
+    /* Determine how x265 react to the content information recieved through the API */
+    int       bCTUInfo;
+
+    /* Use ratecontrol statistics from pic_in, if available*/
+    int       bUseRcStats;
+
+    /* Factor by which input video is scaled down for analysis save mode. Default is 0 */
+    int       scaleFactor;
+
+    /* Enable intra refinement in load mode*/
+    int       intraRefine;
+
+    /* Enable inter refinement in load mode*/
+    int       interRefine;
+
+    /* Enable motion vector refinement in load mode*/
+    int       mvRefine;
+
+    /* Log of maximum CTU size */
+    uint32_t  maxLog2CUSize;
+
+    /* Actual CU depth with respect to config depth */
+    uint32_t  maxCUDepth;
+
+    /* CU depth with respect to maximum transform size */
+    uint32_t  unitSizeDepth;
+
+    /* Number of 4x4 units in maximum CU size */
+    uint32_t  num4x4Partitions;
+
+    /* Specify if analysis mode uses file for data reuse */
+    int       bUseAnalysisFile;
+
+    /* File pointer for csv log */
+    FILE*     csvfpt;
 } x265_param;
+
 /* x265_param_alloc:
  *  Allocates an x265_param instance. The returned param structure is not
  *  special in any way, but using this method together with x265_param_free()

 
@@ -24,10 +24,9 @@
 
 #ifndef X265_H
 #define X265_H
-
 #include <stdint.h>
+#include <stdio.h>
 #include "x265_config.h"
-
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -98,6 +97,7 @@
     uint32_t         sliceType;
     uint32_t         numCUsInFrame;
     uint32_t         numPartitions;
+    uint32_t         depthBytes;
     int              bScenecut;
     void*            wt;
     void*            interData;
@@ -117,6 +117,20 @@
 } x265_cu_stats;
 
 
+/* pu statistics */
+typedef struct x265_pu_stats
+{
+    double      percentSkipPu[4];               // Percentage of skip cu in all depths
+    double      percentIntraPu[4];              // Percentage of intra modes in all depths
+    double      percentAmpPu[4];                // Percentage of amp modes in all depths
+    double      percentInterPu[4][3];           // Percentage of inter 2nx2n, 2nxn and nx2n in all depths
+    double      percentMergePu[4][3];           // Percentage of merge 2nx2n, 2nxn and nx2n in all depth
+    double      percentNxN;
+
+    /* All the above values will add up to 100%. */
+} x265_pu_stats;
+
+
 typedef struct x265_analysis_2Pass
 {
     uint32_t      poc;
@@ -154,13 +168,41 @@
     int              list0POC[16];
     int              list1POC[16];
     uint16_t         maxLumaLevel;
+    uint16_t         minLumaLevel;
+
+    uint16_t         maxChromaULevel;
+    uint16_t         minChromaULevel;
+    double           avgChromaULevel;
+
+
+    uint16_t         maxChromaVLevel;
+    uint16_t         minChromaVLevel;
+    double           avgChromaVLevel;
+
     char             sliceType;
     int              bScenecut;
+    double           ipCostRatio;
     int              frameLatency;
     x265_cu_stats    cuStats;
+    x265_pu_stats    puStats;
     double           totalFrameTime;
 } x265_frame_stats;
 
+typedef struct x265_ctu_info_t
+{
+    int32_t ctuAddress;
+    int32_t ctuPartitions[64];
+    void*    ctuInfo;
+} x265_ctu_info_t;
+
+typedef enum
+{
+    NO_CTU_INFO = 0,
+    HAS_CTU_INFO = 1,
+    CTU_INFO_CHANGE = 2,
+}CTUInfo;
+
+
 /* Arbitrary User SEI
  * Payload size is in bytes and the payload pointer must be non-NULL. 
  * Payload types and syntax can be found in Annex D of the H.265 Specification.
@@ -258,15 +300,15 @@
      * to allow the encoder to determine base QP */
     int     forceqp;
 
-    /* If param.analysisMode is X265_ANALYSIS_OFF this field is ignored on input
+    /* If param.analysisReuseMode is X265_ANALYSIS_OFF this field is ignored on input
      * and output. Else the user must call x265_alloc_analysis_data() to
      * allocate analysis buffers for every picture passed to the encoder.
      *
-     * On input when param.analysisMode is X265_ANALYSIS_LOAD and analysisData
+     * On input when param.analysisReuseMode is X265_ANALYSIS_LOAD and analysisData
      * member pointers are valid, the encoder will use the data stored here to
      * reduce encoder work.
      *
-     * On output when param.analysisMode is X265_ANALYSIS_SAVE and analysisData
+     * On output when param.analysisReuseMode is X265_ANALYSIS_SAVE and analysisData
      * member pointers are valid, the encoder will write output analysis into
      * this data structure */
     x265_analysis_data analysisData;
@@ -612,7 +654,14 @@
      * X265_LOG_FULL, default is X265_LOG_INFO */
     int       logLevel;
 
-    /* Filename of CSV log. Now deprecated */
+    /* Level of csv logging. 0 is summary, 1 is frame level logging,
+     * 2 is frame level logging with performance statistics */
+    int       csvLogLevel;
+
+    /* filename of CSV log. If csvLogLevel is non-zero, the encoder will emit
+     * per-slice statistics to this log file in encode order. Otherwise the
+     * encoder will emit per-stream statistics into the log file when
+     * x265_encoder_log is called (presumably at the end of the encode) */
     const char* csvfn;
 
     /*== Internal Picture Specification ==*/
@@ -1057,10 +1106,10 @@
      * buffers.  if X265_ANALYSIS_LOAD, read analysis information into analysis
      * buffer and use this analysis information to reduce the amount of work
      * the encoder must perform. Default X265_ANALYSIS_OFF */
-    int       analysisMode;
+    int       analysisReuseMode;
 
-    /* Filename for analysisMode save/load. Default name is "x265_analysis.dat" */
-    const char* analysisFileName;
+    /* Filename for analysisReuseMode save/load. Default name is "x265_analysis.dat" */
+    const char* analysisReuseFileName;
 
     /*== Rate Control ==*/
 
@@ -1194,6 +1243,9 @@
 
         /* sets a hard lower limit on QP */
         int      qpMin;
+
+        /* internally enable if tune grain is set */
+        int      bEnableConstVbv;
     } rc;
 
     /*== Video Usability Information ==*/
@@ -1376,9 +1428,9 @@
     int       bHDROpt;
 
     /* A value between 1 and 10 (both inclusive) determines the level of
-    * information stored/reused in save/load analysis-mode. Higher the refine
-    * level higher the informtion stored/reused. Default is 5 */
-    int       analysisRefineLevel;
+    * information stored/reused in save/load analysis-reuse-mode. Higher the refine
+    * level higher the information stored/reused. Default is 5 */
+    int       analysisReuseLevel;
 
      /* Limit Sample Adaptive Offset filter computation by early terminating SAO
      * process based on inter prediction mode, CTU spatial-domain correlations,
@@ -1391,7 +1443,44 @@
     /* Insert tone mapping information only for IDR frames and when the 
      * tone mapping information changes. */
     int       bDhdr10opt;
+
+    /* Determine how x265 react to the content information recieved through the API */
+    int       bCTUInfo;
+
+    /* Use ratecontrol statistics from pic_in, if available*/
+    int       bUseRcStats;
+
+    /* Factor by which input video is scaled down for analysis save mode. Default is 0 */
+    int       scaleFactor;
+
+    /* Enable intra refinement in load mode*/
+    int       intraRefine;
+
+    /* Enable inter refinement in load mode*/
+    int       interRefine;
+
+    /* Enable motion vector refinement in load mode*/
+    int       mvRefine;
+
+    /* Log of maximum CTU size */
+    uint32_t  maxLog2CUSize;
+
+    /* Actual CU depth with respect to config depth */
+    uint32_t  maxCUDepth;
+
+    /* CU depth with respect to maximum transform size */
+    uint32_t  unitSizeDepth;
+
+    /* Number of 4x4 units in maximum CU size */
+    uint32_t  num4x4Partitions;
+
+    /* Specify if analysis mode uses file for data reuse */
+    int       bUseAnalysisFile;
+
+    /* File pointer for csv log */
+    FILE*     csvfpt;
 } x265_param;
+
 /* x265_param_alloc:
  *  Allocates an x265_param instance. The returned param structure is not
  *  special in any way, but using this method together with x265_param_free()
​

x265_2.4.tar.gz/source/x265cli.h -> x265_2.5.tar.gz/source/x265cli.h Changed

@@ -122,6 +122,7 @@
     { "scenecut",       required_argument, NULL, 0 },
     { "no-scenecut",          no_argument, NULL, 0 },
     { "scenecut-bias",  required_argument, NULL, 0 },
+    { "ctu-info",       required_argument, NULL, 0 },
     { "intra-refresh",        no_argument, NULL, 0 },
     { "rc-lookahead",   required_argument, NULL, 0 },
     { "lookahead-slices", required_argument, NULL, 0 },
@@ -158,6 +159,8 @@
     { "qpstep",         required_argument, NULL, 0 },
     { "qpmin",          required_argument, NULL, 0 },
     { "qpmax",          required_argument, NULL, 0 },
+    { "const-vbv",            no_argument, NULL, 0 },
+    { "no-const-vbv",         no_argument, NULL, 0 },
     { "ratetol",        required_argument, NULL, 0 },
     { "cplxblur",       required_argument, NULL, 0 },
     { "qblur",          required_argument, NULL, 0 },
@@ -247,9 +250,13 @@
     { "no-slow-firstpass",    no_argument, NULL, 0 },
     { "multi-pass-opt-rps",   no_argument, NULL, 0 },
     { "no-multi-pass-opt-rps", no_argument, NULL, 0 },
-    { "analysis-mode",  required_argument, NULL, 0 },
-    { "analysis-file",  required_argument, NULL, 0 },
-    { "refine-level",   required_argument, NULL, 0 },
+    { "analysis-reuse-mode",  required_argument, NULL, 0 },
+    { "analysis-reuse-file",  required_argument, NULL, 0 },
+    { "analysis-reuse-level", required_argument, NULL, 0 },
+    { "scale-factor",   required_argument, NULL, 0 },
+    { "refine-intra",   required_argument, NULL, 0 },
+    { "refine-inter",   no_argument, NULL, 0 },
+    { "no-refine-inter",no_argument, NULL, 0 },
     { "strict-cbr",           no_argument, NULL, 0 },
     { "temporal-layers",      no_argument, NULL, 0 },
     { "no-temporal-layers",   no_argument, NULL, 0 },
@@ -271,6 +278,8 @@
     { "dhdr10-info",    required_argument, NULL, 0 },
     { "dhdr10-opt",           no_argument, NULL, 0},
     { "no-dhdr10-opt",        no_argument, NULL, 0},
+    { "refine-mv",            no_argument, NULL, 0 },
+    { "no-refine-mv",         no_argument, NULL, 0 },
     { 0, 0, 0, 0 },
     { 0, 0, 0, 0 },
     { 0, 0, 0, 0 },
@@ -316,9 +325,9 @@
     H1("                                 1 - i420 (4:2:0 default)\n");
     H1("                                 2 - i422 (4:2:2)\n");
     H1("                                 3 - i444 (4:4:4)\n");
-#if ENABLE_DYNAMIC_HDR10
-    H0("   --dhdr10-info <filename>      JSON file containing the Creative Intent Metadata to be encoded as Dynamic Tone Mapping \n");
-    H0("   --[no-]dhdr10-opt             Insert tone mapping SEI only for IDR frames and when the tone mapping information changes. Default disabled");
+#if ENABLE_HDR10_PLUS
+    H0("   --dhdr10-info <filename>      JSON file containing the Creative Intent Metadata to be encoded as Dynamic Tone Mapping\n");
+    H0("   --[no-]dhdr10-opt             Insert tone mapping SEI only for IDR frames and when the tone mapping information changes. Default disabled\n");
 #endif
     H0("-f/--frames <integer>            Maximum number of frames to encode. Default all\n");
     H0("   --seek <integer>              First frame to encode\n");
@@ -367,6 +376,11 @@
     H1("   --[no-]tskip-fast             Enable fast intra transform skipping. Default %s\n", OPT(param->bEnableTSkipFast));
     H1("   --nr-intra <integer>          An integer value in range of 0 to 2000, which denotes strength of noise reduction in intra CUs. Default 0\n");
     H1("   --nr-inter <integer>          An integer value in range of 0 to 2000, which denotes strength of noise reduction in inter CUs. Default 0\n");
+    H0("   --ctu-info <integer>          Enable receiving ctu information asynchronously and determine reaction to the CTU information (0, 1, 2, 4, 6) Default 0\n"
+       "                                    - 1: force the partitions if CTU information is present\n"
+       "                                    - 2: functionality of (1) and reduce qp if CTU information has changed\n"
+       "                                    - 4: functionality of (1) and force Inter modes when CTU Information has changed, merge/skip otherwise\n"
+       "                                    Enable this option only when planning to invoke the API function x265_encoder_ctu_info to copy ctu-info asynchronously\n");
     H0("\nCoding tools:\n");
     H0("-w/--[no-]weightp                Enable weighted prediction in P slices. Default %s\n", OPT(param->bEnableWeightedPred));
     H0("   --[no-]weightb                Enable weighted prediction in B slices. Default %s\n", OPT(param->bEnableWeightedBiPred));
@@ -431,9 +445,13 @@
     H0("   --[no-]analyze-src-pics       Motion estimation uses source frame planes. Default disable\n");
     H0("   --[no-]slow-firstpass         Enable a slow first pass in a multipass rate control mode. Default %s\n", OPT(param->rc.bEnableSlowFirstPass));
     H0("   --[no-]strict-cbr             Enable stricter conditions and tolerance for bitrate deviations in CBR mode. Default %s\n", OPT(param->rc.bStrictCbr));
-    H0("   --analysis-mode <string|int>  save - Dump analysis info into file, load - Load analysis buffers from the file. Default %d\n", param->analysisMode);
-    H0("   --analysis-file <filename>    Specify file name used for either dumping or reading analysis data.\n");
-    H0("   --refine-level <1..10>        Level of analysis refinement indicates amount of info stored/reused in save/load mode, 1:least....10:most. Default %d\n", param->analysisRefineLevel);
+    H0("   --analysis-reuse-mode <string|int>  save - Dump analysis info into file, load - Load analysis buffers from the file. Default %d\n", param->analysisReuseMode);
+    H0("   --analysis-reuse-file <filename>    Specify file name used for either dumping or reading analysis data. Deault x265_analysis.dat\n");
+    H0("   --analysis-reuse-level <1..10>      Level of analysis reuse indicates amount of info stored/reused in save/load mode, 1:least..10:most. Default %d\n", param->analysisReuseLevel);
+    H0("   --scale-factor <int>          Specify factor by which input video is scaled down for analysis save mode. Default %d\n", param->scaleFactor);
+    H0("   --refine-intra <int>          Enable intra refinement for load mode. Default %d\n", param->intraRefine);
+    H0("   --[no-]refine-inter           Enable inter refinement for load mode. Default %s\n", OPT(param->interRefine));
+    H0("   --[no-]refine-mv              Enable mv refinement for load mode. Default %s\n", OPT(param->mvRefine));
     H0("   --aq-mode <integer>           Mode for Adaptive Quantization - 0:none 1:uniform AQ 2:auto variance 3:auto variance with bias to dark scenes. Default %d\n", param->rc.aqMode);
     H0("   --aq-strength <float>         Reduces blocking and blurring in flat and textured areas (0 to 3.0). Default %.2f\n", param->rc.aqStrength);
     H0("   --[no-]aq-motion              Adaptive Quantization based on the relative motion of each CU w.r.t., frame. Default %s\n", OPT(param->bOptCUDeltaQP));
@@ -446,6 +464,7 @@
     H1("   --qpstep <integer>            The maximum single adjustment in QP allowed to rate control. Default %d\n", param->rc.qpStep);
     H1("   --qpmin <integer>             sets a hard lower limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMin);
     H1("   --qpmax <integer>             sets a hard upper limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMax);
+    H0("   --[no-]const-vbv              Enable consistent vbv. turned on with tune grain. Default %s\n", OPT(param->rc.bEnableConstVbv));
     H1("   --cbqpoffs <integer>          Chroma Cb QP Offset [-12..12]. Default %d\n", param->cbQpOffset);
     H1("   --crqpoffs <integer>          Chroma Cr QP Offset [-12..12]. Default %d\n", param->crQpOffset);
     H1("   --scaling-list <string>       Specify a file containing HM style quant scaling lists or 'default' or 'off'. Default: off\n");

 
@@ -122,6 +122,7 @@
     { "scenecut",       required_argument, NULL, 0 },
     { "no-scenecut",          no_argument, NULL, 0 },
     { "scenecut-bias",  required_argument, NULL, 0 },
+    { "ctu-info",       required_argument, NULL, 0 },
     { "intra-refresh",        no_argument, NULL, 0 },
     { "rc-lookahead",   required_argument, NULL, 0 },
     { "lookahead-slices", required_argument, NULL, 0 },
@@ -158,6 +159,8 @@
     { "qpstep",         required_argument, NULL, 0 },
     { "qpmin",          required_argument, NULL, 0 },
     { "qpmax",          required_argument, NULL, 0 },
+    { "const-vbv",            no_argument, NULL, 0 },
+    { "no-const-vbv",         no_argument, NULL, 0 },
     { "ratetol",        required_argument, NULL, 0 },
     { "cplxblur",       required_argument, NULL, 0 },
     { "qblur",          required_argument, NULL, 0 },
@@ -247,9 +250,13 @@
     { "no-slow-firstpass",    no_argument, NULL, 0 },
     { "multi-pass-opt-rps",   no_argument, NULL, 0 },
     { "no-multi-pass-opt-rps", no_argument, NULL, 0 },
-    { "analysis-mode",  required_argument, NULL, 0 },
-    { "analysis-file",  required_argument, NULL, 0 },
-    { "refine-level",   required_argument, NULL, 0 },
+    { "analysis-reuse-mode",  required_argument, NULL, 0 },
+    { "analysis-reuse-file",  required_argument, NULL, 0 },
+    { "analysis-reuse-level", required_argument, NULL, 0 },
+    { "scale-factor",   required_argument, NULL, 0 },
+    { "refine-intra",   required_argument, NULL, 0 },
+    { "refine-inter",   no_argument, NULL, 0 },
+    { "no-refine-inter",no_argument, NULL, 0 },
     { "strict-cbr",           no_argument, NULL, 0 },
     { "temporal-layers",      no_argument, NULL, 0 },
     { "no-temporal-layers",   no_argument, NULL, 0 },
@@ -271,6 +278,8 @@
     { "dhdr10-info",    required_argument, NULL, 0 },
     { "dhdr10-opt",           no_argument, NULL, 0},
     { "no-dhdr10-opt",        no_argument, NULL, 0},
+    { "refine-mv",            no_argument, NULL, 0 },
+    { "no-refine-mv",         no_argument, NULL, 0 },
     { 0, 0, 0, 0 },
     { 0, 0, 0, 0 },
     { 0, 0, 0, 0 },
@@ -316,9 +325,9 @@
     H1("                                 1 - i420 (4:2:0 default)\n");
     H1("                                 2 - i422 (4:2:2)\n");
     H1("                                 3 - i444 (4:4:4)\n");
-#if ENABLE_DYNAMIC_HDR10
-    H0("   --dhdr10-info <filename>      JSON file containing the Creative Intent Metadata to be encoded as Dynamic Tone Mapping \n");
-    H0("   --[no-]dhdr10-opt             Insert tone mapping SEI only for IDR frames and when the tone mapping information changes. Default disabled");
+#if ENABLE_HDR10_PLUS
+    H0("   --dhdr10-info <filename>      JSON file containing the Creative Intent Metadata to be encoded as Dynamic Tone Mapping\n");
+    H0("   --[no-]dhdr10-opt             Insert tone mapping SEI only for IDR frames and when the tone mapping information changes. Default disabled\n");
 #endif
     H0("-f/--frames <integer>            Maximum number of frames to encode. Default all\n");
     H0("   --seek <integer>              First frame to encode\n");
@@ -367,6 +376,11 @@
     H1("   --[no-]tskip-fast             Enable fast intra transform skipping. Default %s\n", OPT(param->bEnableTSkipFast));
     H1("   --nr-intra <integer>          An integer value in range of 0 to 2000, which denotes strength of noise reduction in intra CUs. Default 0\n");
     H1("   --nr-inter <integer>          An integer value in range of 0 to 2000, which denotes strength of noise reduction in inter CUs. Default 0\n");
+    H0("   --ctu-info <integer>          Enable receiving ctu information asynchronously and determine reaction to the CTU information (0, 1, 2, 4, 6) Default 0\n"
+       "                                    - 1: force the partitions if CTU information is present\n"
+       "                                    - 2: functionality of (1) and reduce qp if CTU information has changed\n"
+       "                                    - 4: functionality of (1) and force Inter modes when CTU Information has changed, merge/skip otherwise\n"
+       "                                    Enable this option only when planning to invoke the API function x265_encoder_ctu_info to copy ctu-info asynchronously\n");
     H0("\nCoding tools:\n");
     H0("-w/--[no-]weightp                Enable weighted prediction in P slices. Default %s\n", OPT(param->bEnableWeightedPred));
     H0("   --[no-]weightb                Enable weighted prediction in B slices. Default %s\n", OPT(param->bEnableWeightedBiPred));
@@ -431,9 +445,13 @@
     H0("   --[no-]analyze-src-pics       Motion estimation uses source frame planes. Default disable\n");
     H0("   --[no-]slow-firstpass         Enable a slow first pass in a multipass rate control mode. Default %s\n", OPT(param->rc.bEnableSlowFirstPass));
     H0("   --[no-]strict-cbr             Enable stricter conditions and tolerance for bitrate deviations in CBR mode. Default %s\n", OPT(param->rc.bStrictCbr));
-    H0("   --analysis-mode <string|int>  save - Dump analysis info into file, load - Load analysis buffers from the file. Default %d\n", param->analysisMode);
-    H0("   --analysis-file <filename>    Specify file name used for either dumping or reading analysis data.\n");
-    H0("   --refine-level <1..10>        Level of analysis refinement indicates amount of info stored/reused in save/load mode, 1:least....10:most. Default %d\n", param->analysisRefineLevel);
+    H0("   --analysis-reuse-mode <string|int>  save - Dump analysis info into file, load - Load analysis buffers from the file. Default %d\n", param->analysisReuseMode);
+    H0("   --analysis-reuse-file <filename>    Specify file name used for either dumping or reading analysis data. Deault x265_analysis.dat\n");
+    H0("   --analysis-reuse-level <1..10>      Level of analysis reuse indicates amount of info stored/reused in save/load mode, 1:least..10:most. Default %d\n", param->analysisReuseLevel);
+    H0("   --scale-factor <int>          Specify factor by which input video is scaled down for analysis save mode. Default %d\n", param->scaleFactor);
+    H0("   --refine-intra <int>          Enable intra refinement for load mode. Default %d\n", param->intraRefine);
+    H0("   --[no-]refine-inter           Enable inter refinement for load mode. Default %s\n", OPT(param->interRefine));
+    H0("   --[no-]refine-mv              Enable mv refinement for load mode. Default %s\n", OPT(param->mvRefine));
     H0("   --aq-mode <integer>           Mode for Adaptive Quantization - 0:none 1:uniform AQ 2:auto variance 3:auto variance with bias to dark scenes. Default %d\n", param->rc.aqMode);
     H0("   --aq-strength <float>         Reduces blocking and blurring in flat and textured areas (0 to 3.0). Default %.2f\n", param->rc.aqStrength);
     H0("   --[no-]aq-motion              Adaptive Quantization based on the relative motion of each CU w.r.t., frame. Default %s\n", OPT(param->bOptCUDeltaQP));
@@ -446,6 +464,7 @@
     H1("   --qpstep <integer>            The maximum single adjustment in QP allowed to rate control. Default %d\n", param->rc.qpStep);
     H1("   --qpmin <integer>             sets a hard lower limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMin);
     H1("   --qpmax <integer>             sets a hard upper limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMax);
+    H0("   --[no-]const-vbv              Enable consistent vbv. turned on with tune grain. Default %s\n", OPT(param->rc.bEnableConstVbv));
     H1("   --cbqpoffs <integer>          Chroma Cb QP Offset [-12..12]. Default %d\n", param->cbQpOffset);
     H1("   --crqpoffs <integer>          Chroma Cr QP Offset [-12..12]. Default %d\n", param->crQpOffset);
     H1("   --scaling-list <string>       Specify a file containing HM style quant scaling lists or 'default' or 'off'. Default: off\n");
​