Packman Build Service PMBS

Changes of Revision 20

x265.changes Changed

@@ -1,8 +1,40 @@
 -------------------------------------------------------------------
+Fri Feb 24 14:03:24 UTC 2017 - ismail@i10z.com
+
+- Update to version 2.3
+  Encoder enhancements
+  * New SSIM-based RD-cost computation for improved visual quality,
+    and efficiency; use --ssim-rd to exercise.
+  * Multi-pass encoding can now share analysis information from
+    prior passes.
+  * A dedicated thread pool for lookahead can now be specified
+    with --lookahead-threads.
+  * option:–dynamic-rd dynamically increase analysis in areas
+    where the bitrate is being capped by VBV; works for both
+    CRF and ABR encodes with VBV settings.
+  * The number of bits used to signal the delta-QP can be
+    optimized with the --opt-cu-delta-qp option.
+  * Experimental feature option:–aq-motion adds new QP offsets
+    based on relative motion of a block with respect to the
+    movement of the frame.
+  API changes
+  * Reconfigure API now supports signalling new scaling lists.
+  * x265 application’s csv functionality now reports time
+    (in milliseconds) taken to encode each frame.
+  * --strict-cbr enables stricter bitrate adherence by adding
+    filler bits when achieved bitrate is lower than the target.
+  * --hdr can be used to ensure that max-cll and max-fall values
+    are always signaled (even if 0,0).
+  Bug fixes
+  * Fixed scaling lists support for 4:4:4 videos.
+  * Inconsistent output fix for --opt-qp-pss by removing last
+    slice’s QP from cost calculation.
+
+-------------------------------------------------------------------
 Sun Jan  1 20:32:07 UTC 2017 - idonmez@suse.com
 
 -  Update to version 2.2
-   Encode enhancements
+   Encoder enhancements
    * Enhancements to TU selection algorithm with early-outs for
      improved speed; use --limit-tu to exercise.
    * New motion search method SEA (Successive Elimination Algorithm)

​x
 
@@ -1,8 +1,40 @@
 -------------------------------------------------------------------
+Fri Feb 24 14:03:24 UTC 2017 - ismail@i10z.com
+
+- Update to version 2.3
+  Encoder enhancements
+  * New SSIM-based RD-cost computation for improved visual quality,
+    and efficiency; use --ssim-rd to exercise.
+  * Multi-pass encoding can now share analysis information from
+    prior passes.
+  * A dedicated thread pool for lookahead can now be specified
+    with --lookahead-threads.
+  * option:–dynamic-rd dynamically increase analysis in areas
+    where the bitrate is being capped by VBV; works for both
+    CRF and ABR encodes with VBV settings.
+  * The number of bits used to signal the delta-QP can be
+    optimized with the --opt-cu-delta-qp option.
+  * Experimental feature option:–aq-motion adds new QP offsets
+    based on relative motion of a block with respect to the
+    movement of the frame.
+  API changes
+  * Reconfigure API now supports signalling new scaling lists.
+  * x265 application’s csv functionality now reports time
+    (in milliseconds) taken to encode each frame.
+  * --strict-cbr enables stricter bitrate adherence by adding
+    filler bits when achieved bitrate is lower than the target.
+  * --hdr can be used to ensure that max-cll and max-fall values
+    are always signaled (even if 0,0).
+  Bug fixes
+  * Fixed scaling lists support for 4:4:4 videos.
+  * Inconsistent output fix for --opt-qp-pss by removing last
+    slice’s QP from cost calculation.
+
+-------------------------------------------------------------------
 Sun Jan  1 20:32:07 UTC 2017 - idonmez@suse.com
 
 -  Update to version 2.2
-   Encode enhancements
+   Encoder enhancements
    * Enhancements to TU selection algorithm with early-outs for
      improved speed; use --limit-tu to exercise.
    * New motion search method SEA (Successive Elimination Algorithm)
​

x265.spec Changed

 
@@ -1,10 +1,10 @@
 # based on the spec file from https://build.opensuse.org/package/view_file/home:Simmphonie/libx265/
 
 Name:           x265
-%define soname  102
+%define soname  110
 %define libname lib%{name}
 %define libsoname %{libname}-%{soname}
-Version:        2.2
+Version:        2.3
 Release:        0
 License:        GPL-2.0+
 Summary:        A free h265/HEVC encoder - encoder binary
​

baselibs.conf Changed

 
@@ -1,1 +1,1 @@
-libx265-102
+libx265-110
​

x265_2.2.tar.gz/.hg_archival.txt -> x265_2.3.tar.gz/.hg_archival.txt Changed

 
@@ -1,4 +1,4 @@
 repo: 09fe40627f03a0f9c3e6ac78b22ac93da23f9fdf
-node: be14a7e9755e54f0fd34911c72bdfa66981220bc
+node: 3037c1448549ca920967831482c653e5892fa8ed
 branch: stable
-tag: 2.2
+tag: 2.3
​

x265_2.2.tar.gz/.hgtags -> x265_2.3.tar.gz/.hgtags Changed

 
@@ -20,3 +20,4 @@
 1d3b6e448e01ec40b392ef78b7e55a86249fbe68 1.9
 960c9991d0dcf46559c32e070418d3cbb7e8aa2f 2.0
 981e3bfef16a997bce6f46ce1b15631a0e234747 2.1
+be14a7e9755e54f0fd34911c72bdfa66981220bc 2.2
​

x265_2.2.tar.gz/doc/reST/cli.rst -> x265_2.3.tar.gz/doc/reST/cli.rst Changed

@@ -872,6 +872,7 @@
 .. option:: --limit-tu <0..4>
 
 	Enables early exit from TU depth recursion, for inter coded blocks.
+	
 	Level 1 - decides to recurse to next higher depth based on cost 
 	comparison of full size TU and split TU.
 	
@@ -943,6 +944,26 @@
 	quad-tree begins at the same depth of the coded tree unit, but if the
 	maximum TU size is smaller than the CU size then transform QT begins 
 	at the depth of the max-tu-size. Default: 32.
+	
+.. option:: --dynamic-rd <0..4>
+	
+	Increases the RD level at points where quality drops due to VBV rate 
+	control enforcement. The number of CUs for which the RD is reconfigured 
+	is determined based on the strength. Strength 1 gives the best FPS, 
+	strength 4 gives the best SSIM. Strength 0 switches this feature off. 
+	Default: 0.
+	
+	Effective for RD levels 4 and below.
+
+.. option:: --ssim-rd, --no-ssim-rd
+
+    Enable/Disable SSIM RDO. SSIM is a better perceptual quality assessment
+    method as compared to MSE. SSIM based RDO calculation is based on residual
+    divisive normalization scheme. This normalization is consistent with the 
+    luminance and contrast masking effect of Human Visual System. It is used
+    for mode selection during analysis of CTUs and can achieve significant 
+    gain in terms of objective quality metrics SSIM and PSNR. It only has effect
+    on presets which use RDO-based mode decisions (:option:`--rd` 3 and above).
 
 Temporal / motion search options
 ================================
@@ -1227,8 +1248,18 @@
     Default: 8 for ultrafast, superfast, faster, fast, medium
              4 for slow, slower
              disabled for veryslow, slower
+			 
+.. option:: --lookahead-threads <integer>
 
+    Use multiple worker threads dedicated to doing only lookahead instead of sharing
+    the worker threads with frame Encoders. A dedicated lookahead threadpool is created with the
+    specified number of worker threads. This can range from 0 upto half the
+    hardware threads available for encoding. Using too many threads for lookahead can starve
+    resources for frame Encoder and can harm performance. Default is 0 - disabled, Lookahead 
+	shares worker threads with other FrameEncoders . 
 
+    **Values:** 0 - disabled(default). Max - Half of available hardware threads.
+	
 .. option:: --b-adapt <integer>
 
 	Set the level of effort in determining B frame placement.
@@ -1372,6 +1403,12 @@
 	Default 1.0.
 	**Range of values:** 0.0 to 3.0
 
+.. option:: --aq-motion, --no-aq-motion
+
+	Adjust the AQ offsets based on the relative motion of each block with
+	respect to the motion of the frame. The more the relative motion of the block,
+	the more quantization is used. Default disabled. **Experimental Feature**
+
 .. option:: --qg-size <64|32|16|8>
 
 	Enable adaptive quantization for sub-CTUs. This parameter specifies 
@@ -1428,6 +1465,23 @@
 	* :option:`--subme` = MIN(2, :option:`--subme`)
 	* :option:`--rd` = MIN(2, :option:`--rd`)
 
+.. option:: --multi-pass-opt-analysis, --no-multi-pass-opt-analysis
+
+    Enable/Disable multipass analysis refinement along with multipass ratecontrol. Based on 
+    the information stored in pass 1, in subsequent passes analysis data is refined 
+    and also redundant steps are skipped.
+    In pass 1 analysis information like motion vector, depth, reference and prediction
+    modes of the final best CTU partition is stored for each CTU.
+    Default disabled.
+
+.. option:: --multi-pass-opt-distortion, --no-multi-pass-opt-distortion
+
+    Enable/Disable multipass refinement of qp based on distortion data along with multipass
+    ratecontrol. In pass 1 distortion of best CTU partition is stored. CTUs with high
+    distortion get lower(negative)qp offsets and vice-versa for low distortion CTUs in pass 2.
+    This helps to improve the subjective quality.
+    Default disabled.
+
 .. option:: --strict-cbr, --no-strict-cbr
 	
 	Enables stricter conditions to control bitrate deviance from the 
@@ -1753,7 +1807,8 @@
 	where %hu are unsigned 16bit integers and %u are unsigned 32bit
 	integers. The SEI includes X,Y display primaries for RGB channels
 	and white point (WP) in units of 0.00002 and max,min luminance (L)
-	values in units of 0.0001 candela per meter square. (HDR)
+	values in units of 0.0001 candela per meter square. Applicable for HDR
+	content.
 
 	Example for a P3D65 1000-nits monitor, where G(x=0.265, y=0.690),
 	B(x=0.150, y=0.060), R(x=0.680, y=0.320), WP(x=0.3127, y=0.3290),
@@ -1774,7 +1829,7 @@
 	emitted. The string format is "%hu,%hu" where %hu are unsigned 16bit
 	integers. The first value is the max content light level (or 0 if no
 	maximum is indicated), the second value is the maximum picture
-	average light level (or 0). (HDR)
+	average light level (or 0). Applicable for HDR content.
 
 	Example for MaxCLL=1000 candela per square meter, MaxFALL=400
 	candela per square meter:
@@ -1784,6 +1839,13 @@
 	Note that this string value will need to be escaped or quoted to
 	protect against shell expansion on many platforms. No default.
 
+.. option:: --hdr, --no-hdr
+
+	Force signalling of HDR parameters in SEI packets. Enabled
+	automatically when :option`--master-display` or :option`--max-cll` is
+	specified. Useful when there is a desire to signal 0 values for max-cll
+	and max-fall. Default disabled.
+
 .. option:: --min-luma <integer>
 
 	Minimum luma value allowed for input pictures. Any values below min-luma
@@ -1862,29 +1924,36 @@
 
   Maximum of the picture order count. Default 8
 
-.. option:: --[no-]vui-timing-info
+.. option:: --vui-timing-info, --no-vui-timing-info
 
 	Emit VUI timing info in bitstream. Default enabled.
 
-.. option:: --[no-]vui-hrd-info
+.. option:: --vui-hrd-info, --no-vui-hrd-info
 
 	Emit VUI HRD info in  bitstream. Default enabled when
 	:option:`--hrd` is enabled.
 
-.. option:: --[no-]opt-qp-pps
+.. option:: --opt-qp-pps, --no-opt-qp-pps
 
 	Optimize QP in PPS (instead of default value of 26) based on the QP values
 	observed in last GOP. Default enabled.
 
-.. option:: --[no-]opt-ref-list-length-pps
+.. option:: --opt-ref-list-length-pps, --no-opt-ref-list-length-pps
 
 	Optimize L0 and L1 ref list length in PPS (instead of default value of 0)
 	based on the lengths observed in the last GOP. Default enabled.
 
-.. option:: --[no-]multi-pass-opt-rps
+.. option:: --multi-pass-opt-rps, --no-multi-pass-opt-rps
 
 	Enable storing commonly used RPS in SPS in multi pass mode. Default disabled.
 
+.. option:: --opt-cu-delta-qp, --no-opt-cu-delta-qp
+
+	Optimize CU level QPs by pulling up lower QPs to value close to meanQP thereby
+	minimizing fluctuations in deltaQP signalling. Default disabled.
+
+	Only effective at RD levels 5 and 6
+
 
 Debugging options
 =================

 
@@ -872,6 +872,7 @@
 .. option:: --limit-tu <0..4>
 
    Enables early exit from TU depth recursion, for inter coded blocks.
+   
    Level 1 - decides to recurse to next higher depth based on cost 
    comparison of full size TU and split TU.
    
@@ -943,6 +944,26 @@
    quad-tree begins at the same depth of the coded tree unit, but if the
    maximum TU size is smaller than the CU size then transform QT begins 
    at the depth of the max-tu-size. Default: 32.
+   
+.. option:: --dynamic-rd <0..4>
+   
+   Increases the RD level at points where quality drops due to VBV rate 
+   control enforcement. The number of CUs for which the RD is reconfigured 
+   is determined based on the strength. Strength 1 gives the best FPS, 
+   strength 4 gives the best SSIM. Strength 0 switches this feature off. 
+   Default: 0.
+   
+   Effective for RD levels 4 and below.
+
+.. option:: --ssim-rd, --no-ssim-rd
+
+    Enable/Disable SSIM RDO. SSIM is a better perceptual quality assessment
+    method as compared to MSE. SSIM based RDO calculation is based on residual
+    divisive normalization scheme. This normalization is consistent with the 
+    luminance and contrast masking effect of Human Visual System. It is used
+    for mode selection during analysis of CTUs and can achieve significant 
+    gain in terms of objective quality metrics SSIM and PSNR. It only has effect
+    on presets which use RDO-based mode decisions (:option:`--rd` 3 and above).
 
 Temporal / motion search options
 ================================
@@ -1227,8 +1248,18 @@
     Default: 8 for ultrafast, superfast, faster, fast, medium
              4 for slow, slower
              disabled for veryslow, slower
+            
+.. option:: --lookahead-threads <integer>
 
+    Use multiple worker threads dedicated to doing only lookahead instead of sharing
+    the worker threads with frame Encoders. A dedicated lookahead threadpool is created with the
+    specified number of worker threads. This can range from 0 upto half the
+    hardware threads available for encoding. Using too many threads for lookahead can starve
+    resources for frame Encoder and can harm performance. Default is 0 - disabled, Lookahead 
+   shares worker threads with other FrameEncoders . 
 
+    **Values:** 0 - disabled(default). Max - Half of available hardware threads.
+   
 .. option:: --b-adapt <integer>
 
    Set the level of effort in determining B frame placement.
@@ -1372,6 +1403,12 @@
    Default 1.0.
    **Range of values:** 0.0 to 3.0
 
+.. option:: --aq-motion, --no-aq-motion
+
+   Adjust the AQ offsets based on the relative motion of each block with
+   respect to the motion of the frame. The more the relative motion of the block,
+   the more quantization is used. Default disabled. **Experimental Feature**
+
 .. option:: --qg-size <64|32|16|8>
 
    Enable adaptive quantization for sub-CTUs. This parameter specifies 
@@ -1428,6 +1465,23 @@
    * :option:`--subme` = MIN(2, :option:`--subme`)
    * :option:`--rd` = MIN(2, :option:`--rd`)
 
+.. option:: --multi-pass-opt-analysis, --no-multi-pass-opt-analysis
+
+    Enable/Disable multipass analysis refinement along with multipass ratecontrol. Based on 
+    the information stored in pass 1, in subsequent passes analysis data is refined 
+    and also redundant steps are skipped.
+    In pass 1 analysis information like motion vector, depth, reference and prediction
+    modes of the final best CTU partition is stored for each CTU.
+    Default disabled.
+
+.. option:: --multi-pass-opt-distortion, --no-multi-pass-opt-distortion
+
+    Enable/Disable multipass refinement of qp based on distortion data along with multipass
+    ratecontrol. In pass 1 distortion of best CTU partition is stored. CTUs with high
+    distortion get lower(negative)qp offsets and vice-versa for low distortion CTUs in pass 2.
+    This helps to improve the subjective quality.
+    Default disabled.
+
 .. option:: --strict-cbr, --no-strict-cbr
    
    Enables stricter conditions to control bitrate deviance from the 
@@ -1753,7 +1807,8 @@
    where %hu are unsigned 16bit integers and %u are unsigned 32bit
    integers. The SEI includes X,Y display primaries for RGB channels
    and white point (WP) in units of 0.00002 and max,min luminance (L)
-   values in units of 0.0001 candela per meter square. (HDR)
+   values in units of 0.0001 candela per meter square. Applicable for HDR
+   content.
 
    Example for a P3D65 1000-nits monitor, where G(x=0.265, y=0.690),
    B(x=0.150, y=0.060), R(x=0.680, y=0.320), WP(x=0.3127, y=0.3290),
@@ -1774,7 +1829,7 @@
    emitted. The string format is "%hu,%hu" where %hu are unsigned 16bit
    integers. The first value is the max content light level (or 0 if no
    maximum is indicated), the second value is the maximum picture
-   average light level (or 0). (HDR)
+   average light level (or 0). Applicable for HDR content.
 
    Example for MaxCLL=1000 candela per square meter, MaxFALL=400
    candela per square meter:
@@ -1784,6 +1839,13 @@
    Note that this string value will need to be escaped or quoted to
    protect against shell expansion on many platforms. No default.
 
+.. option:: --hdr, --no-hdr
+
+   Force signalling of HDR parameters in SEI packets. Enabled
+   automatically when :option`--master-display` or :option`--max-cll` is
+   specified. Useful when there is a desire to signal 0 values for max-cll
+   and max-fall. Default disabled.
+
 .. option:: --min-luma <integer>
 
    Minimum luma value allowed for input pictures. Any values below min-luma
@@ -1862,29 +1924,36 @@
 
   Maximum of the picture order count. Default 8
 
-.. option:: --[no-]vui-timing-info
+.. option:: --vui-timing-info, --no-vui-timing-info
 
    Emit VUI timing info in bitstream. Default enabled.
 
-.. option:: --[no-]vui-hrd-info
+.. option:: --vui-hrd-info, --no-vui-hrd-info
 
    Emit VUI HRD info in  bitstream. Default enabled when
    :option:`--hrd` is enabled.
 
-.. option:: --[no-]opt-qp-pps
+.. option:: --opt-qp-pps, --no-opt-qp-pps
 
    Optimize QP in PPS (instead of default value of 26) based on the QP values
    observed in last GOP. Default enabled.
 
-.. option:: --[no-]opt-ref-list-length-pps
+.. option:: --opt-ref-list-length-pps, --no-opt-ref-list-length-pps
 
    Optimize L0 and L1 ref list length in PPS (instead of default value of 0)
    based on the lengths observed in the last GOP. Default enabled.
 
-.. option:: --[no-]multi-pass-opt-rps
+.. option:: --multi-pass-opt-rps, --no-multi-pass-opt-rps
 
    Enable storing commonly used RPS in SPS in multi pass mode. Default disabled.
 
+.. option:: --opt-cu-delta-qp, --no-opt-cu-delta-qp
+
+   Optimize CU level QPs by pulling up lower QPs to value close to meanQP thereby
+   minimizing fluctuations in deltaQP signalling. Default disabled.
+
+   Only effective at RD levels 5 and 6
+
 
 Debugging options
 =================
​

x265_2.2.tar.gz/doc/reST/releasenotes.rst -> x265_2.3.tar.gz/doc/reST/releasenotes.rst Changed

@@ -2,6 +2,34 @@
 Release Notes
 *************
 
+Version 2.3
+===========
+
+Release date - 15th February, 2017.
+
+Encoder enhancements
+--------------------
+1. New SSIM-based RD-cost computation for improved visual quality, and efficiency; use :option:`--ssim-rd` to exercise.
+2. Multi-pass encoding can now share analysis information from prior passes (in addition to rate-control information) to improve performance and quality of subsequent passes; to your multi-pass command-lines that use the :option:`--pass` option, add :option:`--multi-pass-opt-distortion` to share distortion information, and :option:`--multi-pass-opt-analysis` to share other analysis information.
+3. A dedicated thread pool for lookahead can now be specified with :option:`--lookahead-threads`.
+4. option:`--dynamic-rd` dynamically increase analysis in areas where the bitrate is being capped by VBV; works for both CRF and ABR encodes with VBV settings.
+5. The number of bits used to signal the delta-QP can be optimized with the :option:`--opt-cu-delta-qp` option; found to be useful in some scenarios for lower bitrate targets.
+6. Experimental feature option:`--aq-motion` adds new QP offsets based on relative motion of a block with respect to the movement of the frame.
+
+API changes
+-----------
+1. Reconfigure API now supports signalling new scaling lists.
+2. x265 application's csv functionality now reports time (in milliseconds) taken to encode each frame.
+3. :option:`--strict-cbr` enables stricter bitrate adherence by adding filler bits when achieved bitrate is lower than the target; earlier, it was only reacting when the achieved rate was higher.
+4. :option:`--hdr` can be used to ensure that max-cll and max-fall values are always signaled (even if 0,0).
+
+Bug fixes
+---------
+1. Fixed incorrect HW thread counting on MacOS platform.
+2. Fixed scaling lists support for 4:4:4 videos.
+3. Inconsistent output fix for :option:`--opt-qp-pss` by removing last slice's QP from cost calculation.
+4. VTune profiling (enabled using ENABLE_VTUNE CMake option) now also works with 2017 VTune builds.
+
 Version 2.2
 ===========
 
@@ -11,7 +39,7 @@
 --------------------
 1. Enhancements to TU selection algorithm with early-outs for improved speed; use :option:`--limit-tu` to exercise.
 2. New motion search method SEA (Successive Elimination Algorithm) supported now as :option: `--me` 4
-3. Bit-stream optimizations to improve fields in PPS and SPS for bit-rate savings through :option:`--[no-]opt-qp-pps`, :option:`--[no-]opt-ref-list-length-pps`, and :option:`--[no-]multi-pass-opt-rps`.
+3. Bit-stream optimizations to improve fields in PPS and SPS for bit-rate savings through :option:`--opt-qp-pps`, :option:`--opt-ref-list-length-pps`, and :option:`--multi-pass-opt-rps`.
 4. Enabled using VBV constraints when encoding without WPP.
 5. All param options dumped in SEI packet in bitstream when info selected.
 6. x265 now supports POWERPC-based systems. Several key functions also have optimized ALTIVEC kernels.

 
@@ -2,6 +2,34 @@
 Release Notes
 *************
 
+Version 2.3
+===========
+
+Release date - 15th February, 2017.
+
+Encoder enhancements
+--------------------
+1. New SSIM-based RD-cost computation for improved visual quality, and efficiency; use :option:`--ssim-rd` to exercise.
+2. Multi-pass encoding can now share analysis information from prior passes (in addition to rate-control information) to improve performance and quality of subsequent passes; to your multi-pass command-lines that use the :option:`--pass` option, add :option:`--multi-pass-opt-distortion` to share distortion information, and :option:`--multi-pass-opt-analysis` to share other analysis information.
+3. A dedicated thread pool for lookahead can now be specified with :option:`--lookahead-threads`.
+4. option:`--dynamic-rd` dynamically increase analysis in areas where the bitrate is being capped by VBV; works for both CRF and ABR encodes with VBV settings.
+5. The number of bits used to signal the delta-QP can be optimized with the :option:`--opt-cu-delta-qp` option; found to be useful in some scenarios for lower bitrate targets.
+6. Experimental feature option:`--aq-motion` adds new QP offsets based on relative motion of a block with respect to the movement of the frame.
+
+API changes
+-----------
+1. Reconfigure API now supports signalling new scaling lists.
+2. x265 application's csv functionality now reports time (in milliseconds) taken to encode each frame.
+3. :option:`--strict-cbr` enables stricter bitrate adherence by adding filler bits when achieved bitrate is lower than the target; earlier, it was only reacting when the achieved rate was higher.
+4. :option:`--hdr` can be used to ensure that max-cll and max-fall values are always signaled (even if 0,0).
+
+Bug fixes
+---------
+1. Fixed incorrect HW thread counting on MacOS platform.
+2. Fixed scaling lists support for 4:4:4 videos.
+3. Inconsistent output fix for :option:`--opt-qp-pss` by removing last slice's QP from cost calculation.
+4. VTune profiling (enabled using ENABLE_VTUNE CMake option) now also works with 2017 VTune builds.
+
 Version 2.2
 ===========
 
@@ -11,7 +39,7 @@
 --------------------
 1. Enhancements to TU selection algorithm with early-outs for improved speed; use :option:`--limit-tu` to exercise.
 2. New motion search method SEA (Successive Elimination Algorithm) supported now as :option: `--me` 4
-3. Bit-stream optimizations to improve fields in PPS and SPS for bit-rate savings through :option:`--[no-]opt-qp-pps`, :option:`--[no-]opt-ref-list-length-pps`, and :option:`--[no-]multi-pass-opt-rps`.
+3. Bit-stream optimizations to improve fields in PPS and SPS for bit-rate savings through :option:`--opt-qp-pps`, :option:`--opt-ref-list-length-pps`, and :option:`--multi-pass-opt-rps`.
 4. Enabled using VBV constraints when encoding without WPP.
 5. All param options dumped in SEI packet in bitstream when info selected.
 6. x265 now supports POWERPC-based systems. Several key functions also have optimized ALTIVEC kernels.
​

x265_2.2.tar.gz/source/CMakeLists.txt -> x265_2.3.tar.gz/source/CMakeLists.txt Changed

 
@@ -28,9 +28,8 @@
 option(NATIVE_BUILD "Target the build CPU" OFF)
 option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF)
 mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
-
 # X265_BUILD must be incremented each time the public API is changed
-set(X265_BUILD 102)
+set(X265_BUILD 110)
 configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
                "${PROJECT_BINARY_DIR}/x265.def")
 configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
@@ -123,7 +122,7 @@
   set(XCODE 1)
 endif()
 if(APPLE)
-  add_definitions(-DMACOS)
+  add_definitions(-DMACOS=1)
 endif()
 
 if(${CMAKE_CXX_COMPILER_ID} STREQUAL "Clang")
​

x265_2.2.tar.gz/source/cmake/FindVtune.cmake -> x265_2.3.tar.gz/source/cmake/FindVtune.cmake Changed

 
@@ -15,7 +15,7 @@
     else()
         NAMES amplxe-vars.bat
     endif(UNIX)
-    HINTS $ENV{VTUNE_AMPLIFIER_XE_2016_DIR} $ENV{VTUNE_AMPLIFIER_XE_2015_DIR}
+    HINTS $ENV{VTUNE_AMPLIFIER_XE_2017_DIR} $ENV{VTUNE_AMPLIFIER_XE_2016_DIR} $ENV{VTUNE_AMPLIFIER_XE_2015_DIR}
     DOC "Vtune root directory")
 
 set (VTUNE_INCLUDE_DIR ${VTUNE_DIR}/include)
​

x265_2.2.tar.gz/source/common/common.h -> x265_2.3.tar.gz/source/common/common.h Changed

 
@@ -330,6 +330,10 @@
 
 #define INTEGRAL_PLANE_NUM          12 // 12 integral planes for 32x32, 32x24, 32x8, 24x32, 16x16, 16x12, 16x4, 12x16, 8x32, 8x8, 4x16 and 4x4.
 
+#define NAL_TYPE_OVERHEAD 2
+#define START_CODE_OVERHEAD 3 
+#define FILLER_OVERHEAD (NAL_TYPE_OVERHEAD + START_CODE_OVERHEAD + 1)
+
 namespace X265_NS {
 
 enum { SAO_NUM_OFFSET = 4 };
​

x265_2.2.tar.gz/source/common/cudata.cpp -> x265_2.3.tar.gz/source/common/cudata.cpp Changed

@@ -218,10 +218,13 @@
         m_mvd[0] = m_mv[1] +  m_numPartitions;
         m_mvd[1] = m_mvd[0] + m_numPartitions;
 
+        m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions;
+
         uint32_t cuSize = g_maxCUSize >> depth;
         m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (cuSize * cuSize);
         m_trCoeff[1] = m_trCoeff[2] = 0;
         m_transformSkip[1] = m_transformSkip[2] = m_cbf[1] = m_cbf[2] = 0;
+        m_fAc_den[0] = m_fDc_den[0] = 0;
     }
     else
     {
@@ -257,12 +260,16 @@
         m_mvd[0] = m_mv[1] +  m_numPartitions;
         m_mvd[1] = m_mvd[0] + m_numPartitions;
 
+        m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions;
+
         uint32_t cuSize = g_maxCUSize >> depth;
         uint32_t sizeL = cuSize * cuSize;
         uint32_t sizeC = sizeL >> (m_hChromaShift + m_vChromaShift); // block chroma part
         m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (sizeL + sizeC * 2);
         m_trCoeff[1] = m_trCoeff[0] + sizeL;
         m_trCoeff[2] = m_trCoeff[0] + sizeL + sizeC;
+        for (int i = 0; i < 3; i++)
+            m_fAc_den[i] = m_fDc_den[i] = 0;
     }
 }
 
@@ -299,11 +306,14 @@
     for (int8_t i = 0; i < NUM_TU_DEPTH; i++)
         m_refTuDepth[i] = -1;
 
+    m_vbvAffected = false;
+
     uint32_t widthInCU = m_slice->m_sps->numCuInWidth;
     m_cuLeft = (m_cuAddr % widthInCU) ? m_encData->getPicCTU(m_cuAddr - 1) : NULL;
     m_cuAbove = (m_cuAddr >= widthInCU) && !m_bFirstRowInSlice ? m_encData->getPicCTU(m_cuAddr - widthInCU) : NULL;
     m_cuAboveLeft = (m_cuLeft && m_cuAbove) ? m_encData->getPicCTU(m_cuAddr - widthInCU - 1) : NULL;
     m_cuAboveRight = (m_cuAbove && ((m_cuAddr % widthInCU) < (widthInCU - 1))) ? m_encData->getPicCTU(m_cuAddr - widthInCU + 1) : NULL;
+    memset(m_distortion, 0, m_numPartitions * sizeof(sse_t));
 }
 
 // initialize Sub partition
@@ -322,6 +332,11 @@
     m_bFirstRowInSlice = ctu.m_bFirstRowInSlice;
     m_bLastRowInSlice = ctu.m_bLastRowInSlice;
     m_bLastCuInSlice = ctu.m_bLastCuInSlice;
+    for (int i = 0; i < 3; i++)
+    {
+        m_fAc_den[i] = ctu.m_fAc_den[i];
+        m_fDc_den[i] = ctu.m_fDc_den[i];
+    }
 
     X265_CHECK(m_numPartitions == cuGeom.numPartitions, "initSubCU() size mismatch\n");
 
@@ -337,6 +352,7 @@
 
     /* initialize the remaining CU data in one memset */
     memset(m_predMode, 0, (ctu.m_chromaFormat == X265_CSP_I400 ? BytesPerPartition - 12 : BytesPerPartition - 8) * m_numPartitions);
+    memset(m_distortion, 0, m_numPartitions * sizeof(sse_t));
 }
 
 /* Copy the results of a sub-part (split) CU to the parent CU */
@@ -372,6 +388,8 @@
     memcpy(m_mvd[0] + offset, subCU.m_mvd[0], childGeom.numPartitions * sizeof(MV));
     memcpy(m_mvd[1] + offset, subCU.m_mvd[1], childGeom.numPartitions * sizeof(MV));
 
+    memcpy(m_distortion + offset, subCU.m_distortion, childGeom.numPartitions * sizeof(sse_t));
+
     uint32_t tmp = 1 << ((g_maxLog2CUSize - childGeom.depth) * 2);
     uint32_t tmp2 = subPartIdx * tmp;
     memcpy(m_trCoeff[0] + tmp2, subCU.m_trCoeff[0], sizeof(coeff_t)* tmp);
@@ -421,6 +439,7 @@
     memcpy(m_mv[1],  cu.m_mv[1],  m_numPartitions * sizeof(MV));
     memcpy(m_mvd[0], cu.m_mvd[0], m_numPartitions * sizeof(MV));
     memcpy(m_mvd[1], cu.m_mvd[1], m_numPartitions * sizeof(MV));
+    memcpy(m_distortion, cu.m_distortion, m_numPartitions * sizeof(sse_t));
 
     /* force TQBypass to true */
     m_partSet(m_tqBypass, true);
@@ -468,6 +487,8 @@
     memcpy(ctu.m_mvd[0] + m_absIdxInCTU, m_mvd[0], m_numPartitions * sizeof(MV));
     memcpy(ctu.m_mvd[1] + m_absIdxInCTU, m_mvd[1], m_numPartitions * sizeof(MV));
 
+    memcpy(ctu.m_distortion + m_absIdxInCTU, m_distortion, m_numPartitions * sizeof(sse_t));
+
     uint32_t tmpY = 1 << ((g_maxLog2CUSize - depth) * 2);
     uint32_t tmpY2 = m_absIdxInCTU << (LOG2_UNIT_SIZE * 2);
     memcpy(ctu.m_trCoeff[0] + tmpY2, m_trCoeff[0], sizeof(coeff_t)* tmpY);
@@ -520,6 +541,8 @@
     memcpy(m_mvd[0], ctu.m_mvd[0] + m_absIdxInCTU, m_numPartitions * sizeof(MV));
     memcpy(m_mvd[1], ctu.m_mvd[1] + m_absIdxInCTU, m_numPartitions * sizeof(MV));
 
+    memcpy(m_distortion, ctu.m_distortion + m_absIdxInCTU, m_numPartitions * sizeof(sse_t));
+
     /* clear residual coding flags */
     m_partSet(m_tuDepth, 0);
     m_partSet(m_transformSkip[0], 0);

 
@@ -218,10 +218,13 @@
         m_mvd[0] = m_mv[1] +  m_numPartitions;
         m_mvd[1] = m_mvd[0] + m_numPartitions;
 
+        m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions;
+
         uint32_t cuSize = g_maxCUSize >> depth;
         m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (cuSize * cuSize);
         m_trCoeff[1] = m_trCoeff[2] = 0;
         m_transformSkip[1] = m_transformSkip[2] = m_cbf[1] = m_cbf[2] = 0;
+        m_fAc_den[0] = m_fDc_den[0] = 0;
     }
     else
     {
@@ -257,12 +260,16 @@
         m_mvd[0] = m_mv[1] +  m_numPartitions;
         m_mvd[1] = m_mvd[0] + m_numPartitions;
 
+        m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions;
+
         uint32_t cuSize = g_maxCUSize >> depth;
         uint32_t sizeL = cuSize * cuSize;
         uint32_t sizeC = sizeL >> (m_hChromaShift + m_vChromaShift); // block chroma part
         m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (sizeL + sizeC * 2);
         m_trCoeff[1] = m_trCoeff[0] + sizeL;
         m_trCoeff[2] = m_trCoeff[0] + sizeL + sizeC;
+        for (int i = 0; i < 3; i++)
+            m_fAc_den[i] = m_fDc_den[i] = 0;
     }
 }
 
@@ -299,11 +306,14 @@
     for (int8_t i = 0; i < NUM_TU_DEPTH; i++)
         m_refTuDepth[i] = -1;
 
+    m_vbvAffected = false;
+
     uint32_t widthInCU = m_slice->m_sps->numCuInWidth;
     m_cuLeft = (m_cuAddr % widthInCU) ? m_encData->getPicCTU(m_cuAddr - 1) : NULL;
     m_cuAbove = (m_cuAddr >= widthInCU) && !m_bFirstRowInSlice ? m_encData->getPicCTU(m_cuAddr - widthInCU) : NULL;
     m_cuAboveLeft = (m_cuLeft && m_cuAbove) ? m_encData->getPicCTU(m_cuAddr - widthInCU - 1) : NULL;
     m_cuAboveRight = (m_cuAbove && ((m_cuAddr % widthInCU) < (widthInCU - 1))) ? m_encData->getPicCTU(m_cuAddr - widthInCU + 1) : NULL;
+    memset(m_distortion, 0, m_numPartitions * sizeof(sse_t));
 }
 
 // initialize Sub partition
@@ -322,6 +332,11 @@
     m_bFirstRowInSlice = ctu.m_bFirstRowInSlice;
     m_bLastRowInSlice = ctu.m_bLastRowInSlice;
     m_bLastCuInSlice = ctu.m_bLastCuInSlice;
+    for (int i = 0; i < 3; i++)
+    {
+        m_fAc_den[i] = ctu.m_fAc_den[i];
+        m_fDc_den[i] = ctu.m_fDc_den[i];
+    }
 
     X265_CHECK(m_numPartitions == cuGeom.numPartitions, "initSubCU() size mismatch\n");
 
@@ -337,6 +352,7 @@
 
     /* initialize the remaining CU data in one memset */
     memset(m_predMode, 0, (ctu.m_chromaFormat == X265_CSP_I400 ? BytesPerPartition - 12 : BytesPerPartition - 8) * m_numPartitions);
+    memset(m_distortion, 0, m_numPartitions * sizeof(sse_t));
 }
 
 /* Copy the results of a sub-part (split) CU to the parent CU */
@@ -372,6 +388,8 @@
     memcpy(m_mvd[0] + offset, subCU.m_mvd[0], childGeom.numPartitions * sizeof(MV));
     memcpy(m_mvd[1] + offset, subCU.m_mvd[1], childGeom.numPartitions * sizeof(MV));
 
+    memcpy(m_distortion + offset, subCU.m_distortion, childGeom.numPartitions * sizeof(sse_t));
+
     uint32_t tmp = 1 << ((g_maxLog2CUSize - childGeom.depth) * 2);
     uint32_t tmp2 = subPartIdx * tmp;
     memcpy(m_trCoeff[0] + tmp2, subCU.m_trCoeff[0], sizeof(coeff_t)* tmp);
@@ -421,6 +439,7 @@
     memcpy(m_mv[1],  cu.m_mv[1],  m_numPartitions * sizeof(MV));
     memcpy(m_mvd[0], cu.m_mvd[0], m_numPartitions * sizeof(MV));
     memcpy(m_mvd[1], cu.m_mvd[1], m_numPartitions * sizeof(MV));
+    memcpy(m_distortion, cu.m_distortion, m_numPartitions * sizeof(sse_t));
 
     /* force TQBypass to true */
     m_partSet(m_tqBypass, true);
@@ -468,6 +487,8 @@
     memcpy(ctu.m_mvd[0] + m_absIdxInCTU, m_mvd[0], m_numPartitions * sizeof(MV));
     memcpy(ctu.m_mvd[1] + m_absIdxInCTU, m_mvd[1], m_numPartitions * sizeof(MV));
 
+    memcpy(ctu.m_distortion + m_absIdxInCTU, m_distortion, m_numPartitions * sizeof(sse_t));
+
     uint32_t tmpY = 1 << ((g_maxLog2CUSize - depth) * 2);
     uint32_t tmpY2 = m_absIdxInCTU << (LOG2_UNIT_SIZE * 2);
     memcpy(ctu.m_trCoeff[0] + tmpY2, m_trCoeff[0], sizeof(coeff_t)* tmpY);
@@ -520,6 +541,8 @@
     memcpy(m_mvd[0], ctu.m_mvd[0] + m_absIdxInCTU, m_numPartitions * sizeof(MV));
     memcpy(m_mvd[1], ctu.m_mvd[1] + m_absIdxInCTU, m_numPartitions * sizeof(MV));
 
+    memcpy(m_distortion, ctu.m_distortion + m_absIdxInCTU, m_numPartitions * sizeof(sse_t));
+
     /* clear residual coding flags */
     m_partSet(m_tuDepth, 0);
     m_partSet(m_transformSkip[0], 0);
​

x265_2.2.tar.gz/source/common/cudata.h -> x265_2.3.tar.gz/source/common/cudata.h Changed

@@ -164,6 +164,8 @@
     static cubcast_t s_partSet[NUM_FULL_DEPTH]; // pointer to broadcast set functions per absolute depth
     static uint32_t  s_numPartInCUSize;
 
+    bool          m_vbvAffected;
+
     FrameData*    m_encData;
     const Slice*  m_slice;
 
@@ -205,6 +207,7 @@
     uint8_t*      m_chromaIntraDir;   // array of intra directions (chroma)
     enum { BytesPerPartition = 21 };  // combined sizeof() of all per-part data
 
+    sse_t*        m_distortion;
     coeff_t*      m_trCoeff[3];       // transformed coefficient buffer per plane
     int8_t        m_refTuDepth[NUM_TU_DEPTH];   // TU depth of CU at depths 0, 1 and 2
 
@@ -216,6 +219,9 @@
     const CUData* m_cuAboveRight;     // pointer to above-right neighbor CTU
     const CUData* m_cuAbove;          // pointer to above neighbor CTU
     const CUData* m_cuLeft;           // pointer to left neighbor CTU
+    double m_meanQP;
+    uint64_t      m_fAc_den[3];
+    uint64_t      m_fDc_den[3];
 
     CUData();
 
@@ -340,8 +346,9 @@
     uint8_t* charMemBlock;
     coeff_t* trCoeffMemBlock;
     MV*      mvMemBlock;
+    sse_t*   distortionMemBlock;
 
-    CUDataMemPool() { charMemBlock = NULL; trCoeffMemBlock = NULL; mvMemBlock = NULL; }
+    CUDataMemPool() { charMemBlock = NULL; trCoeffMemBlock = NULL; mvMemBlock = NULL; distortionMemBlock = NULL; }
 
     bool create(uint32_t depth, uint32_t csp, uint32_t numInstances)
     {
@@ -359,6 +366,7 @@
         }
         CHECKED_MALLOC(charMemBlock, uint8_t, numPartition * numInstances * CUData::BytesPerPartition);
         CHECKED_MALLOC_ZERO(mvMemBlock, MV, numPartition * 4 * numInstances);
+        CHECKED_MALLOC(distortionMemBlock, sse_t, numPartition * numInstances);
         return true;
     fail:
         return false;
@@ -369,6 +377,7 @@
         X265_FREE(trCoeffMemBlock);
         X265_FREE(mvMemBlock);
         X265_FREE(charMemBlock);
+        X265_FREE(distortionMemBlock);
     }
 };
 }

 
@@ -164,6 +164,8 @@
     static cubcast_t s_partSet[NUM_FULL_DEPTH]; // pointer to broadcast set functions per absolute depth
     static uint32_t  s_numPartInCUSize;
 
+    bool          m_vbvAffected;
+
     FrameData*    m_encData;
     const Slice*  m_slice;
 
@@ -205,6 +207,7 @@
     uint8_t*      m_chromaIntraDir;   // array of intra directions (chroma)
     enum { BytesPerPartition = 21 };  // combined sizeof() of all per-part data
 
+    sse_t*        m_distortion;
     coeff_t*      m_trCoeff[3];       // transformed coefficient buffer per plane
     int8_t        m_refTuDepth[NUM_TU_DEPTH];   // TU depth of CU at depths 0, 1 and 2
 
@@ -216,6 +219,9 @@
     const CUData* m_cuAboveRight;     // pointer to above-right neighbor CTU
     const CUData* m_cuAbove;          // pointer to above neighbor CTU
     const CUData* m_cuLeft;           // pointer to left neighbor CTU
+    double m_meanQP;
+    uint64_t      m_fAc_den[3];
+    uint64_t      m_fDc_den[3];
 
     CUData();
 
@@ -340,8 +346,9 @@
     uint8_t* charMemBlock;
     coeff_t* trCoeffMemBlock;
     MV*      mvMemBlock;
+    sse_t*   distortionMemBlock;
 
-    CUDataMemPool() { charMemBlock = NULL; trCoeffMemBlock = NULL; mvMemBlock = NULL; }
+    CUDataMemPool() { charMemBlock = NULL; trCoeffMemBlock = NULL; mvMemBlock = NULL; distortionMemBlock = NULL; }
 
     bool create(uint32_t depth, uint32_t csp, uint32_t numInstances)
     {
@@ -359,6 +366,7 @@
         }
         CHECKED_MALLOC(charMemBlock, uint8_t, numPartition * numInstances * CUData::BytesPerPartition);
         CHECKED_MALLOC_ZERO(mvMemBlock, MV, numPartition * 4 * numInstances);
+        CHECKED_MALLOC(distortionMemBlock, sse_t, numPartition * numInstances);
         return true;
     fail:
         return false;
@@ -369,6 +377,7 @@
         X265_FREE(trCoeffMemBlock);
         X265_FREE(mvMemBlock);
         X265_FREE(charMemBlock);
+        X265_FREE(distortionMemBlock);
     }
 };
 }
​

x265_2.2.tar.gz/source/common/frame.cpp -> x265_2.3.tar.gz/source/common/frame.cpp Changed

 
@@ -46,6 +46,7 @@
     m_userSEI.payloads = NULL;
     memset(&m_lowres, 0, sizeof(m_lowres));
     m_rcData = NULL;
+    m_encodeStartTime = 0;
 }
 
 bool Frame::create(x265_param *param, float* quantOffsets)
@@ -55,7 +56,7 @@
     CHECKED_MALLOC_ZERO(m_rcData, RcStats, 1);
 
     if (m_fencPic->create(param->sourceWidth, param->sourceHeight, param->internalCsp) &&
-        m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode, param->rc.qgSize))
+        m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode || !!param->bAQMotion, param->rc.qgSize))
     {
         X265_CHECK((m_reconColCount == NULL), "m_reconColCount was initialized");
         m_numRows = (m_fencPic->m_picHeight + g_maxCUSize - 1)  / g_maxCUSize;
​

x265_2.2.tar.gz/source/common/frame.h -> x265_2.3.tar.gz/source/common/frame.h Changed

 
@@ -98,7 +98,10 @@
     Frame*                 m_prev;
     x265_param*            m_param;              // Points to the latest param set for the frame.
     x265_analysis_data     m_analysisData;
+    x265_analysis_2Pass    m_analysis2Pass;
     RcStats*               m_rcData;
+
+    int64_t                m_encodeStartTime;
     Frame();
 
     bool create(x265_param *param, float* quantOffsets);
​

x265_2.2.tar.gz/source/common/framedata.h -> x265_2.3.tar.gz/source/common/framedata.h Changed

@@ -55,6 +55,7 @@
     double      avgLumaDistortion;
     double      avgChromaDistortion;
     double      avgPsyEnergy;
+    double      avgSsimEnergy;
     double      avgResEnergy;
     double      percentIntraNxN;
     double      percentSkipCu[NUM_CU_DEPTH];
@@ -68,6 +69,7 @@
     uint64_t    lumaDistortion;
     uint64_t    chromaDistortion;
     uint64_t    psyEnergy;
+    int64_t     ssimEnergy;
     uint64_t    resEnergy;
     uint64_t    cntSkipCu[NUM_CU_DEPTH];
     uint64_t    cntMergeCu[NUM_CU_DEPTH];
@@ -181,5 +183,24 @@
     uint8_t*    partSize;
     uint8_t*    mergeFlag;
 };
+
+struct analysis2PassFrameData
+{
+    uint8_t*      depth;
+    MV*           m_mv[2];
+    int*          mvpIdx[2];
+    int32_t*      ref[2];
+    uint8_t*      modes;
+    sse_t*        distortion;
+    sse_t*        ctuDistortion;
+    double*       scaledDistortion;
+    double        averageDistortion;
+    double        sdDistortion;
+    uint32_t      highDistortionCtuCount;
+    uint32_t      lowDistortionCtuCount;
+    double*       offset;
+    double*       threshold;
+};
+
 }
 #endif // ifndef X265_FRAMEDATA_H

 
@@ -55,6 +55,7 @@
     double      avgLumaDistortion;
     double      avgChromaDistortion;
     double      avgPsyEnergy;
+    double      avgSsimEnergy;
     double      avgResEnergy;
     double      percentIntraNxN;
     double      percentSkipCu[NUM_CU_DEPTH];
@@ -68,6 +69,7 @@
     uint64_t    lumaDistortion;
     uint64_t    chromaDistortion;
     uint64_t    psyEnergy;
+    int64_t     ssimEnergy;
     uint64_t    resEnergy;
     uint64_t    cntSkipCu[NUM_CU_DEPTH];
     uint64_t    cntMergeCu[NUM_CU_DEPTH];
@@ -181,5 +183,24 @@
     uint8_t*    partSize;
     uint8_t*    mergeFlag;
 };
+
+struct analysis2PassFrameData
+{
+    uint8_t*      depth;
+    MV*           m_mv[2];
+    int*          mvpIdx[2];
+    int32_t*      ref[2];
+    uint8_t*      modes;
+    sse_t*        distortion;
+    sse_t*        ctuDistortion;
+    double*       scaledDistortion;
+    double        averageDistortion;
+    double        sdDistortion;
+    uint32_t      highDistortionCtuCount;
+    uint32_t      lowDistortionCtuCount;
+    double*       offset;
+    double*       threshold;
+};
+
 }
 #endif // ifndef X265_FRAMEDATA_H
​

x265_2.2.tar.gz/source/common/lowres.cpp -> x265_2.3.tar.gz/source/common/lowres.cpp Changed

 
@@ -56,6 +56,7 @@
     if (bAQEnabled)
     {
         CHECKED_MALLOC_ZERO(qpAqOffset, double, cuCountFullRes);
+        CHECKED_MALLOC_ZERO(qpAqMotionOffset, double, cuCountFullRes);
         CHECKED_MALLOC_ZERO(invQscaleFactor, int, cuCountFullRes);
         CHECKED_MALLOC_ZERO(qpCuTreeOffset, double, cuCountFullRes);
         CHECKED_MALLOC_ZERO(blockVariance, uint32_t, cuCountFullRes);
@@ -124,8 +125,8 @@
         X265_FREE(lowresMvCosts[0][i]);
         X265_FREE(lowresMvCosts[1][i]);
     }
-
     X265_FREE(qpAqOffset);
+    X265_FREE(qpAqMotionOffset);
     X265_FREE(invQscaleFactor);
     X265_FREE(qpCuTreeOffset);
     X265_FREE(propagateCost);
​

x265_2.2.tar.gz/source/common/lowres.h -> x265_2.3.tar.gz/source/common/lowres.h Changed

 
@@ -144,6 +144,7 @@
     /* rate control / adaptive quant data */
     double*   qpAqOffset;      // AQ QP offset values for each 16x16 CU
     double*   qpCuTreeOffset;  // cuTree QP offset values for each 16x16 CU
+    double*   qpAqMotionOffset;
     int*      invQscaleFactor; // qScale values for qp Aq Offsets
     int*      invQscaleFactor8x8; // temporary buffer for qg-size 8
     uint32_t* blockVariance;
​

x265_2.2.tar.gz/source/common/param.cpp -> x265_2.3.tar.gz/source/common/param.cpp Changed

@@ -131,6 +131,7 @@
     param->bEnableAccessUnitDelimiters = 0;
     param->bEmitHRDSEI = 0;
     param->bEmitInfoSEI = 1;
+    param->bEmitHDRSEI = 0;
 
     /* CU definitions */
     param->maxCUSize = 64;
@@ -149,8 +150,8 @@
     param->bBPyramid = 1;
     param->scenecutThreshold = 40; /* Magic number pulled in from x264 */
     param->lookaheadSlices = 8;
+    param->lookaheadThreads = 0;
     param->scenecutBias = 5.0;
-
     /* Intra Coding Tools */
     param->bEnableConstrainedIntra = 0;
     param->bEnableStrongIntraSmoothing = 1;
@@ -178,6 +179,7 @@
     param->bEnableTemporalMvp = 1;
     param->bSourceReferenceEstimation = 0;
     param->limitTU = 0;
+    param->dynamicRd = 0;
 
     /* Loop Filter */
     param->bEnableLoopFilter = 1;
@@ -193,6 +195,8 @@
     param->psyRd = 2.0;
     param->psyRdoq = 0.0;
     param->analysisMode = 0;
+    param->analysisMultiPassRefine = 0;
+    param->analysisMultiPassDistortion = 0;
     param->analysisFileName = NULL;
     param->bIntraInBFrames = 0;
     param->bLossless = 0;
@@ -200,6 +204,7 @@
     param->bEnableTemporalSubLayers = 0;
     param->bEnableRdRefine = 0;
     param->bMultiPassOptRPS = 0;
+    param->bSsimRd = 0;
 
     /* Rate control options */
     param->rc.vbvMaxBitrate = 0;
@@ -263,6 +268,8 @@
     param->bEmitVUIHRDInfo      = 1;
     param->bOptQpPPS            = 1;
     param->bOptRefListLengthPPS = 1;
+    param->bOptCUDeltaQP        = 0;
+    param->bAQMotion = 0;
 
 }
 
@@ -919,7 +926,23 @@
         OPT("opt-ref-list-length-pps") p->bOptRefListLengthPPS = atobool(value);
         OPT("multi-pass-opt-rps") p->bMultiPassOptRPS = atobool(value);
         OPT("scenecut-bias") p->scenecutBias = atof(value);
-
+        OPT("lookahead-threads") p->lookaheadThreads = atoi(value);
+        OPT("opt-cu-delta-qp") p->bOptCUDeltaQP = atobool(value);
+        OPT("multi-pass-opt-analysis") p->analysisMultiPassRefine = atobool(value);
+        OPT("multi-pass-opt-distortion") p->analysisMultiPassDistortion = atobool(value);
+        OPT("aq-motion") p->bAQMotion = atobool(value);
+        OPT("dynamic-rd") p->dynamicRd = atof(value);
+        OPT("ssim-rd")
+        {
+            int bval = atobool(value);
+            if (bError || bval)
+            {
+                bError = false;
+                p->psyRd = 0.0;
+                p->bSsimRd = atobool(value);
+            }
+        }
+        OPT("hdr") p->bEmitHDRSEI = atobool(value);
         else
             return X265_PARAM_BAD_NAME;
     }
@@ -1148,6 +1171,8 @@
           "RD Level is out of range");
     CHECK(param->rdoqLevel < 0 || param->rdoqLevel > 2,
         "RDOQ Level is out of range");
+    CHECK(param->dynamicRd < 0 || param->dynamicRd > x265_ADAPT_RD_STRENGTH,
+        "Dynamic RD strength must be between 0 and 4");
     CHECK(param->bframes && param->bframes >= param->lookaheadDepth && !param->rc.bStatRead,
           "Lookahead depth must be greater than the max consecutive bframe count");
     CHECK(param->bframes < 0,
@@ -1260,6 +1285,10 @@
     CHECK(param->searchMethod == X265_SEA && (param->sourceWidth > 840 || param->sourceHeight > 480),
         "SEA motion search does not support resolutions greater than 480p in 32 bit build");
 #endif
+
+    if (param->masteringDisplayColorVolume || param->maxFALL || param->maxCLL)
+        param->bEmitHDRSEI = 1;
+
     return check_failed;
 }
 
@@ -1393,6 +1422,7 @@
     TOOLOPT(param->bEnableAMP, "amp");
     TOOLOPT(param->limitModes, "limit-modes");
     TOOLVAL(param->rdLevel, "rd=%d");
+    TOOLVAL(param->dynamicRd, "dynamic-rd=%.2f");
     TOOLVAL(param->psyRd, "psy-rd=%.2lf");
     TOOLVAL(param->rdoqLevel, "rdoq=%d");
     TOOLVAL(param->psyRdoq, "psy-rdoq=%.2lf");
@@ -1412,6 +1442,7 @@
     TOOLOPT(param->bEnableFastIntra, "fast-intra");
     TOOLOPT(param->bEnableStrongIntraSmoothing, "strong-intra-smoothing");
     TOOLVAL(param->lookaheadSlices, "lslices=%d");
+    TOOLVAL(param->lookaheadThreads, "lthreads=%d")
     if (param->maxSlices > 1)
         TOOLVAL(param->maxSlices, "slices=%d");
     if (param->bEnableLoopFilter)
@@ -1491,6 +1522,7 @@
     s += sprintf(s, " tu-intra-depth=%d", p->tuQTMaxIntraDepth);
     s += sprintf(s, " limit-tu=%d", p->limitTU);
     s += sprintf(s, " rdoq-level=%d", p->rdoqLevel);
+    s += sprintf(s, " dynamic-rd=%.2f", p->dynamicRd);
     BOOL(p->bEnableSignHiding, "signhide");
     BOOL(p->bEnableTransformSkip, "tskip");
     s += sprintf(s, " nr-intra=%d", p->noiseReductionIntra);
@@ -1613,6 +1645,9 @@
     BOOL(p->bOptRefListLengthPPS, "opt-ref-list-length-pps");
     BOOL(p->bMultiPassOptRPS, "multi-pass-opt-rps");
     s += sprintf(s, " scenecut-bias=%.2f", p->scenecutBias);
+    BOOL(p->bOptCUDeltaQP, "opt-cu-delta-qp");
+    BOOL(p->bAQMotion, "aq-motion");
+    BOOL(p->bEmitHDRSEI, "hdr");
 #undef BOOL
     return buf;
 }

 
@@ -131,6 +131,7 @@
     param->bEnableAccessUnitDelimiters = 0;
     param->bEmitHRDSEI = 0;
     param->bEmitInfoSEI = 1;
+    param->bEmitHDRSEI = 0;
 
     /* CU definitions */
     param->maxCUSize = 64;
@@ -149,8 +150,8 @@
     param->bBPyramid = 1;
     param->scenecutThreshold = 40; /* Magic number pulled in from x264 */
     param->lookaheadSlices = 8;
+    param->lookaheadThreads = 0;
     param->scenecutBias = 5.0;
-
     /* Intra Coding Tools */
     param->bEnableConstrainedIntra = 0;
     param->bEnableStrongIntraSmoothing = 1;
@@ -178,6 +179,7 @@
     param->bEnableTemporalMvp = 1;
     param->bSourceReferenceEstimation = 0;
     param->limitTU = 0;
+    param->dynamicRd = 0;
 
     /* Loop Filter */
     param->bEnableLoopFilter = 1;
@@ -193,6 +195,8 @@
     param->psyRd = 2.0;
     param->psyRdoq = 0.0;
     param->analysisMode = 0;
+    param->analysisMultiPassRefine = 0;
+    param->analysisMultiPassDistortion = 0;
     param->analysisFileName = NULL;
     param->bIntraInBFrames = 0;
     param->bLossless = 0;
@@ -200,6 +204,7 @@
     param->bEnableTemporalSubLayers = 0;
     param->bEnableRdRefine = 0;
     param->bMultiPassOptRPS = 0;
+    param->bSsimRd = 0;
 
     /* Rate control options */
     param->rc.vbvMaxBitrate = 0;
@@ -263,6 +268,8 @@
     param->bEmitVUIHRDInfo      = 1;
     param->bOptQpPPS            = 1;
     param->bOptRefListLengthPPS = 1;
+    param->bOptCUDeltaQP        = 0;
+    param->bAQMotion = 0;
 
 }
 
@@ -919,7 +926,23 @@
         OPT("opt-ref-list-length-pps") p->bOptRefListLengthPPS = atobool(value);
         OPT("multi-pass-opt-rps") p->bMultiPassOptRPS = atobool(value);
         OPT("scenecut-bias") p->scenecutBias = atof(value);
-
+        OPT("lookahead-threads") p->lookaheadThreads = atoi(value);
+        OPT("opt-cu-delta-qp") p->bOptCUDeltaQP = atobool(value);
+        OPT("multi-pass-opt-analysis") p->analysisMultiPassRefine = atobool(value);
+        OPT("multi-pass-opt-distortion") p->analysisMultiPassDistortion = atobool(value);
+        OPT("aq-motion") p->bAQMotion = atobool(value);
+        OPT("dynamic-rd") p->dynamicRd = atof(value);
+        OPT("ssim-rd")
+        {
+            int bval = atobool(value);
+            if (bError || bval)
+            {
+                bError = false;
+                p->psyRd = 0.0;
+                p->bSsimRd = atobool(value);
+            }
+        }
+        OPT("hdr") p->bEmitHDRSEI = atobool(value);
         else
             return X265_PARAM_BAD_NAME;
     }
@@ -1148,6 +1171,8 @@
           "RD Level is out of range");
     CHECK(param->rdoqLevel < 0 || param->rdoqLevel > 2,
         "RDOQ Level is out of range");
+    CHECK(param->dynamicRd < 0 || param->dynamicRd > x265_ADAPT_RD_STRENGTH,
+        "Dynamic RD strength must be between 0 and 4");
     CHECK(param->bframes && param->bframes >= param->lookaheadDepth && !param->rc.bStatRead,
           "Lookahead depth must be greater than the max consecutive bframe count");
     CHECK(param->bframes < 0,
@@ -1260,6 +1285,10 @@
     CHECK(param->searchMethod == X265_SEA && (param->sourceWidth > 840 || param->sourceHeight > 480),
         "SEA motion search does not support resolutions greater than 480p in 32 bit build");
 #endif
+
+    if (param->masteringDisplayColorVolume || param->maxFALL || param->maxCLL)
+        param->bEmitHDRSEI = 1;
+
     return check_failed;
 }
 
@@ -1393,6 +1422,7 @@
     TOOLOPT(param->bEnableAMP, "amp");
     TOOLOPT(param->limitModes, "limit-modes");
     TOOLVAL(param->rdLevel, "rd=%d");
+    TOOLVAL(param->dynamicRd, "dynamic-rd=%.2f");
     TOOLVAL(param->psyRd, "psy-rd=%.2lf");
     TOOLVAL(param->rdoqLevel, "rdoq=%d");
     TOOLVAL(param->psyRdoq, "psy-rdoq=%.2lf");
@@ -1412,6 +1442,7 @@
     TOOLOPT(param->bEnableFastIntra, "fast-intra");
     TOOLOPT(param->bEnableStrongIntraSmoothing, "strong-intra-smoothing");
     TOOLVAL(param->lookaheadSlices, "lslices=%d");
+    TOOLVAL(param->lookaheadThreads, "lthreads=%d")
     if (param->maxSlices > 1)
         TOOLVAL(param->maxSlices, "slices=%d");
     if (param->bEnableLoopFilter)
@@ -1491,6 +1522,7 @@
     s += sprintf(s, " tu-intra-depth=%d", p->tuQTMaxIntraDepth);
     s += sprintf(s, " limit-tu=%d", p->limitTU);
     s += sprintf(s, " rdoq-level=%d", p->rdoqLevel);
+    s += sprintf(s, " dynamic-rd=%.2f", p->dynamicRd);
     BOOL(p->bEnableSignHiding, "signhide");
     BOOL(p->bEnableTransformSkip, "tskip");
     s += sprintf(s, " nr-intra=%d", p->noiseReductionIntra);
@@ -1613,6 +1645,9 @@
     BOOL(p->bOptRefListLengthPPS, "opt-ref-list-length-pps");
     BOOL(p->bMultiPassOptRPS, "multi-pass-opt-rps");
     s += sprintf(s, " scenecut-bias=%.2f", p->scenecutBias);
+    BOOL(p->bOptCUDeltaQP, "opt-cu-delta-qp");
+    BOOL(p->bAQMotion, "aq-motion");
+    BOOL(p->bEmitHDRSEI, "hdr");
 #undef BOOL
     return buf;
 }
​

x265_2.2.tar.gz/source/common/quant.cpp -> x265_2.3.tar.gz/source/common/quant.cpp Changed

@@ -479,6 +479,83 @@
     }
 }
 
+uint64_t Quant::ssimDistortion(const CUData& cu, const pixel* fenc, uint32_t fStride, const pixel* recon, intptr_t rstride, uint32_t log2TrSize, TextType ttype, uint32_t absPartIdx)
+{
+    static const int ssim_c1 = (int)(.01 * .01 * PIXEL_MAX * PIXEL_MAX * 64 + .5); // 416
+    static const int ssim_c2 = (int)(.03 * .03 * PIXEL_MAX * PIXEL_MAX * 64 * 63 + .5); // 235963
+    int shift = (X265_DEPTH - 8);
+
+    int trSize = 1 << log2TrSize;
+    uint64_t ssDc = 0, ssBlock = 0, ssAc = 0;
+
+    // Calculation of (X(0) - Y(0)) * (X(0) - Y(0)), DC
+    ssDc = 0;
+    for (int y = 0; y < trSize; y += 4)
+    {
+        for (int x = 0; x < trSize; x += 4)
+        {
+            int temp = fenc[y * fStride + x] - recon[y * rstride + x]; // copy of residual coeff
+            ssDc += temp * temp;
+        }
+    }
+
+    // Calculation of (X(k) - Y(k)) * (X(k) - Y(k)), AC
+    ssBlock = 0;
+    for (int y = 0; y < trSize; y++)
+    {
+        for (int x = 0; x < trSize; x++)
+        {
+            int temp = fenc[y * fStride + x] - recon[y * rstride + x]; // copy of residual coeff
+            ssBlock += temp * temp;
+        }
+    }
+
+    ssAc = ssBlock - ssDc;
+
+    // 1. Calculation of fdc'
+    // Calculate numerator of dc normalization factor
+    uint64_t fDc_num = 0;
+
+    // 2. Calculate dc component
+    uint64_t dc_k = 0;
+    for (int block_yy = 0; block_yy < trSize; block_yy += 4)
+    {
+        for (int block_xx = 0; block_xx < trSize; block_xx += 4)
+        {
+            uint32_t temp = fenc[block_yy * fStride + block_xx] >> shift;
+            dc_k += temp * temp;
+        }
+    }
+
+    fDc_num = (2 * dc_k)  + (trSize * trSize * ssim_c1); // 16 pixels -> for each 4x4 block
+    fDc_num /= ((trSize >> 2) * (trSize >> 2));
+
+    // 1. Calculation of fac'
+    // Calculate numerator of ac normalization factor
+    uint64_t fAc_num = 0;
+
+    // 2. Calculate ac component
+    uint64_t ac_k = 0;
+    for (int block_yy = 0; block_yy < trSize; block_yy += 1)
+    {
+        for (int block_xx = 0; block_xx < trSize; block_xx += 1)
+        {
+            uint32_t temp = fenc[block_yy * fStride + block_xx] >> shift;
+            ac_k += temp * temp;
+        }
+    }
+    ac_k -= dc_k;
+
+    double s = 1 + 0.005 * cu.m_qp[absPartIdx];
+
+    fAc_num = ac_k + uint64_t(s * ac_k) + ssim_c2;
+    fAc_num /= ((trSize >> 2) * (trSize >> 2));
+
+    // Calculate dc and ac normalization factor
+    uint64_t ssim_distortion = ((ssDc * cu.m_fDc_den[ttype]) / fDc_num) + ((ssAc * cu.m_fAc_den[ttype]) / fAc_num);
+    return ssim_distortion;
+}
+
 void Quant::invtransformNxN(const CUData& cu, int16_t* residual, uint32_t resiStride, const coeff_t* coeff,
                             uint32_t log2TrSize, TextType ttype, bool bIntra, bool useTransformSkip, uint32_t numSig)
 {

 
@@ -479,6 +479,83 @@
     }
 }
 
+uint64_t Quant::ssimDistortion(const CUData& cu, const pixel* fenc, uint32_t fStride, const pixel* recon, intptr_t rstride, uint32_t log2TrSize, TextType ttype, uint32_t absPartIdx)
+{
+    static const int ssim_c1 = (int)(.01 * .01 * PIXEL_MAX * PIXEL_MAX * 64 + .5); // 416
+    static const int ssim_c2 = (int)(.03 * .03 * PIXEL_MAX * PIXEL_MAX * 64 * 63 + .5); // 235963
+    int shift = (X265_DEPTH - 8);
+
+    int trSize = 1 << log2TrSize;
+    uint64_t ssDc = 0, ssBlock = 0, ssAc = 0;
+
+    // Calculation of (X(0) - Y(0)) * (X(0) - Y(0)), DC
+    ssDc = 0;
+    for (int y = 0; y < trSize; y += 4)
+    {
+        for (int x = 0; x < trSize; x += 4)
+        {
+            int temp = fenc[y * fStride + x] - recon[y * rstride + x]; // copy of residual coeff
+            ssDc += temp * temp;
+        }
+    }
+
+    // Calculation of (X(k) - Y(k)) * (X(k) - Y(k)), AC
+    ssBlock = 0;
+    for (int y = 0; y < trSize; y++)
+    {
+        for (int x = 0; x < trSize; x++)
+        {
+            int temp = fenc[y * fStride + x] - recon[y * rstride + x]; // copy of residual coeff
+            ssBlock += temp * temp;
+        }
+    }
+
+    ssAc = ssBlock - ssDc;
+
+    // 1. Calculation of fdc'
+    // Calculate numerator of dc normalization factor
+    uint64_t fDc_num = 0;
+
+    // 2. Calculate dc component
+    uint64_t dc_k = 0;
+    for (int block_yy = 0; block_yy < trSize; block_yy += 4)
+    {
+        for (int block_xx = 0; block_xx < trSize; block_xx += 4)
+        {
+            uint32_t temp = fenc[block_yy * fStride + block_xx] >> shift;
+            dc_k += temp * temp;
+        }
+    }
+
+    fDc_num = (2 * dc_k)  + (trSize * trSize * ssim_c1); // 16 pixels -> for each 4x4 block
+    fDc_num /= ((trSize >> 2) * (trSize >> 2));
+
+    // 1. Calculation of fac'
+    // Calculate numerator of ac normalization factor
+    uint64_t fAc_num = 0;
+
+    // 2. Calculate ac component
+    uint64_t ac_k = 0;
+    for (int block_yy = 0; block_yy < trSize; block_yy += 1)
+    {
+        for (int block_xx = 0; block_xx < trSize; block_xx += 1)
+        {
+            uint32_t temp = fenc[block_yy * fStride + block_xx] >> shift;
+            ac_k += temp * temp;
+        }
+    }
+    ac_k -= dc_k;
+
+    double s = 1 + 0.005 * cu.m_qp[absPartIdx];
+
+    fAc_num = ac_k + uint64_t(s * ac_k) + ssim_c2;
+    fAc_num /= ((trSize >> 2) * (trSize >> 2));
+
+    // Calculate dc and ac normalization factor
+    uint64_t ssim_distortion = ((ssDc * cu.m_fDc_den[ttype]) / fDc_num) + ((ssAc * cu.m_fAc_den[ttype]) / fAc_num);
+    return ssim_distortion;
+}
+
 void Quant::invtransformNxN(const CUData& cu, int16_t* residual, uint32_t resiStride, const coeff_t* coeff,
                             uint32_t log2TrSize, TextType ttype, bool bIntra, bool useTransformSkip, uint32_t numSig)
 {
​

x265_2.2.tar.gz/source/common/quant.h -> x265_2.3.tar.gz/source/common/quant.h Changed

 
@@ -111,6 +111,8 @@
 
     void invtransformNxN(const CUData& cu, int16_t* residual, uint32_t resiStride, const coeff_t* coeff,
                          uint32_t log2TrSize, TextType ttype, bool bIntra, bool useTransformSkip, uint32_t numSig);
+    uint64_t ssimDistortion(const CUData& cu, const pixel* fenc, uint32_t fStride, const pixel* recon, intptr_t rstride,
+                            uint32_t log2TrSize, TextType ttype, uint32_t absPartIdx);
 
     /* Pattern decision for context derivation process of significant_coeff_flag */
     static uint32_t calcPatternSigCtx(uint64_t sigCoeffGroupFlag64, uint32_t cgPosX, uint32_t cgPosY, uint32_t cgBlkPos, uint32_t trSizeCG)
​

x265_2.2.tar.gz/source/common/scalinglist.cpp -> x265_2.3.tar.gz/source/common/scalinglist.cpp Changed

@@ -339,7 +339,7 @@
 }
 
 /** set quantized matrix coefficient for encode */
-void ScalingList::setupQuantMatrices()
+void ScalingList::setupQuantMatrices(int internalCsp)
 {
     for (int size = 0; size < NUM_SIZES; size++)
     {
@@ -360,6 +360,21 @@
 
                 if (m_bEnabled)
                 {
+                    if (internalCsp == X265_CSP_I444)
+                    {
+                        for (int i = 0; i < 64; i++)
+                        {
+                            m_scalingListCoef[BLOCK_32x32][1][i] = m_scalingListCoef[BLOCK_16x16][1][i];
+                            m_scalingListCoef[BLOCK_32x32][2][i] = m_scalingListCoef[BLOCK_16x16][2][i];
+                            m_scalingListCoef[BLOCK_32x32][4][i] = m_scalingListCoef[BLOCK_16x16][4][i];
+                            m_scalingListCoef[BLOCK_32x32][5][i] = m_scalingListCoef[BLOCK_16x16][5][i];
+                        }
+
+                        m_scalingListDC[BLOCK_32x32][1] = m_scalingListDC[BLOCK_16x16][1];
+                        m_scalingListDC[BLOCK_32x32][2] = m_scalingListDC[BLOCK_16x16][2];
+                        m_scalingListDC[BLOCK_32x32][4] = m_scalingListDC[BLOCK_16x16][4];
+                        m_scalingListDC[BLOCK_32x32][5] = m_scalingListDC[BLOCK_16x16][5];
+                    }
                     processScalingListEnc(coeff, quantCoeff, s_quantScales[rem] << 4, width, width, ratio, stride, dc);
                     processScalingListDec(coeff, dequantCoeff, s_invQuantScales[rem], width, width, ratio, stride, dc);
                 }

 
@@ -339,7 +339,7 @@
 }
 
 /** set quantized matrix coefficient for encode */
-void ScalingList::setupQuantMatrices()
+void ScalingList::setupQuantMatrices(int internalCsp)
 {
     for (int size = 0; size < NUM_SIZES; size++)
     {
@@ -360,6 +360,21 @@
 
                 if (m_bEnabled)
                 {
+                    if (internalCsp == X265_CSP_I444)
+                    {
+                        for (int i = 0; i < 64; i++)
+                        {
+                            m_scalingListCoef[BLOCK_32x32][1][i] = m_scalingListCoef[BLOCK_16x16][1][i];
+                            m_scalingListCoef[BLOCK_32x32][2][i] = m_scalingListCoef[BLOCK_16x16][2][i];
+                            m_scalingListCoef[BLOCK_32x32][4][i] = m_scalingListCoef[BLOCK_16x16][4][i];
+                            m_scalingListCoef[BLOCK_32x32][5][i] = m_scalingListCoef[BLOCK_16x16][5][i];
+                        }
+
+                        m_scalingListDC[BLOCK_32x32][1] = m_scalingListDC[BLOCK_16x16][1];
+                        m_scalingListDC[BLOCK_32x32][2] = m_scalingListDC[BLOCK_16x16][2];
+                        m_scalingListDC[BLOCK_32x32][4] = m_scalingListDC[BLOCK_16x16][4];
+                        m_scalingListDC[BLOCK_32x32][5] = m_scalingListDC[BLOCK_16x16][5];
+                    }
                     processScalingListEnc(coeff, quantCoeff, s_quantScales[rem] << 4, width, width, ratio, stride, dc);
                     processScalingListDec(coeff, dequantCoeff, s_invQuantScales[rem], width, width, ratio, stride, dc);
                 }
​

x265_2.2.tar.gz/source/common/scalinglist.h -> x265_2.3.tar.gz/source/common/scalinglist.h Changed

 
@@ -60,7 +60,7 @@
     bool     init();
     void     setDefaultScalingList();
     bool     parseScalingList(const char* filename);
-    void     setupQuantMatrices();
+    void     setupQuantMatrices(int internalCsp);
 
     /* used during SPS coding */
     int      checkPredMode(int sizeId, int listId) const;
​

x265_2.2.tar.gz/source/common/threadpool.cpp -> x265_2.3.tar.gz/source/common/threadpool.cpp Changed

@@ -57,7 +57,10 @@
 
 #endif
 
-#if MACOS
+/* TODO FIX: Macro __MACH__ ideally should be part of MacOS definition, but adding to Cmake
+   behaving is not as expected, need to fix this. */
+
+#if MACOS && __MACH__
 #include <sys/param.h>
 #include <sys/sysctl.h>
 #endif
@@ -244,8 +247,7 @@
 
     return bondCount;
 }
-
-ThreadPool* ThreadPool::allocThreadPools(x265_param* p, int& numPools)
+ThreadPool* ThreadPool::allocThreadPools(x265_param* p, int& numPools, bool isThreadsReserved)
 {
     enum { MAX_NODE_NUM = 127 };
     int cpusPerNode[MAX_NODE_NUM + 1];
@@ -397,17 +399,32 @@
         x265_log(p, X265_LOG_DEBUG, "Reducing number of thread pools for frame thread count\n");
         numPools = X265_MAX(p->frameNumThreads / 2, 1);
     }
-
+    if (isThreadsReserved)
+        numPools = 1;
     ThreadPool *pools = new ThreadPool[numPools];
     if (pools)
     {
-        int maxProviders = (p->frameNumThreads + numPools - 1) / numPools + 1; /* +1 is Lookahead, always assigned to threadpool 0 */
+        int maxProviders = (p->frameNumThreads + numPools - 1) / numPools + !isThreadsReserved; /* +1 is Lookahead, always assigned to threadpool 0 */
         int node = 0;
         for (int i = 0; i < numPools; i++)
         {
             while (!threadsPerPool[node])
                 node++;
             int numThreads = X265_MIN(MAX_POOL_THREADS, threadsPerPool[node]);
+            int origNumThreads = numThreads;
+            if (p->lookaheadThreads > numThreads / 2)
+            {
+                p->lookaheadThreads = numThreads / 2;
+                x265_log(p, X265_LOG_DEBUG, "Setting lookahead threads to a maximum of half the total number of threads\n");
+            }
+            if (isThreadsReserved)
+            {
+                numThreads = p->lookaheadThreads;
+                maxProviders = 1;
+            }
+
+            else
+                numThreads -= p->lookaheadThreads;
             if (!pools[i].create(numThreads, maxProviders, nodeMaskPerPool[node]))
             {
                 X265_FREE(pools);
@@ -425,7 +442,7 @@
             }
             else
                 x265_log(p, X265_LOG_INFO, "Thread pool created using %d threads\n", numThreads);
-            threadsPerPool[node] -= numThreads;
+            threadsPerPool[node] -= origNumThreads;
         }
     }
     else
@@ -603,7 +620,7 @@
     return sysconf(_SC_NPROCESSORS_CONF);
 #elif __unix__
     return sysconf(_SC_NPROCESSORS_ONLN);
-#elif MACOS
+#elif MACOS && __MACH__
     int nm[2];
     size_t len = 4;
     uint32_t count;

 
@@ -57,7 +57,10 @@
 
 #endif
 
-#if MACOS
+/* TODO FIX: Macro __MACH__ ideally should be part of MacOS definition, but adding to Cmake
+   behaving is not as expected, need to fix this. */
+
+#if MACOS && __MACH__
 #include <sys/param.h>
 #include <sys/sysctl.h>
 #endif
@@ -244,8 +247,7 @@
 
     return bondCount;
 }
-
-ThreadPool* ThreadPool::allocThreadPools(x265_param* p, int& numPools)
+ThreadPool* ThreadPool::allocThreadPools(x265_param* p, int& numPools, bool isThreadsReserved)
 {
     enum { MAX_NODE_NUM = 127 };
     int cpusPerNode[MAX_NODE_NUM + 1];
@@ -397,17 +399,32 @@
         x265_log(p, X265_LOG_DEBUG, "Reducing number of thread pools for frame thread count\n");
         numPools = X265_MAX(p->frameNumThreads / 2, 1);
     }
-
+    if (isThreadsReserved)
+        numPools = 1;
     ThreadPool *pools = new ThreadPool[numPools];
     if (pools)
     {
-        int maxProviders = (p->frameNumThreads + numPools - 1) / numPools + 1; /* +1 is Lookahead, always assigned to threadpool 0 */
+        int maxProviders = (p->frameNumThreads + numPools - 1) / numPools + !isThreadsReserved; /* +1 is Lookahead, always assigned to threadpool 0 */
         int node = 0;
         for (int i = 0; i < numPools; i++)
         {
             while (!threadsPerPool[node])
                 node++;
             int numThreads = X265_MIN(MAX_POOL_THREADS, threadsPerPool[node]);
+            int origNumThreads = numThreads;
+            if (p->lookaheadThreads > numThreads / 2)
+            {
+                p->lookaheadThreads = numThreads / 2;
+                x265_log(p, X265_LOG_DEBUG, "Setting lookahead threads to a maximum of half the total number of threads\n");
+            }
+            if (isThreadsReserved)
+            {
+                numThreads = p->lookaheadThreads;
+                maxProviders = 1;
+            }
+
+            else
+                numThreads -= p->lookaheadThreads;
             if (!pools[i].create(numThreads, maxProviders, nodeMaskPerPool[node]))
             {
                 X265_FREE(pools);
@@ -425,7 +442,7 @@
             }
             else
                 x265_log(p, X265_LOG_INFO, "Thread pool created using %d threads\n", numThreads);
-            threadsPerPool[node] -= numThreads;
+            threadsPerPool[node] -= origNumThreads;
         }
     }
     else
@@ -603,7 +620,7 @@
     return sysconf(_SC_NPROCESSORS_CONF);
 #elif __unix__
     return sysconf(_SC_NPROCESSORS_ONLN);
-#elif MACOS
+#elif MACOS && __MACH__
     int nm[2];
     size_t len = 4;
     uint32_t count;
​

x265_2.2.tar.gz/source/common/threadpool.h -> x265_2.3.tar.gz/source/common/threadpool.h Changed

 
@@ -102,9 +102,7 @@
     void setThreadNodeAffinity(void *numaMask);
     int  tryAcquireSleepingThread(sleepbitmap_t firstTryBitmap, sleepbitmap_t secondTryBitmap);
     int  tryBondPeers(int maxPeers, sleepbitmap_t peerBitmap, BondedTaskGroup& master);
-
-    static ThreadPool* allocThreadPools(x265_param* p, int& numPools);
-
+    static ThreadPool* allocThreadPools(x265_param* p, int& numPools, bool isThreadsReserved);
     static int  getCpuCount();
     static int  getNumaNodeCount();
 };
​

x265_2.2.tar.gz/source/encoder/analysis.cpp -> x265_2.3.tar.gz/source/encoder/analysis.cpp Changed

@@ -76,6 +76,7 @@
     m_reuseRef = NULL;
     m_bHD = false;
 }
+
 bool Analysis::create(ThreadLocalData *tld)
 {
     m_tld = tld;
@@ -142,9 +143,30 @@
     ctu.setQPSubParts((int8_t)qp, 0, 0);
 
     m_rqt[0].cur.load(initialContext);
+    ctu.m_meanQP = initialContext.m_meanQP;
     m_modeDepth[0].fencYuv.copyFromPicYuv(*m_frame->m_fencPic, ctu.m_cuAddr, 0);
 
+    if (m_param->bSsimRd)
+        calculateNormFactor(ctu, qp);
+
     uint32_t numPartition = ctu.m_numPartitions;
+    if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead)
+    {
+        m_multipassAnalysis = (analysis2PassFrameData*)m_frame->m_analysis2Pass.analysisFramedata;
+        m_multipassDepth = &m_multipassAnalysis->depth[ctu.m_cuAddr * ctu.m_numPartitions];
+        if (m_slice->m_sliceType != I_SLICE)
+        {
+            int numPredDir = m_slice->isInterP() ? 1 : 2;
+            for (int dir = 0; dir < numPredDir; dir++)
+            {
+                m_multipassMv[dir] = &m_multipassAnalysis->m_mv[dir][ctu.m_cuAddr * ctu.m_numPartitions];
+                m_multipassMvpIdx[dir] = &m_multipassAnalysis->mvpIdx[dir][ctu.m_cuAddr * ctu.m_numPartitions];
+                m_multipassRef[dir] = &m_multipassAnalysis->ref[dir][ctu.m_cuAddr * ctu.m_numPartitions];
+            }
+            m_multipassModes = &m_multipassAnalysis->modes[ctu.m_cuAddr * ctu.m_numPartitions];
+        }
+    }
+
     if (m_param->analysisMode && m_slice->m_sliceType != I_SLICE)
     {
         int numPredDir = m_slice->isInterP() ? 1 : 2;
@@ -197,7 +219,7 @@
             compressInterCU_rd5_6(ctu, cuGeom, qp);
     }
 
-    if (m_param->bEnableRdRefine)
+    if (m_param->bEnableRdRefine || m_param->bOptCUDeltaQP)
         qprdRefine(ctu, cuGeom, qp, qp);
 
     return *m_modeDepth[0].bestMode;
@@ -299,8 +321,13 @@
         int cuIdx = (cuGeom.childOffset - 1) / 3;
         bestCUCost = origCUCost = cacheCost[cuIdx];
 
-        for (int dir = 2; dir >= -2; dir -= 4)
+        int direction = m_param->bOptCUDeltaQP ? 1 : 2;
+
+        for (int dir = direction; dir >= -direction; dir -= (direction * 2))
         {
+            if (m_param->bOptCUDeltaQP && ((dir != 1) || ((qp + 3) >= (int32_t)parentCTU.m_meanQP)))
+                break;
+
             int threshold = 1;
             int failure = 0;
             cuPrevCost = origCUCost;
@@ -308,6 +335,9 @@
             int modCUQP = qp + dir;
             while (modCUQP >= m_param->rc.qpMin && modCUQP <= QP_MAX_SPEC)
             {
+                if (m_param->bOptCUDeltaQP && modCUQP > (int32_t)parentCTU.m_meanQP)
+                    break;
+
                 recodeCU(parentCTU, cuGeom, modCUQP, qp);
                 cuCost = md.bestMode->rdCost;
 
@@ -939,6 +969,9 @@
 
 SplitData Analysis::compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp)
 {
+    if (parentCTU.m_vbvAffected && calculateQpforCuSize(parentCTU, cuGeom, 1))
+        return compressInterCU_rd5_6(parentCTU, cuGeom, qp);
+
     uint32_t depth = cuGeom.depth;
     uint32_t cuAddr = parentCTU.m_cuAddr;
     ModeDepth& md = m_modeDepth[depth];
@@ -1006,6 +1039,22 @@
             }
         }
     }
+    if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && m_multipassAnalysis)
+    {
+        if (mightNotSplit && depth == m_multipassDepth[cuGeom.absPartIdx])
+        {
+            if (m_multipassModes[cuGeom.absPartIdx] == MODE_SKIP)
+            {
+                md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
+                md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
+                checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
+
+                skipRecursion = !!m_param->bEnableRecursionSkip && md.bestMode;
+                if (m_param->rdLevel)
+                    skipModes = m_param->bEnableEarlySkip && md.bestMode;
+            }
+        }
+    }
 
     /* Step 1. Evaluate Merge/Skip candidates for likely early-outs, if skip mode was not set above */
     if (mightNotSplit && depth >= minDepth && !md.bestMode) /* TODO: Re-evaluate if analysis load/save still works */
@@ -1491,6 +1540,9 @@
 
 SplitData Analysis::compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp)
 {
+    if (parentCTU.m_vbvAffected && !calculateQpforCuSize(parentCTU, cuGeom, 1))
+        return compressInterCU_rd0_4(parentCTU, cuGeom, qp);
+
     uint32_t depth = cuGeom.depth;
     ModeDepth& md = m_modeDepth[depth];
     md.bestMode = NULL;
@@ -1553,6 +1605,28 @@
         }
     }
 
+    if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && m_multipassAnalysis)
+    {
+        if (mightNotSplit && depth == m_multipassDepth[cuGeom.absPartIdx])
+        {
+            if (m_multipassModes[cuGeom.absPartIdx] == MODE_SKIP)
+            {
+                md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
+                md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
+                checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
+
+                skipModes = !!m_param->bEnableEarlySkip && md.bestMode;
+                refMasks[0] = allSplitRefs;
+                md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);
+                checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, refMasks);
+                checkBestMode(md.pred[PRED_2Nx2N], cuGeom.depth);
+
+                if (m_param->bEnableRecursionSkip && depth && m_modeDepth[depth - 1].bestMode)
+                    skipRecursion = md.bestMode && !md.bestMode->cu.getQtRootCbf(0);
+            }
+        }
+    }
+
     /* Step 1. Evaluate Merge/Skip candidates for likely early-outs */
     if (mightNotSplit && !md.bestMode)
     {
@@ -2301,6 +2375,21 @@
                 bestME[i].ref = m_reuseRef[refOffset + index++];
         }
     }
+
+    if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && m_multipassAnalysis)
+    {
+        uint32_t numPU = interMode.cu.getNumPartInter(0);
+        for (uint32_t part = 0; part < numPU; part++)
+        {
+            MotionData* bestME = interMode.bestME[part];
+            for (int32_t i = 0; i < numPredDir; i++)
+            {
+                bestME[i].ref = m_multipassRef[i][cuGeom.absPartIdx];
+                bestME[i].mv = m_multipassMv[i][cuGeom.absPartIdx];
+                bestME[i].mvpIdx = m_multipassMvpIdx[i][cuGeom.absPartIdx];
+            }
+        }
+    }
     predInterSearch(interMode, cuGeom, m_bChromaSa8d && (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400), refMask);
 
     /* predInterSearch sets interMode.sa8dBits */
@@ -2350,6 +2439,22 @@
                 bestME[i].ref = m_reuseRef[refOffset + index++];
         }
     }
+
+    if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && m_multipassAnalysis)
+    {
+        uint32_t numPU = interMode.cu.getNumPartInter(0);
+        for (uint32_t part = 0; part < numPU; part++)
+        {
+            MotionData* bestME = interMode.bestME[part];
+            for (int32_t i = 0; i < numPredDir; i++)
+            {
+                bestME[i].ref = m_multipassRef[i][cuGeom.absPartIdx];
+                bestME[i].mv = m_multipassMv[i][cuGeom.absPartIdx];
+                bestME[i].mvpIdx = m_multipassMvpIdx[i][cuGeom.absPartIdx];
+            }
+        }
+    }
+
     predInterSearch(interMode, cuGeom, m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400, refMask);
 
     /* predInterSearch sets interMode.sa8dBits, but this is ignored */
@@ -2775,7 +2880,7 @@
     return false;
 }
 
-int Analysis::calculateQpforCuSize(const CUData& ctu, const CUGeom& cuGeom, double baseQp)
+int Analysis::calculateQpforCuSize(const CUData& ctu, const CUGeom& cuGeom, int32_t complexCheck, double baseQp)
 {
     FrameData& curEncData = *m_frame->m_encData;
     double qp = baseQp >= 0 ? baseQp : curEncData.m_cuStat[ctu.m_cuAddr].baseQp;
@@ -2786,7 +2891,11 @@
         loopIncr = 16;
     /* Use cuTree offsets if cuTree enabled and frame is referenced, else use AQ offsets */
     bool isReferenced = IS_REFERENCED(m_frame);
-    double *qpoffs = (isReferenced && m_param->rc.cuTree) ? m_frame->m_lowres.qpCuTreeOffset : m_frame->m_lowres.qpAqOffset;
+    double *qpoffs;
+    if (complexCheck)
+        qpoffs = m_frame->m_lowres.qpAqOffset;
+    else
+        qpoffs = (isReferenced && m_param->rc.cuTree) ? m_frame->m_lowres.qpCuTreeOffset : m_frame->m_lowres.qpAqOffset;
     if (qpoffs)
     {
         uint32_t width = m_frame->m_fencPic->m_picWidth;
@@ -2811,7 +2920,80 @@
 
         qp_offset /= cnt;
         qp += qp_offset;
+        if (complexCheck)
+        {
+            int32_t offset = (int32_t)(qp_offset * 100 + .5);
+            double threshold = (1 - ((x265_ADAPT_RD_STRENGTH - m_param->dynamicRd) * 0.5));
+            int32_t max_threshold = (int32_t)(threshold * 100 + .5);
+            if (offset < max_threshold)
+                return 1;
+            else
+                return 0;
+        }
     }
 
     return x265_clip3(m_param->rc.qpMin, m_param->rc.qpMax, (int)(qp + 0.5));
 }
+
+void Analysis::normFactor(const pixel* src, uint32_t blockSize, CUData& ctu, int qp, TextType ttype)
+{
+    static const int ssim_c1 = (int)(.01 * .01 * PIXEL_MAX * PIXEL_MAX * 64 + .5); // 416
+    static const int ssim_c2 = (int)(.03 * .03 * PIXEL_MAX * PIXEL_MAX * 64 * 63 + .5); // 235963
+    int shift = (X265_DEPTH - 8);
+
+    double s = 1 + 0.005 * qp;
+
+    // Calculate denominator of normalization factor
+    uint64_t fDc_den = 0, fAc_den = 0;
+
+    // 1. Calculate dc component
+    uint64_t z_o = 0;
+    for (uint32_t block_yy = 0; block_yy < blockSize; block_yy += 4)
+    {
+        for (uint32_t block_xx = 0; block_xx < blockSize; block_xx += 4)
+        {
+            uint32_t temp = src[block_yy * blockSize + block_xx] >> shift;
+            z_o += temp * temp; // 2 * (Z(0)) pow(2)
+        }
+    }
+    fDc_den = (2 * z_o)  + (blockSize * blockSize * ssim_c1); // 2 * (Z(0)) pow(2) + N * C1
+    fDc_den /= ((blockSize >> 2) * (blockSize >> 2));
+
+    // 2. Calculate ac component
+    uint64_t z_k = 0;
+    for (uint32_t block_yy = 0; block_yy < blockSize; block_yy += 1)
+    {
+        for (uint32_t block_xx = 0; block_xx < blockSize; block_xx += 1)
+        {
+            uint32_t temp = src[block_yy * blockSize + block_xx] >> shift;
+            z_k += temp * temp;
+        }
+    }
+
+    // Remove the DC part
+    z_k -= z_o;
+
+    fAc_den = z_k + int(s * z_k) + ssim_c2;
+    fAc_den /= ((blockSize >> 2) * (blockSize >> 2));
+
+    ctu.m_fAc_den[ttype] = fAc_den;
+    ctu.m_fDc_den[ttype] = fDc_den;
+}
+
+void Analysis::calculateNormFactor(CUData& ctu, int qp)
+{
+    const pixel* srcY = m_modeDepth[0].fencYuv.m_buf[0];
+    uint32_t blockSize = m_modeDepth[0].fencYuv.m_size;
+
+    normFactor(srcY, blockSize, ctu, qp, TEXT_LUMA);
+
+    if (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400)
+    {
+        const pixel* srcU = m_modeDepth[0].fencYuv.m_buf[1];
+        const pixel* srcV = m_modeDepth[0].fencYuv.m_buf[2];
+        uint32_t blockSizeC = m_modeDepth[0].fencYuv.m_csize;
+
+        normFactor(srcU, blockSizeC, ctu, qp, TEXT_CHROMA_U);
+        normFactor(srcV, blockSizeC, ctu, qp, TEXT_CHROMA_V);
+    }
+}

 
@@ -76,6 +76,7 @@
     m_reuseRef = NULL;
     m_bHD = false;
 }
+
 bool Analysis::create(ThreadLocalData *tld)
 {
     m_tld = tld;
@@ -142,9 +143,30 @@
     ctu.setQPSubParts((int8_t)qp, 0, 0);
 
     m_rqt[0].cur.load(initialContext);
+    ctu.m_meanQP = initialContext.m_meanQP;
     m_modeDepth[0].fencYuv.copyFromPicYuv(*m_frame->m_fencPic, ctu.m_cuAddr, 0);
 
+    if (m_param->bSsimRd)
+        calculateNormFactor(ctu, qp);
+
     uint32_t numPartition = ctu.m_numPartitions;
+    if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead)
+    {
+        m_multipassAnalysis = (analysis2PassFrameData*)m_frame->m_analysis2Pass.analysisFramedata;
+        m_multipassDepth = &m_multipassAnalysis->depth[ctu.m_cuAddr * ctu.m_numPartitions];
+        if (m_slice->m_sliceType != I_SLICE)
+        {
+            int numPredDir = m_slice->isInterP() ? 1 : 2;
+            for (int dir = 0; dir < numPredDir; dir++)
+            {
+                m_multipassMv[dir] = &m_multipassAnalysis->m_mv[dir][ctu.m_cuAddr * ctu.m_numPartitions];
+                m_multipassMvpIdx[dir] = &m_multipassAnalysis->mvpIdx[dir][ctu.m_cuAddr * ctu.m_numPartitions];
+                m_multipassRef[dir] = &m_multipassAnalysis->ref[dir][ctu.m_cuAddr * ctu.m_numPartitions];
+            }
+            m_multipassModes = &m_multipassAnalysis->modes[ctu.m_cuAddr * ctu.m_numPartitions];
+        }
+    }
+
     if (m_param->analysisMode && m_slice->m_sliceType != I_SLICE)
     {
         int numPredDir = m_slice->isInterP() ? 1 : 2;
@@ -197,7 +219,7 @@
             compressInterCU_rd5_6(ctu, cuGeom, qp);
     }
 
-    if (m_param->bEnableRdRefine)
+    if (m_param->bEnableRdRefine || m_param->bOptCUDeltaQP)
         qprdRefine(ctu, cuGeom, qp, qp);
 
     return *m_modeDepth[0].bestMode;
@@ -299,8 +321,13 @@
         int cuIdx = (cuGeom.childOffset - 1) / 3;
         bestCUCost = origCUCost = cacheCost[cuIdx];
 
-        for (int dir = 2; dir >= -2; dir -= 4)
+        int direction = m_param->bOptCUDeltaQP ? 1 : 2;
+
+        for (int dir = direction; dir >= -direction; dir -= (direction * 2))
         {
+            if (m_param->bOptCUDeltaQP && ((dir != 1) || ((qp + 3) >= (int32_t)parentCTU.m_meanQP)))
+                break;
+
             int threshold = 1;
             int failure = 0;
             cuPrevCost = origCUCost;
@@ -308,6 +335,9 @@
             int modCUQP = qp + dir;
             while (modCUQP >= m_param->rc.qpMin && modCUQP <= QP_MAX_SPEC)
             {
+                if (m_param->bOptCUDeltaQP && modCUQP > (int32_t)parentCTU.m_meanQP)
+                    break;
+
                 recodeCU(parentCTU, cuGeom, modCUQP, qp);
                 cuCost = md.bestMode->rdCost;
 
@@ -939,6 +969,9 @@
 
 SplitData Analysis::compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp)
 {
+    if (parentCTU.m_vbvAffected && calculateQpforCuSize(parentCTU, cuGeom, 1))
+        return compressInterCU_rd5_6(parentCTU, cuGeom, qp);
+
     uint32_t depth = cuGeom.depth;
     uint32_t cuAddr = parentCTU.m_cuAddr;
     ModeDepth& md = m_modeDepth[depth];
@@ -1006,6 +1039,22 @@
             }
         }
     }
+    if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && m_multipassAnalysis)
+    {
+        if (mightNotSplit && depth == m_multipassDepth[cuGeom.absPartIdx])
+        {
+            if (m_multipassModes[cuGeom.absPartIdx] == MODE_SKIP)
+            {
+                md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
+                md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
+                checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
+
+                skipRecursion = !!m_param->bEnableRecursionSkip && md.bestMode;
+                if (m_param->rdLevel)
+                    skipModes = m_param->bEnableEarlySkip && md.bestMode;
+            }
+        }
+    }
 
     /* Step 1. Evaluate Merge/Skip candidates for likely early-outs, if skip mode was not set above */
     if (mightNotSplit && depth >= minDepth && !md.bestMode) /* TODO: Re-evaluate if analysis load/save still works */
@@ -1491,6 +1540,9 @@
 
 SplitData Analysis::compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp)
 {
+    if (parentCTU.m_vbvAffected && !calculateQpforCuSize(parentCTU, cuGeom, 1))
+        return compressInterCU_rd0_4(parentCTU, cuGeom, qp);
+
     uint32_t depth = cuGeom.depth;
     ModeDepth& md = m_modeDepth[depth];
     md.bestMode = NULL;
@@ -1553,6 +1605,28 @@
         }
     }
 
+    if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && m_multipassAnalysis)
+    {
+        if (mightNotSplit && depth == m_multipassDepth[cuGeom.absPartIdx])
+        {
+            if (m_multipassModes[cuGeom.absPartIdx] == MODE_SKIP)
+            {
+                md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
+                md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
+                checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
+
+                skipModes = !!m_param->bEnableEarlySkip && md.bestMode;
+                refMasks[0] = allSplitRefs;
+                md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);
+                checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, refMasks);
+                checkBestMode(md.pred[PRED_2Nx2N], cuGeom.depth);
+
+                if (m_param->bEnableRecursionSkip && depth && m_modeDepth[depth - 1].bestMode)
+                    skipRecursion = md.bestMode && !md.bestMode->cu.getQtRootCbf(0);
+            }
+        }
+    }
+
     /* Step 1. Evaluate Merge/Skip candidates for likely early-outs */
     if (mightNotSplit && !md.bestMode)
     {
@@ -2301,6 +2375,21 @@
                 bestME[i].ref = m_reuseRef[refOffset + index++];
         }
     }
+
+    if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && m_multipassAnalysis)
+    {
+        uint32_t numPU = interMode.cu.getNumPartInter(0);
+        for (uint32_t part = 0; part < numPU; part++)
+        {
+            MotionData* bestME = interMode.bestME[part];
+            for (int32_t i = 0; i < numPredDir; i++)
+            {
+                bestME[i].ref = m_multipassRef[i][cuGeom.absPartIdx];
+                bestME[i].mv = m_multipassMv[i][cuGeom.absPartIdx];
+                bestME[i].mvpIdx = m_multipassMvpIdx[i][cuGeom.absPartIdx];
+            }
+        }
+    }
     predInterSearch(interMode, cuGeom, m_bChromaSa8d && (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400), refMask);
 
     /* predInterSearch sets interMode.sa8dBits */
@@ -2350,6 +2439,22 @@
                 bestME[i].ref = m_reuseRef[refOffset + index++];
         }
     }
+
+    if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && m_multipassAnalysis)
+    {
+        uint32_t numPU = interMode.cu.getNumPartInter(0);
+        for (uint32_t part = 0; part < numPU; part++)
+        {
+            MotionData* bestME = interMode.bestME[part];
+            for (int32_t i = 0; i < numPredDir; i++)
+            {
+                bestME[i].ref = m_multipassRef[i][cuGeom.absPartIdx];
+                bestME[i].mv = m_multipassMv[i][cuGeom.absPartIdx];
+                bestME[i].mvpIdx = m_multipassMvpIdx[i][cuGeom.absPartIdx];
+            }
+        }
+    }
+
     predInterSearch(interMode, cuGeom, m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400, refMask);
 
     /* predInterSearch sets interMode.sa8dBits, but this is ignored */
@@ -2775,7 +2880,7 @@
     return false;
 }
 
-int Analysis::calculateQpforCuSize(const CUData& ctu, const CUGeom& cuGeom, double baseQp)
+int Analysis::calculateQpforCuSize(const CUData& ctu, const CUGeom& cuGeom, int32_t complexCheck, double baseQp)
 {
     FrameData& curEncData = *m_frame->m_encData;
     double qp = baseQp >= 0 ? baseQp : curEncData.m_cuStat[ctu.m_cuAddr].baseQp;
@@ -2786,7 +2891,11 @@
         loopIncr = 16;
     /* Use cuTree offsets if cuTree enabled and frame is referenced, else use AQ offsets */
     bool isReferenced = IS_REFERENCED(m_frame);
-    double *qpoffs = (isReferenced && m_param->rc.cuTree) ? m_frame->m_lowres.qpCuTreeOffset : m_frame->m_lowres.qpAqOffset;
+    double *qpoffs;
+    if (complexCheck)
+        qpoffs = m_frame->m_lowres.qpAqOffset;
+    else
+        qpoffs = (isReferenced && m_param->rc.cuTree) ? m_frame->m_lowres.qpCuTreeOffset : m_frame->m_lowres.qpAqOffset;
     if (qpoffs)
     {
         uint32_t width = m_frame->m_fencPic->m_picWidth;
@@ -2811,7 +2920,80 @@
 
         qp_offset /= cnt;
         qp += qp_offset;
+        if (complexCheck)
+        {
+            int32_t offset = (int32_t)(qp_offset * 100 + .5);
+            double threshold = (1 - ((x265_ADAPT_RD_STRENGTH - m_param->dynamicRd) * 0.5));
+            int32_t max_threshold = (int32_t)(threshold * 100 + .5);
+            if (offset < max_threshold)
+                return 1;
+            else
+                return 0;
+        }
     }
 
     return x265_clip3(m_param->rc.qpMin, m_param->rc.qpMax, (int)(qp + 0.5));
 }
+
+void Analysis::normFactor(const pixel* src, uint32_t blockSize, CUData& ctu, int qp, TextType ttype)
+{
+    static const int ssim_c1 = (int)(.01 * .01 * PIXEL_MAX * PIXEL_MAX * 64 + .5); // 416
+    static const int ssim_c2 = (int)(.03 * .03 * PIXEL_MAX * PIXEL_MAX * 64 * 63 + .5); // 235963
+    int shift = (X265_DEPTH - 8);
+
+    double s = 1 + 0.005 * qp;
+
+    // Calculate denominator of normalization factor
+    uint64_t fDc_den = 0, fAc_den = 0;
+
+    // 1. Calculate dc component
+    uint64_t z_o = 0;
+    for (uint32_t block_yy = 0; block_yy < blockSize; block_yy += 4)
+    {
+        for (uint32_t block_xx = 0; block_xx < blockSize; block_xx += 4)
+        {
+            uint32_t temp = src[block_yy * blockSize + block_xx] >> shift;
+            z_o += temp * temp; // 2 * (Z(0)) pow(2)
+        }
+    }
+    fDc_den = (2 * z_o)  + (blockSize * blockSize * ssim_c1); // 2 * (Z(0)) pow(2) + N * C1
+    fDc_den /= ((blockSize >> 2) * (blockSize >> 2));
+
+    // 2. Calculate ac component
+    uint64_t z_k = 0;
+    for (uint32_t block_yy = 0; block_yy < blockSize; block_yy += 1)
+    {
+        for (uint32_t block_xx = 0; block_xx < blockSize; block_xx += 1)
+        {
+            uint32_t temp = src[block_yy * blockSize + block_xx] >> shift;
+            z_k += temp * temp;
+        }
+    }
+
+    // Remove the DC part
+    z_k -= z_o;
+
+    fAc_den = z_k + int(s * z_k) + ssim_c2;
+    fAc_den /= ((blockSize >> 2) * (blockSize >> 2));
+
+    ctu.m_fAc_den[ttype] = fAc_den;
+    ctu.m_fDc_den[ttype] = fDc_den;
+}
+
+void Analysis::calculateNormFactor(CUData& ctu, int qp)
+{
+    const pixel* srcY = m_modeDepth[0].fencYuv.m_buf[0];
+    uint32_t blockSize = m_modeDepth[0].fencYuv.m_size;
+
+    normFactor(srcY, blockSize, ctu, qp, TEXT_LUMA);
+
+    if (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400)
+    {
+        const pixel* srcU = m_modeDepth[0].fencYuv.m_buf[1];
+        const pixel* srcV = m_modeDepth[0].fencYuv.m_buf[2];
+        uint32_t blockSizeC = m_modeDepth[0].fencYuv.m_csize;
+
+        normFactor(srcU, blockSizeC, ctu, qp, TEXT_CHROMA_U);
+        normFactor(srcV, blockSizeC, ctu, qp, TEXT_CHROMA_V);
+    }
+}
​

x265_2.2.tar.gz/source/encoder/analysis.h -> x265_2.3.tar.gz/source/encoder/analysis.h Changed

@@ -130,6 +130,13 @@
     uint32_t             m_splitRefIdx[4];
     uint64_t*            cacheCost;
 
+
+    analysis2PassFrameData* m_multipassAnalysis;
+    uint8_t*                m_multipassDepth;
+    MV*                     m_multipassMv[2];
+    int*                    m_multipassMvpIdx[2];
+    int32_t*                m_multipassRef[2];
+    uint8_t*                m_multipassModes;
     /* refine RD based on QP for rd-levels 5 and 6 */
     void qprdRefine(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp, int32_t lqp);
 
@@ -167,8 +174,10 @@
     /* generate residual and recon pixels for an entire CTU recursively (RD0) */
     void encodeResidue(const CUData& parentCTU, const CUGeom& cuGeom);
 
-    int calculateQpforCuSize(const CUData& ctu, const CUGeom& cuGeom, double baseQP = -1);
+    int calculateQpforCuSize(const CUData& ctu, const CUGeom& cuGeom, int32_t complexCheck = 0, double baseQP = -1);
 
+    void calculateNormFactor(CUData& ctu, int qp);
+    void normFactor(const pixel* src, uint32_t blockSize, CUData& ctu, int qp, TextType ttype);
     /* check whether current mode is the new best */
     inline void checkBestMode(Mode& mode, uint32_t depth)
     {

 
@@ -130,6 +130,13 @@
     uint32_t             m_splitRefIdx[4];
     uint64_t*            cacheCost;
 
+
+    analysis2PassFrameData* m_multipassAnalysis;
+    uint8_t*                m_multipassDepth;
+    MV*                     m_multipassMv[2];
+    int*                    m_multipassMvpIdx[2];
+    int32_t*                m_multipassRef[2];
+    uint8_t*                m_multipassModes;
     /* refine RD based on QP for rd-levels 5 and 6 */
     void qprdRefine(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp, int32_t lqp);
 
@@ -167,8 +174,10 @@
     /* generate residual and recon pixels for an entire CTU recursively (RD0) */
     void encodeResidue(const CUData& parentCTU, const CUGeom& cuGeom);
 
-    int calculateQpforCuSize(const CUData& ctu, const CUGeom& cuGeom, double baseQP = -1);
+    int calculateQpforCuSize(const CUData& ctu, const CUGeom& cuGeom, int32_t complexCheck = 0, double baseQP = -1);
 
+    void calculateNormFactor(CUData& ctu, int qp);
+    void normFactor(const pixel* src, uint32_t blockSize, CUData& ctu, int qp, TextType ttype);
     /* check whether current mode is the new best */
     inline void checkBestMode(Mode& mode, uint32_t depth)
     {
​

x265_2.2.tar.gz/source/encoder/api.cpp -> x265_2.3.tar.gz/source/encoder/api.cpp Changed

@@ -183,6 +183,20 @@
     }
     else
     {
+        if (encoder->m_latestParam->scalingLists && encoder->m_latestParam->scalingLists != encoder->m_param->scalingLists)
+        {
+            if (encoder->m_param->bRepeatHeaders)
+            {
+                if (encoder->m_scalingList.parseScalingList(encoder->m_latestParam->scalingLists))
+                    return -1;
+                encoder->m_scalingList.setupQuantMatrices(encoder->m_param->internalCsp);
+            }
+            else
+            {
+                x265_log(encoder->m_param, X265_LOG_ERROR, "Repeat headers is turned OFF, cannot reconfigure scalinglists\n");
+                return -1;
+            }
+        }
         encoder->m_reconfigure = true;
         encoder->printReconfigureParams();
     }
@@ -210,6 +224,7 @@
     {
         pic_in->analysisData.intraData = NULL;
         pic_in->analysisData.interData = NULL;
+        pic_in->analysis2Pass.analysisFramedata = NULL;
     }
 
     if (pp_nal && numEncoded > 0)

 
@@ -183,6 +183,20 @@
     }
     else
     {
+        if (encoder->m_latestParam->scalingLists && encoder->m_latestParam->scalingLists != encoder->m_param->scalingLists)
+        {
+            if (encoder->m_param->bRepeatHeaders)
+            {
+                if (encoder->m_scalingList.parseScalingList(encoder->m_latestParam->scalingLists))
+                    return -1;
+                encoder->m_scalingList.setupQuantMatrices(encoder->m_param->internalCsp);
+            }
+            else
+            {
+                x265_log(encoder->m_param, X265_LOG_ERROR, "Repeat headers is turned OFF, cannot reconfigure scalinglists\n");
+                return -1;
+            }
+        }
         encoder->m_reconfigure = true;
         encoder->printReconfigureParams();
     }
@@ -210,6 +224,7 @@
     {
         pic_in->analysisData.intraData = NULL;
         pic_in->analysisData.interData = NULL;
+        pic_in->analysis2Pass.analysisFramedata = NULL;
     }
 
     if (pp_nal && numEncoded > 0)
​

x265_2.2.tar.gz/source/encoder/encoder.cpp -> x265_2.3.tar.gz/source/encoder/encoder.cpp Changed

@@ -73,10 +73,11 @@
     m_latestParam = NULL;
     m_threadPool = NULL;
     m_analysisFile = NULL;
+    m_analysisFileIn = NULL;
+    m_analysisFileOut = NULL;
     m_offsetEmergency = NULL;
     m_iFrameNum = 0;
     m_iPPSQpMinus26 = 0;
-    m_iLastSliceQp = 0;
     m_rpsInSpsCount = 0;
     for (int i = 0; i < X265_MAX_FRAME_THREADS; i++)
         m_frameEncoder[i] = NULL;
@@ -84,6 +85,19 @@
     MotionEstimate::initScales();
 }
 
+inline char *strcatFilename(const char *input, const char *suffix)
+{
+    char *output = X265_MALLOC(char, strlen(input) + strlen(suffix) + 1);
+    if (!output)
+    {
+        x265_log(NULL, X265_LOG_ERROR, "unable to allocate memory for filename\n");
+        return NULL;
+    }
+    strcpy(output, input);
+    strcat(output, suffix);
+    return output;
+}
+
 void Encoder::create()
 {
     if (!primitives.pu[0].sad)
@@ -128,11 +142,9 @@
         else
             p->frameNumThreads = 1;
     }
-
     m_numPools = 0;
     if (allowPools)
-        m_threadPool = ThreadPool::allocThreadPools(p, m_numPools);
-
+        m_threadPool = ThreadPool::allocThreadPools(p, m_numPools, 0);
     if (!m_numPools)
     {
         // issue warnings if any of these features were requested
@@ -201,17 +213,26 @@
         m_scalingList.setDefaultScalingList();
     else if (m_scalingList.parseScalingList(m_param->scalingLists))
         m_aborted = true;
-
-    m_lookahead = new Lookahead(m_param, m_threadPool);
-    if (m_numPools)
+    int pools = m_numPools;
+    ThreadPool* lookAheadThreadPool = 0;
+    if (m_param->lookaheadThreads > 0)
     {
-        m_lookahead->m_jpId = m_threadPool[0].m_numProviders++;
-        m_threadPool[0].m_jpTable[m_lookahead->m_jpId] = m_lookahead;
+        lookAheadThreadPool = ThreadPool::allocThreadPools(p, pools, 1);
     }
-
+    else
+        lookAheadThreadPool = m_threadPool;
+    m_lookahead = new Lookahead(m_param, lookAheadThreadPool);
+    if (pools)
+    {
+        m_lookahead->m_jpId = lookAheadThreadPool[0].m_numProviders++;
+        lookAheadThreadPool[0].m_jpTable[m_lookahead->m_jpId] = m_lookahead;
+    }
+    if (m_param->lookaheadThreads > 0)
+        for (int i = 0; i < pools; i++)
+            lookAheadThreadPool[i].start();
+    m_lookahead->m_numPools = pools;
     m_dpb = new DPB(m_param);
     m_rateControl = new RateControl(*m_param);
-
     initVPS(&m_vps);
     initSPS(&m_sps);
     initPPS(&m_pps);
@@ -230,10 +251,10 @@
         if (!scalingEnabled)
         {
             m_scalingList.setDefaultScalingList();
-            m_scalingList.setupQuantMatrices();
+            m_scalingList.setupQuantMatrices(m_sps.chromaFormatIdc);
         }
         else
-            m_scalingList.setupQuantMatrices();
+            m_scalingList.setupQuantMatrices(m_sps.chromaFormatIdc);
 
         for (int q = 0; q < QP_MAX_MAX - QP_MAX_SPEC; q++)
         {
@@ -286,11 +307,11 @@
         {
             m_scalingList.m_bEnabled = false;
             m_scalingList.m_bDataPresent = false;
-            m_scalingList.setupQuantMatrices();
+            m_scalingList.setupQuantMatrices(m_sps.chromaFormatIdc);
         }
     }
     else
-        m_scalingList.setupQuantMatrices();
+        m_scalingList.setupQuantMatrices(m_sps.chromaFormatIdc);
 
     int numRows = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
     int numCols = (m_param->sourceWidth  + g_maxCUSize - 1) / g_maxCUSize;
@@ -332,6 +353,38 @@
         }
     }
 
+    if (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion)
+    {
+        const char* name = m_param->analysisFileName;
+        if (!name)
+            name = defaultAnalysisFileName;
+        if (m_param->rc.bStatWrite)
+        {
+            char* temp = strcatFilename(name, ".temp");
+            if (!temp)
+                m_aborted = true;
+            else
+            {
+                m_analysisFileOut = fopen(temp, "wb");
+                X265_FREE(temp);
+            }
+            if (!m_analysisFileOut)
+            {
+                x265_log(NULL, X265_LOG_ERROR, "Analysis 2 pass: failed to open file %s\n", temp);
+                m_aborted = true;
+            }
+        }
+        if (m_param->rc.bStatRead)
+        {
+            m_analysisFileIn = fopen(name, "rb");
+            if (!m_analysisFileIn)
+            {
+                x265_log(NULL, X265_LOG_ERROR, "Analysis 2 pass: failed to open file %s\n", name);
+                m_aborted = true;
+            }
+        }
+    }
+
     m_bZeroLatency = !m_param->bframes && !m_param->lookaheadDepth && m_param->frameNumThreads == 1;
 
     m_aborted |= parseLambdaFile(m_param);
@@ -408,6 +461,35 @@
     if (m_analysisFile)
         fclose(m_analysisFile);
 
+    if (m_latestParam != NULL && m_latestParam != m_param)
+    {
+        if (m_latestParam->scalingLists != m_param->scalingLists)
+            free((char*)m_latestParam->scalingLists);
+
+        PARAM_NS::x265_param_free(m_latestParam);
+    }
+    if (m_analysisFileIn)
+        fclose(m_analysisFileIn);
+
+    if (m_analysisFileOut)
+    {
+        int bError = 1;
+        fclose(m_analysisFileOut);
+        const char* name = m_param->analysisFileName;
+        if (!name)
+            name = defaultAnalysisFileName;
+        char* temp = strcatFilename(name, ".temp");
+        if (temp)
+        {
+            x265_unlink(name);
+            bError = x265_rename(temp, name);
+        }
+        if (bError)
+        {
+            x265_log(m_param, X265_LOG_ERROR, "failed to rename analysis stats file to \"%s\"\n", name);
+        }
+        X265_FREE(temp);
+     }
     if (m_param)
     {
         /* release string arguments that were strdup'd */
@@ -420,8 +502,6 @@
 
         PARAM_NS::x265_param_free(m_param);
     }
-
-    PARAM_NS::x265_param_free(m_latestParam);
 }
 
 void Encoder::updateVbvPlan(RateControl* rc)
@@ -524,6 +604,7 @@
         if (m_dpb->m_freeList.empty())
         {
             inFrame = new Frame;
+            inFrame->m_encodeStartTime = x265_mdate();
             x265_param* p = m_reconfigure ? m_latestParam : m_param;
             if (inFrame->create(p, pic_in->quantOffsets))
             {
@@ -576,6 +657,7 @@
         else
         {
             inFrame = m_dpb->m_freeList.popBack();
+            inFrame->m_encodeStartTime = x265_mdate();
             /* Set lowres scencut and satdCost here to aovid overwriting ANALYSIS_READ
                decision by lowres init*/
             inFrame->m_lowres.bScenecut = false;
@@ -739,6 +821,17 @@
                     freeAnalysis(&pic_out->analysisData);
                 }
             }
+            if (m_param->rc.bStatWrite && (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion))
+            {
+                if (pic_out)
+                {
+                    pic_out->analysis2Pass.poc = pic_out->poc;
+                    pic_out->analysis2Pass.analysisFramedata = outFrame->m_analysis2Pass.analysisFramedata;
+                }
+                writeAnalysis2PassFile(&outFrame->m_analysis2Pass, *outFrame->m_encData, outFrame->m_lowres.sliceType);
+            }
+            if (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion)
+                freeAnalysis2Pass(&outFrame->m_analysis2Pass, outFrame->m_lowres.sliceType);
             if (m_param->internalCsp == X265_CSP_I400)
             {
                 if (slice->m_sliceType == P_SLICE)
@@ -836,6 +929,13 @@
             frameEnc = m_lookahead->getDecidedPicture();
         if (frameEnc && !pass)
         {
+            if (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion)
+            {
+                allocAnalysis2Pass(&frameEnc->m_analysis2Pass, frameEnc->m_lowres.sliceType);
+                frameEnc->m_analysis2Pass.poc = frameEnc->m_poc;
+                if (m_param->rc.bStatRead)
+                    readAnalysis2PassFile(&frameEnc->m_analysis2Pass, frameEnc->m_poc, frameEnc->m_lowres.sliceType);
+             }
             if (curEncoder->m_reconfigure)
             {
                 /* One round robin cycle of FE reconfigure is complete */
@@ -904,10 +1004,9 @@
                             iLeastCost = m_iBitsCostSum[i];
                         }
                     }
-
                     /* If last slice Qp is close to (26 + m_iPPSQpMinus26) or outputs is all I-frame video,
                        we don't need to change m_iPPSQpMinus26. */
-                    if ((abs(m_iLastSliceQp - (26 + m_iPPSQpMinus26)) > 1) && (m_iFrameNum > 1))
+                    if (m_iFrameNum > 1)
                         m_iPPSQpMinus26 = (iLeastId + 1) - 26;
                     m_iFrameNum = 0;
                 }
@@ -947,7 +1046,6 @@
                 analysis->numPartitions  = NUM_4x4_PARTITIONS;
                 allocAnalysis(analysis);
             }
-
             /* determine references, setup RPS, etc */
             m_dpb->prepareEncode(frameEnc);
 
@@ -986,6 +1084,8 @@
     encParam->bEnableRectInter = param->bEnableRectInter;
     encParam->maxNumMergeCand = param->maxNumMergeCand;
     encParam->bIntraInBFrames = param->bIntraInBFrames;
+    if (param->scalingLists && !encParam->scalingLists)
+        encParam->scalingLists = strdup(param->scalingLists);
     /* To add: Loop Filter/deblocking controls, transform skip, signhide require PPS to be resent */
     /* To add: SAO, temporal MVP, AMP, TU depths require SPS to be resent, at every CVS boundary */
     return x265_check_params(encParam);
@@ -1442,6 +1542,7 @@
         frameStats->refWaitWallTime = ELAPSED_MSEC(curEncoder->m_row0WaitTime, curEncoder->m_allRowsAvailableTime);
         frameStats->totalCTUTime = ELAPSED_MSEC(0, curEncoder->m_totalWorkerElapsedTime);
         frameStats->stallTime = ELAPSED_MSEC(0, curEncoder->m_totalNoWorkerTime);
+        frameStats->totalFrameTime = ELAPSED_MSEC(curFrame->m_encodeStartTime, x265_mdate());
         if (curEncoder->m_totalActiveWorkerCount)
             frameStats->avgWPP = (double)curEncoder->m_totalActiveWorkerCount / curEncoder->m_activeWorkerCountSamples;
         else
@@ -1553,21 +1654,7 @@
     bs.writeByteAlignment();
     list.serialize(NAL_UNIT_PPS, bs);
 
-    if (m_param->masteringDisplayColorVolume)
-    {
-        SEIMasteringDisplayColorVolume mdsei;
-        if (mdsei.parse(m_param->masteringDisplayColorVolume))
-        {
-            bs.resetBits();
-            mdsei.write(bs, m_sps);
-            bs.writeByteAlignment();
-            list.serialize(NAL_UNIT_PREFIX_SEI, bs);
-        }
-        else
-            x265_log(m_param, X265_LOG_WARNING, "unable to parse mastering display color volume info\n");
-    }
-
-    if (m_emitCLLSEI)
+    if (m_param->bEmitHDRSEI)
     {
         SEIContentLightLevel cllsei;
         cllsei.max_content_light_level = m_param->maxCLL;
@@ -1576,6 +1663,20 @@
         cllsei.write(bs, m_sps);
         bs.writeByteAlignment();
         list.serialize(NAL_UNIT_PREFIX_SEI, bs);
+
+        if (m_param->masteringDisplayColorVolume)
+        {
+            SEIMasteringDisplayColorVolume mdsei;
+            if (mdsei.parse(m_param->masteringDisplayColorVolume))
+            {
+                bs.resetBits();
+                mdsei.write(bs, m_sps);
+                bs.writeByteAlignment();
+                list.serialize(NAL_UNIT_PREFIX_SEI, bs);
+            }
+            else
+                x265_log(m_param, X265_LOG_WARNING, "unable to parse mastering display color volume info\n");
+        }
     }
 
     if (m_param->bEmitInfoSEI)
@@ -1714,7 +1815,7 @@
 {
     bool bIsVbv = m_param->rc.vbvBufferSize > 0 && m_param->rc.vbvMaxBitrate > 0;
 
-    if (!m_param->bLossless && (m_param->rc.aqMode || bIsVbv))
+    if (!m_param->bLossless && (m_param->rc.aqMode || bIsVbv || m_param->bAQMotion))
     {
         pps->bUseDQP = true;
         pps->maxCuDQPDepth = g_log2Size[m_param->maxCUSize] - g_log2Size[m_param->rc.qgSize];
@@ -1894,12 +1995,6 @@
     }
 
 
-    if (p->scalingLists && p->internalCsp == X265_CSP_I444)
-    {
-        x265_log(p, X265_LOG_WARNING, "Scaling lists are not yet supported for 4:4:4 chroma subsampling\n");
-        p->scalingLists = 0;
-    }
-
     if (p->interlaceMode)
         x265_log(p, X265_LOG_WARNING, "Support for interlaced video is experimental\n");
 
@@ -1921,6 +2016,19 @@
         p->rc.cuTree = 0;
     }
 
+    if (p->analysisMode && (p->analysisMultiPassRefine || p->analysisMultiPassDistortion))
+    {
+        x265_log(p, X265_LOG_WARNING, "Cannot use Analysis load/save option and multi-pass-opt-analysis/multi-pass-opt-distortion together,"
+            "Disabling Analysis load/save and multi-pass-opt-analysis/multi-pass-opt-distortion\n");
+        p->analysisMode = p->analysisMultiPassRefine = p->analysisMultiPassDistortion = 0;
+    }
+
+    if ((p->analysisMultiPassRefine || p->analysisMultiPassDistortion) && (p->bDistributeModeAnalysis || p->bDistributeMotionEstimation))
+    {
+        x265_log(p, X265_LOG_WARNING, "multi-pass-opt-analysis/multi-pass-opt-distortion incompatible with pmode/pme, Disabling pmode/pme\n");
+        p->bDistributeMotionEstimation = p->bDistributeModeAnalysis = 0;
+    }
+
     if (p->rc.bEnableGrain)
     {
         x265_log(p, X265_LOG_WARNING, "Rc Grain removes qp fluctuations caused by aq/cutree, Disabling aq,cu-tree\n");
@@ -1981,6 +2089,18 @@
         p->bDistributeModeAnalysis = 0;
     }
 
+    if (!p->rc.bStatWrite && !p->rc.bStatRead && (p->analysisMultiPassRefine || p->analysisMultiPassDistortion))
+    {
+        x265_log(p, X265_LOG_WARNING, "analysis-multi-pass/distortion is enabled only when rc multi pass is enabled. Disabling multi-pass-opt-analysis and multi-pass-opt-distortion");
+        p->analysisMultiPassRefine = 0;
+        p->analysisMultiPassDistortion = 0;
+    }
+    if (p->analysisMultiPassRefine && p->rc.bStatWrite && p->rc.bStatRead)
+    {
+        x265_log(p, X265_LOG_WARNING, "--multi-pass-opt-analysis doesn't support refining analysis through multiple-passes; it only reuses analysis from the second-to-last pass to the last pass.Disabling reading\n");
+        p->rc.bStatRead = 0;
+    }
+
     /* some options make no sense if others are disabled */
     p->bSaoNonDeblocked &= p->bEnableSAO;
     p->bEnableTSkipFast &= p->bEnableTransformSkip;
@@ -2009,13 +2129,19 @@
         x265_log(p, X265_LOG_WARNING, "--rd-refine disabled, requires RD level > 4 and adaptive quant\n");
     }
 
+    if (p->bOptCUDeltaQP && p->rdLevel < 5)
+    {
+        p->bOptCUDeltaQP = false;
+        x265_log(p, X265_LOG_WARNING, "--opt-cu-delta-qp disabled, requires RD level > 4\n");
+    }
+
     if (p->limitTU && p->tuQTMaxInterDepth < 2)
     {
         p->limitTU = 0;
         x265_log(p, X265_LOG_WARNING, "limit-tu disabled, requires tu-inter-depth > 1\n");
     }
     bool bIsVbv = m_param->rc.vbvBufferSize > 0 && m_param->rc.vbvMaxBitrate > 0;
-    if (!m_param->bLossless && (m_param->rc.aqMode || bIsVbv))
+    if (!m_param->bLossless && (m_param->rc.aqMode || bIsVbv || m_param->bAQMotion))
     {
         if (p->rc.qgSize < X265_MAX(8, p->minCUSize))
         {
@@ -2031,6 +2157,12 @@
     else
         m_param->rc.qgSize = p->maxCUSize;
 
+    if (m_param->dynamicRd && (!bIsVbv || !p->rc.aqMode || p->rdLevel > 4))
+    {
+        p->dynamicRd = 0;
+        x265_log(p, X265_LOG_WARNING, "Dynamic-rd disabled, requires RD <= 4, VBV and aq-mode enabled\n");
+    }
+
     if (p->uhdBluray)
     {
         p->bEnableAccessUnitDelimiters = 1;
@@ -2220,6 +2352,71 @@
     }
 }
 
+void Encoder::allocAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType)
+{
+    analysis->analysisFramedata = NULL;
+    analysis2PassFrameData *analysisFrameData = (analysis2PassFrameData*)analysis->analysisFramedata;
+    uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
+    uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
+
+    uint32_t numCUsInFrame = widthInCU * heightInCU;
+    CHECKED_MALLOC_ZERO(analysisFrameData, analysis2PassFrameData, 1);
+    CHECKED_MALLOC_ZERO(analysisFrameData->depth, uint8_t, NUM_4x4_PARTITIONS * numCUsInFrame);
+    CHECKED_MALLOC_ZERO(analysisFrameData->distortion, sse_t, NUM_4x4_PARTITIONS * numCUsInFrame);
+    if (m_param->rc.bStatRead)
+    {
+        CHECKED_MALLOC_ZERO(analysisFrameData->ctuDistortion, sse_t, numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->scaledDistortion, double, numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->offset, double, numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->threshold, double, numCUsInFrame);
+    }
+    if (!IS_X265_TYPE_I(sliceType))
+    {
+        CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[0], MV, NUM_4x4_PARTITIONS * numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[1], MV, NUM_4x4_PARTITIONS * numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[0], int, NUM_4x4_PARTITIONS * numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[1], int, NUM_4x4_PARTITIONS * numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->ref[0], int32_t, NUM_4x4_PARTITIONS * numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->ref[1], int32_t, NUM_4x4_PARTITIONS * numCUsInFrame);
+        CHECKED_MALLOC(analysisFrameData->modes, uint8_t, NUM_4x4_PARTITIONS * numCUsInFrame);
+    }
+
+    analysis->analysisFramedata = analysisFrameData;
+
+    return;
+
+fail:
+    freeAnalysis2Pass(analysis, sliceType);
+    m_aborted = true;
+}
+
+void Encoder::freeAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType)
+{
+    if (analysis->analysisFramedata)
+    {
+        X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->depth);
+        X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->distortion);
+        if (m_param->rc.bStatRead)
+        {
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->ctuDistortion);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->scaledDistortion);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->offset);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->threshold);
+        }
+        if (!IS_X265_TYPE_I(sliceType))
+        {
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->m_mv[0]);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->m_mv[1]);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->mvpIdx[0]);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->mvpIdx[1]);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->ref[0]);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->ref[1]);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->modes);
+        }
+        X265_FREE(analysis->analysisFramedata);
+    }
+}
+
 void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc)
 {
 
@@ -2335,6 +2532,131 @@
 #undef X265_FREAD
 }
 
+void Encoder::readAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, int curPoc, int sliceType)
+{
+
+#define X265_FREAD(val, size, readSize, fileOffset)\
+    if (fread(val, size, readSize, fileOffset) != readSize)\
+    {\
+    x265_log(NULL, X265_LOG_ERROR, "Error reading analysis 2 pass data\n"); \
+    freeAnalysis2Pass(analysis2Pass, sliceType); \
+    m_aborted = true; \
+    return; \
+}\
+
+    uint32_t depthBytes = 0;
+    uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
+    uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
+    uint32_t numCUsInFrame = widthInCU * heightInCU;
+
+    int poc; uint32_t frameRecordSize;
+    X265_FREAD(&frameRecordSize, sizeof(uint32_t), 1, m_analysisFileIn);
+    X265_FREAD(&depthBytes, sizeof(uint32_t), 1, m_analysisFileIn);
+    X265_FREAD(&poc, sizeof(int), 1, m_analysisFileIn);
+
+    if (poc != curPoc || feof(m_analysisFileIn))
+    {
+        x265_log(NULL, X265_LOG_WARNING, "Error reading analysis 2 pass data: Cannot find POC %d\n", curPoc);
+        freeAnalysis2Pass(analysis2Pass, sliceType);
+        return;
+    }
+    /* Now arrived at the right frame, read the record */
+    analysis2Pass->frameRecordSize = frameRecordSize;
+    uint8_t* tempBuf = NULL, *depthBuf = NULL;
+    sse_t *tempdistBuf = NULL, *distortionBuf = NULL;
+    tempBuf = X265_MALLOC(uint8_t, depthBytes);
+    X265_FREAD(tempBuf, sizeof(uint8_t), depthBytes, m_analysisFileIn);
+    tempdistBuf = X265_MALLOC(sse_t, depthBytes);
+    X265_FREAD(tempdistBuf, sizeof(sse_t), depthBytes, m_analysisFileIn);
+    depthBuf = tempBuf;
+    distortionBuf = tempdistBuf;
+    analysis2PassFrameData* analysisFrameData = (analysis2PassFrameData*)analysis2Pass->analysisFramedata;
+    size_t count = 0;
+    uint32_t ctuCount = 0;
+    double sum = 0, sqrSum = 0;
+    for (uint32_t d = 0; d < depthBytes; d++)
+    {
+        int bytes = NUM_4x4_PARTITIONS >> (depthBuf[d] * 2);
+        memset(&analysisFrameData->depth[count], depthBuf[d], bytes);
+        analysisFrameData->distortion[count] = distortionBuf[d];
+        analysisFrameData->ctuDistortion[ctuCount] += analysisFrameData->distortion[count];
+        count += bytes;
+        if ((count % (size_t)NUM_4x4_PARTITIONS) == 0)
+        {
+            analysisFrameData->scaledDistortion[ctuCount] = X265_LOG2(X265_MAX(analysisFrameData->ctuDistortion[ctuCount], 1));
+            sum += analysisFrameData->scaledDistortion[ctuCount];
+            sqrSum += analysisFrameData->scaledDistortion[ctuCount] * analysisFrameData->scaledDistortion[ctuCount];
+            ctuCount++;
+        }
+    }
+    double avg = sum / numCUsInFrame;
+    analysisFrameData->sdDistortion = pow(((sqrSum / numCUsInFrame) - (avg * avg)), 0.5);
+    analysisFrameData->averageDistortion = avg;
+    analysisFrameData->highDistortionCtuCount = analysisFrameData->lowDistortionCtuCount = 0;
+    for (uint32_t i = 0; i < numCUsInFrame; ++i)
+    {
+        analysisFrameData->threshold[i] = analysisFrameData->scaledDistortion[i] / analysisFrameData->averageDistortion;
+        analysisFrameData->offset[i] = (analysisFrameData->averageDistortion - analysisFrameData->scaledDistortion[i]) / analysisFrameData->sdDistortion;
+        if (analysisFrameData->threshold[i] < 0.9 && analysisFrameData->offset[i] >= 1)
+            analysisFrameData->lowDistortionCtuCount++;
+        else if (analysisFrameData->threshold[i] > 1.1 && analysisFrameData->offset[i] <= -1)
+            analysisFrameData->highDistortionCtuCount++;
+    }
+    if (!IS_X265_TYPE_I(sliceType))
+    {
+        MV *tempMVBuf[2], *MVBuf[2];
+        int32_t *tempRefBuf[2], *refBuf[2];
+        int *tempMvpBuf[2], *mvpBuf[2];
+        uint8_t* tempModeBuf = NULL, *modeBuf = NULL;
+
+        int numDir = sliceType == X265_TYPE_P ? 1 : 2;
+        for (int i = 0; i < numDir; i++)
+        {
+            tempMVBuf[i] = X265_MALLOC(MV, depthBytes);
+            X265_FREAD(tempMVBuf[i], sizeof(MV), depthBytes, m_analysisFileIn);
+            MVBuf[i] = tempMVBuf[i];
+            tempMvpBuf[i] = X265_MALLOC(int, depthBytes);
+            X265_FREAD(tempMvpBuf[i], sizeof(int), depthBytes, m_analysisFileIn);
+            mvpBuf[i] = tempMvpBuf[i];
+            tempRefBuf[i] = X265_MALLOC(int32_t, depthBytes);
+            X265_FREAD(tempRefBuf[i], sizeof(int32_t), depthBytes, m_analysisFileIn);
+            refBuf[i] = tempRefBuf[i];
+        }
+        tempModeBuf = X265_MALLOC(uint8_t, depthBytes);
+        X265_FREAD(tempModeBuf, sizeof(uint8_t), depthBytes, m_analysisFileIn);
+        modeBuf = tempModeBuf;
+
+        count = 0;
+        for (uint32_t d = 0; d < depthBytes; d++)
+        {
+            size_t bytes = NUM_4x4_PARTITIONS >> (depthBuf[d] * 2);
+            for (int i = 0; i < numDir; i++)
+            {
+                for (size_t j = count, k = 0; k < bytes; j++, k++)
+                {
+                    memcpy(&((analysis2PassFrameData*)analysis2Pass->analysisFramedata)->m_mv[i][j], MVBuf[i] + d, sizeof(MV));
+                    memcpy(&((analysis2PassFrameData*)analysis2Pass->analysisFramedata)->mvpIdx[i][j], mvpBuf[i] + d, sizeof(int));
+                    memcpy(&((analysis2PassFrameData*)analysis2Pass->analysisFramedata)->ref[i][j], refBuf[i] + d, sizeof(int32_t));
+                }
+            }
+            memset(&((analysis2PassFrameData *)analysis2Pass->analysisFramedata)->modes[count], modeBuf[d], bytes);
+            count += bytes;
+        }
+
+        for (int i = 0; i < numDir; i++)
+        {
+            X265_FREE(tempMVBuf[i]);
+            X265_FREE(tempMvpBuf[i]);
+            X265_FREE(tempRefBuf[i]);
+        }
+        X265_FREE(tempModeBuf);
+    }
+    X265_FREE(tempBuf);
+    X265_FREE(tempdistBuf);
+
+#undef X265_FREAD
+}
+
 void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncData)
 {
 
@@ -2450,6 +2772,101 @@
     }
 #undef X265_FWRITE
 }
+void Encoder::writeAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, FrameData &curEncData, int slicetype)
+{
+#define X265_FWRITE(val, size, writeSize, fileOffset)\
+    if (fwrite(val, size, writeSize, fileOffset) < writeSize)\
+    {\
+    x265_log(NULL, X265_LOG_ERROR, "Error writing analysis 2 pass data\n"); \
+    freeAnalysis2Pass(analysis2Pass, slicetype); \
+    m_aborted = true; \
+    return; \
+}\
+
+    uint32_t depthBytes = 0;
+    uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
+    uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
+    uint32_t numCUsInFrame = widthInCU * heightInCU;
+    analysis2PassFrameData* analysisFrameData = (analysis2PassFrameData*)analysis2Pass->analysisFramedata;
+
+    for (uint32_t cuAddr = 0; cuAddr < numCUsInFrame; cuAddr++)
+    {
+        uint8_t depth = 0;
+
+        CUData* ctu = curEncData.getPicCTU(cuAddr);
+
+        for (uint32_t absPartIdx = 0; absPartIdx < ctu->m_numPartitions; depthBytes++)
+        {
+            depth = ctu->m_cuDepth[absPartIdx];
+            analysisFrameData->depth[depthBytes] = depth;
+            analysisFrameData->distortion[depthBytes] = ctu->m_distortion[absPartIdx];
+            absPartIdx += ctu->m_numPartitions >> (depth * 2);
+        }
+    }
+
+    if (curEncData.m_slice->m_sliceType != I_SLICE)
+    {
+        depthBytes = 0;
+        for (uint32_t cuAddr = 0; cuAddr < numCUsInFrame; cuAddr++)
+        {
+            uint8_t depth = 0;
+            uint8_t predMode = 0;
+
+            CUData* ctu = curEncData.getPicCTU(cuAddr);
+
+            for (uint32_t absPartIdx = 0; absPartIdx < ctu->m_numPartitions; depthBytes++)
+            {
+                depth = ctu->m_cuDepth[absPartIdx];
+                analysisFrameData->m_mv[0][depthBytes] = ctu->m_mv[0][absPartIdx];
+                analysisFrameData->mvpIdx[0][depthBytes] = ctu->m_mvpIdx[0][absPartIdx];
+                analysisFrameData->ref[0][depthBytes] = ctu->m_refIdx[0][absPartIdx];
+                predMode = ctu->m_predMode[absPartIdx];
+                if (ctu->m_refIdx[1][absPartIdx] != -1)
+                {
+                    analysisFrameData->m_mv[1][depthBytes] = ctu->m_mv[1][absPartIdx];
+                    analysisFrameData->mvpIdx[1][depthBytes] = ctu->m_mvpIdx[1][absPartIdx];
+                    analysisFrameData->ref[1][depthBytes] = ctu->m_refIdx[1][absPartIdx];
+                    predMode = 4; // used as indiacator if the block is coded as bidir
+                }
+                analysisFrameData->modes[depthBytes] = predMode;
+
+                absPartIdx += ctu->m_numPartitions >> (depth * 2);
+            }
+        }
+    }
+
+    /* calculate frameRecordSize */
+    analysis2Pass->frameRecordSize = sizeof(analysis2Pass->frameRecordSize) + sizeof(depthBytes) + sizeof(analysis2Pass->poc);
+
+    analysis2Pass->frameRecordSize += depthBytes * sizeof(uint8_t);
+    analysis2Pass->frameRecordSize += depthBytes * sizeof(sse_t);
+    if (curEncData.m_slice->m_sliceType != I_SLICE)
+    {
+        int numDir = (curEncData.m_slice->m_sliceType == P_SLICE) ? 1 : 2;
+        analysis2Pass->frameRecordSize += depthBytes * sizeof(MV) * numDir;
+        analysis2Pass->frameRecordSize += depthBytes * sizeof(int32_t) * numDir;
+        analysis2Pass->frameRecordSize += depthBytes * sizeof(int) * numDir;
+        analysis2Pass->frameRecordSize += depthBytes * sizeof(uint8_t);
+    }
+    X265_FWRITE(&analysis2Pass->frameRecordSize, sizeof(uint32_t), 1, m_analysisFileOut);
+    X265_FWRITE(&depthBytes, sizeof(uint32_t), 1, m_analysisFileOut);
+    X265_FWRITE(&analysis2Pass->poc, sizeof(uint32_t), 1, m_analysisFileOut);
+
+    X265_FWRITE(analysisFrameData->depth, sizeof(uint8_t), depthBytes, m_analysisFileOut);
+    X265_FWRITE(analysisFrameData->distortion, sizeof(sse_t), depthBytes, m_analysisFileOut);
+    if (curEncData.m_slice->m_sliceType != I_SLICE)
+    {
+        int numDir = curEncData.m_slice->m_sliceType == P_SLICE ? 1 : 2;
+        for (int i = 0; i < numDir; i++)
+        {
+            X265_FWRITE(analysisFrameData->m_mv[i], sizeof(MV), depthBytes, m_analysisFileOut);
+            X265_FWRITE(analysisFrameData->mvpIdx[i], sizeof(int), depthBytes, m_analysisFileOut);
+            X265_FWRITE(analysisFrameData->ref[i], sizeof(int32_t), depthBytes, m_analysisFileOut);
+        }
+        X265_FWRITE(analysisFrameData->modes, sizeof(uint8_t), depthBytes, m_analysisFileOut);
+    }
+#undef X265_FWRITE
+}
 
 void Encoder::printReconfigureParams()
 {
@@ -2460,7 +2877,7 @@
     
     x265_log(newParam, X265_LOG_DEBUG, "Reconfigured param options, input Frame: %d\n", m_pocLast + 1);
 
-    char tmp[40];
+    char tmp[60];
 #define TOOLCMP(COND1, COND2, STR)  if (COND1 != COND2) { sprintf(tmp, STR, COND1, COND2); x265_log(newParam, X265_LOG_DEBUG, tmp); }
     TOOLCMP(oldParam->maxNumReferences, newParam->maxNumReferences, "ref=%d to %d\n");
     TOOLCMP(oldParam->bEnableFastIntra, newParam->bEnableFastIntra, "fast-intra=%d to %d\n");
@@ -2474,6 +2891,7 @@
     TOOLCMP(oldParam->bEnableRectInter, newParam->bEnableRectInter, "rect=%d to %d\n");
     TOOLCMP(oldParam->maxNumMergeCand, newParam->maxNumMergeCand, "max-merge=%d to %d\n");
     TOOLCMP(oldParam->bIntraInBFrames, newParam->bIntraInBFrames, "b-intra=%d to %d\n");
+    TOOLCMP(oldParam->scalingLists, newParam->scalingLists, "scalinglists=%s to %s\n");
 }
 
 bool Encoder::computeSPSRPSIndex()

 
@@ -73,10 +73,11 @@
     m_latestParam = NULL;
     m_threadPool = NULL;
     m_analysisFile = NULL;
+    m_analysisFileIn = NULL;
+    m_analysisFileOut = NULL;
     m_offsetEmergency = NULL;
     m_iFrameNum = 0;
     m_iPPSQpMinus26 = 0;
-    m_iLastSliceQp = 0;
     m_rpsInSpsCount = 0;
     for (int i = 0; i < X265_MAX_FRAME_THREADS; i++)
         m_frameEncoder[i] = NULL;
@@ -84,6 +85,19 @@
     MotionEstimate::initScales();
 }
 
+inline char *strcatFilename(const char *input, const char *suffix)
+{
+    char *output = X265_MALLOC(char, strlen(input) + strlen(suffix) + 1);
+    if (!output)
+    {
+        x265_log(NULL, X265_LOG_ERROR, "unable to allocate memory for filename\n");
+        return NULL;
+    }
+    strcpy(output, input);
+    strcat(output, suffix);
+    return output;
+}
+
 void Encoder::create()
 {
     if (!primitives.pu[0].sad)
@@ -128,11 +142,9 @@
         else
             p->frameNumThreads = 1;
     }
-
     m_numPools = 0;
     if (allowPools)
-        m_threadPool = ThreadPool::allocThreadPools(p, m_numPools);
-
+        m_threadPool = ThreadPool::allocThreadPools(p, m_numPools, 0);
     if (!m_numPools)
     {
         // issue warnings if any of these features were requested
@@ -201,17 +213,26 @@
         m_scalingList.setDefaultScalingList();
     else if (m_scalingList.parseScalingList(m_param->scalingLists))
         m_aborted = true;
-
-    m_lookahead = new Lookahead(m_param, m_threadPool);
-    if (m_numPools)
+    int pools = m_numPools;
+    ThreadPool* lookAheadThreadPool = 0;
+    if (m_param->lookaheadThreads > 0)
     {
-        m_lookahead->m_jpId = m_threadPool[0].m_numProviders++;
-        m_threadPool[0].m_jpTable[m_lookahead->m_jpId] = m_lookahead;
+        lookAheadThreadPool = ThreadPool::allocThreadPools(p, pools, 1);
     }
-
+    else
+        lookAheadThreadPool = m_threadPool;
+    m_lookahead = new Lookahead(m_param, lookAheadThreadPool);
+    if (pools)
+    {
+        m_lookahead->m_jpId = lookAheadThreadPool[0].m_numProviders++;
+        lookAheadThreadPool[0].m_jpTable[m_lookahead->m_jpId] = m_lookahead;
+    }
+    if (m_param->lookaheadThreads > 0)
+        for (int i = 0; i < pools; i++)
+            lookAheadThreadPool[i].start();
+    m_lookahead->m_numPools = pools;
     m_dpb = new DPB(m_param);
     m_rateControl = new RateControl(*m_param);
-
     initVPS(&m_vps);
     initSPS(&m_sps);
     initPPS(&m_pps);
@@ -230,10 +251,10 @@
         if (!scalingEnabled)
         {
             m_scalingList.setDefaultScalingList();
-            m_scalingList.setupQuantMatrices();
+            m_scalingList.setupQuantMatrices(m_sps.chromaFormatIdc);
         }
         else
-            m_scalingList.setupQuantMatrices();
+            m_scalingList.setupQuantMatrices(m_sps.chromaFormatIdc);
 
         for (int q = 0; q < QP_MAX_MAX - QP_MAX_SPEC; q++)
         {
@@ -286,11 +307,11 @@
         {
             m_scalingList.m_bEnabled = false;
             m_scalingList.m_bDataPresent = false;
-            m_scalingList.setupQuantMatrices();
+            m_scalingList.setupQuantMatrices(m_sps.chromaFormatIdc);
         }
     }
     else
-        m_scalingList.setupQuantMatrices();
+        m_scalingList.setupQuantMatrices(m_sps.chromaFormatIdc);
 
     int numRows = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
     int numCols = (m_param->sourceWidth  + g_maxCUSize - 1) / g_maxCUSize;
@@ -332,6 +353,38 @@
         }
     }
 
+    if (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion)
+    {
+        const char* name = m_param->analysisFileName;
+        if (!name)
+            name = defaultAnalysisFileName;
+        if (m_param->rc.bStatWrite)
+        {
+            char* temp = strcatFilename(name, ".temp");
+            if (!temp)
+                m_aborted = true;
+            else
+            {
+                m_analysisFileOut = fopen(temp, "wb");
+                X265_FREE(temp);
+            }
+            if (!m_analysisFileOut)
+            {
+                x265_log(NULL, X265_LOG_ERROR, "Analysis 2 pass: failed to open file %s\n", temp);
+                m_aborted = true;
+            }
+        }
+        if (m_param->rc.bStatRead)
+        {
+            m_analysisFileIn = fopen(name, "rb");
+            if (!m_analysisFileIn)
+            {
+                x265_log(NULL, X265_LOG_ERROR, "Analysis 2 pass: failed to open file %s\n", name);
+                m_aborted = true;
+            }
+        }
+    }
+
     m_bZeroLatency = !m_param->bframes && !m_param->lookaheadDepth && m_param->frameNumThreads == 1;
 
     m_aborted |= parseLambdaFile(m_param);
@@ -408,6 +461,35 @@
     if (m_analysisFile)
         fclose(m_analysisFile);
 
+    if (m_latestParam != NULL && m_latestParam != m_param)
+    {
+        if (m_latestParam->scalingLists != m_param->scalingLists)
+            free((char*)m_latestParam->scalingLists);
+
+        PARAM_NS::x265_param_free(m_latestParam);
+    }
+    if (m_analysisFileIn)
+        fclose(m_analysisFileIn);
+
+    if (m_analysisFileOut)
+    {
+        int bError = 1;
+        fclose(m_analysisFileOut);
+        const char* name = m_param->analysisFileName;
+        if (!name)
+            name = defaultAnalysisFileName;
+        char* temp = strcatFilename(name, ".temp");
+        if (temp)
+        {
+            x265_unlink(name);
+            bError = x265_rename(temp, name);
+        }
+        if (bError)
+        {
+            x265_log(m_param, X265_LOG_ERROR, "failed to rename analysis stats file to \"%s\"\n", name);
+        }
+        X265_FREE(temp);
+     }
     if (m_param)
     {
         /* release string arguments that were strdup'd */
@@ -420,8 +502,6 @@
 
         PARAM_NS::x265_param_free(m_param);
     }
-
-    PARAM_NS::x265_param_free(m_latestParam);
 }
 
 void Encoder::updateVbvPlan(RateControl* rc)
@@ -524,6 +604,7 @@
         if (m_dpb->m_freeList.empty())
         {
             inFrame = new Frame;
+            inFrame->m_encodeStartTime = x265_mdate();
             x265_param* p = m_reconfigure ? m_latestParam : m_param;
             if (inFrame->create(p, pic_in->quantOffsets))
             {
@@ -576,6 +657,7 @@
         else
         {
             inFrame = m_dpb->m_freeList.popBack();
+            inFrame->m_encodeStartTime = x265_mdate();
             /* Set lowres scencut and satdCost here to aovid overwriting ANALYSIS_READ
                decision by lowres init*/
             inFrame->m_lowres.bScenecut = false;
@@ -739,6 +821,17 @@
                     freeAnalysis(&pic_out->analysisData);
                 }
             }
+            if (m_param->rc.bStatWrite && (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion))
+            {
+                if (pic_out)
+                {
+                    pic_out->analysis2Pass.poc = pic_out->poc;
+                    pic_out->analysis2Pass.analysisFramedata = outFrame->m_analysis2Pass.analysisFramedata;
+                }
+                writeAnalysis2PassFile(&outFrame->m_analysis2Pass, *outFrame->m_encData, outFrame->m_lowres.sliceType);
+            }
+            if (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion)
+                freeAnalysis2Pass(&outFrame->m_analysis2Pass, outFrame->m_lowres.sliceType);
             if (m_param->internalCsp == X265_CSP_I400)
             {
                 if (slice->m_sliceType == P_SLICE)
@@ -836,6 +929,13 @@
             frameEnc = m_lookahead->getDecidedPicture();
         if (frameEnc && !pass)
         {
+            if (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion)
+            {
+                allocAnalysis2Pass(&frameEnc->m_analysis2Pass, frameEnc->m_lowres.sliceType);
+                frameEnc->m_analysis2Pass.poc = frameEnc->m_poc;
+                if (m_param->rc.bStatRead)
+                    readAnalysis2PassFile(&frameEnc->m_analysis2Pass, frameEnc->m_poc, frameEnc->m_lowres.sliceType);
+             }
             if (curEncoder->m_reconfigure)
             {
                 /* One round robin cycle of FE reconfigure is complete */
@@ -904,10 +1004,9 @@
                             iLeastCost = m_iBitsCostSum[i];
                         }
                     }
-
                     /* If last slice Qp is close to (26 + m_iPPSQpMinus26) or outputs is all I-frame video,
                        we don't need to change m_iPPSQpMinus26. */
-                    if ((abs(m_iLastSliceQp - (26 + m_iPPSQpMinus26)) > 1) && (m_iFrameNum > 1))
+                    if (m_iFrameNum > 1)
                         m_iPPSQpMinus26 = (iLeastId + 1) - 26;
                     m_iFrameNum = 0;
                 }
@@ -947,7 +1046,6 @@
                 analysis->numPartitions  = NUM_4x4_PARTITIONS;
                 allocAnalysis(analysis);
             }
-
             /* determine references, setup RPS, etc */
             m_dpb->prepareEncode(frameEnc);
 
@@ -986,6 +1084,8 @@
     encParam->bEnableRectInter = param->bEnableRectInter;
     encParam->maxNumMergeCand = param->maxNumMergeCand;
     encParam->bIntraInBFrames = param->bIntraInBFrames;
+    if (param->scalingLists && !encParam->scalingLists)
+        encParam->scalingLists = strdup(param->scalingLists);
     /* To add: Loop Filter/deblocking controls, transform skip, signhide require PPS to be resent */
     /* To add: SAO, temporal MVP, AMP, TU depths require SPS to be resent, at every CVS boundary */
     return x265_check_params(encParam);
@@ -1442,6 +1542,7 @@
         frameStats->refWaitWallTime = ELAPSED_MSEC(curEncoder->m_row0WaitTime, curEncoder->m_allRowsAvailableTime);
         frameStats->totalCTUTime = ELAPSED_MSEC(0, curEncoder->m_totalWorkerElapsedTime);
         frameStats->stallTime = ELAPSED_MSEC(0, curEncoder->m_totalNoWorkerTime);
+        frameStats->totalFrameTime = ELAPSED_MSEC(curFrame->m_encodeStartTime, x265_mdate());
         if (curEncoder->m_totalActiveWorkerCount)
             frameStats->avgWPP = (double)curEncoder->m_totalActiveWorkerCount / curEncoder->m_activeWorkerCountSamples;
         else
@@ -1553,21 +1654,7 @@
     bs.writeByteAlignment();
     list.serialize(NAL_UNIT_PPS, bs);
 
-    if (m_param->masteringDisplayColorVolume)
-    {
-        SEIMasteringDisplayColorVolume mdsei;
-        if (mdsei.parse(m_param->masteringDisplayColorVolume))
-        {
-            bs.resetBits();
-            mdsei.write(bs, m_sps);
-            bs.writeByteAlignment();
-            list.serialize(NAL_UNIT_PREFIX_SEI, bs);
-        }
-        else
-            x265_log(m_param, X265_LOG_WARNING, "unable to parse mastering display color volume info\n");
-    }
-
-    if (m_emitCLLSEI)
+    if (m_param->bEmitHDRSEI)
     {
         SEIContentLightLevel cllsei;
         cllsei.max_content_light_level = m_param->maxCLL;
@@ -1576,6 +1663,20 @@
         cllsei.write(bs, m_sps);
         bs.writeByteAlignment();
         list.serialize(NAL_UNIT_PREFIX_SEI, bs);
+
+        if (m_param->masteringDisplayColorVolume)
+        {
+            SEIMasteringDisplayColorVolume mdsei;
+            if (mdsei.parse(m_param->masteringDisplayColorVolume))
+            {
+                bs.resetBits();
+                mdsei.write(bs, m_sps);
+                bs.writeByteAlignment();
+                list.serialize(NAL_UNIT_PREFIX_SEI, bs);
+            }
+            else
+                x265_log(m_param, X265_LOG_WARNING, "unable to parse mastering display color volume info\n");
+        }
     }
 
     if (m_param->bEmitInfoSEI)
@@ -1714,7 +1815,7 @@
 {
     bool bIsVbv = m_param->rc.vbvBufferSize > 0 && m_param->rc.vbvMaxBitrate > 0;
 
-    if (!m_param->bLossless && (m_param->rc.aqMode || bIsVbv))
+    if (!m_param->bLossless && (m_param->rc.aqMode || bIsVbv || m_param->bAQMotion))
     {
         pps->bUseDQP = true;
         pps->maxCuDQPDepth = g_log2Size[m_param->maxCUSize] - g_log2Size[m_param->rc.qgSize];
@@ -1894,12 +1995,6 @@
     }
 
 
-    if (p->scalingLists && p->internalCsp == X265_CSP_I444)
-    {
-        x265_log(p, X265_LOG_WARNING, "Scaling lists are not yet supported for 4:4:4 chroma subsampling\n");
-        p->scalingLists = 0;
-    }
-
     if (p->interlaceMode)
         x265_log(p, X265_LOG_WARNING, "Support for interlaced video is experimental\n");
 
@@ -1921,6 +2016,19 @@
         p->rc.cuTree = 0;
     }
 
+    if (p->analysisMode && (p->analysisMultiPassRefine || p->analysisMultiPassDistortion))
+    {
+        x265_log(p, X265_LOG_WARNING, "Cannot use Analysis load/save option and multi-pass-opt-analysis/multi-pass-opt-distortion together,"
+            "Disabling Analysis load/save and multi-pass-opt-analysis/multi-pass-opt-distortion\n");
+        p->analysisMode = p->analysisMultiPassRefine = p->analysisMultiPassDistortion = 0;
+    }
+
+    if ((p->analysisMultiPassRefine || p->analysisMultiPassDistortion) && (p->bDistributeModeAnalysis || p->bDistributeMotionEstimation))
+    {
+        x265_log(p, X265_LOG_WARNING, "multi-pass-opt-analysis/multi-pass-opt-distortion incompatible with pmode/pme, Disabling pmode/pme\n");
+        p->bDistributeMotionEstimation = p->bDistributeModeAnalysis = 0;
+    }
+
     if (p->rc.bEnableGrain)
     {
         x265_log(p, X265_LOG_WARNING, "Rc Grain removes qp fluctuations caused by aq/cutree, Disabling aq,cu-tree\n");
@@ -1981,6 +2089,18 @@
         p->bDistributeModeAnalysis = 0;
     }
 
+    if (!p->rc.bStatWrite && !p->rc.bStatRead && (p->analysisMultiPassRefine || p->analysisMultiPassDistortion))
+    {
+        x265_log(p, X265_LOG_WARNING, "analysis-multi-pass/distortion is enabled only when rc multi pass is enabled. Disabling multi-pass-opt-analysis and multi-pass-opt-distortion");
+        p->analysisMultiPassRefine = 0;
+        p->analysisMultiPassDistortion = 0;
+    }
+    if (p->analysisMultiPassRefine && p->rc.bStatWrite && p->rc.bStatRead)
+    {
+        x265_log(p, X265_LOG_WARNING, "--multi-pass-opt-analysis doesn't support refining analysis through multiple-passes; it only reuses analysis from the second-to-last pass to the last pass.Disabling reading\n");
+        p->rc.bStatRead = 0;
+    }
+
     /* some options make no sense if others are disabled */
     p->bSaoNonDeblocked &= p->bEnableSAO;
     p->bEnableTSkipFast &= p->bEnableTransformSkip;
@@ -2009,13 +2129,19 @@
         x265_log(p, X265_LOG_WARNING, "--rd-refine disabled, requires RD level > 4 and adaptive quant\n");
     }
 
+    if (p->bOptCUDeltaQP && p->rdLevel < 5)
+    {
+        p->bOptCUDeltaQP = false;
+        x265_log(p, X265_LOG_WARNING, "--opt-cu-delta-qp disabled, requires RD level > 4\n");
+    }
+
     if (p->limitTU && p->tuQTMaxInterDepth < 2)
     {
         p->limitTU = 0;
         x265_log(p, X265_LOG_WARNING, "limit-tu disabled, requires tu-inter-depth > 1\n");
     }
     bool bIsVbv = m_param->rc.vbvBufferSize > 0 && m_param->rc.vbvMaxBitrate > 0;
-    if (!m_param->bLossless && (m_param->rc.aqMode || bIsVbv))
+    if (!m_param->bLossless && (m_param->rc.aqMode || bIsVbv || m_param->bAQMotion))
     {
         if (p->rc.qgSize < X265_MAX(8, p->minCUSize))
         {
@@ -2031,6 +2157,12 @@
     else
         m_param->rc.qgSize = p->maxCUSize;
 
+    if (m_param->dynamicRd && (!bIsVbv || !p->rc.aqMode || p->rdLevel > 4))
+    {
+        p->dynamicRd = 0;
+        x265_log(p, X265_LOG_WARNING, "Dynamic-rd disabled, requires RD <= 4, VBV and aq-mode enabled\n");
+    }
+
     if (p->uhdBluray)
     {
         p->bEnableAccessUnitDelimiters = 1;
@@ -2220,6 +2352,71 @@
     }
 }
 
+void Encoder::allocAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType)
+{
+    analysis->analysisFramedata = NULL;
+    analysis2PassFrameData *analysisFrameData = (analysis2PassFrameData*)analysis->analysisFramedata;
+    uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
+    uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
+
+    uint32_t numCUsInFrame = widthInCU * heightInCU;
+    CHECKED_MALLOC_ZERO(analysisFrameData, analysis2PassFrameData, 1);
+    CHECKED_MALLOC_ZERO(analysisFrameData->depth, uint8_t, NUM_4x4_PARTITIONS * numCUsInFrame);
+    CHECKED_MALLOC_ZERO(analysisFrameData->distortion, sse_t, NUM_4x4_PARTITIONS * numCUsInFrame);
+    if (m_param->rc.bStatRead)
+    {
+        CHECKED_MALLOC_ZERO(analysisFrameData->ctuDistortion, sse_t, numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->scaledDistortion, double, numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->offset, double, numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->threshold, double, numCUsInFrame);
+    }
+    if (!IS_X265_TYPE_I(sliceType))
+    {
+        CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[0], MV, NUM_4x4_PARTITIONS * numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[1], MV, NUM_4x4_PARTITIONS * numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[0], int, NUM_4x4_PARTITIONS * numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[1], int, NUM_4x4_PARTITIONS * numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->ref[0], int32_t, NUM_4x4_PARTITIONS * numCUsInFrame);
+        CHECKED_MALLOC_ZERO(analysisFrameData->ref[1], int32_t, NUM_4x4_PARTITIONS * numCUsInFrame);
+        CHECKED_MALLOC(analysisFrameData->modes, uint8_t, NUM_4x4_PARTITIONS * numCUsInFrame);
+    }
+
+    analysis->analysisFramedata = analysisFrameData;
+
+    return;
+
+fail:
+    freeAnalysis2Pass(analysis, sliceType);
+    m_aborted = true;
+}
+
+void Encoder::freeAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType)
+{
+    if (analysis->analysisFramedata)
+    {
+        X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->depth);
+        X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->distortion);
+        if (m_param->rc.bStatRead)
+        {
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->ctuDistortion);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->scaledDistortion);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->offset);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->threshold);
+        }
+        if (!IS_X265_TYPE_I(sliceType))
+        {
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->m_mv[0]);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->m_mv[1]);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->mvpIdx[0]);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->mvpIdx[1]);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->ref[0]);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->ref[1]);
+            X265_FREE(((analysis2PassFrameData*)analysis->analysisFramedata)->modes);
+        }
+        X265_FREE(analysis->analysisFramedata);
+    }
+}
+
 void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc)
 {
 
@@ -2335,6 +2532,131 @@
 #undef X265_FREAD
 }
 
+void Encoder::readAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, int curPoc, int sliceType)
+{
+
+#define X265_FREAD(val, size, readSize, fileOffset)\
+    if (fread(val, size, readSize, fileOffset) != readSize)\
+    {\
+    x265_log(NULL, X265_LOG_ERROR, "Error reading analysis 2 pass data\n"); \
+    freeAnalysis2Pass(analysis2Pass, sliceType); \
+    m_aborted = true; \
+    return; \
+}\
+
+    uint32_t depthBytes = 0;
+    uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
+    uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
+    uint32_t numCUsInFrame = widthInCU * heightInCU;
+
+    int poc; uint32_t frameRecordSize;
+    X265_FREAD(&frameRecordSize, sizeof(uint32_t), 1, m_analysisFileIn);
+    X265_FREAD(&depthBytes, sizeof(uint32_t), 1, m_analysisFileIn);
+    X265_FREAD(&poc, sizeof(int), 1, m_analysisFileIn);
+
+    if (poc != curPoc || feof(m_analysisFileIn))
+    {
+        x265_log(NULL, X265_LOG_WARNING, "Error reading analysis 2 pass data: Cannot find POC %d\n", curPoc);
+        freeAnalysis2Pass(analysis2Pass, sliceType);
+        return;
+    }
+    /* Now arrived at the right frame, read the record */
+    analysis2Pass->frameRecordSize = frameRecordSize;
+    uint8_t* tempBuf = NULL, *depthBuf = NULL;
+    sse_t *tempdistBuf = NULL, *distortionBuf = NULL;
+    tempBuf = X265_MALLOC(uint8_t, depthBytes);
+    X265_FREAD(tempBuf, sizeof(uint8_t), depthBytes, m_analysisFileIn);
+    tempdistBuf = X265_MALLOC(sse_t, depthBytes);
+    X265_FREAD(tempdistBuf, sizeof(sse_t), depthBytes, m_analysisFileIn);
+    depthBuf = tempBuf;
+    distortionBuf = tempdistBuf;
+    analysis2PassFrameData* analysisFrameData = (analysis2PassFrameData*)analysis2Pass->analysisFramedata;
+    size_t count = 0;
+    uint32_t ctuCount = 0;
+    double sum = 0, sqrSum = 0;
+    for (uint32_t d = 0; d < depthBytes; d++)
+    {
+        int bytes = NUM_4x4_PARTITIONS >> (depthBuf[d] * 2);
+        memset(&analysisFrameData->depth[count], depthBuf[d], bytes);
+        analysisFrameData->distortion[count] = distortionBuf[d];
+        analysisFrameData->ctuDistortion[ctuCount] += analysisFrameData->distortion[count];
+        count += bytes;
+        if ((count % (size_t)NUM_4x4_PARTITIONS) == 0)
+        {
+            analysisFrameData->scaledDistortion[ctuCount] = X265_LOG2(X265_MAX(analysisFrameData->ctuDistortion[ctuCount], 1));
+            sum += analysisFrameData->scaledDistortion[ctuCount];
+            sqrSum += analysisFrameData->scaledDistortion[ctuCount] * analysisFrameData->scaledDistortion[ctuCount];
+            ctuCount++;
+        }
+    }
+    double avg = sum / numCUsInFrame;
+    analysisFrameData->sdDistortion = pow(((sqrSum / numCUsInFrame) - (avg * avg)), 0.5);
+    analysisFrameData->averageDistortion = avg;
+    analysisFrameData->highDistortionCtuCount = analysisFrameData->lowDistortionCtuCount = 0;
+    for (uint32_t i = 0; i < numCUsInFrame; ++i)
+    {
+        analysisFrameData->threshold[i] = analysisFrameData->scaledDistortion[i] / analysisFrameData->averageDistortion;
+        analysisFrameData->offset[i] = (analysisFrameData->averageDistortion - analysisFrameData->scaledDistortion[i]) / analysisFrameData->sdDistortion;
+        if (analysisFrameData->threshold[i] < 0.9 && analysisFrameData->offset[i] >= 1)
+            analysisFrameData->lowDistortionCtuCount++;
+        else if (analysisFrameData->threshold[i] > 1.1 && analysisFrameData->offset[i] <= -1)
+            analysisFrameData->highDistortionCtuCount++;
+    }
+    if (!IS_X265_TYPE_I(sliceType))
+    {
+        MV *tempMVBuf[2], *MVBuf[2];
+        int32_t *tempRefBuf[2], *refBuf[2];
+        int *tempMvpBuf[2], *mvpBuf[2];
+        uint8_t* tempModeBuf = NULL, *modeBuf = NULL;
+
+        int numDir = sliceType == X265_TYPE_P ? 1 : 2;
+        for (int i = 0; i < numDir; i++)
+        {
+            tempMVBuf[i] = X265_MALLOC(MV, depthBytes);
+            X265_FREAD(tempMVBuf[i], sizeof(MV), depthBytes, m_analysisFileIn);
+            MVBuf[i] = tempMVBuf[i];
+            tempMvpBuf[i] = X265_MALLOC(int, depthBytes);
+            X265_FREAD(tempMvpBuf[i], sizeof(int), depthBytes, m_analysisFileIn);
+            mvpBuf[i] = tempMvpBuf[i];
+            tempRefBuf[i] = X265_MALLOC(int32_t, depthBytes);
+            X265_FREAD(tempRefBuf[i], sizeof(int32_t), depthBytes, m_analysisFileIn);
+            refBuf[i] = tempRefBuf[i];
+        }
+        tempModeBuf = X265_MALLOC(uint8_t, depthBytes);
+        X265_FREAD(tempModeBuf, sizeof(uint8_t), depthBytes, m_analysisFileIn);
+        modeBuf = tempModeBuf;
+
+        count = 0;
+        for (uint32_t d = 0; d < depthBytes; d++)
+        {
+            size_t bytes = NUM_4x4_PARTITIONS >> (depthBuf[d] * 2);
+            for (int i = 0; i < numDir; i++)
+            {
+                for (size_t j = count, k = 0; k < bytes; j++, k++)
+                {
+                    memcpy(&((analysis2PassFrameData*)analysis2Pass->analysisFramedata)->m_mv[i][j], MVBuf[i] + d, sizeof(MV));
+                    memcpy(&((analysis2PassFrameData*)analysis2Pass->analysisFramedata)->mvpIdx[i][j], mvpBuf[i] + d, sizeof(int));
+                    memcpy(&((analysis2PassFrameData*)analysis2Pass->analysisFramedata)->ref[i][j], refBuf[i] + d, sizeof(int32_t));
+                }
+            }
+            memset(&((analysis2PassFrameData *)analysis2Pass->analysisFramedata)->modes[count], modeBuf[d], bytes);
+            count += bytes;
+        }
+
+        for (int i = 0; i < numDir; i++)
+        {
+            X265_FREE(tempMVBuf[i]);
+            X265_FREE(tempMvpBuf[i]);
+            X265_FREE(tempRefBuf[i]);
+        }
+        X265_FREE(tempModeBuf);
+    }
+    X265_FREE(tempBuf);
+    X265_FREE(tempdistBuf);
+
+#undef X265_FREAD
+}
+
 void Encoder::writeAnalysisFile(x265_analysis_data* analysis, FrameData &curEncData)
 {
 
@@ -2450,6 +2772,101 @@
     }
 #undef X265_FWRITE
 }
+void Encoder::writeAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, FrameData &curEncData, int slicetype)
+{
+#define X265_FWRITE(val, size, writeSize, fileOffset)\
+    if (fwrite(val, size, writeSize, fileOffset) < writeSize)\
+    {\
+    x265_log(NULL, X265_LOG_ERROR, "Error writing analysis 2 pass data\n"); \
+    freeAnalysis2Pass(analysis2Pass, slicetype); \
+    m_aborted = true; \
+    return; \
+}\
+
+    uint32_t depthBytes = 0;
+    uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
+    uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
+    uint32_t numCUsInFrame = widthInCU * heightInCU;
+    analysis2PassFrameData* analysisFrameData = (analysis2PassFrameData*)analysis2Pass->analysisFramedata;
+
+    for (uint32_t cuAddr = 0; cuAddr < numCUsInFrame; cuAddr++)
+    {
+        uint8_t depth = 0;
+
+        CUData* ctu = curEncData.getPicCTU(cuAddr);
+
+        for (uint32_t absPartIdx = 0; absPartIdx < ctu->m_numPartitions; depthBytes++)
+        {
+            depth = ctu->m_cuDepth[absPartIdx];
+            analysisFrameData->depth[depthBytes] = depth;
+            analysisFrameData->distortion[depthBytes] = ctu->m_distortion[absPartIdx];
+            absPartIdx += ctu->m_numPartitions >> (depth * 2);
+        }
+    }
+
+    if (curEncData.m_slice->m_sliceType != I_SLICE)
+    {
+        depthBytes = 0;
+        for (uint32_t cuAddr = 0; cuAddr < numCUsInFrame; cuAddr++)
+        {
+            uint8_t depth = 0;
+            uint8_t predMode = 0;
+
+            CUData* ctu = curEncData.getPicCTU(cuAddr);
+
+            for (uint32_t absPartIdx = 0; absPartIdx < ctu->m_numPartitions; depthBytes++)
+            {
+                depth = ctu->m_cuDepth[absPartIdx];
+                analysisFrameData->m_mv[0][depthBytes] = ctu->m_mv[0][absPartIdx];
+                analysisFrameData->mvpIdx[0][depthBytes] = ctu->m_mvpIdx[0][absPartIdx];
+                analysisFrameData->ref[0][depthBytes] = ctu->m_refIdx[0][absPartIdx];
+                predMode = ctu->m_predMode[absPartIdx];
+                if (ctu->m_refIdx[1][absPartIdx] != -1)
+                {
+                    analysisFrameData->m_mv[1][depthBytes] = ctu->m_mv[1][absPartIdx];
+                    analysisFrameData->mvpIdx[1][depthBytes] = ctu->m_mvpIdx[1][absPartIdx];
+                    analysisFrameData->ref[1][depthBytes] = ctu->m_refIdx[1][absPartIdx];
+                    predMode = 4; // used as indiacator if the block is coded as bidir
+                }
+                analysisFrameData->modes[depthBytes] = predMode;
+
+                absPartIdx += ctu->m_numPartitions >> (depth * 2);
+            }
+        }
+    }
+
+    /* calculate frameRecordSize */
+    analysis2Pass->frameRecordSize = sizeof(analysis2Pass->frameRecordSize) + sizeof(depthBytes) + sizeof(analysis2Pass->poc);
+
+    analysis2Pass->frameRecordSize += depthBytes * sizeof(uint8_t);
+    analysis2Pass->frameRecordSize += depthBytes * sizeof(sse_t);
+    if (curEncData.m_slice->m_sliceType != I_SLICE)
+    {
+        int numDir = (curEncData.m_slice->m_sliceType == P_SLICE) ? 1 : 2;
+        analysis2Pass->frameRecordSize += depthBytes * sizeof(MV) * numDir;
+        analysis2Pass->frameRecordSize += depthBytes * sizeof(int32_t) * numDir;
+        analysis2Pass->frameRecordSize += depthBytes * sizeof(int) * numDir;
+        analysis2Pass->frameRecordSize += depthBytes * sizeof(uint8_t);
+    }
+    X265_FWRITE(&analysis2Pass->frameRecordSize, sizeof(uint32_t), 1, m_analysisFileOut);
+    X265_FWRITE(&depthBytes, sizeof(uint32_t), 1, m_analysisFileOut);
+    X265_FWRITE(&analysis2Pass->poc, sizeof(uint32_t), 1, m_analysisFileOut);
+
+    X265_FWRITE(analysisFrameData->depth, sizeof(uint8_t), depthBytes, m_analysisFileOut);
+    X265_FWRITE(analysisFrameData->distortion, sizeof(sse_t), depthBytes, m_analysisFileOut);
+    if (curEncData.m_slice->m_sliceType != I_SLICE)
+    {
+        int numDir = curEncData.m_slice->m_sliceType == P_SLICE ? 1 : 2;
+        for (int i = 0; i < numDir; i++)
+        {
+            X265_FWRITE(analysisFrameData->m_mv[i], sizeof(MV), depthBytes, m_analysisFileOut);
+            X265_FWRITE(analysisFrameData->mvpIdx[i], sizeof(int), depthBytes, m_analysisFileOut);
+            X265_FWRITE(analysisFrameData->ref[i], sizeof(int32_t), depthBytes, m_analysisFileOut);
+        }
+        X265_FWRITE(analysisFrameData->modes, sizeof(uint8_t), depthBytes, m_analysisFileOut);
+    }
+#undef X265_FWRITE
+}
 
 void Encoder::printReconfigureParams()
 {
@@ -2460,7 +2877,7 @@
     
     x265_log(newParam, X265_LOG_DEBUG, "Reconfigured param options, input Frame: %d\n", m_pocLast + 1);
 
-    char tmp[40];
+    char tmp[60];
 #define TOOLCMP(COND1, COND2, STR)  if (COND1 != COND2) { sprintf(tmp, STR, COND1, COND2); x265_log(newParam, X265_LOG_DEBUG, tmp); }
     TOOLCMP(oldParam->maxNumReferences, newParam->maxNumReferences, "ref=%d to %d\n");
     TOOLCMP(oldParam->bEnableFastIntra, newParam->bEnableFastIntra, "fast-intra=%d to %d\n");
@@ -2474,6 +2891,7 @@
     TOOLCMP(oldParam->bEnableRectInter, newParam->bEnableRectInter, "rect=%d to %d\n");
     TOOLCMP(oldParam->maxNumMergeCand, newParam->maxNumMergeCand, "max-merge=%d to %d\n");
     TOOLCMP(oldParam->bIntraInBFrames, newParam->bIntraInBFrames, "b-intra=%d to %d\n");
+    TOOLCMP(oldParam->scalingLists, newParam->scalingLists, "scalinglists=%s to %s\n");
 }
 
 bool Encoder::computeSPSRPSIndex()
​

x265_2.2.tar.gz/source/encoder/encoder.h -> x265_2.3.tar.gz/source/encoder/encoder.h Changed

@@ -30,6 +30,7 @@
 #include "scalinglist.h"
 #include "x265.h"
 #include "nal.h"
+#include "framedata.h"
 
 struct x265_encoder {};
 
@@ -129,6 +130,8 @@
     DPB*               m_dpb;
     Frame*             m_exportedPic;
     FILE*              m_analysisFile;
+    FILE*              m_analysisFileIn;
+    FILE*              m_analysisFileOut;
     x265_param*        m_param;
     x265_param*        m_latestParam;     // Holds latest param during a reconfigure
     RateControl*       m_rateControl;
@@ -159,9 +162,7 @@
     Lock               m_sliceQpLock;
     int                m_iFrameNum;   
     int                m_iPPSQpMinus26;
-    int                m_iLastSliceQp;
     int64_t            m_iBitsCostSum[QP_MAX_MAX + 1];
-
     Lock               m_sliceRefIdxLock;
     RefIdxLastGOP      m_refIdxLastGOP;
 
@@ -197,10 +198,15 @@
 
     void freeAnalysis(x265_analysis_data* analysis);
 
+    void allocAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType);
+
+    void freeAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType);
+
     void readAnalysisFile(x265_analysis_data* analysis, int poc);
 
     void writeAnalysisFile(x265_analysis_data* pic, FrameData &curEncData);
-
+    void readAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, int poc, int sliceType);
+    void writeAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, FrameData &curEncData, int slicetype);
     void finishFrameStats(Frame* pic, FrameEncoder *curEncoder, x265_frame_stats* frameStats, int inPoc);
 
     void calcRefreshInterval(Frame* frameEnc);

 
@@ -30,6 +30,7 @@
 #include "scalinglist.h"
 #include "x265.h"
 #include "nal.h"
+#include "framedata.h"
 
 struct x265_encoder {};
 
@@ -129,6 +130,8 @@
     DPB*               m_dpb;
     Frame*             m_exportedPic;
     FILE*              m_analysisFile;
+    FILE*              m_analysisFileIn;
+    FILE*              m_analysisFileOut;
     x265_param*        m_param;
     x265_param*        m_latestParam;     // Holds latest param during a reconfigure
     RateControl*       m_rateControl;
@@ -159,9 +162,7 @@
     Lock               m_sliceQpLock;
     int                m_iFrameNum;   
     int                m_iPPSQpMinus26;
-    int                m_iLastSliceQp;
     int64_t            m_iBitsCostSum[QP_MAX_MAX + 1];
-
     Lock               m_sliceRefIdxLock;
     RefIdxLastGOP      m_refIdxLastGOP;
 
@@ -197,10 +198,15 @@
 
     void freeAnalysis(x265_analysis_data* analysis);
 
+    void allocAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType);
+
+    void freeAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType);
+
     void readAnalysisFile(x265_analysis_data* analysis, int poc);
 
     void writeAnalysisFile(x265_analysis_data* pic, FrameData &curEncData);
-
+    void readAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, int poc, int sliceType);
+    void writeAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, FrameData &curEncData, int slicetype);
     void finishFrameStats(Frame* pic, FrameEncoder *curEncoder, x265_frame_stats* frameStats, int inPoc);
 
     void calcRefreshInterval(Frame* frameEnc);
​

x265_2.2.tar.gz/source/encoder/entropy.cpp -> x265_2.3.tar.gz/source/encoder/entropy.cpp Changed

 
@@ -226,6 +226,7 @@
     markValid();
     m_fracBits = 0;
     m_pad = 0;
+    m_meanQP = 0;
     X265_CHECK(sizeof(m_contextState) >= sizeof(m_contextState[0]) * MAX_OFF_CTX_MOD, "context state table is too small\n");
 }
 
​

x265_2.2.tar.gz/source/encoder/entropy.h -> x265_2.3.tar.gz/source/encoder/entropy.h Changed

 
@@ -111,6 +111,7 @@
     int           m_bitsLeft;
     uint64_t      m_fracBits;
     EstBitsSbac   m_estBitsSbac;
+    double        m_meanQP;
 
     Entropy();
 
​

x265_2.2.tar.gz/source/encoder/frameencoder.cpp -> x265_2.3.tar.gz/source/encoder/frameencoder.cpp Changed

@@ -494,9 +494,7 @@
             m_top->m_iBitsCostSum[i] += codeLength;
         }
         m_top->m_iFrameNum++;
-        m_top->m_iLastSliceQp = slice->m_sliceQp;
     }
-
     m_initSliceContext.resetEntropy(*slice);
 
     m_frameFilter.start(m_frame, m_initSliceContext);
@@ -827,6 +825,7 @@
         m_frame->m_encData->m_frameStats.lumaDistortion   += m_rows[i].rowStats.lumaDistortion;
         m_frame->m_encData->m_frameStats.chromaDistortion += m_rows[i].rowStats.chromaDistortion;
         m_frame->m_encData->m_frameStats.psyEnergy        += m_rows[i].rowStats.psyEnergy;
+        m_frame->m_encData->m_frameStats.ssimEnergy       += m_rows[i].rowStats.ssimEnergy;
         m_frame->m_encData->m_frameStats.resEnergy        += m_rows[i].rowStats.resEnergy;
         for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
         {
@@ -841,6 +840,7 @@
     m_frame->m_encData->m_frameStats.avgLumaDistortion   = (double)(m_frame->m_encData->m_frameStats.lumaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
     m_frame->m_encData->m_frameStats.avgChromaDistortion = (double)(m_frame->m_encData->m_frameStats.chromaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
     m_frame->m_encData->m_frameStats.avgPsyEnergy        = (double)(m_frame->m_encData->m_frameStats.psyEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
+    m_frame->m_encData->m_frameStats.avgSsimEnergy       = (double)(m_frame->m_encData->m_frameStats.ssimEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
     m_frame->m_encData->m_frameStats.avgResEnergy        = (double)(m_frame->m_encData->m_frameStats.resEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
     m_frame->m_encData->m_frameStats.percentIntraNxN     = (double)(m_frame->m_encData->m_frameStats.cntIntraNxN * 100) / m_frame->m_encData->m_frameStats.totalCu;
     for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
@@ -969,12 +969,29 @@
     }
     m_accessUnitBits = bytes << 3;
 
-    m_endCompressTime = x265_mdate();
-
+    int filler = 0;
     /* rateControlEnd may also block for earlier frames to call rateControlUpdateStats */
-    if (m_top->m_rateControl->rateControlEnd(m_frame, m_accessUnitBits, &m_rce) < 0)
+    if (m_top->m_rateControl->rateControlEnd(m_frame, m_accessUnitBits, &m_rce, &filler) < 0)
         m_top->m_aborted = true;
 
+    if (filler > 0)
+    {
+        filler = (filler - FILLER_OVERHEAD * 8) >> 3;
+        m_bs.resetBits();
+        while (filler > 0)
+        {
+            m_bs.write(0xff, 8);
+            filler--;
+        }
+        m_bs.writeByteAlignment();
+        m_nalList.serialize(NAL_UNIT_FILLER_DATA, m_bs);
+        bytes += m_nalList.m_nal[m_nalList.m_numNal - 1].sizeBytes;
+        bytes -= 3; //exclude start code prefix
+        m_accessUnitBits = bytes << 3;
+    }
+
+    m_endCompressTime = x265_mdate();
+
     /* Decrement referenced frame reference counts, allow them to be recycled */
     for (int l = 0; l < numPredDir; l++)
     {
@@ -1182,6 +1199,65 @@
         //m_rows[row - 1].bufferedEntropy.loadContexts(m_initSliceContext);
     }
 
+    // calculate mean QP for consistent deltaQP signalling calculation
+    if (m_param->bOptCUDeltaQP)
+    {
+        ScopedLock self(curRow.lock);
+        if (!curRow.avgQPComputed)
+        {
+            if (m_param->bEnableWavefront || !row)
+            {
+                double meanQPOff = 0;
+                uint32_t loopIncr, count = 0;
+                bool isReferenced = IS_REFERENCED(m_frame);
+                double *qpoffs = (isReferenced && m_param->rc.cuTree) ? m_frame->m_lowres.qpCuTreeOffset : m_frame->m_lowres.qpAqOffset;
+                if (qpoffs)
+                {
+                    if (m_param->rc.qgSize == 8)
+                        loopIncr = 8;
+                    else
+                        loopIncr = 16;
+                    uint32_t cuYStart = 0, height = m_frame->m_fencPic->m_picHeight;
+                    if (m_param->bEnableWavefront)
+                    {
+                        cuYStart = intRow * m_param->maxCUSize;
+                        height = cuYStart + m_param->maxCUSize;
+                    }
+
+                    uint32_t qgSize = m_param->rc.qgSize, width = m_frame->m_fencPic->m_picWidth;
+                    uint32_t maxOffsetCols = (m_frame->m_fencPic->m_picWidth + (loopIncr - 1)) / loopIncr;
+                    for (uint32_t cuY = cuYStart; cuY < height && (cuY < m_frame->m_fencPic->m_picHeight); cuY += qgSize)
+                    {
+                        for (uint32_t cuX = 0; cuX < width; cuX += qgSize)
+                        {
+                            double qp_offset = 0;
+                            uint32_t cnt = 0;
+
+                            for (uint32_t block_yy = cuY; block_yy < cuY + qgSize && block_yy < m_frame->m_fencPic->m_picHeight; block_yy += loopIncr)
+                            {
+                                for (uint32_t block_xx = cuX; block_xx < cuX + qgSize && block_xx < width; block_xx += loopIncr)
+                                {
+                                    int idx = ((block_yy / loopIncr) * (maxOffsetCols)) + (block_xx / loopIncr);
+                                    qp_offset += qpoffs[idx];
+                                    cnt++;
+                                }
+                            }
+                            qp_offset /= cnt;
+                            meanQPOff += qp_offset;
+                            count++;
+                        }
+                    }
+                    meanQPOff /= count;
+                }
+                rowCoder.m_meanQP = slice->m_sliceQp + meanQPOff;
+            }
+            else
+            {
+                rowCoder.m_meanQP = m_rows[0].rowGoOnCoder.m_meanQP;
+            }
+            curRow.avgQPComputed = 1;
+        }
+    }
 
     // TODO: specially case handle on first and last row
 
@@ -1252,6 +1328,13 @@
             rowCoder.copyState(m_initSliceContext);
             rowCoder.loadContexts(m_rows[row - 1].bufferedEntropy);
         }
+        analysis2PassFrameData* analysisFrameData = (analysis2PassFrameData*)(m_frame->m_analysis2Pass).analysisFramedata;
+        if (analysisFrameData && m_param->rc.bStatRead && m_param->analysisMultiPassDistortion && (analysisFrameData->threshold[cuAddr] < 0.9 || analysisFrameData->threshold[cuAddr] > 1.1)
+            && analysisFrameData->highDistortionCtuCount && analysisFrameData->lowDistortionCtuCount)
+            curEncData.m_cuStat[cuAddr].baseQp += analysisFrameData->offset[cuAddr];
+
+        if (m_param->dynamicRd && (int32_t)(m_rce.qpaRc - m_rce.qpNoVbv) > 0)
+            ctu->m_vbvAffected = true;
 
         // Does all the CU analysis, returns best top level mode decision
         Mode& best = tld.analysis.compressCTU(*ctu, *m_frame, m_cuGeoms[m_ctuGeomMap[cuAddr]], rowCoder);
@@ -1356,6 +1439,7 @@
         curRow.rowStats.lumaDistortion   += best.lumaDistortion;
         curRow.rowStats.chromaDistortion += best.chromaDistortion;
         curRow.rowStats.psyEnergy        += best.psyEnergy;
+        curRow.rowStats.ssimEnergy       += best.ssimEnergy;
         curRow.rowStats.resEnergy        += best.resEnergy;
         curRow.rowStats.cntIntraNxN      += frameLog.cntIntraNxN;
         curRow.rowStats.totalCu          += frameLog.totalCu;

 
@@ -494,9 +494,7 @@
             m_top->m_iBitsCostSum[i] += codeLength;
         }
         m_top->m_iFrameNum++;
-        m_top->m_iLastSliceQp = slice->m_sliceQp;
     }
-
     m_initSliceContext.resetEntropy(*slice);
 
     m_frameFilter.start(m_frame, m_initSliceContext);
@@ -827,6 +825,7 @@
         m_frame->m_encData->m_frameStats.lumaDistortion   += m_rows[i].rowStats.lumaDistortion;
         m_frame->m_encData->m_frameStats.chromaDistortion += m_rows[i].rowStats.chromaDistortion;
         m_frame->m_encData->m_frameStats.psyEnergy        += m_rows[i].rowStats.psyEnergy;
+        m_frame->m_encData->m_frameStats.ssimEnergy       += m_rows[i].rowStats.ssimEnergy;
         m_frame->m_encData->m_frameStats.resEnergy        += m_rows[i].rowStats.resEnergy;
         for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
         {
@@ -841,6 +840,7 @@
     m_frame->m_encData->m_frameStats.avgLumaDistortion   = (double)(m_frame->m_encData->m_frameStats.lumaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
     m_frame->m_encData->m_frameStats.avgChromaDistortion = (double)(m_frame->m_encData->m_frameStats.chromaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
     m_frame->m_encData->m_frameStats.avgPsyEnergy        = (double)(m_frame->m_encData->m_frameStats.psyEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
+    m_frame->m_encData->m_frameStats.avgSsimEnergy       = (double)(m_frame->m_encData->m_frameStats.ssimEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
     m_frame->m_encData->m_frameStats.avgResEnergy        = (double)(m_frame->m_encData->m_frameStats.resEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
     m_frame->m_encData->m_frameStats.percentIntraNxN     = (double)(m_frame->m_encData->m_frameStats.cntIntraNxN * 100) / m_frame->m_encData->m_frameStats.totalCu;
     for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
@@ -969,12 +969,29 @@
     }
     m_accessUnitBits = bytes << 3;
 
-    m_endCompressTime = x265_mdate();
-
+    int filler = 0;
     /* rateControlEnd may also block for earlier frames to call rateControlUpdateStats */
-    if (m_top->m_rateControl->rateControlEnd(m_frame, m_accessUnitBits, &m_rce) < 0)
+    if (m_top->m_rateControl->rateControlEnd(m_frame, m_accessUnitBits, &m_rce, &filler) < 0)
         m_top->m_aborted = true;
 
+    if (filler > 0)
+    {
+        filler = (filler - FILLER_OVERHEAD * 8) >> 3;
+        m_bs.resetBits();
+        while (filler > 0)
+        {
+            m_bs.write(0xff, 8);
+            filler--;
+        }
+        m_bs.writeByteAlignment();
+        m_nalList.serialize(NAL_UNIT_FILLER_DATA, m_bs);
+        bytes += m_nalList.m_nal[m_nalList.m_numNal - 1].sizeBytes;
+        bytes -= 3; //exclude start code prefix
+        m_accessUnitBits = bytes << 3;
+    }
+
+    m_endCompressTime = x265_mdate();
+
     /* Decrement referenced frame reference counts, allow them to be recycled */
     for (int l = 0; l < numPredDir; l++)
     {
@@ -1182,6 +1199,65 @@
         //m_rows[row - 1].bufferedEntropy.loadContexts(m_initSliceContext);
     }
 
+    // calculate mean QP for consistent deltaQP signalling calculation
+    if (m_param->bOptCUDeltaQP)
+    {
+        ScopedLock self(curRow.lock);
+        if (!curRow.avgQPComputed)
+        {
+            if (m_param->bEnableWavefront || !row)
+            {
+                double meanQPOff = 0;
+                uint32_t loopIncr, count = 0;
+                bool isReferenced = IS_REFERENCED(m_frame);
+                double *qpoffs = (isReferenced && m_param->rc.cuTree) ? m_frame->m_lowres.qpCuTreeOffset : m_frame->m_lowres.qpAqOffset;
+                if (qpoffs)
+                {
+                    if (m_param->rc.qgSize == 8)
+                        loopIncr = 8;
+                    else
+                        loopIncr = 16;
+                    uint32_t cuYStart = 0, height = m_frame->m_fencPic->m_picHeight;
+                    if (m_param->bEnableWavefront)
+                    {
+                        cuYStart = intRow * m_param->maxCUSize;
+                        height = cuYStart + m_param->maxCUSize;
+                    }
+
+                    uint32_t qgSize = m_param->rc.qgSize, width = m_frame->m_fencPic->m_picWidth;
+                    uint32_t maxOffsetCols = (m_frame->m_fencPic->m_picWidth + (loopIncr - 1)) / loopIncr;
+                    for (uint32_t cuY = cuYStart; cuY < height && (cuY < m_frame->m_fencPic->m_picHeight); cuY += qgSize)
+                    {
+                        for (uint32_t cuX = 0; cuX < width; cuX += qgSize)
+                        {
+                            double qp_offset = 0;
+                            uint32_t cnt = 0;
+
+                            for (uint32_t block_yy = cuY; block_yy < cuY + qgSize && block_yy < m_frame->m_fencPic->m_picHeight; block_yy += loopIncr)
+                            {
+                                for (uint32_t block_xx = cuX; block_xx < cuX + qgSize && block_xx < width; block_xx += loopIncr)
+                                {
+                                    int idx = ((block_yy / loopIncr) * (maxOffsetCols)) + (block_xx / loopIncr);
+                                    qp_offset += qpoffs[idx];
+                                    cnt++;
+                                }
+                            }
+                            qp_offset /= cnt;
+                            meanQPOff += qp_offset;
+                            count++;
+                        }
+                    }
+                    meanQPOff /= count;
+                }
+                rowCoder.m_meanQP = slice->m_sliceQp + meanQPOff;
+            }
+            else
+            {
+                rowCoder.m_meanQP = m_rows[0].rowGoOnCoder.m_meanQP;
+            }
+            curRow.avgQPComputed = 1;
+        }
+    }
 
     // TODO: specially case handle on first and last row
 
@@ -1252,6 +1328,13 @@
             rowCoder.copyState(m_initSliceContext);
             rowCoder.loadContexts(m_rows[row - 1].bufferedEntropy);
         }
+        analysis2PassFrameData* analysisFrameData = (analysis2PassFrameData*)(m_frame->m_analysis2Pass).analysisFramedata;
+        if (analysisFrameData && m_param->rc.bStatRead && m_param->analysisMultiPassDistortion && (analysisFrameData->threshold[cuAddr] < 0.9 || analysisFrameData->threshold[cuAddr] > 1.1)
+            && analysisFrameData->highDistortionCtuCount && analysisFrameData->lowDistortionCtuCount)
+            curEncData.m_cuStat[cuAddr].baseQp += analysisFrameData->offset[cuAddr];
+
+        if (m_param->dynamicRd && (int32_t)(m_rce.qpaRc - m_rce.qpNoVbv) > 0)
+            ctu->m_vbvAffected = true;
 
         // Does all the CU analysis, returns best top level mode decision
         Mode& best = tld.analysis.compressCTU(*ctu, *m_frame, m_cuGeoms[m_ctuGeomMap[cuAddr]], rowCoder);
@@ -1356,6 +1439,7 @@
         curRow.rowStats.lumaDistortion   += best.lumaDistortion;
         curRow.rowStats.chromaDistortion += best.chromaDistortion;
         curRow.rowStats.psyEnergy        += best.psyEnergy;
+        curRow.rowStats.ssimEnergy       += best.ssimEnergy;
         curRow.rowStats.resEnergy        += best.resEnergy;
         curRow.rowStats.cntIntraNxN      += frameLog.cntIntraNxN;
         curRow.rowStats.totalCu          += frameLog.totalCu;
​

x265_2.2.tar.gz/source/encoder/frameencoder.h -> x265_2.3.tar.gz/source/encoder/frameencoder.h Changed

 
@@ -95,6 +95,7 @@
 
     /* count of completed CUs in this row */
     volatile uint32_t completed;
+    volatile uint32_t avgQPComputed;
 
     /* called at the start of each frame to initialize state */
     void init(Entropy& initContext, unsigned int sid)
@@ -102,6 +103,7 @@
         active = false;
         busy = false;
         completed = 0;
+        avgQPComputed = 0;
         sliceId = sid;
         memset(&rowStats, 0, sizeof(rowStats));
         rowGoOnCoder.load(initContext);
​

x265_2.2.tar.gz/source/encoder/framefilter.cpp -> x265_2.3.tar.gz/source/encoder/framefilter.cpp Changed

@@ -642,6 +642,9 @@
     const uint32_t numCols = m_frame->m_encData->m_slice->m_sps->numCuInWidth;
     const uint32_t lineStartCUAddr = row * numCols;
 
+    /* Generate integral planes for SEA motion search */
+    if(m_param->searchMethod == X265_SEA)
+        computeMEIntegral(row);
     // Notify other FrameEncoders that this row of reconstructed pixels is available
     m_frame->m_reconRowFlag[row].set(1);
 
@@ -768,10 +771,16 @@
         }
     } // end of (m_param->maxSlices == 1)
 
-    int lastRow = row == (int)m_frame->m_encData->m_slice->m_sps->numCuInHeight - 1;
+    if (ATOMIC_INC(&m_frameEncoder->m_completionCount) == 2 * (int)m_frameEncoder->m_numRows)
+    {
+        m_frameEncoder->m_completionEvent.trigger();
+    }
+}
 
-    /* generate integral planes for SEA motion search */
-    if (m_param->searchMethod == X265_SEA && m_frame->m_encData->m_meIntegral && m_frame->m_lowres.sliceType != X265_TYPE_B)
+void FrameFilter::computeMEIntegral(int row)
+{
+    int lastRow = row == (int)m_frame->m_encData->m_slice->m_sps->numCuInHeight - 1;
+    if (m_frame->m_encData->m_meIntegral && m_frame->m_lowres.sliceType != X265_TYPE_B)
     {
         /* If WPP, other than first row, integral calculation for current row needs to wait till the
         * integral for the previous row is computed */
@@ -868,11 +877,6 @@
         }
         m_parallelFilter[row].m_frameFilter->integralCompleted.set(1);
     }
-
-    if (ATOMIC_INC(&m_frameEncoder->m_completionCount) == 2 * (int)m_frameEncoder->m_numRows)
-    {
-        m_frameEncoder->m_completionEvent.trigger();
-    }
 }
 
 static uint64_t computeSSD(pixel *fenc, pixel *rec, intptr_t stride, uint32_t width, uint32_t height)

 
@@ -642,6 +642,9 @@
     const uint32_t numCols = m_frame->m_encData->m_slice->m_sps->numCuInWidth;
     const uint32_t lineStartCUAddr = row * numCols;
 
+    /* Generate integral planes for SEA motion search */
+    if(m_param->searchMethod == X265_SEA)
+        computeMEIntegral(row);
     // Notify other FrameEncoders that this row of reconstructed pixels is available
     m_frame->m_reconRowFlag[row].set(1);
 
@@ -768,10 +771,16 @@
         }
     } // end of (m_param->maxSlices == 1)
 
-    int lastRow = row == (int)m_frame->m_encData->m_slice->m_sps->numCuInHeight - 1;
+    if (ATOMIC_INC(&m_frameEncoder->m_completionCount) == 2 * (int)m_frameEncoder->m_numRows)
+    {
+        m_frameEncoder->m_completionEvent.trigger();
+    }
+}
 
-    /* generate integral planes for SEA motion search */
-    if (m_param->searchMethod == X265_SEA && m_frame->m_encData->m_meIntegral && m_frame->m_lowres.sliceType != X265_TYPE_B)
+void FrameFilter::computeMEIntegral(int row)
+{
+    int lastRow = row == (int)m_frame->m_encData->m_slice->m_sps->numCuInHeight - 1;
+    if (m_frame->m_encData->m_meIntegral && m_frame->m_lowres.sliceType != X265_TYPE_B)
     {
         /* If WPP, other than first row, integral calculation for current row needs to wait till the
         * integral for the previous row is computed */
@@ -868,11 +877,6 @@
         }
         m_parallelFilter[row].m_frameFilter->integralCompleted.set(1);
     }
-
-    if (ATOMIC_INC(&m_frameEncoder->m_completionCount) == 2 * (int)m_frameEncoder->m_numRows)
-    {
-        m_frameEncoder->m_completionEvent.trigger();
-    }
 }
 
 static uint64_t computeSSD(pixel *fenc, pixel *rec, intptr_t stride, uint32_t width, uint32_t height)
​

x265_2.2.tar.gz/source/encoder/framefilter.h -> x265_2.3.tar.gz/source/encoder/framefilter.h Changed

 
@@ -133,6 +133,7 @@
 
     void processRow(int row);
     void processPostRow(int row);
+    void computeMEIntegral(int row);
 };
 }
 
​

x265_2.2.tar.gz/source/encoder/ratecontrol.cpp -> x265_2.3.tar.gz/source/encoder/ratecontrol.cpp Changed

@@ -454,6 +454,28 @@
                               m_param->fpsNum, m_param->fpsDenom, k, l);
                     return false;
                 }
+                if (m_param->analysisMultiPassRefine)
+                {
+                    p = strstr(opts, "ref=");
+                    sscanf(p, "ref=%d", &i);
+                    if (i > m_param->maxNumReferences)
+                    {
+                        x265_log(m_param, X265_LOG_ERROR, "maxNumReferences cannot be less than 1st pass (%d vs %d)\n",
+                            i, m_param->maxNumReferences);
+                        return false;
+                    }
+                }
+                if (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion)
+                {
+                    p = strstr(opts, "ctu=");
+                    sscanf(p, "ctu=%u", &k);
+                    if (k != m_param->maxCUSize)
+                    {
+                        x265_log(m_param, X265_LOG_ERROR, "maxCUSize mismatch with 1st pass (%u vs %u)\n",
+                            k, m_param->maxCUSize);
+                        return false;
+                    }
+                }
                 CMP_OPT_FIRST_PASS("bitdepth", m_param->internalBitDepth);
                 CMP_OPT_FIRST_PASS("weightp", m_param->bEnableWeightedPred);
                 CMP_OPT_FIRST_PASS("bframes", m_param->bframes);
@@ -2457,14 +2479,16 @@
     p->offset += new_offset;
 }
 
-void RateControl::updateVbv(int64_t bits, RateControlEntry* rce)
+int RateControl::updateVbv(int64_t bits, RateControlEntry* rce)
 {
     int predType = rce->sliceType;
+    int filler = 0;
+    double bufferBits;
     predType = rce->sliceType == B_SLICE && rce->keptAsRef ? 3 : predType;
     if (rce->lastSatd >= m_ncu && rce->encodeOrder >= m_lastPredictorReset)
         updatePredictor(&m_pred[predType], x265_qp2qScale(rce->qpaRc), (double)rce->lastSatd, (double)bits);
     if (!m_isVbv)
-        return;
+        return 0;
 
     m_bufferFillFinal -= bits;
 
@@ -2473,15 +2497,32 @@
 
     m_bufferFillFinal = X265_MAX(m_bufferFillFinal, 0);
     m_bufferFillFinal += m_bufferRate;
-    m_bufferFillFinal = X265_MIN(m_bufferFillFinal, m_bufferSize);
-    double bufferBits = X265_MIN(bits + m_bufferExcess, m_bufferRate);
-    m_bufferExcess = X265_MAX(m_bufferExcess - bufferBits + bits, 0);
-    m_bufferFillActual += bufferBits - bits;
-    m_bufferFillActual = X265_MIN(m_bufferFillActual, m_bufferSize);
+
+    if (m_bufferFillFinal > m_bufferSize) 
+    {
+        if (m_param->rc.bStrictCbr)
+        {
+            filler = (int)(m_bufferFillFinal - m_bufferSize);
+            filler += FILLER_OVERHEAD * 8;
+            m_bufferFillFinal -= filler;
+            bufferBits = X265_MIN(bits + filler + m_bufferExcess, m_bufferRate);
+            m_bufferExcess = X265_MAX(m_bufferExcess - bufferBits + bits + filler, 0);
+            m_bufferFillActual += bufferBits - bits - filler;
+        }
+        else
+        {
+            m_bufferFillFinal = X265_MIN(m_bufferFillFinal, m_bufferSize);
+            bufferBits = X265_MIN(bits + m_bufferExcess, m_bufferRate);
+            m_bufferExcess = X265_MAX(m_bufferExcess - bufferBits + bits, 0);
+            m_bufferFillActual += bufferBits - bits;
+            m_bufferFillActual = X265_MIN(m_bufferFillActual, m_bufferSize);
+        }
+    }
+    return filler;
 }
 
 /* After encoding one frame, update rate control state */
-int RateControl::rateControlEnd(Frame* curFrame, int64_t bits, RateControlEntry* rce)
+int RateControl::rateControlEnd(Frame* curFrame, int64_t bits, RateControlEntry* rce, int *filler)
 {
     int orderValue = m_startEndOrder.get();
     int endOrdinal = (rce->encodeOrder + m_param->frameNumThreads) * 2 - 1;
@@ -2499,7 +2540,7 @@
     int64_t actualBits = bits;
     Slice *slice = curEncData.m_slice;
 
-    if (m_param->rc.aqMode || m_isVbv)
+    if (m_param->rc.aqMode || m_isVbv || m_param->bAQMotion)
     {
         if (m_isVbv && !(m_2pass && m_param->rc.rateControlMode == X265_RC_CRF))
         {
@@ -2513,7 +2554,7 @@
             rce->qpaRc = curEncData.m_avgQpRc;
         }
 
-        if (m_param->rc.aqMode)
+        if (m_param->rc.aqMode || m_param->bAQMotion)
         {
             double avgQpAq = 0;
             /* determine actual avg encoded QP, after AQ/cutree adjustments */
@@ -2612,7 +2653,7 @@
 
     if (m_isVbv)
     {
-        updateVbv(actualBits, rce);
+        *filler = updateVbv(actualBits, rce);
 
         if (m_param->bEmitHRDSEI)
         {
@@ -2634,9 +2675,9 @@
 
                 rce->hrdTiming->cpbInitialAT = hrd->cbrFlag ? m_prevCpbFinalAT : X265_MAX(m_prevCpbFinalAT, cpbEarliestAT);
             }
-
+            int filler_bits = *filler ? (*filler - START_CODE_OVERHEAD * 8)  : 0; 
             uint32_t cpbsizeUnscale = hrd->cpbSizeValue << (hrd->cpbSizeScale + CPB_SHIFT);
-            rce->hrdTiming->cpbFinalAT = m_prevCpbFinalAT = rce->hrdTiming->cpbInitialAT + actualBits / cpbsizeUnscale;
+            rce->hrdTiming->cpbFinalAT = m_prevCpbFinalAT = rce->hrdTiming->cpbInitialAT + (actualBits + filler_bits)/ cpbsizeUnscale;
             rce->hrdTiming->dpbOutputTime = (double)rce->picTimingSEI->m_picDpbOutputDelay * time->numUnitsInTick / time->timeScale + rce->hrdTiming->cpbRemovalTime;
         }
     }

 
@@ -454,6 +454,28 @@
                               m_param->fpsNum, m_param->fpsDenom, k, l);
                     return false;
                 }
+                if (m_param->analysisMultiPassRefine)
+                {
+                    p = strstr(opts, "ref=");
+                    sscanf(p, "ref=%d", &i);
+                    if (i > m_param->maxNumReferences)
+                    {
+                        x265_log(m_param, X265_LOG_ERROR, "maxNumReferences cannot be less than 1st pass (%d vs %d)\n",
+                            i, m_param->maxNumReferences);
+                        return false;
+                    }
+                }
+                if (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion)
+                {
+                    p = strstr(opts, "ctu=");
+                    sscanf(p, "ctu=%u", &k);
+                    if (k != m_param->maxCUSize)
+                    {
+                        x265_log(m_param, X265_LOG_ERROR, "maxCUSize mismatch with 1st pass (%u vs %u)\n",
+                            k, m_param->maxCUSize);
+                        return false;
+                    }
+                }
                 CMP_OPT_FIRST_PASS("bitdepth", m_param->internalBitDepth);
                 CMP_OPT_FIRST_PASS("weightp", m_param->bEnableWeightedPred);
                 CMP_OPT_FIRST_PASS("bframes", m_param->bframes);
@@ -2457,14 +2479,16 @@
     p->offset += new_offset;
 }
 
-void RateControl::updateVbv(int64_t bits, RateControlEntry* rce)
+int RateControl::updateVbv(int64_t bits, RateControlEntry* rce)
 {
     int predType = rce->sliceType;
+    int filler = 0;
+    double bufferBits;
     predType = rce->sliceType == B_SLICE && rce->keptAsRef ? 3 : predType;
     if (rce->lastSatd >= m_ncu && rce->encodeOrder >= m_lastPredictorReset)
         updatePredictor(&m_pred[predType], x265_qp2qScale(rce->qpaRc), (double)rce->lastSatd, (double)bits);
     if (!m_isVbv)
-        return;
+        return 0;
 
     m_bufferFillFinal -= bits;
 
@@ -2473,15 +2497,32 @@
 
     m_bufferFillFinal = X265_MAX(m_bufferFillFinal, 0);
     m_bufferFillFinal += m_bufferRate;
-    m_bufferFillFinal = X265_MIN(m_bufferFillFinal, m_bufferSize);
-    double bufferBits = X265_MIN(bits + m_bufferExcess, m_bufferRate);
-    m_bufferExcess = X265_MAX(m_bufferExcess - bufferBits + bits, 0);
-    m_bufferFillActual += bufferBits - bits;
-    m_bufferFillActual = X265_MIN(m_bufferFillActual, m_bufferSize);
+
+    if (m_bufferFillFinal > m_bufferSize) 
+    {
+        if (m_param->rc.bStrictCbr)
+        {
+            filler = (int)(m_bufferFillFinal - m_bufferSize);
+            filler += FILLER_OVERHEAD * 8;
+            m_bufferFillFinal -= filler;
+            bufferBits = X265_MIN(bits + filler + m_bufferExcess, m_bufferRate);
+            m_bufferExcess = X265_MAX(m_bufferExcess - bufferBits + bits + filler, 0);
+            m_bufferFillActual += bufferBits - bits - filler;
+        }
+        else
+        {
+            m_bufferFillFinal = X265_MIN(m_bufferFillFinal, m_bufferSize);
+            bufferBits = X265_MIN(bits + m_bufferExcess, m_bufferRate);
+            m_bufferExcess = X265_MAX(m_bufferExcess - bufferBits + bits, 0);
+            m_bufferFillActual += bufferBits - bits;
+            m_bufferFillActual = X265_MIN(m_bufferFillActual, m_bufferSize);
+        }
+    }
+    return filler;
 }
 
 /* After encoding one frame, update rate control state */
-int RateControl::rateControlEnd(Frame* curFrame, int64_t bits, RateControlEntry* rce)
+int RateControl::rateControlEnd(Frame* curFrame, int64_t bits, RateControlEntry* rce, int *filler)
 {
     int orderValue = m_startEndOrder.get();
     int endOrdinal = (rce->encodeOrder + m_param->frameNumThreads) * 2 - 1;
@@ -2499,7 +2540,7 @@
     int64_t actualBits = bits;
     Slice *slice = curEncData.m_slice;
 
-    if (m_param->rc.aqMode || m_isVbv)
+    if (m_param->rc.aqMode || m_isVbv || m_param->bAQMotion)
     {
         if (m_isVbv && !(m_2pass && m_param->rc.rateControlMode == X265_RC_CRF))
         {
@@ -2513,7 +2554,7 @@
             rce->qpaRc = curEncData.m_avgQpRc;
         }
 
-        if (m_param->rc.aqMode)
+        if (m_param->rc.aqMode || m_param->bAQMotion)
         {
             double avgQpAq = 0;
             /* determine actual avg encoded QP, after AQ/cutree adjustments */
@@ -2612,7 +2653,7 @@
 
     if (m_isVbv)
     {
-        updateVbv(actualBits, rce);
+        *filler = updateVbv(actualBits, rce);
 
         if (m_param->bEmitHRDSEI)
         {
@@ -2634,9 +2675,9 @@
 
                 rce->hrdTiming->cpbInitialAT = hrd->cbrFlag ? m_prevCpbFinalAT : X265_MAX(m_prevCpbFinalAT, cpbEarliestAT);
             }
-
+            int filler_bits = *filler ? (*filler - START_CODE_OVERHEAD * 8)  : 0; 
             uint32_t cpbsizeUnscale = hrd->cpbSizeValue << (hrd->cpbSizeScale + CPB_SHIFT);
-            rce->hrdTiming->cpbFinalAT = m_prevCpbFinalAT = rce->hrdTiming->cpbInitialAT + actualBits / cpbsizeUnscale;
+            rce->hrdTiming->cpbFinalAT = m_prevCpbFinalAT = rce->hrdTiming->cpbInitialAT + (actualBits + filler_bits)/ cpbsizeUnscale;
             rce->hrdTiming->dpbOutputTime = (double)rce->picTimingSEI->m_picDpbOutputDelay * time->numUnitsInTick / time->timeScale + rce->hrdTiming->cpbRemovalTime;
         }
     }
​

x265_2.2.tar.gz/source/encoder/ratecontrol.h -> x265_2.3.tar.gz/source/encoder/ratecontrol.h Changed

@@ -242,7 +242,7 @@
     // to be called for each curFrame to process RateControl and set QP
     int  rateControlStart(Frame* curFrame, RateControlEntry* rce, Encoder* enc);
     void rateControlUpdateStats(RateControlEntry* rce);
-    int  rateControlEnd(Frame* curFrame, int64_t bits, RateControlEntry* rce);
+    int  rateControlEnd(Frame* curFrame, int64_t bits, RateControlEntry* rce, int *filler);
     int  rowVbvRateControl(Frame* curFrame, uint32_t row, RateControlEntry* rce, double& qpVbv);
     int  rateControlSliceType(int frameNum);
     bool cuTreeReadFor2Pass(Frame* curFrame);
@@ -269,7 +269,7 @@
     void   accumPQpUpdate();
 
     int    getPredictorType(int lowresSliceType, int sliceType);
-    void   updateVbv(int64_t bits, RateControlEntry* rce);
+    int    updateVbv(int64_t bits, RateControlEntry* rce);
     void   updatePredictor(Predictor *p, double q, double var, double bits);
     double clipQscale(Frame* pic, RateControlEntry* rce, double q);
     void   updateVbvPlan(Encoder* enc);

 
@@ -242,7 +242,7 @@
     // to be called for each curFrame to process RateControl and set QP
     int  rateControlStart(Frame* curFrame, RateControlEntry* rce, Encoder* enc);
     void rateControlUpdateStats(RateControlEntry* rce);
-    int  rateControlEnd(Frame* curFrame, int64_t bits, RateControlEntry* rce);
+    int  rateControlEnd(Frame* curFrame, int64_t bits, RateControlEntry* rce, int *filler);
     int  rowVbvRateControl(Frame* curFrame, uint32_t row, RateControlEntry* rce, double& qpVbv);
     int  rateControlSliceType(int frameNum);
     bool cuTreeReadFor2Pass(Frame* curFrame);
@@ -269,7 +269,7 @@
     void   accumPQpUpdate();
 
     int    getPredictorType(int lowresSliceType, int sliceType);
-    void   updateVbv(int64_t bits, RateControlEntry* rce);
+    int    updateVbv(int64_t bits, RateControlEntry* rce);
     void   updatePredictor(Predictor *p, double q, double var, double bits);
     double clipQscale(Frame* pic, RateControlEntry* rce, double q);
     void   updateVbvPlan(Encoder* enc);
​

x265_2.2.tar.gz/source/encoder/rdcost.h -> x265_2.3.tar.gz/source/encoder/rdcost.h Changed

@@ -41,9 +41,11 @@
     uint32_t  m_chromaDistWeight[2];
     uint32_t  m_psyRdBase;
     uint32_t  m_psyRd;
+    uint32_t  m_ssimRd;
     int       m_qp; /* QP used to configure lambda, may be higher than QP_MAX_SPEC but <= QP_MAX_MAX */
 
     void setPsyRdScale(double scale)                { m_psyRdBase = (uint32_t)floor(65536.0 * scale * 0.33); }
+    void setSsimRd(int ssimRd) { m_ssimRd = ssimRd; };
 
     void setQP(const Slice& slice, int qp)
     {
@@ -129,6 +131,20 @@
         return distortion + ((m_lambda * m_psyRd * psycost) >> 24) + ((bits * m_lambda2) >> 8);
     }
 
+    inline uint64_t calcSsimRdCost(uint64_t distortion, uint32_t bits, uint32_t ssimCost) const
+    {
+#if X265_DEPTH < 10
+        X265_CHECK((bits <= (UINT64_MAX / m_lambda2)) && (ssimCost <= UINT64_MAX / m_lambda),
+                   "calcPsyRdCost wrap detected dist: " X265_LL " bits: %u, lambda: " X265_LL ", lambda2: " X265_LL "\n",
+                   distortion, bits, m_lambda, m_lambda2);
+#else
+        X265_CHECK((bits <= (UINT64_MAX / m_lambda2)) && (ssimCost <= UINT64_MAX / m_lambda),
+                   "calcPsyRdCost wrap detected dist: " X265_LL ", bits: %u, lambda: " X265_LL ", lambda2: " X265_LL "\n",
+                   distortion, bits, m_lambda, m_lambda2);
+#endif
+        return distortion + ((m_lambda * ssimCost) >> 14) + ((bits * m_lambda2) >> 8);
+    }
+
     inline uint64_t calcRdSADCost(uint32_t sadCost, uint32_t bits) const
     {
         X265_CHECK(bits <= (UINT64_MAX - 128) / m_lambda,

 
@@ -41,9 +41,11 @@
     uint32_t  m_chromaDistWeight[2];
     uint32_t  m_psyRdBase;
     uint32_t  m_psyRd;
+    uint32_t  m_ssimRd;
     int       m_qp; /* QP used to configure lambda, may be higher than QP_MAX_SPEC but <= QP_MAX_MAX */
 
     void setPsyRdScale(double scale)                { m_psyRdBase = (uint32_t)floor(65536.0 * scale * 0.33); }
+    void setSsimRd(int ssimRd) { m_ssimRd = ssimRd; };
 
     void setQP(const Slice& slice, int qp)
     {
@@ -129,6 +131,20 @@
         return distortion + ((m_lambda * m_psyRd * psycost) >> 24) + ((bits * m_lambda2) >> 8);
     }
 
+    inline uint64_t calcSsimRdCost(uint64_t distortion, uint32_t bits, uint32_t ssimCost) const
+    {
+#if X265_DEPTH < 10
+        X265_CHECK((bits <= (UINT64_MAX / m_lambda2)) && (ssimCost <= UINT64_MAX / m_lambda),
+                   "calcPsyRdCost wrap detected dist: " X265_LL " bits: %u, lambda: " X265_LL ", lambda2: " X265_LL "\n",
+                   distortion, bits, m_lambda, m_lambda2);
+#else
+        X265_CHECK((bits <= (UINT64_MAX / m_lambda2)) && (ssimCost <= UINT64_MAX / m_lambda),
+                   "calcPsyRdCost wrap detected dist: " X265_LL ", bits: %u, lambda: " X265_LL ", lambda2: " X265_LL "\n",
+                   distortion, bits, m_lambda, m_lambda2);
+#endif
+        return distortion + ((m_lambda * ssimCost) >> 14) + ((bits * m_lambda2) >> 8);
+    }
+
     inline uint64_t calcRdSADCost(uint32_t sadCost, uint32_t bits) const
     {
         X265_CHECK(bits <= (UINT64_MAX - 128) / m_lambda,
​

x265_2.2.tar.gz/source/encoder/search.cpp -> x265_2.3.tar.gz/source/encoder/search.cpp Changed

@@ -78,6 +78,7 @@
     m_numLayers = g_log2Size[param.maxCUSize] - 2;
 
     m_rdCost.setPsyRdScale(param.psyRd);
+    m_rdCost.setSsimRd(param.bSsimRd);
     m_me.init(param.internalCsp);
 
     bool ok = m_quant.init(param.psyRdoq, scalingList, m_entropyCoder);
@@ -417,6 +418,11 @@
             fullCost.energy = m_rdCost.psyCost(sizeIdx, fenc, mode.fencYuv->m_size, reconQt, reconQtStride);
             fullCost.rdcost = m_rdCost.calcPsyRdCost(fullCost.distortion, fullCost.bits, fullCost.energy);
         }
+        else if(m_rdCost.m_ssimRd)
+        {
+            fullCost.energy = m_quant.ssimDistortion(cu, fenc, stride, reconQt, reconQtStride, log2TrSize, TEXT_LUMA, absPartIdx);
+            fullCost.rdcost = m_rdCost.calcSsimRdCost(fullCost.distortion, fullCost.bits, fullCost.energy);
+        }
         else
             fullCost.rdcost = m_rdCost.calcRdCost(fullCost.distortion, fullCost.bits);
     }
@@ -460,6 +466,8 @@
 
             if (m_rdCost.m_psyRd)
                 splitCost.rdcost = m_rdCost.calcPsyRdCost(splitCost.distortion, splitCost.bits, splitCost.energy);
+            else if(m_rdCost.m_ssimRd)
+                splitCost.rdcost = m_rdCost.calcSsimRdCost(splitCost.distortion, splitCost.bits, splitCost.energy);
             else
                 splitCost.rdcost = m_rdCost.calcRdCost(splitCost.distortion, splitCost.bits);
         }
@@ -625,6 +633,11 @@
             tmpEnergy = m_rdCost.psyCost(sizeIdx, fenc, fencYuv->m_size, tmpRecon, tmpReconStride);
             tmpCost = m_rdCost.calcPsyRdCost(tmpDist, tmpBits, tmpEnergy);
         }
+        else if(m_rdCost.m_ssimRd)
+        {
+            tmpEnergy = m_quant.ssimDistortion(cu, fenc, stride, tmpRecon, tmpReconStride, log2TrSize, TEXT_LUMA, absPartIdx);
+            tmpCost = m_rdCost.calcSsimRdCost(tmpDist, tmpBits, tmpEnergy);
+        }
         else
             tmpCost = m_rdCost.calcRdCost(tmpDist, tmpBits);
 
@@ -899,6 +912,8 @@
 
             if (m_rdCost.m_psyRd)
                 outCost.energy += m_rdCost.psyCost(sizeIdxC, fenc, stride, reconQt, reconQtStride);
+            else if(m_rdCost.m_ssimRd)
+                outCost.energy += m_quant.ssimDistortion(cu, fenc, stride, reconQt, reconQtStride, log2TrSizeC, ttype, absPartIdxC);
 
             primitives.cu[sizeIdxC].copy_pp(picReconC, picStride, reconQt, reconQtStride);
         }
@@ -1016,6 +1031,11 @@
                     tmpEnergy = m_rdCost.psyCost(sizeIdxC, fenc, stride, reconQt, reconQtStride);
                     tmpCost = m_rdCost.calcPsyRdCost(tmpDist, tmpBits, tmpEnergy);
                 }
+                else if(m_rdCost.m_ssimRd)
+                {
+                    tmpEnergy = m_quant.ssimDistortion(cu, fenc, stride, reconQt, reconQtStride, log2TrSizeC, ttype, absPartIdxC);
+                    tmpCost = m_rdCost.calcSsimRdCost(tmpDist, tmpBits, tmpEnergy);
+                }
                 else
                     tmpCost = m_rdCost.calcRdCost(tmpDist, tmpBits);
 
@@ -1207,7 +1227,7 @@
     }
     else
         intraMode.distortion += intraMode.lumaDistortion;
-
+    cu.m_distortion[0] = intraMode.distortion;
     m_entropyCoder.resetBits();
     if (m_slice->m_pps->bTransquantBypassEnabled)
         m_entropyCoder.codeCUTransquantBypassFlag(cu.m_tqBypass[0]);
@@ -1229,11 +1249,12 @@
     m_entropyCoder.store(intraMode.contexts);
     intraMode.totalBits = m_entropyCoder.getNumberOfWrittenBits();
     intraMode.coeffBits = intraMode.totalBits - intraMode.mvBits - skipFlagBits;
+    const Yuv* fencYuv = intraMode.fencYuv;
     if (m_rdCost.m_psyRd)
-    {
-        const Yuv* fencYuv = intraMode.fencYuv;
         intraMode.psyEnergy = m_rdCost.psyCost(cuGeom.log2CUSize - 2, fencYuv->m_buf[0], fencYuv->m_size, intraMode.reconYuv.m_buf[0], intraMode.reconYuv.m_size);
-    }
+    else if(m_rdCost.m_ssimRd)
+        intraMode.ssimEnergy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, intraMode.reconYuv.m_buf[0], intraMode.reconYuv.m_size, cuGeom.log2CUSize, TEXT_LUMA, 0);
+
     intraMode.resEnergy = primitives.cu[cuGeom.log2CUSize - 2].sse_pp(intraMode.fencYuv->m_buf[0], intraMode.fencYuv->m_size, intraMode.predYuv.m_buf[0], intraMode.predYuv.m_size);
 
     updateModeCost(intraMode);
@@ -1448,12 +1469,13 @@
 
     intraMode.totalBits = m_entropyCoder.getNumberOfWrittenBits();
     intraMode.coeffBits = intraMode.totalBits - intraMode.mvBits - skipFlagBits;
+    const Yuv* fencYuv = intraMode.fencYuv;
     if (m_rdCost.m_psyRd)
-    {
-        const Yuv* fencYuv = intraMode.fencYuv;
         intraMode.psyEnergy = m_rdCost.psyCost(cuGeom.log2CUSize - 2, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size);
-    }
-    intraMode.resEnergy = primitives.cu[cuGeom.log2CUSize - 2].sse_pp(intraMode.fencYuv->m_buf[0], intraMode.fencYuv->m_size, intraMode.predYuv.m_buf[0], intraMode.predYuv.m_size);
+    else if(m_rdCost.m_ssimRd)
+        intraMode.ssimEnergy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size, cuGeom.log2CUSize, TEXT_LUMA, 0);
+
+    intraMode.resEnergy = primitives.cu[cuGeom.log2CUSize - 2].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, intraMode.predYuv.m_buf[0], intraMode.predYuv.m_size);
     m_entropyCoder.store(intraMode.contexts);
     updateModeCost(intraMode);
     checkDQP(intraMode, cuGeom);
@@ -1778,7 +1800,7 @@
             codeCoeffQTChroma(cu, initTuDepth, absPartIdxC, TEXT_CHROMA_U);
             codeCoeffQTChroma(cu, initTuDepth, absPartIdxC, TEXT_CHROMA_V);
             uint32_t bits = m_entropyCoder.getNumberOfWrittenBits();
-            uint64_t cost = m_rdCost.m_psyRd ? m_rdCost.calcPsyRdCost(outCost.distortion, bits, outCost.energy)
+            uint64_t cost = m_rdCost.m_psyRd ? m_rdCost.calcPsyRdCost(outCost.distortion, bits, outCost.energy) : m_rdCost.m_ssimRd ? m_rdCost.calcSsimRdCost(outCost.distortion, bits, outCost.energy)
                                              : m_rdCost.calcRdCost(outCost.distortion, bits);
 
             if (cost < bestCost)
@@ -2128,7 +2150,7 @@
         cu.getNeighbourMV(puIdx, pu.puAbsPartIdx, interMode.interNeighbours);
 
         /* Uni-directional prediction */
-        if (m_param->analysisMode == X265_ANALYSIS_LOAD)
+        if (m_param->analysisMode == X265_ANALYSIS_LOAD || (m_param->analysisMultiPassRefine && m_param->rc.bStatRead))
         {
             for (int list = 0; list < numPredDir; list++)
             {
@@ -2153,7 +2175,11 @@
                         m_me.integral[planes] = interMode.fencYuv->m_integral[list][ref][planes] + puX * pu.width + puY * pu.height * m_slice->m_refFrameList[list][ref]->m_reconPic->m_stride;
                 }
                 setSearchRange(cu, mvp, m_param->searchRange, mvmin, mvmax);
-                int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv,
+                MV mvpIn = mvp;
+                if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && mvpIdx == bestME[list].mvpIdx)
+                    mvpIn = bestME[list].mv;
+                    
+                int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvpIn, numMvc, mvc, m_param->searchRange, outmv,
                   m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
 
                 /* Get total cost of partition, but only include MV bit cost once */
@@ -2162,7 +2188,22 @@
                 uint32_t cost = (satdCost - mvCost) + m_rdCost.getCost(bits);
 
                 /* Refine MVP selection, updates: mvpIdx, bits, cost */
-                mvp = checkBestMVP(amvp, outmv, mvpIdx, bits, cost);
+                if (!m_param->analysisMultiPassRefine)
+                    mvp = checkBestMVP(amvp, outmv, mvpIdx, bits, cost);
+                else
+                {
+                    /* It is more accurate to compare with actual mvp that was used in motionestimate than amvp[mvpIdx]. Here 
+                      the actual mvp is bestME from pass 1 for that mvpIdx */
+                    int diffBits = m_me.bitcost(outmv, amvp[!mvpIdx]) - m_me.bitcost(outmv, mvpIn);
+                    if (diffBits < 0)
+                    {
+                        mvpIdx = !mvpIdx;
+                        uint32_t origOutBits = bits;
+                        bits = origOutBits + diffBits;
+                        cost = (cost - m_rdCost.getCost(origOutBits)) + m_rdCost.getCost(bits);
+                    }
+                    mvp = amvp[mvpIdx];
+                }
 
                 if (cost < bestME[list].cost)
                 {
@@ -2605,6 +2646,7 @@
         interMode.chromaDistortion += m_rdCost.scaleChromaDist(2, primitives.chroma[m_csp].cu[part].sse_pp(fencYuv->m_buf[2], fencYuv->m_csize, reconYuv->m_buf[2], reconYuv->m_csize));
         interMode.distortion += interMode.chromaDistortion;
     }
+    cu.m_distortion[0] = interMode.distortion;
     m_entropyCoder.load(m_rqt[depth].cur);
     m_entropyCoder.resetBits();
     if (m_slice->m_pps->bTransquantBypassEnabled)
@@ -2617,6 +2659,9 @@
     interMode.totalBits = interMode.mvBits + skipFlagBits;
     if (m_rdCost.m_psyRd)
         interMode.psyEnergy = m_rdCost.psyCost(part, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size);
+    else if(m_rdCost.m_ssimRd)
+        interMode.ssimEnergy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size, cu.m_log2CUSize[0], TEXT_LUMA, 0);
+
     interMode.resEnergy = primitives.cu[part].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size);
     updateModeCost(interMode);
     m_entropyCoder.store(interMode.contexts);
@@ -2687,13 +2732,17 @@
         m_entropyCoder.codeQtRootCbfZero();
         uint32_t cbf0Bits = m_entropyCoder.getNumberOfWrittenBits();
 
-        uint64_t cbf0Cost;
-        uint32_t cbf0Energy;
+        uint32_t cbf0Energy; uint64_t cbf0Cost;
         if (m_rdCost.m_psyRd)
         {
             cbf0Energy = m_rdCost.psyCost(log2CUSize - 2, fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size);
             cbf0Cost = m_rdCost.calcPsyRdCost(cbf0Dist, cbf0Bits, cbf0Energy);
         }
+        else if(m_rdCost.m_ssimRd)
+        {
+            cbf0Energy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size, log2CUSize, TEXT_LUMA, 0);
+            cbf0Cost = m_rdCost.calcSsimRdCost(cbf0Dist, cbf0Bits, cbf0Energy);
+        }
         else
             cbf0Cost = m_rdCost.calcRdCost(cbf0Dist, cbf0Bits);
 
@@ -2762,11 +2811,15 @@
     }
     if (m_rdCost.m_psyRd)
         interMode.psyEnergy = m_rdCost.psyCost(sizeIdx, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size);
+    else if(m_rdCost.m_ssimRd)
+        interMode.ssimEnergy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size, cu.m_log2CUSize[0], TEXT_LUMA, 0);
+
     interMode.resEnergy = primitives.cu[sizeIdx].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size);
     interMode.totalBits = bits;
     interMode.lumaDistortion = bestLumaDist;
     interMode.coeffBits = coeffBits;
     interMode.mvBits = mvBits;
+    cu.m_distortion[0] = interMode.distortion;
     updateModeCost(interMode);
     checkDQP(interMode, cuGeom);
 }
@@ -2908,12 +2961,14 @@
     }
 }
 
-uint64_t Search::estimateNullCbfCost(sse_t dist, uint32_t psyEnergy, uint32_t tuDepth, TextType compId)
+uint64_t Search::estimateNullCbfCost(sse_t dist, uint32_t energy, uint32_t tuDepth, TextType compId)
 {
     uint32_t nullBits = m_entropyCoder.estimateCbfBits(0, compId, tuDepth);
 
     if (m_rdCost.m_psyRd)
-        return m_rdCost.calcPsyRdCost(dist, nullBits, psyEnergy);
+        return m_rdCost.calcPsyRdCost(dist, nullBits, energy);
+    else if(m_rdCost.m_ssimRd)
+        return m_rdCost.calcSsimRdCost(dist, nullBits, energy);
     else
         return m_rdCost.calcRdCost(dist, nullBits);
 }
@@ -2962,6 +3017,8 @@
 
     if (m_rdCost.m_psyRd)
         splitCost.rdcost = m_rdCost.calcPsyRdCost(splitCost.distortion, splitCost.bits, splitCost.energy);
+    else if(m_rdCost.m_ssimRd)
+        splitCost.rdcost = m_rdCost.calcSsimRdCost(splitCost.distortion, splitCost.bits, splitCost.energy);
     else
         splitCost.rdcost = m_rdCost.calcRdCost(splitCost.distortion, splitCost.bits);
         
@@ -3034,7 +3091,7 @@
     uint32_t numSig[MAX_NUM_COMPONENT][2 /*0 = top (or whole TU for non-4:2:2) sub-TU, 1 = bottom sub-TU*/] = { { 0, 0 }, {0, 0}, {0, 0} };
     uint32_t singleBits[MAX_NUM_COMPONENT][2 /*0 = top (or whole TU for non-4:2:2) sub-TU, 1 = bottom sub-TU*/] = { { 0, 0 }, { 0, 0 }, { 0, 0 } };
     sse_t singleDist[MAX_NUM_COMPONENT][2 /*0 = top (or whole TU for non-4:2:2) sub-TU, 1 = bottom sub-TU*/] = { { 0, 0 }, { 0, 0 }, { 0, 0 } };
-    uint32_t singlePsyEnergy[MAX_NUM_COMPONENT][2 /*0 = top (or whole TU for non-4:2:2) sub-TU, 1 = bottom sub-TU*/] = { { 0, 0 }, { 0, 0 }, { 0, 0 } };
+    uint32_t singleEnergy[MAX_NUM_COMPONENT][2 /*0 = top (or whole TU for non-4:2:2) sub-TU, 1 = bottom sub-TU*/] = { { 0, 0 }, { 0, 0 }, { 0, 0 } };
     uint32_t bestTransformMode[MAX_NUM_COMPONENT][2 /*0 = top (or whole TU for non-4:2:2) sub-TU, 1 = bottom sub-TU*/] = { { 0, 0 }, { 0, 0 }, { 0, 0 } };
     uint64_t minCost[MAX_NUM_COMPONENT][2 /*0 = top (or whole TU for non-4:2:2) sub-TU, 1 = bottom sub-TU*/] = { { MAX_INT64, MAX_INT64 }, {MAX_INT64, MAX_INT64}, {MAX_INT64, MAX_INT64} };
 
@@ -3083,9 +3140,11 @@
 
         //Assuming zero residual 
         sse_t zeroDistY = primitives.cu[partSize].sse_pp(fenc, fencYuv->m_size, mode.predYuv.getLumaAddr(absPartIdx), mode.predYuv.m_size);
-        uint32_t zeroPsyEnergyY = 0;
+        uint32_t zeroEnergyY = 0;
         if (m_rdCost.m_psyRd)
-            zeroPsyEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, mode.predYuv.getLumaAddr(absPartIdx), mode.predYuv.m_size);
+            zeroEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, mode.predYuv.getLumaAddr(absPartIdx), mode.predYuv.m_size);
+        else if(m_rdCost.m_ssimRd)
+            zeroEnergyY = m_quant.ssimDistortion(cu, fenc, fencYuv->m_size, mode.predYuv.getLumaAddr(absPartIdx), mode.predYuv.m_size, log2TrSize, TEXT_LUMA, absPartIdx);
 
         int16_t* curResiY = m_rqt[qtLayer].resiQtYuv.getLumaAddr(absPartIdx);
         uint32_t strideResiY = m_rqt[qtLayer].resiQtYuv.m_size;
@@ -3102,11 +3161,16 @@
 
             const sse_t nonZeroDistY = primitives.cu[partSize].sse_pp(fenc, fencYuv->m_size, curReconY, strideReconY);
             uint32_t nzCbfBitsY = m_entropyCoder.estimateCbfBits(cbfFlag[TEXT_LUMA][0], TEXT_LUMA, tuDepth);
-            uint32_t nonZeroPsyEnergyY = 0; uint64_t singleCostY = 0;
+            uint32_t nonZeroEnergyY = 0; uint64_t singleCostY = 0;
             if (m_rdCost.m_psyRd)
             {
-                nonZeroPsyEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, curReconY, strideReconY);
-                singleCostY = m_rdCost.calcPsyRdCost(nonZeroDistY, nzCbfBitsY + singleBits[TEXT_LUMA][0], nonZeroPsyEnergyY);
+                nonZeroEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, curReconY, strideReconY);
+                singleCostY = m_rdCost.calcPsyRdCost(nonZeroDistY, nzCbfBitsY + singleBits[TEXT_LUMA][0], nonZeroEnergyY);
+            }
+            else if(m_rdCost.m_ssimRd)
+            {
+                nonZeroEnergyY = m_quant.ssimDistortion(cu, fenc, fencYuv->m_size, curReconY, strideReconY, log2TrSize, TEXT_LUMA, absPartIdx);
+                singleCostY = m_rdCost.calcSsimRdCost(nonZeroDistY, nzCbfBitsY + singleBits[TEXT_LUMA][0], nonZeroEnergyY);
             }
             else
                 singleCostY = m_rdCost.calcRdCost(nonZeroDistY, nzCbfBitsY + singleBits[TEXT_LUMA][0]);
@@ -3114,14 +3178,14 @@
             if (cu.m_tqBypass[0])
             {
                 singleDist[TEXT_LUMA][0] = nonZeroDistY;
-                singlePsyEnergy[TEXT_LUMA][0] = nonZeroPsyEnergyY;
+                singleEnergy[TEXT_LUMA][0] = nonZeroEnergyY;
             }
             else
             {
                 // zero-cost calculation for luma. This is an approximation
                 // Initial cost calculation was also an approximation. First resetting the bit counter and then encoding zero cbf.
                 // Now encoding the zero cbf without writing into bitstream, keeping m_fracBits unchanged. The same is valid for chroma.
-                uint64_t nullCostY = estimateNullCbfCost(zeroDistY, zeroPsyEnergyY, tuDepth, TEXT_LUMA);
+                uint64_t nullCostY = estimateNullCbfCost(zeroDistY, zeroEnergyY, tuDepth, TEXT_LUMA);
 
                 if (nullCostY < singleCostY)
                 {
@@ -3135,25 +3199,25 @@
                     if (checkTransformSkipY)
                         minCost[TEXT_LUMA][0] = nullCostY;
                     singleDist[TEXT_LUMA][0] = zeroDistY;
-                    singlePsyEnergy[TEXT_LUMA][0] = zeroPsyEnergyY;
+                    singleEnergy[TEXT_LUMA][0] = zeroEnergyY;
                 }
                 else
                 {
                     if (checkTransformSkipY)
                         minCost[TEXT_LUMA][0] = singleCostY;
                     singleDist[TEXT_LUMA][0] = nonZeroDistY;
-                    singlePsyEnergy[TEXT_LUMA][0] = nonZeroPsyEnergyY;
+                    singleEnergy[TEXT_LUMA][0] = nonZeroEnergyY;
                 }
             }
         }
         else
         {
             if (checkTransformSkipY)
-                minCost[TEXT_LUMA][0] = estimateNullCbfCost(zeroDistY, zeroPsyEnergyY, tuDepth, TEXT_LUMA);
+                minCost[TEXT_LUMA][0] = estimateNullCbfCost(zeroDistY, zeroEnergyY, tuDepth, TEXT_LUMA);
             primitives.cu[partSize].blockfill_s(curResiY, strideResiY, 0);
             singleDist[TEXT_LUMA][0] = zeroDistY;
             singleBits[TEXT_LUMA][0] = 0;
-            singlePsyEnergy[TEXT_LUMA][0] = zeroPsyEnergyY;
+            singleEnergy[TEXT_LUMA][0] = zeroEnergyY;
         }
 
         cu.setCbfSubParts(cbfFlag[TEXT_LUMA][0] << tuDepth, TEXT_LUMA, absPartIdx, depth);
@@ -3165,7 +3229,7 @@
             for (uint32_t chromaId = TEXT_CHROMA_U; chromaId <= TEXT_CHROMA_V; chromaId++)
             {
                 sse_t zeroDistC = 0;
-                uint32_t zeroPsyEnergyC = 0;
+                uint32_t zeroEnergyC = 0;
                 coeff_t* coeffCurC = m_rqt[qtLayer].coeffRQT[chromaId] + coeffOffsetC;
                 TURecurse tuIterator(splitIntoSubTUs ? VERTICAL_SPLIT : DONT_SPLIT, absPartIdxStep, absPartIdx);
 
@@ -3193,9 +3257,11 @@
                     int16_t* curResiC = m_rqt[qtLayer].resiQtYuv.getChromaAddr(chromaId, absPartIdxC);
                     zeroDistC = m_rdCost.scaleChromaDist(chromaId, primitives.cu[log2TrSizeC - 2].sse_pp(fenc, fencYuv->m_csize, mode.predYuv.getChromaAddr(chromaId, absPartIdxC), mode.predYuv.m_csize));
 
+                    // Assuming zero residual 
                     if (m_rdCost.m_psyRd)
-                    //Assuming zero residual 
-                        zeroPsyEnergyC = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, mode.predYuv.getChromaAddr(chromaId, absPartIdxC), mode.predYuv.m_csize);
+                        zeroEnergyC = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, mode.predYuv.getChromaAddr(chromaId, absPartIdxC), mode.predYuv.m_csize);
+                    else if(m_rdCost.m_ssimRd)
+                        zeroEnergyC = m_quant.ssimDistortion(cu, fenc, fencYuv->m_csize, mode.predYuv.getChromaAddr(chromaId, absPartIdxC), mode.predYuv.m_csize, log2TrSizeC, (TextType)chromaId, absPartIdxC);
 
                     if (cbfFlag[chromaId][tuIterator.section])
                     {
@@ -3209,11 +3275,16 @@
                         primitives.cu[partSizeC].add_ps(curReconC, strideReconC, mode.predYuv.getChromaAddr(chromaId, absPartIdxC), curResiC, mode.predYuv.m_csize, strideResiC);
                         sse_t nonZeroDistC = m_rdCost.scaleChromaDist(chromaId, primitives.cu[partSizeC].sse_pp(fenc, fencYuv->m_csize, curReconC, strideReconC));
                         uint32_t nzCbfBitsC = m_entropyCoder.estimateCbfBits(cbfFlag[chromaId][tuIterator.section], (TextType)chromaId, tuDepth);
-                        uint32_t nonZeroPsyEnergyC = 0; uint64_t singleCostC = 0;
+                        uint32_t nonZeroEnergyC = 0; uint64_t singleCostC = 0;
                         if (m_rdCost.m_psyRd)
                         {
-                            nonZeroPsyEnergyC = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, curReconC, strideReconC);
-                            singleCostC = m_rdCost.calcPsyRdCost(nonZeroDistC, nzCbfBitsC + singleBits[chromaId][tuIterator.section], nonZeroPsyEnergyC);
+                            nonZeroEnergyC = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, curReconC, strideReconC);
+                            singleCostC = m_rdCost.calcPsyRdCost(nonZeroDistC, nzCbfBitsC + singleBits[chromaId][tuIterator.section], nonZeroEnergyC);
+                        }
+                        else if(m_rdCost.m_ssimRd)
+                        {
+                            nonZeroEnergyC = m_quant.ssimDistortion(cu, fenc, fencYuv->m_csize, curReconC, strideReconC, log2TrSizeC, (TextType)chromaId, absPartIdxC);
+                            singleCostC = m_rdCost.calcSsimRdCost(nonZeroDistC, nzCbfBitsC + singleBits[chromaId][tuIterator.section], nonZeroEnergyC);
                         }
                         else
                             singleCostC = m_rdCost.calcRdCost(nonZeroDistC, nzCbfBitsC + singleBits[chromaId][tuIterator.section]);
@@ -3221,12 +3292,12 @@
                         if (cu.m_tqBypass[0])
                         {
                             singleDist[chromaId][tuIterator.section] = nonZeroDistC;
-                            singlePsyEnergy[chromaId][tuIterator.section] = nonZeroPsyEnergyC;
+                            singleEnergy[chromaId][tuIterator.section] = nonZeroEnergyC;
                         }
                         else
                         {
                             //zero-cost calculation for chroma. This is an approximation
-                            uint64_t nullCostC = estimateNullCbfCost(zeroDistC, zeroPsyEnergyC, tuDepth, (TextType)chromaId);
+                            uint64_t nullCostC = estimateNullCbfCost(zeroDistC, zeroEnergyC, tuDepth, (TextType)chromaId);
 
                             if (nullCostC < singleCostC)
                             {
@@ -3240,25 +3311,25 @@
                                 if (checkTransformSkipC)
                                     minCost[chromaId][tuIterator.section] = nullCostC;
                                 singleDist[chromaId][tuIterator.section] = zeroDistC;
-                                singlePsyEnergy[chromaId][tuIterator.section] = zeroPsyEnergyC;
+                                singleEnergy[chromaId][tuIterator.section] = zeroEnergyC;
                             }
                             else
                             {
                                 if (checkTransformSkipC)
                                     minCost[chromaId][tuIterator.section] = singleCostC;
                                 singleDist[chromaId][tuIterator.section] = nonZeroDistC;
-                                singlePsyEnergy[chromaId][tuIterator.section] = nonZeroPsyEnergyC;
+                                singleEnergy[chromaId][tuIterator.section] = nonZeroEnergyC;
                             }
                         }
                     }
                     else
                     {
                         if (checkTransformSkipC)
-                            minCost[chromaId][tuIterator.section] = estimateNullCbfCost(zeroDistC, zeroPsyEnergyC, tuDepthC, (TextType)chromaId);
+                            minCost[chromaId][tuIterator.section] = estimateNullCbfCost(zeroDistC, zeroEnergyC, tuDepthC, (TextType)chromaId);
                         primitives.cu[partSizeC].blockfill_s(curResiC, strideResiC, 0);
                         singleBits[chromaId][tuIterator.section] = 0;
                         singleDist[chromaId][tuIterator.section] = zeroDistC;
-                        singlePsyEnergy[chromaId][tuIterator.section] = zeroPsyEnergyC;
+                        singleEnergy[chromaId][tuIterator.section] = zeroEnergyC;
                     }
 
                     cu.setCbfPartRange(cbfFlag[chromaId][tuIterator.section] << tuDepth, (TextType)chromaId, absPartIdxC, tuIterator.absPartIdxStep);
@@ -3283,7 +3354,7 @@
         if (checkTransformSkipY)
         {
             sse_t nonZeroDistY = 0;
-            uint32_t nonZeroPsyEnergyY = 0;
+            uint32_t nonZeroEnergyY = 0;
             uint64_t singleCostY = MAX_INT64;
 
             m_entropyCoder.load(m_rqt[depth].rqtRoot);
@@ -3311,8 +3382,13 @@
 
                 if (m_rdCost.m_psyRd)
                 {
-                    nonZeroPsyEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, m_tsRecon, trSize);
-                    singleCostY = m_rdCost.calcPsyRdCost(nonZeroDistY, skipSingleBitsY, nonZeroPsyEnergyY);
+                    nonZeroEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, m_tsRecon, trSize);
+                    singleCostY = m_rdCost.calcPsyRdCost(nonZeroDistY, skipSingleBitsY, nonZeroEnergyY);
+                }
+                else if(m_rdCost.m_ssimRd)
+                {
+                    nonZeroEnergyY = m_quant.ssimDistortion(cu, fenc, fencYuv->m_size, m_tsRecon, trSize, log2TrSize, TEXT_LUMA, absPartIdx);
+                    singleCostY = m_rdCost.calcSsimRdCost(nonZeroDistY, skipSingleBitsY, nonZeroEnergyY);
                 }
                 else
                     singleCostY = m_rdCost.calcRdCost(nonZeroDistY, skipSingleBitsY);
@@ -3323,7 +3399,7 @@
             else
             {
                 singleDist[TEXT_LUMA][0] = nonZeroDistY;
-                singlePsyEnergy[TEXT_LUMA][0] = nonZeroPsyEnergyY;
+                singleEnergy[TEXT_LUMA][0] = nonZeroEnergyY;
                 cbfFlag[TEXT_LUMA][0] = !!numSigTSkipY;
                 bestTransformMode[TEXT_LUMA][0] = 1;
                 if (m_param->limitTU)
@@ -3339,7 +3415,7 @@
         if (codeChroma && checkTransformSkipC)
         {
             sse_t nonZeroDistC = 0;
-            uint32_t nonZeroPsyEnergyC = 0;
+            uint32_t nonZeroEnergyC = 0;
             uint64_t singleCostC = MAX_INT64;
             uint32_t strideResiC = m_rqt[qtLayer].resiQtYuv.m_csize;
             uint32_t coeffOffsetC = coeffOffsetY >> (m_hChromaShift + m_vChromaShift);
@@ -3382,9 +3458,13 @@
                         nonZeroDistC = m_rdCost.scaleChromaDist(chromaId, primitives.cu[partSizeC].sse_pp(fenc, fencYuv->m_csize, m_tsRecon, trSizeC));
                         if (m_rdCost.m_psyRd)
                         {
-
-                            nonZeroPsyEnergyC = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, m_tsRecon, trSizeC);
-                            singleCostC = m_rdCost.calcPsyRdCost(nonZeroDistC, singleBits[chromaId][tuIterator.section], nonZeroPsyEnergyC);
+                            nonZeroEnergyC = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, m_tsRecon, trSizeC);
+                            singleCostC = m_rdCost.calcPsyRdCost(nonZeroDistC, singleBits[chromaId][tuIterator.section], nonZeroEnergyC);
+                        }
+                        else if(m_rdCost.m_ssimRd)
+                        {
+                            nonZeroEnergyC = m_quant.ssimDistortion(cu, fenc, mode.fencYuv->m_csize, m_tsRecon, trSizeC, log2TrSizeC, (TextType)chromaId, absPartIdxC);
+                            singleCostC = m_rdCost.calcSsimRdCost(nonZeroDistC, singleBits[chromaId][tuIterator.section], nonZeroEnergyC);
                         }
                         else
                             singleCostC = m_rdCost.calcRdCost(nonZeroDistC, singleBits[chromaId][tuIterator.section]);
@@ -3395,7 +3475,7 @@
                     else
                     {
                         singleDist[chromaId][tuIterator.section] = nonZeroDistC;
-                        singlePsyEnergy[chromaId][tuIterator.section] = nonZeroPsyEnergyC;
+                        singleEnergy[chromaId][tuIterator.section] = nonZeroEnergyC;
                         cbfFlag[chromaId][tuIterator.section] = !!numSigTSkipC;
                         bestTransformMode[chromaId][tuIterator.section] = 1;
                         uint32_t numCoeffC = 1 << (log2TrSizeC << 1);
@@ -3454,7 +3534,7 @@
         fullCost.bits = bSplitPresentFlag ? cbfBits + coeffBits : coeffBits;
 
         fullCost.distortion += singleDist[TEXT_LUMA][0];
-        fullCost.energy += singlePsyEnergy[TEXT_LUMA][0];// need to check we need to add chroma also
+        fullCost.energy += singleEnergy[TEXT_LUMA][0];// need to check we need to add chroma also
         for (uint32_t subTUIndex = 0; subTUIndex < 2; subTUIndex++)
         {
             fullCost.distortion += singleDist[TEXT_CHROMA_U][subTUIndex];
@@ -3463,6 +3543,8 @@
 
         if (m_rdCost.m_psyRd)
             fullCost.rdcost = m_rdCost.calcPsyRdCost(fullCost.distortion, fullCost.bits, fullCost.energy);
+        else if(m_rdCost.m_ssimRd)
+            fullCost.rdcost = m_rdCost.calcSsimRdCost(fullCost.distortion, fullCost.bits, fullCost.energy);
         else
             fullCost.rdcost = m_rdCost.calcRdCost(fullCost.distortion, fullCost.bits);

 
@@ -78,6 +78,7 @@
     m_numLayers = g_log2Size[param.maxCUSize] - 2;
 
     m_rdCost.setPsyRdScale(param.psyRd);
+    m_rdCost.setSsimRd(param.bSsimRd);
     m_me.init(param.internalCsp);
 
     bool ok = m_quant.init(param.psyRdoq, scalingList, m_entropyCoder);
@@ -417,6 +418,11 @@
             fullCost.energy = m_rdCost.psyCost(sizeIdx, fenc, mode.fencYuv->m_size, reconQt, reconQtStride);
             fullCost.rdcost = m_rdCost.calcPsyRdCost(fullCost.distortion, fullCost.bits, fullCost.energy);
         }
+        else if(m_rdCost.m_ssimRd)
+        {
+            fullCost.energy = m_quant.ssimDistortion(cu, fenc, stride, reconQt, reconQtStride, log2TrSize, TEXT_LUMA, absPartIdx);
+            fullCost.rdcost = m_rdCost.calcSsimRdCost(fullCost.distortion, fullCost.bits, fullCost.energy);
+        }
         else
             fullCost.rdcost = m_rdCost.calcRdCost(fullCost.distortion, fullCost.bits);
     }
@@ -460,6 +466,8 @@
 
             if (m_rdCost.m_psyRd)
                 splitCost.rdcost = m_rdCost.calcPsyRdCost(splitCost.distortion, splitCost.bits, splitCost.energy);
+            else if(m_rdCost.m_ssimRd)
+                splitCost.rdcost = m_rdCost.calcSsimRdCost(splitCost.distortion, splitCost.bits, splitCost.energy);
             else
                 splitCost.rdcost = m_rdCost.calcRdCost(splitCost.distortion, splitCost.bits);
         }
@@ -625,6 +633,11 @@
             tmpEnergy = m_rdCost.psyCost(sizeIdx, fenc, fencYuv->m_size, tmpRecon, tmpReconStride);
             tmpCost = m_rdCost.calcPsyRdCost(tmpDist, tmpBits, tmpEnergy);
         }
+        else if(m_rdCost.m_ssimRd)
+        {
+            tmpEnergy = m_quant.ssimDistortion(cu, fenc, stride, tmpRecon, tmpReconStride, log2TrSize, TEXT_LUMA, absPartIdx);
+            tmpCost = m_rdCost.calcSsimRdCost(tmpDist, tmpBits, tmpEnergy);
+        }
         else
             tmpCost = m_rdCost.calcRdCost(tmpDist, tmpBits);
 
@@ -899,6 +912,8 @@
 
             if (m_rdCost.m_psyRd)
                 outCost.energy += m_rdCost.psyCost(sizeIdxC, fenc, stride, reconQt, reconQtStride);
+            else if(m_rdCost.m_ssimRd)
+                outCost.energy += m_quant.ssimDistortion(cu, fenc, stride, reconQt, reconQtStride, log2TrSizeC, ttype, absPartIdxC);
 
             primitives.cu[sizeIdxC].copy_pp(picReconC, picStride, reconQt, reconQtStride);
         }
@@ -1016,6 +1031,11 @@
                     tmpEnergy = m_rdCost.psyCost(sizeIdxC, fenc, stride, reconQt, reconQtStride);
                     tmpCost = m_rdCost.calcPsyRdCost(tmpDist, tmpBits, tmpEnergy);
                 }
+                else if(m_rdCost.m_ssimRd)
+                {
+                    tmpEnergy = m_quant.ssimDistortion(cu, fenc, stride, reconQt, reconQtStride, log2TrSizeC, ttype, absPartIdxC);
+                    tmpCost = m_rdCost.calcSsimRdCost(tmpDist, tmpBits, tmpEnergy);
+                }
                 else
                     tmpCost = m_rdCost.calcRdCost(tmpDist, tmpBits);
 
@@ -1207,7 +1227,7 @@
     }
     else
         intraMode.distortion += intraMode.lumaDistortion;
-
+    cu.m_distortion[0] = intraMode.distortion;
     m_entropyCoder.resetBits();
     if (m_slice->m_pps->bTransquantBypassEnabled)
         m_entropyCoder.codeCUTransquantBypassFlag(cu.m_tqBypass[0]);
@@ -1229,11 +1249,12 @@
     m_entropyCoder.store(intraMode.contexts);
     intraMode.totalBits = m_entropyCoder.getNumberOfWrittenBits();
     intraMode.coeffBits = intraMode.totalBits - intraMode.mvBits - skipFlagBits;
+    const Yuv* fencYuv = intraMode.fencYuv;
     if (m_rdCost.m_psyRd)
-    {
-        const Yuv* fencYuv = intraMode.fencYuv;
         intraMode.psyEnergy = m_rdCost.psyCost(cuGeom.log2CUSize - 2, fencYuv->m_buf[0], fencYuv->m_size, intraMode.reconYuv.m_buf[0], intraMode.reconYuv.m_size);
-    }
+    else if(m_rdCost.m_ssimRd)
+        intraMode.ssimEnergy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, intraMode.reconYuv.m_buf[0], intraMode.reconYuv.m_size, cuGeom.log2CUSize, TEXT_LUMA, 0);
+
     intraMode.resEnergy = primitives.cu[cuGeom.log2CUSize - 2].sse_pp(intraMode.fencYuv->m_buf[0], intraMode.fencYuv->m_size, intraMode.predYuv.m_buf[0], intraMode.predYuv.m_size);
 
     updateModeCost(intraMode);
@@ -1448,12 +1469,13 @@
 
     intraMode.totalBits = m_entropyCoder.getNumberOfWrittenBits();
     intraMode.coeffBits = intraMode.totalBits - intraMode.mvBits - skipFlagBits;
+    const Yuv* fencYuv = intraMode.fencYuv;
     if (m_rdCost.m_psyRd)
-    {
-        const Yuv* fencYuv = intraMode.fencYuv;
         intraMode.psyEnergy = m_rdCost.psyCost(cuGeom.log2CUSize - 2, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size);
-    }
-    intraMode.resEnergy = primitives.cu[cuGeom.log2CUSize - 2].sse_pp(intraMode.fencYuv->m_buf[0], intraMode.fencYuv->m_size, intraMode.predYuv.m_buf[0], intraMode.predYuv.m_size);
+    else if(m_rdCost.m_ssimRd)
+        intraMode.ssimEnergy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size, cuGeom.log2CUSize, TEXT_LUMA, 0);
+
+    intraMode.resEnergy = primitives.cu[cuGeom.log2CUSize - 2].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, intraMode.predYuv.m_buf[0], intraMode.predYuv.m_size);
     m_entropyCoder.store(intraMode.contexts);
     updateModeCost(intraMode);
     checkDQP(intraMode, cuGeom);
@@ -1778,7 +1800,7 @@
             codeCoeffQTChroma(cu, initTuDepth, absPartIdxC, TEXT_CHROMA_U);
             codeCoeffQTChroma(cu, initTuDepth, absPartIdxC, TEXT_CHROMA_V);
             uint32_t bits = m_entropyCoder.getNumberOfWrittenBits();
-            uint64_t cost = m_rdCost.m_psyRd ? m_rdCost.calcPsyRdCost(outCost.distortion, bits, outCost.energy)
+            uint64_t cost = m_rdCost.m_psyRd ? m_rdCost.calcPsyRdCost(outCost.distortion, bits, outCost.energy) : m_rdCost.m_ssimRd ? m_rdCost.calcSsimRdCost(outCost.distortion, bits, outCost.energy)
                                              : m_rdCost.calcRdCost(outCost.distortion, bits);
 
             if (cost < bestCost)
@@ -2128,7 +2150,7 @@
         cu.getNeighbourMV(puIdx, pu.puAbsPartIdx, interMode.interNeighbours);
 
         /* Uni-directional prediction */
-        if (m_param->analysisMode == X265_ANALYSIS_LOAD)
+        if (m_param->analysisMode == X265_ANALYSIS_LOAD || (m_param->analysisMultiPassRefine && m_param->rc.bStatRead))
         {
             for (int list = 0; list < numPredDir; list++)
             {
@@ -2153,7 +2175,11 @@
                         m_me.integral[planes] = interMode.fencYuv->m_integral[list][ref][planes] + puX * pu.width + puY * pu.height * m_slice->m_refFrameList[list][ref]->m_reconPic->m_stride;
                 }
                 setSearchRange(cu, mvp, m_param->searchRange, mvmin, mvmax);
-                int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv,
+                MV mvpIn = mvp;
+                if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && mvpIdx == bestME[list].mvpIdx)
+                    mvpIn = bestME[list].mv;
+                    
+                int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvpIn, numMvc, mvc, m_param->searchRange, outmv,
                   m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
 
                 /* Get total cost of partition, but only include MV bit cost once */
@@ -2162,7 +2188,22 @@
                 uint32_t cost = (satdCost - mvCost) + m_rdCost.getCost(bits);
 
                 /* Refine MVP selection, updates: mvpIdx, bits, cost */
-                mvp = checkBestMVP(amvp, outmv, mvpIdx, bits, cost);
+                if (!m_param->analysisMultiPassRefine)
+                    mvp = checkBestMVP(amvp, outmv, mvpIdx, bits, cost);
+                else
+                {
+                    /* It is more accurate to compare with actual mvp that was used in motionestimate than amvp[mvpIdx]. Here 
+                      the actual mvp is bestME from pass 1 for that mvpIdx */
+                    int diffBits = m_me.bitcost(outmv, amvp[!mvpIdx]) - m_me.bitcost(outmv, mvpIn);
+                    if (diffBits < 0)
+                    {
+                        mvpIdx = !mvpIdx;
+                        uint32_t origOutBits = bits;
+                        bits = origOutBits + diffBits;
+                        cost = (cost - m_rdCost.getCost(origOutBits)) + m_rdCost.getCost(bits);
+                    }
+                    mvp = amvp[mvpIdx];
+                }
 
                 if (cost < bestME[list].cost)
                 {
@@ -2605,6 +2646,7 @@
         interMode.chromaDistortion += m_rdCost.scaleChromaDist(2, primitives.chroma[m_csp].cu[part].sse_pp(fencYuv->m_buf[2], fencYuv->m_csize, reconYuv->m_buf[2], reconYuv->m_csize));
         interMode.distortion += interMode.chromaDistortion;
     }
+    cu.m_distortion[0] = interMode.distortion;
     m_entropyCoder.load(m_rqt[depth].cur);
     m_entropyCoder.resetBits();
     if (m_slice->m_pps->bTransquantBypassEnabled)
@@ -2617,6 +2659,9 @@
     interMode.totalBits = interMode.mvBits + skipFlagBits;
     if (m_rdCost.m_psyRd)
         interMode.psyEnergy = m_rdCost.psyCost(part, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size);
+    else if(m_rdCost.m_ssimRd)
+        interMode.ssimEnergy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size, cu.m_log2CUSize[0], TEXT_LUMA, 0);
+
     interMode.resEnergy = primitives.cu[part].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size);
     updateModeCost(interMode);
     m_entropyCoder.store(interMode.contexts);
@@ -2687,13 +2732,17 @@
         m_entropyCoder.codeQtRootCbfZero();
         uint32_t cbf0Bits = m_entropyCoder.getNumberOfWrittenBits();
 
-        uint64_t cbf0Cost;
-        uint32_t cbf0Energy;
+        uint32_t cbf0Energy; uint64_t cbf0Cost;
         if (m_rdCost.m_psyRd)
         {
             cbf0Energy = m_rdCost.psyCost(log2CUSize - 2, fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size);
             cbf0Cost = m_rdCost.calcPsyRdCost(cbf0Dist, cbf0Bits, cbf0Energy);
         }
+        else if(m_rdCost.m_ssimRd)
+        {
+            cbf0Energy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size, log2CUSize, TEXT_LUMA, 0);
+            cbf0Cost = m_rdCost.calcSsimRdCost(cbf0Dist, cbf0Bits, cbf0Energy);
+        }
         else
             cbf0Cost = m_rdCost.calcRdCost(cbf0Dist, cbf0Bits);
 
@@ -2762,11 +2811,15 @@
     }
     if (m_rdCost.m_psyRd)
         interMode.psyEnergy = m_rdCost.psyCost(sizeIdx, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size);
+    else if(m_rdCost.m_ssimRd)
+        interMode.ssimEnergy = m_quant.ssimDistortion(cu, fencYuv->m_buf[0], fencYuv->m_size, reconYuv->m_buf[0], reconYuv->m_size, cu.m_log2CUSize[0], TEXT_LUMA, 0);
+
     interMode.resEnergy = primitives.cu[sizeIdx].sse_pp(fencYuv->m_buf[0], fencYuv->m_size, predYuv->m_buf[0], predYuv->m_size);
     interMode.totalBits = bits;
     interMode.lumaDistortion = bestLumaDist;
     interMode.coeffBits = coeffBits;
     interMode.mvBits = mvBits;
+    cu.m_distortion[0] = interMode.distortion;
     updateModeCost(interMode);
     checkDQP(interMode, cuGeom);
 }
@@ -2908,12 +2961,14 @@
     }
 }
 
-uint64_t Search::estimateNullCbfCost(sse_t dist, uint32_t psyEnergy, uint32_t tuDepth, TextType compId)
+uint64_t Search::estimateNullCbfCost(sse_t dist, uint32_t energy, uint32_t tuDepth, TextType compId)
 {
     uint32_t nullBits = m_entropyCoder.estimateCbfBits(0, compId, tuDepth);
 
     if (m_rdCost.m_psyRd)
-        return m_rdCost.calcPsyRdCost(dist, nullBits, psyEnergy);
+        return m_rdCost.calcPsyRdCost(dist, nullBits, energy);
+    else if(m_rdCost.m_ssimRd)
+        return m_rdCost.calcSsimRdCost(dist, nullBits, energy);
     else
         return m_rdCost.calcRdCost(dist, nullBits);
 }
@@ -2962,6 +3017,8 @@
 
     if (m_rdCost.m_psyRd)
         splitCost.rdcost = m_rdCost.calcPsyRdCost(splitCost.distortion, splitCost.bits, splitCost.energy);
+    else if(m_rdCost.m_ssimRd)
+        splitCost.rdcost = m_rdCost.calcSsimRdCost(splitCost.distortion, splitCost.bits, splitCost.energy);
     else
         splitCost.rdcost = m_rdCost.calcRdCost(splitCost.distortion, splitCost.bits);
         
@@ -3034,7 +3091,7 @@
     uint32_t numSig[MAX_NUM_COMPONENT][2 /*0 = top (or whole TU for non-4:2:2) sub-TU, 1 = bottom sub-TU*/] = { { 0, 0 }, {0, 0}, {0, 0} };
     uint32_t singleBits[MAX_NUM_COMPONENT][2 /*0 = top (or whole TU for non-4:2:2) sub-TU, 1 = bottom sub-TU*/] = { { 0, 0 }, { 0, 0 }, { 0, 0 } };
     sse_t singleDist[MAX_NUM_COMPONENT][2 /*0 = top (or whole TU for non-4:2:2) sub-TU, 1 = bottom sub-TU*/] = { { 0, 0 }, { 0, 0 }, { 0, 0 } };
-    uint32_t singlePsyEnergy[MAX_NUM_COMPONENT][2 /*0 = top (or whole TU for non-4:2:2) sub-TU, 1 = bottom sub-TU*/] = { { 0, 0 }, { 0, 0 }, { 0, 0 } };
+    uint32_t singleEnergy[MAX_NUM_COMPONENT][2 /*0 = top (or whole TU for non-4:2:2) sub-TU, 1 = bottom sub-TU*/] = { { 0, 0 }, { 0, 0 }, { 0, 0 } };
     uint32_t bestTransformMode[MAX_NUM_COMPONENT][2 /*0 = top (or whole TU for non-4:2:2) sub-TU, 1 = bottom sub-TU*/] = { { 0, 0 }, { 0, 0 }, { 0, 0 } };
     uint64_t minCost[MAX_NUM_COMPONENT][2 /*0 = top (or whole TU for non-4:2:2) sub-TU, 1 = bottom sub-TU*/] = { { MAX_INT64, MAX_INT64 }, {MAX_INT64, MAX_INT64}, {MAX_INT64, MAX_INT64} };
 
@@ -3083,9 +3140,11 @@
 
         //Assuming zero residual 
         sse_t zeroDistY = primitives.cu[partSize].sse_pp(fenc, fencYuv->m_size, mode.predYuv.getLumaAddr(absPartIdx), mode.predYuv.m_size);
-        uint32_t zeroPsyEnergyY = 0;
+        uint32_t zeroEnergyY = 0;
         if (m_rdCost.m_psyRd)
-            zeroPsyEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, mode.predYuv.getLumaAddr(absPartIdx), mode.predYuv.m_size);
+            zeroEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, mode.predYuv.getLumaAddr(absPartIdx), mode.predYuv.m_size);
+        else if(m_rdCost.m_ssimRd)
+            zeroEnergyY = m_quant.ssimDistortion(cu, fenc, fencYuv->m_size, mode.predYuv.getLumaAddr(absPartIdx), mode.predYuv.m_size, log2TrSize, TEXT_LUMA, absPartIdx);
 
         int16_t* curResiY = m_rqt[qtLayer].resiQtYuv.getLumaAddr(absPartIdx);
         uint32_t strideResiY = m_rqt[qtLayer].resiQtYuv.m_size;
@@ -3102,11 +3161,16 @@
 
             const sse_t nonZeroDistY = primitives.cu[partSize].sse_pp(fenc, fencYuv->m_size, curReconY, strideReconY);
             uint32_t nzCbfBitsY = m_entropyCoder.estimateCbfBits(cbfFlag[TEXT_LUMA][0], TEXT_LUMA, tuDepth);
-            uint32_t nonZeroPsyEnergyY = 0; uint64_t singleCostY = 0;
+            uint32_t nonZeroEnergyY = 0; uint64_t singleCostY = 0;
             if (m_rdCost.m_psyRd)
             {
-                nonZeroPsyEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, curReconY, strideReconY);
-                singleCostY = m_rdCost.calcPsyRdCost(nonZeroDistY, nzCbfBitsY + singleBits[TEXT_LUMA][0], nonZeroPsyEnergyY);
+                nonZeroEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, curReconY, strideReconY);
+                singleCostY = m_rdCost.calcPsyRdCost(nonZeroDistY, nzCbfBitsY + singleBits[TEXT_LUMA][0], nonZeroEnergyY);
+            }
+            else if(m_rdCost.m_ssimRd)
+            {
+                nonZeroEnergyY = m_quant.ssimDistortion(cu, fenc, fencYuv->m_size, curReconY, strideReconY, log2TrSize, TEXT_LUMA, absPartIdx);
+                singleCostY = m_rdCost.calcSsimRdCost(nonZeroDistY, nzCbfBitsY + singleBits[TEXT_LUMA][0], nonZeroEnergyY);
             }
             else
                 singleCostY = m_rdCost.calcRdCost(nonZeroDistY, nzCbfBitsY + singleBits[TEXT_LUMA][0]);
@@ -3114,14 +3178,14 @@
             if (cu.m_tqBypass[0])
             {
                 singleDist[TEXT_LUMA][0] = nonZeroDistY;
-                singlePsyEnergy[TEXT_LUMA][0] = nonZeroPsyEnergyY;
+                singleEnergy[TEXT_LUMA][0] = nonZeroEnergyY;
             }
             else
             {
                 // zero-cost calculation for luma. This is an approximation
                 // Initial cost calculation was also an approximation. First resetting the bit counter and then encoding zero cbf.
                 // Now encoding the zero cbf without writing into bitstream, keeping m_fracBits unchanged. The same is valid for chroma.
-                uint64_t nullCostY = estimateNullCbfCost(zeroDistY, zeroPsyEnergyY, tuDepth, TEXT_LUMA);
+                uint64_t nullCostY = estimateNullCbfCost(zeroDistY, zeroEnergyY, tuDepth, TEXT_LUMA);
 
                 if (nullCostY < singleCostY)
                 {
@@ -3135,25 +3199,25 @@
                     if (checkTransformSkipY)
                         minCost[TEXT_LUMA][0] = nullCostY;
                     singleDist[TEXT_LUMA][0] = zeroDistY;
-                    singlePsyEnergy[TEXT_LUMA][0] = zeroPsyEnergyY;
+                    singleEnergy[TEXT_LUMA][0] = zeroEnergyY;
                 }
                 else
                 {
                     if (checkTransformSkipY)
                         minCost[TEXT_LUMA][0] = singleCostY;
                     singleDist[TEXT_LUMA][0] = nonZeroDistY;
-                    singlePsyEnergy[TEXT_LUMA][0] = nonZeroPsyEnergyY;
+                    singleEnergy[TEXT_LUMA][0] = nonZeroEnergyY;
                 }
             }
         }
         else
         {
             if (checkTransformSkipY)
-                minCost[TEXT_LUMA][0] = estimateNullCbfCost(zeroDistY, zeroPsyEnergyY, tuDepth, TEXT_LUMA);
+                minCost[TEXT_LUMA][0] = estimateNullCbfCost(zeroDistY, zeroEnergyY, tuDepth, TEXT_LUMA);
             primitives.cu[partSize].blockfill_s(curResiY, strideResiY, 0);
             singleDist[TEXT_LUMA][0] = zeroDistY;
             singleBits[TEXT_LUMA][0] = 0;
-            singlePsyEnergy[TEXT_LUMA][0] = zeroPsyEnergyY;
+            singleEnergy[TEXT_LUMA][0] = zeroEnergyY;
         }
 
         cu.setCbfSubParts(cbfFlag[TEXT_LUMA][0] << tuDepth, TEXT_LUMA, absPartIdx, depth);
@@ -3165,7 +3229,7 @@
             for (uint32_t chromaId = TEXT_CHROMA_U; chromaId <= TEXT_CHROMA_V; chromaId++)
             {
                 sse_t zeroDistC = 0;
-                uint32_t zeroPsyEnergyC = 0;
+                uint32_t zeroEnergyC = 0;
                 coeff_t* coeffCurC = m_rqt[qtLayer].coeffRQT[chromaId] + coeffOffsetC;
                 TURecurse tuIterator(splitIntoSubTUs ? VERTICAL_SPLIT : DONT_SPLIT, absPartIdxStep, absPartIdx);
 
@@ -3193,9 +3257,11 @@
                     int16_t* curResiC = m_rqt[qtLayer].resiQtYuv.getChromaAddr(chromaId, absPartIdxC);
                     zeroDistC = m_rdCost.scaleChromaDist(chromaId, primitives.cu[log2TrSizeC - 2].sse_pp(fenc, fencYuv->m_csize, mode.predYuv.getChromaAddr(chromaId, absPartIdxC), mode.predYuv.m_csize));
 
+                    // Assuming zero residual 
                     if (m_rdCost.m_psyRd)
-                    //Assuming zero residual 
-                        zeroPsyEnergyC = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, mode.predYuv.getChromaAddr(chromaId, absPartIdxC), mode.predYuv.m_csize);
+                        zeroEnergyC = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, mode.predYuv.getChromaAddr(chromaId, absPartIdxC), mode.predYuv.m_csize);
+                    else if(m_rdCost.m_ssimRd)
+                        zeroEnergyC = m_quant.ssimDistortion(cu, fenc, fencYuv->m_csize, mode.predYuv.getChromaAddr(chromaId, absPartIdxC), mode.predYuv.m_csize, log2TrSizeC, (TextType)chromaId, absPartIdxC);
 
                     if (cbfFlag[chromaId][tuIterator.section])
                     {
@@ -3209,11 +3275,16 @@
                         primitives.cu[partSizeC].add_ps(curReconC, strideReconC, mode.predYuv.getChromaAddr(chromaId, absPartIdxC), curResiC, mode.predYuv.m_csize, strideResiC);
                         sse_t nonZeroDistC = m_rdCost.scaleChromaDist(chromaId, primitives.cu[partSizeC].sse_pp(fenc, fencYuv->m_csize, curReconC, strideReconC));
                         uint32_t nzCbfBitsC = m_entropyCoder.estimateCbfBits(cbfFlag[chromaId][tuIterator.section], (TextType)chromaId, tuDepth);
-                        uint32_t nonZeroPsyEnergyC = 0; uint64_t singleCostC = 0;
+                        uint32_t nonZeroEnergyC = 0; uint64_t singleCostC = 0;
                         if (m_rdCost.m_psyRd)
                         {
-                            nonZeroPsyEnergyC = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, curReconC, strideReconC);
-                            singleCostC = m_rdCost.calcPsyRdCost(nonZeroDistC, nzCbfBitsC + singleBits[chromaId][tuIterator.section], nonZeroPsyEnergyC);
+                            nonZeroEnergyC = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, curReconC, strideReconC);
+                            singleCostC = m_rdCost.calcPsyRdCost(nonZeroDistC, nzCbfBitsC + singleBits[chromaId][tuIterator.section], nonZeroEnergyC);
+                        }
+                        else if(m_rdCost.m_ssimRd)
+                        {
+                            nonZeroEnergyC = m_quant.ssimDistortion(cu, fenc, fencYuv->m_csize, curReconC, strideReconC, log2TrSizeC, (TextType)chromaId, absPartIdxC);
+                            singleCostC = m_rdCost.calcSsimRdCost(nonZeroDistC, nzCbfBitsC + singleBits[chromaId][tuIterator.section], nonZeroEnergyC);
                         }
                         else
                             singleCostC = m_rdCost.calcRdCost(nonZeroDistC, nzCbfBitsC + singleBits[chromaId][tuIterator.section]);
@@ -3221,12 +3292,12 @@
                         if (cu.m_tqBypass[0])
                         {
                             singleDist[chromaId][tuIterator.section] = nonZeroDistC;
-                            singlePsyEnergy[chromaId][tuIterator.section] = nonZeroPsyEnergyC;
+                            singleEnergy[chromaId][tuIterator.section] = nonZeroEnergyC;
                         }
                         else
                         {
                             //zero-cost calculation for chroma. This is an approximation
-                            uint64_t nullCostC = estimateNullCbfCost(zeroDistC, zeroPsyEnergyC, tuDepth, (TextType)chromaId);
+                            uint64_t nullCostC = estimateNullCbfCost(zeroDistC, zeroEnergyC, tuDepth, (TextType)chromaId);
 
                             if (nullCostC < singleCostC)
                             {
@@ -3240,25 +3311,25 @@
                                 if (checkTransformSkipC)
                                     minCost[chromaId][tuIterator.section] = nullCostC;
                                 singleDist[chromaId][tuIterator.section] = zeroDistC;
-                                singlePsyEnergy[chromaId][tuIterator.section] = zeroPsyEnergyC;
+                                singleEnergy[chromaId][tuIterator.section] = zeroEnergyC;
                             }
                             else
                             {
                                 if (checkTransformSkipC)
                                     minCost[chromaId][tuIterator.section] = singleCostC;
                                 singleDist[chromaId][tuIterator.section] = nonZeroDistC;
-                                singlePsyEnergy[chromaId][tuIterator.section] = nonZeroPsyEnergyC;
+                                singleEnergy[chromaId][tuIterator.section] = nonZeroEnergyC;
                             }
                         }
                     }
                     else
                     {
                         if (checkTransformSkipC)
-                            minCost[chromaId][tuIterator.section] = estimateNullCbfCost(zeroDistC, zeroPsyEnergyC, tuDepthC, (TextType)chromaId);
+                            minCost[chromaId][tuIterator.section] = estimateNullCbfCost(zeroDistC, zeroEnergyC, tuDepthC, (TextType)chromaId);
                         primitives.cu[partSizeC].blockfill_s(curResiC, strideResiC, 0);
                         singleBits[chromaId][tuIterator.section] = 0;
                         singleDist[chromaId][tuIterator.section] = zeroDistC;
-                        singlePsyEnergy[chromaId][tuIterator.section] = zeroPsyEnergyC;
+                        singleEnergy[chromaId][tuIterator.section] = zeroEnergyC;
                     }
 
                     cu.setCbfPartRange(cbfFlag[chromaId][tuIterator.section] << tuDepth, (TextType)chromaId, absPartIdxC, tuIterator.absPartIdxStep);
@@ -3283,7 +3354,7 @@
         if (checkTransformSkipY)
         {
             sse_t nonZeroDistY = 0;
-            uint32_t nonZeroPsyEnergyY = 0;
+            uint32_t nonZeroEnergyY = 0;
             uint64_t singleCostY = MAX_INT64;
 
             m_entropyCoder.load(m_rqt[depth].rqtRoot);
@@ -3311,8 +3382,13 @@
 
                 if (m_rdCost.m_psyRd)
                 {
-                    nonZeroPsyEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, m_tsRecon, trSize);
-                    singleCostY = m_rdCost.calcPsyRdCost(nonZeroDistY, skipSingleBitsY, nonZeroPsyEnergyY);
+                    nonZeroEnergyY = m_rdCost.psyCost(partSize, fenc, fencYuv->m_size, m_tsRecon, trSize);
+                    singleCostY = m_rdCost.calcPsyRdCost(nonZeroDistY, skipSingleBitsY, nonZeroEnergyY);
+                }
+                else if(m_rdCost.m_ssimRd)
+                {
+                    nonZeroEnergyY = m_quant.ssimDistortion(cu, fenc, fencYuv->m_size, m_tsRecon, trSize, log2TrSize, TEXT_LUMA, absPartIdx);
+                    singleCostY = m_rdCost.calcSsimRdCost(nonZeroDistY, skipSingleBitsY, nonZeroEnergyY);
                 }
                 else
                     singleCostY = m_rdCost.calcRdCost(nonZeroDistY, skipSingleBitsY);
@@ -3323,7 +3399,7 @@
             else
             {
                 singleDist[TEXT_LUMA][0] = nonZeroDistY;
-                singlePsyEnergy[TEXT_LUMA][0] = nonZeroPsyEnergyY;
+                singleEnergy[TEXT_LUMA][0] = nonZeroEnergyY;
                 cbfFlag[TEXT_LUMA][0] = !!numSigTSkipY;
                 bestTransformMode[TEXT_LUMA][0] = 1;
                 if (m_param->limitTU)
@@ -3339,7 +3415,7 @@
         if (codeChroma && checkTransformSkipC)
         {
             sse_t nonZeroDistC = 0;
-            uint32_t nonZeroPsyEnergyC = 0;
+            uint32_t nonZeroEnergyC = 0;
             uint64_t singleCostC = MAX_INT64;
             uint32_t strideResiC = m_rqt[qtLayer].resiQtYuv.m_csize;
             uint32_t coeffOffsetC = coeffOffsetY >> (m_hChromaShift + m_vChromaShift);
@@ -3382,9 +3458,13 @@
                         nonZeroDistC = m_rdCost.scaleChromaDist(chromaId, primitives.cu[partSizeC].sse_pp(fenc, fencYuv->m_csize, m_tsRecon, trSizeC));
                         if (m_rdCost.m_psyRd)
                         {
-
-                            nonZeroPsyEnergyC = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, m_tsRecon, trSizeC);
-                            singleCostC = m_rdCost.calcPsyRdCost(nonZeroDistC, singleBits[chromaId][tuIterator.section], nonZeroPsyEnergyC);
+                            nonZeroEnergyC = m_rdCost.psyCost(partSizeC, fenc, fencYuv->m_csize, m_tsRecon, trSizeC);
+                            singleCostC = m_rdCost.calcPsyRdCost(nonZeroDistC, singleBits[chromaId][tuIterator.section], nonZeroEnergyC);
+                        }
+                        else if(m_rdCost.m_ssimRd)
+                        {
+                            nonZeroEnergyC = m_quant.ssimDistortion(cu, fenc, mode.fencYuv->m_csize, m_tsRecon, trSizeC, log2TrSizeC, (TextType)chromaId, absPartIdxC);
+                            singleCostC = m_rdCost.calcSsimRdCost(nonZeroDistC, singleBits[chromaId][tuIterator.section], nonZeroEnergyC);
                         }
                         else
                             singleCostC = m_rdCost.calcRdCost(nonZeroDistC, singleBits[chromaId][tuIterator.section]);
@@ -3395,7 +3475,7 @@
                     else
                     {
                         singleDist[chromaId][tuIterator.section] = nonZeroDistC;
-                        singlePsyEnergy[chromaId][tuIterator.section] = nonZeroPsyEnergyC;
+                        singleEnergy[chromaId][tuIterator.section] = nonZeroEnergyC;
                         cbfFlag[chromaId][tuIterator.section] = !!numSigTSkipC;
                         bestTransformMode[chromaId][tuIterator.section] = 1;
                         uint32_t numCoeffC = 1 << (log2TrSizeC << 1);
@@ -3454,7 +3534,7 @@
         fullCost.bits = bSplitPresentFlag ? cbfBits + coeffBits : coeffBits;
 
         fullCost.distortion += singleDist[TEXT_LUMA][0];
-        fullCost.energy += singlePsyEnergy[TEXT_LUMA][0];// need to check we need to add chroma also
+        fullCost.energy += singleEnergy[TEXT_LUMA][0];// need to check we need to add chroma also
         for (uint32_t subTUIndex = 0; subTUIndex < 2; subTUIndex++)
         {
             fullCost.distortion += singleDist[TEXT_CHROMA_U][subTUIndex];
@@ -3463,6 +3543,8 @@
 
         if (m_rdCost.m_psyRd)
             fullCost.rdcost = m_rdCost.calcPsyRdCost(fullCost.distortion, fullCost.bits, fullCost.energy);
+        else if(m_rdCost.m_ssimRd)
+            fullCost.rdcost = m_rdCost.calcSsimRdCost(fullCost.distortion, fullCost.bits, fullCost.energy);
         else
             fullCost.rdcost = m_rdCost.calcRdCost(fullCost.distortion, fullCost.bits);
 
​

x265_2.2.tar.gz/source/encoder/search.h -> x265_2.3.tar.gz/source/encoder/search.h Changed

@@ -118,6 +118,7 @@
     uint64_t    sa8dCost;   // sum of partition sa8d distortion costs   (sa8d(fenc, pred) + lambda * bits)
     uint32_t    sa8dBits;   // signal bits used in sa8dCost calculation
     uint32_t    psyEnergy;  // sum of partition psycho-visual energy difference
+    uint32_t    ssimEnergy;
     sse_t   resEnergy;  // sum of partition residual energy after motion prediction
     sse_t   lumaDistortion;
     sse_t   chromaDistortion;
@@ -132,6 +133,7 @@
         sa8dCost = 0;
         sa8dBits = 0;
         psyEnergy = 0;
+        ssimEnergy = 0;
         resEnergy = 0;
         lumaDistortion = 0;
         chromaDistortion = 0;
@@ -147,6 +149,7 @@
         sa8dCost += subMode.sa8dCost;
         sa8dBits += subMode.sa8dBits;
         psyEnergy += subMode.psyEnergy;
+        ssimEnergy += subMode.ssimEnergy;
         resEnergy += subMode.resEnergy;
         lumaDistortion += subMode.lumaDistortion;
         chromaDistortion += subMode.chromaDistortion;
@@ -390,7 +393,7 @@
         Entropy rqtStore[NUM_SUBPART];
     } m_cacheTU;
 
-    uint64_t estimateNullCbfCost(sse_t dist, uint32_t psyEnergy, uint32_t tuDepth, TextType compId);
+    uint64_t estimateNullCbfCost(sse_t dist, uint32_t energy, uint32_t tuDepth, TextType compId);
     bool     splitTU(Mode& mode, const CUGeom& cuGeom, uint32_t absPartIdx, uint32_t tuDepth, ShortYuv& resiYuv, Cost& splitCost, const uint32_t depthRange[2], int32_t splitMore);
     void     estimateResidualQT(Mode& mode, const CUGeom& cuGeom, uint32_t absPartIdx, uint32_t depth, ShortYuv& resiYuv, Cost& costs, const uint32_t depthRange[2], int32_t splitMore = -1);
 
@@ -430,7 +433,9 @@
     // get most probable luma modes for CU part, and bit cost of all non mpm modes
     uint32_t getIntraRemModeBits(CUData & cu, uint32_t absPartIdx, uint32_t mpmModes[3], uint64_t& mpms) const;
 
-    void updateModeCost(Mode& m) const { m.rdCost = m_rdCost.m_psyRd ? m_rdCost.calcPsyRdCost(m.distortion, m.totalBits, m.psyEnergy) : m_rdCost.calcRdCost(m.distortion, m.totalBits); }
+    void updateModeCost(Mode& m) const { m.rdCost = m_rdCost.m_psyRd ? m_rdCost.calcPsyRdCost(m.distortion, m.totalBits, m.psyEnergy)
+                                                : (m_rdCost.m_ssimRd ? m_rdCost.calcSsimRdCost(m.distortion, m.totalBits, m.ssimEnergy) 
+                                                : m_rdCost.calcRdCost(m.distortion, m.totalBits)); }
 };
 }

 
@@ -118,6 +118,7 @@
     uint64_t    sa8dCost;   // sum of partition sa8d distortion costs   (sa8d(fenc, pred) + lambda * bits)
     uint32_t    sa8dBits;   // signal bits used in sa8dCost calculation
     uint32_t    psyEnergy;  // sum of partition psycho-visual energy difference
+    uint32_t    ssimEnergy;
     sse_t   resEnergy;  // sum of partition residual energy after motion prediction
     sse_t   lumaDistortion;
     sse_t   chromaDistortion;
@@ -132,6 +133,7 @@
         sa8dCost = 0;
         sa8dBits = 0;
         psyEnergy = 0;
+        ssimEnergy = 0;
         resEnergy = 0;
         lumaDistortion = 0;
         chromaDistortion = 0;
@@ -147,6 +149,7 @@
         sa8dCost += subMode.sa8dCost;
         sa8dBits += subMode.sa8dBits;
         psyEnergy += subMode.psyEnergy;
+        ssimEnergy += subMode.ssimEnergy;
         resEnergy += subMode.resEnergy;
         lumaDistortion += subMode.lumaDistortion;
         chromaDistortion += subMode.chromaDistortion;
@@ -390,7 +393,7 @@
         Entropy rqtStore[NUM_SUBPART];
     } m_cacheTU;
 
-    uint64_t estimateNullCbfCost(sse_t dist, uint32_t psyEnergy, uint32_t tuDepth, TextType compId);
+    uint64_t estimateNullCbfCost(sse_t dist, uint32_t energy, uint32_t tuDepth, TextType compId);
     bool     splitTU(Mode& mode, const CUGeom& cuGeom, uint32_t absPartIdx, uint32_t tuDepth, ShortYuv& resiYuv, Cost& splitCost, const uint32_t depthRange[2], int32_t splitMore);
     void     estimateResidualQT(Mode& mode, const CUGeom& cuGeom, uint32_t absPartIdx, uint32_t depth, ShortYuv& resiYuv, Cost& costs, const uint32_t depthRange[2], int32_t splitMore = -1);
 
@@ -430,7 +433,9 @@
     // get most probable luma modes for CU part, and bit cost of all non mpm modes
     uint32_t getIntraRemModeBits(CUData & cu, uint32_t absPartIdx, uint32_t mpmModes[3], uint64_t& mpms) const;
 
-    void updateModeCost(Mode& m) const { m.rdCost = m_rdCost.m_psyRd ? m_rdCost.calcPsyRdCost(m.distortion, m.totalBits, m.psyEnergy) : m_rdCost.calcRdCost(m.distortion, m.totalBits); }
+    void updateModeCost(Mode& m) const { m.rdCost = m_rdCost.m_psyRd ? m_rdCost.calcPsyRdCost(m.distortion, m.totalBits, m.psyEnergy)
+                                                : (m_rdCost.m_ssimRd ? m_rdCost.calcSsimRdCost(m.distortion, m.totalBits, m.ssimEnergy) 
+                                                : m_rdCost.calcRdCost(m.distortion, m.totalBits)); }
 };
 }
 
​

x265_2.2.tar.gz/source/encoder/slicetype.cpp -> x265_2.3.tar.gz/source/encoder/slicetype.cpp Changed

@@ -563,7 +563,7 @@
     m_lastKeyframe = -m_param->keyframeMax;
     m_sliceTypeBusy = false;
     m_fullQueueSize = X265_MAX(1, m_param->lookaheadDepth);
-    m_bAdaptiveQuant = m_param->rc.aqMode || m_param->bEnableWeightedPred || m_param->bEnableWeightedBiPred;
+    m_bAdaptiveQuant = m_param->rc.aqMode || m_param->bEnableWeightedPred || m_param->bEnableWeightedBiPred || m_param->bAQMotion;
 
     /* If we have a thread pool and are using --b-adapt 2, it is generally
      * preferable to perform all motion searches for each lowres frame in large
@@ -656,8 +656,12 @@
         if (wait)
             m_outputSignal.wait();
     }
+    if (m_pool && m_param->lookaheadThreads > 0)
+    {
+        for (int i = 0; i < m_numPools; i++)
+            m_pool[i].stopWorkers();
+    }
 }
-
 void Lookahead::destroy()
 {
     // these two queues will be empty unless the encode was aborted
@@ -676,10 +680,10 @@
     }
 
     X265_FREE(m_scratch);
-
     delete [] m_tld;
+    if (m_param->lookaheadThreads > 0)
+        delete [] m_pool;
 }
-
 /* The synchronization of slicetypeDecide is managed here.  The findJob() method
  * polls the occupancy of the input queue. If the queue is
  * full, it will run slicetypeDecide() and output a mini-gop of frames to the
@@ -868,7 +872,7 @@
         uint32_t widthInLowresCu = (uint32_t)m_8x8Width, heightInLowresCu = (uint32_t)m_8x8Height;
         double *qp_offset = 0;
         /* Factor in qpoffsets based on Aq/Cutree in CU costs */
-        if (m_param->rc.aqMode)
+        if (m_param->rc.aqMode || m_param->bAQMotion)
             qp_offset = (frames[b]->sliceType == X265_TYPE_B || !m_param->rc.cuTree) ? frames[b]->qpAqOffset : frames[b]->qpCuTreeOffset;
 
         for (uint32_t row = 0; row < numCuInHeight; row++)
@@ -1262,7 +1266,7 @@
     CostEstimateGroup estGroup(*this, frames);
     int64_t cost = estGroup.singleCost(p0, p1, b);
 
-    if (m_param->rc.aqMode)
+    if (m_param->rc.aqMode || m_param->bAQMotion)
     {
         if (m_param->rc.cuTree)
             return frameCostRecalculate(frames, p0, p1, b);
@@ -1491,6 +1495,8 @@
 
         resetStart = bKeyframe ? 1 : 2;
     }
+    if (m_param->bAQMotion)
+        aqMotion(frames, bKeyframe);
 
     if (m_param->rc.cuTree)
         cuTree(frames, X265_MIN(numFrames, m_param->keyframeMax), bKeyframe);
@@ -1720,6 +1726,88 @@
 
     return cost;
 }
+void Lookahead::aqMotion(Lowres **frames, bool bIntra)
+{
+    if (!bIntra)
+    {
+        int curnonb = 0, lastnonb = 1;
+        int bframes = 0, i = 1;
+        while (frames[lastnonb]->sliceType != X265_TYPE_P)
+            lastnonb++;
+        bframes = lastnonb - 1;
+        if (m_param->bBPyramid && bframes > 1)
+        {
+            int middle = (bframes + 1) / 2;
+            for (i = 1; i < lastnonb; i++)
+            {
+                int p0 = i > middle ? middle : curnonb;
+                int p1 = i < middle ? middle : lastnonb;
+                if (i != middle)
+                    calcMotionAdaptiveQuantFrame(frames, p0, p1, i);
+            }
+            calcMotionAdaptiveQuantFrame(frames, curnonb, lastnonb, middle);
+        }
+        else
+            for (i = 1; i < lastnonb; i++)
+                calcMotionAdaptiveQuantFrame(frames, curnonb, lastnonb, i);
+        calcMotionAdaptiveQuantFrame(frames, curnonb, lastnonb, lastnonb);
+    }
+}
+
+void Lookahead::calcMotionAdaptiveQuantFrame(Lowres **frames, int p0, int p1, int b)
+{
+    int listDist[2] = { b - p0 - 1, p1 - b - 1 };
+    int32_t strideInCU = m_8x8Width;
+    double qp_adj = 0, avg_adj = 0, avg_adj_pow2 = 0, sd;
+    for (uint16_t blocky = 0; blocky < m_8x8Height; blocky++)
+    {
+        int cuIndex = blocky * strideInCU;
+        for (uint16_t blockx = 0; blockx < m_8x8Width; blockx++, cuIndex++)
+        {
+            int32_t lists_used = frames[b]->lowresCosts[b - p0][p1 - b][cuIndex] >> LOWRES_COST_SHIFT;
+            double displacement = 0;
+            for (uint16_t list = 0; list < 2; list++)
+            {
+                if ((lists_used >> list) & 1)
+                {
+                    MV *mvs = frames[b]->lowresMvs[list][listDist[list]];
+                    int32_t x = mvs[cuIndex].x;
+                    int32_t y = mvs[cuIndex].y;
+                    displacement += sqrt(pow(abs(x), 2) + pow(abs(y), 2));
+                }
+                else
+                    displacement += 0.0;
+            }
+            if (lists_used == 3)
+                displacement = displacement / 2;
+            qp_adj = pow(displacement, 0.1);
+            frames[b]->qpAqMotionOffset[cuIndex] = qp_adj;
+            avg_adj += qp_adj;
+            avg_adj_pow2 += qp_adj * qp_adj;
+        }
+    }
+    avg_adj /= m_cuCount;
+    avg_adj_pow2 /= m_cuCount;
+    sd = sqrt((avg_adj_pow2 - (avg_adj * avg_adj)));
+    if (sd > 0)
+    {
+        for (uint16_t blocky = 0; blocky < m_8x8Height; blocky++)
+        {
+            int cuIndex = blocky * strideInCU;
+            for (uint16_t blockx = 0; blockx < m_8x8Width; blockx++, cuIndex++)
+            {
+                qp_adj = frames[b]->qpAqMotionOffset[cuIndex];
+                qp_adj = (qp_adj - avg_adj) / sd;
+                if (qp_adj > 1)
+                {
+                    frames[b]->qpAqOffset[cuIndex] += qp_adj;
+                    frames[b]->qpCuTreeOffset[cuIndex] += qp_adj;
+                    frames[b]->invQscaleFactor[cuIndex] += x265_exp2fix8(qp_adj);
+                }
+            }
+        }
+    }
+}
 
 void Lookahead::cuTree(Lowres **frames, int numframes, bool bIntra)
 {

 
@@ -563,7 +563,7 @@
     m_lastKeyframe = -m_param->keyframeMax;
     m_sliceTypeBusy = false;
     m_fullQueueSize = X265_MAX(1, m_param->lookaheadDepth);
-    m_bAdaptiveQuant = m_param->rc.aqMode || m_param->bEnableWeightedPred || m_param->bEnableWeightedBiPred;
+    m_bAdaptiveQuant = m_param->rc.aqMode || m_param->bEnableWeightedPred || m_param->bEnableWeightedBiPred || m_param->bAQMotion;
 
     /* If we have a thread pool and are using --b-adapt 2, it is generally
      * preferable to perform all motion searches for each lowres frame in large
@@ -656,8 +656,12 @@
         if (wait)
             m_outputSignal.wait();
     }
+    if (m_pool && m_param->lookaheadThreads > 0)
+    {
+        for (int i = 0; i < m_numPools; i++)
+            m_pool[i].stopWorkers();
+    }
 }
-
 void Lookahead::destroy()
 {
     // these two queues will be empty unless the encode was aborted
@@ -676,10 +680,10 @@
     }
 
     X265_FREE(m_scratch);
-
     delete [] m_tld;
+    if (m_param->lookaheadThreads > 0)
+        delete [] m_pool;
 }
-
 /* The synchronization of slicetypeDecide is managed here.  The findJob() method
  * polls the occupancy of the input queue. If the queue is
  * full, it will run slicetypeDecide() and output a mini-gop of frames to the
@@ -868,7 +872,7 @@
         uint32_t widthInLowresCu = (uint32_t)m_8x8Width, heightInLowresCu = (uint32_t)m_8x8Height;
         double *qp_offset = 0;
         /* Factor in qpoffsets based on Aq/Cutree in CU costs */
-        if (m_param->rc.aqMode)
+        if (m_param->rc.aqMode || m_param->bAQMotion)
             qp_offset = (frames[b]->sliceType == X265_TYPE_B || !m_param->rc.cuTree) ? frames[b]->qpAqOffset : frames[b]->qpCuTreeOffset;
 
         for (uint32_t row = 0; row < numCuInHeight; row++)
@@ -1262,7 +1266,7 @@
     CostEstimateGroup estGroup(*this, frames);
     int64_t cost = estGroup.singleCost(p0, p1, b);
 
-    if (m_param->rc.aqMode)
+    if (m_param->rc.aqMode || m_param->bAQMotion)
     {
         if (m_param->rc.cuTree)
             return frameCostRecalculate(frames, p0, p1, b);
@@ -1491,6 +1495,8 @@
 
         resetStart = bKeyframe ? 1 : 2;
     }
+    if (m_param->bAQMotion)
+        aqMotion(frames, bKeyframe);
 
     if (m_param->rc.cuTree)
         cuTree(frames, X265_MIN(numFrames, m_param->keyframeMax), bKeyframe);
@@ -1720,6 +1726,88 @@
 
     return cost;
 }
+void Lookahead::aqMotion(Lowres **frames, bool bIntra)
+{
+    if (!bIntra)
+    {
+        int curnonb = 0, lastnonb = 1;
+        int bframes = 0, i = 1;
+        while (frames[lastnonb]->sliceType != X265_TYPE_P)
+            lastnonb++;
+        bframes = lastnonb - 1;
+        if (m_param->bBPyramid && bframes > 1)
+        {
+            int middle = (bframes + 1) / 2;
+            for (i = 1; i < lastnonb; i++)
+            {
+                int p0 = i > middle ? middle : curnonb;
+                int p1 = i < middle ? middle : lastnonb;
+                if (i != middle)
+                    calcMotionAdaptiveQuantFrame(frames, p0, p1, i);
+            }
+            calcMotionAdaptiveQuantFrame(frames, curnonb, lastnonb, middle);
+        }
+        else
+            for (i = 1; i < lastnonb; i++)
+                calcMotionAdaptiveQuantFrame(frames, curnonb, lastnonb, i);
+        calcMotionAdaptiveQuantFrame(frames, curnonb, lastnonb, lastnonb);
+    }
+}
+
+void Lookahead::calcMotionAdaptiveQuantFrame(Lowres **frames, int p0, int p1, int b)
+{
+    int listDist[2] = { b - p0 - 1, p1 - b - 1 };
+    int32_t strideInCU = m_8x8Width;
+    double qp_adj = 0, avg_adj = 0, avg_adj_pow2 = 0, sd;
+    for (uint16_t blocky = 0; blocky < m_8x8Height; blocky++)
+    {
+        int cuIndex = blocky * strideInCU;
+        for (uint16_t blockx = 0; blockx < m_8x8Width; blockx++, cuIndex++)
+        {
+            int32_t lists_used = frames[b]->lowresCosts[b - p0][p1 - b][cuIndex] >> LOWRES_COST_SHIFT;
+            double displacement = 0;
+            for (uint16_t list = 0; list < 2; list++)
+            {
+                if ((lists_used >> list) & 1)
+                {
+                    MV *mvs = frames[b]->lowresMvs[list][listDist[list]];
+                    int32_t x = mvs[cuIndex].x;
+                    int32_t y = mvs[cuIndex].y;
+                    displacement += sqrt(pow(abs(x), 2) + pow(abs(y), 2));
+                }
+                else
+                    displacement += 0.0;
+            }
+            if (lists_used == 3)
+                displacement = displacement / 2;
+            qp_adj = pow(displacement, 0.1);
+            frames[b]->qpAqMotionOffset[cuIndex] = qp_adj;
+            avg_adj += qp_adj;
+            avg_adj_pow2 += qp_adj * qp_adj;
+        }
+    }
+    avg_adj /= m_cuCount;
+    avg_adj_pow2 /= m_cuCount;
+    sd = sqrt((avg_adj_pow2 - (avg_adj * avg_adj)));
+    if (sd > 0)
+    {
+        for (uint16_t blocky = 0; blocky < m_8x8Height; blocky++)
+        {
+            int cuIndex = blocky * strideInCU;
+            for (uint16_t blockx = 0; blockx < m_8x8Width; blockx++, cuIndex++)
+            {
+                qp_adj = frames[b]->qpAqMotionOffset[cuIndex];
+                qp_adj = (qp_adj - avg_adj) / sd;
+                if (qp_adj > 1)
+                {
+                    frames[b]->qpAqOffset[cuIndex] += qp_adj;
+                    frames[b]->qpCuTreeOffset[cuIndex] += qp_adj;
+                    frames[b]->invQscaleFactor[cuIndex] += x265_exp2fix8(qp_adj);
+                }
+            }
+        }
+    }
+}
 
 void Lookahead::cuTree(Lowres **frames, int numframes, bool bIntra)
 {
​

x265_2.2.tar.gz/source/encoder/slicetype.h -> x265_2.3.tar.gz/source/encoder/slicetype.h Changed

 
@@ -129,8 +129,8 @@
     bool          m_bBatchFrameCosts;
     bool          m_filled;
     bool          m_isSceneTransition;
+    int           m_numPools;
     Lookahead(x265_param *param, ThreadPool *pool);
-
 #if DETAILED_CU_STATS
     int64_t       m_slicetypeDecideElapsedTime;
     int64_t       m_preLookaheadElapsedTime;
@@ -165,7 +165,8 @@
     int64_t slicetypePathCost(Lowres **frames, char *path, int64_t threshold);
     int64_t vbvFrameCost(Lowres **frames, int p0, int p1, int b);
     void    vbvLookahead(Lowres **frames, int numFrames, int keyframes);
-
+    void    aqMotion(Lowres **frames, bool bintra);
+    void    calcMotionAdaptiveQuantFrame(Lowres **frames, int p0, int p1, int b);
     /* called by slicetypeAnalyse() to effect cuTree adjustments to adaptive
      * quant offsets */
     void    cuTree(Lowres **frames, int numframes, bool bintra);
​

x265_2.2.tar.gz/source/test/rate-control-tests.txt -> x265_2.3.tar.gz/source/test/rate-control-tests.txt Changed

@@ -24,6 +24,7 @@
 BasketballDrive_1920x1080_50.y4m,--preset ultrafast --bitrate 3000 --vbv-bufsize 3000 --vbv-maxrate 3000 --no-wpp
 big_buck_bunny_360p24.y4m,--preset medium --bitrate 400 --vbv-bufsize 600 --vbv-maxrate 600 --no-wpp --aud --hrd --tune fast-decode
 sita_1920x1080_30.yuv,--preset superfast --bitrate 3000 --vbv-bufsize 3000 --vbv-maxrate 3000 --aud --strict-cbr --no-wpp
+sintel_trailer_2k_480p24.y4m, --preset slow --crf 24 --vbv-bufsize 150 --vbv-maxrate 150 --dynamic-rd 1.53
 
 
 
@@ -43,3 +44,8 @@
 RaceHorses_416x240_30_10bit.yuv,--preset medium --crf 26 --vbv-maxrate 1000 --vbv-bufsize 1000 --pass 1,--preset fast --bitrate 1000  --vbv-maxrate 1000 --vbv-bufsize 700 --pass 3 -F4,--preset slow --bitrate 500 --vbv-maxrate 500  --vbv-bufsize 700 --pass 2 -F4
 sita_1920x1080_30.yuv, --preset ultrafast --crf 20 --no-cutree --keyint 50 --min-keyint 50 --no-open-gop --pass 1 --vbv-bufsize 7000 --vbv-maxrate 5000, --preset ultrafast --crf 20 --no-cutree --keyint 50 --min-keyint 50 --no-open-gop --pass 2 --vbv-bufsize 7000 --vbv-maxrate 5000 --repeat-headers
 sita_1920x1080_30.yuv, --preset medium --crf 20 --no-cutree --keyint 50 --min-keyint 50 --no-open-gop --pass 1 --vbv-bufsize 7000 --vbv-maxrate 5000 --repeat-headers --multi-pass-opt-rps, --preset medium --crf 20 --no-cutree --keyint 50 --min-keyint 50 --no-open-gop --pass 2 --vbv-bufsize 7000 --vbv-maxrate 5000 --repeat-headers --multi-pass-opt-rps
+
+# multi-pass rate control and analysis
+ducks_take_off_1080p50.y4m,--bitrate 6000 --pass 1  --multi-pass-opt-analysis  --hash 1 --ssim --psnr, --bitrate 6000 --pass 2  --multi-pass-opt-analysis  --hash 1 --ssim --psnr
+big_buck_bunny_360p24.y4m,--preset veryslow --bitrate 600 --pass 1  --multi-pass-opt-analysis  --multi-pass-opt-distortion --hash 1 --ssim --psnr, --preset veryslow --bitrate 600 --pass 2  --multi-pass-opt-analysis  --multi-pass-opt-distortion --hash 1 --ssim --psnr
+parkrun_ter_720p50.y4m, --bitrate 3500 --pass 1 --multi-pass-opt-distortion --hash 1 --ssim --psnr, --bitrate 3500 --pass 3 --multi-pass-opt-distortion --hash 1 --ssim --psnr, --bitrate 3500 --pass 2 --multi-pass-opt-distortion --hash 1 --ssim --psnr

 
@@ -24,6 +24,7 @@
 BasketballDrive_1920x1080_50.y4m,--preset ultrafast --bitrate 3000 --vbv-bufsize 3000 --vbv-maxrate 3000 --no-wpp
 big_buck_bunny_360p24.y4m,--preset medium --bitrate 400 --vbv-bufsize 600 --vbv-maxrate 600 --no-wpp --aud --hrd --tune fast-decode
 sita_1920x1080_30.yuv,--preset superfast --bitrate 3000 --vbv-bufsize 3000 --vbv-maxrate 3000 --aud --strict-cbr --no-wpp
+sintel_trailer_2k_480p24.y4m, --preset slow --crf 24 --vbv-bufsize 150 --vbv-maxrate 150 --dynamic-rd 1.53
 
 
 
@@ -43,3 +44,8 @@
 RaceHorses_416x240_30_10bit.yuv,--preset medium --crf 26 --vbv-maxrate 1000 --vbv-bufsize 1000 --pass 1,--preset fast --bitrate 1000  --vbv-maxrate 1000 --vbv-bufsize 700 --pass 3 -F4,--preset slow --bitrate 500 --vbv-maxrate 500  --vbv-bufsize 700 --pass 2 -F4
 sita_1920x1080_30.yuv, --preset ultrafast --crf 20 --no-cutree --keyint 50 --min-keyint 50 --no-open-gop --pass 1 --vbv-bufsize 7000 --vbv-maxrate 5000, --preset ultrafast --crf 20 --no-cutree --keyint 50 --min-keyint 50 --no-open-gop --pass 2 --vbv-bufsize 7000 --vbv-maxrate 5000 --repeat-headers
 sita_1920x1080_30.yuv, --preset medium --crf 20 --no-cutree --keyint 50 --min-keyint 50 --no-open-gop --pass 1 --vbv-bufsize 7000 --vbv-maxrate 5000 --repeat-headers --multi-pass-opt-rps, --preset medium --crf 20 --no-cutree --keyint 50 --min-keyint 50 --no-open-gop --pass 2 --vbv-bufsize 7000 --vbv-maxrate 5000 --repeat-headers --multi-pass-opt-rps
+
+# multi-pass rate control and analysis
+ducks_take_off_1080p50.y4m,--bitrate 6000 --pass 1  --multi-pass-opt-analysis  --hash 1 --ssim --psnr, --bitrate 6000 --pass 2  --multi-pass-opt-analysis  --hash 1 --ssim --psnr
+big_buck_bunny_360p24.y4m,--preset veryslow --bitrate 600 --pass 1  --multi-pass-opt-analysis  --multi-pass-opt-distortion --hash 1 --ssim --psnr, --preset veryslow --bitrate 600 --pass 2  --multi-pass-opt-analysis  --multi-pass-opt-distortion --hash 1 --ssim --psnr
+parkrun_ter_720p50.y4m, --bitrate 3500 --pass 1 --multi-pass-opt-distortion --hash 1 --ssim --psnr, --bitrate 3500 --pass 3 --multi-pass-opt-distortion --hash 1 --ssim --psnr, --bitrate 3500 --pass 2 --multi-pass-opt-distortion --hash 1 --ssim --psnr
​

x265_2.2.tar.gz/source/test/regression-tests.txt -> x265_2.3.tar.gz/source/test/regression-tests.txt Changed

@@ -43,6 +43,8 @@
 CrowdRun_1920x1080_50_10bit_444.yuv,--preset veryfast --temporal-layers --repeat-headers --limit-refs 2
 CrowdRun_1920x1080_50_10bit_444.yuv,--preset medium --dither --keyint -1 --rdoq-level 1 --limit-modes
 CrowdRun_1920x1080_50_10bit_444.yuv,--preset veryslow --tskip --tskip-fast --no-scenecut --limit-tu 1
+CrowdRun_1920x1080_50_10bit_444.yuv,--preset veryslow --aq-mode 3 --aq-strength 1.5 --aq-motion --bitrate 5000
+CrowdRun_1920x1080_50_10bit_444.yuv,--preset veryslow --aq-mode 3 --aq-strength 1.5 --no-psy-rd --ssim-rd
 DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset superfast --weightp --qg-size 16
 DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset medium --tune psnr --bframes 16 --limit-modes
 DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset slow --temporal-layers --no-psy-rd --qg-size 32 --limit-refs 0 --cu-lossless
@@ -71,6 +73,7 @@
 News-4k.y4m,--preset superfast --lookahead-slices 6 --aq-mode 0
 News-4k.y4m,--preset superfast --slices 4 --aq-mode 0 
 News-4k.y4m,--preset medium --tune ssim --no-sao --qg-size 16
+News-4k.y4m,--preset slower --opt-cu-delta-qp
 News-4k.y4m,--preset veryslow --no-rskip
 News-4k.y4m,--preset veryslow --pme --crf 40
 OldTownCross_1920x1080_50_10bit_422.yuv,--preset superfast --weightp
@@ -100,6 +103,7 @@
 city_4cif_60fps.y4m,--preset superfast --rdpenalty 1 --tu-intra-depth 2
 city_4cif_60fps.y4m,--preset medium --crf 4 --cu-lossless --sao-non-deblock
 city_4cif_60fps.y4m,--preset slower --scaling-list default
+city_4cif_60fps.y4m,--preset veryslow --opt-cu-delta-qp
 city_4cif_60fps.y4m,--preset veryslow --rdpenalty 2 --sao-non-deblock --no-b-intra --limit-refs 0
 ducks_take_off_420_720p50.y4m,--preset ultrafast --constrained-intra --rd 1
 ducks_take_off_444_720p50.y4m,--preset superfast --weightp --limit-refs 2
@@ -153,6 +157,6 @@
 CrowdRun_1920x1080_50_10bit_422.yuv,--preset fast --interlace bff
 
 #SEA Implementation Test
-silent_cif_420.y4m,--preset veryslow --me 4
-big_buck_bunny_360p24.y4m,--preset superfast --me 4
+silent_cif_420.y4m,--preset veryslow --me sea
+big_buck_bunny_360p24.y4m,--preset superfast --me sea
 # vim: tw=200

 
@@ -43,6 +43,8 @@
 CrowdRun_1920x1080_50_10bit_444.yuv,--preset veryfast --temporal-layers --repeat-headers --limit-refs 2
 CrowdRun_1920x1080_50_10bit_444.yuv,--preset medium --dither --keyint -1 --rdoq-level 1 --limit-modes
 CrowdRun_1920x1080_50_10bit_444.yuv,--preset veryslow --tskip --tskip-fast --no-scenecut --limit-tu 1
+CrowdRun_1920x1080_50_10bit_444.yuv,--preset veryslow --aq-mode 3 --aq-strength 1.5 --aq-motion --bitrate 5000
+CrowdRun_1920x1080_50_10bit_444.yuv,--preset veryslow --aq-mode 3 --aq-strength 1.5 --no-psy-rd --ssim-rd
 DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset superfast --weightp --qg-size 16
 DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset medium --tune psnr --bframes 16 --limit-modes
 DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset slow --temporal-layers --no-psy-rd --qg-size 32 --limit-refs 0 --cu-lossless
@@ -71,6 +73,7 @@
 News-4k.y4m,--preset superfast --lookahead-slices 6 --aq-mode 0
 News-4k.y4m,--preset superfast --slices 4 --aq-mode 0 
 News-4k.y4m,--preset medium --tune ssim --no-sao --qg-size 16
+News-4k.y4m,--preset slower --opt-cu-delta-qp
 News-4k.y4m,--preset veryslow --no-rskip
 News-4k.y4m,--preset veryslow --pme --crf 40
 OldTownCross_1920x1080_50_10bit_422.yuv,--preset superfast --weightp
@@ -100,6 +103,7 @@
 city_4cif_60fps.y4m,--preset superfast --rdpenalty 1 --tu-intra-depth 2
 city_4cif_60fps.y4m,--preset medium --crf 4 --cu-lossless --sao-non-deblock
 city_4cif_60fps.y4m,--preset slower --scaling-list default
+city_4cif_60fps.y4m,--preset veryslow --opt-cu-delta-qp
 city_4cif_60fps.y4m,--preset veryslow --rdpenalty 2 --sao-non-deblock --no-b-intra --limit-refs 0
 ducks_take_off_420_720p50.y4m,--preset ultrafast --constrained-intra --rd 1
 ducks_take_off_444_720p50.y4m,--preset superfast --weightp --limit-refs 2
@@ -153,6 +157,6 @@
 CrowdRun_1920x1080_50_10bit_422.yuv,--preset fast --interlace bff
 
 #SEA Implementation Test
-silent_cif_420.y4m,--preset veryslow --me 4
-big_buck_bunny_360p24.y4m,--preset superfast --me 4
+silent_cif_420.y4m,--preset veryslow --me sea
+big_buck_bunny_360p24.y4m,--preset superfast --me sea
 # vim: tw=200
​

x265_2.2.tar.gz/source/test/smoke-tests.txt -> x265_2.3.tar.gz/source/test/smoke-tests.txt Changed

 
@@ -13,7 +13,7 @@
 old_town_cross_444_720p50.y4m,--preset=fast --keyint 20 --min-cu-size 16
 old_town_cross_444_720p50.y4m,--preset=slow --sao-non-deblock --pmode --qg-size 32
 RaceHorses_416x240_30_10bit.yuv,--preset=veryfast --max-tu-size 8
-RaceHorses_416x240_30_10bit.yuv,--preset=slower --bitrate 500 -F4 --rdoq-level 1
+RaceHorses_416x240_30_10bit.yuv,--preset=slower --bitrate 500 -F4 --rdoq-level 1 --opt-cu-delta-qp
 CrowdRun_1920x1080_50_10bit_444.yuv,--preset=ultrafast --constrained-intra --min-keyint 5 --keyint 10
 CrowdRun_1920x1080_50_10bit_444.yuv,--preset=medium --max-tu-size 16 --tu-inter-depth 2 --limit-tu 3
 DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset=veryfast --min-cu 16
​

x265_2.2.tar.gz/source/x265-extras.cpp -> x265_2.3.tar.gz/source/x265-extras.cpp Changed

@@ -114,7 +114,7 @@
 
                 /* detailed performance statistics */
                 if (level >= 2)
-                    fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms), Stall Time (ms), Avg WPP, Row Blocks");
+                    fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms), Stall Time (ms), Total frame time (ms), Avg WPP, Row Blocks");
                 fprintf(csvfp, "\n");
             }
             else
@@ -184,7 +184,7 @@
 
     if (level >= 2)
     {
-        fprintf(csvfp, ", %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf,", frameStats->decideWaitTime, frameStats->row0WaitTime, frameStats->wallTime, frameStats->refWaitWallTime, frameStats->totalCTUTime, frameStats->stallTime);
+        fprintf(csvfp, ", %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf,", frameStats->decideWaitTime, frameStats->row0WaitTime, frameStats->wallTime, frameStats->refWaitWallTime, frameStats->totalCTUTime, frameStats->stallTime, frameStats->totalFrameTime);
         fprintf(csvfp, " %.3lf, %d", frameStats->avgWPP, frameStats->countRowBlocks);
     }
     fprintf(csvfp, "\n");

 
@@ -114,7 +114,7 @@
 
                 /* detailed performance statistics */
                 if (level >= 2)
-                    fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms), Stall Time (ms), Avg WPP, Row Blocks");
+                    fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms), Stall Time (ms), Total frame time (ms), Avg WPP, Row Blocks");
                 fprintf(csvfp, "\n");
             }
             else
@@ -184,7 +184,7 @@
 
     if (level >= 2)
     {
-        fprintf(csvfp, ", %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf,", frameStats->decideWaitTime, frameStats->row0WaitTime, frameStats->wallTime, frameStats->refWaitWallTime, frameStats->totalCTUTime, frameStats->stallTime);
+        fprintf(csvfp, ", %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf,", frameStats->decideWaitTime, frameStats->row0WaitTime, frameStats->wallTime, frameStats->refWaitWallTime, frameStats->totalCTUTime, frameStats->stallTime, frameStats->totalFrameTime);
         fprintf(csvfp, " %.3lf, %d", frameStats->avgWPP, frameStats->countRowBlocks);
     }
     fprintf(csvfp, "\n");
​

x265_2.2.tar.gz/source/x265.h -> x265_2.3.tar.gz/source/x265.h Changed

@@ -115,6 +115,14 @@
     /* All the above values will add up to 100%. */
 } x265_cu_stats;
 
+
+typedef struct x265_analysis_2Pass
+{
+    uint32_t      poc;
+    uint32_t      frameRecordSize;
+    void*         analysisFramedata;
+}x265_analysis_2Pass;
+
 /* Frame level statistics */
 typedef struct x265_frame_stats
 {
@@ -149,6 +157,7 @@
     int              bScenecut;
     int              frameLatency;
     x265_cu_stats    cuStats;
+    double           totalFrameTime;
 } x265_frame_stats;
 
 /* Arbitrary User SEI
@@ -282,6 +291,8 @@
     uint64_t framesize;
 
     int    height;
+
+    x265_analysis_2Pass analysis2Pass;
 } x265_picture;
 
 typedef enum
@@ -379,6 +390,8 @@
 #define X265_AQ_AUTO_VARIANCE        2
 #define X265_AQ_AUTO_VARIANCE_BIASED 3
 
+#define x265_ADAPT_RD_STRENGTH   4
+
 /* NOTE! For this release only X265_CSP_I420 and X265_CSP_I444 are supported */
 
 /* Supported internal color space types (according to semantics of chroma_format_idc) */
@@ -1335,6 +1348,37 @@
     * intra cost of a frame used in scenecut detection. Default 5. */
     double     scenecutBias;
 
+    /* Use multiple worker threads dedicated to doing only lookahead instead of sharing
+    * the worker threads with Frame Encoders. A dedicated lookahead threadpool is created with the
+    * specified number of worker threads. This can range from 0 upto half the
+    * hardware threads available for encoding. Using too many threads for lookahead can starve
+    * resources for frame Encoder and can harm performance. Default is 0 - disabled. */
+    int       lookaheadThreads;
+
+    /* Optimize CU level QPs to signal consistent deltaQPs in frame for rd level > 4 */
+    int        bOptCUDeltaQP;
+
+    /* Refine analysis in multipass ratecontrol based on analysis information stored */
+    int         analysisMultiPassRefine;
+
+    /* Refine analysis in multipass ratecontrol based on distortion data stored */
+    int         analysisMultiPassDistortion;
+
+    /* Adaptive Quantization based on relative motion */
+    int        bAQMotion;
+
+    /* SSIM based RDO, based on residual divisive normalization scheme. Used for mode
+    * selection during analysis of CTUs, can achieve significant gain in terms of 
+    * objective quality metrics SSIM and PSNR */
+    int       bSsimRd;
+
+    /* Increase RD at points where bitrate drops due to vbv. Default 0 */
+    double    dynamicRd;
+
+    /* Enables the emitting of HDR SEI packets which contains HDR-specific params.
+     * Auto-enabled when max-cll, max-fall, or mastering display info is specified.
+     * Default is disabled */
+    int       bEmitHDRSEI;
 } x265_param;
 
 /* x265_param_alloc:

 
@@ -115,6 +115,14 @@
     /* All the above values will add up to 100%. */
 } x265_cu_stats;
 
+
+typedef struct x265_analysis_2Pass
+{
+    uint32_t      poc;
+    uint32_t      frameRecordSize;
+    void*         analysisFramedata;
+}x265_analysis_2Pass;
+
 /* Frame level statistics */
 typedef struct x265_frame_stats
 {
@@ -149,6 +157,7 @@
     int              bScenecut;
     int              frameLatency;
     x265_cu_stats    cuStats;
+    double           totalFrameTime;
 } x265_frame_stats;
 
 /* Arbitrary User SEI
@@ -282,6 +291,8 @@
     uint64_t framesize;
 
     int    height;
+
+    x265_analysis_2Pass analysis2Pass;
 } x265_picture;
 
 typedef enum
@@ -379,6 +390,8 @@
 #define X265_AQ_AUTO_VARIANCE        2
 #define X265_AQ_AUTO_VARIANCE_BIASED 3
 
+#define x265_ADAPT_RD_STRENGTH   4
+
 /* NOTE! For this release only X265_CSP_I420 and X265_CSP_I444 are supported */
 
 /* Supported internal color space types (according to semantics of chroma_format_idc) */
@@ -1335,6 +1348,37 @@
     * intra cost of a frame used in scenecut detection. Default 5. */
     double     scenecutBias;
 
+    /* Use multiple worker threads dedicated to doing only lookahead instead of sharing
+    * the worker threads with Frame Encoders. A dedicated lookahead threadpool is created with the
+    * specified number of worker threads. This can range from 0 upto half the
+    * hardware threads available for encoding. Using too many threads for lookahead can starve
+    * resources for frame Encoder and can harm performance. Default is 0 - disabled. */
+    int       lookaheadThreads;
+
+    /* Optimize CU level QPs to signal consistent deltaQPs in frame for rd level > 4 */
+    int        bOptCUDeltaQP;
+
+    /* Refine analysis in multipass ratecontrol based on analysis information stored */
+    int         analysisMultiPassRefine;
+
+    /* Refine analysis in multipass ratecontrol based on distortion data stored */
+    int         analysisMultiPassDistortion;
+
+    /* Adaptive Quantization based on relative motion */
+    int        bAQMotion;
+
+    /* SSIM based RDO, based on residual divisive normalization scheme. Used for mode
+    * selection during analysis of CTUs, can achieve significant gain in terms of 
+    * objective quality metrics SSIM and PSNR */
+    int       bSsimRd;
+
+    /* Increase RD at points where bitrate drops due to vbv. Default 0 */
+    double    dynamicRd;
+
+    /* Enables the emitting of HDR SEI packets which contains HDR-specific params.
+     * Auto-enabled when max-cll, max-fall, or mastering display info is specified.
+     * Default is disabled */
+    int       bEmitHDRSEI;
 } x265_param;
 
 /* x265_param_alloc:
​

x265_2.2.tar.gz/source/x265cli.h -> x265_2.3.tar.gz/source/x265cli.h Changed

@@ -125,6 +125,7 @@
     { "intra-refresh",        no_argument, NULL, 0 },
     { "rc-lookahead",   required_argument, NULL, 0 },
     { "lookahead-slices", required_argument, NULL, 0 },
+    { "lookahead-threads", required_argument, NULL, 0 },
     { "bframes",        required_argument, NULL, 'b' },
     { "bframe-bias",    required_argument, NULL, 0 },
     { "b-adapt",        required_argument, NULL, 0 },
@@ -165,6 +166,7 @@
     { "rd",             required_argument, NULL, 0 },
     { "rdoq-level",     required_argument, NULL, 0 },
     { "no-rdoq-level",        no_argument, NULL, 0 },
+    { "dynamic-rd",     required_argument, NULL, 0 },
     { "psy-rd",         required_argument, NULL, 0 },
     { "psy-rdoq",       required_argument, NULL, 0 },
     { "no-psy-rd",            no_argument, NULL, 0 },
@@ -218,6 +220,8 @@
     { "no-opt-qp-pps",        no_argument, NULL, 0 },
     { "opt-ref-list-length-pps",         no_argument, NULL, 0 },
     { "no-opt-ref-list-length-pps",      no_argument, NULL, 0 },
+    { "opt-cu-delta-qp",      no_argument, NULL, 0 },
+    { "no-opt-cu-delta-qp",   no_argument, NULL, 0 },
     { "no-dither",            no_argument, NULL, 0 },
     { "dither",               no_argument, NULL, 0 },
     { "no-repeat-headers",    no_argument, NULL, 0 },
@@ -235,6 +239,10 @@
     { "nr-inter",       required_argument, NULL, 0 },
     { "stats",          required_argument, NULL, 0 },
     { "pass",           required_argument, NULL, 0 },
+    { "multi-pass-opt-analysis", no_argument, NULL, 0 },
+    { "no-multi-pass-opt-analysis",    no_argument, NULL, 0 },
+    { "multi-pass-opt-distortion",     no_argument, NULL, 0 },
+    { "no-multi-pass-opt-distortion",  no_argument, NULL, 0 },
     { "slow-firstpass",       no_argument, NULL, 0 },
     { "no-slow-firstpass",    no_argument, NULL, 0 },
     { "multi-pass-opt-rps",   no_argument, NULL, 0 },
@@ -249,6 +257,12 @@
     { "analyze-src-pics", no_argument, NULL, 0 },
     { "no-analyze-src-pics", no_argument, NULL, 0 },
     { "slices",         required_argument, NULL, 0 },
+    { "aq-motion",            no_argument, NULL, 0 },
+    { "no-aq-motion",         no_argument, NULL, 0 },
+    { "ssim-rd",              no_argument, NULL, 0 },
+    { "no-ssim-rd",           no_argument, NULL, 0 },
+    { "hdr",                  no_argument, NULL, 0 },
+    { "no-hdr",               no_argument, NULL, 0 },
     { 0, 0, 0, 0 },
     { 0, 0, 0, 0 },
     { 0, 0, 0, 0 },
@@ -333,6 +347,8 @@
     H0("   --[no-]psy-rd <0..5.0>        Strength of psycho-visual rate distortion optimization, 0 to disable. Default %.1f\n", param->psyRd);
     H0("   --[no-]rdoq-level <0|1|2>     Level of RDO in quantization 0:none, 1:levels, 2:levels & coding groups. Default %d\n", param->rdoqLevel);
     H0("   --[no-]psy-rdoq <0..50.0>     Strength of psycho-visual optimization in RDO quantization, 0 to disable. Default %.1f\n", param->psyRdoq);
+    H0("   --dynamic-rd <0..4.0>         Strength of dynamic RD, 0 to disable. Default %.2f\n", param->dynamicRd);
+    H0("   --[no-]ssim-rd                Enable ssim rate distortion optimization, 0 to disable. Default %s\n", OPT(param->bSsimRd));
     H0("   --[no-]rd-refine              Enable QP based RD refinement for rd levels 5 and 6. Default %s\n", OPT(param->bEnableRdRefine));
     H0("   --[no-]early-skip             Enable early SKIP detection. Default %s\n", OPT(param->bEnableEarlySkip));
     H0("   --[no-]rskip                  Enable early exit from recursion. Default %s\n", OPT(param->bEnableRecursionSkip));
@@ -372,6 +388,7 @@
     H0("   --intra-refresh               Use Periodic Intra Refresh instead of IDR frames\n");
     H0("   --rc-lookahead <integer>      Number of frames for frame-type lookahead (determines encoder latency) Default %d\n", param->lookaheadDepth);
     H1("   --lookahead-slices <0..16>    Number of slices to use per lookahead cost estimate. Default %d\n", param->lookaheadSlices);
+    H0("   --lookahead-threads <integer> Number of threads to be dedicated to perform lookahead only. Default %d\n", param->lookaheadThreads);
     H0("   --bframes <integer>           Maximum number of consecutive b-frames (now it only enables B GOP structure) Default %d\n", param->bframes);
     H1("   --bframe-bias <integer>       Bias towards B frame decisions. Default %d\n", param->bFrameBias);
     H0("   --b-adapt <0..2>              0 - none, 1 - fast, 2 - full (trellis) adaptive B frame scheduling. Default %d\n", param->bFrameAdaptive);
@@ -396,6 +413,8 @@
        "                                   - 1 : First pass, creates stats file\n"
        "                                   - 2 : Last pass, does not overwrite stats file\n"
        "                                   - 3 : Nth pass, overwrites stats file\n");
+    H0("   --[no-]multi-pass-opt-analysis   Refine analysis in 2 pass based on analysis information from pass 1\n");
+    H0("   --[no-]multi-pass-opt-distortion Use distortion of CTU from pass 1 to refine qp in 2 pass\n");
     H0("   --stats                       Filename for stats file in multipass pass rate control. Default x265_2pass.log\n");
     H0("   --[no-]analyze-src-pics       Motion estimation uses source frame planes. Default disable\n");
     H0("   --[no-]slow-firstpass         Enable a slow first pass in a multipass rate control mode. Default %s\n", OPT(param->rc.bEnableSlowFirstPass));
@@ -404,6 +423,7 @@
     H0("   --analysis-file <filename>    Specify file name used for either dumping or reading analysis data.\n");
     H0("   --aq-mode <integer>           Mode for Adaptive Quantization - 0:none 1:uniform AQ 2:auto variance 3:auto variance with bias to dark scenes. Default %d\n", param->rc.aqMode);
     H0("   --aq-strength <float>         Reduces blocking and blurring in flat and textured areas (0 to 3.0). Default %.2f\n", param->rc.aqStrength);
+    H0("   --[no-]aq-motion              Adaptive Quantization based on the relative motion of each CU w.r.t., frame. Default %s\n", OPT(param->bOptCUDeltaQP));
     H0("   --qg-size <int>               Specifies the size of the quantization group (64, 32, 16, 8). Default %d\n", param->rc.qgSize);
     H0("   --[no-]cutree                 Enable cutree for Adaptive Quantization. Default %s\n", OPT(param->rc.cuTree));
     H0("   --[no-]rc-grain               Enable ratecontrol mode to handle grains specifically. turned on with tune grain. Default %s\n", OPT(param->rc.bEnableGrain));
@@ -450,6 +470,7 @@
     H0("   --master-display <string>     SMPTE ST 2086 master display color volume info SEI (HDR)\n");
     H0("                                    format: G(x,y)B(x,y)R(x,y)WP(x,y)L(max,min)\n");
     H0("   --max-cll <string>            Emit content light level info SEI as \"cll,fall\" (HDR)\n");
+    H0("   --[no-]hdr                    Control dumping of HDR SEI packet. If max-cll or master-display has non-zero values, this is enabled. Default %s\n", OPT(param->bEmitHDRSEI));
     H0("   --min-luma <integer>          Minimum luma plane value of input source picture\n");
     H0("   --max-luma <integer>          Maximum luma plane value of input source picture\n");
     H0("\nBitstream options:\n");
@@ -465,6 +486,7 @@
     H0("   --[no-]opt-qp-pps             Dynamically optimize QP in PPS (instead of default 26) based on QPs in previous GOP. Default %s\n", OPT(param->bOptQpPPS));
     H0("   --[no-]opt-ref-list-length-pps  Dynamically set L0 and L1 ref list length in PPS (instead of default 0) based on values in last GOP. Default %s\n", OPT(param->bOptRefListLengthPPS));
     H0("   --[no-]multi-pass-opt-rps     Enable storing commonly used RPS in SPS in multi pass mode. Default %s\n", OPT(param->bMultiPassOptRPS));
+    H0("   --[no-]opt-cu-delta-qp        Optimize to signal consistent CU level delta QPs in frame. Default %s\n", OPT(param->bOptCUDeltaQP));
     H1("\nReconstructed video options (debugging):\n");
     H1("-r/--recon <filename>            Reconstructed raw image YUV or Y4M output file name\n");
     H1("   --recon-depth <integer>       Bit-depth of reconstructed raw image file. Defaults to input bit depth, or 8 if Y4M\n");

 
@@ -125,6 +125,7 @@
     { "intra-refresh",        no_argument, NULL, 0 },
     { "rc-lookahead",   required_argument, NULL, 0 },
     { "lookahead-slices", required_argument, NULL, 0 },
+    { "lookahead-threads", required_argument, NULL, 0 },
     { "bframes",        required_argument, NULL, 'b' },
     { "bframe-bias",    required_argument, NULL, 0 },
     { "b-adapt",        required_argument, NULL, 0 },
@@ -165,6 +166,7 @@
     { "rd",             required_argument, NULL, 0 },
     { "rdoq-level",     required_argument, NULL, 0 },
     { "no-rdoq-level",        no_argument, NULL, 0 },
+    { "dynamic-rd",     required_argument, NULL, 0 },
     { "psy-rd",         required_argument, NULL, 0 },
     { "psy-rdoq",       required_argument, NULL, 0 },
     { "no-psy-rd",            no_argument, NULL, 0 },
@@ -218,6 +220,8 @@
     { "no-opt-qp-pps",        no_argument, NULL, 0 },
     { "opt-ref-list-length-pps",         no_argument, NULL, 0 },
     { "no-opt-ref-list-length-pps",      no_argument, NULL, 0 },
+    { "opt-cu-delta-qp",      no_argument, NULL, 0 },
+    { "no-opt-cu-delta-qp",   no_argument, NULL, 0 },
     { "no-dither",            no_argument, NULL, 0 },
     { "dither",               no_argument, NULL, 0 },
     { "no-repeat-headers",    no_argument, NULL, 0 },
@@ -235,6 +239,10 @@
     { "nr-inter",       required_argument, NULL, 0 },
     { "stats",          required_argument, NULL, 0 },
     { "pass",           required_argument, NULL, 0 },
+    { "multi-pass-opt-analysis", no_argument, NULL, 0 },
+    { "no-multi-pass-opt-analysis",    no_argument, NULL, 0 },
+    { "multi-pass-opt-distortion",     no_argument, NULL, 0 },
+    { "no-multi-pass-opt-distortion",  no_argument, NULL, 0 },
     { "slow-firstpass",       no_argument, NULL, 0 },
     { "no-slow-firstpass",    no_argument, NULL, 0 },
     { "multi-pass-opt-rps",   no_argument, NULL, 0 },
@@ -249,6 +257,12 @@
     { "analyze-src-pics", no_argument, NULL, 0 },
     { "no-analyze-src-pics", no_argument, NULL, 0 },
     { "slices",         required_argument, NULL, 0 },
+    { "aq-motion",            no_argument, NULL, 0 },
+    { "no-aq-motion",         no_argument, NULL, 0 },
+    { "ssim-rd",              no_argument, NULL, 0 },
+    { "no-ssim-rd",           no_argument, NULL, 0 },
+    { "hdr",                  no_argument, NULL, 0 },
+    { "no-hdr",               no_argument, NULL, 0 },
     { 0, 0, 0, 0 },
     { 0, 0, 0, 0 },
     { 0, 0, 0, 0 },
@@ -333,6 +347,8 @@
     H0("   --[no-]psy-rd <0..5.0>        Strength of psycho-visual rate distortion optimization, 0 to disable. Default %.1f\n", param->psyRd);
     H0("   --[no-]rdoq-level <0|1|2>     Level of RDO in quantization 0:none, 1:levels, 2:levels & coding groups. Default %d\n", param->rdoqLevel);
     H0("   --[no-]psy-rdoq <0..50.0>     Strength of psycho-visual optimization in RDO quantization, 0 to disable. Default %.1f\n", param->psyRdoq);
+    H0("   --dynamic-rd <0..4.0>         Strength of dynamic RD, 0 to disable. Default %.2f\n", param->dynamicRd);
+    H0("   --[no-]ssim-rd                Enable ssim rate distortion optimization, 0 to disable. Default %s\n", OPT(param->bSsimRd));
     H0("   --[no-]rd-refine              Enable QP based RD refinement for rd levels 5 and 6. Default %s\n", OPT(param->bEnableRdRefine));
     H0("   --[no-]early-skip             Enable early SKIP detection. Default %s\n", OPT(param->bEnableEarlySkip));
     H0("   --[no-]rskip                  Enable early exit from recursion. Default %s\n", OPT(param->bEnableRecursionSkip));
@@ -372,6 +388,7 @@
     H0("   --intra-refresh               Use Periodic Intra Refresh instead of IDR frames\n");
     H0("   --rc-lookahead <integer>      Number of frames for frame-type lookahead (determines encoder latency) Default %d\n", param->lookaheadDepth);
     H1("   --lookahead-slices <0..16>    Number of slices to use per lookahead cost estimate. Default %d\n", param->lookaheadSlices);
+    H0("   --lookahead-threads <integer> Number of threads to be dedicated to perform lookahead only. Default %d\n", param->lookaheadThreads);
     H0("   --bframes <integer>           Maximum number of consecutive b-frames (now it only enables B GOP structure) Default %d\n", param->bframes);
     H1("   --bframe-bias <integer>       Bias towards B frame decisions. Default %d\n", param->bFrameBias);
     H0("   --b-adapt <0..2>              0 - none, 1 - fast, 2 - full (trellis) adaptive B frame scheduling. Default %d\n", param->bFrameAdaptive);
@@ -396,6 +413,8 @@
        "                                   - 1 : First pass, creates stats file\n"
        "                                   - 2 : Last pass, does not overwrite stats file\n"
        "                                   - 3 : Nth pass, overwrites stats file\n");
+    H0("   --[no-]multi-pass-opt-analysis   Refine analysis in 2 pass based on analysis information from pass 1\n");
+    H0("   --[no-]multi-pass-opt-distortion Use distortion of CTU from pass 1 to refine qp in 2 pass\n");
     H0("   --stats                       Filename for stats file in multipass pass rate control. Default x265_2pass.log\n");
     H0("   --[no-]analyze-src-pics       Motion estimation uses source frame planes. Default disable\n");
     H0("   --[no-]slow-firstpass         Enable a slow first pass in a multipass rate control mode. Default %s\n", OPT(param->rc.bEnableSlowFirstPass));
@@ -404,6 +423,7 @@
     H0("   --analysis-file <filename>    Specify file name used for either dumping or reading analysis data.\n");
     H0("   --aq-mode <integer>           Mode for Adaptive Quantization - 0:none 1:uniform AQ 2:auto variance 3:auto variance with bias to dark scenes. Default %d\n", param->rc.aqMode);
     H0("   --aq-strength <float>         Reduces blocking and blurring in flat and textured areas (0 to 3.0). Default %.2f\n", param->rc.aqStrength);
+    H0("   --[no-]aq-motion              Adaptive Quantization based on the relative motion of each CU w.r.t., frame. Default %s\n", OPT(param->bOptCUDeltaQP));
     H0("   --qg-size <int>               Specifies the size of the quantization group (64, 32, 16, 8). Default %d\n", param->rc.qgSize);
     H0("   --[no-]cutree                 Enable cutree for Adaptive Quantization. Default %s\n", OPT(param->rc.cuTree));
     H0("   --[no-]rc-grain               Enable ratecontrol mode to handle grains specifically. turned on with tune grain. Default %s\n", OPT(param->rc.bEnableGrain));
@@ -450,6 +470,7 @@
     H0("   --master-display <string>     SMPTE ST 2086 master display color volume info SEI (HDR)\n");
     H0("                                    format: G(x,y)B(x,y)R(x,y)WP(x,y)L(max,min)\n");
     H0("   --max-cll <string>            Emit content light level info SEI as \"cll,fall\" (HDR)\n");
+    H0("   --[no-]hdr                    Control dumping of HDR SEI packet. If max-cll or master-display has non-zero values, this is enabled. Default %s\n", OPT(param->bEmitHDRSEI));
     H0("   --min-luma <integer>          Minimum luma plane value of input source picture\n");
     H0("   --max-luma <integer>          Maximum luma plane value of input source picture\n");
     H0("\nBitstream options:\n");
@@ -465,6 +486,7 @@
     H0("   --[no-]opt-qp-pps             Dynamically optimize QP in PPS (instead of default 26) based on QPs in previous GOP. Default %s\n", OPT(param->bOptQpPPS));
     H0("   --[no-]opt-ref-list-length-pps  Dynamically set L0 and L1 ref list length in PPS (instead of default 0) based on values in last GOP. Default %s\n", OPT(param->bOptRefListLengthPPS));
     H0("   --[no-]multi-pass-opt-rps     Enable storing commonly used RPS in SPS in multi pass mode. Default %s\n", OPT(param->bMultiPassOptRPS));
+    H0("   --[no-]opt-cu-delta-qp        Optimize to signal consistent CU level delta QPs in frame. Default %s\n", OPT(param->bOptCUDeltaQP));
     H1("\nReconstructed video options (debugging):\n");
     H1("-r/--recon <filename>            Reconstructed raw image YUV or Y4M output file name\n");
     H1("   --recon-depth <integer>       Bit-depth of reconstructed raw image file. Defaults to input bit depth, or 8 if Y4M\n");
​