Overview
Submit package home:enzokiel:branches:Essentials / x265 to package Essentials / x265
x265.changes
Changed
x
1
2
-------------------------------------------------------------------
3
+Thu Jul 27 08:33:52 UTC 2017 - joerg.lorenzen@ki.tng.de
4
+
5
+- Update to version 2.5
6
+ Encoder enhancements
7
+ * Improved grain handling with --tune grain option by throttling
8
+ VBV operations to limit QP jumps.
9
+ * Frame threads are now decided based on number of threads
10
+ specified in the --pools, as opposed to the number of hardware
11
+ threads available. The mapping was also adjusted to improve
12
+ quality of the encodes with minimal impact to performance.
13
+ * CSV logging feature (enabled by --csv) is now part of the
14
+ library; it was previously part of the x265 application.
15
+ Applications that integrate libx265 can now extract frame level
16
+ statistics for their encodes by exercising this option in the
17
+ library.
18
+ * Globals that track min and max CU sizes, number of slices, and
19
+ other parameters have now been moved into instance-specific
20
+ variables. Consequently, applications that invoke multiple
21
+ instances of x265 library are no longer restricted to use the
22
+ same settings for these parameter options across the multiple
23
+ instances.
24
+ * x265 can now generate a seprate library that exports the HDR10+
25
+ parsing API. Other libraries that wish to use this API may do
26
+ so by linking against this library. Enable ENABLE_HDR10_PLUS in
27
+ CMake options and build to generate this library.
28
+ * SEA motion search receives a 10% performance boost from AVX2
29
+ optimization of its kernels.
30
+ * The CSV log is now more elaborate with additional fields such
31
+ as PU statistics, average-min-max luma and chroma values, etc.
32
+ Refer to documentation of --csv for details of all fields.
33
+ * x86inc.asm cleaned-up for improved instruction handling.
34
+ API changes
35
+ * New API x265_encoder_ctu_info() introduced to specify suggested
36
+ partition sizes for various CTUs in a frame. To be used in
37
+ conjunction with --ctu-info to react to the specified
38
+ partitions appropriately.
39
+ * Rate-control statistics passed through the x265_picture object
40
+ for an incoming frame are now used by the encoder.
41
+ * Options to scale, reuse, and refine analysis for incoming
42
+ analysis shared through the x265_analysis_data field in
43
+ x265_picture for runs that use --analysis-reuse-mode load; use
44
+ options --scale, --refine-mv, --refine-inter, and
45
+ --refine-intra to explore.
46
+ * VBV now has a deterministic mode. Use --const-vbv to exercise.
47
+ Bug fixes
48
+ * Several fixes for HDR10+ parsing code including incompatibility
49
+ with user-specific SEI, removal of warnings, linking issues in
50
+ linux, etc.
51
+ * SEI messages for HDR10 repeated every keyint when HDR options
52
+ (--hdr-opt, --master-display) specified.
53
+- soname bump to 130.
54
+
55
+-------------------------------------------------------------------
56
Thu Apr 27 14:15:13 UTC 2017 - joerg.lorenzen@ki.tng.de
57
58
- Update to version 2.4
59
x265.spec
Changed
14
1
2
# based on the spec file from https://build.opensuse.org/package/view_file/home:Simmphonie/libx265/
3
4
Name: x265
5
-%define soname 116
6
+%define soname 130
7
%define libname lib%{name}
8
%define libsoname %{libname}-%{soname}
9
-Version: 2.4
10
+Version: 2.5
11
Release: 0
12
License: GPL-2.0+
13
Summary: A free h265/HEVC encoder - encoder binary
14
baselibs.conf
Changed
4
1
2
-libx265-116
3
+libx265-130
4
x265_2.4.tar.gz/source/dynamicHDR10/BasicStructures.cpp
Deleted
42
1
2
-/**
3
- * @file BasicStructures.cpp
4
- * @brief Defines the structure of metadata parameters
5
- * @author Daniel Maximiliano Valenzuela, Seongnam Oh.
6
- * @create date 03/01/2017
7
- * @version 0.0.1
8
- *
9
- * Copyright @ 2017 Samsung Electronics, DMS Lab, Samsung Research America and Samsung Research Tijuana
10
- *
11
- * This program is free software; you can redistribute it and/or
12
- * modify it under the terms of the GNU General Public License
13
- * as published by the Free Software Foundation; either version 2
14
- * of the License, or (at your option) any later version.
15
- *
16
- * This program is distributed in the hope that it will be useful,
17
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
18
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
19
- * GNU General Public License for more details.
20
- *
21
- * You should have received a copy of the GNU General Public License
22
- * along with this program; if not, write to the Free Software
23
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
24
- * MA 02110-1301, USA.
25
-**/
26
-
27
-#include "BasicStructures.h"
28
-#include "vector"
29
-
30
-struct PercentileLuminance{
31
-
32
- float averageLuminance = 0.0;
33
- float maxRLuminance = 0.0;
34
- float maxGLuminance = 0.0;
35
- float maxBLuminance = 0.0;
36
- int order;
37
- std::vector<unsigned int> percentiles;
38
-};
39
-
40
-
41
-
42
x265_2.4.tar.gz/.hg_archival.txt -> x265_2.5.tar.gz/.hg_archival.txt
Changed
8
1
2
repo: 09fe40627f03a0f9c3e6ac78b22ac93da23f9fdf
3
-node: e7a4dd48293b7956d4a20df257d23904cc78e376
4
+node: 64b2d0bf45a52511e57a6b7299160b961ca3d51c
5
branch: stable
6
-tag: 2.4
7
+tag: 2.5
8
x265_2.4.tar.gz/.hgtags -> x265_2.5.tar.gz/.hgtags
Changed
6
1
2
981e3bfef16a997bce6f46ce1b15631a0e234747 2.1
3
be14a7e9755e54f0fd34911c72bdfa66981220bc 2.2
4
3037c1448549ca920967831482c653e5892fa8ed 2.3
5
+e7a4dd48293b7956d4a20df257d23904cc78e376 2.4
6
x265_2.4.tar.gz/doc/reST/api.rst -> x265_2.5.tar.gz/doc/reST/api.rst
Changed
29
1
2
* presets is not recommended without a more fine-grained breakdown of
3
* parameters to take this into account. */
4
int x265_encoder_reconfig(x265_encoder *, x265_param *);
5
+**x265_encoder_ctu_info**
6
+ /* x265_encoder_ctu_info:
7
+ * Copy CTU information such as ctu address and ctu partition structure of all
8
+ * CTUs in each frame. The function is invoked only if "--ctu-info" is enabled and
9
+ * the encoder will wait for this copy to complete if enabled.
10
+ */
11
12
Pictures
13
========
14
15
Cleanup
16
=======
17
18
+At the end of the encode, the application will want to trigger logging
19
+of the final encode statistics, if :option:`--csv` had been specified::
20
+
21
+ /* x265_encoder_log:
22
+ * write a line to the configured CSV file. If a CSV filename was not
23
+ * configured, or file open failed, this function will perform no write. */
24
+ void x265_encoder_log(x265_encoder *encoder, int argc, char **argv);
25
+
26
Finally, the encoder must be closed in order to free all of its
27
resources. An encoder that has been flushed cannot be restarted and
28
reused. Once **x265_encoder_close()** has been called, the encoder
29
x265_2.4.tar.gz/doc/reST/cli.rst -> x265_2.5.tar.gz/doc/reST/cli.rst
Changed
268
1
2
2. unable to open encoder
3
3. unable to generate stream headers
4
4. encoder abort
5
- 5. unable to open csv file
6
-
7
+
8
Logging/Statistic Options
9
=========================
10
11
12
it adds one line per run. If :option:`--csv-log-level` is greater than
13
0, it writes one line per frame. Default none
14
15
- Several frame performance statistics are available when
16
- :option:`--csv-log-level` is greater than or equal to 2:
17
-
18
+ The following statistics are available when :option:`--csv-log-level` is
19
+ greater than or equal to 1:
20
+
21
+ **Encode Order** The frame order in which the encoder encodes.
22
+
23
+ **Type** Slice type of the frame.
24
+
25
+ **POC** Picture Order Count - The display order of the frames.
26
+
27
+ **QP** Quantization Parameter decided for the frame.
28
+
29
+ **Bits** Number of bits consumed by the frame.
30
+
31
+ **Scenecut** 1 if the frame is a scenecut, 0 otherwise.
32
+
33
+ **RateFactor** Applicable only when CRF is enabled. The rate factor depends
34
+ on the CRF given by the user. This is used to determine the QP so as to
35
+ target a certain quality.
36
+
37
+ **BufferFill** Bits available for the next frame. Includes bits carried
38
+ over from the current frame.
39
+
40
+ **Latency** Latency in terms of number of frames between when the frame
41
+ was given in and when the frame is given out.
42
+
43
+ **PSNR** Peak signal to noise ratio for Y, U and V planes.
44
+
45
+ **SSIM** A quality metric that denotes the structural similarity between frames.
46
+
47
+ **Ref lists** POC of references in lists 0 and 1 for the frame.
48
+
49
+ Several statistics about the encoded bitstream and encoder performance are
50
+ available when :option:`--csv-log-level` is greater than or equal to 2:
51
+
52
+ **I/P cost ratio:** The ratio between the cost when a frame is decided as an
53
+ I frame to that when it is decided as a P frame as computed from the
54
+ quarter-resolution frame in look-ahead. This, in combination with other parameters
55
+ such as position of the frame in the GOP, is used to decide scene transitions.
56
+
57
+ **Analysis statistics:**
58
+
59
+ **CU Statistics** percentage of CU modes.
60
+
61
+ **Distortion** Average luma and chroma distortion. Calculated as
62
+ SSE is done on fenc and recon(after quantization).
63
+
64
+ **Psy Energy** Average psy energy calculated as the sum of absolute
65
+ difference between source and recon energy. Energy is measured by sa8d
66
+ minus SAD.
67
+
68
+ **Residual Energy** Average residual energy. SSE is calculated on fenc
69
+ and pred(before quantization).
70
+
71
+ **Luma/Chroma Values** minumum, maximum and average(averaged by area)
72
+ luma and chroma values of source for each frame.
73
+
74
+ **PU Statistics** percentage of PU modes at each depth.
75
+
76
+ **Performance statistics:**
77
+
78
**DecideWait ms** number of milliseconds the frame encoder had to
79
wait, since the previous frame was retrieved by the API thread,
80
before a new frame has been given to it. This is the latency
81
82
**Stall Time ms** the number of milliseconds of the reported wall
83
time that were spent with zero worker threads, aka all compression
84
was completely stalled.
85
+
86
+ **Total frame time** Total time spent to encode the frame.
87
88
**Avg WPP** the average number of worker threads working on this
89
frame, at any given time. This value is sampled at the completion of
90
91
is more of a problem for P frames where some blocks are much more
92
expensive than others.
93
94
- **CLI ONLY**
95
-
96
.. option:: --csv-log-level <integer>
97
98
Controls the level of detail (and size) of --csv log files
99
100
1. frame level logging
101
2. frame level logging with performance statistics
102
103
- **CLI ONLY**
104
-
105
.. option:: --ssim, --no-ssim
106
107
Calculate and report Structural Similarity values. It is
108
109
110
Analysis re-use options, to improve performance when encoding the same
111
sequence multiple times (presumably at varying bitrates). The encoder
112
-will not reuse analysis if the resolution and slice type parameters do
113
-not match.
114
+will not reuse analysis if slice type parameters do not match.
115
116
-.. option:: --analysis-mode <string|int>
117
+.. option:: --analysis-reuse-mode <string|int>
118
119
- Specify whether analysis information of each frame is output by encoder
120
- or input for reuse. By reading the analysis data writen by an
121
- earlier encode of the same sequence, substantial redundant work may
122
- be avoided.
123
-
124
- The following data may be stored and reused:
125
- I frames - split decisions and luma intra directions of all CUs.
126
- P/B frames - motion vectors are dumped at each depth for all CUs.
127
+ This option allows reuse of analysis information from first pass to second pass.
128
+ :option:`--analysis-reuse-mode save` specifies that encoder outputs analysis information of each frame.
129
+ :option:`--analysis-reuse-mode load` specifies that encoder reuses analysis information from first pass.
130
+ There is no benefit using load mode without running encoder in save mode. Analysis data from save mode is
131
+ written to a file specified by :option:`--analysis-reuse-file`. The amount of analysis data stored/reused
132
+ is determined by :option:`--analysis-reuse-level`. By reading the analysis data writen by an earlier encode
133
+ of the same sequence, substantial redundant work may be avoided. Requires cutree, pmode to be off. Default 0.
134
135
**Values:** off(0), save(1): dump analysis data, load(2): read analysis data
136
137
-.. option:: --analysis-file <filename>
138
+.. option:: --analysis-reuse-file <filename>
139
140
- Specify a filename for analysis data (see :option:`--analysis-mode`)
141
+ Specify a filename for analysis data (see :option:`--analysis-reuse-mode`)
142
If no filename is specified, x265_analysis.dat is used.
143
144
-.. option:: --refine-level <1..10>
145
+.. option:: --analysis-reuse-level <1..10>
146
147
- Amount of information stored/reused in :option:`--analysis-mode` is distributed across levels.
148
+ Amount of information stored/reused in :option:`--analysis-reuse-mode` is distributed across levels.
149
Higher the value, higher the information stored/reused, faster the encode. Default 5.
150
151
- Note that --refine-level must be paired with analysis-mode.
152
+ Note that --analysis-reuse-level must be paired with analysis-reuse-mode.
153
154
+--------+-----------------------------------------+
155
| Level | Description |
156
157
| 10 | Level 5 + Full CU analysis-info |
158
+--------+-----------------------------------------+
159
160
+.. option:: --scale-factor
161
+
162
+ Factor by which input video is scaled down for analysis save mode.
163
+ This option should be coupled with analysis-reuse-mode option, --analysis-reuse-level 10.
164
+ The ctu size of load should be double the size of save. Default 0.
165
+
166
+.. option:: --refine-intra <0|1|2>
167
+
168
+ Enables refinement of intra blocks in current encode.
169
+
170
+ Level 0 - Forces both mode and depth from the previous encode.
171
+
172
+ Level 1 - Evaluates all intra modes for blocks of size one smaller than
173
+ the min-cu-size of the incoming analysis data from the previous encode,
174
+ forces modes for blocks of larger size.
175
+
176
+ Level 2 - Evaluates all intra modes for blocks of size one smaller than
177
+ the min-cu-size of the incoming analysis data from the previous encode.
178
+ For larger blocks, force only depth when angular mode is chosen by the
179
+ previous encode, force depth and mode when other intra modes are chosen.
180
+
181
+ Default 0.
182
+
183
+.. option:: --refine-inter-depth
184
+
185
+ Enables refinement of inter blocks in current encode. Evaluates all
186
+ inter modes for blocks of size one smaller than the min-cu-size of the
187
+ incoming analysis data from the previous encode. Default disabled.
188
+
189
+.. option:: --refine-mv
190
+
191
+ Enables refinement of motion vector for scaled video. Evaluates the best
192
+ motion vector by searching the surrounding eight integer and subpel pixel
193
+ positions.
194
+
195
Options which affect the transform unit quad-tree, sometimes referred to
196
as the residual quad-tree (RQT).
197
198
199
intra cost of a frame used in scenecut detection. For example, a value of 5 indicates,
200
if the inter cost of a frame is greater than or equal to 95 percent of the intra cost of the frame,
201
then detect this frame as scenecut. Values between 5 and 15 are recommended. Default 5.
202
-
203
+
204
+.. option:: --ctu-info <0, 1, 2, 4, 6>
205
+
206
+ This value enables receiving CTU information asynchronously and determine reaction to the CTU information. Default 0.
207
+ 1: force the partitions if CTU information is present.
208
+ 2: functionality of (1) and reduce qp if CTU information has changed.
209
+ 4: functionality of (1) and force Inter modes when CTU Information has changed, merge/skip otherwise.
210
+ This option should be enabled only when planning to invoke the API function x265_encoder_ctu_info to copy ctu-info asynchronously.
211
+ If enabled without calling the API function, the encoder will wait indefinitely.
212
+
213
.. option:: --intra-refresh
214
215
Enables Periodic Intra Refresh(PIR) instead of keyframe insertion.
216
217
and also redundant steps are skipped.
218
In pass 1 analysis information like motion vector, depth, reference and prediction
219
modes of the final best CTU partition is stored for each CTU.
220
- Default disabled.
221
+ Multipass analysis refinement cannot be enabled when 'analysis-save/analysis-load' option
222
+ is enabled and both will be disabled when enabled together. This feature requires 'pmode/pme'
223
+ to be disabled and hence pmode/pme will be disabled when enabled at the same time.
224
+
225
+ Default: disabled.
226
227
.. option:: --multi-pass-opt-distortion, --no-multi-pass-opt-distortion
228
229
230
ratecontrol. In pass 1 distortion of best CTU partition is stored. CTUs with high
231
distortion get lower(negative)qp offsets and vice-versa for low distortion CTUs in pass 2.
232
This helps to improve the subjective quality.
233
- Default disabled.
234
+ Multipass refinement of qp cannot be enabled when 'analysis-save/analysis-load' option
235
+ is enabled and both will be disabled when enabled together. 'multi-pass-opt-distortion'
236
+ requires 'pmode/pme' to be disabled and hence pmode/pme will be disabled when enabled along with it.
237
+
238
+ Default: disabled.
239
240
.. option:: --strict-cbr, --no-strict-cbr
241
242
243
that this option is used through the tune grain feature where a combination
244
of param options are used to improve visual quality.
245
246
+.. option:: --const-vbv, --no-const-vbv
247
+
248
+ Enables VBV algorithm to be consistent across runs. Default disabled.
249
+ Enabled when :option:'--tune' grain is applied.
250
+
251
.. option:: --qblur <float>
252
253
Temporally blur quants. Default 0.5
254
255
256
.. option:: --dhdr10-info <filename>
257
258
- Inserts tone mapping information as an SEI message.
259
+ Inserts tone mapping information as an SEI message. It takes as input,
260
+ the path to the JSON file containing the Creative Intent Metadata
261
+ to be encoded as Dynamic Tone Mapping into the bitstream.
262
+
263
+ Click `here <https://www.sra.samsung.com/assets/User-data-registered-itu-t-t35-SEI-message-for-ST-2094-40-v1.1.pdf>`_
264
+ for the syntax of the metadata file. A sample JSON file is available in `the downloads page <https://bitbucket.org/multicoreware/x265/downloads/DCIP3_4K_to_400_dynamic.json>`_
265
266
.. option:: --dhdr10-opt, --no-dhdr10-opt
267
268
x265_2.4.tar.gz/doc/reST/releasenotes.rst -> x265_2.5.tar.gz/doc/reST/releasenotes.rst
Changed
37
1
2
Release Notes
3
*************
4
5
-Release Notes
6
-*************
7
+Version 2.5
8
+===========
9
+
10
+Release date - 13th July, 2017.
11
+
12
+Encoder enhancements
13
+--------------------
14
+1. Improved grain handling with :option:`--tune` grain option by throttling VBV operations to limit QP jumps.
15
+2. Frame threads are now decided based on number of threads specified in the :option:`--pools`, as opposed to the number of hardware threads available. The mapping was also adjusted to improve quality of the encodes with minimal impact to performance.
16
+3. CSV logging feature (enabled by :option:`--csv`) is now part of the library; it was previously part of the x265 application. Applications that integrate libx265 can now extract frame level statistics for their encodes by exercising this option in the library.
17
+4. Globals that track min and max CU sizes, number of slices, and other parameters have now been moved into instance-specific variables. Consequently, applications that invoke multiple instances of x265 library are no longer restricted to use the same settings for these parameter options across the multiple instances.
18
+5. x265 can now generate a seprate library that exports the HDR10+ parsing API. Other libraries that wish to use this API may do so by linking against this library. Enable ENABLE_HDR10_PLUS in CMake options and build to generate this library.
19
+6. SEA motion search receives a 10% performance boost from AVX2 optimization of its kernels.
20
+7. The CSV log is now more elaborate with additional fields such as PU statistics, average-min-max luma and chroma values, etc. Refer to documentation of :option:`--csv` for details of all fields.
21
+8. x86inc.asm cleaned-up for improved instruction handling.
22
+
23
+API changes
24
+-----------
25
+1. New API x265_encoder_ctu_info() introduced to specify suggested partition sizes for various CTUs in a frame. To be used in conjunction with :option:`--ctu-info` to react to the specified partitions appropriately.
26
+2. Rate-control statistics passed through the x265_picture object for an incoming frame are now used by the encoder.
27
+3. Options to scale, reuse, and refine analysis for incoming analysis shared through the x265_analysis_data field in x265_picture for runs that use :option:`--analysis-reuse-mode` load; use options :option:`--scale`, :option:`--refine-mv`, :option:`--refine-inter`, and :option:`--refine-intra` to explore.
28
+4. VBV now has a deterministic mode. Use :option:`--const-vbv` to exercise.
29
+
30
+Bug fixes
31
+---------
32
+1. Several fixes for HDR10+ parsing code including incompatibility with user-specific SEI, removal of warnings, linking issues in linux, etc.
33
+2. SEI messages for HDR10 repeated every keyint when HDR options (:option:`--hdr-opt`, :option:`--master-display`) specified.
34
35
Version 2.4
36
===========
37
x265_2.4.tar.gz/source/CMakeLists.txt -> x265_2.5.tar.gz/source/CMakeLists.txt
Changed
132
1
2
option(STATIC_LINK_CRT "Statically link C runtime for release builds" OFF)
3
mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
4
# X265_BUILD must be incremented each time the public API is changed
5
-set(X265_BUILD 116)
6
+set(X265_BUILD 130)
7
configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
8
"${PROJECT_BINARY_DIR}/x265.def")
9
configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
10
11
add_definitions(-O3 -qstrict -qhot -qaltivec)
12
add_definitions(-qinline=level=10 -qpath=IL:/data/video_files/latest.tpo/)
13
endif()
14
-
15
-
16
+# this option is to enable the inclusion of dynamic HDR10 library to the libx265 compilation
17
+option(ENABLE_HDR10_PLUS "Enable dynamic HDR10 compilation" OFF)
18
if(GCC)
19
add_definitions(-Wall -Wextra -Wshadow)
20
add_definitions(-D__STDC_LIMIT_MACROS=1)
21
- add_definitions(-std=gnu++98)
22
+ if(ENABLE_HDR10_PLUS)
23
+ if(CMAKE_CXX_COMPILER_VERSION VERSION_LESS "4.8")
24
+ message(FATAL_ERROR "gcc version above 4.8 required to support hdr10plus")
25
+ endif()
26
+ add_definitions(-std=gnu++11)
27
+ else()
28
+ add_definitions(-std=gnu++98)
29
+ endif()
30
if(ENABLE_PIC)
31
add_definitions(-fPIC)
32
endif(ENABLE_PIC)
33
34
else(HIGH_BIT_DEPTH)
35
add_definitions(-DHIGH_BIT_DEPTH=0 -DX265_DEPTH=8)
36
endif(HIGH_BIT_DEPTH)
37
-# this option is to enable the inclusion of dynamic HDR10 library to the libx265 compilation
38
-option(ENABLE_DYNAMIC_HDR10 "Enable dynamic HDR10 compilation" OFF)
39
-if (ENABLE_DYNAMIC_HDR10)
40
- add_subdirectory(dynamicHDR10)
41
- include_directories(dynamicHDR10)
42
- add_definitions(-DENABLE_DYNAMIC_HDR10)
43
-endif(ENABLE_DYNAMIC_HDR10)
44
45
+if (ENABLE_HDR10_PLUS)
46
+ include_directories(. dynamicHDR10 "${PROJECT_BINARY_DIR}")
47
+ add_subdirectory(dynamicHDR10)
48
+ add_definitions(-DENABLE_HDR10_PLUS)
49
+endif(ENABLE_HDR10_PLUS)
50
# this option can only be used when linking multiple libx265 libraries
51
# together, and some alternate API access method is implemented.
52
option(EXPORT_C_API "Implement public C programming interface" ON)
53
54
endif()
55
endif()
56
source_group(ASM FILES ${ASM_SRCS})
57
-if(ENABLE_DYNAMIC_HDR10)
58
+if(ENABLE_HDR10_PLUS)
59
add_library(x265-static STATIC $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10> ${ASM_OBJS} ${ASM_SRCS})
60
+ add_library(hdr10plus-static STATIC $<TARGET_OBJECTS:dynamicHDR10>)
61
+ set_target_properties(hdr10plus-static PROPERTIES OUTPUT_NAME hdr10plus)
62
else()
63
add_library(x265-static STATIC $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> ${ASM_OBJS} ${ASM_SRCS})
64
endif()
65
66
install(TARGETS x265-static
67
LIBRARY DESTINATION ${LIB_INSTALL_DIR}
68
ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
69
+
70
+if(ENABLE_HDR10_PLUS)
71
+ install(TARGETS hdr10plus-static
72
+ LIBRARY DESTINATION ${LIB_INSTALL_DIR}
73
+ ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
74
+endif()
75
install(FILES x265.h "${PROJECT_BINARY_DIR}/x265_config.h" DESTINATION include)
76
77
if(CMAKE_RC_COMPILER)
78
79
endif()
80
option(ENABLE_SHARED "Build shared library" ON)
81
if(ENABLE_SHARED)
82
-
83
- if(ENABLE_DYNAMIC_HDR10)
84
+ if(ENABLE_HDR10_PLUS)
85
add_library(x265-shared SHARED "${PROJECT_BINARY_DIR}/x265.def" ${ASM_OBJS}
86
${X265_RC_FILE} $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10>)
87
+ add_library(hdr10plus-shared SHARED $<TARGET_OBJECTS:dynamicHDR10>)
88
+
89
+ if(MSVC)
90
+ set_target_properties(hdr10plus-shared PROPERTIES OUTPUT_NAME libhdr10plus)
91
+ else()
92
+ set_target_properties(hdr10plus-shared PROPERTIES OUTPUT_NAME hdr10plus)
93
+ endif()
94
else()
95
add_library(x265-shared SHARED "${PROJECT_BINARY_DIR}/x265.def" ${ASM_OBJS}
96
${X265_RC_FILE} $<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common>)
97
98
ARCHIVE DESTINATION ${LIB_INSTALL_DIR}
99
RUNTIME DESTINATION ${BIN_INSTALL_DIR})
100
endif()
101
+ if(ENABLE_HDR10_PLUS)
102
+ install(TARGETS hdr10plus-shared
103
+ LIBRARY DESTINATION ${LIB_INSTALL_DIR}
104
+ ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
105
+ endif()
106
if(LINKER_OPTIONS)
107
# set_target_properties can't do list expansion
108
string(REPLACE ";" " " LINKER_OPTION_STR "${LINKER_OPTIONS}")
109
110
endif(WIN32)
111
if(XCODE)
112
# Xcode seems unable to link the CLI with libs, so link as one targget
113
- if(ENABLE_DYNAMIC_HDR10)
114
+ if(ENABLE_HDR10_PLUS)
115
add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT}
116
- x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp
117
+ x265.cpp x265.h x265cli.h
118
$<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> $<TARGET_OBJECTS:dynamicHDR10> ${ASM_OBJS} ${ASM_SRCS})
119
else()
120
add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT}
121
- x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp
122
+ x265.cpp x265.h x265cli.h
123
$<TARGET_OBJECTS:encoder> $<TARGET_OBJECTS:common> ${ASM_OBJS} ${ASM_SRCS})
124
endif()
125
else()
126
add_executable(cli ../COPYING ${InputFiles} ${OutputFiles} ${GETOPT} ${X265_RC_FILE}
127
- ${ExportDefs} x265.cpp x265.h x265cli.h x265-extras.h x265-extras.cpp)
128
+ ${ExportDefs} x265.cpp x265.h x265cli.h)
129
if(WIN32 OR NOT ENABLE_SHARED OR INTEL_CXX)
130
# The CLI cannot link to the shared library on Windows, it
131
# requires internal APIs not exported from the DLL
132
x265_2.4.tar.gz/source/common/CMakeLists.txt -> x265_2.5.tar.gz/source/common/CMakeLists.txt
Changed
14
1
2
set(VEC_PRIMITIVES vec/vec-primitives.cpp ${PRIMITIVES})
3
source_group(Intrinsics FILES ${VEC_PRIMITIVES})
4
5
- set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h)
6
+ set(C_SRCS asm-primitives.cpp pixel.h mc.h ipfilter8.h blockcopy8.h dct8.h loopfilter.h seaintegral.h)
7
set(A_SRCS pixel-a.asm const-a.asm cpu-a.asm ssd-a.asm mc-a.asm
8
mc-a2.asm pixel-util8.asm blockcopy8.asm
9
- pixeladd8.asm dct8.asm)
10
+ pixeladd8.asm dct8.asm seaintegral.asm)
11
if(HIGH_BIT_DEPTH)
12
set(A_SRCS ${A_SRCS} sad16-a.asm intrapred16.asm ipfilter16.asm loopfilter.asm)
13
else()
14
x265_2.4.tar.gz/source/common/common.h -> x265_2.5.tar.gz/source/common/common.h
Changed
9
1
2
#define LOG2_RASTER_SIZE (MAX_LOG2_CU_SIZE - LOG2_UNIT_SIZE)
3
#define RASTER_SIZE (1 << LOG2_RASTER_SIZE)
4
#define MAX_NUM_PARTITIONS (RASTER_SIZE * RASTER_SIZE)
5
-#define NUM_4x4_PARTITIONS (1U << (g_unitSizeDepth << 1)) // number of 4x4 units in max CU size
6
7
#define MIN_PU_SIZE 4
8
#define MIN_TU_SIZE 4
9
x265_2.4.tar.gz/source/common/constants.cpp -> x265_2.5.tar.gz/source/common/constants.cpp
Changed
9
1
2
65535
3
};
4
5
-int g_ctuSizeConfigured = 0;
6
uint32_t g_maxLog2CUSize = MAX_LOG2_CU_SIZE;
7
uint32_t g_maxCUSize = MAX_CU_SIZE;
8
uint32_t g_unitSizeDepth = NUM_CU_DEPTH;
9
x265_2.4.tar.gz/source/common/constants.h -> x265_2.5.tar.gz/source/common/constants.h
Changed
10
1
2
namespace X265_NS {
3
// private namespace
4
5
-extern int g_ctuSizeConfigured;
6
-
7
extern double x265_lambda_tab[QP_MAX_MAX + 1];
8
extern double x265_lambda2_tab[QP_MAX_MAX + 1];
9
extern const uint16_t x265_chroma_lambda2_offset_tab[MAX_CHROMA_LAMBDA_OFFSET + 1];
10
x265_2.4.tar.gz/source/common/cpu.cpp -> x265_2.5.tar.gz/source/common/cpu.cpp
Changed
31
1
2
{ "SSE2Slow", SSE2 | X265_CPU_SSE2_IS_SLOW },
3
{ "SSE2", SSE2 },
4
{ "SSE2Fast", SSE2 | X265_CPU_SSE2_IS_FAST },
5
+ { "LZCNT", X265_CPU_LZCNT },
6
{ "SSE3", SSE2 | X265_CPU_SSE3 },
7
{ "SSSE3", SSE2 | X265_CPU_SSE3 | X265_CPU_SSSE3 },
8
{ "SSE4.1", SSE2 | X265_CPU_SSE3 | X265_CPU_SSSE3 | X265_CPU_SSE4 },
9
10
{ "AVX", AVX },
11
{ "XOP", AVX | X265_CPU_XOP },
12
{ "FMA4", AVX | X265_CPU_FMA4 },
13
- { "AVX2", AVX | X265_CPU_AVX2 },
14
{ "FMA3", AVX | X265_CPU_FMA3 },
15
+ { "BMI1", AVX | X265_CPU_LZCNT | X265_CPU_BMI1 },
16
+ { "BMI2", AVX | X265_CPU_LZCNT | X265_CPU_BMI1 | X265_CPU_BMI2 },
17
+#define AVX2 AVX | X265_CPU_FMA3 | X265_CPU_LZCNT | X265_CPU_BMI1 | X265_CPU_BMI2 | X265_CPU_AVX2
18
+ { "AVX2", AVX2},
19
+#undef AVX2
20
#undef AVX
21
#undef SSE2
22
#undef MMX2
23
{ "Cache32", X265_CPU_CACHELINE_32 },
24
{ "Cache64", X265_CPU_CACHELINE_64 },
25
- { "LZCNT", X265_CPU_LZCNT },
26
- { "BMI1", X265_CPU_BMI1 },
27
- { "BMI2", X265_CPU_BMI1 | X265_CPU_BMI2 },
28
{ "SlowCTZ", X265_CPU_SLOW_CTZ },
29
{ "SlowAtom", X265_CPU_SLOW_ATOM },
30
{ "SlowPshufb", X265_CPU_SLOW_PSHUFB },
31
x265_2.4.tar.gz/source/common/cudata.cpp -> x265_2.5.tar.gz/source/common/cudata.cpp
Changed
219
1
2
#include "picyuv.h"
3
#include "mv.h"
4
#include "cudata.h"
5
+#define MAX_MV 1 << 14
6
7
using namespace X265_NS;
8
9
10
11
}
12
13
-cubcast_t CUData::s_partSet[NUM_FULL_DEPTH] = { NULL, NULL, NULL, NULL, NULL };
14
-uint32_t CUData::s_numPartInCUSize;
15
-
16
CUData::CUData()
17
{
18
memset(this, 0, sizeof(*this));
19
}
20
21
-void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp, int instance)
22
+void CUData::initialize(const CUDataMemPool& dataPool, uint32_t depth, const x265_param& param, int instance)
23
{
24
+ int csp = param.internalCsp;
25
m_chromaFormat = csp;
26
m_hChromaShift = CHROMA_H_SHIFT(csp);
27
m_vChromaShift = CHROMA_V_SHIFT(csp);
28
- m_numPartitions = NUM_4x4_PARTITIONS >> (depth * 2);
29
+ m_numPartitions = param.num4x4Partitions >> (depth * 2);
30
31
if (!s_partSet[0])
32
{
33
- s_numPartInCUSize = 1 << g_unitSizeDepth;
34
- switch (g_maxLog2CUSize)
35
+ s_numPartInCUSize = 1 << param.unitSizeDepth;
36
+ switch (param.maxLog2CUSize)
37
{
38
case 6:
39
s_partSet[0] = bcast256;
40
41
42
m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions;
43
44
- uint32_t cuSize = g_maxCUSize >> depth;
45
+ uint32_t cuSize = param.maxCUSize >> depth;
46
m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (cuSize * cuSize);
47
m_trCoeff[1] = m_trCoeff[2] = 0;
48
m_transformSkip[1] = m_transformSkip[2] = m_cbf[1] = m_cbf[2] = 0;
49
50
51
m_distortion = dataPool.distortionMemBlock + instance * m_numPartitions;
52
53
- uint32_t cuSize = g_maxCUSize >> depth;
54
+ uint32_t cuSize = param.maxCUSize >> depth;
55
uint32_t sizeL = cuSize * cuSize;
56
uint32_t sizeC = sizeL >> (m_hChromaShift + m_vChromaShift); // block chroma part
57
m_trCoeff[0] = dataPool.trCoeffMemBlock + instance * (sizeL + sizeC * 2);
58
59
m_encData = frame.m_encData;
60
m_slice = m_encData->m_slice;
61
m_cuAddr = cuAddr;
62
- m_cuPelX = (cuAddr % m_slice->m_sps->numCuInWidth) << g_maxLog2CUSize;
63
- m_cuPelY = (cuAddr / m_slice->m_sps->numCuInWidth) << g_maxLog2CUSize;
64
+ m_cuPelX = (cuAddr % m_slice->m_sps->numCuInWidth) << m_slice->m_param->maxLog2CUSize;
65
+ m_cuPelY = (cuAddr / m_slice->m_sps->numCuInWidth) << m_slice->m_param->maxLog2CUSize;
66
m_absIdxInCTU = 0;
67
- m_numPartitions = NUM_4x4_PARTITIONS;
68
+ m_numPartitions = m_encData->m_param->num4x4Partitions;
69
m_bFirstRowInSlice = (uint8_t)firstRowInSlice;
70
m_bLastRowInSlice = (uint8_t)lastRowInSlice;
71
m_bLastCuInSlice = (uint8_t)lastCuInSlice;
72
73
/* sequential memsets */
74
m_partSet((uint8_t*)m_qp, (uint8_t)qp);
75
- m_partSet(m_log2CUSize, (uint8_t)g_maxLog2CUSize);
76
+ m_partSet(m_log2CUSize, (uint8_t)m_slice->m_param->maxLog2CUSize);
77
m_partSet(m_lumaIntraDir, (uint8_t)ALL_IDX);
78
m_partSet(m_chromaIntraDir, (uint8_t)ALL_IDX);
79
m_partSet(m_tqBypass, (uint8_t)frame.m_encData->m_param->bLossless);
80
81
82
memcpy(m_distortion + offset, subCU.m_distortion, childGeom.numPartitions * sizeof(sse_t));
83
84
- uint32_t tmp = 1 << ((g_maxLog2CUSize - childGeom.depth) * 2);
85
+ uint32_t tmp = 1 << ((m_slice->m_param->maxLog2CUSize - childGeom.depth) * 2);
86
uint32_t tmp2 = subPartIdx * tmp;
87
memcpy(m_trCoeff[0] + tmp2, subCU.m_trCoeff[0], sizeof(coeff_t)* tmp);
88
89
90
91
memcpy(ctu.m_distortion + m_absIdxInCTU, m_distortion, m_numPartitions * sizeof(sse_t));
92
93
- uint32_t tmpY = 1 << ((g_maxLog2CUSize - depth) * 2);
94
+ uint32_t tmpY = 1 << ((m_slice->m_param->maxLog2CUSize - depth) * 2);
95
uint32_t tmpY2 = m_absIdxInCTU << (LOG2_UNIT_SIZE * 2);
96
memcpy(ctu.m_trCoeff[0] + tmpY2, m_trCoeff[0], sizeof(coeff_t)* tmpY);
97
98
99
m_partCopy(ctu.m_tuDepth + m_absIdxInCTU, m_tuDepth);
100
m_partCopy(ctu.m_cbf[0] + m_absIdxInCTU, m_cbf[0]);
101
102
- uint32_t tmpY = 1 << ((g_maxLog2CUSize - depth) * 2);
103
+ uint32_t tmpY = 1 << ((m_slice->m_param->maxLog2CUSize - depth) * 2);
104
uint32_t tmpY2 = m_absIdxInCTU << (LOG2_UNIT_SIZE * 2);
105
memcpy(ctu.m_trCoeff[0] + tmpY2, m_trCoeff[0], sizeof(coeff_t)* tmpY);
106
107
108
return m_cuLeft;
109
}
110
111
- alPartUnitIdx = NUM_4x4_PARTITIONS - 1;
112
+ alPartUnitIdx = m_encData->m_param->num4x4Partitions - 1;
113
return m_cuAboveLeft;
114
}
115
116
117
/* Get left QpMinCu */
118
const CUData* CUData::getQpMinCuLeft(uint32_t& lPartUnitIdx, uint32_t curAbsIdxInCTU) const
119
{
120
- uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
121
+ uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
122
uint32_t absRorderQpMinCUIdx = g_zscanToRaster[absZorderQpMinCUIdx];
123
124
// check for left CTU boundary
125
126
/* Get above QpMinCu */
127
const CUData* CUData::getQpMinCuAbove(uint32_t& aPartUnitIdx, uint32_t curAbsIdxInCTU) const
128
{
129
- uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
130
+ uint32_t absZorderQpMinCUIdx = curAbsIdxInCTU & (0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2);
131
uint32_t absRorderQpMinCUIdx = g_zscanToRaster[absZorderQpMinCUIdx];
132
133
// check for top CTU boundary
134
135
136
int8_t CUData::getLastCodedQP(uint32_t absPartIdx) const
137
{
138
- uint32_t quPartIdxMask = 0xFF << (g_unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2;
139
+ uint32_t quPartIdxMask = 0xFF << (m_encData->m_param->unitSizeDepth - m_slice->m_pps->maxCuDQPDepth) * 2;
140
int lastValidPartIdx = getLastValidPartIdx(absPartIdx & quPartIdxMask);
141
142
if (lastValidPartIdx >= 0)
143
144
if (m_absIdxInCTU)
145
return m_encData->getPicCTU(m_cuAddr)->getLastCodedQP(m_absIdxInCTU);
146
else if (m_cuAddr > 0 && !(m_slice->m_pps->bEntropyCodingSyncEnabled && !(m_cuAddr % m_slice->m_sps->numCuInWidth)))
147
- return m_encData->getPicCTU(m_cuAddr - 1)->getLastCodedQP(NUM_4x4_PARTITIONS);
148
+ return m_encData->getPicCTU(m_cuAddr - 1)->getLastCodedQP(m_encData->m_param->num4x4Partitions);
149
else
150
return (int8_t)m_slice->m_sliceQp;
151
}
152
153
154
bool CUData::setQPSubCUs(int8_t qp, uint32_t absPartIdx, uint32_t depth)
155
{
156
- uint32_t curPartNumb = NUM_4x4_PARTITIONS >> (depth << 1);
157
+ uint32_t curPartNumb = m_encData->m_param->num4x4Partitions >> (depth << 1);
158
uint32_t curPartNumQ = curPartNumb >> 2;
159
160
if (m_cuDepth[absPartIdx] > depth)
161
162
dir |= (1 << list);
163
candMvField[count][list].mv = colmv;
164
candMvField[count][list].refIdx = refIdx;
165
+ if (m_encData->m_param->scaleFactor && m_encData->m_param->analysisReuseMode == X265_ANALYSIS_SAVE && m_log2CUSize[0] < 4)
166
+ {
167
+ MV dist(MAX_MV, MAX_MV);
168
+ candMvField[count][list].mv = dist;
169
+ }
170
}
171
}
172
173
174
int curRefPOC = m_slice->m_refPOCList[picList][refIdx];
175
int curPOC = m_slice->m_poc;
176
177
- pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC);
178
+ if (m_encData->m_param->scaleFactor && m_encData->m_param->analysisReuseMode == X265_ANALYSIS_SAVE && (m_log2CUSize[0] < 4))
179
+ {
180
+ MV dist(MAX_MV, MAX_MV);
181
+ pmv[numMvc++] = amvpCand[num++] = dist;
182
+ }
183
+ else
184
+ pmv[numMvc++] = amvpCand[num++] = scaleMvByPOCDist(neighbours[MD_COLLOCATED].mv[picList], curPOC, curRefPOC, colPOC, colRefPOC);
185
}
186
}
187
188
189
uint32_t offset = 8;
190
191
int16_t xmax = (int16_t)((m_slice->m_sps->picWidthInLumaSamples + offset - m_cuPelX - 1) << mvshift);
192
- int16_t xmin = -(int16_t)((g_maxCUSize + offset + m_cuPelX - 1) << mvshift);
193
+ int16_t xmin = -(int16_t)((m_encData->m_param->maxCUSize + offset + m_cuPelX - 1) << mvshift);
194
195
int16_t ymax = (int16_t)((m_slice->m_sps->picHeightInLumaSamples + offset - m_cuPelY - 1) << mvshift);
196
- int16_t ymin = -(int16_t)((g_maxCUSize + offset + m_cuPelY - 1) << mvshift);
197
+ int16_t ymin = -(int16_t)((m_encData->m_param->maxCUSize + offset + m_cuPelY - 1) << mvshift);
198
199
outMV.x = X265_MIN(xmax, X265_MAX(xmin, outMV.x));
200
outMV.y = X265_MIN(ymax, X265_MAX(ymin, outMV.y));
201
202
203
void CUData::calcCTUGeoms(uint32_t ctuWidth, uint32_t ctuHeight, uint32_t maxCUSize, uint32_t minCUSize, CUGeom cuDataArray[CUGeom::MAX_GEOMS])
204
{
205
+ uint32_t num4x4Partition = (1U << ((g_log2Size[maxCUSize] - LOG2_UNIT_SIZE) << 1));
206
+
207
// Initialize the coding blocks inside the CTB
208
for (uint32_t log2CUSize = g_log2Size[maxCUSize], rangeCUIdx = 0; log2CUSize >= g_log2Size[minCUSize]; log2CUSize--)
209
{
210
211
cu->log2CUSize = log2CUSize;
212
cu->childOffset = childIdx - cuIdx;
213
cu->absPartIdx = g_depthScanIdx[yOffset][xOffset] * 4;
214
- cu->numPartitions = (NUM_4x4_PARTITIONS >> ((g_maxLog2CUSize - cu->log2CUSize) * 2));
215
+ cu->numPartitions = (num4x4Partition >> ((g_log2Size[maxCUSize] - cu->log2CUSize) * 2));
216
cu->depth = g_log2Size[maxCUSize] - log2CUSize;
217
cu->geomRecurId = cuIdx;
218
219
x265_2.4.tar.gz/source/common/cudata.h -> x265_2.5.tar.gz/source/common/cudata.h
Changed
53
1
2
{
3
public:
4
5
- static cubcast_t s_partSet[NUM_FULL_DEPTH]; // pointer to broadcast set functions per absolute depth
6
- static uint32_t s_numPartInCUSize;
7
+ cubcast_t s_partSet[NUM_FULL_DEPTH]; // pointer to broadcast set functions per absolute depth
8
+ uint32_t s_numPartInCUSize;
9
10
bool m_vbvAffected;
11
12
13
14
CUData();
15
16
- void initialize(const CUDataMemPool& dataPool, uint32_t depth, int csp, int instance);
17
+ void initialize(const CUDataMemPool& dataPool, uint32_t depth, const x265_param& param, int instance);
18
static void calcCTUGeoms(uint32_t ctuWidth, uint32_t ctuHeight, uint32_t maxCUSize, uint32_t minCUSize, CUGeom cuDataArray[CUGeom::MAX_GEOMS]);
19
20
void initCTU(const Frame& frame, uint32_t cuAddr, int qp, uint32_t firstRowInSlice, uint32_t lastRowInSlice, uint32_t lastCUInSlice);
21
22
void getInterTUQtDepthRange(uint32_t tuDepthRange[2], uint32_t absPartIdx) const;
23
uint32_t getBestRefIdx(uint32_t subPartIdx) const { return ((m_interDir[subPartIdx] & 1) << m_refIdx[0][subPartIdx]) |
24
(((m_interDir[subPartIdx] >> 1) & 1) << (m_refIdx[1][subPartIdx] + 16)); }
25
- uint32_t getPUOffset(uint32_t puIdx, uint32_t absPartIdx) const { return (partAddrTable[(int)m_partSize[absPartIdx]][puIdx] << (g_unitSizeDepth - m_cuDepth[absPartIdx]) * 2) >> 4; }
26
+ uint32_t getPUOffset(uint32_t puIdx, uint32_t absPartIdx) const { return (partAddrTable[(int)m_partSize[absPartIdx]][puIdx] << (m_slice->m_param->unitSizeDepth - m_cuDepth[absPartIdx]) * 2) >> 4; }
27
28
uint32_t getNumPartInter(uint32_t absPartIdx) const { return nbPartsTable[(int)m_partSize[absPartIdx]]; }
29
bool isIntra(uint32_t absPartIdx) const { return m_predMode[absPartIdx] == MODE_INTRA; }
30
31
void getAllowedChromaDir(uint32_t absPartIdx, uint32_t* modeList) const;
32
int getIntraDirLumaPredictor(uint32_t absPartIdx, uint32_t* intraDirPred) const;
33
34
- uint32_t getSCUAddr() const { return (m_cuAddr << g_unitSizeDepth * 2) + m_absIdxInCTU; }
35
+ uint32_t getSCUAddr() const { return (m_cuAddr << m_slice->m_param->unitSizeDepth * 2) + m_absIdxInCTU; }
36
uint32_t getCtxSplitFlag(uint32_t absPartIdx, uint32_t depth) const;
37
uint32_t getCtxSkipFlag(uint32_t absPartIdx) const;
38
void getTUEntropyCodingParameters(TUEntropyCodingParameters &result, uint32_t absPartIdx, uint32_t log2TrSize, bool bIsLuma) const;
39
40
41
CUDataMemPool() { charMemBlock = NULL; trCoeffMemBlock = NULL; mvMemBlock = NULL; distortionMemBlock = NULL; }
42
43
- bool create(uint32_t depth, uint32_t csp, uint32_t numInstances)
44
+ bool create(uint32_t depth, uint32_t csp, uint32_t numInstances, const x265_param& param)
45
{
46
- uint32_t numPartition = NUM_4x4_PARTITIONS >> (depth * 2);
47
- uint32_t cuSize = g_maxCUSize >> depth;
48
+ uint32_t numPartition = param.num4x4Partitions >> (depth * 2);
49
+ uint32_t cuSize = param.maxCUSize >> depth;
50
uint32_t sizeL = cuSize * cuSize;
51
if (csp == X265_CSP_I400)
52
{
53
x265_2.4.tar.gz/source/common/frame.cpp -> x265_2.5.tar.gz/source/common/frame.cpp
Changed
94
1
2
m_rcData = NULL;
3
m_encodeStartTime = 0;
4
m_reconfigureRc = false;
5
+ m_ctuInfo = NULL;
6
+ m_prevCtuInfoChange = NULL;
7
+ m_addOnDepth = NULL;
8
+ m_addOnCtuInfo = NULL;
9
+ m_addOnPrevChange = NULL;
10
}
11
12
bool Frame::create(x265_param *param, float* quantOffsets)
13
14
m_param = param;
15
CHECKED_MALLOC_ZERO(m_rcData, RcStats, 1);
16
17
- if (m_fencPic->create(param->sourceWidth, param->sourceHeight, param->internalCsp) &&
18
- m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode || !!param->bAQMotion, param->rc.qgSize))
19
+ if (param->bCTUInfo)
20
+ {
21
+ uint32_t widthInCTU = (m_param->sourceWidth + param->maxCUSize - 1) >> m_param->maxLog2CUSize;
22
+ uint32_t heightInCTU = (m_param->sourceHeight + param->maxCUSize - 1) >> m_param->maxLog2CUSize;
23
+ uint32_t numCTUsInFrame = widthInCTU * heightInCTU;
24
+ CHECKED_MALLOC_ZERO(m_addOnDepth, uint8_t *, numCTUsInFrame);
25
+ CHECKED_MALLOC_ZERO(m_addOnCtuInfo, uint8_t *, numCTUsInFrame);
26
+ CHECKED_MALLOC_ZERO(m_addOnPrevChange, int *, numCTUsInFrame);
27
+ for (uint32_t i = 0; i < numCTUsInFrame; i++)
28
+ {
29
+ CHECKED_MALLOC_ZERO(m_addOnDepth[i], uint8_t, uint32_t(param->num4x4Partitions));
30
+ CHECKED_MALLOC_ZERO(m_addOnCtuInfo[i], uint8_t, uint32_t(param->num4x4Partitions));
31
+ CHECKED_MALLOC_ZERO(m_addOnPrevChange[i], int, uint32_t(param->num4x4Partitions));
32
+ }
33
+ }
34
+
35
+ if (m_fencPic->create(param) && m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode || !!param->bAQMotion, param->rc.qgSize))
36
{
37
X265_CHECK((m_reconColCount == NULL), "m_reconColCount was initialized");
38
- m_numRows = (m_fencPic->m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
39
+ m_numRows = (m_fencPic->m_picHeight + param->maxCUSize - 1) / param->maxCUSize;
40
m_reconRowFlag = new ThreadSafeInteger[m_numRows];
41
m_reconColCount = new ThreadSafeInteger[m_numRows];
42
43
44
m_reconPic = new PicYuv;
45
m_param = param;
46
m_encData->m_reconPic = m_reconPic;
47
- bool ok = m_encData->create(*param, sps, m_fencPic->m_picCsp) && m_reconPic->create(param->sourceWidth, param->sourceHeight, param->internalCsp);
48
+ bool ok = m_encData->create(*param, sps, m_fencPic->m_picCsp) && m_reconPic->create(param);
49
if (ok)
50
{
51
/* initialize right border of m_reconpicYuv as SAO may read beyond the
52
* end of the picture accessing uninitialized pixels */
53
- int maxHeight = sps.numCuInHeight * g_maxCUSize;
54
+ int maxHeight = sps.numCuInHeight * param->maxCUSize;
55
memset(m_reconPic->m_picOrg[0], 0, sizeof(pixel)* m_reconPic->m_stride * maxHeight);
56
57
/* use pre-calculated cu/pu offsets cached in the SPS structure */
58
59
delete[] m_userSEI.payloads;
60
}
61
62
+ if (m_ctuInfo)
63
+ {
64
+ uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
65
+ uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
66
+ uint32_t numCUsInFrame = widthInCU * heightInCU;
67
+ for (uint32_t i = 0; i < numCUsInFrame; i++)
68
+ {
69
+ X265_FREE((*m_ctuInfo + i)->ctuInfo);
70
+ (*m_ctuInfo + i)->ctuInfo = NULL;
71
+ X265_FREE(m_addOnDepth[i]);
72
+ m_addOnDepth[i] = NULL;
73
+ X265_FREE(m_addOnCtuInfo[i]);
74
+ m_addOnCtuInfo[i] = NULL;
75
+ X265_FREE(m_addOnPrevChange[i]);
76
+ m_addOnPrevChange[i] = NULL;
77
+ }
78
+ X265_FREE(*m_ctuInfo);
79
+ *m_ctuInfo = NULL;
80
+ X265_FREE(m_ctuInfo);
81
+ m_ctuInfo = NULL;
82
+ X265_FREE(m_prevCtuInfoChange);
83
+ m_prevCtuInfoChange = NULL;
84
+ X265_FREE(m_addOnDepth);
85
+ m_addOnDepth = NULL;
86
+ X265_FREE(m_addOnCtuInfo);
87
+ m_addOnCtuInfo = NULL;
88
+ X265_FREE(m_addOnPrevChange);
89
+ m_addOnPrevChange = NULL;
90
+ }
91
m_lowres.destroy();
92
X265_FREE(m_rcData);
93
}
94
x265_2.4.tar.gz/source/common/frame.h -> x265_2.5.tar.gz/source/common/frame.h
Changed
27
1
2
double shortTermCplxCount;
3
int64_t totalBits;
4
int64_t encodedBits;
5
+ double coeff[4];
6
+ double count[4];
7
+ double offset[4];
8
+ double bufferFillFinal;
9
};
10
11
class Frame
12
13
x265_analysis_2Pass m_analysis2Pass;
14
RcStats* m_rcData;
15
16
+ x265_ctu_info_t** m_ctuInfo;
17
+ Event m_copied;
18
+ int* m_prevCtuInfoChange;
19
int64_t m_encodeStartTime;
20
+
21
+ uint8_t** m_addOnDepth;
22
+ uint8_t** m_addOnCtuInfo;
23
+ int** m_addOnPrevChange;
24
Frame();
25
26
bool create(x265_param *param, float* quantOffsets);
27
x265_2.4.tar.gz/source/common/framedata.cpp -> x265_2.5.tar.gz/source/common/framedata.cpp
Changed
13
1
2
if (param.rc.bStatWrite)
3
m_spsrps = const_cast<RPS*>(sps.spsrps);
4
5
- m_cuMemPool.create(0, param.internalCsp, sps.numCUsInFrame);
6
+ m_cuMemPool.create(0, param.internalCsp, sps.numCUsInFrame, param);
7
for (uint32_t ctuAddr = 0; ctuAddr < sps.numCUsInFrame; ctuAddr++)
8
- m_picCTU[ctuAddr].initialize(m_cuMemPool, 0, param.internalCsp, ctuAddr);
9
+ m_picCTU[ctuAddr].initialize(m_cuMemPool, 0, param, ctuAddr);
10
11
CHECKED_MALLOC_ZERO(m_cuStat, RCStatCU, sps.numCUsInFrame);
12
CHECKED_MALLOC(m_rowStat, RCStatRow, sps.numCuInHeight);
13
x265_2.4.tar.gz/source/common/framedata.h -> x265_2.5.tar.gz/source/common/framedata.h
Changed
25
1
2
double percentMergeCu[NUM_CU_DEPTH];
3
double percentIntraDistribution[NUM_CU_DEPTH][INTRA_MODES];
4
double percentInterDistribution[NUM_CU_DEPTH][3]; // 2Nx2N, RECT, AMP modes percentage
5
+ double ipCostRatio;
6
7
uint64_t cntIntraNxN;
8
uint64_t totalCu;
9
10
uint64_t cuInterDistribution[NUM_CU_DEPTH][INTER_MODES];
11
uint64_t cuIntraDistribution[NUM_CU_DEPTH][INTRA_MODES];
12
13
+
14
+ uint64_t totalPu[NUM_CU_DEPTH + 1];
15
+ uint64_t cntSkipPu[NUM_CU_DEPTH];
16
+ uint64_t cntIntraPu[NUM_CU_DEPTH];
17
+ uint64_t cntAmp[NUM_CU_DEPTH];
18
+ uint64_t cnt4x4;
19
+ uint64_t cntInterPu[NUM_CU_DEPTH][INTER_MODES - 1];
20
+ uint64_t cntMergePu[NUM_CU_DEPTH][INTER_MODES - 1];
21
+
22
FrameStats()
23
{
24
memset(this, 0, sizeof(FrameStats));
25
x265_2.4.tar.gz/source/common/ipfilter.cpp -> x265_2.5.tar.gz/source/common/ipfilter.cpp
Changed
24
1
2
const int16_t* coeff = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx];
3
int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
4
int shift = IF_FILTER_PREC - headRoom;
5
- int offset = -IF_INTERNAL_OFFS << shift;
6
+ int offset = (unsigned)-IF_INTERNAL_OFFS << shift;
7
int blkheight = height;
8
-
9
src -= N / 2 - 1;
10
11
if (isRowExt)
12
13
const int16_t* c = (N == 4) ? g_chromaFilter[coeffIdx] : g_lumaFilter[coeffIdx];
14
int headRoom = IF_INTERNAL_PREC - X265_DEPTH;
15
int shift = IF_FILTER_PREC - headRoom;
16
- int offset = -IF_INTERNAL_OFFS << shift;
17
-
18
+ int offset = (unsigned)-IF_INTERNAL_OFFS << shift;
19
src -= (N / 2 - 1) * srcStride;
20
-
21
int row, col;
22
for (row = 0; row < height; row++)
23
{
24
x265_2.4.tar.gz/source/common/lowres.h -> x265_2.5.tar.gz/source/common/lowres.h
Changed
10
1
2
bool bKeyframe;
3
bool bLastMiniGopBFrame;
4
5
+ double ipCostRatio;
6
+
7
/* lookahead output data */
8
int64_t costEst[X265_BFRAME_MAX + 2][X265_BFRAME_MAX + 2];
9
int64_t costEstAq[X265_BFRAME_MAX + 2][X265_BFRAME_MAX + 2];
10
x265_2.4.tar.gz/source/common/param.cpp -> x265_2.5.tar.gz/source/common/param.cpp
Changed
232
1
2
param->frameNumThreads = 0;
3
4
param->logLevel = X265_LOG_INFO;
5
+ param->csvLogLevel = 0;
6
param->csvfn = NULL;
7
param->rc.lambdaFileName = NULL;
8
param->bLogCuStats = 0;
9
10
param->rdPenalty = 0;
11
param->psyRd = 2.0;
12
param->psyRdoq = 0.0;
13
- param->analysisMode = 0;
14
+ param->analysisReuseMode = 0;
15
param->analysisMultiPassRefine = 0;
16
param->analysisMultiPassDistortion = 0;
17
- param->analysisFileName = NULL;
18
+ param->analysisReuseFileName = NULL;
19
param->bIntraInBFrames = 0;
20
param->bLossless = 0;
21
param->bCULossless = 0;
22
23
param->rc.bEnableGrain = 0;
24
param->rc.qpMin = 0;
25
param->rc.qpMax = QP_MAX_MAX;
26
+ param->rc.bEnableConstVbv = 0;
27
28
/* Video Usability Information (VUI) */
29
param->vui.aspectRatioIdc = 0;
30
31
param->bOptCUDeltaQP = 0;
32
param->bAQMotion = 0;
33
param->bHDROpt = 0;
34
- param->analysisRefineLevel = 5;
35
+ param->analysisReuseLevel = 5;
36
37
param->toneMapFile = NULL;
38
param->bDhdr10opt = 0;
39
+ param->bCTUInfo = 0;
40
+ param->bUseRcStats = 0;
41
+ param->scaleFactor = 0;
42
+ param->intraRefine = 0;
43
+ param->interRefine = 0;
44
+ param->mvRefine = 0;
45
+ param->bUseAnalysisFile = 1;
46
+ param->csvfpt = NULL;
47
}
48
49
int x265_param_default_preset(x265_param* param, const char* preset, const char* tune)
50
51
param->psyRd = 4.0;
52
param->psyRdoq = 10.0;
53
param->bEnableSAO = 0;
54
+ param->rc.bEnableConstVbv = 1;
55
}
56
else
57
return -1;
58
59
p->rc.bStrictCbr = atobool(value);
60
p->rc.pbFactor = 1.0;
61
}
62
- OPT("analysis-mode") p->analysisMode = parseName(value, x265_analysis_names, bError);
63
+ OPT("analysis-reuse-mode") p->analysisReuseMode = parseName(value, x265_analysis_names, bError);
64
OPT("sar")
65
{
66
p->vui.aspectRatioIdc = parseName(value, x265_sar_names, bError);
67
68
OPT("scaling-list") p->scalingLists = strdup(value);
69
OPT2("pools", "numa-pools") p->numaPools = strdup(value);
70
OPT("lambda-file") p->rc.lambdaFileName = strdup(value);
71
- OPT("analysis-file") p->analysisFileName = strdup(value);
72
+ OPT("analysis-reuse-file") p->analysisReuseFileName = strdup(value);
73
OPT("qg-size") p->rc.qgSize = atoi(value);
74
OPT("master-display") p->masteringDisplayColorVolume = strdup(value);
75
OPT("max-cll") bError |= sscanf(value, "%hu,%hu", &p->maxCLL, &p->maxFALL) != 2;
76
77
if (bExtraParams)
78
{
79
if (0) ;
80
+ OPT("csv") p->csvfn = strdup(value);
81
+ OPT("csv-log-level") p->csvLogLevel = atoi(value);
82
OPT("qpmin") p->rc.qpMin = atoi(value);
83
OPT("analyze-src-pics") p->bSourceReferenceEstimation = atobool(value);
84
OPT("log2-max-poc-lsb") p->log2MaxPocLsb = atoi(value);
85
86
OPT("multi-pass-opt-distortion") p->analysisMultiPassDistortion = atobool(value);
87
OPT("aq-motion") p->bAQMotion = atobool(value);
88
OPT("dynamic-rd") p->dynamicRd = atof(value);
89
- OPT("refine-level") p->analysisRefineLevel = atoi(value);
90
+ OPT("analysis-reuse-level") p->analysisReuseLevel = atoi(value);
91
OPT("ssim-rd")
92
{
93
int bval = atobool(value);
94
95
OPT("limit-sao") p->bLimitSAO = atobool(value);
96
OPT("dhdr10-info") p->toneMapFile = strdup(value);
97
OPT("dhdr10-opt") p->bDhdr10opt = atobool(value);
98
+ OPT("const-vbv") p->rc.bEnableConstVbv = atobool(value);
99
+ OPT("ctu-info") p->bCTUInfo = atoi(value);
100
+ OPT("scale-factor") p->scaleFactor = atoi(value);
101
+ OPT("refine-intra")p->intraRefine = atoi(value);
102
+ OPT("refine-inter")p->interRefine = atobool(value);
103
+ OPT("refine-mv")p->mvRefine = atobool(value);
104
else
105
return X265_PARAM_BAD_NAME;
106
}
107
108
"Constant QP is incompatible with 2pass");
109
CHECK(param->rc.bStrictCbr && (param->rc.bitrate <= 0 || param->rc.vbvBufferSize <=0),
110
"Strict-cbr cannot be applied without specifying target bitrate or vbv bufsize");
111
- CHECK(param->analysisMode && (param->analysisMode < X265_ANALYSIS_OFF || param->analysisMode > X265_ANALYSIS_LOAD),
112
+ CHECK(param->analysisReuseMode && (param->analysisReuseMode < X265_ANALYSIS_OFF || param->analysisReuseMode > X265_ANALYSIS_LOAD),
113
"Invalid analysis mode. Analysis mode 0: OFF 1: SAVE : 2 LOAD");
114
- CHECK(param->analysisMode && (param->analysisRefineLevel < 1 || param->analysisRefineLevel > 10),
115
+ CHECK(param->analysisReuseMode && (param->analysisReuseLevel < 1 || param->analysisReuseLevel > 10),
116
"Invalid analysis refine level. Value must be between 1 and 10 (inclusive)");
117
+ CHECK(param->scaleFactor > 2, "Invalid scale-factor. Supports factor <= 2");
118
CHECK(param->rc.qpMax < QP_MIN || param->rc.qpMax > QP_MAX_MAX,
119
"qpmax exceeds supported range (0 to 69)");
120
CHECK(param->rc.qpMin < QP_MIN || param->rc.qpMin > QP_MAX_MAX,
121
"qpmin exceeds supported range (0 to 69)");
122
CHECK(param->log2MaxPocLsb < 4 || param->log2MaxPocLsb > 16,
123
"Supported range for log2MaxPocLsb is 4 to 16");
124
+ CHECK(param->bCTUInfo < 0 || (param->bCTUInfo != 0 && param->bCTUInfo != 1 && param->bCTUInfo != 2 && param->bCTUInfo != 4 && param->bCTUInfo != 6) || param->bCTUInfo > 6,
125
+ "Supported values for bCTUInfo are 0, 1, 2, 4, 6");
126
#if !X86_64
127
CHECK(param->searchMethod == X265_SEA && (param->sourceWidth > 840 || param->sourceHeight > 480),
128
"SEA motion search does not support resolutions greater than 480p in 32 bit build");
129
130
}
131
}
132
133
-int x265_set_globals(x265_param* param)
134
-{
135
- uint32_t maxLog2CUSize = (uint32_t)g_log2Size[param->maxCUSize];
136
- uint32_t minLog2CUSize = (uint32_t)g_log2Size[param->minCUSize];
137
-
138
- Lock gLock;
139
- ScopedLock sLock(gLock);
140
-
141
- if (++g_ctuSizeConfigured > 1)
142
- {
143
- if (g_maxCUSize != param->maxCUSize)
144
- {
145
- x265_log(param, X265_LOG_WARNING, "maxCUSize must be the same for all encoders in a single process");
146
- }
147
- if (g_maxCUDepth != maxLog2CUSize - minLog2CUSize)
148
- {
149
- x265_log(param, X265_LOG_WARNING, "maxCUDepth must be the same for all encoders in a single process");
150
- }
151
- param->maxCUSize = g_maxCUSize;
152
- return x265_check_params(param); /* Check again, since param may have changed */
153
- }
154
- else
155
- {
156
- // set max CU width & height
157
- g_maxCUSize = param->maxCUSize;
158
- g_maxLog2CUSize = maxLog2CUSize;
159
-
160
- // compute actual CU depth with respect to config depth and max transform size
161
- g_maxCUDepth = maxLog2CUSize - minLog2CUSize;
162
- g_unitSizeDepth = maxLog2CUSize - LOG2_UNIT_SIZE;
163
- }
164
-
165
- g_maxSlices = param->maxSlices;
166
- return 0;
167
-}
168
-
169
static void appendtool(x265_param* param, char* buf, size_t size, const char* toolstr)
170
{
171
static const int overhead = (int)strlen("x265 [info]: tools: ");
172
173
TOOLOPT(param->bEnableStrongIntraSmoothing, "strong-intra-smoothing");
174
TOOLVAL(param->lookaheadSlices, "lslices=%d");
175
TOOLVAL(param->lookaheadThreads, "lthreads=%d")
176
+ TOOLVAL(param->bCTUInfo, "ctu-info=%d");
177
if (param->maxSlices > 1)
178
TOOLVAL(param->maxSlices, "slices=%d");
179
if (param->bEnableLoopFilter)
180
181
TOOLOPT(!param->bSaoNonDeblocked && param->bEnableSAO, "sao");
182
TOOLOPT(param->rc.bStatWrite, "stats-write");
183
TOOLOPT(param->rc.bStatRead, "stats-read");
184
-#if ENABLE_DYNAMIC_HDR10
185
- TOOLVAL(param->toneMapFile != NULL, "dhdr10-info");
186
+#if ENABLE_HDR10_PLUS
187
+ TOOLOPT(param->toneMapFile != NULL, "dhdr10-info");
188
#endif
189
x265_log(param, X265_LOG_INFO, "tools:%s\n", buf);
190
fflush(stderr);
191
192
BOOL(p->bEnablePsnr, "psnr");
193
BOOL(p->bEnableSsim, "ssim");
194
s += sprintf(s, " log-level=%d", p->logLevel);
195
+ if (p->csvfn)
196
+ s += sprintf(s, " csvfn=%s csv-log-level=%d", p->csvfn, p->csvLogLevel);
197
s += sprintf(s, " bitdepth=%d", p->internalBitDepth);
198
s += sprintf(s, " input-csp=%d", p->internalCsp);
199
s += sprintf(s, " fps=%u/%u", p->fpsNum, p->fpsDenom);
200
201
s += sprintf(s, " psy-rd=%.2f", p->psyRd);
202
s += sprintf(s, " psy-rdoq=%.2f", p->psyRdoq);
203
BOOL(p->bEnableRdRefine, "rd-refine");
204
- s += sprintf(s, " analysis-mode=%d", p->analysisMode);
205
+ s += sprintf(s, " analysis-reuse-mode=%d", p->analysisReuseMode);
206
BOOL(p->bLossless, "lossless");
207
s += sprintf(s, " cbqpoffs=%d", p->cbQpOffset);
208
s += sprintf(s, " crqpoffs=%d", p->crQpOffset);
209
210
s += sprintf(s, " qg-size=%d", p->rc.qgSize);
211
BOOL(p->rc.bEnableGrain, "rc-grain");
212
s += sprintf(s, " qpmax=%d qpmin=%d", p->rc.qpMax, p->rc.qpMin);
213
+ BOOL(p->rc.bEnableConstVbv, "const-vbv");
214
s += sprintf(s, " sar=%d", p->vui.aspectRatioIdc);
215
if (p->vui.aspectRatioIdc == X265_EXTENDED_SAR)
216
s += sprintf(s, " sar-width : sar-height=%d:%d", p->vui.sarWidth, p->vui.sarHeight);
217
218
BOOL(p->bEmitHDRSEI, "hdr");
219
BOOL(p->bHDROpt, "hdr-opt");
220
BOOL(p->bDhdr10opt, "dhdr10-opt");
221
- s += sprintf(s, " refine-level=%d", p->analysisRefineLevel);
222
+ s += sprintf(s, " analysis-reuse-level=%d", p->analysisReuseLevel);
223
+ s += sprintf(s, " scale-factor=%d", p->scaleFactor);
224
+ s += sprintf(s, " refine-intra=%d", p->intraRefine);
225
+ s += sprintf(s, " refine-inter=%d", p->interRefine);
226
+ s += sprintf(s, " refine-mv=%d", p->mvRefine);
227
BOOL(p->bLimitSAO, "limit-sao");
228
+ s += sprintf(s, " ctu-info=%d", p->bCTUInfo);
229
#undef BOOL
230
return buf;
231
}
232
x265_2.4.tar.gz/source/common/param.h -> x265_2.5.tar.gz/source/common/param.h
Changed
9
1
2
namespace X265_NS {
3
4
int x265_check_params(x265_param *param);
5
-int x265_set_globals(x265_param *param);
6
void x265_print_params(x265_param *param);
7
void x265_param_apply_fastfirstpass(x265_param *p);
8
char* x265_param2string(x265_param *param, int padx, int pady);
9
x265_2.4.tar.gz/source/common/picyuv.cpp -> x265_2.5.tar.gz/source/common/picyuv.cpp
Changed
189
1
2
3
m_maxLumaLevel = 0;
4
m_avgLumaLevel = 0;
5
+
6
+ m_maxChromaULevel = 0;
7
+ m_avgChromaULevel = 0;
8
+
9
+ m_maxChromaVLevel = 0;
10
+ m_avgChromaVLevel = 0;
11
+
12
+#if (X265_DEPTH > 8)
13
+ m_minLumaLevel = 0xFFFF;
14
+ m_minChromaULevel = 0xFFFF;
15
+ m_minChromaVLevel = 0xFFFF;
16
+#else
17
+ m_minLumaLevel = 0xFF;
18
+ m_minChromaULevel = 0xFF;
19
+ m_minChromaVLevel = 0xFF;
20
+#endif
21
+
22
m_stride = 0;
23
m_strideC = 0;
24
m_hChromaShift = 0;
25
m_vChromaShift = 0;
26
}
27
28
-bool PicYuv::create(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp)
29
+bool PicYuv::create(x265_param* param, pixel *pixelbuf)
30
{
31
+ m_param = param;
32
+ uint32_t picWidth = m_param->sourceWidth;
33
+ uint32_t picHeight = m_param->sourceHeight;
34
+ uint32_t picCsp = m_param->internalCsp;
35
m_picWidth = picWidth;
36
m_picHeight = picHeight;
37
m_hChromaShift = CHROMA_H_SHIFT(picCsp);
38
m_vChromaShift = CHROMA_V_SHIFT(picCsp);
39
m_picCsp = picCsp;
40
41
- uint32_t numCuInWidth = (m_picWidth + g_maxCUSize - 1) / g_maxCUSize;
42
- uint32_t numCuInHeight = (m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
43
+ uint32_t numCuInWidth = (m_picWidth + param->maxCUSize - 1) / param->maxCUSize;
44
+ uint32_t numCuInHeight = (m_picHeight + param->maxCUSize - 1) / param->maxCUSize;
45
46
- m_lumaMarginX = g_maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
47
- m_lumaMarginY = g_maxCUSize + 16; // margin for 8-tap filter and infinite padding
48
- m_stride = (numCuInWidth * g_maxCUSize) + (m_lumaMarginX << 1);
49
+ m_lumaMarginX = param->maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
50
+ m_lumaMarginY = param->maxCUSize + 16; // margin for 8-tap filter and infinite padding
51
+ m_stride = (numCuInWidth * param->maxCUSize) + (m_lumaMarginX << 1);
52
53
- int maxHeight = numCuInHeight * g_maxCUSize;
54
- CHECKED_MALLOC(m_picBuf[0], pixel, m_stride * (maxHeight + (m_lumaMarginY * 2)));
55
- m_picOrg[0] = m_picBuf[0] + m_lumaMarginY * m_stride + m_lumaMarginX;
56
+ int maxHeight = numCuInHeight * param->maxCUSize;
57
+ if (pixelbuf)
58
+ m_picOrg[0] = pixelbuf;
59
+ else
60
+ {
61
+ CHECKED_MALLOC(m_picBuf[0], pixel, m_stride * (maxHeight + (m_lumaMarginY * 2)));
62
+ m_picOrg[0] = m_picBuf[0] + m_lumaMarginY * m_stride + m_lumaMarginX;
63
+ }
64
65
if (picCsp != X265_CSP_I400)
66
{
67
m_chromaMarginX = m_lumaMarginX; // keep 16-byte alignment for chroma CTUs
68
m_chromaMarginY = m_lumaMarginY >> m_vChromaShift;
69
- m_strideC = ((numCuInWidth * g_maxCUSize) >> m_hChromaShift) + (m_chromaMarginX * 2);
70
+ m_strideC = ((numCuInWidth * m_param->maxCUSize) >> m_hChromaShift) + (m_chromaMarginX * 2);
71
72
CHECKED_MALLOC(m_picBuf[1], pixel, m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2)));
73
CHECKED_MALLOC(m_picBuf[2], pixel, m_strideC * ((maxHeight >> m_vChromaShift) + (m_chromaMarginY * 2)));
74
75
return false;
76
}
77
78
+int PicYuv::getLumaBufLen(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp)
79
+{
80
+ m_picWidth = picWidth;
81
+ m_picHeight = picHeight;
82
+ m_hChromaShift = CHROMA_H_SHIFT(picCsp);
83
+ m_vChromaShift = CHROMA_V_SHIFT(picCsp);
84
+ m_picCsp = picCsp;
85
+
86
+ uint32_t numCuInWidth = (m_picWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
87
+ uint32_t numCuInHeight = (m_picHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
88
+
89
+ m_lumaMarginX = m_param->maxCUSize + 32; // search margin and 8-tap filter half-length, padded for 32-byte alignment
90
+ m_lumaMarginY = m_param->maxCUSize + 16; // margin for 8-tap filter and infinite padding
91
+ m_stride = (numCuInWidth * m_param->maxCUSize) + (m_lumaMarginX << 1);
92
+
93
+ int maxHeight = numCuInHeight * m_param->maxCUSize;
94
+ int bufLen = (int)(m_stride * (maxHeight + (m_lumaMarginY * 2)));
95
+
96
+ return bufLen;
97
+}
98
+
99
/* the first picture allocated by the encoder will be asked to generate these
100
* offset arrays. Once generated, they will be provided to all future PicYuv
101
* allocated by the same encoder. */
102
bool PicYuv::createOffsets(const SPS& sps)
103
{
104
- uint32_t numPartitions = 1 << (g_unitSizeDepth * 2);
105
+ uint32_t numPartitions = 1 << (m_param->unitSizeDepth * 2);
106
107
if (m_picCsp != X265_CSP_I400)
108
{
109
110
{
111
for (uint32_t cuCol = 0; cuCol < sps.numCuInWidth; cuCol++)
112
{
113
- m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * g_maxCUSize + cuCol * g_maxCUSize;
114
- m_cuOffsetC[cuRow * sps.numCuInWidth + cuCol] = m_strideC * cuRow * (g_maxCUSize >> m_vChromaShift) + cuCol * (g_maxCUSize >> m_hChromaShift);
115
+ m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * m_param->maxCUSize + cuCol * m_param->maxCUSize;
116
+ m_cuOffsetC[cuRow * sps.numCuInWidth + cuCol] = m_strideC * cuRow * (m_param->maxCUSize >> m_vChromaShift) + cuCol * (m_param->maxCUSize >> m_hChromaShift);
117
}
118
}
119
120
121
CHECKED_MALLOC(m_cuOffsetY, intptr_t, sps.numCuInWidth * sps.numCuInHeight);
122
for (uint32_t cuRow = 0; cuRow < sps.numCuInHeight; cuRow++)
123
for (uint32_t cuCol = 0; cuCol < sps.numCuInWidth; cuCol++)
124
- m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * g_maxCUSize + cuCol * g_maxCUSize;
125
+ m_cuOffsetY[cuRow * sps.numCuInWidth + cuCol] = m_stride * cuRow * m_param->maxCUSize + cuCol * m_param->maxCUSize;
126
127
CHECKED_MALLOC(m_buOffsetY, intptr_t, (size_t)numPartitions);
128
for (uint32_t idx = 0; idx < numPartitions; ++idx)
129
130
131
X265_CHECK(pic.bitDepth >= 8, "pic.bitDepth check failure");
132
133
+ uint64_t lumaSum;
134
+ uint64_t cbSum;
135
+ uint64_t crSum;
136
+ lumaSum = cbSum = crSum = 0;
137
+
138
if (pic.bitDepth == 8)
139
{
140
#if (X265_DEPTH > 8)
141
142
pixel *U = m_picOrg[1];
143
pixel *V = m_picOrg[2];
144
145
+ pixel *yPic = m_picOrg[0];
146
+ pixel *uPic = m_picOrg[1];
147
+ pixel *vPic = m_picOrg[2];
148
+
149
+ for (int r = 0; r < height; r++)
150
+ {
151
+ for (int c = 0; c < width; c++)
152
+ {
153
+ m_maxLumaLevel = X265_MAX(yPic[c], m_maxLumaLevel);
154
+ m_minLumaLevel = X265_MIN(yPic[c], m_minLumaLevel);
155
+ lumaSum += yPic[c];
156
+ }
157
+ yPic += m_stride;
158
+ }
159
+ m_avgLumaLevel = (double)lumaSum / (m_picHeight * m_picWidth);
160
+
161
+ if (param.csvLogLevel >= 2)
162
+ {
163
+ if (param.internalCsp != X265_CSP_I400)
164
+ {
165
+ for (int r = 0; r < height >> m_vChromaShift; r++)
166
+ {
167
+ for (int c = 0; c < width >> m_hChromaShift; c++)
168
+ {
169
+ m_maxChromaULevel = X265_MAX(uPic[c], m_maxChromaULevel);
170
+ m_minChromaULevel = X265_MIN(uPic[c], m_minChromaULevel);
171
+ cbSum += uPic[c];
172
+
173
+ m_maxChromaVLevel = X265_MAX(vPic[c], m_maxChromaVLevel);
174
+ m_minChromaVLevel = X265_MIN(vPic[c], m_minChromaVLevel);
175
+ crSum += vPic[c];
176
+ }
177
+
178
+ uPic += m_strideC;
179
+ vPic += m_strideC;
180
+ }
181
+ m_avgChromaULevel = (double)cbSum / ((height >> m_vChromaShift) * (width >> m_hChromaShift));
182
+ m_avgChromaVLevel = (double)crSum / ((height >> m_vChromaShift) * (width >> m_hChromaShift));
183
+ }
184
+ }
185
+
186
#if HIGH_BIT_DEPTH
187
bool calcHDRParams = !!param.minLuma || (param.maxLuma != PIXEL_MAX);
188
/* Apply min/max luma bounds for HDR pixel manipulations */
189
x265_2.4.tar.gz/source/common/picyuv.h -> x265_2.5.tar.gz/source/common/picyuv.h
Changed
30
1
2
uint32_t m_chromaMarginX;
3
uint32_t m_chromaMarginY;
4
5
- pixel m_maxLumaLevel;
6
- double m_avgLumaLevel;
7
+ pixel m_maxLumaLevel;
8
+ pixel m_minLumaLevel;
9
+ double m_avgLumaLevel;
10
+
11
+ pixel m_maxChromaULevel;
12
+ pixel m_minChromaULevel;
13
+ double m_avgChromaULevel;
14
+
15
+ pixel m_maxChromaVLevel;
16
+ pixel m_minChromaVLevel;
17
+ double m_avgChromaVLevel;
18
+ x265_param *m_param;
19
20
PicYuv();
21
22
- bool create(uint32_t picWidth, uint32_t picHeight, uint32_t csp);
23
+ bool create(x265_param* param, pixel *pixelbuf = NULL);
24
bool createOffsets(const SPS& sps);
25
void destroy();
26
+ int getLumaBufLen(uint32_t picWidth, uint32_t picHeight, uint32_t picCsp);
27
28
void copyFromPicture(const x265_picture&, const x265_param& param, int padx, int pady);
29
30
x265_2.4.tar.gz/source/common/primitives.cpp -> x265_2.5.tar.gz/source/common/primitives.cpp
Changed
17
1
2
void setupIntraPrimitives_c(EncoderPrimitives &p);
3
void setupLoopFilterPrimitives_c(EncoderPrimitives &p);
4
void setupSaoPrimitives_c(EncoderPrimitives &p);
5
+void setupSeaIntegralPrimitives_c(EncoderPrimitives &p);
6
7
void setupCPrimitives(EncoderPrimitives &p)
8
{
9
10
setupIntraPrimitives_c(p); // intrapred.cpp
11
setupLoopFilterPrimitives_c(p); // loopfilter.cpp
12
setupSaoPrimitives_c(p); // sao.cpp
13
+ setupSeaIntegralPrimitives_c(p); // framefilter.cpp
14
}
15
16
void setupAliasPrimitives(EncoderPrimitives &p)
17
x265_2.4.tar.gz/source/common/primitives.h -> x265_2.5.tar.gz/source/common/primitives.h
Changed
39
1
2
BLOCK_422_32x64
3
};
4
5
+enum IntegralSize
6
+{
7
+ INTEGRAL_4,
8
+ INTEGRAL_8,
9
+ INTEGRAL_12,
10
+ INTEGRAL_16,
11
+ INTEGRAL_24,
12
+ INTEGRAL_32,
13
+ NUM_INTEGRAL_SIZE
14
+};
15
+
16
typedef int (*pixelcmp_t)(const pixel* fenc, intptr_t fencstride, const pixel* fref, intptr_t frefstride); // fenc is aligned
17
typedef int (*pixelcmp_ss_t)(const int16_t* fenc, intptr_t fencstride, const int16_t* fref, intptr_t frefstride);
18
typedef sse_t (*pixel_sse_t)(const pixel* fenc, intptr_t fencstride, const pixel* fref, intptr_t frefstride); // fenc is aligned
19
20
typedef void (*pelFilterLumaStrong_t)(pixel* src, intptr_t srcStep, intptr_t offset, int32_t tcP, int32_t tcQ);
21
typedef void (*pelFilterChroma_t)(pixel* src, intptr_t srcStep, intptr_t offset, int32_t tc, int32_t maskP, int32_t maskQ);
22
23
+typedef void (*integralv_t)(uint32_t *sum, intptr_t stride);
24
+typedef void (*integralh_t)(uint32_t *sum, pixel *pix, intptr_t stride);
25
+
26
/* Function pointers to optimized encoder primitives. Each pointer can reference
27
* either an assembly routine, a SIMD intrinsic primitive, or a C function */
28
struct EncoderPrimitives
29
30
pelFilterLumaStrong_t pelFilterLumaStrong[2]; // EDGE_VER = 0, EDGE_HOR = 1
31
pelFilterChroma_t pelFilterChroma[2]; // EDGE_VER = 0, EDGE_HOR = 1
32
33
+ integralv_t integral_initv[NUM_INTEGRAL_SIZE];
34
+ integralh_t integral_inith[NUM_INTEGRAL_SIZE];
35
+
36
/* There is one set of chroma primitives per color space. An encoder will
37
* have just a single color space and thus it will only ever use one entry
38
* in this array. However we always fill all entries in the array in case
39
x265_2.4.tar.gz/source/common/slice.cpp -> x265_2.5.tar.gz/source/common/slice.cpp
Changed
30
1
2
uint32_t Slice::realEndAddress(uint32_t endCUAddr) const
3
{
4
// Calculate end address
5
- uint32_t internalAddress = (endCUAddr - 1) % NUM_4x4_PARTITIONS;
6
- uint32_t externalAddress = (endCUAddr - 1) / NUM_4x4_PARTITIONS;
7
- uint32_t xmax = m_sps->picWidthInLumaSamples - (externalAddress % m_sps->numCuInWidth) * g_maxCUSize;
8
- uint32_t ymax = m_sps->picHeightInLumaSamples - (externalAddress / m_sps->numCuInWidth) * g_maxCUSize;
9
+ uint32_t internalAddress = (endCUAddr - 1) % m_param->num4x4Partitions;
10
+ uint32_t externalAddress = (endCUAddr - 1) / m_param->num4x4Partitions;
11
+ uint32_t xmax = m_sps->picWidthInLumaSamples - (externalAddress % m_sps->numCuInWidth) * m_param->maxCUSize;
12
+ uint32_t ymax = m_sps->picHeightInLumaSamples - (externalAddress / m_sps->numCuInWidth) * m_param->maxCUSize;
13
14
while (g_zscanToPelX[internalAddress] >= xmax || g_zscanToPelY[internalAddress] >= ymax)
15
internalAddress--;
16
17
internalAddress++;
18
- if (internalAddress == NUM_4x4_PARTITIONS)
19
+ if (internalAddress == m_param->num4x4Partitions)
20
{
21
internalAddress = 0;
22
externalAddress++;
23
}
24
25
- return externalAddress * NUM_4x4_PARTITIONS + internalAddress;
26
+ return externalAddress * m_param->num4x4Partitions + internalAddress;
27
}
28
29
30
x265_2.4.tar.gz/source/common/slice.h -> x265_2.5.tar.gz/source/common/slice.h
Changed
9
1
2
int m_iPPSQpMinus26;
3
int numRefIdxDefault[2];
4
int m_iNumRPSInSPS;
5
+ const x265_param *m_param;
6
7
Slice()
8
{
9
x265_2.4.tar.gz/source/common/threadpool.cpp -> x265_2.5.tar.gz/source/common/threadpool.cpp
Changed
73
1
2
int cpusPerNode[MAX_NODE_NUM + 1];
3
int threadsPerPool[MAX_NODE_NUM + 2];
4
uint64_t nodeMaskPerPool[MAX_NODE_NUM + 2];
5
+ int totalNumThreads = 0;
6
7
memset(cpusPerNode, 0, sizeof(cpusPerNode));
8
memset(threadsPerPool, 0, sizeof(threadsPerPool));
9
10
if (bNumaSupport)
11
x265_log(p, X265_LOG_DEBUG, "NUMA node %d may use %d logical cores\n", i, cpusPerNode[i]);
12
if (threadsPerPool[i])
13
+ {
14
numPools += (threadsPerPool[i] + MAX_POOL_THREADS - 1) / MAX_POOL_THREADS;
15
+ totalNumThreads += threadsPerPool[i];
16
+ }
17
}
18
+ if (!isThreadsReserved)
19
+ {
20
+ if (!numPools)
21
+ {
22
+ x265_log(p, X265_LOG_DEBUG, "No pool thread available. Deciding frame-threads based on detected CPU threads\n");
23
+ totalNumThreads = ThreadPool::getCpuCount(); // auto-detect frame threads
24
+ }
25
26
+ if (!p->frameNumThreads)
27
+ ThreadPool::getFrameThreadsCount(p, totalNumThreads);
28
+ }
29
+
30
if (!numPools)
31
return NULL;
32
33
34
node++;
35
int numThreads = X265_MIN(MAX_POOL_THREADS, threadsPerPool[node]);
36
int origNumThreads = numThreads;
37
- if (p->lookaheadThreads > numThreads / 2)
38
+ if (i == 0 && p->lookaheadThreads > numThreads / 2)
39
{
40
p->lookaheadThreads = numThreads / 2;
41
x265_log(p, X265_LOG_DEBUG, "Setting lookahead threads to a maximum of half the total number of threads\n");
42
43
maxProviders = 1;
44
}
45
46
- else
47
+ else if (i == 0)
48
numThreads -= p->lookaheadThreads;
49
if (!pools[i].create(numThreads, maxProviders, nodeMaskPerPool[node]))
50
{
51
52
#endif
53
}
54
55
+void ThreadPool::getFrameThreadsCount(x265_param* p, int cpuCount)
56
+{
57
+ int rows = (p->sourceHeight + p->maxCUSize - 1) >> g_log2Size[p->maxCUSize];
58
+ if (!p->bEnableWavefront)
59
+ p->frameNumThreads = X265_MIN3(cpuCount, (rows + 1) / 2, X265_MAX_FRAME_THREADS);
60
+ else if (cpuCount >= 32)
61
+ p->frameNumThreads = (p->sourceHeight > 2000) ? 6 : 5;
62
+ else if (cpuCount >= 16)
63
+ p->frameNumThreads = 4;
64
+ else if (cpuCount >= 8)
65
+ p->frameNumThreads = 3;
66
+ else if (cpuCount >= 4)
67
+ p->frameNumThreads = 2;
68
+ else
69
+ p->frameNumThreads = 1;
70
+}
71
+
72
} // end namespace X265_NS
73
x265_2.4.tar.gz/source/common/threadpool.h -> x265_2.5.tar.gz/source/common/threadpool.h
Changed
9
1
2
static ThreadPool* allocThreadPools(x265_param* p, int& numPools, bool isThreadsReserved);
3
static int getCpuCount();
4
static int getNumaNodeCount();
5
+ static void getFrameThreadsCount(x265_param* p,int cpuCount);
6
};
7
8
/* Any worker thread may enlist the help of idle worker threads from the same
9
x265_2.4.tar.gz/source/common/x86/asm-primitives.cpp -> x265_2.5.tar.gz/source/common/x86/asm-primitives.cpp
Changed
47
1
2
#include "blockcopy8.h"
3
#include "intrapred.h"
4
#include "dct8.h"
5
+#include "seaintegral.h"
6
}
7
8
#define ALL_LUMA_CU_TYPED(prim, fncdef, fname, cpu) \
9
10
p.fix8Unpack = PFX(cutree_fix8_unpack_avx2);
11
p.fix8Pack = PFX(cutree_fix8_pack_avx2);
12
13
+ p.integral_initv[INTEGRAL_4] = PFX(integral4v_avx2);
14
+ p.integral_initv[INTEGRAL_8] = PFX(integral8v_avx2);
15
+ p.integral_initv[INTEGRAL_12] = PFX(integral12v_avx2);
16
+ p.integral_initv[INTEGRAL_16] = PFX(integral16v_avx2);
17
+ p.integral_initv[INTEGRAL_24] = PFX(integral24v_avx2);
18
+ p.integral_initv[INTEGRAL_32] = PFX(integral32v_avx2);
19
+ p.integral_inith[INTEGRAL_4] = PFX(integral4h_avx2);
20
+ p.integral_inith[INTEGRAL_8] = PFX(integral8h_avx2);
21
+ p.integral_inith[INTEGRAL_12] = PFX(integral12h_avx2);
22
+ p.integral_inith[INTEGRAL_16] = PFX(integral16h_avx2);
23
+
24
/* TODO: This kernel needs to be modified to work with HIGH_BIT_DEPTH only
25
p.planeClipAndMax = PFX(planeClipAndMax_avx2); */
26
27
28
p.fix8Unpack = PFX(cutree_fix8_unpack_avx2);
29
p.fix8Pack = PFX(cutree_fix8_pack_avx2);
30
31
+ p.integral_initv[INTEGRAL_4] = PFX(integral4v_avx2);
32
+ p.integral_initv[INTEGRAL_8] = PFX(integral8v_avx2);
33
+ p.integral_initv[INTEGRAL_12] = PFX(integral12v_avx2);
34
+ p.integral_initv[INTEGRAL_16] = PFX(integral16v_avx2);
35
+ p.integral_initv[INTEGRAL_24] = PFX(integral24v_avx2);
36
+ p.integral_initv[INTEGRAL_32] = PFX(integral32v_avx2);
37
+ p.integral_inith[INTEGRAL_4] = PFX(integral4h_avx2);
38
+ p.integral_inith[INTEGRAL_8] = PFX(integral8h_avx2);
39
+ p.integral_inith[INTEGRAL_12] = PFX(integral12h_avx2);
40
+ p.integral_inith[INTEGRAL_16] = PFX(integral16h_avx2);
41
+ p.integral_inith[INTEGRAL_24] = PFX(integral24h_avx2);
42
+ p.integral_inith[INTEGRAL_32] = PFX(integral32h_avx2);
43
+
44
}
45
#endif
46
}
47
x265_2.4.tar.gz/source/common/x86/loopfilter.asm -> x265_2.5.tar.gz/source/common/x86/loopfilter.asm
Changed
28
1
2
pshufb m1, m4, m0
3
pcmpgtb m0, [pb_15] ; m0 = [mask]
4
5
- pblendvb m6, m6, m1, m0 ; NOTE: don't use 3 parameters style, x264 macro have some bug!
6
+ pblendvb m6, m1, m0
7
8
pmovsxbw m0, m6 ; offset
9
punpckhbw m6, m6
10
11
pshufb m6, m3, m1
12
pshufb m5, m4, m1
13
14
- pblendvb m6, m6, m5, m0 ; NOTE: don't use 3 parameters style, x264 macro have some bug!
15
+ pblendvb m6, m5, m0
16
17
pmovzxbw m1, m2 ; rec
18
punpckhbw m2, m7
19
20
sub r3, r4
21
movu xmm0, [r3]
22
movu m3, [r0]
23
- pblendvb m5, m5, m3, xmm0
24
+ pblendvb m5, m3, xmm0
25
movu [r0], m5
26
27
.end:
28
x265_2.4.tar.gz/source/common/x86/pixel-a.asm -> x265_2.5.tar.gz/source/common/x86/pixel-a.asm
Changed
10
1
2
; clobber: m3..m7
3
; out: %1 = satd
4
%macro SATD_4x4_MMX 3
5
- %xdefine %%n n%1
6
+ %xdefine %%n nn%1
7
%assign offset %2*SIZEOF_PIXEL
8
LOAD_DIFF m4, m3, none, [r0+ offset], [r2+ offset]
9
LOAD_DIFF m5, m3, none, [r0+ r1+offset], [r2+ r3+offset]
10
x265_2.4.tar.gz/source/common/x86/pixel-util8.asm -> x265_2.5.tar.gz/source/common/x86/pixel-util8.asm
Changed
10
1
2
3
.widthLess8:
4
movu m6, [r1]
5
- pblendvb m6, m6, m7, m0
6
+ pblendvb m6, m7, m0
7
movu [r1], m6
8
9
.nextH:
10
x265_2.5.tar.gz/source/common/x86/seaintegral.asm
Added
1064
1
2
+;*****************************************************************************
3
+;* Copyright (C) 2013-2017 MulticoreWare, Inc
4
+;*
5
+;* Authors: Jayashri Murugan <jayashri@multicorewareinc.com>
6
+;* Vignesh V Menon <vignesh@multicorewareinc.com>
7
+;* Praveen Tiwari <praveen@multicorewareinc.com>
8
+;*
9
+;* This program is free software; you can redistribute it and/or modify
10
+;* it under the terms of the GNU General Public License as published by
11
+;* the Free Software Foundation; either version 2 of the License, or
12
+;* (at your option) any later version.
13
+;*
14
+;* This program is distributed in the hope that it will be useful,
15
+;* but WITHOUT ANY WARRANTY; without even the implied warranty of
16
+;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17
+;* GNU General Public License for more details.
18
+;*
19
+;* You should have received a copy of the GNU General Public License
20
+;* along with this program; if not, write to the Free Software
21
+;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
22
+;*
23
+;* This program is also available under a commercial proprietary license.
24
+;* For more information, contact us at license @ x265.com.
25
+;*****************************************************************************/
26
+
27
+%include "x86inc.asm"
28
+%include "x86util.asm"
29
+
30
+SECTION .text
31
+
32
+;-----------------------------------------------------------------------------
33
+;void integral_init4v_c(uint32_t *sum4, intptr_t stride)
34
+;-----------------------------------------------------------------------------
35
+INIT_YMM avx2
36
+cglobal integral4v, 2, 3, 2
37
+ mov r2, r1
38
+ shl r2, 4
39
+
40
+.loop
41
+ movu m0, [r0]
42
+ movu m1, [r0 + r2]
43
+ psubd m1, m0
44
+ movu [r0], m1
45
+ add r0, 32
46
+ sub r1, 8
47
+ jnz .loop
48
+ RET
49
+
50
+;-----------------------------------------------------------------------------
51
+;void integral_init8v_c(uint32_t *sum8, intptr_t stride)
52
+;-----------------------------------------------------------------------------
53
+INIT_YMM avx2
54
+cglobal integral8v, 2, 3, 2
55
+ mov r2, r1
56
+ shl r2, 5
57
+
58
+.loop
59
+ movu m0, [r0]
60
+ movu m1, [r0 + r2]
61
+ psubd m1, m0
62
+ movu [r0], m1
63
+ add r0, 32
64
+ sub r1, 8
65
+ jnz .loop
66
+ RET
67
+
68
+;-----------------------------------------------------------------------------
69
+;void integral_init12v_c(uint32_t *sum12, intptr_t stride)
70
+;-----------------------------------------------------------------------------
71
+INIT_YMM avx2
72
+cglobal integral12v, 2, 4, 2
73
+ mov r2, r1
74
+ mov r3, r1
75
+ shl r2, 5
76
+ shl r3, 4
77
+ add r2, r3
78
+
79
+.loop
80
+ movu m0, [r0]
81
+ movu m1, [r0 + r2]
82
+ psubd m1, m0
83
+ movu [r0], m1
84
+ add r0, 32
85
+ sub r1, 8
86
+ jnz .loop
87
+ RET
88
+
89
+;-----------------------------------------------------------------------------
90
+;void integral_init16v_c(uint32_t *sum16, intptr_t stride)
91
+;-----------------------------------------------------------------------------
92
+INIT_YMM avx2
93
+cglobal integral16v, 2, 3, 2
94
+ mov r2, r1
95
+ shl r2, 6
96
+
97
+.loop
98
+ movu m0, [r0]
99
+ movu m1, [r0 + r2]
100
+ psubd m1, m0
101
+ movu [r0], m1
102
+ add r0, 32
103
+ sub r1, 8
104
+ jnz .loop
105
+ RET
106
+
107
+;-----------------------------------------------------------------------------
108
+;void integral_init24v_c(uint32_t *sum24, intptr_t stride)
109
+;-----------------------------------------------------------------------------
110
+INIT_YMM avx2
111
+cglobal integral24v, 2, 4, 2
112
+ mov r2, r1
113
+ mov r3, r1
114
+ shl r2, 6
115
+ shl r3, 5
116
+ add r2, r3
117
+
118
+.loop
119
+ movu m0, [r0]
120
+ movu m1, [r0 + r2]
121
+ psubd m1, m0
122
+ movu [r0], m1
123
+ add r0, 32
124
+ sub r1, 8
125
+ jnz .loop
126
+ RET
127
+
128
+;-----------------------------------------------------------------------------
129
+;void integral_init32v_c(uint32_t *sum32, intptr_t stride)
130
+;-----------------------------------------------------------------------------
131
+INIT_YMM avx2
132
+cglobal integral32v, 2, 3, 2
133
+ mov r2, r1
134
+ shl r2, 7
135
+
136
+.loop
137
+ movu m0, [r0]
138
+ movu m1, [r0 + r2]
139
+ psubd m1, m0
140
+ movu [r0], m1
141
+ add r0, 32
142
+ sub r1, 8
143
+ jnz .loop
144
+ RET
145
+
146
+%macro INTEGRAL_FOUR_HORIZONTAL_16 0
147
+ pmovzxbw m0, [r1]
148
+ pmovzxbw m1, [r1 + 1]
149
+ paddw m0, m1
150
+ pmovzxbw m1, [r1 + 2]
151
+ paddw m0, m1
152
+ pmovzxbw m1, [r1 + 3]
153
+ paddw m0, m1
154
+%endmacro
155
+
156
+%macro INTEGRAL_FOUR_HORIZONTAL_4 0
157
+ movd xm0, [r1]
158
+ movd xm1, [r1 + 1]
159
+ pmovzxbw xm0, xm0
160
+ pmovzxbw xm1, xm1
161
+ paddw xm0, xm1
162
+ movd xm1, [r1 + 2]
163
+ pmovzxbw xm1, xm1
164
+ paddw xm0, xm1
165
+ movd xm1, [r1 + 3]
166
+ pmovzxbw xm1, xm1
167
+ paddw xm0, xm1
168
+%endmacro
169
+
170
+%macro INTEGRAL_FOUR_HORIZONTAL_8_HBD 0
171
+ pmovzxwd m0, [r1]
172
+ pmovzxwd m1, [r1 + 2]
173
+ paddd m0, m1
174
+ pmovzxwd m1, [r1 + 4]
175
+ paddd m0, m1
176
+ pmovzxwd m1, [r1 + 6]
177
+ paddd m0, m1
178
+%endmacro
179
+
180
+%macro INTEGRAL_FOUR_HORIZONTAL_4_HBD 0
181
+ pmovzxwd xm0, [r1]
182
+ pmovzxwd xm1, [r1 + 2]
183
+ paddd xm0, xm1
184
+ pmovzxwd xm1, [r1 + 4]
185
+ paddd xm0, xm1
186
+ pmovzxwd xm1, [r1 + 6]
187
+ paddd xm0, xm1
188
+%endmacro
189
+
190
+;-----------------------------------------------------------------------------
191
+;static void integral_init4h(uint32_t *sum, pixel *pix, intptr_t stride)
192
+;-----------------------------------------------------------------------------
193
+INIT_YMM avx2
194
+%if HIGH_BIT_DEPTH
195
+cglobal integral4h, 3, 5, 3
196
+ lea r3, [4 * r2]
197
+ sub r0, r3
198
+ sub r2, 4 ;stride - 4
199
+ mov r4, r2
200
+ shr r4, 3
201
+
202
+.loop_8:
203
+ INTEGRAL_FOUR_HORIZONTAL_8_HBD
204
+ movu m1, [r0]
205
+ paddd m0, m1
206
+ movu [r0 + r3], m0
207
+ add r1, 16
208
+ add r0, 32
209
+ sub r2, 8
210
+ sub r4, 1
211
+ jnz .loop_8
212
+ INTEGRAL_FOUR_HORIZONTAL_4_HBD
213
+ movu xm1, [r0]
214
+ paddd xm0, xm1
215
+ movu [r0 + r3], xm0
216
+ RET
217
+
218
+%else
219
+cglobal integral4h, 3, 5, 3
220
+ lea r3, [4 * r2]
221
+ sub r0, r3
222
+ sub r2, 4 ;stride - 4
223
+ mov r4, r2
224
+ shr r4, 4
225
+
226
+.loop_16:
227
+ INTEGRAL_FOUR_HORIZONTAL_16
228
+ vperm2i128 m2, m0, m0, 1
229
+ pmovzxwd m2, xm2
230
+ pmovzxwd m0, xm0
231
+ movu m1, [r0]
232
+ paddd m0, m1
233
+ movu [r0 + r3], m0
234
+ movu m1, [r0 + 32]
235
+ paddd m2, m1
236
+ movu [r0 + r3 + 32], m2
237
+ add r1, 16
238
+ add r0, 64
239
+ sub r2, 16
240
+ sub r4, 1
241
+ jnz .loop_16
242
+ cmp r2, 12
243
+ je .loop_12
244
+ cmp r2, 4
245
+ je .loop_4
246
+
247
+.loop_12:
248
+ INTEGRAL_FOUR_HORIZONTAL_16
249
+ vperm2i128 m2, m0, m0, 1
250
+ pmovzxwd xm2, xm2
251
+ pmovzxwd m0, xm0
252
+ movu m1, [r0]
253
+ paddd m0, m1
254
+ movu [r0 + r3], m0
255
+ movu xm1, [r0 + 32]
256
+ paddd xm2, xm1
257
+ movu [r0 + r3 + 32], xm2
258
+ jmp .end
259
+
260
+.loop_4:
261
+ INTEGRAL_FOUR_HORIZONTAL_4
262
+ pmovzxwd xm0, xm0
263
+ movu xm1, [r0]
264
+ paddd xm0, xm1
265
+ movu [r0 + r3], xm0
266
+ jmp .end
267
+
268
+.end
269
+ RET
270
+%endif
271
+
272
+%macro INTEGRAL_EIGHT_HORIZONTAL_16 0
273
+ pmovzxbw m0, [r1]
274
+ pmovzxbw m1, [r1 + 1]
275
+ paddw m0, m1
276
+ pmovzxbw m1, [r1 + 2]
277
+ paddw m0, m1
278
+ pmovzxbw m1, [r1 + 3]
279
+ paddw m0, m1
280
+ pmovzxbw m1, [r1 + 4]
281
+ paddw m0, m1
282
+ pmovzxbw m1, [r1 + 5]
283
+ paddw m0, m1
284
+ pmovzxbw m1, [r1 + 6]
285
+ paddw m0, m1
286
+ pmovzxbw m1, [r1 + 7]
287
+ paddw m0, m1
288
+%endmacro
289
+
290
+%macro INTEGRAL_EIGHT_HORIZONTAL_8 0
291
+ pmovzxbw xm0, [r1]
292
+ pmovzxbw xm1, [r1 + 1]
293
+ paddw xm0, xm1
294
+ pmovzxbw xm1, [r1 + 2]
295
+ paddw xm0, xm1
296
+ pmovzxbw xm1, [r1 + 3]
297
+ paddw xm0, xm1
298
+ pmovzxbw xm1, [r1 + 4]
299
+ paddw xm0, xm1
300
+ pmovzxbw xm1, [r1 + 5]
301
+ paddw xm0, xm1
302
+ pmovzxbw xm1, [r1 + 6]
303
+ paddw xm0, xm1
304
+ pmovzxbw xm1, [r1 + 7]
305
+ paddw xm0, xm1
306
+%endmacro
307
+
308
+%macro INTEGRAL_EIGHT_HORIZONTAL_8_HBD 0
309
+ pmovzxwd m0, [r1]
310
+ pmovzxwd m1, [r1 + 2]
311
+ paddd m0, m1
312
+ pmovzxwd m1, [r1 + 4]
313
+ paddd m0, m1
314
+ pmovzxwd m1, [r1 + 6]
315
+ paddd m0, m1
316
+ pmovzxwd m1, [r1 + 8]
317
+ paddd m0, m1
318
+ pmovzxwd m1, [r1 + 10]
319
+ paddd m0, m1
320
+ pmovzxwd m1, [r1 + 12]
321
+ paddd m0, m1
322
+ pmovzxwd m1, [r1 + 14]
323
+ paddd m0, m1
324
+%endmacro
325
+
326
+;-----------------------------------------------------------------------------
327
+;static void integral_init8h_c(uint32_t *sum, pixel *pix, intptr_t stride)
328
+;-----------------------------------------------------------------------------
329
+INIT_YMM avx2
330
+%if HIGH_BIT_DEPTH
331
+cglobal integral8h, 3, 4, 3
332
+ lea r3, [4 * r2]
333
+ sub r0, r3
334
+ sub r2, 8 ;stride - 8
335
+
336
+.loop:
337
+ INTEGRAL_EIGHT_HORIZONTAL_8_HBD
338
+ movu m1, [r0]
339
+ paddd m0, m1
340
+ movu [r0 + r3], m0
341
+ add r1, 16
342
+ add r0, 32
343
+ sub r2, 8
344
+ jnz .loop
345
+ RET
346
+
347
+%else
348
+cglobal integral8h, 3, 5, 3
349
+ lea r3, [4 * r2]
350
+ sub r0, r3
351
+ sub r2, 8 ;stride - 8
352
+ mov r4, r2
353
+ shr r4, 4
354
+
355
+.loop_16:
356
+ INTEGRAL_EIGHT_HORIZONTAL_16
357
+ vperm2i128 m2, m0, m0, 1
358
+ pmovzxwd m2, xm2
359
+ pmovzxwd m0, xm0
360
+ movu m1, [r0]
361
+ paddd m0, m1
362
+ movu [r0 + r3], m0
363
+ movu m1, [r0 + 32]
364
+ paddd m2, m1
365
+ movu [r0 + r3 + 32], m2
366
+ add r1, 16
367
+ add r0, 64
368
+ sub r2, 16
369
+ sub r4, 1
370
+ jnz .loop_16
371
+ cmp r2, 8
372
+ je .loop_8
373
+ jmp .end
374
+
375
+.loop_8:
376
+ INTEGRAL_EIGHT_HORIZONTAL_8
377
+ pmovzxwd m0, xm0
378
+ movu m1, [r0]
379
+ paddd m0, m1
380
+ movu [r0 + r3], m0
381
+ jmp .end
382
+
383
+.end
384
+ RET
385
+%endif
386
+
387
+%macro INTEGRAL_TWELVE_HORIZONTAL_16 0
388
+ pmovzxbw m0, [r1]
389
+ pmovzxbw m1, [r1 + 1]
390
+ paddw m0, m1
391
+ pmovzxbw m1, [r1 + 2]
392
+ paddw m0, m1
393
+ pmovzxbw m1, [r1 + 3]
394
+ paddw m0, m1
395
+ pmovzxbw m1, [r1 + 4]
396
+ paddw m0, m1
397
+ pmovzxbw m1, [r1 + 5]
398
+ paddw m0, m1
399
+ pmovzxbw m1, [r1 + 6]
400
+ paddw m0, m1
401
+ pmovzxbw m1, [r1 + 7]
402
+ paddw m0, m1
403
+ pmovzxbw m1, [r1 + 8]
404
+ paddw m0, m1
405
+ pmovzxbw m1, [r1 + 9]
406
+ paddw m0, m1
407
+ pmovzxbw m1, [r1 + 10]
408
+ paddw m0, m1
409
+ pmovzxbw m1, [r1 + 11]
410
+ paddw m0, m1
411
+%endmacro
412
+
413
+%macro INTEGRAL_TWELVE_HORIZONTAL_4 0
414
+ movd xm0, [r1]
415
+ movd xm1, [r1 + 1]
416
+ pmovzxbw xm0, xm0
417
+ pmovzxbw xm1, xm1
418
+ paddw xm0, xm1
419
+ movd xm1, [r1 + 2]
420
+ pmovzxbw xm1, xm1
421
+ paddw xm0, xm1
422
+ movd xm1, [r1 + 3]
423
+ pmovzxbw xm1, xm1
424
+ paddw xm0, xm1
425
+ movd xm1, [r1 + 4]
426
+ pmovzxbw xm1, xm1
427
+ paddw xm0, xm1
428
+ movd xm1, [r1 + 5]
429
+ pmovzxbw xm1, xm1
430
+ paddw xm0, xm1
431
+ movd xm1, [r1 + 6]
432
+ pmovzxbw xm1, xm1
433
+ paddw xm0, xm1
434
+ movd xm1, [r1 + 7]
435
+ pmovzxbw xm1, xm1
436
+ paddw xm0, xm1
437
+ movd xm1, [r1 + 8]
438
+ pmovzxbw xm1, xm1
439
+ paddw xm0, xm1
440
+ movd xm1, [r1 + 9]
441
+ pmovzxbw xm1, xm1
442
+ paddw xm0, xm1
443
+ movd xm1, [r1 + 10]
444
+ pmovzxbw xm1, xm1
445
+ paddw xm0, xm1
446
+ movd xm1, [r1 + 11]
447
+ pmovzxbw xm1, xm1
448
+ paddw xm0, xm1
449
+%endmacro
450
+
451
+%macro INTEGRAL_TWELVE_HORIZONTAL_8_HBD 0
452
+ pmovzxwd m0, [r1]
453
+ pmovzxwd m1, [r1 + 2]
454
+ paddd m0, m1
455
+ pmovzxwd m1, [r1 + 4]
456
+ paddd m0, m1
457
+ pmovzxwd m1, [r1 + 6]
458
+ paddd m0, m1
459
+ pmovzxwd m1, [r1 + 8]
460
+ paddd m0, m1
461
+ pmovzxwd m1, [r1 + 10]
462
+ paddd m0, m1
463
+ pmovzxwd m1, [r1 + 12]
464
+ paddd m0, m1
465
+ pmovzxwd m1, [r1 + 14]
466
+ paddd m0, m1
467
+ pmovzxwd m1, [r1 + 16]
468
+ paddd m0, m1
469
+ pmovzxwd m1, [r1 + 18]
470
+ paddd m0, m1
471
+ pmovzxwd m1, [r1 + 20]
472
+ paddd m0, m1
473
+ pmovzxwd m1, [r1 + 22]
474
+ paddd m0, m1
475
+%endmacro
476
+
477
+%macro INTEGRAL_TWELVE_HORIZONTAL_4_HBD 0
478
+ pmovzxwd xm0, [r1]
479
+ pmovzxwd xm1, [r1 + 2]
480
+ paddd xm0, xm1
481
+ pmovzxwd xm1, [r1 + 4]
482
+ paddd xm0, xm1
483
+ pmovzxwd xm1, [r1 + 6]
484
+ paddd xm0, xm1
485
+ pmovzxwd xm1, [r1 + 8]
486
+ paddd xm0, xm1
487
+ pmovzxwd xm1, [r1 + 10]
488
+ paddd xm0, xm1
489
+ pmovzxwd xm1, [r1 + 12]
490
+ paddd xm0, xm1
491
+ pmovzxwd xm1, [r1 + 14]
492
+ paddd xm0, xm1
493
+ pmovzxwd xm1, [r1 + 16]
494
+ paddd xm0, xm1
495
+ pmovzxwd xm1, [r1 + 18]
496
+ paddd xm0, xm1
497
+ pmovzxwd xm1, [r1 + 20]
498
+ paddd xm0, xm1
499
+ pmovzxwd xm1, [r1 + 22]
500
+ paddd xm0, xm1
501
+%endmacro
502
+
503
+;-----------------------------------------------------------------------------
504
+;static void integral_init12h_c(uint32_t *sum, pixel *pix, intptr_t stride)
505
+;-----------------------------------------------------------------------------
506
+INIT_YMM avx2
507
+%if HIGH_BIT_DEPTH
508
+cglobal integral12h, 3, 5, 3
509
+ lea r3, [4 * r2]
510
+ sub r0, r3
511
+ sub r2, 12 ;stride - 12
512
+ mov r4, r2
513
+ shr r4, 3
514
+
515
+.loop:
516
+ INTEGRAL_TWELVE_HORIZONTAL_8_HBD
517
+ movu m1, [r0]
518
+ paddd m0, m1
519
+ movu [r0 + r3], m0
520
+ add r1, 16
521
+ add r0, 32
522
+ sub r2, 8
523
+ sub r4, 1
524
+ jnz .loop
525
+ INTEGRAL_TWELVE_HORIZONTAL_4_HBD
526
+ movu xm1, [r0]
527
+ paddd xm0, xm1
528
+ movu [r0 + r3], xm0
529
+ RET
530
+
531
+%else
532
+cglobal integral12h, 3, 5, 3
533
+ lea r3, [4 * r2]
534
+ sub r0, r3
535
+ sub r2, 12 ;stride - 12
536
+ mov r4, r2
537
+ shr r4, 4
538
+
539
+.loop_16:
540
+ INTEGRAL_TWELVE_HORIZONTAL_16
541
+ vperm2i128 m2, m0, m0, 1
542
+ pmovzxwd m2, xm2
543
+ pmovzxwd m0, xm0
544
+ movu m1, [r0]
545
+ paddd m0, m1
546
+ movu [r0 + r3], m0
547
+ movu m1, [r0 + 32]
548
+ paddd m2, m1
549
+ movu [r0 + r3 + 32], m2
550
+ add r1, 16
551
+ add r0, 64
552
+ sub r2, 16
553
+ sub r4, 1
554
+ jnz .loop_16
555
+ cmp r2, 12
556
+ je .loop_12
557
+ cmp r2, 4
558
+ je .loop_4
559
+
560
+.loop_12:
561
+ INTEGRAL_TWELVE_HORIZONTAL_16
562
+ vperm2i128 m2, m0, m0, 1
563
+ pmovzxwd xm2, xm2
564
+ pmovzxwd m0, xm0
565
+ movu m1, [r0]
566
+ paddd m0, m1
567
+ movu [r0 + r3], m0
568
+ movu xm1, [r0 + 32]
569
+ paddd xm2, xm1
570
+ movu [r0 + r3 + 32], xm2
571
+ jmp .end
572
+
573
+.loop_4:
574
+ INTEGRAL_TWELVE_HORIZONTAL_4
575
+ pmovzxwd xm0, xm0
576
+ movu xm1, [r0]
577
+ paddd xm0, xm1
578
+ movu [r0 + r3], xm0
579
+ jmp .end
580
+
581
+.end
582
+ RET
583
+%endif
584
+
585
+%macro INTEGRAL_SIXTEEN_HORIZONTAL_16 0
586
+ pmovzxbw m0, [r1]
587
+ pmovzxbw m1, [r1 + 1]
588
+ paddw m0, m1
589
+ pmovzxbw m1, [r1 + 2]
590
+ paddw m0, m1
591
+ pmovzxbw m1, [r1 + 3]
592
+ paddw m0, m1
593
+ pmovzxbw m1, [r1 + 4]
594
+ paddw m0, m1
595
+ pmovzxbw m1, [r1 + 5]
596
+ paddw m0, m1
597
+ pmovzxbw m1, [r1 + 6]
598
+ paddw m0, m1
599
+ pmovzxbw m1, [r1 + 7]
600
+ paddw m0, m1
601
+ pmovzxbw m1, [r1 + 8]
602
+ paddw m0, m1
603
+ pmovzxbw m1, [r1 + 9]
604
+ paddw m0, m1
605
+ pmovzxbw m1, [r1 + 10]
606
+ paddw m0, m1
607
+ pmovzxbw m1, [r1 + 11]
608
+ paddw m0, m1
609
+ pmovzxbw m1, [r1 + 12]
610
+ paddw m0, m1
611
+ pmovzxbw m1, [r1 + 13]
612
+ paddw m0, m1
613
+ pmovzxbw m1, [r1 + 14]
614
+ paddw m0, m1
615
+ pmovzxbw m1, [r1 + 15]
616
+ paddw m0, m1
617
+%endmacro
618
+
619
+%macro INTEGRAL_SIXTEEN_HORIZONTAL_8 0
620
+ pmovzxbw xm0, [r1]
621
+ pmovzxbw xm1, [r1 + 1]
622
+ paddw xm0, xm1
623
+ pmovzxbw xm1, [r1 + 2]
624
+ paddw xm0, xm1
625
+ pmovzxbw xm1, [r1 + 3]
626
+ paddw xm0, xm1
627
+ pmovzxbw xm1, [r1 + 4]
628
+ paddw xm0, xm1
629
+ pmovzxbw xm1, [r1 + 5]
630
+ paddw xm0, xm1
631
+ pmovzxbw xm1, [r1 + 6]
632
+ paddw xm0, xm1
633
+ pmovzxbw xm1, [r1 + 7]
634
+ paddw xm0, xm1
635
+ pmovzxbw xm1, [r1 + 8]
636
+ paddw xm0, xm1
637
+ pmovzxbw xm1, [r1 + 9]
638
+ paddw xm0, xm1
639
+ pmovzxbw xm1, [r1 + 10]
640
+ paddw xm0, xm1
641
+ pmovzxbw xm1, [r1 + 11]
642
+ paddw xm0, xm1
643
+ pmovzxbw xm1, [r1 + 12]
644
+ paddw xm0, xm1
645
+ pmovzxbw xm1, [r1 + 13]
646
+ paddw xm0, xm1
647
+ pmovzxbw xm1, [r1 + 14]
648
+ paddw xm0, xm1
649
+ pmovzxbw xm1, [r1 + 15]
650
+ paddw xm0, xm1
651
+%endmacro
652
+
653
+%macro INTEGRAL_SIXTEEN_HORIZONTAL_8_HBD 0
654
+ pmovzxwd m0, [r1]
655
+ pmovzxwd m1, [r1 + 2]
656
+ paddd m0, m1
657
+ pmovzxwd m1, [r1 + 4]
658
+ paddd m0, m1
659
+ pmovzxwd m1, [r1 + 6]
660
+ paddd m0, m1
661
+ pmovzxwd m1, [r1 + 8]
662
+ paddd m0, m1
663
+ pmovzxwd m1, [r1 + 10]
664
+ paddd m0, m1
665
+ pmovzxwd m1, [r1 + 12]
666
+ paddd m0, m1
667
+ pmovzxwd m1, [r1 + 14]
668
+ paddd m0, m1
669
+ pmovzxwd m1, [r1 + 16]
670
+ paddd m0, m1
671
+ pmovzxwd m1, [r1 + 18]
672
+ paddd m0, m1
673
+ pmovzxwd m1, [r1 + 20]
674
+ paddd m0, m1
675
+ pmovzxwd m1, [r1 + 22]
676
+ paddd m0, m1
677
+ pmovzxwd m1, [r1 + 24]
678
+ paddd m0, m1
679
+ pmovzxwd m1, [r1 + 26]
680
+ paddd m0, m1
681
+ pmovzxwd m1, [r1 + 28]
682
+ paddd m0, m1
683
+ pmovzxwd m1, [r1 + 30]
684
+ paddd m0, m1
685
+%endmacro
686
+
687
+;-----------------------------------------------------------------------------
688
+;static void integral_init16h_c(uint32_t *sum, pixel *pix, intptr_t stride)
689
+;-----------------------------------------------------------------------------
690
+INIT_YMM avx2
691
+%if HIGH_BIT_DEPTH
692
+cglobal integral16h, 3, 4, 3
693
+ lea r3, [4 * r2]
694
+ sub r0, r3
695
+ sub r2, 16 ;stride - 16
696
+
697
+.loop:
698
+ INTEGRAL_SIXTEEN_HORIZONTAL_8_HBD
699
+ movu m1, [r0]
700
+ paddd m0, m1
701
+ movu [r0 + r3], m0
702
+ add r1, 16
703
+ add r0, 32
704
+ sub r2, 8
705
+ jnz .loop
706
+ RET
707
+
708
+%else
709
+cglobal integral16h, 3, 5, 3
710
+ lea r3, [4 * r2]
711
+ sub r0, r3
712
+ sub r2, 16 ;stride - 16
713
+ mov r4, r2
714
+ shr r4, 4
715
+
716
+.loop_16:
717
+ INTEGRAL_SIXTEEN_HORIZONTAL_16
718
+ vperm2i128 m2, m0, m0, 1
719
+ pmovzxwd m2, xm2
720
+ pmovzxwd m0, xm0
721
+ movu m1, [r0]
722
+ paddd m0, m1
723
+ movu [r0 + r3], m0
724
+ movu m1, [r0 + 32]
725
+ paddd m2, m1
726
+ movu [r0 + r3 + 32], m2
727
+ add r1, 16
728
+ add r0, 64
729
+ sub r2, 16
730
+ sub r4, 1
731
+ jnz .loop_16
732
+ cmp r2, 8
733
+ je .loop_8
734
+ jmp .end
735
+
736
+.loop_8:
737
+ INTEGRAL_SIXTEEN_HORIZONTAL_8
738
+ pmovzxwd m0, xm0
739
+ movu m1, [r0]
740
+ paddd m0, m1
741
+ movu [r0 + r3], m0
742
+ jmp .end
743
+
744
+.end
745
+ RET
746
+%endif
747
+
748
+%macro INTEGRAL_TWENTYFOUR_HORIZONTAL_16 0
749
+ pmovzxbw m0, [r1]
750
+ pmovzxbw m1, [r1 + 1]
751
+ paddw m0, m1
752
+ pmovzxbw m1, [r1 + 2]
753
+ paddw m0, m1
754
+ pmovzxbw m1, [r1 + 3]
755
+ paddw m0, m1
756
+ pmovzxbw m1, [r1 + 4]
757
+ paddw m0, m1
758
+ pmovzxbw m1, [r1 + 5]
759
+ paddw m0, m1
760
+ pmovzxbw m1, [r1 + 6]
761
+ paddw m0, m1
762
+ pmovzxbw m1, [r1 + 7]
763
+ paddw m0, m1
764
+ pmovzxbw m1, [r1 + 8]
765
+ paddw m0, m1
766
+ pmovzxbw m1, [r1 + 9]
767
+ paddw m0, m1
768
+ pmovzxbw m1, [r1 + 10]
769
+ paddw m0, m1
770
+ pmovzxbw m1, [r1 + 11]
771
+ paddw m0, m1
772
+ pmovzxbw m1, [r1 + 12]
773
+ paddw m0, m1
774
+ pmovzxbw m1, [r1 + 13]
775
+ paddw m0, m1
776
+ pmovzxbw m1, [r1 + 14]
777
+ paddw m0, m1
778
+ pmovzxbw m1, [r1 + 15]
779
+ paddw m0, m1
780
+ pmovzxbw m1, [r1 + 16]
781
+ paddw m0, m1
782
+ pmovzxbw m1, [r1 + 17]
783
+ paddw m0, m1
784
+ pmovzxbw m1, [r1 + 18]
785
+ paddw m0, m1
786
+ pmovzxbw m1, [r1 + 19]
787
+ paddw m0, m1
788
+ pmovzxbw m1, [r1 + 20]
789
+ paddw m0, m1
790
+ pmovzxbw m1, [r1 + 21]
791
+ paddw m0, m1
792
+ pmovzxbw m1, [r1 + 22]
793
+ paddw m0, m1
794
+ pmovzxbw m1, [r1 + 23]
795
+ paddw m0, m1
796
+%endmacro
797
+
798
+%macro INTEGRAL_TWENTYFOUR_HORIZONTAL_8 0
799
+ pmovzxbw xm0, [r1]
800
+ pmovzxbw xm1, [r1 + 1]
801
+ paddw xm0, xm1
802
+ pmovzxbw xm1, [r1 + 2]
803
+ paddw xm0, xm1
804
+ pmovzxbw xm1, [r1 + 3]
805
+ paddw xm0, xm1
806
+ pmovzxbw xm1, [r1 + 4]
807
+ paddw xm0, xm1
808
+ pmovzxbw xm1, [r1 + 5]
809
+ paddw xm0, xm1
810
+ pmovzxbw xm1, [r1 + 6]
811
+ paddw xm0, xm1
812
+ pmovzxbw xm1, [r1 + 7]
813
+ paddw xm0, xm1
814
+ pmovzxbw xm1, [r1 + 8]
815
+ paddw xm0, xm1
816
+ pmovzxbw xm1, [r1 + 9]
817
+ paddw xm0, xm1
818
+ pmovzxbw xm1, [r1 + 10]
819
+ paddw xm0, xm1
820
+ pmovzxbw xm1, [r1 + 11]
821
+ paddw xm0, xm1
822
+ pmovzxbw xm1, [r1 + 12]
823
+ paddw xm0, xm1
824
+ pmovzxbw xm1, [r1 + 13]
825
+ paddw xm0, xm1
826
+ pmovzxbw xm1, [r1 + 14]
827
+ paddw xm0, xm1
828
+ pmovzxbw xm1, [r1 + 15]
829
+ paddw xm0, xm1
830
+ pmovzxbw xm1, [r1 + 16]
831
+ paddw xm0, xm1
832
+ pmovzxbw xm1, [r1 + 17]
833
+ paddw xm0, xm1
834
+ pmovzxbw xm1, [r1 + 18]
835
+ paddw xm0, xm1
836
+ pmovzxbw xm1, [r1 + 19]
837
+ paddw xm0, xm1
838
+ pmovzxbw xm1, [r1 + 20]
839
+ paddw xm0, xm1
840
+ pmovzxbw xm1, [r1 + 21]
841
+ paddw xm0, xm1
842
+ pmovzxbw xm1, [r1 + 22]
843
+ paddw xm0, xm1
844
+ pmovzxbw xm1, [r1 + 23]
845
+ paddw xm0, xm1
846
+%endmacro
847
+
848
+;-----------------------------------------------------------------------------
849
+;static void integral_init24h_c(uint32_t *sum, pixel *pix, intptr_t stride)
850
+;-----------------------------------------------------------------------------
851
+INIT_YMM avx2
852
+cglobal integral24h, 3, 5, 3
853
+ lea r3, [4 * r2]
854
+ sub r0, r3
855
+ sub r2, 24 ;stride - 24
856
+ mov r4, r2
857
+ shr r4, 4
858
+
859
+.loop_16:
860
+ INTEGRAL_TWENTYFOUR_HORIZONTAL_16
861
+ vperm2i128 m2, m0, m0, 1
862
+ pmovzxwd m2, xm2
863
+ pmovzxwd m0, xm0
864
+ movu m1, [r0]
865
+ paddd m0, m1
866
+ movu [r0 + r3], m0
867
+ movu m1, [r0 + 32]
868
+ paddd m2, m1
869
+ movu [r0 + r3 + 32], m2
870
+ add r1, 16
871
+ add r0, 64
872
+ sub r2, 16
873
+ sub r4, 1
874
+ jnz .loop_16
875
+ cmp r2, 8
876
+ je .loop_8
877
+ jmp .end
878
+
879
+.loop_8:
880
+ INTEGRAL_TWENTYFOUR_HORIZONTAL_8
881
+ pmovzxwd m0, xm0
882
+ movu m1, [r0]
883
+ paddd m0, m1
884
+ movu [r0 + r3], m0
885
+ jmp .end
886
+
887
+.end
888
+ RET
889
+
890
+%macro INTEGRAL_THIRTYTWO_HORIZONTAL_16 0
891
+ pmovzxbw m0, [r1]
892
+ pmovzxbw m1, [r1 + 1]
893
+ paddw m0, m1
894
+ pmovzxbw m1, [r1 + 2]
895
+ paddw m0, m1
896
+ pmovzxbw m1, [r1 + 3]
897
+ paddw m0, m1
898
+ pmovzxbw m1, [r1 + 4]
899
+ paddw m0, m1
900
+ pmovzxbw m1, [r1 + 5]
901
+ paddw m0, m1
902
+ pmovzxbw m1, [r1 + 6]
903
+ paddw m0, m1
904
+ pmovzxbw m1, [r1 + 7]
905
+ paddw m0, m1
906
+ pmovzxbw m1, [r1 + 8]
907
+ paddw m0, m1
908
+ pmovzxbw m1, [r1 + 9]
909
+ paddw m0, m1
910
+ pmovzxbw m1, [r1 + 10]
911
+ paddw m0, m1
912
+ pmovzxbw m1, [r1 + 11]
913
+ paddw m0, m1
914
+ pmovzxbw m1, [r1 + 12]
915
+ paddw m0, m1
916
+ pmovzxbw m1, [r1 + 13]
917
+ paddw m0, m1
918
+ pmovzxbw m1, [r1 + 14]
919
+ paddw m0, m1
920
+ pmovzxbw m1, [r1 + 15]
921
+ paddw m0, m1
922
+ pmovzxbw m1, [r1 + 16]
923
+ paddw m0, m1
924
+ pmovzxbw m1, [r1 + 17]
925
+ paddw m0, m1
926
+ pmovzxbw m1, [r1 + 18]
927
+ paddw m0, m1
928
+ pmovzxbw m1, [r1 + 19]
929
+ paddw m0, m1
930
+ pmovzxbw m1, [r1 + 20]
931
+ paddw m0, m1
932
+ pmovzxbw m1, [r1 + 21]
933
+ paddw m0, m1
934
+ pmovzxbw m1, [r1 + 22]
935
+ paddw m0, m1
936
+ pmovzxbw m1, [r1 + 23]
937
+ paddw m0, m1
938
+ pmovzxbw m1, [r1 + 24]
939
+ paddw m0, m1
940
+ pmovzxbw m1, [r1 + 25]
941
+ paddw m0, m1
942
+ pmovzxbw m1, [r1 + 26]
943
+ paddw m0, m1
944
+ pmovzxbw m1, [r1 + 27]
945
+ paddw m0, m1
946
+ pmovzxbw m1, [r1 + 28]
947
+ paddw m0, m1
948
+ pmovzxbw m1, [r1 + 29]
949
+ paddw m0, m1
950
+ pmovzxbw m1, [r1 + 30]
951
+ paddw m0, m1
952
+ pmovzxbw m1, [r1 + 31]
953
+ paddw m0, m1
954
+%endmacro
955
+
956
+
957
+%macro INTEGRAL_THIRTYTWO_HORIZONTAL_8 0
958
+ pmovzxbw xm0, [r1]
959
+ pmovzxbw xm1, [r1 + 1]
960
+ paddw xm0, xm1
961
+ pmovzxbw xm1, [r1 + 2]
962
+ paddw xm0, xm1
963
+ pmovzxbw xm1, [r1 + 3]
964
+ paddw xm0, xm1
965
+ pmovzxbw xm1, [r1 + 4]
966
+ paddw xm0, xm1
967
+ pmovzxbw xm1, [r1 + 5]
968
+ paddw xm0, xm1
969
+ pmovzxbw xm1, [r1 + 6]
970
+ paddw xm0, xm1
971
+ pmovzxbw xm1, [r1 + 7]
972
+ paddw xm0, xm1
973
+ pmovzxbw xm1, [r1 + 8]
974
+ paddw xm0, xm1
975
+ pmovzxbw xm1, [r1 + 9]
976
+ paddw xm0, xm1
977
+ pmovzxbw xm1, [r1 + 10]
978
+ paddw xm0, xm1
979
+ pmovzxbw xm1, [r1 + 11]
980
+ paddw xm0, xm1
981
+ pmovzxbw xm1, [r1 + 12]
982
+ paddw xm0, xm1
983
+ pmovzxbw xm1, [r1 + 13]
984
+ paddw xm0, xm1
985
+ pmovzxbw xm1, [r1 + 14]
986
+ paddw xm0, xm1
987
+ pmovzxbw xm1, [r1 + 15]
988
+ paddw xm0, xm1
989
+ pmovzxbw xm1, [r1 + 16]
990
+ paddw xm0, xm1
991
+ pmovzxbw xm1, [r1 + 17]
992
+ paddw xm0, xm1
993
+ pmovzxbw xm1, [r1 + 18]
994
+ paddw xm0, xm1
995
+ pmovzxbw xm1, [r1 + 19]
996
+ paddw xm0, xm1
997
+ pmovzxbw xm1, [r1 + 20]
998
+ paddw xm0, xm1
999
+ pmovzxbw xm1, [r1 + 21]
1000
+ paddw xm0, xm1
1001
+ pmovzxbw xm1, [r1 + 22]
1002
+ paddw xm0, xm1
1003
+ pmovzxbw xm1, [r1 + 23]
1004
+ paddw xm0, xm1
1005
+ pmovzxbw xm1, [r1 + 24]
1006
+ paddw xm0, xm1
1007
+ pmovzxbw xm1, [r1 + 25]
1008
+ paddw xm0, xm1
1009
+ pmovzxbw xm1, [r1 + 26]
1010
+ paddw xm0, xm1
1011
+ pmovzxbw xm1, [r1 + 27]
1012
+ paddw xm0, xm1
1013
+ pmovzxbw xm1, [r1 + 28]
1014
+ paddw xm0, xm1
1015
+ pmovzxbw xm1, [r1 + 29]
1016
+ paddw xm0, xm1
1017
+ pmovzxbw xm1, [r1 + 30]
1018
+ paddw xm0, xm1
1019
+ pmovzxbw xm1, [r1 + 31]
1020
+ paddw xm0, xm1
1021
+%endmacro
1022
+
1023
+;-----------------------------------------------------------------------------
1024
+;static void integral_init32h_c(uint32_t *sum, pixel *pix, intptr_t stride)
1025
+;-----------------------------------------------------------------------------
1026
+INIT_YMM avx2
1027
+cglobal integral32h, 3, 5, 3
1028
+ lea r3, [4 * r2]
1029
+ sub r0, r3
1030
+ sub r2, 32 ;stride - 32
1031
+ mov r4, r2
1032
+ shr r4, 4
1033
+
1034
+.loop_16:
1035
+ INTEGRAL_THIRTYTWO_HORIZONTAL_16
1036
+ vperm2i128 m2, m0, m0, 1
1037
+ pmovzxwd m2, xm2
1038
+ pmovzxwd m0, xm0
1039
+ movu m1, [r0]
1040
+ paddd m0, m1
1041
+ movu [r0 + r3], m0
1042
+ movu m1, [r0 + 32]
1043
+ paddd m2, m1
1044
+ movu [r0 + r3 + 32], m2
1045
+ add r1, 16
1046
+ add r0, 64
1047
+ sub r2, 16
1048
+ sub r4, 1
1049
+ jnz .loop_16
1050
+ cmp r2, 8
1051
+ je .loop_8
1052
+ jmp .end
1053
+
1054
+.loop_8:
1055
+ INTEGRAL_THIRTYTWO_HORIZONTAL_8
1056
+ pmovzxwd m0, xm0
1057
+ movu m1, [r0]
1058
+ paddd m0, m1
1059
+ movu [r0 + r3], m0
1060
+ jmp .end
1061
+
1062
+.end
1063
+ RET
1064
x265_2.5.tar.gz/source/common/x86/seaintegral.h
Added
44
1
2
+/*****************************************************************************
3
+* Copyright (C) 2013-2017 MulticoreWare, Inc
4
+*
5
+* Authors: Vignesh V Menon <vignesh@multicorewareinc.com>
6
+* Jayashri Murugan <jayashri@multicorewareinc.com>
7
+* Praveen Tiwari <praveen@multicorewareinc.com>
8
+*
9
+* This program is free software; you can redistribute it and/or modify
10
+* it under the terms of the GNU General Public License as published by
11
+* the Free Software Foundation; either version 2 of the License, or
12
+* (at your option) any later version.
13
+*
14
+* This program is distributed in the hope that it will be useful,
15
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
16
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17
+* GNU General Public License for more details.
18
+*
19
+* You should have received a copy of the GNU General Public License
20
+* along with this program; if not, write to the Free Software
21
+* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
22
+*
23
+* This program is also available under a commercial proprietary license.
24
+* For more information, contact us at license @ x265.com.
25
+*****************************************************************************/
26
+
27
+#ifndef X265_SEAINTEGRAL_H
28
+#define X265_SEAINTEGRAL_H
29
+
30
+void PFX(integral4v_avx2)(uint32_t *sum, intptr_t stride);
31
+void PFX(integral8v_avx2)(uint32_t *sum, intptr_t stride);
32
+void PFX(integral12v_avx2)(uint32_t *sum, intptr_t stride);
33
+void PFX(integral16v_avx2)(uint32_t *sum, intptr_t stride);
34
+void PFX(integral24v_avx2)(uint32_t *sum, intptr_t stride);
35
+void PFX(integral32v_avx2)(uint32_t *sum, intptr_t stride);
36
+void PFX(integral4h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
37
+void PFX(integral8h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
38
+void PFX(integral12h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
39
+void PFX(integral16h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
40
+void PFX(integral24h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
41
+void PFX(integral32h_avx2)(uint32_t *sum, pixel *pix, intptr_t stride);
42
+
43
+#endif //X265_SEAINTEGRAL_H
44
x265_2.4.tar.gz/source/common/x86/x86inc.asm -> x265_2.5.tar.gz/source/common/x86/x86inc.asm
Changed
827
1
2
SECTION .rodata align=%1
3
%endmacro
4
5
-%macro SECTION_TEXT 0-1 16
6
- SECTION .text align=%1
7
-%endmacro
8
-
9
%if WIN64
10
%define PIC
11
%elif ARCH_X86_64 == 0
12
13
%define r%1w %2w
14
%define r%1b %2b
15
%define r%1h %2h
16
+ %define %2q %2
17
%if %0 == 2
18
%define r%1m %2d
19
%define r%1mp %2
20
21
%define e%1h %3
22
%define r%1b %2
23
%define e%1b %2
24
-%if ARCH_X86_64 == 0
25
- %define r%1 e%1
26
-%endif
27
+ %if ARCH_X86_64 == 0
28
+ %define r%1 e%1
29
+ %endif
30
%endmacro
31
32
DECLARE_REG_SIZE ax, al, ah
33
34
35
%macro ASSERT 1
36
%if (%1) == 0
37
- %error assert failed
38
+ %error assertion ``%1'' failed
39
%endif
40
%endmacro
41
42
43
%ifnum %1
44
%if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT
45
%if %1 > 0
46
+ ; Reserve an additional register for storing the original stack pointer, but avoid using
47
+ ; eax/rax for this purpose since it can potentially get overwritten as a return value.
48
%assign regs_used (regs_used + 1)
49
- %elif ARCH_X86_64 && regs_used == num_args && num_args <= 4 + UNIX64 * 2
50
- %warning "Stack pointer will overwrite register argument"
51
+ %if ARCH_X86_64 && regs_used == 7
52
+ %assign regs_used 8
53
+ %elif ARCH_X86_64 == 0 && regs_used == 1
54
+ %assign regs_used 2
55
+ %endif
56
+ %endif
57
+ %if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
58
+ ; Ensure that we don't clobber any registers containing arguments. For UNIX64 we also preserve r6 (rax)
59
+ ; since it's used as a hidden argument in vararg functions to specify the number of vector registers used.
60
+ %assign regs_used 5 + UNIX64 * 3
61
%endif
62
%endif
63
%endif
64
65
DECLARE_REG 8, rsi, 72
66
DECLARE_REG 9, rbx, 80
67
DECLARE_REG 10, rbp, 88
68
-DECLARE_REG 11, R12, 96
69
-DECLARE_REG 12, R13, 104
70
-DECLARE_REG 13, R14, 112
71
-DECLARE_REG 14, R15, 120
72
+DECLARE_REG 11, R14, 96
73
+DECLARE_REG 12, R15, 104
74
+DECLARE_REG 13, R12, 112
75
+DECLARE_REG 14, R13, 120
76
77
%macro PROLOGUE 2-5+ 0 ; #args, #regs, #xmm_regs, [stack_size,] arg_names...
78
%assign num_args %1
79
80
WIN64_PUSH_XMM
81
%endmacro
82
83
-%macro WIN64_RESTORE_XMM_INTERNAL 1
84
+%macro WIN64_RESTORE_XMM_INTERNAL 0
85
%assign %%pad_size 0
86
%if xmm_regs_used > 8
87
%assign %%i xmm_regs_used
88
%rep xmm_regs_used-8
89
%assign %%i %%i-1
90
- movaps xmm %+ %%i, [%1 + (%%i-8)*16 + stack_size + 32]
91
+ movaps xmm %+ %%i, [rsp + (%%i-8)*16 + stack_size + 32]
92
%endrep
93
%endif
94
%if stack_size_padded > 0
95
%if stack_size > 0 && required_stack_alignment > STACK_ALIGNMENT
96
mov rsp, rstkm
97
%else
98
- add %1, stack_size_padded
99
+ add rsp, stack_size_padded
100
%assign %%pad_size stack_size_padded
101
%endif
102
%endif
103
%if xmm_regs_used > 7
104
- movaps xmm7, [%1 + stack_offset - %%pad_size + 24]
105
+ movaps xmm7, [rsp + stack_offset - %%pad_size + 24]
106
%endif
107
%if xmm_regs_used > 6
108
- movaps xmm6, [%1 + stack_offset - %%pad_size + 8]
109
+ movaps xmm6, [rsp + stack_offset - %%pad_size + 8]
110
%endif
111
%endmacro
112
113
-%macro WIN64_RESTORE_XMM 1
114
- WIN64_RESTORE_XMM_INTERNAL %1
115
+%macro WIN64_RESTORE_XMM 0
116
+ WIN64_RESTORE_XMM_INTERNAL
117
%assign stack_offset (stack_offset-stack_size_padded)
118
+ %assign stack_size_padded 0
119
%assign xmm_regs_used 0
120
%endmacro
121
122
%define has_epilogue regs_used > 7 || xmm_regs_used > 6 || mmsize == 32 || stack_size > 0
123
124
%macro RET 0
125
- WIN64_RESTORE_XMM_INTERNAL rsp
126
+ WIN64_RESTORE_XMM_INTERNAL
127
POP_IF_USED 14, 13, 12, 11, 10, 9, 8, 7
128
-%if mmsize == 32
129
- vzeroupper
130
-%endif
131
+ %if mmsize == 32
132
+ vzeroupper
133
+ %endif
134
AUTO_REP_RET
135
%endmacro
136
137
138
DECLARE_REG 8, R11, 24
139
DECLARE_REG 9, rbx, 32
140
DECLARE_REG 10, rbp, 40
141
-DECLARE_REG 11, R12, 48
142
-DECLARE_REG 12, R13, 56
143
-DECLARE_REG 13, R14, 64
144
-DECLARE_REG 14, R15, 72
145
+DECLARE_REG 11, R14, 48
146
+DECLARE_REG 12, R15, 56
147
+DECLARE_REG 13, R12, 64
148
+DECLARE_REG 14, R13, 72
149
150
%macro PROLOGUE 2-5+ ; #args, #regs, #xmm_regs, [stack_size,] arg_names...
151
%assign num_args %1
152
153
%define has_epilogue regs_used > 9 || mmsize == 32 || stack_size > 0
154
155
%macro RET 0
156
-%if stack_size_padded > 0
157
-%if required_stack_alignment > STACK_ALIGNMENT
158
- mov rsp, rstkm
159
-%else
160
- add rsp, stack_size_padded
161
-%endif
162
-%endif
163
+ %if stack_size_padded > 0
164
+ %if required_stack_alignment > STACK_ALIGNMENT
165
+ mov rsp, rstkm
166
+ %else
167
+ add rsp, stack_size_padded
168
+ %endif
169
+ %endif
170
POP_IF_USED 14, 13, 12, 11, 10, 9
171
-%if mmsize == 32
172
- vzeroupper
173
-%endif
174
+ %if mmsize == 32
175
+ vzeroupper
176
+ %endif
177
AUTO_REP_RET
178
%endmacro
179
180
181
%define has_epilogue regs_used > 3 || mmsize == 32 || stack_size > 0
182
183
%macro RET 0
184
-%if stack_size_padded > 0
185
-%if required_stack_alignment > STACK_ALIGNMENT
186
- mov rsp, rstkm
187
-%else
188
- add rsp, stack_size_padded
189
-%endif
190
-%endif
191
+ %if stack_size_padded > 0
192
+ %if required_stack_alignment > STACK_ALIGNMENT
193
+ mov rsp, rstkm
194
+ %else
195
+ add rsp, stack_size_padded
196
+ %endif
197
+ %endif
198
POP_IF_USED 6, 5, 4, 3
199
-%if mmsize == 32
200
- vzeroupper
201
-%endif
202
+ %if mmsize == 32
203
+ vzeroupper
204
+ %endif
205
AUTO_REP_RET
206
%endmacro
207
208
%endif ;======================================================================
209
210
%if WIN64 == 0
211
-%macro WIN64_SPILL_XMM 1
212
-%endmacro
213
-%macro WIN64_RESTORE_XMM 1
214
-%endmacro
215
-%macro WIN64_PUSH_XMM 0
216
-%endmacro
217
+ %macro WIN64_SPILL_XMM 1
218
+ %endmacro
219
+ %macro WIN64_RESTORE_XMM 0
220
+ %endmacro
221
+ %macro WIN64_PUSH_XMM 0
222
+ %endmacro
223
%endif
224
225
; On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
226
227
228
%define last_branch_adr $$
229
%macro AUTO_REP_RET 0
230
- %ifndef cpuflags
231
- times ((last_branch_adr-$)>>31)+1 rep ; times 1 iff $ != last_branch_adr.
232
- %elif notcpuflag(ssse3)
233
- times ((last_branch_adr-$)>>31)+1 rep
234
+ %if notcpuflag(ssse3)
235
+ times ((last_branch_adr-$)>>31)+1 rep ; times 1 iff $ == last_branch_adr.
236
%endif
237
ret
238
%endmacro
239
240
%rep %0
241
%macro %1 1-2 %1
242
%2 %1
243
- %%branch_instr:
244
- %xdefine last_branch_adr %%branch_instr
245
+ %if notcpuflag(ssse3)
246
+ %%branch_instr equ $
247
+ %xdefine last_branch_adr %%branch_instr
248
+ %endif
249
%endmacro
250
%rotate 1
251
%endrep
252
253
; This is needed for ELF, otherwise the GNU linker assumes the stack is
254
; executable by default.
255
%ifidn __OUTPUT_FORMAT__,elf
256
-SECTION .note.GNU-stack noalloc noexec nowrite progbits
257
+ [SECTION .note.GNU-stack noalloc noexec nowrite progbits]
258
%endif
259
260
; cpuflags
261
262
%assign cpuflags_sse (1<<4) | cpuflags_mmx2
263
%assign cpuflags_sse2 (1<<5) | cpuflags_sse
264
%assign cpuflags_sse2slow (1<<6) | cpuflags_sse2
265
-%assign cpuflags_sse3 (1<<7) | cpuflags_sse2
266
-%assign cpuflags_ssse3 (1<<8) | cpuflags_sse3
267
-%assign cpuflags_sse4 (1<<9) | cpuflags_ssse3
268
-%assign cpuflags_sse42 (1<<10)| cpuflags_sse4
269
-%assign cpuflags_avx (1<<11)| cpuflags_sse42
270
-%assign cpuflags_xop (1<<12)| cpuflags_avx
271
-%assign cpuflags_fma4 (1<<13)| cpuflags_avx
272
-%assign cpuflags_avx2 (1<<14)| cpuflags_avx
273
+%assign cpuflags_lzcnt (1<<7) | cpuflags_sse2
274
+%assign cpuflags_sse3 (1<<8) | cpuflags_sse2
275
+%assign cpuflags_ssse3 (1<<9) | cpuflags_sse3
276
+%assign cpuflags_sse4 (1<<10)| cpuflags_ssse3
277
+%assign cpuflags_sse42 (1<<11)| cpuflags_sse4
278
+%assign cpuflags_avx (1<<12)| cpuflags_sse42
279
+%assign cpuflags_xop (1<<13)| cpuflags_avx
280
+%assign cpuflags_fma4 (1<<14)| cpuflags_avx
281
%assign cpuflags_fma3 (1<<15)| cpuflags_avx
282
+%assign cpuflags_bmi1 (1<<16)| cpuflags_avx | cpuflags_lzcnt
283
+%assign cpuflags_bmi2 (1<<17)| cpuflags_bmi1
284
+%assign cpuflags_avx2 (1<<18)| cpuflags_fma3 | cpuflags_bmi2
285
286
-%assign cpuflags_cache32 (1<<16)
287
-%assign cpuflags_cache64 (1<<17)
288
-%assign cpuflags_slowctz (1<<18)
289
-%assign cpuflags_lzcnt (1<<19)
290
-%assign cpuflags_aligned (1<<20) ; not a cpu feature, but a function variant
291
-%assign cpuflags_atom (1<<21)
292
-%assign cpuflags_bmi1 (1<<22)|cpuflags_lzcnt
293
-%assign cpuflags_bmi2 (1<<23)|cpuflags_bmi1
294
+%assign cpuflags_cache32 (1<<19)
295
+%assign cpuflags_cache64 (1<<20)
296
+%assign cpuflags_slowctz (1<<21)
297
+%assign cpuflags_aligned (1<<22) ; not a cpu feature, but a function variant
298
+%assign cpuflags_atom (1<<23)
299
300
-%define cpuflag(x) ((cpuflags & (cpuflags_ %+ x)) == (cpuflags_ %+ x))
301
-%define notcpuflag(x) ((cpuflags & (cpuflags_ %+ x)) != (cpuflags_ %+ x))
302
+; Returns a boolean value expressing whether or not the specified cpuflag is enabled.
303
+%define cpuflag(x) (((((cpuflags & (cpuflags_ %+ x)) ^ (cpuflags_ %+ x)) - 1) >> 31) & 1)
304
+%define notcpuflag(x) (cpuflag(x) ^ 1)
305
306
; Takes an arbitrary number of cpuflags from the above list.
307
; All subsequent functions (up to the next INIT_CPUFLAGS) is built for the specified cpu.
308
309
%define movnta movntq
310
%assign %%i 0
311
%rep 8
312
- CAT_XDEFINE m, %%i, mm %+ %%i
313
- CAT_XDEFINE nmm, %%i, %%i
314
- %assign %%i %%i+1
315
+ CAT_XDEFINE m, %%i, mm %+ %%i
316
+ CAT_XDEFINE nnmm, %%i, %%i
317
+ %assign %%i %%i+1
318
%endrep
319
%rep 8
320
- CAT_UNDEF m, %%i
321
- CAT_UNDEF nmm, %%i
322
- %assign %%i %%i+1
323
+ CAT_UNDEF m, %%i
324
+ CAT_UNDEF nnmm, %%i
325
+ %assign %%i %%i+1
326
%endrep
327
INIT_CPUFLAGS %1
328
%endmacro
329
330
%define mmsize 16
331
%define num_mmregs 8
332
%if ARCH_X86_64
333
- %define num_mmregs 16
334
+ %define num_mmregs 16
335
%endif
336
%define mova movdqa
337
%define movu movdqu
338
339
%define movnta movntdq
340
%assign %%i 0
341
%rep num_mmregs
342
- CAT_XDEFINE m, %%i, xmm %+ %%i
343
- CAT_XDEFINE nxmm, %%i, %%i
344
- %assign %%i %%i+1
345
+ CAT_XDEFINE m, %%i, xmm %+ %%i
346
+ CAT_XDEFINE nnxmm, %%i, %%i
347
+ %assign %%i %%i+1
348
%endrep
349
INIT_CPUFLAGS %1
350
%endmacro
351
352
%define mmsize 32
353
%define num_mmregs 8
354
%if ARCH_X86_64
355
- %define num_mmregs 16
356
+ %define num_mmregs 16
357
%endif
358
%define mova movdqa
359
%define movu movdqu
360
361
%define movnta movntdq
362
%assign %%i 0
363
%rep num_mmregs
364
- CAT_XDEFINE m, %%i, ymm %+ %%i
365
- CAT_XDEFINE nymm, %%i, %%i
366
- %assign %%i %%i+1
367
+ CAT_XDEFINE m, %%i, ymm %+ %%i
368
+ CAT_XDEFINE nnymm, %%i, %%i
369
+ %assign %%i %%i+1
370
%endrep
371
INIT_CPUFLAGS %1
372
%endmacro
373
374
%define ymmmm%1 mm%1
375
%define ymmxmm%1 xmm%1
376
%define ymmymm%1 ymm%1
377
- %define ymm%1xmm xmm%1
378
- %define xmm%1ymm ymm%1
379
%define xm%1 xmm %+ m%1
380
%define ym%1 ymm %+ m%1
381
%endmacro
382
383
%assign i 0
384
%rep 16
385
DECLARE_MMCAST i
386
-%assign i i+1
387
+ %assign i i+1
388
%endrep
389
390
; I often want to use macros that permute their arguments. e.g. there's no
391
392
; doesn't cost any cycles.
393
394
%macro PERMUTE 2-* ; takes a list of pairs to swap
395
-%rep %0/2
396
- %xdefine %%tmp%2 m%2
397
- %rotate 2
398
-%endrep
399
-%rep %0/2
400
- %xdefine m%1 %%tmp%2
401
- CAT_XDEFINE n, m%1, %1
402
- %rotate 2
403
-%endrep
404
+ %rep %0/2
405
+ %xdefine %%tmp%2 m%2
406
+ %rotate 2
407
+ %endrep
408
+ %rep %0/2
409
+ %xdefine m%1 %%tmp%2
410
+ CAT_XDEFINE nn, m%1, %1
411
+ %rotate 2
412
+ %endrep
413
%endmacro
414
415
%macro SWAP 2+ ; swaps a single chain (sometimes more concise than pairs)
416
-%ifnum %1 ; SWAP 0, 1, ...
417
- SWAP_INTERNAL_NUM %1, %2
418
-%else ; SWAP m0, m1, ...
419
- SWAP_INTERNAL_NAME %1, %2
420
-%endif
421
+ %ifnum %1 ; SWAP 0, 1, ...
422
+ SWAP_INTERNAL_NUM %1, %2
423
+ %else ; SWAP m0, m1, ...
424
+ SWAP_INTERNAL_NAME %1, %2
425
+ %endif
426
%endmacro
427
428
%macro SWAP_INTERNAL_NUM 2-*
429
430
%xdefine %%tmp m%1
431
%xdefine m%1 m%2
432
%xdefine m%2 %%tmp
433
- CAT_XDEFINE n, m%1, %1
434
- CAT_XDEFINE n, m%2, %2
435
- %rotate 1
436
+ CAT_XDEFINE nn, m%1, %1
437
+ CAT_XDEFINE nn, m%2, %2
438
+ %rotate 1
439
%endrep
440
%endmacro
441
442
%macro SWAP_INTERNAL_NAME 2-*
443
- %xdefine %%args n %+ %1
444
+ %xdefine %%args nn %+ %1
445
%rep %0-1
446
- %xdefine %%args %%args, n %+ %2
447
- %rotate 1
448
+ %xdefine %%args %%args, nn %+ %2
449
+ %rotate 1
450
%endrep
451
SWAP_INTERNAL_NUM %%args
452
%endmacro
453
454
%assign %%i 0
455
%rep num_mmregs
456
CAT_XDEFINE %%f, %%i, m %+ %%i
457
- %assign %%i %%i+1
458
+ %assign %%i %%i+1
459
%endrep
460
%endmacro
461
462
463
%assign %%i 0
464
%rep num_mmregs
465
CAT_XDEFINE m, %%i, %1_m %+ %%i
466
- CAT_XDEFINE n, m %+ %%i, %%i
467
- %assign %%i %%i+1
468
+ CAT_XDEFINE nn, m %+ %%i, %%i
469
+ %assign %%i %%i+1
470
%endrep
471
%endif
472
%endmacro
473
474
; Append cpuflags to the callee's name iff the appended name is known and the plain name isn't
475
%macro call 1
476
- call_internal %1, %1 %+ SUFFIX
477
+ %ifid %1
478
+ call_internal %1 %+ SUFFIX, %1
479
+ %else
480
+ call %1
481
+ %endif
482
%endmacro
483
%macro call_internal 2
484
- %xdefine %%i %1
485
- %ifndef cglobaled_%1
486
- %ifdef cglobaled_%2
487
- %xdefine %%i %2
488
+ %xdefine %%i %2
489
+ %ifndef cglobaled_%2
490
+ %ifdef cglobaled_%1
491
+ %xdefine %%i %1
492
%endif
493
%endif
494
call %%i
495
496
%endif
497
CAT_XDEFINE sizeofxmm, i, 16
498
CAT_XDEFINE sizeofymm, i, 32
499
-%assign i i+1
500
+ %assign i i+1
501
%endrep
502
%undef i
503
504
505
;%1 == instruction
506
;%2 == minimal instruction set
507
;%3 == 1 if float, 0 if int
508
-;%4 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise
509
+;%4 == 1 if 4-operand emulation, 0 if 3-operand emulation, 255 otherwise (no emulation)
510
;%5 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not
511
;%6+: operands
512
%macro RUN_AVX_INSTR 6-9+
513
514
%ifdef cpuname
515
%if notcpuflag(%2)
516
%error use of ``%1'' %2 instruction in cpuname function: current_function
517
+ %elif cpuflags_%2 < cpuflags_sse && notcpuflag(sse2) && __sizeofreg > 8
518
+ %error use of ``%1'' sse2 instruction in cpuname function: current_function
519
%endif
520
%endif
521
%endif
522
523
%if __emulate_avx
524
%xdefine __src1 %7
525
%xdefine __src2 %8
526
- %ifnidn %6, %7
527
- %if %0 >= 9
528
- CHECK_AVX_INSTR_EMU {%1 %6, %7, %8, %9}, %6, %8, %9
529
- %else
530
- CHECK_AVX_INSTR_EMU {%1 %6, %7, %8}, %6, %8
531
- %endif
532
- %if %5 && %4 == 0
533
- %ifnid %8
534
+ %if %5 && %4 == 0
535
+ %ifnidn %6, %7
536
+ %ifidn %6, %8
537
+ %xdefine __src1 %8
538
+ %xdefine __src2 %7
539
+ %elifnnum sizeof%8
540
; 3-operand AVX instructions with a memory arg can only have it in src2,
541
; whereas SSE emulation prefers to have it in src1 (i.e. the mov).
542
; So, if the instruction is commutative with a memory arg, swap them.
543
544
%xdefine __src2 %7
545
%endif
546
%endif
547
+ %endif
548
+ %ifnidn %6, __src1
549
+ %if %0 >= 9
550
+ CHECK_AVX_INSTR_EMU {%1 %6, %7, %8, %9}, %6, __src2, %9
551
+ %else
552
+ CHECK_AVX_INSTR_EMU {%1 %6, %7, %8}, %6, __src2
553
+ %endif
554
%if __sizeofreg == 8
555
MOVQ %6, __src1
556
%elif %3
557
558
;%1 == instruction
559
;%2 == minimal instruction set
560
;%3 == 1 if float, 0 if int
561
-;%4 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise
562
+;%4 == 1 if 4-operand emulation, 0 if 3-operand emulation, 255 otherwise (no emulation)
563
;%5 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not
564
-%macro AVX_INSTR 1-5 fnord, 0, 1, 0
565
+%macro AVX_INSTR 1-5 fnord, 0, 255, 0
566
%macro %1 1-10 fnord, fnord, fnord, fnord, %1, %2, %3, %4, %5
567
%ifidn %2, fnord
568
RUN_AVX_INSTR %6, %7, %8, %9, %10, %1
569
570
; Non-destructive instructions are written without parameters
571
AVX_INSTR addpd, sse2, 1, 0, 1
572
AVX_INSTR addps, sse, 1, 0, 1
573
-AVX_INSTR addsd, sse2, 1, 0, 1
574
-AVX_INSTR addss, sse, 1, 0, 1
575
+AVX_INSTR addsd, sse2, 1, 0, 0
576
+AVX_INSTR addss, sse, 1, 0, 0
577
AVX_INSTR addsubpd, sse3, 1, 0, 0
578
AVX_INSTR addsubps, sse3, 1, 0, 0
579
AVX_INSTR aesdec, fnord, 0, 0, 0
580
581
AVX_INSTR andnps, sse, 1, 0, 0
582
AVX_INSTR andpd, sse2, 1, 0, 1
583
AVX_INSTR andps, sse, 1, 0, 1
584
-AVX_INSTR blendpd, sse4, 1, 0, 0
585
-AVX_INSTR blendps, sse4, 1, 0, 0
586
-AVX_INSTR blendvpd, sse4, 1, 0, 0
587
-AVX_INSTR blendvps, sse4, 1, 0, 0
588
+AVX_INSTR blendpd, sse4, 1, 1, 0
589
+AVX_INSTR blendps, sse4, 1, 1, 0
590
+AVX_INSTR blendvpd, sse4 ; can't be emulated
591
+AVX_INSTR blendvps, sse4 ; can't be emulated
592
AVX_INSTR cmppd, sse2, 1, 1, 0
593
AVX_INSTR cmpps, sse, 1, 1, 0
594
AVX_INSTR cmpsd, sse2, 1, 1, 0
595
596
AVX_INSTR cvtps2dq, sse2
597
AVX_INSTR cvtps2pd, sse2
598
AVX_INSTR cvtsd2si, sse2
599
-AVX_INSTR cvtsd2ss, sse2
600
-AVX_INSTR cvtsi2sd, sse2
601
-AVX_INSTR cvtsi2ss, sse
602
-AVX_INSTR cvtss2sd, sse2
603
+AVX_INSTR cvtsd2ss, sse2, 1, 0, 0
604
+AVX_INSTR cvtsi2sd, sse2, 1, 0, 0
605
+AVX_INSTR cvtsi2ss, sse, 1, 0, 0
606
+AVX_INSTR cvtss2sd, sse2, 1, 0, 0
607
AVX_INSTR cvtss2si, sse
608
AVX_INSTR cvttpd2dq, sse2
609
AVX_INSTR cvttps2dq, sse2
610
611
AVX_INSTR maskmovdqu, sse2
612
AVX_INSTR maxpd, sse2, 1, 0, 1
613
AVX_INSTR maxps, sse, 1, 0, 1
614
-AVX_INSTR maxsd, sse2, 1, 0, 1
615
-AVX_INSTR maxss, sse, 1, 0, 1
616
+AVX_INSTR maxsd, sse2, 1, 0, 0
617
+AVX_INSTR maxss, sse, 1, 0, 0
618
AVX_INSTR minpd, sse2, 1, 0, 1
619
AVX_INSTR minps, sse, 1, 0, 1
620
-AVX_INSTR minsd, sse2, 1, 0, 1
621
-AVX_INSTR minss, sse, 1, 0, 1
622
+AVX_INSTR minsd, sse2, 1, 0, 0
623
+AVX_INSTR minss, sse, 1, 0, 0
624
AVX_INSTR movapd, sse2
625
AVX_INSTR movaps, sse
626
-AVX_INSTR movd
627
+AVX_INSTR movd, mmx
628
AVX_INSTR movddup, sse3
629
AVX_INSTR movdqa, sse2
630
AVX_INSTR movdqu, sse2
631
632
AVX_INSTR movntdqa, sse4
633
AVX_INSTR movntpd, sse2
634
AVX_INSTR movntps, sse
635
-AVX_INSTR movq
636
+AVX_INSTR movq, mmx
637
AVX_INSTR movsd, sse2, 1, 0, 0
638
AVX_INSTR movshdup, sse3
639
AVX_INSTR movsldup, sse3
640
AVX_INSTR movss, sse, 1, 0, 0
641
AVX_INSTR movupd, sse2
642
AVX_INSTR movups, sse
643
-AVX_INSTR mpsadbw, sse4
644
+AVX_INSTR mpsadbw, sse4, 0, 1, 0
645
AVX_INSTR mulpd, sse2, 1, 0, 1
646
AVX_INSTR mulps, sse, 1, 0, 1
647
-AVX_INSTR mulsd, sse2, 1, 0, 1
648
-AVX_INSTR mulss, sse, 1, 0, 1
649
+AVX_INSTR mulsd, sse2, 1, 0, 0
650
+AVX_INSTR mulss, sse, 1, 0, 0
651
AVX_INSTR orpd, sse2, 1, 0, 1
652
AVX_INSTR orps, sse, 1, 0, 1
653
AVX_INSTR pabsb, ssse3
654
655
AVX_INSTR paddsw, mmx, 0, 0, 1
656
AVX_INSTR paddusb, mmx, 0, 0, 1
657
AVX_INSTR paddusw, mmx, 0, 0, 1
658
-AVX_INSTR palignr, ssse3
659
+AVX_INSTR palignr, ssse3, 0, 1, 0
660
AVX_INSTR pand, mmx, 0, 0, 1
661
AVX_INSTR pandn, mmx, 0, 0, 0
662
AVX_INSTR pavgb, mmx2, 0, 0, 1
663
AVX_INSTR pavgw, mmx2, 0, 0, 1
664
-AVX_INSTR pblendvb, sse4, 0, 0, 0
665
-AVX_INSTR pblendw, sse4
666
-AVX_INSTR pclmulqdq
667
+AVX_INSTR pblendvb, sse4 ; can't be emulated
668
+AVX_INSTR pblendw, sse4, 0, 1, 0
669
+AVX_INSTR pclmulqdq, fnord, 0, 1, 0
670
+AVX_INSTR pclmulhqhqdq, fnord, 0, 0, 0
671
+AVX_INSTR pclmulhqlqdq, fnord, 0, 0, 0
672
+AVX_INSTR pclmullqhqdq, fnord, 0, 0, 0
673
+AVX_INSTR pclmullqlqdq, fnord, 0, 0, 0
674
AVX_INSTR pcmpestri, sse42
675
AVX_INSTR pcmpestrm, sse42
676
AVX_INSTR pcmpistri, sse42
677
678
AVX_INSTR phsubw, ssse3, 0, 0, 0
679
AVX_INSTR phsubd, ssse3, 0, 0, 0
680
AVX_INSTR phsubsw, ssse3, 0, 0, 0
681
-AVX_INSTR pinsrb, sse4
682
-AVX_INSTR pinsrd, sse4
683
-AVX_INSTR pinsrq, sse4
684
-AVX_INSTR pinsrw, mmx2
685
+AVX_INSTR pinsrb, sse4, 0, 1, 0
686
+AVX_INSTR pinsrd, sse4, 0, 1, 0
687
+AVX_INSTR pinsrq, sse4, 0, 1, 0
688
+AVX_INSTR pinsrw, mmx2, 0, 1, 0
689
AVX_INSTR pmaddwd, mmx, 0, 0, 1
690
AVX_INSTR pmaddubsw, ssse3, 0, 0, 0
691
AVX_INSTR pmaxsb, sse4, 0, 0, 1
692
693
AVX_INSTR punpckldq, mmx, 0, 0, 0
694
AVX_INSTR punpcklqdq, sse2, 0, 0, 0
695
AVX_INSTR pxor, mmx, 0, 0, 1
696
-AVX_INSTR rcpps, sse, 1, 0, 0
697
+AVX_INSTR rcpps, sse
698
AVX_INSTR rcpss, sse, 1, 0, 0
699
AVX_INSTR roundpd, sse4
700
AVX_INSTR roundps, sse4
701
-AVX_INSTR roundsd, sse4
702
-AVX_INSTR roundss, sse4
703
-AVX_INSTR rsqrtps, sse, 1, 0, 0
704
+AVX_INSTR roundsd, sse4, 1, 1, 0
705
+AVX_INSTR roundss, sse4, 1, 1, 0
706
+AVX_INSTR rsqrtps, sse
707
AVX_INSTR rsqrtss, sse, 1, 0, 0
708
AVX_INSTR shufpd, sse2, 1, 1, 0
709
AVX_INSTR shufps, sse, 1, 1, 0
710
-AVX_INSTR sqrtpd, sse2, 1, 0, 0
711
-AVX_INSTR sqrtps, sse, 1, 0, 0
712
+AVX_INSTR sqrtpd, sse2
713
+AVX_INSTR sqrtps, sse
714
AVX_INSTR sqrtsd, sse2, 1, 0, 0
715
AVX_INSTR sqrtss, sse, 1, 0, 0
716
AVX_INSTR stmxcsr, sse
717
718
%else
719
CAT_XDEFINE q, j, i
720
%endif
721
-%assign i i+1
722
+ %assign i i+1
723
%endrep
724
%undef i
725
%undef j
726
727
FMA_INSTR pmacsdql, pmuldq, paddq ; sse4 emulation
728
FMA_INSTR pmadcswd, pmaddwd, paddd
729
730
-; convert FMA4 to FMA3 if possible
731
-%macro FMA4_INSTR 4
732
- %macro %1 4-8 %1, %2, %3, %4
733
- %if cpuflag(fma4)
734
- v%5 %1, %2, %3, %4
735
- %elifidn %1, %2
736
- v%6 %1, %4, %3 ; %1 = %1 * %3 + %4
737
- %elifidn %1, %3
738
- v%7 %1, %2, %4 ; %1 = %2 * %1 + %4
739
- %elifidn %1, %4
740
- v%8 %1, %2, %3 ; %1 = %2 * %3 + %1
741
+; Macros for consolidating FMA3 and FMA4 using 4-operand (dst, src1, src2, src3) syntax.
742
+; FMA3 is only possible if dst is the same as one of the src registers.
743
+; Either src2 or src3 can be a memory operand.
744
+%macro FMA4_INSTR 2-*
745
+ %push fma4_instr
746
+ %xdefine %$prefix %1
747
+ %rep %0 - 1
748
+ %macro %$prefix%2 4-6 %$prefix, %2
749
+ %if notcpuflag(fma3) && notcpuflag(fma4)
750
+ %error use of ``%5%6'' fma instruction in cpuname function: current_function
751
+ %elif cpuflag(fma4)
752
+ v%5%6 %1, %2, %3, %4
753
+ %elifidn %1, %2
754
+ ; If %3 or %4 is a memory operand it needs to be encoded as the last operand.
755
+ %ifid %3
756
+ v%{5}213%6 %2, %3, %4
757
+ %else
758
+ v%{5}132%6 %2, %4, %3
759
+ %endif
760
+ %elifidn %1, %3
761
+ v%{5}213%6 %3, %2, %4
762
+ %elifidn %1, %4
763
+ v%{5}231%6 %4, %2, %3
764
+ %else
765
+ %error fma3 emulation of ``%5%6 %1, %2, %3, %4'' is not supported
766
+ %endif
767
+ %endmacro
768
+ %rotate 1
769
+ %endrep
770
+ %pop
771
+%endmacro
772
+
773
+FMA4_INSTR fmadd, pd, ps, sd, ss
774
+FMA4_INSTR fmaddsub, pd, ps
775
+FMA4_INSTR fmsub, pd, ps, sd, ss
776
+FMA4_INSTR fmsubadd, pd, ps
777
+FMA4_INSTR fnmadd, pd, ps, sd, ss
778
+FMA4_INSTR fnmsub, pd, ps, sd, ss
779
+
780
+; workaround: vpbroadcastq is broken in x86_32 due to a yasm bug (fixed in 1.3.0)
781
+%if __YASM_VERSION_ID__ < 0x01030000 && ARCH_X86_64 == 0
782
+ %macro vpbroadcastq 2
783
+ %if sizeof%1 == 16
784
+ movddup %1, %2
785
%else
786
- %error fma3 emulation of ``%5 %1, %2, %3, %4'' is not supported
787
+ vbroadcastsd %1, %2
788
%endif
789
%endmacro
790
-%endmacro
791
-
792
-FMA4_INSTR fmaddpd, fmadd132pd, fmadd213pd, fmadd231pd
793
-FMA4_INSTR fmaddps, fmadd132ps, fmadd213ps, fmadd231ps
794
-FMA4_INSTR fmaddsd, fmadd132sd, fmadd213sd, fmadd231sd
795
-FMA4_INSTR fmaddss, fmadd132ss, fmadd213ss, fmadd231ss
796
-
797
-FMA4_INSTR fmaddsubpd, fmaddsub132pd, fmaddsub213pd, fmaddsub231pd
798
-FMA4_INSTR fmaddsubps, fmaddsub132ps, fmaddsub213ps, fmaddsub231ps
799
-FMA4_INSTR fmsubaddpd, fmsubadd132pd, fmsubadd213pd, fmsubadd231pd
800
-FMA4_INSTR fmsubaddps, fmsubadd132ps, fmsubadd213ps, fmsubadd231ps
801
-
802
-FMA4_INSTR fmsubpd, fmsub132pd, fmsub213pd, fmsub231pd
803
-FMA4_INSTR fmsubps, fmsub132ps, fmsub213ps, fmsub231ps
804
-FMA4_INSTR fmsubsd, fmsub132sd, fmsub213sd, fmsub231sd
805
-FMA4_INSTR fmsubss, fmsub132ss, fmsub213ss, fmsub231ss
806
-
807
-FMA4_INSTR fnmaddpd, fnmadd132pd, fnmadd213pd, fnmadd231pd
808
-FMA4_INSTR fnmaddps, fnmadd132ps, fnmadd213ps, fnmadd231ps
809
-FMA4_INSTR fnmaddsd, fnmadd132sd, fnmadd213sd, fnmadd231sd
810
-FMA4_INSTR fnmaddss, fnmadd132ss, fnmadd213ss, fnmadd231ss
811
-
812
-FMA4_INSTR fnmsubpd, fnmsub132pd, fnmsub213pd, fnmsub231pd
813
-FMA4_INSTR fnmsubps, fnmsub132ps, fnmsub213ps, fnmsub231ps
814
-FMA4_INSTR fnmsubsd, fnmsub132sd, fnmsub213sd, fnmsub231sd
815
-FMA4_INSTR fnmsubss, fnmsub132ss, fnmsub213ss, fnmsub231ss
816
-
817
-; workaround: vpbroadcastq is broken in x86_32 due to a yasm bug
818
-%if ARCH_X86_64 == 0
819
-%macro vpbroadcastq 2
820
-%if sizeof%1 == 16
821
- movddup %1, %2
822
-%else
823
- vbroadcastsd %1, %2
824
-%endif
825
-%endmacro
826
%endif
827
x265_2.4.tar.gz/source/dynamicHDR10/BasicStructures.h -> x265_2.5.tar.gz/source/dynamicHDR10/BasicStructures.h
Changed
32
1
2
float maxRLuminance = 0.0;
3
float maxGLuminance = 0.0;
4
float maxBLuminance = 0.0;
5
- int order;
6
+ int order = 0;
7
std::vector<unsigned int> percentiles;
8
};
9
10
struct BezierCurveData
11
{
12
- int order;
13
- int sPx;
14
- int sPy;
15
+ int order = 0;
16
+ int sPx = 0;
17
+ int sPy = 0;
18
std::vector<int> coeff;
19
};
20
21
+struct PercentileLuminance{
22
+
23
+ float averageLuminance = 0.0;
24
+ float maxRLuminance = 0.0;
25
+ float maxGLuminance = 0.0;
26
+ float maxBLuminance = 0.0;
27
+ int order = 0;
28
+ std::vector<unsigned int> percentiles;
29
+};
30
+
31
#endif // BASICSTRUCTURES_H
32
x265_2.4.tar.gz/source/dynamicHDR10/CMakeLists.txt -> x265_2.5.tar.gz/source/dynamicHDR10/CMakeLists.txt
Changed
48
1
2
# vim: syntax=cmake
3
-if(ENABLE_DYNAMIC_HDR10)
4
+if(ENABLE_HDR10_PLUS)
5
6
add_library(dynamicHDR10 OBJECT
7
- BasicStructures.cpp BasicStructures.h
8
+ BasicStructures.h
9
json11/json11.cpp json11/json11.h
10
JsonHelper.cpp JsonHelper.h
11
metadataFromJson.cpp metadataFromJson.h
12
13
hdr10plus.h
14
api.cpp )
15
16
-else()
17
cmake_minimum_required (VERSION 2.8.11)
18
project(dynamicHDR10)
19
include(CheckIncludeFiles)
20
21
22
option(ENABLE_SHARED "Build shared library" OFF)
23
24
-if(ENABLE_SHARED)
25
- add_library(dynamicHDR10 SHARED
26
- json11/json11.cpp json11/json11.h
27
- BasicStructures.cpp BasicStructures.h
28
- JsonHelper.cpp JsonHelper.h
29
- metadataFromJson.cpp metadataFromJson.h
30
- SeiMetadataDictionary.cpp SeiMetadataDictionary.h
31
- hdr10plus.h api.cpp )
32
-else()
33
- add_library(dynamicHDR10 STATIC
34
- json11/json11.cpp json11/json11.h
35
- BasicStructures.cpp BasicStructures.h
36
- JsonHelper.cpp JsonHelper.h
37
- metadataFromJson.cpp metadataFromJson.h
38
- SeiMetadataDictionary.cpp SeiMetadataDictionary.h
39
- hdr10plus.h api.cpp )
40
-endif()
41
-
42
-install (TARGETS dynamicHDR10
43
- LIBRARY DESTINATION ${LIB_INSTALL_DIR}
44
- ARCHIVE DESTINATION ${LIB_INSTALL_DIR})
45
install(FILES hdr10plus.h DESTINATION include)
46
endif()
47
\ No newline at end of file
48
x265_2.4.tar.gz/source/dynamicHDR10/json11/json11.cpp -> x265_2.5.tar.gz/source/dynamicHDR10/json11/json11.cpp
Changed
50
1
2
#include <cstdio>
3
#include <limits>
4
5
+#if _MSC_VER
6
+#pragma warning(disable: 4510) //const member cannot be default initialized
7
+#pragma warning(disable: 4512) //assignment operator could not be generated
8
+#pragma warning(disable: 4610) //const member cannot be default initialized
9
+#endif
10
+
11
namespace json11 {
12
13
static const int max_depth = 200;
14
15
char get_next_token() {
16
consume_garbage();
17
if (i == str.size())
18
- return fail("unexpected end of input", 0);
19
+ return fail("unexpected end of input", '0');
20
21
return str[i++];
22
}
23
24
string parse_string() {
25
string out;
26
long last_escaped_codepoint = -1;
27
- while (true) {
28
+ for (;;) {
29
if (i == str.size())
30
return fail("unexpected end of input in string", "");
31
32
33
if (ch == '}')
34
return data;
35
36
- while (1) {
37
+ for (;;) {
38
if (ch != '"')
39
return fail("expected '\"' in object, got " + esc(ch));
40
41
42
if (ch == ']')
43
return data;
44
45
- while (1) {
46
+ for (;;) {
47
i--;
48
data.push_back(parse_json(depth + 1));
49
if (failed)
50
x265_2.4.tar.gz/source/dynamicHDR10/metadataFromJson.cpp -> x265_2.5.tar.gz/source/dynamicHDR10/metadataFromJson.cpp
Changed
10
1
2
{
3
int payloadBytes = 1;
4
5
- for(;payload > 0xFF; payload -= 0xFF, ++payloadBytes);
6
+ for(;payload >= 0xFF; payload -= 0xFF, ++payloadBytes);
7
8
if(payloadBytes > 1)
9
{
10
x265_2.4.tar.gz/source/encoder/CMakeLists.txt -> x265_2.5.tar.gz/source/encoder/CMakeLists.txt
Changed
8
1
2
reference.cpp reference.h
3
encoder.cpp encoder.h
4
api.cpp
5
- weightPrediction.cpp)
6
+ weightPrediction.cpp
7
+ ../x265-extras.cpp ../x265-extras.h)
8
x265_2.4.tar.gz/source/encoder/analysis.cpp -> x265_2.5.tar.gz/source/encoder/analysis.cpp
Changed
773
1
2
m_reuseInterDataCTU = NULL;
3
m_reuseRef = NULL;
4
m_bHD = false;
5
+ m_evaluateInter = 0;
6
}
7
8
bool Analysis::create(ThreadLocalData *tld)
9
10
cacheCost = X265_MALLOC(uint64_t, costArrSize);
11
12
int csp = m_param->internalCsp;
13
- uint32_t cuSize = g_maxCUSize;
14
+ uint32_t cuSize = m_param->maxCUSize;
15
16
bool ok = true;
17
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++, cuSize >>= 1)
18
+ for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++, cuSize >>= 1)
19
{
20
ModeDepth &md = m_modeDepth[depth];
21
22
- md.cuMemPool.create(depth, csp, MAX_PRED_TYPES);
23
+ md.cuMemPool.create(depth, csp, MAX_PRED_TYPES, *m_param);
24
ok &= md.fencYuv.create(cuSize, csp);
25
26
for (int j = 0; j < MAX_PRED_TYPES; j++)
27
{
28
- md.pred[j].cu.initialize(md.cuMemPool, depth, csp, j);
29
+ md.pred[j].cu.initialize(md.cuMemPool, depth, *m_param, j);
30
ok &= md.pred[j].predYuv.create(cuSize, csp);
31
ok &= md.pred[j].reconYuv.create(cuSize, csp);
32
md.pred[j].fencYuv = &md.fencYuv;
33
34
35
void Analysis::destroy()
36
{
37
- for (uint32_t i = 0; i <= g_maxCUDepth; i++)
38
+ for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
39
{
40
m_modeDepth[i].cuMemPool.destroy();
41
m_modeDepth[i].fencYuv.destroy();
42
43
calculateNormFactor(ctu, qp);
44
45
uint32_t numPartition = ctu.m_numPartitions;
46
+ if (m_param->bCTUInfo && (*m_frame->m_ctuInfo + ctu.m_cuAddr))
47
+ {
48
+ x265_ctu_info_t* ctuTemp = *m_frame->m_ctuInfo + ctu.m_cuAddr;
49
+ if (ctuTemp->ctuPartitions)
50
+ {
51
+ int32_t depthIdx = 0;
52
+ uint32_t maxNum8x8Partitions = 64;
53
+ uint8_t* depthInfoPtr = m_frame->m_addOnDepth[ctu.m_cuAddr];
54
+ uint8_t* contentInfoPtr = m_frame->m_addOnCtuInfo[ctu.m_cuAddr];
55
+ int* prevCtuInfoChangePtr = m_frame->m_addOnPrevChange[ctu.m_cuAddr];
56
+ do
57
+ {
58
+ uint8_t depth = (uint8_t)ctuTemp->ctuPartitions[depthIdx];
59
+ uint8_t content = (uint8_t)(*((int32_t *)ctuTemp->ctuInfo + depthIdx));
60
+ int prevCtuInfoChange = m_frame->m_prevCtuInfoChange[ctu.m_cuAddr * maxNum8x8Partitions + depthIdx];
61
+ memset(depthInfoPtr, depth, sizeof(uint8_t) * numPartition >> 2 * depth);
62
+ memset(contentInfoPtr, content, sizeof(uint8_t) * numPartition >> 2 * depth);
63
+ memset(prevCtuInfoChangePtr, 0, sizeof(int) * numPartition >> 2 * depth);
64
+ for (uint32_t l = 0; l < numPartition >> 2 * depth; l++)
65
+ prevCtuInfoChangePtr[l] = prevCtuInfoChange;
66
+ depthInfoPtr += ctu.m_numPartitions >> 2 * depth;
67
+ contentInfoPtr += ctu.m_numPartitions >> 2 * depth;
68
+ prevCtuInfoChangePtr += ctu.m_numPartitions >> 2 * depth;
69
+ depthIdx++;
70
+ } while (ctuTemp->ctuPartitions[depthIdx] != 0);
71
+
72
+ m_additionalCtuInfo = m_frame->m_addOnCtuInfo[ctu.m_cuAddr];
73
+ m_prevCtuInfoChange = m_frame->m_addOnPrevChange[ctu.m_cuAddr];
74
+ memcpy(ctu.m_cuDepth, m_frame->m_addOnDepth[ctu.m_cuAddr], sizeof(uint8_t) * numPartition);
75
+ //Calculate log2CUSize from depth
76
+ for (uint32_t i = 0; i < cuGeom.numPartitions; i++)
77
+ ctu.m_log2CUSize[i] = (uint8_t)m_param->maxLog2CUSize - ctu.m_cuDepth[i];
78
+ }
79
+ }
80
+
81
if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead)
82
{
83
m_multipassAnalysis = (analysis2PassFrameData*)m_frame->m_analysis2Pass.analysisFramedata;
84
85
}
86
}
87
88
- if (m_param->analysisMode && m_slice->m_sliceType != I_SLICE && m_param->analysisRefineLevel > 1 && m_param->analysisRefineLevel < 10)
89
+ if (m_param->analysisReuseMode && m_slice->m_sliceType != I_SLICE && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel < 10)
90
{
91
int numPredDir = m_slice->isInterP() ? 1 : 2;
92
m_reuseInterDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData;
93
m_reuseRef = &m_reuseInterDataCTU->ref[ctu.m_cuAddr * X265_MAX_PRED_MODE_PER_CTU * numPredDir];
94
m_reuseDepth = &m_reuseInterDataCTU->depth[ctu.m_cuAddr * ctu.m_numPartitions];
95
m_reuseModes = &m_reuseInterDataCTU->modes[ctu.m_cuAddr * ctu.m_numPartitions];
96
- if (m_param->analysisRefineLevel > 4)
97
+ if (m_param->analysisReuseLevel > 4)
98
{
99
m_reusePartSize = &m_reuseInterDataCTU->partSize[ctu.m_cuAddr * ctu.m_numPartitions];
100
m_reuseMergeFlag = &m_reuseInterDataCTU->mergeFlag[ctu.m_cuAddr * ctu.m_numPartitions];
101
}
102
- if (m_param->analysisMode == X265_ANALYSIS_SAVE)
103
+ if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
104
for (int i = 0; i < X265_MAX_PRED_MODE_PER_CTU * numPredDir; i++)
105
m_reuseRef[i] = -1;
106
}
107
108
if (m_slice->m_sliceType == I_SLICE)
109
{
110
analysis_intra_data* intraDataCTU = (analysis_intra_data*)m_frame->m_analysisData.intraData;
111
- if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
112
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1)
113
{
114
memcpy(ctu.m_cuDepth, &intraDataCTU->depth[ctu.m_cuAddr * numPartition], sizeof(uint8_t) * numPartition);
115
memcpy(ctu.m_lumaIntraDir, &intraDataCTU->modes[ctu.m_cuAddr * numPartition], sizeof(uint8_t) * numPartition);
116
117
else
118
{
119
if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
120
- ctu.m_cuPelX / g_maxCUSize >= frame.m_encData->m_pir.pirStartCol
121
- && ctu.m_cuPelX / g_maxCUSize < frame.m_encData->m_pir.pirEndCol)
122
+ ctu.m_cuPelX / m_param->maxCUSize >= frame.m_encData->m_pir.pirStartCol
123
+ && ctu.m_cuPelX / m_param->maxCUSize < frame.m_encData->m_pir.pirEndCol)
124
compressIntraCU(ctu, cuGeom, qp);
125
else if (!m_param->rdLevel)
126
{
127
128
/* generate residual for entire CTU at once and copy to reconPic */
129
encodeResidue(ctu, cuGeom);
130
}
131
- else if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel == 10)
132
+ else if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel == 10)
133
{
134
analysis_inter_data* interDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData;
135
int posCTU = ctu.m_cuAddr * numPartition;
136
137
}
138
//Calculate log2CUSize from depth
139
for (uint32_t i = 0; i < cuGeom.numPartitions; i++)
140
- ctu.m_log2CUSize[i] = (uint8_t)g_maxLog2CUSize - ctu.m_cuDepth[i];
141
+ ctu.m_log2CUSize[i] = (uint8_t)m_param->maxLog2CUSize - ctu.m_cuDepth[i];
142
143
qprdRefine (ctu, cuGeom, qp, qp);
144
return *m_modeDepth[0].bestMode;
145
146
if (m_param->bEnableRdRefine || m_param->bOptCUDeltaQP)
147
qprdRefine(ctu, cuGeom, qp, qp);
148
149
+ if (m_param->csvLogLevel >= 2)
150
+ collectPUStatistics(ctu, cuGeom);
151
+
152
return *m_modeDepth[0].bestMode;
153
}
154
155
+void Analysis::collectPUStatistics(const CUData& ctu, const CUGeom& cuGeom)
156
+{
157
+ uint8_t depth = 0;
158
+ uint8_t partSize = 0;
159
+ for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
160
+ {
161
+ depth = ctu.m_cuDepth[absPartIdx];
162
+ partSize = ctu.m_partSize[absPartIdx];
163
+ uint32_t numPU = nbPartsTable[(int)partSize];
164
+ int shift = 2 * (m_param->maxCUDepth + 1 - depth);
165
+ for (uint32_t puIdx = 0; puIdx < numPU; puIdx++)
166
+ {
167
+ PredictionUnit pu(ctu, cuGeom, puIdx);
168
+ int puabsPartIdx = ctu.getPUOffset(puIdx, absPartIdx);
169
+ int mode = 1;
170
+ if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_Nx2N || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxN)
171
+ mode = 2;
172
+ else if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxnU || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_2NxnD || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_nLx2N || ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_nRx2N)
173
+ mode = 3;
174
+
175
+ if (ctu.m_predMode[puabsPartIdx + absPartIdx] == MODE_SKIP)
176
+ {
177
+ ctu.m_encData->m_frameStats.cntSkipPu[depth] += (uint64_t)(1 << shift);
178
+ ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
179
+ }
180
+ else if (ctu.m_predMode[puabsPartIdx + absPartIdx] == MODE_INTRA)
181
+ {
182
+ if (ctu.m_partSize[puabsPartIdx + absPartIdx] == SIZE_NxN)
183
+ {
184
+ ctu.m_encData->m_frameStats.cnt4x4++;
185
+ ctu.m_encData->m_frameStats.totalPu[4]++;
186
+ }
187
+ else
188
+ {
189
+ ctu.m_encData->m_frameStats.cntIntraPu[depth] += (uint64_t)(1 << shift);
190
+ ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
191
+ }
192
+ }
193
+ else if (mode == 3)
194
+ {
195
+ ctu.m_encData->m_frameStats.cntAmp[depth] += (uint64_t)(1 << shift);
196
+ ctu.m_encData->m_frameStats.totalPu[depth] += (uint64_t)(1 << shift);
197
+ break;
198
+ }
199
+ else
200
+ {
201
+ if (ctu.m_mergeFlag[puabsPartIdx + absPartIdx])
202
+ ctu.m_encData->m_frameStats.cntMergePu[depth][ctu.m_partSize[puabsPartIdx + absPartIdx]] += (1 << shift) / mode;
203
+ else
204
+ ctu.m_encData->m_frameStats.cntInterPu[depth][ctu.m_partSize[puabsPartIdx + absPartIdx]] += (1 << shift) / mode;
205
+
206
+ ctu.m_encData->m_frameStats.totalPu[depth] += (1 << shift) / mode;
207
+ }
208
+ }
209
+ }
210
+}
211
+
212
int32_t Analysis::loadTUDepth(CUGeom cuGeom, CUData parentCTU)
213
{
214
float predDepth = 0;
215
216
int lambdaQP = lqp;
217
218
bool doQPRefine = (bDecidedDepth && depth <= m_slice->m_pps->maxCuDQPDepth) || (!bDecidedDepth && depth == m_slice->m_pps->maxCuDQPDepth);
219
- if (m_param->analysisRefineLevel == 10)
220
+ if (m_param->analysisReuseLevel == 10)
221
doQPRefine = false;
222
223
if (doQPRefine)
224
225
226
bool bAlreadyDecided = parentCTU.m_lumaIntraDir[cuGeom.absPartIdx] != (uint8_t)ALL_IDX;
227
bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
228
+ int split = 0;
229
+ if (m_param->intraRefine)
230
+ {
231
+ split = ((cuGeom.log2CUSize == (uint32_t)(g_log2Size[m_param->minCUSize] + 1)) && bDecidedDepth);
232
+ if (cuGeom.log2CUSize == (uint32_t)(g_log2Size[m_param->minCUSize]) && !bDecidedDepth)
233
+ bAlreadyDecided = false;
234
+ }
235
236
if (bAlreadyDecided)
237
{
238
239
Mode& mode = md.pred[0];
240
md.bestMode = &mode;
241
mode.cu.initSubCU(parentCTU, cuGeom, qp);
242
- memcpy(mode.cu.m_lumaIntraDir, parentCTU.m_lumaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
243
- memcpy(mode.cu.m_chromaIntraDir, parentCTU.m_chromaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
244
+ if (m_param->intraRefine != 2 || parentCTU.m_lumaIntraDir[cuGeom.absPartIdx] <= 1)
245
+ {
246
+ memcpy(mode.cu.m_lumaIntraDir, parentCTU.m_lumaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
247
+ memcpy(mode.cu.m_chromaIntraDir, parentCTU.m_chromaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
248
+ }
249
checkIntra(mode, cuGeom, (PartSize)parentCTU.m_partSize[cuGeom.absPartIdx]);
250
251
if (m_bTryLossless)
252
253
}
254
255
// stop recursion if we reach the depth of previous analysis decision
256
- mightSplit &= !(bAlreadyDecided && bDecidedDepth);
257
+ mightSplit &= !(bAlreadyDecided && bDecidedDepth) || split;
258
259
if (mightSplit)
260
{
261
262
}
263
264
/* Save Intra CUs TU depth only when analysis mode is OFF */
265
- if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4 && !m_param->analysisMode)
266
+ if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4 && !m_param->analysisReuseMode)
267
{
268
CUData* ctu = md.bestMode->cu.m_encData->getPicCTU(parentCTU.m_cuAddr);
269
int8_t maxTUDepth = -1;
270
271
bool mightSplit = !(cuGeom.flags & CUGeom::LEAF);
272
bool mightNotSplit = !(cuGeom.flags & CUGeom::SPLIT_MANDATORY);
273
uint32_t minDepth = topSkipMinDepth(parentCTU, cuGeom);
274
+ bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
275
bool skipModes = false; /* Skip any remaining mode analyses at current depth */
276
bool skipRecursion = false; /* Skip recursion */
277
bool splitIntra = true;
278
bool skipRectAmp = false;
279
bool chooseMerge = false;
280
+ bool bCtuInfoCheck = false;
281
+ int sameContentRef = 0;
282
+
283
+ if (m_evaluateInter == 1)
284
+ {
285
+ skipRectAmp = !!md.bestMode;
286
+ mightSplit &= false;
287
+ minDepth = depth;
288
+ }
289
290
if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4)
291
m_maxTUDepth = loadTUDepth(cuGeom, parentCTU);
292
293
md.pred[PRED_2Nx2N].sa8dCost = 0;
294
}
295
296
- if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
297
+ if (m_param->bCTUInfo && depth <= parentCTU.m_cuDepth[cuGeom.absPartIdx])
298
+ {
299
+ if (bDecidedDepth && m_additionalCtuInfo[cuGeom.absPartIdx])
300
+ sameContentRef = findSameContentRefCount(parentCTU, cuGeom);
301
+ if (depth < parentCTU.m_cuDepth[cuGeom.absPartIdx])
302
+ {
303
+ mightNotSplit &= bDecidedDepth;
304
+ bCtuInfoCheck = skipRecursion = false;
305
+ skipModes = true;
306
+ }
307
+ else if (mightNotSplit && bDecidedDepth)
308
+ {
309
+ if (m_additionalCtuInfo[cuGeom.absPartIdx])
310
+ {
311
+ bCtuInfoCheck = skipRecursion = true;
312
+ md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
313
+ md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
314
+ checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
315
+ if (!sameContentRef)
316
+ {
317
+ if ((m_param->bCTUInfo & 2) && (m_slice->m_pps->bUseDQP && depth <= m_slice->m_pps->maxCuDQPDepth))
318
+ {
319
+ qp -= int32_t(0.04 * qp);
320
+ setLambdaFromQP(parentCTU, qp);
321
+ }
322
+ if (m_param->bCTUInfo & 4)
323
+ skipModes = false;
324
+ }
325
+ if (sameContentRef || (!sameContentRef && !(m_param->bCTUInfo & 4)))
326
+ {
327
+ if (m_param->rdLevel)
328
+ skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0);
329
+ if ((m_param->bCTUInfo & 4) && sameContentRef)
330
+ skipModes = md.bestMode && true;
331
+ }
332
+ }
333
+ else
334
+ {
335
+ md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
336
+ md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
337
+ checkMerge2Nx2N_rd0_4(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
338
+ if (m_param->rdLevel)
339
+ skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0);
340
+ }
341
+ mightSplit &= !bDecidedDepth;
342
+ }
343
+ }
344
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
345
{
346
if (mightNotSplit && depth == m_reuseDepth[cuGeom.absPartIdx])
347
{
348
349
if (m_param->rdLevel)
350
skipModes = m_param->bEnableEarlySkip && md.bestMode;
351
}
352
- if (m_param->analysisRefineLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
353
+ if (m_param->analysisReuseLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
354
{
355
if (m_reuseModes[cuGeom.absPartIdx] != MODE_INTRA && m_reuseModes[cuGeom.absPartIdx] != 4)
356
{
357
358
}
359
360
/* Step 1. Evaluate Merge/Skip candidates for likely early-outs, if skip mode was not set above */
361
- if (mightNotSplit && depth >= minDepth && !md.bestMode) /* TODO: Re-evaluate if analysis load/save still works */
362
+ if (mightNotSplit && depth >= minDepth && !md.bestMode && !bCtuInfoCheck) /* TODO: Re-evaluate if analysis load/save still works */
363
{
364
/* Compute Merge Cost */
365
md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
366
367
skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0); // TODO: sa8d threshold per depth
368
}
369
370
- if (md.bestMode && m_param->bEnableRecursionSkip)
371
+ if (md.bestMode && m_param->bEnableRecursionSkip && !bCtuInfoCheck)
372
{
373
skipRecursion = md.bestMode->cu.isSkipped(0);
374
if (mightSplit && depth >= minDepth && !skipRecursion)
375
376
/* Step 2. Evaluate each of the 4 split sub-blocks in series */
377
if (mightSplit && !skipRecursion)
378
{
379
+ if (bCtuInfoCheck && m_param->bCTUInfo & 2)
380
+ qp = int((1 / 0.96) * qp + 0.5);
381
Mode* splitPred = &md.pred[PRED_SPLIT];
382
splitPred->initCosts();
383
CUData* splitCU = &splitPred->cu;
384
385
* 2 3 */
386
uint32_t allSplitRefs = splitData[0].splitRefs | splitData[1].splitRefs | splitData[2].splitRefs | splitData[3].splitRefs;
387
/* Step 3. Evaluate ME (2Nx2N, rect, amp) and intra modes at current depth */
388
- if (mightNotSplit && depth >= minDepth)
389
+ if (mightNotSplit && (depth >= minDepth || (m_param->bCTUInfo && !md.bestMode)))
390
{
391
if (m_slice->m_pps->bUseDQP && depth <= m_slice->m_pps->maxCuDQPDepth && m_slice->m_pps->maxCuDQPDepth != 0)
392
setLambdaFromQP(parentCTU, qp);
393
394
}
395
}
396
}
397
- bool bTryIntra = (m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames) && cuGeom.log2CUSize != MAX_LOG2_CU_SIZE;
398
+ bool bTryIntra = (m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames) && cuGeom.log2CUSize != MAX_LOG2_CU_SIZE && !((m_param->bCTUInfo & 4) && bCtuInfoCheck);
399
if (m_param->rdLevel >= 3)
400
{
401
/* Calculate RD cost of best inter option */
402
403
404
bool mightSplit = !(cuGeom.flags & CUGeom::LEAF);
405
bool mightNotSplit = !(cuGeom.flags & CUGeom::SPLIT_MANDATORY);
406
+ bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
407
bool skipRecursion = false;
408
bool skipModes = false;
409
bool splitIntra = true;
410
bool skipRectAmp = false;
411
+ bool bCtuInfoCheck = false;
412
+ int sameContentRef = 0;
413
+
414
+ if (m_evaluateInter == 1)
415
+ {
416
+ skipRectAmp = !!md.bestMode;
417
+ mightSplit &= false;
418
+ }
419
420
// avoid uninitialize value in below reference
421
if (m_param->limitModes)
422
423
splitData[3].initSplitCUData();
424
uint32_t allSplitRefs = splitData[0].splitRefs | splitData[1].splitRefs | splitData[2].splitRefs | splitData[3].splitRefs;
425
uint32_t refMasks[2];
426
- if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
427
+ if (m_param->bCTUInfo && depth <= parentCTU.m_cuDepth[cuGeom.absPartIdx])
428
+ {
429
+ if (bDecidedDepth && m_additionalCtuInfo[cuGeom.absPartIdx])
430
+ sameContentRef = findSameContentRefCount(parentCTU, cuGeom);
431
+ if (depth < parentCTU.m_cuDepth[cuGeom.absPartIdx])
432
+ {
433
+ mightNotSplit &= bDecidedDepth;
434
+ bCtuInfoCheck = skipRecursion = false;
435
+ skipModes = true;
436
+ }
437
+ else if (mightNotSplit && bDecidedDepth)
438
+ {
439
+ if (m_additionalCtuInfo[cuGeom.absPartIdx])
440
+ {
441
+ bCtuInfoCheck = skipRecursion = true;
442
+ refMasks[0] = allSplitRefs;
443
+ md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);
444
+ checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, refMasks);
445
+ checkBestMode(md.pred[PRED_2Nx2N], cuGeom.depth);
446
+ if (!sameContentRef)
447
+ {
448
+ if ((m_param->bCTUInfo & 2) && (m_slice->m_pps->bUseDQP && depth <= m_slice->m_pps->maxCuDQPDepth))
449
+ {
450
+ qp -= int32_t(0.04 * qp);
451
+ setLambdaFromQP(parentCTU, qp);
452
+ }
453
+ if (m_param->bCTUInfo & 4)
454
+ skipModes = false;
455
+ }
456
+ if (sameContentRef || (!sameContentRef && !(m_param->bCTUInfo & 4)))
457
+ {
458
+ if (m_param->rdLevel)
459
+ skipModes = m_param->bEnableEarlySkip && md.bestMode && md.bestMode->cu.isSkipped(0);
460
+ if ((m_param->bCTUInfo & 4) && sameContentRef)
461
+ skipModes = md.bestMode && true;
462
+ }
463
+ }
464
+ else
465
+ {
466
+ md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
467
+ md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
468
+ checkMerge2Nx2N_rd5_6(md.pred[PRED_SKIP], md.pred[PRED_MERGE], cuGeom);
469
+ skipModes = !!m_param->bEnableEarlySkip && md.bestMode;
470
+ refMasks[0] = allSplitRefs;
471
+ md.pred[PRED_2Nx2N].cu.initSubCU(parentCTU, cuGeom, qp);
472
+ checkInter_rd5_6(md.pred[PRED_2Nx2N], cuGeom, SIZE_2Nx2N, refMasks);
473
+ checkBestMode(md.pred[PRED_2Nx2N], cuGeom.depth);
474
+ }
475
+ mightSplit &= !bDecidedDepth;
476
+ }
477
+ }
478
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
479
{
480
if (mightNotSplit && depth == m_reuseDepth[cuGeom.absPartIdx])
481
{
482
483
if (m_param->bEnableRecursionSkip && depth && m_modeDepth[depth - 1].bestMode)
484
skipRecursion = md.bestMode && !md.bestMode->cu.getQtRootCbf(0);
485
}
486
- if (m_param->analysisRefineLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
487
+ if (m_param->analysisReuseLevel > 4 && m_reusePartSize[cuGeom.absPartIdx] == SIZE_2Nx2N)
488
skipRectAmp = true && !!md.bestMode;
489
}
490
}
491
492
}
493
494
/* Step 1. Evaluate Merge/Skip candidates for likely early-outs */
495
- if (mightNotSplit && !md.bestMode)
496
+ if (mightNotSplit && !md.bestMode && !bCtuInfoCheck)
497
{
498
md.pred[PRED_SKIP].cu.initSubCU(parentCTU, cuGeom, qp);
499
md.pred[PRED_MERGE].cu.initSubCU(parentCTU, cuGeom, qp);
500
501
/* Step 2. Evaluate each of the 4 split sub-blocks in series */
502
if (mightSplit && !skipRecursion)
503
{
504
+ if (bCtuInfoCheck && m_param->bCTUInfo & 2)
505
+ qp = int((1 / 0.96) * qp + 0.5);
506
Mode* splitPred = &md.pred[PRED_SPLIT];
507
splitPred->initCosts();
508
CUData* splitCU = &splitPred->cu;
509
510
}
511
}
512
513
- if ((m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames) && cuGeom.log2CUSize != MAX_LOG2_CU_SIZE)
514
+ if ((m_slice->m_sliceType != B_SLICE || m_param->bIntraInBFrames) && (cuGeom.log2CUSize != MAX_LOG2_CU_SIZE) && !((m_param->bCTUInfo & 4) && bCtuInfoCheck))
515
{
516
if (!m_param->limitReferences || splitIntra)
517
{
518
519
ModeDepth& md = m_modeDepth[depth];
520
md.bestMode = NULL;
521
522
+ m_evaluateInter = 0;
523
bool mightSplit = !(cuGeom.flags & CUGeom::LEAF);
524
bool mightNotSplit = !(cuGeom.flags & CUGeom::SPLIT_MANDATORY);
525
bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
526
527
+ int split = (m_param->interRefine && cuGeom.log2CUSize == (uint32_t)(g_log2Size[m_param->minCUSize] + 1)
528
+ && bDecidedDepth && parentCTU.m_predMode[cuGeom.absPartIdx] == MODE_SKIP);
529
+
530
if (bDecidedDepth)
531
{
532
setLambdaFromQP(parentCTU, qp, lqp);
533
534
PartSize size = (PartSize)parentCTU.m_partSize[cuGeom.absPartIdx];
535
if (parentCTU.isIntra(cuGeom.absPartIdx))
536
{
537
- memcpy(mode.cu.m_lumaIntraDir, parentCTU.m_lumaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
538
- memcpy(mode.cu.m_chromaIntraDir, parentCTU.m_chromaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
539
+ if (m_param->intraRefine != 2 || parentCTU.m_lumaIntraDir[cuGeom.absPartIdx] <= 1)
540
+ {
541
+ memcpy(mode.cu.m_lumaIntraDir, parentCTU.m_lumaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
542
+ memcpy(mode.cu.m_chromaIntraDir, parentCTU.m_chromaIntraDir + cuGeom.absPartIdx, cuGeom.numPartitions);
543
+ }
544
checkIntra(mode, cuGeom, size);
545
}
546
else
547
548
for (uint32_t part = 0; part < numPU; part++)
549
{
550
PredictionUnit pu(mode.cu, cuGeom, part);
551
- if (m_param->analysisRefineLevel == 10)
552
+ if (m_param->analysisReuseLevel == 10)
553
{
554
analysis_inter_data* interDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData;
555
int cuIdx = (mode.cu.m_cuAddr * parentCTU.m_numPartitions) + cuGeom.absPartIdx;
556
mode.cu.m_mergeFlag[pu.puAbsPartIdx] = interDataCTU->mergeFlag[cuIdx + part];
557
mode.cu.setPUInterDir(interDataCTU->interDir[cuIdx + part], pu.puAbsPartIdx, part);
558
- for (int dir = 0; dir < m_slice->isInterB() + 1; dir++)
559
+ for (int list = 0; list < m_slice->isInterB() + 1; list++)
560
{
561
- mode.cu.setPUMv(dir, interDataCTU->mv[dir][cuIdx + part], pu.puAbsPartIdx, part);
562
- mode.cu.setPURefIdx(dir, interDataCTU->refIdx[dir][cuIdx + part], pu.puAbsPartIdx, part);
563
- mode.cu.m_mvpIdx[dir][pu.puAbsPartIdx] = interDataCTU->mvpIdx[dir][cuIdx + part];
564
+ mode.cu.setPUMv(list, interDataCTU->mv[list][cuIdx + part], pu.puAbsPartIdx, part);
565
+ mode.cu.setPURefIdx(list, interDataCTU->refIdx[list][cuIdx + part], pu.puAbsPartIdx, part);
566
+ mode.cu.m_mvpIdx[list][pu.puAbsPartIdx] = interDataCTU->mvpIdx[list][cuIdx + part];
567
}
568
if (!mode.cu.m_mergeFlag[pu.puAbsPartIdx])
569
{
570
+ if (m_param->mvRefine)
571
+ m_me.setSourcePU(*mode.fencYuv, pu.ctuAddr, pu.cuAbsPartIdx, pu.puAbsPartIdx, pu.width, pu.height, m_param->searchMethod, m_param->subpelRefine, false);
572
//AMVP
573
MV mvc[(MD_ABOVE_LEFT + 1) * 2 + 2];
574
mode.cu.getNeighbourMV(part, pu.puAbsPartIdx, mode.interNeighbours);
575
576
continue;
577
mode.cu.getPMV(mode.interNeighbours, list, ref, mode.amvpCand[list][ref], mvc);
578
MV mvp = mode.amvpCand[list][ref][mode.cu.m_mvpIdx[list][pu.puAbsPartIdx]];
579
+ if (m_param->mvRefine)
580
+ {
581
+ MV outmv;
582
+ searchMV(mode, pu, list, ref, outmv);
583
+ mode.cu.setPUMv(list, outmv, pu.puAbsPartIdx, part);
584
+ }
585
mode.cu.m_mvd[list][pu.puAbsPartIdx] = mode.cu.m_mv[list][pu.puAbsPartIdx] - mvp;
586
}
587
}
588
+ else if(m_param->scaleFactor)
589
+ {
590
+ MVField candMvField[MRG_MAX_NUM_CANDS][2]; // double length for mv of both lists
591
+ uint8_t candDir[MRG_MAX_NUM_CANDS];
592
+ mode.cu.getInterMergeCandidates(pu.puAbsPartIdx, part, candMvField, candDir);
593
+ uint8_t mvpIdx = mode.cu.m_mvpIdx[0][pu.puAbsPartIdx];
594
+ mode.cu.setPUInterDir(candDir[mvpIdx], pu.puAbsPartIdx, part);
595
+ mode.cu.setPUMv(0, candMvField[mvpIdx][0].mv, pu.puAbsPartIdx, part);
596
+ mode.cu.setPUMv(1, candMvField[mvpIdx][1].mv, pu.puAbsPartIdx, part);
597
+ mode.cu.setPURefIdx(0, (int8_t)candMvField[mvpIdx][0].refIdx, pu.puAbsPartIdx, part);
598
+ mode.cu.setPURefIdx(1, (int8_t)candMvField[mvpIdx][1].refIdx, pu.puAbsPartIdx, part);
599
+ }
600
}
601
motionCompensation(mode.cu, pu, mode.predYuv, true, (m_csp != X265_CSP_I400 && m_frame->m_fencPic->m_picCsp != X265_CSP_I400));
602
}
603
-
604
- if (parentCTU.isSkipped(cuGeom.absPartIdx))
605
+ if (!m_param->interRefine && parentCTU.isSkipped(cuGeom.absPartIdx))
606
encodeResAndCalcRdSkipCU(mode);
607
else
608
encodeResAndCalcRdInterCU(mode, cuGeom);
609
610
611
if (mightSplit && m_param->rdLevel < 5)
612
checkDQPForSplitPred(*md.bestMode, cuGeom);
613
+
614
+ if (m_param->interRefine && parentCTU.m_predMode[cuGeom.absPartIdx] == MODE_SKIP && !mode.cu.isSkipped(0))
615
+ {
616
+ m_evaluateInter = 1;
617
+ m_param->rdLevel > 4 ? compressInterCU_rd5_6(parentCTU, cuGeom, qp) : compressInterCU_rd0_4(parentCTU, cuGeom, qp);
618
+ }
619
}
620
- else
621
+ if (!bDecidedDepth || split)
622
{
623
Mode* splitPred = &md.pred[PRED_SPLIT];
624
- md.bestMode = splitPred;
625
+ if (!split)
626
+ md.bestMode = splitPred;
627
splitPred->initCosts();
628
CUData* splitCU = &splitPred->cu;
629
splitCU->initSubCU(parentCTU, cuGeom, qp);
630
631
if (m_slice->m_pps->bUseDQP && nextDepth <= m_slice->m_pps->maxCuDQPDepth)
632
nextQP = setLambdaFromQP(parentCTU, calculateQpforCuSize(parentCTU, childGeom));
633
634
- int lamdaQP = m_param->analysisRefineLevel == 10 ? nextQP : lqp;
635
- qprdRefine(parentCTU, childGeom, nextQP, lamdaQP);
636
+ int lamdaQP = m_param->analysisReuseLevel == 10 ? nextQP : lqp;
637
+
638
+ if (split)
639
+ m_param->rdLevel > 4 ? compressInterCU_rd5_6(parentCTU, childGeom, nextQP) : compressInterCU_rd0_4(parentCTU, childGeom, nextQP);
640
+ else
641
+ qprdRefine(parentCTU, childGeom, nextQP, lamdaQP);
642
643
// Save best CU and pred data for this sub CU
644
splitCU->copyPartFrom(nd.bestMode->cu, childGeom, subPartIdx);
645
646
else
647
updateModeCost(*splitPred);
648
649
+ if (m_param->interRefine)
650
+ {
651
+ if (m_param->rdLevel > 1)
652
+ checkBestMode(*splitPred, cuGeom.depth);
653
+ else if (splitPred->sa8dCost < md.bestMode->sa8dCost)
654
+ md.bestMode = splitPred;
655
+ }
656
+
657
checkDQPForSplitPred(*splitPred, cuGeom);
658
659
/* Copy best data to encData CTU and recon */
660
661
int safeX, maxSafeMv;
662
if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE)
663
{
664
- safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * g_maxCUSize - 3;
665
+ safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * m_param->maxCUSize - 3;
666
maxSafeMv = (safeX - tempPred->cu.m_cuPelX) * 4;
667
}
668
for (uint32_t i = 0; i < numMergeCand; ++i)
669
670
}
671
672
if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
673
- tempPred->cu.m_cuPelX / g_maxCUSize < m_frame->m_encData->m_pir.pirEndCol &&
674
+ tempPred->cu.m_cuPelX / m_param->maxCUSize < m_frame->m_encData->m_pir.pirEndCol &&
675
candMvField[i][0].mv.x > maxSafeMv)
676
// skip merge candidates which reference beyond safe reference area
677
continue;
678
679
int safeX, maxSafeMv;
680
if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE)
681
{
682
- safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * g_maxCUSize - 3;
683
+ safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * m_param->maxCUSize - 3;
684
maxSafeMv = (safeX - tempPred->cu.m_cuPelX) * 4;
685
}
686
for (uint32_t i = 0; i < numMergeCand; i++)
687
688
triedBZero = true;
689
}
690
if (m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
691
- tempPred->cu.m_cuPelX / g_maxCUSize < m_frame->m_encData->m_pir.pirEndCol &&
692
+ tempPred->cu.m_cuPelX / m_param->maxCUSize < m_frame->m_encData->m_pir.pirEndCol &&
693
candMvField[i][0].mv.x > maxSafeMv)
694
// skip merge candidates which reference beyond safe reference area
695
continue;
696
697
interMode.cu.setPredModeSubParts(MODE_INTER);
698
int numPredDir = m_slice->isInterP() ? 1 : 2;
699
700
- if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_reuseInterDataCTU && m_param->analysisRefineLevel > 1)
701
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_reuseInterDataCTU && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
702
{
703
int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;
704
int index = 0;
705
706
}
707
interMode.sa8dCost = m_rdCost.calcRdSADCost((uint32_t)interMode.distortion, interMode.sa8dBits);
708
709
- if (m_param->analysisMode == X265_ANALYSIS_SAVE && m_reuseInterDataCTU && m_param->analysisRefineLevel > 1)
710
+ if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE && m_reuseInterDataCTU && m_param->analysisReuseLevel > 1)
711
{
712
int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;
713
int index = 0;
714
715
interMode.cu.setPredModeSubParts(MODE_INTER);
716
int numPredDir = m_slice->isInterP() ? 1 : 2;
717
718
- if (m_param->analysisMode == X265_ANALYSIS_LOAD && m_reuseInterDataCTU && m_param->analysisRefineLevel > 1)
719
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_reuseInterDataCTU && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
720
{
721
int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;
722
int index = 0;
723
724
/* predInterSearch sets interMode.sa8dBits, but this is ignored */
725
encodeResAndCalcRdInterCU(interMode, cuGeom);
726
727
- if (m_param->analysisMode == X265_ANALYSIS_SAVE && m_reuseInterDataCTU && m_param->analysisRefineLevel > 1)
728
+ if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE && m_reuseInterDataCTU && m_param->analysisReuseLevel > 1)
729
{
730
int refOffset = cuGeom.geomRecurId * 16 * numPredDir + partSize * numPredDir * 2;
731
int index = 0;
732
733
734
void Analysis::encodeResidue(const CUData& ctu, const CUGeom& cuGeom)
735
{
736
- if (cuGeom.depth < ctu.m_cuDepth[cuGeom.absPartIdx] && cuGeom.depth < g_maxCUDepth)
737
+ if (cuGeom.depth < ctu.m_cuDepth[cuGeom.absPartIdx] && cuGeom.depth < ctu.m_encData->m_param->maxCUDepth)
738
{
739
for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++)
740
{
741
742
uint32_t block_x = ctu.m_cuPelX + g_zscanToPelX[cuGeom.absPartIdx];
743
uint32_t block_y = ctu.m_cuPelY + g_zscanToPelY[cuGeom.absPartIdx];
744
uint32_t maxCols = (m_frame->m_fencPic->m_picWidth + (loopIncr - 1)) / loopIncr;
745
- uint32_t blockSize = g_maxCUSize >> cuGeom.depth;
746
+ uint32_t blockSize = m_param->maxCUSize >> cuGeom.depth;
747
double qp_offset = 0;
748
uint32_t cnt = 0;
749
uint32_t idx;
750
751
normFactor(srcV, blockSizeC, ctu, qp, TEXT_CHROMA_V);
752
}
753
}
754
+
755
+int Analysis::findSameContentRefCount(const CUData& parentCTU, const CUGeom& cuGeom)
756
+{
757
+ int sameContentRef = 0;
758
+ int m_curPoc = parentCTU.m_slice->m_poc;
759
+ int prevChange = m_prevCtuInfoChange[cuGeom.absPartIdx];
760
+ int numPredDir = m_slice->isInterP() ? 1 : 2;
761
+ for (int list = 0; list < numPredDir; list++)
762
+ {
763
+ for (int i = 0; i < m_frame->m_encData->m_slice->m_numRefIdx[list]; i++)
764
+ {
765
+ int refPoc = m_frame->m_encData->m_slice->m_refFrameList[list][i]->m_poc;
766
+ int refPrevChange = m_frame->m_encData->m_slice->m_refFrameList[list][i]->m_addOnPrevChange[parentCTU.m_cuAddr][cuGeom.absPartIdx];
767
+ if ((refPoc < prevChange && refPoc < m_curPoc) || (refPoc > m_curPoc && prevChange < m_curPoc && refPrevChange > m_curPoc) || ((refPoc == prevChange) && (m_additionalCtuInfo[cuGeom.absPartIdx] == CTU_INFO_CHANGE)))
768
+ sameContentRef++; /* Content changed */
769
+ }
770
+ }
771
+ return sameContentRef;
772
+}
773
x265_2.4.tar.gz/source/encoder/analysis.h -> x265_2.5.tar.gz/source/encoder/analysis.h
Changed
30
1
2
int* m_multipassMvpIdx[2];
3
int32_t* m_multipassRef[2];
4
uint8_t* m_multipassModes;
5
+
6
+ uint8_t m_evaluateInter;
7
+ uint8_t* m_additionalCtuInfo;
8
+ int* m_prevCtuInfoChange;
9
/* refine RD based on QP for rd-levels 5 and 6 */
10
void qprdRefine(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp, int32_t lqp);
11
12
13
14
void calculateNormFactor(CUData& ctu, int qp);
15
void normFactor(const pixel* src, uint32_t blockSize, CUData& ctu, int qp, TextType ttype);
16
+
17
+ void collectPUStatistics(const CUData& ctu, const CUGeom& cuGeom);
18
+
19
/* check whether current mode is the new best */
20
inline void checkBestMode(Mode& mode, uint32_t depth)
21
{
22
23
else
24
md.bestMode = &mode;
25
}
26
+ int findSameContentRefCount(const CUData& parentCTU, const CUGeom& cuGeom);
27
};
28
29
struct ThreadLocalData
30
x265_2.4.tar.gz/source/encoder/api.cpp -> x265_2.5.tar.gz/source/encoder/api.cpp
Changed
152
1
2
#include "level.h"
3
#include "nal.h"
4
#include "bitcost.h"
5
+#include "x265-extras.h"
6
7
/* multilib namespace reflectors */
8
#if LINKED_8BIT
9
10
if (x265_check_params(param))
11
goto fail;
12
13
- if (x265_set_globals(param))
14
- goto fail;
15
-
16
encoder = new Encoder;
17
if (!param->rc.bEnableSlowFirstPass)
18
PARAM_NS::x265_param_apply_fastfirstpass(param);
19
20
}
21
22
encoder->create();
23
+ /* Try to open CSV file handle */
24
+ if (encoder->m_param->csvfn)
25
+ {
26
+ encoder->m_param->csvfpt = x265_csvlog_open(*encoder->m_param, encoder->m_param->csvfn, encoder->m_param->csvLogLevel);
27
+ if (!encoder->m_param->csvfpt)
28
+ {
29
+ x265_log(encoder->m_param, X265_LOG_ERROR, "Unable to open CSV log file <%s>, aborting\n", encoder->m_param->csvfn);
30
+ encoder->m_aborted = true;
31
+ }
32
+ }
33
+
34
encoder->m_latestParam = latestParam;
35
memcpy(latestParam, param, sizeof(x265_param));
36
if (encoder->m_aborted)
37
38
if (encoder->m_param->rc.bStatRead && encoder->m_param->bMultiPassOptRPS)
39
{
40
if (!encoder->computeSPSRPSIndex())
41
+ {
42
+ encoder->m_aborted = true;
43
return -1;
44
+ }
45
}
46
encoder->getStreamHeaders(encoder->m_nalList, sbacCoder, bs);
47
*pp_nal = &encoder->m_nalList.m_nal[0];
48
49
return encoder->m_nalList.m_occupancy;
50
}
51
52
+ if (enc)
53
+ {
54
+ Encoder *encoder = static_cast<Encoder*>(enc);
55
+ encoder->m_aborted = true;
56
+ }
57
return -1;
58
}
59
60
61
else if (pi_nal)
62
*pi_nal = 0;
63
64
+ if (numEncoded && encoder->m_param->csvLogLevel)
65
+ x265_csvlog_frame(encoder->m_param->csvfpt, *encoder->m_param, *pic_out, encoder->m_param->csvLogLevel);
66
+
67
+ if (numEncoded < 0)
68
+ encoder->m_aborted = true;
69
+
70
return numEncoded;
71
}
72
73
74
}
75
}
76
77
-void x265_encoder_log(x265_encoder* enc, int, char **)
78
+void x265_encoder_log(x265_encoder* enc, int argc, char **argv)
79
{
80
if (enc)
81
{
82
Encoder *encoder = static_cast<Encoder*>(enc);
83
- x265_log(encoder->m_param, X265_LOG_WARNING, "x265_encoder_log is now deprecated\n");
84
+ x265_stats stats;
85
+ int padx = encoder->m_sps.conformanceWindow.rightOffset;
86
+ int pady = encoder->m_sps.conformanceWindow.bottomOffset;
87
+ encoder->fetchStats(&stats, sizeof(stats));
88
+ const x265_api * api = x265_api_get(0);
89
+ x265_csvlog_encode(encoder->m_param->csvfpt, api->version_str, *encoder->m_param, padx, pady, stats, encoder->m_param->csvLogLevel, argc, argv);
90
}
91
}
92
93
94
encoder->printSummary();
95
encoder->destroy();
96
delete encoder;
97
- ATOMIC_DEC(&g_ctuSizeConfigured);
98
}
99
}
100
101
102
encoder->m_bQueuedIntraRefresh = 1;
103
return 0;
104
}
105
+int x265_encoder_ctu_info(x265_encoder *enc, int poc, x265_ctu_info_t** ctu)
106
+{
107
+ if (!ctu || !enc)
108
+ return -1;
109
+ Encoder* encoder = static_cast<Encoder*>(enc);
110
+ encoder->copyCtuInfo(ctu, poc);
111
+ return 0;
112
+}
113
114
void x265_cleanup(void)
115
{
116
- if (!g_ctuSizeConfigured)
117
- {
118
- BitCost::destroy();
119
- CUData::s_partSet[0] = NULL; /* allow CUData to adjust to new CTU size */
120
- }
121
+ BitCost::destroy();
122
}
123
124
x265_picture *x265_picture_alloc()
125
126
pic->userSEI.payloads = NULL;
127
pic->userSEI.numPayloads = 0;
128
129
- if (param->analysisMode)
130
+ if (param->analysisReuseMode)
131
{
132
- uint32_t widthInCU = (param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
133
- uint32_t heightInCU = (param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
134
+ uint32_t widthInCU = (param->sourceWidth + param->maxCUSize - 1) >> param->maxLog2CUSize;
135
+ uint32_t heightInCU = (param->sourceHeight + param->maxCUSize - 1) >> param->maxLog2CUSize;
136
137
uint32_t numCUsInFrame = widthInCU * heightInCU;
138
pic->analysisData.numCUsInFrame = numCUsInFrame;
139
- pic->analysisData.numPartitions = NUM_4x4_PARTITIONS;
140
+ pic->analysisData.numPartitions = param->num4x4Partitions;
141
}
142
}
143
144
145
146
sizeof(x265_frame_stats),
147
&x265_encoder_intra_refresh,
148
+ &x265_encoder_ctu_info,
149
};
150
151
typedef const x265_api* (*api_get_func)(int bitDepth);
152
x265_2.4.tar.gz/source/encoder/dpb.cpp -> x265_2.5.tar.gz/source/encoder/dpb.cpp
Changed
34
1
2
}
3
}
4
5
+ if (curFrame->m_ctuInfo != NULL)
6
+ {
7
+ uint32_t widthInCU = (curFrame->m_param->sourceWidth + curFrame->m_param->maxCUSize - 1) >> curFrame->m_param->maxLog2CUSize;
8
+ uint32_t heightInCU = (curFrame->m_param->sourceHeight + curFrame->m_param->maxCUSize - 1) >> curFrame->m_param->maxLog2CUSize;
9
+ uint32_t numCUsInFrame = widthInCU * heightInCU;
10
+ for (uint32_t i = 0; i < numCUsInFrame; i++)
11
+ {
12
+ X265_FREE((*curFrame->m_ctuInfo + i)->ctuInfo);
13
+ (*curFrame->m_ctuInfo + i)->ctuInfo = NULL;
14
+ }
15
+ X265_FREE(*curFrame->m_ctuInfo);
16
+ *(curFrame->m_ctuInfo) = NULL;
17
+ X265_FREE(curFrame->m_ctuInfo);
18
+ curFrame->m_ctuInfo = NULL;
19
+ X265_FREE(curFrame->m_prevCtuInfoChange);
20
+ curFrame->m_prevCtuInfoChange = NULL;
21
+ }
22
curFrame->m_encData = NULL;
23
curFrame->m_reconPic = NULL;
24
}
25
26
}
27
28
// Disable Loopfilter in bound area, because we will do slice-parallelism in future
29
- slice->m_sLFaseFlag = (g_maxSlices > 1) ? false : ((SLFASE_CONSTANT & (1 << (pocCurr % 31))) > 0);
30
+ slice->m_sLFaseFlag = (newFrame->m_param->maxSlices > 1) ? false : ((SLFASE_CONSTANT & (1 << (pocCurr % 31))) > 0);
31
32
/* Increment reference count of all motion-referenced frames to prevent them
33
* from being recycled. These counts are decremented at the end of
34
x265_2.4.tar.gz/source/encoder/encoder.cpp -> x265_2.5.tar.gz/source/encoder/encoder.cpp
Changed
1421
1
2
m_frameEncoder[i] = NULL;
3
MotionEstimate::initScales();
4
5
-#if ENABLE_DYNAMIC_HDR10
6
+#if ENABLE_HDR10_PLUS
7
m_hdr10plus_api = hdr10plus_api_get();
8
+ numCimInfo = 0;
9
+ cim = NULL;
10
#endif
11
12
m_prevTonemapPayload.payload = NULL;
13
14
if (!p->bEnableWavefront && !p->bDistributeModeAnalysis && !p->bDistributeMotionEstimation && !p->lookaheadSlices)
15
allowPools = false;
16
17
- if (!p->frameNumThreads)
18
- {
19
- // auto-detect frame threads
20
- int cpuCount = ThreadPool::getCpuCount();
21
- if (!p->bEnableWavefront)
22
- p->frameNumThreads = X265_MIN3(cpuCount, (rows + 1) / 2, X265_MAX_FRAME_THREADS);
23
- else if (cpuCount >= 32)
24
- p->frameNumThreads = (p->sourceHeight > 2000) ? 8 : 6; // dual-socket 10-core IvyBridge or higher
25
- else if (cpuCount >= 16)
26
- p->frameNumThreads = 5; // 8 HT cores, or dual socket
27
- else if (cpuCount >= 8)
28
- p->frameNumThreads = 3; // 4 HT cores
29
- else if (cpuCount >= 4)
30
- p->frameNumThreads = 2; // Dual or Quad core
31
- else
32
- p->frameNumThreads = 1;
33
- }
34
m_numPools = 0;
35
if (allowPools)
36
m_threadPool = ThreadPool::allocThreadPools(p, m_numPools, 0);
37
+ else
38
+ {
39
+ if (!p->frameNumThreads)
40
+ {
41
+ // auto-detect frame threads
42
+ int cpuCount = ThreadPool::getCpuCount();
43
+ ThreadPool::getFrameThreadsCount(p, cpuCount);
44
+ }
45
+ }
46
+
47
if (!m_numPools)
48
{
49
// issue warnings if any of these features were requested
50
51
else
52
m_scalingList.setupQuantMatrices(m_sps.chromaFormatIdc);
53
54
- int numRows = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
55
- int numCols = (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize;
56
+ int numRows = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
57
+ int numCols = (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
58
for (int i = 0; i < m_param->frameNumThreads; i++)
59
{
60
if (!m_frameEncoder[i]->init(this, numRows, numCols))
61
62
63
initRefIdx();
64
65
- if (m_param->analysisMode)
66
+ if (m_param->analysisReuseMode)
67
{
68
- const char* name = m_param->analysisFileName;
69
+ const char* name = m_param->analysisReuseFileName;
70
if (!name)
71
name = defaultAnalysisFileName;
72
- const char* mode = m_param->analysisMode == X265_ANALYSIS_LOAD ? "rb" : "wb";
73
+ const char* mode = m_param->analysisReuseMode == X265_ANALYSIS_LOAD ? "rb" : "wb";
74
m_analysisFile = x265_fopen(name, mode);
75
if (!m_analysisFile)
76
{
77
78
79
if (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion)
80
{
81
- const char* name = m_param->analysisFileName;
82
+ const char* name = m_param->analysisReuseFileName;
83
if (!name)
84
name = defaultAnalysisFileName;
85
if (m_param->rc.bStatWrite)
86
87
88
void Encoder::destroy()
89
{
90
+#if ENABLE_HDR10_PLUS
91
+ m_hdr10plus_api->hdr10plus_clear_movie(cim, numCimInfo);
92
+#endif
93
+
94
if (m_exportedPic)
95
{
96
ATOMIC_DEC(&m_exportedPic->m_countRefEncoders);
97
98
{
99
int bError = 1;
100
fclose(m_analysisFileOut);
101
- const char* name = m_param->analysisFileName;
102
+ const char* name = m_param->analysisReuseFileName;
103
if (!name)
104
name = defaultAnalysisFileName;
105
char* temp = strcatFilename(name, ".temp");
106
107
}
108
if (m_param)
109
{
110
+ if (m_param->csvfpt)
111
+ fclose(m_param->csvfpt);
112
/* release string arguments that were strdup'd */
113
free((char*)m_param->rc.lambdaFileName);
114
free((char*)m_param->rc.statFileName);
115
- free((char*)m_param->analysisFileName);
116
+ free((char*)m_param->analysisReuseFileName);
117
free((char*)m_param->scalingLists);
118
+ free((char*)m_param->csvfn);
119
free((char*)m_param->numaPools);
120
free((char*)m_param->masteringDisplayColorVolume);
121
free((char*)m_param->toneMapFile);
122
123
FrameEncoder *encoder = m_frameEncoder[i];
124
if (encoder->m_rce.isActive && encoder->m_rce.poc != rc->m_curSlice->m_poc)
125
{
126
- int64_t bits = (int64_t) X265_MAX(encoder->m_rce.frameSizeEstimated, encoder->m_rce.frameSizePlanned);
127
+ int64_t bits = m_param->rc.bEnableConstVbv ? (int64_t)encoder->m_rce.frameSizePlanned : (int64_t)X265_MAX(encoder->m_rce.frameSizeEstimated, encoder->m_rce.frameSizePlanned);
128
rc->m_bufferFill -= bits;
129
rc->m_bufferFill = X265_MAX(rc->m_bufferFill, 0);
130
rc->m_bufferFill += encoder->m_rce.bufferRate;
131
132
133
if (m_exportedPic)
134
{
135
+ if (!m_param->bUseAnalysisFile && m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
136
+ freeAnalysis(&m_exportedPic->m_analysisData);
137
ATOMIC_DEC(&m_exportedPic->m_countRefEncoders);
138
m_exportedPic = NULL;
139
m_dpb->recycleUnreferenced();
140
141
{
142
x265_sei_payload toneMap;
143
toneMap.payload = NULL;
144
-#if ENABLE_DYNAMIC_HDR10
145
+#if ENABLE_HDR10_PLUS
146
if (m_bToneMap)
147
{
148
- uint8_t *cim = NULL;
149
- if (m_hdr10plus_api->hdr10plus_json_to_frame_cim(m_param->toneMapFile, pic_in->poc, cim))
150
- {
151
- toneMap.payload = (uint8_t*)x265_malloc(sizeof(uint8_t) * cim[0]);
152
- toneMap.payloadSize = cim[0];
153
+ if (pic_in->poc == 0)
154
+ numCimInfo = m_hdr10plus_api->hdr10plus_json_to_movie_cim(m_param->toneMapFile, cim);
155
+ if (pic_in->poc < numCimInfo)
156
+ {
157
+ int32_t i = 0;
158
+ toneMap.payloadSize = 0;
159
+ while (cim[pic_in->poc][i] == 0xFF)
160
+ toneMap.payloadSize += cim[pic_in->poc][i++];
161
+ toneMap.payloadSize += cim[pic_in->poc][i++];
162
+
163
+ toneMap.payload = (uint8_t*)x265_malloc(sizeof(uint8_t) * toneMap.payloadSize);
164
toneMap.payloadType = USER_DATA_REGISTERED_ITU_T_T35;
165
- memcpy(toneMap.payload, cim, toneMap.payloadSize);
166
+ memcpy(toneMap.payload, cim[pic_in->poc] + i, toneMap.payloadSize);
167
}
168
}
169
#endif
170
171
for (int i = 0; i < numPayloads; i++)
172
{
173
x265_sei_payload input;
174
- if (i == (numPayloads - 1))
175
+ if ((i == (numPayloads - 1)) && toneMapEnable)
176
input = toneMap;
177
else
178
input = pic_in->userSEI.payloads[i];
179
180
181
/* In analysisSave mode, x265_analysis_data is allocated in pic_in and inFrame points to this */
182
/* Load analysis data before lookahead->addPicture, since sliceType has been decided */
183
- if (m_param->analysisMode == X265_ANALYSIS_LOAD)
184
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD)
185
{
186
- x265_picture* inputPic = const_cast<x265_picture*>(pic_in);
187
/* readAnalysisFile reads analysis data for the frame and allocates memory based on slicetype */
188
- readAnalysisFile(&inputPic->analysisData, inFrame->m_poc);
189
- inFrame->m_analysisData.poc = inFrame->m_poc;
190
- inFrame->m_analysisData.sliceType = inputPic->analysisData.sliceType;
191
- inFrame->m_analysisData.bScenecut = inputPic->analysisData.bScenecut;
192
- inFrame->m_analysisData.satdCost = inputPic->analysisData.satdCost;
193
- inFrame->m_analysisData.numCUsInFrame = inputPic->analysisData.numCUsInFrame;
194
- inFrame->m_analysisData.numPartitions = inputPic->analysisData.numPartitions;
195
- inFrame->m_analysisData.wt = inputPic->analysisData.wt;
196
- inFrame->m_analysisData.interData = inputPic->analysisData.interData;
197
- inFrame->m_analysisData.intraData = inputPic->analysisData.intraData;
198
- sliceType = inputPic->analysisData.sliceType;
199
+ readAnalysisFile(&inFrame->m_analysisData, inFrame->m_poc, pic_in);
200
+ sliceType = inFrame->m_analysisData.sliceType;
201
inFrame->m_lowres.bScenecut = !!inFrame->m_analysisData.bScenecut;
202
inFrame->m_lowres.satdCost = inFrame->m_analysisData.satdCost;
203
}
204
+ if (m_param->bUseRcStats && pic_in->rcData)
205
+ {
206
+ RcStats* rc = (RcStats*)pic_in->rcData;
207
+ m_rateControl->m_accumPQp = rc->cumulativePQp;
208
+ m_rateControl->m_accumPNorm = rc->cumulativePNorm;
209
+ m_rateControl->m_isNextGop = true;
210
+ for (int j = 0; j < 3; j++)
211
+ m_rateControl->m_lastQScaleFor[j] = rc->lastQScaleFor[j];
212
+ m_rateControl->m_wantedBitsWindow = rc->wantedBitsWindow;
213
+ m_rateControl->m_cplxrSum = rc->cplxrSum;
214
+ m_rateControl->m_totalBits = rc->totalBits;
215
+ m_rateControl->m_encodedBits = rc->encodedBits;
216
+ m_rateControl->m_shortTermCplxSum = rc->shortTermCplxSum;
217
+ m_rateControl->m_shortTermCplxCount = rc->shortTermCplxCount;
218
+ if (m_rateControl->m_isVbv)
219
+ {
220
+ m_rateControl->m_bufferFillFinal = rc->bufferFillFinal;
221
+ for (int i = 0; i < 4; i++)
222
+ {
223
+ m_rateControl->m_pred[i].coeff = rc->coeff[i];
224
+ m_rateControl->m_pred[i].count = rc->count[i];
225
+ m_rateControl->m_pred[i].offset = rc->offset[i];
226
+ }
227
+ }
228
+ m_param->bUseRcStats = 0;
229
+ }
230
if (m_reconfigureRc)
231
inFrame->m_reconfigureRc = true;
232
233
234
x265_frame_stats* frameData = NULL;
235
236
/* Free up pic_in->analysisData since it has already been used */
237
- if (m_param->analysisMode == X265_ANALYSIS_LOAD)
238
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD)
239
freeAnalysis(&outFrame->m_analysisData);
240
241
if (pic_out)
242
243
244
pic_out->pts = outFrame->m_pts;
245
pic_out->dts = outFrame->m_dts;
246
-
247
- switch (slice->m_sliceType)
248
- {
249
- case I_SLICE:
250
- pic_out->sliceType = outFrame->m_lowres.bKeyframe ? X265_TYPE_IDR : X265_TYPE_I;
251
- break;
252
- case P_SLICE:
253
- pic_out->sliceType = X265_TYPE_P;
254
- break;
255
- case B_SLICE:
256
- pic_out->sliceType = X265_TYPE_B;
257
- break;
258
- }
259
-
260
+ pic_out->sliceType = outFrame->m_lowres.sliceType;
261
pic_out->planes[0] = recpic->m_picOrg[0];
262
pic_out->stride[0] = (int)(recpic->m_stride * sizeof(pixel));
263
if (m_param->internalCsp != X265_CSP_I400)
264
265
}
266
267
/* Dump analysis data from pic_out to file in save mode and free */
268
- if (m_param->analysisMode == X265_ANALYSIS_SAVE)
269
+ if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
270
{
271
pic_out->analysisData.poc = pic_out->poc;
272
pic_out->analysisData.sliceType = pic_out->sliceType;
273
274
pic_out->analysisData.interData = outFrame->m_analysisData.interData;
275
pic_out->analysisData.intraData = outFrame->m_analysisData.intraData;
276
writeAnalysisFile(&pic_out->analysisData, *outFrame->m_encData);
277
- freeAnalysis(&pic_out->analysisData);
278
+ if (m_param->bUseAnalysisFile)
279
+ freeAnalysis(&pic_out->analysisData);
280
}
281
}
282
if (m_param->rc.bStatWrite && (m_param->analysisMultiPassRefine || m_param->analysisMultiPassDistortion))
283
284
Slice* slice = frameEnc->m_encData->m_slice;
285
slice->m_sps = &m_sps;
286
slice->m_pps = &m_pps;
287
+ slice->m_param = m_param;
288
slice->m_maxNumMergeCand = m_param->maxNumMergeCand;
289
- slice->m_endCUAddr = slice->realEndAddress(m_sps.numCUsInFrame * NUM_4x4_PARTITIONS);
290
+ slice->m_endCUAddr = slice->realEndAddress(m_sps.numCUsInFrame * m_param->num4x4Partitions);
291
}
292
293
if (m_param->searchMethod == X265_SEA && frameEnc->m_lowres.sliceType != X265_TYPE_B)
294
{
295
- int padX = g_maxCUSize + 32;
296
- int padY = g_maxCUSize + 16;
297
- uint32_t numCuInHeight = (frameEnc->m_encData->m_reconPic->m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
298
- int maxHeight = numCuInHeight * g_maxCUSize;
299
+ int padX = m_param->maxCUSize + 32;
300
+ int padY = m_param->maxCUSize + 16;
301
+ uint32_t numCuInHeight = (frameEnc->m_encData->m_reconPic->m_picHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
302
+ int maxHeight = numCuInHeight * m_param->maxCUSize;
303
for (int i = 0; i < INTEGRAL_PLANE_NUM; i++)
304
{
305
frameEnc->m_encData->m_meBuffer[i] = X265_MALLOC(uint32_t, frameEnc->m_reconPic->m_stride * (maxHeight + (2 * padY)));
306
307
frameEnc->m_dts = frameEnc->m_reorderedPts;
308
309
/* Allocate analysis data before encode in save mode. This is allocated in frameEnc */
310
- if (m_param->analysisMode == X265_ANALYSIS_SAVE)
311
+ if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE)
312
{
313
x265_analysis_data* analysis = &frameEnc->m_analysisData;
314
analysis->poc = frameEnc->m_poc;
315
analysis->sliceType = frameEnc->m_lowres.sliceType;
316
- uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
317
- uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
318
+ uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
319
+ uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
320
321
uint32_t numCUsInFrame = widthInCU * heightInCU;
322
analysis->numCUsInFrame = numCUsInFrame;
323
- analysis->numPartitions = NUM_4x4_PARTITIONS;
324
+ analysis->numPartitions = m_param->num4x4Partitions;
325
allocAnalysis(analysis);
326
}
327
/* determine references, setup RPS, etc */
328
329
return x265_check_params(encParam);
330
}
331
332
+void Encoder::copyCtuInfo(x265_ctu_info_t** frameCtuInfo, int poc)
333
+{
334
+ uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
335
+ uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
336
+ Frame* curFrame;
337
+ Frame* prevFrame = NULL;
338
+ int32_t* frameCTU;
339
+ uint32_t numCUsInFrame = widthInCU * heightInCU;
340
+ uint32_t maxNum8x8Partitions = 64;
341
+ bool copied = false;
342
+ do
343
+ {
344
+ curFrame = m_lookahead->m_inputQueue.getPOC(poc);
345
+ if (!curFrame)
346
+ curFrame = m_lookahead->m_outputQueue.getPOC(poc);
347
+
348
+ if (poc > 0)
349
+ {
350
+ prevFrame = m_lookahead->m_inputQueue.getPOC(poc - 1);
351
+ if (!prevFrame)
352
+ prevFrame = m_lookahead->m_outputQueue.getPOC(poc - 1);
353
+ if (!prevFrame)
354
+ {
355
+ FrameEncoder* prevEncoder;
356
+ for (int i = 0; i < m_param->frameNumThreads; i++)
357
+ {
358
+ prevEncoder = m_frameEncoder[i];
359
+ prevFrame = prevEncoder->m_frame;
360
+ if (prevFrame && (prevEncoder->m_frame->m_poc == poc - 1))
361
+ {
362
+ prevFrame = prevEncoder->m_frame;
363
+ break;
364
+ }
365
+ }
366
+ }
367
+ }
368
+ x265_ctu_info_t* ctuTemp, *prevCtuTemp;
369
+ if (curFrame)
370
+ {
371
+ if (!curFrame->m_ctuInfo)
372
+ CHECKED_MALLOC(curFrame->m_ctuInfo, x265_ctu_info_t*, 1);
373
+ CHECKED_MALLOC(*curFrame->m_ctuInfo, x265_ctu_info_t, numCUsInFrame);
374
+ CHECKED_MALLOC_ZERO(curFrame->m_prevCtuInfoChange, int, numCUsInFrame * maxNum8x8Partitions);
375
+ for (uint32_t i = 0; i < numCUsInFrame; i++)
376
+ {
377
+ ctuTemp = *curFrame->m_ctuInfo + i;
378
+ CHECKED_MALLOC(frameCTU, int32_t, maxNum8x8Partitions);
379
+ ctuTemp->ctuInfo = (int32_t*)frameCTU;
380
+ ctuTemp->ctuAddress = frameCtuInfo[i]->ctuAddress;
381
+ memcpy(ctuTemp->ctuPartitions, frameCtuInfo[i]->ctuPartitions, sizeof(int32_t) * maxNum8x8Partitions);
382
+ memcpy(ctuTemp->ctuInfo, frameCtuInfo[i]->ctuInfo, sizeof(int32_t) * maxNum8x8Partitions);
383
+ if (prevFrame && curFrame->m_poc > 1)
384
+ {
385
+ prevCtuTemp = *prevFrame->m_ctuInfo + i;
386
+ for (uint32_t j = 0; j < maxNum8x8Partitions; j++)
387
+ curFrame->m_prevCtuInfoChange[i * maxNum8x8Partitions + j] = (*((int32_t *)prevCtuTemp->ctuInfo + j) == 2) ? (poc - 1) : prevFrame->m_prevCtuInfoChange[i * maxNum8x8Partitions + j];
388
+ }
389
+ }
390
+ copied = true;
391
+ curFrame->m_copied.trigger();
392
+ }
393
+ else
394
+ {
395
+ FrameEncoder* curEncoder;
396
+ for (int i = 0; i < m_param->frameNumThreads; i++)
397
+ {
398
+ curEncoder = m_frameEncoder[i];
399
+ curFrame = curEncoder->m_frame;
400
+ if (curFrame)
401
+ {
402
+ if (poc == curFrame->m_poc)
403
+ {
404
+ if (!curFrame->m_ctuInfo)
405
+ CHECKED_MALLOC(curFrame->m_ctuInfo, x265_ctu_info_t*, 1);
406
+ CHECKED_MALLOC(*curFrame->m_ctuInfo, x265_ctu_info_t, numCUsInFrame);
407
+ CHECKED_MALLOC_ZERO(curFrame->m_prevCtuInfoChange, int, numCUsInFrame * maxNum8x8Partitions);
408
+ for (uint32_t l = 0; l < numCUsInFrame; l++)
409
+ {
410
+ ctuTemp = *curFrame->m_ctuInfo + l;
411
+ CHECKED_MALLOC(frameCTU, int32_t, maxNum8x8Partitions);
412
+ ctuTemp->ctuInfo = (int32_t*)frameCTU;
413
+ ctuTemp->ctuAddress = frameCtuInfo[l]->ctuAddress;
414
+ memcpy(ctuTemp->ctuPartitions, frameCtuInfo[l]->ctuPartitions, sizeof(int32_t) * maxNum8x8Partitions);
415
+ memcpy(ctuTemp->ctuInfo, frameCtuInfo[l]->ctuInfo, sizeof(int32_t) * maxNum8x8Partitions);
416
+ if (prevFrame && curFrame->m_poc > 1)
417
+ {
418
+ prevCtuTemp = *prevFrame->m_ctuInfo + l;
419
+ for (uint32_t j = 0; j < maxNum8x8Partitions; j++)
420
+ curFrame->m_prevCtuInfoChange[l * maxNum8x8Partitions + j] = (*((int32_t *)prevCtuTemp->ctuInfo + j) == CTU_INFO_CHANGE) ? (poc - 1) : prevFrame->m_prevCtuInfoChange[l * maxNum8x8Partitions + j];
421
+ }
422
+ }
423
+ copied = true;
424
+ curFrame->m_copied.trigger();
425
+ break;
426
+ }
427
+ }
428
+ }
429
+ }
430
+ } while (!copied);
431
+ return;
432
+fail:
433
+ for (uint32_t i = 0; i < numCUsInFrame; i++)
434
+ {
435
+ X265_FREE((*curFrame->m_ctuInfo + i)->ctuInfo);
436
+ (*curFrame->m_ctuInfo + i)->ctuInfo = NULL;
437
+ }
438
+ X265_FREE(*curFrame->m_ctuInfo);
439
+ *(curFrame->m_ctuInfo) = NULL;
440
+ X265_FREE(curFrame->m_ctuInfo);
441
+ curFrame->m_ctuInfo = NULL;
442
+ X265_FREE(curFrame->m_prevCtuInfoChange);
443
+ curFrame->m_prevCtuInfoChange = NULL;
444
+}
445
+
446
void EncStats::addPsnr(double psnrY, double psnrU, double psnrV)
447
{
448
m_psnrSumY += psnrY;
449
450
/* Summarize stats from all frame encoders */
451
CUStats cuStats;
452
for (int i = 0; i < m_param->frameNumThreads; i++)
453
- cuStats.accumulate(m_frameEncoder[i]->m_cuStats);
454
+ cuStats.accumulate(m_frameEncoder[i]->m_cuStats, *m_param);
455
456
if (!cuStats.totalCTUTime)
457
return;
458
459
460
int64_t interRDOTotalTime = 0, intraRDOTotalTime = 0;
461
uint64_t interRDOTotalCount = 0, intraRDOTotalCount = 0;
462
- for (uint32_t i = 0; i <= g_maxCUDepth; i++)
463
+ for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
464
{
465
interRDOTotalTime += cuStats.interRDOElapsedTime[i];
466
intraRDOTotalTime += cuStats.intraRDOElapsedTime[i];
467
468
}
469
470
x265_log(m_param, X265_LOG_INFO, "CU: " X265_LL " %dX%d CTUs compressed in %.3lf seconds, %.3lf CTUs per worker-second\n",
471
- cuStats.totalCTUs, g_maxCUSize, g_maxCUSize,
472
+ cuStats.totalCTUs, m_param->maxCUSize, m_param->maxCUSize,
473
ELAPSED_SEC(totalWorkerTime),
474
cuStats.totalCTUs / ELAPSED_SEC(totalWorkerTime));
475
476
477
frameStats->qp = curEncData.m_avgQpAq;
478
frameStats->bits = bits;
479
frameStats->bScenecut = curFrame->m_lowres.bScenecut;
480
+ if (m_param->csvLogLevel >= 2)
481
+ frameStats->ipCostRatio = curFrame->m_lowres.ipCostRatio;
482
frameStats->bufferFill = m_rateControl->m_bufferFillActual;
483
frameStats->frameLatency = inPoc - poc;
484
if (m_param->rc.rateControlMode == X265_RC_CRF)
485
486
487
#define ELAPSED_MSEC(start, end) (((double)(end) - (start)) / 1000)
488
489
- frameStats->decideWaitTime = ELAPSED_MSEC(0, curEncoder->m_slicetypeWaitTime);
490
- frameStats->row0WaitTime = ELAPSED_MSEC(curEncoder->m_startCompressTime, curEncoder->m_row0WaitTime);
491
- frameStats->wallTime = ELAPSED_MSEC(curEncoder->m_row0WaitTime, curEncoder->m_endCompressTime);
492
- frameStats->refWaitWallTime = ELAPSED_MSEC(curEncoder->m_row0WaitTime, curEncoder->m_allRowsAvailableTime);
493
- frameStats->totalCTUTime = ELAPSED_MSEC(0, curEncoder->m_totalWorkerElapsedTime);
494
- frameStats->stallTime = ELAPSED_MSEC(0, curEncoder->m_totalNoWorkerTime);
495
- frameStats->totalFrameTime = ELAPSED_MSEC(curFrame->m_encodeStartTime, x265_mdate());
496
- if (curEncoder->m_totalActiveWorkerCount)
497
- frameStats->avgWPP = (double)curEncoder->m_totalActiveWorkerCount / curEncoder->m_activeWorkerCountSamples;
498
- else
499
- frameStats->avgWPP = 1;
500
- frameStats->countRowBlocks = curEncoder->m_countRowBlocks;
501
+ frameStats->maxLumaLevel = curFrame->m_fencPic->m_maxLumaLevel;
502
+ frameStats->minLumaLevel = curFrame->m_fencPic->m_minLumaLevel;
503
+ frameStats->avgLumaLevel = curFrame->m_fencPic->m_avgLumaLevel;
504
+
505
+ if (m_param->csvLogLevel >= 2)
506
+ {
507
+ frameStats->decideWaitTime = ELAPSED_MSEC(0, curEncoder->m_slicetypeWaitTime);
508
+ frameStats->row0WaitTime = ELAPSED_MSEC(curEncoder->m_startCompressTime, curEncoder->m_row0WaitTime);
509
+ frameStats->wallTime = ELAPSED_MSEC(curEncoder->m_row0WaitTime, curEncoder->m_endCompressTime);
510
+ frameStats->refWaitWallTime = ELAPSED_MSEC(curEncoder->m_row0WaitTime, curEncoder->m_allRowsAvailableTime);
511
+ frameStats->totalCTUTime = ELAPSED_MSEC(0, curEncoder->m_totalWorkerElapsedTime);
512
+ frameStats->stallTime = ELAPSED_MSEC(0, curEncoder->m_totalNoWorkerTime);
513
+ frameStats->totalFrameTime = ELAPSED_MSEC(curFrame->m_encodeStartTime, x265_mdate());
514
+ if (curEncoder->m_totalActiveWorkerCount)
515
+ frameStats->avgWPP = (double)curEncoder->m_totalActiveWorkerCount / curEncoder->m_activeWorkerCountSamples;
516
+ else
517
+ frameStats->avgWPP = 1;
518
+ frameStats->countRowBlocks = curEncoder->m_countRowBlocks;
519
+
520
+ frameStats->avgChromaDistortion = curFrame->m_encData->m_frameStats.avgChromaDistortion;
521
+ frameStats->avgLumaDistortion = curFrame->m_encData->m_frameStats.avgLumaDistortion;
522
+ frameStats->avgPsyEnergy = curFrame->m_encData->m_frameStats.avgPsyEnergy;
523
+ frameStats->avgResEnergy = curFrame->m_encData->m_frameStats.avgResEnergy;
524
+
525
+ frameStats->maxChromaULevel = curFrame->m_fencPic->m_maxChromaULevel;
526
+ frameStats->minChromaULevel = curFrame->m_fencPic->m_minChromaULevel;
527
+ frameStats->avgChromaULevel = curFrame->m_fencPic->m_avgChromaULevel;
528
+
529
+ frameStats->maxChromaVLevel = curFrame->m_fencPic->m_maxChromaVLevel;
530
+ frameStats->minChromaVLevel = curFrame->m_fencPic->m_minChromaVLevel;
531
+ frameStats->avgChromaVLevel = curFrame->m_fencPic->m_avgChromaVLevel;
532
+
533
+ if (curFrame->m_encData->m_frameStats.totalPu[4] == 0)
534
+ frameStats->puStats.percentNxN = 0;
535
+ else
536
+ frameStats->puStats.percentNxN = (double)(curFrame->m_encData->m_frameStats.cnt4x4 / (double)curFrame->m_encData->m_frameStats.totalPu[4]) * 100;
537
+ for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
538
+ {
539
+ if (curFrame->m_encData->m_frameStats.totalPu[depth] == 0)
540
+ {
541
+ frameStats->puStats.percentSkipPu[depth] = 0;
542
+ frameStats->puStats.percentIntraPu[depth] = 0;
543
+ frameStats->puStats.percentAmpPu[depth] = 0;
544
+ for (int i = 0; i < INTER_MODES - 1; i++)
545
+ {
546
+ frameStats->puStats.percentInterPu[depth][i] = 0;
547
+ frameStats->puStats.percentMergePu[depth][i] = 0;
548
+ }
549
+ }
550
+ else
551
+ {
552
+ frameStats->puStats.percentSkipPu[depth] = (double)(curFrame->m_encData->m_frameStats.cntSkipPu[depth] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
553
+ frameStats->puStats.percentIntraPu[depth] = (double)(curFrame->m_encData->m_frameStats.cntIntraPu[depth] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
554
+ frameStats->puStats.percentAmpPu[depth] = (double)(curFrame->m_encData->m_frameStats.cntAmp[depth] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
555
+ for (int i = 0; i < INTER_MODES - 1; i++)
556
+ {
557
+ frameStats->puStats.percentInterPu[depth][i] = (double)(curFrame->m_encData->m_frameStats.cntInterPu[depth][i] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
558
+ frameStats->puStats.percentMergePu[depth][i] = (double)(curFrame->m_encData->m_frameStats.cntMergePu[depth][i] / (double)curFrame->m_encData->m_frameStats.totalPu[depth]) * 100;
559
+ }
560
+ }
561
+ }
562
+ }
563
+
564
+ if (m_param->csvLogLevel >= 1)
565
+ {
566
+ frameStats->cuStats.percentIntraNxN = curFrame->m_encData->m_frameStats.percentIntraNxN;
567
568
- frameStats->cuStats.percentIntraNxN = curFrame->m_encData->m_frameStats.percentIntraNxN;
569
- frameStats->avgChromaDistortion = curFrame->m_encData->m_frameStats.avgChromaDistortion;
570
- frameStats->avgLumaDistortion = curFrame->m_encData->m_frameStats.avgLumaDistortion;
571
- frameStats->avgPsyEnergy = curFrame->m_encData->m_frameStats.avgPsyEnergy;
572
- frameStats->avgResEnergy = curFrame->m_encData->m_frameStats.avgResEnergy;
573
- frameStats->avgLumaLevel = curFrame->m_fencPic->m_avgLumaLevel;
574
- frameStats->maxLumaLevel = curFrame->m_fencPic->m_maxLumaLevel;
575
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
576
- {
577
- frameStats->cuStats.percentSkipCu[depth] = curFrame->m_encData->m_frameStats.percentSkipCu[depth];
578
- frameStats->cuStats.percentMergeCu[depth] = curFrame->m_encData->m_frameStats.percentMergeCu[depth];
579
- frameStats->cuStats.percentInterDistribution[depth][0] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][0];
580
- frameStats->cuStats.percentInterDistribution[depth][1] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][1];
581
- frameStats->cuStats.percentInterDistribution[depth][2] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][2];
582
- for (int n = 0; n < INTRA_MODES; n++)
583
- frameStats->cuStats.percentIntraDistribution[depth][n] = curFrame->m_encData->m_frameStats.percentIntraDistribution[depth][n];
584
+ for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
585
+ {
586
+ frameStats->cuStats.percentSkipCu[depth] = curFrame->m_encData->m_frameStats.percentSkipCu[depth];
587
+ frameStats->cuStats.percentMergeCu[depth] = curFrame->m_encData->m_frameStats.percentMergeCu[depth];
588
+ frameStats->cuStats.percentInterDistribution[depth][0] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][0];
589
+ frameStats->cuStats.percentInterDistribution[depth][1] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][1];
590
+ frameStats->cuStats.percentInterDistribution[depth][2] = curFrame->m_encData->m_frameStats.percentInterDistribution[depth][2];
591
+ for (int n = 0; n < INTRA_MODES; n++)
592
+ frameStats->cuStats.percentIntraDistribution[depth][n] = curFrame->m_encData->m_frameStats.percentIntraDistribution[depth][n];
593
+ }
594
}
595
}
596
}
597
598
sps->chromaFormatIdc = m_param->internalCsp;
599
sps->picWidthInLumaSamples = m_param->sourceWidth;
600
sps->picHeightInLumaSamples = m_param->sourceHeight;
601
- sps->numCuInWidth = (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize;
602
- sps->numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
603
+ sps->numCuInWidth = (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
604
+ sps->numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
605
sps->numCUsInFrame = sps->numCuInWidth * sps->numCuInHeight;
606
- sps->numPartitions = NUM_4x4_PARTITIONS;
607
- sps->numPartInCUSize = 1 << g_unitSizeDepth;
608
+ sps->numPartitions = m_param->num4x4Partitions;
609
+ sps->numPartInCUSize = 1 << m_param->unitSizeDepth;
610
611
- sps->log2MinCodingBlockSize = g_maxLog2CUSize - g_maxCUDepth;
612
- sps->log2DiffMaxMinCodingBlockSize = g_maxCUDepth;
613
+ sps->log2MinCodingBlockSize = m_param->maxLog2CUSize - m_param->maxCUDepth;
614
+ sps->log2DiffMaxMinCodingBlockSize = m_param->maxCUDepth;
615
uint32_t maxLog2TUSize = (uint32_t)g_log2Size[m_param->maxTUSize];
616
- sps->quadtreeTULog2MaxSize = X265_MIN(g_maxLog2CUSize, maxLog2TUSize);
617
+ sps->quadtreeTULog2MaxSize = X265_MIN((uint32_t)m_param->maxLog2CUSize, maxLog2TUSize);
618
sps->quadtreeTULog2MinSize = 2;
619
sps->quadtreeTUMaxDepthInter = m_param->tuQTMaxInterDepth;
620
sps->quadtreeTUMaxDepthIntra = m_param->tuQTMaxIntraDepth;
621
622
sps->bUseSAO = m_param->bEnableSAO;
623
624
sps->bUseAMP = m_param->bEnableAMP;
625
- sps->maxAMPDepth = m_param->bEnableAMP ? g_maxCUDepth : 0;
626
+ sps->maxAMPDepth = m_param->bEnableAMP ? m_param->maxCUDepth : 0;
627
628
sps->maxTempSubLayers = m_param->bEnableTemporalSubLayers ? 2 : 1;
629
sps->maxDecPicBuffering = m_vps.maxDecPicBuffering;
630
631
p->lookaheadDepth = p->totalFrames;
632
if (p->bIntraRefresh)
633
{
634
- int numCuInWidth = (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize;
635
+ int numCuInWidth = (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
636
if (p->maxNumReferences > 1)
637
{
638
x265_log(p, X265_LOG_WARNING, "Max References > 1 + intra-refresh is not supported , setting max num references = 1\n");
639
640
p->rc.rfConstantMin = 0;
641
}
642
643
- if (p->analysisMode && (p->bDistributeModeAnalysis || p->bDistributeMotionEstimation))
644
+ if (p->analysisReuseMode && (p->bDistributeModeAnalysis || p->bDistributeMotionEstimation))
645
{
646
x265_log(p, X265_LOG_WARNING, "Analysis load/save options incompatible with pmode/pme, Disabling pmode/pme\n");
647
p->bDistributeMotionEstimation = p->bDistributeModeAnalysis = 0;
648
}
649
650
- if (p->analysisMode && p->rc.cuTree)
651
+ if (p->analysisReuseMode && p->rc.cuTree)
652
{
653
x265_log(p, X265_LOG_WARNING, "Analysis load/save options works only with cu-tree off, Disabling cu-tree\n");
654
p->rc.cuTree = 0;
655
}
656
657
- if (p->analysisMode && (p->analysisMultiPassRefine || p->analysisMultiPassDistortion))
658
+ if (p->analysisReuseMode && (p->analysisMultiPassRefine || p->analysisMultiPassDistortion))
659
{
660
x265_log(p, X265_LOG_WARNING, "Cannot use Analysis load/save option and multi-pass-opt-analysis/multi-pass-opt-distortion together,"
661
"Disabling Analysis load/save and multi-pass-opt-analysis/multi-pass-opt-distortion\n");
662
- p->analysisMode = p->analysisMultiPassRefine = p->analysisMultiPassDistortion = 0;
663
+ p->analysisReuseMode = p->analysisMultiPassRefine = p->analysisMultiPassDistortion = 0;
664
+ }
665
+ if (p->scaleFactor)
666
+ {
667
+ if (p->scaleFactor == 1)
668
+ {
669
+ p->scaleFactor = 0;
670
+ }
671
+ else if (!p->analysisReuseMode || p->analysisReuseLevel < 10)
672
+ {
673
+ x265_log(p, X265_LOG_WARNING, "Input scaling works with analysis-reuse-mode, analysis-reuse-level 10. Disabling scale-factor.\n");
674
+ p->scaleFactor = 0;
675
+ }
676
+ }
677
+
678
+ if (p->intraRefine)
679
+ {
680
+ if (p->analysisReuseMode!= X265_ANALYSIS_LOAD || p->analysisReuseLevel < 10 || !p->scaleFactor)
681
+ {
682
+ x265_log(p, X265_LOG_WARNING, "Intra refinement requires analysis load, analysis-reuse-level 10, scale factor. Disabling intra refine.\n");
683
+ p->intraRefine = 0;
684
+ }
685
+ }
686
+
687
+ if (p->interRefine)
688
+ {
689
+ if (p->analysisReuseMode != X265_ANALYSIS_LOAD || p->analysisReuseLevel < 10 || !p->scaleFactor)
690
+ {
691
+ x265_log(p, X265_LOG_WARNING, "Inter refinement requires analysis load, analysis-reuse-level 10, scale factor. Disabling inter refine.\n");
692
+ p->interRefine = 0;
693
+ }
694
+ }
695
+
696
+ if (p->limitTU && p->interRefine)
697
+ {
698
+ x265_log(p, X265_LOG_WARNING, "Inter refinement does not support limitTU. Disabling limitTU.\n");
699
+ p->limitTU = 0;
700
+ }
701
+
702
+ if (p->mvRefine)
703
+ {
704
+ if (p->analysisReuseMode != X265_ANALYSIS_LOAD || p->analysisReuseLevel < 10 || !p->scaleFactor)
705
+ {
706
+ x265_log(p, X265_LOG_WARNING, "MV refinement requires analysis load, analysis-reuse-level 10, scale factor. Disabling MV refine.\n");
707
+ p->mvRefine = 0;
708
+ }
709
}
710
711
if ((p->analysisMultiPassRefine || p->analysisMultiPassDistortion) && (p->bDistributeModeAnalysis || p->bDistributeMotionEstimation))
712
713
m_conformanceWindow.topOffset = 0;
714
m_conformanceWindow.bottomOffset = 0;
715
m_conformanceWindow.leftOffset = 0;
716
-
717
/* set pad size if width is not multiple of the minimum CU size */
718
- if (p->sourceWidth & (p->minCUSize - 1))
719
+ if (p->scaleFactor == 2 && ((p->sourceWidth / 2) & (p->minCUSize - 1)) && p->analysisReuseMode == X265_ANALYSIS_LOAD)
720
+ {
721
+ uint32_t rem = (p->sourceWidth / 2) & (p->minCUSize - 1);
722
+ uint32_t padsize = p->minCUSize - rem;
723
+ p->sourceWidth += padsize * 2;
724
+
725
+ m_conformanceWindow.bEnabled = true;
726
+ m_conformanceWindow.rightOffset = padsize * 2;
727
+ }
728
+ else if(p->sourceWidth & (p->minCUSize - 1))
729
{
730
uint32_t rem = p->sourceWidth & (p->minCUSize - 1);
731
uint32_t padsize = p->minCUSize - rem;
732
733
p->dynamicRd = 0;
734
x265_log(p, X265_LOG_WARNING, "Dynamic-rd disabled, requires RD <= 4, VBV and aq-mode enabled\n");
735
}
736
-#ifdef ENABLE_DYNAMIC_HDR10
737
+#ifdef ENABLE_HDR10_PLUS
738
if (m_param->bDhdr10opt && m_param->toneMapFile == NULL)
739
{
740
x265_log(p, X265_LOG_WARNING, "Disabling dhdr10-opt. dhdr10-info must be enabled.\n");
741
742
#else
743
if (m_param->toneMapFile)
744
{
745
- x265_log(p, X265_LOG_WARNING, "--dhdr10-info disabled. Enable dynamic HDR in cmake.\n");
746
+ x265_log(p, X265_LOG_WARNING, "--dhdr10-info disabled. Enable HDR10_PLUS in cmake.\n");
747
m_bToneMap = 0;
748
m_param->toneMapFile = NULL;
749
}
750
751
x265_log(p, X265_LOG_ERROR, "uhd-bd: Disabled\n");
752
}
753
}
754
-
755
/* set pad size if height is not multiple of the minimum CU size */
756
- if (p->sourceHeight & (p->minCUSize - 1))
757
+ if (p->scaleFactor == 2 && ((p->sourceHeight / 2) & (p->minCUSize - 1)) && p->analysisReuseMode == X265_ANALYSIS_LOAD)
758
+ {
759
+ uint32_t rem = (p->sourceHeight / 2) & (p->minCUSize - 1);
760
+ uint32_t padsize = p->minCUSize - rem;
761
+ p->sourceHeight += padsize * 2;
762
+ m_conformanceWindow.bEnabled = true;
763
+ m_conformanceWindow.bottomOffset = padsize * 2;
764
+ }
765
+ else if(p->sourceHeight & (p->minCUSize - 1))
766
{
767
uint32_t rem = p->sourceHeight & (p->minCUSize - 1);
768
uint32_t padsize = p->minCUSize - rem;
769
770
if (p->bLogCuStats)
771
x265_log(p, X265_LOG_WARNING, "--cu-stats option is now deprecated\n");
772
773
- if (p->csvfn)
774
- x265_log(p, X265_LOG_WARNING, "libx265 no longer supports CSV file statistics\n");
775
-
776
if (p->log2MaxPocLsb < 4)
777
{
778
x265_log(p, X265_LOG_WARNING, "maximum of the picture order count can not be less than 4\n");
779
780
p->bHDROpt = 0;
781
}
782
}
783
+
784
+ if (m_param->toneMapFile || p->bHDROpt || p->bEmitHDRSEI)
785
+ {
786
+ if (!p->bRepeatHeaders)
787
+ {
788
+ p->bRepeatHeaders = 1;
789
+ x265_log(p, X265_LOG_WARNING, "Turning on repeat-headers for HDR compatibility\n");
790
+ }
791
+ }
792
+
793
+ p->maxLog2CUSize = g_log2Size[p->maxCUSize];
794
+ p->maxCUDepth = p->maxLog2CUSize - g_log2Size[p->minCUSize];
795
+ p->unitSizeDepth = p->maxLog2CUSize - LOG2_UNIT_SIZE;
796
+ p->num4x4Partitions = (1U << (p->unitSizeDepth << 1));
797
}
798
799
void Encoder::allocAnalysis(x265_analysis_data* analysis)
800
801
analysis->interData = analysis->intraData = NULL;
802
if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
803
{
804
- if (m_param->analysisRefineLevel < 2)
805
+ if (m_param->analysisReuseLevel < 2)
806
return;
807
808
analysis_intra_data *intraData = (analysis_intra_data*)analysis->intraData;
809
810
int numDir = analysis->sliceType == X265_TYPE_P ? 1 : 2;
811
uint32_t numPlanes = m_param->internalCsp == X265_CSP_I400 ? 1 : 3;
812
CHECKED_MALLOC_ZERO(analysis->wt, WeightParam, numPlanes * numDir);
813
- if (m_param->analysisRefineLevel < 2)
814
+ if (m_param->analysisReuseLevel < 2)
815
return;
816
817
analysis_inter_data *interData = (analysis_inter_data*)analysis->interData;
818
CHECKED_MALLOC_ZERO(interData, analysis_inter_data, 1);
819
CHECKED_MALLOC(interData->depth, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
820
CHECKED_MALLOC(interData->modes, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
821
- if (m_param->analysisRefineLevel > 4)
822
+ if (m_param->analysisReuseLevel > 4)
823
{
824
CHECKED_MALLOC(interData->partSize, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
825
CHECKED_MALLOC(interData->mergeFlag, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
826
}
827
828
- if (m_param->analysisRefineLevel == 10)
829
+ if (m_param->analysisReuseLevel == 10)
830
{
831
CHECKED_MALLOC(interData->interDir, uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
832
for (int dir = 0; dir < numDir; dir++)
833
{
834
CHECKED_MALLOC(interData->mvpIdx[dir], uint8_t, analysis->numPartitions * analysis->numCUsInFrame);
835
CHECKED_MALLOC(interData->refIdx[dir], int8_t, analysis->numPartitions * analysis->numCUsInFrame);
836
- CHECKED_MALLOC(interData->mv[dir], MV, analysis->numPartitions * analysis->numCUsInFrame);
837
+ CHECKED_MALLOC(interData->mv[dir], MV, analysis->numPartitions * analysis->numCUsInFrame);
838
}
839
840
/* Allocate intra in inter */
841
842
/* Early exit freeing weights alone if level is 1 (when there is no analysis inter/intra) */
843
if (analysis->sliceType > X265_TYPE_I && analysis->wt)
844
X265_FREE(analysis->wt);
845
- if (m_param->analysisRefineLevel < 2)
846
+ if (m_param->analysisReuseLevel < 2)
847
return;
848
849
- if (analysis->intraData)
850
+ if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
851
{
852
- if (m_param->analysisRefineLevel < 2)
853
- return;
854
-
855
- X265_FREE(((analysis_intra_data*)analysis->intraData)->depth);
856
- X265_FREE(((analysis_intra_data*)analysis->intraData)->modes);
857
- X265_FREE(((analysis_intra_data*)analysis->intraData)->partSizes);
858
- X265_FREE(((analysis_intra_data*)analysis->intraData)->chromaModes);
859
- X265_FREE(analysis->intraData);
860
+ if (analysis->intraData)
861
+ {
862
+ X265_FREE(((analysis_intra_data*)analysis->intraData)->depth);
863
+ X265_FREE(((analysis_intra_data*)analysis->intraData)->modes);
864
+ X265_FREE(((analysis_intra_data*)analysis->intraData)->partSizes);
865
+ X265_FREE(((analysis_intra_data*)analysis->intraData)->chromaModes);
866
+ X265_FREE(analysis->intraData);
867
+ analysis->intraData = NULL;
868
+ }
869
}
870
- else if (analysis->interData)
871
+ else
872
{
873
- X265_FREE(((analysis_inter_data*)analysis->interData)->depth);
874
- X265_FREE(((analysis_inter_data*)analysis->interData)->modes);
875
- if (m_param->analysisRefineLevel > 4)
876
+ if (analysis->intraData)
877
{
878
- X265_FREE(((analysis_inter_data*)analysis->interData)->mergeFlag);
879
- X265_FREE(((analysis_inter_data*)analysis->interData)->partSize);
880
+ X265_FREE(((analysis_intra_data*)analysis->intraData)->modes);
881
+ X265_FREE(((analysis_intra_data*)analysis->intraData)->chromaModes);
882
+ X265_FREE(analysis->intraData);
883
+ analysis->intraData = NULL;
884
}
885
-
886
- if (m_param->analysisRefineLevel == 10)
887
+ if (analysis->interData)
888
{
889
- X265_FREE(((analysis_inter_data*)analysis->interData)->interDir);
890
- int numDir = analysis->sliceType == X265_TYPE_P ? 1 : 2;
891
- for (int dir = 0; dir < numDir; dir++)
892
+ X265_FREE(((analysis_inter_data*)analysis->interData)->depth);
893
+ X265_FREE(((analysis_inter_data*)analysis->interData)->modes);
894
+ if (m_param->analysisReuseLevel > 4)
895
{
896
- X265_FREE(((analysis_inter_data*)analysis->interData)->mvpIdx[dir]);
897
- X265_FREE(((analysis_inter_data*)analysis->interData)->refIdx[dir]);
898
- X265_FREE(((analysis_inter_data*)analysis->interData)->mv[dir]);
899
- }
900
- if (analysis->sliceType == P_SLICE || m_param->bIntraInBFrames)
901
- {
902
- X265_FREE(((analysis_intra_data*)analysis->intraData)->modes);
903
- X265_FREE(((analysis_intra_data*)analysis->intraData)->chromaModes);
904
- X265_FREE(analysis->intraData);
905
+ X265_FREE(((analysis_inter_data*)analysis->interData)->mergeFlag);
906
+ X265_FREE(((analysis_inter_data*)analysis->interData)->partSize);
907
}
908
- }
909
- else
910
- X265_FREE(((analysis_inter_data*)analysis->interData)->ref);
911
+ if (m_param->analysisReuseLevel == 10)
912
+ {
913
+ X265_FREE(((analysis_inter_data*)analysis->interData)->interDir);
914
+ int numDir = analysis->sliceType == X265_TYPE_P ? 1 : 2;
915
+ for (int dir = 0; dir < numDir; dir++)
916
+ {
917
+ X265_FREE(((analysis_inter_data*)analysis->interData)->mvpIdx[dir]);
918
+ X265_FREE(((analysis_inter_data*)analysis->interData)->refIdx[dir]);
919
+ X265_FREE(((analysis_inter_data*)analysis->interData)->mv[dir]);
920
+ }
921
+ }
922
+ else
923
+ X265_FREE(((analysis_inter_data*)analysis->interData)->ref);
924
925
- X265_FREE(analysis->interData);
926
+ X265_FREE(analysis->interData);
927
+ analysis->interData = NULL;
928
+ }
929
}
930
}
931
932
933
{
934
analysis->analysisFramedata = NULL;
935
analysis2PassFrameData *analysisFrameData = (analysis2PassFrameData*)analysis->analysisFramedata;
936
- uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
937
- uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
938
+ uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
939
+ uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
940
941
uint32_t numCUsInFrame = widthInCU * heightInCU;
942
CHECKED_MALLOC_ZERO(analysisFrameData, analysis2PassFrameData, 1);
943
- CHECKED_MALLOC_ZERO(analysisFrameData->depth, uint8_t, NUM_4x4_PARTITIONS * numCUsInFrame);
944
- CHECKED_MALLOC_ZERO(analysisFrameData->distortion, sse_t, NUM_4x4_PARTITIONS * numCUsInFrame);
945
+ CHECKED_MALLOC_ZERO(analysisFrameData->depth, uint8_t, m_param->num4x4Partitions * numCUsInFrame);
946
+ CHECKED_MALLOC_ZERO(analysisFrameData->distortion, sse_t, m_param->num4x4Partitions * numCUsInFrame);
947
if (m_param->rc.bStatRead)
948
{
949
CHECKED_MALLOC_ZERO(analysisFrameData->ctuDistortion, sse_t, numCUsInFrame);
950
951
}
952
if (!IS_X265_TYPE_I(sliceType))
953
{
954
- CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[0], MV, NUM_4x4_PARTITIONS * numCUsInFrame);
955
- CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[1], MV, NUM_4x4_PARTITIONS * numCUsInFrame);
956
- CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[0], int, NUM_4x4_PARTITIONS * numCUsInFrame);
957
- CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[1], int, NUM_4x4_PARTITIONS * numCUsInFrame);
958
- CHECKED_MALLOC_ZERO(analysisFrameData->ref[0], int32_t, NUM_4x4_PARTITIONS * numCUsInFrame);
959
- CHECKED_MALLOC_ZERO(analysisFrameData->ref[1], int32_t, NUM_4x4_PARTITIONS * numCUsInFrame);
960
- CHECKED_MALLOC(analysisFrameData->modes, uint8_t, NUM_4x4_PARTITIONS * numCUsInFrame);
961
+ CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[0], MV, m_param->num4x4Partitions * numCUsInFrame);
962
+ CHECKED_MALLOC_ZERO(analysisFrameData->m_mv[1], MV, m_param->num4x4Partitions * numCUsInFrame);
963
+ CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[0], int, m_param->num4x4Partitions * numCUsInFrame);
964
+ CHECKED_MALLOC_ZERO(analysisFrameData->mvpIdx[1], int, m_param->num4x4Partitions * numCUsInFrame);
965
+ CHECKED_MALLOC_ZERO(analysisFrameData->ref[0], int32_t, m_param->num4x4Partitions * numCUsInFrame);
966
+ CHECKED_MALLOC_ZERO(analysisFrameData->ref[1], int32_t, m_param->num4x4Partitions * numCUsInFrame);
967
+ CHECKED_MALLOC(analysisFrameData->modes, uint8_t, m_param->num4x4Partitions * numCUsInFrame);
968
}
969
970
analysis->analysisFramedata = analysisFrameData;
971
972
}
973
}
974
975
-void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc)
976
+void Encoder::readAnalysisFile(x265_analysis_data* analysis, int curPoc, const x265_picture* picIn)
977
{
978
979
-#define X265_FREAD(val, size, readSize, fileOffset)\
980
- if (fread(val, size, readSize, fileOffset) != readSize)\
981
+#define X265_FREAD(val, size, readSize, fileOffset, src)\
982
+ if (!m_param->bUseAnalysisFile)\
983
+ {\
984
+ memcpy(val, src, (size * readSize));\
985
+ }\
986
+ else if (fread(val, size, readSize, fileOffset) != readSize)\
987
{\
988
x265_log(NULL, X265_LOG_ERROR, "Error reading analysis data\n");\
989
freeAnalysis(analysis);\
990
991
uint32_t depthBytes = 0;
992
fseeko(m_analysisFile, totalConsumedBytes, SEEK_SET);
993
994
- int poc; uint32_t frameRecordSize;
995
- X265_FREAD(&frameRecordSize, sizeof(uint32_t), 1, m_analysisFile);
996
- X265_FREAD(&depthBytes, sizeof(uint32_t), 1, m_analysisFile);
997
- X265_FREAD(&poc, sizeof(int), 1, m_analysisFile);
998
+ const x265_analysis_data *picData = &(picIn->analysisData);
999
+ analysis_intra_data *intraPic = (analysis_intra_data *)picData->intraData;
1000
+ analysis_inter_data *interPic = (analysis_inter_data *)picData->interData;
1001
1002
- uint64_t currentOffset = totalConsumedBytes;
1003
+ int poc; uint32_t frameRecordSize;
1004
+ X265_FREAD(&frameRecordSize, sizeof(uint32_t), 1, m_analysisFile, &(picData->frameRecordSize));
1005
+ X265_FREAD(&depthBytes, sizeof(uint32_t), 1, m_analysisFile, &(picData->depthBytes));
1006
+ X265_FREAD(&poc, sizeof(int), 1, m_analysisFile, &(picData->poc));
1007
1008
- /* Seeking to the right frame Record */
1009
- while (poc != curPoc && !feof(m_analysisFile))
1010
+ if (m_param->bUseAnalysisFile)
1011
{
1012
- currentOffset += frameRecordSize;
1013
- fseeko(m_analysisFile, currentOffset, SEEK_SET);
1014
- X265_FREAD(&frameRecordSize, sizeof(uint32_t), 1, m_analysisFile);
1015
- X265_FREAD(&depthBytes, sizeof(uint32_t), 1, m_analysisFile);
1016
- X265_FREAD(&poc, sizeof(int), 1, m_analysisFile);
1017
- }
1018
+ uint64_t currentOffset = totalConsumedBytes;
1019
1020
- if (poc != curPoc || feof(m_analysisFile))
1021
- {
1022
- x265_log(NULL, X265_LOG_WARNING, "Error reading analysis data: Cannot find POC %d\n", curPoc);
1023
- freeAnalysis(analysis);
1024
- return;
1025
+ /* Seeking to the right frame Record */
1026
+ while (poc != curPoc && !feof(m_analysisFile))
1027
+ {
1028
+ currentOffset += frameRecordSize;
1029
+ fseeko(m_analysisFile, currentOffset, SEEK_SET);
1030
+ X265_FREAD(&frameRecordSize, sizeof(uint32_t), 1, m_analysisFile, &(picData->frameRecordSize));
1031
+ X265_FREAD(&depthBytes, sizeof(uint32_t), 1, m_analysisFile, &(picData->depthBytes));
1032
+ X265_FREAD(&poc, sizeof(int), 1, m_analysisFile, &(picData->poc));
1033
+ }
1034
+ if (poc != curPoc || feof(m_analysisFile))
1035
+ {
1036
+ x265_log(NULL, X265_LOG_WARNING, "Error reading analysis data: Cannot find POC %d\n", curPoc);
1037
+ freeAnalysis(analysis);
1038
+ return;
1039
+ }
1040
}
1041
1042
/* Now arrived at the right frame, read the record */
1043
analysis->poc = poc;
1044
analysis->frameRecordSize = frameRecordSize;
1045
- X265_FREAD(&analysis->sliceType, sizeof(int), 1, m_analysisFile);
1046
- X265_FREAD(&analysis->bScenecut, sizeof(int), 1, m_analysisFile);
1047
- X265_FREAD(&analysis->satdCost, sizeof(int64_t), 1, m_analysisFile);
1048
- X265_FREAD(&analysis->numCUsInFrame, sizeof(int), 1, m_analysisFile);
1049
- X265_FREAD(&analysis->numPartitions, sizeof(int), 1, m_analysisFile);
1050
+ X265_FREAD(&analysis->sliceType, sizeof(int), 1, m_analysisFile, &(picData->sliceType));
1051
+ X265_FREAD(&analysis->bScenecut, sizeof(int), 1, m_analysisFile, &(picData->bScenecut));
1052
+ X265_FREAD(&analysis->satdCost, sizeof(int64_t), 1, m_analysisFile, &(picData->satdCost));
1053
+ X265_FREAD(&analysis->numCUsInFrame, sizeof(int), 1, m_analysisFile, &(picData->numCUsInFrame));
1054
+ X265_FREAD(&analysis->numPartitions, sizeof(int), 1, m_analysisFile, &(picData->numPartitions));
1055
+ int scaledNumPartition = analysis->numPartitions;
1056
+ int factor = 1 << m_param->scaleFactor;
1057
+
1058
+ if (m_param->scaleFactor)
1059
+ analysis->numPartitions *= factor;
1060
1061
/* Memory is allocated for inter and intra analysis data based on the slicetype */
1062
allocAnalysis(analysis);
1063
1064
if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
1065
{
1066
- analysis->sliceType = X265_TYPE_I;
1067
- if (m_param->analysisRefineLevel < 2)
1068
+ if (m_param->analysisReuseLevel < 2)
1069
return;
1070
1071
uint8_t *tempBuf = NULL, *depthBuf = NULL, *modeBuf = NULL, *partSizes = NULL;
1072
1073
tempBuf = X265_MALLOC(uint8_t, depthBytes * 3);
1074
- X265_FREAD(tempBuf, sizeof(uint8_t), depthBytes * 3, m_analysisFile);
1075
-
1076
depthBuf = tempBuf;
1077
modeBuf = tempBuf + depthBytes;
1078
partSizes = tempBuf + 2 * depthBytes;
1079
1080
+ X265_FREAD(depthBuf, sizeof(uint8_t), depthBytes, m_analysisFile, intraPic->depth);
1081
+ X265_FREAD(modeBuf, sizeof(uint8_t), depthBytes, m_analysisFile, intraPic->chromaModes);
1082
+ X265_FREAD(partSizes, sizeof(uint8_t), depthBytes, m_analysisFile, intraPic->partSizes);
1083
+
1084
size_t count = 0;
1085
for (uint32_t d = 0; d < depthBytes; d++)
1086
{
1087
int bytes = analysis->numPartitions >> (depthBuf[d] * 2);
1088
+ if (m_param->scaleFactor)
1089
+ {
1090
+ if (depthBuf[d] == 0)
1091
+ depthBuf[d] = 1;
1092
+ if (partSizes[d] == SIZE_NxN)
1093
+ partSizes[d] = SIZE_2Nx2N;
1094
+ }
1095
memset(&((analysis_intra_data *)analysis->intraData)->depth[count], depthBuf[d], bytes);
1096
memset(&((analysis_intra_data *)analysis->intraData)->chromaModes[count], modeBuf[d], bytes);
1097
memset(&((analysis_intra_data *)analysis->intraData)->partSizes[count], partSizes[d], bytes);
1098
count += bytes;
1099
}
1100
- X265_FREAD(((analysis_intra_data *)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile);
1101
+
1102
+ if (!m_param->scaleFactor)
1103
+ {
1104
+ X265_FREAD(((analysis_intra_data *)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile, intraPic->modes);
1105
+ }
1106
+ else
1107
+ {
1108
+ uint8_t *tempLumaBuf = X265_MALLOC(uint8_t, analysis->numCUsInFrame * scaledNumPartition);
1109
+ X265_FREAD(tempLumaBuf, sizeof(uint8_t), analysis->numCUsInFrame * scaledNumPartition, m_analysisFile, intraPic->modes);
1110
+ for (uint32_t ctu32Idx = 0, cnt = 0; ctu32Idx < analysis->numCUsInFrame * scaledNumPartition; ctu32Idx++, cnt += factor)
1111
+ memset(&((analysis_intra_data *)analysis->intraData)->modes[cnt], tempLumaBuf[ctu32Idx], factor);
1112
+ X265_FREE(tempLumaBuf);
1113
+ }
1114
X265_FREE(tempBuf);
1115
consumedBytes += frameRecordSize;
1116
}
1117
1118
{
1119
uint32_t numDir = analysis->sliceType == X265_TYPE_P ? 1 : 2;
1120
uint32_t numPlanes = m_param->internalCsp == X265_CSP_I400 ? 1 : 3;
1121
- X265_FREAD((WeightParam*)analysis->wt, sizeof(WeightParam), numPlanes * numDir, m_analysisFile);
1122
- if (m_param->analysisRefineLevel < 2)
1123
+ X265_FREAD((WeightParam*)analysis->wt, sizeof(WeightParam), numPlanes * numDir, m_analysisFile, (picIn->analysisData.wt));
1124
+ if (m_param->analysisReuseLevel < 2)
1125
return;
1126
1127
uint8_t *tempBuf = NULL, *depthBuf = NULL, *modeBuf = NULL, *partSize = NULL, *mergeFlag = NULL;
1128
1129
MV* mv[2];
1130
int8_t* refIdx[2];
1131
1132
- int numBuf = m_param->analysisRefineLevel > 4 ? 4 : 2;
1133
+ int numBuf = m_param->analysisReuseLevel > 4 ? 4 : 2;
1134
bool bIntraInInter = false;
1135
- if (m_param->analysisRefineLevel == 10)
1136
+ if (m_param->analysisReuseLevel == 10)
1137
{
1138
numBuf++;
1139
bIntraInInter = (analysis->sliceType == X265_TYPE_P || m_param->bIntraInBFrames);
1140
1141
}
1142
1143
tempBuf = X265_MALLOC(uint8_t, depthBytes * numBuf);
1144
- X265_FREAD(tempBuf, sizeof(uint8_t), depthBytes * numBuf, m_analysisFile);
1145
-
1146
depthBuf = tempBuf;
1147
modeBuf = tempBuf + depthBytes;
1148
- if (m_param->analysisRefineLevel > 4)
1149
+
1150
+ X265_FREAD(depthBuf, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->depth);
1151
+ X265_FREAD(modeBuf, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->modes);
1152
+
1153
+ if (m_param->analysisReuseLevel > 4)
1154
{
1155
partSize = modeBuf + depthBytes;
1156
mergeFlag = partSize + depthBytes;
1157
- if (m_param->analysisRefineLevel == 10)
1158
+ X265_FREAD(partSize, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->partSize);
1159
+ X265_FREAD(mergeFlag, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->mergeFlag);
1160
+
1161
+ if (m_param->analysisReuseLevel == 10)
1162
{
1163
interDir = mergeFlag + depthBytes;
1164
- if (bIntraInInter) chromaDir = interDir + depthBytes;
1165
+ X265_FREAD(interDir, sizeof(uint8_t), depthBytes, m_analysisFile, interPic->interDir);
1166
+ if (bIntraInInter)
1167
+ {
1168
+ chromaDir = interDir + depthBytes;
1169
+ X265_FREAD(chromaDir, sizeof(uint8_t), depthBytes, m_analysisFile, intraPic->chromaModes);
1170
+ }
1171
for (uint32_t i = 0; i < numDir; i++)
1172
{
1173
- mvpIdx[i] = X265_MALLOC(uint8_t, depthBytes * 3);
1174
- X265_FREAD(mvpIdx[i], sizeof(uint8_t), depthBytes, m_analysisFile);
1175
+ mvpIdx[i] = X265_MALLOC(uint8_t, depthBytes);
1176
refIdx[i] = X265_MALLOC(int8_t, depthBytes);
1177
- X265_FREAD(refIdx[i], sizeof(int8_t), depthBytes, m_analysisFile);
1178
mv[i] = X265_MALLOC(MV, depthBytes);
1179
- X265_FREAD(mv[i], sizeof(MV), depthBytes, m_analysisFile);
1180
+ X265_FREAD(mvpIdx[i], sizeof(uint8_t), depthBytes, m_analysisFile, interPic->mvpIdx[i]);
1181
+ X265_FREAD(refIdx[i], sizeof(int8_t), depthBytes, m_analysisFile, interPic->refIdx[i]);
1182
+ X265_FREAD(mv[i], sizeof(MV), depthBytes, m_analysisFile, interPic->mv[i]);
1183
}
1184
}
1185
}
1186
1187
for (uint32_t d = 0; d < depthBytes; d++)
1188
{
1189
int bytes = analysis->numPartitions >> (depthBuf[d] * 2);
1190
+ if (m_param->scaleFactor && modeBuf[d] == MODE_INTRA && depthBuf[d] == 0)
1191
+ depthBuf[d] = 1;
1192
memset(&((analysis_inter_data *)analysis->interData)->depth[count], depthBuf[d], bytes);
1193
memset(&((analysis_inter_data *)analysis->interData)->modes[count], modeBuf[d], bytes);
1194
- if (m_param->analysisRefineLevel > 4)
1195
+ if (m_param->analysisReuseLevel > 4)
1196
{
1197
+ if (m_param->scaleFactor && modeBuf[d] == MODE_INTRA && partSize[d] == SIZE_NxN)
1198
+ partSize[d] = SIZE_2Nx2N;
1199
memset(&((analysis_inter_data *)analysis->interData)->partSize[count], partSize[d], bytes);
1200
- int numPU = nbPartsTable[(int)partSize[d]];
1201
+ int numPU = (modeBuf[d] == MODE_INTRA) ? 1 : nbPartsTable[(int)partSize[d]];
1202
for (int pu = 0; pu < numPU; pu++)
1203
{
1204
if (pu) d++;
1205
((analysis_inter_data *)analysis->interData)->mergeFlag[count + pu] = mergeFlag[d];
1206
- if (m_param->analysisRefineLevel == 10)
1207
+ if (m_param->analysisReuseLevel == 10)
1208
{
1209
((analysis_inter_data *)analysis->interData)->interDir[count + pu] = interDir[d];
1210
for (uint32_t i = 0; i < numDir; i++)
1211
{
1212
((analysis_inter_data *)analysis->interData)->mvpIdx[i][count + pu] = mvpIdx[i][d];
1213
((analysis_inter_data *)analysis->interData)->refIdx[i][count + pu] = refIdx[i][d];
1214
+ if (m_param->scaleFactor)
1215
+ {
1216
+ mv[i][d].x *= (int16_t)m_param->scaleFactor;
1217
+ mv[i][d].y *= (int16_t)m_param->scaleFactor;
1218
+ }
1219
memcpy(&((analysis_inter_data *)analysis->interData)->mv[i][count + pu], &mv[i][d], sizeof(MV));
1220
}
1221
}
1222
}
1223
- if (m_param->analysisRefineLevel == 10 && bIntraInInter)
1224
+ if (m_param->analysisReuseLevel == 10 && bIntraInInter)
1225
memset(&((analysis_intra_data *)analysis->intraData)->chromaModes[count], chromaDir[d], bytes);
1226
}
1227
count += bytes;
1228
1229
1230
X265_FREE(tempBuf);
1231
1232
- if (m_param->analysisRefineLevel == 10)
1233
+ if (m_param->analysisReuseLevel == 10)
1234
{
1235
for (uint32_t i = 0; i < numDir; i++)
1236
{
1237
1238
X265_FREE(mv[i]);
1239
}
1240
if (bIntraInInter)
1241
- X265_FREAD(((analysis_intra_data *)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile);
1242
+ {
1243
+ if (!m_param->scaleFactor)
1244
+ {
1245
+ X265_FREAD(((analysis_intra_data *)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile, intraPic->modes);
1246
+ }
1247
+ else
1248
+ {
1249
+ uint8_t *tempLumaBuf = X265_MALLOC(uint8_t, analysis->numCUsInFrame * scaledNumPartition);
1250
+ X265_FREAD(tempLumaBuf, sizeof(uint8_t), analysis->numCUsInFrame * scaledNumPartition, m_analysisFile, intraPic->modes);
1251
+ for (uint32_t ctu32Idx = 0, cnt = 0; ctu32Idx < analysis->numCUsInFrame * scaledNumPartition; ctu32Idx++, cnt += factor)
1252
+ memset(&((analysis_intra_data *)analysis->intraData)->modes[cnt], tempLumaBuf[ctu32Idx], factor);
1253
+ X265_FREE(tempLumaBuf);
1254
+ }
1255
+ }
1256
}
1257
else
1258
- X265_FREAD(((analysis_inter_data *)analysis->interData)->ref, sizeof(int32_t), analysis->numCUsInFrame * X265_MAX_PRED_MODE_PER_CTU * numDir, m_analysisFile);
1259
+ X265_FREAD(((analysis_inter_data *)analysis->interData)->ref, sizeof(int32_t), analysis->numCUsInFrame * X265_MAX_PRED_MODE_PER_CTU * numDir, m_analysisFile, interPic->ref);
1260
1261
consumedBytes += frameRecordSize;
1262
if (numDir == 1)
1263
1264
}\
1265
1266
uint32_t depthBytes = 0;
1267
- uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
1268
- uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
1269
+ uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
1270
+ uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
1271
uint32_t numCUsInFrame = widthInCU * heightInCU;
1272
1273
int poc; uint32_t frameRecordSize;
1274
1275
double sum = 0, sqrSum = 0;
1276
for (uint32_t d = 0; d < depthBytes; d++)
1277
{
1278
- int bytes = NUM_4x4_PARTITIONS >> (depthBuf[d] * 2);
1279
+ int bytes = m_param->num4x4Partitions >> (depthBuf[d] * 2);
1280
memset(&analysisFrameData->depth[count], depthBuf[d], bytes);
1281
analysisFrameData->distortion[count] = distortionBuf[d];
1282
analysisFrameData->ctuDistortion[ctuCount] += analysisFrameData->distortion[count];
1283
count += bytes;
1284
- if ((count % (size_t)NUM_4x4_PARTITIONS) == 0)
1285
+ if ((count % (unsigned)m_param->num4x4Partitions) == 0)
1286
{
1287
analysisFrameData->scaledDistortion[ctuCount] = X265_LOG2(X265_MAX(analysisFrameData->ctuDistortion[ctuCount], 1));
1288
sum += analysisFrameData->scaledDistortion[ctuCount];
1289
1290
count = 0;
1291
for (uint32_t d = 0; d < depthBytes; d++)
1292
{
1293
- size_t bytes = NUM_4x4_PARTITIONS >> (depthBuf[d] * 2);
1294
+ size_t bytes = m_param->num4x4Partitions >> (depthBuf[d] * 2);
1295
for (int i = 0; i < numDir; i++)
1296
{
1297
for (size_t j = count, k = 0; k < bytes; j++, k++)
1298
1299
analysis->frameRecordSize += sizeof(WeightParam) * numPlanes * numDir;
1300
}
1301
1302
- if (m_param->analysisRefineLevel > 1)
1303
+ if (m_param->analysisReuseLevel > 1)
1304
{
1305
if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
1306
{
1307
1308
interDataCTU->depth[depthBytes] = depth;
1309
1310
predMode = ctu->m_predMode[absPartIdx];
1311
- if (m_param->analysisRefineLevel != 10 && ctu->m_refIdx[1][absPartIdx] != -1)
1312
+ if (m_param->analysisReuseLevel != 10 && ctu->m_refIdx[1][absPartIdx] != -1)
1313
predMode = 4; // used as indiacator if the block is coded as bidir
1314
1315
interDataCTU->modes[depthBytes] = predMode;
1316
1317
- if (m_param->analysisRefineLevel > 4)
1318
+ if (m_param->analysisReuseLevel > 4)
1319
{
1320
partSize = ctu->m_partSize[absPartIdx];
1321
interDataCTU->partSize[depthBytes] = partSize;
1322
1323
/* Store per PU data */
1324
- uint32_t numPU = nbPartsTable[(int)partSize];
1325
+ uint32_t numPU = (predMode == MODE_INTRA) ? 1 : nbPartsTable[(int)partSize];
1326
for (uint32_t puIdx = 0; puIdx < numPU; puIdx++)
1327
{
1328
uint32_t puabsPartIdx = ctu->getPUOffset(puIdx, absPartIdx) + absPartIdx;
1329
if (puIdx) depthBytes++;
1330
interDataCTU->mergeFlag[depthBytes] = ctu->m_mergeFlag[puabsPartIdx];
1331
1332
- if (m_param->analysisRefineLevel == 10)
1333
+ if (m_param->analysisReuseLevel == 10)
1334
{
1335
interDataCTU->interDir[depthBytes] = ctu->m_interDir[puabsPartIdx];
1336
for (uint32_t dir = 0; dir < numDir; dir++)
1337
1338
}
1339
}
1340
}
1341
- if (m_param->analysisRefineLevel == 10 && bIntraInInter)
1342
+ if (m_param->analysisReuseLevel == 10 && bIntraInInter)
1343
intraDataCTU->chromaModes[depthBytes] = ctu->m_chromaIntraDir[absPartIdx];
1344
}
1345
absPartIdx += ctu->m_numPartitions >> (depth * 2);
1346
}
1347
- if (m_param->analysisRefineLevel == 10 && bIntraInInter)
1348
+ if (m_param->analysisReuseLevel == 10 && bIntraInInter)
1349
memcpy(&intraDataCTU->modes[ctu->m_cuAddr * ctu->m_numPartitions], ctu->m_lumaIntraDir, sizeof(uint8_t)* ctu->m_numPartitions);
1350
}
1351
}
1352
1353
{
1354
/* Add sizeof depth, modes, partSize, mergeFlag */
1355
analysis->frameRecordSize += depthBytes * 2;
1356
- if (m_param->analysisRefineLevel > 4)
1357
+ if (m_param->analysisReuseLevel > 4)
1358
analysis->frameRecordSize += (depthBytes * 2);
1359
1360
- if (m_param->analysisRefineLevel == 10)
1361
+ if (m_param->analysisReuseLevel == 10)
1362
{
1363
/* Add Size of interDir, mvpIdx, refIdx, mv, luma and chroma modes */
1364
analysis->frameRecordSize += depthBytes;
1365
1366
else
1367
analysis->frameRecordSize += sizeof(int32_t)* analysis->numCUsInFrame * X265_MAX_PRED_MODE_PER_CTU * numDir;
1368
}
1369
+ analysis->depthBytes = depthBytes;
1370
}
1371
+
1372
+ if (!m_param->bUseAnalysisFile)
1373
+ return;
1374
+
1375
X265_FWRITE(&analysis->frameRecordSize, sizeof(uint32_t), 1, m_analysisFile);
1376
X265_FWRITE(&depthBytes, sizeof(uint32_t), 1, m_analysisFile);
1377
X265_FWRITE(&analysis->poc, sizeof(int), 1, m_analysisFile);
1378
1379
if (analysis->sliceType > X265_TYPE_I)
1380
X265_FWRITE((WeightParam*)analysis->wt, sizeof(WeightParam), numPlanes * numDir, m_analysisFile);
1381
1382
- if (m_param->analysisRefineLevel < 2)
1383
+ if (m_param->analysisReuseLevel < 2)
1384
return;
1385
1386
if (analysis->sliceType == X265_TYPE_IDR || analysis->sliceType == X265_TYPE_I)
1387
1388
{
1389
X265_FWRITE(((analysis_inter_data*)analysis->interData)->depth, sizeof(uint8_t), depthBytes, m_analysisFile);
1390
X265_FWRITE(((analysis_inter_data*)analysis->interData)->modes, sizeof(uint8_t), depthBytes, m_analysisFile);
1391
- if (m_param->analysisRefineLevel > 4)
1392
+ if (m_param->analysisReuseLevel > 4)
1393
{
1394
X265_FWRITE(((analysis_inter_data*)analysis->interData)->partSize, sizeof(uint8_t), depthBytes, m_analysisFile);
1395
X265_FWRITE(((analysis_inter_data*)analysis->interData)->mergeFlag, sizeof(uint8_t), depthBytes, m_analysisFile);
1396
- if (m_param->analysisRefineLevel == 10)
1397
+ if (m_param->analysisReuseLevel == 10)
1398
{
1399
X265_FWRITE(((analysis_inter_data*)analysis->interData)->interDir, sizeof(uint8_t), depthBytes, m_analysisFile);
1400
if (bIntraInInter) X265_FWRITE(((analysis_intra_data*)analysis->intraData)->chromaModes, sizeof(uint8_t), depthBytes, m_analysisFile);
1401
1402
X265_FWRITE(((analysis_intra_data*)analysis->intraData)->modes, sizeof(uint8_t), analysis->numCUsInFrame * analysis->numPartitions, m_analysisFile);
1403
}
1404
}
1405
- if (m_param->analysisRefineLevel != 10)
1406
+ if (m_param->analysisReuseLevel != 10)
1407
X265_FWRITE(((analysis_inter_data*)analysis->interData)->ref, sizeof(int32_t), analysis->numCUsInFrame * X265_MAX_PRED_MODE_PER_CTU * numDir, m_analysisFile);
1408
1409
}
1410
1411
}\
1412
1413
uint32_t depthBytes = 0;
1414
- uint32_t widthInCU = (m_param->sourceWidth + g_maxCUSize - 1) >> g_maxLog2CUSize;
1415
- uint32_t heightInCU = (m_param->sourceHeight + g_maxCUSize - 1) >> g_maxLog2CUSize;
1416
+ uint32_t widthInCU = (m_param->sourceWidth + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
1417
+ uint32_t heightInCU = (m_param->sourceHeight + m_param->maxCUSize - 1) >> m_param->maxLog2CUSize;
1418
uint32_t numCUsInFrame = widthInCU * heightInCU;
1419
analysis2PassFrameData* analysisFrameData = (analysis2PassFrameData*)analysis2Pass->analysisFramedata;
1420
1421
x265_2.4.tar.gz/source/encoder/encoder.h -> x265_2.5.tar.gz/source/encoder/encoder.h
Changed
54
1
2
#include "x265.h"
3
#include "nal.h"
4
#include "framedata.h"
5
-
6
-#ifdef ENABLE_DYNAMIC_HDR10
7
- #include "dynamicHDR10\hdr10plus.h"
8
+#ifdef ENABLE_HDR10_PLUS
9
+ #include "dynamicHDR10/hdr10plus.h"
10
#endif
11
-
12
struct x265_encoder {};
13
namespace X265_NS {
14
// private namespace
15
16
17
int m_bToneMap; // Enables tone-mapping
18
19
-#ifdef ENABLE_DYNAMIC_HDR10
20
+#ifdef ENABLE_HDR10_PLUS
21
const hdr10plus_api *m_hdr10plus_api;
22
+ uint8_t **cim;
23
+ int numCimInfo;
24
#endif
25
26
x265_sei_payload m_prevTonemapPayload;
27
28
Encoder();
29
~Encoder()
30
{
31
-#ifdef ENABLE_DYNAMIC_HDR10
32
+#ifdef ENABLE_HDR10_PLUS
33
if (m_prevTonemapPayload.payload != NULL)
34
X265_FREE(m_prevTonemapPayload.payload);
35
#endif
36
37
38
int reconfigureParam(x265_param* encParam, x265_param* param);
39
40
+ void copyCtuInfo(x265_ctu_info_t** frameCtuInfo, int poc);
41
+
42
void getStreamHeaders(NALList& list, Entropy& sbacCoder, Bitstream& bs);
43
44
void fetchStats(x265_stats* stats, size_t statsSizeBytes);
45
46
47
void freeAnalysis2Pass(x265_analysis_2Pass* analysis, int sliceType);
48
49
- void readAnalysisFile(x265_analysis_data* analysis, int poc);
50
+ void readAnalysisFile(x265_analysis_data* analysis, int poc, const x265_picture* picIn);
51
52
void writeAnalysisFile(x265_analysis_data* pic, FrameData &curEncData);
53
void readAnalysis2PassFile(x265_analysis_2Pass* analysis2Pass, int poc, int sliceType);
54
x265_2.4.tar.gz/source/encoder/entropy.cpp -> x265_2.5.tar.gz/source/encoder/entropy.cpp
Changed
64
1
2
// TODO: Enable when pps_loop_filter_across_slices_enabled_flag==1
3
// We didn't support filter across slice board, so disable it now
4
5
- if (g_maxSlices <= 1)
6
+ if (encData.m_param->maxSlices <= 1)
7
{
8
bool isSAOEnabled = slice.m_sps->bUseSAO ? saoParam->bSaoFlag[0] || saoParam->bSaoFlag[1] : false;
9
bool isDBFEnabled = !slice.m_pps->bPicDisableDeblockingFilter;
10
11
if (cuSplitFlag)
12
codeSplitFlag(ctu, absPartIdx, depth);
13
14
- if (depth < ctu.m_cuDepth[absPartIdx] && depth < g_maxCUDepth)
15
+ if (depth < ctu.m_cuDepth[absPartIdx] && depth < ctu.m_encData->m_param->maxCUDepth)
16
{
17
uint32_t qNumParts = cuGeom.numPartitions >> 2;
18
if (depth == slice->m_pps->maxCuDQPDepth && slice->m_pps->bUseDQP)
19
20
case SIZE_nRx2N:
21
bits += bitsCodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 0]);
22
bits += bitsCodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 1]);
23
- if (depth == g_maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
24
+ if (depth == cu.m_encData->m_param->maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
25
bits += bitsCodeBin(1, m_contextState[OFF_PART_SIZE_CTX + 2]);
26
if (cu.m_slice->m_sps->maxAMPDepth > depth)
27
{
28
29
uint32_t cuAddr = ctu.getSCUAddr() + absPartIdx;
30
X265_CHECK(realEndAddress == slice->realEndAddress(slice->m_endCUAddr), "real end address expected\n");
31
32
- uint32_t granularityMask = g_maxCUSize - 1;
33
+ uint32_t granularityMask = ctu.m_encData->m_param->maxCUSize - 1;
34
uint32_t cuSize = 1 << ctu.m_log2CUSize[absPartIdx];
35
uint32_t rpelx = ctu.m_cuPelX + g_zscanToPelX[absPartIdx] + cuSize;
36
uint32_t bpely = ctu.m_cuPelY + g_zscanToPelY[absPartIdx] + cuSize;
37
38
{
39
// Encode slice finish
40
uint32_t bTerminateSlice = ctu.m_bLastCuInSlice;
41
- if (cuAddr + (NUM_4x4_PARTITIONS >> (depth << 1)) == realEndAddress)
42
+ if (cuAddr + (slice->m_param->num4x4Partitions >> (depth << 1)) == realEndAddress)
43
bTerminateSlice = 1;
44
45
// The 1-terminating bit is added to all streams, so don't add it here when it's 1.
46
47
48
if (cu.isIntra(absPartIdx))
49
{
50
- if (depth == g_maxCUDepth)
51
+ if (depth == cu.m_encData->m_param->maxCUDepth)
52
encodeBin(partSize == SIZE_2Nx2N ? 1 : 0, m_contextState[OFF_PART_SIZE_CTX]);
53
return;
54
}
55
56
case SIZE_nRx2N:
57
encodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 0]);
58
encodeBin(0, m_contextState[OFF_PART_SIZE_CTX + 1]);
59
- if (depth == g_maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
60
+ if (depth == cu.m_encData->m_param->maxCUDepth && !(cu.m_log2CUSize[absPartIdx] == 3))
61
encodeBin(1, m_contextState[OFF_PART_SIZE_CTX + 2]);
62
if (cu.m_slice->m_sps->maxAMPDepth > depth)
63
{
64
x265_2.4.tar.gz/source/encoder/frameencoder.cpp -> x265_2.5.tar.gz/source/encoder/frameencoder.cpp
Changed
459
1
2
range += !!(m_param->searchMethod < 2); /* diamond/hex range check lag */
3
range += NTAPS_LUMA / 2; /* subpel filter half-length */
4
range += 2 + (MotionEstimate::hpelIterationCount(m_param->subpelRefine) + 1) / 2; /* subpel refine steps */
5
- m_refLagRows = /*(m_param->maxSlices > 1 ? 1 : 0) +*/ 1 + ((range + g_maxCUSize - 1) / g_maxCUSize);
6
+ m_refLagRows = /*(m_param->maxSlices > 1 ? 1 : 0) +*/ 1 + ((range + m_param->maxCUSize - 1) / m_param->maxCUSize);
7
8
// NOTE: 2 times of numRows because both Encoder and Filter in same queue
9
if (!WaveFront::init(m_numRows * 2))
10
11
12
while (m_threadActive)
13
{
14
+ if (m_param->bCTUInfo)
15
+ {
16
+ while (!m_frame->m_ctuInfo)
17
+ m_frame->m_copied.wait();
18
+ }
19
compressFrame();
20
m_done.trigger(); /* FrameEncoder::getEncodedPicture() blocks for this event */
21
m_enable.wait();
22
23
bool bUseWeightB = slice->m_sliceType == B_SLICE && slice->m_pps->bUseWeightedBiPred;
24
25
WeightParam* reuseWP = NULL;
26
- if (m_param->analysisMode && (bUseWeightP || bUseWeightB))
27
+ if (m_param->analysisReuseMode && (bUseWeightP || bUseWeightB))
28
reuseWP = (WeightParam*)m_frame->m_analysisData.wt;
29
30
if (bUseWeightP || bUseWeightB)
31
32
m_cuStats.countWeightAnalyze++;
33
ScopedElapsedTime time(m_cuStats.weightAnalyzeTime);
34
#endif
35
- if (m_param->analysisMode == X265_ANALYSIS_LOAD)
36
+ if (m_param->analysisReuseMode == X265_ANALYSIS_LOAD)
37
{
38
for (int list = 0; list < slice->isInterB() + 1; list++)
39
{
40
41
slice->m_refReconPicList[l][ref] = slice->m_refFrameList[l][ref]->m_reconPic;
42
m_mref[l][ref].init(slice->m_refReconPicList[l][ref], w, *m_param);
43
}
44
- if (m_param->analysisMode == X265_ANALYSIS_SAVE && (bUseWeightP || bUseWeightB))
45
+ if (m_param->analysisReuseMode == X265_ANALYSIS_SAVE && (bUseWeightP || bUseWeightB))
46
{
47
for (int i = 0; i < (m_param->internalCsp != X265_CSP_I400 ? 3 : 1); i++)
48
*(reuseWP++) = slice->m_weightPredTable[l][0][i];
49
50
if (writeSei)
51
{
52
SEICreativeIntentMeta sei;
53
- sei.cim = payload->payload;
54
+ sei.m_payload = payload->payload;
55
m_bs.resetBits();
56
sei.setSize(payload->payloadSize);
57
sei.write(m_bs, *slice->m_sps);
58
59
}
60
else if (m_param->decodedPictureHashSEI == 3)
61
{
62
- uint32_t cuHeight = g_maxCUSize;
63
+ uint32_t cuHeight = m_param->maxCUSize;
64
65
m_checksum[0] = 0;
66
67
68
m_frame->m_encData->m_frameStats.percent8x8Inter = (double)totalP / totalCuCount;
69
m_frame->m_encData->m_frameStats.percent8x8Skip = (double)totalSkip / totalCuCount;
70
}
71
- for (uint32_t i = 0; i < m_numRows; i++)
72
+
73
+ if (m_param->csvLogLevel >= 1)
74
{
75
- m_frame->m_encData->m_frameStats.cntIntraNxN += m_rows[i].rowStats.cntIntraNxN;
76
- m_frame->m_encData->m_frameStats.totalCu += m_rows[i].rowStats.totalCu;
77
- m_frame->m_encData->m_frameStats.totalCtu += m_rows[i].rowStats.totalCtu;
78
- m_frame->m_encData->m_frameStats.lumaDistortion += m_rows[i].rowStats.lumaDistortion;
79
- m_frame->m_encData->m_frameStats.chromaDistortion += m_rows[i].rowStats.chromaDistortion;
80
- m_frame->m_encData->m_frameStats.psyEnergy += m_rows[i].rowStats.psyEnergy;
81
- m_frame->m_encData->m_frameStats.ssimEnergy += m_rows[i].rowStats.ssimEnergy;
82
- m_frame->m_encData->m_frameStats.resEnergy += m_rows[i].rowStats.resEnergy;
83
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
84
+ for (uint32_t i = 0; i < m_numRows; i++)
85
{
86
- m_frame->m_encData->m_frameStats.cntSkipCu[depth] += m_rows[i].rowStats.cntSkipCu[depth];
87
- m_frame->m_encData->m_frameStats.cntMergeCu[depth] += m_rows[i].rowStats.cntMergeCu[depth];
88
- for (int m = 0; m < INTER_MODES; m++)
89
- m_frame->m_encData->m_frameStats.cuInterDistribution[depth][m] += m_rows[i].rowStats.cuInterDistribution[depth][m];
90
+ m_frame->m_encData->m_frameStats.cntIntraNxN += m_rows[i].rowStats.cntIntraNxN;
91
+ m_frame->m_encData->m_frameStats.totalCu += m_rows[i].rowStats.totalCu;
92
+ m_frame->m_encData->m_frameStats.totalCtu += m_rows[i].rowStats.totalCtu;
93
+ m_frame->m_encData->m_frameStats.lumaDistortion += m_rows[i].rowStats.lumaDistortion;
94
+ m_frame->m_encData->m_frameStats.chromaDistortion += m_rows[i].rowStats.chromaDistortion;
95
+ m_frame->m_encData->m_frameStats.psyEnergy += m_rows[i].rowStats.psyEnergy;
96
+ m_frame->m_encData->m_frameStats.ssimEnergy += m_rows[i].rowStats.ssimEnergy;
97
+ m_frame->m_encData->m_frameStats.resEnergy += m_rows[i].rowStats.resEnergy;
98
+ for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
99
+ {
100
+ m_frame->m_encData->m_frameStats.cntSkipCu[depth] += m_rows[i].rowStats.cntSkipCu[depth];
101
+ m_frame->m_encData->m_frameStats.cntMergeCu[depth] += m_rows[i].rowStats.cntMergeCu[depth];
102
+ for (int m = 0; m < INTER_MODES; m++)
103
+ m_frame->m_encData->m_frameStats.cuInterDistribution[depth][m] += m_rows[i].rowStats.cuInterDistribution[depth][m];
104
+ for (int n = 0; n < INTRA_MODES; n++)
105
+ m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] += m_rows[i].rowStats.cuIntraDistribution[depth][n];
106
+ }
107
+ }
108
+ m_frame->m_encData->m_frameStats.percentIntraNxN = (double)(m_frame->m_encData->m_frameStats.cntIntraNxN * 100) / m_frame->m_encData->m_frameStats.totalCu;
109
+
110
+ for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
111
+ {
112
+ m_frame->m_encData->m_frameStats.percentSkipCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntSkipCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
113
+ m_frame->m_encData->m_frameStats.percentMergeCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntMergeCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
114
for (int n = 0; n < INTRA_MODES; n++)
115
- m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] += m_rows[i].rowStats.cuIntraDistribution[depth][n];
116
+ m_frame->m_encData->m_frameStats.percentIntraDistribution[depth][n] = (double)(m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] * 100) / m_frame->m_encData->m_frameStats.totalCu;
117
+ uint64_t cuInterRectCnt = 0; // sum of Nx2N, 2NxN counts
118
+ cuInterRectCnt += m_frame->m_encData->m_frameStats.cuInterDistribution[depth][1] + m_frame->m_encData->m_frameStats.cuInterDistribution[depth][2];
119
+ m_frame->m_encData->m_frameStats.percentInterDistribution[depth][0] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][0] * 100) / m_frame->m_encData->m_frameStats.totalCu;
120
+ m_frame->m_encData->m_frameStats.percentInterDistribution[depth][1] = (double)(cuInterRectCnt * 100) / m_frame->m_encData->m_frameStats.totalCu;
121
+ m_frame->m_encData->m_frameStats.percentInterDistribution[depth][2] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][3] * 100) / m_frame->m_encData->m_frameStats.totalCu;
122
}
123
}
124
- m_frame->m_encData->m_frameStats.avgLumaDistortion = (double)(m_frame->m_encData->m_frameStats.lumaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
125
- m_frame->m_encData->m_frameStats.avgChromaDistortion = (double)(m_frame->m_encData->m_frameStats.chromaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
126
- m_frame->m_encData->m_frameStats.avgPsyEnergy = (double)(m_frame->m_encData->m_frameStats.psyEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
127
- m_frame->m_encData->m_frameStats.avgSsimEnergy = (double)(m_frame->m_encData->m_frameStats.ssimEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
128
- m_frame->m_encData->m_frameStats.avgResEnergy = (double)(m_frame->m_encData->m_frameStats.resEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
129
- m_frame->m_encData->m_frameStats.percentIntraNxN = (double)(m_frame->m_encData->m_frameStats.cntIntraNxN * 100) / m_frame->m_encData->m_frameStats.totalCu;
130
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
131
+
132
+ if (m_param->csvLogLevel >= 2)
133
{
134
- m_frame->m_encData->m_frameStats.percentSkipCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntSkipCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
135
- m_frame->m_encData->m_frameStats.percentMergeCu[depth] = (double)(m_frame->m_encData->m_frameStats.cntMergeCu[depth] * 100) / m_frame->m_encData->m_frameStats.totalCu;
136
- for (int n = 0; n < INTRA_MODES; n++)
137
- m_frame->m_encData->m_frameStats.percentIntraDistribution[depth][n] = (double)(m_frame->m_encData->m_frameStats.cuIntraDistribution[depth][n] * 100) / m_frame->m_encData->m_frameStats.totalCu;
138
- uint64_t cuInterRectCnt = 0; // sum of Nx2N, 2NxN counts
139
- cuInterRectCnt += m_frame->m_encData->m_frameStats.cuInterDistribution[depth][1] + m_frame->m_encData->m_frameStats.cuInterDistribution[depth][2];
140
- m_frame->m_encData->m_frameStats.percentInterDistribution[depth][0] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][0] * 100) / m_frame->m_encData->m_frameStats.totalCu;
141
- m_frame->m_encData->m_frameStats.percentInterDistribution[depth][1] = (double)(cuInterRectCnt * 100) / m_frame->m_encData->m_frameStats.totalCu;
142
- m_frame->m_encData->m_frameStats.percentInterDistribution[depth][2] = (double)(m_frame->m_encData->m_frameStats.cuInterDistribution[depth][3] * 100) / m_frame->m_encData->m_frameStats.totalCu;
143
+ m_frame->m_encData->m_frameStats.avgLumaDistortion = (double)(m_frame->m_encData->m_frameStats.lumaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
144
+ m_frame->m_encData->m_frameStats.avgChromaDistortion = (double)(m_frame->m_encData->m_frameStats.chromaDistortion) / m_frame->m_encData->m_frameStats.totalCtu;
145
+ m_frame->m_encData->m_frameStats.avgPsyEnergy = (double)(m_frame->m_encData->m_frameStats.psyEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
146
+ m_frame->m_encData->m_frameStats.avgSsimEnergy = (double)(m_frame->m_encData->m_frameStats.ssimEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
147
+ m_frame->m_encData->m_frameStats.avgResEnergy = (double)(m_frame->m_encData->m_frameStats.resEnergy) / m_frame->m_encData->m_frameStats.totalCtu;
148
}
149
150
m_bs.resetBits();
151
152
/* Accumulate CU statistics from each worker thread, we could report
153
* per-frame stats here, but currently we do not. */
154
for (int i = 0; i < numTLD; i++)
155
- m_cuStats.accumulate(m_tld[i].analysis.m_stats[m_jpId]);
156
+ m_cuStats.accumulate(m_tld[i].analysis.m_stats[m_jpId], *m_param);
157
#endif
158
159
m_endFrameTime = x265_mdate();
160
161
{
162
Slice* slice = m_frame->m_encData->m_slice;
163
const uint32_t widthInLCUs = slice->m_sps->numCuInWidth;
164
- const uint32_t lastCUAddr = (slice->m_endCUAddr + NUM_4x4_PARTITIONS - 1) / NUM_4x4_PARTITIONS;
165
+ const uint32_t lastCUAddr = (slice->m_endCUAddr + m_param->num4x4Partitions - 1) / m_param->num4x4Partitions;
166
const uint32_t numSubstreams = m_param->bEnableWavefront ? slice->m_sps->numCuInHeight : 1;
167
168
SAOParam* saoParam = slice->m_sps->bUseSAO ? m_frame->m_encData->m_saoParam : NULL;
169
170
const uint32_t row = (uint32_t)intRow;
171
CTURow& curRow = m_rows[row];
172
173
- tld.analysis.m_param = m_param;
174
if (m_param->bEnableWavefront)
175
{
176
ScopedLock self(curRow.lock);
177
178
179
uint32_t maxBlockCols = (m_frame->m_fencPic->m_picWidth + (16 - 1)) / 16;
180
uint32_t maxBlockRows = (m_frame->m_fencPic->m_picHeight + (16 - 1)) / 16;
181
- uint32_t noOfBlocks = g_maxCUSize / 16;
182
+ uint32_t noOfBlocks = m_param->maxCUSize / 16;
183
const uint32_t bFirstRowInSlice = ((row == 0) || (m_rows[row - 1].sliceId != curRow.sliceId)) ? 1 : 0;
184
const uint32_t bLastRowInSlice = ((row == m_numRows - 1) || (m_rows[row + 1].sliceId != curRow.sliceId)) ? 1 : 0;
185
const uint32_t sliceId = curRow.sliceId;
186
187
// TODO: specially case handle on first and last row
188
189
// Initialize restrict on MV range in slices
190
- tld.analysis.m_sliceMinY = -(int16_t)(rowInSlice * g_maxCUSize * 4) + 3 * 4;
191
- tld.analysis.m_sliceMaxY = (int16_t)((endRowInSlicePlus1 - 1 - row) * (g_maxCUSize * 4) - 4 * 4);
192
+ tld.analysis.m_sliceMinY = -(int16_t)(rowInSlice * m_param->maxCUSize * 4) + 3 * 4;
193
+ tld.analysis.m_sliceMaxY = (int16_t)((endRowInSlicePlus1 - 1 - row) * (m_param->maxCUSize * 4) - 4 * 4);
194
195
// Handle single row slice
196
if (tld.analysis.m_sliceMaxY < tld.analysis.m_sliceMinY)
197
198
cuStat.baseQp = curEncData.m_rowStat[row].rowQp;
199
200
/* TODO: use defines from slicetype.h for lowres block size */
201
- uint32_t block_y = (ctu->m_cuPelY >> g_maxLog2CUSize) * noOfBlocks;
202
- uint32_t block_x = (ctu->m_cuPelX >> g_maxLog2CUSize) * noOfBlocks;
203
+ uint32_t block_y = (ctu->m_cuPelY >> m_param->maxLog2CUSize) * noOfBlocks;
204
+ uint32_t block_x = (ctu->m_cuPelX >> m_param->maxLog2CUSize) * noOfBlocks;
205
206
cuStat.vbvCost = 0;
207
cuStat.intraVbvCost = 0;
208
209
curRow.rowStats.coeffBits += best.coeffBits;
210
curRow.rowStats.miscBits += best.totalBits - (best.mvBits + best.coeffBits);
211
212
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
213
+ for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
214
{
215
/* 1 << shift == number of 8x8 blocks at current depth */
216
- int shift = 2 * (g_maxCUDepth - depth);
217
- int cuSize = g_maxCUSize >> depth;
218
+ int shift = 2 * (m_param->maxCUDepth - depth);
219
+ int cuSize = m_param->maxCUSize >> depth;
220
221
if (cuSize == 8)
222
curRow.rowStats.intra8x8Cnt += (int)(frameLog.cntIntra[depth] + frameLog.cntIntraNxN);
223
224
curRow.rowStats.resEnergy += best.resEnergy;
225
curRow.rowStats.cntIntraNxN += frameLog.cntIntraNxN;
226
curRow.rowStats.totalCu += frameLog.totalCu;
227
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
228
+ for (uint32_t depth = 0; depth <= m_param->maxCUDepth; depth++)
229
{
230
curRow.rowStats.cntSkipCu[depth] += frameLog.cntSkipCu[depth];
231
curRow.rowStats.cntMergeCu[depth] += frameLog.cntMergeCu[depth];
232
233
x265_emms();
234
235
if (bIsVbv)
236
- {
237
- // Update encoded bits, satdCost, baseQP for each CU
238
- curEncData.m_rowStat[row].rowSatd += curEncData.m_cuStat[cuAddr].vbvCost;
239
- curEncData.m_rowStat[row].rowIntraSatd += curEncData.m_cuStat[cuAddr].intraVbvCost;
240
- curEncData.m_rowStat[row].encodedBits += curEncData.m_cuStat[cuAddr].totalBits;
241
- curEncData.m_rowStat[row].sumQpRc += curEncData.m_cuStat[cuAddr].baseQp;
242
- curEncData.m_rowStat[row].numEncodedCUs = cuAddr;
243
-
244
+ {
245
+ // Update encoded bits, satdCost, baseQP for each CU if tune grain is disabled
246
+ if ((m_param->bEnableWavefront && (!cuAddr || !m_param->rc.bEnableConstVbv)) || !m_param->bEnableWavefront)
247
+ {
248
+ curEncData.m_rowStat[row].rowSatd += curEncData.m_cuStat[cuAddr].vbvCost;
249
+ curEncData.m_rowStat[row].rowIntraSatd += curEncData.m_cuStat[cuAddr].intraVbvCost;
250
+ curEncData.m_rowStat[row].encodedBits += curEncData.m_cuStat[cuAddr].totalBits;
251
+ curEncData.m_rowStat[row].sumQpRc += curEncData.m_cuStat[cuAddr].baseQp;
252
+ curEncData.m_rowStat[row].numEncodedCUs = cuAddr;
253
+ }
254
+
255
// If current block is at row end checkpoint, call vbv ratecontrol.
256
257
if (!m_param->bEnableWavefront && col == numCols - 1)
258
259
260
else if (m_param->bEnableWavefront && row == col && row)
261
{
262
+ if (m_param->rc.bEnableConstVbv)
263
+ {
264
+ int32_t startCuAddr = numCols * row;
265
+ int32_t EndCuAddr = startCuAddr + col;
266
+ for (int32_t r = row; r >= 0; r--)
267
+ {
268
+ for (int32_t c = startCuAddr; c <= EndCuAddr && c <= (int32_t)numCols * (r + 1) - 1; c++)
269
+ {
270
+ curEncData.m_rowStat[r].rowSatd += curEncData.m_cuStat[c].vbvCost;
271
+ curEncData.m_rowStat[r].rowIntraSatd += curEncData.m_cuStat[c].intraVbvCost;
272
+ curEncData.m_rowStat[r].encodedBits += curEncData.m_cuStat[c].totalBits;
273
+ curEncData.m_rowStat[r].sumQpRc += curEncData.m_cuStat[c].baseQp;
274
+ curEncData.m_rowStat[r].numEncodedCUs = c;
275
+ }
276
+ startCuAddr = EndCuAddr - numCols;
277
+ EndCuAddr = startCuAddr + 1;
278
+ }
279
+ }
280
double qpBase = curEncData.m_cuStat[cuAddr].baseQp;
281
int reEncode = m_top->m_rateControl->rowVbvRateControl(m_frame, row, &m_rce, qpBase);
282
qpBase = x265_clip3((double)m_param->rc.qpMin, (double)m_param->rc.qpMax, qpBase);
283
284
}
285
286
/** this row of CTUs has been compressed **/
287
+ if (m_param->bEnableWavefront && m_param->rc.bEnableConstVbv)
288
+ {
289
+ if (row == m_numRows - 1)
290
+ {
291
+ for (int32_t r = 0; r < (int32_t)m_numRows; r++)
292
+ {
293
+ for (int32_t c = curEncData.m_rowStat[r].numEncodedCUs + 1; c < (int32_t)numCols * (r + 1); c++)
294
+ {
295
+ curEncData.m_rowStat[r].rowSatd += curEncData.m_cuStat[c].vbvCost;
296
+ curEncData.m_rowStat[r].rowIntraSatd += curEncData.m_cuStat[c].intraVbvCost;
297
+ curEncData.m_rowStat[r].encodedBits += curEncData.m_cuStat[c].totalBits;
298
+ curEncData.m_rowStat[r].sumQpRc += curEncData.m_cuStat[c].baseQp;
299
+ curEncData.m_rowStat[r].numEncodedCUs = c;
300
+ }
301
+ }
302
+ }
303
+ }
304
305
/* If encoding with ABR, update update bits and complexity in rate control
306
* after a number of rows so the next frame's rateControlStart has more
307
308
}
309
}
310
311
- tld.analysis.m_param = NULL;
312
curRow.busy = false;
313
314
// CHECK_ME: Does it always FALSE condition?
315
316
int FrameEncoder::collectCTUStatistics(const CUData& ctu, FrameStats* log)
317
{
318
int totQP = 0;
319
- if (ctu.m_slice->m_sliceType == I_SLICE)
320
+ uint32_t depth = 0;
321
+ for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
322
{
323
- uint32_t depth = 0;
324
- for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
325
- {
326
- depth = ctu.m_cuDepth[absPartIdx];
327
-
328
- log->totalCu++;
329
- log->cntIntra[depth]++;
330
- totQP += ctu.m_qp[absPartIdx] * (ctu.m_numPartitions >> (depth * 2));
331
-
332
- if (ctu.m_predMode[absPartIdx] == MODE_NONE)
333
- {
334
- log->totalCu--;
335
- log->cntIntra[depth]--;
336
- }
337
- else if (ctu.m_partSize[absPartIdx] != SIZE_2Nx2N)
338
- {
339
- /* TODO: log intra modes at absPartIdx +0 to +3 */
340
- X265_CHECK(ctu.m_log2CUSize[absPartIdx] == 3 && ctu.m_slice->m_sps->quadtreeTULog2MinSize < 3, "Intra NxN found at improbable depth\n");
341
- log->cntIntraNxN++;
342
- log->cntIntra[depth]--;
343
- }
344
- else if (ctu.m_lumaIntraDir[absPartIdx] > 1)
345
- log->cuIntraDistribution[depth][ANGULAR_MODE_ID]++;
346
- else
347
- log->cuIntraDistribution[depth][ctu.m_lumaIntraDir[absPartIdx]]++;
348
- }
349
+ depth = ctu.m_cuDepth[absPartIdx];
350
+ totQP += ctu.m_qp[absPartIdx] * (ctu.m_numPartitions >> (depth * 2));
351
}
352
- else
353
+
354
+ if (m_param->csvLogLevel >= 1 || m_param->rc.bStatWrite)
355
{
356
- uint32_t depth = 0;
357
- for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
358
+ if (ctu.m_slice->m_sliceType == I_SLICE)
359
{
360
- depth = ctu.m_cuDepth[absPartIdx];
361
-
362
- log->totalCu++;
363
- totQP += ctu.m_qp[absPartIdx] * (ctu.m_numPartitions >> (depth * 2));
364
-
365
- if (ctu.m_predMode[absPartIdx] == MODE_NONE)
366
- log->totalCu--;
367
- else if (ctu.isSkipped(absPartIdx))
368
+ depth = 0;
369
+ for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
370
{
371
- if (ctu.m_mergeFlag[0])
372
- log->cntMergeCu[depth]++;
373
- else
374
- log->cntSkipCu[depth]++;
375
- }
376
- else if (ctu.isInter(absPartIdx))
377
- {
378
- log->cntInter[depth]++;
379
+ depth = ctu.m_cuDepth[absPartIdx];
380
381
- if (ctu.m_partSize[absPartIdx] < AMP_ID)
382
- log->cuInterDistribution[depth][ctu.m_partSize[absPartIdx]]++;
383
- else
384
- log->cuInterDistribution[depth][AMP_ID]++;
385
- }
386
- else if (ctu.isIntra(absPartIdx))
387
- {
388
+ log->totalCu++;
389
log->cntIntra[depth]++;
390
391
- if (ctu.m_partSize[absPartIdx] != SIZE_2Nx2N)
392
+ if (ctu.m_predMode[absPartIdx] == MODE_NONE)
393
+ {
394
+ log->totalCu--;
395
+ log->cntIntra[depth]--;
396
+ }
397
+ else if (ctu.m_partSize[absPartIdx] != SIZE_2Nx2N)
398
{
399
+ /* TODO: log intra modes at absPartIdx +0 to +3 */
400
X265_CHECK(ctu.m_log2CUSize[absPartIdx] == 3 && ctu.m_slice->m_sps->quadtreeTULog2MinSize < 3, "Intra NxN found at improbable depth\n");
401
log->cntIntraNxN++;
402
log->cntIntra[depth]--;
403
- /* TODO: log intra modes at absPartIdx +0 to +3 */
404
}
405
else if (ctu.m_lumaIntraDir[absPartIdx] > 1)
406
log->cuIntraDistribution[depth][ANGULAR_MODE_ID]++;
407
408
log->cuIntraDistribution[depth][ctu.m_lumaIntraDir[absPartIdx]]++;
409
}
410
}
411
+ else
412
+ {
413
+ depth = 0;
414
+ for (uint32_t absPartIdx = 0; absPartIdx < ctu.m_numPartitions; absPartIdx += ctu.m_numPartitions >> (depth * 2))
415
+ {
416
+ depth = ctu.m_cuDepth[absPartIdx];
417
+
418
+ log->totalCu++;
419
+
420
+ if (ctu.m_predMode[absPartIdx] == MODE_NONE)
421
+ log->totalCu--;
422
+ else if (ctu.isSkipped(absPartIdx))
423
+ {
424
+ if (ctu.m_mergeFlag[0])
425
+ log->cntMergeCu[depth]++;
426
+ else
427
+ log->cntSkipCu[depth]++;
428
+ }
429
+ else if (ctu.isInter(absPartIdx))
430
+ {
431
+ log->cntInter[depth]++;
432
+
433
+ if (ctu.m_partSize[absPartIdx] < AMP_ID)
434
+ log->cuInterDistribution[depth][ctu.m_partSize[absPartIdx]]++;
435
+ else
436
+ log->cuInterDistribution[depth][AMP_ID]++;
437
+ }
438
+ else if (ctu.isIntra(absPartIdx))
439
+ {
440
+ log->cntIntra[depth]++;
441
+
442
+ if (ctu.m_partSize[absPartIdx] != SIZE_2Nx2N)
443
+ {
444
+ X265_CHECK(ctu.m_log2CUSize[absPartIdx] == 3 && ctu.m_slice->m_sps->quadtreeTULog2MinSize < 3, "Intra NxN found at improbable depth\n");
445
+ log->cntIntraNxN++;
446
+ log->cntIntra[depth]--;
447
+ /* TODO: log intra modes at absPartIdx +0 to +3 */
448
+ }
449
+ else if (ctu.m_lumaIntraDir[absPartIdx] > 1)
450
+ log->cuIntraDistribution[depth][ANGULAR_MODE_ID]++;
451
+ else
452
+ log->cuIntraDistribution[depth][ctu.m_lumaIntraDir[absPartIdx]]++;
453
+ }
454
+ }
455
+ }
456
}
457
458
return totQP;
459
x265_2.4.tar.gz/source/encoder/framefilter.cpp -> x265_2.5.tar.gz/source/encoder/framefilter.cpp
Changed
351
1
2
static uint64_t computeSSD(pixel *fenc, pixel *rec, intptr_t stride, uint32_t width, uint32_t height);
3
static float calculateSSIM(pixel *pix1, intptr_t stride1, pixel *pix2, intptr_t stride2, uint32_t width, uint32_t height, void *buf, uint32_t& cnt);
4
5
-static void integral_init4h(uint32_t *sum, pixel *pix, intptr_t stride)
6
+namespace X265_NS
7
{
8
- int32_t v = pix[0] + pix[1] + pix[2] + pix[3];
9
- for (int16_t x = 0; x < stride - 4; x++)
10
+ static void integral_init4h_c(uint32_t *sum, pixel *pix, intptr_t stride)
11
{
12
- sum[x] = v + sum[x - stride];
13
- v += pix[x + 4] - pix[x];
14
+ int32_t v = pix[0] + pix[1] + pix[2] + pix[3];
15
+ for (int16_t x = 0; x < stride - 4; x++)
16
+ {
17
+ sum[x] = v + sum[x - stride];
18
+ v += pix[x + 4] - pix[x];
19
+ }
20
}
21
-}
22
23
-static void integral_init8h(uint32_t *sum, pixel *pix, intptr_t stride)
24
-{
25
- int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7];
26
- for (int16_t x = 0; x < stride - 8; x++)
27
+ static void integral_init8h_c(uint32_t *sum, pixel *pix, intptr_t stride)
28
{
29
- sum[x] = v + sum[x - stride];
30
- v += pix[x + 8] - pix[x];
31
+ int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7];
32
+ for (int16_t x = 0; x < stride - 8; x++)
33
+ {
34
+ sum[x] = v + sum[x - stride];
35
+ v += pix[x + 8] - pix[x];
36
+ }
37
}
38
-}
39
40
-static void integral_init12h(uint32_t *sum, pixel *pix, intptr_t stride)
41
-{
42
- int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
43
- pix[8] + pix[9] + pix[10] + pix[11];
44
- for (int16_t x = 0; x < stride - 12; x++)
45
+ static void integral_init12h_c(uint32_t *sum, pixel *pix, intptr_t stride)
46
{
47
- sum[x] = v + sum[x - stride];
48
- v += pix[x + 12] - pix[x];
49
+ int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
50
+ pix[8] + pix[9] + pix[10] + pix[11];
51
+ for (int16_t x = 0; x < stride - 12; x++)
52
+ {
53
+ sum[x] = v + sum[x - stride];
54
+ v += pix[x + 12] - pix[x];
55
+ }
56
}
57
-}
58
59
-static void integral_init16h(uint32_t *sum, pixel *pix, intptr_t stride)
60
-{
61
- int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
62
- pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15];
63
- for (int16_t x = 0; x < stride - 16; x++)
64
+ static void integral_init16h_c(uint32_t *sum, pixel *pix, intptr_t stride)
65
{
66
- sum[x] = v + sum[x - stride];
67
- v += pix[x + 16] - pix[x];
68
+ int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
69
+ pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15];
70
+ for (int16_t x = 0; x < stride - 16; x++)
71
+ {
72
+ sum[x] = v + sum[x - stride];
73
+ v += pix[x + 16] - pix[x];
74
+ }
75
}
76
-}
77
78
-static void integral_init24h(uint32_t *sum, pixel *pix, intptr_t stride)
79
-{
80
- int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
81
- pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
82
- pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23];
83
- for (int16_t x = 0; x < stride - 24; x++)
84
+ static void integral_init24h_c(uint32_t *sum, pixel *pix, intptr_t stride)
85
{
86
- sum[x] = v + sum[x - stride];
87
- v += pix[x + 24] - pix[x];
88
+ int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
89
+ pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
90
+ pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23];
91
+ for (int16_t x = 0; x < stride - 24; x++)
92
+ {
93
+ sum[x] = v + sum[x - stride];
94
+ v += pix[x + 24] - pix[x];
95
+ }
96
}
97
-}
98
99
-static void integral_init32h(uint32_t *sum, pixel *pix, intptr_t stride)
100
-{
101
- int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
102
- pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
103
- pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23] +
104
- pix[24] + pix[25] + pix[26] + pix[27] + pix[28] + pix[29] + pix[30] + pix[31];
105
- for (int16_t x = 0; x < stride - 32; x++)
106
+ static void integral_init32h_c(uint32_t *sum, pixel *pix, intptr_t stride)
107
{
108
- sum[x] = v + sum[x - stride];
109
- v += pix[x + 32] - pix[x];
110
+ int32_t v = pix[0] + pix[1] + pix[2] + pix[3] + pix[4] + pix[5] + pix[6] + pix[7] +
111
+ pix[8] + pix[9] + pix[10] + pix[11] + pix[12] + pix[13] + pix[14] + pix[15] +
112
+ pix[16] + pix[17] + pix[18] + pix[19] + pix[20] + pix[21] + pix[22] + pix[23] +
113
+ pix[24] + pix[25] + pix[26] + pix[27] + pix[28] + pix[29] + pix[30] + pix[31];
114
+ for (int16_t x = 0; x < stride - 32; x++)
115
+ {
116
+ sum[x] = v + sum[x - stride];
117
+ v += pix[x + 32] - pix[x];
118
+ }
119
}
120
-}
121
122
-static void integral_init4v(uint32_t *sum4, intptr_t stride)
123
-{
124
- for (int x = 0; x < stride; x++)
125
- sum4[x] = sum4[x + 4 * stride] - sum4[x];
126
-}
127
+ static void integral_init4v_c(uint32_t *sum4, intptr_t stride)
128
+ {
129
+ for (int x = 0; x < stride; x++)
130
+ sum4[x] = sum4[x + 4 * stride] - sum4[x];
131
+ }
132
133
-static void integral_init8v(uint32_t *sum8, intptr_t stride)
134
-{
135
- for (int x = 0; x < stride; x++)
136
- sum8[x] = sum8[x + 8 * stride] - sum8[x];
137
-}
138
+ static void integral_init8v_c(uint32_t *sum8, intptr_t stride)
139
+ {
140
+ for (int x = 0; x < stride; x++)
141
+ sum8[x] = sum8[x + 8 * stride] - sum8[x];
142
+ }
143
144
-static void integral_init12v(uint32_t *sum12, intptr_t stride)
145
-{
146
- for (int x = 0; x < stride; x++)
147
- sum12[x] = sum12[x + 12 * stride] - sum12[x];
148
-}
149
+ static void integral_init12v_c(uint32_t *sum12, intptr_t stride)
150
+ {
151
+ for (int x = 0; x < stride; x++)
152
+ sum12[x] = sum12[x + 12 * stride] - sum12[x];
153
+ }
154
155
-static void integral_init16v(uint32_t *sum16, intptr_t stride)
156
-{
157
- for (int x = 0; x < stride; x++)
158
- sum16[x] = sum16[x + 16 * stride] - sum16[x];
159
-}
160
+ static void integral_init16v_c(uint32_t *sum16, intptr_t stride)
161
+ {
162
+ for (int x = 0; x < stride; x++)
163
+ sum16[x] = sum16[x + 16 * stride] - sum16[x];
164
+ }
165
166
-static void integral_init24v(uint32_t *sum24, intptr_t stride)
167
-{
168
- for (int x = 0; x < stride; x++)
169
- sum24[x] = sum24[x + 24 * stride] - sum24[x];
170
-}
171
+ static void integral_init24v_c(uint32_t *sum24, intptr_t stride)
172
+ {
173
+ for (int x = 0; x < stride; x++)
174
+ sum24[x] = sum24[x + 24 * stride] - sum24[x];
175
+ }
176
177
-static void integral_init32v(uint32_t *sum32, intptr_t stride)
178
-{
179
- for (int x = 0; x < stride; x++)
180
- sum32[x] = sum32[x + 32 * stride] - sum32[x];
181
+ static void integral_init32v_c(uint32_t *sum32, intptr_t stride)
182
+ {
183
+ for (int x = 0; x < stride; x++)
184
+ sum32[x] = sum32[x + 32 * stride] - sum32[x];
185
+ }
186
+
187
+ void setupSeaIntegralPrimitives_c(EncoderPrimitives &p)
188
+ {
189
+ p.integral_initv[INTEGRAL_4] = integral_init4v_c;
190
+ p.integral_initv[INTEGRAL_8] = integral_init8v_c;
191
+ p.integral_initv[INTEGRAL_12] = integral_init12v_c;
192
+ p.integral_initv[INTEGRAL_16] = integral_init16v_c;
193
+ p.integral_initv[INTEGRAL_24] = integral_init24v_c;
194
+ p.integral_initv[INTEGRAL_32] = integral_init32v_c;
195
+ p.integral_inith[INTEGRAL_4] = integral_init4h_c;
196
+ p.integral_inith[INTEGRAL_8] = integral_init8h_c;
197
+ p.integral_inith[INTEGRAL_12] = integral_init12h_c;
198
+ p.integral_inith[INTEGRAL_16] = integral_init16h_c;
199
+ p.integral_inith[INTEGRAL_24] = integral_init24h_c;
200
+ p.integral_inith[INTEGRAL_32] = integral_init32h_c;
201
+ }
202
}
203
204
void FrameFilter::destroy()
205
206
m_pad[0] = top->m_sps.conformanceWindow.rightOffset;
207
m_pad[1] = top->m_sps.conformanceWindow.bottomOffset;
208
m_saoRowDelay = m_param->bEnableLoopFilter ? 1 : 0;
209
- m_lastHeight = (m_param->sourceHeight % g_maxCUSize) ? (m_param->sourceHeight % g_maxCUSize) : g_maxCUSize;
210
- m_lastWidth = (m_param->sourceWidth % g_maxCUSize) ? (m_param->sourceWidth % g_maxCUSize) : g_maxCUSize;
211
+ m_lastHeight = (m_param->sourceHeight % m_param->maxCUSize) ? (m_param->sourceHeight % m_param->maxCUSize) : m_param->maxCUSize;
212
+ m_lastWidth = (m_param->sourceWidth % m_param->maxCUSize) ? (m_param->sourceWidth % m_param->maxCUSize) : m_param->maxCUSize;
213
integralCompleted.set(0);
214
215
if (m_param->bEnableSsim)
216
217
for(int row = 0; row < numRows; row++)
218
{
219
// Setting maximum bound information
220
- m_parallelFilter[row].m_rowHeight = (row == numRows - 1) ? m_lastHeight : g_maxCUSize;
221
+ m_parallelFilter[row].m_rowHeight = (row == numRows - 1) ? m_lastHeight : m_param->maxCUSize;
222
m_parallelFilter[row].m_row = row;
223
m_parallelFilter[row].m_rowAddr = row * numCols;
224
m_parallelFilter[row].m_frameFilter = this;
225
226
void FrameFilter::ParallelFilter::copySaoAboveRef(const CUData *ctu, PicYuv* reconPic, uint32_t cuAddr, int col)
227
{
228
// Copy SAO Top Reference Pixels
229
- int ctuWidth = g_maxCUSize;
230
+ int ctuWidth = ctu->m_encData->m_param->maxCUSize;
231
const pixel* recY = reconPic->getPlaneAddr(0, cuAddr) - (ctu->m_bFirstRowInSlice ? 0 : reconPic->m_stride);
232
233
// Luma
234
235
intptr_t stride2 = m_frame->m_fencPic->m_stride;
236
uint32_t bEnd = ((row) == (this->m_numRows - 1));
237
uint32_t bStart = (row == 0);
238
- uint32_t minPixY = row * g_maxCUSize - 4 * !bStart;
239
- uint32_t maxPixY = X265_MIN((row + 1) * g_maxCUSize - 4 * !bEnd, (uint32_t)m_param->sourceHeight);
240
+ uint32_t minPixY = row * m_param->maxCUSize - 4 * !bStart;
241
+ uint32_t maxPixY = X265_MIN((row + 1) * m_param->maxCUSize - 4 * !bEnd, (uint32_t)m_param->sourceHeight);
242
uint32_t ssim_cnt;
243
x265_emms();
244
245
246
uint32_t width = reconPic->m_picWidth;
247
uint32_t height = m_parallelFilter[row].getCUHeight();
248
intptr_t stride = reconPic->m_stride;
249
- uint32_t cuHeight = g_maxCUSize;
250
+ uint32_t cuHeight = m_param->maxCUSize;
251
252
if (!row)
253
m_frameEncoder->m_checksum[0] = 0;
254
255
}
256
257
int stride = (int)m_frame->m_reconPic->m_stride;
258
- int padX = g_maxCUSize + 32;
259
- int padY = g_maxCUSize + 16;
260
+ int padX = m_param->maxCUSize + 32;
261
+ int padY = m_param->maxCUSize + 16;
262
int numCuInHeight = m_frame->m_encData->m_slice->m_sps->numCuInHeight;
263
- int maxHeight = numCuInHeight * g_maxCUSize;
264
+ int maxHeight = numCuInHeight * m_param->maxCUSize;
265
int startRow = 0;
266
267
if (m_param->interlaceMode)
268
- startRow = (row * g_maxCUSize >> 1);
269
+ startRow = (row * m_param->maxCUSize >> 1);
270
else
271
- startRow = row * g_maxCUSize;
272
+ startRow = row * m_param->maxCUSize;
273
274
- int height = lastRow ? (maxHeight + g_maxCUSize * m_param->interlaceMode) : (((row + m_param->interlaceMode) * g_maxCUSize) + g_maxCUSize);
275
+ int height = lastRow ? (maxHeight + m_param->maxCUSize * m_param->interlaceMode) : (((row + m_param->interlaceMode) * m_param->maxCUSize) + m_param->maxCUSize);
276
277
if (!row)
278
{
279
280
uint32_t *sum4x4 = m_frame->m_encData->m_meIntegral[11] + (y + 1) * stride - padX;
281
282
/*For width = 32 */
283
- integral_init32h(sum32x32, pix, stride);
284
+ primitives.integral_inith[INTEGRAL_32](sum32x32, pix, stride);
285
if (y >= 32 - padY)
286
- integral_init32v(sum32x32 - 32 * stride, stride);
287
- integral_init32h(sum32x24, pix, stride);
288
+ primitives.integral_initv[INTEGRAL_32](sum32x32 - 32 * stride, stride);
289
+ primitives.integral_inith[INTEGRAL_32](sum32x24, pix, stride);
290
if (y >= 24 - padY)
291
- integral_init24v(sum32x24 - 24 * stride, stride);
292
- integral_init32h(sum32x8, pix, stride);
293
+ primitives.integral_initv[INTEGRAL_24](sum32x24 - 24 * stride, stride);
294
+ primitives.integral_inith[INTEGRAL_32](sum32x8, pix, stride);
295
if (y >= 8 - padY)
296
- integral_init8v(sum32x8 - 8 * stride, stride);
297
+ primitives.integral_initv[INTEGRAL_8](sum32x8 - 8 * stride, stride);
298
/*For width = 24 */
299
- integral_init24h(sum24x32, pix, stride);
300
+ primitives.integral_inith[INTEGRAL_24](sum24x32, pix, stride);
301
if (y >= 32 - padY)
302
- integral_init32v(sum24x32 - 32 * stride, stride);
303
+ primitives.integral_initv[INTEGRAL_32](sum24x32 - 32 * stride, stride);
304
/*For width = 16 */
305
- integral_init16h(sum16x16, pix, stride);
306
+ primitives.integral_inith[INTEGRAL_16](sum16x16, pix, stride);
307
if (y >= 16 - padY)
308
- integral_init16v(sum16x16 - 16 * stride, stride);
309
- integral_init16h(sum16x12, pix, stride);
310
+ primitives.integral_initv[INTEGRAL_16](sum16x16 - 16 * stride, stride);
311
+ primitives.integral_inith[INTEGRAL_16](sum16x12, pix, stride);
312
if (y >= 12 - padY)
313
- integral_init12v(sum16x12 - 12 * stride, stride);
314
- integral_init16h(sum16x4, pix, stride);
315
+ primitives.integral_initv[INTEGRAL_12](sum16x12 - 12 * stride, stride);
316
+ primitives.integral_inith[INTEGRAL_16](sum16x4, pix, stride);
317
if (y >= 4 - padY)
318
- integral_init4v(sum16x4 - 4 * stride, stride);
319
+ primitives.integral_initv[INTEGRAL_4](sum16x4 - 4 * stride, stride);
320
/*For width = 12 */
321
- integral_init12h(sum12x16, pix, stride);
322
+ primitives.integral_inith[INTEGRAL_12](sum12x16, pix, stride);
323
if (y >= 16 - padY)
324
- integral_init16v(sum12x16 - 16 * stride, stride);
325
+ primitives.integral_initv[INTEGRAL_16](sum12x16 - 16 * stride, stride);
326
/*For width = 8 */
327
- integral_init8h(sum8x32, pix, stride);
328
+ primitives.integral_inith[INTEGRAL_8](sum8x32, pix, stride);
329
if (y >= 32 - padY)
330
- integral_init32v(sum8x32 - 32 * stride, stride);
331
- integral_init8h(sum8x8, pix, stride);
332
+ primitives.integral_initv[INTEGRAL_32](sum8x32 - 32 * stride, stride);
333
+ primitives.integral_inith[INTEGRAL_8](sum8x8, pix, stride);
334
if (y >= 8 - padY)
335
- integral_init8v(sum8x8 - 8 * stride, stride);
336
+ primitives.integral_initv[INTEGRAL_8](sum8x8 - 8 * stride, stride);
337
/*For width = 4 */
338
- integral_init4h(sum4x16, pix, stride);
339
+ primitives.integral_inith[INTEGRAL_4](sum4x16, pix, stride);
340
if (y >= 16 - padY)
341
- integral_init16v(sum4x16 - 16 * stride, stride);
342
- integral_init4h(sum4x4, pix, stride);
343
+ primitives.integral_initv[INTEGRAL_16](sum4x16 - 16 * stride, stride);
344
+ primitives.integral_inith[INTEGRAL_4](sum4x4, pix, stride);
345
if (y >= 4 - padY)
346
- integral_init4v(sum4x4 - 4 * stride, stride);
347
+ primitives.integral_initv[INTEGRAL_4](sum4x4 - 4 * stride, stride);
348
}
349
m_parallelFilter[row].m_frameFilter->integralCompleted.set(1);
350
}
351
x265_2.4.tar.gz/source/encoder/framefilter.h -> x265_2.5.tar.gz/source/encoder/framefilter.h
Changed
10
1
2
3
uint32_t getCUWidth(int colNum) const
4
{
5
- return (colNum == (int)m_numCols - 1) ? m_lastWidth : g_maxCUSize;
6
+ return (colNum == (int)m_numCols - 1) ? m_lastWidth : m_param->maxCUSize;
7
}
8
9
void init(Encoder *top, FrameEncoder *frame, int numRows, uint32_t numCols);
10
x265_2.4.tar.gz/source/encoder/motion.cpp -> x265_2.5.tar.gz/source/encoder/motion.cpp
Changed
158
1
2
}
3
}
4
5
+void MotionEstimate::refineMV(ReferencePlanes* ref,
6
+ const MV& mvmin,
7
+ const MV& mvmax,
8
+ const MV& qmvp,
9
+ MV& outQMv)
10
+{
11
+ ALIGN_VAR_16(int, costs[16]);
12
+ if (ctuAddr >= 0)
13
+ blockOffset = ref->reconPic->getLumaAddr(ctuAddr, absPartIdx) - ref->reconPic->getLumaAddr(0);
14
+ intptr_t stride = ref->lumaStride;
15
+ pixel* fenc = fencPUYuv.m_buf[0];
16
+ pixel* fref = ref->fpelPlane[0] + blockOffset;
17
+
18
+ setMVP(qmvp);
19
+
20
+ MV qmvmin = mvmin.toQPel();
21
+ MV qmvmax = mvmax.toQPel();
22
+
23
+ /* The term cost used here means satd/sad values for that particular search.
24
+ * The costs used in ME integer search only includes the SAD cost of motion
25
+ * residual and sqrtLambda times MVD bits. The subpel refine steps use SATD
26
+ * cost of residual and sqrtLambda * MVD bits.
27
+ */
28
+
29
+ // measure SATD cost at clipped QPEL MVP
30
+ MV pmv = qmvp.clipped(qmvmin, qmvmax);
31
+ MV bestpre = pmv;
32
+ int bprecost;
33
+
34
+ bprecost = subpelCompare(ref, pmv, sad);
35
+
36
+ /* re-measure full pel rounded MVP with SAD as search start point */
37
+ MV bmv = pmv.roundToFPel();
38
+ int bcost = bprecost;
39
+ if (pmv.isSubpel())
40
+ bcost = sad(fenc, FENC_STRIDE, fref + bmv.x + bmv.y * stride, stride) + mvcost(bmv << 2);
41
+
42
+ /* square refine */
43
+ int dir = 0;
44
+ COST_MV_X4_DIR(0, -1, 0, 1, -1, 0, 1, 0, costs);
45
+ if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
46
+ COPY2_IF_LT(bcost, costs[0], dir, 1);
47
+ if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
48
+ COPY2_IF_LT(bcost, costs[1], dir, 2);
49
+ COPY2_IF_LT(bcost, costs[2], dir, 3);
50
+ COPY2_IF_LT(bcost, costs[3], dir, 4);
51
+ COST_MV_X4_DIR(-1, -1, -1, 1, 1, -1, 1, 1, costs);
52
+ if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
53
+ COPY2_IF_LT(bcost, costs[0], dir, 5);
54
+ if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
55
+ COPY2_IF_LT(bcost, costs[1], dir, 6);
56
+ if ((bmv.y - 1 >= mvmin.y) & (bmv.y - 1 <= mvmax.y))
57
+ COPY2_IF_LT(bcost, costs[2], dir, 7);
58
+ if ((bmv.y + 1 >= mvmin.y) & (bmv.y + 1 <= mvmax.y))
59
+ COPY2_IF_LT(bcost, costs[3], dir, 8);
60
+ bmv += square1[dir];
61
+
62
+ if (bprecost < bcost)
63
+ {
64
+ bmv = bestpre;
65
+ bcost = bprecost;
66
+ }
67
+ else
68
+ bmv = bmv.toQPel(); // promote search bmv to qpel
69
+
70
+ // TO DO: Change SubpelWorkload to fine tune MV
71
+ // Now it is set to 5 for experiment.
72
+ // const SubpelWorkload& wl = workload[this->subpelRefine];
73
+ const SubpelWorkload& wl = workload[5];
74
+
75
+ pixelcmp_t hpelcomp;
76
+
77
+ if (wl.hpel_satd)
78
+ {
79
+ bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
80
+ hpelcomp = satd;
81
+ }
82
+ else
83
+ hpelcomp = sad;
84
+
85
+ for (int iter = 0; iter < wl.hpel_iters; iter++)
86
+ {
87
+ int bdir = 0;
88
+ for (int i = 1; i <= wl.hpel_dirs; i++)
89
+ {
90
+ MV qmv = bmv + square1[i] * 2;
91
+
92
+ // check mv range for slice bound
93
+ if ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y))
94
+ continue;
95
+
96
+ int cost = subpelCompare(ref, qmv, hpelcomp) + mvcost(qmv);
97
+ COPY2_IF_LT(bcost, cost, bdir, i);
98
+ }
99
+
100
+ if (bdir)
101
+ bmv += square1[bdir] * 2;
102
+ else
103
+ break;
104
+ }
105
+
106
+ /* if HPEL search used SAD, remeasure with SATD before QPEL */
107
+ if (!wl.hpel_satd)
108
+ bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
109
+
110
+ for (int iter = 0; iter < wl.qpel_iters; iter++)
111
+ {
112
+ int bdir = 0;
113
+ for (int i = 1; i <= wl.qpel_dirs; i++)
114
+ {
115
+ MV qmv = bmv + square1[i];
116
+
117
+ // check mv range for slice bound
118
+ if ((qmv.y < qmvmin.y) | (qmv.y > qmvmax.y))
119
+ continue;
120
+
121
+ int cost = subpelCompare(ref, qmv, satd) + mvcost(qmv);
122
+ COPY2_IF_LT(bcost, cost, bdir, i);
123
+ }
124
+
125
+ if (bdir)
126
+ bmv += square1[bdir];
127
+ else
128
+ break;
129
+ }
130
+
131
+ // check mv range for slice bound
132
+ X265_CHECK(((pmv.y >= qmvmin.y) & (pmv.y <= qmvmax.y)), "mv beyond range!");
133
+
134
+ x265_emms();
135
+ outQMv = bmv;
136
+}
137
+
138
int MotionEstimate::motionEstimate(ReferencePlanes *ref,
139
const MV & mvmin,
140
const MV & mvmax,
141
142
const MV * mvc,
143
int merange,
144
MV & outQMv,
145
+ uint32_t maxSlices,
146
pixel * srcReferencePlane)
147
{
148
ALIGN_VAR_16(int, costs[16]);
149
150
const SubpelWorkload& wl = workload[this->subpelRefine];
151
152
// check mv range for slice bound
153
- if ((g_maxSlices > 1) & ((bmv.y < qmvmin.y) | (bmv.y > qmvmax.y)))
154
+ if ((maxSlices > 1) & ((bmv.y < qmvmin.y) | (bmv.y > qmvmax.y)))
155
{
156
bmv.y = x265_min(x265_max(bmv.y, qmvmin.y), qmvmax.y);
157
bcost = subpelCompare(ref, bmv, satd) + mvcost(bmv);
158
x265_2.4.tar.gz/source/encoder/motion.h -> x265_2.5.tar.gz/source/encoder/motion.h
Changed
11
1
2
chromaSatd(refYuv.getCrAddr(puPartIdx), refYuv.m_csize, fencPUYuv.m_buf[2], fencPUYuv.m_csize);
3
}
4
5
- int motionEstimate(ReferencePlanes* ref, const MV & mvmin, const MV & mvmax, const MV & qmvp, int numCandidates, const MV * mvc, int merange, MV & outQMv, pixel *srcReferencePlane = 0);
6
+ void refineMV(ReferencePlanes* ref, const MV& mvmin, const MV& mvmax, const MV& qmvp, MV& outQMv);
7
+ int motionEstimate(ReferencePlanes* ref, const MV & mvmin, const MV & mvmax, const MV & qmvp, int numCandidates, const MV * mvc, int merange, MV & outQMv, uint32_t maxSlices, pixel *srcReferencePlane = 0);
8
9
int subpelCompare(ReferencePlanes* ref, const MV &qmv, pixelcmp_t);
10
11
x265_2.4.tar.gz/source/encoder/ratecontrol.cpp -> x265_2.5.tar.gz/source/encoder/ratecontrol.cpp
Changed
52
1
2
uint32_t refRowSatdCost = 0, refRowBits = 0, intraCostForPendingCus = 0;
3
double refQScale = 0;
4
5
- if (picType != I_SLICE)
6
+ if (picType != I_SLICE && !m_param->rc.bEnableConstVbv)
7
{
8
FrameData& refEncData = *refFrame->m_encData;
9
uint32_t endCuAddr = maxCols * (row + 1);
10
11
&& refFrame
12
&& refFrame->m_encData->m_slice->m_sliceType == picType
13
&& refQScale > 0
14
- && refRowSatdCost > 0)
15
+ && refRowBits > 0
16
+ && !m_param->rc.bEnableConstVbv)
17
{
18
if (abs((int32_t)(refRowSatdCost - satdCostForPendingCus)) < (int32_t)satdCostForPendingCus / 2)
19
{
20
21
}
22
rowSatdCost >>= X265_DEPTH - 8;
23
updatePredictor(rce->rowPred[0], qScaleVbv, (double)rowSatdCost, encodedBits);
24
- if (curEncData.m_slice->m_sliceType != I_SLICE)
25
+ if (curEncData.m_slice->m_sliceType != I_SLICE && !m_param->rc.bEnableConstVbv)
26
{
27
Frame* refFrame = curEncData.m_slice->m_refFrameList[0][0];
28
if (qpVbv < refFrame->m_encData->m_rowStat[row].rowQp)
29
30
for (uint32_t i = 0; i < slice->m_sps->numCuInHeight; i++)
31
avgQpAq += curEncData.m_rowStat[i].sumQpAq;
32
33
- avgQpAq /= (slice->m_sps->numCUsInFrame * NUM_4x4_PARTITIONS);
34
+ avgQpAq /= (slice->m_sps->numCUsInFrame * m_param->num4x4Partitions);
35
curEncData.m_avgQpAq = avgQpAq;
36
}
37
else
38
39
{
40
*filler = updateVbv(actualBits, rce);
41
42
+ curFrame->m_rcData->bufferFillFinal = m_bufferFillFinal;
43
+ for (int i = 0; i < 4; i++)
44
+ {
45
+ curFrame->m_rcData->coeff[i] = m_pred[i].coeff;
46
+ curFrame->m_rcData->count[i] = m_pred[i].count;
47
+ curFrame->m_rcData->offset[i] = m_pred[i].offset;
48
+ }
49
if (m_param->bEmitHRDSEI)
50
{
51
const VUI *vui = &curEncData.m_slice->m_sps->vuiParameters;
52
x265_2.4.tar.gz/source/encoder/reference.cpp -> x265_2.5.tar.gz/source/encoder/reference.cpp
Changed
36
1
2
3
if (wp)
4
{
5
- uint32_t numCUinHeight = (reconPic->m_picHeight + g_maxCUSize - 1) / g_maxCUSize;
6
+ uint32_t numCUinHeight = (reconPic->m_picHeight + p.maxCUSize - 1) / p.maxCUSize;
7
8
int marginX = reconPic->m_lumaMarginX;
9
int marginY = reconPic->m_lumaMarginY;
10
intptr_t stride = reconPic->m_stride;
11
- int cuHeight = g_maxCUSize;
12
+ int cuHeight = p.maxCUSize;
13
14
for (int c = 0; c < (p.internalCsp != X265_CSP_I400 && recPic->m_picCsp != X265_CSP_I400 ? numInterpPlanes : 1); c++)
15
{
16
17
int marginY = reconPic->m_lumaMarginY;
18
intptr_t stride = reconPic->m_stride;
19
int width = reconPic->m_picWidth;
20
- int height = (finishedRows - numWeightedRows) * g_maxCUSize;
21
+ int height = (finishedRows - numWeightedRows) * reconPic->m_param->maxCUSize;
22
/* the last row may be partial height */
23
if (finishedRows == maxNumRows - 1)
24
{
25
- const int leftRows = (reconPic->m_picHeight & (g_maxCUSize - 1));
26
+ const int leftRows = (reconPic->m_picHeight & (reconPic->m_param->maxCUSize - 1));
27
28
- height += leftRows ? leftRows : g_maxCUSize;
29
+ height += leftRows ? leftRows : reconPic->m_param->maxCUSize;
30
}
31
- int cuHeight = g_maxCUSize;
32
+ int cuHeight = reconPic->m_param->maxCUSize;
33
34
for (int c = 0; c < numInterpPlanes; c++)
35
{
36
x265_2.4.tar.gz/source/encoder/sao.cpp -> x265_2.5.tar.gz/source/encoder/sao.cpp
Changed
118
1
2
m_hChromaShift = CHROMA_H_SHIFT(param->internalCsp);
3
m_vChromaShift = CHROMA_V_SHIFT(param->internalCsp);
4
5
- m_numCuInWidth = (m_param->sourceWidth + g_maxCUSize - 1) / g_maxCUSize;
6
- m_numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
7
+ m_numCuInWidth = (m_param->sourceWidth + m_param->maxCUSize - 1) / m_param->maxCUSize;
8
+ m_numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
9
10
const pixel maxY = (1 << X265_DEPTH) - 1;
11
const pixel rangeExt = maxY >> 1;
12
13
14
for (int i = 0; i < (param->internalCsp != X265_CSP_I400 ? 3 : 1); i++)
15
{
16
- CHECKED_MALLOC(m_tmpL1[i], pixel, g_maxCUSize + 1);
17
- CHECKED_MALLOC(m_tmpL2[i], pixel, g_maxCUSize + 1);
18
+ CHECKED_MALLOC(m_tmpL1[i], pixel, m_param->maxCUSize + 1);
19
+ CHECKED_MALLOC(m_tmpL2[i], pixel, m_param->maxCUSize + 1);
20
21
// SAO asm code will read 1 pixel before and after, so pad by 2
22
// NOTE: m_param->sourceWidth+2 enough, to avoid condition check in copySaoAboveRef(), I alloc more up to 63 bytes in here
23
- CHECKED_MALLOC(m_tmpU[i], pixel, m_numCuInWidth * g_maxCUSize + 2 + 32);
24
+ CHECKED_MALLOC(m_tmpU[i], pixel, m_numCuInWidth * m_param->maxCUSize + 2 + 32);
25
m_tmpU[i] += 1;
26
}
27
28
29
uint32_t picWidth = m_param->sourceWidth;
30
uint32_t picHeight = m_param->sourceHeight;
31
const CUData* cu = m_frame->m_encData->getPicCTU(addr);
32
- int ctuWidth = g_maxCUSize;
33
- int ctuHeight = g_maxCUSize;
34
+ int ctuWidth = m_param->maxCUSize;
35
+ int ctuHeight = m_param->maxCUSize;
36
uint32_t lpelx = cu->m_cuPelX;
37
uint32_t tpely = cu->m_cuPelY;
38
const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
39
40
{
41
PicYuv* reconPic = m_frame->m_reconPic;
42
intptr_t stride = reconPic->m_stride;
43
- int ctuWidth = g_maxCUSize;
44
- int ctuHeight = g_maxCUSize;
45
+ int ctuWidth = m_param->maxCUSize;
46
+ int ctuHeight = m_param->maxCUSize;
47
48
int addr = idxY * m_numCuInWidth + idxX;
49
pixel* rec = reconPic->getLumaAddr(addr);
50
51
{
52
PicYuv* reconPic = m_frame->m_reconPic;
53
intptr_t stride = reconPic->m_strideC;
54
- int ctuWidth = g_maxCUSize;
55
- int ctuHeight = g_maxCUSize;
56
+ int ctuWidth = m_param->maxCUSize;
57
+ int ctuHeight = m_param->maxCUSize;
58
59
{
60
ctuWidth >>= m_hChromaShift;
61
62
intptr_t stride = plane ? reconPic->m_strideC : reconPic->m_stride;
63
uint32_t picWidth = m_param->sourceWidth;
64
uint32_t picHeight = m_param->sourceHeight;
65
- int ctuWidth = g_maxCUSize;
66
- int ctuHeight = g_maxCUSize;
67
+ int ctuWidth = m_param->maxCUSize;
68
+ int ctuHeight = m_param->maxCUSize;
69
uint32_t lpelx = cu->m_cuPelX;
70
uint32_t tpely = cu->m_cuPelY;
71
const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
72
73
// WARNING: *) May read beyond bound on video than ctuWidth or ctuHeight is NOT multiple of cuSize
74
X265_CHECK((ctuWidth == ctuHeight) || (m_chromaFormat != X265_CSP_I420), "video size check failure\n");
75
if (plane)
76
- primitives.chroma[m_chromaFormat].cu[g_maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
77
+ primitives.chroma[m_chromaFormat].cu[m_param->maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
78
else
79
- primitives.cu[g_maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
80
+ primitives.cu[m_param->maxLog2CUSize - 2].sub_ps(diff, MAX_CU_SIZE, fenc0, rec0, stride, stride);
81
}
82
else
83
{
84
85
intptr_t stride = reconPic->m_stride;
86
uint32_t picWidth = m_param->sourceWidth;
87
uint32_t picHeight = m_param->sourceHeight;
88
- int ctuWidth = g_maxCUSize;
89
- int ctuHeight = g_maxCUSize;
90
+ int ctuWidth = m_param->maxCUSize;
91
+ int ctuHeight = m_param->maxCUSize;
92
uint32_t lpelx = cu->m_cuPelX;
93
uint32_t tpely = cu->m_cuPelY;
94
const uint32_t firstRowInSlice = cu->m_bFirstRowInSlice;
95
96
}
97
98
// Estimate Best Position
99
- int64_t bestRDCostBO = MAX_INT64;
100
int32_t bestClassBO = 0;
101
+ int64_t currentRDCost = costClasses[0];
102
+ currentRDCost += costClasses[1];
103
+ currentRDCost += costClasses[2];
104
+ currentRDCost += costClasses[3];
105
+ int64_t bestRDCostBO = currentRDCost;
106
107
- for (int i = 0; i < MAX_NUM_SAO_CLASS - SAO_NUM_OFFSET + 1; i++)
108
+ for (int i = 1; i < MAX_NUM_SAO_CLASS - SAO_NUM_OFFSET + 1; i++)
109
{
110
- int64_t currentRDCost = 0;
111
- for (int j = i; j < i + SAO_NUM_OFFSET; j++)
112
- currentRDCost += costClasses[j];
113
+ currentRDCost -= costClasses[i - 1];
114
+ currentRDCost += costClasses[i + 3];
115
116
if (currentRDCost < bestRDCostBO)
117
{
118
x265_2.4.tar.gz/source/encoder/search.cpp -> x265_2.5.tar.gz/source/encoder/search.cpp
Changed
127
1
2
CHECKED_MALLOC(m_rqt[i].coeffRQT[0], coeff_t, sizeL + sizeC * 2);
3
m_rqt[i].coeffRQT[1] = m_rqt[i].coeffRQT[0] + sizeL;
4
m_rqt[i].coeffRQT[2] = m_rqt[i].coeffRQT[0] + sizeL + sizeC;
5
- ok &= m_rqt[i].reconQtYuv.create(g_maxCUSize, param.internalCsp);
6
- ok &= m_rqt[i].resiQtYuv.create(g_maxCUSize, param.internalCsp);
7
+ ok &= m_rqt[i].reconQtYuv.create(param.maxCUSize, param.internalCsp);
8
+ ok &= m_rqt[i].resiQtYuv.create(param.maxCUSize, param.internalCsp);
9
}
10
}
11
else
12
13
{
14
CHECKED_MALLOC(m_rqt[i].coeffRQT[0], coeff_t, sizeL);
15
m_rqt[i].coeffRQT[1] = m_rqt[i].coeffRQT[2] = NULL;
16
- ok &= m_rqt[i].reconQtYuv.create(g_maxCUSize, param.internalCsp);
17
- ok &= m_rqt[i].resiQtYuv.create(g_maxCUSize, param.internalCsp);
18
+ ok &= m_rqt[i].reconQtYuv.create(param.maxCUSize, param.internalCsp);
19
+ ok &= m_rqt[i].resiQtYuv.create(param.maxCUSize, param.internalCsp);
20
}
21
}
22
23
/* the rest of these buffers are indexed per-depth */
24
- for (uint32_t i = 0; i <= g_maxCUDepth; i++)
25
+ for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
26
{
27
- int cuSize = g_maxCUSize >> i;
28
+ int cuSize = param.maxCUSize >> i;
29
ok &= m_rqt[i].tmpResiYuv.create(cuSize, param.internalCsp);
30
ok &= m_rqt[i].tmpPredYuv.create(cuSize, param.internalCsp);
31
ok &= m_rqt[i].bidirPredYuv[0].create(cuSize, param.internalCsp);
32
33
m_rqt[i].resiQtYuv.destroy();
34
}
35
36
- for (uint32_t i = 0; i <= g_maxCUDepth; i++)
37
+ for (uint32_t i = 0; i <= m_param->maxCUDepth; i++)
38
{
39
m_rqt[i].tmpResiYuv.destroy();
40
m_rqt[i].tmpPredYuv.destroy();
41
42
int mvpIdx = selectMVP(interMode.cu, pu, amvp, list, ref);
43
MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx];
44
45
- if (!m_param->analysisMode) /* Prevents load/save outputs from diverging if lowresMV is not available */
46
+ if (!m_param->analysisReuseMode) /* Prevents load/save outputs from diverging if lowresMV is not available */
47
{
48
MV lmv = getLowresMV(interMode.cu, pu, list, ref);
49
if (lmv.notZero())
50
51
52
setSearchRange(interMode.cu, mvp, m_param->searchRange, mvmin, mvmax);
53
54
- int satdCost = m_me.motionEstimate(&m_slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv,
55
+ int satdCost = m_me.motionEstimate(&m_slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices,
56
m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
57
58
/* Get total cost of partition, but only include MV bit cost once */
59
60
}
61
}
62
63
+void Search::searchMV(Mode& interMode, const PredictionUnit& pu, int list, int ref, MV& outmv)
64
+{
65
+ CUData& cu = interMode.cu;
66
+ const Slice *slice = m_slice;
67
+ MV mv = cu.m_mv[list][pu.puAbsPartIdx];
68
+ cu.clipMv(mv);
69
+ MV mvmin, mvmax;
70
+ setSearchRange(cu, mv, m_param->searchRange, mvmin, mvmax);
71
+ m_me.refineMV(&slice->m_mref[list][ref], mvmin, mvmax, mv, outmv);
72
+}
73
+
74
/* find the best inter prediction for each PU of specified mode */
75
void Search::predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChromaMC, uint32_t refMasks[2])
76
{
77
78
cu.getNeighbourMV(puIdx, pu.puAbsPartIdx, interMode.interNeighbours);
79
80
/* Uni-directional prediction */
81
- if ((m_param->analysisMode == X265_ANALYSIS_LOAD && m_param->analysisRefineLevel > 1)
82
+ if ((m_param->analysisReuseMode == X265_ANALYSIS_LOAD && m_param->analysisReuseLevel > 1 && m_param->analysisReuseLevel != 10)
83
|| (m_param->analysisMultiPassRefine && m_param->rc.bStatRead))
84
{
85
for (int list = 0; list < numPredDir; list++)
86
87
if (m_param->analysisMultiPassRefine && m_param->rc.bStatRead && mvpIdx == bestME[list].mvpIdx)
88
mvpIn = bestME[list].mv;
89
90
- int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvpIn, numMvc, mvc, m_param->searchRange, outmv,
91
+ int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvpIn, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices,
92
m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
93
94
/* Get total cost of partition, but only include MV bit cost once */
95
96
int mvpIdx = selectMVP(cu, pu, amvp, list, ref);
97
MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx];
98
99
- if (!m_param->analysisMode) /* Prevents load/save outputs from diverging when lowresMV is not available */
100
+ if (!m_param->analysisReuseMode) /* Prevents load/save outputs from diverging when lowresMV is not available */
101
{
102
MV lmv = getLowresMV(cu, pu, list, ref);
103
if (lmv.notZero())
104
105
m_me.integral[planes] = interMode.fencYuv->m_integral[list][ref][planes] + puX * pu.width + puY * pu.height * m_slice->m_refFrameList[list][ref]->m_reconPic->m_stride;
106
}
107
setSearchRange(cu, mvp, m_param->searchRange, mvmin, mvmax);
108
- int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv,
109
+ int satdCost = m_me.motionEstimate(&slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, m_param->searchRange, outmv, m_param->maxSlices,
110
m_param->bSourceReferenceEstimation ? m_slice->m_refFrameList[list][ref]->m_fencPic->getLumaAddr(0) : 0);
111
112
/* Get total cost of partition, but only include MV bit cost once */
113
114
cu.clipMv(mvmax);
115
116
if (cu.m_encData->m_param->bIntraRefresh && m_slice->m_sliceType == P_SLICE &&
117
- cu.m_cuPelX / g_maxCUSize < m_frame->m_encData->m_pir.pirStartCol &&
118
+ cu.m_cuPelX / m_param->maxCUSize < m_frame->m_encData->m_pir.pirStartCol &&
119
m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol < m_slice->m_sps->numCuInWidth)
120
{
121
int safeX, maxSafeMv;
122
- safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * g_maxCUSize - 3;
123
+ safeX = m_slice->m_refFrameList[0][0]->m_encData->m_pir.pirEndCol * m_param->maxCUSize - 3;
124
maxSafeMv = (safeX - cu.m_cuPelX) * 4;
125
mvmax.x = X265_MIN(mvmax.x, maxSafeMv);
126
mvmin.x = X265_MIN(mvmin.x, maxSafeMv);
127
x265_2.4.tar.gz/source/encoder/search.h -> x265_2.5.tar.gz/source/encoder/search.h
Changed
21
1
2
memset(this, 0, sizeof(*this));
3
}
4
5
- void accumulate(CUStats& other)
6
+ void accumulate(CUStats& other, x265_param& param)
7
{
8
- for (uint32_t i = 0; i <= g_maxCUDepth; i++)
9
+ for (uint32_t i = 0; i <= param.maxCUDepth; i++)
10
{
11
intraRDOElapsedTime[i] += other.intraRDOElapsedTime[i];
12
interRDOElapsedTime[i] += other.interRDOElapsedTime[i];
13
14
// estimation inter prediction (non-skip)
15
void predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChromaMC, uint32_t masks[2]);
16
17
+ void searchMV(Mode& interMode, const PredictionUnit& pu, int list, int ref, MV& outmv);
18
// encode residual and compute rd-cost for inter mode
19
void encodeResAndCalcRdInterCU(Mode& interMode, const CUGeom& cuGeom);
20
void encodeResAndCalcRdSkipCU(Mode& interMode);
21
x265_2.4.tar.gz/source/encoder/sei.cpp -> x265_2.5.tar.gz/source/encoder/sei.cpp
Changed
28
1
2
}
3
WRITE_CODE(type, 8, "payload_type");
4
uint32_t payloadSize;
5
- if (hrdTypes || m_payloadType == USER_DATA_UNREGISTERED)
6
+ if (hrdTypes || m_payloadType == USER_DATA_UNREGISTERED || m_payloadType == USER_DATA_REGISTERED_ITU_T_T35)
7
{
8
if (hrdTypes)
9
{
10
X265_CHECK(0 == (count.getNumberOfWrittenBits() & 7), "payload unaligned\n");
11
payloadSize = count.getNumberOfWrittenBits() >> 3;
12
}
13
- else
14
+ else if (m_payloadType == USER_DATA_UNREGISTERED)
15
payloadSize = m_payloadSize + 16;
16
+ else
17
+ payloadSize = m_payloadSize;
18
19
for (; payloadSize >= 0xff; payloadSize -= 0xff)
20
WRITE_CODE(0xff, 8, "payload_size");
21
WRITE_CODE(payloadSize, 8, "payload_size");
22
}
23
- else if(m_payloadType != USER_DATA_REGISTERED_ITU_T_T35)
24
+ else
25
WRITE_CODE(m_payloadSize, 8, "payload_size");
26
/* virtual writeSEI method, write to bs */
27
writeSEI(sps);
28
x265_2.4.tar.gz/source/encoder/sei.h -> x265_2.5.tar.gz/source/encoder/sei.h
Changed
34
1
2
m_payloadSize = 0;
3
}
4
5
- uint8_t *cim;
6
+ uint8_t *m_payload;
7
8
// daniel.vt@samsung.com :: for the Creative Intent Meta Data Encoding ( seongnam.oh@samsung.com )
9
void writeSEI(const SPS&)
10
{
11
- if (!cim)
12
+ if (!m_payload)
13
return;
14
15
- int i = 0;
16
- int payloadSize = m_payloadSize;
17
- while (cim[i] == 0xFF)
18
- {
19
- i++;
20
- payloadSize += cim[i];
21
- WRITE_CODE(0xFF, 8, "payload_size");
22
- }
23
- WRITE_CODE(payloadSize, 8, "payload_size");
24
- i++;
25
- payloadSize += i;
26
- for (; i < payloadSize; ++i)
27
- WRITE_CODE(cim[i], 8, "creative_intent_metadata");
28
+ uint32_t i = 0;
29
+ for (; i < m_payloadSize; ++i)
30
+ WRITE_CODE(m_payload[i], 8, "creative_intent_metadata");
31
}
32
};
33
}
34
x265_2.4.tar.gz/source/encoder/slicetype.cpp -> x265_2.5.tar.gz/source/encoder/slicetype.cpp
Changed
52
1
2
if (m_param->rc.cuTree && !m_param->rc.bStatRead)
3
/* update row satds based on cutree offsets */
4
curFrame->m_lowres.satdCost = frameCostRecalculate(frames, p0, p1, b);
5
- else if (m_param->analysisMode != X265_ANALYSIS_LOAD)
6
+ else if (m_param->analysisReuseMode != X265_ANALYSIS_LOAD || m_param->scaleFactor)
7
{
8
if (m_param->rc.aqMode)
9
curFrame->m_lowres.satdCost = curFrame->m_lowres.costEstAq[b - p0][p1 - b];
10
11
curFrame->m_lowres.lowresCostForRc = curFrame->m_lowres.lowresCosts[b - p0][p1 - b];
12
uint32_t lowresRow = 0, lowresCol = 0, lowresCuIdx = 0, sum = 0, intraSum = 0;
13
uint32_t scale = m_param->maxCUSize / (2 * X265_LOWRES_CU_SIZE);
14
- uint32_t numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
15
+ uint32_t numCuInHeight = (m_param->sourceHeight + m_param->maxCUSize - 1) / m_param->maxCUSize;
16
uint32_t widthInLowresCu = (uint32_t)m_8x8Width, heightInLowresCu = (uint32_t)m_8x8Height;
17
double *qp_offset = 0;
18
/* Factor in qpoffsets based on Aq/Cutree in CU costs */
19
20
m_isSceneTransition = false; /* Signal end of scene transitioning */
21
}
22
23
+ if (m_param->csvLogLevel >= 2)
24
+ {
25
+ int64_t icost = frames[p1]->costEst[0][0];
26
+ int64_t pcost = frames[p1]->costEst[p1 - p0][0];
27
+ frames[p1]->ipCostRatio = (double)icost / pcost;
28
+ }
29
+
30
/* A frame is always analysed with bRealScenecut = true first, and then bRealScenecut = false,
31
the former for I decisions and the latter for P/B decisions. It's possible that the first
32
analysis detected scenecuts which were later nulled due to scene transitioning, in which
33
34
MV *mvs = frames[b]->lowresMvs[list][listDist[list]];
35
int32_t x = mvs[cuIndex].x;
36
int32_t y = mvs[cuIndex].y;
37
- displacement += sqrt(pow(abs(x), 2) + pow(abs(y), 2));
38
+ // NOTE: the dynamic range of abs(x) and abs(y) is 15-bits
39
+ displacement += sqrt((double)(abs(x) * abs(x)) + (double)(abs(y) * abs(y)));
40
}
41
else
42
displacement += 0.0;
43
44
45
/* ME will never return a cost larger than the cost @MVP, so we do not
46
* have to check that ME cost is more than the estimated merge cost */
47
- fencCost = tld.me.motionEstimate(fref, mvmin, mvmax, mvp, 0, NULL, s_merange, *fencMV);
48
+ fencCost = tld.me.motionEstimate(fref, mvmin, mvmax, mvp, 0, NULL, s_merange, *fencMV, m_lookahead.m_param->maxSlices);
49
if (skipCost < 64 && skipCost < fencCost && bBidir)
50
{
51
fencCost = skipCost;
52
x265_2.4.tar.gz/source/test/ipfilterharness.cpp -> x265_2.5.tar.gz/source/test/ipfilterharness.cpp
Changed
13
1
2
{
3
pixel_test_buff[0][i] = rand() & PIXEL_MAX;
4
short_test_buff[0][i] = (rand() % (2 * SMAX)) - SMAX;
5
-
6
pixel_test_buff[1][i] = PIXEL_MIN;
7
- short_test_buff[1][i] = SMIN;
8
-
9
+ short_test_buff[1][i] = (int16_t)SMIN;
10
pixel_test_buff[2][i] = PIXEL_MAX;
11
short_test_buff[2][i] = SMAX;
12
}
13
x265_2.4.tar.gz/source/test/ipfilterharness.h -> x265_2.5.tar.gz/source/test/ipfilterharness.h
Changed
11
1
2
enum { ITERS = 100 };
3
enum { TEST_CASES = 3 };
4
enum { SMAX = 1 << 12 };
5
- enum { SMIN = -1 << 12 };
6
-
7
+ enum { SMIN = (unsigned)-1 << 12 };
8
ALIGN_VAR_32(pixel, pixel_buff[TEST_BUF_SIZE]);
9
int16_t short_buff[TEST_BUF_SIZE];
10
int16_t IPF_vec_output_s[TEST_BUF_SIZE];
11
x265_2.4.tar.gz/source/test/pixelharness.cpp -> x265_2.5.tar.gz/source/test/pixelharness.cpp
Changed
222
1
2
uchar_test_buff[0][i] = rand() % ((1 << 8) - 1);
3
residual_test_buff[0][i] = (rand() % (2 * RMAX + 1)) - RMAX - 1;// For sse_ss only
4
double_test_buff[0][i] = (double)(short_test_buff[0][i]) / 256.0;
5
-
6
pixel_test_buff[1][i] = PIXEL_MIN;
7
- short_test_buff[1][i] = SMIN;
8
+ short_test_buff[1][i] = (int16_t)SMIN;
9
short_test_buff1[1][i] = PIXEL_MIN;
10
short_test_buff2[1][i] = -16384;
11
int_test_buff[1][i] = SHORT_MIN;
12
13
return true;
14
}
15
16
+bool PixelHarness::check_integral_initv(integralv_t ref, integralv_t opt)
17
+{
18
+ intptr_t srcStep = 64;
19
+ int j = 0;
20
+ uint32_t dst_ref[BUFFSIZE] = { 0 };
21
+ uint32_t dst_opt[BUFFSIZE] = { 0 };
22
+
23
+ for (int i = 0; i < 64; i++)
24
+ {
25
+ dst_ref[i] = pixel_test_buff[0][i];
26
+ dst_opt[i] = pixel_test_buff[0][i];
27
+ }
28
+
29
+ for (int i = 0, k = 0; i < BUFFSIZE; i++)
30
+ {
31
+ if (i % 64 == 0)
32
+ k++;
33
+ dst_ref[i] = dst_ref[i % 64] + k;
34
+ dst_opt[i] = dst_opt[i % 64] + k;
35
+ }
36
+
37
+ int padx = 4;
38
+ int pady = 4;
39
+ uint32_t *dst_ref_ptr = dst_ref + srcStep * pady + padx;
40
+ uint32_t *dst_opt_ptr = dst_opt + srcStep * pady + padx;
41
+ for (int i = 0; i < ITERS; i++)
42
+ {
43
+ ref(dst_ref_ptr, srcStep);
44
+ checked(opt, dst_opt_ptr, srcStep);
45
+
46
+ if (memcmp(dst_ref, dst_opt, sizeof(uint32_t) * BUFFSIZE))
47
+ return false;
48
+
49
+ reportfail()
50
+ j += INCR;
51
+ }
52
+ return true;
53
+}
54
+
55
+bool PixelHarness::check_integral_inith(integralh_t ref, integralh_t opt)
56
+{
57
+ /* Since stride is always a multiple of 8 and data movement in AVX2 is 16 elements at a time for 8 bit pixel, we need
58
+ * to check correctness for two cases: stride multiple of 16 and stride not a multiple of 16; fine for High bit depth
59
+ * where data movement in AVX2 is 8 elements at a time */
60
+ intptr_t srcStep[2] = { 56, 64 };
61
+ int j = 0;
62
+ uint32_t dst_ref[BUFFSIZE] = { 0 };
63
+ uint32_t dst_opt[BUFFSIZE] = { 0 };
64
+
65
+ int padx = 4;
66
+ int pady = 4;
67
+ for (int l = 0; l < 2; l++)
68
+ {
69
+ uint32_t *dst_ref_ptr = dst_ref + srcStep[l] * pady + padx;
70
+ uint32_t *dst_opt_ptr = dst_opt + srcStep[l] * pady + padx;
71
+ for (int k = 0; k < ITERS; k++)
72
+ {
73
+ ref(dst_ref_ptr, pixel_test_buff[0], srcStep[l]);
74
+ checked(opt, dst_opt_ptr, pixel_test_buff[0], srcStep[l]);
75
+
76
+ if (memcmp(dst_ref, dst_opt, sizeof(uint32_t) * BUFFSIZE))
77
+ return false;
78
+
79
+ reportfail()
80
+ j += INCR;
81
+ }
82
+ }
83
+ return true;
84
+}
85
+
86
bool PixelHarness::testPU(int part, const EncoderPrimitives& ref, const EncoderPrimitives& opt)
87
{
88
if (opt.pu[part].satd)
89
90
}
91
}
92
93
+ for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
94
+ {
95
+ if (opt.integral_initv[k] && !check_integral_initv(ref.integral_initv[k], opt.integral_initv[k]))
96
+ {
97
+ switch (k)
98
+ {
99
+ case 0:
100
+ printf("Integral4v failed!\n");
101
+ break;
102
+ case 1:
103
+ printf("Integral8v failed!\n");
104
+ break;
105
+ case 2:
106
+ printf("Integral12v failed!\n");
107
+ break;
108
+ case 3:
109
+ printf("Integral16v failed!\n");
110
+ break;
111
+ case 4:
112
+ printf("Integral24v failed!\n");
113
+ break;
114
+ case 5:
115
+ printf("Integral32v failed!\n");
116
+ break;
117
+ }
118
+ return false;
119
+ }
120
+ }
121
+
122
+
123
+ for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
124
+ {
125
+ if (opt.integral_inith[k] && !check_integral_inith(ref.integral_inith[k], opt.integral_inith[k]))
126
+ {
127
+ switch (k)
128
+ {
129
+ case 0:
130
+ printf("Integral4h failed!\n");
131
+ break;
132
+ case 1:
133
+ printf("Integral8h failed!\n");
134
+ break;
135
+ case 2:
136
+ printf("Integral12h failed!\n");
137
+ break;
138
+ case 3:
139
+ printf("Integral16h failed!\n");
140
+ break;
141
+ case 4:
142
+ printf("Integral24h failed!\n");
143
+ break;
144
+ case 5:
145
+ printf("Integral32h failed!\n");
146
+ break;
147
+ }
148
+ return false;
149
+ }
150
+ }
151
return true;
152
}
153
154
155
HEADER0("pelFilterChroma_Horizontal");
156
REPORT_SPEEDUP(opt.pelFilterChroma[1], ref.pelFilterChroma[1], pbuf1, 1, STRIDE, tc, maskP, maskQ);
157
}
158
+
159
+ for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
160
+ {
161
+ if (opt.integral_initv[k])
162
+ {
163
+ switch (k)
164
+ {
165
+ case 0:
166
+ HEADER0("integral_init4v");
167
+ break;
168
+ case 1:
169
+ HEADER0("integral_init8v");
170
+ break;
171
+ case 2:
172
+ HEADER0("integral_init12v");
173
+ break;
174
+ case 3:
175
+ HEADER0("integral_init16v");
176
+ break;
177
+ case 4:
178
+ HEADER0("integral_init24v");
179
+ break;
180
+ case 5:
181
+ HEADER0("integral_init32v");
182
+ break;
183
+ default:
184
+ break;
185
+ }
186
+ REPORT_SPEEDUP(opt.integral_initv[k], ref.integral_initv[k], (uint32_t*)pbuf1, STRIDE);
187
+ }
188
+ }
189
+
190
+ for (int k = 0; k < NUM_INTEGRAL_SIZE; k++)
191
+ {
192
+ if (opt.integral_inith[k])
193
+ {
194
+ uint32_t dst_buf[BUFFSIZE] = { 0 };
195
+ switch (k)
196
+ {
197
+ case 0:
198
+ HEADER0("integral_init4h");
199
+ break;
200
+ case 1:
201
+ HEADER0("integral_init8h");
202
+ break;
203
+ case 2:
204
+ HEADER0("integral_init12h");
205
+ break;
206
+ case 3:
207
+ HEADER0("integral_init16h");
208
+ break;
209
+ case 4:
210
+ HEADER0("integral_init24h");
211
+ break;
212
+ case 5:
213
+ HEADER0("integral_init32h");
214
+ break;
215
+ default:
216
+ break;
217
+ }
218
+ REPORT_SPEEDUP(opt.integral_inith[k], ref.integral_inith[k], dst_buf, pbuf1, STRIDE);
219
+ }
220
+ }
221
}
222
x265_2.4.tar.gz/source/test/pixelharness.h -> x265_2.5.tar.gz/source/test/pixelharness.h
Changed
19
1
2
enum { BUFFSIZE = STRIDE * (MAX_HEIGHT + PAD_ROWS) + INCR * ITERS };
3
enum { TEST_CASES = 3 };
4
enum { SMAX = 1 << 12 };
5
- enum { SMIN = -1 << 12 };
6
+ enum { SMIN = (unsigned)-1 << 12 };
7
enum { RMAX = PIXEL_MAX - PIXEL_MIN }; //The maximum value obtained by subtracting pixel values (residual max)
8
enum { RMIN = PIXEL_MIN - PIXEL_MAX }; //The minimum value obtained by subtracting pixel values (residual min)
9
10
11
bool check_pelFilterLumaStrong_H(pelFilterLumaStrong_t ref, pelFilterLumaStrong_t opt);
12
bool check_pelFilterChroma_V(pelFilterChroma_t ref, pelFilterChroma_t opt);
13
bool check_pelFilterChroma_H(pelFilterChroma_t ref, pelFilterChroma_t opt);
14
+ bool check_integral_initv(integralv_t ref, integralv_t opt);
15
+ bool check_integral_inith(integralh_t ref, integralh_t opt);
16
17
public:
18
19
x265_2.4.tar.gz/source/test/regression-tests.txt -> x265_2.5.tar.gz/source/test/regression-tests.txt
Changed
52
1
2
BasketballDrive_1920x1080_50.y4m,--preset faster --aq-strength 2 --merange 190 --slices 3
3
BasketballDrive_1920x1080_50.y4m,--preset medium --ctu 16 --max-tu-size 8 --subme 7 --qg-size 16 --cu-lossless --tu-inter-depth 3 --limit-tu 1
4
BasketballDrive_1920x1080_50.y4m,--preset medium --keyint -1 --nr-inter 100 -F4 --no-sao
5
-BasketballDrive_1920x1080_50.y4m,--preset medium --no-cutree --analysis-mode=save --refine-level 2 --bitrate 7000 --limit-modes,--preset medium --no-cutree --analysis-mode=load --refine-level 2 --bitrate 7000 --limit-modes
6
+BasketballDrive_1920x1080_50.y4m,--preset medium --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 2 --bitrate 7000 --limit-modes,--preset medium --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 2 --bitrate 7000 --limit-modes
7
BasketballDrive_1920x1080_50.y4m,--preset slow --nr-intra 100 -F4 --aq-strength 3 --qg-size 16 --limit-refs 1
8
BasketballDrive_1920x1080_50.y4m,--preset slower --lossless --chromaloc 3 --subme 0 --limit-tu 4
9
-BasketballDrive_1920x1080_50.y4m,--preset slower --no-cutree --analysis-mode=save --refine-level 10 --bitrate 7000 --limit-tu 0,--preset slower --no-cutree --analysis-mode=load --refine-level 10 --bitrate 7000 --limit-tu 0
10
+BasketballDrive_1920x1080_50.y4m,--preset slower --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 10 --bitrate 7000 --limit-tu 0,--preset slower --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 10 --bitrate 7000 --limit-tu 0
11
BasketballDrive_1920x1080_50.y4m,--preset veryslow --crf 4 --cu-lossless --pmode --limit-refs 1 --aq-mode 3 --limit-tu 3
12
-BasketballDrive_1920x1080_50.y4m,--preset veryslow --no-cutree --analysis-mode=save --bitrate 7000 --tskip-fast --limit-tu 4,--preset veryslow --no-cutree --analysis-mode=load --bitrate 7000 --tskip-fast --limit-tu 4
13
+BasketballDrive_1920x1080_50.y4m,--preset veryslow --no-cutree --analysis-reuse-mode=save --bitrate 7000 --tskip-fast --limit-tu 4,--preset veryslow --no-cutree --analysis-reuse-mode=load --bitrate 7000 --tskip-fast --limit-tu 4
14
BasketballDrive_1920x1080_50.y4m,--preset veryslow --recon-y4m-exec "ffplay -i pipe:0 -autoexit"
15
Coastguard-4k.y4m,--preset ultrafast --recon-y4m-exec "ffplay -i pipe:0 -autoexit"
16
Coastguard-4k.y4m,--preset superfast --tune grain --overscan=crop
17
Coastguard-4k.y4m,--preset superfast --tune grain --pme --aq-strength 2 --merange 190
18
-Coastguard-4k.y4m,--preset veryfast --no-cutree --analysis-mode=save --refine-level 1 --bitrate 15000,--preset veryfast --no-cutree --analysis-mode=load --refine-level 1 --bitrate 15000
19
+Coastguard-4k.y4m,--preset veryfast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 1 --bitrate 15000,--preset veryfast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 1 --bitrate 15000
20
Coastguard-4k.y4m,--preset medium --rdoq-level 1 --tune ssim --no-signhide --me umh --slices 2
21
Coastguard-4k.y4m,--preset slow --tune psnr --cbqpoffs -1 --crqpoffs 1 --limit-refs 1
22
CrowdRun_1920x1080_50_10bit_422.yuv,--preset ultrafast --weightp --tune zerolatency --qg-size 16
23
24
DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset veryfast --weightp --nr-intra 1000 -F4
25
DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset medium --nr-inter 500 -F4 --no-psy-rdoq
26
DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset slower --no-weightp --rdoq-level 0 --limit-refs 3 --tu-inter-depth 4 --limit-tu 3
27
-DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --no-cutree --analysis-mode=save --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1,--preset fast --no-cutree --analysis-mode=load --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1
28
+DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset fast --no-cutree --analysis-reuse-mode=save --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1,--preset fast --no-cutree --analysis-reuse-mode=load --bitrate 3000 --early-skip --tu-inter-depth 3 --limit-tu 1
29
FourPeople_1280x720_60.y4m,--preset superfast --no-wpp --lookahead-slices 2
30
FourPeople_1280x720_60.y4m,--preset veryfast --aq-mode 2 --aq-strength 1.5 --qg-size 8
31
FourPeople_1280x720_60.y4m,--preset medium --qp 38 --no-psy-rd
32
33
KristenAndSara_1280x720_60.y4m,--preset slower --pmode --max-tu-size 8 --limit-refs 0 --limit-modes --limit-tu 1
34
NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset superfast --tune psnr
35
NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset medium --tune grain --limit-refs 2
36
-NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset slow --no-cutree --analysis-mode=save --rd 5 --refine-level 10 --bitrate 9000,--preset slow --no-cutree --analysis-mode=load --rd 5 --refine-level 10 --bitrate 9000
37
-News-4k.y4m,--preset ultrafast --no-cutree --analysis-mode=save --refine-level 2 --bitrate 15000,--preset ultrafast --no-cutree --analysis-mode=load --refine-level 2 --bitrate 15000
38
+NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset slow --no-cutree --analysis-reuse-mode=save --rd 5 --analysis-reuse-level 10 --bitrate 9000,--preset slow --no-cutree --analysis-reuse-mode=load --rd 5 --analysis-reuse-level 10 --bitrate 9000
39
+News-4k.y4m,--preset ultrafast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 2 --bitrate 15000,--preset ultrafast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 2 --bitrate 15000
40
News-4k.y4m,--preset superfast --lookahead-slices 6 --aq-mode 0
41
News-4k.y4m,--preset superfast --slices 4 --aq-mode 0
42
News-4k.y4m,--preset medium --tune ssim --no-sao --qg-size 16
43
44
old_town_cross_444_720p50.y4m,--preset superfast --weightp --min-cu 16 --limit-modes
45
old_town_cross_444_720p50.y4m,--preset veryfast --qp 1 --tune ssim
46
old_town_cross_444_720p50.y4m,--preset faster --rd 1 --tune zero-latency
47
-old_town_cross_444_720p50.y4m,--preset fast --no-cutree --analysis-mode=save --refine-level 1 --bitrate 3000 --early-skip,--preset fast --no-cutree --analysis-mode=load --refine-level 1 --bitrate 3000 --early-skip
48
+old_town_cross_444_720p50.y4m,--preset fast --no-cutree --analysis-reuse-mode=save --analysis-reuse-level 1 --bitrate 3000 --early-skip,--preset fast --no-cutree --analysis-reuse-mode=load --analysis-reuse-level 1 --bitrate 3000 --early-skip
49
old_town_cross_444_720p50.y4m,--preset medium --keyint -1 --no-weightp --ref 6
50
old_town_cross_444_720p50.y4m,--preset slow --rdoq-level 1 --early-skip --ref 7 --no-b-pyramid
51
old_town_cross_444_720p50.y4m,--preset slower --crf 4 --cu-lossless
52
x265_2.4.tar.gz/source/x265-extras.cpp -> x265_2.5.tar.gz/source/x265-extras.cpp
Changed
258
1
2
3
#include "x265.h"
4
#include "x265-extras.h"
5
-
6
+#include "param.h"
7
#include "common.h"
8
9
using namespace X265_NS;
10
11
"B count, B ave-QP, B kbps, B-PSNR Y, B-PSNR U, B-PSNR V, B-SSIM (dB), "
12
"MaxCLL, MaxFALL, Version\n";
13
14
-FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char* fname, int level)
15
+FILE* x265_csvlog_open(const x265_param& param, const char* fname, int level)
16
{
17
- if (sizeof(x265_stats) != api.sizeof_stats || sizeof(x265_picture) != api.sizeof_picture)
18
- {
19
- fprintf(stderr, "extras [error]: structure size skew, unable to create CSV logfile\n");
20
- return NULL;
21
- }
22
-
23
FILE *csvfp = x265_fopen(fname, "r");
24
if (csvfp)
25
{
26
27
if (level)
28
{
29
fprintf(csvfp, "Encode Order, Type, POC, QP, Bits, Scenecut, ");
30
+ if (level >= 2)
31
+ fprintf(csvfp, "I/P cost ratio, ");
32
if (param.rc.rateControlMode == X265_RC_CRF)
33
fprintf(csvfp, "RateFactor, ");
34
if (param.rc.vbvBufferSize)
35
36
fprintf(csvfp, "Latency, ");
37
fprintf(csvfp, "List 0, List 1");
38
uint32_t size = param.maxCUSize;
39
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
40
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
41
{
42
fprintf(csvfp, ", Intra %dx%d DC, Intra %dx%d Planar, Intra %dx%d Ang", size, size, size, size, size, size);
43
size /= 2;
44
45
size = param.maxCUSize;
46
if (param.bEnableRectInter)
47
{
48
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
49
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
50
{
51
fprintf(csvfp, ", Inter %dx%d, Inter %dx%d (Rect)", size, size, size, size);
52
if (param.bEnableAMP)
53
54
}
55
else
56
{
57
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
58
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
59
{
60
fprintf(csvfp, ", Inter %dx%d", size, size);
61
size /= 2;
62
}
63
}
64
size = param.maxCUSize;
65
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
66
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
67
{
68
fprintf(csvfp, ", Skip %dx%d", size, size);
69
size /= 2;
70
}
71
size = param.maxCUSize;
72
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
73
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
74
{
75
fprintf(csvfp, ", Merge %dx%d", size, size);
76
size /= 2;
77
}
78
- fprintf(csvfp, ", Avg Luma Distortion, Avg Chroma Distortion, Avg psyEnergy, Avg Luma Level, Max Luma Level, Avg Residual Energy");
79
80
- /* detailed performance statistics */
81
if (level >= 2)
82
- fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms), Stall Time (ms), Total frame time (ms), Avg WPP, Row Blocks");
83
+ {
84
+ fprintf(csvfp, ", Avg Luma Distortion, Avg Chroma Distortion, Avg psyEnergy, Avg Residual Energy,"
85
+ " Min Luma Level, Max Luma Level, Avg Luma Level");
86
+
87
+ if (param.internalCsp != X265_CSP_I400)
88
+ fprintf(csvfp, ", Min Cb Level, Max Cb Level, Avg Cb Level, Min Cr Level, Max Cr Level, Avg Cr Level");
89
+
90
+ /* PU statistics */
91
+ size = param.maxCUSize;
92
+ for (uint32_t i = 0; i< param.maxLog2CUSize - (uint32_t)g_log2Size[param.minCUSize] + 1; i++)
93
+ {
94
+ fprintf(csvfp, ", Intra %dx%d", size, size);
95
+ fprintf(csvfp, ", Skip %dx%d", size, size);
96
+ fprintf(csvfp, ", AMP %d", size);
97
+ fprintf(csvfp, ", Inter %dx%d", size, size);
98
+ fprintf(csvfp, ", Merge %dx%d", size, size);
99
+ fprintf(csvfp, ", Inter %dx%d", size, size / 2);
100
+ fprintf(csvfp, ", Merge %dx%d", size, size / 2);
101
+ fprintf(csvfp, ", Inter %dx%d", size / 2, size);
102
+ fprintf(csvfp, ", Merge %dx%d", size / 2, size);
103
+ size /= 2;
104
+ }
105
+
106
+ if ((uint32_t)g_log2Size[param.minCUSize] == 3)
107
+ fprintf(csvfp, ", 4x4");
108
+
109
+ /* detailed performance statistics */
110
+ fprintf(csvfp, ", DecideWait (ms), Row0Wait (ms), Wall time (ms), Ref Wait Wall (ms), Total CTU time (ms),"
111
+ "Stall Time (ms), Total frame time (ms), Avg WPP, Row Blocks");
112
+ }
113
fprintf(csvfp, "\n");
114
}
115
else
116
117
return;
118
119
const x265_frame_stats* frameStats = &pic.frameData;
120
- fprintf(csvfp, "%d, %c-SLICE, %4d, %2.2lf, %10d, %d,", frameStats->encoderOrder, frameStats->sliceType, frameStats->poc, frameStats->qp, (int)frameStats->bits, frameStats->bScenecut);
121
+ fprintf(csvfp, "%d, %c-SLICE, %4d, %2.2lf, %10d, %d,", frameStats->encoderOrder, frameStats->sliceType, frameStats->poc,
122
+ frameStats->qp, (int)frameStats->bits, frameStats->bScenecut);
123
+ if (level >= 2)
124
+ fprintf(csvfp, "%.2f,", frameStats->ipCostRatio);
125
if (param.rc.rateControlMode == X265_RC_CRF)
126
fprintf(csvfp, "%.3lf,", frameStats->rateFactor);
127
if (param.rc.vbvBufferSize)
128
129
else
130
fputs(" -,", csvfp);
131
}
132
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
133
- fprintf(csvfp, "%5.2lf%%, %5.2lf%%, %5.2lf%%,", frameStats->cuStats.percentIntraDistribution[depth][0], frameStats->cuStats.percentIntraDistribution[depth][1], frameStats->cuStats.percentIntraDistribution[depth][2]);
134
- fprintf(csvfp, "%5.2lf%%", frameStats->cuStats.percentIntraNxN);
135
- if (param.bEnableRectInter)
136
+
137
+ if (level)
138
{
139
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
140
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
141
+ fprintf(csvfp, "%5.2lf%%, %5.2lf%%, %5.2lf%%,", frameStats->cuStats.percentIntraDistribution[depth][0],
142
+ frameStats->cuStats.percentIntraDistribution[depth][1],
143
+ frameStats->cuStats.percentIntraDistribution[depth][2]);
144
+ fprintf(csvfp, "%5.2lf%%", frameStats->cuStats.percentIntraNxN);
145
+ if (param.bEnableRectInter)
146
{
147
- fprintf(csvfp, ", %5.2lf%%, %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0], frameStats->cuStats.percentInterDistribution[depth][1]);
148
- if (param.bEnableAMP)
149
- fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][2]);
150
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
151
+ {
152
+ fprintf(csvfp, ", %5.2lf%%, %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0],
153
+ frameStats->cuStats.percentInterDistribution[depth][1]);
154
+ if (param.bEnableAMP)
155
+ fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][2]);
156
+ }
157
}
158
+ else
159
+ {
160
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
161
+ fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0]);
162
+ }
163
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
164
+ fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentSkipCu[depth]);
165
+ for (uint32_t depth = 0; depth <= param.maxCUDepth; depth++)
166
+ fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentMergeCu[depth]);
167
}
168
- else
169
- {
170
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
171
- fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentInterDistribution[depth][0]);
172
- }
173
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
174
- fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentSkipCu[depth]);
175
- for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
176
- fprintf(csvfp, ", %5.2lf%%", frameStats->cuStats.percentMergeCu[depth]);
177
- fprintf(csvfp, ", %.2lf, %.2lf, %.2lf, %.2lf, %d, %.2lf", frameStats->avgLumaDistortion, frameStats->avgChromaDistortion, frameStats->avgPsyEnergy, frameStats->avgLumaLevel, frameStats->maxLumaLevel, frameStats->avgResEnergy);
178
179
if (level >= 2)
180
{
181
- fprintf(csvfp, ", %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf,", frameStats->decideWaitTime, frameStats->row0WaitTime, frameStats->wallTime, frameStats->refWaitWallTime, frameStats->totalCTUTime, frameStats->stallTime, frameStats->totalFrameTime);
182
+ fprintf(csvfp, ", %.2lf, %.2lf, %.2lf, %.2lf ", frameStats->avgLumaDistortion,
183
+ frameStats->avgChromaDistortion,
184
+ frameStats->avgPsyEnergy,
185
+ frameStats->avgResEnergy);
186
+
187
+ fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minLumaLevel, frameStats->maxLumaLevel, frameStats->avgLumaLevel);
188
+
189
+ if (param.internalCsp != X265_CSP_I400)
190
+ {
191
+ fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minChromaULevel, frameStats->maxChromaULevel, frameStats->avgChromaULevel);
192
+ fprintf(csvfp, ", %d, %d, %.2lf", frameStats->minChromaVLevel, frameStats->maxChromaVLevel, frameStats->avgChromaVLevel);
193
+ }
194
+
195
+ for (uint32_t i = 0; i < param.maxLog2CUSize - (uint32_t)g_log2Size[param.minCUSize] + 1; i++)
196
+ {
197
+ fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentIntraPu[i]);
198
+ fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentSkipPu[i]);
199
+ fprintf(csvfp, ",%.2lf%%", frameStats->puStats.percentAmpPu[i]);
200
+ for (uint32_t j = 0; j < 3; j++)
201
+ {
202
+ fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentInterPu[i][j]);
203
+ fprintf(csvfp, ", %.2lf%%", frameStats->puStats.percentMergePu[i][j]);
204
+ }
205
+ }
206
+ if ((uint32_t)g_log2Size[param.minCUSize] == 3)
207
+ fprintf(csvfp, ",%.2lf%%", frameStats->puStats.percentNxN);
208
+
209
+ fprintf(csvfp, ", %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf, %.1lf,", frameStats->decideWaitTime, frameStats->row0WaitTime,
210
+ frameStats->wallTime, frameStats->refWaitWallTime,
211
+ frameStats->totalCTUTime, frameStats->stallTime,
212
+ frameStats->totalFrameTime);
213
+
214
fprintf(csvfp, " %.3lf, %d", frameStats->avgWPP, frameStats->countRowBlocks);
215
}
216
fprintf(csvfp, "\n");
217
fflush(stderr);
218
}
219
220
-void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, const x265_stats& stats, int level, int argc, char** argv)
221
+void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, int padx, int pady, const x265_stats& stats, int level, int argc, char** argv)
222
{
223
if (!csvfp)
224
return;
225
226
}
227
228
// CLI arguments or other
229
- fputc('"', csvfp);
230
- for (int i = 1; i < argc; i++)
231
+ if (argc)
232
{
233
- fputc(' ', csvfp);
234
- fputs(argv[i], csvfp);
235
+ fputc('"', csvfp);
236
+ for (int i = 1; i < argc; i++)
237
+ {
238
+ fputc(' ', csvfp);
239
+ fputs(argv[i], csvfp);
240
+ }
241
+ fputc('"', csvfp);
242
+ }
243
+ else
244
+ {
245
+ const x265_param* paramTemp = ¶m;
246
+ char *opts = x265_param2string((x265_param*)paramTemp, padx, pady);
247
+ if (opts)
248
+ {
249
+ fputc('"', csvfp);
250
+ fputs(opts, csvfp);
251
+ fputc('"', csvfp);
252
+ }
253
}
254
- fputc('"', csvfp);
255
256
// current date and time
257
time_t now;
258
x265_2.4.tar.gz/source/x265-extras.h -> x265_2.5.tar.gz/source/x265-extras.h
Changed
19
1
2
* closed by the caller using fclose(). If level is 0, then no frame logging
3
* header is written to the file. This function will return NULL if it is unable
4
* to open the file for write or if it detects a structure size skew */
5
-LIBAPI FILE* x265_csvlog_open(const x265_api& api, const x265_param& param, const char* fname, int level);
6
+LIBAPI FILE* x265_csvlog_open(const x265_param& param, const char* fname, int level);
7
8
/* Log frame statistics to the CSV file handle. level should have been non-zero
9
* in the call to x265_csvlog_open() if this function is called. */
10
11
/* Log final encode statistics to the CSV file handle. 'argc' and 'argv' are
12
* intended to be command line arguments passed to the encoder. Encode
13
* statistics should be queried from the encoder just prior to closing it. */
14
-LIBAPI void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, const x265_stats& stats, int level, int argc, char** argv);
15
+LIBAPI void x265_csvlog_encode(FILE* csvfp, const char* version, const x265_param& param, int padx, int pady, const x265_stats& stats, int level, int argc, char** argv);
16
17
/* In-place downshift from a bit-depth greater than 8 to a bit-depth of 8, using
18
* the residual bits to dither each row. */
19
x265_2.4.tar.gz/source/x265.cpp -> x265_2.5.tar.gz/source/x265.cpp
Changed
124
1
2
ReconFile* recon;
3
OutputFile* output;
4
FILE* qpfile;
5
- FILE* csvfpt;
6
- const char* csvfn;
7
const char* reconPlayCmd;
8
const x265_api* api;
9
x265_param* param;
10
bool bProgress;
11
bool bForceY4m;
12
bool bDither;
13
- int csvLogLevel;
14
uint32_t seek; // number of frames to skip from the beginning
15
uint32_t framesToBeEncoded; // number of frames to encode
16
uint64_t totalbytes;
17
18
recon = NULL;
19
output = NULL;
20
qpfile = NULL;
21
- csvfpt = NULL;
22
- csvfn = NULL;
23
reconPlayCmd = NULL;
24
api = NULL;
25
param = NULL;
26
27
startTime = x265_mdate();
28
prevUpdateTime = 0;
29
bDither = false;
30
- csvLogLevel = 0;
31
}
32
33
void destroy();
34
35
if (qpfile)
36
fclose(qpfile);
37
qpfile = NULL;
38
- if (csvfpt)
39
- fclose(csvfpt);
40
- csvfpt = NULL;
41
if (output)
42
output->release();
43
output = NULL;
44
45
if (0) ;
46
OPT2("frame-skip", "seek") this->seek = (uint32_t)x265_atoi(optarg, bError);
47
OPT("frames") this->framesToBeEncoded = (uint32_t)x265_atoi(optarg, bError);
48
- OPT("csv") this->csvfn = optarg;
49
- OPT("csv-log-level") this->csvLogLevel = x265_atoi(optarg, bError);
50
OPT("no-progress") this->bProgress = false;
51
OPT("output") outputfn = optarg;
52
OPT("input") inputfn = optarg;
53
54
* 1 - unable to parse command line
55
* 2 - unable to open encoder
56
* 3 - unable to generate stream headers
57
- * 4 - encoder abort
58
- * 5 - unable to open csv file */
59
+ * 4 - encoder abort */
60
61
int main(int argc, char **argv)
62
{
63
64
/* get the encoder parameters post-initialization */
65
api->encoder_parameters(encoder, param);
66
67
- if (cliopt.csvfn)
68
- {
69
- cliopt.csvfpt = x265_csvlog_open(*api, *param, cliopt.csvfn, cliopt.csvLogLevel);
70
- if (!cliopt.csvfpt)
71
- {
72
- x265_log_file(param, X265_LOG_ERROR, "Unable to open CSV log file <%s>, aborting\n", cliopt.csvfn);
73
- cliopt.destroy();
74
- if (cliopt.api)
75
- cliopt.api->param_free(cliopt.param);
76
- exit(5);
77
- }
78
- }
79
-
80
- /* Control-C handler */
81
+ /* Control-C handler */
82
if (signal(SIGINT, sigint_handler) == SIG_ERR)
83
x265_log(param, X265_LOG_ERROR, "Unable to register CTRL+C handler: %s\n", strerror(errno));
84
85
x265_picture pic_orig, pic_out;
86
x265_picture *pic_in = &pic_orig;
87
- /* Allocate recon picture if analysisMode is enabled */
88
+ /* Allocate recon picture if analysisReuseMode is enabled */
89
std::priority_queue<int64_t>* pts_queue = cliopt.output->needPTS() ? new std::priority_queue<int64_t>() : NULL;
90
- x265_picture *pic_recon = (cliopt.recon || !!param->analysisMode || pts_queue || reconPlay || cliopt.csvLogLevel) ? &pic_out : NULL;
91
+ x265_picture *pic_recon = (cliopt.recon || !!param->analysisReuseMode || pts_queue || reconPlay || param->csvLogLevel) ? &pic_out : NULL;
92
uint32_t inFrameCount = 0;
93
uint32_t outFrameCount = 0;
94
x265_nal *p_nal;
95
96
}
97
98
cliopt.printStatus(outFrameCount);
99
- if (numEncoded && cliopt.csvLogLevel)
100
- x265_csvlog_frame(cliopt.csvfpt, *param, *pic_recon, cliopt.csvLogLevel);
101
}
102
103
/* Flush the encoder */
104
105
}
106
107
cliopt.printStatus(outFrameCount);
108
- if (numEncoded && cliopt.csvLogLevel)
109
- x265_csvlog_frame(cliopt.csvfpt, *param, *pic_recon, cliopt.csvLogLevel);
110
111
if (!numEncoded)
112
break;
113
114
delete reconPlay;
115
116
api->encoder_get_stats(encoder, &stats, sizeof(stats));
117
- if (cliopt.csvfpt && !b_ctrl_c)
118
- x265_csvlog_encode(cliopt.csvfpt, api->version_str, *param, stats, cliopt.csvLogLevel, argc, argv);
119
+ if (param->csvfn && !b_ctrl_c)
120
+ api->encoder_log(encoder, argc, argv);
121
api->encoder_close(encoder);
122
123
int64_t second_largest_pts = 0;
124
x265_2.4.tar.gz/source/x265.h -> x265_2.5.tar.gz/source/x265.h
Changed
232
1
2
3
#ifndef X265_H
4
#define X265_H
5
-
6
#include <stdint.h>
7
+#include <stdio.h>
8
#include "x265_config.h"
9
-
10
#ifdef __cplusplus
11
extern "C" {
12
#endif
13
14
uint32_t sliceType;
15
uint32_t numCUsInFrame;
16
uint32_t numPartitions;
17
+ uint32_t depthBytes;
18
int bScenecut;
19
void* wt;
20
void* interData;
21
22
} x265_cu_stats;
23
24
25
+/* pu statistics */
26
+typedef struct x265_pu_stats
27
+{
28
+ double percentSkipPu[4]; // Percentage of skip cu in all depths
29
+ double percentIntraPu[4]; // Percentage of intra modes in all depths
30
+ double percentAmpPu[4]; // Percentage of amp modes in all depths
31
+ double percentInterPu[4][3]; // Percentage of inter 2nx2n, 2nxn and nx2n in all depths
32
+ double percentMergePu[4][3]; // Percentage of merge 2nx2n, 2nxn and nx2n in all depth
33
+ double percentNxN;
34
+
35
+ /* All the above values will add up to 100%. */
36
+} x265_pu_stats;
37
+
38
+
39
typedef struct x265_analysis_2Pass
40
{
41
uint32_t poc;
42
43
int list0POC[16];
44
int list1POC[16];
45
uint16_t maxLumaLevel;
46
+ uint16_t minLumaLevel;
47
+
48
+ uint16_t maxChromaULevel;
49
+ uint16_t minChromaULevel;
50
+ double avgChromaULevel;
51
+
52
+
53
+ uint16_t maxChromaVLevel;
54
+ uint16_t minChromaVLevel;
55
+ double avgChromaVLevel;
56
+
57
char sliceType;
58
int bScenecut;
59
+ double ipCostRatio;
60
int frameLatency;
61
x265_cu_stats cuStats;
62
+ x265_pu_stats puStats;
63
double totalFrameTime;
64
} x265_frame_stats;
65
66
+typedef struct x265_ctu_info_t
67
+{
68
+ int32_t ctuAddress;
69
+ int32_t ctuPartitions[64];
70
+ void* ctuInfo;
71
+} x265_ctu_info_t;
72
+
73
+typedef enum
74
+{
75
+ NO_CTU_INFO = 0,
76
+ HAS_CTU_INFO = 1,
77
+ CTU_INFO_CHANGE = 2,
78
+}CTUInfo;
79
+
80
+
81
/* Arbitrary User SEI
82
* Payload size is in bytes and the payload pointer must be non-NULL.
83
* Payload types and syntax can be found in Annex D of the H.265 Specification.
84
85
* to allow the encoder to determine base QP */
86
int forceqp;
87
88
- /* If param.analysisMode is X265_ANALYSIS_OFF this field is ignored on input
89
+ /* If param.analysisReuseMode is X265_ANALYSIS_OFF this field is ignored on input
90
* and output. Else the user must call x265_alloc_analysis_data() to
91
* allocate analysis buffers for every picture passed to the encoder.
92
*
93
- * On input when param.analysisMode is X265_ANALYSIS_LOAD and analysisData
94
+ * On input when param.analysisReuseMode is X265_ANALYSIS_LOAD and analysisData
95
* member pointers are valid, the encoder will use the data stored here to
96
* reduce encoder work.
97
*
98
- * On output when param.analysisMode is X265_ANALYSIS_SAVE and analysisData
99
+ * On output when param.analysisReuseMode is X265_ANALYSIS_SAVE and analysisData
100
* member pointers are valid, the encoder will write output analysis into
101
* this data structure */
102
x265_analysis_data analysisData;
103
104
* X265_LOG_FULL, default is X265_LOG_INFO */
105
int logLevel;
106
107
- /* Filename of CSV log. Now deprecated */
108
+ /* Level of csv logging. 0 is summary, 1 is frame level logging,
109
+ * 2 is frame level logging with performance statistics */
110
+ int csvLogLevel;
111
+
112
+ /* filename of CSV log. If csvLogLevel is non-zero, the encoder will emit
113
+ * per-slice statistics to this log file in encode order. Otherwise the
114
+ * encoder will emit per-stream statistics into the log file when
115
+ * x265_encoder_log is called (presumably at the end of the encode) */
116
const char* csvfn;
117
118
/*== Internal Picture Specification ==*/
119
120
* buffers. if X265_ANALYSIS_LOAD, read analysis information into analysis
121
* buffer and use this analysis information to reduce the amount of work
122
* the encoder must perform. Default X265_ANALYSIS_OFF */
123
- int analysisMode;
124
+ int analysisReuseMode;
125
126
- /* Filename for analysisMode save/load. Default name is "x265_analysis.dat" */
127
- const char* analysisFileName;
128
+ /* Filename for analysisReuseMode save/load. Default name is "x265_analysis.dat" */
129
+ const char* analysisReuseFileName;
130
131
/*== Rate Control ==*/
132
133
134
135
/* sets a hard lower limit on QP */
136
int qpMin;
137
+
138
+ /* internally enable if tune grain is set */
139
+ int bEnableConstVbv;
140
} rc;
141
142
/*== Video Usability Information ==*/
143
144
int bHDROpt;
145
146
/* A value between 1 and 10 (both inclusive) determines the level of
147
- * information stored/reused in save/load analysis-mode. Higher the refine
148
- * level higher the informtion stored/reused. Default is 5 */
149
- int analysisRefineLevel;
150
+ * information stored/reused in save/load analysis-reuse-mode. Higher the refine
151
+ * level higher the information stored/reused. Default is 5 */
152
+ int analysisReuseLevel;
153
154
/* Limit Sample Adaptive Offset filter computation by early terminating SAO
155
* process based on inter prediction mode, CTU spatial-domain correlations,
156
157
/* Insert tone mapping information only for IDR frames and when the
158
* tone mapping information changes. */
159
int bDhdr10opt;
160
+
161
+ /* Determine how x265 react to the content information recieved through the API */
162
+ int bCTUInfo;
163
+
164
+ /* Use ratecontrol statistics from pic_in, if available*/
165
+ int bUseRcStats;
166
+
167
+ /* Factor by which input video is scaled down for analysis save mode. Default is 0 */
168
+ int scaleFactor;
169
+
170
+ /* Enable intra refinement in load mode*/
171
+ int intraRefine;
172
+
173
+ /* Enable inter refinement in load mode*/
174
+ int interRefine;
175
+
176
+ /* Enable motion vector refinement in load mode*/
177
+ int mvRefine;
178
+
179
+ /* Log of maximum CTU size */
180
+ uint32_t maxLog2CUSize;
181
+
182
+ /* Actual CU depth with respect to config depth */
183
+ uint32_t maxCUDepth;
184
+
185
+ /* CU depth with respect to maximum transform size */
186
+ uint32_t unitSizeDepth;
187
+
188
+ /* Number of 4x4 units in maximum CU size */
189
+ uint32_t num4x4Partitions;
190
+
191
+ /* Specify if analysis mode uses file for data reuse */
192
+ int bUseAnalysisFile;
193
+
194
+ /* File pointer for csv log */
195
+ FILE* csvfpt;
196
} x265_param;
197
+
198
/* x265_param_alloc:
199
* Allocates an x265_param instance. The returned param structure is not
200
* special in any way, but using this method together with x265_param_free()
201
202
void x265_encoder_get_stats(x265_encoder *encoder, x265_stats *, uint32_t statsSizeBytes);
203
204
/* x265_encoder_log:
205
- * This function is deprecated */
206
+ * write a line to the configured CSV file. If a CSV filename was not
207
+ * configured, or file open failed, this function will perform no write. */
208
void x265_encoder_log(x265_encoder *encoder, int argc, char **argv);
209
210
/* x265_encoder_close:
211
212
213
int x265_encoder_intra_refresh(x265_encoder *);
214
215
+/* x265_encoder_ctu_info:
216
+ * Copy CTU information such as ctu address and ctu partition structure of all
217
+ * CTUs in each frame. The function is invoked only if "--ctu-info" is enabled and
218
+ * the encoder will wait for this copy to complete if enabled.
219
+ */
220
+int x265_encoder_ctu_info(x265_encoder *, int poc, x265_ctu_info_t** ctu);
221
/* x265_cleanup:
222
* release library static allocations, reset configured CTU size */
223
void x265_cleanup(void);
224
225
226
int sizeof_frame_stats; /* sizeof(x265_frame_stats) */
227
int (*encoder_intra_refresh)(x265_encoder*);
228
+ int (*encoder_ctu_info)(x265_encoder*, int, x265_ctu_info_t**);
229
/* add new pointers to the end, or increment X265_MAJOR_VERSION */
230
} x265_api;
231
232
x265_2.4.tar.gz/source/x265cli.h -> x265_2.5.tar.gz/source/x265cli.h
Changed
94
1
2
{ "scenecut", required_argument, NULL, 0 },
3
{ "no-scenecut", no_argument, NULL, 0 },
4
{ "scenecut-bias", required_argument, NULL, 0 },
5
+ { "ctu-info", required_argument, NULL, 0 },
6
{ "intra-refresh", no_argument, NULL, 0 },
7
{ "rc-lookahead", required_argument, NULL, 0 },
8
{ "lookahead-slices", required_argument, NULL, 0 },
9
10
{ "qpstep", required_argument, NULL, 0 },
11
{ "qpmin", required_argument, NULL, 0 },
12
{ "qpmax", required_argument, NULL, 0 },
13
+ { "const-vbv", no_argument, NULL, 0 },
14
+ { "no-const-vbv", no_argument, NULL, 0 },
15
{ "ratetol", required_argument, NULL, 0 },
16
{ "cplxblur", required_argument, NULL, 0 },
17
{ "qblur", required_argument, NULL, 0 },
18
19
{ "no-slow-firstpass", no_argument, NULL, 0 },
20
{ "multi-pass-opt-rps", no_argument, NULL, 0 },
21
{ "no-multi-pass-opt-rps", no_argument, NULL, 0 },
22
- { "analysis-mode", required_argument, NULL, 0 },
23
- { "analysis-file", required_argument, NULL, 0 },
24
- { "refine-level", required_argument, NULL, 0 },
25
+ { "analysis-reuse-mode", required_argument, NULL, 0 },
26
+ { "analysis-reuse-file", required_argument, NULL, 0 },
27
+ { "analysis-reuse-level", required_argument, NULL, 0 },
28
+ { "scale-factor", required_argument, NULL, 0 },
29
+ { "refine-intra", required_argument, NULL, 0 },
30
+ { "refine-inter", no_argument, NULL, 0 },
31
+ { "no-refine-inter",no_argument, NULL, 0 },
32
{ "strict-cbr", no_argument, NULL, 0 },
33
{ "temporal-layers", no_argument, NULL, 0 },
34
{ "no-temporal-layers", no_argument, NULL, 0 },
35
36
{ "dhdr10-info", required_argument, NULL, 0 },
37
{ "dhdr10-opt", no_argument, NULL, 0},
38
{ "no-dhdr10-opt", no_argument, NULL, 0},
39
+ { "refine-mv", no_argument, NULL, 0 },
40
+ { "no-refine-mv", no_argument, NULL, 0 },
41
{ 0, 0, 0, 0 },
42
{ 0, 0, 0, 0 },
43
{ 0, 0, 0, 0 },
44
45
H1(" 1 - i420 (4:2:0 default)\n");
46
H1(" 2 - i422 (4:2:2)\n");
47
H1(" 3 - i444 (4:4:4)\n");
48
-#if ENABLE_DYNAMIC_HDR10
49
- H0(" --dhdr10-info <filename> JSON file containing the Creative Intent Metadata to be encoded as Dynamic Tone Mapping \n");
50
- H0(" --[no-]dhdr10-opt Insert tone mapping SEI only for IDR frames and when the tone mapping information changes. Default disabled");
51
+#if ENABLE_HDR10_PLUS
52
+ H0(" --dhdr10-info <filename> JSON file containing the Creative Intent Metadata to be encoded as Dynamic Tone Mapping\n");
53
+ H0(" --[no-]dhdr10-opt Insert tone mapping SEI only for IDR frames and when the tone mapping information changes. Default disabled\n");
54
#endif
55
H0("-f/--frames <integer> Maximum number of frames to encode. Default all\n");
56
H0(" --seek <integer> First frame to encode\n");
57
58
H1(" --[no-]tskip-fast Enable fast intra transform skipping. Default %s\n", OPT(param->bEnableTSkipFast));
59
H1(" --nr-intra <integer> An integer value in range of 0 to 2000, which denotes strength of noise reduction in intra CUs. Default 0\n");
60
H1(" --nr-inter <integer> An integer value in range of 0 to 2000, which denotes strength of noise reduction in inter CUs. Default 0\n");
61
+ H0(" --ctu-info <integer> Enable receiving ctu information asynchronously and determine reaction to the CTU information (0, 1, 2, 4, 6) Default 0\n"
62
+ " - 1: force the partitions if CTU information is present\n"
63
+ " - 2: functionality of (1) and reduce qp if CTU information has changed\n"
64
+ " - 4: functionality of (1) and force Inter modes when CTU Information has changed, merge/skip otherwise\n"
65
+ " Enable this option only when planning to invoke the API function x265_encoder_ctu_info to copy ctu-info asynchronously\n");
66
H0("\nCoding tools:\n");
67
H0("-w/--[no-]weightp Enable weighted prediction in P slices. Default %s\n", OPT(param->bEnableWeightedPred));
68
H0(" --[no-]weightb Enable weighted prediction in B slices. Default %s\n", OPT(param->bEnableWeightedBiPred));
69
70
H0(" --[no-]analyze-src-pics Motion estimation uses source frame planes. Default disable\n");
71
H0(" --[no-]slow-firstpass Enable a slow first pass in a multipass rate control mode. Default %s\n", OPT(param->rc.bEnableSlowFirstPass));
72
H0(" --[no-]strict-cbr Enable stricter conditions and tolerance for bitrate deviations in CBR mode. Default %s\n", OPT(param->rc.bStrictCbr));
73
- H0(" --analysis-mode <string|int> save - Dump analysis info into file, load - Load analysis buffers from the file. Default %d\n", param->analysisMode);
74
- H0(" --analysis-file <filename> Specify file name used for either dumping or reading analysis data.\n");
75
- H0(" --refine-level <1..10> Level of analysis refinement indicates amount of info stored/reused in save/load mode, 1:least....10:most. Default %d\n", param->analysisRefineLevel);
76
+ H0(" --analysis-reuse-mode <string|int> save - Dump analysis info into file, load - Load analysis buffers from the file. Default %d\n", param->analysisReuseMode);
77
+ H0(" --analysis-reuse-file <filename> Specify file name used for either dumping or reading analysis data. Deault x265_analysis.dat\n");
78
+ H0(" --analysis-reuse-level <1..10> Level of analysis reuse indicates amount of info stored/reused in save/load mode, 1:least..10:most. Default %d\n", param->analysisReuseLevel);
79
+ H0(" --scale-factor <int> Specify factor by which input video is scaled down for analysis save mode. Default %d\n", param->scaleFactor);
80
+ H0(" --refine-intra <int> Enable intra refinement for load mode. Default %d\n", param->intraRefine);
81
+ H0(" --[no-]refine-inter Enable inter refinement for load mode. Default %s\n", OPT(param->interRefine));
82
+ H0(" --[no-]refine-mv Enable mv refinement for load mode. Default %s\n", OPT(param->mvRefine));
83
H0(" --aq-mode <integer> Mode for Adaptive Quantization - 0:none 1:uniform AQ 2:auto variance 3:auto variance with bias to dark scenes. Default %d\n", param->rc.aqMode);
84
H0(" --aq-strength <float> Reduces blocking and blurring in flat and textured areas (0 to 3.0). Default %.2f\n", param->rc.aqStrength);
85
H0(" --[no-]aq-motion Adaptive Quantization based on the relative motion of each CU w.r.t., frame. Default %s\n", OPT(param->bOptCUDeltaQP));
86
87
H1(" --qpstep <integer> The maximum single adjustment in QP allowed to rate control. Default %d\n", param->rc.qpStep);
88
H1(" --qpmin <integer> sets a hard lower limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMin);
89
H1(" --qpmax <integer> sets a hard upper limit on QP allowed to ratecontrol. Default %d\n", param->rc.qpMax);
90
+ H0(" --[no-]const-vbv Enable consistent vbv. turned on with tune grain. Default %s\n", OPT(param->rc.bEnableConstVbv));
91
H1(" --cbqpoffs <integer> Chroma Cb QP Offset [-12..12]. Default %d\n", param->cbQpOffset);
92
H1(" --crqpoffs <integer> Chroma Cr QP Offset [-12..12]. Default %d\n", param->crQpOffset);
93
H1(" --scaling-list <string> Specify a file containing HM style quant scaling lists or 'default' or 'off'. Default: off\n");
94
Refresh
No build results available
Refresh
No rpmlint results available
Login required, please
login
or
signup
in order to comment
Request History
enzokiel created request over 7 years ago
Update to version 2.5
enzokiel accepted request over 7 years ago