We truncated the diff of some files because they were too big.
If you want to see the full diff for every file, click here.
Changes of Revision 10
x265.changes
Changed
x
1
2
-------------------------------------------------------------------
3
+Fri May 29 09:11:02 UTC 2015 - aloisio@gmx.com
4
+
5
+- soname bump to 59
6
+- Update to version 1.7
7
+ * large amount of assembly code optimizations
8
+ * some preliminary support for high dynamic range content
9
+ * improvements for multi-library support
10
+ * some new quality features
11
+ (full documentation at: http://x265.readthedocs.org/en/1.7)
12
+ * This release simplifies the multi-library support introduced
13
+ in version 1.6. Any libx265 can now forward API requests to
14
+ other installed libx265 libraries (by name) so applications
15
+ like ffmpeg and the x265 CLI can select between 8bit and 10bit
16
+ encodes at runtime without the need of a shim library or
17
+ library load path hacks. See --output-depth, and
18
+ http://x265.readthedocs.org/en/1.7/api.html#multi-library-interface
19
+ * For quality, x265 now allows you to configure the quantization
20
+ group size smaller than the CTU size (for finer grained AQ
21
+ adjustments). See --qg-size.
22
+ * x265 now supports limited mid-encode reconfigure via a new public
23
+ method: x265_encoder_reconfig()
24
+ * For HDR, x265 now supports signaling the SMPTE 2084 color transfer
25
+ function, the SMPTE 2086 mastering display color primaries, and the
26
+ content light levels. See --master-display, --max-cll
27
+ * x265 will no longer emit any non-conformant bitstreams unless
28
+ --allow-non-conformance is specified.
29
+ * The x265 CLI now supports a simple encode preview feature. See
30
+ --recon-y4m-exec.
31
+ * The AnnexB NAL headers can now be configured off, via x265_param.bAnnexB
32
+ This is not configurable via the CLI because it is a function of the
33
+ muxer being used, and the CLI only supports raw output files. See
34
+ --annexb
35
+ Misc:
36
+ * --lossless encodes are now signaled as level 8.5
37
+ * --profile now has a -P short option
38
+ * The regression scripts used by x265 are now public, and can be found at:
39
+ https://bitbucket.org/sborho/test-harness
40
+ * x265's cmake scripts now support PGO builds, the test-harness can be
41
+ used to drive the profile-guided build process.
42
+
43
+-------------------------------------------------------------------
44
Tue Apr 28 20:08:06 UTC 2015 - aloisio@gmx.com
45
46
- soname bumped to 51
47
x265.spec
Changed
14
1
2
# based on the spec file from https://build.opensuse.org/package/view_file/home:Simmphonie/libx265/
3
4
Name: x265
5
-%define soname 51
6
+%define soname 59
7
%define libname lib%{name}
8
%define libsoname %{libname}-%{soname}
9
-Version: 1.6
10
+Version: 1.7
11
Release: 0
12
License: GPL-2.0+
13
Summary: A free h265/HEVC encoder - encoder binary
14
baselibs.conf
Changed
4
1
2
-libx265-51
3
+libx265-59
4
x265_1.6.tar.gz/.hg_archival.txt -> x265_1.7.tar.gz/.hg_archival.txt
Changed
8
1
2
repo: 09fe40627f03a0f9c3e6ac78b22ac93da23f9fdf
3
-node: cbeb7d8a4880e4020c4545dd8e498432c3c6cad3
4
+node: 8425278def1edf0931dc33fc518e1950063e76b0
5
branch: stable
6
-tag: 1.6
7
+tag: 1.7
8
x265_1.6.tar.gz/.hgtags -> x265_1.7.tar.gz/.hgtags
Changed
6
1
2
c1e4fc0162c14fdb84f5c3bd404fb28cfe10a17f 1.3
3
5e604833c5aa605d0b6efbe5234492b5e7d8ac61 1.4
4
9f0324125f53a12f766f6ed6f98f16e2f42337f4 1.5
5
+cbeb7d8a4880e4020c4545dd8e498432c3c6cad3 1.6
6
x265_1.6.tar.gz/doc/reST/api.rst -> x265_1.7.tar.gz/doc/reST/api.rst
Changed
83
1
2
* how x265_encoder_open has changed the parameters.
3
* note that the data accessible through pointers in the returned param struct
4
* (e.g. filenames) should not be modified by the calling application. */
5
- void x265_encoder_parameters(x265_encoder *, x265_param *);
6
-
7
+ void x265_encoder_parameters(x265_encoder *, x265_param *);
8
+
9
+**x265_encoder_reconfig()** may be used to reconfigure encoder parameters mid-encode::
10
+
11
+ /* x265_encoder_reconfig:
12
+ * used to modify encoder parameters.
13
+ * various parameters from x265_param are copied.
14
+ * this takes effect immediately, on whichever frame is encoded next;
15
+ * returns 0 on success, negative on parameter validation error.
16
+ *
17
+ * not all parameters can be changed; see the actual function for a
18
+ * detailed breakdown. since not all parameters can be changed, moving
19
+ * from preset to preset may not always fully copy all relevant parameters,
20
+ * but should still work usably in practice. however, more so than for
21
+ * other presets, many of the speed shortcuts used in ultrafast cannot be
22
+ * switched out of; using reconfig to switch between ultrafast and other
23
+ * presets is not recommended without a more fine-grained breakdown of
24
+ * parameters to take this into account. */
25
+ int x265_encoder_reconfig(x265_encoder *, x265_param *);
26
+
27
Pictures
28
========
29
30
31
Multi-library Interface
32
=======================
33
34
-If your application might want to make a runtime selection between among
35
+If your application might want to make a runtime selection between
36
a number of libx265 libraries (perhaps 8bpp and 16bpp), then you will
37
want to use the multi-library interface.
38
39
40
* libx265 */
41
const x265_api* x265_api_get(int bitDepth);
42
43
-The general idea is to request the API for the bitDepth you would prefer
44
-the encoder to use (8 or 10), and if that returns NULL you request the
45
-API for bitDepth=0, which returns the system default libx265.
46
-
47
Note that using this multi-library API in your application is only the
48
-first step. Next your application must dynamically link to libx265 and
49
-then you must build and install a multi-lib configuration of libx265,
50
-which includes 8bpp and 16bpp builds of libx265 and a shim library which
51
-forwards x265_api_get() calls to the appropriate library using dynamic
52
-loading and binding.
53
+first step.
54
+
55
+Your application must link to one build of libx265 (statically or
56
+dynamically) and this linked version of libx265 will support one
57
+bit-depth (8 or 10 bits).
58
+
59
+Your application must now request the API for the bitDepth you would
60
+prefer the encoder to use (8 or 10). If the requested bitdepth is zero,
61
+or if it matches the bitdepth of the system default libx265 (the
62
+currently linked library), then this library will be used for encode.
63
+If you request a different bit-depth, the linked libx265 will attempt
64
+to dynamically bind a shared library with a name appropriate for the
65
+requested bit-depth:
66
+
67
+ 8-bit: libx265_main.dll
68
+ 10-bit: libx265_main10.dll
69
+
70
+ (the shared library extension is obviously platform specific. On
71
+ Linux it is .so while on Mac it is .dylib)
72
+
73
+For example on Windows, one could package together an x265.exe
74
+statically linked against the 8bpp libx265 together with a
75
+libx265_main10.dll in the same folder, and this executable would be able
76
+to encode main and main10 bitstreams.
77
+
78
+On Linux, x265 packagers could install 8bpp static and shared libraries
79
+under the name libx265 (so all applications link against 8bpp libx265)
80
+and then also install libx265_main10.so (symlinked to its numbered solib).
81
+Thus applications which use x265_api_get() will be able to generate main
82
+or main10 bitstreams.
83
x265_1.6.tar.gz/doc/reST/cli.rst -> x265_1.7.tar.gz/doc/reST/cli.rst
Changed
201
1
2
handled implicitly.
3
4
One may also directly supply the CPU capability bitmap as an integer.
5
+
6
+ Note that by specifying this option you are overriding x265's CPU
7
+ detection and it is possible to do this wrong. You can cause encoder
8
+ crashes by specifying SIMD architectures which are not supported on
9
+ your CPU.
10
+
11
+ Default: auto-detected SIMD architectures
12
13
.. option:: --frame-threads, -F <integer>
14
15
16
Over-allocation of frame threads will not improve performance, it
17
will generally just increase memory use.
18
19
- **Values:** any value between 8 and 16. Default is 0, auto-detect
20
+ **Values:** any value between 0 and 16. Default is 0, auto-detect
21
22
.. option:: --pools <string>, --numa-pools <string>
23
24
25
their node, they will not be allowed to migrate between nodes, but they
26
will be allowed to move between CPU cores within their node.
27
28
- If the three pool features: :option:`--wpp` :option:`--pmode` and
29
- :option:`--pme` are all disabled, then :option:`--pools` is ignored
30
- and no thread pools are created.
31
+ If the four pool features: :option:`--wpp`, :option:`--pmode`,
32
+ :option:`--pme` and :option:`--lookahead-slices` are all disabled,
33
+ then :option:`--pools` is ignored and no thread pools are created.
34
35
- If "none" is specified, then all three of the thread pool features are
36
+ If "none" is specified, then all four of the thread pool features are
37
implicitly disabled.
38
39
Multiple thread pools will be allocated for any NUMA node with more than
40
41
:option:`--frame-threads`. The pools are used for WPP and for
42
distributed analysis and motion search.
43
44
+ On Windows, the native APIs offer sufficient functionality to
45
+ discover the NUMA topology and enforce the thread affinity that
46
+ libx265 needs (so long as you have not chosen to target XP or
47
+ Vista), but on POSIX systems it relies on libnuma for this
48
+ functionality. If your target POSIX system is single socket, then
49
+ building without libnuma is a perfectly reasonable option, as it
50
+ will have no effect on the runtime behavior. On a multiple-socket
51
+ system, a POSIX build of libx265 without libnuma will be less work
52
+ efficient. See :ref:`thread pools <pools>` for more detail.
53
+
54
Default "", one thread is allocated per detected hardware thread
55
(logical CPU cores) and one thread pool per NUMA node.
56
57
+ Note that the string value will need to be escaped or quoted to
58
+ protect against shell expansion on many platforms
59
+
60
.. option:: --wpp, --no-wpp
61
62
Enable Wavefront Parallel Processing. The encoder may begin encoding
63
64
65
**CLI ONLY**
66
67
+.. option:: --output-depth, -D 8|10
68
+
69
+ Bitdepth of output HEVC bitstream, which is also the internal bit
70
+ depth of the encoder. If the requested bit depth is not the bit
71
+ depth of the linked libx265, it will attempt to bind libx265_main
72
+ for an 8bit encoder, or libx265_main10 for a 10bit encoder, with the
73
+ same API version as the linked libx265.
74
+
75
+ **CLI ONLY**
76
+
77
Profile, Level, Tier
78
====================
79
80
-.. option:: --profile <string>
81
+.. option:: --profile, -P <string>
82
83
Enforce the requirements of the specified profile, ensuring the
84
output stream will be decodable by a decoder which supports that
85
86
times 10, for example level **5.1** is specified as "5.1" or "51",
87
and level **5.0** is specified as "5.0" or "50".
88
89
- Annex A levels: 1, 2, 2.1, 3, 3.1, 4, 4.1, 5, 5.1, 5.2, 6, 6.1, 6.2
90
+ Annex A levels: 1, 2, 2.1, 3, 3.1, 4, 4.1, 5, 5.1, 5.2, 6, 6.1, 6.2, 8.5
91
92
.. option:: --high-tier, --no-high-tier
93
94
95
HEVC specification. If x265 detects that the total reference count
96
is greater than 8, it will issue a warning that the resulting stream
97
is non-compliant and it signals the stream as profile NONE and level
98
- NONE but still allows the encode to continue. Compliant HEVC
99
+ NONE and will abort the encode unless
100
+ :option:`--allow-non-conformance` it specified. Compliant HEVC
101
decoders may refuse to decode such streams.
102
103
Default 3
104
105
+.. option:: --allow-non-conformance, --no-allow-non-conformance
106
+
107
+ Allow libx265 to generate a bitstream with profile and level NONE.
108
+ By default it will abort any encode which does not meet strict level
109
+ compliance. The two most likely causes for non-conformance are
110
+ :option:`--ctu` being too small, :option:`--ref` being too high,
111
+ or the bitrate or resolution being out of specification.
112
+
113
+ Default: disabled
114
+
115
.. note::
116
:option:`--profile`, :option:`--level-idc`, and
117
:option:`--high-tier` are only intended for use when you are
118
119
limitations and must constrain the bitstream within those limits.
120
Specifying a profile or level may lower the encode quality
121
parameters to meet those requirements but it will never raise
122
- them.
123
+ them. It may enable VBV constraints on a CRF encode.
124
125
Mode decision / Analysis
126
========================
127
128
129
**Range of values:** 0.0 to 3.0
130
131
+.. option:: --qg-size <64|32|16>
132
+
133
+ Enable adaptive quantization for sub-CTUs. This parameter specifies
134
+ the minimum CU size at which QP can be adjusted, ie. Quantization Group
135
+ size. Allowed range of values are 64, 32, 16 provided this falls within
136
+ the inclusive range [maxCUSize, minCUSize]. Experimental.
137
+ Default: same as maxCUSize
138
+
139
.. option:: --cutree, --no-cutree
140
141
Enable the use of lookahead's lowres motion vector fields to
142
143
.. option:: --strict-cbr, --no-strict-cbr
144
145
Enables stricter conditions to control bitrate deviance from the
146
- target bitrate in CBR mode. Bitrate adherence is prioritised
147
+ target bitrate in ABR mode. Bit rate adherence is prioritised
148
over quality. Rate tolerance is reduced to 50%. Default disabled.
149
150
This option is for use-cases which require the final average bitrate
151
- to be within very strict limits of the target - preventing overshoots
152
- completely, and achieve bitrates within 5% of target bitrate,
153
+ to be within very strict limits of the target; preventing overshoots,
154
+ while keeping the bit rate within 5% of the target setting,
155
especially in short segment encodes. Typically, the encoder stays
156
conservative, waiting until there is enough feedback in terms of
157
encoded frames to control QP. strict-cbr allows the encoder to be
158
159
lookahead). Default value is 0.6. Increasing it to 1 will
160
effectively generate CQP
161
162
-.. option:: --qstep <integer>
163
+.. option:: --qpstep <integer>
164
165
The maximum single adjustment in QP allowed to rate control. Default
166
4
167
168
specification for a description of these values. Default undefined
169
(not signaled)
170
171
+.. option:: --master-display <string>
172
+
173
+ SMPTE ST 2086 mastering display color volume SEI info, specified as
174
+ a string which is parsed when the stream header SEI are emitted. The
175
+ string format is "G(%hu,%hu)B(%hu,%hu)R(%hu,%hu)WP(%hu,%hu)L(%u,%u)"
176
+ where %hu are unsigned 16bit integers and %u are unsigned 32bit
177
+ integers. The SEI includes X,Y display primaries for RGB channels,
178
+ white point X,Y and max,min luminance values. (HDR)
179
+
180
+ Example for P65D3 1000-nits:
181
+
182
+ G(13200,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1)
183
+
184
+ Note that this string value will need to be escaped or quoted to
185
+ protect against shell expansion on many platforms. No default.
186
+
187
+.. option:: --max-cll <string>
188
+
189
+ Maximum content light level and maximum frame average light level as
190
+ required by the Consumer Electronics Association 861.3 specification.
191
+
192
+ Specified as a string which is parsed when the stream header SEI are
193
+ emitted. The string format is "%hu,%hu" where %hu are unsigned 16bit
194
+ integers. The first value is the max content light level (or 0 if no
195
+ maximum is indicated), the second value is the maximum picture
196
+ average light level (or 0). (HDR)
197
+
198
+ Note that this string value will need to be escaped or quoted to
199
+ protect against shell expansion on many platforms. No default.
200
+
201
x265_1.6.tar.gz/doc/reST/threading.rst -> x265_1.7.tar.gz/doc/reST/threading.rst
Changed
37
1
2
Threading
3
*********
4
5
+.. _pools:
6
+
7
Thread Pools
8
============
9
10
11
expected to drop that job so the worker thread may go back to the pool
12
and find more work.
13
14
+On Windows, the native APIs offer sufficient functionality to discover
15
+the NUMA topology and enforce the thread affinity that libx265 needs (so
16
+long as you have not chosen to target XP or Vista), but on POSIX systems
17
+it relies on libnuma for this functionality. If your target POSIX system
18
+is single socket, then building without libnuma is a perfectly
19
+reasonable option, as it will have no effect on the runtime behavior. On
20
+a multiple-socket system, a POSIX build of libx265 without libnuma will
21
+be less work efficient, but will still function correctly. You lose the
22
+work isolation effect that keeps each frame encoder from only using the
23
+threads of a single socket and so you incur a heavier context switching
24
+cost.
25
+
26
Wavefront Parallel Processing
27
=============================
28
29
30
lowres cost analysis to worker threads. It will use bonded task groups
31
to perform batches of frame cost estimates, and it may optionally use
32
bonded task groups to measure single frame cost estimates using slices.
33
+(see :option:`--lookahead-slices`)
34
35
The function slicetypeDecide() itself is also be performed by a worker
36
thread if your encoder has a thread pool, else it runs within the
37
x265_1.6.tar.gz/readme.rst -> x265_1.7.tar.gz/readme.rst
Changed
10
1
2
=================
3
4
| **Read:** | Online `documentation <http://x265.readthedocs.org/en/default/>`_ | Developer `wiki <http://bitbucket.org/multicoreware/x265/wiki/>`_
5
-| **Download:** | `releases <http://bitbucket.org/multicoreware/x265/downloads/>`_
6
+| **Download:** | `releases <http://ftp.videolan.org/pub/videolan/x265/>`_
7
| **Interact:** | #x265 on freenode.irc.net | `x265-devel@videolan.org <http://mailman.videolan.org/listinfo/x265-devel>`_ | `Report an issue <https://bitbucket.org/multicoreware/x265/issues?status=new&status=open>`_
8
9
`x265 <https://www.videolan.org/developers/x265.html>`_ is an open
10
x265_1.6.tar.gz/source/CMakeLists.txt -> x265_1.7.tar.gz/source/CMakeLists.txt
Changed
91
1
2
mark_as_advanced(FPROFILE_USE FPROFILE_GENERATE NATIVE_BUILD)
3
4
# X265_BUILD must be incremented each time the public API is changed
5
-set(X265_BUILD 51)
6
+set(X265_BUILD 59)
7
configure_file("${PROJECT_SOURCE_DIR}/x265.def.in"
8
"${PROJECT_BINARY_DIR}/x265.def")
9
configure_file("${PROJECT_SOURCE_DIR}/x265_config.h.in"
10
11
if(LIBRT)
12
list(APPEND PLATFORM_LIBS rt)
13
endif()
14
+ find_library(LIBDL dl)
15
+ if(LIBDL)
16
+ list(APPEND PLATFORM_LIBS dl)
17
+ endif()
18
find_package(Numa)
19
if(NUMA_FOUND)
20
- list(APPEND CMAKE_REQUIRED_LIBRARIES ${NUMA_LIBRARY})
21
+ link_directories(${NUMA_LIBRARY_DIR})
22
+ list(APPEND CMAKE_REQUIRED_LIBRARIES numa)
23
check_symbol_exists(numa_node_of_cpu numa.h NUMA_V2)
24
if(NUMA_V2)
25
add_definitions(-DHAVE_LIBNUMA)
26
message(STATUS "libnuma found, building with support for NUMA nodes")
27
- list(APPEND PLATFORM_LIBS ${NUMA_LIBRARY})
28
- link_directories(${NUMA_LIBRARY_DIR})
29
+ list(APPEND PLATFORM_LIBS numa)
30
include_directories(${NUMA_INCLUDE_DIR})
31
endif()
32
endif()
33
34
if(CMAKE_GENERATOR STREQUAL "Xcode")
35
set(XCODE 1)
36
endif()
37
-if (APPLE)
38
+if(APPLE)
39
add_definitions(-DMACOS)
40
endif()
41
42
43
add_definitions(-static)
44
list(APPEND LINKER_OPTIONS "-static")
45
endif(STATIC_LINK_CRT)
46
+ check_cxx_compiler_flag(-Wno-strict-overflow CC_HAS_NO_STRICT_OVERFLOW)
47
check_cxx_compiler_flag(-Wno-narrowing CC_HAS_NO_NARROWING)
48
check_cxx_compiler_flag(-Wno-array-bounds CC_HAS_NO_ARRAY_BOUNDS)
49
if (CC_HAS_NO_ARRAY_BOUNDS)
50
51
endif()
52
endif(WARNINGS_AS_ERRORS)
53
54
-if (WIN32)
55
+if(WIN32)
56
# Visual leak detector
57
find_package(VLD QUIET)
58
if(VLD_FOUND)
59
60
list(APPEND PLATFORM_LIBS ${VLD_LIBRARIES})
61
link_directories(${VLD_LIBRARY_DIRS})
62
endif()
63
- option(WINXP_SUPPORT "Make binaries compatible with Windows XP" OFF)
64
+ option(WINXP_SUPPORT "Make binaries compatible with Windows XP and Vista" OFF)
65
if(WINXP_SUPPORT)
66
# force use of workarounds for CONDITION_VARIABLE and atomic
67
# intrinsics introduced after XP
68
- add_definitions(-D_WIN32_WINNT=_WIN32_WINNT_WINXP)
69
- endif()
70
+ add_definitions(-D_WIN32_WINNT=_WIN32_WINNT_WINXP -D_WIN32_WINNT_WIN7=0x0601)
71
+ else(WINXP_SUPPORT)
72
+ # default to targeting Windows 7 for the NUMA APIs
73
+ add_definitions(-D_WIN32_WINNT=_WIN32_WINNT_WIN7)
74
+ endif(WINXP_SUPPORT)
75
endif()
76
77
include(version) # determine X265_VERSION and X265_LATEST_TAG
78
79
# Main CLI application
80
option(ENABLE_CLI "Build standalone CLI application" ON)
81
if(ENABLE_CLI)
82
- file(GLOB InputFiles input/*.cpp input/*.h)
83
- file(GLOB OutputFiles output/*.cpp output/*.h)
84
+ file(GLOB InputFiles input/input.cpp input/yuv.cpp input/y4m.cpp input/*.h)
85
+ file(GLOB OutputFiles output/output.cpp output/reconplay.cpp output/*.h
86
+ output/yuv.cpp output/y4m.cpp # recon
87
+ output/raw.cpp) # muxers
88
file(GLOB FilterFiles filters/*.cpp filters/*.h)
89
source_group(input FILES ${InputFiles})
90
source_group(output FILES ${OutputFiles})
91
x265_1.6.tar.gz/source/common/common.cpp -> x265_1.7.tar.gz/source/common/common.cpp
Changed
34
1
2
return (x265_exp2_lut[i & 63] + 256) << (i >> 6) >> 8;
3
}
4
5
-void x265_log(const x265_param *param, int level, const char *fmt, ...)
6
+void general_log(const x265_param* param, const char* caller, int level, const char* fmt, ...)
7
{
8
if (param && level > param->logLevel)
9
return;
10
- const char *log_level;
11
+ const int bufferSize = 4096;
12
+ char buffer[bufferSize];
13
+ int p = 0;
14
+ const char* log_level;
15
switch (level)
16
{
17
case X265_LOG_ERROR:
18
19
break;
20
}
21
22
- fprintf(stderr, "x265 [%s]: ", log_level);
23
+ if (caller)
24
+ p += sprintf(buffer, "%-4s [%s]: ", caller, log_level);
25
va_list arg;
26
va_start(arg, fmt);
27
- vfprintf(stderr, fmt, arg);
28
+ vsnprintf(buffer + p, bufferSize - p, fmt, arg);
29
va_end(arg);
30
+ fputs(buffer, stderr);
31
}
32
33
double x265_ssim2dB(double ssim)
34
x265_1.6.tar.gz/source/common/common.h -> x265_1.7.tar.gz/source/common/common.h
Changed
11
1
2
3
/* outside x265 namespace, but prefixed. defined in common.cpp */
4
int64_t x265_mdate(void);
5
-void x265_log(const x265_param *param, int level, const char *fmt, ...);
6
+#define x265_log(param, ...) general_log(param, "x265", __VA_ARGS__)
7
+void general_log(const x265_param* param, const char* caller, int level, const char* fmt, ...);
8
int x265_exp2fix8(double x);
9
10
double x265_ssim2dB(double ssim);
11
x265_1.6.tar.gz/source/common/constants.cpp -> x265_1.7.tar.gz/source/common/constants.cpp
Changed
10
1
2
4, 12, 20, 28, 5, 13, 21, 29, 6, 14, 22, 30, 7, 15, 23, 31, 36, 44, 52, 60, 37, 45, 53, 61, 38, 46, 54, 62, 39, 47, 55, 63 }
3
};
4
5
-const uint16_t g_scan4x4[NUM_SCAN_TYPE][4 * 4] =
6
+ALIGN_VAR_16(const uint16_t, g_scan4x4[NUM_SCAN_TYPE][4 * 4]) =
7
{
8
{ 0, 4, 1, 8, 5, 2, 12, 9, 6, 3, 13, 10, 7, 14, 11, 15 },
9
{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
10
x265_1.6.tar.gz/source/common/contexts.h -> x265_1.7.tar.gz/source/common/contexts.h
Changed
9
1
2
// private namespace
3
4
extern const uint32_t g_entropyBits[128];
5
+extern const uint32_t g_entropyStateBits[128];
6
extern const uint8_t g_nextState[128][2];
7
8
#define sbacGetMps(S) ((S) & 1)
9
x265_1.6.tar.gz/source/common/cudata.cpp -> x265_1.7.tar.gz/source/common/cudata.cpp
Changed
40
1
2
}
3
4
// initialize Sub partition
5
-void CUData::initSubCU(const CUData& ctu, const CUGeom& cuGeom)
6
+void CUData::initSubCU(const CUData& ctu, const CUGeom& cuGeom, int qp)
7
{
8
m_absIdxInCTU = cuGeom.absPartIdx;
9
m_encData = ctu.m_encData;
10
11
m_cuAboveRight = ctu.m_cuAboveRight;
12
X265_CHECK(m_numPartitions == cuGeom.numPartitions, "initSubCU() size mismatch\n");
13
14
- /* sequential memsets */
15
- m_partSet((uint8_t*)m_qp, (uint8_t)ctu.m_qp[0]);
16
+ m_partSet((uint8_t*)m_qp, (uint8_t)qp);
17
+
18
m_partSet(m_log2CUSize, (uint8_t)cuGeom.log2CUSize);
19
m_partSet(m_lumaIntraDir, (uint8_t)DC_IDX);
20
m_partSet(m_tqBypass, (uint8_t)m_encData->m_param->bLossless);
21
22
}
23
}
24
25
+/* Clip motion vector to within slightly padded boundary of picture (the
26
+ * MV may reference a block that is completely within the padded area).
27
+ * Note this function is unaware of how much of this picture is actually
28
+ * available for use (re: frame parallelism) */
29
void CUData::clipMv(MV& outMV) const
30
{
31
const uint32_t mvshift = 2;
32
33
uint32_t blockSize = 1 << log2CUSize;
34
uint32_t sbWidth = 1 << (g_log2Size[maxCUSize] - log2CUSize);
35
int32_t lastLevelFlag = log2CUSize == g_log2Size[minCUSize];
36
+
37
for (uint32_t sbY = 0; sbY < sbWidth; sbY++)
38
{
39
for (uint32_t sbX = 0; sbX < sbWidth; sbX++)
40
x265_1.6.tar.gz/source/common/cudata.h -> x265_1.7.tar.gz/source/common/cudata.h
Changed
20
1
2
uint32_t childOffset; // offset of the first child CU from current CU
3
uint32_t absPartIdx; // Part index of this CU in terms of 4x4 blocks.
4
uint32_t numPartitions; // Number of 4x4 blocks in the CU
5
- uint32_t depth; // depth of this CU relative from CTU
6
uint32_t flags; // CU flags.
7
+ uint32_t depth; // depth of this CU relative from CTU
8
};
9
10
struct MVField
11
12
static void calcCTUGeoms(uint32_t ctuWidth, uint32_t ctuHeight, uint32_t maxCUSize, uint32_t minCUSize, CUGeom cuDataArray[CUGeom::MAX_GEOMS]);
13
14
void initCTU(const Frame& frame, uint32_t cuAddr, int qp);
15
- void initSubCU(const CUData& ctu, const CUGeom& cuGeom);
16
+ void initSubCU(const CUData& ctu, const CUGeom& cuGeom, int qp);
17
void initLosslessCU(const CUData& cu, const CUGeom& cuGeom);
18
19
void copyPartFrom(const CUData& cu, const CUGeom& childGeom, uint32_t subPartIdx);
20
x265_1.6.tar.gz/source/common/dct.cpp -> x265_1.7.tar.gz/source/common/dct.cpp
Changed
57
1
2
}
3
}
4
5
-int findPosLast_c(const uint16_t *scan, const coeff_t *coeff, uint16_t *coeffSign, uint16_t *coeffFlag, uint8_t *coeffNum, int numSig)
6
+int scanPosLast_c(const uint16_t *scan, const coeff_t *coeff, uint16_t *coeffSign, uint16_t *coeffFlag, uint8_t *coeffNum, int numSig, const uint16_t* /*scanCG4x4*/, const int /*trSize*/)
7
{
8
memset(coeffNum, 0, MLS_GRP_NUM * sizeof(*coeffNum));
9
memset(coeffFlag, 0, MLS_GRP_NUM * sizeof(*coeffFlag));
10
11
return scanPosLast - 1;
12
}
13
14
+uint32_t findPosFirstLast_c(const int16_t *dstCoeff, const intptr_t trSize, const uint16_t scanTbl[16])
15
+{
16
+ int n;
17
+
18
+ for (n = SCAN_SET_SIZE - 1; n >= 0; --n)
19
+ {
20
+ const uint32_t idx = scanTbl[n];
21
+ const uint32_t idxY = idx / MLS_CG_SIZE;
22
+ const uint32_t idxX = idx % MLS_CG_SIZE;
23
+ if (dstCoeff[idxY * trSize + idxX])
24
+ break;
25
+ }
26
+
27
+ X265_CHECK(n >= 0, "non-zero coeff scan failuare!\n");
28
+
29
+ uint32_t lastNZPosInCG = (uint32_t)n;
30
+
31
+ for (n = 0;; n++)
32
+ {
33
+ const uint32_t idx = scanTbl[n];
34
+ const uint32_t idxY = idx / MLS_CG_SIZE;
35
+ const uint32_t idxX = idx % MLS_CG_SIZE;
36
+ if (dstCoeff[idxY * trSize + idxX])
37
+ break;
38
+ }
39
+
40
+ uint32_t firstNZPosInCG = (uint32_t)n;
41
+
42
+ return ((lastNZPosInCG << 16) | firstNZPosInCG);
43
+}
44
+
45
} // closing - anonymous file-static namespace
46
47
namespace x265 {
48
49
p.cu[BLOCK_16x16].copy_cnt = copy_count<16>;
50
p.cu[BLOCK_32x32].copy_cnt = copy_count<32>;
51
52
- p.findPosLast = findPosLast_c;
53
+ p.scanPosLast = scanPosLast_c;
54
+ p.findPosFirstLast = findPosFirstLast_c;
55
}
56
}
57
x265_1.6.tar.gz/source/common/frame.cpp -> x265_1.7.tar.gz/source/common/frame.cpp
Changed
23
1
2
Frame::Frame()
3
{
4
m_bChromaExtended = false;
5
+ m_lowresInit = false;
6
m_reconRowCount.set(0);
7
m_countRefEncoders = 0;
8
m_encData = NULL;
9
m_reconPic = NULL;
10
m_next = NULL;
11
m_prev = NULL;
12
+ m_param = NULL;
13
memset(&m_lowres, 0, sizeof(m_lowres));
14
}
15
16
bool Frame::create(x265_param *param)
17
{
18
m_fencPic = new PicYuv;
19
+ m_param = param;
20
21
return m_fencPic->create(param->sourceWidth, param->sourceHeight, param->internalCsp) &&
22
m_lowres.create(m_fencPic, param->bframes, !!param->rc.aqMode);
23
x265_1.6.tar.gz/source/common/frame.h -> x265_1.7.tar.gz/source/common/frame.h
Changed
18
1
2
void* m_userData; // user provided pointer passed in with this picture
3
4
Lowres m_lowres;
5
+ bool m_lowresInit; // lowres init complete (pre-analysis)
6
bool m_bChromaExtended; // orig chroma planes motion extended for weight analysis
7
8
/* Frame Parallelism - notification between FrameEncoders of available motion reference rows */
9
10
11
Frame* m_next; // PicList doubly linked list pointers
12
Frame* m_prev;
13
-
14
+ x265_param* m_param; // Points to the latest param set for the frame.
15
x265_analysis_data m_analysisData;
16
Frame();
17
18
x265_1.6.tar.gz/source/common/framedata.h -> x265_1.7.tar.gz/source/common/framedata.h
Changed
9
1
2
uint32_t numEncodedCUs; /* ctuAddr of last encoded CTU in row */
3
uint32_t encodedBits; /* sum of 'totalBits' of encoded CTUs */
4
uint32_t satdForVbv; /* sum of lowres (estimated) costs for entire row */
5
+ uint32_t intraSatdForVbv; /* sum of lowres (estimated) intra costs for entire row */
6
uint32_t diagSatd;
7
uint32_t diagIntraSatd;
8
double diagQp;
9
x265_1.6.tar.gz/source/common/ipfilter.cpp -> x265_1.7.tar.gz/source/common/ipfilter.cpp
Changed
87
1
2
#endif
3
4
namespace {
5
-template<int dstStride, int width, int height>
6
-void pixelToShort_c(const pixel* src, intptr_t srcStride, int16_t* dst)
7
-{
8
- int shift = IF_INTERNAL_PREC - X265_DEPTH;
9
- int row, col;
10
-
11
- for (row = 0; row < height; row++)
12
- {
13
- for (col = 0; col < width; col++)
14
- {
15
- int16_t val = src[col] << shift;
16
- dst[col] = val - (int16_t)IF_INTERNAL_OFFS;
17
- }
18
-
19
- src += srcStride;
20
- dst += dstStride;
21
- }
22
-}
23
-
24
-template<int dstStride>
25
-void filterPixelToShort_c(const pixel* src, intptr_t srcStride, int16_t* dst, int width, int height)
26
+template<int width, int height>
27
+void filterPixelToShort_c(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride)
28
{
29
int shift = IF_INTERNAL_PREC - X265_DEPTH;
30
int row, col;
31
32
p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].filter_vps = interp_vert_ps_c<4, W, H>; \
33
p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].filter_vsp = interp_vert_sp_c<4, W, H>; \
34
p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].filter_vss = interp_vert_ss_c<4, W, H>; \
35
- p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].chroma_p2s = pixelToShort_c<MAX_CU_SIZE / 2, W, H>;
36
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_ ## W ## x ## H].p2s = filterPixelToShort_c<W, H>;
37
38
#define CHROMA_422(W, H) \
39
p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].filter_hpp = interp_horiz_pp_c<4, W, H>; \
40
41
p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].filter_vps = interp_vert_ps_c<4, W, H>; \
42
p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].filter_vsp = interp_vert_sp_c<4, W, H>; \
43
p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].filter_vss = interp_vert_ss_c<4, W, H>; \
44
- p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].chroma_p2s = pixelToShort_c<MAX_CU_SIZE / 2, W, H>;
45
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_ ## W ## x ## H].p2s = filterPixelToShort_c<W, H>;
46
47
#define CHROMA_444(W, H) \
48
p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_hpp = interp_horiz_pp_c<4, W, H>; \
49
50
p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_vps = interp_vert_ps_c<4, W, H>; \
51
p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_vsp = interp_vert_sp_c<4, W, H>; \
52
p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].filter_vss = interp_vert_ss_c<4, W, H>; \
53
- p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].chroma_p2s = pixelToShort_c<MAX_CU_SIZE, W, H>;
54
+ p.chroma[X265_CSP_I444].pu[LUMA_ ## W ## x ## H].p2s = filterPixelToShort_c<W, H>;
55
56
#define LUMA(W, H) \
57
p.pu[LUMA_ ## W ## x ## H].luma_hpp = interp_horiz_pp_c<8, W, H>; \
58
59
p.pu[LUMA_ ## W ## x ## H].luma_vsp = interp_vert_sp_c<8, W, H>; \
60
p.pu[LUMA_ ## W ## x ## H].luma_vss = interp_vert_ss_c<8, W, H>; \
61
p.pu[LUMA_ ## W ## x ## H].luma_hvpp = interp_hv_pp_c<8, W, H>; \
62
- p.pu[LUMA_ ## W ## x ## H].filter_p2s = pixelToShort_c<MAX_CU_SIZE, W, H>
63
+ p.pu[LUMA_ ## W ## x ## H].convert_p2s = filterPixelToShort_c<W, H>;
64
65
void setupFilterPrimitives_c(EncoderPrimitives& p)
66
{
67
68
69
CHROMA_422(4, 8);
70
CHROMA_422(4, 4);
71
+ CHROMA_422(2, 4);
72
CHROMA_422(2, 8);
73
CHROMA_422(8, 16);
74
CHROMA_422(8, 8);
75
76
CHROMA_444(48, 64);
77
CHROMA_444(64, 16);
78
CHROMA_444(16, 64);
79
- p.luma_p2s = filterPixelToShort_c<MAX_CU_SIZE>;
80
-
81
- p.chroma[X265_CSP_I444].p2s = filterPixelToShort_c<MAX_CU_SIZE>;
82
- p.chroma[X265_CSP_I420].p2s = filterPixelToShort_c<MAX_CU_SIZE / 2>;
83
- p.chroma[X265_CSP_I422].p2s = filterPixelToShort_c<MAX_CU_SIZE / 2>;
84
85
p.extendRowBorder = extendCURowColBorder;
86
}
87
x265_1.6.tar.gz/source/common/loopfilter.cpp -> x265_1.7.tar.gz/source/common/loopfilter.cpp
Changed
73
1
2
dst[x] = signOf(src1[x] - src2[x]);
3
}
4
5
-void processSaoCUE0(pixel * rec, int8_t * offsetEo, int width, int8_t signLeft)
6
+void processSaoCUE0(pixel * rec, int8_t * offsetEo, int width, int8_t* signLeft, intptr_t stride)
7
{
8
- int x;
9
- int8_t signRight;
10
+ int x, y;
11
+ int8_t signRight, signLeft0;
12
int8_t edgeType;
13
14
- for (x = 0; x < width; x++)
15
+ for (y = 0; y < 2; y++)
16
{
17
- signRight = ((rec[x] - rec[x + 1]) < 0) ? -1 : ((rec[x] - rec[x + 1]) > 0) ? 1 : 0;
18
- edgeType = signRight + signLeft + 2;
19
- signLeft = -signRight;
20
- rec[x] = x265_clip(rec[x] + offsetEo[edgeType]);
21
+ signLeft0 = signLeft[y];
22
+ for (x = 0; x < width; x++)
23
+ {
24
+ signRight = ((rec[x] - rec[x + 1]) < 0) ? -1 : ((rec[x] - rec[x + 1]) > 0) ? 1 : 0;
25
+ edgeType = signRight + signLeft0 + 2;
26
+ signLeft0 = -signRight;
27
+ rec[x] = x265_clip(rec[x] + offsetEo[edgeType]);
28
+ }
29
+ rec += stride;
30
}
31
}
32
33
34
}
35
}
36
37
+void processSaoCUE1_2Rows(pixel* rec, int8_t* upBuff1, int8_t* offsetEo, intptr_t stride, int width)
38
+{
39
+ int x, y;
40
+ int8_t signDown;
41
+ int edgeType;
42
+
43
+ for (y = 0; y < 2; y++)
44
+ {
45
+ for (x = 0; x < width; x++)
46
+ {
47
+ signDown = signOf(rec[x] - rec[x + stride]);
48
+ edgeType = signDown + upBuff1[x] + 2;
49
+ upBuff1[x] = -signDown;
50
+ rec[x] = x265_clip(rec[x] + offsetEo[edgeType]);
51
+ }
52
+ rec += stride;
53
+ }
54
+}
55
+
56
void processSaoCUE2(pixel * rec, int8_t * bufft, int8_t * buff1, int8_t * offsetEo, int width, intptr_t stride)
57
{
58
int x;
59
60
{
61
p.saoCuOrgE0 = processSaoCUE0;
62
p.saoCuOrgE1 = processSaoCUE1;
63
- p.saoCuOrgE2 = processSaoCUE2;
64
- p.saoCuOrgE3 = processSaoCUE3;
65
+ p.saoCuOrgE1_2Rows = processSaoCUE1_2Rows;
66
+ p.saoCuOrgE2[0] = processSaoCUE2;
67
+ p.saoCuOrgE2[1] = processSaoCUE2;
68
+ p.saoCuOrgE3[0] = processSaoCUE3;
69
+ p.saoCuOrgE3[1] = processSaoCUE3;
70
p.saoCuOrgB0 = processSaoCUB0;
71
p.sign = calSign;
72
}
73
x265_1.6.tar.gz/source/common/param.cpp -> x265_1.7.tar.gz/source/common/param.cpp
Changed
159
1
2
extern "C"
3
void x265_param_free(x265_param* p)
4
{
5
- return x265_free(p);
6
+ x265_free(p);
7
}
8
9
extern "C"
10
11
param->levelIdc = 0;
12
param->bHighTier = 0;
13
param->interlaceMode = 0;
14
+ param->bAnnexB = 1;
15
param->bRepeatHeaders = 0;
16
param->bEnableAccessUnitDelimiters = 0;
17
param->bEmitHRDSEI = 0;
18
19
param->rc.zones = NULL;
20
param->rc.bEnableSlowFirstPass = 0;
21
param->rc.bStrictCbr = 0;
22
+ param->rc.qgSize = 64; /* Same as maxCUSize */
23
24
/* Video Usability Information (VUI) */
25
param->vui.aspectRatioIdc = 0;
26
27
param->rc.aqStrength = 0.0;
28
param->rc.aqMode = X265_AQ_NONE;
29
param->rc.cuTree = 0;
30
+ param->rc.qgSize = 32;
31
param->bEnableFastIntra = 1;
32
}
33
else if (!strcmp(preset, "superfast"))
34
35
param->rc.aqStrength = 0.0;
36
param->rc.aqMode = X265_AQ_NONE;
37
param->rc.cuTree = 0;
38
+ param->rc.qgSize = 32;
39
param->bEnableSAO = 0;
40
param->bEnableFastIntra = 1;
41
}
42
43
param->rdLevel = 2;
44
param->maxNumReferences = 1;
45
param->rc.cuTree = 0;
46
+ param->rc.qgSize = 32;
47
param->bEnableFastIntra = 1;
48
}
49
else if (!strcmp(preset, "faster"))
50
51
p->levelIdc = atoi(value);
52
}
53
OPT("high-tier") p->bHighTier = atobool(value);
54
+ OPT("allow-non-conformance") p->bAllowNonConformance = atobool(value);
55
OPT2("log-level", "log")
56
{
57
p->logLevel = atoi(value);
58
59
}
60
}
61
OPT("cu-stats") p->bLogCuStats = atobool(value);
62
+ OPT("annexb") p->bAnnexB = atobool(value);
63
OPT("repeat-headers") p->bRepeatHeaders = atobool(value);
64
OPT("wpp") p->bEnableWavefront = atobool(value);
65
OPT("ctu") p->maxCUSize = (uint32_t)atoi(value);
66
67
OPT2("pools", "numa-pools") p->numaPools = strdup(value);
68
OPT("lambda-file") p->rc.lambdaFileName = strdup(value);
69
OPT("analysis-file") p->analysisFileName = strdup(value);
70
+ OPT("qg-size") p->rc.qgSize = atoi(value);
71
+ OPT("master-display") p->masteringDisplayColorVolume = strdup(value);
72
+ OPT("max-cll") p->contentLightLevelInfo = strdup(value);
73
else
74
return X265_PARAM_BAD_NAME;
75
#undef OPT
76
77
uint32_t maxLog2CUSize = (uint32_t)g_log2Size[param->maxCUSize];
78
uint32_t minLog2CUSize = (uint32_t)g_log2Size[param->minCUSize];
79
80
- if (g_ctuSizeConfigured || ATOMIC_INC(&g_ctuSizeConfigured) > 1)
81
+ if (ATOMIC_INC(&g_ctuSizeConfigured) > 1)
82
{
83
if (g_maxCUSize != param->maxCUSize)
84
{
85
86
x265_log(param, X265_LOG_INFO, "b-pyramid / weightp / weightb / refs: %d / %d / %d / %d\n",
87
param->bBPyramid, param->bEnableWeightedPred, param->bEnableWeightedBiPred, param->maxNumReferences);
88
89
+ if (param->rc.aqMode)
90
+ x265_log(param, X265_LOG_INFO, "AQ: mode / str / qg-size / cu-tree : %d / %0.1f / %d / %d\n", param->rc.aqMode,
91
+ param->rc.aqStrength, param->rc.qgSize, param->rc.cuTree);
92
+
93
if (param->bLossless)
94
x265_log(param, X265_LOG_INFO, "Rate Control : Lossless\n");
95
else switch (param->rc.rateControlMode)
96
{
97
case X265_RC_ABR:
98
- x265_log(param, X265_LOG_INFO, "Rate Control / AQ-Strength / CUTree : ABR-%d kbps / %0.1f / %d\n", param->rc.bitrate,
99
- param->rc.aqStrength, param->rc.cuTree);
100
- break;
101
+ x265_log(param, X265_LOG_INFO, "Rate Control / qCompress : ABR-%d kbps / %0.2f\n", param->rc.bitrate, param->rc.qCompress); break;
102
case X265_RC_CQP:
103
- x265_log(param, X265_LOG_INFO, "Rate Control / AQ-Strength / CUTree : CQP-%d / %0.1f / %d\n", param->rc.qp, param->rc.aqStrength,
104
- param->rc.cuTree);
105
- break;
106
+ x265_log(param, X265_LOG_INFO, "Rate Control : CQP-%d\n", param->rc.qp); break;
107
case X265_RC_CRF:
108
- x265_log(param, X265_LOG_INFO, "Rate Control / AQ-Strength / CUTree : CRF-%0.1f / %0.1f / %d\n", param->rc.rfConstant,
109
- param->rc.aqStrength, param->rc.cuTree);
110
- break;
111
+ x265_log(param, X265_LOG_INFO, "Rate Control / qCompress : CRF-%0.1f / %0.2f\n", param->rc.rfConstant, param->rc.qCompress); break;
112
}
113
114
if (param->rc.vbvBufferSize)
115
116
fflush(stderr);
117
}
118
119
+void x265_print_reconfigured_params(x265_param* param, x265_param* reconfiguredParam)
120
+{
121
+ if (!param || !reconfiguredParam)
122
+ return;
123
+
124
+ x265_log(param,X265_LOG_INFO, "Reconfigured param options :\n");
125
+
126
+ char buf[80] = { 0 };
127
+ char tmp[40];
128
+#define TOOLCMP(COND1, COND2, STR, VAL) if (COND1 != COND2) { sprintf(tmp, STR, VAL); appendtool(param, buf, sizeof(buf), tmp); }
129
+ TOOLCMP(param->maxNumReferences, reconfiguredParam->maxNumReferences, "ref=%d", reconfiguredParam->maxNumReferences);
130
+ TOOLCMP(param->maxTUSize, reconfiguredParam->maxTUSize, "max-tu-size=%d", reconfiguredParam->maxTUSize);
131
+ TOOLCMP(param->searchRange, reconfiguredParam->searchRange, "merange=%d", reconfiguredParam->searchRange);
132
+ TOOLCMP(param->subpelRefine, reconfiguredParam->subpelRefine, "subme= %d", reconfiguredParam->subpelRefine);
133
+ TOOLCMP(param->rdLevel, reconfiguredParam->rdLevel, "rd=%d", reconfiguredParam->rdLevel);
134
+ TOOLCMP(param->psyRd, reconfiguredParam->psyRd, "psy-rd=%.2lf", reconfiguredParam->psyRd);
135
+ TOOLCMP(param->rdoqLevel, reconfiguredParam->rdoqLevel, "rdoq=%d", reconfiguredParam->rdoqLevel);
136
+ TOOLCMP(param->psyRdoq, reconfiguredParam->psyRdoq, "psy-rdoq=%.2lf", reconfiguredParam->psyRdoq);
137
+ TOOLCMP(param->noiseReductionIntra, reconfiguredParam->noiseReductionIntra, "nr-intra=%d", reconfiguredParam->noiseReductionIntra);
138
+ TOOLCMP(param->noiseReductionInter, reconfiguredParam->noiseReductionInter, "nr-inter=%d", reconfiguredParam->noiseReductionInter);
139
+ TOOLCMP(param->bEnableTSkipFast, reconfiguredParam->bEnableTSkipFast, "tskip-fast=%d", reconfiguredParam->bEnableTSkipFast);
140
+ TOOLCMP(param->bEnableSignHiding, reconfiguredParam->bEnableSignHiding, "signhide=%d", reconfiguredParam->bEnableSignHiding);
141
+ TOOLCMP(param->bEnableFastIntra, reconfiguredParam->bEnableFastIntra, "fast-intra=%d", reconfiguredParam->bEnableFastIntra);
142
+ if (param->bEnableLoopFilter && (param->deblockingFilterBetaOffset != reconfiguredParam->deblockingFilterBetaOffset
143
+ || param->deblockingFilterTCOffset != reconfiguredParam->deblockingFilterTCOffset))
144
+ {
145
+ sprintf(tmp, "deblock(tC=%d:B=%d)", param->deblockingFilterTCOffset, param->deblockingFilterBetaOffset);
146
+ appendtool(param, buf, sizeof(buf), tmp);
147
+ }
148
+ else
149
+ TOOLCMP(param->bEnableLoopFilter, reconfiguredParam->bEnableLoopFilter, "deblock=%d", reconfiguredParam->bEnableLoopFilter);
150
+
151
+ TOOLCMP(param->bEnableTemporalMvp, reconfiguredParam->bEnableTemporalMvp, "tmvp=%d", reconfiguredParam->bEnableTemporalMvp);
152
+ TOOLCMP(param->bEnableEarlySkip, reconfiguredParam->bEnableEarlySkip, "early-skip=%d", reconfiguredParam->bEnableEarlySkip);
153
+ x265_log(param, X265_LOG_INFO, "tools:%s\n", buf);
154
+}
155
+
156
char *x265_param2string(x265_param* p)
157
{
158
char *buf, *s;
159
x265_1.6.tar.gz/source/common/param.h -> x265_1.7.tar.gz/source/common/param.h
Changed
9
1
2
int x265_check_params(x265_param *param);
3
int x265_set_globals(x265_param *param);
4
void x265_print_params(x265_param *param);
5
+void x265_print_reconfigured_params(x265_param* param, x265_param* reconfiguredParam);
6
void x265_param_apply_fastfirstpass(x265_param *p);
7
char* x265_param2string(x265_param *param);
8
int x265_atoi(const char *str, bool& bError);
9
x265_1.6.tar.gz/source/common/picyuv.cpp -> x265_1.7.tar.gz/source/common/picyuv.cpp
Changed
25
1
2
3
for (int r = 0; r < height; r++)
4
{
5
- for (int c = 0; c < width; c++)
6
- yPixel[c] = (pixel)yChar[c];
7
+ memcpy(yPixel, yChar, width * sizeof(pixel));
8
9
yPixel += m_stride;
10
yChar += pic.stride[0] / sizeof(*yChar);
11
12
13
for (int r = 0; r < height >> m_vChromaShift; r++)
14
{
15
- for (int c = 0; c < width >> m_hChromaShift; c++)
16
- {
17
- uPixel[c] = (pixel)uChar[c];
18
- vPixel[c] = (pixel)vChar[c];
19
- }
20
+ memcpy(uPixel, uChar, (width >> m_hChromaShift) * sizeof(pixel));
21
+ memcpy(vPixel, vChar, (width >> m_hChromaShift) * sizeof(pixel));
22
23
uPixel += m_strideC;
24
vPixel += m_strideC;
25
x265_1.6.tar.gz/source/common/pixel.cpp -> x265_1.7.tar.gz/source/common/pixel.cpp
Changed
10
1
2
}
3
}
4
5
-void scale1D_128to64(pixel *dst, const pixel *src, intptr_t /*stride*/)
6
+void scale1D_128to64(pixel *dst, const pixel *src)
7
{
8
int x;
9
const pixel* src1 = src;
10
x265_1.6.tar.gz/source/common/predict.cpp -> x265_1.7.tar.gz/source/common/predict.cpp
Changed
72
1
2
void Predict::predInterLumaShort(const PredictionUnit& pu, ShortYuv& dstSYuv, const PicYuv& refPic, const MV& mv) const
3
{
4
int16_t* dst = dstSYuv.getLumaAddr(pu.puAbsPartIdx);
5
- int dstStride = dstSYuv.m_size;
6
+ intptr_t dstStride = dstSYuv.m_size;
7
8
intptr_t srcStride = refPic.m_stride;
9
intptr_t srcOffset = (mv.x >> 2) + (mv.y >> 2) * srcStride;
10
11
X265_CHECK(dstStride == MAX_CU_SIZE, "stride expected to be max cu size\n");
12
13
if (!(yFrac | xFrac))
14
- primitives.luma_p2s(src, srcStride, dst, pu.width, pu.height);
15
+ primitives.pu[partEnum].convert_p2s(src, srcStride, dst, dstStride);
16
else if (!yFrac)
17
primitives.pu[partEnum].luma_hps(src, srcStride, dst, dstStride, xFrac, 0);
18
else if (!xFrac)
19
20
int partEnum = partitionFromSizes(pu.width, pu.height);
21
22
uint32_t cxWidth = pu.width >> m_hChromaShift;
23
- uint32_t cxHeight = pu.height >> m_vChromaShift;
24
25
- X265_CHECK(((cxWidth | cxHeight) % 2) == 0, "chroma block size expected to be multiple of 2\n");
26
+ X265_CHECK(((cxWidth | (pu.height >> m_vChromaShift)) % 2) == 0, "chroma block size expected to be multiple of 2\n");
27
28
if (!(yFrac | xFrac))
29
{
30
- primitives.chroma[m_csp].p2s(refCb, refStride, dstCb, cxWidth, cxHeight);
31
- primitives.chroma[m_csp].p2s(refCr, refStride, dstCr, cxWidth, cxHeight);
32
+ primitives.chroma[m_csp].pu[partEnum].p2s(refCb, refStride, dstCb, dstStride);
33
+ primitives.chroma[m_csp].pu[partEnum].p2s(refCr, refStride, dstCr, dstStride);
34
}
35
else if (!yFrac)
36
{
37
38
const pixel refSample = *pAdiLineNext;
39
// Pad unavailable samples with new value
40
int nextOrTop = X265_MIN(next, leftUnits);
41
+
42
// fill left column
43
+#if HIGH_BIT_DEPTH
44
while (curr < nextOrTop)
45
{
46
for (int i = 0; i < unitHeight; i++)
47
48
adi += unitWidth;
49
curr++;
50
}
51
+#else
52
+ X265_CHECK(curr <= nextOrTop, "curr must be less than or equal to nextOrTop\n");
53
+ if (curr < nextOrTop)
54
+ {
55
+ const int fillSize = unitHeight * (nextOrTop - curr);
56
+ memset(adi, refSample, fillSize * sizeof(pixel));
57
+ curr = nextOrTop;
58
+ adi += fillSize;
59
+ }
60
+
61
+ if (curr < next)
62
+ {
63
+ const int fillSize = unitWidth * (next - curr);
64
+ memset(adi, refSample, fillSize * sizeof(pixel));
65
+ curr = next;
66
+ adi += fillSize;
67
+ }
68
+#endif
69
}
70
71
// pad all other reference samples.
72
x265_1.6.tar.gz/source/common/primitives.cpp -> x265_1.7.tar.gz/source/common/primitives.cpp
Changed
18
1
2
3
/* alias chroma 4:4:4 from luma primitives (all but chroma filters) */
4
5
- p.chroma[X265_CSP_I444].p2s = p.luma_p2s;
6
p.chroma[X265_CSP_I444].cu[BLOCK_4x4].sa8d = NULL;
7
8
for (int i = 0; i < NUM_PU_SIZES; i++)
9
10
p.chroma[X265_CSP_I444].pu[i].copy_pp = p.pu[i].copy_pp;
11
p.chroma[X265_CSP_I444].pu[i].addAvg = p.pu[i].addAvg;
12
p.chroma[X265_CSP_I444].pu[i].satd = p.pu[i].satd;
13
- p.chroma[X265_CSP_I444].pu[i].chroma_p2s = p.pu[i].filter_p2s;
14
+ p.chroma[X265_CSP_I444].pu[i].p2s = p.pu[i].convert_p2s;
15
}
16
17
for (int i = 0; i < NUM_CU_SIZES; i++)
18
x265_1.6.tar.gz/source/common/primitives.h -> x265_1.7.tar.gz/source/common/primitives.h
Changed
110
1
2
typedef int(*count_nonzero_t)(const int16_t* quantCoeff);
3
typedef void (*weightp_pp_t)(const pixel* src, pixel* dst, intptr_t stride, int width, int height, int w0, int round, int shift, int offset);
4
typedef void (*weightp_sp_t)(const int16_t* src, pixel* dst, intptr_t srcStride, intptr_t dstStride, int width, int height, int w0, int round, int shift, int offset);
5
-typedef void (*scale_t)(pixel* dst, const pixel* src, intptr_t stride);
6
+typedef void (*scale1D_t)(pixel* dst, const pixel* src);
7
+typedef void (*scale2D_t)(pixel* dst, const pixel* src, intptr_t stride);
8
typedef void (*downscale_t)(const pixel* src0, pixel* dstf, pixel* dsth, pixel* dstv, pixel* dstc,
9
intptr_t src_stride, intptr_t dst_stride, int width, int height);
10
typedef void (*extendCURowBorder_t)(pixel* txt, intptr_t stride, int width, int height, int marginX);
11
12
typedef void (*filter_sp_t) (const int16_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx);
13
typedef void (*filter_ss_t) (const int16_t* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx);
14
typedef void (*filter_hv_pp_t) (const pixel* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int idxX, int idxY);
15
-typedef void (*filter_p2s_wxh_t)(const pixel* src, intptr_t srcStride, int16_t* dst, int width, int height);
16
-typedef void (*filter_p2s_t)(const pixel* src, intptr_t srcStride, int16_t* dst);
17
+typedef void (*filter_p2s_t)(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
18
19
typedef void (*copy_pp_t)(pixel* dst, intptr_t dstStride, const pixel* src, intptr_t srcStride); // dst is aligned
20
typedef void (*copy_sp_t)(pixel* dst, intptr_t dstStride, const int16_t* src, intptr_t srcStride);
21
22
typedef void (*pixelavg_pp_t)(pixel* dst, intptr_t dstride, const pixel* src0, intptr_t sstride0, const pixel* src1, intptr_t sstride1, int weight);
23
typedef void (*addAvg_t)(const int16_t* src0, const int16_t* src1, pixel* dst, intptr_t src0Stride, intptr_t src1Stride, intptr_t dstStride);
24
25
-typedef void (*saoCuOrgE0_t)(pixel* rec, int8_t* offsetEo, int width, int8_t signLeft);
26
+typedef void (*saoCuOrgE0_t)(pixel* rec, int8_t* offsetEo, int width, int8_t* signLeft, intptr_t stride);
27
typedef void (*saoCuOrgE1_t)(pixel* rec, int8_t* upBuff1, int8_t* offsetEo, intptr_t stride, int width);
28
typedef void (*saoCuOrgE2_t)(pixel* rec, int8_t* pBufft, int8_t* pBuff1, int8_t* offsetEo, int lcuWidth, intptr_t stride);
29
typedef void (*saoCuOrgE3_t)(pixel* rec, int8_t* upBuff1, int8_t* m_offsetEo, intptr_t stride, int startX, int endX);
30
31
32
typedef void (*cutree_propagate_cost) (int* dst, const uint16_t* propagateIn, const int32_t* intraCosts, const uint16_t* interCosts, const int32_t* invQscales, const double* fpsFactor, int len);
33
34
-typedef int (*findPosLast_t)(const uint16_t *scan, const coeff_t *coeff, uint16_t *coeffSign, uint16_t *coeffFlag, uint8_t *coeffNum, int numSig);
35
+typedef int (*scanPosLast_t)(const uint16_t *scan, const coeff_t *coeff, uint16_t *coeffSign, uint16_t *coeffFlag, uint8_t *coeffNum, int numSig, const uint16_t* scanCG4x4, const int trSize);
36
+typedef uint32_t (*findPosFirstLast_t)(const int16_t *dstCoeff, const intptr_t trSize, const uint16_t scanTbl[16]);
37
38
/* Function pointers to optimized encoder primitives. Each pointer can reference
39
* either an assembly routine, a SIMD intrinsic primitive, or a C function */
40
41
addAvg_t addAvg; // bidir motion compensation, uses 16bit values
42
43
copy_pp_t copy_pp;
44
- filter_p2s_t filter_p2s;
45
+ filter_p2s_t convert_p2s;
46
}
47
pu[NUM_PU_SIZES];
48
49
50
dequant_scaling_t dequant_scaling;
51
dequant_normal_t dequant_normal;
52
denoiseDct_t denoiseDct;
53
- scale_t scale1D_128to64;
54
- scale_t scale2D_64to32;
55
+ scale1D_t scale1D_128to64;
56
+ scale2D_t scale2D_64to32;
57
58
ssim_4x4x2_core_t ssim_4x4x2_core;
59
ssim_end4_t ssim_end_4;
60
61
sign_t sign;
62
saoCuOrgE0_t saoCuOrgE0;
63
- saoCuOrgE1_t saoCuOrgE1;
64
- saoCuOrgE2_t saoCuOrgE2;
65
- saoCuOrgE3_t saoCuOrgE3;
66
+
67
+ /* To avoid the overhead in avx2 optimization in handling width=16, SAO_E0_1 is split
68
+ * into two parts: saoCuOrgE1, saoCuOrgE1_2Rows */
69
+ saoCuOrgE1_t saoCuOrgE1, saoCuOrgE1_2Rows;
70
+
71
+ // saoCuOrgE2[0] is used for width<=16 and saoCuOrgE2[1] is used for width > 16.
72
+ saoCuOrgE2_t saoCuOrgE2[2];
73
+
74
+ /* In avx2 optimization, two rows cannot be handled simultaneously since it requires
75
+ * a pixel from the previous row. So, saoCuOrgE3[0] is used for width<=16 and
76
+ * saoCuOrgE3[1] is used for width > 16. */
77
+ saoCuOrgE3_t saoCuOrgE3[2];
78
saoCuOrgB0_t saoCuOrgB0;
79
80
downscale_t frameInitLowres;
81
82
weightp_sp_t weight_sp;
83
weightp_pp_t weight_pp;
84
85
- filter_p2s_wxh_t luma_p2s;
86
87
- findPosLast_t findPosLast;
88
+ scanPosLast_t scanPosLast;
89
+ findPosFirstLast_t findPosFirstLast;
90
91
/* There is one set of chroma primitives per color space. An encoder will
92
* have just a single color space and thus it will only ever use one entry
93
94
filter_hps_t filter_hps;
95
addAvg_t addAvg;
96
copy_pp_t copy_pp;
97
- filter_p2s_t chroma_p2s;
98
+ filter_p2s_t p2s;
99
100
}
101
pu[NUM_PU_SIZES];
102
103
}
104
cu[NUM_CU_SIZES];
105
106
- filter_p2s_wxh_t p2s; // takes width/height as arguments
107
}
108
chroma[X265_CSP_COUNT];
109
};
110
x265_1.6.tar.gz/source/common/quant.cpp -> x265_1.7.tar.gz/source/common/quant.cpp
Changed
201
1
2
{
3
m_entropyCoder = &entropy;
4
m_rdoqLevel = rdoqLevel;
5
- m_psyRdoqScale = (int64_t)(psyScale * 256.0);
6
+ m_psyRdoqScale = (int32_t)(psyScale * 256.0);
7
+ X265_CHECK((psyScale * 256.0) < (double)MAX_INT, "psyScale value too large\n");
8
m_scalingList = &scalingList;
9
m_resiDctCoeff = X265_MALLOC(int16_t, MAX_TR_SIZE * MAX_TR_SIZE * 2);
10
m_fencDctCoeff = m_resiDctCoeff + (MAX_TR_SIZE * MAX_TR_SIZE);
11
12
X265_FREE(m_fencShortBuf);
13
}
14
15
-void Quant::setQPforQuant(const CUData& cu)
16
+void Quant::setQPforQuant(const CUData& ctu, int qp)
17
{
18
- m_tqBypass = !!cu.m_tqBypass[0];
19
+ m_tqBypass = !!ctu.m_tqBypass[0];
20
if (m_tqBypass)
21
return;
22
- m_nr = m_frameNr ? &m_frameNr[cu.m_encData->m_frameEncoderID] : NULL;
23
- int qpy = cu.m_qp[0];
24
- m_qpParam[TEXT_LUMA].setQpParam(qpy + QP_BD_OFFSET);
25
- setChromaQP(qpy + cu.m_slice->m_pps->chromaQpOffset[0], TEXT_CHROMA_U, cu.m_chromaFormat);
26
- setChromaQP(qpy + cu.m_slice->m_pps->chromaQpOffset[1], TEXT_CHROMA_V, cu.m_chromaFormat);
27
+ m_nr = m_frameNr ? &m_frameNr[ctu.m_encData->m_frameEncoderID] : NULL;
28
+ m_qpParam[TEXT_LUMA].setQpParam(qp + QP_BD_OFFSET);
29
+ setChromaQP(qp + ctu.m_slice->m_pps->chromaQpOffset[0], TEXT_CHROMA_U, ctu.m_chromaFormat);
30
+ setChromaQP(qp + ctu.m_slice->m_pps->chromaQpOffset[1], TEXT_CHROMA_V, ctu.m_chromaFormat);
31
}
32
33
void Quant::setChromaQP(int qpin, TextType ttype, int chFmt)
34
35
{
36
int transformShift = MAX_TR_DYNAMIC_RANGE - X265_DEPTH - log2TrSize; /* Represents scaling through forward transform */
37
int scalingListType = (cu.isIntra(absPartIdx) ? 0 : 3) + ttype;
38
+ const uint32_t usePsyMask = usePsy ? -1 : 0;
39
40
X265_CHECK(scalingListType < 6, "scaling list type out of range\n");
41
42
43
X265_CHECK((int)numSig == primitives.cu[log2TrSize - 2].count_nonzero(dstCoeff), "numSig differ\n");
44
if (!numSig)
45
return 0;
46
+
47
uint32_t trSize = 1 << log2TrSize;
48
int64_t lambda2 = m_qpParam[ttype].lambda2;
49
- int64_t psyScale = (m_psyRdoqScale * m_qpParam[ttype].lambda);
50
+ const int64_t psyScale = ((int64_t)m_psyRdoqScale * m_qpParam[ttype].lambda);
51
52
/* unquant constants for measuring distortion. Scaling list quant coefficients have a (1 << 4)
53
* scale applied that must be removed during unquant. Note that in real dequant there is clipping
54
55
#define UNQUANT(lvl) (((lvl) * (unquantScale[blkPos] << per) + unquantRound) >> unquantShift)
56
#define SIGCOST(bits) ((lambda2 * (bits)) >> 8)
57
#define RDCOST(d, bits) ((((int64_t)d * d) << scaleBits) + SIGCOST(bits))
58
-#define PSYVALUE(rec) ((psyScale * (rec)) >> (16 - scaleBits))
59
+#define PSYVALUE(rec) ((psyScale * (rec)) >> (2 * transformShift + 1))
60
61
int64_t costCoeff[32 * 32]; /* d*d + lambda * bits */
62
int64_t costUncoded[32 * 32]; /* d*d + lambda * 0 */
63
64
int64_t costCoeffGroupSig[MLS_GRP_NUM]; /* lambda * bits of group coding cost */
65
uint64_t sigCoeffGroupFlag64 = 0;
66
67
- uint32_t ctxSet = 0;
68
- int c1 = 1;
69
- int c2 = 0;
70
- uint32_t goRiceParam = 0;
71
- uint32_t c1Idx = 0;
72
- uint32_t c2Idx = 0;
73
- int cgLastScanPos = -1;
74
- int lastScanPos = -1;
75
const uint32_t cgSize = (1 << MLS_CG_SIZE); /* 4x4 num coef = 16 */
76
bool bIsLuma = ttype == TEXT_LUMA;
77
78
79
TUEntropyCodingParameters codeParams;
80
cu.getTUEntropyCodingParameters(codeParams, absPartIdx, log2TrSize, bIsLuma);
81
const uint32_t cgNum = 1 << (codeParams.log2TrSizeCG * 2);
82
+ const uint32_t cgStride = (trSize >> MLS_CG_LOG2_SIZE);
83
+
84
+ uint8_t coeffNum[MLS_GRP_NUM]; // value range[0, 16]
85
+ uint16_t coeffSign[MLS_GRP_NUM]; // bit mask map for non-zero coeff sign
86
+ uint16_t coeffFlag[MLS_GRP_NUM]; // bit mask map for non-zero coeff
87
+
88
+#if CHECKED_BUILD || _DEBUG
89
+ // clean output buffer, the asm version of scanPosLast Never output anything after latest non-zero coeff group
90
+ memset(coeffNum, 0, sizeof(coeffNum));
91
+ memset(coeffSign, 0, sizeof(coeffNum));
92
+ memset(coeffFlag, 0, sizeof(coeffNum));
93
+#endif
94
+ const int lastScanPos = primitives.scanPosLast(codeParams.scan, dstCoeff, coeffSign, coeffFlag, coeffNum, numSig, g_scan4x4[codeParams.scanType], trSize);
95
+ const int cgLastScanPos = (lastScanPos >> LOG2_SCAN_SET_SIZE);
96
+
97
98
/* TODO: update bit estimates if dirty */
99
EstBitsSbac& estBitsSbac = m_entropyCoder->m_estBitsSbac;
100
101
- uint32_t scanPos;
102
- coeffGroupRDStats cgRdStats;
103
+ uint32_t scanPos = 0;
104
+ uint32_t c1 = 1;
105
+
106
+ // process trail all zero Coeff Group
107
+
108
+ /* coefficients after lastNZ have no distortion signal cost */
109
+ const int zeroCG = cgNum - 1 - cgLastScanPos;
110
+ memset(&costCoeff[(cgLastScanPos + 1) << MLS_CG_SIZE], 0, zeroCG * MLS_CG_BLK_SIZE * sizeof(int64_t));
111
+ memset(&costSig[(cgLastScanPos + 1) << MLS_CG_SIZE], 0, zeroCG * MLS_CG_BLK_SIZE * sizeof(int64_t));
112
+
113
+ /* sum zero coeff (uncodec) cost */
114
+
115
+ // TODO: does we need these cost?
116
+ if (usePsyMask)
117
+ {
118
+ for (int cgScanPos = cgLastScanPos + 1; cgScanPos < (int)cgNum ; cgScanPos++)
119
+ {
120
+ X265_CHECK(coeffNum[cgScanPos] == 0, "count of coeff failure\n");
121
+
122
+ uint32_t scanPosBase = (cgScanPos << MLS_CG_SIZE);
123
+ uint32_t blkPos = codeParams.scan[scanPosBase];
124
+
125
+ // TODO: we can't SIMD optimize because PSYVALUE need 64-bits multiplication, convert to Double can work faster by FMA
126
+ for (int y = 0; y < MLS_CG_SIZE; y++)
127
+ {
128
+ for (int x = 0; x < MLS_CG_SIZE; x++)
129
+ {
130
+ int signCoef = m_resiDctCoeff[blkPos + x]; /* pre-quantization DCT coeff */
131
+ int predictedCoef = m_fencDctCoeff[blkPos + x] - signCoef; /* predicted DCT = source DCT - residual DCT*/
132
+
133
+ costUncoded[blkPos + x] = ((int64_t)signCoef * signCoef) << scaleBits;
134
+
135
+ /* when no residual coefficient is coded, predicted coef == recon coef */
136
+ costUncoded[blkPos + x] -= PSYVALUE(predictedCoef);
137
+
138
+ totalUncodedCost += costUncoded[blkPos + x];
139
+ totalRdCost += costUncoded[blkPos + x];
140
+ }
141
+ blkPos += trSize;
142
+ }
143
+ }
144
+ }
145
+ else
146
+ {
147
+ // non-psy path
148
+ for (int cgScanPos = cgLastScanPos + 1; cgScanPos < (int)cgNum ; cgScanPos++)
149
+ {
150
+ X265_CHECK(coeffNum[cgScanPos] == 0, "count of coeff failure\n");
151
+
152
+ uint32_t scanPosBase = (cgScanPos << MLS_CG_SIZE);
153
+ uint32_t blkPos = codeParams.scan[scanPosBase];
154
+
155
+ for (int y = 0; y < MLS_CG_SIZE; y++)
156
+ {
157
+ for (int x = 0; x < MLS_CG_SIZE; x++)
158
+ {
159
+ int signCoef = m_resiDctCoeff[blkPos + x]; /* pre-quantization DCT coeff */
160
+ costUncoded[blkPos + x] = ((int64_t)signCoef * signCoef) << scaleBits;
161
+
162
+ totalUncodedCost += costUncoded[blkPos + x];
163
+ totalRdCost += costUncoded[blkPos + x];
164
+ }
165
+ blkPos += trSize;
166
+ }
167
+ }
168
+ }
169
+
170
+ static const uint8_t table_cnt[5][SCAN_SET_SIZE] =
171
+ {
172
+ // patternSigCtx = 0
173
+ {
174
+ 2, 1, 1, 0,
175
+ 1, 1, 0, 0,
176
+ 1, 0, 0, 0,
177
+ 0, 0, 0, 0,
178
+ },
179
+ // patternSigCtx = 1
180
+ {
181
+ 2, 2, 2, 2,
182
+ 1, 1, 1, 1,
183
+ 0, 0, 0, 0,
184
+ 0, 0, 0, 0,
185
+ },
186
+ // patternSigCtx = 2
187
+ {
188
+ 2, 1, 0, 0,
189
+ 2, 1, 0, 0,
190
+ 2, 1, 0, 0,
191
+ 2, 1, 0, 0,
192
+ },
193
+ // patternSigCtx = 3
194
+ {
195
+ 2, 2, 2, 2,
196
+ 2, 2, 2, 2,
197
+ 2, 2, 2, 2,
198
+ 2, 2, 2, 2,
199
+ },
200
+ // 4x4
201
x265_1.6.tar.gz/source/common/quant.h -> x265_1.7.tar.gz/source/common/quant.h
Changed
80
1
2
int per;
3
int qp;
4
int64_t lambda2; /* FIX8 */
5
- int64_t lambda; /* FIX8 */
6
+ int32_t lambda; /* FIX8, dynamic range is 18-bits in 8bpp and 20-bits in 16bpp */
7
8
QpParam() : qp(MAX_INT) {}
9
10
11
per = qpScaled / 6;
12
qp = qpScaled;
13
lambda2 = (int64_t)(x265_lambda2_tab[qp - QP_BD_OFFSET] * 256. + 0.5);
14
- lambda = (int64_t)(x265_lambda_tab[qp - QP_BD_OFFSET] * 256. + 0.5);
15
+ lambda = (int32_t)(x265_lambda_tab[qp - QP_BD_OFFSET] * 256. + 0.5);
16
+ X265_CHECK((x265_lambda_tab[qp - QP_BD_OFFSET] * 256. + 0.5) < (double)MAX_INT, "x265_lambda_tab[] value too large\n");
17
}
18
}
19
};
20
21
QpParam m_qpParam[3];
22
23
int m_rdoqLevel;
24
- int64_t m_psyRdoqScale;
25
+ int32_t m_psyRdoqScale; // dynamic range [0,50] * 256 = 14-bits
26
int16_t* m_resiDctCoeff;
27
int16_t* m_fencDctCoeff;
28
int16_t* m_fencShortBuf;
29
30
bool allocNoiseReduction(const x265_param& param);
31
32
/* CU setup */
33
- void setQPforQuant(const CUData& cu);
34
+ void setQPforQuant(const CUData& ctu, int qp);
35
36
uint32_t transformNxN(const CUData& cu, const pixel* fenc, uint32_t fencStride, const int16_t* residual, uint32_t resiStride, coeff_t* coeff,
37
uint32_t log2TrSize, TextType ttype, uint32_t absPartIdx, bool useTransformSkip);
38
39
void invtransformNxN(int16_t* residual, uint32_t resiStride, const coeff_t* coeff,
40
uint32_t log2TrSize, TextType ttype, bool bIntra, bool useTransformSkip, uint32_t numSig);
41
42
+ /* Pattern decision for context derivation process of significant_coeff_flag */
43
+ static uint32_t calcPatternSigCtx(uint64_t sigCoeffGroupFlag64, uint32_t cgPosX, uint32_t cgPosY, uint32_t cgBlkPos, uint32_t trSizeCG)
44
+ {
45
+ if (trSizeCG == 1)
46
+ return 0;
47
+
48
+ X265_CHECK(trSizeCG <= 8, "transform CG is too large\n");
49
+ X265_CHECK(cgBlkPos < 64, "cgBlkPos is too large\n");
50
+ // NOTE: cgBlkPos+1 may more than 63, it is invalid for shift,
51
+ // but in this case, both cgPosX and cgPosY equal to (trSizeCG - 1),
52
+ // the sigRight and sigLower will clear value to zero, the final result will be correct
53
+ const uint32_t sigPos = (uint32_t)(sigCoeffGroupFlag64 >> (cgBlkPos + 1)); // just need lowest 7-bits valid
54
+
55
+ // TODO: instruction BT is faster, but _bittest64 still generate instruction 'BT m, r' in VS2012
56
+ const uint32_t sigRight = ((int32_t)(cgPosX - (trSizeCG - 1)) >> 31) & (sigPos & 1);
57
+ const uint32_t sigLower = ((int32_t)(cgPosY - (trSizeCG - 1)) >> 31) & (sigPos >> (trSizeCG - 2)) & 2;
58
+ return sigRight + sigLower;
59
+ }
60
+
61
+ /* Context derivation process of coeff_abs_significant_flag */
62
+ static uint32_t getSigCoeffGroupCtxInc(uint64_t cgGroupMask, uint32_t cgPosX, uint32_t cgPosY, uint32_t cgBlkPos, uint32_t trSizeCG)
63
+ {
64
+ X265_CHECK(cgBlkPos < 64, "cgBlkPos is too large\n");
65
+ // NOTE: unsafe shift operator, see NOTE in calcPatternSigCtx
66
+ const uint32_t sigPos = (uint32_t)(cgGroupMask >> (cgBlkPos + 1)); // just need lowest 8-bits valid
67
+ const uint32_t sigRight = ((int32_t)(cgPosX - (trSizeCG - 1)) >> 31) & sigPos;
68
+ const uint32_t sigLower = ((int32_t)(cgPosY - (trSizeCG - 1)) >> 31) & (sigPos >> (trSizeCG - 1));
69
+
70
+ return (sigRight | sigLower) & 1;
71
+ }
72
+
73
/* static methods shared with entropy.cpp */
74
- static uint32_t calcPatternSigCtx(uint64_t sigCoeffGroupFlag64, uint32_t cgPosX, uint32_t cgPosY, uint32_t log2TrSizeCG);
75
static uint32_t getSigCtxInc(uint32_t patternSigCtx, uint32_t log2TrSize, uint32_t trSize, uint32_t blkPos, bool bIsLuma, uint32_t firstSignificanceMapContext);
76
- static uint32_t getSigCoeffGroupCtxInc(uint64_t sigCoeffGroupFlag64, uint32_t cgPosX, uint32_t cgPosY, uint32_t log2TrSizeCG);
77
78
protected:
79
80
x265_1.6.tar.gz/source/common/slice.h -> x265_1.7.tar.gz/source/common/slice.h
Changed
9
1
2
LEVEL6 = 180,
3
LEVEL6_1 = 183,
4
LEVEL6_2 = 186,
5
+ LEVEL8_5 = 255,
6
};
7
}
8
9
x265_1.6.tar.gz/source/common/threading.h -> x265_1.7.tar.gz/source/common/threading.h
Changed
31
1
2
LeaveCriticalSection(&m_cs);
3
}
4
5
+ void poke(void)
6
+ {
7
+ /* awaken all waiting threads, but make no change */
8
+ EnterCriticalSection(&m_cs);
9
+ WakeAllConditionVariable(&m_cv);
10
+ LeaveCriticalSection(&m_cs);
11
+ }
12
+
13
void incr()
14
{
15
EnterCriticalSection(&m_cs);
16
17
pthread_mutex_unlock(&m_mutex);
18
}
19
20
+ void poke(void)
21
+ {
22
+ /* awaken all waiting threads, but make no change */
23
+ pthread_mutex_lock(&m_mutex);
24
+ pthread_cond_broadcast(&m_cond);
25
+ pthread_mutex_unlock(&m_mutex);
26
+ }
27
+
28
void incr()
29
{
30
pthread_mutex_lock(&m_mutex);
31
x265_1.6.tar.gz/source/common/threadpool.cpp -> x265_1.7.tar.gz/source/common/threadpool.cpp
Changed
59
1
2
int cpuCount = getCpuCount();
3
bool bNumaSupport = false;
4
5
-#if _WIN32_WINNT >= 0x0601
6
+#if defined(_WIN32_WINNT) && _WIN32_WINNT >= _WIN32_WINNT_WIN7
7
bNumaSupport = true;
8
#elif HAVE_LIBNUMA
9
bNumaSupport = numa_available() >= 0;
10
11
12
for (int i = 0; i < cpuCount; i++)
13
{
14
-#if _WIN32_WINNT >= 0x0601
15
+#if defined(_WIN32_WINNT) && _WIN32_WINNT >= _WIN32_WINNT_WIN7
16
UCHAR node;
17
if (GetNumaProcessorNode((UCHAR)i, &node))
18
- cpusPerNode[X265_MIN(node, MAX_NODE_NUM)]++;
19
+ cpusPerNode[X265_MIN(node, (UCHAR)MAX_NODE_NUM)]++;
20
else
21
#elif HAVE_LIBNUMA
22
if (bNumaSupport >= 0)
23
24
/* limit nodes based on param->numaPools */
25
if (p->numaPools && *p->numaPools)
26
{
27
- char *nodeStr = p->numaPools;
28
+ const char *nodeStr = p->numaPools;
29
for (int i = 0; i < numNumaNodes; i++)
30
{
31
if (!*nodeStr)
32
33
return true;
34
}
35
36
-void ThreadPool::stop()
37
+void ThreadPool::stopWorkers()
38
{
39
if (m_workers)
40
{
41
42
/* static */
43
void ThreadPool::setThreadNodeAffinity(int numaNode)
44
{
45
-#if _WIN32_WINNT >= 0x0601
46
+#if defined(_WIN32_WINNT) && _WIN32_WINNT >= _WIN32_WINNT_WIN7
47
GROUP_AFFINITY groupAffinity;
48
if (GetNumaNodeProcessorMaskEx((USHORT)numaNode, &groupAffinity))
49
{
50
51
/* static */
52
int ThreadPool::getNumaNodeCount()
53
{
54
-#if _WIN32_WINNT >= 0x0601
55
+#if defined(_WIN32_WINNT) && _WIN32_WINNT >= _WIN32_WINNT_WIN7
56
ULONG num = 1;
57
if (GetNumaHighestNodeNumber(&num))
58
num++;
59
x265_1.6.tar.gz/source/common/threadpool.h -> x265_1.7.tar.gz/source/common/threadpool.h
Changed
10
1
2
3
bool create(int numThreads, int maxProviders, int node);
4
bool start();
5
- void stop();
6
+ void stopWorkers();
7
void setCurrentThreadAffinity();
8
int tryAcquireSleepingThread(sleepbitmap_t firstTryBitmap, sleepbitmap_t secondTryBitmap);
9
int tryBondPeers(int maxPeers, sleepbitmap_t peerBitmap, BondedTaskGroup& master);
10
x265_1.6.tar.gz/source/common/x86/asm-primitives.cpp -> x265_1.7.tar.gz/source/common/x86/asm-primitives.cpp
Changed
201
1
2
#error "Unsupported build configuration (32bit x86 and HIGH_BIT_DEPTH), you must configure ENABLE_ASSEMBLY=OFF"
3
#endif
4
5
+#if X86_64
6
+ p.scanPosLast = x265_scanPosLast_x64;
7
+#endif
8
+
9
if (cpuMask & X265_CPU_SSE2)
10
{
11
/* We do not differentiate CPUs which support MMX and not SSE2. We only check
12
13
PIXEL_AVG_W4(mmx2);
14
LUMA_VAR(sse2);
15
16
- p.luma_p2s = x265_luma_p2s_sse2;
17
- p.chroma[X265_CSP_I420].p2s = x265_chroma_p2s_sse2;
18
- p.chroma[X265_CSP_I422].p2s = x265_chroma_p2s_sse2;
19
20
ALL_LUMA_TU(blockfill_s, blockfill_s, sse2);
21
ALL_LUMA_TU_S(cpy1Dto2D_shr, cpy1Dto2D_shr_, sse2);
22
23
ALL_LUMA_TU_S(calcresidual, getResidual, sse2);
24
ALL_LUMA_TU_S(transpose, transpose, sse2);
25
26
- p.cu[BLOCK_4x4].intra_pred[DC_IDX] = x265_intra_pred_dc4_sse2;
27
- p.cu[BLOCK_8x8].intra_pred[DC_IDX] = x265_intra_pred_dc8_sse2;
28
- p.cu[BLOCK_16x16].intra_pred[DC_IDX] = x265_intra_pred_dc16_sse2;
29
- p.cu[BLOCK_32x32].intra_pred[DC_IDX] = x265_intra_pred_dc32_sse2;
30
-
31
- p.cu[BLOCK_4x4].intra_pred[PLANAR_IDX] = x265_intra_pred_planar4_sse2;
32
- p.cu[BLOCK_8x8].intra_pred[PLANAR_IDX] = x265_intra_pred_planar8_sse2;
33
- p.cu[BLOCK_16x16].intra_pred[PLANAR_IDX] = x265_intra_pred_planar16_sse2;
34
- p.cu[BLOCK_32x32].intra_pred[PLANAR_IDX] = x265_intra_pred_planar32_sse2;
35
+ ALL_LUMA_TU_S(intra_pred[PLANAR_IDX], intra_pred_planar, sse2);
36
+ ALL_LUMA_TU_S(intra_pred[DC_IDX], intra_pred_dc, sse2);
37
+
38
+ p.cu[BLOCK_4x4].intra_pred[2] = x265_intra_pred_ang4_2_sse2;
39
+ p.cu[BLOCK_4x4].intra_pred[3] = x265_intra_pred_ang4_3_sse2;
40
+ p.cu[BLOCK_4x4].intra_pred[4] = x265_intra_pred_ang4_4_sse2;
41
+ p.cu[BLOCK_4x4].intra_pred[5] = x265_intra_pred_ang4_5_sse2;
42
+ p.cu[BLOCK_4x4].intra_pred[6] = x265_intra_pred_ang4_6_sse2;
43
+ p.cu[BLOCK_4x4].intra_pred[7] = x265_intra_pred_ang4_7_sse2;
44
+ p.cu[BLOCK_4x4].intra_pred[8] = x265_intra_pred_ang4_8_sse2;
45
+ p.cu[BLOCK_4x4].intra_pred[9] = x265_intra_pred_ang4_9_sse2;
46
+ p.cu[BLOCK_4x4].intra_pred[10] = x265_intra_pred_ang4_10_sse2;
47
+ p.cu[BLOCK_4x4].intra_pred[11] = x265_intra_pred_ang4_11_sse2;
48
+ p.cu[BLOCK_4x4].intra_pred[12] = x265_intra_pred_ang4_12_sse2;
49
+ p.cu[BLOCK_4x4].intra_pred[13] = x265_intra_pred_ang4_13_sse2;
50
+ p.cu[BLOCK_4x4].intra_pred[14] = x265_intra_pred_ang4_14_sse2;
51
+ p.cu[BLOCK_4x4].intra_pred[15] = x265_intra_pred_ang4_15_sse2;
52
+ p.cu[BLOCK_4x4].intra_pred[16] = x265_intra_pred_ang4_16_sse2;
53
+ p.cu[BLOCK_4x4].intra_pred[17] = x265_intra_pred_ang4_17_sse2;
54
+ p.cu[BLOCK_4x4].intra_pred[18] = x265_intra_pred_ang4_18_sse2;
55
+ p.cu[BLOCK_4x4].intra_pred[19] = x265_intra_pred_ang4_17_sse2;
56
+ p.cu[BLOCK_4x4].intra_pred[20] = x265_intra_pred_ang4_16_sse2;
57
+ p.cu[BLOCK_4x4].intra_pred[21] = x265_intra_pred_ang4_15_sse2;
58
+ p.cu[BLOCK_4x4].intra_pred[22] = x265_intra_pred_ang4_14_sse2;
59
+ p.cu[BLOCK_4x4].intra_pred[23] = x265_intra_pred_ang4_13_sse2;
60
+ p.cu[BLOCK_4x4].intra_pred[24] = x265_intra_pred_ang4_12_sse2;
61
+ p.cu[BLOCK_4x4].intra_pred[25] = x265_intra_pred_ang4_11_sse2;
62
+ p.cu[BLOCK_4x4].intra_pred[26] = x265_intra_pred_ang4_26_sse2;
63
+ p.cu[BLOCK_4x4].intra_pred[27] = x265_intra_pred_ang4_9_sse2;
64
+ p.cu[BLOCK_4x4].intra_pred[28] = x265_intra_pred_ang4_8_sse2;
65
+ p.cu[BLOCK_4x4].intra_pred[29] = x265_intra_pred_ang4_7_sse2;
66
+ p.cu[BLOCK_4x4].intra_pred[30] = x265_intra_pred_ang4_6_sse2;
67
+ p.cu[BLOCK_4x4].intra_pred[31] = x265_intra_pred_ang4_5_sse2;
68
+ p.cu[BLOCK_4x4].intra_pred[32] = x265_intra_pred_ang4_4_sse2;
69
+ p.cu[BLOCK_4x4].intra_pred[33] = x265_intra_pred_ang4_3_sse2;
70
71
p.cu[BLOCK_4x4].sse_ss = x265_pixel_ssd_ss_4x4_mmx2;
72
ALL_LUMA_CU(sse_ss, pixel_ssd_ss, sse2);
73
74
p.cu[BLOCK_16x16].count_nonzero = x265_count_nonzero_16x16_ssse3;
75
p.cu[BLOCK_32x32].count_nonzero = x265_count_nonzero_32x32_ssse3;
76
p.frameInitLowres = x265_frame_init_lowres_core_ssse3;
77
+
78
+ p.pu[LUMA_4x4].convert_p2s = x265_filterPixelToShort_4x4_ssse3;
79
+ p.pu[LUMA_4x8].convert_p2s = x265_filterPixelToShort_4x8_ssse3;
80
+ p.pu[LUMA_4x16].convert_p2s = x265_filterPixelToShort_4x16_ssse3;
81
+ p.pu[LUMA_8x4].convert_p2s = x265_filterPixelToShort_8x4_ssse3;
82
+ p.pu[LUMA_8x8].convert_p2s = x265_filterPixelToShort_8x8_ssse3;
83
+ p.pu[LUMA_8x16].convert_p2s = x265_filterPixelToShort_8x16_ssse3;
84
+ p.pu[LUMA_8x32].convert_p2s = x265_filterPixelToShort_8x32_ssse3;
85
+ p.pu[LUMA_16x4].convert_p2s = x265_filterPixelToShort_16x4_ssse3;
86
+ p.pu[LUMA_16x8].convert_p2s = x265_filterPixelToShort_16x8_ssse3;
87
+ p.pu[LUMA_16x12].convert_p2s = x265_filterPixelToShort_16x12_ssse3;
88
+ p.pu[LUMA_16x16].convert_p2s = x265_filterPixelToShort_16x16_ssse3;
89
+ p.pu[LUMA_16x32].convert_p2s = x265_filterPixelToShort_16x32_ssse3;
90
+ p.pu[LUMA_16x64].convert_p2s = x265_filterPixelToShort_16x64_ssse3;
91
+ p.pu[LUMA_32x8].convert_p2s = x265_filterPixelToShort_32x8_ssse3;
92
+ p.pu[LUMA_32x16].convert_p2s = x265_filterPixelToShort_32x16_ssse3;
93
+ p.pu[LUMA_32x24].convert_p2s = x265_filterPixelToShort_32x24_ssse3;
94
+ p.pu[LUMA_32x32].convert_p2s = x265_filterPixelToShort_32x32_ssse3;
95
+ p.pu[LUMA_32x64].convert_p2s = x265_filterPixelToShort_32x64_ssse3;
96
+ p.pu[LUMA_64x16].convert_p2s = x265_filterPixelToShort_64x16_ssse3;
97
+ p.pu[LUMA_64x32].convert_p2s = x265_filterPixelToShort_64x32_ssse3;
98
+ p.pu[LUMA_64x48].convert_p2s = x265_filterPixelToShort_64x48_ssse3;
99
+ p.pu[LUMA_64x64].convert_p2s = x265_filterPixelToShort_64x64_ssse3;
100
+ p.pu[LUMA_24x32].convert_p2s = x265_filterPixelToShort_24x32_ssse3;
101
+ p.pu[LUMA_12x16].convert_p2s = x265_filterPixelToShort_12x16_ssse3;
102
+ p.pu[LUMA_48x64].convert_p2s = x265_filterPixelToShort_48x64_ssse3;
103
+
104
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x4].p2s = x265_filterPixelToShort_4x4_ssse3;
105
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x8].p2s = x265_filterPixelToShort_4x8_ssse3;
106
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x16].p2s = x265_filterPixelToShort_4x16_ssse3;
107
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x4].p2s = x265_filterPixelToShort_8x4_ssse3;
108
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x8].p2s = x265_filterPixelToShort_8x8_ssse3;
109
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x16].p2s = x265_filterPixelToShort_8x16_ssse3;
110
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x32].p2s = x265_filterPixelToShort_8x32_ssse3;
111
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x4].p2s = x265_filterPixelToShort_16x4_ssse3;
112
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x8].p2s = x265_filterPixelToShort_16x8_ssse3;
113
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x12].p2s = x265_filterPixelToShort_16x12_ssse3;
114
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x16].p2s = x265_filterPixelToShort_16x16_ssse3;
115
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_16x32].p2s = x265_filterPixelToShort_16x32_ssse3;
116
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x8].p2s = x265_filterPixelToShort_32x8_ssse3;
117
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x16].p2s = x265_filterPixelToShort_32x16_ssse3;
118
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x24].p2s = x265_filterPixelToShort_32x24_ssse3;
119
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_32x32].p2s = x265_filterPixelToShort_32x32_ssse3;
120
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x4].p2s = x265_filterPixelToShort_4x4_ssse3;
121
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x8].p2s = x265_filterPixelToShort_4x8_ssse3;
122
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x16].p2s = x265_filterPixelToShort_4x16_ssse3;
123
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_4x32].p2s = x265_filterPixelToShort_4x32_ssse3;
124
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x4].p2s = x265_filterPixelToShort_8x4_ssse3;
125
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x8].p2s = x265_filterPixelToShort_8x8_ssse3;
126
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x12].p2s = x265_filterPixelToShort_8x12_ssse3;
127
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x16].p2s = x265_filterPixelToShort_8x16_ssse3;
128
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x32].p2s = x265_filterPixelToShort_8x32_ssse3;
129
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_8x64].p2s = x265_filterPixelToShort_8x64_ssse3;
130
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_12x32].p2s = x265_filterPixelToShort_12x32_ssse3;
131
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x8].p2s = x265_filterPixelToShort_16x8_ssse3;
132
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x16].p2s = x265_filterPixelToShort_16x16_ssse3;
133
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x24].p2s = x265_filterPixelToShort_16x24_ssse3;
134
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x32].p2s = x265_filterPixelToShort_16x32_ssse3;
135
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_16x64].p2s = x265_filterPixelToShort_16x64_ssse3;
136
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_24x64].p2s = x265_filterPixelToShort_24x64_ssse3;
137
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x16].p2s = x265_filterPixelToShort_32x16_ssse3;
138
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x32].p2s = x265_filterPixelToShort_32x32_ssse3;
139
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x48].p2s = x265_filterPixelToShort_32x48_ssse3;
140
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_32x64].p2s = x265_filterPixelToShort_32x64_ssse3;
141
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_4x2].p2s = x265_filterPixelToShort_4x2_ssse3;
142
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x2].p2s = x265_filterPixelToShort_8x2_ssse3;
143
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_8x6].p2s = x265_filterPixelToShort_8x6_ssse3;
144
+ p.findPosFirstLast = x265_findPosFirstLast_ssse3;
145
}
146
if (cpuMask & X265_CPU_SSE4)
147
{
148
149
ALL_LUMA_TU_S(copy_cnt, copy_cnt_, sse4);
150
ALL_LUMA_CU(psy_cost_pp, psyCost_pp, sse4);
151
ALL_LUMA_CU(psy_cost_ss, psyCost_ss, sse4);
152
+
153
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_2x4].p2s = x265_filterPixelToShort_2x4_sse4;
154
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_2x8].p2s = x265_filterPixelToShort_2x8_sse4;
155
+ p.chroma[X265_CSP_I420].pu[CHROMA_420_6x8].p2s = x265_filterPixelToShort_6x8_sse4;
156
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_2x8].p2s = x265_filterPixelToShort_2x8_sse4;
157
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_2x16].p2s = x265_filterPixelToShort_2x16_sse4;
158
+ p.chroma[X265_CSP_I422].pu[CHROMA_422_6x16].p2s = x265_filterPixelToShort_6x16_sse4;
159
}
160
if (cpuMask & X265_CPU_AVX)
161
{
162
163
}
164
if (cpuMask & X265_CPU_AVX2)
165
{
166
+ p.pu[LUMA_48x64].satd = x265_pixel_satd_48x64_avx2;
167
+
168
+ p.pu[LUMA_64x16].satd = x265_pixel_satd_64x16_avx2;
169
+ p.pu[LUMA_64x32].satd = x265_pixel_satd_64x32_avx2;
170
+ p.pu[LUMA_64x48].satd = x265_pixel_satd_64x48_avx2;
171
+ p.pu[LUMA_64x64].satd = x265_pixel_satd_64x64_avx2;
172
+
173
+ p.pu[LUMA_32x8].satd = x265_pixel_satd_32x8_avx2;
174
+ p.pu[LUMA_32x16].satd = x265_pixel_satd_32x16_avx2;
175
+ p.pu[LUMA_32x24].satd = x265_pixel_satd_32x24_avx2;
176
+ p.pu[LUMA_32x32].satd = x265_pixel_satd_32x32_avx2;
177
+ p.pu[LUMA_32x64].satd = x265_pixel_satd_32x64_avx2;
178
+
179
+ p.pu[LUMA_16x4].satd = x265_pixel_satd_16x4_avx2;
180
+ p.pu[LUMA_16x8].satd = x265_pixel_satd_16x8_avx2;
181
+ p.pu[LUMA_16x12].satd = x265_pixel_satd_16x12_avx2;
182
+ p.pu[LUMA_16x16].satd = x265_pixel_satd_16x16_avx2;
183
+ p.pu[LUMA_16x32].satd = x265_pixel_satd_16x32_avx2;
184
+ p.pu[LUMA_16x64].satd = x265_pixel_satd_16x64_avx2;
185
+
186
p.cu[BLOCK_32x32].ssd_s = x265_pixel_ssd_s_32_avx2;
187
p.cu[BLOCK_16x16].sse_ss = x265_pixel_ssd_ss_16x16_avx2;
188
189
190
p.dequant_normal = x265_dequant_normal_avx2;
191
192
p.scale1D_128to64 = x265_scale1D_128to64_avx2;
193
+ p.scale2D_64to32 = x265_scale2D_64to32_avx2;
194
// p.weight_pp = x265_weight_pp_avx2; fails tests
195
196
p.cu[BLOCK_16x16].calcresidual = x265_getResidual16_avx2;
197
198
ALL_LUMA_PU(luma_vps, interp_8tap_vert_ps, avx2);
199
ALL_LUMA_PU(luma_vsp, interp_8tap_vert_sp, avx2);
200
ALL_LUMA_PU(luma_vss, interp_8tap_vert_ss, avx2);
201
x265_1.6.tar.gz/source/common/x86/const-a.asm -> x265_1.7.tar.gz/source/common/x86/const-a.asm
Changed
173
1
2
3
SECTION_RODATA 32
4
5
-const pb_1, times 32 db 1
6
+;; 8-bit constants
7
8
-const hsub_mul, times 16 db 1, -1
9
-const pw_1, times 16 dw 1
10
-const pw_16, times 16 dw 16
11
-const pw_32, times 16 dw 32
12
-const pw_128, times 16 dw 128
13
-const pw_256, times 16 dw 256
14
-const pw_257, times 16 dw 257
15
-const pw_512, times 16 dw 512
16
-const pw_1023, times 8 dw 1023
17
-ALIGN 32
18
-const pw_1024, times 16 dw 1024
19
-const pw_4096, times 16 dw 4096
20
-const pw_00ff, times 16 dw 0x00ff
21
-ALIGN 32
22
-const pw_pixel_max,times 16 dw ((1 << BIT_DEPTH)-1)
23
-const deinterleave_shufd, dd 0,4,1,5,2,6,3,7
24
-const pb_unpackbd1, times 2 db 0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3
25
-const pb_unpackbd2, times 2 db 4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7
26
-const pb_unpackwq1, db 0,1,0,1,0,1,0,1,2,3,2,3,2,3,2,3
27
-const pb_unpackwq2, db 4,5,4,5,4,5,4,5,6,7,6,7,6,7,6,7
28
-const pw_swap, times 2 db 6,7,4,5,2,3,0,1
29
+const pb_0, times 16 db 0
30
+const pb_1, times 32 db 1
31
+const pb_2, times 32 db 2
32
+const pb_3, times 16 db 3
33
+const pb_4, times 32 db 4
34
+const pb_8, times 32 db 8
35
+const pb_15, times 32 db 15
36
+const pb_16, times 32 db 16
37
+const pb_32, times 32 db 32
38
+const pb_64, times 32 db 64
39
+const pb_128, times 16 db 128
40
+const pb_a1, times 16 db 0xa1
41
42
-const pb_2, times 32 db 2
43
-const pb_4, times 32 db 4
44
-const pb_16, times 32 db 16
45
-const pb_64, times 32 db 64
46
-const pb_01, times 8 db 0,1
47
-const pb_0, times 16 db 0
48
-const pb_a1, times 16 db 0xa1
49
-const pb_3, times 16 db 3
50
-const pb_8, times 32 db 8
51
-const pb_32, times 32 db 32
52
-const pb_128, times 16 db 128
53
-const pb_shuf8x8c, db 0,0,0,0,2,2,2,2,4,4,4,4,6,6,6,6
54
+const pb_01, times 8 db 0, 1
55
+const hsub_mul, times 16 db 1, -1
56
+const pw_swap, times 2 db 6, 7, 4, 5, 2, 3, 0, 1
57
+const pb_unpackbd1, times 2 db 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3
58
+const pb_unpackbd2, times 2 db 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7
59
+const pb_unpackwq1, times 1 db 0, 1, 0, 1, 0, 1, 0, 1, 2, 3, 2, 3, 2, 3, 2, 3
60
+const pb_unpackwq2, times 1 db 4, 5, 4, 5, 4, 5, 4, 5, 6, 7, 6, 7, 6, 7, 6, 7
61
+const pb_shuf8x8c, times 1 db 0, 0, 0, 0, 2, 2, 2, 2, 4, 4, 4, 4, 6, 6, 6, 6
62
+const pb_movemask, times 16 db 0x00
63
+ times 16 db 0xFF
64
+const pb_0000000000000F0F, times 2 db 0xff, 0x00
65
+ times 12 db 0x00
66
+const pb_000000000000000F, db 0xff
67
+ times 15 db 0x00
68
69
-const pw_0_15, times 2 dw 0, 1, 2, 3, 4, 5, 6, 7
70
-const pw_2, times 8 dw 2
71
-const pw_m2, times 8 dw -2
72
-const pw_4, times 8 dw 4
73
-const pw_8, times 8 dw 8
74
-const pw_64, times 8 dw 64
75
-const pw_256, times 8 dw 256
76
-const pw_32_0, times 4 dw 32,
77
- times 4 dw 0
78
-const pw_2000, times 16 dw 0x2000
79
-const pw_8000, times 8 dw 0x8000
80
-const pw_3fff, times 8 dw 0x3fff
81
-const pw_ppppmmmm, dw 1,1,1,1,-1,-1,-1,-1
82
-const pw_ppmmppmm, dw 1,1,-1,-1,1,1,-1,-1
83
-const pw_pmpmpmpm, dw 1,-1,1,-1,1,-1,1,-1
84
-const pw_pmmpzzzz, dw 1,-1,-1,1,0,0,0,0
85
-const pd_1, times 8 dd 1
86
-const pd_2, times 8 dd 2
87
-const pd_4, times 4 dd 4
88
-const pd_8, times 4 dd 8
89
-const pd_16, times 4 dd 16
90
-const pd_32, times 4 dd 32
91
-const pd_64, times 4 dd 64
92
-const pd_128, times 4 dd 128
93
-const pd_256, times 4 dd 256
94
-const pd_512, times 4 dd 512
95
-const pd_1024, times 4 dd 1024
96
-const pd_2048, times 4 dd 2048
97
-const pd_ffff, times 4 dd 0xffff
98
-const pd_32767, times 4 dd 32767
99
-const pd_n32768, times 4 dd 0xffff8000
100
-const pw_ff00, times 8 dw 0xff00
101
+;; 16-bit constants
102
103
-const multi_2Row, dw 1, 2, 3, 4, 1, 2, 3, 4
104
-const multiL, dw 1, 2, 3, 4, 5, 6, 7, 8
105
-const multiH, dw 9, 10, 11, 12, 13, 14, 15, 16
106
-const multiH2, dw 17, 18, 19, 20, 21, 22, 23, 24
107
-const multiH3, dw 25, 26, 27, 28, 29, 30, 31, 32
108
+const pw_1, times 16 dw 1
109
+const pw_2, times 8 dw 2
110
+const pw_m2, times 8 dw -2
111
+const pw_4, times 8 dw 4
112
+const pw_8, times 8 dw 8
113
+const pw_16, times 16 dw 16
114
+const pw_15, times 16 dw 15
115
+const pw_31, times 16 dw 31
116
+const pw_32, times 16 dw 32
117
+const pw_64, times 8 dw 64
118
+const pw_128, times 16 dw 128
119
+const pw_256, times 16 dw 256
120
+const pw_257, times 16 dw 257
121
+const pw_512, times 16 dw 512
122
+const pw_1023, times 8 dw 1023
123
+const pw_1024, times 16 dw 1024
124
+const pw_4096, times 16 dw 4096
125
+const pw_00ff, times 16 dw 0x00ff
126
+const pw_ff00, times 8 dw 0xff00
127
+const pw_2000, times 16 dw 0x2000
128
+const pw_8000, times 8 dw 0x8000
129
+const pw_3fff, times 8 dw 0x3fff
130
+const pw_32_0, times 4 dw 32,
131
+ times 4 dw 0
132
+const pw_pixel_max, times 16 dw ((1 << BIT_DEPTH)-1)
133
+
134
+const pw_0_15, times 2 dw 0, 1, 2, 3, 4, 5, 6, 7
135
+const pw_ppppmmmm, times 1 dw 1, 1, 1, 1, -1, -1, -1, -1
136
+const pw_ppmmppmm, times 1 dw 1, 1, -1, -1, 1, 1, -1, -1
137
+const pw_pmpmpmpm, times 1 dw 1, -1, 1, -1, 1, -1, 1, -1
138
+const pw_pmmpzzzz, times 1 dw 1, -1, -1, 1, 0, 0, 0, 0
139
+const multi_2Row, times 1 dw 1, 2, 3, 4, 1, 2, 3, 4
140
+const multiH, times 1 dw 9, 10, 11, 12, 13, 14, 15, 16
141
+const multiH3, times 1 dw 25, 26, 27, 28, 29, 30, 31, 32
142
+const multiL, times 1 dw 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
143
+const multiH2, times 1 dw 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32
144
+const pw_planar16_mul, times 1 dw 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
145
+const pw_planar32_mul, times 1 dw 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16
146
+const pw_FFFFFFFFFFFFFFF0, dw 0x00
147
+ times 7 dw 0xff
148
+
149
+
150
+;; 32-bit constants
151
+
152
+const pd_1, times 8 dd 1
153
+const pd_2, times 8 dd 2
154
+const pd_4, times 4 dd 4
155
+const pd_8, times 4 dd 8
156
+const pd_16, times 4 dd 16
157
+const pd_32, times 4 dd 32
158
+const pd_64, times 4 dd 64
159
+const pd_128, times 4 dd 128
160
+const pd_256, times 4 dd 256
161
+const pd_512, times 4 dd 512
162
+const pd_1024, times 4 dd 1024
163
+const pd_2048, times 4 dd 2048
164
+const pd_ffff, times 4 dd 0xffff
165
+const pd_32767, times 4 dd 32767
166
+const pd_n32768, times 4 dd 0xffff8000
167
+
168
+const trans8_shuf, times 1 dd 0, 4, 1, 5, 2, 6, 3, 7
169
+const deinterleave_shufd, times 1 dd 0, 4, 1, 5, 2, 6, 3, 7
170
171
const popcnt_table
172
%assign x 0
173
x265_1.6.tar.gz/source/common/x86/dct8.asm -> x265_1.7.tar.gz/source/common/x86/dct8.asm
Changed
181
1
2
times 2 dw 84, -29, -74, 55
3
times 2 dw 55, -84, 74, -29
4
5
+pw_dst4_tab: times 4 dw 29, 55, 74, 84
6
+ times 4 dw 74, 74, 0, -74
7
+ times 4 dw 84, -29, -74, 55
8
+ times 4 dw 55, -84, 74, -29
9
+
10
tab_idst4: times 4 dw 29, +84
11
times 4 dw +74, +55
12
times 4 dw 55, -29
13
14
times 4 dw 84, +55
15
times 4 dw -74, -29
16
17
+pw_idst4_tab: times 4 dw 29, 84
18
+ times 4 dw 55, -29
19
+ times 4 dw 74, 55
20
+ times 4 dw 74, -84
21
+ times 4 dw 74, -74
22
+ times 4 dw 84, 55
23
+ times 4 dw 0, 74
24
+ times 4 dw -74, -29
25
+pb_idst4_shuf: times 2 db 0, 1, 8, 9, 2, 3, 10, 11, 4, 5, 12, 13, 6, 7, 14, 15
26
+
27
tab_dct8_1: times 2 dw 89, 50, 75, 18
28
times 2 dw 75, -89, -18, -50
29
times 2 dw 50, 18, -89, 75
30
31
cextern pd_1024
32
cextern pd_2048
33
cextern pw_ppppmmmm
34
-
35
+cextern trans8_shuf
36
;------------------------------------------------------
37
;void dct4(const int16_t* src, int16_t* dst, intptr_t srcStride)
38
;------------------------------------------------------
39
40
41
RET
42
43
+;------------------------------------------------------------------
44
+;void dst4(const int16_t* src, int16_t* dst, intptr_t srcStride)
45
+;------------------------------------------------------------------
46
+INIT_YMM avx2
47
+cglobal dst4, 3, 4, 6
48
+%if BIT_DEPTH == 8
49
+ %define DST_SHIFT 1
50
+ vpbroadcastd m5, [pd_1]
51
+%elif BIT_DEPTH == 10
52
+ %define DST_SHIFT 3
53
+ vpbroadcastd m5, [pd_4]
54
+%endif
55
+ mova m4, [trans8_shuf]
56
+ add r2d, r2d
57
+ lea r3, [pw_dst4_tab]
58
+
59
+ movq xm0, [r0 + 0 * r2]
60
+ movhps xm0, [r0 + 1 * r2]
61
+ lea r0, [r0 + 2 * r2]
62
+ movq xm1, [r0]
63
+ movhps xm1, [r0 + r2]
64
+
65
+ vinserti128 m0, m0, xm1, 1 ; m0 = src[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
66
+
67
+ pmaddwd m2, m0, [r3 + 0 * 32]
68
+ pmaddwd m1, m0, [r3 + 1 * 32]
69
+ phaddd m2, m1
70
+ paddd m2, m5
71
+ psrad m2, DST_SHIFT
72
+ pmaddwd m3, m0, [r3 + 2 * 32]
73
+ pmaddwd m1, m0, [r3 + 3 * 32]
74
+ phaddd m3, m1
75
+ paddd m3, m5
76
+ psrad m3, DST_SHIFT
77
+ packssdw m2, m3
78
+ vpermd m2, m4, m2
79
+
80
+ vpbroadcastd m5, [pd_128]
81
+ pmaddwd m0, m2, [r3 + 0 * 32]
82
+ pmaddwd m1, m2, [r3 + 1 * 32]
83
+ phaddd m0, m1
84
+ paddd m0, m5
85
+ psrad m0, 8
86
+ pmaddwd m3, m2, [r3 + 2 * 32]
87
+ pmaddwd m2, m2, [r3 + 3 * 32]
88
+ phaddd m3, m2
89
+ paddd m3, m5
90
+ psrad m3, 8
91
+ packssdw m0, m3
92
+ vpermd m0, m4, m0
93
+ movu [r1], m0
94
+ RET
95
+
96
;-------------------------------------------------------
97
;void idst4(const int16_t* src, int16_t* dst, intptr_t dstStride)
98
;-------------------------------------------------------
99
100
movhps [r1 + r2], m1
101
RET
102
103
+;-----------------------------------------------------------------
104
+;void idst4(const int16_t* src, int16_t* dst, intptr_t dstStride)
105
+;-----------------------------------------------------------------
106
+INIT_YMM avx2
107
+cglobal idst4, 3, 4, 6
108
+%if BIT_DEPTH == 8
109
+ vpbroadcastd m4, [pd_2048]
110
+ %define IDCT4_SHIFT 12
111
+%elif BIT_DEPTH == 10
112
+ vpbroadcastd m4, [pd_512]
113
+ %define IDCT4_SHIFT 10
114
+%else
115
+ %error Unsupported BIT_DEPTH!
116
+%endif
117
+ add r2d, r2d
118
+ lea r3, [pw_idst4_tab]
119
+
120
+ movu xm0, [r0 + 0 * 16]
121
+ movu xm1, [r0 + 1 * 16]
122
+
123
+ punpcklwd m2, m0, m1
124
+ punpckhwd m0, m1
125
+
126
+ vinserti128 m2, m2, xm2, 1
127
+ vinserti128 m0, m0, xm0, 1
128
+
129
+ vpbroadcastd m5, [pd_64]
130
+ pmaddwd m1, m2, [r3 + 0 * 32]
131
+ pmaddwd m3, m0, [r3 + 1 * 32]
132
+ paddd m1, m3
133
+ paddd m1, m5
134
+ psrad m1, 7
135
+ pmaddwd m3, m2, [r3 + 2 * 32]
136
+ pmaddwd m0, [r3 + 3 * 32]
137
+ paddd m3, m0
138
+ paddd m3, m5
139
+ psrad m3, 7
140
+
141
+ packssdw m0, m1, m3
142
+ pshufb m0, [pb_idst4_shuf]
143
+ vpermq m1, m0, 11101110b
144
+
145
+ punpcklwd m2, m0, m1
146
+ punpckhwd m0, m1
147
+ punpcklwd m1, m2, m0
148
+ punpckhwd m2, m0
149
+
150
+ vpermq m1, m1, 01000100b
151
+ vpermq m2, m2, 01000100b
152
+
153
+ pmaddwd m0, m1, [r3 + 0 * 32]
154
+ pmaddwd m3, m2, [r3 + 1 * 32]
155
+ paddd m0, m3
156
+ paddd m0, m4
157
+ psrad m0, IDCT4_SHIFT
158
+ pmaddwd m3, m1, [r3 + 2 * 32]
159
+ pmaddwd m2, m2, [r3 + 3 * 32]
160
+ paddd m3, m2
161
+ paddd m3, m4
162
+ psrad m3, IDCT4_SHIFT
163
+
164
+ packssdw m0, m3
165
+ pshufb m1, m0, [pb_idst4_shuf]
166
+ vpermq m0, m1, 11101110b
167
+
168
+ punpcklwd m2, m1, m0
169
+ movq [r1 + 0 * r2], xm2
170
+ movhps [r1 + 1 * r2], xm2
171
+
172
+ punpckhwd m1, m0
173
+ movq [r1 + 2 * r2], xm1
174
+ lea r1, [r1 + 2 * r2]
175
+ movhps [r1 + r2], xm1
176
+ RET
177
+
178
;-------------------------------------------------------
179
; void dct8(const int16_t* src, int16_t* dst, intptr_t srcStride)
180
;-------------------------------------------------------
181
x265_1.6.tar.gz/source/common/x86/dct8.h -> x265_1.7.tar.gz/source/common/x86/dct8.h
Changed
17
1
2
void x265_dct4_sse2(const int16_t* src, int16_t* dst, intptr_t srcStride);
3
void x265_dct8_sse2(const int16_t* src, int16_t* dst, intptr_t srcStride);
4
void x265_dst4_ssse3(const int16_t* src, int16_t* dst, intptr_t srcStride);
5
+void x265_dst4_avx2(const int16_t* src, int16_t* dst, intptr_t srcStride);
6
void x265_dct8_sse4(const int16_t* src, int16_t* dst, intptr_t srcStride);
7
void x265_dct4_avx2(const int16_t* src, int16_t* dst, intptr_t srcStride);
8
void x265_dct8_avx2(const int16_t* src, int16_t* dst, intptr_t srcStride);
9
10
void x265_dct32_avx2(const int16_t* src, int16_t* dst, intptr_t srcStride);
11
12
void x265_idst4_sse2(const int16_t* src, int16_t* dst, intptr_t dstStride);
13
+void x265_idst4_avx2(const int16_t* src, int16_t* dst, intptr_t dstStride);
14
void x265_idct4_sse2(const int16_t* src, int16_t* dst, intptr_t dstStride);
15
void x265_idct4_avx2(const int16_t* src, int16_t* dst, intptr_t dstStride);
16
void x265_idct8_sse2(const int16_t* src, int16_t* dst, intptr_t dstStride);
17
x265_1.6.tar.gz/source/common/x86/intrapred.h -> x265_1.7.tar.gz/source/common/x86/intrapred.h
Changed
113
1
2
void x265_intra_pred_dc8_sse4(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int filter);
3
void x265_intra_pred_dc16_sse4(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int filter);
4
void x265_intra_pred_dc32_sse4(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int filter);
5
+void x265_intra_pred_dc32_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int filter);
6
7
void x265_intra_pred_planar4_sse2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int);
8
void x265_intra_pred_planar8_sse2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int);
9
10
void x265_intra_pred_planar8_sse4(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int);
11
void x265_intra_pred_planar16_sse4(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int);
12
void x265_intra_pred_planar32_sse4(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int);
13
+void x265_intra_pred_planar16_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int);
14
+void x265_intra_pred_planar32_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int, int);
15
16
#define DECL_ANG(bsize, mode, cpu) \
17
void x265_intra_pred_ang ## bsize ## _ ## mode ## _ ## cpu(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
18
19
DECL_ANG(4, 7, sse2);
20
DECL_ANG(4, 8, sse2);
21
DECL_ANG(4, 9, sse2);
22
+DECL_ANG(4, 10, sse2);
23
+DECL_ANG(4, 11, sse2);
24
+DECL_ANG(4, 12, sse2);
25
+DECL_ANG(4, 13, sse2);
26
+DECL_ANG(4, 14, sse2);
27
+DECL_ANG(4, 15, sse2);
28
+DECL_ANG(4, 16, sse2);
29
+DECL_ANG(4, 17, sse2);
30
+DECL_ANG(4, 18, sse2);
31
+DECL_ANG(4, 26, sse2);
32
33
DECL_ANG(4, 2, ssse3);
34
DECL_ANG(4, 3, sse4);
35
36
DECL_ANG(32, 33, sse4);
37
38
#undef DECL_ANG
39
+void x265_intra_pred_ang4_3_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
40
+void x265_intra_pred_ang4_4_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
41
+void x265_intra_pred_ang4_5_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
42
+void x265_intra_pred_ang4_6_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
43
+void x265_intra_pred_ang4_7_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
44
+void x265_intra_pred_ang4_8_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
45
+void x265_intra_pred_ang4_9_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
46
+void x265_intra_pred_ang4_11_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
47
+void x265_intra_pred_ang4_12_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
48
+void x265_intra_pred_ang4_13_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
49
+void x265_intra_pred_ang4_14_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
50
+void x265_intra_pred_ang4_15_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
51
+void x265_intra_pred_ang4_16_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
52
+void x265_intra_pred_ang4_17_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
53
+void x265_intra_pred_ang4_19_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
54
+void x265_intra_pred_ang4_20_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
55
+void x265_intra_pred_ang4_21_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
56
+void x265_intra_pred_ang4_22_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
57
+void x265_intra_pred_ang4_23_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
58
+void x265_intra_pred_ang4_24_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
59
+void x265_intra_pred_ang4_25_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
60
+void x265_intra_pred_ang4_27_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
61
+void x265_intra_pred_ang4_28_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
62
+void x265_intra_pred_ang4_29_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
63
+void x265_intra_pred_ang4_30_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
64
+void x265_intra_pred_ang4_31_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
65
+void x265_intra_pred_ang4_32_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
66
+void x265_intra_pred_ang4_33_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
67
void x265_intra_pred_ang8_3_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
68
void x265_intra_pred_ang8_33_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
69
void x265_intra_pred_ang8_4_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
70
71
void x265_intra_pred_ang8_12_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
72
void x265_intra_pred_ang8_24_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
73
void x265_intra_pred_ang8_11_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
74
+void x265_intra_pred_ang8_13_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
75
+void x265_intra_pred_ang8_14_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
76
+void x265_intra_pred_ang8_15_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
77
+void x265_intra_pred_ang8_16_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
78
+void x265_intra_pred_ang8_20_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
79
+void x265_intra_pred_ang8_21_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
80
+void x265_intra_pred_ang8_22_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
81
+void x265_intra_pred_ang8_23_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
82
+void x265_intra_pred_ang16_3_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
83
+void x265_intra_pred_ang16_4_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
84
+void x265_intra_pred_ang16_5_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
85
+void x265_intra_pred_ang16_6_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
86
+void x265_intra_pred_ang16_7_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
87
+void x265_intra_pred_ang16_8_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
88
+void x265_intra_pred_ang16_9_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
89
+void x265_intra_pred_ang16_12_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
90
+void x265_intra_pred_ang16_11_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
91
+void x265_intra_pred_ang16_13_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
92
void x265_intra_pred_ang16_25_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
93
void x265_intra_pred_ang16_28_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
94
void x265_intra_pred_ang16_27_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
95
96
void x265_intra_pred_ang32_30_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
97
void x265_intra_pred_ang32_31_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
98
void x265_intra_pred_ang32_32_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
99
+void x265_intra_pred_ang32_33_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
100
+void x265_intra_pred_ang32_25_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
101
+void x265_intra_pred_ang32_24_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
102
+void x265_intra_pred_ang32_23_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
103
+void x265_intra_pred_ang32_22_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
104
+void x265_intra_pred_ang32_21_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
105
+void x265_intra_pred_ang32_18_avx2(pixel* dst, intptr_t dstStride, const pixel* srcPix, int dirMode, int bFilter);
106
+void x265_all_angs_pred_4x4_sse2(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
107
void x265_all_angs_pred_4x4_sse4(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
108
void x265_all_angs_pred_8x8_sse4(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
109
void x265_all_angs_pred_16x16_sse4(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
110
void x265_all_angs_pred_32x32_sse4(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
111
+void x265_all_angs_pred_4x4_avx2(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma);
112
#endif // ifndef X265_INTRAPRED_H
113
x265_1.6.tar.gz/source/common/x86/intrapred16.asm -> x265_1.7.tar.gz/source/common/x86/intrapred16.asm
Changed
201
1
2
%endrep
3
RET
4
5
+;-----------------------------------------------------------------------------------------
6
+; void intraPredAng4(pixel* dst, intptr_t dstStride, pixel* src, int dirMode, int bFilter)
7
+;-----------------------------------------------------------------------------------------
8
+INIT_XMM sse2
9
+cglobal intra_pred_ang4_2, 3,5,4
10
+ lea r4, [r2 + 4]
11
+ add r2, 20
12
+ cmp r3m, byte 34
13
+ cmove r2, r4
14
+
15
+ add r1, r1
16
+ movu m0, [r2]
17
+ movh [r0], m0
18
+ psrldq m0, 2
19
+ movh [r0 + r1], m0
20
+ psrldq m0, 2
21
+ movh [r0 + r1 * 2], m0
22
+ lea r1, [r1 * 3]
23
+ psrldq m0, 2
24
+ movh [r0 + r1], m0
25
+ RET
26
+
27
+cglobal intra_pred_ang4_3, 3,5,8
28
+ mov r4d, 2
29
+ cmp r3m, byte 33
30
+ mov r3d, 18
31
+ cmove r3d, r4d
32
+
33
+ movu m0, [r2 + r3] ; [8 7 6 5 4 3 2 1]
34
+
35
+ mova m2, m0
36
+ psrldq m0, 2
37
+ punpcklwd m2, m0 ; [5 4 4 3 3 2 2 1]
38
+ mova m3, m0
39
+ psrldq m0, 2
40
+ punpcklwd m3, m0 ; [6 5 5 4 4 3 3 2]
41
+ mova m4, m0
42
+ psrldq m0, 2
43
+ punpcklwd m4, m0 ; [7 6 6 5 5 4 4 3]
44
+ mova m5, m0
45
+ psrldq m0, 2
46
+ punpcklwd m5, m0 ; [8 7 7 6 6 5 5 4]
47
+
48
+
49
+ lea r3, [ang_table + 20 * 16]
50
+ mova m0, [r3 + 6 * 16] ; [26]
51
+ mova m1, [r3] ; [20]
52
+ mova m6, [r3 - 6 * 16] ; [14]
53
+ mova m7, [r3 - 12 * 16] ; [ 8]
54
+ jmp .do_filter4x4
55
+
56
+
57
+ALIGN 16
58
+.do_filter4x4:
59
+ lea r4, [pd_16]
60
+ pmaddwd m2, m0
61
+ paddd m2, [r4]
62
+ psrld m2, 5
63
+
64
+ pmaddwd m3, m1
65
+ paddd m3, [r4]
66
+ psrld m3, 5
67
+ packssdw m2, m3
68
+
69
+ pmaddwd m4, m6
70
+ paddd m4, [r4]
71
+ psrld m4, 5
72
+
73
+ pmaddwd m5, m7
74
+ paddd m5, [r4]
75
+ psrld m5, 5
76
+ packssdw m4, m5
77
+
78
+ jz .store
79
+
80
+ ; transpose 4x4
81
+ punpckhwd m0, m2, m4
82
+ punpcklwd m2, m4
83
+ punpckhwd m4, m2, m0
84
+ punpcklwd m2, m0
85
+
86
+.store:
87
+ add r1, r1
88
+ movh [r0], m2
89
+ movhps [r0 + r1], m2
90
+ movh [r0 + r1 * 2], m4
91
+ lea r1, [r1 * 3]
92
+ movhps [r0 + r1], m4
93
+ RET
94
+
95
+cglobal intra_pred_ang4_4, 3,5,8
96
+ mov r4d, 2
97
+ cmp r3m, byte 32
98
+ mov r3d, 18
99
+ cmove r3d, r4d
100
+
101
+ movu m0, [r2 + r3] ; [8 7 6 5 4 3 2 1]
102
+ mova m2, m0
103
+ psrldq m0, 2
104
+ punpcklwd m2, m0 ; [5 4 4 3 3 2 2 1]
105
+ mova m3, m0
106
+ psrldq m0, 2
107
+ punpcklwd m3, m0 ; [6 5 5 4 4 3 3 2]
108
+ mova m4, m3
109
+ mova m5, m0
110
+ psrldq m0, 2
111
+ punpcklwd m5, m0 ; [7 6 6 5 5 4 4 3]
112
+
113
+ lea r3, [ang_table + 18 * 16]
114
+ mova m0, [r3 + 3 * 16] ; [21]
115
+ mova m1, [r3 - 8 * 16] ; [10]
116
+ mova m6, [r3 + 13 * 16] ; [31]
117
+ mova m7, [r3 + 2 * 16] ; [20]
118
+ jmp mangle(private_prefix %+ _ %+ intra_pred_ang4_3 %+ SUFFIX %+ .do_filter4x4)
119
+
120
+cglobal intra_pred_ang4_5, 3,5,8
121
+ mov r4d, 2
122
+ cmp r3m, byte 31
123
+ mov r3d, 18
124
+ cmove r3d, r4d
125
+
126
+ movu m0, [r2 + r3] ; [8 7 6 5 4 3 2 1]
127
+ mova m2, m0
128
+ psrldq m0, 2
129
+ punpcklwd m2, m0 ; [5 4 4 3 3 2 2 1]
130
+ mova m3, m0
131
+ psrldq m0, 2
132
+ punpcklwd m3, m0 ; [6 5 5 4 4 3 3 2]
133
+ mova m4, m3
134
+ mova m5, m0
135
+ psrldq m0, 2
136
+ punpcklwd m5, m0 ; [7 6 6 5 5 4 4 3]
137
+
138
+ lea r3, [ang_table + 10 * 16]
139
+ mova m0, [r3 + 7 * 16] ; [17]
140
+ mova m1, [r3 - 8 * 16] ; [ 2]
141
+ mova m6, [r3 + 9 * 16] ; [19]
142
+ mova m7, [r3 - 6 * 16] ; [ 4]
143
+ jmp mangle(private_prefix %+ _ %+ intra_pred_ang4_3 %+ SUFFIX %+ .do_filter4x4)
144
+
145
+cglobal intra_pred_ang4_6, 3,5,8
146
+ mov r4d, 2
147
+ cmp r3m, byte 30
148
+ mov r3d, 18
149
+ cmove r3d, r4d
150
+
151
+ movu m0, [r2 + r3] ; [8 7 6 5 4 3 2 1]
152
+ mova m2, m0
153
+ psrldq m0, 2
154
+ punpcklwd m2, m0 ; [5 4 4 3 3 2 2 1]
155
+ mova m3, m2
156
+ mova m4, m0
157
+ psrldq m0, 2
158
+ punpcklwd m4, m0 ; [6 5 5 4 4 3 3 2]
159
+ mova m5, m4
160
+
161
+ lea r3, [ang_table + 19 * 16]
162
+ mova m0, [r3 - 6 * 16] ; [13]
163
+ mova m1, [r3 + 7 * 16] ; [26]
164
+ mova m6, [r3 - 12 * 16] ; [ 7]
165
+ mova m7, [r3 + 1 * 16] ; [20]
166
+ jmp mangle(private_prefix %+ _ %+ intra_pred_ang4_3 %+ SUFFIX %+ .do_filter4x4)
167
+
168
+cglobal intra_pred_ang4_7, 3,5,8
169
+ mov r4d, 2
170
+ cmp r3m, byte 29
171
+ mov r3d, 18
172
+ cmove r3d, r4d
173
+
174
+ movu m0, [r2 + r3] ; [8 7 6 5 4 3 2 1]
175
+ mova m2, m0
176
+ psrldq m0, 2
177
+ punpcklwd m2, m0 ; [5 4 4 3 3 2 2 1]
178
+ mova m3, m2
179
+ mova m4, m2
180
+ mova m5, m0
181
+ psrldq m0, 2
182
+ punpcklwd m5, m0 ; [6 5 5 4 4 3 3 2]
183
+
184
+ lea r3, [ang_table + 20 * 16]
185
+ mova m0, [r3 - 11 * 16] ; [ 9]
186
+ mova m1, [r3 - 2 * 16] ; [18]
187
+ mova m6, [r3 + 7 * 16] ; [27]
188
+ mova m7, [r3 - 16 * 16] ; [ 4]
189
+ jmp mangle(private_prefix %+ _ %+ intra_pred_ang4_3 %+ SUFFIX %+ .do_filter4x4)
190
+
191
+cglobal intra_pred_ang4_8, 3,5,8
192
+ mov r4d, 2
193
+ cmp r3m, byte 28
194
+ mov r3d, 18
195
+ cmove r3d, r4d
196
+
197
+ movu m0, [r2 + r3] ; [8 7 6 5 4 3 2 1]
198
+ mova m2, m0
199
+ psrldq m0, 2
200
+ punpcklwd m2, m0 ; [5 4 4 3 3 2 2 1]
201
x265_1.6.tar.gz/source/common/x86/intrapred8.asm -> x265_1.7.tar.gz/source/common/x86/intrapred8.asm
Changed
201
1
2
SECTION_RODATA 32
3
4
intra_pred_shuff_0_8: times 2 db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8
5
+intra_pred_shuff_15_0: times 2 db 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
6
7
pb_0_8 times 8 db 0, 8
8
pb_unpackbw1 times 2 db 1, 8, 2, 8, 3, 8, 4, 8
9
10
c_mode16_18: db 0, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1
11
12
ALIGN 32
13
-trans8_shuf: dd 0, 4, 1, 5, 2, 6, 3, 7
14
c_ang8_src1_9_2_10: db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9
15
c_ang8_26_20: db 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20
16
c_ang8_src3_11_4_12: db 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11
17
18
db 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4
19
db 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0
20
21
+ALIGN 32
22
+c_ang16_mode_11: db 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14
23
+ db 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12
24
+ db 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10
25
+ db 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8
26
+ db 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6
27
+ db 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4
28
+ db 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2
29
+ db 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0
30
+
31
+
32
+ALIGN 32
33
+c_ang16_mode_12: db 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19
34
+ db 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14
35
+ db 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9
36
+ db 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4
37
+ db 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31
38
+ db 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26
39
+ db 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21
40
+ db 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
41
+
42
+
43
+ALIGN 32
44
+c_ang16_mode_13: db 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15
45
+ db 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6
46
+ db 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29
47
+ db 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20
48
+ db 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11
49
+ db 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2
50
+ db 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25
51
+ db 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
52
53
ALIGN 32
54
c_ang16_mode_28: db 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10
55
56
db 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6
57
db 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
58
59
+ALIGN 32
60
+c_ang16_mode_9: db 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18
61
+ db 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20
62
+ db 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22
63
+ db 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24
64
+ db 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26
65
+ db 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28
66
+ db 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30
67
+ db 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0
68
69
ALIGN 32
70
c_ang16_mode_27: db 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4
71
72
ALIGN 32
73
intra_pred_shuff_0_15: db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 15
74
75
+ALIGN 32
76
+c_ang16_mode_8: db 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13
77
+ db 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18
78
+ db 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23
79
+ db 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28
80
+ db 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1
81
+ db 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6
82
+ db 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11
83
+ db 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
84
85
ALIGN 32
86
c_ang16_mode_29: db 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18
87
88
db 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30
89
db 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
90
91
+ALIGN 32
92
+c_ang16_mode_7: db 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17
93
+ db 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26
94
+ db 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3
95
+ db 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12
96
+ db 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21
97
+ db 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30
98
+ db 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7
99
+ db 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
100
101
ALIGN 32
102
c_ang16_mode_30: db 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26
103
104
db 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
105
106
107
+
108
+ALIGN 32
109
+c_ang16_mode_6: db 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21
110
+ db 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2
111
+ db 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15
112
+ db 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28
113
+ db 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9
114
+ db 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22
115
+ db 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3
116
+ db 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
117
+
118
ALIGN 32
119
c_ang16_mode_31: db 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17
120
db 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19
121
122
db 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31
123
db 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
124
125
+
126
+ALIGN 32
127
+c_ang16_mode_5: db 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25
128
+ db 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10
129
+ db 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27
130
+ db 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12
131
+ db 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29
132
+ db 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14
133
+ db 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31
134
+ db 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
135
+
136
ALIGN 32
137
c_ang16_mode_32: db 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21
138
db 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31
139
140
db 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
141
142
ALIGN 32
143
+c_ang16_mode_4: db 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29, 3, 29
144
+ db 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18
145
+ db 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 1, 31, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7
146
+ db 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28
147
+ db 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 23, 9, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17
148
+ db 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6
149
+ db 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 13, 19, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27
150
+ db 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
151
+
152
+ALIGN 32
153
c_ang16_mode_33: db 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26
154
db 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20
155
db 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14
156
157
db 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0
158
159
ALIGN 32
160
+c_ang16_mode_3: db 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10
161
+ db 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4
162
+ db 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30
163
+ db 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24
164
+ db 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18
165
+ db 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12
166
+ db 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6
167
+ db 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0
168
+
169
+ALIGN 32
170
c_ang16_mode_24: db 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 5, 27, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22
171
db 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 15, 17, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12
172
db 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 25, 7, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2
173
174
db 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11, 21, 11
175
db 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0
176
177
+
178
+ALIGN 32
179
+c_ang32_mode_33: db 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26
180
+ db 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20
181
+ db 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14
182
+ db 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8
183
+ db 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28
184
+ db 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22
185
+ db 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
186
+ db 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10
187
+ db 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30
188
+ db 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24
189
+ db 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18
190
+ db 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12
191
+ db 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6
192
+ db 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 32, 0, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26, 6, 26
193
+ db 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20, 12, 20
194
+ db 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14, 18, 14
195
+ db 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8, 24, 8
196
+ db 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28
197
+ db 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22
198
+ db 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16
199
+ db 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10, 22, 10
200
+ db 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 28, 4, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30, 2, 30
201
x265_1.6.tar.gz/source/common/x86/intrapred8_allangs.asm -> x265_1.7.tar.gz/source/common/x86/intrapred8_allangs.asm
Changed
201
1
2
;* Copyright (C) 2013 x265 project
3
;*
4
;* Authors: Min Chen <chenm003@163.com> <min.chen@multicorewareinc.com>
5
-;* Praveen Tiwari <praveen@multicorewareinc.com>
6
+;* Praveen Tiwari <praveen@multicorewareinc.com>
7
;*
8
;* This program is free software; you can redistribute it and/or modify
9
;* it under the terms of the GNU General Public License as published by
10
11
12
SECTION_RODATA 32
13
14
+all_ang4_shuff: db 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6, 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6
15
+ db 0, 1, 1, 2, 2, 3, 3, 4, 1, 2, 2, 3, 3, 4, 4, 5, 2, 3, 3, 4, 4, 5, 5, 6, 3, 4, 4, 5, 5, 6, 6, 7
16
+ db 0, 1, 1, 2, 2, 3, 3, 4, 1, 2, 2, 3, 3, 4, 4, 5, 1, 2, 2, 3, 3, 4, 4, 5, 2, 3, 3, 4, 4, 5, 5, 6
17
+ db 0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 2, 3, 3, 4, 1, 2, 2, 3, 3, 4, 4, 5, 1, 2, 2, 3, 3, 4, 4, 5
18
+ db 0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 2, 3, 3, 4, 1, 2, 2, 3, 3, 4, 4, 5
19
+ db 0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 2, 3, 3, 4
20
+ db 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3
21
+ db 0, 9, 9, 10, 10, 11, 11, 12, 0, 9, 9, 10, 10, 11, 11, 12, 0, 9, 9, 10, 10, 11, 11, 12, 0, 9, 9, 10, 10, 11, 11, 12
22
+ db 0, 9, 9, 10, 10, 11, 11, 12, 0, 9, 9, 10, 10, 11, 11, 12, 0, 9, 9, 10, 10, 11, 11, 12, 4, 0, 0, 9, 9, 10, 10, 11
23
+ db 0, 9, 9, 10, 10, 11, 11, 12, 0, 9, 9, 10, 10, 11, 11, 12, 2, 0, 0, 9, 9, 10, 10, 11, 2, 0, 0, 9, 9, 10, 10, 11
24
+ db 0, 9, 9, 10, 10, 11, 11, 12, 2, 0, 0, 9, 9, 10, 10, 11, 2, 0, 0, 9, 9, 10, 10, 11, 4, 2, 2, 0, 0, 9, 9, 10
25
+ db 0, 9, 9, 10, 10, 11, 11, 12, 2, 0, 0, 9, 9, 10, 10, 11, 2, 0, 0, 9, 9, 10, 10, 11, 3, 2, 2, 0, 0, 9, 9, 10
26
+ db 0, 9, 9, 10, 10, 11, 11, 12, 1, 0, 0, 9, 9, 10, 10, 11, 2, 1, 1, 0, 0, 9, 9, 10, 4, 2, 2, 1, 1, 0, 0, 9
27
+ db 0, 1, 2, 3, 9, 0, 1, 2, 10, 9, 0, 1, 11, 10, 9, 0, 0, 1, 2, 3, 9, 0, 1, 2, 10, 9, 0, 1, 11, 10, 9, 0
28
+ db 0, 1, 1, 2, 2, 3, 3, 4, 9, 0, 0, 1, 1, 2, 2, 3, 10, 9, 9, 0, 0, 1, 1, 2, 12, 10, 10, 9, 9, 0, 0, 1
29
+ db 0, 1, 1, 2, 2, 3, 3, 4, 10, 0, 0, 1, 1, 2, 2, 3, 10, 0, 0, 1, 1, 2, 2, 3, 11, 10, 10, 0, 0, 1, 1, 2
30
+ db 0, 1, 1, 2, 2, 3, 3, 4, 10, 0, 0, 1, 1, 2, 2, 3, 10, 0, 0, 1, 1, 2, 2, 3, 12, 10, 10, 0, 0, 1, 1, 2
31
+ db 0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 2, 3, 3, 4, 10, 0, 0, 1, 1, 2, 2, 3, 10, 0, 0, 1, 1, 2, 2, 3
32
+ db 0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 2, 3, 3, 4, 12, 0, 0, 1, 1, 2, 2, 3
33
+ db 0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 2, 3, 3, 4, 0, 1, 1, 2, 2, 3, 3, 4
34
+ db 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4
35
+ db 1, 2, 2, 3, 3, 4, 4, 5, 1, 2, 2, 3, 3, 4, 4, 5, 1, 2, 2, 3, 3, 4, 4, 5, 1, 2, 2, 3, 3, 4, 4, 5
36
+ db 1, 2, 2, 3, 3, 4, 4, 5, 1, 2, 2, 3, 3, 4, 4, 5, 1, 2, 2, 3, 3, 4, 4, 5, 2, 3, 3, 4, 4, 5, 5, 6
37
+ db 1, 2, 2, 3, 3, 4, 4, 5, 1, 2, 2, 3, 3, 4, 4, 5, 2, 3, 3, 4, 4, 5, 5, 6, 2, 3, 3, 4, 4, 5, 5, 6
38
+ db 1, 2, 2, 3, 3, 4, 4, 5, 2, 3, 3, 4, 4, 5, 5, 6, 2, 3, 3, 4, 4, 5, 5, 6, 3, 4, 4, 5, 5, 6, 6, 7
39
+ db 1, 2, 2, 3, 3, 4, 4, 5, 2, 3, 3, 4, 4, 5, 5, 6, 3, 4, 4, 5, 5, 6, 6, 7, 4, 5, 5, 6, 6, 7, 7, 8
40
+ db 2, 3, 4, 5, 3, 4, 5, 6, 4, 5, 6, 7, 5, 6, 7, 8, 2, 3, 4, 5, 3, 4, 5, 6, 4, 5, 6, 7, 5, 6, 7, 8
41
+
42
+all_ang4: db 6, 26, 6, 26, 6, 26, 6, 26, 12, 20, 12, 20, 12, 20, 12, 20, 18, 14, 18, 14, 18, 14, 18, 14, 24, 8, 24, 8, 24, 8, 24, 8
43
+ db 11, 21, 11, 21, 11, 21, 11, 21, 22, 10, 22, 10, 22, 10, 22, 10, 1, 31, 1, 31, 1, 31, 1, 31, 12, 20, 12, 20, 12, 20, 12, 20
44
+ db 15, 17, 15, 17, 15, 17, 15, 17, 30, 2, 30, 2, 30, 2, 30, 2, 13, 19, 13, 19, 13, 19, 13, 19, 28, 4, 28, 4, 28, 4, 28, 4
45
+ db 19, 13, 19, 13, 19, 13, 19, 13, 6, 26, 6, 26, 6, 26, 6, 26, 25, 7, 25, 7, 25, 7, 25, 7, 12, 20, 12, 20, 12, 20, 12, 20
46
+ db 23, 9, 23, 9, 23, 9, 23, 9, 14, 18, 14, 18, 14, 18, 14, 18, 5, 27, 5, 27, 5, 27, 5, 27, 28, 4, 28, 4, 28, 4, 28, 4
47
+ db 27, 5, 27, 5, 27, 5, 27, 5, 22, 10, 22, 10, 22, 10, 22, 10, 17, 15, 17, 15, 17, 15, 17, 15, 12, 20, 12, 20, 12, 20, 12, 20
48
+ db 30, 2, 30, 2, 30, 2, 30, 2, 28, 4, 28, 4, 28, 4, 28, 4, 26, 6, 26, 6, 26, 6, 26, 6, 24, 8, 24, 8, 24, 8, 24, 8
49
+ db 2, 30, 2, 30, 2, 30, 2, 30, 4, 28, 4, 28, 4, 28, 4, 28, 6, 26, 6, 26, 6, 26, 6, 26, 8, 24, 8, 24, 8, 24, 8, 24
50
+ db 5, 27, 5, 27, 5, 27, 5, 27, 10, 22, 10, 22, 10, 22, 10, 22, 15, 17, 15, 17, 15, 17, 15, 17, 20, 12, 20, 12, 20, 12, 20, 12
51
+ db 9, 23, 9, 23, 9, 23, 9, 23, 18, 14, 18, 14, 18, 14, 18, 14, 27, 5, 27, 5, 27, 5, 27, 5, 4, 28, 4, 28, 4, 28, 4, 28
52
+ db 13, 19, 13, 19, 13, 19, 13, 19, 26, 6, 26, 6, 26, 6, 26, 6, 7, 25, 7, 25, 7, 25, 7, 25, 20, 12, 20, 12, 20, 12, 20, 12
53
+ db 17, 15, 17, 15, 17, 15, 17, 15, 2, 30, 2, 30, 2, 30, 2, 30, 19, 13, 19, 13, 19, 13, 19, 13, 4, 28, 4, 28, 4, 28, 4, 28
54
+ db 21, 11, 21, 11, 21, 11, 21, 11, 10, 22, 10, 22, 10, 22, 10, 22, 31, 1, 31, 1, 31, 1, 31, 1, 20, 12, 20, 12, 20, 12, 20, 12
55
+ db 26, 6, 26, 6, 26, 6, 26, 6, 20, 12, 20, 12, 20, 12, 20, 12, 14, 18, 14, 18, 14, 18, 14, 18, 8, 24, 8, 24, 8, 24, 8, 24
56
+ db 26, 6, 26, 6, 26, 6, 26, 6, 20, 12, 20, 12, 20, 12, 20, 12, 14, 18, 14, 18, 14, 18, 14, 18, 8, 24, 8, 24, 8, 24, 8, 24
57
+ db 21, 11, 21, 11, 21, 11, 21, 11, 10, 22, 10, 22, 10, 22, 10, 22, 31, 1, 31, 1, 31, 1, 31, 1, 20, 12, 20, 12, 20, 12, 20, 12
58
+ db 17, 15, 17, 15, 17, 15, 17, 15, 2, 30, 2, 30, 2, 30, 2, 30, 19, 13, 19, 13, 19, 13, 19, 13, 4, 28, 4, 28, 4, 28, 4, 28
59
+ db 13, 19, 13, 19, 13, 19, 13, 19, 26, 6, 26, 6, 26, 6, 26, 6, 7, 25, 7, 25, 7, 25, 7, 25, 20, 12, 20, 12, 20, 12, 20, 12
60
+ db 9, 23, 9, 23, 9, 23, 9, 23, 18, 14, 18, 14, 18, 14, 18, 14, 27, 5, 27, 5, 27, 5, 27, 5, 4, 28, 4, 28, 4, 28, 4, 28
61
+ db 5, 27, 5, 27, 5, 27, 5, 27, 10, 22, 10, 22, 10, 22, 10, 22, 15, 17, 15, 17, 15, 17, 15, 17, 20, 12, 20, 12, 20, 12, 20, 12
62
+ db 2, 30, 2, 30, 2, 30, 2, 30, 4, 28, 4, 28, 4, 28, 4, 28, 6, 26, 6, 26, 6, 26, 6, 26, 8, 24, 8, 24, 8, 24, 8, 24
63
+ db 30, 2, 30, 2, 30, 2, 30, 2, 28, 4, 28, 4, 28, 4, 28, 4, 26, 6, 26, 6, 26, 6, 26, 6, 24, 8, 24, 8, 24, 8, 24, 8
64
+ db 27, 5, 27, 5, 27, 5, 27, 5, 22, 10, 22, 10, 22, 10, 22, 10, 17, 15, 17, 15, 17, 15, 17, 15, 12, 20, 12, 20, 12, 20, 12, 20
65
+ db 23, 9, 23, 9, 23, 9, 23, 9, 14, 18, 14, 18, 14, 18, 14, 18, 5, 27, 5, 27, 5, 27, 5, 27, 28, 4, 28, 4, 28, 4, 28, 4
66
+ db 19, 13, 19, 13, 19, 13, 19, 13, 6, 26, 6, 26, 6, 26, 6, 26, 25, 7, 25, 7, 25, 7, 25, 7, 12, 20, 12, 20, 12, 20, 12, 20
67
+ db 15, 17, 15, 17, 15, 17, 15, 17, 30, 2, 30, 2, 30, 2, 30, 2, 13, 19, 13, 19, 13, 19, 13, 19, 28, 4, 28, 4, 28, 4, 28, 4
68
+ db 11, 21, 11, 21, 11, 21, 11, 21, 22, 10, 22, 10, 22, 10, 22, 10, 1, 31, 1, 31, 1, 31, 1, 31, 12, 20, 12, 20, 12, 20, 12, 20
69
+ db 6, 26, 6, 26, 6, 26, 6, 26, 12, 20, 12, 20, 12, 20, 12, 20, 18, 14, 18, 14, 18, 14, 18, 14, 24, 8, 24, 8, 24, 8, 24, 8
70
+
71
+
72
SECTION .text
73
74
; global constant
75
76
77
; common constant with intrapred8.asm
78
cextern ang_table
79
+cextern pw_ang_table
80
cextern tab_S1
81
cextern tab_S2
82
cextern tab_Si
83
+cextern pw_16
84
+cextern pb_000000000000000F
85
+cextern pb_0000000000000F0F
86
+cextern pw_FFFFFFFFFFFFFFF0
87
88
89
;-----------------------------------------------------------------------------
90
91
palignr m4, m2, m1, 14
92
movu [r0 + 2111 * 16], m4
93
RET
94
+
95
+
96
+;-----------------------------------------------------------------------------
97
+; void all_angs_pred_4x4(pixel *dest, pixel *refPix, pixel *filtPix, int bLuma)
98
+;-----------------------------------------------------------------------------
99
+INIT_YMM avx2
100
+cglobal all_angs_pred_4x4, 4, 4, 6
101
+
102
+ mova m5, [pw_1024]
103
+ lea r2, [all_ang4]
104
+ lea r3, [all_ang4_shuff]
105
+
106
+; mode 2
107
+
108
+ vbroadcasti128 m0, [r1 + 9]
109
+ mova xm1, xm0
110
+ psrldq xm1, 1
111
+ pshufb xm1, [r3]
112
+ movu [r0], xm1
113
+
114
+; mode 3
115
+
116
+ pshufb m1, m0, [r3 + 1 * mmsize]
117
+ pmaddubsw m1, [r2]
118
+ pmulhrsw m1, m5
119
+
120
+; mode 4
121
+
122
+ pshufb m2, m0, [r3 + 2 * mmsize]
123
+ pmaddubsw m2, [r2 + 1 * mmsize]
124
+ pmulhrsw m2, m5
125
+ packuswb m1, m2
126
+ vpermq m1, m1, 11011000b
127
+ movu [r0 + (3 - 2) * 16], m1
128
+
129
+; mode 5
130
+
131
+ pshufb m1, m0, [r3 + 2 * mmsize]
132
+ pmaddubsw m1, [r2 + 2 * mmsize]
133
+ pmulhrsw m1, m5
134
+
135
+; mode 6
136
+
137
+ pshufb m2, m0, [r3 + 3 * mmsize]
138
+ pmaddubsw m2, [r2 + 3 * mmsize]
139
+ pmulhrsw m2, m5
140
+ packuswb m1, m2
141
+ vpermq m1, m1, 11011000b
142
+ movu [r0 + (5 - 2) * 16], m1
143
+
144
+ add r3, 4 * mmsize
145
+ add r2, 4 * mmsize
146
+
147
+; mode 7
148
+
149
+ pshufb m1, m0, [r3 + 0 * mmsize]
150
+ pmaddubsw m1, [r2 + 0 * mmsize]
151
+ pmulhrsw m1, m5
152
+
153
+; mode 8
154
+
155
+ pshufb m2, m0, [r3 + 1 * mmsize]
156
+ pmaddubsw m2, [r2 + 1 * mmsize]
157
+ pmulhrsw m2, m5
158
+ packuswb m1, m2
159
+ vpermq m1, m1, 11011000b
160
+ movu [r0 + (7 - 2) * 16], m1
161
+
162
+; mode 9
163
+
164
+ pshufb m1, m0, [r3 + 1 * mmsize]
165
+ pmaddubsw m1, [r2 + 2 * mmsize]
166
+ pmulhrsw m1, m5
167
+ packuswb m1, m1
168
+ vpermq m1, m1, 11011000b
169
+ movu [r0 + (9 - 2) * 16], xm1
170
+
171
+; mode 10
172
+
173
+ pshufb xm1, xm0, [r3 + 2 * mmsize]
174
+ movu [r0 + (10 - 2) * 16], xm1
175
+
176
+ pxor xm1, xm1
177
+ movd xm2, [r1 + 1]
178
+ pshufd xm3, xm2, 0
179
+ punpcklbw xm3, xm1
180
+ pinsrb xm2, [r1], 0
181
+ pshufb xm4, xm2, xm1
182
+ punpcklbw xm4, xm1
183
+ psubw xm3, xm4
184
+ psraw xm3, 1
185
+ pshufb xm4, xm0, xm1
186
+ punpcklbw xm4, xm1
187
+ paddw xm3, xm4
188
+ packuswb xm3, xm1
189
+
190
+ pextrb [r0 + 128], xm3, 0
191
+ pextrb [r0 + 132], xm3, 1
192
+ pextrb [r0 + 136], xm3, 2
193
+ pextrb [r0 + 140], xm3, 3
194
+
195
+; mode 11
196
+
197
+ vbroadcasti128 m0, [r1]
198
+ pshufb m1, m0, [r3 + 3 * mmsize]
199
+ pmaddubsw m1, [r2 + 3 * mmsize]
200
+ pmulhrsw m1, m5
201
x265_1.6.tar.gz/source/common/x86/ipfilter16.asm -> x265_1.7.tar.gz/source/common/x86/ipfilter16.asm
Changed
201
1
2
times 8 dw 58, -10
3
times 8 dw 4, -1
4
5
+const interp8_hps_shuf, dd 0, 4, 1, 5, 2, 6, 3, 7
6
+
7
SECTION .text
8
cextern pd_32
9
cextern pw_pixel_max
10
cextern pd_n32768
11
+cextern pw_2000
12
13
;------------------------------------------------------------------------------------------------------------
14
; void interp_8tap_horiz_pp_4x4(pixel *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int coeffIdx)
15
16
FILTER_VER_LUMA_SS 64, 16
17
FILTER_VER_LUMA_SS 16, 64
18
19
-;--------------------------------------------------------------------------------------------------
20
-; void filterConvertPelToShort(pixel *src, intptr_t srcStride, int16_t *dst, int width, int height)
21
-;--------------------------------------------------------------------------------------------------
22
-INIT_XMM sse2
23
-cglobal luma_p2s, 3, 7, 5
24
+;-----------------------------------------------------------------------------
25
+; void filterPixelToShort(pixel *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride)
26
+;-----------------------------------------------------------------------------
27
+%macro P2S_H_2xN 1
28
+INIT_XMM sse4
29
+cglobal filterPixelToShort_2x%1, 3, 6, 2
30
+ add r1d, r1d
31
+ mov r3d, r3m
32
+ add r3d, r3d
33
+ lea r4, [r1 * 3]
34
+ lea r5, [r3 * 3]
35
36
- add r1, r1
37
+ ; load constant
38
+ mova m1, [pw_2000]
39
40
- ; load width and height
41
- mov r3d, r3m
42
- mov r4d, r4m
43
+%rep %1/4
44
+ movd m0, [r0]
45
+ movhps m0, [r0 + r1]
46
+ psllw m0, 4
47
+ psubw m0, m1
48
+
49
+ movd [r2 + r3 * 0], m0
50
+ pextrd [r2 + r3 * 1], m0, 2
51
+
52
+ movd m0, [r0 + r1 * 2]
53
+ movhps m0, [r0 + r4]
54
+ psllw m0, 4
55
+ psubw m0, m1
56
+
57
+ movd [r2 + r3 * 2], m0
58
+ pextrd [r2 + r5], m0, 2
59
+
60
+ lea r0, [r0 + r1 * 4]
61
+ lea r2, [r2 + r3 * 4]
62
+%endrep
63
+ RET
64
+%endmacro
65
+P2S_H_2xN 4
66
+P2S_H_2xN 8
67
+P2S_H_2xN 16
68
+
69
+;-----------------------------------------------------------------------------
70
+; void filterPixelToShort(pixel *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride)
71
+;-----------------------------------------------------------------------------
72
+%macro P2S_H_4xN 1
73
+INIT_XMM ssse3
74
+cglobal filterPixelToShort_4x%1, 3, 6, 2
75
+ add r1d, r1d
76
+ mov r3d, r3m
77
+ add r3d, r3d
78
+ lea r4, [r3 * 3]
79
+ lea r5, [r1 * 3]
80
81
; load constant
82
- mova m4, [tab_c_n8192]
83
+ mova m1, [pw_2000]
84
85
-.loopH:
86
+%rep %1/4
87
+ movh m0, [r0]
88
+ movhps m0, [r0 + r1]
89
+ psllw m0, 4
90
+ psubw m0, m1
91
+ movh [r2 + r3 * 0], m0
92
+ movhps [r2 + r3 * 1], m0
93
+
94
+ movh m0, [r0 + r1 * 2]
95
+ movhps m0, [r0 + r5]
96
+ psllw m0, 4
97
+ psubw m0, m1
98
+ movh [r2 + r3 * 2], m0
99
+ movhps [r2 + r4], m0
100
101
- xor r5d, r5d
102
-.loopW:
103
- lea r6, [r0 + r5 * 2]
104
+ lea r0, [r0 + r1 * 4]
105
+ lea r2, [r2 + r3 * 4]
106
+%endrep
107
+ RET
108
+%endmacro
109
+P2S_H_4xN 4
110
+P2S_H_4xN 8
111
+P2S_H_4xN 16
112
+P2S_H_4xN 32
113
114
- movu m0, [r6]
115
- psllw m0, 4
116
- paddw m0, m4
117
+;-----------------------------------------------------------------------------
118
+; void filterPixelToShort(pixel *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride)
119
+;-----------------------------------------------------------------------------
120
+INIT_XMM ssse3
121
+cglobal filterPixelToShort_4x2, 3, 4, 1
122
+ add r1d, r1d
123
+ mov r3d, r3m
124
+ add r3d, r3d
125
126
- movu m1, [r6 + r1]
127
- psllw m1, 4
128
- paddw m1, m4
129
+ movh m0, [r0]
130
+ movhps m0, [r0 + r1]
131
+ psllw m0, 4
132
+ psubw m0, [pw_2000]
133
+ movh [r2 + r3 * 0], m0
134
+ movhps [r2 + r3 * 1], m0
135
136
- movu m2, [r6 + r1 * 2]
137
- psllw m2, 4
138
- paddw m2, m4
139
-
140
- lea r6, [r6 + r1 * 2]
141
- movu m3, [r6 + r1]
142
- psllw m3, 4
143
- paddw m3, m4
144
+ RET
145
146
- add r5, 8
147
- cmp r5, r3
148
- jg .width4
149
- movu [r2 + r5 * 2 + FENC_STRIDE * 0 - 16], m0
150
- movu [r2 + r5 * 2 + FENC_STRIDE * 2 - 16], m1
151
- movu [r2 + r5 * 2 + FENC_STRIDE * 4 - 16], m2
152
- movu [r2 + r5 * 2 + FENC_STRIDE * 6 - 16], m3
153
- je .nextH
154
- jmp .loopW
155
+;-----------------------------------------------------------------------------
156
+; void filterPixelToShort(pixel *src, intptr_t srcStride, int16_t *dst, intptr_t dstStride)
157
+;-----------------------------------------------------------------------------
158
+%macro P2S_H_6xN 1
159
+INIT_XMM sse4
160
+cglobal filterPixelToShort_6x%1, 3, 7, 3
161
+ add r1d, r1d
162
+ mov r3d, r3m
163
+ add r3d, r3d
164
+ lea r4, [r3 * 3]
165
+ lea r5, [r1 * 3]
166
167
-.width4:
168
- movh [r2 + r5 * 2 + FENC_STRIDE * 0 - 16], m0
169
- movh [r2 + r5 * 2 + FENC_STRIDE * 2 - 16], m1
170
- movh [r2 + r5 * 2 + FENC_STRIDE * 4 - 16], m2
171
- movh [r2 + r5 * 2 + FENC_STRIDE * 6 - 16], m3
172
+ ; load height
173
+ mov r6d, %1/4
174
175
-.nextH:
176
- lea r0, [r0 + r1 * 4]
177
- add r2, FENC_STRIDE * 8
178
+ ; load constant
179
+ mova m2, [pw_2000]
180
181
- sub r4d, 4
182
- jnz .loopH
183
+.loop
184
+ movu m0, [r0]
185
+ movu m1, [r0 + r1]
186
+ psllw m0, 4
187
+ psubw m0, m2
188
+ psllw m1, 4
189
+ psubw m1, m2
190
+
191
+ movh [r2 + r3 * 0], m0
192
+ pextrd [r2 + r3 * 0 + 8], m0, 2
193
+ movh [r2 + r3 * 1], m1
194
+ pextrd [r2 + r3 * 1 + 8], m1, 2
195
+
196
+ movu m0, [r0 + r1 * 2]
197
+ movu m1, [r0 + r5]
198
+ psllw m0, 4
199
+ psubw m0, m2
200
+ psllw m1, 4
201
x265_1.6.tar.gz/source/common/x86/ipfilter8.asm -> x265_1.7.tar.gz/source/common/x86/ipfilter8.asm
Changed
201
1
2
%include "x86util.asm"
3
4
SECTION_RODATA 32
5
-tab_Tm: db 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6
6
- db 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10
7
- db 8, 9,10,11, 9,10,11,12,10,11,12,13,11,12,13, 14
8
+const tab_Tm, db 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6
9
+ db 4, 5, 6, 7, 5, 6, 7, 8, 6, 7, 8, 9, 7, 8, 9, 10
10
+ db 8, 9,10,11, 9,10,11,12,10,11,12,13,11,12,13, 14
11
12
-ALIGN 32
13
const interp4_vpp_shuf, times 2 db 0, 4, 1, 5, 2, 6, 3, 7, 8, 12, 9, 13, 10, 14, 11, 15
14
15
-ALIGN 32
16
const interp_vert_shuf, times 2 db 0, 2, 1, 3, 2, 4, 3, 5, 4, 6, 5, 7, 6, 8, 7, 9
17
times 2 db 4, 6, 5, 7, 6, 8, 7, 9, 8, 10, 9, 11, 10, 12, 11, 13
18
19
-ALIGN 32
20
const interp4_vpp_shuf1, dd 0, 1, 1, 2, 2, 3, 3, 4
21
dd 2, 3, 3, 4, 4, 5, 5, 6
22
23
-ALIGN 32
24
const pb_8tap_hps_0, times 2 db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8
25
times 2 db 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9,10
26
times 2 db 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9,10,10,11,11,12
27
times 2 db 6, 7, 7, 8, 8, 9, 9,10,10,11,11,12,12,13,13,14
28
29
-ALIGN 32
30
-tab_Lm: db 0, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8
31
- db 2, 3, 4, 5, 6, 7, 8, 9, 3, 4, 5, 6, 7, 8, 9, 10
32
- db 4, 5, 6, 7, 8, 9, 10, 11, 5, 6, 7, 8, 9, 10, 11, 12
33
- db 6, 7, 8, 9, 10, 11, 12, 13, 7, 8, 9, 10, 11, 12, 13, 14
34
-
35
-tab_Vm: db 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1
36
- db 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3
37
-
38
-tab_Cm: db 0, 2, 1, 3, 0, 2, 1, 3, 0, 2, 1, 3, 0, 2, 1, 3
39
-
40
-tab_c_526336: times 4 dd 8192*64+2048
41
-
42
-pd_526336: times 8 dd 8192*64+2048
43
-
44
-tab_ChromaCoeff: db 0, 64, 0, 0
45
- db -2, 58, 10, -2
46
- db -4, 54, 16, -2
47
- db -6, 46, 28, -4
48
- db -4, 36, 36, -4
49
- db -4, 28, 46, -6
50
- db -2, 16, 54, -4
51
- db -2, 10, 58, -2
52
-ALIGN 32
53
-tab_ChromaCoeff_V: times 8 db 0, 64
54
- times 8 db 0, 0
55
+const tab_Lm, db 0, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 8
56
+ db 2, 3, 4, 5, 6, 7, 8, 9, 3, 4, 5, 6, 7, 8, 9, 10
57
+ db 4, 5, 6, 7, 8, 9, 10, 11, 5, 6, 7, 8, 9, 10, 11, 12
58
+ db 6, 7, 8, 9, 10, 11, 12, 13, 7, 8, 9, 10, 11, 12, 13, 14
59
60
- times 8 db -2, 58
61
- times 8 db 10, -2
62
+const tab_Vm, db 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1
63
+ db 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3, 2, 3
64
65
- times 8 db -4, 54
66
- times 8 db 16, -2
67
+const tab_Cm, db 0, 2, 1, 3, 0, 2, 1, 3, 0, 2, 1, 3, 0, 2, 1, 3
68
69
- times 8 db -6, 46
70
- times 8 db 28, -4
71
+const pd_526336, times 8 dd 8192*64+2048
72
73
- times 8 db -4, 36
74
- times 8 db 36, -4
75
+const tab_ChromaCoeff, db 0, 64, 0, 0
76
+ db -2, 58, 10, -2
77
+ db -4, 54, 16, -2
78
+ db -6, 46, 28, -4
79
+ db -4, 36, 36, -4
80
+ db -4, 28, 46, -6
81
+ db -2, 16, 54, -4
82
+ db -2, 10, 58, -2
83
84
- times 8 db -4, 28
85
- times 8 db 46, -6
86
+const tabw_ChromaCoeff, dw 0, 64, 0, 0
87
+ dw -2, 58, 10, -2
88
+ dw -4, 54, 16, -2
89
+ dw -6, 46, 28, -4
90
+ dw -4, 36, 36, -4
91
+ dw -4, 28, 46, -6
92
+ dw -2, 16, 54, -4
93
+ dw -2, 10, 58, -2
94
95
- times 8 db -2, 16
96
- times 8 db 54, -4
97
+const tab_ChromaCoeff_V, times 8 db 0, 64
98
+ times 8 db 0, 0
99
100
- times 8 db -2, 10
101
- times 8 db 58, -2
102
+ times 8 db -2, 58
103
+ times 8 db 10, -2
104
105
-tab_ChromaCoeffV: times 4 dw 0, 64
106
- times 4 dw 0, 0
107
+ times 8 db -4, 54
108
+ times 8 db 16, -2
109
110
- times 4 dw -2, 58
111
- times 4 dw 10, -2
112
+ times 8 db -6, 46
113
+ times 8 db 28, -4
114
115
- times 4 dw -4, 54
116
- times 4 dw 16, -2
117
+ times 8 db -4, 36
118
+ times 8 db 36, -4
119
120
- times 4 dw -6, 46
121
- times 4 dw 28, -4
122
+ times 8 db -4, 28
123
+ times 8 db 46, -6
124
125
- times 4 dw -4, 36
126
- times 4 dw 36, -4
127
+ times 8 db -2, 16
128
+ times 8 db 54, -4
129
130
- times 4 dw -4, 28
131
- times 4 dw 46, -6
132
+ times 8 db -2, 10
133
+ times 8 db 58, -2
134
135
- times 4 dw -2, 16
136
- times 4 dw 54, -4
137
+const tab_ChromaCoeffV, times 4 dw 0, 64
138
+ times 4 dw 0, 0
139
140
- times 4 dw -2, 10
141
- times 4 dw 58, -2
142
+ times 4 dw -2, 58
143
+ times 4 dw 10, -2
144
145
-ALIGN 32
146
-pw_ChromaCoeffV: times 8 dw 0, 64
147
- times 8 dw 0, 0
148
+ times 4 dw -4, 54
149
+ times 4 dw 16, -2
150
151
- times 8 dw -2, 58
152
- times 8 dw 10, -2
153
+ times 4 dw -6, 46
154
+ times 4 dw 28, -4
155
156
- times 8 dw -4, 54
157
- times 8 dw 16, -2
158
+ times 4 dw -4, 36
159
+ times 4 dw 36, -4
160
161
- times 8 dw -6, 46
162
- times 8 dw 28, -4
163
-
164
- times 8 dw -4, 36
165
- times 8 dw 36, -4
166
-
167
- times 8 dw -4, 28
168
- times 8 dw 46, -6
169
-
170
- times 8 dw -2, 16
171
- times 8 dw 54, -4
172
-
173
- times 8 dw -2, 10
174
- times 8 dw 58, -2
175
-
176
-tab_LumaCoeff: db 0, 0, 0, 64, 0, 0, 0, 0
177
- db -1, 4, -10, 58, 17, -5, 1, 0
178
- db -1, 4, -11, 40, 40, -11, 4, -1
179
- db 0, 1, -5, 17, 58, -10, 4, -1
180
-
181
-tab_LumaCoeffV: times 4 dw 0, 0
182
- times 4 dw 0, 64
183
- times 4 dw 0, 0
184
- times 4 dw 0, 0
185
-
186
- times 4 dw -1, 4
187
- times 4 dw -10, 58
188
- times 4 dw 17, -5
189
- times 4 dw 1, 0
190
-
191
- times 4 dw -1, 4
192
- times 4 dw -11, 40
193
- times 4 dw 40, -11
194
- times 4 dw 4, -1
195
-
196
- times 4 dw 0, 1
197
- times 4 dw -5, 17
198
- times 4 dw 58, -10
199
- times 4 dw 4, -1
200
+ times 4 dw -4, 28
201
x265_1.6.tar.gz/source/common/x86/ipfilter8.h -> x265_1.7.tar.gz/source/common/x86/ipfilter8.h
Changed
201
1
2
SETUP_CHROMA_420_HORIZ_FUNC_DEF(64, 16, cpu); \
3
SETUP_CHROMA_420_HORIZ_FUNC_DEF(16, 64, cpu)
4
5
-void x265_chroma_p2s_sse2(const pixel* src, intptr_t srcStride, int16_t* dst, int width, int height);
6
-void x265_luma_p2s_sse2(const pixel* src, intptr_t srcStride, int16_t* dst, int width, int height);
7
+void x265_filterPixelToShort_4x4_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
8
+void x265_filterPixelToShort_4x8_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
9
+void x265_filterPixelToShort_4x16_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
10
+void x265_filterPixelToShort_8x4_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
11
+void x265_filterPixelToShort_8x8_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
12
+void x265_filterPixelToShort_8x16_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
13
+void x265_filterPixelToShort_8x32_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
14
+void x265_filterPixelToShort_16x4_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
15
+void x265_filterPixelToShort_16x8_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
16
+void x265_filterPixelToShort_16x12_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
17
+void x265_filterPixelToShort_16x16_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
18
+void x265_filterPixelToShort_16x32_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
19
+void x265_filterPixelToShort_16x64_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
20
+void x265_filterPixelToShort_32x8_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
21
+void x265_filterPixelToShort_32x16_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
22
+void x265_filterPixelToShort_32x24_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
23
+void x265_filterPixelToShort_32x32_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
24
+void x265_filterPixelToShort_32x64_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
25
+void x265_filterPixelToShort_64x16_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
26
+void x265_filterPixelToShort_64x32_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
27
+void x265_filterPixelToShort_64x48_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
28
+void x265_filterPixelToShort_64x64_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
29
+void x265_filterPixelToShort_24x32_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
30
+void x265_filterPixelToShort_12x16_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
31
+void x265_filterPixelToShort_48x64_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
32
+void x265_filterPixelToShort_16x4_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
33
+void x265_filterPixelToShort_16x8_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
34
+void x265_filterPixelToShort_16x12_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
35
+void x265_filterPixelToShort_16x16_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
36
+void x265_filterPixelToShort_16x32_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
37
+void x265_filterPixelToShort_16x64_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
38
+void x265_filterPixelToShort_32x8_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
39
+void x265_filterPixelToShort_32x16_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
40
+void x265_filterPixelToShort_32x24_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
41
+void x265_filterPixelToShort_32x32_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
42
+void x265_filterPixelToShort_32x64_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
43
+void x265_filterPixelToShort_64x16_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
44
+void x265_filterPixelToShort_64x32_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
45
+void x265_filterPixelToShort_64x48_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
46
+void x265_filterPixelToShort_64x64_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
47
+void x265_filterPixelToShort_24x32_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
48
+void x265_filterPixelToShort_48x64_avx2(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
49
+
50
+#define SETUP_CHROMA_P2S_FUNC_DEF(W, H, cpu) \
51
+ void x265_filterPixelToShort_ ## W ## x ## H ## cpu(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
52
+
53
+#define CHROMA_420_P2S_FILTERS_SSSE3(cpu) \
54
+ SETUP_CHROMA_P2S_FUNC_DEF(4, 2, cpu); \
55
+ SETUP_CHROMA_P2S_FUNC_DEF(8, 2, cpu); \
56
+ SETUP_CHROMA_P2S_FUNC_DEF(8, 6, cpu);
57
+
58
+#define CHROMA_420_P2S_FILTERS_SSE4(cpu) \
59
+ SETUP_CHROMA_P2S_FUNC_DEF(2, 4, cpu); \
60
+ SETUP_CHROMA_P2S_FUNC_DEF(2, 8, cpu); \
61
+ SETUP_CHROMA_P2S_FUNC_DEF(6, 8, cpu);
62
+
63
+#define CHROMA_422_P2S_FILTERS_SSSE3(cpu) \
64
+ SETUP_CHROMA_P2S_FUNC_DEF(4, 32, cpu) \
65
+ SETUP_CHROMA_P2S_FUNC_DEF(8, 12, cpu); \
66
+ SETUP_CHROMA_P2S_FUNC_DEF(8, 64, cpu); \
67
+ SETUP_CHROMA_P2S_FUNC_DEF(12, 32, cpu); \
68
+ SETUP_CHROMA_P2S_FUNC_DEF(16, 24, cpu); \
69
+ SETUP_CHROMA_P2S_FUNC_DEF(16, 64, cpu); \
70
+ SETUP_CHROMA_P2S_FUNC_DEF(24, 64, cpu); \
71
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 48, cpu);
72
+
73
+#define CHROMA_422_P2S_FILTERS_SSE4(cpu) \
74
+ SETUP_CHROMA_P2S_FUNC_DEF(2, 8, cpu); \
75
+ SETUP_CHROMA_P2S_FUNC_DEF(2, 16, cpu) \
76
+ SETUP_CHROMA_P2S_FUNC_DEF(6, 16, cpu);
77
+
78
+#define CHROMA_420_P2S_FILTERS_AVX2(cpu) \
79
+ SETUP_CHROMA_P2S_FUNC_DEF(16, 4, cpu); \
80
+ SETUP_CHROMA_P2S_FUNC_DEF(16, 8, cpu); \
81
+ SETUP_CHROMA_P2S_FUNC_DEF(16, 12, cpu); \
82
+ SETUP_CHROMA_P2S_FUNC_DEF(16, 16, cpu); \
83
+ SETUP_CHROMA_P2S_FUNC_DEF(16, 32, cpu); \
84
+ SETUP_CHROMA_P2S_FUNC_DEF(24, 32, cpu); \
85
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 8, cpu); \
86
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 16, cpu); \
87
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 24, cpu); \
88
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 32, cpu);
89
+
90
+#define CHROMA_422_P2S_FILTERS_AVX2(cpu) \
91
+ SETUP_CHROMA_P2S_FUNC_DEF(16, 8, cpu); \
92
+ SETUP_CHROMA_P2S_FUNC_DEF(16, 16, cpu); \
93
+ SETUP_CHROMA_P2S_FUNC_DEF(16, 24, cpu); \
94
+ SETUP_CHROMA_P2S_FUNC_DEF(16, 32, cpu); \
95
+ SETUP_CHROMA_P2S_FUNC_DEF(16, 64, cpu); \
96
+ SETUP_CHROMA_P2S_FUNC_DEF(24, 64, cpu); \
97
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 16, cpu); \
98
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 32, cpu); \
99
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 48, cpu); \
100
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 64, cpu);
101
102
CHROMA_420_VERT_FILTERS(_sse2);
103
CHROMA_420_HORIZ_FILTERS(_sse4);
104
CHROMA_420_VERT_FILTERS_SSE4(_sse4);
105
+CHROMA_420_P2S_FILTERS_SSSE3(_ssse3);
106
+CHROMA_420_P2S_FILTERS_SSE4(_sse4);
107
+CHROMA_420_P2S_FILTERS_AVX2(_avx2);
108
109
CHROMA_422_VERT_FILTERS(_sse2);
110
CHROMA_422_HORIZ_FILTERS(_sse4);
111
CHROMA_422_VERT_FILTERS_SSE4(_sse4);
112
+CHROMA_422_P2S_FILTERS_SSE4(_sse4);
113
+CHROMA_422_P2S_FILTERS_SSSE3(_ssse3);
114
+CHROMA_422_P2S_FILTERS_AVX2(_avx2);
115
116
CHROMA_444_VERT_FILTERS(_sse2);
117
CHROMA_444_HORIZ_FILTERS(_sse4);
118
119
SETUP_CHROMA_SS_FUNC_DEF(64, 16, cpu); \
120
SETUP_CHROMA_SS_FUNC_DEF(16, 64, cpu);
121
122
+#define SETUP_CHROMA_P2S_FUNC_DEF(W, H, cpu) \
123
+ void x265_filterPixelToShort_ ## W ## x ## H ## cpu(const pixel* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride);
124
+
125
+#define CHROMA_420_P2S_FILTERS_SSE4(cpu) \
126
+ SETUP_CHROMA_P2S_FUNC_DEF(2, 4, cpu); \
127
+ SETUP_CHROMA_P2S_FUNC_DEF(2, 8, cpu); \
128
+ SETUP_CHROMA_P2S_FUNC_DEF(4, 2, cpu); \
129
+ SETUP_CHROMA_P2S_FUNC_DEF(6, 8, cpu);
130
+
131
+#define CHROMA_420_P2S_FILTERS_SSSE3(cpu) \
132
+ SETUP_CHROMA_P2S_FUNC_DEF(8, 2, cpu); \
133
+ SETUP_CHROMA_P2S_FUNC_DEF(8, 6, cpu);
134
+
135
+#define CHROMA_422_P2S_FILTERS_SSE4(cpu) \
136
+ SETUP_CHROMA_P2S_FUNC_DEF(2, 8, cpu); \
137
+ SETUP_CHROMA_P2S_FUNC_DEF(2, 16, cpu); \
138
+ SETUP_CHROMA_P2S_FUNC_DEF(6, 16, cpu); \
139
+ SETUP_CHROMA_P2S_FUNC_DEF(4, 32, cpu);
140
+
141
+#define CHROMA_422_P2S_FILTERS_SSSE3(cpu) \
142
+ SETUP_CHROMA_P2S_FUNC_DEF(8, 12, cpu); \
143
+ SETUP_CHROMA_P2S_FUNC_DEF(8, 64, cpu); \
144
+ SETUP_CHROMA_P2S_FUNC_DEF(12, 32, cpu); \
145
+ SETUP_CHROMA_P2S_FUNC_DEF(16, 24, cpu); \
146
+ SETUP_CHROMA_P2S_FUNC_DEF(16, 64, cpu); \
147
+ SETUP_CHROMA_P2S_FUNC_DEF(24, 64, cpu); \
148
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 48, cpu);
149
+
150
+#define CHROMA_420_P2S_FILTERS_AVX2(cpu) \
151
+ SETUP_CHROMA_P2S_FUNC_DEF(24, 32, cpu); \
152
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 8, cpu); \
153
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 16, cpu); \
154
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 24, cpu); \
155
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 32, cpu);
156
+
157
+#define CHROMA_422_P2S_FILTERS_AVX2(cpu) \
158
+ SETUP_CHROMA_P2S_FUNC_DEF(24, 64, cpu); \
159
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 16, cpu); \
160
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 32, cpu); \
161
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 48, cpu); \
162
+ SETUP_CHROMA_P2S_FUNC_DEF(32, 64, cpu);
163
+
164
CHROMA_420_FILTERS(_sse4);
165
CHROMA_420_FILTERS(_avx2);
166
CHROMA_420_SP_FILTERS(_sse2);
167
168
CHROMA_420_SS_FILTERS_SSE4(_sse4);
169
CHROMA_420_SS_FILTERS(_avx2);
170
CHROMA_420_SS_FILTERS_SSE4(_avx2);
171
+CHROMA_420_P2S_FILTERS_SSE4(_sse4);
172
+CHROMA_420_P2S_FILTERS_SSSE3(_ssse3);
173
+CHROMA_420_P2S_FILTERS_AVX2(_avx2);
174
175
CHROMA_422_FILTERS(_sse4);
176
CHROMA_422_FILTERS(_avx2);
177
CHROMA_422_SP_FILTERS(_sse2);
178
+CHROMA_422_SP_FILTERS(_avx2);
179
CHROMA_422_SP_FILTERS_SSE4(_sse4);
180
+CHROMA_422_SP_FILTERS_SSE4(_avx2);
181
CHROMA_422_SS_FILTERS(_sse2);
182
+CHROMA_422_SS_FILTERS(_avx2);
183
CHROMA_422_SS_FILTERS_SSE4(_sse4);
184
+CHROMA_422_SS_FILTERS_SSE4(_avx2);
185
+CHROMA_422_P2S_FILTERS_SSE4(_sse4);
186
+CHROMA_422_P2S_FILTERS_SSSE3(_ssse3);
187
+CHROMA_422_P2S_FILTERS_AVX2(_avx2);
188
+void x265_interp_4tap_vert_ss_2x4_avx2(const int16_t* src, intptr_t srcStride, int16_t* dst, intptr_t dstStride, int coeffIdx);
189
+void x265_interp_4tap_vert_sp_2x4_avx2(const int16_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int coeffIdx);
190
191
CHROMA_444_FILTERS(_sse4);
192
CHROMA_444_SP_FILTERS(_sse4);
193
CHROMA_444_SS_FILTERS(_sse2);
194
-
195
-void x265_chroma_p2s_ssse3(const pixel* src, intptr_t srcStride, int16_t* dst, int width, int height);
196
+CHROMA_444_FILTERS(_avx2);
197
+CHROMA_444_SP_FILTERS(_avx2);
198
+CHROMA_444_SS_FILTERS(_avx2);
199
200
#undef SETUP_CHROMA_FUNC_DEF
201
x265_1.6.tar.gz/source/common/x86/loopfilter.asm -> x265_1.7.tar.gz/source/common/x86/loopfilter.asm
Changed
201
1
2
%include "x86inc.asm"
3
4
SECTION_RODATA 32
5
-pb_31: times 16 db 31
6
-pb_15: times 16 db 15
7
+pb_31: times 32 db 31
8
+pb_15: times 32 db 15
9
+pb_movemask_32: times 32 db 0x00
10
+ times 32 db 0xFF
11
12
SECTION .text
13
cextern pb_1
14
cextern pb_128
15
cextern pb_2
16
cextern pw_2
17
+cextern pb_movemask
18
19
20
;============================================================================================================
21
-; void saoCuOrgE0(pixel * rec, int8_t * offsetEo, int lcuWidth, int8_t signLeft)
22
+; void saoCuOrgE0(pixel * rec, int8_t * offsetEo, int lcuWidth, int8_t* signLeft, intptr_t stride)
23
;============================================================================================================
24
INIT_XMM sse4
25
-cglobal saoCuOrgE0, 4, 4, 8, rec, offsetEo, lcuWidth, signLeft
26
+cglobal saoCuOrgE0, 5, 5, 8, rec, offsetEo, lcuWidth, signLeft, stride
27
28
- neg r3 ; r3 = -signLeft
29
- movzx r3d, r3b
30
- movd m0, r3d
31
- mova m4, [pb_128] ; m4 = [80]
32
- pxor m5, m5 ; m5 = 0
33
- movu m6, [r1] ; m6 = offsetEo
34
+ mov r4d, r4m
35
+ mova m4, [pb_128] ; m4 = [80]
36
+ pxor m5, m5 ; m5 = 0
37
+ movu m6, [r1] ; m6 = offsetEo
38
+
39
+ movzx r1d, byte [r3]
40
+ inc r3
41
+ neg r1b
42
+ movd m0, r1d
43
+ lea r1, [r0 + r4]
44
+ mov r4d, r2d
45
46
.loop:
47
- movu m7, [r0] ; m1 = rec[x]
48
+ movu m7, [r0] ; m7 = rec[x]
49
movu m2, [r0 + 1] ; m2 = rec[x+1]
50
51
pxor m1, m7, m4
52
53
pxor m0, m0
54
palignr m0, m2, 15
55
paddb m2, m3
56
- paddb m2, [pb_2] ; m1 = uiEdgeType
57
+ paddb m2, [pb_2] ; m2 = uiEdgeType
58
pshufb m3, m6, m2
59
pmovzxbw m2, m7 ; rec
60
punpckhbw m7, m5
61
62
add r0q, 16
63
sub r2d, 16
64
jnz .loop
65
+
66
+ movzx r3d, byte [r3]
67
+ neg r3b
68
+ movd m0, r3d
69
+.loopH:
70
+ movu m7, [r1] ; m7 = rec[x]
71
+ movu m2, [r1 + 1] ; m2 = rec[x+1]
72
+
73
+ pxor m1, m7, m4
74
+ pxor m3, m2, m4
75
+ pcmpgtb m2, m1, m3
76
+ pcmpgtb m3, m1
77
+ pand m2, [pb_1]
78
+ por m2, m3
79
+
80
+ pslldq m3, m2, 1
81
+ por m3, m0
82
+
83
+ psignb m3, m4 ; m3 = signLeft
84
+ pxor m0, m0
85
+ palignr m0, m2, 15
86
+ paddb m2, m3
87
+ paddb m2, [pb_2] ; m2 = uiEdgeType
88
+ pshufb m3, m6, m2
89
+ pmovzxbw m2, m7 ; rec
90
+ punpckhbw m7, m5
91
+ pmovsxbw m1, m3 ; offsetEo
92
+ punpckhbw m3, m3
93
+ psraw m3, 8
94
+ paddw m2, m1
95
+ paddw m7, m3
96
+ packuswb m2, m7
97
+ movu [r1], m2
98
+
99
+ add r1q, 16
100
+ sub r4d, 16
101
+ jnz .loopH
102
+ RET
103
+
104
+INIT_YMM avx2
105
+cglobal saoCuOrgE0, 5, 5, 7, rec, offsetEo, lcuWidth, signLeft, stride
106
+
107
+ mov r4d, r4m
108
+ vbroadcasti128 m4, [pb_128] ; m4 = [80]
109
+ vbroadcasti128 m6, [r1] ; m6 = offsetEo
110
+ movzx r1d, byte [r3]
111
+ neg r1b
112
+ movd xm0, r1d
113
+ movzx r1d, byte [r3 + 1]
114
+ neg r1b
115
+ movd xm1, r1d
116
+ vinserti128 m0, m0, xm1, 1
117
+
118
+.loop:
119
+ movu xm5, [r0] ; xm5 = rec[x]
120
+ movu xm2, [r0 + 1] ; xm2 = rec[x + 1]
121
+ vinserti128 m5, m5, [r0 + r4], 1
122
+ vinserti128 m2, m2, [r0 + r4 + 1], 1
123
+
124
+ pxor m1, m5, m4
125
+ pxor m3, m2, m4
126
+ pcmpgtb m2, m1, m3
127
+ pcmpgtb m3, m1
128
+ pand m2, [pb_1]
129
+ por m2, m3
130
+
131
+ pslldq m3, m2, 1
132
+ por m3, m0
133
+
134
+ psignb m3, m4 ; m3 = signLeft
135
+ pxor m0, m0
136
+ palignr m0, m2, 15
137
+ paddb m2, m3
138
+ paddb m2, [pb_2] ; m2 = uiEdgeType
139
+ pshufb m3, m6, m2
140
+ pmovzxbw m2, xm5 ; rec
141
+ vextracti128 xm5, m5, 1
142
+ pmovzxbw m5, xm5
143
+ pmovsxbw m1, xm3 ; offsetEo
144
+ vextracti128 xm3, m3, 1
145
+ pmovsxbw m3, xm3
146
+ paddw m2, m1
147
+ paddw m5, m3
148
+ packuswb m2, m5
149
+ vpermq m2, m2, 11011000b
150
+ movu [r0], xm2
151
+ vextracti128 [r0 + r4], m2, 1
152
+
153
+ add r0q, 16
154
+ sub r2d, 16
155
+ jnz .loop
156
RET
157
158
;==================================================================================================
159
160
mov r3d, r3m
161
mov r4d, r4m
162
pxor m0, m0 ; m0 = 0
163
- movu m6, [pb_2] ; m6 = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
164
+ mova m6, [pb_2] ; m6 = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
165
mova m7, [pb_128]
166
shr r4d, 4
167
- .loop
168
- movu m1, [r0] ; m1 = pRec[x]
169
- movu m2, [r0 + r3] ; m2 = pRec[x + iStride]
170
-
171
- pxor m3, m1, m7
172
- pxor m4, m2, m7
173
- pcmpgtb m2, m3, m4
174
- pcmpgtb m4, m3
175
- pand m2, [pb_1]
176
- por m2, m4
177
-
178
- movu m3, [r1] ; m3 = m_iUpBuff1
179
-
180
- paddb m3, m2
181
- paddb m3, m6
182
-
183
- movu m4, [r2] ; m4 = m_iOffsetEo
184
- pshufb m5, m4, m3
185
-
186
- psubb m3, m0, m2
187
- movu [r1], m3
188
-
189
- pmovzxbw m2, m1
190
- punpckhbw m1, m0
191
- pmovsxbw m3, m5
192
- punpckhbw m5, m5
193
- psraw m5, 8
194
-
195
- paddw m2, m3
196
- paddw m1, m5
197
- packuswb m2, m1
198
- movu [r0], m2
199
-
200
- add r0, 16
201
x265_1.6.tar.gz/source/common/x86/loopfilter.h -> x265_1.7.tar.gz/source/common/x86/loopfilter.h
Changed
24
1
2
#ifndef X265_LOOPFILTER_H
3
#define X265_LOOPFILTER_H
4
5
-void x265_saoCuOrgE0_sse4(pixel * rec, int8_t * offsetEo, int endX, int8_t signLeft);
6
+void x265_saoCuOrgE0_sse4(pixel * rec, int8_t * offsetEo, int endX, int8_t* signLeft, intptr_t stride);
7
+void x265_saoCuOrgE0_avx2(pixel * rec, int8_t * offsetEo, int endX, int8_t* signLeft, intptr_t stride);
8
void x265_saoCuOrgE1_sse4(pixel* rec, int8_t* upBuff1, int8_t* offsetEo, intptr_t stride, int width);
9
+void x265_saoCuOrgE1_avx2(pixel* rec, int8_t* upBuff1, int8_t* offsetEo, intptr_t stride, int width);
10
+void x265_saoCuOrgE1_2Rows_sse4(pixel* rec, int8_t* upBuff1, int8_t* offsetEo, intptr_t stride, int width);
11
+void x265_saoCuOrgE1_2Rows_avx2(pixel* rec, int8_t* upBuff1, int8_t* offsetEo, intptr_t stride, int width);
12
void x265_saoCuOrgE2_sse4(pixel* rec, int8_t* pBufft, int8_t* pBuff1, int8_t* offsetEo, int lcuWidth, intptr_t stride);
13
+void x265_saoCuOrgE2_avx2(pixel* rec, int8_t* pBufft, int8_t* pBuff1, int8_t* offsetEo, int lcuWidth, intptr_t stride);
14
+void x265_saoCuOrgE2_32_avx2(pixel* rec, int8_t* pBufft, int8_t* pBuff1, int8_t* offsetEo, int lcuWidth, intptr_t stride);
15
void x265_saoCuOrgE3_sse4(pixel *rec, int8_t *upBuff1, int8_t *m_offsetEo, intptr_t stride, int startX, int endX);
16
+void x265_saoCuOrgE3_avx2(pixel *rec, int8_t *upBuff1, int8_t *m_offsetEo, intptr_t stride, int startX, int endX);
17
+void x265_saoCuOrgE3_32_avx2(pixel *rec, int8_t *upBuff1, int8_t *m_offsetEo, intptr_t stride, int startX, int endX);
18
void x265_saoCuOrgB0_sse4(pixel* rec, const int8_t* offsetBo, int ctuWidth, int ctuHeight, intptr_t stride);
19
+void x265_saoCuOrgB0_avx2(pixel* rec, const int8_t* offsetBo, int ctuWidth, int ctuHeight, intptr_t stride);
20
void x265_calSign_sse4(int8_t *dst, const pixel *src1, const pixel *src2, const int endX);
21
+void x265_calSign_avx2(int8_t *dst, const pixel *src1, const pixel *src2, const int endX);
22
23
#endif // ifndef X265_LOOPFILTER_H
24
x265_1.6.tar.gz/source/common/x86/mc-a.asm -> x265_1.7.tar.gz/source/common/x86/mc-a.asm
Changed
44
1
2
3
ADDAVG_W8_H4_AVX2 4
4
ADDAVG_W8_H4_AVX2 8
5
+ADDAVG_W8_H4_AVX2 12
6
ADDAVG_W8_H4_AVX2 16
7
ADDAVG_W8_H4_AVX2 32
8
+ADDAVG_W8_H4_AVX2 64
9
10
%macro ADDAVG_W12_H4_AVX2 1
11
INIT_YMM avx2
12
13
%endmacro
14
15
ADDAVG_W12_H4_AVX2 16
16
+ADDAVG_W12_H4_AVX2 32
17
18
%macro ADDAVG_W16_H4_AVX2 1
19
INIT_YMM avx2
20
21
ADDAVG_W16_H4_AVX2 8
22
ADDAVG_W16_H4_AVX2 12
23
ADDAVG_W16_H4_AVX2 16
24
+ADDAVG_W16_H4_AVX2 24
25
ADDAVG_W16_H4_AVX2 32
26
ADDAVG_W16_H4_AVX2 64
27
28
29
%endmacro
30
31
ADDAVG_W24_H2_AVX2 32
32
+ADDAVG_W24_H2_AVX2 64
33
34
%macro ADDAVG_W32_H2_AVX2 1
35
INIT_YMM avx2
36
37
ADDAVG_W32_H2_AVX2 16
38
ADDAVG_W32_H2_AVX2 24
39
ADDAVG_W32_H2_AVX2 32
40
+ADDAVG_W32_H2_AVX2 48
41
ADDAVG_W32_H2_AVX2 64
42
43
%macro ADDAVG_W64_H2_AVX2 1
44
x265_1.6.tar.gz/source/common/x86/pixel-a.asm -> x265_1.7.tar.gz/source/common/x86/pixel-a.asm
Changed
201
1
2
.end:
3
RET
4
5
+; Input 16bpp, Output 8bpp
6
+;-------------------------------------------------------------------------------------------------------------------------------------
7
+;void planecopy_sp(uint16_t *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int width, int height, int shift, uint16_t mask)
8
+;-------------------------------------------------------------------------------------------------------------------------------------
9
+INIT_YMM avx2
10
+cglobal downShift_16, 6,7,3
11
+ movd xm0, r6m ; m0 = shift
12
+ add r1d, r1d
13
+ dec r5d
14
+.loopH:
15
+ xor r6, r6
16
+.loopW:
17
+ movu m1, [r0 + r6 * 2 + 0]
18
+ movu m2, [r0 + r6 * 2 + 32]
19
+ vpsrlw m1, xm0
20
+ vpsrlw m2, xm0
21
+ packuswb m1, m2
22
+ vpermq m1, m1, 11011000b
23
+ movu [r2 + r6], m1
24
+
25
+ add r6d, mmsize
26
+ cmp r6d, r4d
27
+ jl .loopW
28
+
29
+ ; move to next row
30
+ add r0, r1
31
+ add r2, r3
32
+ dec r5d
33
+ jnz .loopH
34
+
35
+; processing last row of every frame [To handle width which not a multiple of 32]
36
+ mov r6d, r4d
37
+ and r4d, 31
38
+ shr r6d, 5
39
+
40
+.loop32:
41
+ movu m1, [r0]
42
+ movu m2, [r0 + 32]
43
+ psrlw m1, xm0
44
+ psrlw m2, xm0
45
+ packuswb m1, m2
46
+ vpermq m1, m1, 11011000b
47
+ movu [r2], m1
48
+
49
+ add r0, 2*mmsize
50
+ add r2, mmsize
51
+ dec r6d
52
+ jnz .loop32
53
+
54
+ cmp r4d, 16
55
+ jl .process8
56
+ movu m1, [r0]
57
+ psrlw m1, xm0
58
+ packuswb m1, m1
59
+ vpermq m1, m1, 10001000b
60
+ movu [r2], xm1
61
+
62
+ add r0, mmsize
63
+ add r2, 16
64
+ sub r4d, 16
65
+ jz .end
66
+
67
+.process8:
68
+ cmp r4d, 8
69
+ jl .process4
70
+ movu m1, [r0]
71
+ psrlw m1, xm0
72
+ packuswb m1, m1
73
+ movq [r2], xm1
74
+
75
+ add r0, 16
76
+ add r2, 8
77
+ sub r4d, 8
78
+ jz .end
79
+
80
+.process4:
81
+ cmp r4d, 4
82
+ jl .process2
83
+ movq xm1,[r0]
84
+ psrlw m1, xm0
85
+ packuswb m1, m1
86
+ movd [r2], xm1
87
+
88
+ add r0, 8
89
+ add r2, 4
90
+ sub r4d, 4
91
+ jz .end
92
+
93
+.process2:
94
+ cmp r4d, 2
95
+ jl .process1
96
+ movd xm1, [r0]
97
+ psrlw m1, xm0
98
+ packuswb m1, m1
99
+ movd r6d, xm1
100
+ mov [r2], r6w
101
+
102
+ add r0, 4
103
+ add r2, 2
104
+ sub r4d, 2
105
+ jz .end
106
+
107
+.process1:
108
+ movd xm1, [r0]
109
+ psrlw m1, xm0
110
+ packuswb m1, m1
111
+ movd r3d, xm1
112
+ mov [r2], r3b
113
+.end:
114
+ RET
115
+
116
; Input 8bpp, Output 16bpp
117
;---------------------------------------------------------------------------------------------------------------------
118
;void planecopy_cp(uint8_t *src, intptr_t srcStride, pixel *dst, intptr_t dstStride, int width, int height, int shift)
119
120
mov rsp, r5
121
RET
122
%endif
123
+
124
+;;---------------------------------------------------------------
125
+;; SATD AVX2
126
+;; int pixel_satd(const pixel*, intptr_t, const pixel*, intptr_t)
127
+;;---------------------------------------------------------------
128
+;; r0 - pix0
129
+;; r1 - pix0Stride
130
+;; r2 - pix1
131
+;; r3 - pix1Stride
132
+
133
+%if ARCH_X86_64 == 1 && HIGH_BIT_DEPTH == 0
134
+INIT_YMM avx2
135
+cglobal calc_satd_16x8 ; function to compute satd cost for 16 columns, 8 rows
136
+ pxor m6, m6
137
+ vbroadcasti128 m0, [r0]
138
+ vbroadcasti128 m4, [r2]
139
+ vbroadcasti128 m1, [r0 + r1]
140
+ vbroadcasti128 m5, [r2 + r3]
141
+ pmaddubsw m4, m7
142
+ pmaddubsw m0, m7
143
+ pmaddubsw m5, m7
144
+ pmaddubsw m1, m7
145
+ psubw m0, m4
146
+ psubw m1, m5
147
+ vbroadcasti128 m2, [r0 + r1 * 2]
148
+ vbroadcasti128 m4, [r2 + r3 * 2]
149
+ vbroadcasti128 m3, [r0 + r4]
150
+ vbroadcasti128 m5, [r2 + r5]
151
+ pmaddubsw m4, m7
152
+ pmaddubsw m2, m7
153
+ pmaddubsw m5, m7
154
+ pmaddubsw m3, m7
155
+ psubw m2, m4
156
+ psubw m3, m5
157
+ lea r0, [r0 + r1 * 4]
158
+ lea r2, [r2 + r3 * 4]
159
+ paddw m4, m0, m1
160
+ psubw m1, m1, m0
161
+ paddw m0, m2, m3
162
+ psubw m3, m2
163
+ paddw m2, m4, m0
164
+ psubw m0, m4
165
+ paddw m4, m1, m3
166
+ psubw m3, m1
167
+ pabsw m2, m2
168
+ pabsw m0, m0
169
+ pabsw m4, m4
170
+ pabsw m3, m3
171
+ pblendw m1, m2, m0, 10101010b
172
+ pslld m0, 16
173
+ psrld m2, 16
174
+ por m0, m2
175
+ pmaxsw m1, m0
176
+ paddw m6, m1
177
+ pblendw m2, m4, m3, 10101010b
178
+ pslld m3, 16
179
+ psrld m4, 16
180
+ por m3, m4
181
+ pmaxsw m2, m3
182
+ paddw m6, m2
183
+ vbroadcasti128 m1, [r0]
184
+ vbroadcasti128 m4, [r2]
185
+ vbroadcasti128 m2, [r0 + r1]
186
+ vbroadcasti128 m5, [r2 + r3]
187
+ pmaddubsw m4, m7
188
+ pmaddubsw m1, m7
189
+ pmaddubsw m5, m7
190
+ pmaddubsw m2, m7
191
+ psubw m1, m4
192
+ psubw m2, m5
193
+ vbroadcasti128 m0, [r0 + r1 * 2]
194
+ vbroadcasti128 m4, [r2 + r3 * 2]
195
+ vbroadcasti128 m3, [r0 + r4]
196
+ vbroadcasti128 m5, [r2 + r5]
197
+ lea r0, [r0 + r1 * 4]
198
+ lea r2, [r2 + r3 * 4]
199
+ pmaddubsw m4, m7
200
+ pmaddubsw m0, m7
201
x265_1.6.tar.gz/source/common/x86/pixel-util.h -> x265_1.7.tar.gz/source/common/x86/pixel-util.h
Changed
33
1
2
float x265_pixel_ssim_end4_sse2(int sum0[5][4], int sum1[5][4], int width);
3
float x265_pixel_ssim_end4_avx(int sum0[5][4], int sum1[5][4], int width);
4
5
-void x265_scale1D_128to64_ssse3(pixel*, const pixel*, intptr_t);
6
-void x265_scale1D_128to64_avx2(pixel*, const pixel*, intptr_t);
7
+void x265_scale1D_128to64_ssse3(pixel*, const pixel*);
8
+void x265_scale1D_128to64_avx2(pixel*, const pixel*);
9
void x265_scale2D_64to32_ssse3(pixel*, const pixel*, intptr_t);
10
+void x265_scale2D_64to32_avx2(pixel*, const pixel*, intptr_t);
11
12
-int x265_findPosLast_x64(const uint16_t *scan, const coeff_t *coeff, uint16_t *coeffSign, uint16_t *coeffFlag, uint8_t *coeffNum, int numSig);
13
+int x265_scanPosLast_x64(const uint16_t *scan, const coeff_t *coeff, uint16_t *coeffSign, uint16_t *coeffFlag, uint8_t *coeffNum, int numSig, const uint16_t* scanCG4x4, const int trSize);
14
+int x265_scanPosLast_avx2_bmi2(const uint16_t *scan, const coeff_t *coeff, uint16_t *coeffSign, uint16_t *coeffFlag, uint8_t *coeffNum, int numSig, const uint16_t* scanCG4x4, const int trSize);
15
+uint32_t x265_findPosFirstLast_ssse3(const int16_t *dstCoeff, const intptr_t trSize, const uint16_t scanTbl[16]);
16
17
#define SETUP_CHROMA_PIXELSUB_PS_FUNC(W, H, cpu) \
18
void x265_pixel_sub_ps_ ## W ## x ## H ## cpu(int16_t* dest, intptr_t destride, const pixel* src0, const pixel* src1, intptr_t srcstride0, intptr_t srcstride1); \
19
- void x265_pixel_add_ps_ ## W ## x ## H ## cpu(pixel* dest, intptr_t destride, const pixel* src0, const int16_t* scr1, intptr_t srcStride0, intptr_t srcStride1);
20
+ void x265_pixel_add_ps_ ## W ## x ## H ## cpu(pixel* dest, intptr_t destride, const pixel* src0, const int16_t* src1, intptr_t srcStride0, intptr_t srcStride1);
21
22
#define CHROMA_420_PIXELSUB_DEF(cpu) \
23
SETUP_CHROMA_PIXELSUB_PS_FUNC(4, 4, cpu); \
24
25
26
#define SETUP_LUMA_PIXELSUB_PS_FUNC(W, H, cpu) \
27
void x265_pixel_sub_ps_ ## W ## x ## H ## cpu(int16_t* dest, intptr_t destride, const pixel* src0, const pixel* src1, intptr_t srcstride0, intptr_t srcstride1); \
28
- void x265_pixel_add_ps_ ## W ## x ## H ## cpu(pixel* dest, intptr_t destride, const pixel* src0, const int16_t* scr1, intptr_t srcStride0, intptr_t srcStride1);
29
+ void x265_pixel_add_ps_ ## W ## x ## H ## cpu(pixel* dest, intptr_t destride, const pixel* src0, const int16_t* src1, intptr_t srcStride0, intptr_t srcStride1);
30
31
#define LUMA_PIXELSUB_DEF(cpu) \
32
SETUP_LUMA_PIXELSUB_PS_FUNC(8, 8, cpu); \
33
x265_1.6.tar.gz/source/common/x86/pixel-util8.asm -> x265_1.7.tar.gz/source/common/x86/pixel-util8.asm
Changed
201
1
2
ssim_c1: times 4 dd 416 ; .01*.01*255*255*64
3
ssim_c2: times 4 dd 235963 ; .03*.03*255*255*64*63
4
%endif
5
-mask_ff: times 16 db 0xff
6
- times 16 db 0
7
-deinterleave_shuf: db 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15
8
-deinterleave_word_shuf: db 0, 1, 4, 5, 8, 9, 12, 13, 2, 3, 6, 7, 10, 11, 14, 15
9
-hmul_16p: times 16 db 1
10
- times 8 db 1, -1
11
-hmulw_16p: times 8 dw 1
12
- times 4 dw 1, -1
13
14
-trans8_shuf: dd 0, 4, 1, 5, 2, 6, 3, 7
15
+mask_ff: times 16 db 0xff
16
+ times 16 db 0
17
+deinterleave_shuf: times 2 db 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15
18
+deinterleave_word_shuf: times 2 db 0, 1, 4, 5, 8, 9, 12, 13, 2, 3, 6, 7, 10, 11, 14, 15
19
+hmul_16p: times 16 db 1
20
+ times 8 db 1, -1
21
+hmulw_16p: times 8 dw 1
22
+ times 4 dw 1, -1
23
+
24
+trans8_shuf: dd 0, 4, 1, 5, 2, 6, 3, 7
25
26
SECTION .text
27
28
29
cextern pb_2
30
cextern pb_4
31
cextern pb_8
32
+cextern pb_15
33
cextern pb_16
34
cextern pb_32
35
cextern pb_64
36
37
38
%if ARCH_X86_64 == 1
39
INIT_YMM avx2
40
-cglobal quant, 5,5,10
41
+cglobal quant, 5,6,9
42
; fill qbits
43
movd xm4, r4d ; m4 = qbits
44
45
46
; fill offset
47
vpbroadcastd m5, r5m ; m5 = add
48
49
- vpbroadcastw m9, [pw_1] ; m9 = word [1]
50
+ lea r5, [pw_1]
51
52
mov r4d, r6m
53
shr r4d, 4
54
55
56
; count non-zero coeff
57
; TODO: popcnt is faster, but some CPU can't support
58
- pminuw m2, m9
59
+ pminuw m2, [r5]
60
paddw m7, m2
61
62
add r0, mmsize
63
64
mov r6d, r6m
65
shl r6d, 16
66
or r6d, r5d ; assuming both (w0<<6) and round are using maximum of 16 bits each.
67
- movd xm0, r6d
68
- pshufd xm0, xm0, 0 ; m0 = [w0<<6, round]
69
- vinserti128 m0, m0, xm0, 1 ; document says (pshufd + vinserti128) can be replaced with vpbroadcastd m0, xm0, but having build problem, need to investigate
70
+
71
+ vpbroadcastd m0, r6d
72
73
movd xm1, r7m
74
vpbroadcastd m2, r8m
75
76
dec r5d
77
jnz .loopH
78
RET
79
+
80
+%if ARCH_X86_64
81
+INIT_YMM avx2
82
+cglobal weight_sp, 6, 9, 7
83
+ mov r7d, r7m
84
+ shl r7d, 16
85
+ or r7d, r6m
86
+ vpbroadcastd m0, r7d ; m0 = times 8 dw w0, round
87
+ movd xm1, r8m ; m1 = [shift]
88
+ vpbroadcastd m2, r9m ; m2 = times 16 dw offset
89
+ vpbroadcastw m3, [pw_1]
90
+ vpbroadcastw m4, [pw_2000]
91
+
92
+ add r2d, r2d ; 2 * srcstride
93
+
94
+ mov r7, r0
95
+ mov r8, r1
96
+.loopH:
97
+ mov r6d, r4d ; width
98
+
99
+ ; save old src and dst
100
+ mov r0, r7 ; src
101
+ mov r1, r8 ; dst
102
+.loopW:
103
+ movu m5, [r0]
104
+ paddw m5, m4
105
+
106
+ punpcklwd m6,m5, m3
107
+ pmaddwd m6, m0
108
+ psrad m6, xm1
109
+ paddd m6, m2
110
+
111
+ punpckhwd m5, m3
112
+ pmaddwd m5, m0
113
+ psrad m5, xm1
114
+ paddd m5, m2
115
+
116
+ packssdw m6, m5
117
+ packuswb m6, m6
118
+ vpermq m6, m6, 10001000b
119
+
120
+ sub r6d, 16
121
+ jl .width8
122
+ movu [r1], xm6
123
+ je .nextH
124
+ add r0, 32
125
+ add r1, 16
126
+ jmp .loopW
127
+
128
+.width8:
129
+ add r6d, 16
130
+ cmp r6d, 8
131
+ jl .width4
132
+ movq [r1], xm6
133
+ je .nextH
134
+ psrldq m6, 8
135
+ sub r6d, 8
136
+ add r1, 8
137
+
138
+.width4:
139
+ cmp r6d, 4
140
+ jl .width2
141
+ movd [r1], xm6
142
+ je .nextH
143
+ add r1, 4
144
+ pshufd m6, m6, 1
145
+
146
+.width2:
147
+ pextrw [r1], xm6, 0
148
+
149
+.nextH:
150
+ lea r7, [r7 + r2]
151
+ lea r8, [r8 + r3]
152
+
153
+ dec r5d
154
+ jnz .loopH
155
+ RET
156
+%endif
157
%endif ; end of (HIGH_BIT_DEPTH == 0)
158
159
160
161
RET
162
%endif
163
164
+;-----------------------------------------------------------------
165
+; void scale2D_64to32(pixel *dst, pixel *src, intptr_t stride)
166
+;-----------------------------------------------------------------
167
+%if HIGH_BIT_DEPTH
168
+INIT_YMM avx2
169
+cglobal scale2D_64to32, 3, 4, 5, dest, src, stride
170
+ mov r3d, 32
171
+ add r2d, r2d
172
+ mova m4, [pw_2000]
173
+
174
+.loop:
175
+ movu m0, [r1]
176
+ movu m1, [r1 + 1 * mmsize]
177
+ movu m2, [r1 + r2]
178
+ movu m3, [r1 + r2 + 1 * mmsize]
179
+
180
+ paddw m0, m2
181
+ paddw m1, m3
182
+ phaddw m0, m1
183
+
184
+ pmulhrsw m0, m4
185
+ vpermq m0, m0, q3120
186
+ movu [r0], m0
187
+
188
+ movu m0, [r1 + 2 * mmsize]
189
+ movu m1, [r1 + 3 * mmsize]
190
+ movu m2, [r1 + r2 + 2 * mmsize]
191
+ movu m3, [r1 + r2 + 3 * mmsize]
192
+
193
+ paddw m0, m2
194
+ paddw m1, m3
195
+ phaddw m0, m1
196
+
197
+ pmulhrsw m0, m4
198
+ vpermq m0, m0, q3120
199
+ movu [r0 + mmsize], m0
200
+
201
x265_1.6.tar.gz/source/common/x86/pixel.h -> x265_1.7.tar.gz/source/common/x86/pixel.h
Changed
32
1
2
ADDAVG(addAvg_32x48)
3
4
void x265_downShift_16_sse2(const uint16_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int width, int height, int shift, uint16_t mask);
5
+void x265_downShift_16_avx2(const uint16_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int width, int height, int shift, uint16_t mask);
6
void x265_upShift_8_sse4(const uint8_t* src, intptr_t srcStride, pixel* dst, intptr_t dstStride, int width, int height, int shift);
7
int x265_psyCost_pp_4x4_sse4(const pixel* source, intptr_t sstride, const pixel* recon, intptr_t rstride);
8
int x265_psyCost_pp_8x8_sse4(const pixel* source, intptr_t sstride, const pixel* recon, intptr_t rstride);
9
10
void x265_pixel_add_ps_16x16_avx2(pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1);
11
void x265_pixel_add_ps_32x32_avx2(pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1);
12
void x265_pixel_add_ps_64x64_avx2(pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1);
13
+void x265_pixel_add_ps_16x32_avx2(pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1);
14
+void x265_pixel_add_ps_32x64_avx2(pixel* a, intptr_t dstride, const pixel* b0, const int16_t* b1, intptr_t sstride0, intptr_t sstride1);
15
16
void x265_pixel_sub_ps_16x16_avx2(int16_t* a, intptr_t dstride, const pixel* b0, const pixel* b1, intptr_t sstride0, intptr_t sstride1);
17
void x265_pixel_sub_ps_32x32_avx2(int16_t* a, intptr_t dstride, const pixel* b0, const pixel* b1, intptr_t sstride0, intptr_t sstride1);
18
void x265_pixel_sub_ps_64x64_avx2(int16_t* a, intptr_t dstride, const pixel* b0, const pixel* b1, intptr_t sstride0, intptr_t sstride1);
19
+void x265_pixel_sub_ps_16x32_avx2(int16_t* a, intptr_t dstride, const pixel* b0, const pixel* b1, intptr_t sstride0, intptr_t sstride1);
20
+void x265_pixel_sub_ps_32x64_avx2(int16_t* a, intptr_t dstride, const pixel* b0, const pixel* b1, intptr_t sstride0, intptr_t sstride1);
21
22
int x265_psyCost_pp_4x4_avx2(const pixel* source, intptr_t sstride, const pixel* recon, intptr_t rstride);
23
int x265_psyCost_pp_8x8_avx2(const pixel* source, intptr_t sstride, const pixel* recon, intptr_t rstride);
24
25
int x265_psyCost_ss_16x16_avx2(const int16_t* source, intptr_t sstride, const int16_t* recon, intptr_t rstride);
26
int x265_psyCost_ss_32x32_avx2(const int16_t* source, intptr_t sstride, const int16_t* recon, intptr_t rstride);
27
int x265_psyCost_ss_64x64_avx2(const int16_t* source, intptr_t sstride, const int16_t* recon, intptr_t rstride);
28
+void x265_weight_sp_avx2(const int16_t* src, pixel* dst, intptr_t srcStride, intptr_t dstStride, int width, int height, int w0, int round, int shift, int offset);
29
30
#undef DECL_PIXELS
31
#undef DECL_HEVC_SSD
32
x265_1.6.tar.gz/source/common/x86/pixeladd8.asm -> x265_1.7.tar.gz/source/common/x86/pixeladd8.asm
Changed
201
1
2
3
jnz .loop
4
RET
5
+%endif
6
+%endmacro
7
+PIXEL_ADD_PS_W16_H4 16, 16
8
+PIXEL_ADD_PS_W16_H4 16, 32
9
10
+;-----------------------------------------------------------------------------
11
+; void pixel_add_ps_16x16(pixel *dest, intptr_t destride, pixel *src0, int16_t *scr1, intptr_t srcStride0, intptr_t srcStride1)
12
+;-----------------------------------------------------------------------------
13
+%macro PIXEL_ADD_PS_W16_H4_avx2 1
14
+%if HIGH_BIT_DEPTH
15
+%if ARCH_X86_64
16
INIT_YMM avx2
17
-cglobal pixel_add_ps_16x%2, 6, 7, 8, dest, destride, src0, scr1, srcStride0, srcStride1
18
- mov r6d, %2/4
19
+cglobal pixel_add_ps_16x%1, 6, 10, 4, dest, destride, src0, scr1, srcStride0, srcStride1
20
+ mova m3, [pw_pixel_max]
21
+ pxor m2, m2
22
+ mov r6d, %1/4
23
+ add r4d, r4d
24
+ add r5d, r5d
25
+ add r1d, r1d
26
+ lea r7, [r4 * 3]
27
+ lea r8, [r5 * 3]
28
+ lea r9, [r1 * 3]
29
+
30
+.loop:
31
+ movu m0, [r2]
32
+ movu m1, [r3]
33
+ paddw m0, m1
34
+ CLIPW m0, m2, m3
35
+ movu [r0], m0
36
+
37
+ movu m0, [r2 + r4]
38
+ movu m1, [r3 + r5]
39
+ paddw m0, m1
40
+ CLIPW m0, m2, m3
41
+ movu [r0 + r1], m0
42
+
43
+ movu m0, [r2 + r4 * 2]
44
+ movu m1, [r3 + r5 * 2]
45
+ paddw m0, m1
46
+ CLIPW m0, m2, m3
47
+ movu [r0 + r1 * 2], m0
48
+
49
+ movu m0, [r2 + r7]
50
+ movu m1, [r3 + r8]
51
+ paddw m0, m1
52
+ CLIPW m0, m2, m3
53
+ movu [r0 + r9], m0
54
+
55
+ dec r6d
56
+ lea r0, [r0 + r1 * 4]
57
+ lea r2, [r2 + r4 * 4]
58
+ lea r3, [r3 + r5 * 4]
59
+ jnz .loop
60
+ RET
61
+%endif
62
+%else
63
+INIT_YMM avx2
64
+cglobal pixel_add_ps_16x%1, 6, 7, 8, dest, destride, src0, scr1, srcStride0, srcStride1
65
+ mov r6d, %1/4
66
add r5, r5
67
.loop:
68
69
70
%endif
71
%endmacro
72
73
-PIXEL_ADD_PS_W16_H4 16, 16
74
-PIXEL_ADD_PS_W16_H4 16, 32
75
+PIXEL_ADD_PS_W16_H4_avx2 16
76
+PIXEL_ADD_PS_W16_H4_avx2 32
77
78
79
;-----------------------------------------------------------------------------
80
81
82
jnz .loop
83
RET
84
+%endif
85
+%endmacro
86
+PIXEL_ADD_PS_W32_H2 32, 32
87
+PIXEL_ADD_PS_W32_H2 32, 64
88
89
+;-----------------------------------------------------------------------------
90
+; void pixel_add_ps_32x32(pixel *dest, intptr_t destride, pixel *src0, int16_t *scr1, intptr_t srcStride0, intptr_t srcStride1)
91
+;-----------------------------------------------------------------------------
92
+%macro PIXEL_ADD_PS_W32_H4_avx2 1
93
+%if HIGH_BIT_DEPTH
94
+%if ARCH_X86_64
95
INIT_YMM avx2
96
-cglobal pixel_add_ps_32x%2, 6, 7, 8, dest, destride, src0, scr1, srcStride0, srcStride1
97
- mov r6d, %2/4
98
+cglobal pixel_add_ps_32x%1, 6, 10, 6, dest, destride, src0, scr1, srcStride0, srcStride1
99
+ mova m5, [pw_pixel_max]
100
+ pxor m4, m4
101
+ mov r6d, %1/4
102
+ add r4d, r4d
103
+ add r5d, r5d
104
+ add r1d, r1d
105
+ lea r7, [r4 * 3]
106
+ lea r8, [r5 * 3]
107
+ lea r9, [r1 * 3]
108
+
109
+.loop:
110
+ movu m0, [r2]
111
+ movu m2, [r2 + 32]
112
+ movu m1, [r3]
113
+ movu m3, [r3 + 32]
114
+ paddw m0, m1
115
+ paddw m2, m3
116
+ CLIPW2 m0, m2, m4, m5
117
+
118
+ movu [r0], m0
119
+ movu [r0 + 32], m2
120
+
121
+ movu m0, [r2 + r4]
122
+ movu m2, [r2 + r4 + 32]
123
+ movu m1, [r3 + r5]
124
+ movu m3, [r3 + r5 + 32]
125
+ paddw m0, m1
126
+ paddw m2, m3
127
+ CLIPW2 m0, m2, m4, m5
128
+
129
+ movu [r0 + r1], m0
130
+ movu [r0 + r1 + 32], m2
131
+
132
+ movu m0, [r2 + r4 * 2]
133
+ movu m2, [r2 + r4 * 2 + 32]
134
+ movu m1, [r3 + r5 * 2]
135
+ movu m3, [r3 + r5 * 2 + 32]
136
+ paddw m0, m1
137
+ paddw m2, m3
138
+ CLIPW2 m0, m2, m4, m5
139
+
140
+ movu [r0 + r1 * 2], m0
141
+ movu [r0 + r1 * 2 + 32], m2
142
+
143
+ movu m0, [r2 + r7]
144
+ movu m2, [r2 + r7 + 32]
145
+ movu m1, [r3 + r8]
146
+ movu m3, [r3 + r8 + 32]
147
+ paddw m0, m1
148
+ paddw m2, m3
149
+ CLIPW2 m0, m2, m4, m5
150
+
151
+ movu [r0 + r9], m0
152
+ movu [r0 + r9 + 32], m2
153
+
154
+ dec r6d
155
+ lea r0, [r0 + r1 * 4]
156
+ lea r2, [r2 + r4 * 4]
157
+ lea r3, [r3 + r5 * 4]
158
+ jnz .loop
159
+ RET
160
+%endif
161
+%else
162
+%if ARCH_X86_64
163
+INIT_YMM avx2
164
+cglobal pixel_add_ps_32x%1, 6, 10, 8, dest, destride, src0, scr1, srcStride0, srcStride1
165
+ mov r6d, %1/4
166
add r5, r5
167
+ lea r7, [r4 * 3]
168
+ lea r8, [r5 * 3]
169
+ lea r9, [r1 * 3]
170
.loop:
171
pmovzxbw m0, [r2] ; first half of row 0 of src0
172
pmovzxbw m1, [r2 + 16] ; second half of row 0 of src0
173
174
vpermq m0, m0, 11011000b
175
movu [r0 + r1], m0 ; row 1 of dst
176
177
- lea r2, [r2 + r4 * 2]
178
- lea r3, [r3 + r5 * 2]
179
- lea r0, [r0 + r1 * 2]
180
-
181
- pmovzxbw m0, [r2] ; first half of row 2 of src0
182
- pmovzxbw m1, [r2 + 16] ; second half of row 2 of src0
183
- movu m2, [r3] ; first half of row 2 of src1
184
- movu m3, [r3 + 32] ; second half of row 2 of src1
185
+ pmovzxbw m0, [r2 + r4 * 2] ; first half of row 2 of src0
186
+ pmovzxbw m1, [r2 + r4 * 2 + 16] ; second half of row 2 of src0
187
+ movu m2, [r3 + r5 * 2] ; first half of row 2 of src1
188
+ movu m3, [r3 + + r5 * 2 + 32]; second half of row 2 of src1
189
190
paddw m0, m2
191
paddw m1, m3
192
packuswb m0, m1
193
vpermq m0, m0, 11011000b
194
- movu [r0], m0 ; row 2 of dst
195
+ movu [r0 + r1 * 2], m0 ; row 2 of dst
196
197
- pmovzxbw m0, [r2 + r4] ; first half of row 3 of src0
198
- pmovzxbw m1, [r2 + r4 + 16] ; second half of row 3 of src0
199
- movu m2, [r3 + r5] ; first half of row 3 of src1
200
- movu m3, [r3 + r5 + 32] ; second half of row 3 of src1
201
x265_1.6.tar.gz/source/common/x86/sad-a.asm -> x265_1.7.tar.gz/source/common/x86/sad-a.asm
Changed
187
1
2
RET
3
4
INIT_YMM avx2
5
-cglobal pixel_sad_32x24, 4,5,6
6
+cglobal pixel_sad_32x24, 4,7,6
7
xorps m0, m0
8
xorps m5, m5
9
mov r4d, 6
10
+ lea r5, [r1 * 3]
11
+ lea r6, [r3 * 3]
12
.loop
13
movu m1, [r0] ; row 0 of pix0
14
movu m2, [r2] ; row 0 of pix1
15
16
paddd m0, m1
17
paddd m5, m3
18
19
- lea r2, [r2 + 2 * r3]
20
- lea r0, [r0 + 2 * r1]
21
-
22
- movu m1, [r0] ; row 2 of pix0
23
- movu m2, [r2] ; row 2 of pix1
24
- movu m3, [r0 + r1] ; row 3 of pix0
25
- movu m4, [r2 + r3] ; row 3 of pix1
26
+ movu m1, [r0 + 2 * r1] ; row 2 of pix0
27
+ movu m2, [r2 + 2 * r3] ; row 2 of pix1
28
+ movu m3, [r0 + r5] ; row 3 of pix0
29
+ movu m4, [r2 + r6] ; row 3 of pix1
30
31
psadbw m1, m2
32
psadbw m3, m4
33
paddd m0, m1
34
paddd m5, m3
35
36
- lea r2, [r2 + 2 * r3]
37
- lea r0, [r0 + 2 * r1]
38
+ lea r2, [r2 + 4 * r3]
39
+ lea r0, [r0 + 4 * r1]
40
41
dec r4d
42
jnz .loop
43
44
RET
45
46
INIT_YMM avx2
47
-cglobal pixel_sad_64x48, 4,5,6
48
+cglobal pixel_sad_64x48, 4,7,6
49
xorps m0, m0
50
xorps m5, m5
51
- mov r4d, 24
52
+ mov r4d, 12
53
+ lea r5, [r1 * 3]
54
+ lea r6, [r3 * 3]
55
.loop
56
movu m1, [r0] ; first 32 of row 0 of pix0
57
movu m2, [r2] ; first 32 of row 0 of pix1
58
59
paddd m0, m1
60
paddd m5, m3
61
62
- lea r2, [r2 + 2 * r3]
63
- lea r0, [r0 + 2 * r1]
64
+ movu m1, [r0 + 2 * r1] ; first 32 of row 0 of pix0
65
+ movu m2, [r2 + 2 * r3] ; first 32 of row 0 of pix1
66
+ movu m3, [r0 + 2 * r1 + 32] ; second 32 of row 0 of pix0
67
+ movu m4, [r2 + 2 * r3 + 32] ; second 32 of row 0 of pix1
68
+
69
+ psadbw m1, m2
70
+ psadbw m3, m4
71
+ paddd m0, m1
72
+ paddd m5, m3
73
+
74
+ movu m1, [r0 + r5] ; first 32 of row 1 of pix0
75
+ movu m2, [r2 + r6] ; first 32 of row 1 of pix1
76
+ movu m3, [r0 + 32 + r5] ; second 32 of row 1 of pix0
77
+ movu m4, [r2 + 32 + r6] ; second 32 of row 1 of pix1
78
+
79
+ psadbw m1, m2
80
+ psadbw m3, m4
81
+ paddd m0, m1
82
+ paddd m5, m3
83
+
84
+ lea r2, [r2 + 4 * r3]
85
+ lea r0, [r0 + 4 * r1]
86
87
dec r4d
88
jnz .loop
89
90
RET
91
92
INIT_YMM avx2
93
-cglobal pixel_sad_64x64, 4,5,6
94
+cglobal pixel_sad_64x64, 4,7,6
95
xorps m0, m0
96
xorps m5, m5
97
mov r4d, 8
98
+ lea r5, [r1 * 3]
99
+ lea r6, [r3 * 3]
100
.loop
101
movu m1, [r0] ; first 32 of row 0 of pix0
102
movu m2, [r2] ; first 32 of row 0 of pix1
103
104
paddd m0, m1
105
paddd m5, m3
106
107
- lea r2, [r2 + 2 * r3]
108
- lea r0, [r0 + 2 * r1]
109
-
110
- movu m1, [r0] ; first 32 of row 2 of pix0
111
- movu m2, [r2] ; first 32 of row 2 of pix1
112
- movu m3, [r0 + 32] ; second 32 of row 2 of pix0
113
- movu m4, [r2 + 32] ; second 32 of row 2 of pix1
114
+ movu m1, [r0 + 2 * r1] ; first 32 of row 2 of pix0
115
+ movu m2, [r2 + 2 * r3] ; first 32 of row 2 of pix1
116
+ movu m3, [r0 + 2 * r1 + 32] ; second 32 of row 2 of pix0
117
+ movu m4, [r2 + 2 * r3 + 32] ; second 32 of row 2 of pix1
118
119
psadbw m1, m2
120
psadbw m3, m4
121
paddd m0, m1
122
paddd m5, m3
123
124
- movu m1, [r0 + r1] ; first 32 of row 3 of pix0
125
- movu m2, [r2 + r3] ; first 32 of row 3 of pix1
126
- movu m3, [r0 + 32 + r1] ; second 32 of row 3 of pix0
127
- movu m4, [r2 + 32 + r3] ; second 32 of row 3 of pix1
128
+ movu m1, [r0 + r5] ; first 32 of row 3 of pix0
129
+ movu m2, [r2 + r6] ; first 32 of row 3 of pix1
130
+ movu m3, [r0 + 32 + r5] ; second 32 of row 3 of pix0
131
+ movu m4, [r2 + 32 + r6] ; second 32 of row 3 of pix1
132
133
psadbw m1, m2
134
psadbw m3, m4
135
paddd m0, m1
136
paddd m5, m3
137
138
- lea r2, [r2 + 2 * r3]
139
- lea r0, [r0 + 2 * r1]
140
+ lea r2, [r2 + 4 * r3]
141
+ lea r0, [r0 + 4 * r1]
142
143
movu m1, [r0] ; first 32 of row 4 of pix0
144
movu m2, [r2] ; first 32 of row 4 of pix1
145
146
paddd m0, m1
147
paddd m5, m3
148
149
- lea r2, [r2 + 2 * r3]
150
- lea r0, [r0 + 2 * r1]
151
-
152
- movu m1, [r0] ; first 32 of row 6 of pix0
153
- movu m2, [r2] ; first 32 of row 6 of pix1
154
- movu m3, [r0 + 32] ; second 32 of row 6 of pix0
155
- movu m4, [r2 + 32] ; second 32 of row 6 of pix1
156
+ movu m1, [r0 + 2 * r1] ; first 32 of row 6 of pix0
157
+ movu m2, [r2 + 2 * r3] ; first 32 of row 6 of pix1
158
+ movu m3, [r0 + 2 * r1 + 32] ; second 32 of row 6 of pix0
159
+ movu m4, [r2 + 2 * r3 + 32] ; second 32 of row 6 of pix1
160
161
psadbw m1, m2
162
psadbw m3, m4
163
paddd m0, m1
164
paddd m5, m3
165
166
- movu m1, [r0 + r1] ; first 32 of row 7 of pix0
167
- movu m2, [r2 + r3] ; first 32 of row 7 of pix1
168
- movu m3, [r0 + 32 + r1] ; second 32 of row 7 of pix0
169
- movu m4, [r2 + 32 + r3] ; second 32 of row 7 of pix1
170
+ movu m1, [r0 + r5] ; first 32 of row 7 of pix0
171
+ movu m2, [r2 + r6] ; first 32 of row 7 of pix1
172
+ movu m3, [r0 + 32 + r5] ; second 32 of row 7 of pix0
173
+ movu m4, [r2 + 32 + r6] ; second 32 of row 7 of pix1
174
175
psadbw m1, m2
176
psadbw m3, m4
177
paddd m0, m1
178
paddd m5, m3
179
180
- lea r2, [r2 + 2 * r3]
181
- lea r0, [r0 + 2 * r1]
182
+ lea r2, [r2 + 4 * r3]
183
+ lea r0, [r0 + 4 * r1]
184
185
dec r4d
186
jnz .loop
187
x265_1.6.tar.gz/source/common/x86/sad16-a.asm -> x265_1.7.tar.gz/source/common/x86/sad16-a.asm
Changed
132
1
2
ABSW2 m3, m4, m3, m4, m7, m5
3
paddw m1, m2
4
paddw m3, m4
5
- paddw m3, m1
6
- pmaddwd m3, [pw_1]
7
- paddd m0, m3
8
+ paddw m0, m1
9
+ paddw m0, m3
10
%else
11
movu m1, [r2]
12
movu m2, [r2+2*r3]
13
14
ABSW2 m1, m2, m1, m2, m3, m4
15
lea r0, [r0+4*r1]
16
lea r2, [r2+4*r3]
17
- paddw m2, m1
18
- pmaddwd m2, [pw_1]
19
- paddd m0, m2
20
+ paddw m0, m1
21
+ paddw m0, m2
22
%endif
23
%endmacro
24
25
-;-----------------------------------------------------------------------------
26
-; int pixel_sad_NxM( uint16_t *, intptr_t, uint16_t *, intptr_t )
27
-;-----------------------------------------------------------------------------
28
+%macro SAD_INC_2ROW_Nx64 1
29
+%if 2*%1 > mmsize
30
+ movu m1, [r2 + 0]
31
+ movu m2, [r2 + 16]
32
+ movu m3, [r2 + 2 * r3 + 0]
33
+ movu m4, [r2 + 2 * r3 + 16]
34
+ psubw m1, [r0 + 0]
35
+ psubw m2, [r0 + 16]
36
+ psubw m3, [r0 + 2 * r1 + 0]
37
+ psubw m4, [r0 + 2 * r1 + 16]
38
+ ABSW2 m1, m2, m1, m2, m5, m6
39
+ lea r0, [r0 + 4 * r1]
40
+ lea r2, [r2 + 4 * r3]
41
+ ABSW2 m3, m4, m3, m4, m7, m5
42
+ paddw m1, m2
43
+ paddw m3, m4
44
+ paddw m0, m1
45
+ paddw m8, m3
46
+%else
47
+ movu m1, [r2]
48
+ movu m2, [r2 + 2 * r3]
49
+ psubw m1, [r0]
50
+ psubw m2, [r0 + 2 * r1]
51
+ ABSW2 m1, m2, m1, m2, m3, m4
52
+ lea r0, [r0 + 4 * r1]
53
+ lea r2, [r2 + 4 * r3]
54
+ paddw m0, m1
55
+ paddw m8, m2
56
+%endif
57
+%endmacro
58
+
59
+; ---------------------------------------------------------------------------- -
60
+; int pixel_sad_NxM(uint16_t *, intptr_t, uint16_t *, intptr_t)
61
+; ---------------------------------------------------------------------------- -
62
%macro SAD 2
63
cglobal pixel_sad_%1x%2, 4,5-(%2&4/4),8*(%1/mmsize)
64
pxor m0, m0
65
66
dec r4d
67
jg .loop
68
%endif
69
+%if %2 == 32
70
+ HADDUWD m0, m1
71
+ HADDD m0, m1
72
+%else
73
+ HADDW m0, m1
74
+%endif
75
+ movd eax, xm0
76
+ RET
77
+%endmacro
78
79
+; ---------------------------------------------------------------------------- -
80
+; int pixel_sad_Nx64(uint16_t *, intptr_t, uint16_t *, intptr_t)
81
+; ---------------------------------------------------------------------------- -
82
+%macro SAD_Nx64 1
83
+cglobal pixel_sad_%1x64, 4,5-(64&4/4), 9
84
+ pxor m0, m0
85
+ pxor m8, m8
86
+ mov r4d, 64 / 2
87
+.loop:
88
+ SAD_INC_2ROW_Nx64 %1
89
+ dec r4d
90
+ jg .loop
91
+
92
+ HADDUWD m0, m1
93
+ HADDUWD m8, m1
94
HADDD m0, m1
95
+ HADDD m8, m1
96
+ paddd m0, m8
97
+
98
movd eax, xm0
99
RET
100
%endmacro
101
102
SAD 16, 12
103
SAD 16, 16
104
SAD 16, 32
105
-SAD 16, 64
106
+SAD_Nx64 16
107
108
INIT_XMM sse2
109
SAD 8, 4
110
111
SAD 8, 16
112
SAD 8, 32
113
114
+INIT_YMM avx2
115
+SAD 16, 4
116
+SAD 16, 8
117
+SAD 16, 12
118
+SAD 16, 16
119
+SAD 16, 32
120
+
121
;------------------------------------------------------------------
122
; int pixel_sad_32xN( uint16_t *, intptr_t, uint16_t *, intptr_t )
123
;------------------------------------------------------------------
124
125
%endif
126
movd eax, xm0
127
RET
128
-
129
;-----------------------------------------------------------------------------
130
; void pixel_sad_xN_WxH( uint16_t *fenc, uint16_t *pix0, uint16_t *pix1,
131
; uint16_t *pix2, intptr_t i_stride, int scores[3] )
132
x265_1.6.tar.gz/source/common/x86/x86inc.asm -> x265_1.7.tar.gz/source/common/x86/x86inc.asm
Changed
18
1
2
%define mangle(x) x
3
%endif
4
5
-%macro SECTION_RODATA 0-1 16
6
+%macro SECTION_RODATA 0-1 32
7
SECTION .rodata align=%1
8
%endmacro
9
10
11
%else
12
global %1
13
%endif
14
+ ALIGN 32
15
%1: %2
16
%endmacro
17
18
x265_1.6.tar.gz/source/encoder/CMakeLists.txt -> x265_1.7.tar.gz/source/encoder/CMakeLists.txt
Changed
14
1
2
# vim: syntax=cmake
3
4
if(GCC)
5
- add_definitions(-Wno-uninitialized)
6
+ add_definitions(-Wno-uninitialized)
7
+ if(CC_HAS_NO_STRICT_OVERFLOW)
8
+ # GCC 4.9.2 gives warnings we know we can ignore in this file
9
+ set_source_files_properties(slicetype.cpp PROPERTIES COMPILE_FLAGS -Wno-strict-overflow)
10
+ endif(CC_HAS_NO_STRICT_OVERFLOW)
11
endif()
12
if(MSVC)
13
add_definitions(/wd4701) # potentially uninitialized local variable 'foo' used
14
x265_1.6.tar.gz/source/encoder/analysis.cpp -> x265_1.7.tar.gz/source/encoder/analysis.cpp
Changed
201
1
2
for (uint32_t i = 0; i <= g_maxCUDepth; i++)
3
for (uint32_t j = 0; j < MAX_PRED_TYPES; j++)
4
m_modeDepth[i].pred[j].invalidate();
5
-#endif
6
invalidateContexts(0);
7
- m_quant.setQPforQuant(ctu);
8
+#endif
9
+
10
+ int qp = setLambdaFromQP(ctu, m_slice->m_pps->bUseDQP ? calculateQpforCuSize(ctu, cuGeom) : m_slice->m_sliceQp);
11
+ ctu.setQPSubParts((int8_t)qp, 0, 0);
12
+
13
m_rqt[0].cur.load(initialContext);
14
m_modeDepth[0].fencYuv.copyFromPicYuv(*m_frame->m_fencPic, ctu.m_cuAddr, 0);
15
16
17
if (m_param->analysisMode)
18
{
19
if (m_slice->m_sliceType == I_SLICE)
20
- m_reuseIntraDataCTU = (analysis_intra_data *)m_frame->m_analysisData.intraData;
21
+ m_reuseIntraDataCTU = (analysis_intra_data*)m_frame->m_analysisData.intraData;
22
else
23
{
24
int numPredDir = m_slice->isInterP() ? 1 : 2;
25
- m_reuseInterDataCTU = (analysis_inter_data *)m_frame->m_analysisData.interData;
26
+ m_reuseInterDataCTU = (analysis_inter_data*)m_frame->m_analysisData.interData;
27
m_reuseRef = &m_reuseInterDataCTU->ref[ctu.m_cuAddr * X265_MAX_PRED_MODE_PER_CTU * numPredDir];
28
m_reuseBestMergeCand = &m_reuseInterDataCTU->bestMergeCand[ctu.m_cuAddr * CUGeom::MAX_GEOMS];
29
}
30
31
uint32_t zOrder = 0;
32
if (m_slice->m_sliceType == I_SLICE)
33
{
34
- compressIntraCU(ctu, cuGeom, zOrder);
35
+ compressIntraCU(ctu, cuGeom, zOrder, qp);
36
if (m_param->analysisMode == X265_ANALYSIS_SAVE && m_frame->m_analysisData.intraData)
37
{
38
- CUData *bestCU = &m_modeDepth[0].bestMode->cu;
39
+ CUData* bestCU = &m_modeDepth[0].bestMode->cu;
40
memcpy(&m_reuseIntraDataCTU->depth[ctu.m_cuAddr * numPartition], bestCU->m_cuDepth, sizeof(uint8_t) * numPartition);
41
memcpy(&m_reuseIntraDataCTU->modes[ctu.m_cuAddr * numPartition], bestCU->m_lumaIntraDir, sizeof(uint8_t) * numPartition);
42
memcpy(&m_reuseIntraDataCTU->partSizes[ctu.m_cuAddr * numPartition], bestCU->m_partSize, sizeof(uint8_t) * numPartition);
43
44
* they are available for intra predictions */
45
m_modeDepth[0].fencYuv.copyToPicYuv(*m_frame->m_reconPic, ctu.m_cuAddr, 0);
46
47
- compressInterCU_rd0_4(ctu, cuGeom);
48
+ compressInterCU_rd0_4(ctu, cuGeom, qp);
49
50
/* generate residual for entire CTU at once and copy to reconPic */
51
encodeResidue(ctu, cuGeom);
52
}
53
else if (m_param->bDistributeModeAnalysis && m_param->rdLevel >= 2)
54
- compressInterCU_dist(ctu, cuGeom);
55
+ compressInterCU_dist(ctu, cuGeom, qp);
56
else if (m_param->rdLevel <= 4)
57
- compressInterCU_rd0_4(ctu, cuGeom);
58
+ compressInterCU_rd0_4(ctu, cuGeom, qp);
59
else
60
{
61
- compressInterCU_rd5_6(ctu, cuGeom, zOrder);
62
+ compressInterCU_rd5_6(ctu, cuGeom, zOrder, qp);
63
if (m_param->analysisMode == X265_ANALYSIS_SAVE && m_frame->m_analysisData.interData)
64
{
65
- CUData *bestCU = &m_modeDepth[0].bestMode->cu;
66
+ CUData* bestCU = &m_modeDepth[0].bestMode->cu;
67
memcpy(&m_reuseInterDataCTU->depth[ctu.m_cuAddr * numPartition], bestCU->m_cuDepth, sizeof(uint8_t) * numPartition);
68
memcpy(&m_reuseInterDataCTU->modes[ctu.m_cuAddr * numPartition], bestCU->m_predMode, sizeof(uint8_t) * numPartition);
69
}
70
71
return;
72
else if (md.bestMode->cu.isIntra(0))
73
{
74
+ m_quant.m_tqBypass = true;
75
md.pred[PRED_LOSSLESS].initCosts();
76
md.pred[PRED_LOSSLESS].cu.initLosslessCU(md.bestMode->cu, cuGeom);
77
PartSize size = (PartSize)md.pred[PRED_LOSSLESS].cu.m_partSize[0];
78
uint8_t* modes = md.pred[PRED_LOSSLESS].cu.m_lumaIntraDir;
79
checkIntra(md.pred[PRED_LOSSLESS], cuGeom, size, modes, NULL);
80
checkBestMode(md.pred[PRED_LOSSLESS], cuGeom.depth);
81
+ m_quant.m_tqBypass = false;
82
}
83
else
84
{
85
+ m_quant.m_tqBypass = true;
86
md.pred[PRED_LOSSLESS].initCosts();
87
md.pred[PRED_LOSSLESS].cu.initLosslessCU(md.bestMode->cu, cuGeom);
88
md.pred[PRED_LOSSLESS].predYuv.copyFromYuv(md.bestMode->predYuv);
89
encodeResAndCalcRdInterCU(md.pred[PRED_LOSSLESS], cuGeom);
90
checkBestMode(md.pred[PRED_LOSSLESS], cuGeom.depth);
91
+ m_quant.m_tqBypass = false;
92
}
93
}
94
95
-void Analysis::compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, uint32_t& zOrder)
96
+void Analysis::compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, uint32_t& zOrder, int32_t qp)
97
{
98
uint32_t depth = cuGeom.depth;
99
ModeDepth& md = m_modeDepth[depth];
100
101
102
if (mightNotSplit && depth == reuseDepth[zOrder] && zOrder == cuGeom.absPartIdx)
103
{
104
- m_quant.setQPforQuant(parentCTU);
105
-
106
PartSize size = (PartSize)reusePartSizes[zOrder];
107
Mode& mode = size == SIZE_2Nx2N ? md.pred[PRED_INTRA] : md.pred[PRED_INTRA_NxN];
108
- mode.cu.initSubCU(parentCTU, cuGeom);
109
+ mode.cu.initSubCU(parentCTU, cuGeom, qp);
110
checkIntra(mode, cuGeom, size, &reuseModes[zOrder], &reuseChromaModes[zOrder]);
111
checkBestMode(mode, depth);
112
113
114
}
115
else if (mightNotSplit)
116
{
117
- m_quant.setQPforQuant(parentCTU);
118
-
119
- md.pred[PRED_INTRA].cu.initSubCU(parentCTU, cuGeom);
120
+ md.pred[PRED_INTRA].cu.initSubCU(parentCTU, cuGeom, qp);
121
checkIntra(md.pred[PRED_INTRA], cuGeom, SIZE_2Nx2N, NULL, NULL);
122
checkBestMode(md.pred[PRED_INTRA], depth);
123
124
if (cuGeom.log2CUSize == 3 && m_slice->m_sps->quadtreeTULog2MinSize < 3)
125
{
126
- md.pred[PRED_INTRA_NxN].cu.initSubCU(parentCTU, cuGeom);
127
+ md.pred[PRED_INTRA_NxN].cu.initSubCU(parentCTU, cuGeom, qp);
128
checkIntra(md.pred[PRED_INTRA_NxN], cuGeom, SIZE_NxN, NULL, NULL);
129
checkBestMode(md.pred[PRED_INTRA_NxN], depth);
130
}
131
132
Mode* splitPred = &md.pred[PRED_SPLIT];
133
splitPred->initCosts();
134
CUData* splitCU = &splitPred->cu;
135
- splitCU->initSubCU(parentCTU, cuGeom);
136
+ splitCU->initSubCU(parentCTU, cuGeom, qp);
137
138
uint32_t nextDepth = depth + 1;
139
ModeDepth& nd = m_modeDepth[nextDepth];
140
invalidateContexts(nextDepth);
141
Entropy* nextContext = &m_rqt[depth].cur;
142
+ int32_t nextQP = qp;
143
144
for (uint32_t subPartIdx = 0; subPartIdx < 4; subPartIdx++)
145
{
146
147
{
148
m_modeDepth[0].fencYuv.copyPartToYuv(nd.fencYuv, childGeom.absPartIdx);
149
m_rqt[nextDepth].cur.load(*nextContext);
150
- compressIntraCU(parentCTU, childGeom, zOrder);
151
+
152
+ if (m_slice->m_pps->bUseDQP && nextDepth <= m_slice->m_pps->maxCuDQPDepth)
153
+ nextQP = setLambdaFromQP(parentCTU, calculateQpforCuSize(parentCTU, childGeom));
154
+
155
+ compressIntraCU(parentCTU, childGeom, zOrder, nextQP);
156
157
// Save best CU and pred data for this sub CU
158
splitCU->copyPartFrom(nd.bestMode->cu, childGeom, subPartIdx);
159
160
else
161
updateModeCost(*splitPred);
162
163
- checkDQPForSplitPred(splitPred->cu, cuGeom);
164
+ checkDQPForSplitPred(*splitPred, cuGeom);
165
checkBestMode(*splitPred, depth);
166
}
167
168
169
}
170
171
ModeDepth& md = m_modeDepth[pmode.cuGeom.depth];
172
- bool bMergeOnly = pmode.cuGeom.log2CUSize == 6;
173
174
/* setup slave Analysis */
175
if (&slave != this)
176
{
177
slave.m_slice = m_slice;
178
slave.m_frame = m_frame;
179
- slave.setQP(*m_slice, m_rdCost.m_qp);
180
+ slave.m_param = m_param;
181
+ slave.setLambdaFromQP(md.pred[PRED_2Nx2N].cu, m_rdCost.m_qp);
182
slave.invalidateContexts(0);
183
-
184
- if (m_param->rdLevel >= 5)
185
- {
186
- slave.m_rqt[pmode.cuGeom.depth].cur.load(m_rqt[pmode.cuGeom.depth].cur);
187
- slave.m_quant.setQPforQuant(md.pred[PRED_2Nx2N].cu);
188
- }
189
+ slave.m_rqt[pmode.cuGeom.depth].cur.load(m_rqt[pmode.cuGeom.depth].cur);
190
}
191
192
-
193
/* perform Mode task, repeat until no more work is available */
194
do
195
{
196
197
switch (pmode.modes[task])
198
{
199
case PRED_INTRA:
200
- if (&slave != this)
201
x265_1.6.tar.gz/source/encoder/analysis.h -> x265_1.7.tar.gz/source/encoder/analysis.h
Changed
36
1
2
uint32_t* m_reuseBestMergeCand;
3
4
/* full analysis for an I-slice CU */
5
- void compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, uint32_t &zOrder);
6
+ void compressIntraCU(const CUData& parentCTU, const CUGeom& cuGeom, uint32_t &zOrder, int32_t qp);
7
8
/* full analysis for a P or B slice CU */
9
- void compressInterCU_dist(const CUData& parentCTU, const CUGeom& cuGeom);
10
- void compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom& cuGeom);
11
- void compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom& cuGeom, uint32_t &zOrder);
12
+ void compressInterCU_dist(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp);
13
+ void compressInterCU_rd0_4(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp);
14
+ void compressInterCU_rd5_6(const CUData& parentCTU, const CUGeom& cuGeom, uint32_t &zOrder, int32_t qp);
15
16
/* measure merge and skip */
17
void checkMerge2Nx2N_rd0_4(Mode& skip, Mode& merge, const CUGeom& cuGeom);
18
19
20
/* measure inter options */
21
void checkInter_rd0_4(Mode& interMode, const CUGeom& cuGeom, PartSize partSize);
22
- void checkInter_rd5_6(Mode& interMode, const CUGeom& cuGeom, PartSize partSize, bool bMergeOnly);
23
+ void checkInter_rd5_6(Mode& interMode, const CUGeom& cuGeom, PartSize partSize);
24
25
void checkBidir2Nx2N(Mode& inter2Nx2N, Mode& bidir2Nx2N, const CUGeom& cuGeom);
26
27
28
/* generate residual and recon pixels for an entire CTU recursively (RD0) */
29
void encodeResidue(const CUData& parentCTU, const CUGeom& cuGeom);
30
31
- int calculateQpforCuSize(CUData& ctu, const CUGeom& cuGeom);
32
+ int calculateQpforCuSize(const CUData& ctu, const CUGeom& cuGeom);
33
34
/* check whether current mode is the new best */
35
inline void checkBestMode(Mode& mode, uint32_t depth)
36
x265_1.6.tar.gz/source/encoder/api.cpp -> x265_1.7.tar.gz/source/encoder/api.cpp
Changed
201
1
2
if (!p)
3
return NULL;
4
5
- x265_param *param = X265_MALLOC(x265_param, 1);
6
- if (!param)
7
- return NULL;
8
+ Encoder* encoder = NULL;
9
+ x265_param* param = x265_param_alloc();
10
+ x265_param* latestParam = x265_param_alloc();
11
+ if (!param || !latestParam)
12
+ goto fail;
13
14
memcpy(param, p, sizeof(x265_param));
15
x265_log(param, X265_LOG_INFO, "HEVC encoder version %s\n", x265_version_str);
16
17
x265_setup_primitives(param, param->cpuid);
18
19
if (x265_check_params(param))
20
- return NULL;
21
+ goto fail;
22
23
if (x265_set_globals(param))
24
- return NULL;
25
+ goto fail;
26
27
- Encoder *encoder = new Encoder;
28
+ encoder = new Encoder;
29
if (!param->rc.bEnableSlowFirstPass)
30
x265_param_apply_fastfirstpass(param);
31
32
// may change params for auto-detect, etc
33
encoder->configure(param);
34
-
35
// may change rate control and CPB params
36
if (!enforceLevel(*param, encoder->m_vps))
37
- {
38
- delete encoder;
39
- return NULL;
40
- }
41
+ goto fail;
42
43
// will detect and set profile/tier/level in VPS
44
determineLevel(*param, encoder->m_vps);
45
46
- encoder->create();
47
- if (encoder->m_aborted)
48
+ if (!param->bAllowNonConformance && encoder->m_vps.ptl.profileIdc == Profile::NONE)
49
{
50
- delete encoder;
51
- return NULL;
52
+ x265_log(param, X265_LOG_INFO, "non-conformant bitstreams not allowed (--allow-non-conformance)\n");
53
+ goto fail;
54
}
55
56
- x265_print_params(param);
57
+ encoder->create();
58
+ encoder->m_latestParam = latestParam;
59
+ memcpy(latestParam, param, sizeof(x265_param));
60
+ if (encoder->m_aborted)
61
+ goto fail;
62
63
+ x265_print_params(param);
64
return encoder;
65
+
66
+fail:
67
+ delete encoder;
68
+ x265_param_free(param);
69
+ x265_param_free(latestParam);
70
+ return NULL;
71
}
72
73
extern "C"
74
75
}
76
77
extern "C"
78
+int x265_encoder_reconfig(x265_encoder* enc, x265_param* param_in)
79
+{
80
+ if (!enc || !param_in)
81
+ return -1;
82
+
83
+ x265_param save;
84
+ Encoder* encoder = static_cast<Encoder*>(enc);
85
+ memcpy(&save, encoder->m_latestParam, sizeof(x265_param));
86
+ int ret = encoder->reconfigureParam(encoder->m_latestParam, param_in);
87
+ if (ret)
88
+ /* reconfigure failed, recover saved param set */
89
+ memcpy(encoder->m_latestParam, &save, sizeof(x265_param));
90
+ else
91
+ {
92
+ encoder->m_reconfigured = true;
93
+ x265_print_reconfigured_params(&save, encoder->m_latestParam);
94
+ }
95
+ return ret;
96
+}
97
+
98
+extern "C"
99
int x265_encoder_encode(x265_encoder *enc, x265_nal **pp_nal, uint32_t *pi_nal, x265_picture *pic_in, x265_picture *pic_out)
100
{
101
if (!enc)
102
103
{
104
Encoder *encoder = static_cast<Encoder*>(enc);
105
106
- encoder->stop();
107
+ encoder->stopJobs();
108
encoder->printSummary();
109
encoder->destroy();
110
delete encoder;
111
+ ATOMIC_DEC(&g_ctuSizeConfigured);
112
}
113
}
114
115
extern "C"
116
void x265_cleanup(void)
117
{
118
- BitCost::destroy();
119
- CUData::s_partSet[0] = NULL; /* allow CUData to adjust to new CTU size */
120
- g_ctuSizeConfigured = 0;
121
+ if (!g_ctuSizeConfigured)
122
+ {
123
+ BitCost::destroy();
124
+ CUData::s_partSet[0] = NULL; /* allow CUData to adjust to new CTU size */
125
+ }
126
}
127
128
extern "C"
129
130
&x265_picture_init,
131
&x265_encoder_open,
132
&x265_encoder_parameters,
133
+ &x265_encoder_reconfig,
134
&x265_encoder_headers,
135
&x265_encoder_encode,
136
&x265_encoder_get_stats,
137
138
x265_max_bit_depth,
139
};
140
141
+typedef const x265_api* (*api_get_func)(int bitDepth);
142
+
143
+#define xstr(s) str(s)
144
+#define str(s) #s
145
+
146
+#if _WIN32
147
+#define ext ".dll"
148
+#elif MACOS
149
+#include <dlfcn.h>
150
+#define ext ".dylib"
151
+#else
152
+#include <dlfcn.h>
153
+#define ext ".so"
154
+#endif
155
+
156
extern "C"
157
const x265_api* x265_api_get(int bitDepth)
158
{
159
if (bitDepth && bitDepth != X265_DEPTH)
160
- return NULL;
161
+ {
162
+ const char* libname = NULL;
163
+ const char* method = "x265_api_get_" xstr(X265_BUILD);
164
+
165
+ if (bitDepth == 12)
166
+ libname = "libx265_main12" ext;
167
+ else if (bitDepth == 10)
168
+ libname = "libx265_main10" ext;
169
+ else if (bitDepth == 8)
170
+ libname = "libx265_main" ext;
171
+ else
172
+ return NULL;
173
+
174
+ const x265_api* api = NULL;
175
+
176
+#if _WIN32
177
+ HMODULE h = LoadLibraryA(libname);
178
+ if (h)
179
+ {
180
+ api_get_func get = (api_get_func)GetProcAddress(h, method);
181
+ if (get)
182
+ api = get(0);
183
+ }
184
+#else
185
+ void* h = dlopen(libname, RTLD_LAZY | RTLD_LOCAL);
186
+ if (h)
187
+ {
188
+ api_get_func get = (api_get_func)dlsym(h, method);
189
+ if (get)
190
+ api = get(0);
191
+ }
192
+#endif
193
+
194
+ if (api && bitDepth != api->max_bit_depth)
195
+ {
196
+ x265_log(NULL, X265_LOG_WARNING, "%s does not support requested bitDepth %d\n", libname, bitDepth);
197
+ return NULL;
198
+ }
199
+
200
+ return api;
201
x265_1.6.tar.gz/source/encoder/encoder.cpp -> x265_1.7.tar.gz/source/encoder/encoder.cpp
Changed
201
1
2
Encoder::Encoder()
3
{
4
m_aborted = false;
5
+ m_reconfigured = false;
6
m_encodedFrameNum = 0;
7
m_pocLast = -1;
8
m_curEncoder = 0;
9
10
m_outputCount = 0;
11
m_csvfpt = NULL;
12
m_param = NULL;
13
+ m_latestParam = NULL;
14
m_cuOffsetY = NULL;
15
m_cuOffsetC = NULL;
16
m_buOffsetY = NULL;
17
18
bool allowPools = !p->numaPools || strcmp(p->numaPools, "none");
19
20
// Trim the thread pool if --wpp, --pme, and --pmode are disabled
21
- if (!p->bEnableWavefront && !p->bDistributeModeAnalysis && !p->bDistributeMotionEstimation)
22
+ if (!p->bEnableWavefront && !p->bDistributeModeAnalysis && !p->bDistributeMotionEstimation && !p->lookaheadSlices)
23
allowPools = false;
24
25
if (!p->frameNumThreads)
26
27
x265_log(p, X265_LOG_WARNING, "No thread pool allocated, --pme disabled\n");
28
if (p->bDistributeModeAnalysis)
29
x265_log(p, X265_LOG_WARNING, "No thread pool allocated, --pmode disabled\n");
30
+ if (p->lookaheadSlices)
31
+ x265_log(p, X265_LOG_WARNING, "No thread pool allocated, --lookahead-slices disabled\n");
32
33
// disable all pool features if the thread pool is disabled or unusable.
34
- p->bEnableWavefront = p->bDistributeModeAnalysis = p->bDistributeMotionEstimation = 0;
35
+ p->bEnableWavefront = p->bDistributeModeAnalysis = p->bDistributeMotionEstimation = p->lookaheadSlices = 0;
36
}
37
38
char buf[128];
39
40
x265_log(p, X265_LOG_INFO, "frame threads / pool features : %d / %s\n", p->frameNumThreads, buf);
41
42
for (int i = 0; i < m_param->frameNumThreads; i++)
43
+ {
44
m_frameEncoder[i] = new FrameEncoder;
45
+ m_frameEncoder[i]->m_nalList.m_annexB = !!m_param->bAnnexB;
46
+ }
47
48
if (m_numPools)
49
{
50
51
m_aborted |= parseLambdaFile(m_param);
52
53
m_encodeStartTime = x265_mdate();
54
+
55
+ m_nalList.m_annexB = !!m_param->bAnnexB;
56
}
57
58
-void Encoder::stop()
59
+void Encoder::stopJobs()
60
{
61
if (m_rateControl)
62
m_rateControl->terminate(); // unblock all blocked RC calls
63
64
if (m_lookahead)
65
- m_lookahead->stop();
66
+ m_lookahead->stopJobs();
67
68
for (int i = 0; i < m_param->frameNumThreads; i++)
69
{
70
71
}
72
73
if (m_threadPool)
74
- m_threadPool->stop();
75
+ m_threadPool->stopWorkers();
76
}
77
78
void Encoder::destroy()
79
80
81
if (m_param)
82
{
83
- free((void*)m_param->rc.lambdaFileName); // allocs by strdup
84
- free(m_param->rc.statFileName);
85
- free(m_param->analysisFileName);
86
- free((void*)m_param->scalingLists);
87
- free(m_param->csvfn);
88
- free(m_param->numaPools);
89
+ /* release string arguments that were strdup'd */
90
+ free((char*)m_param->rc.lambdaFileName);
91
+ free((char*)m_param->rc.statFileName);
92
+ free((char*)m_param->analysisFileName);
93
+ free((char*)m_param->scalingLists);
94
+ free((char*)m_param->csvfn);
95
+ free((char*)m_param->numaPools);
96
+ free((char*)m_param->masteringDisplayColorVolume);
97
+ free((char*)m_param->contentLightLevelInfo);
98
99
- X265_FREE(m_param);
100
+ x265_param_free(m_param);
101
}
102
+
103
+ x265_param_free(m_latestParam);
104
}
105
106
void Encoder::updateVbvPlan(RateControl* rc)
107
108
if (m_dpb->m_freeList.empty())
109
{
110
inFrame = new Frame;
111
- if (inFrame->create(m_param))
112
+ x265_param* p = m_reconfigured? m_latestParam : m_param;
113
+ if (inFrame->create(p))
114
{
115
/* the first PicYuv created is asked to generate the CU and block unit offset
116
* arrays which are then shared with all subsequent PicYuv (orig and recon)
117
118
}
119
}
120
else
121
+ {
122
inFrame = m_dpb->m_freeList.popBack();
123
+ inFrame->m_lowresInit = false;
124
+ }
125
126
/* Copy input picture into a Frame and PicYuv, send to lookahead */
127
inFrame->m_fencPic->copyFromPicture(*pic_in, m_sps.conformanceWindow.rightOffset, m_sps.conformanceWindow.bottomOffset);
128
129
inFrame->m_userData = pic_in->userData;
130
inFrame->m_pts = pic_in->pts;
131
inFrame->m_forceqp = pic_in->forceqp;
132
+ inFrame->m_param = m_reconfigured ? m_latestParam : m_param;
133
134
if (m_pocLast == 0)
135
m_firstPts = inFrame->m_pts;
136
137
return ret;
138
}
139
140
+int Encoder::reconfigureParam(x265_param* encParam, x265_param* param)
141
+{
142
+ encParam->maxNumReferences = param->maxNumReferences; // never uses more refs than specified in stream headers
143
+ encParam->bEnableLoopFilter = param->bEnableLoopFilter;
144
+ encParam->deblockingFilterTCOffset = param->deblockingFilterTCOffset;
145
+ encParam->deblockingFilterBetaOffset = param->deblockingFilterBetaOffset;
146
+ encParam->bEnableFastIntra = param->bEnableFastIntra;
147
+ encParam->bEnableEarlySkip = param->bEnableEarlySkip;
148
+ encParam->bEnableTemporalMvp = param->bEnableTemporalMvp;
149
+ /* Scratch buffer prevents me_range from being increased for esa/tesa
150
+ if (param->searchMethod < X265_FULL_SEARCH || param->searchMethod < encParam->searchRange)
151
+ encParam->searchRange = param->searchRange; */
152
+ encParam->noiseReductionInter = param->noiseReductionInter;
153
+ encParam->noiseReductionIntra = param->noiseReductionIntra;
154
+ /* We can't switch out of subme=0 during encoding. */
155
+ if (encParam->subpelRefine)
156
+ encParam->subpelRefine = param->subpelRefine;
157
+ encParam->rdoqLevel = param->rdoqLevel;
158
+ encParam->rdLevel = param->rdLevel;
159
+ encParam->bEnableTSkipFast = param->bEnableTSkipFast;
160
+ encParam->psyRd = param->psyRd;
161
+ encParam->psyRdoq = param->psyRdoq;
162
+ encParam->bEnableSignHiding = param->bEnableSignHiding;
163
+ encParam->bEnableFastIntra = param->bEnableFastIntra;
164
+ encParam->maxTUSize = param->maxTUSize;
165
+ return x265_check_params(encParam);
166
+}
167
+
168
void EncStats::addPsnr(double psnrY, double psnrU, double psnrV)
169
{
170
m_psnrSumY += psnrY;
171
172
bs.writeByteAlignment();
173
list.serialize(NAL_UNIT_PPS, bs);
174
175
+ if (m_param->masteringDisplayColorVolume)
176
+ {
177
+ SEIMasteringDisplayColorVolume mdsei;
178
+ if (mdsei.parse(m_param->masteringDisplayColorVolume))
179
+ {
180
+ bs.resetBits();
181
+ mdsei.write(bs, m_sps);
182
+ bs.writeByteAlignment();
183
+ list.serialize(NAL_UNIT_PREFIX_SEI, bs);
184
+ }
185
+ else
186
+ x265_log(m_param, X265_LOG_WARNING, "unable to parse mastering display color volume info\n");
187
+ }
188
+
189
+ if (m_param->contentLightLevelInfo)
190
+ {
191
+ SEIContentLightLevel cllsei;
192
+ if (cllsei.parse(m_param->contentLightLevelInfo))
193
+ {
194
+ bs.resetBits();
195
+ cllsei.write(bs, m_sps);
196
+ bs.writeByteAlignment();
197
+ list.serialize(NAL_UNIT_PREFIX_SEI, bs);
198
+ }
199
+ else
200
+ x265_log(m_param, X265_LOG_WARNING, "unable to parse content light level info\n");
201
x265_1.6.tar.gz/source/encoder/encoder.h -> x265_1.7.tar.gz/source/encoder/encoder.h
Changed
29
1
2
uint32_t m_numDelayedPic;
3
4
x265_param* m_param;
5
+ x265_param* m_latestParam;
6
RateControl* m_rateControl;
7
Lookahead* m_lookahead;
8
Window m_conformanceWindow;
9
10
bool m_bZeroLatency; // x265_encoder_encode() returns NALs for the input picture, zero lag
11
bool m_aborted; // fatal error detected
12
+ bool m_reconfigured; // reconfigure of encoder detected
13
14
Encoder();
15
~Encoder() {}
16
17
void create();
18
- void stop();
19
+ void stopJobs();
20
void destroy();
21
22
int encode(const x265_picture* pic, x265_picture *pic_out);
23
24
+ int reconfigureParam(x265_param* encParam, x265_param* param);
25
+
26
void getStreamHeaders(NALList& list, Entropy& sbacCoder, Bitstream& bs);
27
28
void fetchStats(x265_stats* stats, size_t statsSizeBytes);
29
x265_1.6.tar.gz/source/encoder/entropy.cpp -> x265_1.7.tar.gz/source/encoder/entropy.cpp
Changed
201
1
2
if (ctu.isSkipped(absPartIdx))
3
{
4
codeMergeIndex(ctu, absPartIdx);
5
- finishCU(ctu, absPartIdx, depth);
6
+ finishCU(ctu, absPartIdx, depth, bEncodeDQP);
7
return;
8
}
9
codePredMode(ctu.m_predMode[absPartIdx]);
10
11
codeCoeff(ctu, absPartIdx, bEncodeDQP, tuDepthRange);
12
13
// --- write terminating bit ---
14
- finishCU(ctu, absPartIdx, depth);
15
+ finishCU(ctu, absPartIdx, depth, bEncodeDQP);
16
}
17
18
/* Return bit count of signaling inter mode */
19
20
}
21
22
/* finish encoding a cu and handle end-of-slice conditions */
23
-void Entropy::finishCU(const CUData& ctu, uint32_t absPartIdx, uint32_t depth)
24
+void Entropy::finishCU(const CUData& ctu, uint32_t absPartIdx, uint32_t depth, bool bCodeDQP)
25
{
26
const Slice* slice = ctu.m_slice;
27
uint32_t realEndAddress = slice->m_endCUAddr;
28
29
bool granularityBoundary = (((rpelx & granularityMask) == 0 || (rpelx == slice->m_sps->picWidthInLumaSamples )) &&
30
((bpely & granularityMask) == 0 || (bpely == slice->m_sps->picHeightInLumaSamples)));
31
32
+ if (slice->m_pps->bUseDQP)
33
+ const_cast<CUData&>(ctu).setQPSubParts(bCodeDQP ? ctu.getRefQP(absPartIdx) : ctu.m_qp[absPartIdx], absPartIdx, depth);
34
+
35
if (granularityBoundary)
36
{
37
// Encode slice finish
38
39
{
40
length = 0;
41
codeNumber = (codeNumber >> absGoRice) - COEF_REMAIN_BIN_REDUCTION;
42
- if (codeNumber != 0)
43
{
44
unsigned long idx;
45
CLZ(idx, codeNumber + 1);
46
length = idx;
47
+ X265_CHECK((codeNumber != 0) || (length == 0), "length check failure\n");
48
codeNumber -= (1 << idx) - 1;
49
}
50
codeNumber = (codeNumber << absGoRice) + codeRemain;
51
52
//const uint32_t maskPosXY = ((uint32_t)~0 >> (31 - log2TrSize + MLS_CG_LOG2_SIZE)) >> 1;
53
X265_CHECK((uint32_t)((1 << (log2TrSize - MLS_CG_LOG2_SIZE)) - 1) == (((uint32_t)~0 >> (31 - log2TrSize + MLS_CG_LOG2_SIZE)) >> 1), "maskPosXY fault\n");
54
55
- scanPosLast = primitives.findPosLast(codingParameters.scan, coeff, coeffSign, coeffFlag, coeffNum, numSig);
56
+ scanPosLast = primitives.scanPosLast(codingParameters.scan, coeff, coeffSign, coeffFlag, coeffNum, numSig, g_scan4x4[codingParameters.scanType], trSize);
57
posLast = codingParameters.scan[scanPosLast];
58
59
const int lastScanSet = scanPosLast >> MLS_CG_SIZE;
60
61
uint8_t * const baseCoeffGroupCtx = &m_contextState[OFF_SIG_CG_FLAG_CTX + (bIsLuma ? 0 : NUM_SIG_CG_FLAG_CTX)];
62
uint8_t * const baseCtx = bIsLuma ? &m_contextState[OFF_SIG_FLAG_CTX] : &m_contextState[OFF_SIG_FLAG_CTX + NUM_SIG_FLAG_CTX_LUMA];
63
uint32_t c1 = 1;
64
- uint32_t goRiceParam = 0;
65
int scanPosSigOff = scanPosLast - (lastScanSet << MLS_CG_SIZE) - 1;
66
int absCoeff[1 << MLS_CG_SIZE];
67
int numNonZero = 1;
68
69
const uint32_t subCoeffFlag = coeffFlag[subSet];
70
uint32_t scanFlagMask = subCoeffFlag;
71
int subPosBase = subSet << MLS_CG_SIZE;
72
- goRiceParam = 0;
73
74
if (subSet == lastScanSet)
75
{
76
77
else
78
{
79
uint32_t sigCoeffGroup = ((sigCoeffGroupFlag64 & cgBlkPosMask) != 0);
80
- uint32_t ctxSig = Quant::getSigCoeffGroupCtxInc(sigCoeffGroupFlag64, cgPosX, cgPosY, codingParameters.log2TrSizeCG);
81
+ uint32_t ctxSig = Quant::getSigCoeffGroupCtxInc(sigCoeffGroupFlag64, cgPosX, cgPosY, cgBlkPos, (trSize >> MLS_CG_LOG2_SIZE));
82
encodeBin(sigCoeffGroup, baseCoeffGroupCtx[ctxSig]);
83
}
84
85
86
if (sigCoeffGroupFlag64 & cgBlkPosMask)
87
{
88
X265_CHECK((log2TrSize != 2) || (log2TrSize == 2 && subSet == 0), "log2TrSize and subSet mistake!\n");
89
- const int patternSigCtx = Quant::calcPatternSigCtx(sigCoeffGroupFlag64, cgPosX, cgPosY, codingParameters.log2TrSizeCG);
90
+ const int patternSigCtx = Quant::calcPatternSigCtx(sigCoeffGroupFlag64, cgPosX, cgPosY, cgBlkPos, (trSize >> MLS_CG_LOG2_SIZE));
91
+ const uint32_t posOffset = (bIsLuma && subSet) ? 3 : 0;
92
93
static const uint8_t ctxIndMap4x4[16] =
94
{
95
96
7, 7, 8, 8
97
};
98
// NOTE: [patternSigCtx][posXinSubset][posYinSubset]
99
- static const uint8_t table_cnt[4][4][4] =
100
+ static const uint8_t table_cnt[4][SCAN_SET_SIZE] =
101
{
102
// patternSigCtx = 0
103
{
104
- { 2, 1, 1, 0 },
105
- { 1, 1, 0, 0 },
106
- { 1, 0, 0, 0 },
107
- { 0, 0, 0, 0 },
108
+ 2, 1, 1, 0,
109
+ 1, 1, 0, 0,
110
+ 1, 0, 0, 0,
111
+ 0, 0, 0, 0,
112
},
113
// patternSigCtx = 1
114
{
115
- { 2, 1, 0, 0 },
116
- { 2, 1, 0, 0 },
117
- { 2, 1, 0, 0 },
118
- { 2, 1, 0, 0 },
119
+ 2, 2, 2, 2,
120
+ 1, 1, 1, 1,
121
+ 0, 0, 0, 0,
122
+ 0, 0, 0, 0,
123
},
124
// patternSigCtx = 2
125
{
126
- { 2, 2, 2, 2 },
127
- { 1, 1, 1, 1 },
128
- { 0, 0, 0, 0 },
129
- { 0, 0, 0, 0 },
130
+ 2, 1, 0, 0,
131
+ 2, 1, 0, 0,
132
+ 2, 1, 0, 0,
133
+ 2, 1, 0, 0,
134
},
135
// patternSigCtx = 3
136
{
137
- { 2, 2, 2, 2 },
138
- { 2, 2, 2, 2 },
139
- { 2, 2, 2, 2 },
140
- { 2, 2, 2, 2 },
141
+ 2, 2, 2, 2,
142
+ 2, 2, 2, 2,
143
+ 2, 2, 2, 2,
144
+ 2, 2, 2, 2,
145
}
146
};
147
+
148
+ const int offset = codingParameters.firstSignificanceMapContext;
149
+ ALIGN_VAR_32(uint16_t, tmpCoeff[SCAN_SET_SIZE]);
150
+ // TODO: accelerate by PABSW
151
+ const uint32_t blkPosBase = codingParameters.scan[subPosBase];
152
+ for (int i = 0; i < MLS_CG_SIZE; i++)
153
+ {
154
+ tmpCoeff[i * MLS_CG_SIZE + 0] = (uint16_t)abs(coeff[blkPosBase + i * trSize + 0]);
155
+ tmpCoeff[i * MLS_CG_SIZE + 1] = (uint16_t)abs(coeff[blkPosBase + i * trSize + 1]);
156
+ tmpCoeff[i * MLS_CG_SIZE + 2] = (uint16_t)abs(coeff[blkPosBase + i * trSize + 2]);
157
+ tmpCoeff[i * MLS_CG_SIZE + 3] = (uint16_t)abs(coeff[blkPosBase + i * trSize + 3]);
158
+ }
159
+
160
if (m_bitIf)
161
{
162
if (log2TrSize == 2)
163
164
uint32_t blkPos, sig, ctxSig;
165
for (; scanPosSigOff >= 0; scanPosSigOff--)
166
{
167
- blkPos = codingParameters.scan[subPosBase + scanPosSigOff];
168
+ blkPos = g_scan4x4[codingParameters.scanType][scanPosSigOff];
169
sig = scanFlagMask & 1;
170
scanFlagMask >>= 1;
171
- X265_CHECK((uint32_t)(coeff[blkPos] != 0) == sig, "sign bit mistake\n");
172
+ X265_CHECK((uint32_t)(tmpCoeff[blkPos] != 0) == sig, "sign bit mistake\n");
173
{
174
ctxSig = ctxIndMap4x4[blkPos];
175
X265_CHECK(ctxSig == Quant::getSigCtxInc(patternSigCtx, log2TrSize, trSize, blkPos, bIsLuma, codingParameters.firstSignificanceMapContext), "sigCtx mistake!\n");;
176
encodeBin(sig, baseCtx[ctxSig]);
177
}
178
- absCoeff[numNonZero] = int(abs(coeff[blkPos]));
179
+ absCoeff[numNonZero] = tmpCoeff[blkPos];
180
numNonZero += sig;
181
}
182
}
183
184
{
185
X265_CHECK((log2TrSize > 2), "log2TrSize must be more than 2 in this path!\n");
186
187
- const uint8_t (*tabSigCtx)[4] = table_cnt[(uint32_t)patternSigCtx];
188
- const int offset = codingParameters.firstSignificanceMapContext;
189
- const uint32_t lumaMask = bIsLuma ? ~0 : 0;
190
- static const uint32_t posXY4Mask[] = {0x024, 0x0CC, 0x39C};
191
- const uint32_t posGT4Mask = posXY4Mask[log2TrSize - 3] & lumaMask;
192
+ const uint8_t *tabSigCtx = table_cnt[(uint32_t)patternSigCtx];
193
194
uint32_t blkPos, sig, ctxSig;
195
for (; scanPosSigOff >= 0; scanPosSigOff--)
196
{
197
- blkPos = codingParameters.scan[subPosBase + scanPosSigOff];
198
- X265_CHECK(blkPos || (subPosBase + scanPosSigOff == 0), "blkPos==0 must be at scan[0]\n");
199
+ blkPos = g_scan4x4[codingParameters.scanType][scanPosSigOff];
200
const uint32_t posZeroMask = (subPosBase + scanPosSigOff) ? ~0 : 0;
201
x265_1.6.tar.gz/source/encoder/entropy.h -> x265_1.7.tar.gz/source/encoder/entropy.h
Changed
36
1
2
struct EstBitsSbac
3
{
4
int significantCoeffGroupBits[NUM_SIG_CG_FLAG_CTX][2];
5
- int significantBits[NUM_SIG_FLAG_CTX][2];
6
+ int significantBits[2][NUM_SIG_FLAG_CTX];
7
int lastBits[2][10];
8
int greaterOneBits[NUM_ONE_FLAG_CTX][2];
9
int levelAbsBits[NUM_ABS_FLAG_CTX][2];
10
11
inline void codeQtCbfChroma(uint32_t cbf, uint32_t tuDepth) { encodeBin(cbf, m_contextState[OFF_QT_CBF_CTX + 2 + tuDepth]); }
12
inline void codeQtRootCbf(uint32_t cbf) { encodeBin(cbf, m_contextState[OFF_QT_ROOT_CBF_CTX]); }
13
inline void codeTransformSkipFlags(uint32_t transformSkip, TextType ttype) { encodeBin(transformSkip, m_contextState[OFF_TRANSFORMSKIP_FLAG_CTX + (ttype ? NUM_TRANSFORMSKIP_FLAG_CTX : 0)]); }
14
-
15
+ void codeDeltaQP(const CUData& cu, uint32_t absPartIdx);
16
void codeSaoOffset(const SaoCtuParam& ctuParam, int plane);
17
18
/* RDO functions */
19
20
}
21
22
void encodeCU(const CUData& ctu, const CUGeom &cuGeom, uint32_t absPartIdx, uint32_t depth, bool& bEncodeDQP);
23
- void finishCU(const CUData& ctu, uint32_t absPartIdx, uint32_t depth);
24
+ void finishCU(const CUData& ctu, uint32_t absPartIdx, uint32_t depth, bool bEncodeDQP);
25
26
void writeOut();
27
28
29
30
void codeSaoMaxUvlc(uint32_t code, uint32_t maxSymbol);
31
32
- void codeDeltaQP(const CUData& cu, uint32_t absPartIdx);
33
void codeLastSignificantXY(uint32_t posx, uint32_t posy, uint32_t log2TrSize, bool bIsLuma, uint32_t scanIdx);
34
35
void encodeTransform(const CUData& cu, uint32_t absPartIdx, uint32_t tuDepth, uint32_t log2TrSize,
36
x265_1.6.tar.gz/source/encoder/frameencoder.cpp -> x265_1.7.tar.gz/source/encoder/frameencoder.cpp
Changed
201
1
2
{
3
m_slicetypeWaitTime = x265_mdate() - m_prevOutputTime;
4
m_frame = curFrame;
5
+ m_param = curFrame->m_param;
6
m_sliceType = curFrame->m_lowres.sliceType;
7
curFrame->m_encData->m_frameEncoderID = m_jpId;
8
curFrame->m_encData->m_jobProvider = this;
9
10
uint32_t row = (uint32_t)intRow;
11
CTURow& curRow = m_rows[row];
12
13
+ tld.analysis.m_param = m_param;
14
if (m_param->bEnableWavefront)
15
{
16
ScopedLock self(curRow.lock);
17
18
const uint32_t lineStartCUAddr = row * numCols;
19
bool bIsVbv = m_param->rc.vbvBufferSize > 0 && m_param->rc.vbvMaxBitrate > 0;
20
21
+ /* These store the count of inter, intra and skip cus within quad tree structure of each CTU */
22
+ uint32_t qTreeInterCnt[NUM_CU_DEPTH];
23
+ uint32_t qTreeIntraCnt[NUM_CU_DEPTH];
24
+ uint32_t qTreeSkipCnt[NUM_CU_DEPTH];
25
+ for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
26
+ qTreeIntraCnt[depth] = qTreeInterCnt[depth] = qTreeSkipCnt[depth] = 0;
27
+
28
while (curRow.completed < numCols)
29
{
30
ProfileScopeEvent(encodeCTU);
31
32
curEncData.m_rowStat[row].diagQpScale = x265_qp2qScale(curEncData.m_avgQpRc);
33
}
34
35
+ FrameData::RCStatCU& cuStat = curEncData.m_cuStat[cuAddr];
36
if (row >= col && row && m_vbvResetTriggerRow != intRow)
37
- curEncData.m_cuStat[cuAddr].baseQp = curEncData.m_cuStat[cuAddr - numCols + 1].baseQp;
38
+ cuStat.baseQp = curEncData.m_cuStat[cuAddr - numCols + 1].baseQp;
39
else
40
- curEncData.m_cuStat[cuAddr].baseQp = curEncData.m_rowStat[row].diagQp;
41
- }
42
- else
43
- curEncData.m_cuStat[cuAddr].baseQp = curEncData.m_avgQpRc;
44
+ cuStat.baseQp = curEncData.m_rowStat[row].diagQp;
45
+
46
+ /* TODO: use defines from slicetype.h for lowres block size */
47
+ uint32_t maxBlockCols = (m_frame->m_fencPic->m_picWidth + (16 - 1)) / 16;
48
+ uint32_t maxBlockRows = (m_frame->m_fencPic->m_picHeight + (16 - 1)) / 16;
49
+ uint32_t noOfBlocks = g_maxCUSize / 16;
50
+ uint32_t block_y = (cuAddr / curEncData.m_slice->m_sps->numCuInWidth) * noOfBlocks;
51
+ uint32_t block_x = (cuAddr * noOfBlocks) - block_y * curEncData.m_slice->m_sps->numCuInWidth;
52
+
53
+ cuStat.vbvCost = 0;
54
+ cuStat.intraVbvCost = 0;
55
+ for (uint32_t h = 0; h < noOfBlocks && block_y < maxBlockRows; h++, block_y++)
56
+ {
57
+ uint32_t idx = block_x + (block_y * maxBlockCols);
58
59
- if (m_param->rc.aqMode || bIsVbv)
60
- {
61
- int qp = calcQpForCu(cuAddr, curEncData.m_cuStat[cuAddr].baseQp);
62
- tld.analysis.setQP(*slice, qp);
63
- qp = x265_clip3(QP_MIN, QP_MAX_SPEC, qp);
64
- ctu->setQPSubParts((int8_t)qp, 0, 0);
65
- curEncData.m_rowStat[row].sumQpAq += qp;
66
+ for (uint32_t w = 0; w < noOfBlocks && (block_x + w) < maxBlockCols; w++, idx++)
67
+ {
68
+ cuStat.vbvCost += m_frame->m_lowres.lowresCostForRc[idx] & LOWRES_COST_MASK;
69
+ cuStat.intraVbvCost += m_frame->m_lowres.intraCost[idx];
70
+ }
71
+ }
72
}
73
else
74
- tld.analysis.setQP(*slice, slice->m_sliceQp);
75
+ curEncData.m_cuStat[cuAddr].baseQp = curEncData.m_avgQpRc;
76
77
if (m_param->bEnableWavefront && !col && row)
78
{
79
80
curRow.completed++;
81
82
if (m_param->bLogCuStats || m_param->rc.bStatWrite)
83
- collectCTUStatistics(*ctu);
84
+ curEncData.m_rowStat[row].sumQpAq += collectCTUStatistics(*ctu, qTreeInterCnt, qTreeIntraCnt, qTreeSkipCnt);
85
+ else if (m_param->rc.aqMode)
86
+ curEncData.m_rowStat[row].sumQpAq += calcCTUQP(*ctu);
87
88
// copy no. of intra, inter Cu cnt per row into frame stats for 2 pass
89
if (m_param->rc.bStatWrite)
90
91
curRow.rowStats.mvBits += best.mvBits;
92
curRow.rowStats.coeffBits += best.coeffBits;
93
curRow.rowStats.miscBits += best.totalBits - (best.mvBits + best.coeffBits);
94
- StatisticLog* log = &m_sliceTypeLog[slice->m_sliceType];
95
96
for (uint32_t depth = 0; depth <= g_maxCUDepth; depth++)
97
{
98
/* 1 << shift == number of 8x8 blocks at current depth */
99
int shift = 2 * (g_maxCUDepth - depth);
100
- curRow.rowStats.iCuCnt += log->qTreeIntraCnt[depth] << shift;
101
- curRow.rowStats.pCuCnt += log->qTreeInterCnt[depth] << shift;
102
- curRow.rowStats.skipCuCnt += log->qTreeSkipCnt[depth] << shift;
103
+ curRow.rowStats.iCuCnt += qTreeIntraCnt[depth] << shift;
104
+ curRow.rowStats.pCuCnt += qTreeInterCnt[depth] << shift;
105
+ curRow.rowStats.skipCuCnt += qTreeSkipCnt[depth] << shift;
106
107
// clear the row cu data from thread local object
108
- log->qTreeIntraCnt[depth] = log->qTreeInterCnt[depth] = log->qTreeSkipCnt[depth] = 0;
109
+ qTreeIntraCnt[depth] = qTreeInterCnt[depth] = qTreeSkipCnt[depth] = 0;
110
}
111
}
112
113
114
}
115
}
116
117
+ tld.analysis.m_param = NULL;
118
curRow.busy = false;
119
120
if (ATOMIC_INC(&m_completionCount) == 2 * (int)m_numRows)
121
m_completionEvent.trigger();
122
}
123
124
-void FrameEncoder::collectCTUStatistics(CUData& ctu)
125
+/* collect statistics about CU coding decisions, return total QP */
126
+int FrameEncoder::collectCTUStatistics(const CUData& ctu, uint32_t* qtreeInterCnt, uint32_t* qtreeIntraCnt, uint32_t* qtreeSkipCnt)
127
{
128
StatisticLog* log = &m_sliceTypeLog[ctu.m_slice->m_sliceType];
129
+ int totQP = 0;
130
131
if (ctu.m_slice->m_sliceType == I_SLICE)
132
{
133
134
135
log->totalCu++;
136
log->cntIntra[depth]++;
137
- log->qTreeIntraCnt[depth]++;
138
+ qtreeIntraCnt[depth]++;
139
+ totQP += ctu.m_qp[absPartIdx] * (ctu.m_numPartitions >> (depth * 2));
140
141
if (ctu.m_predMode[absPartIdx] == MODE_NONE)
142
{
143
log->totalCu--;
144
log->cntIntra[depth]--;
145
- log->qTreeIntraCnt[depth]--;
146
+ qtreeIntraCnt[depth]--;
147
}
148
else if (ctu.m_partSize[absPartIdx] != SIZE_2Nx2N)
149
{
150
151
152
log->totalCu++;
153
log->cntTotalCu[depth]++;
154
+ totQP += ctu.m_qp[absPartIdx] * (ctu.m_numPartitions >> (depth * 2));
155
156
if (ctu.m_predMode[absPartIdx] == MODE_NONE)
157
{
158
159
{
160
log->totalCu--;
161
log->cntSkipCu[depth]++;
162
- log->qTreeSkipCnt[depth]++;
163
+ qtreeSkipCnt[depth]++;
164
}
165
else if (ctu.isInter(absPartIdx))
166
{
167
log->cntInter[depth]++;
168
- log->qTreeInterCnt[depth]++;
169
+ qtreeInterCnt[depth]++;
170
171
if (ctu.m_partSize[absPartIdx] < AMP_ID)
172
log->cuInterDistribution[depth][ctu.m_partSize[absPartIdx]]++;
173
174
else if (ctu.isIntra(absPartIdx))
175
{
176
log->cntIntra[depth]++;
177
- log->qTreeIntraCnt[depth]++;
178
+ qtreeIntraCnt[depth]++;
179
180
if (ctu.m_partSize[absPartIdx] != SIZE_2Nx2N)
181
{
182
X265_CHECK(ctu.m_log2CUSize[absPartIdx] == 3 && ctu.m_slice->m_sps->quadtreeTULog2MinSize < 3, "Intra NxN found at improbable depth\n");
183
log->cntIntraNxN++;
184
+ log->cntIntra[depth]--;
185
/* TODO: log intra modes at absPartIdx +0 to +3 */
186
}
187
else if (ctu.m_lumaIntraDir[absPartIdx] > 1)
188
189
}
190
}
191
}
192
+
193
+ return totQP;
194
+}
195
+
196
+/* iterate over coded CUs and determine total QP */
197
+int FrameEncoder::calcCTUQP(const CUData& ctu)
198
+{
199
+ int totQP = 0;
200
+ uint32_t depth = 0, numParts = ctu.m_numPartitions;
201
x265_1.6.tar.gz/source/encoder/frameencoder.h -> x265_1.7.tar.gz/source/encoder/frameencoder.h
Changed
24
1
2
uint64_t cntTotalCu[4];
3
uint64_t totalCu;
4
5
- /* These states store the count of inter,intra and skip ctus within quad tree structure of each CU */
6
- uint32_t qTreeInterCnt[4];
7
- uint32_t qTreeIntraCnt[4];
8
- uint32_t qTreeSkipCnt[4];
9
-
10
StatisticLog()
11
{
12
memset(this, 0, sizeof(StatisticLog));
13
14
void encodeSlice();
15
16
void threadMain();
17
- int calcQpForCu(uint32_t cuAddr, double baseQp);
18
- void collectCTUStatistics(CUData& ctu);
19
+ int collectCTUStatistics(const CUData& ctu, uint32_t* qtreeInterCnt, uint32_t* qtreeIntraCnt, uint32_t* qtreeSkipCnt);
20
+ int calcCTUQP(const CUData& ctu);
21
void noiseReductionUpdate();
22
23
/* Called by WaveFront::findJob() */
24
x265_1.6.tar.gz/source/encoder/level.cpp -> x265_1.7.tar.gz/source/encoder/level.cpp
Changed
138
1
2
{ 35651584, 1069547520, 60000, 240000, 60000, 240000, 8, Level::LEVEL6, "6", 60 },
3
{ 35651584, 2139095040, 120000, 480000, 120000, 480000, 8, Level::LEVEL6_1, "6.1", 61 },
4
{ 35651584, 4278190080U, 240000, 800000, 240000, 800000, 6, Level::LEVEL6_2, "6.2", 62 },
5
+ { MAX_UINT, MAX_UINT, MAX_UINT, MAX_UINT, MAX_UINT, MAX_UINT, 1, Level::LEVEL8_5, "8.5", 85 },
6
};
7
8
/* determine minimum decoder level required to decode the described video */
9
void determineLevel(const x265_param ¶m, VPS& vps)
10
{
11
vps.maxTempSubLayers = param.bEnableTemporalSubLayers ? 2 : 1;
12
- if (param.bLossless)
13
- vps.ptl.profileIdc = Profile::NONE;
14
- else if (param.internalCsp == X265_CSP_I420)
15
+ if (param.internalCsp == X265_CSP_I420)
16
{
17
if (param.internalBitDepth == 8)
18
{
19
20
21
const size_t NumLevels = sizeof(levels) / sizeof(levels[0]);
22
uint32_t i;
23
- for (i = 0; i < NumLevels; i++)
24
+ if (param.bLossless)
25
+ {
26
+ i = 13;
27
+ vps.ptl.minCrForLevel = 1;
28
+ vps.ptl.maxLumaSrForLevel = MAX_UINT;
29
+ vps.ptl.levelIdc = Level::LEVEL8_5;
30
+ vps.ptl.tierFlag = Level::MAIN;
31
+ }
32
+ else for (i = 0; i < NumLevels; i++)
33
{
34
if (lumaSamples > levels[i].maxLumaSamples)
35
continue;
36
37
extern "C"
38
int x265_param_apply_profile(x265_param *param, const char *profile)
39
{
40
- if (!profile)
41
+ if (!param || !profile)
42
return 0;
43
- if (!strcmp(profile, "main"))
44
- {
45
- /* SPSs shall have chroma_format_idc equal to 1 only */
46
- param->internalCsp = X265_CSP_I420;
47
48
#if HIGH_BIT_DEPTH
49
- /* SPSs shall have bit_depth_luma_minus8 equal to 0 only */
50
- x265_log(param, X265_LOG_ERROR, "Main profile not supported, compiled for Main10.\n");
51
+ if (!strcmp(profile, "main") || !strcmp(profile, "mainstillpicture") || !strcmp(profile, "msp") || !strcmp(profile, "main444-8"))
52
+ {
53
+ x265_log(param, X265_LOG_ERROR, "%s profile not supported, compiled for Main10.\n", profile);
54
return -1;
55
-#endif
56
}
57
- else if (!strcmp(profile, "main10"))
58
+#else
59
+ if (!strcmp(profile, "main10") || !strcmp(profile, "main422-10") || !strcmp(profile, "main444-10"))
60
{
61
- /* SPSs shall have chroma_format_idc equal to 1 only */
62
- param->internalCsp = X265_CSP_I420;
63
-
64
- /* SPSs shall have bit_depth_luma_minus8 in the range of 0 to 2, inclusive
65
- * this covers all builds of x265, currently */
66
+ x265_log(param, X265_LOG_ERROR, "%s profile not supported, compiled for Main.\n", profile);
67
+ return -1;
68
+ }
69
+#endif
70
+
71
+ if (!strcmp(profile, "main"))
72
+ {
73
+ if (!(param->internalCsp & X265_CSP_I420))
74
+ {
75
+ x265_log(param, X265_LOG_ERROR, "%s profile not compatible with %s input color space.\n",
76
+ profile, x265_source_csp_names[param->internalCsp]);
77
+ return -1;
78
+ }
79
}
80
else if (!strcmp(profile, "mainstillpicture") || !strcmp(profile, "msp"))
81
{
82
- /* SPSs shall have chroma_format_idc equal to 1 only */
83
- param->internalCsp = X265_CSP_I420;
84
+ if (!(param->internalCsp & X265_CSP_I420))
85
+ {
86
+ x265_log(param, X265_LOG_ERROR, "%s profile not compatible with %s input color space.\n",
87
+ profile, x265_source_csp_names[param->internalCsp]);
88
+ return -1;
89
+ }
90
91
/* SPSs shall have sps_max_dec_pic_buffering_minus1[ sps_max_sub_layers_minus1 ] equal to 0 only */
92
param->maxNumReferences = 1;
93
94
param->rc.cuTree = 0;
95
param->bEnableWeightedPred = 0;
96
param->bEnableWeightedBiPred = 0;
97
-
98
-#if HIGH_BIT_DEPTH
99
- /* SPSs shall have bit_depth_luma_minus8 equal to 0 only */
100
- x265_log(param, X265_LOG_ERROR, "Mainstillpicture profile not supported, compiled for Main10.\n");
101
- return -1;
102
-#endif
103
+ }
104
+ else if (!strcmp(profile, "main10"))
105
+ {
106
+ if (!(param->internalCsp & X265_CSP_I420))
107
+ {
108
+ x265_log(param, X265_LOG_ERROR, "%s profile not compatible with %s input color space.\n",
109
+ profile, x265_source_csp_names[param->internalCsp]);
110
+ return -1;
111
+ }
112
}
113
else if (!strcmp(profile, "main422-10"))
114
- param->internalCsp = X265_CSP_I422;
115
- else if (!strcmp(profile, "main444-8"))
116
{
117
- param->internalCsp = X265_CSP_I444;
118
-#if HIGH_BIT_DEPTH
119
- x265_log(param, X265_LOG_ERROR, "Main 4:4:4 8 profile not supported, compiled for Main10.\n");
120
- return -1;
121
-#endif
122
+ if (!(param->internalCsp & (X265_CSP_I420 | X265_CSP_I422)))
123
+ {
124
+ x265_log(param, X265_LOG_ERROR, "%s profile not compatible with %s input color space.\n",
125
+ profile, x265_source_csp_names[param->internalCsp]);
126
+ return -1;
127
+ }
128
+ }
129
+ else if (!strcmp(profile, "main444-8") || !strcmp(profile, "main444-10"))
130
+ {
131
+ /* any color space allowed */
132
}
133
- else if (!strcmp(profile, "main444-10"))
134
- param->internalCsp = X265_CSP_I444;
135
else
136
{
137
x265_log(param, X265_LOG_ERROR, "unknown profile <%s>\n", profile);
138
x265_1.6.tar.gz/source/encoder/motion.cpp -> x265_1.7.tar.gz/source/encoder/motion.cpp
Changed
77
1
2
pix_base + (m1x) + (m1y) * stride, \
3
pix_base + (m2x) + (m2y) * stride, \
4
stride, costs); \
5
- (costs)[0] += mvcost((bmv + MV(m0x, m0y)) << 2); \
6
- (costs)[1] += mvcost((bmv + MV(m1x, m1y)) << 2); \
7
- (costs)[2] += mvcost((bmv + MV(m2x, m2y)) << 2); \
8
+ const uint16_t *base_mvx = &m_cost_mvx[(bmv.x + (m0x)) << 2]; \
9
+ const uint16_t *base_mvy = &m_cost_mvy[(bmv.y + (m0y)) << 2]; \
10
+ X265_CHECK(mvcost((bmv + MV(m0x, m0y)) << 2) == (base_mvx[((m0x) - (m0x)) << 2] + base_mvy[((m0y) - (m0y)) << 2]), "mvcost() check failure\n"); \
11
+ X265_CHECK(mvcost((bmv + MV(m1x, m1y)) << 2) == (base_mvx[((m1x) - (m0x)) << 2] + base_mvy[((m1y) - (m0y)) << 2]), "mvcost() check failure\n"); \
12
+ X265_CHECK(mvcost((bmv + MV(m2x, m2y)) << 2) == (base_mvx[((m2x) - (m0x)) << 2] + base_mvy[((m2y) - (m0y)) << 2]), "mvcost() check failure\n"); \
13
+ (costs)[0] += (base_mvx[((m0x) - (m0x)) << 2] + base_mvy[((m0y) - (m0y)) << 2]); \
14
+ (costs)[1] += (base_mvx[((m1x) - (m0x)) << 2] + base_mvy[((m1y) - (m0y)) << 2]); \
15
+ (costs)[2] += (base_mvx[((m2x) - (m0x)) << 2] + base_mvy[((m2y) - (m0y)) << 2]); \
16
}
17
18
#define COST_MV_PT_DIST_X4(m0x, m0y, p0, d0, m1x, m1y, p1, d1, m2x, m2y, p2, d2, m3x, m3y, p3, d3) \
19
20
fref + (m2x) + (m2y) * stride, \
21
fref + (m3x) + (m3y) * stride, \
22
stride, costs); \
23
- costs[0] += mvcost(MV(m0x, m0y) << 2); \
24
- costs[1] += mvcost(MV(m1x, m1y) << 2); \
25
- costs[2] += mvcost(MV(m2x, m2y) << 2); \
26
- costs[3] += mvcost(MV(m3x, m3y) << 2); \
27
+ (costs)[0] += mvcost(MV(m0x, m0y) << 2); \
28
+ (costs)[1] += mvcost(MV(m1x, m1y) << 2); \
29
+ (costs)[2] += mvcost(MV(m2x, m2y) << 2); \
30
+ (costs)[3] += mvcost(MV(m3x, m3y) << 2); \
31
COPY4_IF_LT(bcost, costs[0], bmv, MV(m0x, m0y), bPointNr, p0, bDistance, d0); \
32
COPY4_IF_LT(bcost, costs[1], bmv, MV(m1x, m1y), bPointNr, p1, bDistance, d1); \
33
COPY4_IF_LT(bcost, costs[2], bmv, MV(m2x, m2y), bPointNr, p2, bDistance, d2); \
34
35
pix_base + (m2x) + (m2y) * stride, \
36
pix_base + (m3x) + (m3y) * stride, \
37
stride, costs); \
38
- costs[0] += mvcost((omv + MV(m0x, m0y)) << 2); \
39
- costs[1] += mvcost((omv + MV(m1x, m1y)) << 2); \
40
- costs[2] += mvcost((omv + MV(m2x, m2y)) << 2); \
41
- costs[3] += mvcost((omv + MV(m3x, m3y)) << 2); \
42
+ const uint16_t *base_mvx = &m_cost_mvx[(omv.x << 2)]; \
43
+ const uint16_t *base_mvy = &m_cost_mvy[(omv.y << 2)]; \
44
+ X265_CHECK(mvcost((omv + MV(m0x, m0y)) << 2) == (base_mvx[(m0x) << 2] + base_mvy[(m0y) << 2]), "mvcost() check failure\n"); \
45
+ X265_CHECK(mvcost((omv + MV(m1x, m1y)) << 2) == (base_mvx[(m1x) << 2] + base_mvy[(m1y) << 2]), "mvcost() check failure\n"); \
46
+ X265_CHECK(mvcost((omv + MV(m2x, m2y)) << 2) == (base_mvx[(m2x) << 2] + base_mvy[(m2y) << 2]), "mvcost() check failure\n"); \
47
+ X265_CHECK(mvcost((omv + MV(m3x, m3y)) << 2) == (base_mvx[(m3x) << 2] + base_mvy[(m3y) << 2]), "mvcost() check failure\n"); \
48
+ costs[0] += (base_mvx[(m0x) << 2] + base_mvy[(m0y) << 2]); \
49
+ costs[1] += (base_mvx[(m1x) << 2] + base_mvy[(m1y) << 2]); \
50
+ costs[2] += (base_mvx[(m2x) << 2] + base_mvy[(m2y) << 2]); \
51
+ costs[3] += (base_mvx[(m3x) << 2] + base_mvy[(m3y) << 2]); \
52
COPY2_IF_LT(bcost, costs[0], bmv, omv + MV(m0x, m0y)); \
53
COPY2_IF_LT(bcost, costs[1], bmv, omv + MV(m1x, m1y)); \
54
COPY2_IF_LT(bcost, costs[2], bmv, omv + MV(m2x, m2y)); \
55
56
pix_base + (m2x) + (m2y) * stride, \
57
pix_base + (m3x) + (m3y) * stride, \
58
stride, costs); \
59
- (costs)[0] += mvcost((bmv + MV(m0x, m0y)) << 2); \
60
- (costs)[1] += mvcost((bmv + MV(m1x, m1y)) << 2); \
61
- (costs)[2] += mvcost((bmv + MV(m2x, m2y)) << 2); \
62
- (costs)[3] += mvcost((bmv + MV(m3x, m3y)) << 2); \
63
+ /* TODO: use restrict keyword in ICL */ \
64
+ const uint16_t *base_mvx = &m_cost_mvx[(bmv.x << 2)]; \
65
+ const uint16_t *base_mvy = &m_cost_mvy[(bmv.y << 2)]; \
66
+ X265_CHECK(mvcost((bmv + MV(m0x, m0y)) << 2) == (base_mvx[(m0x) << 2] + base_mvy[(m0y) << 2]), "mvcost() check failure\n"); \
67
+ X265_CHECK(mvcost((bmv + MV(m1x, m1y)) << 2) == (base_mvx[(m1x) << 2] + base_mvy[(m1y) << 2]), "mvcost() check failure\n"); \
68
+ X265_CHECK(mvcost((bmv + MV(m2x, m2y)) << 2) == (base_mvx[(m2x) << 2] + base_mvy[(m2y) << 2]), "mvcost() check failure\n"); \
69
+ X265_CHECK(mvcost((bmv + MV(m3x, m3y)) << 2) == (base_mvx[(m3x) << 2] + base_mvy[(m3y) << 2]), "mvcost() check failure\n"); \
70
+ (costs)[0] += (base_mvx[(m0x) << 2] + base_mvy[(m0y) << 2]); \
71
+ (costs)[1] += (base_mvx[(m1x) << 2] + base_mvy[(m1y) << 2]); \
72
+ (costs)[2] += (base_mvx[(m2x) << 2] + base_mvy[(m2y) << 2]); \
73
+ (costs)[3] += (base_mvx[(m3x) << 2] + base_mvy[(m3y) << 2]); \
74
}
75
76
#define DIA1_ITER(mx, my) \
77
x265_1.6.tar.gz/source/encoder/nal.cpp -> x265_1.7.tar.gz/source/encoder/nal.cpp
Changed
40
1
2
, m_extraBuffer(NULL)
3
, m_extraOccupancy(0)
4
, m_extraAllocSize(0)
5
+ , m_annexB(true)
6
{}
7
8
void NALList::takeContents(NALList& other)
9
10
uint8_t *out = m_buffer + m_occupancy;
11
uint32_t bytes = 0;
12
13
- if (!m_numNal || nalUnitType == NAL_UNIT_VPS || nalUnitType == NAL_UNIT_SPS || nalUnitType == NAL_UNIT_PPS)
14
+ if (!m_annexB)
15
+ {
16
+ /* Will write size later */
17
+ bytes += 4;
18
+ }
19
+ else if (!m_numNal || nalUnitType == NAL_UNIT_VPS || nalUnitType == NAL_UNIT_SPS || nalUnitType == NAL_UNIT_PPS)
20
{
21
memcpy(out, startCodePrefix, 4);
22
bytes += 4;
23
24
* to 0x03 is appended to the end of the data. */
25
if (!out[bytes - 1])
26
out[bytes++] = 0x03;
27
+
28
+ if (!m_annexB)
29
+ {
30
+ uint32_t dataSize = bytes - 4;
31
+ out[0] = (uint8_t)(dataSize >> 24);
32
+ out[1] = (uint8_t)(dataSize >> 16);
33
+ out[2] = (uint8_t)(dataSize >> 8);
34
+ out[3] = (uint8_t)dataSize;
35
+ }
36
+
37
m_occupancy += bytes;
38
39
X265_CHECK(m_numNal < (uint32_t)MAX_NAL_UNITS, "NAL count overflow\n");
40
x265_1.6.tar.gz/source/encoder/nal.h -> x265_1.7.tar.gz/source/encoder/nal.h
Changed
9
1
2
uint8_t* m_extraBuffer;
3
uint32_t m_extraOccupancy;
4
uint32_t m_extraAllocSize;
5
+ bool m_annexB;
6
7
NALList();
8
~NALList() { X265_FREE(m_buffer); X265_FREE(m_extraBuffer); }
9
x265_1.6.tar.gz/source/encoder/ratecontrol.cpp -> x265_1.7.tar.gz/source/encoder/ratecontrol.cpp
Changed
201
1
2
}
3
}
4
5
- /* qstep - value set as encoder specific */
6
+ /* qpstep - value set as encoder specific */
7
m_lstep = pow(2, m_param->rc.qpStep / 6.0);
8
9
for (int i = 0; i < 2; i++)
10
11
m_accumPQp = (m_param->rc.rateControlMode == X265_RC_CRF ? CRF_INIT_QP : ABR_INIT_QP_MIN) * m_accumPNorm;
12
13
/* Frame Predictors and Row predictors used in vbv */
14
- for (int i = 0; i < 5; i++)
15
+ for (int i = 0; i < 4; i++)
16
{
17
- m_pred[i].coeff = 1.5;
18
+ m_pred[i].coeff = 1.0;
19
m_pred[i].count = 1.0;
20
m_pred[i].decay = 0.5;
21
m_pred[i].offset = 0.0;
22
}
23
- m_pred[0].coeff = 1.0;
24
+ m_pred[0].coeff = m_pred[3].coeff = 0.75;
25
+ if (m_param->rc.qCompress >= 0.8) // when tuned for grain
26
+ {
27
+ m_pred[1].coeff = 0.75;
28
+ m_pred[0].coeff = m_pred[3].coeff = 0.50;
29
+ }
30
if (!m_statFileOut && (m_param->rc.bStatWrite || m_param->rc.bStatRead))
31
{
32
/* If the user hasn't defined the stat filename, use the default value */
33
34
m_curSlice = curEncData.m_slice;
35
m_sliceType = m_curSlice->m_sliceType;
36
rce->sliceType = m_sliceType;
37
+ if (!m_2pass)
38
+ rce->keptAsRef = IS_REFERENCED(curFrame);
39
+ m_predType = getPredictorType(curFrame->m_lowres.sliceType, m_sliceType);
40
rce->poc = m_curSlice->m_poc;
41
if (m_param->rc.bStatRead)
42
{
43
44
m_lastQScaleFor[m_sliceType] = x265_qp2qScale(rce->qpaRc);
45
if (rce->poc == 0)
46
m_lastQScaleFor[P_SLICE] = m_lastQScaleFor[m_sliceType] * fabs(m_param->rc.ipFactor);
47
- rce->frameSizePlanned = predictSize(&m_pred[m_sliceType], m_qp, (double)m_currentSatd);
48
+ rce->frameSizePlanned = predictSize(&m_pred[m_predType], m_qp, (double)m_currentSatd);
49
}
50
}
51
m_framesDone++;
52
53
m_accumPQp += m_qp;
54
}
55
56
+int RateControl::getPredictorType(int lowresSliceType, int sliceType)
57
+{
58
+ /* Use a different predictor for B Ref and B frames for vbv frame size predictions */
59
+ if (lowresSliceType == X265_TYPE_BREF)
60
+ return 3;
61
+ return sliceType;
62
+}
63
+
64
double RateControl::getDiffLimitedQScale(RateControlEntry *rce, double q)
65
{
66
// force I/B quants as a function of P quants
67
68
q += m_pbOffset;
69
70
double qScale = x265_qp2qScale(q);
71
+ rce->qpNoVbv = q;
72
double lmin = 0, lmax = 0;
73
if (m_isVbv)
74
{
75
76
qScale = x265_clip3(lmin, lmax, qScale);
77
q = x265_qScale2qp(qScale);
78
}
79
- rce->qpNoVbv = q;
80
if (!m_2pass)
81
{
82
qScale = clipQscale(curFrame, rce, qScale);
83
/* clip qp to permissible range after vbv-lookahead estimation to avoid possible
84
* mispredictions by initial frame size predictors */
85
- if (m_pred[m_sliceType].count == 1)
86
+ if (m_pred[m_predType].count == 1)
87
qScale = x265_clip3(lmin, lmax, qScale);
88
m_lastQScaleFor[m_sliceType] = qScale;
89
- rce->frameSizePlanned = predictSize(&m_pred[m_sliceType], qScale, (double)m_currentSatd);
90
+ rce->frameSizePlanned = predictSize(&m_pred[m_predType], qScale, (double)m_currentSatd);
91
}
92
else
93
rce->frameSizePlanned = qScale2bits(rce, qScale);
94
95
q = clipQscale(curFrame, rce, q);
96
/* clip qp to permissible range after vbv-lookahead estimation to avoid possible
97
* mispredictions by initial frame size predictors */
98
- if (!m_2pass && m_isVbv && m_pred[m_sliceType].count == 1)
99
+ if (!m_2pass && m_isVbv && m_pred[m_predType].count == 1)
100
q = x265_clip3(lqmin, lqmax, q);
101
}
102
m_lastQScaleFor[m_sliceType] = q;
103
104
if (m_2pass && m_isVbv)
105
rce->frameSizePlanned = qScale2bits(rce, q);
106
else
107
- rce->frameSizePlanned = predictSize(&m_pred[m_sliceType], q, (double)m_currentSatd);
108
+ rce->frameSizePlanned = predictSize(&m_pred[m_predType], q, (double)m_currentSatd);
109
110
/* Always use up the whole VBV in this case. */
111
if (m_singleFrameVbv)
112
113
{
114
double frameQ[3];
115
double curBits;
116
- curBits = predictSize(&m_pred[m_sliceType], q, (double)m_currentSatd);
117
+ curBits = predictSize(&m_pred[m_predType], q, (double)m_currentSatd);
118
double bufferFillCur = m_bufferFill - curBits;
119
double targetFill;
120
double totalDuration = m_frameDuration;
121
122
bufferFillCur += wantedFrameSize;
123
int64_t satd = curFrame->m_lowres.plannedSatd[j] >> (X265_DEPTH - 8);
124
type = IS_X265_TYPE_I(type) ? I_SLICE : IS_X265_TYPE_B(type) ? B_SLICE : P_SLICE;
125
- curBits = predictSize(&m_pred[type], frameQ[type], (double)satd);
126
+ int predType = getPredictorType(curFrame->m_lowres.plannedType[j], type);
127
+ curBits = predictSize(&m_pred[predType], frameQ[type], (double)satd);
128
bufferFillCur -= curBits;
129
}
130
131
132
}
133
// Now a hard threshold to make sure the frame fits in VBV.
134
// This one is mostly for I-frames.
135
- double bits = predictSize(&m_pred[m_sliceType], q, (double)m_currentSatd);
136
+ double bits = predictSize(&m_pred[m_predType], q, (double)m_currentSatd);
137
138
// For small VBVs, allow the frame to use up the entire VBV.
139
double maxFillFactor;
140
141
bits *= qf;
142
if (bits < m_bufferRate / minFillFactor)
143
q *= bits * minFillFactor / m_bufferRate;
144
- bits = predictSize(&m_pred[m_sliceType], q, (double)m_currentSatd);
145
+ bits = predictSize(&m_pred[m_predType], q, (double)m_currentSatd);
146
}
147
148
q = X265_MAX(q0, q);
149
}
150
151
/* Apply MinCR restrictions */
152
- double pbits = predictSize(&m_pred[m_sliceType], q, (double)m_currentSatd);
153
+ double pbits = predictSize(&m_pred[m_predType], q, (double)m_currentSatd);
154
if (pbits > rce->frameSizeMaximum)
155
q *= pbits / rce->frameSizeMaximum;
156
-
157
- if (!m_isCbr || (m_isAbr && m_currentSatd >= rce->movingAvgSum && q <= q0 / 2))
158
+ /* To detect frames that are more complex in SATD costs compared to prev window, yet
159
+ * lookahead vbv reduces its qscale by half its value. Be on safer side and avoid drastic
160
+ * qscale reductions for frames high in complexity */
161
+ bool mispredCheck = rce->movingAvgSum && m_currentSatd >= rce->movingAvgSum && q <= q0 / 2;
162
+ if (!m_isCbr || (m_isAbr && mispredCheck))
163
q = X265_MAX(q0, q);
164
165
if (m_rateFactorMaxIncrement)
166
167
if (satdCostForPendingCus > 0)
168
{
169
double pred_s = predictSize(rce->rowPred[0], qScale, satdCostForPendingCus);
170
- uint32_t refRowSatdCost = 0, refRowBits = 0, intraCost = 0;
171
+ uint32_t refRowSatdCost = 0, refRowBits = 0, intraCostForPendingCus = 0;
172
double refQScale = 0;
173
174
if (picType != I_SLICE)
175
{
176
FrameData& refEncData = *refFrame->m_encData;
177
uint32_t endCuAddr = maxCols * (row + 1);
178
- for (uint32_t cuAddr = curEncData.m_rowStat[row].numEncodedCUs + 1; cuAddr < endCuAddr; cuAddr++)
179
+ uint32_t startCuAddr = curEncData.m_rowStat[row].numEncodedCUs;
180
+ if (startCuAddr)
181
{
182
- refRowSatdCost += refEncData.m_cuStat[cuAddr].vbvCost;
183
- refRowBits += refEncData.m_cuStat[cuAddr].totalBits;
184
- intraCost += curEncData.m_cuStat[cuAddr].intraVbvCost;
185
+ for (uint32_t cuAddr = startCuAddr + 1 ; cuAddr < endCuAddr; cuAddr++)
186
+ {
187
+ refRowSatdCost += refEncData.m_cuStat[cuAddr].vbvCost;
188
+ refRowBits += refEncData.m_cuStat[cuAddr].totalBits;
189
+ }
190
+ }
191
+ else
192
+ {
193
+ refRowBits = refEncData.m_rowStat[row].encodedBits;
194
+ refRowSatdCost = refEncData.m_rowStat[row].satdForVbv;
195
}
196
197
refRowSatdCost >>= X265_DEPTH - 8;
198
199
if (picType == I_SLICE || qScale >= refQScale)
200
{
201
x265_1.6.tar.gz/source/encoder/ratecontrol.h -> x265_1.7.tar.gz/source/encoder/ratecontrol.h
Changed
22
1
2
double m_rateFactorMaxIncrement; /* Don't allow RF above (CRF + this value). */
3
double m_rateFactorMaxDecrement; /* don't allow RF below (this value). */
4
5
- Predictor m_pred[5];
6
- Predictor m_predBfromP;
7
-
8
+ Predictor m_pred[4]; /* Slice predictors to preidct bits for each Slice type - I,P,Bref and B */
9
int64_t m_leadingNoBSatd;
10
+ int m_predType; /* Type of slice predictors to be used - depends on the slice type */
11
double m_ipOffset;
12
double m_pbOffset;
13
int64_t m_bframeBits;
14
15
double tuneAbrQScaleFromFeedback(double qScale);
16
void accumPQpUpdate();
17
18
+ int getPredictorType(int lowresSliceType, int sliceType);
19
void updateVbv(int64_t bits, RateControlEntry* rce);
20
void updatePredictor(Predictor *p, double q, double var, double bits);
21
double clipQscale(Frame* pic, RateControlEntry* rce, double q);
22
x265_1.6.tar.gz/source/encoder/rdcost.h -> x265_1.7.tar.gz/source/encoder/rdcost.h
Changed
47
1
2
uint32_t m_chromaDistWeight[2];
3
uint32_t m_psyRdBase;
4
uint32_t m_psyRd;
5
- int m_qp;
6
+ int m_qp; /* QP used to configure lambda, may be higher than QP_MAX_SPEC but <= QP_MAX_MAX */
7
8
void setPsyRdScale(double scale) { m_psyRdBase = (uint32_t)floor(65536.0 * scale * 0.33); }
9
10
void setQP(const Slice& slice, int qp)
11
{
12
+ x265_emms(); /* TODO: if the lambda tables were ints, this would not be necessary */
13
m_qp = qp;
14
+ setLambda(x265_lambda2_tab[qp], x265_lambda_tab[qp]);
15
16
/* Scale PSY RD factor by a slice type factor */
17
static const uint32_t psyScaleFix8[3] = { 300, 256, 96 }; /* B, P, I */
18
19
}
20
21
int qpCb, qpCr;
22
- setLambda(x265_lambda2_tab[qp], x265_lambda_tab[qp]);
23
if (slice.m_sps->chromaFormatIdc == X265_CSP_I420)
24
- qpCb = x265_clip3(QP_MIN, QP_MAX_MAX, (int)g_chromaScale[qp + slice.m_pps->chromaQpOffset[0]]);
25
+ {
26
+ qpCb = (int)g_chromaScale[x265_clip3(QP_MIN, QP_MAX_MAX, qp + slice.m_pps->chromaQpOffset[0])];
27
+ qpCr = (int)g_chromaScale[x265_clip3(QP_MIN, QP_MAX_MAX, qp + slice.m_pps->chromaQpOffset[1])];
28
+ }
29
else
30
- qpCb = X265_MIN(qp + slice.m_pps->chromaQpOffset[0], QP_MAX_SPEC);
31
+ {
32
+ qpCb = x265_clip3(QP_MIN, QP_MAX_SPEC, qp + slice.m_pps->chromaQpOffset[0]);
33
+ qpCr = x265_clip3(QP_MIN, QP_MAX_SPEC, qp + slice.m_pps->chromaQpOffset[1]);
34
+ }
35
+
36
int chroma_offset_idx = X265_MIN(qp - qpCb + 12, MAX_CHROMA_LAMBDA_OFFSET);
37
uint16_t lambdaOffset = m_psyRd ? x265_chroma_lambda2_offset_tab[chroma_offset_idx] : 256;
38
m_chromaDistWeight[0] = lambdaOffset;
39
40
- if (slice.m_sps->chromaFormatIdc == X265_CSP_I420)
41
- qpCr = x265_clip3(QP_MIN, QP_MAX_MAX, (int)g_chromaScale[qp + slice.m_pps->chromaQpOffset[0]]);
42
- else
43
- qpCr = X265_MIN(qp + slice.m_pps->chromaQpOffset[0], QP_MAX_SPEC);
44
chroma_offset_idx = X265_MIN(qp - qpCr + 12, MAX_CHROMA_LAMBDA_OFFSET);
45
lambdaOffset = m_psyRd ? x265_chroma_lambda2_offset_tab[chroma_offset_idx] : 256;
46
m_chromaDistWeight[1] = lambdaOffset;
47
x265_1.6.tar.gz/source/encoder/sao.cpp -> x265_1.7.tar.gz/source/encoder/sao.cpp
Changed
149
1
2
pixel* tmpL;
3
pixel* tmpU;
4
5
- int8_t _upBuff1[MAX_CU_SIZE + 2], *upBuff1 = _upBuff1 + 1;
6
+ int8_t _upBuff1[MAX_CU_SIZE + 2], *upBuff1 = _upBuff1 + 1, signLeft1[2];
7
int8_t _upBufft[MAX_CU_SIZE + 2], *upBufft = _upBufft + 1;
8
9
memset(_upBuff1 + MAX_CU_SIZE, 0, 2 * sizeof(int8_t)); /* avoid valgrind uninit warnings */
10
11
{
12
case SAO_EO_0: // dir: -
13
{
14
- pixel firstPxl = 0, lastPxl = 0;
15
+ pixel firstPxl = 0, lastPxl = 0, row1FirstPxl = 0, row1LastPxl = 0;
16
startX = !lpelx;
17
endX = (rpelx == picWidth) ? ctuWidth - 1 : ctuWidth;
18
if (ctuWidth & 15)
19
20
}
21
else
22
{
23
- for (y = 0; y < ctuHeight; y++)
24
+ for (y = 0; y < ctuHeight; y += 2)
25
{
26
- int signLeft = signOf(rec[startX] - tmpL[y]);
27
+ signLeft1[0] = signOf(rec[startX] - tmpL[y]);
28
+ signLeft1[1] = signOf(rec[stride + startX] - tmpL[y + 1]);
29
30
if (!lpelx)
31
+ {
32
firstPxl = rec[0];
33
+ row1FirstPxl = rec[stride];
34
+ }
35
36
if (rpelx == picWidth)
37
+ {
38
lastPxl = rec[ctuWidth - 1];
39
+ row1LastPxl = rec[stride + ctuWidth - 1];
40
+ }
41
42
- primitives.saoCuOrgE0(rec, m_offsetEo, ctuWidth, (int8_t)signLeft);
43
+ primitives.saoCuOrgE0(rec, m_offsetEo, ctuWidth, signLeft1, stride);
44
45
if (!lpelx)
46
+ {
47
rec[0] = firstPxl;
48
+ rec[stride] = row1FirstPxl;
49
+ }
50
51
if (rpelx == picWidth)
52
+ {
53
rec[ctuWidth - 1] = lastPxl;
54
+ rec[stride + ctuWidth - 1] = row1LastPxl;
55
+ }
56
57
- rec += stride;
58
+ rec += 2 * stride;
59
}
60
}
61
break;
62
63
{
64
primitives.sign(upBuff1, rec, tmpU, ctuWidth);
65
66
- for (y = startY; y < endY; y++)
67
+ int diff = (endY - startY) % 2;
68
+ for (y = startY; y < endY - diff; y += 2)
69
{
70
- primitives.saoCuOrgE1(rec, upBuff1, m_offsetEo, stride, ctuWidth);
71
- rec += stride;
72
+ primitives.saoCuOrgE1_2Rows(rec, upBuff1, m_offsetEo, stride, ctuWidth);
73
+ rec += 2 * stride;
74
}
75
+ if (diff & 1)
76
+ primitives.saoCuOrgE1(rec, upBuff1, m_offsetEo, stride, ctuWidth);
77
}
78
79
break;
80
81
for (y = startY; y < endY; y++)
82
{
83
int8_t iSignDown2 = signOf(rec[stride + startX] - tmpL[y]);
84
- pixel firstPxl = rec[0]; // copy first Pxl
85
- pixel lastPxl = rec[ctuWidth - 1];
86
- int8_t one = upBufft[1];
87
- int8_t two = upBufft[endX + 1];
88
89
- primitives.saoCuOrgE2(rec, upBufft, upBuff1, m_offsetEo, ctuWidth, stride);
90
- if (!lpelx)
91
- {
92
- rec[0] = firstPxl;
93
- upBufft[1] = one;
94
- }
95
-
96
- if (rpelx == picWidth)
97
- {
98
- rec[ctuWidth - 1] = lastPxl;
99
- upBufft[endX + 1] = two;
100
- }
101
+ primitives.saoCuOrgE2[endX > 16](rec + startX, upBufft + startX, upBuff1 + startX, m_offsetEo, endX - startX, stride);
102
103
upBufft[startX] = iSignDown2;
104
105
106
upBuff1[x - 1] = -signDown;
107
rec[x] = m_clipTable[rec[x] + m_offsetEo[edgeType]];
108
109
- primitives.saoCuOrgE3(rec, upBuff1, m_offsetEo, stride - 1, startX, endX);
110
+ primitives.saoCuOrgE3[endX > 16](rec, upBuff1, m_offsetEo, stride - 1, startX, endX);
111
112
upBuff1[endX - 1] = signOf(rec[endX - 1 + stride] - rec[endX]);
113
114
115
rec += stride;
116
}
117
118
- if (!(ctuWidth & 15))
119
- primitives.sign(upBuff1, rec, &rec[- stride], ctuWidth);
120
- else
121
- {
122
- for (x = 0; x < ctuWidth; x++)
123
- upBuff1[x] = signOf(rec[x] - rec[x - stride]);
124
- }
125
+ primitives.sign(upBuff1, rec, &rec[- stride], ctuWidth);
126
127
for (y = startY; y < endY; y++)
128
{
129
130
rec += stride;
131
}
132
133
- for (x = startX; x < endX; x++)
134
- upBuff1[x] = signOf(rec[x] - rec[x - stride - 1]);
135
+ primitives.sign(&upBuff1[startX], &rec[startX], &rec[startX - stride - 1], (endX - startX));
136
137
for (y = startY; y < endY; y++)
138
{
139
140
rec += stride;
141
}
142
143
- for (x = startX - 1; x < endX; x++)
144
- upBuff1[x] = signOf(rec[x] - rec[x - stride + 1]);
145
+ primitives.sign(&upBuff1[startX - 1], &rec[startX - 1], &rec[startX - 1 - stride + 1], (endX - startX + 1));
146
147
for (y = startY; y < endY; y++)
148
{
149
x265_1.6.tar.gz/source/encoder/search.cpp -> x265_1.7.tar.gz/source/encoder/search.cpp
Changed
201
1
2
X265_FREE(m_tsRecon);
3
}
4
5
-void Search::setQP(const Slice& slice, int qp)
6
+int Search::setLambdaFromQP(const CUData& ctu, int qp)
7
{
8
- x265_emms(); /* TODO: if the lambda tables were ints, this would not be necessary */
9
+ X265_CHECK(qp >= QP_MIN && qp <= QP_MAX_MAX, "QP used for lambda is out of range\n");
10
+
11
m_me.setQP(qp);
12
- m_rdCost.setQP(slice, qp);
13
+ m_rdCost.setQP(*m_slice, qp);
14
+
15
+ int quantQP = x265_clip3(QP_MIN, QP_MAX_SPEC, qp);
16
+ m_quant.setQPforQuant(ctu, quantQP);
17
+ return quantQP;
18
}
19
20
#if CHECKED_BUILD || _DEBUG
21
22
intraMode.psyEnergy = m_rdCost.psyCost(cuGeom.log2CUSize - 2, fencYuv->m_buf[0], fencYuv->m_size, intraMode.reconYuv.m_buf[0], intraMode.reconYuv.m_size);
23
}
24
updateModeCost(intraMode);
25
- checkDQP(cu, cuGeom);
26
+ checkDQP(intraMode, cuGeom);
27
}
28
29
/* Note that this function does not save the best intra prediction, it must
30
31
32
pixel nScale[129];
33
intraNeighbourBuf[1][0] = intraNeighbourBuf[0][0];
34
- primitives.scale1D_128to64(nScale + 1, intraNeighbourBuf[0] + 1, 0);
35
+ primitives.scale1D_128to64(nScale + 1, intraNeighbourBuf[0] + 1);
36
37
// we do not estimate filtering for downscaled samples
38
- for (int x = 1; x < 65; x++)
39
- {
40
- intraNeighbourBuf[0][x] = nScale[x]; // Top pixel
41
- intraNeighbourBuf[0][x + 64] = nScale[x + 64]; // Left pixel
42
- intraNeighbourBuf[1][x] = nScale[x]; // Top pixel
43
- intraNeighbourBuf[1][x + 64] = nScale[x + 64]; // Left pixel
44
- }
45
+ memcpy(&intraNeighbourBuf[0][1], &nScale[1], 2 * 64 * sizeof(pixel)); // Top & Left pixels
46
+ memcpy(&intraNeighbourBuf[1][1], &nScale[1], 2 * 64 * sizeof(pixel));
47
48
scaleTuSize = 32;
49
scaleStride = 32;
50
51
X265_CHECK(cu.m_partSize[0] == SIZE_2Nx2N, "encodeIntraInInter does not expect NxN intra\n");
52
X265_CHECK(!m_slice->isIntra(), "encodeIntraInInter does not expect to be used in I slices\n");
53
54
- m_quant.setQPforQuant(cu);
55
-
56
uint32_t tuDepthRange[2];
57
cu.getIntraTUQtDepthRange(tuDepthRange, 0);
58
59
60
61
m_entropyCoder.store(intraMode.contexts);
62
updateModeCost(intraMode);
63
- checkDQP(intraMode.cu, cuGeom);
64
+ checkDQP(intraMode, cuGeom);
65
}
66
67
uint32_t Search::estIntraPredQT(Mode &intraMode, const CUGeom& cuGeom, const uint32_t depthRange[2], uint8_t* sharedModes)
68
69
70
pixel nScale[129];
71
intraNeighbourBuf[1][0] = intraNeighbourBuf[0][0];
72
- primitives.scale1D_128to64(nScale + 1, intraNeighbourBuf[0] + 1, 0);
73
+ primitives.scale1D_128to64(nScale + 1, intraNeighbourBuf[0] + 1);
74
75
- // TO DO: primitive
76
- for (int x = 1; x < 65; x++)
77
- {
78
- intraNeighbourBuf[0][x] = nScale[x]; // Top pixel
79
- intraNeighbourBuf[0][x + 64] = nScale[x + 64]; // Left pixel
80
- intraNeighbourBuf[1][x] = nScale[x]; // Top pixel
81
- intraNeighbourBuf[1][x + 64] = nScale[x + 64]; // Left pixel
82
- }
83
+ memcpy(&intraNeighbourBuf[0][1], &nScale[1], 2 * 64 * sizeof(pixel));
84
+ memcpy(&intraNeighbourBuf[1][1], &nScale[1], 2 * 64 * sizeof(pixel));
85
86
scaleTuSize = 32;
87
scaleStride = 32;
88
89
return outCost;
90
}
91
92
+/* Pick between the two AMVP candidates which is the best one to use as
93
+ * MVP for the motion search, based on SAD cost */
94
+int Search::selectMVP(const CUData& cu, const PredictionUnit& pu, const MV amvp[AMVP_NUM_CANDS], int list, int ref)
95
+{
96
+ if (amvp[0] == amvp[1])
97
+ return 0;
98
+
99
+ Yuv& tmpPredYuv = m_rqt[cu.m_cuDepth[0]].tmpPredYuv;
100
+ uint32_t costs[AMVP_NUM_CANDS];
101
+
102
+ for (int i = 0; i < AMVP_NUM_CANDS; i++)
103
+ {
104
+ MV mvCand = amvp[i];
105
+
106
+ // NOTE: skip mvCand if Y is > merange and -FN>1
107
+ if (m_bFrameParallel && (mvCand.y >= (m_param->searchRange + 1) * 4))
108
+ costs[i] = m_me.COST_MAX;
109
+ else
110
+ {
111
+ cu.clipMv(mvCand);
112
+ predInterLumaPixel(pu, tmpPredYuv, *m_slice->m_refPicList[list][ref]->m_reconPic, mvCand);
113
+ costs[i] = m_me.bufSAD(tmpPredYuv.getLumaAddr(pu.puAbsPartIdx), tmpPredYuv.m_size);
114
+ }
115
+ }
116
+
117
+ return costs[0] <= costs[1] ? 0 : 1;
118
+}
119
+
120
void Search::PME::processTasks(int workerThreadId)
121
{
122
#if DETAILED_CU_STATS
123
124
/* Setup slave Search instance for ME for master's CU */
125
if (&slave != this)
126
{
127
- slave.setQP(*m_slice, m_rdCost.m_qp);
128
slave.m_slice = m_slice;
129
slave.m_frame = m_frame;
130
-
131
+ slave.m_param = m_param;
132
+ slave.setLambdaFromQP(pme.mode.cu, m_rdCost.m_qp);
133
slave.m_me.setSourcePU(*pme.mode.fencYuv, pme.pu.ctuAddr, pme.pu.cuAbsPartIdx, pme.pu.puAbsPartIdx, pme.pu.width, pme.pu.height);
134
}
135
136
137
do
138
{
139
if (meId < m_slice->m_numRefIdx[0])
140
- slave.singleMotionEstimation(*this, pme.mode, pme.cuGeom, pme.pu, pme.puIdx, 0, meId);
141
+ slave.singleMotionEstimation(*this, pme.mode, pme.pu, pme.puIdx, 0, meId);
142
else
143
- slave.singleMotionEstimation(*this, pme.mode, pme.cuGeom, pme.pu, pme.puIdx, 1, meId - m_slice->m_numRefIdx[0]);
144
+ slave.singleMotionEstimation(*this, pme.mode, pme.pu, pme.puIdx, 1, meId - m_slice->m_numRefIdx[0]);
145
146
meId = -1;
147
pme.m_lock.acquire();
148
149
while (meId >= 0);
150
}
151
152
-void Search::singleMotionEstimation(Search& master, Mode& interMode, const CUGeom& cuGeom, const PredictionUnit& pu,
153
- int part, int list, int ref)
154
+void Search::singleMotionEstimation(Search& master, Mode& interMode, const PredictionUnit& pu, int part, int list, int ref)
155
{
156
uint32_t bits = master.m_listSelBits[list] + MVP_IDX_BITS;
157
bits += getTUBits(ref, m_slice->m_numRefIdx[list]);
158
159
- MV mvc[(MD_ABOVE_LEFT + 1) * 2 + 1];
160
- int numMvc = interMode.cu.getPMV(interMode.interNeighbours, list, ref, interMode.amvpCand[list][ref], mvc);
161
-
162
- int mvpIdx = 0;
163
- int merange = m_param->searchRange;
164
MotionData* bestME = interMode.bestME[part];
165
166
- if (interMode.amvpCand[list][ref][0] != interMode.amvpCand[list][ref][1])
167
- {
168
- uint32_t bestCost = MAX_INT;
169
- for (int i = 0; i < AMVP_NUM_CANDS; i++)
170
- {
171
- MV mvCand = interMode.amvpCand[list][ref][i];
172
-
173
- // NOTE: skip mvCand if Y is > merange and -FN>1
174
- if (m_bFrameParallel && (mvCand.y >= (merange + 1) * 4))
175
- continue;
176
-
177
- interMode.cu.clipMv(mvCand);
178
-
179
- Yuv& tmpPredYuv = m_rqt[cuGeom.depth].tmpPredYuv;
180
- predInterLumaPixel(pu, tmpPredYuv, *m_slice->m_refPicList[list][ref]->m_reconPic, mvCand);
181
- uint32_t cost = m_me.bufSAD(tmpPredYuv.getLumaAddr(pu.puAbsPartIdx), tmpPredYuv.m_size);
182
+ MV mvc[(MD_ABOVE_LEFT + 1) * 2 + 1];
183
+ int numMvc = interMode.cu.getPMV(interMode.interNeighbours, list, ref, interMode.amvpCand[list][ref], mvc);
184
185
- if (bestCost > cost)
186
- {
187
- bestCost = cost;
188
- mvpIdx = i;
189
- }
190
- }
191
- }
192
+ const MV* amvp = interMode.amvpCand[list][ref];
193
+ int mvpIdx = selectMVP(interMode.cu, pu, amvp, list, ref);
194
+ MV mvmin, mvmax, outmv, mvp = amvp[mvpIdx];
195
196
- MV mvmin, mvmax, outmv, mvp = interMode.amvpCand[list][ref][mvpIdx];
197
- setSearchRange(interMode.cu, mvp, merange, mvmin, mvmax);
198
+ setSearchRange(interMode.cu, mvp, m_param->searchRange, mvmin, mvmax);
199
200
- int satdCost = m_me.motionEstimate(&m_slice->m_mref[list][ref], mvmin, mvmax, mvp, numMvc, mvc, merange, outmv);
201
x265_1.6.tar.gz/source/encoder/search.h -> x265_1.7.tar.gz/source/encoder/search.h
Changed
51
1
2
~Search();
3
4
bool initSearch(const x265_param& param, ScalingList& scalingList);
5
- void setQP(const Slice& slice, int qp);
6
+ int setLambdaFromQP(const CUData& ctu, int qp); /* returns real quant QP in valid spec range */
7
8
// mark temp RD entropy contexts as uninitialized; useful for finding loads without stores
9
void invalidateContexts(int fromDepth);
10
11
void encodeIntraInInter(Mode& intraMode, const CUGeom& cuGeom);
12
13
// estimation inter prediction (non-skip)
14
- void predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bMergeOnly, bool bChroma);
15
+ void predInterSearch(Mode& interMode, const CUGeom& cuGeom, bool bChromaMC);
16
17
// encode residual and compute rd-cost for inter mode
18
void encodeResAndCalcRdInterCU(Mode& interMode, const CUGeom& cuGeom);
19
20
void getBestIntraModeChroma(Mode& intraMode, const CUGeom& cuGeom);
21
22
/* update CBF flags and QP values to be internally consistent */
23
- void checkDQP(CUData& cu, const CUGeom& cuGeom);
24
- void checkDQPForSplitPred(CUData& cu, const CUGeom& cuGeom);
25
+ void checkDQP(Mode& mode, const CUGeom& cuGeom);
26
+ void checkDQPForSplitPred(Mode& mode, const CUGeom& cuGeom);
27
28
class PME : public BondedTaskGroup
29
{
30
31
};
32
33
void processPME(PME& pme, Search& slave);
34
- void singleMotionEstimation(Search& master, Mode& interMode, const CUGeom& cuGeom, const PredictionUnit& pu, int part, int list, int ref);
35
+ void singleMotionEstimation(Search& master, Mode& interMode, const PredictionUnit& pu, int part, int list, int ref);
36
37
protected:
38
39
40
};
41
42
/* inter/ME helper functions */
43
- void checkBestMVP(MV* amvpCand, MV cMv, MV& mvPred, int& mvpIdx, uint32_t& outBits, uint32_t& outCost) const;
44
- void setSearchRange(const CUData& cu, MV mvp, int merange, MV& mvmin, MV& mvmax) const;
45
+ int selectMVP(const CUData& cu, const PredictionUnit& pu, const MV amvp[AMVP_NUM_CANDS], int list, int ref);
46
+ const MV& checkBestMVP(const MV amvpCand[2], const MV& mv, int& mvpIdx, uint32_t& outBits, uint32_t& outCost) const;
47
+ void setSearchRange(const CUData& cu, const MV& mvp, int merange, MV& mvmin, MV& mvmax) const;
48
uint32_t mergeEstimation(CUData& cu, const CUGeom& cuGeom, const PredictionUnit& pu, int puIdx, MergeData& m);
49
static void getBlkBits(PartSize cuMode, bool bPSlice, int puIdx, uint32_t lastMode, uint32_t blockBit[3]);
50
51
x265_1.6.tar.gz/source/encoder/sei.h -> x265_1.7.tar.gz/source/encoder/sei.h
Changed
84
1
2
DECODED_PICTURE_HASH = 132,
3
SCALABLE_NESTING = 133,
4
REGION_REFRESH_INFO = 134,
5
+ MASTERING_DISPLAY_INFO = 137,
6
+ CONTENT_LIGHT_LEVEL_INFO = 144,
7
};
8
9
virtual PayloadType payloadType() const = 0;
10
11
}
12
};
13
14
+class SEIMasteringDisplayColorVolume : public SEI
15
+{
16
+public:
17
+
18
+ uint16_t displayPrimaryX[3];
19
+ uint16_t displayPrimaryY[3];
20
+ uint16_t whitePointX, whitePointY;
21
+ uint32_t maxDisplayMasteringLuminance;
22
+ uint32_t minDisplayMasteringLuminance;
23
+
24
+ PayloadType payloadType() const { return MASTERING_DISPLAY_INFO; }
25
+
26
+ bool parse(const char* value)
27
+ {
28
+ return sscanf(value, "G(%hu,%hu)B(%hu,%hu)R(%hu,%hu)WP(%hu,%hu)L(%u,%u)",
29
+ &displayPrimaryX[0], &displayPrimaryY[0],
30
+ &displayPrimaryX[1], &displayPrimaryY[1],
31
+ &displayPrimaryX[2], &displayPrimaryY[2],
32
+ &whitePointX, &whitePointY,
33
+ &maxDisplayMasteringLuminance, &minDisplayMasteringLuminance) == 10;
34
+ }
35
+
36
+ void write(Bitstream& bs, const SPS&)
37
+ {
38
+ m_bitIf = &bs;
39
+
40
+ WRITE_CODE(MASTERING_DISPLAY_INFO, 8, "payload_type");
41
+ WRITE_CODE(8 * 2 + 2 * 4, 8, "payload_size");
42
+
43
+ for (uint32_t i = 0; i < 3; i++)
44
+ {
45
+ WRITE_CODE(displayPrimaryX[i], 16, "display_primaries_x[ c ]");
46
+ WRITE_CODE(displayPrimaryY[i], 16, "display_primaries_y[ c ]");
47
+ }
48
+ WRITE_CODE(whitePointX, 16, "white_point_x");
49
+ WRITE_CODE(whitePointY, 16, "white_point_y");
50
+ WRITE_CODE(maxDisplayMasteringLuminance, 32, "max_display_mastering_luminance");
51
+ WRITE_CODE(minDisplayMasteringLuminance, 32, "min_display_mastering_luminance");
52
+ }
53
+};
54
+
55
+class SEIContentLightLevel : public SEI
56
+{
57
+public:
58
+
59
+ uint16_t max_content_light_level;
60
+ uint16_t max_pic_average_light_level;
61
+
62
+ PayloadType payloadType() const { return CONTENT_LIGHT_LEVEL_INFO; }
63
+
64
+ bool parse(const char* value)
65
+ {
66
+ return sscanf(value, "%hu,%hu",
67
+ &max_content_light_level, &max_pic_average_light_level) == 2;
68
+ }
69
+
70
+ void write(Bitstream& bs, const SPS&)
71
+ {
72
+ m_bitIf = &bs;
73
+
74
+ WRITE_CODE(CONTENT_LIGHT_LEVEL_INFO, 8, "payload_type");
75
+ WRITE_CODE(4, 8, "payload_size");
76
+ WRITE_CODE(max_content_light_level, 16, "max_content_light_level");
77
+ WRITE_CODE(max_pic_average_light_level, 16, "max_pic_average_light_level");
78
+ }
79
+};
80
+
81
class SEIDecodedPictureHash : public SEI
82
{
83
public:
84
x265_1.6.tar.gz/source/encoder/slicetype.cpp -> x265_1.7.tar.gz/source/encoder/slicetype.cpp
Changed
201
1
2
3
namespace {
4
5
-inline int16_t median(int16_t a, int16_t b, int16_t c)
6
-{
7
- int16_t t = (a - b) & ((a - b) >> 31);
8
-
9
- a -= t;
10
- b += t;
11
- b -= (b - c) & ((b - c) >> 31);
12
- b += (a - b) & ((a - b) >> 31);
13
- return b;
14
-}
15
-
16
-inline void median_mv(MV &dst, MV a, MV b, MV c)
17
-{
18
- dst.x = median(a.x, b.x, c.x);
19
- dst.y = median(a.y, b.y, c.y);
20
-}
21
-
22
/* Compute variance to derive AC energy of each block */
23
inline uint32_t acEnergyVar(Frame *curFrame, uint64_t sum_ssd, int shift, int plane)
24
{
25
26
m_8x8Blocks = m_8x8Width > 2 && m_8x8Height > 2 ? (m_8x8Width - 2) * (m_8x8Height - 2) : m_8x8Width * m_8x8Height;
27
28
m_lastKeyframe = -m_param->keyframeMax;
29
- memset(m_preframes, 0, sizeof(m_preframes));
30
- m_preTotal = m_preAcquired = m_preCompleted = 0;
31
m_sliceTypeBusy = false;
32
m_fullQueueSize = X265_MAX(1, m_param->lookaheadDepth);
33
m_bAdaptiveQuant = m_param->rc.aqMode || m_param->bEnableWeightedPred || m_param->bEnableWeightedBiPred;
34
35
return m_tld && m_scratch;
36
}
37
38
-void Lookahead::stop()
39
+void Lookahead::stopJobs()
40
{
41
if (m_pool && !m_inputQueue.empty())
42
{
43
- m_preLookaheadLock.acquire();
44
+ m_inputLock.acquire();
45
m_isActive = false;
46
bool wait = m_outputSignalRequired = m_sliceTypeBusy;
47
- m_preLookaheadLock.release();
48
+ m_inputLock.release();
49
50
if (wait)
51
m_outputSignal.wait();
52
53
m_filled = true; /* full capacity plus mini-gop lag */
54
}
55
56
- m_preLookaheadLock.acquire();
57
-
58
m_inputLock.acquire();
59
m_inputQueue.pushBack(curFrame);
60
- m_inputLock.release();
61
-
62
- m_preframes[m_preTotal++] = &curFrame;
63
- X265_CHECK(m_preTotal <= X265_LOOKAHEAD_MAX, "prelookahead overflow\n");
64
-
65
- m_preLookaheadLock.release();
66
-
67
- if (m_pool)
68
+ if (m_pool && m_inputQueue.size() >= m_fullQueueSize)
69
tryWakeOne();
70
+ m_inputLock.release();
71
}
72
73
/* Called by API thread */
74
75
m_filled = true;
76
}
77
78
-void Lookahead::findJob(int workerThreadID)
79
+void Lookahead::findJob(int /*workerThreadID*/)
80
{
81
- Frame* preFrame;
82
- bool doDecide;
83
-
84
- if (!m_isActive)
85
- return;
86
-
87
- int tld = workerThreadID;
88
- if (workerThreadID < 0)
89
- tld = m_pool ? m_pool->m_numWorkers : 0;
90
+ bool doDecide;
91
92
- m_preLookaheadLock.acquire();
93
- do
94
- {
95
- preFrame = NULL;
96
- doDecide = false;
97
+ m_inputLock.acquire();
98
+ if (m_inputQueue.size() >= m_fullQueueSize && !m_sliceTypeBusy && m_isActive)
99
+ doDecide = m_sliceTypeBusy = true;
100
+ else
101
+ doDecide = m_helpWanted = false;
102
+ m_inputLock.release();
103
104
- if (m_preTotal > m_preAcquired)
105
- preFrame = m_preframes[m_preAcquired++];
106
- else
107
- {
108
- if (m_preTotal == m_preCompleted)
109
- m_preAcquired = m_preTotal = m_preCompleted = 0;
110
-
111
- /* the worker thread that performs the last pre-lookahead will generally get to run
112
- * slicetypeDecide() */
113
- m_inputLock.acquire();
114
- if (!m_sliceTypeBusy && !m_preTotal && m_inputQueue.size() >= m_fullQueueSize && m_isActive)
115
- doDecide = m_sliceTypeBusy = true;
116
- else
117
- m_helpWanted = false;
118
- m_inputLock.release();
119
- }
120
- m_preLookaheadLock.release();
121
+ if (!doDecide)
122
+ return;
123
124
- if (preFrame)
125
- {
126
- ProfileLookaheadTime(m_preLookaheadElapsedTime, m_countPreLookahead);
127
- ProfileScopeEvent(prelookahead);
128
-
129
- preFrame->m_lowres.init(preFrame->m_fencPic, preFrame->m_poc);
130
- if (m_param->rc.bStatRead && m_param->rc.cuTree && IS_REFERENCED(preFrame))
131
- /* cu-tree offsets were read from stats file */;
132
- else if (m_bAdaptiveQuant)
133
- m_tld[tld].calcAdaptiveQuantFrame(preFrame, m_param);
134
- m_tld[tld].lowresIntraEstimate(preFrame->m_lowres);
135
-
136
- m_preLookaheadLock.acquire(); /* re-acquire for next pass */
137
- m_preCompleted++;
138
- }
139
- else if (doDecide)
140
- {
141
- ProfileLookaheadTime(m_slicetypeDecideElapsedTime, m_countSlicetypeDecide);
142
- ProfileScopeEvent(slicetypeDecideEV);
143
+ ProfileLookaheadTime(m_slicetypeDecideElapsedTime, m_countSlicetypeDecide);
144
+ ProfileScopeEvent(slicetypeDecideEV);
145
146
- slicetypeDecide();
147
+ slicetypeDecide();
148
149
- m_preLookaheadLock.acquire(); /* re-acquire for next pass */
150
- if (m_outputSignalRequired)
151
- {
152
- m_outputSignal.trigger();
153
- m_outputSignalRequired = false;
154
- }
155
- m_sliceTypeBusy = false;
156
- }
157
+ m_inputLock.acquire();
158
+ if (m_outputSignalRequired)
159
+ {
160
+ m_outputSignal.trigger();
161
+ m_outputSignalRequired = false;
162
}
163
- while (preFrame || doDecide);
164
+ m_sliceTypeBusy = false;
165
+ m_inputLock.release();
166
}
167
168
/* Called by API thread */
169
170
if (out)
171
return out;
172
173
- /* process all pending pre-lookahead frames and run slicetypeDecide() if
174
- * necessary */
175
- findJob(-1);
176
+ findJob(-1); /* run slicetypeDecide() if necessary */
177
178
- m_preLookaheadLock.acquire();
179
- bool wait = m_outputSignalRequired = m_sliceTypeBusy || m_preTotal;
180
- m_preLookaheadLock.release();
181
+ m_inputLock.acquire();
182
+ bool wait = m_outputSignalRequired = m_sliceTypeBusy;
183
+ m_inputLock.release();
184
185
if (wait)
186
m_outputSignal.wait();
187
188
{
189
/* aggregate lowres row satds to CTU resolution */
190
curFrame->m_lowres.lowresCostForRc = curFrame->m_lowres.lowresCosts[b - p0][p1 - b];
191
- uint32_t lowresRow = 0, lowresCol = 0, lowresCuIdx = 0, sum = 0;
192
+ uint32_t lowresRow = 0, lowresCol = 0, lowresCuIdx = 0, sum = 0, intraSum = 0;
193
uint32_t scale = m_param->maxCUSize / (2 * X265_LOWRES_CU_SIZE);
194
uint32_t numCuInHeight = (m_param->sourceHeight + g_maxCUSize - 1) / g_maxCUSize;
195
uint32_t widthInLowresCu = (uint32_t)m_8x8Width, heightInLowresCu = (uint32_t)m_8x8Height;
196
197
lowresRow = row * scale;
198
for (uint32_t cnt = 0; cnt < scale && lowresRow < heightInLowresCu; lowresRow++, cnt++)
199
{
200
- sum = 0;
201
x265_1.6.tar.gz/source/encoder/slicetype.h -> x265_1.7.tar.gz/source/encoder/slicetype.h
Changed
50
1
2
Lock m_outputLock;
3
4
/* pre-lookahead */
5
- Frame* m_preframes[X265_LOOKAHEAD_MAX];
6
- int m_preTotal, m_preAcquired, m_preCompleted;
7
int m_fullQueueSize;
8
bool m_isActive;
9
bool m_sliceTypeBusy;
10
11
bool m_outputSignalRequired;
12
bool m_bBatchMotionSearch;
13
bool m_bBatchFrameCosts;
14
- Lock m_preLookaheadLock;
15
Event m_outputSignal;
16
17
LookaheadTLD* m_tld;
18
19
20
bool create();
21
void destroy();
22
- void stop();
23
+ void stopJobs();
24
25
void addPicture(Frame&, int sliceType);
26
void flush();
27
28
int64_t frameCostRecalculate(Lowres **frames, int p0, int p1, int b);
29
};
30
31
+class PreLookaheadGroup : public BondedTaskGroup
32
+{
33
+public:
34
+
35
+ Frame* m_preframes[X265_LOOKAHEAD_MAX];
36
+ Lookahead& m_lookahead;
37
+
38
+ PreLookaheadGroup(Lookahead& l) : m_lookahead(l) {}
39
+
40
+ void processTasks(int workerThreadID);
41
+
42
+protected:
43
+
44
+ PreLookaheadGroup& operator=(const PreLookaheadGroup&);
45
+};
46
+
47
class CostEstimateGroup : public BondedTaskGroup
48
{
49
public:
50
x265_1.6.tar.gz/source/input/input.cpp -> x265_1.7.tar.gz/source/input/input.cpp
Changed
10
1
2
3
using namespace x265;
4
5
-Input* Input::open(InputFileInfo& info, bool bForceY4m)
6
+InputFile* InputFile::open(InputFileInfo& info, bool bForceY4m)
7
{
8
const char * s = strrchr(info.filename, '.');
9
10
x265_1.6.tar.gz/source/input/input.h -> x265_1.7.tar.gz/source/input/input.h
Changed
31
1
2
int sarWidth;
3
int sarHeight;
4
int frameCount;
5
+ int timebaseNum;
6
+ int timebaseDenom;
7
8
/* user supplied */
9
int skipFrames;
10
const char *filename;
11
};
12
13
-class Input
14
+class InputFile
15
{
16
protected:
17
18
- virtual ~Input() {}
19
+ virtual ~InputFile() {}
20
21
public:
22
23
- Input() {}
24
+ InputFile() {}
25
26
- static Input* open(InputFileInfo& info, bool bForceY4m);
27
+ static InputFile* open(InputFileInfo& info, bool bForceY4m);
28
29
virtual void startReader() = 0;
30
31
x265_1.6.tar.gz/source/input/y4m.cpp -> x265_1.7.tar.gz/source/input/y4m.cpp
Changed
29
1
2
for (int i = 0; i < QUEUE_SIZE; i++)
3
buf[i] = NULL;
4
5
- readCount.set(0);
6
- writeCount.set(0);
7
-
8
threadActive = false;
9
colorSpace = info.csp;
10
sarWidth = info.sarWidth;
11
12
void Y4MInput::release()
13
{
14
threadActive = false;
15
- readCount.set(readCount.get()); // unblock file reader
16
+ readCount.poke();
17
stop();
18
delete this;
19
}
20
21
while (threadActive);
22
23
threadActive = false;
24
- writeCount.set(writeCount.get()); // unblock readPicture
25
+ writeCount.poke();
26
}
27
28
bool Y4MInput::populateFrameQueue()
29
x265_1.6.tar.gz/source/input/y4m.h -> x265_1.7.tar.gz/source/input/y4m.h
Changed
10
1
2
namespace x265 {
3
// x265 private namespace
4
5
-class Y4MInput : public Input, public Thread
6
+class Y4MInput : public InputFile, public Thread
7
{
8
protected:
9
10
x265_1.6.tar.gz/source/input/yuv.cpp -> x265_1.7.tar.gz/source/input/yuv.cpp
Changed
28
1
2
for (int i = 0; i < QUEUE_SIZE; i++)
3
buf[i] = NULL;
4
5
- readCount.set(0);
6
- writeCount.set(0);
7
depth = info.depth;
8
width = info.width;
9
height = info.height;
10
11
void YUVInput::release()
12
{
13
threadActive = false;
14
- readCount.set(readCount.get()); // unblock read thread
15
+ readCount.poke();
16
stop();
17
delete this;
18
}
19
20
}
21
22
threadActive = false;
23
- writeCount.set(writeCount.get()); // unblock readPicture
24
+ writeCount.poke();
25
}
26
27
bool YUVInput::populateFrameQueue()
28
x265_1.6.tar.gz/source/input/yuv.h -> x265_1.7.tar.gz/source/input/yuv.h
Changed
10
1
2
namespace x265 {
3
// private x265 namespace
4
5
-class YUVInput : public Input, public Thread
6
+class YUVInput : public InputFile, public Thread
7
{
8
protected:
9
10
x265_1.6.tar.gz/source/output/output.cpp -> x265_1.7.tar.gz/source/output/output.cpp
Changed
33
1
2
/*****************************************************************************
3
- * Copyright (C) 2013 x265 project
4
+ * Copyright (C) 2013-2015 x265 project
5
*
6
* Authors: Steve Borho <steve@borho.org>
7
+ * Xinyue Lu <i@7086.in>
8
*
9
* This program is free software; you can redistribute it and/or modify
10
* it under the terms of the GNU General Public License as published by
11
12
#include "yuv.h"
13
#include "y4m.h"
14
15
+#include "raw.h"
16
+
17
using namespace x265;
18
19
-Output* Output::open(const char *fname, int width, int height, uint32_t bitdepth, uint32_t fpsNum, uint32_t fpsDenom, int csp)
20
+ReconFile* ReconFile::open(const char *fname, int width, int height, uint32_t bitdepth, uint32_t fpsNum, uint32_t fpsDenom, int csp)
21
{
22
const char * s = strrchr(fname, '.');
23
24
25
else
26
return new YUVOutput(fname, width, height, bitdepth, csp);
27
}
28
+
29
+OutputFile* OutputFile::open(const char *fname, InputFileInfo& inputInfo)
30
+{
31
+ return new RAWOutput(fname, inputInfo);
32
+}
33
x265_1.6.tar.gz/source/output/output.h -> x265_1.7.tar.gz/source/output/output.h
Changed
76
1
2
/*****************************************************************************
3
- * Copyright (C) 2013 x265 project
4
+ * Copyright (C) 2013-2015 x265 project
5
*
6
* Authors: Steve Borho <steve@borho.org>
7
+ * Xinyue Lu <i@7086.in>
8
*
9
* This program is free software; you can redistribute it and/or modify
10
* it under the terms of the GNU General Public License as published by
11
12
#define X265_OUTPUT_H
13
14
#include "x265.h"
15
+#include "input/input.h"
16
17
namespace x265 {
18
// private x265 namespace
19
20
-class Output
21
+class ReconFile
22
{
23
protected:
24
25
- virtual ~Output() {}
26
+ virtual ~ReconFile() {}
27
28
public:
29
30
- Output() {}
31
+ ReconFile() {}
32
33
- static Output* open(const char *fname, int width, int height, uint32_t bitdepth,
34
- uint32_t fpsNum, uint32_t fpsDenom, int csp);
35
+ static ReconFile* open(const char *fname, int width, int height, uint32_t bitdepth,
36
+ uint32_t fpsNum, uint32_t fpsDenom, int csp);
37
38
virtual bool isFail() const = 0;
39
40
41
42
virtual const char *getName() const = 0;
43
};
44
+
45
+class OutputFile
46
+{
47
+protected:
48
+
49
+ virtual ~OutputFile() {}
50
+
51
+public:
52
+
53
+ OutputFile() {}
54
+
55
+ static OutputFile* open(const char* fname, InputFileInfo& inputInfo);
56
+
57
+ virtual bool isFail() const = 0;
58
+
59
+ virtual bool needPTS() const = 0;
60
+
61
+ virtual void release() = 0;
62
+
63
+ virtual const char* getName() const = 0;
64
+
65
+ virtual void setParam(x265_param* param) = 0;
66
+
67
+ virtual int writeHeaders(const x265_nal* nal, uint32_t nalcount) = 0;
68
+
69
+ virtual int writeFrame(const x265_nal* nal, uint32_t nalcount, x265_picture& pic) = 0;
70
+
71
+ virtual void closeFile(int64_t largest_pts, int64_t second_largest_pts) = 0;
72
+};
73
}
74
75
#endif // ifndef X265_OUTPUT_H
76
x265_1.7.tar.gz/source/output/raw.cpp
Added
82
1
2
+/*****************************************************************************
3
+ * Copyright (C) 2013-2015 x265 project
4
+ *
5
+ * Authors: Steve Borho <steve@borho.org>
6
+ * Xinyue Lu <i@7086.in>
7
+ *
8
+ * This program is free software; you can redistribute it and/or modify
9
+ * it under the terms of the GNU General Public License as published by
10
+ * the Free Software Foundation; either version 2 of the License, or
11
+ * (at your option) any later version.
12
+ *
13
+ * This program is distributed in the hope that it will be useful,
14
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
15
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16
+ * GNU General Public License for more details.
17
+ *
18
+ * You should have received a copy of the GNU General Public License
19
+ * along with this program; if not, write to the Free Software
20
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
21
+ *
22
+ * This program is also available under a commercial proprietary license.
23
+ * For more information, contact us at license @ x265.com.
24
+ *****************************************************************************/
25
+
26
+#include "raw.h"
27
+
28
+using namespace x265;
29
+using namespace std;
30
+
31
+RAWOutput::RAWOutput(const char* fname, InputFileInfo&)
32
+{
33
+ b_fail = false;
34
+ if (!strcmp(fname, "-"))
35
+ {
36
+ ofs = &cout;
37
+ return;
38
+ }
39
+ ofs = new ofstream(fname, ios::binary | ios::out);
40
+ if (ofs->fail())
41
+ b_fail = true;
42
+}
43
+
44
+void RAWOutput::setParam(x265_param* param)
45
+{
46
+ param->bAnnexB = true;
47
+}
48
+
49
+int RAWOutput::writeHeaders(const x265_nal* nal, uint32_t nalcount)
50
+{
51
+ uint32_t bytes = 0;
52
+
53
+ for (uint32_t i = 0; i < nalcount; i++)
54
+ {
55
+ ofs->write((const char*)nal->payload, nal->sizeBytes);
56
+ bytes += nal->sizeBytes;
57
+ nal++;
58
+ }
59
+
60
+ return bytes;
61
+}
62
+
63
+int RAWOutput::writeFrame(const x265_nal* nal, uint32_t nalcount, x265_picture&)
64
+{
65
+ uint32_t bytes = 0;
66
+
67
+ for (uint32_t i = 0; i < nalcount; i++)
68
+ {
69
+ ofs->write((const char*)nal->payload, nal->sizeBytes);
70
+ bytes += nal->sizeBytes;
71
+ nal++;
72
+ }
73
+
74
+ return bytes;
75
+}
76
+
77
+void RAWOutput::closeFile(int64_t, int64_t)
78
+{
79
+ if (ofs != &cout)
80
+ delete ofs;
81
+}
82
x265_1.7.tar.gz/source/output/raw.h
Added
66
1
2
+/*****************************************************************************
3
+ * Copyright (C) 2013-2015 x265 project
4
+ *
5
+ * Authors: Steve Borho <steve@borho.org>
6
+ * Xinyue Lu <i@7086.in>
7
+ *
8
+ * This program is free software; you can redistribute it and/or modify
9
+ * it under the terms of the GNU General Public License as published by
10
+ * the Free Software Foundation; either version 2 of the License, or
11
+ * (at your option) any later version.
12
+ *
13
+ * This program is distributed in the hope that it will be useful,
14
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
15
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16
+ * GNU General Public License for more details.
17
+ *
18
+ * You should have received a copy of the GNU General Public License
19
+ * along with this program; if not, write to the Free Software
20
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
21
+ *
22
+ * This program is also available under a commercial proprietary license.
23
+ * For more information, contact us at license @ x265.com.
24
+ *****************************************************************************/
25
+
26
+#ifndef X265_HEVC_RAW_H
27
+#define X265_HEVC_RAW_H
28
+
29
+#include "output.h"
30
+#include "common.h"
31
+#include <fstream>
32
+#include <iostream>
33
+
34
+namespace x265 {
35
+class RAWOutput : public OutputFile
36
+{
37
+protected:
38
+
39
+ std::ostream* ofs;
40
+
41
+ bool b_fail;
42
+
43
+public:
44
+
45
+ RAWOutput(const char* fname, InputFileInfo&);
46
+
47
+ bool isFail() const { return b_fail; }
48
+
49
+ bool needPTS() const { return false; }
50
+
51
+ void release() { delete this; }
52
+
53
+ const char* getName() const { return "raw"; }
54
+
55
+ void setParam(x265_param* param);
56
+
57
+ int writeHeaders(const x265_nal* nal, uint32_t nalcount);
58
+
59
+ int writeFrame(const x265_nal* nal, uint32_t nalcount, x265_picture&);
60
+
61
+ void closeFile(int64_t largest_pts, int64_t second_largest_pts);
62
+};
63
+}
64
+
65
+#endif // ifndef X265_HEVC_RAW_H
66
x265_1.7.tar.gz/source/output/reconplay.cpp
Added
199
1
2
+/*****************************************************************************
3
+ * Copyright (C) 2013 x265 project
4
+ *
5
+ * Authors: Peixuan Zhang <zhangpeixuancn@gmail.com>
6
+ * Chunli Zhang <chunli@multicorewareinc.com>
7
+ *
8
+ * This program is free software; you can redistribute it and/or modify
9
+ * it under the terms of the GNU General Public License as published by
10
+ * the Free Software Foundation; either version 2 of the License, or
11
+ * (at your option) any later version.
12
+ *
13
+ * This program is distributed in the hope that it will be useful,
14
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
15
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16
+ * GNU General Public License for more details.
17
+ *
18
+ * You should have received a copy of the GNU General Public License
19
+ * along with this program; if not, write to the Free Software
20
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
21
+ *
22
+ * This program is also available under a commercial proprietary license.
23
+ * For more information, contact us at license @ x265.com.
24
+ *****************************************************************************/
25
+
26
+#include "common.h"
27
+#include "reconplay.h"
28
+
29
+#include <signal.h>
30
+
31
+using namespace x265;
32
+
33
+#if _WIN32
34
+#define popen _popen
35
+#define pclose _pclose
36
+#define pipemode "wb"
37
+#else
38
+#define pipemode "w"
39
+#endif
40
+
41
+bool ReconPlay::pipeValid;
42
+
43
+#ifndef _WIN32
44
+static void sigpipe_handler(int)
45
+{
46
+ if (ReconPlay::pipeValid)
47
+ general_log(NULL, "exec", X265_LOG_ERROR, "pipe closed\n");
48
+ ReconPlay::pipeValid = false;
49
+}
50
+#endif
51
+
52
+ReconPlay::ReconPlay(const char* commandLine, x265_param& param)
53
+{
54
+#ifndef _WIN32
55
+ if (signal(SIGPIPE, sigpipe_handler) == SIG_ERR)
56
+ general_log(¶m, "exec", X265_LOG_ERROR, "Unable to register SIGPIPE handler: %s\n", strerror(errno));
57
+#endif
58
+
59
+ width = param.sourceWidth;
60
+ height = param.sourceHeight;
61
+ colorSpace = param.internalCsp;
62
+
63
+ frameSize = 0;
64
+ for (int i = 0; i < x265_cli_csps[colorSpace].planes; i++)
65
+ frameSize += (uint32_t)((width >> x265_cli_csps[colorSpace].width[i]) * (height >> x265_cli_csps[colorSpace].height[i]));
66
+
67
+ for (int i = 0; i < RECON_BUF_SIZE; i++)
68
+ {
69
+ poc[i] = -1;
70
+ CHECKED_MALLOC(frameData[i], pixel, frameSize);
71
+ }
72
+
73
+ outputPipe = popen(commandLine, pipemode);
74
+ if (outputPipe)
75
+ {
76
+ const char* csp = (colorSpace >= X265_CSP_I444) ? "444" : (colorSpace >= X265_CSP_I422) ? "422" : "420";
77
+ const char* depth = (param.internalBitDepth == 10) ? "p10" : "";
78
+
79
+ fprintf(outputPipe, "YUV4MPEG2 W%d H%d F%d:%d Ip C%s%s\n", width, height, param.fpsNum, param.fpsDenom, csp, depth);
80
+
81
+ pipeValid = true;
82
+ threadActive = true;
83
+ start();
84
+ return;
85
+ }
86
+ else
87
+ general_log(¶m, "exec", X265_LOG_ERROR, "popen(%s) failed\n", commandLine);
88
+
89
+fail:
90
+ threadActive = false;
91
+}
92
+
93
+ReconPlay::~ReconPlay()
94
+{
95
+ if (threadActive)
96
+ {
97
+ threadActive = false;
98
+ writeCount.poke();
99
+ stop();
100
+ }
101
+
102
+ if (outputPipe)
103
+ pclose(outputPipe);
104
+
105
+ for (int i = 0; i < RECON_BUF_SIZE; i++)
106
+ X265_FREE(frameData[i]);
107
+}
108
+
109
+bool ReconPlay::writePicture(const x265_picture& pic)
110
+{
111
+ if (!threadActive || !pipeValid)
112
+ return false;
113
+
114
+ int written = writeCount.get();
115
+ int read = readCount.get();
116
+ int currentCursor = pic.poc % RECON_BUF_SIZE;
117
+
118
+ /* TODO: it's probably better to drop recon pictures when the ring buffer is
119
+ * backed up on the display app */
120
+ while (written - read > RECON_BUF_SIZE - 2 || poc[currentCursor] != -1)
121
+ {
122
+ read = readCount.waitForChange(read);
123
+ if (!threadActive)
124
+ return false;
125
+ }
126
+
127
+ X265_CHECK(pic.colorSpace == colorSpace, "invalid color space\n");
128
+ X265_CHECK(pic.bitDepth == X265_DEPTH, "invalid bit depth\n");
129
+
130
+ pixel* buf = frameData[currentCursor];
131
+ for (int i = 0; i < x265_cli_csps[colorSpace].planes; i++)
132
+ {
133
+ char* src = (char*)pic.planes[i];
134
+ int pwidth = width >> x265_cli_csps[colorSpace].width[i];
135
+
136
+ for (int h = 0; h < height >> x265_cli_csps[colorSpace].height[i]; h++)
137
+ {
138
+ memcpy(buf, src, pwidth * sizeof(pixel));
139
+ src += pic.stride[i];
140
+ buf += pwidth;
141
+ }
142
+ }
143
+
144
+ poc[currentCursor] = pic.poc;
145
+ writeCount.incr();
146
+
147
+ return true;
148
+}
149
+
150
+void ReconPlay::threadMain()
151
+{
152
+ THREAD_NAME("ReconPlayOutput", 0);
153
+
154
+ do
155
+ {
156
+ /* extract the next output picture in display order and write to pipe */
157
+ if (!outputFrame())
158
+ break;
159
+ }
160
+ while (threadActive);
161
+
162
+ threadActive = false;
163
+ readCount.poke();
164
+}
165
+
166
+bool ReconPlay::outputFrame()
167
+{
168
+ int written = writeCount.get();
169
+ int read = readCount.get();
170
+ int currentCursor = read % RECON_BUF_SIZE;
171
+
172
+ while (poc[currentCursor] != read)
173
+ {
174
+ written = writeCount.waitForChange(written);
175
+ if (!threadActive)
176
+ return false;
177
+ }
178
+
179
+ char* buf = (char*)frameData[currentCursor];
180
+ intptr_t remainSize = frameSize * sizeof(pixel);
181
+
182
+ fprintf(outputPipe, "FRAME\n");
183
+ while (remainSize > 0)
184
+ {
185
+ intptr_t retCount = (intptr_t)fwrite(buf, sizeof(char), remainSize, outputPipe);
186
+
187
+ if (retCount < 0 || !pipeValid)
188
+ /* pipe failure, stop writing and start dropping recon pictures */
189
+ return false;
190
+
191
+ buf += retCount;
192
+ remainSize -= retCount;
193
+ }
194
+
195
+ poc[currentCursor] = -1;
196
+ readCount.incr();
197
+ return true;
198
+}
199
x265_1.7.tar.gz/source/output/reconplay.h
Added
76
1
2
+/*****************************************************************************
3
+ * Copyright (C) 2013 x265 project
4
+ *
5
+ * Authors: Peixuan Zhang <zhangpeixuancn@gmail.com>
6
+ * Chunli Zhang <chunli@multicorewareinc.com>
7
+ *
8
+ * This program is free software; you can redistribute it and/or modify
9
+ * it under the terms of the GNU General Public License as published by
10
+ * the Free Software Foundation; either version 2 of the License, or
11
+ * (at your option) any later version.
12
+ *
13
+ * This program is distributed in the hope that it will be useful,
14
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
15
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16
+ * GNU General Public License for more details.
17
+ *
18
+ * You should have received a copy of the GNU General Public License
19
+ * along with this program; if not, write to the Free Software
20
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02111, USA.
21
+ *
22
+ * This program is also available under a commercial proprietary license.
23
+ * For more information, contact us at license @ x265.com.
24
+ *****************************************************************************/
25
+
26
+#ifndef X265_RECONPLAY_H
27
+#define X265_RECONPLAY_H
28
+
29
+#include "x265.h"
30
+#include "threading.h"
31
+#include <cstdio>
32
+
33
+namespace x265 {
34
+// private x265 namespace
35
+
36
+class ReconPlay : public Thread
37
+{
38
+public:
39
+
40
+ ReconPlay(const char* commandLine, x265_param& param);
41
+
42
+ virtual ~ReconPlay();
43
+
44
+ bool writePicture(const x265_picture& pic);
45
+
46
+ static bool pipeValid;
47
+
48
+protected:
49
+
50
+ enum { RECON_BUF_SIZE = 40 };
51
+
52
+ FILE* outputPipe; /* The output pipe for player */
53
+ size_t frameSize; /* size of one frame in pixels */
54
+ bool threadActive; /* worker thread is active */
55
+ int width; /* width of frame */
56
+ int height; /* height of frame */
57
+ int colorSpace; /* color space of frame */
58
+
59
+ int poc[RECON_BUF_SIZE];
60
+ pixel* frameData[RECON_BUF_SIZE];
61
+
62
+ /* Note that the class uses read and write counters to signal that reads and
63
+ * writes have occurred in the ring buffer, but writes into the buffer
64
+ * happen in decode order and the reader must check that the POC it next
65
+ * needs to send to the pipe is in fact present. The counters are used to
66
+ * prevent the writer from getting too far ahead of the reader */
67
+ ThreadSafeInteger readCount;
68
+ ThreadSafeInteger writeCount;
69
+
70
+ void threadMain();
71
+ bool outputFrame();
72
+};
73
+}
74
+
75
+#endif // ifndef X265_RECONPLAY_H
76
x265_1.6.tar.gz/source/output/y4m.h -> x265_1.7.tar.gz/source/output/y4m.h
Changed
10
1
2
namespace x265 {
3
// private x265 namespace
4
5
-class Y4MOutput : public Output
6
+class Y4MOutput : public ReconFile
7
{
8
protected:
9
10
x265_1.6.tar.gz/source/output/yuv.h -> x265_1.7.tar.gz/source/output/yuv.h
Changed
10
1
2
namespace x265 {
3
// private x265 namespace
4
5
-class YUVOutput : public Output
6
+class YUVOutput : public ReconFile
7
{
8
protected:
9
10
x265_1.6.tar.gz/source/test/ipfilterharness.cpp -> x265_1.7.tar.gz/source/test/ipfilterharness.cpp
Changed
201
1
2
}
3
}
4
5
-bool IPFilterHarness::check_IPFilter_primitive(filter_p2s_wxh_t ref, filter_p2s_wxh_t opt, int isChroma, int csp)
6
-{
7
- intptr_t rand_srcStride;
8
- int min_size = isChroma ? 2 : 4;
9
- int max_size = isChroma ? (MAX_CU_SIZE >> 1) : MAX_CU_SIZE;
10
-
11
- if (isChroma && (csp == X265_CSP_I444))
12
- {
13
- min_size = 4;
14
- max_size = MAX_CU_SIZE;
15
- }
16
-
17
- for (int i = 0; i < ITERS; i++)
18
- {
19
- int index = i % TEST_CASES;
20
- int rand_height = (int16_t)rand() % 100;
21
- int rand_width = (int16_t)rand() % 100;
22
-
23
- rand_srcStride = rand_width + rand() % 100;
24
- if (rand_srcStride < rand_width)
25
- rand_srcStride = rand_width;
26
-
27
- rand_width &= ~(min_size - 1);
28
- rand_width = x265_clip3(min_size, max_size, rand_width);
29
-
30
- rand_height &= ~(min_size - 1);
31
- rand_height = x265_clip3(min_size, max_size, rand_height);
32
-
33
- ref(pixel_test_buff[index],
34
- rand_srcStride,
35
- IPF_C_output_s,
36
- rand_width,
37
- rand_height);
38
-
39
- checked(opt, pixel_test_buff[index],
40
- rand_srcStride,
41
- IPF_vec_output_s,
42
- rand_width,
43
- rand_height);
44
-
45
- if (memcmp(IPF_vec_output_s, IPF_C_output_s, TEST_BUF_SIZE * sizeof(int16_t)))
46
- return false;
47
-
48
- reportfail();
49
- }
50
-
51
- return true;
52
-}
53
-
54
bool IPFilterHarness::check_IPFilterChroma_primitive(filter_pp_t ref, filter_pp_t opt)
55
{
56
intptr_t rand_srcStride, rand_dstStride;
57
58
{
59
intptr_t rand_srcStride = rand() % 100;
60
int index = i % TEST_CASES;
61
+ intptr_t dstStride = rand() % 100 + 64;
62
63
- ref(pixel_test_buff[index] + i, rand_srcStride, IPF_C_output_s);
64
+ ref(pixel_test_buff[index] + i, rand_srcStride, IPF_C_output_s, dstStride);
65
66
- checked(opt, pixel_test_buff[index] + i, rand_srcStride, IPF_vec_output_s);
67
+ checked(opt, pixel_test_buff[index] + i, rand_srcStride, IPF_vec_output_s, dstStride);
68
69
- if (memcmp(IPF_vec_output_s, IPF_C_output_s, TEST_BUF_SIZE * sizeof(pixel)))
70
+ if (memcmp(IPF_vec_output_s, IPF_C_output_s, TEST_BUF_SIZE * sizeof(int16_t)))
71
return false;
72
73
reportfail();
74
75
{
76
intptr_t rand_srcStride = rand() % 100;
77
int index = i % TEST_CASES;
78
+ intptr_t dstStride = rand() % 100 + 64;
79
80
- ref(pixel_test_buff[index] + i, rand_srcStride, IPF_C_output_s);
81
+ ref(pixel_test_buff[index] + i, rand_srcStride, IPF_C_output_s, dstStride);
82
83
- checked(opt, pixel_test_buff[index] + i, rand_srcStride, IPF_vec_output_s);
84
+ checked(opt, pixel_test_buff[index] + i, rand_srcStride, IPF_vec_output_s, dstStride);
85
86
- if (memcmp(IPF_vec_output_s, IPF_C_output_s, TEST_BUF_SIZE * sizeof(pixel)))
87
+ if (memcmp(IPF_vec_output_s, IPF_C_output_s, TEST_BUF_SIZE * sizeof(int16_t)))
88
return false;
89
90
reportfail();
91
92
93
bool IPFilterHarness::testCorrectness(const EncoderPrimitives& ref, const EncoderPrimitives& opt)
94
{
95
- if (opt.luma_p2s)
96
- {
97
- // last parameter does not matter in case of luma
98
- if (!check_IPFilter_primitive(ref.luma_p2s, opt.luma_p2s, 0, 1))
99
- {
100
- printf("luma_p2s failed\n");
101
- return false;
102
- }
103
- }
104
105
for (int value = 0; value < NUM_PU_SIZES; value++)
106
{
107
108
return false;
109
}
110
}
111
- if (opt.pu[value].filter_p2s)
112
+ if (opt.pu[value].convert_p2s)
113
{
114
- if (!check_IPFilterLumaP2S_primitive(ref.pu[value].filter_p2s, opt.pu[value].filter_p2s))
115
+ if (!check_IPFilterLumaP2S_primitive(ref.pu[value].convert_p2s, opt.pu[value].convert_p2s))
116
{
117
- printf("filter_p2s[%s]", lumaPartStr[value]);
118
+ printf("convert_p2s[%s]", lumaPartStr[value]);
119
return false;
120
}
121
}
122
123
124
for (int csp = X265_CSP_I420; csp < X265_CSP_COUNT; csp++)
125
{
126
- if (opt.chroma[csp].p2s)
127
- {
128
- if (!check_IPFilter_primitive(ref.chroma[csp].p2s, opt.chroma[csp].p2s, 1, csp))
129
- {
130
- printf("chroma_p2s[%s]", x265_source_csp_names[csp]);
131
- return false;
132
- }
133
- }
134
for (int value = 0; value < NUM_PU_SIZES; value++)
135
{
136
if (opt.chroma[csp].pu[value].filter_hpp)
137
138
return false;
139
}
140
}
141
- if (opt.chroma[csp].pu[value].chroma_p2s)
142
+ if (opt.chroma[csp].pu[value].p2s)
143
{
144
- if (!check_IPFilterChromaP2S_primitive(ref.chroma[csp].pu[value].chroma_p2s, opt.chroma[csp].pu[value].chroma_p2s))
145
+ if (!check_IPFilterChromaP2S_primitive(ref.chroma[csp].pu[value].p2s, opt.chroma[csp].pu[value].p2s))
146
{
147
printf("chroma_p2s[%s]", chromaPartStr[csp][value]);
148
return false;
149
150
151
void IPFilterHarness::measureSpeed(const EncoderPrimitives& ref, const EncoderPrimitives& opt)
152
{
153
- int height = 64;
154
- int width = 64;
155
int16_t srcStride = 96;
156
int16_t dstStride = 96;
157
int maxVerticalfilterHalfDistance = 3;
158
159
- if (opt.luma_p2s)
160
- {
161
- printf("luma_p2s\t");
162
- REPORT_SPEEDUP(opt.luma_p2s, ref.luma_p2s,
163
- pixel_buff, srcStride, IPF_vec_output_s, width, height);
164
- }
165
-
166
for (int value = 0; value < NUM_PU_SIZES; value++)
167
{
168
if (opt.pu[value].luma_hpp)
169
170
pixel_buff + 3 * srcStride, srcStride, IPF_vec_output_p, srcStride, 1, 3);
171
}
172
173
- if (opt.pu[value].filter_p2s)
174
+ if (opt.pu[value].convert_p2s)
175
{
176
- printf("filter_p2s [%s]\t", lumaPartStr[value]);
177
- REPORT_SPEEDUP(opt.pu[value].filter_p2s, ref.pu[value].filter_p2s,
178
- pixel_buff, srcStride, IPF_vec_output_s);
179
+ printf("convert_p2s[%s]\t", lumaPartStr[value]);
180
+ REPORT_SPEEDUP(opt.pu[value].convert_p2s, ref.pu[value].convert_p2s,
181
+ pixel_buff, srcStride,
182
+ IPF_vec_output_s, dstStride);
183
}
184
}
185
186
for (int csp = X265_CSP_I420; csp < X265_CSP_COUNT; csp++)
187
{
188
printf("= Color Space %s =\n", x265_source_csp_names[csp]);
189
- if (opt.chroma[csp].p2s)
190
- {
191
- printf("chroma_p2s\t");
192
- REPORT_SPEEDUP(opt.chroma[csp].p2s, ref.chroma[csp].p2s,
193
- pixel_buff, srcStride, IPF_vec_output_s, width, height);
194
- }
195
for (int value = 0; value < NUM_PU_SIZES; value++)
196
{
197
if (opt.chroma[csp].pu[value].filter_hpp)
198
199
short_buff + maxVerticalfilterHalfDistance * srcStride, srcStride,
200
IPF_vec_output_s, dstStride, 1);
201
x265_1.6.tar.gz/source/test/ipfilterharness.h -> x265_1.7.tar.gz/source/test/ipfilterharness.h
Changed
9
1
2
pixel pixel_test_buff[TEST_CASES][TEST_BUF_SIZE];
3
int16_t short_test_buff[TEST_CASES][TEST_BUF_SIZE];
4
5
- bool check_IPFilter_primitive(filter_p2s_wxh_t ref, filter_p2s_wxh_t opt, int isChroma, int csp);
6
bool check_IPFilterChroma_primitive(filter_pp_t ref, filter_pp_t opt);
7
bool check_IPFilterChroma_ps_primitive(filter_ps_t ref, filter_ps_t opt);
8
bool check_IPFilterChroma_hps_primitive(filter_hps_t ref, filter_hps_t opt);
9
x265_1.6.tar.gz/source/test/pixelharness.cpp -> x265_1.7.tar.gz/source/test/pixelharness.cpp
Changed
201
1
2
return true;
3
}
4
5
-bool PixelHarness::check_scale_pp(scale_t ref, scale_t opt)
6
+bool PixelHarness::check_scale1D_pp(scale1D_t ref, scale1D_t opt)
7
+{
8
+ ALIGN_VAR_16(pixel, ref_dest[64 * 64]);
9
+ ALIGN_VAR_16(pixel, opt_dest[64 * 64]);
10
+
11
+ memset(ref_dest, 0, sizeof(ref_dest));
12
+ memset(opt_dest, 0, sizeof(opt_dest));
13
+
14
+ int j = 0;
15
+ for (int i = 0; i < ITERS; i++)
16
+ {
17
+ int index = i % TEST_CASES;
18
+ checked(opt, opt_dest, pixel_test_buff[index] + j);
19
+ ref(ref_dest, pixel_test_buff[index] + j);
20
+
21
+ if (memcmp(ref_dest, opt_dest, 64 * 64 * sizeof(pixel)))
22
+ return false;
23
+
24
+ reportfail();
25
+ j += INCR;
26
+ }
27
+
28
+ return true;
29
+}
30
+
31
+bool PixelHarness::check_scale2D_pp(scale2D_t ref, scale2D_t opt)
32
{
33
ALIGN_VAR_16(pixel, ref_dest[64 * 64]);
34
ALIGN_VAR_16(pixel, opt_dest[64 * 64]);
35
36
37
bool PixelHarness::check_calSign(sign_t ref, sign_t opt)
38
{
39
- ALIGN_VAR_16(int8_t, ref_dest[64 * 64]);
40
- ALIGN_VAR_16(int8_t, opt_dest[64 * 64]);
41
+ ALIGN_VAR_16(int8_t, ref_dest[64 * 2]);
42
+ ALIGN_VAR_16(int8_t, opt_dest[64 * 2]);
43
44
memset(ref_dest, 0xCD, sizeof(ref_dest));
45
memset(opt_dest, 0xCD, sizeof(opt_dest));
46
47
48
for (int i = 0; i < ITERS; i++)
49
{
50
- int width = 16 * (rand() % 4 + 1);
51
+ int width = (rand() % 64) + 1;
52
53
ref(ref_dest, pbuf2 + j, pbuf3 + j, width);
54
checked(opt, opt_dest, pbuf2 + j, pbuf3 + j, width);
55
56
- if (memcmp(ref_dest, opt_dest, 64 * 64 * sizeof(int8_t)))
57
+ if (memcmp(ref_dest, opt_dest, sizeof(ref_dest)))
58
return false;
59
60
reportfail();
61
62
for (int i = 0; i < ITERS; i++)
63
{
64
int width = 16 * (rand() % 4 + 1);
65
- int8_t sign = rand() % 3;
66
- if (sign == 2)
67
- sign = -1;
68
+ int stride = width + 1;
69
70
- ref(ref_dest, psbuf1 + j, width, sign);
71
- checked(opt, opt_dest, psbuf1 + j, width, sign);
72
+ ref(ref_dest, psbuf1 + j, width, psbuf2 + j, stride);
73
+ checked(opt, opt_dest, psbuf1 + j, width, psbuf5 + j, stride);
74
75
if (memcmp(ref_dest, opt_dest, 64 * 64 * sizeof(pixel)))
76
return false;
77
78
return true;
79
}
80
81
-bool PixelHarness::check_saoCuOrgE2_t(saoCuOrgE2_t ref, saoCuOrgE2_t opt)
82
+bool PixelHarness::check_saoCuOrgE2_t(saoCuOrgE2_t ref[2], saoCuOrgE2_t opt[2])
83
+{
84
+ ALIGN_VAR_16(pixel, ref_dest[64 * 64]);
85
+ ALIGN_VAR_16(pixel, opt_dest[64 * 64]);
86
+
87
+ memset(ref_dest, 0xCD, sizeof(ref_dest));
88
+ memset(opt_dest, 0xCD, sizeof(opt_dest));
89
+
90
+ for (int id = 0; id < 2; id++)
91
+ {
92
+ int j = 0;
93
+ if (opt[id])
94
+ {
95
+ for (int i = 0; i < ITERS; i++)
96
+ {
97
+ int width = 16 * (1 << (id * (rand() % 2 + 1))) - (rand() % 2);
98
+ int stride = width + 1;
99
+
100
+ ref[width > 16](ref_dest, psbuf1 + j, psbuf2 + j, psbuf3 + j, width, stride);
101
+ checked(opt[width > 16], opt_dest, psbuf4 + j, psbuf2 + j, psbuf3 + j, width, stride);
102
+
103
+ if (memcmp(psbuf1 + j, psbuf4 + j, width * sizeof(int8_t)))
104
+ return false;
105
+
106
+ if (memcmp(ref_dest, opt_dest, 64 * 64 * sizeof(pixel)))
107
+ return false;
108
+
109
+ reportfail();
110
+ j += INCR;
111
+ }
112
+ }
113
+ }
114
+
115
+ return true;
116
+}
117
+
118
+bool PixelHarness::check_saoCuOrgE3_t(saoCuOrgE3_t ref, saoCuOrgE3_t opt)
119
{
120
ALIGN_VAR_16(pixel, ref_dest[64 * 64]);
121
ALIGN_VAR_16(pixel, opt_dest[64 * 64]);
122
123
124
for (int i = 0; i < ITERS; i++)
125
{
126
- int width = 16 * (rand() % 4 + 1);
127
- int stride = width + 1;
128
-
129
- ref(ref_dest, psbuf1 + j, psbuf2 + j, psbuf3 + j, width, stride);
130
- checked(opt, opt_dest, psbuf4 + j, psbuf2 + j, psbuf3 + j, width, stride);
131
+ int stride = 16 * (rand() % 4 + 1);
132
+ int start = rand() % 2;
133
+ int end = 16 - rand() % 2;
134
135
- if (memcmp(psbuf1 + j, psbuf4 + j, width * sizeof(int8_t)))
136
- return false;
137
+ ref(ref_dest, psbuf2 + j, psbuf1 + j, stride, start, end);
138
+ checked(opt, opt_dest, psbuf5 + j, psbuf1 + j, stride, start, end);
139
140
- if (memcmp(ref_dest, opt_dest, 64 * 64 * sizeof(pixel)))
141
+ if (memcmp(ref_dest, opt_dest, 64 * 64 * sizeof(pixel)) || memcmp(psbuf2, psbuf5, BUFFSIZE))
142
return false;
143
144
reportfail();
145
146
return true;
147
}
148
149
-bool PixelHarness::check_saoCuOrgE3_t(saoCuOrgE3_t ref, saoCuOrgE3_t opt)
150
+bool PixelHarness::check_saoCuOrgE3_32_t(saoCuOrgE3_t ref, saoCuOrgE3_t opt)
151
{
152
ALIGN_VAR_16(pixel, ref_dest[64 * 64]);
153
ALIGN_VAR_16(pixel, opt_dest[64 * 64]);
154
155
156
for (int i = 0; i < ITERS; i++)
157
{
158
- int stride = 16 * (rand() % 4 + 1);
159
+ int stride = 32 * (rand() % 2 + 1);
160
int start = rand() % 2;
161
- int end = (16 * (rand() % 4 + 1)) - rand() % 2;
162
+ int end = (32 * (rand() % 2 + 1)) - rand() % 2;
163
164
ref(ref_dest, psbuf2 + j, psbuf1 + j, stride, start, end);
165
checked(opt, opt_dest, psbuf5 + j, psbuf1 + j, stride, start, end);
166
167
168
memset(ref_dest, 0xCD, sizeof(ref_dest));
169
memset(opt_dest, 0xCD, sizeof(opt_dest));
170
-
171
- int width = 16 + rand() % 48;
172
- int height = 16 + rand() % 48;
173
+ int width = 32 + rand() % 32;
174
+ int height = 32 + rand() % 32;
175
intptr_t srcStride = 64;
176
intptr_t dstStride = width;
177
int j = 0;
178
179
for (int i = 0; i < ITERS; i++)
180
{
181
int width = 16 * (rand() % 4 + 1);
182
- int height = rand() % 64 +1;
183
- int stride = rand() % 65;
184
+ int height = rand() % 63 + 2;
185
+ int stride = width;
186
187
ref(ref_dest, psbuf1 + j, width, height, stride);
188
checked(opt, opt_dest, psbuf1 + j, width, height, stride);
189
190
return true;
191
}
192
193
-bool PixelHarness::check_findPosLast(findPosLast_t ref, findPosLast_t opt)
194
+bool PixelHarness::check_scanPosLast(scanPosLast_t ref, scanPosLast_t opt)
195
{
196
ALIGN_VAR_16(coeff_t, ref_src[32 * 32 + ITERS * 2]);
197
uint8_t ref_coeffNum[MLS_GRP_NUM], opt_coeffNum[MLS_GRP_NUM]; // value range[0, 16]
198
199
for (int i = 0; i < 32 * 32; i++)
200
{
201
x265_1.6.tar.gz/source/test/pixelharness.h -> x265_1.7.tar.gz/source/test/pixelharness.h
Changed
32
1
2
bool check_pixelavg_pp(pixelavg_pp_t ref, pixelavg_pp_t opt);
3
bool check_pixel_sub_ps(pixel_sub_ps_t ref, pixel_sub_ps_t opt);
4
bool check_pixel_add_ps(pixel_add_ps_t ref, pixel_add_ps_t opt);
5
- bool check_scale_pp(scale_t ref, scale_t opt);
6
+ bool check_scale1D_pp(scale1D_t ref, scale1D_t opt);
7
+ bool check_scale2D_pp(scale2D_t ref, scale2D_t opt);
8
bool check_ssd_s(pixel_ssd_s_t ref, pixel_ssd_s_t opt);
9
bool check_blockfill_s(blockfill_s_t ref, blockfill_s_t opt);
10
bool check_calresidual(calcresidual_t ref, calcresidual_t opt);
11
12
bool check_addAvg(addAvg_t, addAvg_t);
13
bool check_saoCuOrgE0_t(saoCuOrgE0_t ref, saoCuOrgE0_t opt);
14
bool check_saoCuOrgE1_t(saoCuOrgE1_t ref, saoCuOrgE1_t opt);
15
- bool check_saoCuOrgE2_t(saoCuOrgE2_t ref, saoCuOrgE2_t opt);
16
+ bool check_saoCuOrgE2_t(saoCuOrgE2_t ref[], saoCuOrgE2_t opt[]);
17
bool check_saoCuOrgE3_t(saoCuOrgE3_t ref, saoCuOrgE3_t opt);
18
+ bool check_saoCuOrgE3_32_t(saoCuOrgE3_t ref, saoCuOrgE3_t opt);
19
bool check_saoCuOrgB0_t(saoCuOrgB0_t ref, saoCuOrgB0_t opt);
20
bool check_planecopy_sp(planecopy_sp_t ref, planecopy_sp_t opt);
21
bool check_planecopy_cp(planecopy_cp_t ref, planecopy_cp_t opt);
22
23
bool check_psyCost_pp(pixelcmp_t ref, pixelcmp_t opt);
24
bool check_psyCost_ss(pixelcmp_ss_t ref, pixelcmp_ss_t opt);
25
bool check_calSign(sign_t ref, sign_t opt);
26
- bool check_findPosLast(findPosLast_t ref, findPosLast_t opt);
27
+ bool check_scanPosLast(scanPosLast_t ref, scanPosLast_t opt);
28
+ bool check_findPosFirstLast(findPosFirstLast_t ref, findPosFirstLast_t opt);
29
30
public:
31
32
x265_1.6.tar.gz/source/test/rate-control-tests.txt -> x265_1.7.tar.gz/source/test/rate-control-tests.txt
Changed
72
1
2
-# List of command lines to be run by rate control regression tests, see https://bitbucket.org/sborho/test-harness
3
-
4
-# This test is listed first since it currently reproduces bugs
5
-big_buck_bunny_360p24.y4m,--preset medium --bitrate 1000 --pass 1 -F4,--preset medium --bitrate 1000 --pass 2 -F4
6
-
7
-# VBV tests, non-deterministic so testing for correctness and bitrate
8
-# fluctuations - up to 1% bitrate fluctuation is allowed between runs
9
-RaceHorses_416x240_30_10bit.yuv,--preset medium --bitrate 700 --vbv-bufsize 900 --vbv-maxrate 700
10
-RaceHorses_416x240_30_10bit.yuv,--preset superfast --bitrate 600 --vbv-bufsize 600 --vbv-maxrate 600
11
-RaceHorses_416x240_30_10bit.yuv,--preset veryslow --bitrate 1100 --vbv-bufsize 1100 --vbv-maxrate 1200
12
-112_1920x1080_25.yuv,--preset medium --bitrate 1000 --vbv-maxrate 1500 --vbv-bufsize 1500 --aud
13
-112_1920x1080_25.yuv,--preset medium --bitrate 10000 --vbv-maxrate 10000 --vbv-bufsize 15000 --hrd
14
-112_1920x1080_25.yuv,--preset medium --bitrate 4000 --vbv-maxrate 12000 --vbv-bufsize 12000 --repeat-headers
15
-112_1920x1080_25.yuv,--preset superfast --bitrate 1000 --vbv-maxrate 1000 --vbv-bufsize 1500 --hrd --strict-cbr
16
-112_1920x1080_25.yuv,--preset superfast --bitrate 30000 --vbv-maxrate 30000 --vbv-bufsize 30000 --repeat-headers
17
-112_1920x1080_25.yuv,--preset superfast --bitrate 4000 --vbv-maxrate 6000 --vbv-bufsize 6000 --aud
18
-112_1920x1080_25.yuv,--preset veryslow --bitrate 1000 --vbv-maxrate 3000 --vbv-bufsize 3000 --repeat-headers
19
-big_buck_bunny_360p24.y4m,--preset medium --bitrate 1000 --vbv-bufsize 3000 --vbv-maxrate 3000 --repeat-headers
20
-big_buck_bunny_360p24.y4m,--preset medium --bitrate 3000 --vbv-bufsize 3000 --vbv-maxrate 3000 --hrd
21
-big_buck_bunny_360p24.y4m,--preset medium --bitrate 400 --vbv-bufsize 600 --vbv-maxrate 600 --aud
22
-big_buck_bunny_360p24.y4m,--preset medium --crf 1 --vbv-bufsize 3000 --vbv-maxrate 3000 --hrd
23
-big_buck_bunny_360p24.y4m,--preset superfast --bitrate 1000 --vbv-bufsize 1000 --vbv-maxrate 1000 --aud --strict-cbr
24
-big_buck_bunny_360p24.y4m,--preset superfast --bitrate 3000 --vbv-bufsize 9000 --vbv-maxrate 9000 --repeat-headers
25
-big_buck_bunny_360p24.y4m,--preset superfast --bitrate 400 --vbv-bufsize 600 --vbv-maxrate 400 --hrd
26
-big_buck_bunny_360p24.y4m,--preset superfast --crf 6 --vbv-bufsize 1000 --vbv-maxrate 1000 --aud
27
-
28
-# multi-pass rate control tests
29
-big_buck_bunny_360p24.y4m,--preset slow --crf 40 --pass 1,--preset slow --bitrate 200 --pass 2
30
-big_buck_bunny_360p24.y4m,--preset medium --bitrate 700 --pass 1 -F4 --slow-firstpass,--preset medium --bitrate 700 --vbv-bufsize 900 --vbv-maxrate 700 --pass 2 -F4
31
-112_1920x1080_25.yuv,--preset slow --bitrate 1000 --pass 1 -F4,--preset slow --bitrate 1000 --pass 2 -F4
32
-112_1920x1080_25.yuv,--preset superfast --crf 12 --pass 1,--preset superfast --bitrate 4000 --pass 2 -F4
33
-RaceHorses_416x240_30_10bit.yuv,--preset veryslow --crf 40 --pass 1, --preset veryslow --bitrate 200 --pass 2 -F4
34
-RaceHorses_416x240_30_10bit.yuv,--preset superfast --bitrate 600 --pass 1 -F4 --slow-firstpass,--preset superfast --bitrate 600 --pass 2 -F4
35
-RaceHorses_416x240_30_10bit.yuv,--preset medium --crf 26 --pass 1,--preset medium --bitrate 500 --pass 3 -F4,--preset medium --bitrate 500 --pass 2 -F4
36
+# List of command lines to be run by rate control regression tests, see https://bitbucket.org/sborho/test-harness
37
+
38
+#These tests should yeild deterministic results
39
+# This test is listed first since it currently reproduces bugs
40
+big_buck_bunny_360p24.y4m,--preset medium --bitrate 1000 --pass 1 -F4,--preset medium --bitrate 1000 --pass 2 -F4
41
+fire_1920x1080_30.yuv, --preset slow --bitrate 2000 --tune zero-latency
42
+
43
+
44
+# VBV tests, non-deterministic so testing for correctness and bitrate
45
+# fluctuations - up to 1% bitrate fluctuation is allowed between runs
46
+night_cars_1920x1080_30.yuv,--preset medium --crf 25 --vbv-bufsize 5000 --vbv-maxrate 5000 -F6 --crf-max 34 --crf-min 22
47
+ducks_take_off_420_720p50.y4m,--preset slow --bitrate 1600 --vbv-bufsize 1600 --vbv-maxrate 1600 --strict-cbr --aq-mode 2 --aq-strength 0.5
48
+CrowdRun_1920x1080_50_10bit_422.yuv,--preset veryslow --bitrate 4000 --vbv-bufsize 3000 --vbv-maxrate 4000 --tune grain
49
+fire_1920x1080_30.yuv,--preset medium --bitrate 1000 --vbv-maxrate 1500 --vbv-bufsize 1500 --aud --pmode --tune ssim
50
+112_1920x1080_25.yuv,--preset ultrafast --bitrate 10000 --vbv-maxrate 10000 --vbv-bufsize 15000 --hrd --strict-cbr
51
+Traffic_4096x2048_30.yuv,--preset superfast --bitrate 20000 --vbv-maxrate 20000 --vbv-bufsize 20000 --repeat-headers --strict-cbr
52
+Traffic_4096x2048_30.yuv,--preset faster --bitrate 8000 --vbv-maxrate 8000 --vbv-bufsize 6000 --aud --repeat-headers --no-open-gop --hrd --pmode --pme
53
+News-4k.y4m,--preset veryfast --bitrate 3000 --vbv-maxrate 5000 --vbv-bufsize 5000 --repeat-headers --temporal-layers
54
+NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset medium --bitrate 18000 --vbv-bufsize 20000 --vbv-maxrate 18000 --strict-cbr
55
+NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset medium --bitrate 8000 --vbv-bufsize 12000 --vbv-maxrate 10000 --tune grain
56
+big_buck_bunny_360p24.y4m,--preset medium --bitrate 400 --vbv-bufsize 600 --vbv-maxrate 600 --aud --hrd --tune fast-decode
57
+sita_1920x1080_30.yuv,--preset superfast --crf 25 --vbv-bufsize 3000 --vbv-maxrate 4000 --vbv-bufsize 5000 --hrd --crf-max 30
58
+sita_1920x1080_30.yuv,--preset superfast --bitrate 3000 --vbv-bufsize 3000 --vbv-maxrate 3000 --aud --strict-cbr
59
+
60
+
61
+
62
+# multi-pass rate control tests
63
+big_buck_bunny_360p24.y4m,--preset slow --crf 40 --pass 1 -f 5000,--preset slow --bitrate 200 --pass 2 -f 5000
64
+big_buck_bunny_360p24.y4m,--preset medium --bitrate 700 --pass 1 -F4 --slow-firstpass -f 5000 ,--preset medium --bitrate 700 --vbv-bufsize 900 --vbv-maxrate 700 --pass 2 -F4 -f 5000
65
+112_1920x1080_25.yuv,--preset fast --bitrate 1000 --vbv-maxrate 1000 --vbv-bufsize 1000 --strict-cbr --pass 1 -F4,--preset fast --bitrate 1000 --vbv-maxrate 3000 --vbv-bufsize 3000 --pass 2 -F4
66
+pine_tree_1920x1080_30.yuv,--preset veryfast --crf 12 --pass 1 -F4,--preset faster --bitrate 4000 --pass 2 -F4
67
+SteamLocomotiveTrain_2560x1600_60_10bit_crop.yuv, --tune grain --preset ultrafast --bitrate 5000 --vbv-maxrate 5000 --vbv-bufsize 8000 --strict-cbr -F4 --pass 1, --tune grain --preset ultrafast --bitrate 8000 --vbv-maxrate 8000 --vbv-bufsize 8000 -F4 --pass 2
68
+RaceHorses_416x240_30_10bit.yuv,--preset medium --crf 40 --pass 1, --preset faster --bitrate 200 --pass 2 -F4
69
+CrowdRun_1920x1080_50_10bit_422.yuv,--preset superfast --bitrate 2500 --pass 1 -F4 --slow-firstpass,--preset superfast --bitrate 2500 --pass 2 -F4
70
+RaceHorses_416x240_30_10bit.yuv,--preset medium --crf 26 --vbv-maxrate 1000 --vbv-bufsize 1000 --pass 1,--preset fast --bitrate 1000 --vbv-maxrate 1000 --vbv-bufsize 700 --pass 3 -F4,--preset slow --bitrate 500 --vbv-maxrate 500 --vbv-bufsize 700 --pass 2 -F4
71
+
72
x265_1.6.tar.gz/source/test/regression-tests.txt -> x265_1.7.tar.gz/source/test/regression-tests.txt
Changed
64
1
2
# not auto-detected.
3
4
BasketballDrive_1920x1080_50.y4m,--preset faster --aq-strength 2 --merange 190
5
-BasketballDrive_1920x1080_50.y4m,--preset medium --ctu 16 --max-tu-size 8 --subme 7
6
+BasketballDrive_1920x1080_50.y4m,--preset medium --ctu 16 --max-tu-size 8 --subme 7 --qg-size 32
7
BasketballDrive_1920x1080_50.y4m,--preset medium --keyint -1 --nr-inter 100 -F4 --no-sao
8
-BasketballDrive_1920x1080_50.y4m,--preset slow --nr-intra 100 -F4 --aq-strength 3
9
+BasketballDrive_1920x1080_50.y4m,--preset slow --nr-intra 100 -F4 --aq-strength 3 --qg-size 16
10
BasketballDrive_1920x1080_50.y4m,--preset slower --lossless --chromaloc 3 --subme 0
11
BasketballDrive_1920x1080_50.y4m,--preset superfast --psy-rd 1 --ctu 16 --no-wpp
12
BasketballDrive_1920x1080_50.y4m,--preset ultrafast --signhide --colormatrix bt709
13
14
CrowdRun_1920x1080_50_10bit_422.yuv,--preset slow --no-wpp --tune ssim --transfer smpte240m
15
CrowdRun_1920x1080_50_10bit_422.yuv,--preset slower --tune ssim --tune fastdecode
16
CrowdRun_1920x1080_50_10bit_422.yuv,--preset superfast --weightp --no-wpp --sao
17
-CrowdRun_1920x1080_50_10bit_422.yuv,--preset ultrafast --weightp --tune zerolatency
18
+CrowdRun_1920x1080_50_10bit_422.yuv,--preset ultrafast --weightp --tune zerolatency --qg-size 16
19
CrowdRun_1920x1080_50_10bit_422.yuv,--preset veryfast --temporal-layers --tune grain
20
CrowdRun_1920x1080_50_10bit_444.yuv,--preset medium --dither --keyint -1 --rdoq-level 1
21
CrowdRun_1920x1080_50_10bit_444.yuv,--preset superfast --weightp --dither --no-psy-rd
22
23
CrowdRun_1920x1080_50_10bit_444.yuv,--preset veryfast --temporal-layers --repeat-headers
24
CrowdRun_1920x1080_50_10bit_444.yuv,--preset veryslow --tskip --tskip-fast --no-scenecut
25
DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset medium --tune psnr --bframes 16
26
-DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset slow --temporal-layers --no-psy-rd
27
-DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset superfast --weightp
28
+DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset slow --temporal-layers --no-psy-rd --qg-size 32
29
+DucksAndLegs_1920x1080_60_10bit_422.yuv,--preset superfast --weightp --qg-size 16
30
DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset medium --nr-inter 500 -F4 --no-psy-rdoq
31
DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset slower --no-weightp --rdoq-level 0
32
DucksAndLegs_1920x1080_60_10bit_444.yuv,--preset veryfast --weightp --nr-intra 1000 -F4
33
34
Kimono1_1920x1080_24_10bit_444.yuv,--preset superfast --weightb
35
KristenAndSara_1280x720_60.y4m,--preset medium --no-cutree --max-tu-size 16
36
KristenAndSara_1280x720_60.y4m,--preset slower --pmode --max-tu-size 8
37
-KristenAndSara_1280x720_60.y4m,--preset superfast --min-cu-size 16
38
+KristenAndSara_1280x720_60.y4m,--preset superfast --min-cu-size 16 --qg-size 16
39
KristenAndSara_1280x720_60.y4m,--preset ultrafast --strong-intra-smoothing
40
NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset medium --tune grain
41
NebutaFestival_2560x1600_60_10bit_crop.yuv,--preset superfast --tune psnr
42
-News-4k.y4m,--preset medium --tune ssim --no-sao
43
+News-4k.y4m,--preset medium --tune ssim --no-sao --qg-size 32
44
News-4k.y4m,--preset superfast --lookahead-slices 6 --aq-mode 0
45
OldTownCross_1920x1080_50_10bit_422.yuv,--preset medium --no-weightp
46
OldTownCross_1920x1080_50_10bit_422.yuv,--preset slower --tune fastdecode
47
48
parkrun_ter_720p50.y4m,--preset slower --fast-intra --no-rect --tune grain
49
silent_cif_420.y4m,--preset medium --me full --rect --amp
50
silent_cif_420.y4m,--preset superfast --weightp --rect
51
-silent_cif_420.y4m,--preset placebo --ctu 32 --no-sao
52
+silent_cif_420.y4m,--preset placebo --ctu 32 --no-sao --qg-size 16
53
vtc1nw_422_ntsc.y4m,--preset medium --scaling-list default --ctu 16 --ref 5
54
-vtc1nw_422_ntsc.y4m,--preset slower --nr-inter 1000 -F4 --tune fast-decode
55
+vtc1nw_422_ntsc.y4m,--preset slower --nr-inter 1000 -F4 --tune fast-decode --qg-size 16
56
vtc1nw_422_ntsc.y4m,--preset superfast --weightp --nr-intra 100 -F4
57
washdc_422_ntsc.y4m,--preset faster --rdoq-level 1 --max-merge 5
58
washdc_422_ntsc.y4m,--preset medium --no-weightp --max-tu-size 4
59
-washdc_422_ntsc.y4m,--preset slower --psy-rdoq 2.0 --rdoq-level 2
60
+washdc_422_ntsc.y4m,--preset slower --psy-rdoq 2.0 --rdoq-level 2 --qg-size 32
61
washdc_422_ntsc.y4m,--preset superfast --psy-rd 1 --tune zerolatency
62
washdc_422_ntsc.y4m,--preset ultrafast --weightp --tu-intra-depth 4
63
washdc_422_ntsc.y4m,--preset veryfast --tu-inter-depth 4
64
x265_1.6.tar.gz/source/test/smoke-tests.txt -> x265_1.7.tar.gz/source/test/smoke-tests.txt
Changed
23
1
2
# List of command lines to be run by smoke tests, see https://bitbucket.org/sborho/test-harness
3
4
+# consider VBV tests a failure if new bitrate is more than 5% different
5
+# from the old bitrate
6
+# vbv-tolerance = 0.05
7
+
8
big_buck_bunny_360p24.y4m,--preset=superfast --bitrate 400 --vbv-bufsize 600 --vbv-maxrate 400 --hrd --aud --repeat-headers
9
big_buck_bunny_360p24.y4m,--preset=medium --bitrate 1000 -F4 --cu-lossless --scaling-list default
10
-big_buck_bunny_360p24.y4m,--preset=slower --no-weightp --cu-stats --pme
11
-washdc_422_ntsc.y4m,--preset=faster --no-strong-intra-smoothing --keyint 1
12
+big_buck_bunny_360p24.y4m,--preset=slower --no-weightp --cu-stats --pme --qg-size 16
13
+washdc_422_ntsc.y4m,--preset=faster --no-strong-intra-smoothing --keyint 1 --qg-size 16
14
washdc_422_ntsc.y4m,--preset=medium --qp 40 --nr-inter 400 -F4
15
washdc_422_ntsc.y4m,--preset=veryslow --pmode --tskip --rdoq-level 0
16
old_town_cross_444_720p50.y4m,--preset=ultrafast --weightp --keyint -1
17
old_town_cross_444_720p50.y4m,--preset=fast --keyint 20 --min-cu-size 16
18
-old_town_cross_444_720p50.y4m,--preset=slow --sao-non-deblock --pmode
19
+old_town_cross_444_720p50.y4m,--preset=slow --sao-non-deblock --pmode --qg-size 32
20
RaceHorses_416x240_30_10bit.yuv,--preset=veryfast --cu-stats --max-tu-size 8
21
RaceHorses_416x240_30_10bit.yuv,--preset=slower --bitrate 500 -F4 --rdoq-level 1
22
CrowdRun_1920x1080_50_10bit_444.yuv,--preset=ultrafast --constrained-intra --min-keyint 5 --keyint 10
23
x265_1.6.tar.gz/source/test/testbench.cpp -> x265_1.7.tar.gz/source/test/testbench.cpp
Changed
9
1
2
{ "AVX", X265_CPU_AVX },
3
{ "XOP", X265_CPU_XOP },
4
{ "AVX2", X265_CPU_AVX2 },
5
+ { "BMI2", X265_CPU_AVX2 | X265_CPU_BMI1 | X265_CPU_BMI2 },
6
{ "", 0 },
7
};
8
9
x265_1.6.tar.gz/source/x265.cpp -> x265_1.7.tar.gz/source/x265.cpp
Changed
201
1
2
3
#include "input/input.h"
4
#include "output/output.h"
5
+#include "output/reconplay.h"
6
#include "filters/filters.h"
7
#include "common.h"
8
#include "param.h"
9
10
#include <string>
11
#include <ostream>
12
#include <fstream>
13
+#include <queue>
14
15
+#define CONSOLE_TITLE_SIZE 200
16
#ifdef _WIN32
17
#include <windows.h>
18
+static char orgConsoleTitle[CONSOLE_TITLE_SIZE] = "";
19
#else
20
#define GetConsoleTitle(t, n)
21
#define SetConsoleTitle(t)
22
+#define SetThreadExecutionState(es)
23
#endif
24
25
using namespace x265;
26
27
28
struct CLIOptions
29
{
30
- Input* input;
31
- Output* recon;
32
- std::fstream bitstreamFile;
33
+ InputFile* input;
34
+ ReconFile* recon;
35
+ OutputFile* output;
36
+ FILE* qpfile;
37
+ const char* reconPlayCmd;
38
+ const x265_api* api;
39
+ x265_param* param;
40
bool bProgress;
41
bool bForceY4m;
42
bool bDither;
43
-
44
uint32_t seek; // number of frames to skip from the beginning
45
uint32_t framesToBeEncoded; // number of frames to encode
46
uint64_t totalbytes;
47
- size_t analysisRecordSize; // number of bytes read from or dumped into file
48
- int analysisHeaderSize;
49
-
50
int64_t startTime;
51
int64_t prevUpdateTime;
52
- float frameRate;
53
- FILE* qpfile;
54
- FILE* analysisFile;
55
56
/* in microseconds */
57
static const int UPDATE_INTERVAL = 250000;
58
59
CLIOptions()
60
{
61
- frameRate = 0.f;
62
input = NULL;
63
recon = NULL;
64
+ output = NULL;
65
+ qpfile = NULL;
66
+ reconPlayCmd = NULL;
67
+ api = NULL;
68
+ param = NULL;
69
framesToBeEncoded = seek = 0;
70
totalbytes = 0;
71
bProgress = true;
72
73
startTime = x265_mdate();
74
prevUpdateTime = 0;
75
bDither = false;
76
- qpfile = NULL;
77
- analysisFile = NULL;
78
- analysisRecordSize = 0;
79
- analysisHeaderSize = 0;
80
}
81
82
void destroy();
83
- void writeNALs(const x265_nal* nal, uint32_t nalcount);
84
- void printStatus(uint32_t frameNum, x265_param *param);
85
- bool parse(int argc, char **argv, x265_param* param);
86
+ void printStatus(uint32_t frameNum);
87
+ bool parse(int argc, char **argv);
88
bool parseQPFile(x265_picture &pic_org);
89
- bool validateFanout(x265_param*);
90
};
91
92
void CLIOptions::destroy()
93
94
if (qpfile)
95
fclose(qpfile);
96
qpfile = NULL;
97
- if (analysisFile)
98
- fclose(analysisFile);
99
- analysisFile = NULL;
100
+ if (output)
101
+ output->release();
102
+ output = NULL;
103
}
104
105
-void CLIOptions::writeNALs(const x265_nal* nal, uint32_t nalcount)
106
-{
107
- ProfileScopeEvent(bitstreamWrite);
108
- for (uint32_t i = 0; i < nalcount; i++)
109
- {
110
- bitstreamFile.write((const char*)nal->payload, nal->sizeBytes);
111
- totalbytes += nal->sizeBytes;
112
- nal++;
113
- }
114
-}
115
-
116
-void CLIOptions::printStatus(uint32_t frameNum, x265_param *param)
117
+void CLIOptions::printStatus(uint32_t frameNum)
118
{
119
char buf[200];
120
int64_t time = x265_mdate();
121
122
prevUpdateTime = time;
123
}
124
125
-bool CLIOptions::parse(int argc, char **argv, x265_param* param)
126
+bool CLIOptions::parse(int argc, char **argv)
127
{
128
bool bError = 0;
129
int help = 0;
130
int inputBitDepth = 8;
131
+ int outputBitDepth = 0;
132
int reconFileBitDepth = 0;
133
const char *inputfn = NULL;
134
const char *reconfn = NULL;
135
- const char *bitstreamfn = NULL;
136
+ const char *outputfn = NULL;
137
const char *preset = NULL;
138
const char *tune = NULL;
139
const char *profile = NULL;
140
141
int c = getopt_long(argc, argv, short_options, long_options, NULL);
142
if (c == -1)
143
break;
144
- if (c == 'p')
145
+ else if (c == 'p')
146
preset = optarg;
147
- if (c == 't')
148
+ else if (c == 't')
149
tune = optarg;
150
+ else if (c == 'D')
151
+ outputBitDepth = atoi(optarg);
152
else if (c == '?')
153
showHelp(param);
154
}
155
156
- if (x265_param_default_preset(param, preset, tune) < 0)
157
+ api = x265_api_get(outputBitDepth);
158
+ if (!api)
159
+ {
160
+ x265_log(NULL, X265_LOG_WARNING, "falling back to default bit-depth\n");
161
+ api = x265_api_get(0);
162
+ }
163
+
164
+ param = api->param_alloc();
165
+ if (!param)
166
+ {
167
+ x265_log(NULL, X265_LOG_ERROR, "param alloc failed\n");
168
+ return true;
169
+ }
170
+
171
+ if (api->param_default_preset(param, preset, tune) < 0)
172
{
173
x265_log(NULL, X265_LOG_ERROR, "preset or tune unrecognized\n");
174
return true;
175
176
int long_options_index = -1;
177
int c = getopt_long(argc, argv, short_options, long_options, &long_options_index);
178
if (c == -1)
179
- {
180
break;
181
- }
182
183
switch (c)
184
{
185
186
OPT2("frame-skip", "seek") this->seek = (uint32_t)x265_atoi(optarg, bError);
187
OPT("frames") this->framesToBeEncoded = (uint32_t)x265_atoi(optarg, bError);
188
OPT("no-progress") this->bProgress = false;
189
- OPT("output") bitstreamfn = optarg;
190
+ OPT("output") outputfn = optarg;
191
OPT("input") inputfn = optarg;
192
OPT("recon") reconfn = optarg;
193
OPT("input-depth") inputBitDepth = (uint32_t)x265_atoi(optarg, bError);
194
195
OPT("profile") profile = optarg; /* handled last */
196
OPT("preset") /* handled above */;
197
OPT("tune") /* handled above */;
198
+ OPT("output-depth") /* handled above */;
199
+ OPT("recon-y4m-exec") reconPlayCmd = optarg;
200
OPT("qpfile")
201
x265_1.6.tar.gz/source/x265.def.in -> x265_1.7.tar.gz/source/x265.def.in
Changed
9
1
2
x265_build_info_str
3
x265_encoder_headers
4
x265_encoder_parameters
5
+x265_encoder_reconfig
6
x265_encoder_encode
7
x265_encoder_get_stats
8
x265_encoder_log
9
x265_1.6.tar.gz/source/x265.h -> x265_1.7.tar.gz/source/x265.h
Changed
159
1
2
*
3
* Frame encoders are distributed between the available thread pools, and
4
* the encoder will never generate more thread pools than frameNumThreads */
5
- char* numaPools;
6
+ const char* numaPools;
7
8
/* Enable wavefront parallel processing, greatly increases parallelism for
9
* less than 1% compression efficiency loss. Requires a thread pool, enabled
10
11
* order. Otherwise the encoder will emit per-stream statistics into the log
12
* file when x265_encoder_log is called (presumably at the end of the
13
* encode) */
14
- char* csvfn;
15
+ const char* csvfn;
16
17
/*== Internal Picture Specification ==*/
18
19
20
* performance. Value must be between 1 and 16, default is 3 */
21
int maxNumReferences;
22
23
+ /* Allow libx265 to emit HEVC bitstreams which do not meet strict level
24
+ * requirements. Defaults to false */
25
+ int bAllowNonConformance;
26
+
27
/*== Bitstream Options ==*/
28
29
/* Flag indicating whether VPS, SPS and PPS headers should be output with
30
* each keyframe. Default false */
31
int bRepeatHeaders;
32
33
+ /* Flag indicating whether the encoder should generate start codes (Annex B
34
+ * format) or length (file format) before NAL units. Default true, Annex B.
35
+ * Muxers should set this to the correct value */
36
+ int bAnnexB;
37
+
38
/* Flag indicating whether the encoder should emit an Access Unit Delimiter
39
* NAL at the start of every access unit. Default false */
40
int bEnableAccessUnitDelimiters;
41
42
int analysisMode;
43
44
/* Filename for analysisMode save/load. Default name is "x265_analysis.dat" */
45
- char* analysisFileName;
46
+ const char* analysisFileName;
47
48
/*== Rate Control ==*/
49
50
51
52
/* Filename of the 2pass output/input stats file, if unspecified the
53
* encoder will default to using x265_2pass.log */
54
- char* statFileName;
55
+ const char* statFileName;
56
57
/* temporally blur quants */
58
double qblur;
59
60
/* Enable stricter conditions to check bitrate deviations in CBR mode. May compromise
61
* quality to maintain bitrate adherence */
62
int bStrictCbr;
63
+
64
+ /* Enable adaptive quantization at CU granularity. This parameter specifies
65
+ * the minimum CU size at which QP can be adjusted, i.e. Quantization Group
66
+ * (QG) size. Allowed values are 64, 32, 16 provided it falls within the
67
+ * inclusuve range [maxCUSize, minCUSize]. Experimental, default: maxCUSize*/
68
+ uint32_t qgSize;
69
} rc;
70
71
/*== Video Usability Information ==*/
72
73
* conformance cropping window to further crop the displayed window */
74
int defDispWinBottomOffset;
75
} vui;
76
+
77
+ /* SMPTE ST 2086 mastering display color volume SEI info, specified as a
78
+ * string which is parsed when the stream header SEI are emitted. The string
79
+ * format is "G(%hu,%hu)B(%hu,%hu)R(%hu,%hu)WP(%hu,%hu)L(%u,%u)" where %hu
80
+ * are unsigned 16bit integers and %u are unsigned 32bit integers. The SEI
81
+ * includes X,Y display primaries for RGB channels, white point X,Y and
82
+ * max,min luminance values. */
83
+ const char* masteringDisplayColorVolume;
84
+
85
+ /* Content light level info SEI, specified as a string which is parsed when
86
+ * the stream header SEI are emitted. The string format is "%hu,%hu" where
87
+ * %hu are unsigned 16bit integers. The first value is the max content light
88
+ * level (or 0 if no maximum is indicated), the second value is the maximum
89
+ * picture average light level (or 0). */
90
+ const char* contentLightLevelInfo;
91
+
92
} x265_param;
93
94
/* x265_param_alloc:
95
96
void x265_picture_init(x265_param *param, x265_picture *pic);
97
98
/* x265_max_bit_depth:
99
- * Specifies the maximum number of bits per pixel that x265 can input. This
100
- * is also the max bit depth that x265 encodes in. When x265_max_bit_depth
101
- * is 8, the internal and input bit depths must be 8. When
102
- * x265_max_bit_depth is 12, the internal and input bit depths can be
103
- * either 8, 10, or 12. Note that the internal bit depth must be the same
104
- * for all encoders allocated in the same process. */
105
+ * Specifies the numer of bits per pixel that x265 uses internally to
106
+ * represent a pixel, and the bit depth of the output bitstream.
107
+ * param->internalBitDepth must be set to this value. x265_max_bit_depth
108
+ * will be 8 for default builds, 10 for HIGH_BIT_DEPTH builds. */
109
X265_API extern const int x265_max_bit_depth;
110
111
/* x265_version_str:
112
113
* Once flushing has begun, all subsequent calls must pass pic_in as NULL. */
114
int x265_encoder_encode(x265_encoder *encoder, x265_nal **pp_nal, uint32_t *pi_nal, x265_picture *pic_in, x265_picture *pic_out);
115
116
+/* x265_encoder_reconfig:
117
+ * various parameters from x265_param are copied.
118
+ * this takes effect immediately, on whichever frame is encoded next;
119
+ * returns 0 on success, negative on parameter validation error.
120
+ *
121
+ * not all parameters can be changed; see the actual function for a
122
+ * detailed breakdown. since not all parameters can be changed, moving
123
+ * from preset to preset may not always fully copy all relevant parameters,
124
+ * but should still work usably in practice. however, more so than for
125
+ * other presets, many of the speed shortcuts used in ultrafast cannot be
126
+ * switched out of; using reconfig to switch between ultrafast and other
127
+ * presets is not recommended without a more fine-grained breakdown of
128
+ * parameters to take this into account. */
129
+int x265_encoder_reconfig(x265_encoder *, x265_param *);
130
+
131
/* x265_encoder_get_stats:
132
* returns encoder statistics */
133
void x265_encoder_get_stats(x265_encoder *encoder, x265_stats *, uint32_t statsSizeBytes);
134
135
void (*picture_init)(x265_param*, x265_picture*);
136
x265_encoder* (*encoder_open)(x265_param*);
137
void (*encoder_parameters)(x265_encoder*, x265_param*);
138
+ int (*encoder_reconfig)(x265_encoder*, x265_param*);
139
int (*encoder_headers)(x265_encoder*, x265_nal**, uint32_t*);
140
int (*encoder_encode)(x265_encoder*, x265_nal**, uint32_t*, x265_picture*, x265_picture*);
141
void (*encoder_get_stats)(x265_encoder*, x265_stats*, uint32_t);
142
143
* Retrieve the programming interface for a linked x265 library.
144
* May return NULL if no library is available that supports the
145
* requested bit depth. If bitDepth is 0 the function is guarunteed
146
- * to return a non-NULL x265_api pointer, from the system default
147
- * libx265 */
148
+ * to return a non-NULL x265_api pointer, from the linked libx265.
149
+ *
150
+ * If the requested bitDepth is not supported by the linked libx265,
151
+ * it will attempt to dynamically bind x265_api_get() from a shared
152
+ * library with an appropriate name:
153
+ * 8bit: libx265_main.so
154
+ * 10bit: libx265_main10.so
155
+ * Obviously the shared library file extension is platform specific */
156
const x265_api* x265_api_get(int bitDepth);
157
158
#ifdef __cplusplus
159
x265_1.6.tar.gz/source/x265cli.h -> x265_1.7.tar.gz/source/x265cli.h
Changed
104
1
2
namespace x265 {
3
#endif
4
5
-static const char short_options[] = "o:p:f:F:r:I:i:b:s:t:q:m:hwV?";
6
+static const char short_options[] = "o:D:P:p:f:F:r:I:i:b:s:t:q:m:hwV?";
7
static const struct option long_options[] =
8
{
9
{ "help", no_argument, NULL, 'h' },
10
11
{ "no-pme", no_argument, NULL, 0 },
12
{ "pme", no_argument, NULL, 0 },
13
{ "log-level", required_argument, NULL, 0 },
14
- { "profile", required_argument, NULL, 0 },
15
+ { "profile", required_argument, NULL, 'P' },
16
{ "level-idc", required_argument, NULL, 0 },
17
{ "high-tier", no_argument, NULL, 0 },
18
{ "no-high-tier", no_argument, NULL, 0 },
19
+ { "allow-non-conformance",no_argument, NULL, 0 },
20
+ { "no-allow-non-conformance",no_argument, NULL, 0 },
21
{ "csv", required_argument, NULL, 0 },
22
{ "no-cu-stats", no_argument, NULL, 0 },
23
{ "cu-stats", no_argument, NULL, 0 },
24
{ "y4m", no_argument, NULL, 0 },
25
{ "no-progress", no_argument, NULL, 0 },
26
{ "output", required_argument, NULL, 'o' },
27
+ { "output-depth", required_argument, NULL, 'D' },
28
{ "input", required_argument, NULL, 0 },
29
{ "input-depth", required_argument, NULL, 0 },
30
{ "input-res", required_argument, NULL, 0 },
31
32
{ "colormatrix", required_argument, NULL, 0 },
33
{ "chromaloc", required_argument, NULL, 0 },
34
{ "crop-rect", required_argument, NULL, 0 },
35
+ { "master-display", required_argument, NULL, 0 },
36
+ { "max-cll", required_argument, NULL, 0 },
37
{ "no-dither", no_argument, NULL, 0 },
38
{ "dither", no_argument, NULL, 0 },
39
{ "no-repeat-headers", no_argument, NULL, 0 },
40
41
{ "strict-cbr", no_argument, NULL, 0 },
42
{ "temporal-layers", no_argument, NULL, 0 },
43
{ "no-temporal-layers", no_argument, NULL, 0 },
44
+ { "qg-size", required_argument, NULL, 0 },
45
+ { "recon-y4m-exec", required_argument, NULL, 0 },
46
{ 0, 0, 0, 0 },
47
{ 0, 0, 0, 0 },
48
{ 0, 0, 0, 0 },
49
50
H0("-V/--version Show version info and exit\n");
51
H0("\nOutput Options:\n");
52
H0("-o/--output <filename> Bitstream output file name\n");
53
+ H0("-D/--output-depth 8|10 Output bit depth (also internal bit depth). Default %d\n", param->internalBitDepth);
54
H0(" --log-level <string> Logging level: none error warning info debug full. Default %s\n", x265::logLevelNames[param->logLevel + 1]);
55
H0(" --no-progress Disable CLI progress reports\n");
56
H0(" --[no-]cu-stats Enable logging stats about distribution of cu across all modes. Default %s\n",OPT(param->bLogCuStats));
57
58
H0(" --[no-]ssim Enable reporting SSIM metric scores. Default %s\n", OPT(param->bEnableSsim));
59
H0(" --[no-]psnr Enable reporting PSNR metric scores. Default %s\n", OPT(param->bEnablePsnr));
60
H0("\nProfile, Level, Tier:\n");
61
- H0(" --profile <string> Enforce an encode profile: main, main10, mainstillpicture\n");
62
+ H0("-P/--profile <string> Enforce an encode profile: main, main10, mainstillpicture\n");
63
H0(" --level-idc <integer|float> Force a minimum required decoder level (as '5.0' or '50')\n");
64
H0(" --[no-]high-tier If a decoder level is specified, this modifier selects High tier of that level\n");
65
+ H0(" --[no-]allow-non-conformance Allow the encoder to generate profile NONE bitstreams. Default %s\n", OPT(param->bAllowNonConformance));
66
H0("\nThreading, performance:\n");
67
H0(" --pools <integer,...> Comma separated thread count per thread pool (pool per NUMA node)\n");
68
H0(" '-' implies no threads on node, '+' implies one thread per core on node\n");
69
70
H0(" --analysis-file <filename> Specify file name used for either dumping or reading analysis data.\n");
71
H0(" --aq-mode <integer> Mode for Adaptive Quantization - 0:none 1:uniform AQ 2:auto variance. Default %d\n", param->rc.aqMode);
72
H0(" --aq-strength <float> Reduces blocking and blurring in flat and textured areas (0 to 3.0). Default %.2f\n", param->rc.aqStrength);
73
+ H0(" --qg-size <int> Specifies the size of the quantization group (64, 32, 16). Default %d\n", param->rc.qgSize);
74
H0(" --[no-]cutree Enable cutree for Adaptive Quantization. Default %s\n", OPT(param->rc.cuTree));
75
H1(" --ipratio <float> QP factor between I and P. Default %.2f\n", param->rc.ipFactor);
76
H1(" --pbratio <float> QP factor between P and B. Default %.2f\n", param->rc.pbFactor);
77
H1(" --qcomp <float> Weight given to predicted complexity. Default %.2f\n", param->rc.qCompress);
78
- H1(" --cbqpoffs <integer> Chroma Cb QP Offset. Default %d\n", param->cbQpOffset);
79
- H1(" --crqpoffs <integer> Chroma Cr QP Offset. Default %d\n", param->crQpOffset);
80
+ H1(" --qpstep <integer> The maximum single adjustment in QP allowed to rate control. Default %d\n", param->rc.qpStep);
81
+ H1(" --cbqpoffs <integer> Chroma Cb QP Offset [-12..12]. Default %d\n", param->cbQpOffset);
82
+ H1(" --crqpoffs <integer> Chroma Cr QP Offset [-12..12]. Default %d\n", param->crQpOffset);
83
H1(" --scaling-list <string> Specify a file containing HM style quant scaling lists or 'default' or 'off'. Default: off\n");
84
H1(" --lambda-file <string> Specify a file containing replacement values for the lambda tables\n");
85
H1(" MAX_MAX_QP+1 floats for lambda table, then again for lambda2 table\n");
86
87
H1(" --colormatrix <string> Specify color matrix setting from undef, bt709, fcc, bt470bg, smpte170m,\n");
88
H1(" smpte240m, GBR, YCgCo, bt2020nc, bt2020c. Default undef\n");
89
H1(" --chromaloc <integer> Specify chroma sample location (0 to 5). Default of %d\n", param->vui.chromaSampleLocTypeTopField);
90
+ H0(" --master-display <string> SMPTE ST 2086 master display color volume info SEI (HDR)\n");
91
+ H0(" format: G(x,y)B(x,y)R(x,y)WP(x,y)L(max,min)\n");
92
+ H0(" --max-cll <string> Emit content light level info SEI as \"cll,fall\" (HDR)\n");
93
H0("\nBitstream options:\n");
94
H0(" --[no-]repeat-headers Emit SPS and PPS headers at each keyframe. Default %s\n", OPT(param->bRepeatHeaders));
95
H0(" --[no-]info Emit SEI identifying encoder and parameters. Default %s\n", OPT(param->bEmitInfoSEI));
96
97
H1("\nReconstructed video options (debugging):\n");
98
H1("-r/--recon <filename> Reconstructed raw image YUV or Y4M output file name\n");
99
H1(" --recon-depth <integer> Bit-depth of reconstructed raw image file. Defaults to input bit depth, or 8 if Y4M\n");
100
+ H1(" --recon-y4m-exec <string> pipe reconstructed frames to Y4M viewer, ex:\"ffplay -i pipe:0 -autoexit\"\n");
101
H1("\nExecutable return codes:\n");
102
H1(" 0 - encode successful\n");
103
H1(" 1 - unable to parse command line\n");
104