Projects
home:TheBlackCat
libbml
Sign Up
Log In
Username
Password
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
Expand all
Collapse all
Changes of Revision 3
View file
baselibs.conf
Changed
@@ -1,1428 +1 @@ -ATLAS 3.9.75 released 05/23/12, changes from 3.9.74: - * Switched more archs to using gcc4.7.0: - + Some archs use gcc4.7.0 using 4.6.2 archdefs fine: - - AMD64K10h64SSE3, Core264SSE3, Corei264AVX, Corei264SSE3, ARMv7, - AMDDOZER64AVXFMA4, Core232SSE3, PPCG532AltiVec - + Some archs need new archdefs for gcc 4.7.0: - - Corei164SSE3, Corei132SSE3, PPCG564AltiVec - * Added a configure option to detect macports gcc as gcc - * Some string length extensions for better flag handling -ATLAS 3.9.75 released 05/17/12, changes from 3.9.74: - * add --force-tids= flag to configure to allow manual override of thread - affinity IDs (so you can ignore virtual processors) - * Switched POWER7 configure to prefer gcc 4.7.0 (4.6.2 fails full tester) - * New POWER7 archdefs for gcc 4.7.0 (pass full tester) -ATLAS 3.9.74 released 05/05/12, changes from 3.9.73: - * Improved Corei264SSE3 defaults for OS X sandy bridge machines - * Improved POWER764VSX defaults - * git snafu means some of fixes shown in 73 may only show up in 74 -ATLAS 3.9.73 released 04/03/12, changes from 3.9.72: - * Fixed bug where non-x86 archs couldn't build threaded libs - * Made it so ISA extension flags (eg., -msse) are added gfortran as well - * Added archdefs for HAMMER32SSE3 - * Added archdefs for Corei264SSE3 for use on OS X sandy bridge machines - * Fixed bug in emit_mm.c where GetUserCase did not initialize MCC/MMFLAGS - * Updated power7 gcc flags to work with gcc 4.6.2 - * New archdefs for POWER764VSX -ATLAS 3.9.72 released 03/30/12, changes from 3.9.71: - * Added missing [s,c] files in Dozer64 archdefs - * Provided new fpu probe (?MULADD files) that works better with modern gcc - * Added new archdefs for P4E64SSE3, HAMMER64SSE3 - * Made it so -msse/avx/etc autoadded to gcc default flags - * Fixed it so archdef install doesn't rerun gmmsearch unnecessarily -ATLAS 3.9.71 released 03/24/12, changes from 3.9.70: - * Added code to enforce in-order writes or not use PCA for weakly ordered - memory systems like IA64, PPC, POWER & ARM. - - These insts don't work on PPC/ARM, so turned off PCA on all these archs - * Added some support for AMD Bulldozer (K15h family): - - configure support recognizes FX chips - - Added probe for FMA4 ISA extension - - New FMA4-enabled kernels (all 4 precisions) - - Archdefs for AMDDozer - * Stopped L2timers from exceeding Cachelen in no-align upper limit - * Made it so BLAS testers are compiled w/o optimization - * Changed L2/3 BLAS testers to call ATLAS's lamch to compute EPS, since newer - gfortran will yield 80-bit eps in unsafe old loop compution using x87 -ATLAS 3.9.70 released 03/16/12, changes from 3.9.69: - * Fixed bugs that caused sporadic seg faults when tuning L2BLAS kernels - where ALIGNX2A was set - * Changed ATL_tge[qr,ql]2 to use LARFG rather than unstable LARFP - * config fixed to accept -Ss pmake XXX flag again - * Added ISA extensions to xprint_enums - * Added -# arg to slvtst - * Added archdefs for: - - Corei232AVX - - Corei132SSE3 - - Core232SSE3 - - PPCG532AltiVec -ATLAS 3.9.69 released 03/09/12, changes from 3.9.68: - * Improved ranking of possible gccs during configure - * Fixed buffer overrun in config.c that caused seg fault on Windows - * Added DRVOPTS to defs in lapack_test.tar.gz Makefiles - * Fixed config.c to define F77NOOPTS to include F77FLAGS - * Fixed buffer overrun in config.c that caused seg fault on Windows - * Fixed stack overwrites in: - - ATL_cmm4x4x128_av.c - - ATL_dmm4x4x2pf_av.c - - ATL_smm4x4x128_av.c - * Added archdefs for PPCG432AltiVec and USIII64 - * got rid of unused "OBJdir/include/atlas_?[t]xover_ge[Q-type,lu]r.h" files -ATLAS 3.9.68 released 02/23/12, changes from 3.9.67: - * Fixed ATL_smm4x4x128_av.c so it can use gcc's non-standard VRSAVE inst - * Did crappy adaptation of ATL_smm4x4x128_av.c to complex ATL_cmm4x4x128_av.c - * Fixed possible seg fault in atlconf_misc.c's CompIsIBMXL - * Updated flags & architectural defaults for PPCG564AltiVec -ATLAS 3.9.67 released 02/14/12, changes from 3.9.66: - * Fixed error in Core264SSE3's gemvN archdef - * Put in call to serial code for small threaded syr2k to avoid subtractive - cancellation caused by lapack tester sdrvst (DST of dsep.out) -ATLAS 3.9.66 released 02/08/12, changes from 3.9.65: - * Changed a lot of L3BLAS/auxil integer computations to size_t in order - to avoid overflow on very large matrices (N=47,000) -ATLAS 3.9.65 released 02/07/12, changes from 3.9.64: - * Improved single-precision ARM GEMM kernel. - * Improved s/c ARM archdefaults - * Fixed L2 threaded bugs by casting ldamul to size_t -ATLAS 3.9.64 released 01/31/12, changes from 3.9.63: - * Deleted MATGEN/*.o from lapack tester tarfile - * Commented out nonsensical Q-type LWORK testing in error exit tests - * Attempted to guard all x86 ISA extensions with appropriate ifdefs - * Added new generic x86 architectures: - - x86x87, x86SSE1, x86SSE2, x86SSE3 - * Added (crappy) architectural defaults for generic archs: - - x86x8732, x86SSE132SSE1, x86SSE232SSE2 - * Fixed it so flushCacheByAddr depends on SSE2, not SSE1 - * Added section on building generic libs in atlas_install - * Added -M handling to gmmsearch.c's GetFlags - * Fixed error when CacheEdge is 0 in threaded Level 2 BLAS and recursive - Q-type factorizations - * Changed makefile so rec Q-type factorizations depend on atlas_qrrmeth.h - * added lapack_test_pt_pt to test atlas threaded lapack + threaded blas - * Added new archdefs for PIII32SSE1/PPRO2 for debian guys - -> Gcc 4.6.1 x87 performance is terrible, and gfortran has compiler bug - that causes all blas testers to fail unless -O1 or lower opt thrown - * Add new configure flag -Si ieee 0, which allows non-IEEE crap like - ARM NEON to be used when set to 0 - * Added ARM NEON kernels for s/cGEMM, sGEMVT, sGER2K -ATLAS 3.9.63 released 01/11/12, changes from 3.9.62: - * Fixed unitialized variable in ProbeOS - * Modified all QR-related routines to call LARFG, eliminated LARFP from lib - to follow reversal done in mainline LAPACK - * Modified single precision LAPY[2,3] to call sqrtf rather than sqrt, so - that answers are directly comparable to F77 implementations -ATLAS 3.9.62 released 01/03/12, changes from 3.9.61: - * Fixed error in atlas_mvtesttime.h where No-trans applied align args - to wrong vector - * Fixed alignment restriction on ATL_cgemvN_8x4_sse3.c so alignY=16 - -> alignY really applies to X for axpy-based implemntations - -> Updated bunch of archdefs to fix this error - * Updated ATLAS's LAPACK tester to that of lapack 3.4 to get around LAPACK's - API changes - -> Won't work with older LAPACK, but I can't do anything about LAPACK - changing the API -ATLAS 3.9.61 released 01/01/12, changes from 3.9.60: - * Fixed inadequate workspace bug in GELS - * Fixed src/auxil/ATL_geset to properly handle non-square matrices -ATLAS 3.9.60 released 12/31/11, changes from 3.9.59: - * Fixed failure to check for M or N < 1 in genned ATL_[ger,ger2]k_Mlt16 - * Fixed ATL_getf2 to return first non-zero pivot instead of last - * Fixed error in QR,QL where non-square matrices get wrong value for M - * Fixed error in atlas_qrmeth.h, where method was not assigned (serial) - * Fixed several errors in malloc/handling of ge[lq,rq]f's ws_CPRaw -ATLAS 3.9.59 released 12/21/11, changes from 3.9.58: - * Fixed FLAGS= to CFLAGS= in all L2 index files - * Removed a *bunch* of buffer overrun poss in config & archinfo files - --> still need to adapt emit_buildinfo -ATLAS 3.9.58 released 12/14/11, changes from 3.9.57: - * Fixed errors in TRSM for non-SSE/AVX kernels - * Added BETA=0 case to AVX cgemvT kernel (caused AVX to fail sanity tests) -ATLAS 3.9.57 released 12/09/11, changes from 3.9.56: - * Fixed error involving declaration of ln (line 711) config.c - * Fixed divide-by-zero error for small threaded SYRK - * New archdefs for AMD64K10h64SSE3 - * Got rid of obsolete (and now bad) PowerPC archdefs - * Changed archinfo so it recognizes model 46 or Xeon X7560 as Corei1 - * Fixed dependence in Make.aux to run IRun_nthr rather than IRun_aff -ATLAS 3.9.56 released 12/07/11, changes from 3.9.55: - * Added kludge so that ATLAS can autobuild new lapack 3.4.0 - * Added HOME/local to searched paths - * Found & fixed another possible buffer overrun in FindGoodGcc/Gfortran - * Added kludge so that ATLAS can autobuild new lapack 3.4.0 - * Added HOME/local to searched paths - * Added check for NULL return of GetGE in bin/ testers - * Added AVX cgemvT kernel - * New Corei264AVX arch defs for gcc 4.6.2 -ATLAS 3.9.55 released 12/02/11, changes from 3.9.54: - * Rewrite of config to avoid buffer overruns caused by long flags/paths -ATLAS 3.9.54 released 10/24/11, changes from 3.9.53: - * Improvements to config's compiler handling: - - config can now search various gcc's for best version - - config now searches for full path for gcc and gfortran - - config now searches for libgfortran.[so,dll,dylib] for dynamic build - - config now searches and finds path for goodgcc - * config --shared now works on OS X assuming gnu gcc and gfortran - * atlas_contrib's L2 tuning section partially updated - * Improved double complex GER2 kernels for AVX and SSE - - Updated only 64-bit AVX archdefs -ATLAS 3.9.53 released 10/12/11, changes from 3.9.52: - * Removed ATLAS/pthreads from library - * Added AVX kernels for ZAXPY and ICAMAX -ATLAS 3.9.52 released 09/29/11, changes from 3.9.51: - * Improved complex TRSM performance, particularly for small L/U, large RHS - * Fixed bug in complex ATLAS/tune/blas/level3/invtrsm.c - * Accepted series of patches & arch defs to add ATLAS support for IBM Z9, - Z10, and z196 mainframe computers. - Patches submitted by Christian Borntraeger of IBM. -ATLAS 3.9.51 released 09/13/11, changes from 3.9.50: - * Improved AVX kernels 10% faster for all precisions - * Improved reporting in results/, updated docs in atlas_install - * Fixed bug in mmsearch when user case forces a change in NB -ATLAS 3.9.50 released 09/02/11, changes from 3.9.49: - * Fixed typo causing seg fault in l2 kernel searches - * Fixed a bunch of warnings coming from clang -ATLAS 3.9.49 released 09/01/11, changes from 3.9.48: - * Fixed unitialized var in all l2 kernel searches - * Fixed out-of-mem bugs in GERC and GER2C - * Fixed a bunch of warnings coming from clang -ATLAS 3.9.48 released 08/31/11, changes from 3.9.47: - * Architectural defaults for Atom64SSE3 - * Improved Real TRSM performance, particularly for small triangle, large RHS - - Improves Invers, Cholesky, LU (in perf order), part. for SREAL on x8664 - * Fixed bug in gerk assembly reported by Blooox - * Added Xeon E5645 detection to configure -ATLAS 3.9.47 released 08/05/11, changes from 3.9.46: - * Improve parallel performance for LU & QR. - * Improved performance for serial LQ and RQ. - * Architectural defaults for ARMv732 - * Made it so config recognizes Atom, and suggests good compiler flags - * Added ability to chart all QR and Cholesky variants in results/ - * Added a lot of charting options, including charting more than 4 lines - * Added ability to use -# <nsamp> in l3blastst -ATLAS 3.9.46 released 07/09/11, changes from 3.9.45: - * Bug fixes in qrtst.c - * QR-related routines cleaned up - * Better PCA crossover rules improve parallel QR performance - * Fixed error in Core232SSE dMVTK.sum (missing \ from CFLAGS line) - * Fixed bad return values in ATL_getf2 -ATLAS 3.9.45 released 07/06/11, changes from 3.9.44: - * New chart creating targets (see ATLAS/doc/atlas_install.pdf) - * Fix bug in all L2 kernel searches where lda was set < M sometimes - in MU search. - * Found workaround to ATL_dgemvT_2x8_sse3.c Windows compiler bug (-Os) - * Removed goparallel_prank (unused) to avoid problems wt dynamic linking - * Architecural defaults for: - + P4E32SSE3 (gcc 4.2.1) - + AMD64K10h32SSE3 (gcc 4.4.5) - + Corei132SSE3 (gcc 4.4.5) - + Corei232AVX (gcc 4.4.5) -ATLAS 3.9.44 released 06/30/11, changes from 3.9.43: - * Fixed errors in ATL_tgemm_bigMN_Kp.c & ATL_tgemm_rkK.c where cleanup - was called with K > KB (usually causing seg faults). - * Several fixes for 32-bit windows. -ATLAS 3.9.43 released 06/29/11, changes from 3.9.42: - * Fixed errors in threaded GEMV and GER - * Bunch of fixes to make it possible to build 64-bit lib on Win64 - -> can build, but executables don't work, probably lib issue - * Changed windows Mhz probe to look in cygwin-provided cpuinfo rather - than use QueryPerformanceFrequency, which is not always set to clock rate - * Fixed lutst to print "fail" on failure. - * Updated full tester to call QR as well - * Updated sanity_checks to call QR - * Increased size of sanity checks for threaded code - * Added GEMM NaN tester to EXtest - * Improved charting functions in results/ -ATLAS 3.9.42 released 06/22/11, changes from 3.9.41: - * Added ability to autobuild performance charts in results/ - * Added EXtest/ and all-aligment testing for GER and GEMV - * Fixed bug in BETA=0 case of ATL_cgemvN_8x4_sse3.c - * Added results/ directory that can autobuild performance charts - * numerous fixes to qrtest and some fixes for the QR fact routines - * Added missing $(F77SYSLIB) in Make.lib's dylib and ptdylib targets - * Added chapter in atlas_install explaining how to use mmflagsearch - * Fixed uninitialized memory read caused by copying data I don't reference - in parallel GEMM. - * Fixed unitialized memory read in gemvT - * Changed extendedmodel=2, model=5 from Corei2 to Corei1 in archinfo_x86 -ATLAS 3.9.41 released 05/14/11, changes from 3.9.40: - * Bug fix in EmitMakefile for L2 that should fix some dynamic lib errors - * Fixed yet another C/Z GEMM JITcp bug where C was read when BETA=0 - * Fixed BETA=0, KB=1 bug in: ATL_mm4x4x2_1_prefCU.c & ATL_mm4x4x2US.c - * Configure support, kernels, and architectural defaults for ARMv7. - - Tom Wallace supplied a comprehensive patch for configure support - * Added single & double precision ARM kernels (single not very good) -ATLAS 3.9.40 released 04/21/11, changes from 3.9.39: - * Added beta versions of simple threaded GEMV & GER - * Added threaded L2 testing to tester - * Fixed bug in axpby where it called SCAL with alpha=0, which fixes GEMM - error for BETA=0 case. - * Fixed several simple buffer overruns in full tester - * Added dynamically scheduled tgemm that is used whenever all dimensions - are large. - * Added support for complex types for both dynamic cases (rank-K, large) - * Fixed several errors in GEMM that occur when K dim is cut -ATLAS 3.9.39 released 03/18/11, changes from 3.9.38: - * Basic AVX GEMM kernels and new Corei264AVX arch defs. - * Now use dynamically scheduled parallel rank-K updates for real types - * Complete rewrite of all threaded routines to use goparallel, and thus - dynamic spawn. - * OpenMP now uses same codebase as windows & pthreads forall threading. - * Thread tune now creates atlas_tsumm.h for summation of threaded tuning - * Added ATL_thread_yield function - * If affinity is not set, dynamic funcs now yield thread execution when - waiting for their peers to signal completion of a stage - -> Otherwise, active poller prevents thread running on same core from exec -ATLAS 3.9.38 released 03/03/11, changes from 3.9.37: - * Translated ptflushcache to use new goparallel framework - * Fixed bug in Make.ttune causing systems w/o affinity (eg. OSX) to fail - to build the AtomicCounter symbols - * Fixed error in ATL_gemaxnrm.c - * Added probe to see if assembly mutexes supply speedup, and use system - mutex when they don't - - Now time with P local counters instead of one global counter. - * Renamed Corei7 to Corei1 (1st gen Corei) - - Corei5/i7 all same to ATLAS if 1st gen - * Added configure support for Corei2 (2nd generation Corei, eg. sandy bridge) - * Added probes for AVX and AVXMAC (AVX including multiply/accumulate inst) - * Added architectural defaults for: - - Corei2AVX - - US[IV,III][32,64] - * Fixed gemmtst to handle parallel timing correctly - * Added x_mmtst_[aff,noaff] targets so we can see difference in perf - * Several dynamically scheduled tGEMMs now in library, but not called -ATLAS 3.9.37 released 02/15/11, changes from 3.9.36: - * Fixed bug in all L2 kernel timers where timing loop not even entered, - resulting in bogus time being returned - * Fixed bug in gmmsearch.c; lat was set to bad value after K-unroll search - * Added xtune_spawn_fp to study spawning strategies under load -ATLAS 3.9.36 released 02/13/11, changes from 3.9.35: - * ATLAS now only uses affinity when it provides speedup (empirically tuned) - * Fixed bug in all ATL_PAFF_SELF implementations of affinity - * Fixed bug in ATL_PAFF_SCHED affinity implementation - * Fixed several bugs for when your affinity IDs are not contiguous - * Fixed bug in gmmsearch.c, where lat was set to bad value in K-unroll search - * Fixed l2install targs to ignore problems in deleting old values -ATLAS 3.9.35 released 02/08/11, changes from 3.9.34: - * Fixed bug in Upper complex case of ATLAS/src/auxil/ATL_trscal.c - * Fixed numerous bugs relating to transpose & row-major interface in - GBMV and GEMV. - * Fixed bugs involving aligning Y and applying BETA in ATL_gemvCN. - * Fixed bugs in SYR2 and GER, where N-cleanup was calling the Mlt func, - rather than the Nlt/axpy func, and using an index of 'N' rather than 'n'. -ATLAS 3.9.34 released 02/06/11, changes from 3.9.33: - * First release from github basefiles - * Affinity now auto-probed for instead of assumed (tested only Linux). - -> If affinity works on P=0, probes for all legal affinity IDs - * Implemented serial & dynamic launch; serial is best most of time - - On PPC, dynamic & log2 are faster with P=32 - * Several threading-related probes added to ATLAS/tune/threads - * Added -m64/-m32 to flags on POWER7 and POWER6 - * Addition of AtomicCount routines for later threading use. - -> just x86 & mutex presently, can do for PPC,SPARC,MIPS as well. -ATLAS 3.9.33 released 01/21/11, changes from 3.9.32: - * Large number of bug patches reported by Tom Wallace applied - * Important bug patches submitted by Mike Kistler for L2 tuning: - - Allowing prefetch tuning flags when C flags not specified - - Allow timings to work in the face of low-resolution timers - * Cast threading BLAS indexing to size_t to avoid overflow - * Rewrite of lapack libs, so we have threaded/serial versions. - - Now liblapack.a and libptlapack.a! - * Addition of PCA codes for LU and QR, but not yet used by default - * Added section on ATLAS coding style to ATLAS/doc/atlas_devel.pdf - * Rewrite of dynamic lib build, so they always build one monolithic lib: - - libtatlas.[so,dylib,dll] : threaded lapack, threaded blas - - libsatlas.[so,dylib,dll] : serial lapack, serial blas - * Updated P4ESSE3 and PPCG564AltiVec arch defs -ATLAS 3.9.32 released 11/02/10, changes from 3.9.31: - * Fixed error in ATL_cgemvN_8x4_sse3.c, causing seg fault if BETA=0 - - Updated arch defs for Core264SSE & AMD64K10h64SSE3 -ATLAS 3.9.31 released 10/29/10, changes from 3.9.30: - * Made it so L2 searches print out error output on fatal kernel tests or - failed timings - * Made it so that unrestricted L2 timings force all operands to be aligned - to sizeof(TYPE) (and no greater), in order to get real worst-case - performance for vector codes. - * Fixed L2 timers so they allow complex arrays to be aligned to underlying - type size, rather than full complex size - * New L2 archdefs using improved timers for: - + Core264SSE3, AMD64K10h64SSE3, Corei764SSE3 -ATLAS 3.9.30 released 10/28/10, changes from 3.9.29: - * Made it so prefetchw is not tried by l2searches if 3DNow! is not detected - * Fixed error in AMD64K10h64SSE3 arch defs causing incomplete timings - * Had Level 2 BLAS use serial cacheedge rather than parallel -ATLAS 3.9.29 released 10/27/10, changes from 3.9.28: - * Removed workspace error check from all QR variant interface routs - * Fixed bug where GE[LQ,RQ]f in case where N>128 && M==N returned TAU - with the diagonal elements conjugated from what they should have been - * Fixed error in neg files used in "make ArchNew" - * Updated architectural defaults for Corei764SSE3 - * Fixed error in AMD64K10h64SSE3 arch defs causing negative "make time" -ATLAS 3.9.28 released 10/25/10, changes from 3.9.27: - * Several changes so that dynamic libs will build w/o missing symbols: - + Changed SPR, HPR so they just call the reference packed blas. - + Removed prototypes for MV kernels that no longer exist from atlas_lvl2.h - + Removed build of src/blas/level2/kernel, since no longer needed - * Fixed all L2 kernel searches so TimeMyKernel returns mflop rate - * Fixed bug in ATL_sgemvN_8x4_sse.c where $Y$ was read when BETA=0 - * Changed it so generated Makefiles for mvN, mvT, r1, r2 kerenls use GOODGCC - if 'gcc' is specified (so they inherit flags like -pg, -m64, etc). - * Added VSX GER from Mike Kistler to kernels & Power7 arch defs -ATLAS 3.9.27 released 10/20/10, changes from 3.9.26: - * Fixed several bugs to allow L2 BLAS to install using a low-res timer - * Fixed bug in x8664 kernel description causing seg faults for CGER - * Fixed bug in r1hgen and r2hgen where first kernel's minM was ignored - * Fixed bug in ATL_her/her2 where j-loop max was N rather than NN - * Fixed bug so that h2gen.c generates ATL_GENGERK as a function that - can handle all operations, not just least restricted kernel. - * Fixed bug in ATL_syr & syr2 where nr computed incorrectly - * Fixed cblas_[nrm2,asum,iamax,scal] so that they return with no - operations if incX < 1 (this matches f77 behavior) - * Fixed bug with extra spaces in configure's OSX libtool finding script -ATLAS 3.9.26 released 10/18/10, changes from 3.9.25: - * Much improved GEMV & GER performance for x86-64: - + Addition of SSE/x86-64 GER/GEMV generators - + Complete rewrite of GEMV tuning infrastructure - + Change to GER kernel API to minimize parameter passing - + Arch defaults for Core264SSE3 & AMD64K10h64SSE3 updated - * ANSI C code generators for MVT, MVN, R1 and R2. - + should improve non-x86/x86-64 performance - * Started rewrite of all L2BLAS: - + GEMVT, GEMVN, GER, SYR, SYR2, HER, HER2 built from optimized kernels - - TRMV, TRSV, SYMV, HEMV just call reference implementation - * Bug fixes: - + Fixed kernel testers & timers to correctly handle alignments, - particularly ALIGNX2A. - * Basic support for POWER7 - + VSX detection - + VSX GEMM, GEMVN & GEMVT kernels provided by Mike Kistler of IBM - + Arch defs - * Fixed it so GEMM kernel files use $(GOODGCC) instead of flat gcc - if 'gcc' is specified (so they inherit flags like -pg, -m64, etc). -ATLAS 3.9.25 released 06/04/10, changes from 3.9.24: - * Fixed bug causing x_tfindCE/txover to use CPU rather than WALL time - * Got rid of test -e in Makefiles, since Solaris /bin/sh disallows - * Fixed lack of return statement in lanbsrch's findNBByN(). - * Hid bug in ATL_thrdecompMM_rMNK exposed by p=128 by not calling when - K is small - * Fixed bug in r[1,2]ksearch that prevented arch defs from working - * Fixed typos when setting self affinity in threading (SunOS) - * Fixed R1SUMM/R1K.sum error in ArchNew target for creating arch defs - * Added R2K.sum files to archdef Makefile, and to Core264 & k10h64 arch defs. - * Added configure support for UltraSPARC T2 - + Turned off affinity for T2, where it decreases parallel performance - * Added architectural defaults for UST264 & UST232 - * Fixed errors in GER2 handling - * Fixed repeated GER/GER2 symbols that prevented shared lib build -ATLAS 3.9.24 released 04/21/10, changes from 3.9.23: - * Should see a roughly doubling in performance of L2's SYR2/HER2 - * Addition of new BLAS2.5-like kernel, GER2 (rank-2 update) - - A = alpha*x*y + beta*w*z + A - * Native ATLAS support for xGELS and all subsidiary routines, including - C and Fortran interfaces for GELS - - Internal routs not yet exposed in C/F77 iface include: - + ORM[[QL,QR,LQ,RQ] -> UNM* called ORM for complex - + GE[QL,QR,LQ,RQ]2 (unblocked QR) - + GE[QL,QR,LQ,RQ]R (recursive QR) - + LADIV, LAPY2, LAPY3 - + LARFB, LARFT (F77 ifaces, but no C ifaces) - + LARF, LARFG, LARFP - + LASCL (not supported for banded matrices) - - Of these, should definitely expose UNM/ORM at iface level - * Addition of [D,S]LAMCH for both C & F77 interfaces - * Fixed slvtst (LU & QR) to use norm of original A, not factored matrix - in computing solve residual - * Chad fixed a bug in the SSE generator in type casting for stores - * Changed it so unknown LAPACK routs are given ATLAS's NB for NB, - rather than 1 - * Fixed bug in r1hgen.c where Level 1 & 2 blocking were hugely inflated - (leading to no effective blocking) - * Updated archinfo_linux to recognize "PPC970MP" as a G5 -ATLAS 3.9.23 released 02/07/10, changes from 3.9.22: - * Fixed dependency error in ATLAS/makes/Make.mmtune - * Improved mmflagsearch, so we now have O(N) greedy search as default - -> if you pass -f gcc, will gen most opt-related gcc flags in gccflags.txt - * Improved flags used on PowerPC G4 & G5 - * Updated some architectural defaults: - - Corei764SSE3, PPCG564AltiVec, PPCG4AltiVec, MIPSICE964 -ATLAS 3.9.22 released 02/05/10, changes from 3.9.21 - * Fixed long-standing bug in cleanup code generation -- this bug has been - in package since we've generated cleanup, and it causes malformed ifs - that select cleanup code; most commonly it creates uncompilable code, - but it could also result in using a suboptimal cleanup kernel. - * Fixed another long-standing bug in cleanup code generation, this - one involving not building enough fixed=1 clean cases if there are - higher imult cleanup cases in the Q. This resulted in errors in - cleanup answers. - * Complete rewrite of search for finding best generated kernel to use - new test/time infrastructure. See ATLAS/tune/blas/gemm/gmmsearch for - new search. Cleanup and no-copy still uses old search, which is renamed - ATLAS/tune/blas/gemm/mmcuncpsearch.c. New search driver is mmsearch.c - * Chad fixed several bugs in the SSE generator relating to type casting - * Fixed genparse's DupString to handle NULL pointers - * Fixed erroneous include of atlas_misc.h in clapack.h - * Added a compiler flag search to ease job of finding good flags. - - ATLAS/tune/blas/gemm/mmflagsearch.c - * Arch def changes: - - Updated G4 defs -- reduced perf due to gcc PPC performance bug - - Corei7464SSE3: negated ?MMRES.sum mflop values - - AMD64K10h64SSE3 : updated to new style - - Core264SSE3 : updated to new style - * Some PowerPC-specific fixes: - - Fixed it so configure can autodetect clock speed on G4/Linux - - Fixed it so ATLAS always assumes gnu gcc altivec handling on PowerPC - - Renamed vector registers to numbers just like GPRs (fixes Linux/PPC - assembly, and related altivec probe) -ATLAS 3.9.21 released 01/11/10, changes from 3.9.20 - * Fixed error in threaded SYMM, where recursion had bad pointer - * Created ability to tune threaded/serial crossover points, see - ATLAS/tune/blas/gemm/txover.c - * Improved CacheEdge detection - * Fixed bug in configure for --shared on archs w/o f77 compiler - * Updated lanbtst to work wt new QR naming scheme, and to compile - correctly for lanbtime (was not using lapack's ILAENV in this case) -ATLAS 3.9.20 released 12/21/09, changes from 3.9.19 - * Fixed bug in call to memcpy by casting all MulBySize to size_t - * Fixed several ilaenv-related errors, including QR always using serial parms - * Made it so ORMQR and UNMQR variants use QR's tuned NB - * Fixed error in complex gemoveT & gemoveC (src/auxil) - * Made gemoveT & C TLB-aware - * Added src/auxil/ATL_sqtrans to do TLB-aware in-place square transpose - * If M==N, then RQ & LQ (row-major) do in-place transpose and call - QL or QR (column-major). This gives ~10% performance improvement. - * Added F77 interface for xLARFT and xLARFB -ATLAS 3.9.19 released 12/08/09, changes from 3.9.18 - * Got rid of files in C2F now being provided natively by ATLAS: - - larft, geqrf, geqlf, gerqf, gelqf, geqrf, - * Fixed duplication of unmqr_wrk symbols - * Removed use of SAFMIN global variable in larfb/larfg -ATLAS 3.9.18 released 12/05/09, changes from 3.9.17 - * Found & fixed error in threaded GEMM - * Fixed bug where lanbtst_pt didn't set NB - * Modified mmksearch_sse.c to try gcc & sse flags if native compiler - can't handle the generated files. - * Rewrote LAPACK/QR NB tuning - - now uses ATLAS/tune/lapack/lanbsrch rather than bin/lanbtst (faster) - - Now done by default - * Numerous errors fixed involving architecture default timing (all levels) - * Modified atlas_install to keep track of times for every part of install, - so we can see where time is spent - * Architectural default related changes: - - Fixed ArchNew target in building arch defs to negate .sum files - - Core264SSE & AMD64K10h64SSE needed negative values in .sum files - - Updated Core264SSE, AMD64K10h64SSE, HAMMER64SSE3 to get new threaded - lapack, and full .sum support -ATLAS 3.9.17 released 11/15/09, changes from 3.9.16 - * Chad's SSE GEMM generator now works for CGEMM - - Provides faster (CGEMM) arch defs for Core264SSE3 - * Addition of householder factorizations (mostly written by Siju Samuel): - - F77 & C interface, C supports row/col- major - - GEQRF GEQLF GERQF GELQF - - tester is qrtst.c in ATLAS/bin/ - - Retuned LAPACK's QR NB arch defs for AMD64K10h64SSE3 & Core264SSE3 - * Fixed seg fault in ummsearch caused by mmksearch_sse failure - * Rewrote Write[MM,MV,R1]File to get around gcc bug - * Fixed bugs in ATLAS/src/auxil/[ge,tr]collapse - * Fixed bug in ATLAS/tune/blas/ger/CASES/ATL_zgerk_1x4_sse3.c - * Renamed xatlas_install -> xatlas_build, to get around Windows 7 - "security-through-stupidity" misfeature -ATLAS 3.9.16 released 10/17/09 (bugfix release), changes from 3.9.15 - * Fixed bugs in mmksearch_sse.c for machines w/o SSE3 - * Fixed errors in C2F preventing full lapack install - * Fixed error in atlas_install trying to open wrong filename in latune - * Fixed error in mmsearch's FindNoCopyNB where latency computed incorrectly - * Numerous errors related to new architectural default handling - * New architectural defaults for: - - AMD64K10h64SSE3 - - Core264SSE3 - - Corei764SSE3 -ATLAS 3.9.15 released 10/10/09, changes from 3.9.14 - * Addition of Chad Zalkin's SSE GEMM generator to ATLAS - * Support for external searches and use of standard matmul search routs in: - - include/atlas_mmparse.h - - include/atlas_mmtesttime.h - * Numerous search changes to incorporate above in ATLAS matmul install - - Changed matmul install to be much quieter -ATLAS 3.9.14 released 08/19/09 (bugfix release), changes from 3.9.13 - * Fixed complex indexing errors in ATL_ger.c & ATL_zgerk_1x4_sse3.c - * Fixed error in config.c where using LAPACK caused OpenMP to be built - * Made it so C2F LAPACK interface only built if F77 LAPACK is provided - * Basic --shared install now works (tested Linux build only) -ATLAS 3.9.13 released 08/17/09 (bugfix release), changes from 3.9.12 - * Fixed ATL_smm14x1x84_sseCU.c so it won't be used when NB > 84 - - fixed AMD64 arch def not to use it - * Fixed 1-character memory overwrite in atlas_genparse.h's DupString - * Added prototype to r1ktest.c - * 3.9.12 showed version of 3.9.11; this version shows correct 3.9.13 -ATLAS 3.9.12 released 08/06/09, changes from 3.9.11 - * Complete rewrite of GER, SYR/HER and SYR2/HER2: - - New tuning mechinism tunes GER for in-L1, in-L2, and out-of-cache - * Call ATL_<pre>ger_L1 if data known to be in L1 cache - * Call ATL_<pre>ger_L2 if data known to be in L2 cache - - Most architectures now lack GER arch defs - * Provided GER archdefs 64-bit K10h and Core2 - - atlas_devel not yet updated - * Relatively untested standard timing/tester code available for all - tuned kernels (GER fairly well tested) - - atlas_[mv,r1,mm]parse.h reads standard input/output files - - atlas_[mv,r1,mm]testtime.h provides tester/timer calls for kernels - * Can compile both lapack 3.2 and 3.1 with --with-netlib-lapack-tarfile - - Removed support for other ways of building lapack - - atlas_install mostly updated - * Bug fixes - - Fixed BETA=0 SCAL NaN-propogation bug (no more call to ATL_set) - - Fixed C/Z GEMM JITcp bug where C was read when BETA=0 - - Fixed threaded LAPACK calling serial ilaenv (QR speedup) -ATLAS 3.9.11 released 04/07/09, changes from 3.9.10 - * Added flags -Si [omp,antthr] 0/1/2 to allow ease of building ATLAS - with alternative threading implementations - * Fixed prototypes in atlas_f77wrap.h so that all thread interfaces - are properly prototyped when they are selected by the above flags - * Fixed missing TRMM prototype in atlas_tlvl3.h that caused STRSM - to fail tests in xsl3blastst_pt -ATLAS 3.9.10 released 03/11/09, changes from 3.9.9 - * Rewrote tgemm's combine routine to work on arbitrary partitionings - combined in arbitrary orders (necessary for non-power-of-2 processors) - - Restricted fix for SYRK (not general, as it isn't needed yet) - * Fixed bug in EnforceNonPwr2LO caused by failure to rename moved - structure in the Cinfp array - * Fixed makefile problem that caused ATLAS to re-archive the L3BLAS for - every tester compile - * On windows, added -lkernel32 to LIBS macro to enable shared lib build -ATLAS 3.9.9 released 02/26/09, changes from 3.9.8 - * Fixed bug in Xtsyrk's ATL_tsyrkdecomp_K, both on when the algorithm - is used, and correctness for when K is not large enough to give all - processors NB of work. - * Fixed bug in lanbtst, where single precision (S/C) used double values - rather than single values when determining workspace requirements - * Changed atlas_install to have a final library build phase - - Was not rebuilding lib after post-build tuning - -> Caused lapack and poss other files to be untuned unless user rebuilds - by invoking tester/timer for each subpiece - -> Caused dynamic libs to be built from badly tuned libs - * Added missing lapack arch defs for Corei764 and MIPSICE9 -ATLAS 3.9.8 released 02/23/09, changes from 3.9.7 - * Fixed bug in ATL_Xtgemm where ATL_thrdecompMM failed to return the - number of processors on non-power-of-2 processor systems - * Fixed bug in ATL_tsyrk where I was calling the K-splitting routine - when the required workspace was large, rather than when it was small. - * Fixed analagous problem in ATL_tsyrk as the 3.9.7 did for ATL_tgemm; - however, tsyrk bug could not have been exercised by current decomposition. - * Introduced some fixes & workarounds for SiCortex/MIPSICE9: - - Changed default MIPSICE9 compiler back to gcc, since pathcc produces - bad ATL_tsyrk when optimization is above -O1 (confirmed compiler error) - * Added dependence on atlas_ptalias3.h in cblas interface Makefile. -ATLAS 3.9.7 released 02/20/09, changes from 3.9.6: - * Fixed bug in ATL_tgemm that caused seg faults for some small-M tGEMMs - * Added architectural defaults for K7323DNow (Athlon "classic") -ATLAS 3.9.6 released 02/01/09, Changes from 3.9.5: - * Made it so LAPACK is tuned specifically for threading as well as for serial - - Added threaded lapack arch defs for: - + Core264SSE3, P4E64SSE3, Corei764SSE3 - * Made it so LAPACK NB-tuning is mu/nu aware - * MIPSICE9 (sicortex) improvements: - - added pathcc arch defs - - updated gcc arch defs to better values - --> Still getting errors on this platform - * Some bug fixes: - - Detect model 29 as Core2 - - Rewrote ptFlushAreasByCL to use new thread framework - - Fixed handling of non-power-of-2 number of threads - - Better dependencies for building ilaenv -ATLAS 3.9.5 released 12/11/08, Changes from 3.9.4: - * Complete rewrite of ATLAS threading system: - - Now supports native windows threads in addition to pthreads - - Use of master-last and affinity increases threaded performance, with - an advantage that grows with P (almost no advantage for P=2, but for - instance LU is more than 60% faster asymptotically on a P=8 Core2) - + OS X and FreeBSD don't support processor affinity, and so their - performance is still bad - - Cacheedge specifically tuned for threading (another 5%) - * Changed emit_buildinfo so that it replaces all control characters with - spaces (prevents errors under windows). - * Added dependency info for ATL_ilaenv so that it is recompiled once - lapack tuning is complete - * Fixed error in configure where it issues commands in wrong directory - when the user builds lapack directly from a tarfile - * Fixed typos in config.c where I used 'comp' rather than 'comps'. - * Added mmtime_pt.c, which can allow us to find kernels that do well - in parallel operation. - * Various small configure fixes for windows -ATLAS 3.9.4 released 09/06/08, Changes from 3.9.3: - * Improved Windows/cygwin configure with addition of archinfo_win.c - * Added basic support for Windows/interix - - Did not pursue much due to widespread seg fault in gcc, hundreds of - hard-to-get "hot fixes", and ancient gnu tools that can't assemble SSE3 - * Removed special "no-need-to-copy" cases from ATLmm_JIK/IJK.c, since they - occasionally seem to cause large performance drops. - * Changed it so JIK matmul always called for rank-K update, in order to - reduce access costs on C. - * Fixed several errors in ATLAS's ILAENV. - * Fixed several errors in configure - * Fixed error when -Ss lasrc is given as relative rather than absolute path - * Added BETA support for auto-building shared/dynamic libraries when the - user passes --shared to configure (no need to explicitly set compiler - flags [eg., -fPIC] for any of the known compilers): - - Not fully tested, but appears to work for Windows, OS X and Linux - - Now referenced in make install, but present process is crude - - with --nof77, get clapack reather than lapack; eventually probably want - a logical link of lapack -ATLAS 3.9.3 released 08/13/08, Changes from 3.9.2: - * Added much more extensive testing capability: - - make full_test / make scope_full_test - + Added Antoine's testing scripts to ATLAS. Just do a "make full_test" - to run them ("make full_test_nh" to use nohup for remove connection). - - make lapack_test[a,s,f]l_[ab,sb,fb,pt] / make scope_lapack_test_?l_?? - + Runs lapack testers linked against indicated lapack & BLAS - - See INSTALL.txt/"EXTENDED ATLAS TESTING" for further details - * Added several missing symbols from full LAPACK build - * Fixed it so ?lamch are compiled wt no optimization. - * Added ATLAS/src/blas/f77reference for ease of testing. Made it so by - default Make.inc's FBLASlib and BLASlib macros are set to this lib. - * Fixed errors in arch default creation for LAPACK defaults - * Changed test in LAPACK build Makefile to get around solaris shell - incompatibility - * Added architectural defaults for LAPACK QR tuning for: - - AMD64K10h32SSE3 (first time 32-bit archdefs are given for this arch) - - AMD64K10h64SSE3 - - PPCG564AltiVec - - Core232SSE3 - - HAMMER64SSE3 -ATLAS 3.9.2 released 08/09/08, Changes from 3.9.1: - * Improved Core2 performance, particularly for 32 bit and/or single precision - * Changed Core2Duo arch name to Core2, since we use this description for - the entire Core2 family (including Xeon, Core2Quad, etc). - * Bug fixes: - - Fixed cycle of dependencies in L1 Makefiles causing an endless stack - of make processes (wt assoc hang) to spawn when tuning the L1 BLAS - - Fixed compile probs for archs w/o cacheline flush in assembly - - Fixed error in configure caused by change in CPUID usage - - Added missing f77 wrappers for GERQF and GEQLF - - Avoided CPP division in assembly on Solaris, due to binutils/solaris bug -ATLAS 3.9.1 released 07/22/08, Changes from 3.9.0: - * Fixed several small bugs in ATLAS/src/auxil/ATL_ptflushcache.c - * Fixed f77wrap ilaenv renaming errors - * Fixed error in ATLAS/src/test/ATL_f77geqrf.c that messed up --nof77 builds - * Fixed failure to quote MVCC, which messed up MVTUNE on some systems - * Fixed these errataed errors: - - http://math-atlas.sourceforge.net/errata.html#trsmNB - - http://math-atlas.sourceforge.net/errata.html#mipsK -ATLAS 3.9.0 released 07/17/08, Changes from 3.8.2: - * Added ATLAS/bin/lanbtst.c, which can be used to time lapack routines, - as well as tune the NB returned by ILEANV - - Ability to autotune LAPACK QR factorization performance by varying NB - + Added QR NB choice header files to Core2Duo arch defs - * Started producing standard C wrappers for F77LAPACK. See: - ATLAS/include/C_lapack.h - * Much improved DGEMM performance for Core2Duo and AMD K10h - * Configure improvements: - - Added '--with-netlib-lapack-tarfile' and '-Ss lasrc' flags to configure - so that the full F77LAPACK can be built w/o having to compile anything - external to ATLAS (no more fiddling with LAPACK's make.inc!) - - Added -Si latune [0,1] to autotune LAPACK QR performance - -ATLAS 3.8.3 released 02/21/09, Changes from 3.8.2: - * Fixed bugs: - - Numerous improvements to configure's architecture recognition - - Fixed D/ZGEMM cleanup error on MIPS - - Fixed TRMV tuning Makefile error - - Fixed Makefile error preventing TRSM tuning - - Worked around gcc's Solaris division bug - * Backported Core2 and K10h GEMM kernels - - Makes a *huge* perf diff on Intel boxes, slight improvement for K10h - * Added arch defs for Corei7 (64 bit only) -ATLAS 3.8.2 released 06/06/08, Changes from 3.8.1: - * Fixed bugs: - - Pervasive performance bug in GEMM, affecting all architectures - - Occasional access of C when BETA=0 - * Configure improvements: - - Improved freebsd architecture probe - - Improved linux cpu throttling probe - * Added mu=4 SSE M cleanup for extra performance -ATLAS 3.8.1 released 02/22/08, Changes from 3.8.0: - * Fixed bug in slvtst that counted complex flops same as real - * Fixed bug causing wrong answer for row-major gemm C=A*A' or A'A - * Fixed bug in configure causing Pentium-M to be IDed as CoreDuo - * Fixed bug in tfc.c causing memory overwrite when too many samples taken - * Improved L1 BLAS timers so they work like the rest of the package, and - thus don't die all the time on tolerance failures - * Improved ATLAS/tune/blas/gemm/mmsearch.c: - - for x86, tried more registers, since smart compiler can reduce A & B - regs to 2 (and possibly even 1) - - Made it so search tries both load-C-at-top and load-C-at-bottom of - M loop. Bottom is superior for error, and ATLAS originally defaulted - to load-C-at-top. - * Added configure support for new K10h platform from AMD, as well as - basic architectural defaults (no new kernels, just good search) - -ATLAS 3.8.0 released 10/10/07, changes from 3.6.0: - * Improved installation support: now works with 5-step standard install: - - configure, build, check, time, install - - Support for easily building 32 or 64 bit libraries - - Support for building dynamic (shared) libraries - - Can build in any directory - * Added detailed installation guide (ATLAS/doc/atlas_install.pdf), - indicating how to build ATLAS, as well as describing how you can - ensure that the produced libraries get adequate performance as well - as the correct answers. - * Improved GEMM performance on most platforms: - - HAMMER (Opteron/Athlon-64), P4, P4E, Core2Duo, CoreDuo, MIPS, - G5/PowerPC970, POWER4, POWER5, etc. - - Better handling of long-thin matrices (K >> M,N) and rank-K, K<=4 shapes - - Improved complex performance on some platforms - - Further reduced error on some platforms - + ATLAS error bound always <= reference BLAS before reduction - * More OS support: - - OSX/x86, Solaris/x86, Linux/MIPS, modern Windows, - * A lot of other changes, see developer ChangeLog below for further details -ATLAS 3.8.0 released 10/10/07, changes from 3.7.40: - * Updated some documentation -ATLAS 3.7.40 released 10/10/07, changes from 3.7.39: - * Fixed configure, where lack of \n after GOODGCC caused errors on Itanium - * Increased MAXALLOC in tfc.c to allow larger malloc in CacheEdge detection - * Replaced nonportable == with -eq (int) or = (str) in test lines of - configure - * Rewrote config's handling of 32/64 compiler flags to be more robust - to get around error found when trying to install 32bit SunOS libs - * Added USIII architectural defaults and config support - * Updated atlas_devel and atlas_contrib -ATLAS 3.7.39 released 10/07/07, changes from 3.7.38: - * Updated configure to handle AIX 64-bit flags automatically - * Expanded and corrected PowerPC ABI section in atlas_contrib - * Fixed PowerPC assembly kernels to work under AIX for 64 & 32 bit ABIs -ATLAS 3.7.38 released 10/05/07, changes from 3.7.37: - * Added new install guide, ATLAS/doc/atlas_install.pdf - * Updated docs - * Added F77 testing wrappers for POSV and GESV, so slvtst can test F77 iface - * Expanded configure support for AIX, but build still dies - * Configure support and flags for G4 - * Added arch defaults for: - - Pentium III - - G4 using apple's hacked gcc 3.1 - - HAMMER32SSE3 - - HAMMER32SSE2 -ATLAS 3.7.37 released 08/10/07, changes from 3.7.36: - * Fixed error in gemm, so we call SYRK for A*A^T only when beta=0 -ATLAS 3.7.36 released 08/09/07, changes from 3.7.35: - * Some smoothing ops allowing easier use of windows compilers - * Fixed error in mmsearch causing PPC searches to die wt latency problems - * Fixed error where wrong flags caused snrm2 to be incorrect on Core2Duo - * Changed GER to heavily favor applying alpha to X, in order to keep LAPACK - from barfing up a lung on those tiny matrix test cases - * Fixed error in complex syreflect causing wrong answers in [c,z]gemm when - gemm is used to do a syrk -ATLAS 3.7.35 released 07/26/07, changes from 3.7.34: - * Changed it so pthread calls assert zero return value (debugging aid) - * Improved threaded GEMM performance for cases where two dim < NB - * Increased default MaxMalloc to 64MB - * Improved Windows support: - - Added support for building Windows ATLAS with Intel's ifort - - Added support for building on Windows without the cygwin library - - Added ability to get cycle accurate timer when using Windows compiler - * Improved POWER4 & P4SSE2 arch defaults. - * Removed duplicate symbols in Make.mmsrc messing up shared library building -ATLAS 3.7.34 released 06/25/07, changes from 3.7.33: - * Fixed error causing read of C for beta=0 in ATL_mmJITcp - * [S,D]KC compiles the bulk of the non-kernel library - * Added 64 bit single precision Core2Duo kernel, added to arch defs - * Added gcc4.2/P432SSE2 arch defs - * Changed all Makefiles so ICC compiles only interface routines, and - * Added support for POWER4/Linux, including 64 & 32 bit arch defs using gcc - - No xlc support or single precision assembly yet - * Install using gnu compilers now works under Windows - * Now works correctly for Linux/POWER5/gcc -ATLAS 3.7.33 released 05/01/07, changes from 3.7.32: - * Made it so ATLAS builds on Solaris x86: - - Had to remove all constant divides in integer expressions in assembly, - as Sun geniuses decided to change comment character to '/' - + \/ is supposed to work, but doesn't - - Had to touch every x86 assembly file to change assembly comments to /**/ -ATLAS 3.7.32 released 04/27/07, changes from 3.7.31: - * Adapted MIPS double prec kernel to single - * Added 32-bit support (n32) to MIPS (assembly & config) - * Ported UltraSPARC assembly kernels used by arch defs to v9 ABI - * Added arch defs to build 64 bit (v9) ABI for Solarix/UltraSPARC - * Documented these new interfaces in atlas_contrib. -ATLAS 3.7.31 released 04/17/07, changes from 3.7.30: - * Fixed bug in atlas_prefetch found by David Cournapeau. - * Added MIPSICE9 prefetch option, d/zgemm assembly kernels and arch defaults. - - These should work on most MIPS platforms - - Assembly kernels work under IRIX, but no way to get cc to do prefetch - + could not make cc's pragma work with ATLAS's atlas_prefetch defs - * Added support for OSX/PowerPC970: - - Double precision assembly kernel getting 82.5% of peak (4*Mhz) - - Single precision assembly kernel getting 79% of peak (8*Mhz) - - Arch defaults for 64 & 32 bit installs - - Config support for random-ass apple flag extravaganza -ATLAS 3.7.30 released 03/25/07, changes from 3.7.29: - * Bug fixes - - fixed error in building --nof77 dynamic libs - - fixed dynamic lib link for f77 interface libs - - Updated L1 kernel testers in tune/ for function routs to call the test - func first (so correct answer not on stack), and to check for NaN - - Fixed it so error report genned again. - - Fixed error causing real JITcp to copy all the time, and then fixed - error in func ptr when this was selected. - * Wrote special Just In Time Copy (JITcp) gemm for complex that copies A&B - a block at a time, and calls the real kernel for complex matmul - - Speeds up large-case z/cgemm on some platforms (5-10%) - - Speeds up long-K case for some platforms (as much as doubles perf) - * Fixed miscalculation of CacheEdge, where I stopped using it for large K. - This fix reduces memory usage, and speeds up asymptotic case a bit. -ATLAS 3.7.29 released 02/28/07, changes from 3.7.28: - * Wrote special routines (mmBPP and mmMNK) for handling small M, N and - large K case. For M = N <= NB can double performance. Presently works - for real precisions (s,d) only. - * Translated x87 Athlon-64 kernel to 32-bit assembly. - * Put in special code to handle SYRK call to GEMM by calling SYRK and - reflecting the triangular matrix. Doubles speed, and avoid fp error - on reflection. - * Added arch defaults for Core2Duo32SSE3 - * Fixed some problems with -b 32 in configure and building dynamic libs - * Fixed ATLAS/bin Makefile to correctly link x?l1blastst_dyn - * Enlarged MaxMalloc -ATLAS 3.7.28 released 02/11/07, changes from 3.7.27: - * bugfix release on 3.7.27 on configure/compiler behavior: - - Fixed possible infinite loop in probing for f77libs - - Made gnu arch defaults work for gnu compilers regardless of compiler name -ATLAS 3.7.27 released 02/10/07, changes from 3.7.26: - * Support for building ATLAS to .so! See INSTALL.txt for details. - * Expanded support for appending compiler flags: - - Can specify flags to be appended to gcc in user-contributed index files - - Can append flags to only C compilers - - Can append flags to only C+usergcc, all+usergcc, etc. - * Configure now recognizes gnu compilers as gnu compiler regardless of - compiler name when looking for default flags for user-override compilers -ATLAS 3.7.26 released 01/30/07, changes from 3.7.25: - * Added line to all assembly files to declare them as not requiring an - executable stack for Linux (apparently, lack broke SELinux). - * Numerous assembly fixes, particularly forced use of .text and asmdecor - in all x86 assembly files. - * Fixed dnrm2's to call sqrtl to avoid gcc round-down. -ATLAS 3.7.25 released 01/22/07, changes from 3.7.24: - * Added x87 nrm2 assembly kernels to avoid gcc probs, changed old - gcc-compiled nrm21 kernels to use double native precision for - accumulator (breaks dnrm2 due to gcc's spurious round-down). - * Changed Athlon64 and Core2Duo arch defaults to use load-at-bottom gemm - kernels, which should reduce GEMM error - * Changed configure to error out if ran in ATLAS source directory. - * Changed all ATLAS/doc postscript files to .pdf -ATLAS 3.7.24 released 12/18/06, changes from 3.7.23: - * Fixed alignment problem in x87 hammer kernel causing large performance - losses for AMD64 machines. -ATLAS 3.7.23 released 12/07/06, changes from 3.7.22: - * Fixed bug in Makefile causing repeated path - * Added basic config support for Irix - * Added basic arch defaults for MIPS R1[2,4,6]K using MIPSpro cc - * Several small bug/compatibility fixes found by MIPS/cc install - * Modified handling of MAFLAGS to prevent compiler hang for gcc3/Itan - and cc/MIPS. -ATLAS 3.7.22 released 11/26/06, changes from 3.7.21: - * Fixed bug in mmsearch's ProbeFPU that gave advantage to muladd=0, not =1. - * Added support for Itanium's to config - - Added extra lines with gcc 4's best flags to ?cases.flg - - gcc 3 still produces best code by slight margin - - Found arch defaults that do well for both gcc 3 & 4 - * Fixed complex C = A A' bug: - https://sourceforge.net/tracker/index.php?func=detail&aid=1598272& \ - group_id=23725&atid=379482 -ATLAS 3.7.21 released 11/18/06, changes from 3.7.20: - * Made gemm call axpy-based GEMM when K < 4 && M >= 40 and - no-copy code would be used -- should help bottom of LU recursion perf - * Changed it so all F2C probes linked by Fortran do all I/O in Fortran, - instead of printing from C (some platforms seem to have problems - redirecting C I/O from a Fortran-linked program). - * Several bug fixes - * Added config support for solaris install -ATLAS 3.7.20 released 11/11/06, changes from 3.7.19: - * Added ability to use Cij = instead of Cij += on first iteration of loop - in emit_mm.c: - - Max K unrolling where this is done is set by cpp macro MAX_CASG_KU - to avoid code bloat (always works for full unroll) - - For muladd=1, doesn't work if K is unknown at compile-time - - Speeds up load-at-bottom and beta=0 code - * Added ability to prefetch C when prefA selected and doing load-at-bottom - or beta=0. Gives nice speedup on HammerX2, need to test other machines - * Added -falign-loops=4 to x87-using flags - - big speedup on Hammer, need to test on Intel - * Several bug fixes to allow config/install to work on OSX/Core2Duo: - - Fixed userindex so that it substitutes $(GOODGCC) for gcc in .SSE & .3DN - files as well as in .flg - - Made user override of 64 bits switch the probed assembly if it was - probed to be x8632 - - Fixed freebsd archinfo syntax error (typo in code that fixed overflow). - - Fixed typo in iamax_SSE.c - - Replaced binary constant with hex in Core2Duo gemm kernel - - For portability, rewrote saxpy_sse.c to avoid indirect jumps -ATLAS 3.7.19 released 10/14/06, changes from 3.7.18: - * Fixed config so it defines [S,D]MAFLAGS, and changed muladd probe - to use them - * Fixed a couple more assembly files to work with OS X - * User can now successfully override 32/64 bit choice on the configure - line using -b [32,64]. - - Made config append -m32/-m64 to gnu compiler collection when ptrbits - is overridden by the user on the configure line - - Fixed error in userflag.c - - Fixed lack of ' ' around C compiler names in GEMM files - - After probes finished in config, made 32-bit override change detected - asmb to 32 if it was presently 64 -ATLAS 3.7.18 released 10/12/06, changes from 3.7.17: - * Bugfix release only: - - Fixed configure so that multiple compiler flags can be passed to config. - - Adapted x86 assembly kernels in Level 1 & src directories so that they - will also run under OS X - - Added needed #define to ATLAS/src/invtst.c - - Added fix to disambiguate int & long in f77/C interface -ATLAS 3.7.17 released 09/09/06, changes from 3.7.16: - * Added ability to generate non-diagonally dominant positive definite - matrices to Cholesky-based testers if POSDEFGEN is defined - * Added new Core2Duo kernel (also think good for P4E64). - * New Core2Duo arch defaults. -ATLAS 3.7.16 released 08/30/06, changes from 3.7.15: - * Added flag --with-netlib-lapack to configure - * Added src/testing f77 wrapper for QR - - Still must write LU wrapper and test LLt - * Added crude ability to call QR from slvtst - * Added config support for Core2Duo and Core2Solo - * Added architectural defaults for Core2Duo64SSE3 - - Hand-tuned cases not yet optimized; presently using P4-tuned kernels - * Made "make install" allow copy of fortran interface to fail w/o dying - (for users w/o fortran compiler) -ATLAS 3.7.15 released 08/22/06, changes from 3.7.14: - * New x87 kernel that achieves over 90% of peak for double precision - Opteron/Athlon-64. Gemm runs at roughly same speed as old SSE kernel, - but LU and Cholesky actually get a speedup. The fp stack usage - of this kernel was suggested by the new gcc. - - New arch defaults for HAMMER64SSE[2,3] - * Modified ILEANV so small problems aren't told to use the full ATLAS NB. - * Fixed error in mmsearch.c that often caused complex performance to be - misreported - * Fixes/updates to ATLAS config system: - - Added support for DESTDIR system on install target as in gnu - - Made config kill any genned core and object files after run - - Made "make build" delete all config executables - - Added --nof77 to configure - - Added "make check" as sanity test instead of "make test" - + If --nof77 has been thrown, "make check" only calls C interface testers - - Added probe for 3DNow, merged 3DNow 1 & 2. -ATLAS 3.7.14 released 08/17/06, changes from 3.7.13: - * Fixes/updates to ATLAS config system: - - Improved cpu throttling probe - - Added compiler test so only compilers that work are chosen from defaults - - Added simple C interoperation test - - Fixed frontend/backend tmpnam collision prob (config[1,0].tmp) - - Re-enabled parallel make support - - Fixed buildinfo support - - Added clock speed probe to config - - Enabled "make time" to produce performance summary! - - Added "make check" as alias to "make test" to make more like gnu - -- Alias not working, need to check! - - Fixed error in -Si nof77 1, which caused config to die w/o f77 compiler - * Added new arch defaults for P4E[32,64]SSE3 and HAMMER64SSE3, which get - better performance for gcc 4.2 (perf should still be OK for gcc 3). -ATLAS 3.7.13 released 07/26/06, changes from 3.7.12: - * Mainly, fixes/updates to ATLAS config: - - Added cpu throttling test to linux, and enabled it - - Added "make install" to copy libs and includes - - Fixed basic "make error_report" - - Added 32/64 bit distinguishing in x86 arch def - - Added "-Si nof77 1" to enable easier build wt no f77 compiler - - Added "--help" handling to configure - - Added "-Si archdef 0" to suppress use of architectural defaults - - Added "-Si cputhrchk 0" to suppress CPU throttling error exit -ATLAS 3.7.12 released 07/19/06, changes from 3.7.11: - * Completely rewrote configure handling to make ATLAS act more like - gnu configure - - You now build ATLAS in an arbitrary build directory - + /path/to/ATLAS/configure ; make build ; make test - - Read ATLAS/INSTALL.txt for directions, everything is changed! - - Presently, only supported OSes are Linux and FreeBSD (OSX). - Will be adding more in subsequent developer releases. - * Added support for prefetch in generator, mmsearch.c, fc.c, etc. - * Improved broken GetUserNB in ummsearch.c, which prevented good user cases - from being found on many systems - * mmsearch.c improvements: - - Added prefetch searching - - Updated FindMUNU to suggest 1-D vals on x86 boxes (2-op assembler). - - Made sure GetNO1D always returns false for x86 boxes (2-op assembler) - - Added special case for large number of registers (eg. Itanium) to - speed up munu search (searches near-square only) - + Untested, and likely needs fixing - - several small error-handling issues - * Improved masearch.c & L1CacheSize.c to make loop-removal by compiler - less likely. -ATLAS 3.7.11 released 07/21/05, changes from 3.7.10: - * This is a bugfix release: - - Fixed doc path errors caught by Kate Minola - - Fixed f77getrf/getri FunkyInts declaration - - Fixed Level 1 ref stX/StX typo in ATL_[dz,sc]refnrm2 caught - by Neil James - - Fixed assembly typo in ATL_dmm6x1x72_sse2 caught by Simon Perreault - - Added Dean's x86 assembly probe as backup for uname x8664 probe, - as Kate Minola reports uname probe doesn't work under solaris/x8664 -ATLAS 3.7.10 released 04/24/05, changes from 3.7.9: - * Updated config.c to use Dean Gaudet's contributed CPUID probe to get - relatively OS-independent x86 arch info. - * Fixed problem where altivec makes config think not using arch def flags. - * Added support for EM64T: - - Updated config to search for x86_64 independant hammer arch - - Updated P4E assembly kernels to run under x86_64 - - Updated hammer kernels to not use 3DNow inst if compiled on Intel - + cpp macro ATL_Has3DNow is now defined on sys possessing 3DNow!, - even if SSE is the selected SIMD paradigm - - Generated P4E64 arch defaults - * Added support for 64 bit ABI PowerPC Linux: - - Updated config to search for 64 bit PPC - - New macro ATL_USE64BITS set for all 64 bit ABI - - Updated G4 assembler kernel to handle 64 and 32 bit Linux ABIs - - Updated G5 assembler kernel to handle 64 and 32 bit Linux ABIs -ATLAS 3.7.9 released 04/22/05, changes from 3.7.8: - * In order to get icc to auto-vectorize, changed all ref L1 for loops: - for (i=0; i != N; i++) ---> for (i=0; i < N; i++) - also changed code generator (only if ATL_SSE1 defined): - for (k=N; k; k--) ---> for (k=0; k < N; k++) - * icc arch defaults for P4e (using autovectorization) - * Fixed errors in FA_malloc - * Changed mmsearch to use median of CPU times and min of WALL (no more tol) - * Updated config to recognize the G5 (PPC970FX) and handle apple gcc - * Updated AltiVec kernel to use line fetch for G5 - * Added G5-specific DGEMM assembly kernel - * Arch defaults for G5 -ATLAS 3.7.8 released 07/24/04, changes from 3.7.7: - * Better [d,z]GEMM kernel for Transmeta Efficeon -ATLAS 3.7.7 released 07/17/04, changes from 3.7.6: - * Better [d,z]GEMM kernel for Transmeta Efficeon -ATLAS 3.7.6 released 07/16/04, changes from 3.7.5: - * Arch defaults & config support for Transmeta Efficeon. - * New single prec SSE kernel, added to P4E arch defaults. -ATLAS 3.7.5 released 06/27/04, changes from 3.7.4: - * Added PA-RISC 2.0 config support, arch defaults, & assembly kernels -ATLAS 3.7.4 released 06/12/04, changes from 3.7.3: - * Modified L1 testers so they all take same flags - * Modified L1 timers so they all take same flags (not same as testers) - * Modified L1 & L2 tester & timers so they all take force-alignment flags: - -Fa 16 -Fx -32 : force 16-byte align for A, misalign X to 32 bytes -ATLAS 3.7.3 released 03/20/04, changes from 3.7.2: - * Added P4E (prescott) support - * Changed config to distinguish between P4 implementations based on model - number; presently knows about P4 (models 0-2) and prescott (model 3) - * Added SSE3 to ISA probe - * Updated s/d P4 kernels (not cleanup yet) to work with SSE3, and smaller - block sizes that prescott likes - * Added architectural defaults for P4E (prescott) -ATLAS 3.7.2 released 02/29/04, changes from 3.7.1: - * Added empirical tuning of TRSM_NB parameter -ATLAS 3.7.1 released 02/21/04, changes from 3.7.0: - * Increased 32-bit hammer single precision gemm to 64 bit speed -ATLAS 3.7.0 released 02/14/04 (I love optimization), changes from 3.6.0: - * Increased 32-bit hammer double precision gemm to 64 bit speed - -ATLAS 3.6.0 released 12/22/03, changes from 3.4.2: - * Gemm speedups for most architectures - - Hammer (Opteron/Athlon-64) - - IA64 family - - P4 - - PIII - - UltraSparc II & III - - single precision real Athlon3DNow! by Tim Mattox & Hank Dietz - * Faster Level 2 for P4/PIII due to improved gemv/ger kernels - by Camm Maguire - * Faster SYRK/HERK & dependent Cholesky - * New arch defaults for most architectures - * Many config changes, including command-line selection of compilers & flags - * Better complex row-major Cholesky factor & solve - * Several new architectures and compilers supported with arch defaults - - Explicit support for Intel compilers on P4 & PIII - - IBM Power 4 arch defaults included - *** See developer ChangeLog below for details - -ATLAS 3.6.0 released 12/21/03, changes from 3.5.22: - * Forced all non-x86 archs to have max TRSM_NB of 8, to prevent massive - Cholesky performance dropoff (essentially a performance bug) -ATLAS 3.5.22 released 12/20/03, changes from 3.5.21: - * Added ifort support under Windows - * Small fixes for the timers - * Made config default to not searching for BLAS -ATLAS 3.5.21 released 12/19/03, changes from 3.5.20: - * Added MVC support, plus non-gemm arch defaults for P4 - (thus './xconfig -b 0 -c mvc -f cvf' now gets you very good CVF lib) - * Defined symbols required for dynamic library - * Fixed bug in GetSysSum - * Numerous small config changes, mainly to make things smoother under windows -ATLAS 3.5.20 released 12/18/03, changes from 3.5.19: - * Config fixes - * Bunch of changes necessary to make CVF/icl work under windows -ATLAS 3.5.19 released 12/17/03, changes from 3.5.18: - * Numerous config bug fixes - * Added dummy ATL_cpmmJIKF symbol to lib (.so workaround) - * Arch defaults for US5 cc & gcc (missing L1 defaults for cc) - * Arch defaults for US2/4 gcc & cc - * Possible overflow & unnecessary division removed from ATL_walltime.c - * Added back winf77 stuff for Windows - - missing __alloca prevents CVS from linking, may be compiler bug: - http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8750 -ATLAS 3.5.18 released 12/15/03, changes from 3.5.17: - * Fixed bug killing multithreaded ATHLON - - Replaced Peter adaptation of Julian's kernel with my athlon kernel - for all cleanup and all precisions other than double real - * Rewrote compiler and flag handling in config, again - - do ./xconfig --help for new options - * Better compiler flags for gcc on IA64 (3.5.16 "improvement" was mistake) -ATLAS 3.5.17 released 12/13/03, changes from 3.5.16: - * Numerous small config fixes - * Removed compiler & flag mentions from GER cases files - * Architectural defaults & config flags for intel compilers on IA64 & PIII - - IA64/icc *much* faster than IA64/gcc for normal-size problems - + same asymptotic GEMM speed due to hand-tuned kernel - * Workarounds for icc bugs on IA64Itan2: - - Fixes errors in [d,s]TRSM, [c,z]HER, [c,z]HPR, [c,z]HER2K, [c,z]SYR2K - - fgrep code for ATL_IntelIccBugs: - + ATLAS/src/blas/level2/ATL_[hpr,her].c - + ATLAS/src/blas/level3/kernel/ATL_syrk2_put[L,U].c - + ATLAS/src/blas/level3/ATL_trsm.c - - If you don't use arch defaults, other icc bugs can kill you -ATLAS 3.5.16 released 12/10/03, changes from 3.5.15: - * Added command-line selection of compilers for config - * Added pthread options to compile flags for MP FreeBSD - * Better compiler flags boosts Itanium 2 performance - * Fixed bug in GEMV makefile generation that prevented ATL_gemvS that - required special compiler and flags from working - * Added some icc support to config (Linux ONLY) - * Add arch defaults for Pentium 4/icc - * Added arch defaults for IA64Itan2/icc: - - Don't use: presently they fail tester, probably compiler error - * New AthlonSSE1 defaults, courtesy of Tim Mattox - * Fixed bug causing hangs for installs with large NB and small CacheEdge -ATLAS 3.5.15 released 12/08/03, changes from 3.5.14: - * Added arch defaults and config support for IBM Power4 - * New PIIISSE1 arch defaults - * Updated L1CacheSize for crude timer resolution fix - * Changed cygwin cp fix from @ - cp to -@ cp (AIX Make requirement) -ATLAS 3.5.14 released 12/07/03, changes from 3.5.13: - * Improved L1 and CacheEdge detection - * All of Camm's new stuff in and working: - - CGEMV improved for Pentium 4 defaults - - All of Level 2 improved for 32 bit Hammer - - Improved Level 3 cleanup for 32 bit Hammer - * Updated 32 bit Hammer arch defaults - - Improved Level 2 from Camm's stuff - - Improved Level 3 from Camm and my P4 cleanup - * Improved 64 bit Hammer [d,z]GEMM M cleanup using new 1x14 kernel -ATLAS 3.5.13 released 11/30/03, changes from 3.5.12: - * Row-major, complex Cholesky error fixes - * New, and *much* more efficient Athlon 3Dnow! kernel from - Tim Mattox & Hank Dietz - * New P4 gemm cleanup cases, speeding up small-to-medium size problems - for double precision (real & complex) - * New P4 Level 2 kernels from Camm Maguire, speeding up Level 2 and - fixing massive compiler warnings - * More arch defaults, including BOZOL1, to allow skipping L1 tuning - * Added version number to Make.ARCH and install log files. - * Improved still-crappy cleanup search -ATLAS 3.5.12 released 10/05/03, changes from 3.5.11: - * New assembly UltraSparc kernels for both Ultra2 & 3. - * New arch defaults for UltraSparcs -ATLAS 3.5.11 released 09/27/03, changes from 3.5.10: - * Windows-specific makefile changes to match new cygwin behavior -ATLAS 3.5.10 released 09/13/03, changes from 3.5.9: - * Opteron speedups, all precisions Level 3 - * SPRK bug fixes -ATLAS 3.5.9 released 08/27/03, changes from 3.5.8: - * Recursive partitioning algorithm for when we can't copy A up front in - SYRK/HERK - * Itanium 2 gemm kernel, speeding up entire Level 3 BLAS - * Arch defaults and config support for Itanium 2 - * Arch defaults & config support for USIII (presently fails sanity test) - * Various bug fixes -ATLAS 3.5.8 released 08/09/03, changes from 3.5.7: - * Direct gemm-kernel [c,z]SYRK and xHERK implementation significantly - boosts SYRK/HERK and Cholesky performance - * Numerous bug fixes -ATLAS 3.5.7 released 07/15/03, changes from 3.5.6: - * Direct gemm-kernel implementation of SYRK significantly boosts SYRK and - Cholesky performance (only in real precisions so far). - * Fixed some errors that occur when using Solaris make rather than gnu. -ATLAS 3.5.6 released 06/26/03, changes from 3.5.5: - * Opteron speedups: - - Full cleanup for Opteron [d,z]GEMM - - Better CacheEdge improves threaded GEMM speed - * Bug fixes: - - Removed some extraneous characters my windows changes introduced - in assembler kernels - - Fixed errataed error in clapack.h -ATLAS 3.5.5 released 06/22/03, changes from 3.5.4: - * More Opteron [d,z]GEMM speedups - * Small Pentium 4 [d,z]GEMM speedup - * Fixes to support cygwin/windows compilation - - Removed reliance on case-sensitive archiver - - Workaround for windows assembly name-mangling - - Forced config to look for gcc-2 -ATLAS 3.5.4 released 06/15/03, changes from 3.5.3: - * Opteron [d,z]GEMM speedup -ATLAS 3.5.3 released 06/14/03, changes from 3.5.2: - * Fixed Athlon STRSM so sLU is sped up by new SGEMM from 3.5.2 - * Fixed aligned access error in iamax_sse -ATLAS 3.5.2 released 05/03/03, changes from 3.5.1: - * Athlon GEMM speedups for all precisions -ATLAS 3.5.1 released 04/21/03, changes from 3.5.0: - * Added AltiVec support via gcc 3.3 or newer (older gcc buggy) - - This gives Linux AltiVec speedups for first time - * Added support for OSX and Linux PPC assembler dialects to config -ATLAS 3.5.0 released 01/21/03, selected changes from 3.4.0: - * Added support for finding assembly dialect to config - * Redirected ISA extension output in config - * Added x86-64 support to config - * Added machinery so Level 1 kernels may be in assembly - * Miscellaneous x86 Level 1 speedups - * Assembly GEMM kernels improving performance for: - - x86-64 SSE2, all precisions (85% of peak for real, 83-84 for complex) - - SSE2 for Pentium 4, double real and complex - - Pentium III, all precisions - - UltraSparc, big boost for single precision - -ATLAS 3.4.2 released 08/19/03, bugfix release. -ATLAS 3.4.1 released 06/17/02, bugfix release. -ATLAS 3.4.0 released 05/11/02, selected changes from 3.2.1: - * Optimization of Level 1 BLAS - * Additional architecture-specific support: - - OS X and AltiVec support - - IA64 prefetch - - Julian Ruhe's Athlon kernel boosts performance to ~80% of peak - * New LAPACK routines: - - xTRTRI - - XGETRI - - XPOTRI - - xLAUUM - * User callable info function ATL_buildinfo() - * User callable sanity check - * Numerous small speedups and error corrections, see below for details - -ATLAS 3.3.15, changes from 3.3.14: - * Fixed PPCG4 arch defaults - * Made it so Linux_21164 does not use GOTO gemm - * Fixed config hang when using Solaris make - * Relaxed too-strict residual tests in lapack testers - * Updated atlas_contrib to point at SourceForge rather than atlas-comm - * Fixed error in no-copy case of aliased gemm (SSE&3DNOW [s,c]TRMM/TRSM) - * Fixed GETRI workspace query -ATLAS 3.3.14, changes from 3.3.13: - * Got rid of duplicate ger and gemv symbols in libatlas -ATLAS 3.3.13, changes from 3.3.12: - * Bug fixes release: - - error in dsdot tester - - g77 flags for compiler error on Itanium - - Error in emit_mm (K cleanup) - - Error in threaded syrk - - Error in Ultra5/10 arch defaults -ATLAS 3.3.12, changes from 3.3.11: - * Bug fixes, including: - - Error in Level 1 tester - - Error in Level 1 routine - - Error in threaded SYMM - - Error in fc.c - * Addition of ATLAS/doc/atlas_devel.ps, with description of how to use - the ATLAS tester. -ATLAS 3.3.11, changes from 3.3.10: - * With Peter's extension to Julian's Athlon code, 80% of peak on all - precisions, providing massively improved Athlon performance - * Additional arch defaults, and config changes -ATLAS 3.3.10, changes from 3.3.9: - * Boatload of bug fixes - * Applied Goto's Linux patch - * New arch defaults -ATLAS 3.3.9, changes from 3.3.8: - * Slightly improved [Z,D]GEMM on PIIISSE1 (prefetched kernels) - * Slightly improved DGEMM kernel for Athlon - * Updated ATLAS/tune/blas/[gemv,ger] to match other levels - - All kernels now have ID - - All kernels can now extend line and give compiler and flags - - If compiler line is given as +, get default compiler with added flags - (useful for changing prefetch distances, etc) - - gcc sub is done, as for other levels - - basic infrastructure for xccobj is in place (untested) - * SYMV update - - SYMV now tuned seperately from GEMV - - Slightly improved GetPartSYMV -ATLAS 3.3.8, changes from 3.3.7: - * Addition of Julian Ruhe's double precision Athlon kernel - * Addition of sanity_test build check - * Addition of LAPACK routines xGETRI & xPOTRI (row & col-major versions) - * Addition of recursive version of LAPACK routine xLAUUM - * Ability to tune xROT - * Bunch of bug fixes. -ATLAS 3.3.7, changes from 3.3.6: - * Bug fix release: - - AltiVec support had been messed up since change to CVS at 3.3.3 - - Fix in CacheEdge printing of ATL_buildinfo -ATLAS 3.3.6, changes from 3.3.5: - * Peter Soendergaard's recursive TRTRI now built into lapack lib. - * Version and build informational routine, ATL_buildinfo - * Config supports avoiding gcc 3.0 on x86 archs, whenever possible -ATLAS 3.3.5, changes from 3.3.4: - * Removes dummy TRTRI from lapack lib - * Improves IA64 complex gemm performance (removes prefetching) -ATLAS 3.3.4, changes from 3.3.3: - * Bug fix release, fixing P4 and Athlon archs. -ATLAS 3.3.3, changes from 3.3.2: - * First release based on SourceForge CVS, rather than my home area - * IA64 prefetch added, speeding up all levels -ATLAS 3.3.2, changes from 3.3.1: - * Index files for user-contributed GEMM kernels take ID parameter - * Updated ATLAS/doc/atlas_contrib.ps to include changed GEMM index and - ability to tune Level 1 - * Added OS X support to config - * Added AltiVec support to ATLAS, speedup up all precisions, all levels - * Bug fixes for Level 1 tuning -ATLAS 3.3.1, changes from 3.3.0: - * Tuning and kernel contribution for Level 1 - * Level 1 tuned decently well for Athlon classic -ATLAS 3.3.0, changes from stable: - * Camm & Peter's SSE2 GEMM kernel - * Small-case LU & Cholesky speedup - * Complex TRSM speedup - -ATLAS 3.2.1, released 03/23/01, bugfix release. -ATLAS 3.2 (stable), released 12/20/00. The highlights of -changes from v3.0Beta are: - ** SMP support via posix threads for Level 3 BLAS - ** Addition of infrastructure for contribution of kernels, thus allowing: - ** SSE support - ** 3DNow! support - ** Speedups on ev6x, ev5x, UltraSparcs, IA64, PowerPC archs - ** Level 1 BLAS tester/timer added - ** Additional OS and architectural support - ** Bug fixes and misc. speedups - -ATLAS version 3.0Beta (stable), released December 1999. The highlights of -changes from v2.0 are: - ** ATLAS now supplies complete BLAS, although some level 1 and 2 BLAS not - fully optimized on all architectures - ** Some LAPACK routines explicitly supported (LU, Cholesky and related - routines) - ** Standard C and Fortran77 APIs for all BLAS and provided LAPACK routines; - C routines support both row- & column-major access - ** Improved small-case GEMM performance made possible by code generator that - can generate all transpose cases (and thus avoid data copy), with - associated speed boost in many Level 3 BLAS routines. - ** Support for complex matrix multiplication without copying user data - ** Support for additional looping structures for complex GEMM, providing - better performance and reducing memory usage for many cases - -ATLAS version 2.0, released February 1999. The highlights of changes -from 1.1 are: - ** Support for all 4 types/precisions - ** All Level 3 BLAS routines now supported - ** Fortran77 is not required for installation - ** Install & configure steps are now automated & logged - ** Timer/tester for all Level 3 BLAS now included - ** C interface to BLAS supported, and tester provided - ** Improved small-case matrix multiply performance - -ATLAS version 1.0, released September 1998. The highlights of changes -from version 0.1 are: - ** Support for entire real Level 3 BLAS via the Superscalar gemm-based - BLAS (written in Fortran77) - ** Improved matmul generator, including support for explicit - register blocking in GEMM - -First ATLAS release, version 0.1, released December 1997. Provided: - ** Optimized, real matrix multiplication - ** Real GEMM tester/timer +libbml0 \ No newline at end of file
Locations
Projects
Search
Status Monitor
Help
Open Build Service
OBS Manuals
API Documentation
OBS Portal
Reporting a Bug
Contact
Mailing List
Forums
Chat (IRC)
Twitter
Open Build Service (OBS)
is an
openSUSE project
.