This site may earn affiliate commissions from the links on this folio. Terms of apply.

Final month, AMD launched its new server compages, codenamed Epyc, in unmarried-chip configurations of up to 32 cores. We already knew back then that Intel was prepping a massive Xeon refresh, starting with the Core X-Serial (working on that, for the record) and following it upward with a new lineup of Xeon parts with up to 28 cores, 56 threads, and a new L2 cache structure that quadrupled the amount of L2 while slashing the amount of L3 allocated per-cadre.

On Monday, Intel launched roughly 50 SKUs in total, with top-end 28-core prices reaching $10K to $11K per physical CPU. Intel'south new Xeon "Purley" Skylake-SP CPUs supports AVX-512, Intel's own mesh topology, and the aforementioned larger L2 cache, and so the fries are rather significantly different (with both gains and losses) relative to previous Xeon products.

Over at Anandtech, the indelible Johan De Gelas (once of Aces Hardware for you longtime tech readers) has joined upwardly with Ian Cutress to provide preliminary information on how AMD's Epyc and these new Skylake-SP Xeons compare with one another, with previous Xeon chips thrown in for good measure.

A few points before we swoop in. Anandtech acknowledges having had just one calendar week with its AMD testbed and ii weeks with the Intel organisation. Server testing is far more complicated than desktop tests, the benchmarks themselves are often more cabalistic and trickier to fine tune, and performance tin exist very dependent on the presence or absence of such optimizations. Anandtech makes prominent note of the fact that they had just a very limited window in which to exam and apply or discover such optimizations, peculiarly in AMD's case.

Second, the operation scenarios and relative rankings of Epyc versus Xeon are themselves highly dependent on the tests in question. This is the first time in at least six years that AMD has had a server part that could have the fight to Intel in any context, but testing has shown some distinct potent and weak points to AMD'southward architecture. While we'll provide an overview of the findings, at that place's no substitute for shut reading of the original commodity if you want to completely understand the subtleties.

Inherent strengths, weaknesses, and differences

In comparing to Intel's new chips, AMD'south Epyc uses its own CCX and Infinity Fabric, doesn't implement AVX-512, and has the same cache structure every bit Ryzen. This proves critical to agreement how Intel and AMD compare in a number of benchmarks (more than on that in a moment).

AMD-Epyc

AMD has a significant reward in base price; the top-end Epyc 7601 (180W TDP) is a 32-core chip with a 2.2GHz base / 3.2GHz max clock speed and a $four,200 price tag. Intel's Xeon 8180 is a 28-core chip with a 2.5 – three.8GHz max clock and a $10,009 price tag (the same chip in a 165W TDP with support for 1.5TB of DRAM per socket, and a two.1GHz base clock retails for $11,722). Anandtech tested the Xeon 8176 — 28 cores, 2.1GHz base of operations, with a maximum of 768GB of RAM per socket and a price tag of $8,719. Intel'due south new Platinum/Gold/Silver/Bronze format looks nothing curt of nightmarishly complicated, with vastly different specs swept into the same "families" in some cases. Other designations contain a number of exceptions to the rules that are supposed to govern which fries are placed in which brackets.

See? The merely divergence between 51xx and 61xx is the number of QPI links, AVX-512 FMA units per core, core counts, RAM support, and scalability. They're practically identical!

We noted when news of the rebrand hit that information technology wasn't articulate how this structure would clarify Intel's product lines, and a muddle is precisely what's emerged from these results.

Cache changes

I desire to take a moment to talk about enshroud architecture differences between the new Skylake-SP Xeons and previous CPUs, as well equally between Intel and AMD. Skylake-S processors have a 256KB L2 cache that's 4-fashion gear up associative (run across our L1 vs. L2 enshroud explainer for details on what this means) with an 11-cycle latency. Previous Xeons used a large inclusive L3 cache with ~2.5MB of L3 enshroud allocated per core, up to 16-way set associativity, and a 44ns cycle time.

Skylake-SP, on the other hand, has a 1MB L2 cache that'southward 16-way associative, just has higher (13 bike) latency. Less L3 cache is integrated per cadre (1.375MB), the cache is 11-way set associative instead of xvi-way, it has a 77 cycle latency (up from 44), and it'south a non-inclusive cache.

An inclusive enshroud is a cache that is guaranteed to contain data plant inside higher level caches. The advantage of inclusive caches is that you can search the highest level of cache (L3 in Intel'south case) and determine whether information is located in L1. If yous tin can't find information technology in L3, you know it's not in L1, which means yous know you need to load information technology. This reduces the miss latency penalisation (searching main memory is notwithstanding much slower than searching L3). The disadvantage to an inclusive cache is that they offer less real space for storing data, since each cache must contain all the information in the cache level above it. Intel's utilize of very large L3 caches in previous Broadwell and Skylake-S chips mitigated this event by providing a large absolute amount of cache space.

Skylake-SP transforms the L3 cache into what is often called a victim cache, because data lines present in L2 aren't copied to L3 until they are moved or evicted. Data can exist read back from L3 into L2 but too remain in the L3. Anandtech doesn't believe Skylake-SP can prefetch into L3, which means it serves every bit a home for "evicted" data. It's not used as much as the inclusive Broadwell and earlier Xeon L3 cache, which is why Intel tin can relax its latency and performance.

Meanwhile, AMD uses its own distinct CPU Complex (CCX) design, which combines four CPU cores and an 8MB L3 cache. Two CCX's brand up one Zeppelin die, and AMD's ain Epyc diagrams testify upwardly to 4 dies per CPU packet. The L3 is mostly exclusive victim cache, merely AMD's reliance on the CCX compages for cross-communication between cores ways there are some tangible penalties and impacts. Local data motion inside the same CCX is quite quick, but in that location'southward a meaning latency penalty for moving data across CCX complexes. AMD states that a Naples CPU (4 Zeppelin dies) has 64MB of L3, but that's non actually accurate. What Epyc has is amend described as 8x8MB L3s, in much the same way that a pair of GPUs in SLI fashion with 4GB of RAM each are better described equally 2x4GB GPUs as opposed to an 8GB GPU.

These enshroud structure divergence account for a substantial part of why Epyc, pre-Skylake-SP Xeons, and the new Purley Xeons perform differently than one another. But they're scarcely the merely factor in play. The chart below shows how complex the comparisons between Epyc, Broadwell-EP, and Skylake-SP tin get in just retention bandwidth depending on test weather condition.

MemHierarchy

There's no "wrong" test effect hither and all these test types are used by shipping software to varying degrees.

AMD's Epyc 7601 has 0.42x of Skylake-SP's bandwidth in some tests, merely 2.26x more bandwidth than others, depending on how threads are pinned across the CPUs. Raw bandwidth for Broadwell-SP is higher than Skylake-SP in nearly every case except when viii threads are running, which is where Skylake-SP finally pulls alee. Relative memory latencies are also different between AMD and Intel, with AMD competing extremely well at or below 4MB of L3 and poorly once in a higher place that point. Accessing more than 8MB of L3 is a worst-case scenario for Epyc; its latency is worse than Intel's DRAM admission latency.

latencyepyc_xeonv5_tinymembench

Ouch — just not as formative of overall functioning every bit yous might think, given the scope of the gap.

Performance Overview

AT runs through SPEC2006 (single-thread, SMT, multi-core), database and transactional performance, Java, large data number crunching, and floating indicate performance. AMD's FPU operation is surprisingly excellent compared with Intel. At that place are several reasons for this, merely a number of them come down to various aspects of AVX and its bear on on turbo clocks. For the concluding few product cycles, Intel has publicly stated that its Turbo Fashion frequency figures depend on whether AVX is active, with not-AVX clocks beingness substantially lower. Intel's Xeon 8176 has a non-AVX 28-cadre maximum turbo frequency of 2.8GHz, an AVX ii.0 28-core maximum turbo frequency of 2.4GHz, and an AVX-512 28-core maximum turbo frequency of but i.9GHz.

NAMD MolDyn

Intel talks upward its use of 256-flake and 512-scrap FMACs compared with AMD's 128-scrap implementation of AVX. But AMD may take taken the wiser route here (it wins all the FPU benchmarks AT ran). Intel takes a 20 percent clock penalisation compared with 256-bit AVX when running AVX-512. While college efficiency should theoretically be able to still evidence significant AVX-512 functioning improvements, they're only going to happen with substantial performance tuning. Not all software vendors or buyers can afford that kind of work, but it'll exist critical for AVX-512 to be a success.

FPU operation is, surprisingly, AMD's best total showing. Information technology's a mediocre database server, beats Intel in Coffee functioning (simply not by the same margins as in FPU code), and is extremely competitive in Big Data tests given price and clock differentials. Power consumption varies substantially past workload; the Xeon 8176 has extremely high idle power consumption, but vastly better MySQL perf/watt than Broadwell and modestly better perf/watt in this exam than the Epyc 7601. In POV-RAY testing, AMD flips the tables on Intel, with college operation at a huge power differential (327W for Epyc versus 453W for Skylake-SP).

Conclusions

The bottom line is this: AMD's Epyc isn't the improve choice in every situation or environment. But a combination of lower prices, competitive operation, and some solid exam wins show AMD tin can hang with Intel again, even at the top of the market place. For hardware price-conscious companies, or vendors that can afford to optimize heavily for Ryzen (cloud providers similar MS, for instance), Epyc is a very potent brand. Merely Skylake-SP shows some formidable performance gains of its own, has a better scaling mesh topology, and the stronger overall level of performance. If your TCO is dominated more than by software costs than hardware pricing, Intel and its proven track record may still be the better option here.

Finally, I'd like to echo some comments Johan makes. After years of watching Intel'south just competition beingness its own previous generation of products, it'due south really nice to meet some genuine functioning dorsum-and-forth. One of the grand ironies of reviewing is that people regularly accuse reviewers of using various tricks or indulging biases to tilt reviews deliberately towards AMD or Intel when, in reality, we're probably the people that well-nigh want to see exciting performance matches. Articles like this (or, of course, AT's vastly larger review) don't write themselves; they take considerable time and effort. It's boring to scout the aforementioned company win over and over. Nobody likes a slugfest better than a reviewer, and this review is worth a read.