Categories
Ξ TREND

The microarchitecture of the Apple M3, M3 Pro and M3 Max, explained: abysmal leap for its GPU, although more modest for its CPU


Apple’s M3 processor family is here. And it promises. Those from Cupertino have presented three chips with different features and a common microarchitecture: the M3, M3 Pro and M3 Max CPUs. At the moment there is no sign of a very likely M3 Ultra processor, although we can be reasonably sure that it will arrive in the future, possibly when Apple decides to renew its Mac Pro and Mac Studio desktop computer families.

These are the first microprocessors for laptop and desktop computers manufactured using TSMC’s 3 nm photolithography, which is currently the most advanced in the large-scale production phase of this Taiwanese company. However, the microarchitecture of these chips also introduces several very important improvements that, on paper, should allow them to clearly outperform the equivalent M2 family CPUs.

The subsystem that incorporates the most relevant improvements is the graph. For the first time a Mac processor implements hardware acceleration for ray tracing, and it’s also the first time the integrated GPU relies on a dynamic cache. In addition, the performance of the CPU cores and the Neural Engine for artificial intelligence is, according to Apple, much higher than in the M2 processors, and the M3 chips are designed to coexist with a larger unified memory map (up to 128 GB if we stick to the M3 Max CPU).

The Apple M3, M3 Pro and M3 Max microprocessors, in figures

m3 max

m3 pro

m3

M2

M1

photolithography

3nm

3nm

3nm

5nm (2nd generation)

5nm

number of transistors

92 billion

37 billion

25 billion

20 billion

16 billion

maker

TSMC

TSMC

TSMC

TSMC

TSMC

number of cpu cores

Up to 16

Up to 12

8

8

8

high performance cores (ar)

12

6

4

4

4

high efficiency (ae) cores

4

6

4

4

4

number of graphics cores

Up to 40

Up to 18

10

10

8

NEURAL ENGINE CORE (NE)

16

16

16

16

16

maximum unified memory map

128GB

36GB

24GB

24GB

16 GB

main memory technology

LPDDR5-6400

LPDDR5-6400

LPDDR5-6400

LPDDR5-6400

LPDDR4X-4266

unified memory bandwidth

Up to 400 GB/s

150GB/s

100GB/s

100GB/s

68GB/s

video encoding and decoding

Hardware accelerated H.264, HEVC, H.265, ProRes, ProRes RAW and AV1 Hardware accelerated H.264, HEVC, H.265, ProRes, ProRes RAW and AV1

Hardware accelerated H.264, HEVC, H.265, ProRes, ProRes RAW and AV1v

Hardware accelerated H.264, HEVC, H.265, ProRes, ProRes RAW and AV1

8K H.264, H.265, ProRes and ProRes RAW

4K H.264 and H.265

connectivity

3 x Thunderbolt 4/USB 4

3 x Thunderbolt 4/USB 4

2 x Thunderbolt 3/USB 4

2 x Thunderbolt 3/USB 4

2 x Thunderbolt 3/USB 4

This is the performance of the M3 processors that Apple promises us

The configuration of the cores of the M3 family chips is consistent with what the M1 and M2 series have proposed to us. The entry processor to the family, the M3 chip, incorporates 8 CPU cores (4 high performance and another 4 high efficiency), and 10 GPU cores. The M3 Pro processor is available in two different versions– with 11 CPU cores and 14 GPU cores, or with 12 CPU cores (6 high-performance cores and 6 high-efficiency cores) and 18 GPU cores. Finally, the M3 Max chip brings together up to 16 CPU cores (12 high-performance cores and 4 high-efficiency cores) and up to 40 graphics cores.

Apple has told us what we can expect about the performance of its new processors

As on other occasions, Apple has only revealed some details about the architecture of the M3 processors. As I mentioned a few lines above, the most relevant improvements come from the graphic logic (we will investigate it in the next section of this article), although yes, those from Cupertino have told us what we can expect about the performance of your new processors by putting them to the test with video editing, image processing, code compilation or productivity applications.

The most prudent thing is that we take the performance indices that the brands give us with some reluctance because it is evident that they are an interested party, but they can help us to get a rough idea about the productivity of the new chips. Of course, as soon as the new MacBook Pro or iMac equipped with an M3 chip fall into our hands we will analyze their performance in depth. In the graph below we can see that, according to Apple, the M3 processor is noticeably faster than the M2 and M1 chips when scaling images with Photomator.

When rendering images using Redshift, the M3 Pro chip far outperforms the M2 Pro and M1 Pro processors than the M3 did over its predecessors in the previous test. The higher clock frequency at which the CPU and GPU cores of the M3 Pro processor presumably work works in its favor, although it is very likely that Apple engineers have also introduced important optimizations in the microarchitecture of the M3 chips compared to the implementation of M2 processors.

On its website Apple has published many more performance tests, but the three we have selected illustrate quite well what productivity the M3 chips promise us. In the following graph we can see that the M3 Max, the most capable until the M3 Ultra processor arrives in the future, is much faster when rendering images in Redshift than the M2 Max and M1 Max chips.

This improvement is a priori the result of the increase in the clock frequency and the optimizations introduced by Apple in the microarchitecture of the M3 processors. Of course, TSMC’s 3nm lithography in theory works in its favor not only if we stick to its performance per watt; It also determines the maximum clock frequencies at which the CPU cores are capable of working.

According to Apple, the high-performance cores of the M3 processors are 30% faster than the comparable cores of the M1 chips. And the high-efficiency cores are, again according to Cupertino, 50% faster than the comparable cores of the M1 chips.

Unfortunately, Apple has barely given us a couple of hints about the efficiency of the M3 microprocessors, although it has promised us something revealing: its performance in multithreaded usage scenarios is the same as that offered by comparable M1 chips, but your energy consumption is reduced by half. In this area, 3 nm lithography makes the difference.

Hardware ray tracing and dynamic cache come to the Mac GPU

Apple assures that the graphics of the M3 processors play in another league. Could be. We will know for sure when we have the opportunity to thoroughly analyze the first Macs equipped with these chips, but the truth is that, on paper, the improvements implemented in its graphic logic look very good. One of the most important is dynamic cache, a work strategy that allows the GPU to decide in real time how much local memory it needs to reserve for each task. This technique in theory allows you to maximize the use of graphical logic and its performance. At the same time it optimizes the use of local memory.

The integrated GPU in the M3 processors implements hardware specifically dedicated to ray tracing acceleration

Additionally, the integrated GPU in the M3 processors implements hardware specifically dedicated to ray tracing acceleration. PC gamers know well that this rendering strategy is much more demanding on graphics hardware than traditional rasterization, so the presence of dedicated hardware is welcome. According to Apple, the performance of M3 chips is up to 2.5 times greater than that of M1 processors when the . Sounds good.

However, the graphics logic of the M3 processors implements one more improvement that is worth not overlooking: hardware acceleration of the . Broadly speaking, this technique acts on the geometry of the scene with the purpose of transforming complex geometry into a package of simpler meshes that can be rendered with much less effort. According to Apple, the GPU of the M3 processors is capable of delivering the same performance as the graphics logic of the M1 chips, but consuming half the energy.

Up to 128 GB of unified memory

The implementation of a unified memory map that is accessed by both the CPU cores and the GPU responds to the search for a strategy that allows reducing latency, increasing energy efficiency, and at the same time, improving transfer speed. This technique is still present in the M3 processors, but these chips can work in tandem with a larger unified memory map. The M3 processor maintains the maximum 24 GB of the M2 and M1 chips, but the M3 Pro increases this figure to 36 GB, and the M3 Max to a quite impressive 128 GB.

Categories
Ξ TREND

New Mod for Switch provides real-time CPU, GPU and temperature monitoring – item


Some time ago, everything was done with FRAPS. These days, however, Riva Tuner Statistics Server and OCAT are the go-to tools. For decades, PC users have relied on frame rate information and monitoring various parameters displayed on the screen to get a sense of how their PC has been used. What if similar tools were also available to console gamers? Well, a recent intervention in Switch Modding has made this possible. Frame rates, percentages of CPU and GPU usage, temperature monitoring, fan speed: all these parameters have come to the fore and give us a fascinating insight into the use of Switch hardware by the various titles. during the game.

Obviously, all of this is only possible on the early versions of Switch, which were vulnerable to a hardware exploit recovery mode on which custom firmware was developed. Yes, you can fire up these tools on your own, but they do offer a way of piracy and it’s no wonder if these modified consoles connect to Nintendo’s online services, they are regularly banned. But the interesting part for us at Digital Foundry is the proliferation of homebrew software. Recently released was the Tesla Framework, code that runs on the Switch’s CPU-only SoC, and capable of bringing an interactive overlay to the screen during any gaming session. Tesla was immediately followed by the release of the switch overlay mod, which essentially brings a lot of the functionality of Riva Tuner Statistics Server to Tesla. So here we are at the full analysis: what does it tell us?

Basically you have instant confirmation that Nintendo is reserving an entire processor core for the OS and UI: the overlay shows that cores 0, 1, and 2 are dormant while navigating through the menus, with only heart three active. Likewise, the information on the screen indicates that during the docked configuration, the clock frequencies are fully unlocked during the game: 1020 MHz for the CPU and 768 MHz for the GPU, and 1600 for the EMC (controller of built-in memory).

Anyway, we now have the opportunity (and we have somehow done this in the past) to see how the hardware behaves in real time with the boost mode. This is the ability for some games to temporarily overclock the CPU to improve load times. For example, when you die in Mario Odyssey, the screen turns black and the game returns you to the previous checkpoint. This is a relatively quick operation normally, but in Mario Odyssey it is faster thanks to the boost mode. While loading, the CPU is temporarily overclocked to 1785 MHz, equal to + 75% of normal frequency. In contrast, the GPU is under-synchronized at 76.8 MHz, which is 1/10 of its maximum frequency. Nintendo essentially balances the heat output inside the SoC by overclocking the processor to the maximum and reducing the GPU to its minimum conditions.

This technique is used in many modern games: Wolfenstein Youngblood and Crash Team Racing exploit it, while Zelda: Breath of the Wild and Super Mario Odyssey have been fixed to include it. The load times are determined not only by the speed of the internal NAND and your SD card, but also by the CPU which has to decompress the assets in the background. With the screen black or displaying a static image, the graphics component does not need to be operating at full power. From the first touches of gameplay, the system restores the default clock frequencies. The boost mode certainly works great, as we found a 7 second lead in loading the main menu to the Grand Plateau in Breath of the Wild (23s vs 30s).

The system monitor overlay also reveals how some titles have managed to push Switch’s hardware to its limits to the point that Nintendo was forced to step in by providing an OS-level performance mode (one thing aside from the boost mode which only applies to portable configuration) When the switch clock frequencies were first revealed, the CPU was locked at 1020 MHz and the GPU at 307.2 MHz. Right before launch, portable mode saw the GPU increase more reasonably to 384 MHz. There are some more complex titles these days pushing the GPU to 460 MHz, but that’s only part of the story.


Mortal Kombat 11 is a prime example. Once the arena is loaded, the GPU increases by 460 MHz from opening cutscenes to gameplay. This is an exceptionally high clock rate, but limited to gameplay only. In fact, the menu reverts to 384 MHz. Super Mario Odyssey uses the same improved clock mode, but some surprisingly they don’t use it. Hellblade: Senua’s sacrifice would have benefited tremendously: Its dynamic resolution would be higher and the frame rate more solid, but it runs on the standard GPU clock of 384 MHz.

We find the same situation in Link’s Awakening, which experienced frame rate issues, and some scenarios in the past have shown huge benefits in overclocking the console. The developers may have gone for standard frequencies to conserve battery life, as users are more prone to playing RPGs for long, continuous sessions. But there is an interesting starting point regarding this game. GPU overclocking certainly helps solidify the frame rate, but CPU and GPU monitoring suggests that a lot of resources go unused in the SoC when these issues are running. stuttering occurs, suggesting that the problem lies elsewhere.

One of the most fascinating results of this monitoring tool is the dynamic clock in portable mode. Games use them are few, and among them there is Luigi’s Mansion 3. The GPU varies between 307.2 MHz and 384 MHz, depending on the scenario, and in the lighter ones, it aims to preserve battery life. However, in the id Tech 6 engine used in ports developed by Panic Button, the GPU oscillates across the full range of available frequencies: 307.2 MHz, 384 MHz and also 460 MHz. Recently, fixes have been released that improve the performance of older Tech Portage IDs, and we are wondering if they are related to this factor.

The system monitor overlay also gives us detailed information about the internal temperatures of the switch. In docked mode, Doom and Wolfenstein are usually titles that focus on hardware by forcing the fan to run at full speed. In an air-conditioned office at 22 ° C, these two titles generated a lot of heat in the console bringing the SoC to 60 ° C and 55 ° C respectively. And all this with a fan running at 47%. Obviously higher speeds can be achieved, but in our experience these two titles were the ones that put the hardware stress the most, with Luigi’s Mansion 3 strangely bringing the fan to 100%. Considering these are technically complex titles, and all of them carry the CPU at 90%, that makes sense. And at the same time, it indicates that we have a lot of room for overclocking: since the TJmax of the SoC is 100 ° C, 60 ° C is very safe. The biggest problem with overclocking is definitely fan noise, which gets very annoying above 60 ° C.


But maybe the increase in clock rates is somehow in future plans from Nintendo. We know Nintendo has a developer mode that sets the processor to 1220 MHz, which is a 19.6% increase in frequency of actions. Our tests show that thanks to the OC homebrew sysclck tool, this frequency has no impact on the battery and helps a lot in solving the performance issues that plague many titles.

The system monitor overlay shows that titles like Smash Bros Ultimate, Doom, Wolfenstein, and Luigi’s Mansion 3 use over 90% CPU and more power would definitely help improve performance. A quick test at Wolfenstein Youngblood shows big improvements in fluency in the initial part of the first level, for example. Nintendo has shown that it wants to change the performance profiles of Switch as we encountered dynamic frequencies for the GPU, a boost mode for downloads, and a GPU set to 460 MHz in portable mode. There is therefore a good chance that the company will continue on this path.

Whatever the reason, whether for monitoring, overclocking, or gaming mods (as we saw recently with The Witcher 3), the low-level access to the SoC allowed us to fully understand the how the Nintendo Hybrid Console works and how the company continues to improve its performance. The detail, the system monitor overlay illustrates the machine’s versatility and the areas of hardware that can be pushed further by balancing temperature, fan speed, GPU load, and performance. This is the most general analysis we’ve done so far on the behavior of a current generation console, and it will be interesting to see what Nintendo’s next move will be.

Source : Reddit

Categories
Ξ TREND

The dizzying evolution of the iPhone CPU and GPU over the years on the doorstep of the iPhone 12


The processor designed by Apple for each generation of Apple has been a symbol of its power for a decade. Just weeks after the launch of the iPhone 12, which presumably will bring under its hood the A14 processor, we examined the dizzying evolution of these chips. An improvement that has multiplied by more than 160 times the power of its CPU and by nearly 2000 times that of its GPU.

All iPhone chips and their technical evolution

Processor IPhone model An CPU increase Boost GPU Frequency RAM Nanometers
Samsung APL0098 iPhone Edge 2007 412 MHz 128 Mo 90 nm
Samsung APL0098 iPhone 3G 2008 0% 0% 412 MHz 128 Mo 90 nm
Samsung APL0298 iPhone 3GS 2009 100% 100% 600 MHz 256 Mo 65 nm
A4 iphone 4 2010 100% 100% 800 MHz 512 Mo 45 nm
TO 5 iphone 4s 2011 100% 800% 800 MHz 512 Mo 45 nm
A6 iphone 5 2012 100% 100% 1,3 GHz 512 Mo 32 nm
A7 iphone 5s 2013 100% 100% 1,4 GHz 1 Go 28 nm
A8 iPhone 6 and 6 Plus 2014 25% fifty% 1,5 GHz 1 Go 20 nm
A9 iPhone 6s and 6s Plus 2015 70% 90% 1,85 GHz 2 Go 14 to 16 nm
Fusion A10 iPhone 7 and 7 Plus 2016 40% fifty% 2,34 GHz 2 à 3 Go 16 nm
A11 Bionics iPhone 8, 8 Plus and X 2017 25% 70% 2,39 GHz 2 à 3 Go 10 nm
A12 Bionics iPhone XR, Xs et Xs Plus 2018 fifteen% fifty% 2,49 GHz 3 à 4 Go 7 nm
A13 Bionics iPhone 11, 11 Pro et 11 Pro Max 2019 twenty% twenty% 2,66 GHz 4 Go 7 nm

A power multiplied by 160 in its CPU and by 1900 times its GPU

Early iPhone models had a processor designed and manufactured by Samsung. IPhone Edge and iPhone 3G they used the same processor, called Samsung APL 0098, with a frequency of 412 MHz, 128MB of RAM and manufactured in a 90nm process. Fast forward to 2019, when Apple introduced the iPhone 11 family and its designed A13 Bionic at home, with dizzying characteristics: 2.66 GHz, 4 GB of RAM and 7 nm.

Between the two, a path has been traveled with several unique milestones in the industry. The first was undoubtedly the A4 launched in 2010, since it involved the first design developed by the Apple company and not by a third party. It debuted on the original iPad and, months later, on the iPhone 4 (although under-clocked). Three years later, the A7 arrived in the iPhone 5s, the first 64-bit mobile chip that caused panic in the industry.

Apple processors consistently rank among mobile chips for their performance

Later, the A10 Fusion, which launched cores of different sizes for different tasks (efficiency and power) on the platform. And the Bionic generations brought the neural engine to perform machine learning tasks. Until we get to the A13 Bionic from last year.


If we compare the evolution and increases in power, both CPU and GPU, we get charts like the ones accompanying this section. The CPU has multiplied its power by 164.2, while the GPU did it 1883.7 times (The A5 was an improvement of the 9x GPU over the previous version). Apple tends to accompany its presentations with new iPhones or iPads with similar graphics, especially when the jump is considerable.

An A14 processor for 2020 iPhones: 40% more CPU and 50% more GPU

The logical evolution leads to more powerful chips every year. Of course, not every year there is great progress. According to some leaks, the A14 chip that we will see this year will count with an increase compared to the A13 Bionic which would be as follows:

  • CPU: 40% more, up to 230 times on the iPhone Edge.
  • GPU: 50% more, up to 2,825 times on the iPhone Edge.

If this forecast comes true, we would be facing a very significant rise on both fronts. The one that, on the processor side, had not been produced since the A10 Fusion 2016. In recent years, Apple’s “A” processors rivaled those of Intel in terms of performanceSo it’s no wonder we’ve seen the company use the A14 in their early Macs with Apple Silicon.


Intel’s straightjacket and the journey to Apple Silicon

A most interesting fall is to come on the Apple front. When the iPhone 12 launches, we are waiting the first Mac model with Apple Silicon, with which the company will test the waters of the first transformer at home for your conventional computers. In a few weeks, we will get rid of the doubts.