The Intel 1.4Ghz and 1.7Ghz Pentium 4 processors and the D850GB mainboard
Friday, April 27, 2001
Advanced Dynamic Execution
Intel's development team pulled out all the stops in the battle to alleviate the effects of branch-mispredicition within the P4's Hyper Pipeline. To that effect, they implemented a very aggressive Out-Of-Order instruction handling architecture, followed by the development of a more efficient branch-prediction algorithm. As a result, the Pentium 4 can have as many as 126 instructions in-flight, compared to the 42 of the Pentium III. This is further combined with a "Branch Prediction Buffer" of 4KB, compared to the 512-bit of the Pentium III. As a result, the Advanced Dynamic Execution engine has the net effect of reducing the number of branch misprediction by about 33% over the P6 generation processor's branch prediction capability.
The Rapid Execution Engine
Another very interesting technology of Pentium 4 is its incorporation of two Arithmetic Logic Units (ALUs) that operate at twice the frequency of the rest of the processor. Ergo, if the chip itself is oscillating at 1400MHz, the twin ALUs can be found humming away at 2800MHz. This arrangement goes a long way towards boosting the raw number-crunching abilities of the P4.
Execution Trace Cache
The Pentium 4 also implements a different type of L1 cache than its predecessors. Instead of the split data/instruction L1 caches of the Pentium III, the Pentium 4 uses solely a data cache of 8KB.
With the Pentium 4, instructions are cached within the Execution Trace Cache. One of the most important aspects of this arrangement is that the Trace Cache does not store x86 instructions, but rather micro-ops that can be read directly by the processor. This is important, because of the most resources intensive operations of x86-CPUs is the translation of x86 instructions into a format that is comprehensible to the inner works of the CPU.
The Execution Trace Cache serves as a buffer between the decoding stage and the execution phase of an instructions life-time. Once an instruction has been de-coded it can immediately be stored in the Trace Cache, and thus avoid the potential work of decoding it the next time it's called - all of which goes to reducing the penalties of flushing the 20-stage pipeline.
Suite: The Advanced Transfer Cache.
|