Tuesday, August 14, 2007

Power of Itanium 2 (Montecito)

Itanium has been the most talked about processor even before it hit the market in 2001. it competes with the enterprise server processors like IBM's Power and Sun's UltraSparc. But when it hit the stands its sales didn't soar up as it was expected.This is mainly because of the ignorance of many people about its architecture and how to get the best out of it. some time back when in college some people had sarcastically pointed out for our college going for a itanium processor based Symmetric Multi Processor system(SMP). Now i would like to clear out those dirt out of the Byte-crunching Itanium processor.

Itanium is designed using the Explicitly Parallel Intruction Code (EPIC) architecture which is a completely new design neither near to RISC nor to CISC. EPIC means the that Itanium processor does not have the hardware in its chip for identifying instruction dependencies for parallel execution. By doing so some amount of chip space is vacuumed which could be used for accommodating additional registers.All the parallelism has to be decided by the compiler itself during compiling. The resulting binary could be made to run on the Itanium processor at mind boggling speeds. Imagine having 128 (64 bit)General purpose registers, 128 (82 bit) Floating point register, 64 branch and predicate register, 64 application register etc working on your data. really stupendous!!! Therefore if your compiler doesn't parallelize the code just like for other processors you would find itanium lagging, leading to people doubting its capacity.Itanium is best suited for places where the applications are fine tuned for it and then would one realize its sheer speed in computation.

Itanium could theoritically execute 6 instructions per cycle and practically known to execute 3 instructions per cycle while x86 architecture could do only one instruction at a time.Itanium's latest release Montecito has 2 cores each with 2 threads(coarse multithreading - doesn't run concurrently, one thread chips in when other does memory lookup ) emulating to 4 processor. Thunder and NASA's Columbia supercomputer's secret lies with Itanium !!! Apart from this Itanium has built-in capability to handle virtualization which means it can run multiple operating system concurrently. its huge 24 MB L3 on-die (internal) cache adds to its efficiency in fetching data as quickly as possible.The speculative execution in Itanium is very robust adds to its performance. Itanium 2 has improved a lot in power consumption and in price/performance arena. Itanium's sales is picking up and latest survey has indicated that its revenue is over 63% of SPARC system revenue and over 54% of Power system revenue.

As Tukwila (Itanium's next generation)is round the corner packed with quad core, Simultaneous MultiThreading(SMT- Intel's next Lethal weapon) and Common system interface and other features up its sleeve, i could smell out a competition brewing in the processor arena with IBM's Power7, Sun's Ultrasparc Rock(16 cores with yet to be decided on number of threads) and from AMD's Fusion


Ananth said...

Floating Point performance is currently the only reason why I would choose an Itanium over a Niagara !

But when Rock comes out with 8 cores, 8 FPUs, and 64 threads, it will enter the HPC bastion, helped by the Magnum IB switch and Software Transactional Memory.

But what HP, IBM and Intel should be
*really* worried about is this:
A 2048 thread monster coming out next year, with the only OS in the world that can scale that out of the box :-)

Praveen Krishnamoorthy said...

i heard that Rock has got 16 cores

With the race for threads on the go would it be really exploited? i saw some benchmarks where Niagara suffered a lot even against opteron(in single threaded performance).With T2 having one FPU per core am looking forward for the SPEC benchmark to find out if that had really helped.

Ralph said...

thanks for the explanation on why the Itanium2 is completely and utterly unsuited (not to mentioned overpriced) for general purpose computing, why HP decided to push the architecture as a business solution for any typical number of processors (SMP >= 64 being an open question) defies logic and economics . One of the greatest digital engineering boondoggles of our time, for sure. Do the dismal profits on a sliver of expected sales to date even cover the development costs?