Archive for the ‘General’ Category

nVidia’s supercomputer for DARPA

Tuesday, August 10th, 2010

DARPA are giving $25 millions to nVidia to research exascale computer, that should offer 1000X performance-level of actual existing  supercomputers in 2018, using future generations of nVidia’s Fermi GPU.

Again, this show the incredible momentum that GPGPU is gaining, US government funding nVidia to research new architectural evolutions, to raise performance-level, reliability, and programmability of their GPU.

  • Print this article!
  • E-mail this story to a friend!
  • Digg
  • Twitter
  • Facebook
  • del.icio.us
  • Google Bookmarks
  • LinkedIn
  • Reddit
  • Slashdot
  • Technorati

Debunking the 100X GPU vs. CPU Myth: an Intel Paper

Monday, June 28th, 2010

Download and read the intel paper about debunking the myth of GPU being 100X faster than CPU on some tasks.

This paper is interesting, there are flaw in it, but I truly think the whole part is honest, and this honesty goes up to the point where Intel is saying that nVidia GPU are 14X faster than Intel’s CPU at best, and average 2.5X faster. I think that depending on the selected set of tasks, it might be true or a little conservative. Anyway Intel honestly said that their CPU are lagging far behind GPU.

They compared a $600 CPU and a $300 GPU, and the GPU was 2.5X faster average. Depending if money or performance interest you, you could consider different conclusions…

Upgrade to an old PC

To upgrade an old PCI-Express PC, you could add a $100-$120 graphic card, to obtain performance-level of a brand-new Core i7 960 PC. Core i7 960 needs new socket so an old PC won’t be compatible, you will have to spend $1000 for it (or more!). Given you resell your old PC for $300, Intel CPU are juste 6X-7X more expensive…

High-Performance PC

You may prefer to invest on a high-performance computer, because you need it, so you could choose for example a Mac Pro with 8-core 2.93Ghz (2 quad-core Nehalem CPU), that is 1.8X faster than the Core i7 960 in pure computing, for a tag price of $5899. Or for the same average computation power, take a $500 desktop and add it a $200 GeForce GTX260, $700 investment total. 8.4X more expensive on Intel CPU.

Desktop Supercomputing

Or maybe compare a high-performance 8-core PC/Mac with what this money could give you in term of performance for the $5899 tag price, giving you are buying a basic Mac Pro 4×2.66 Ghz $2499 and upgrade it with $3400 worth of GPU and second power… $1000 for a superb 1000W power, and 3xGeForce GTX 480 (each one at least 1.5X faster than GTX 285), PCI-Express expander.

On my left 8×2.93 Intel CPU power, at my right 3xGeForce GTX 480, for the same budget, on high-end computing, you will get 5X to 6X more performance average, and sometimes 20X to 30X if your task is GPU-oriented. Ouch!

Conclusions

If you want to upgrade an old PC to double it’s computing power or more on CUDA-enabled tasks, consider buying a new high-end PC/Mac, or want to have the fastest desktop supercomputer, wether it’s performance/price that drives you or performance-level, in any case there’s no competition to GPU.

Intel isn’t competing anymore in high-end supercomputing when we are talking about tasks that GPU handle correctly and are massively scalable. Whatever your metric is, nVidia GPUs are far better.

  • Print this article!
  • E-mail this story to a friend!
  • Digg
  • Twitter
  • Facebook
  • del.icio.us
  • Google Bookmarks
  • LinkedIn
  • Reddit
  • Slashdot
  • Technorati

Danke sehr Herr Srdja

Sunday, June 20th, 2010

I am in communication with a german student that is working on OpenCL implementation of chess engine (read he’s blog about Chess on OpenCL), and it’s truly interesting in many ways.

We are not following the same path, we are of 2 different generation, with different backgrounds, on similar (or derived) technologies, with the same goal: writing a chess-engine. And sharing ideas, exchanging documents with him make my mind being much more creative, as I solved indirectly my move generation problem, and even the memory contention that I had, and branch divergence at once.

Seems incredible, but exchanging ideas with people trying to solve the same problem not only enrich you directly, but also give you the ability to mix them and finally come with surprising things! Thanks :-)

I also posted a ZIP containing a list of open-source chess engine that inspired me, as well as PDF documentation and interview about chess computing, and the people that made it possible.

It was originally for Srdja, but I think that the time I took to find them might be really useful for anyone interested in chess computing. Will probably update it regularly to include more documents and code.

  • Print this article!
  • E-mail this story to a friend!
  • Digg
  • Twitter
  • Facebook
  • del.icio.us
  • Google Bookmarks
  • LinkedIn
  • Reddit
  • Slashdot
  • Technorati

Memory limits and new developer generation…

Monday, June 14th, 2010

The new generation of developers have the habit to count in GB (gigabyte = 1 billion bytes) for main memory and TB (1000 billion of bytes!) for storage space!

I began programming in the 70’s, so mine had the habit of counting in bytes for memory, and not thinking too much about external storage as a means to avoid memory overload, just as a means to access data and store their states, but having in mind that these datas should be as close as the CPU or execution unit as possible!

My first own programmable computer was a TI-57, the one with LED, 50 instructions step, 8 registers (1 dedicated for decrementing loops, 1 for comparison). There wasn’t no useless instructions, execution was real slow too, and you could not afford such luxury as external storage or non-optimized code. A great lesson to use each resource to it’s fullest.

Chess programmers on the 70’s have done with some kind of limitations, imagine a full chess engine on ‘72, on a 4-bit micro-controller (that is 4bit CPU + peripheral on one chip), 2KB ROM, and 80 Bytes of memory (yes 80, organized in 160 x 4bytes). David Levy and it’s team have done that! Incredible for me!

Today, most of our new generation developer think that these limits of the past, or the know-how old developers (as me) have acquired to live with that and produce useful applications with so limited resources, all that is useless and should be put on a Museum…

But if you look at chess on CUDA, you will discover that these limits are actually there, and you’ll have to cope with them, and better don’t waste any storage Byte, ’cause you may regret it:

On each SM, you have 8 SP (Scalar Processor), that executes at least 32 threads to be fully working on basic instructions, and only 16KB or shared ram. yes, that is 512Bytes of RAM for each thread, in a world where you usually allocates Megabytes to any threads just to have it starting! You could use the videocard main memory, you will be limited by total bandwidth of memory, and will have scaring latency. You will even have to launch more thread to hide latencies and use your GPU processing power, ending with memory being a total bottleneck: the more your launch thread, the less each one has shared memory, the more each thread will use main memory. An exponential problem!

So you will have to cope with 512Byte memory per thread, if you want to use each GPU cycle efficiently on basic instructions. And it,s the same wether you consider 2SM/16SP GeForce 9400M IGP, or 16SM/128SP GeForce 9800! The problem scale perfectly, albeit main memory bandwidth doesn’t on high-end card!

Now be prepared to code like David Levy’s have done, Dan & Kathe Spracklen did, and some other famous chess developers of the 70’s: your resources are so limited that you may even struggle just to have the list of move in a given position. 64 bytes for chess board, 218 move possible at worst, 2 bytes per move (packed), you are at 500 bytes for your thread, just 12 bytes (3 32bit word) left! Ouch!

So how to overcome these limitations??? And avoid using video card main memory?

  • Print this article!
  • E-mail this story to a friend!
  • Digg
  • Twitter
  • Facebook
  • del.icio.us
  • Google Bookmarks
  • LinkedIn
  • Reddit
  • Slashdot
  • Technorati

Mobile Fermi!

Wednesday, May 26th, 2010

nVidia just presented GeForce GTX 480M Fermi Mobile GPU. It’s more of a underclocked GeForce GTX 465 (to be presented soon), and will offer typically 50% of a GeForce GTX 480 desktop performance-level.

Still it’s largely over the last-generation GeForce GTX 285M that was just a GeForce 9800GTX in disguise, showing that GT200 GPU (GTX260..295) is just inadequate to be used on mobile platforms or downsized for middle-level gamer videocards!

The most interesting thing is not the gaming performance, and it will be impressive, on a par with my desktop GTX 260 or better, but Fermi being available on (huge) laptops. With real-world OpenCL & CUDA performance-level that is really impressive.

If you compare this laptop GPU to desktop CPU, for example with Folding@home distributed supercomputing projet, created for CPU and ported to GPU on nVidia’s CUDA and ATI’s brooke(n) technology you will have to compare with:

- 3 desktop Core i7 high-end CPU (or 6 laptop Core i7 Mobile CPU!)

- 2 Radeon 5870 desktop GPU (or a 5970 desktop GPU)

The raw numbers are more impressive on Radion HD 5xxx GPU, but the real-life OpenCL performance (and CUDA too) is almost unbeatable when you took CPU-developped programs ported to GPU!

And it’s a mobile GPU! I would like to see a PC card with it, to consume 2X less than my GTX 260 while offering better performance-level and Fermi computing ability :-)

  • Print this article!
  • E-mail this story to a friend!
  • Digg
  • Twitter
  • Facebook
  • del.icio.us
  • Google Bookmarks
  • LinkedIn
  • Reddit
  • Slashdot
  • Technorati