Archive for October, 2009

New Apple iMac with Radeon 4xxx GPU

Tuesday, October 20th, 2009

Apple just unveiled new lineup of iMac with Radeon 4670 & 4850 (except the entry model with GeForce 9400M IGP).

I don’t understand this move, because Radeon 4xxx are just under-performers on OpenCL, except on some benchmarks that just do mul-add (MADD) or bandwidth tests. Radeon 5xxx may have been better choice, but they are not available, but it seems that nVidia and Apple divorced after 1 good year of successfull collaboration…

I just wonder why Apple choose ATI Radeons, except for the GPU price?

Underperformers on OpenCL, a big novelty on Snow Leopard, and unsable to play games correctly at native screen resolution (Radeon 4670 on 1920×1080 and Radeon 4850 on 2560×1440). Clearly it’s a dumb move, nobody will be satisfied by these choices, nor gamer nor OpenCL developer or software user!

CUDA vs. CPU

Friday, October 16th, 2009

I found this title on nVidia’s Forums, and I think some people are misleading.

Misleading because you don’t have to consider CUDA per-se anymore, but instead consider writing code for OpenCL, that is natively supported on Mac OS X platform (CPU, ATI GPU, nVidia GPU), by nVidia on Windows and Linux (nVidia GPU), by AMD on Windows (any-vendor CPU), by ATI on Windows and Linux (ATI GPU). And you have to know CUDA and do some CUDA development and optimization to be able to do good OpenCL GPU code. OpenCL is now the way to go, definitely.

OpenCL vs. CPU?

OpenCL enable you to code once, then have your code running on each available GPU (if supported), and each available CPU core or CPU Thread (hyper-threading), simultaneously. OpenCL ease the use of many-many-core CPU, and the use of GPU at the same time, aggregating all their potential instead of opposing it.

With OpenCL, your code will run as fast as possible wether you have a mono-core CPU, a 16-core 32 thread workstation or server, a computer with 3 TFlops on GPUS, or even, as me a laptop with dual core CPU, and 2 GPU.

OpenCL is a way to use your multi-core CPU a 100% while aggregating the computing power of your GPU. It’s intended to unleash the full potential of modern computers, and make them run 2X to 5x than before :-)

OpenCL Sandra 2009 benchmarks results

Friday, October 9th, 2009

The firsts nVidia’s OpenCL drivers were slow at least (to not write buggy too), but the latest one are stable and real real fast, they provide incredible real-world level of performance:

Comparing Core2 Quad QX9650 and Core i7 to GeForce 9600M GT (mobile GPU) and GeForce 9600 GT show that actual GPU found on laptops or on the low-end desktop computer will crush CPU on many tasks, and more, enable you to double or triple the performance-level on many softwares:

SIS Software integrated OpenCL into their PC test-suite, Sandra 2009 SP4, and do benchmarks of CPU with OpenCL as well as benchmark of GPU with OpenCL.

I won’t compare Apple and Orange (albeit I prefer Mac on my lap and oranges on my glass), so the best is to compare OpenCL GPU-code and SSE2-optimized CPU-code, with actual ForceWare 190.89 video drivers. Fastest CPU implementation with hand-coding against great GPU implementation with adapted algorithms (but no hand-coding PTX)

- GeForce 9600M GT delivers 60Mpix/s, approximately the level of a Core2 Quad QX 9650, 4×3.0Ghz

- GeForce 9600 GT (under $100) delivers 170Mpix/s, 50% over any Core i7 965 4×3.2Ghz with 8 threads, that cost 1000$+!

I will finish talking about MCP79, GeForce 9400M integrated in most Mac, they offer with this benchmark the same level of performance than a Core2 Duo2 2×2.0Ghz with SSE2 optimized code, so it’s the opportunity to double (2X) performance on MacBook Air, aggregating CPU and GPU, and adds at least 60% performance on 2.53Ghz MacBook Pro based on GeForce 9400M.

Seems interesting, but imagine that a MacBook Air could be compared to a Core2 Duo 2×3.6Ghz desktop, or a MacBook Pro 2.53Ghz to a Core2 Quad 4×2.2Ghz desktop?

Or a MacBook Pro like mine, 2×2.8Ghz with GeForce 9600M GT, with the computational power of a Core2 Quad 4×4.5Ghz overclocked desktop, while running on battery, or to compare Apple and Apple, faster than an actual MacPro’s 4×2.26Ghz CPU???

PS: Oups, I do an error, on a MacBook Pro in Performance Mode, GeForce 9400M MCP79 and GeForce 9600M GT are BOTH active, and the gloable performance on SISoft Sandra 2009 SP4 benchmark is on a par with quad-core (8 threads) 2.66 Ghz MacPro CPU!!!

Fermi: the revolution

Friday, October 2nd, 2009

nVidia’s Fermi is a real revolution, and franckly much more interesting then Intel’s Larrabee from an OpenCL perspective.

A massive 16 TFlop computing power on a PC, that you may use in C, Fortran or C++, with a computing-model that enable you to port flawlessly current code, and then optimize it for the cGPU architecture to unleash it’s incredible power.

I have to recall that the most powerfull PC workstations of these day may reach 0.3 TFlops with their Intel CPUS and we are talking 50X more, available transparently in current languages, with code that will automagically run on any available CPU or cGPU on the system on Mac OS X or Windows.

OpenCL is a revolution in itself, that needed Fermi to reveal it’s potential. Welcome to a new world!!!

Fermi / GT300 : a new era!

Thursday, October 1st, 2009

nVidia just introduced it’s new GPGPU architecture, code-named GT300 and officially presented as “Fermi”.

This new architecture isn’t an evolution of existing one, as GT200 (GTX series) was over G80 (GeForce 8800), but is a revolution in itself, with L1+L2 cache, unified memory space, hardware stack, and the ability to execute full C++ code, and ability to execute multiple DIFFERENT kernels at once. Incredible!

CPU-oriented code will run more easily on this new cGPU architecture, without headache (at least less), and far better performances.

Within the next 2 years, CUDA Code (GPGPU oriented) will disappear to be replaced by OpenCL generic code that will run on both CPU and cGPU! :-)