ATI Radeon HD5870 Mobility: OpenCL Teraflop on a laptop

January 8th, 2010

The new ATI Radeon HD 5870 use the same 5xxx generation chip that offer full OpenCL 1.0 support, equivalent to nVidia’s GeForce 8xxx and later, but break the Teraflop MAD on a mobile GPU with 1120 Gigaflops!!!

Yes you read this well, you will be able to find laptops with 1 Teraflop raw power inside their GPU before spring, it will be at least 20X the processing power of the multi-core CPU of these laptops!

As a reminder, ASCI RED supercomputer broke the 1 Teraflop barrier in December 1996, with 4510 Pentium-Pro processors. Now you may do calculation at an equivalent pace with a laptop, only 13 years later! ouch!

  • Print this article!
  • E-mail this story to a friend!
  • Digg
  • Twitter
  • Facebook
  • del.icio.us
  • Google Bookmarks
  • LinkedIn
  • Reddit
  • Slashdot
  • Technorati

ATI Radeon 4xxx OpenCL benchmarks

November 10th, 2009

There’s some OpenCL benchmarks out there, and on OpenCL Benchmark that test real GPGPU computation, instead of pure processing power on theorical computation, Radeon 4xxx series lag far far behind of nVidia actual GPU.

ATI Radeon 3xxx and 2xxx are not supported, albeit nVidia’s GPU are supported since 2006 G80 (GeForce 8800 and any GeForce 8 series or later GPU), and Radeon 4xxx are just underperforming, lacking shared memory (memory inside each processor core).

Lacking “shared memory” means that for any data access Radeon 4xxx have to access global video card memory, that is usually 20X to 30X slower, and worse, memory bandwidth on Radeon graphic card are 2X to 3X slower than on nVidia’s. This is not an handicap for games, where radeon are really great graphic card, but it is for GPGPU and OpcnCL.

The result of lacking Shared Memory and slow graphic card memory: a Radeon 4870 (around 200$ street) could not compete with GeForce 9400M IGP (found on Mac Mini, MacBook Air, MacBook…), and a GeForce 9400M iMac will beat any ATI Radeon 4850 iMac when it’s time to compare OpenCL performances! :-(

  • Print this article!
  • E-mail this story to a friend!
  • Digg
  • Twitter
  • Facebook
  • del.icio.us
  • Google Bookmarks
  • LinkedIn
  • Reddit
  • Slashdot
  • Technorati

How-to use CUDA for H.264 encoding?

November 3rd, 2009

CUDA is a powerful technology, incredibly powerful GPU and superb suite of development, debugging and profilin tools. x264 project tried to make it work on their excellent h.264 video encoder (that is blazingly fast with a great video quality on CPU).

They failed, or put it differently, they choose not to use it but consider other way to accelerate encoding such as dedicated hardware accelerators (such as ElGato Turbo.264 HD that I use on my laptop).

There’s many way to follow to use CUDA as H.264 encoder accelerator:

  • Put some cpu-hungry algorithm part to GPU. Was their first choise, but this algorithm seems slower on their implementation than CPU counter-part. FAILED!
  • Put the whole encoding chain to GPU, but as the most computing-intensive part is actually slower on GPU (as they try to implement it) it’s a loss. FAILED!
  • Put the whole encoding chain to GPU, *BUT* give it a different movie part to encode, dynamically, and instead swapping to GPU, aggregate CPU and GPU to do the whole encoding.

The third option is a different way to consider the GPU, not as a co-processor in the middle of a cpu-algorithm, but as an asymmetric computing resource, able to give 10% to 30% performance gain on the whole process.

This is the way I am currently exploring, having in mind to obtain a gain in H.264 encoding over pure-cpu, and to be able to port it to OpenCL with dedicated algorithms for CPU and GPU :-)

  • Print this article!
  • E-mail this story to a friend!
  • Digg
  • Twitter
  • Facebook
  • del.icio.us
  • Google Bookmarks
  • LinkedIn
  • Reddit
  • Slashdot
  • Technorati

New Apple iMac with Radeon 4xxx GPU

October 20th, 2009

Apple just unveiled new lineup of iMac with Radeon 4670 & 4850 (except the entry model with GeForce 9400M IGP).

I don’t understand this move, because Radeon 4xxx are just under-performers on OpenCL, except on some benchmarks that just do mul-add (MADD) or bandwidth tests. Radeon 5xxx may have been better choice, but they are not available, but it seems that nVidia and Apple divorced after 1 good year of successfull collaboration…

I just wonder why Apple choose ATI Radeons, except for the GPU price?

Underperformers on OpenCL, a big novelty on Snow Leopard, and unsable to play games correctly at native screen resolution (Radeon 4670 on 1920×1080 and Radeon 4850 on 2560×1440). Clearly it’s a dumb move, nobody will be satisfied by these choices, nor gamer nor OpenCL developer or software user!

  • Print this article!
  • E-mail this story to a friend!
  • Digg
  • Twitter
  • Facebook
  • del.icio.us
  • Google Bookmarks
  • LinkedIn
  • Reddit
  • Slashdot
  • Technorati

CUDA vs. CPU

October 16th, 2009

I found this title on nVidia’s Forums, and I think some people are misleading.

Misleading because you don’t have to consider CUDA per-se anymore, but instead consider writing code for OpenCL, that is natively supported on Mac OS X platform (CPU, ATI GPU, nVidia GPU), by nVidia on Windows and Linux (nVidia GPU), by AMD on Windows (any-vendor CPU), by ATI on Windows and Linux (ATI GPU). And you have to know CUDA and do some CUDA development and optimization to be able to do good OpenCL GPU code. OpenCL is now the way to go, definitely.

OpenCL vs. CPU?

OpenCL enable you to code once, then have your code running on each available GPU (if supported), and each available CPU core or CPU Thread (hyper-threading), simultaneously. OpenCL ease the use of many-many-core CPU, and the use of GPU at the same time, aggregating all their potential instead of opposing it.

With OpenCL, your code will run as fast as possible wether you have a mono-core CPU, a 16-core 32 thread workstation or server, a computer with 3 TFlops on GPUS, or even, as me a laptop with dual core CPU, and 2 GPU.

OpenCL is a way to use your multi-core CPU a 100% while aggregating the computing power of your GPU. It’s intended to unleash the full potential of modern computers, and make them run 2X to 5x than before :-)

  • Print this article!
  • E-mail this story to a friend!
  • Digg
  • Twitter
  • Facebook
  • del.icio.us
  • Google Bookmarks
  • LinkedIn
  • Reddit
  • Slashdot
  • Technorati