Posts Tagged ‘CUDA’

Mobile Fermi!

Wednesday, May 26th, 2010

nVidia just presented GeForce GTX 480M Fermi Mobile GPU. It’s more of a underclocked GeForce GTX 465 (to be presented soon), and will offer typically 50% of a GeForce GTX 480 desktop performance-level.

Still it’s largely over the last-generation GeForce GTX 285M that was just a GeForce 9800GTX in disguise, showing that GT200 GPU (GTX260..295) is just inadequate to be used on mobile platforms or downsized for middle-level gamer videocards!

The most interesting thing is not the gaming performance, and it will be impressive, on a par with my desktop GTX 260 or better, but Fermi being available on (huge) laptops. With real-world OpenCL & CUDA performance-level that is really impressive.

If you compare this laptop GPU to desktop CPU, for example with Folding@home distributed supercomputing projet, created for CPU and ported to GPU on nVidia’s CUDA and ATI’s brooke(n) technology you will have to compare with:

- 3 desktop Core i7 high-end CPU (or 6 laptop Core i7 Mobile CPU!)

- 2 Radeon 5870 desktop GPU (or a 5970 desktop GPU)

The raw numbers are more impressive on Radion HD 5xxx GPU, but the real-life OpenCL performance (and CUDA too) is almost unbeatable when you took CPU-developped programs ported to GPU!

And it’s a mobile GPU! I would like to see a PC card with it, to consume 2X less than my GTX 260 while offering better performance-level and Fermi computing ability :-)

How-to use CUDA for H.264 encoding?

Tuesday, November 3rd, 2009

CUDA is a powerful technology, incredibly powerful GPU and superb suite of development, debugging and profilin tools. x264 project tried to make it work on their excellent h.264 video encoder (that is blazingly fast with a great video quality on CPU).

They failed, or put it differently, they choose not to use it but consider other way to accelerate encoding such as dedicated hardware accelerators (such as ElGato Turbo.264 HD that I use on my laptop).

There’s many way to follow to use CUDA as H.264 encoder accelerator:

  • Put some cpu-hungry algorithm part to GPU. Was their first choise, but this algorithm seems slower on their implementation than CPU counter-part. FAILED!
  • Put the whole encoding chain to GPU, but as the most computing-intensive part is actually slower on GPU (as they try to implement it) it’s a loss. FAILED!
  • Put the whole encoding chain to GPU, *BUT* give it a different movie part to encode, dynamically, and instead swapping to GPU, aggregate CPU and GPU to do the whole encoding.

The third option is a different way to consider the GPU, not as a co-processor in the middle of a cpu-algorithm, but as an asymmetric computing resource, able to give 10% to 30% performance gain on the whole process.

This is the way I am currently exploring, having in mind to obtain a gain in H.264 encoding over pure-cpu, and to be able to port it to OpenCL with dedicated algorithms for CPU and GPU :-)