Posts Tagged ‘OpenCL’

OpenCL performance surprise on MacBook Pro

Wednesday, March 3rd, 2010

I own a MacBook Pro 17″ with Core2 Duo 2.8Ghz, IGP MCP79 (GeForce 9400M w/16SP), and GeForce 9600MGT (32SP), to develop for CUDA and OpenCL. I like the ability to use GeForce 9400M and 9600M GT in parallel, the GeForce 9400M to try PINNED MAPPED MEMORY exchange with the CPU while kernels are running, 9600M GT for 2.5X more performance and dedicated 512MB, and both to improve dynamic load-balancing between GPUs or CPU and GPUs.

This computer could be set to use only the GeForce 9400M is active, 9600M GT shutdown, it’s called “Better battery life” on System Preferences. I dont’t use it since I want best performance for OpenCL and CUDA, and the ability to use any of the GPU at any time for computing, and even both sometimes.

The “Better Performance” setting

In this default mode, that I use each and everyday, each GPU is active and visible on both OpenCL and CUDA. They also appears together on the “About this Mac” page as graphic displays.

So you could run your CUDA or OpenCL code in any of these GPU, albeit GeForce 9600M GT is running at 1.25Ghz and GeForce 9400M only around 400Mhz. But they are both usable.

In this mode, Galaxy OpenCL Benchmark will give you 20Gflops CPU, 7 Gigaflop 9400M (400Mhz) and 43 GFlops 9600M GT.

The “Better battery life” setting

The GeForce 9600M GT disappear from the “About this Mac” video-card list, so it seems to be deactivated completely… but… taddammmmmm

-edit- on last system version 10.6.2 they both appear again in “better battery life” setting!

On OpenCL Galaxy benchmark, and OpenCL list of devices, it re-appears, fully useable. Moreover, the GeForce 9600M GT run at full speed, 1.25Ghz, and the GeForce 9400M too, at 1.1Ghz (instead 400Mhz in “Better Performance” setting!).

And so there goes the Galaxy OpenCL Benchmark results: CPU 22 Gflops (+10%), 9400M 19Gflops (2.7X faster) and 9600M GT 45 Gflops (+5%). Yes, it’s faster whatever the metric you consider than using “Best Performance” mode, at elast in case of OpenCL development, with a total gain of 16Gflops (+23% overall).

GeForce 9600M GT is faster because it don’t have to handle graphic anymore, CPU is faster because IGP is running at 1.1Ghz instead 400Mhz and it improves memory IO, 9400M is far faster running at 1100Mhz instead 400Mhz even while it needs to drive the video output and OpenGL display!

And unplugged on battery?

Performances are totally identical, albeit GPU took more time to go to their maximum frequencies, due to energey-saving policy. So battery or AC-plugged doesn’t matter from a performance point-of-view, either in “Better battery life” or “Better performance” mode.

So which mode to choose

It’s clear if you have OpenCL-enabled software, go for the “Better battery life” setting, because CPU is faster anyway (10% more, FREE upgrade of your Mac! lol!) and OpenCL is faster too, whichever GPU is used by the application!

Notice that a laptop may provide 86 Gflops of processing power on Galaxy benchmark, that is a real-world astro-physic application, not a simple MAD benchmark that only favorise number of core on a GPU. These 86 Gflops are largely over an actual Mac Pro 8-core 2.66Ghz with 16threads (2 quad-core Xeon processors).

I want to see more and more OpenCL-enabled application!

Apple Aperture 3 and OpenCL

Thursday, February 11th, 2010

Apple Aperture 3 is probably the first mainstream application to use OpenCL technology. It’s not on the specifications or technical informations, but it use OpenCL for RAW decoding and processing, from start to finish, and it’s a brilliant idea, even if the software is not as fast as I expected.

I discovered that, after some forums reading, and trying Aperture 3, doing same tasks using IGP GeForce 9400M on my MacBook Pro 17″ and the GeForce 9600M GT GPU (approx. 3X faster). Simple basic tasks as Thumbnail generation is really faster with the later, showing real usage of the GPU as a resource. This is not true demonstration of use of OpenCL but as it only supports Snow Leopard OS and Snow Leopard CoreImage technology switched from OpenGL shaders to OpenCL, this is highly probable.

Anyway, beside all drawbacks on Aperture 3 (memory usage, cpu usage, stupid multi-threading implementation…), that let LightRoom rule the market, it’s cool to see usage of new technology, and the turbo-boost that OpenCL may gives to mainstream applications!

As I stated on some forums about choosing a MacBook Pro with IGP GeForce 9400M or one with a “real” GPU GeForce 9600M GT, with OpenCL being used, the previous will stay slow albeit with fast GPU, the second one will be faster with new applications offering it longer life as a useful production tool!

ATI Radeon 4xxx OpenCL benchmarks

Tuesday, November 10th, 2009

There’s some OpenCL benchmarks out there, and on OpenCL Benchmark that test real GPGPU computation, instead of pure processing power on theorical computation, Radeon 4xxx series lag far far behind of nVidia actual GPU.

ATI Radeon 3xxx and 2xxx are not supported, albeit nVidia’s GPU are supported since 2006 G80 (GeForce 8800 and any GeForce 8 series or later GPU), and Radeon 4xxx are just underperforming, lacking shared memory (memory inside each processor core).

Lacking “shared memory” means that for any data access Radeon 4xxx have to access global video card memory, that is usually 20X to 30X slower, and worse, memory bandwidth on Radeon graphic card are 2X to 3X slower than on nVidia’s. This is not an handicap for games, where radeon are really great graphic card, but it is for GPGPU and OpcnCL.

The result of lacking Shared Memory and slow graphic card memory: a Radeon 4870 (around 200$ street) could not compete with GeForce 9400M IGP (found on Mac Mini, MacBook Air, MacBook…), and a GeForce 9400M iMac will beat any ATI Radeon 4850 iMac when it’s time to compare OpenCL performances! :-(

How-to use CUDA for H.264 encoding?

Tuesday, November 3rd, 2009

CUDA is a powerful technology, incredibly powerful GPU and superb suite of development, debugging and profilin tools. x264 project tried to make it work on their excellent h.264 video encoder (that is blazingly fast with a great video quality on CPU).

They failed, or put it differently, they choose not to use it but consider other way to accelerate encoding such as dedicated hardware accelerators (such as ElGato Turbo.264 HD that I use on my laptop).

There’s many way to follow to use CUDA as H.264 encoder accelerator:

  • Put some cpu-hungry algorithm part to GPU. Was their first choise, but this algorithm seems slower on their implementation than CPU counter-part. FAILED!
  • Put the whole encoding chain to GPU, but as the most computing-intensive part is actually slower on GPU (as they try to implement it) it’s a loss. FAILED!
  • Put the whole encoding chain to GPU, *BUT* give it a different movie part to encode, dynamically, and instead swapping to GPU, aggregate CPU and GPU to do the whole encoding.

The third option is a different way to consider the GPU, not as a co-processor in the middle of a cpu-algorithm, but as an asymmetric computing resource, able to give 10% to 30% performance gain on the whole process.

This is the way I am currently exploring, having in mind to obtain a gain in H.264 encoding over pure-cpu, and to be able to port it to OpenCL with dedicated algorithms for CPU and GPU :-)

ATI’s OpenCL CPU-Only!

Friday, August 7th, 2009

While nVidia actually support OpenCL on it’s GPU, but not on main CPU, ATI offers it’s own drivers that support main CPU but not it’s GPU! Anyway ATI’s GPU are not really ceonceived for GPGPU and wil llag far far behind nVidia’s on real OpenCL implementations!

The purpose of OpenCL is to enable code to run on both CPU and GPUs (even a mix of ATI and nVidia), not to enable to run either in CPU (what’s the novelty???) or restricted to a propretary GPU!!!

At this time, CUDA seems to be the technology path to follow before switching to OpenCL in 2010 or 2011…