Archive for July, 2009

GTS 240/250 : The return of GeForce 8800

Monday, July 27th, 2009

nVidia’s launched the GTS 250 months ago, based in fact on GeForce 9800GTX+ design (that is an overclocked 8800 inside!!!) and now the GTS 240 that is a G92b GPU from GeForce 8800 lineup.

I am happy to have a “real” GeForce 8800 to play with, but I wonder if consumer will pay twice their values for 2 years-old hardware repackaged?

For CUDA, I regret that nVidia is launching old hardware instead pushing CUDA 1.3 enabled devices, even with low clock or low frequency, because we all need CUDA 1.3 devices (shared Atomics, Double fp, twice register count … ) to write better CUDA or OpenCL software.

Censorship on nVidia’s Forums

Wednesday, July 22nd, 2009

I wrote an answer to a Post on nVidia’s Forums (openCL category), where an user asked if he should use OpenCL or CUDA for developping, by now.

My answer go straight to some points  about OpenCL immaturity, lack of support, and current implementation (nVidia’s) shortcomings, lack of delivering promises (heterogenous computing between CPU and GPUs), finally talking about learning curve for OpenCL.

This was just factual, and was censored by nVidia team :-(

So expect to find this discussion here in the following days!

PS: the Post is back, so I wonder if it’s an error on my side or someone put it back online??? Anyway as for any technology we should be able to discuss it franckly, challenge it and push nVidia to make it better and better!

CUDA 2.3 SDK

Tuesday, July 21st, 2009

The new CUDA SK 2.3 is here, I just downloaded it for my Vista 32bits and Mac OS X, I wanna try them ASAP.

A good evening, after seeing Armstrong walking on the moon :-)

Inter-GPU communication in CUDA 2.2

Monday, July 20th, 2009

CUDA 2.2 Device Driver added Mapped Pinned-Memory support to CUDA: computer main memory pages that are physically allocated and readable+writeable by any CUDA-enabled GPU.

This is the feature we all needed to enable multi-GPU asynchronous communication!!!

The only way to enable communication between multi-GPU (says the 2 GPUs of a GTX 295) to communicate was to stop the current GPU kernel execution, returning to CPU host, waiting for the other GPU to stop it’s kernel execution and then sending them informations via PCIe bus. You ended stopping kernel execution all the time and switching back to CPU even if no communication is waiting!!!

Now you just have to maintain Queues in Mapped Pinned memory, one for each pair of producer/consumer (n x n-1), 2 for 2 GPU, 6 for 3 GPU, .. and 56 for 8 GPU (still may be over 64KB queue with a low-end computer!), and you could even use the macro-threading technology to handle them :-)

PS: It just seems to be available to IGP (integrated graphic such as MacBook’s 9400M) and GT200 (GTX 260+) graphic cards :-(

sha1 contest : win iPhone, MacBook Pro and more…

Saturday, July 18th, 2009

I saw a sha1 contest that will start on next monday and will run for 30 hours.

Too late for me, but as it is relatively simple to do a sha1 hashed password attack with CUDA, I plan to do a little tool for that, just for the fun…

You just have to avoid generating the useless intermediate 2560bits buffer, using 16 internal registers instead + 8 for computing, and as my last post stated it, with macro-threading, you could occupate nearly 100% GPU-cycle of each scalar-processor while checking a global-memory entry that will flag if any thread as found the password.

Will be simpler using macros and manully unlooping the whole process, as the best CPU implementation do! (I think it’s on BSD?)

No thread synchronization (stopping!) needed, no communication penalty, no need to access Shared Memory (anyway coalescing will be easy by nterleaving datas :-) )

SHA1 is typically an algorithm that could be ported easily to CUDA!