The actual line of MacBook Pro have 2 GPUs: a 9400M IGP (MCP79) w/ 256MB shared DDR3, and a real GPU 9600M GT w/ 512MB video-ram.
But Apple advertised that you could just use one at once, either the 9400M GT IGP, in maximal autonomy mode, or the 9600M GT in maximum performance mode, to select on battery saver preference…
The reality is really different…
When you start using the 9400M IGP, the 9600M GT is disabled and doesn’t appear on CUDA’s deviceQuery. Normal, shutting down the 9600M GT diminish energy consumption.
But when you start using the 9600M GT, in Maximal Performance mode, the 9400M GT that is part of the chipset is not disabled: it appears inĀ deviceQuery and moreover it could be tested using bandwidthTest -device=1
CUDA Device Query (Runtime API) version (CUDART static linking)
There are 2 devices supporting CUDA
Device 0: "GeForce 9600M GT"
CUDA Driver Version: 2.30
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 536543232 bytes
Number of multiprocessors: 4
Number of cores: 32
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.78 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: No
Compute mode: Default (multiple host threads can use this device simultaneously)
Device 1: "GeForce 9400M"
CUDA Driver Version: 2.30
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 266010624 bytes
Number of multiprocessors: 2
Number of cores: 16
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.40 GHz
Concurrent copy and execution: No
Run time limit on kernels: Yes
Integrated: Yes
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Test PASSED
Needless to say is is a great new for me, enabling me to check real GPU code against IGP, and moreover beginning to use them both to have GPU load-balancer working with asymmetrical SLI.