<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CUDA Chess &#187; OpenCL</title>
	<atom:link href="http://blog.cudachess.org/tag/opencl/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.cudachess.org</link>
	<description>CUDA Open-Source Chess Software and general considerations</description>
	<lastBuildDate>Wed, 11 Aug 2010 22:58:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9-rare</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>OpenCL performance surprise on MacBook Pro</title>
		<link>http://blog.cudachess.org/2010/03/opencl-performance-surprise-on-macbook-pro/</link>
		<comments>http://blog.cudachess.org/2010/03/opencl-performance-surprise-on-macbook-pro/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 19:14:18 +0000</pubDate>
		<dc:creator>iAPX</dc:creator>
				<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[Galaxy OpenCL Benchmark]]></category>

		<guid isPermaLink="false">http://blog.cudachess.org/?p=148</guid>
		<description><![CDATA[I own a MacBook Pro 17&#8243; with Core2 Duo 2.8Ghz, IGP MCP79 (GeForce 9400M w/16SP), and GeForce 9600MGT (32SP), to develop for CUDA and OpenCL. I like the ability to use GeForce 9400M and 9600M GT in parallel, the GeForce 9400M to try PINNED MAPPED MEMORY exchange with the CPU while kernels are running, 9600M [...]]]></description>
			<content:encoded><![CDATA[<p>I own a MacBook Pro 17&#8243; with Core2 Duo 2.8Ghz, IGP MCP79 (GeForce 9400M w/16SP), and GeForce 9600MGT (32SP), to develop for CUDA and OpenCL. I like the ability to use GeForce 9400M and 9600M GT in parallel, the GeForce 9400M to try PINNED MAPPED MEMORY exchange with the CPU while kernels are running, 9600M GT for 2.5X more performance and dedicated 512MB, and both to improve dynamic load-balancing between GPUs or CPU and GPUs.</p>
<p>This computer could be set to use only the GeForce 9400M is active, 9600M GT shutdown, it&#8217;s called &#8220;Better battery life&#8221; on System Preferences. I dont&#8217;t use it since I want best performance for OpenCL and CUDA, and the ability to use any of the GPU at any time for computing, and even both sometimes.</p>
<p><strong>The &#8220;Better Performance&#8221; setting</strong></p>
<p>In this default mode, that I use each and everyday, each GPU is active and visible on both OpenCL and CUDA. They also appears together on the &#8220;About this Mac&#8221; page as graphic displays.</p>
<p>So you could run your CUDA or OpenCL code in any of these GPU, albeit GeForce 9600M GT is running at 1.25Ghz and GeForce 9400M only around 400Mhz. But they are both usable.</p>
<p>In this mode, <a href="http://www.insanelymac.com/forum/index.php?showtopic=182874">Galaxy OpenCL Benchmark</a> will give you 20Gflops CPU, 7 Gigaflop 9400M (400Mhz) and 43 GFlops 9600M GT.</p>
<p><strong>The &#8220;Better battery life&#8221; setting</strong></p>
<p>The GeForce 9600M GT disappear from the &#8220;About this Mac&#8221; video-card list, so it seems to be deactivated completely&#8230; but&#8230; taddammmmmm</p>
<p><em>-edit- on last system version 10.6.2 they both appear again in &#8220;better battery life&#8221; setting!</em></p>
<p>On OpenCL Galaxy benchmark, and OpenCL list of devices, it re-appears, fully useable. Moreover, the GeForce 9600M GT run at full speed, 1.25Ghz, and the GeForce 9400M too, at 1.1Ghz (instead 400Mhz in &#8220;Better Performance&#8221; setting!).</p>
<p>And so there goes the Galaxy OpenCL Benchmark results: CPU 22 Gflops (+10%), 9400M 19Gflops (2.7X faster) and 9600M GT 45 Gflops (+5%). Yes, it&#8217;s faster whatever the metric you consider than using &#8220;Best Performance&#8221; mode, at elast in case of OpenCL development, with a total gain of 16Gflops (+23% overall).</p>
<p>GeForce 9600M GT is faster because it don&#8217;t have to handle graphic anymore, CPU is faster because IGP is running at 1.1Ghz instead 400Mhz and it improves memory IO, 9400M is far faster running at 1100Mhz instead 400Mhz even while it needs to drive the video output and OpenGL display!</p>
<p><strong>And unplugged on battery?</strong></p>
<p>Performances are totally identical, albeit GPU took more time to go to their maximum frequencies, due to energey-saving policy. So battery or AC-plugged doesn&#8217;t matter from a performance point-of-view, either in &#8220;Better battery life&#8221; or &#8220;Better performance&#8221; mode.</p>
<p><strong>So which mode to choose</strong></p>
<p>It&#8217;s clear if you have OpenCL-enabled software, go for the &#8220;Better battery life&#8221; setting, because CPU is faster anyway (10% more, FREE upgrade of your Mac! lol!) and OpenCL is faster too, whichever GPU is used by the application!</p>
<p>Notice that a laptop may provide 86 Gflops of processing power on Galaxy benchmark, that is a real-world astro-physic application, not a simple MAD benchmark that only favorise number of core on a GPU. These 86 Gflops are largely over an actual Mac Pro 8-core 2.66Ghz with 16threads (2 quad-core Xeon processors).</p>
<p>I want to see more and more OpenCL-enabled application!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.cudachess.org/2010/03/opencl-performance-surprise-on-macbook-pro/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apple Aperture 3 and OpenCL</title>
		<link>http://blog.cudachess.org/2010/02/apple-aperture-3-and-opencl/</link>
		<comments>http://blog.cudachess.org/2010/02/apple-aperture-3-and-opencl/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 00:07:23 +0000</pubDate>
		<dc:creator>iAPX</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Aperture 3]]></category>
		<category><![CDATA[Apple]]></category>
		<category><![CDATA[OpenCL]]></category>

		<guid isPermaLink="false">http://blog.cudachess.org/?p=145</guid>
		<description><![CDATA[Apple Aperture 3 is probably the first mainstream application to use OpenCL technology. It&#8217;s not on the specifications or technical informations, but it use OpenCL for RAW decoding and processing, from start to finish, and it&#8217;s a brilliant idea, even if the software is not as fast as I expected.
I discovered that, after some forums [...]]]></description>
			<content:encoded><![CDATA[<p>Apple Aperture 3 is probably the first mainstream application to use OpenCL technology. It&#8217;s not on the specifications or technical informations, but it use OpenCL for RAW decoding and processing, from start to finish, and it&#8217;s a brilliant idea, even if the software is not as fast as I expected.</p>
<p>I discovered that, after some forums reading, and trying Aperture 3, doing same tasks using IGP GeForce 9400M on my MacBook Pro 17&#8243; and the GeForce 9600M GT GPU (approx. 3X faster). Simple basic tasks as Thumbnail generation is really faster with the later, showing real usage of the GPU as a resource. This is not true demonstration of use of OpenCL but as it only supports Snow Leopard OS and Snow Leopard CoreImage technology switched from OpenGL shaders to OpenCL, this is highly probable.</p>
<p>Anyway, beside all drawbacks on Aperture 3 (memory usage, cpu usage, stupid multi-threading implementation&#8230;), that let LightRoom rule the market, it&#8217;s cool to see usage of new technology, and the turbo-boost that OpenCL may gives to mainstream applications!</p>
<p>As I stated on some forums about choosing a MacBook Pro with IGP GeForce 9400M or one with a &#8220;real&#8221; GPU GeForce 9600M GT, with OpenCL being used, the previous will stay slow albeit with fast GPU, the second one will be faster with new applications offering it longer life as a useful production tool!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.cudachess.org/2010/02/apple-aperture-3-and-opencl/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ATI Radeon 4xxx OpenCL benchmarks</title>
		<link>http://blog.cudachess.org/2009/11/ati-radeon-4opencl-benchmarks/</link>
		<comments>http://blog.cudachess.org/2009/11/ati-radeon-4opencl-benchmarks/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 02:18:42 +0000</pubDate>
		<dc:creator>iAPX</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[ATI]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[Radeon 4870]]></category>

		<guid isPermaLink="false">http://blog.cudachess.org/?p=132</guid>
		<description><![CDATA[There&#8217;s some OpenCL benchmarks out there, and on OpenCL Benchmark that test real GPGPU computation, instead of pure processing power on theorical computation, Radeon 4xxx series lag far far behind of nVidia actual GPU.
ATI Radeon 3xxx and 2xxx are not supported, albeit nVidia&#8217;s GPU are supported since 2006 G80 (GeForce 8800 and any GeForce 8 [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s some OpenCL benchmarks out there, and on <a href="http://www.insanelymac.com/forum/index.php?s=c84e17c2d8d0848db27bd8c4624da5d2&amp;showtopic=181590&amp;st=80">OpenCL Benchmark that test real GPGPU computation</a>, instead of pure processing power on theorical computation, Radeon 4xxx series lag far far behind of nVidia actual GPU.</p>
<p>ATI Radeon 3xxx and 2xxx are not supported, albeit nVidia&#8217;s GPU are supported since 2006 G80 (GeForce 8800 and any GeForce 8 series or later GPU), and Radeon 4xxx are just underperforming, lacking shared memory (memory inside each processor core).</p>
<p>Lacking &#8220;shared memory&#8221; means that for any data access Radeon 4xxx have to access global video card memory, that is usually 20X to 30X slower, and worse, memory bandwidth on Radeon graphic card are 2X to 3X slower than on nVidia&#8217;s. This is not an handicap for games, where radeon are really great graphic card, but it is for GPGPU and OpcnCL.</p>
<p>The result of lacking Shared Memory and slow graphic card memory: a Radeon 4870 (around 200$ street) could not compete with GeForce 9400M IGP (found on Mac Mini, MacBook Air, MacBook&#8230;), and a GeForce 9400M iMac will beat any ATI Radeon 4850 iMac when it&#8217;s time to compare OpenCL performances! <img src='http://blog.cudachess.org/wp-includes/images/smilies/face-sad.png' alt=':-(' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.cudachess.org/2009/11/ati-radeon-4opencl-benchmarks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How-to use CUDA for H.264 encoding?</title>
		<link>http://blog.cudachess.org/2009/11/cuda-h264-encoding-x264/</link>
		<comments>http://blog.cudachess.org/2009/11/cuda-h264-encoding-x264/#comments</comments>
		<pubDate>Tue, 03 Nov 2009 17:28:27 +0000</pubDate>
		<dc:creator>iAPX</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[CUDA]]></category>
		<category><![CDATA[full hd]]></category>
		<category><![CDATA[h.264]]></category>
		<category><![CDATA[OpenCL]]></category>
		<category><![CDATA[video encoding]]></category>

		<guid isPermaLink="false">http://blog.cudachess.org/?p=128</guid>
		<description><![CDATA[CUDA is a powerful technology, incredibly powerful GPU and superb suite of development, debugging and profilin tools. x264 project tried to make it work on their excellent h.264 video encoder (that is blazingly fast with a great video quality on CPU).
They failed, or put it differently, they choose not to use it but consider other [...]]]></description>
			<content:encoded><![CDATA[<p>CUDA is a powerful technology, incredibly powerful GPU and superb suite of development, debugging and profilin tools. <a href="http://www.videolan.org/developers/x264.html">x264 project</a> tried to make it work on their excellent h.264 video encoder (that is blazingly fast with a great video quality on CPU).</p>
<p>They failed, or put it differently, they choose not to use it but consider other way to accelerate encoding such as dedicated hardware accelerators (such as <a href="http://www.elgato.com/elgato/int/mainmenu/products/Accessories/Turbo264HD/product1.en.html">ElGato Turbo.264 HD</a> that I use on my laptop).</p>
<p>There&#8217;s many way to follow to use CUDA as H.264 encoder accelerator:</p>
<ul>
<li>Put some cpu-hungry algorithm part to GPU. Was their first choise, but this algorithm seems slower on their implementation than CPU counter-part. FAILED!</li>
<li>Put the whole encoding chain to GPU, but as the most computing-intensive part is actually slower on GPU (as they try to implement it) it&#8217;s a loss. FAILED!</li>
<li>Put the whole encoding chain to GPU, *BUT* give it a different movie part to encode, dynamically, and instead swapping to GPU, aggregate CPU and GPU to do the whole encoding.</li>
</ul>
<p>The third option is a different way to consider the GPU, not as a co-processor in the middle of a cpu-algorithm, but as an asymmetric computing resource, able to give 10% to 30% performance gain on the whole process.</p>
<p>This is the way I am currently exploring, having in mind to obtain a gain in H.264 encoding over pure-cpu, and to be able to port it to OpenCL with dedicated algorithms for CPU and GPU <img src='http://blog.cudachess.org/wp-includes/images/smilies/face-smile.png' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.cudachess.org/2009/11/cuda-h264-encoding-x264/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ATI&#8217;s OpenCL CPU-Only!</title>
		<link>http://blog.cudachess.org/2009/08/atis-opencl-cpu-only/</link>
		<comments>http://blog.cudachess.org/2009/08/atis-opencl-cpu-only/#comments</comments>
		<pubDate>Fri, 07 Aug 2009 14:06:39 +0000</pubDate>
		<dc:creator>iAPX</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[ATI]]></category>
		<category><![CDATA[OpenCL]]></category>

		<guid isPermaLink="false">http://blog.cudachess.org/?p=76</guid>
		<description><![CDATA[While nVidia actually support OpenCL on it&#8217;s GPU, but not on main CPU, ATI offers it&#8217;s own drivers that support main CPU but not it&#8217;s GPU! Anyway ATI&#8217;s GPU are not really ceonceived for GPGPU and wil llag far far behind nVidia&#8217;s on real OpenCL implementations!
The purpose of OpenCL is to enable code to run [...]]]></description>
			<content:encoded><![CDATA[<p>While nVidia actually support OpenCL on it&#8217;s GPU, but not on main CPU, ATI offers it&#8217;s own drivers that support main CPU but not it&#8217;s GPU! Anyway ATI&#8217;s GPU are not really ceonceived for GPGPU and wil llag far far behind nVidia&#8217;s on real OpenCL implementations!</p>
<p>The purpose of OpenCL is to enable code to run on both CPU and GPUs (even a mix of ATI and nVidia), not to enable to run either in CPU (what&#8217;s the novelty???) or restricted to a propretary GPU!!!</p>
<p>At this time, CUDA seems to be the technology path to follow before switching to OpenCL in 2010 or 2011&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.cudachess.org/2009/08/atis-opencl-cpu-only/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
