Harnessing the Power of GPU
Rocketick’s technology breaks the simulator’s dependency barrier, by mapping the Verilog code into an elaborate dependency graph, and parsing its expressions into semi-independent threads that can be processed in parallel. Theoretically, Rocketick’s parallel processing could have been carried out by a strong multi-core CPU; however, GPUs offer massive parallelism that is unmatched by the strongest CPUs. The unique architecture of the GPU offers dramatic advantages, but only to programs that are designed to utilize the GPU’s architectural characteristics.

Unique Characteristics of GPUs
GPUs have evolved into a highly parallel, multithreaded, many-core processor, with tremendous computational horsepower and very high memory bandwidth. GPUs are specialized for compute-intensive, highly parallel computation – exactly what graphics rendering is about – and are therefore designed so that more transistors are devoted to parallel data processing rather than data caching and flow control. They are especially well-suited to address problems that can be expressed as data-parallel computations – the same program is executed on many data elements in parallel – with high arithmetic intensity (the ratio of arithmetic operations to memory operations). As the same program is executed for each data element, there is a lower requirement for sophisticated flow control; also, as it is executed on many data elements and has high arithmetic intensity, the memory access latency can be hidden with calculations instead of big data caches.
Making Verilog Models Suitable for GPU Massive Parallel Computing
Verilog models are a typical example of code that is not inherently suitable for GPU processing. They are extremely complex, with numerous dependencies and rules. In order to make them suitable for massive parallel computing, Rocketick’s patent-pending technology analyzes the Verilog design, and maps the dependencies among the expressions into an elaborate dependency graph. It then partitions the dependency graph into many semi-independent threads, each including a series of processing elements. The threads and their processing elements are organized in an efficient manner, allowing optimized continuous parallelization with minimal synchronization among threads.
In addition, Rocketick has developed an efficient virtual machine that runs on the GPU and executes the threads, while maintaining the consistency of the design rules. The virtual machine uses the GPU’s standard APIs, scheduling the threads in a way that utilizes the GPU’s power and memory bandwidth to accelerate the simulation.