Tag Archives: GPU

Optical Chip is Faster than GPU

Although a typical GPU setup can solve the Ising problem with ease, now a silicon photonics accelerator can also do the same, but at a speed a hundred times faster. The optical computing startup Lightelligence has demonstrated this feat.

The photonic arithmetic computing engine from Lightelligence is an integrated optical computing system, and it is known as Pace. It consists of about twelve thousand photonic devices that run at 1 GHz each. Compared to Comet, the earlier 100-device prototype from Lightelligence that they unveiled in 2019, Pace has a speed advantage of 1 million times. This is the first time that Lightelligence has demonstrated a use case on its hardware that goes beyond AI acceleration.

Lightelligence has designed Pace to run algorithms for problems that belong to the NP-Complete class. These represent one of the most difficult computational issues, requiring much higher speed systems compared to existing accelerators. Pace did not demonstrate optical superiority for all applications. However, it beat a typical GPU when executing the Ising problem by a factor of 100. In fact, it was even defeated by a factor of 25 a system that Toshiba assembled especially for solving the Ising problem—the simulated bifurcation machine running on FPGAs.

With a huge state space, NP-complete problems require very large computing resources for tackling them. The computing time depends on a polynomial of the size of the problem, scaling in proportion. This class includes the Ising problem, traveling salesman problem, and the graph max-cut/min-cut problem. In reality, NP-Complete problems can be found in scheduling, bio-informatics, material discovery, circuit design, power grid optimization, and cryptography applications.

According to their CEO Yichen Shen, Lightelligence decided to demonstrate the acceleration of NP-Complete problems as this best illustrated the advantages of optical computing.

The chief advantage of the optical compute engine from Lightelligence is it can compute matrix multiplications much faster than GPUs can. Typically, GPUs take several hundreds of clock cycles to complete a 64 x 64 matrix multiplication. According to Lightelligence, Pace can do it in about 5 nsec. As NP-Complete problems require several iterative matrix multiplications, Pace has the upper hand. Lightelligence wanted a problem that best demonstrated the superiority of this new technology.

The major factor for Pace is the iterative nature of the algorithms that the NP-Complete problems use. Moreover, the successive matrix multiplications depend on the result of the previous calculations. In GPUs, system electronic parts cause the bottleneck, as data must shuttle to and from the memory in between multiplications. With bigger commercial use cases, the read and write cycles in digital electronics increases tremendously such that the entire computing system slows down. Lightelligence is confident it will be able to demonstrate advantages at least several times faster, if not 100 times.

Optical computing has numerous advantages. Based on silicon photonics, Its main advantage is its speed—several orders of magnitude improvements in power efficiency and computing speed. Basically, the system directs modulated infrared light within silicon wires or waveguides. Scientists accomplished this by using standard CMOS processes.

Low-Power GPU for IoT

The Mali Graphical Processing Units or GPUs from ARM are popular because of their cost efficiency. ARM has optimized them to provide energy efficient, high performance graphics in the smallest possible area of silicon. As a result, not only low- to mid-range smartphones, but also tablets and DTVs are also using Mali cost efficient GPUs as ARM offers a diverse selection of scalable solutions involving both graphics-only and graphics plus GPU Compute technology.

ARM offers the Mali-400MP, which is the first OpenGL, ES 2.0, multi-core GPU with leading area efficiency and the Mali-450MP, which offers approximately twice the performance of the Mali-400MP. However, these are not suitable for the Internet of Things, as these devices require extremely low levels of energy consumption. For the IoT, ARM has released a new low-power GPU. Useful for wearable and other IoT gadgets, the new 32-bit Mali-470MP from ARM claims smartphone-quality graphics, while requiring only half the power used by the Mali-400MP, using the same process geometry.

For cutting the power consumption in the Mali GPUs, ARM targeted three prime areas and made a range of micro-architectural optimizations. They updated most of the processing blocks within the chip to a scheduling pipelines operating on quads. They reduced the frequency of control and state-update operations. They also increased the amount of clock gating in areas including LI caches and completed the bypass blocks.

In general, most graphic processors use floating-point arithmetic for better performance. However, using floating-point arithmetic consumes a lot of power. In Mali-470MP GPUs, ARM prefers using fixed-point arithmetic in places where it does not affect performance. By scrutinizing every milli-watt across the entire SOC, ARM was able to tune the efficiency of Mali-470MP, making it relevant for devices operating with low power budgets, but requiring sophisticated graphics such as wearables, IoT devices and entry-level smartphones.

According to Dan Wilson, Product Manager of ARM, the Mali-470MP is highly power-efficient because it is optimized for the OpenGL ES 2.0 API and its drivers. As most of the devices using Android, Android Wear and Tizen devices use the OpenGL ES API, Mali-470MP can replace the previous generation of GPUs from ARM. Additionally, there is no need to re-optimize the applications for the new GPU.

Just as users are accustomed to vibrant displays and touch interfaces on smartphones, Mali-470MP is expected to bring immersive experiences to wearables, because of its greater power efficiency and support for the OpenGL ES 2.0.

Designers have the freedom of using the multi-core configurable Mali-470MP with both 32- and 64-bit CPUs. These include processors such as the ARM Cortex-A7 and the Cortex-AS3. As IoT devices do not need to address more than 4GB or memory, ARM has designed the new CPU as a 32-bit device. However, Mali-470MP offers optimal energy efficiency when used for screen resolutions up to 640x640p in single-core configurations and up to 1080p for multi-core configurations.

However, the new GPU from ARM is not available in the market yet, and licensees will most likely be able to ship products based on the new Mali-470MP only by the end of 2016.