Tesla applies for series of patents for new AI chip in Autopilot Hardware 3.0

Fred Lambert | Jan 25 2019 - 11:04 am PT

Tesla is working on an important new product that it claims will enable them to bring full self-driving capability to its vehicles: a new AI chip, or “neural net accelerator’, to be released in the Autopilot Hardware 3.0 computer upgrade.

We have now uncovered a series of new patent applications from Tesla about this new computer.

As previously reported, Tesla hired a team of chip architects and executives from AMD back in 2016 and they are now led by former Apple chip architect Peter Bannon to deliver the new computer to power Tesla’s self-driving system.

Bannon, along with several of those former AMD engineers, like Emil Talpes, a former longtime AMD chip architect who worked on the K12 ARM core, and Debjit Das Sarma, former lead CPU architect at AMD, are all named on a series of patent applications related to the new computer.

In one of the patent applications made public today, Tesla explains why they wanted to move away from CPUs and GPUs to power their machine learning system:

“Processing for machine learning and artificial intelligence typically requires performing mathematical operations on large sets of data and often involves solving multiple convolution layers and pooling layers. Machine learning and artificial intelligence techniques typically utilize matrix operations and non-linear functions such as activation functions. Applications of machine learning include self-driving and driver-assisted automobiles. In some scenarios, computer processors are utilized to perform machine learning training and inference. Traditional computer processors are able to perform a single mathematical operation very quickly but typically can only operate on a limited amount of data simultaneously. As an alternative, graphical processing units (GPUs) may be utilized and are capable of performing the same mathematical operations but on a larger set of data in parallel. By utilizing multiple processor cores, GPUs may perform multiple tasks in parallel and are typically capable of completing large graphics processing tasks that utilized parallelism faster than a traditional computer processor. However, neither GPUs nor traditional computer processors were originally designed for machine learning or artificial intelligence operations. Machine learning and artificial intelligence operations often rely on the repeated application of a set of specific machine learning processor operations over very large datasets. Therefore, there exists a need for a microprocessor system that supports performing machine learning and artificial intelligence specific processing operations on large datasets in parallel without the overhead of multiple processing cores for each parallel operation.”

The series of patent describe microprocessor designed to address this issue

Tesla’s new AI chip patents

Accelerated Mathematical Engine

Tesla describes the invention in the patent application:

“Various embodiments of the disclosure relate to an accelerated mathematical engine. In certain embodiments, the accelerated mathematical engine is applied to image processing such that convolution of an image is accelerated by using a two-dimensional matrix processor comprising sub-circuits that include an ALU, output register and shadow register. This architecture supports a clocked, two-dimensional architecture in which image data and weights are multiplied in a synchronized manner to allow a large number of mathematical operations to be performed in parallel.”

Here are a few drawings and schematics from the patent application:

Here’s the patent application in full:

[scribd id=398220774 key=key-edaiGfUrqgpzXH0fntqt mode=scroll]

Computational Array Microprocessor system with variable latency memory access

Tesla describes the invention in the patent application:

“A microprocessor system comprises a computational array and a hardware arbiter. The computational array includes a plurality of computation units. Each of the plurality of computation units operates on a corresponding value addressed from memory. The hardware arbiter is configured to control issuing of at least one memory request for one or more of the corresponding values addressed from the memory for the computation units. The hardware arbiter is also configured to schedule a control signal to be issued based on the issuing of the memory requests.”

Here are a few drawings and schematics from the patent application:

Here’s the patent application in full:

[scribd id=398220661 key=key-HwuHE0nzW7oHjgYseeOK mode=scroll]

Computational array microprocessor system using non-consecutive data formating

Tesla describes the invention in the patent application:

A microprocessor system comprises a computational array and a hardware data formatter. The computational array includes a plurality of computation units that each operates on a corresponding value addressed from memory. The values operated by the computation units are synchronously provided together to the computational array as a group of values to be processed in parallel. The hardware data formatter is configured to gather the group of values, wherein the group of values includes a first subset of values located consecutively in memory and a second subset of values located consecutively in memory. The first subset of values is not required to be located consecutively in the memory from the second subset of values.

Here are a few drawings and schematics from the patent application:

Here’s the patent application in full:

[scribd id=398220565 key=key-x9aDGCJDczdb1AZghtJE mode=scroll]

Vertor Computational Unit

Tesla describes the invention in the patent application:

“A microprocessor system comprises a computational array and a vector computational unit. The computational array includes a plurality of computation units. The vector computational unit is in communication with the computational array and includes a plurality of processing elements. The processing elements are configured to receive output data elements from the computational array and process in parallel the received output data elements.”

Here are a few drawings and schematics from the patent application: