SHAKTI Processor Family – India’s Processor

9/4/2018

Introduction

General purpose on-chip processors have become ubiquitous today. These processors range from extremely small and low power micro-controllers (used in motor controls, robotic platforms, home-appliances, etc.) to hefty and high-performance multi-core processors (used in servers and supercomputers). However, the growth of modern domain-specific languages (like Caffe, Tensorflow, etc.) and the need for more specialized features like machine-learning, enhanced security, etc. has forced the industry to look beyond general purpose solutions and towards mass-customization. While a large number of companies today can develop custom ASICs (Application Specific Integrated Chips) and license specific silicon blocks from chip-vendors to develop a customized SoCs (System on Chips), at the heart of every design is the processor and the associated hardware. To serve modern workloads better, these processors also need to be customized, upgraded, re-designed and augmented suitably. This requires that vendors/consumers have access to relevant processor variants and the flexibility to make modifications and ship them at an endurable cost.

Today, a fair share of the processor market is dominated by just a few giants like Intel, ARM, AMD, etc. Each of these companies have an impressive IP portfolio of processors catering to various market trends. Almost all of the IP offerings of these companies fall under a licensing model which varies significantly. For example, Intel licenses its ISA only to limited users like AMD. ARM on the other hand offers a broad of range of licenses from ISA to architectural licenses. Apart from just license fees, these companies also charge royalties on devices using their IPs. Having sustained a successful IP model, today some of these licenses can go upto 1-10\$ Million in addition to strict NDAs which may restrict the user from making any proprietary changes or even publishing relevant numbers. All these aspects of the licensing model, while benefiting the respective companies, has made is difficult for consumers to develop truly customized solutions for modern day workloads. Some of these customizations cater to too low a market sector for the giants themselves to invest in, thereby prohibiting growth and novelty.

In essence, the closed-source IP model in the processor community is proving to be a hindrance to build scalable solutions. A similar struggle in the software industry against closed-source IP led to the rise of the open-source Linux kernel in the 1990s. Since then the software community has seen a plethora of open-source software and tool-chains which have been adopted by industry and academia both. The hardware community however, hasn't seen such a revolution yet and is probably in dire need of the same. An open-source processor eco-system will not only boost customization but also allow bright minds of the industry and academia to collaborate and provide a stable and viable framework competent enough with modern-day products. SHAKTI, an open-source initiative by IIT-Madras (Indian Institute of Technology Madras) is primarily aimed at building such open-source processor development eco-systems which can equip the community with enough ammunition to build custom and industrial grade processors without the hassle of licensing, NDAs, royalties or any other sort of restrictions.

The SHAKTI Program

The SHAKTI Processor Program was started as an academic initiative back in 2014 by the RISE group at IIT-Madras. Realizing the limitations of the processor industry mentioned above, the initiative aimed at not only creating open-source industrial grade processors but also building associated components of a bigger eco-system - like interconnect fabrics, scalable verification platforms, peripheral IPs, etc. - which enables rapid adoption of the processors. Some of the major highlights of the program which make it a viable option for adoption are:

Source code of all the components of the SHAKTI eco-system is open under the 3 part BSD license. This means a user could freely use, modify and circulate the source code without having to sign any NDAs, licenses or even notify the authors as long as the license header file remains. The SHAKTI program itself will not assert any patents and thereby removes the burden of paying royalties as well.

The processors of the SHAKTI eco-system are build using the open-source RISC-V ISA. RISC-V has been designed for modularity and extensions, thereby perfectly fitting the prologue of "customization". The ISA also comes with a complete software stack, including compilers, operating systems, and debuggers, which are open source and thus also modifiable. Since the ISA does not dictate micro-architectural features, the software and hardware can be maintained by two complete different entities and yet be compatible. This allows for great re-usability and sharing of code-base across the community.

The SHAKTI processors and the front-end (RTL) designs are developed using the open-source High Level Synthesis (HLS) language: Bluespec System Verilog (BSV). BSV equips the user to develop extremely modular and parameterized modules with defined interfaces. This feature facilitates the user to focus and modify only the designs of interest without having to break the rest of the flow. Today there exists a free bsv-parser which the community can use to develop open/proprietary compilers for BSV.

Academia now has access to a real world working prototypes of processors which they can play with for free. This enables them to depart from the world of "simulators" and "emulation models" and try out their research and ideas in practice. They are no longer tied down by strict NDAs on publishing and can thereby participate more actively in shaping the future of the processor industry.

A typical process of acquiring ISA or architectural licenses from companies like ARM can vary anywhere between 6-12 months. This increases the time-to market for the consumers. SHAKTI, can immensely reduce this time by avoiding such formalities and providing a powerful modular framework allowing small tech start-ups to only modify components of interest rather than building a solution from scratch.

With minds from all over the community pouring in ideas and solutions, SHAKTI has the potential to become a state-of-the-art offering quickly

An open-source eco-system such as SHAKTI promotes a mix-and-match environment where users can plug-in different open-source or proprietary IPs and innovate on new ideas and projects.

Being completely open-source, it is close-to-impossible for external entities to add back-doors and black-boxes. This is of particular interest to strategic sectors of countries like India, which today depend on black-box solutions provided by industries which are headquartered in foreign countries.

SHAKTI can also enable the software community drastically. Fearing strong patent lawsuits, software developers who own licensed HW IPs for development are forced to release only binaries rather source code and also provide minimal documentation. This leaves the software Libre community in dangling state, spending months and even years \emph{"picking up pieces"}.

In addition to the above arguments, a combination of the open-source processor eco-systems such as SHAKTI and a fabrication entity like TSMC, which is offering upto 100 small tests chips on its latest technology node for only 30,000\$, can virtually enable any project with real-chips for their final validation at drastically low costs and time.

Source Code

A majority of the front-end design of SHAKTI is done using Bluespec System Verilog. The bluespec compiler can generate a cycle-accurate C model, which in simulation is 8-10x faster than state-of-the-art verilog simulators. This drastically speeds up verification of designs. Additionally, the BSV generated verilog is not only well structured and human readable/maintainable but is also 100% synthesizable, enabling users to start prototyping on FPGAs from day-1. It also perevents classes of design errors like race-conditions and type errors from happeneing, thereby obviating the need for verification in these areas. This represents a paradigm change in CPU architecture design flow. A large part of the verification tools and auxiliary components are developed using python

Members of the Shakti Processor Team : G. S. Madhusudan, Vishvesh Sundararaman, Arjun Menon, Vinod Ganesan, Shankar Raman, Neel Gala, Deepa N Sarma, Gopinathan M., Rahul Bodduna

Ecosystem ComponentsSHAKTI has envisioned a family of processors as part of its road-map, catering to different segments of the market. They have been broadly categorized into "Base Processors", "Multi-Core Processors" and "Experimental Processors".

Base Processors
E Class
This is our embedded class processor, built around a 3-stage in-order core. It is aimed at low-power and low compute applications and is capable of running basic RTOSs like FreeRTOS, Zephyr and eChronos. Market segments include: smart-cards, IoT sensors, motor-controls and robotic platforms

C Class
The C Class is a controller class of processors, aimed at mid-range application workloads. The core is a highly optmized, 5-stage in-order design with MMU support and capability to run operating systems linux and seL4. These processors are targeted at compute/control applications in the 500 MHz - 1.5 Ghz range. The C-class will support the full RISC-V ISA(Instruction Set Architecture). The C Class is also the basis for our Tagged-ISA and Fault tolerant cores.

I Class
Equipped with performance oriented features like out-of-order execution, multi-threading, aggressive branch prediction, non-blocking caches and deep pipeline stages. the I-Class processors are targeted at the compute , mobile, storage and networking the mobile and networking segments. Target operating range - 1.5-2.5 Ghz.

Multi-Core ProcessorsThis category consists of multi-core variants with auxiliary computational units meant to serve high-performance compute requirements

M Class
This is a mobile class processor with a maximum of 8 cores, the cores being a combination of C and I class cores. Tile-Link is used as the cache-coherent interconnect used along along with transaction adapters/bridges to AXI4/AHB to connect to fast and/or slow peripherals. The tilelink topology is customizable to allow optimations for various power/performance targets. In typical configurations, it is expected that a core complex of 2 or 4 cores will share an L2 cache. L3 caches are optional and are typically expected to be used in desktop type applications.

S Class
The S-Class is aimed at Workstation and Enterprise serever workloads. The base core is an enhanced version of the I-class, with quad-issue and multi-threading support. A tile-link based cache coherent mesh fabric is the intercoonect of choice. Cores are expected to use dedicated L2 caches and segmented L3 caches. A maximum core count of 32 will be supported. External interconnect is expected to be Gen-Z and we are considering supporting multi-socket cache coherenecy based on a MOESIF style protocol running on top of Gen-Z.

H Class
SoC configuration aimed at highly parallel enterprise ,HPC and analytics workloads. The cores can be a combination of C or I class, single thread performance driving the core choice. Optional L4 caches and an optimized memory hierarchy is key to achieveing a high memory bandwidth. The architecture thrust is on accelerators, VPU and AI/ML and an mesh SoC fabric optimized for up to 128 cores with multiple accelerators per core. Close integration with an external Gen-Z fabric is a key part of the design, as is support for storage class memory. This aspect of the design is crucial since I/O and memory bandwidth is often the bottleneck for these classes of processors.

Experimental ProcessorsThese categories of cores are experimental in nature and will include variants of the base-class processors modified to meet specific criteria

T Class
A varinat of the C-Class that explores tag based ISAs for object level security. We plan to support corase and fine grain tags. Coarse grain tags will be used to realize micro-VM like functionality. to mitigate software attacks like buffer-overflow.

F Class
T-Class processors are fault tolerant versions of the base-processors. Features include redundant compute blocks (like DMR and TMR), temporal redundancy modules to detect permanent faults, lock-step core configurations, fault localization circuits, ECC for critical memory blcoks and redundant bus fabrics. These are also a key component of our ASIL-D solutions and autonmous vehicle compute blocks.

According to Sources, Shakti is already going into production with the first design in the control system of an experimental civilian nuclear reactor[prototype Fast Breeder reactor].

Source :- https://shaktiproject.bitbucket.io/ [Official Website of Shakti Program]

0 Comments

SHAKTI Processor Family – India’s Processor

Leave a Reply.

Author

Archives

Categories