General purpose on-chip processors have become ubiquitous today. These processors range from extremely small and low power micro-controllers (used in motor controls, robotic platforms, home-appliances, etc.) to hefty and high-performance multi-core processors (used in servers and supercomputers). However, the growth of modern domain-specific languages (like Caffe, Tensorflow, etc.) and the need for more specialized features like machine-learning, enhanced security, etc. has forced the industry to look beyond general purpose solutions and towards mass-customization. While a large number of companies today can develop custom ASICs (Application Specific Integrated Chips) and license specific silicon blocks from chip-vendors to develop a customized SoCs (System on Chips), at the heart of every design is the processor and the associated hardware. To serve modern workloads better, these processors also need to be customized, upgraded, re-designed and augmented suitably. This requires that vendors/consumers have access to relevant processor variants and the flexibility to make modifications and ship them at an endurable cost.
Today, a fair share of the processor market is dominated by just a few giants like Intel, ARM, AMD, etc. Each of these companies have an impressive IP portfolio of processors catering to various market trends. Almost all of the IP offerings of these companies fall under a licensing model which varies significantly. For example, Intel licenses its ISA only to limited users like AMD. ARM on the other hand offers a broad of range of licenses from ISA to architectural licenses. Apart from just license fees, these companies also charge royalties on devices using their IPs. Having sustained a successful IP model, today some of these licenses can go upto 1-10\$ Million in addition to strict NDAs which may restrict the user from making any proprietary changes or even publishing relevant numbers. All these aspects of the licensing model, while benefiting the respective companies, has made is difficult for consumers to develop truly customized solutions for modern day workloads. Some of these customizations cater to too low a market sector for the giants themselves to invest in, thereby prohibiting growth and novelty.
In essence, the closed-source IP model in the processor community is proving to be a hindrance to build scalable solutions. A similar struggle in the software industry against closed-source IP led to the rise of the open-source Linux kernel in the 1990s. Since then the software community has seen a plethora of open-source software and tool-chains which have been adopted by industry and academia both. The hardware community however, hasn't seen such a revolution yet and is probably in dire need of the same. An open-source processor eco-system will not only boost customization but also allow bright minds of the industry and academia to collaborate and provide a stable and viable framework competent enough with modern-day products. SHAKTI, an open-source initiative by IIT-Madras (Indian Institute of Technology Madras) is primarily aimed at building such open-source processor development eco-systems which can equip the community with enough ammunition to build custom and industrial grade processors without the hassle of licensing, NDAs, royalties or any other sort of restrictions.
The SHAKTI Program
The SHAKTI Processor Program was started as an academic initiative back in 2014 by the RISE group at IIT-Madras. Realizing the limitations of the processor industry mentioned above, the initiative aimed at not only creating open-source industrial grade processors but also building associated components of a bigger eco-system - like interconnect fabrics, scalable verification platforms, peripheral IPs, etc. - which enables rapid adoption of the processors. Some of the major highlights of the program which make it a viable option for adoption are:
In addition to the above arguments, a combination of the open-source processor eco-systems such as SHAKTI and a fabrication entity like TSMC, which is offering upto 100 small tests chips on its latest technology node for only 30,000\$, can virtually enable any project with real-chips for their final validation at drastically low costs and time.
A majority of the front-end design of SHAKTI is done using Bluespec System Verilog. The bluespec compiler can generate a cycle-accurate C model, which in simulation is 8-10x faster than state-of-the-art verilog simulators. This drastically speeds up verification of designs. Additionally, the BSV generated verilog is not only well structured and human readable/maintainable but is also 100% synthesizable, enabling users to start prototyping on FPGAs from day-1. It also perevents classes of design errors like race-conditions and type errors from happeneing, thereby obviating the need for verification in these areas. This represents a paradigm change in CPU architecture design flow. A large part of the verification tools and auxiliary components are developed using python
Members of the Shakti Processor Team : G. S. Madhusudan, Vishvesh Sundararaman, Arjun Menon, Vinod Ganesan, Shankar Raman, Neel Gala, Deepa N Sarma, Gopinathan M., Rahul Bodduna
Ecosystem ComponentsSHAKTI has envisioned a family of processors as part of its road-map, catering to different segments of the market. They have been broadly categorized into "Base Processors", "Multi-Core Processors" and "Experimental Processors".
This is our embedded class processor, built around a 3-stage in-order core. It is aimed at low-power and low compute applications and is capable of running basic RTOSs like FreeRTOS, Zephyr and eChronos. Market segments include: smart-cards, IoT sensors, motor-controls and robotic platforms
The C Class is a controller class of processors, aimed at mid-range application workloads. The core is a highly optmized, 5-stage in-order design with MMU support and capability to run operating systems linux and seL4. These processors are targeted at compute/control applications in the 500 MHz - 1.5 Ghz range. The C-class will support the full RISC-V ISA(Instruction Set Architecture). The C Class is also the basis for our Tagged-ISA and Fault tolerant cores.
Equipped with performance oriented features like out-of-order execution, multi-threading, aggressive branch prediction, non-blocking caches and deep pipeline stages. the I-Class processors are targeted at the compute , mobile, storage and networking the mobile and networking segments. Target operating range - 1.5-2.5 Ghz.
Multi-Core ProcessorsThis category consists of multi-core variants with auxiliary computational units meant to serve high-performance compute requirements
This is a mobile class processor with a maximum of 8 cores, the cores being a combination of C and I class cores. Tile-Link is used as the cache-coherent interconnect used along along with transaction adapters/bridges to AXI4/AHB to connect to fast and/or slow peripherals. The tilelink topology is customizable to allow optimations for various power/performance targets. In typical configurations, it is expected that a core complex of 2 or 4 cores will share an L2 cache. L3 caches are optional and are typically expected to be used in desktop type applications.
The S-Class is aimed at Workstation and Enterprise serever workloads. The base core is an enhanced version of the I-class, with quad-issue and multi-threading support. A tile-link based cache coherent mesh fabric is the intercoonect of choice. Cores are expected to use dedicated L2 caches and segmented L3 caches. A maximum core count of 32 will be supported. External interconnect is expected to be Gen-Z and we are considering supporting multi-socket cache coherenecy based on a MOESIF style protocol running on top of Gen-Z.
SoC configuration aimed at highly parallel enterprise ,HPC and analytics workloads. The cores can be a combination of C or I class, single thread performance driving the core choice. Optional L4 caches and an optimized memory hierarchy is key to achieveing a high memory bandwidth. The architecture thrust is on accelerators, VPU and AI/ML and an mesh SoC fabric optimized for up to 128 cores with multiple accelerators per core. Close integration with an external Gen-Z fabric is a key part of the design, as is support for storage class memory. This aspect of the design is crucial since I/O and memory bandwidth is often the bottleneck for these classes of processors.
Experimental ProcessorsThese categories of cores are experimental in nature and will include variants of the base-class processors modified to meet specific criteria
A varinat of the C-Class that explores tag based ISAs for object level security. We plan to support corase and fine grain tags. Coarse grain tags will be used to realize micro-VM like functionality. to mitigate software attacks like buffer-overflow.
T-Class processors are fault tolerant versions of the base-processors. Features include redundant compute blocks (like DMR and TMR), temporal redundancy modules to detect permanent faults, lock-step core configurations, fault localization circuits, ECC for critical memory blcoks and redundant bus fabrics. These are also a key component of our ASIL-D solutions and autonmous vehicle compute blocks.
According to Sources, Shakti is already going into production with the first design in the control system of an experimental civilian nuclear reactor[prototype Fast Breeder reactor].
Source :- https://shaktiproject.bitbucket.io/ [Official Website of Shakti Program]