A COMPLETE TUTOR FOR ASIC DESIGN FLOW (RTL TO GDS II): PROJECTS

VEDIC MULTIPLIER AND FAULT TOLERANT SPARSE KOGGE STONE ADDER BASED MULTIPLY AND ACCUMULATE UNIT

Abstract:- In most of the digital signal processing(DSP) applications the critical operations are the multiplication and accumulation. Real-time signal processing requires high speed and high throughput Multiplier-Accumulator (MAC) unit that consumes low power, which is always a key to achieve a high performance digital signal processing system. The multiplier used inside the MAC unit is based on the Sutra "Urdhava Tiryagbhyam"(Vertically and Cross wise) which is one of the Sutras of Vedic mathematics. Vedic mathematics is mainly based on sixteen Sutras and was rediscovered in early twentieth century. In ancient time this Sutra was traditionally used for decimal number multiplications within less time. The same concept is applied for multiplication of binary numbers to make it useful in the digital hardware. The adder that is used in the design of MAC unit is fault tolerant sparse kogge stone adder. Sparse kogge stone adder is very fast adder , it is made fault tolerant to make the design robust even in the critical environments. The combined use of the vedic muliplier and fault tolerant sparse kogge stone adder makes this MAC unit highly robust with high performance.

II.INTRODUCTION

The DSP functions extensively make use of the multiply-accumulate (MAC) operation, for high performance digital signal processing system. The research focus on MAC design is to enchance its speed. The main motivation behind this work is to acheive high speed through VLSI design and implementation of MAC unit architecture using multipliers based on Vedic multiplier and fault tolerant fast adder. Vedic mathematics was reconstructed from Vedas by Sri Bharati Krishna Tirthaji(1884-1960) after his eight years of research on Vedas. According to his view, Vedic mathematics is mainly focused on sixteen very important principles or word-formulae which are otherwise known as Sutras. The beauty of Vedic mathematics lies in the fact that it reduces very cumbersome calculations in conventional mathematics to very simple means. This is true, because Vedic formulae are developed in a natural way on which the human mind acts. The most important feature of the Vedic mathematics is its coherence. The entire system is wisely interrelated and unified. The general multiplication scheme can be easily be reversed to acheive one-line divisions. Similarly, the simple squaring scheme can easily be reversed to produce one-line square roots. These methods are very easy to understand. These mathematics Sutras are very powerful and useful even for astrological calculations. This is very intresting filed and presents some effective algorithms which can be applied to various branches of engineering such as computing and digital signal processing.

Generally, digital adders work faster when the carries are generated before the addition is actually performed. This is the case for carry look ahead adders. But the disadvantage of the carry look ahead adders is that the logic of the circuit gets complicated as & when the number of binary bits used for the operands increases. This increases the area consumption and increases the delay for calculation of carry for higher most bits. Later Peter Kogge in 1970’s found the Kogge stone adder which generates & propagates the carry in different stages in an orderly fashion. This increases the fastness of the calculation.

Later an enhanced version of Kogge stone adder was proposed & was called as Sparse Kogge stone adder. This is similar to Kogge stone adder except for the fact that it calculates only intermediate carries instead of calculating all the carries. The remaining carries are generated by the ripple carry adders themselves. This reduces the area consumption compared to Kogge stone adder but gives the same delay & fastness as the Kogge stone adder. A 16 bit Sparse Kogge stone adder is taken & fault tolerant circuit is introduced for it.

A conventional MAC unit consists of fast multiplier and an accumulator that contains the sum of previous consecutive products. The main goal of DSP processor design is to enhance the speed of the MAC unit. We have designed the MAC unit based on the vedic multiplier and fault tolerant sparse kogge stone adder, to acheive high computational capability.

II.THE MULTIPLIER ARCHITECTURE

The vedic Sutra called Urdhava Tiryagbhyam(Vertically and crosswise) deals with the multiplication of numbers. This Sutra has been traditionally used for the multiplication of decimal numbers. We have applied the same idea to the binary numbers to make it compatible with the digital hardware. Let us first illustrate this Sutra with the help of an example in which two decimal numbers 592 and 687 are multiplied.

We now extend the vedic multiplication algorithm to binary number system. Let us consider the multiplication of two 4-bit binary numbers A3A2A1A0 and B3B2B1B0. As the result would be more than 4 bits, we express it as R7R6R5......R0. Line diagram for multiplication of two 4-bit numbes is shown in below diagram, which is nothing but mapping of the above fig for binary system.

Least significant bit R0 is obtained by multiplying the least significant bits of the multiplicand and multiplier. The process is followed according to the steps shown in above figure. The digits on both the side of the line are multiplied and added with the carry form the previous step. This generates one of the bits of the result(Rn) and a carry(say Cn). This carry is added in the next step and hence the process goes on. If more than one line are there in one step, all the results are added to the previous carry. In each step, least significant bit acts as result bit and the other entire bits act as carry. Thus the following expressions are obtained

R0=B0A0;

C1R1=BOA1+B1A0+C0;

C2R2=C1+BOA2+B1A1+B2A0;

C3R2=C2+B0A3+B1A2+B2A1+B3A0;

C4R4=C3+B1A3+B2A2+B3A1;

C5R5=C4+B2A3+B3A2;

C6R6=C5+B3A3;

R7=COUT;

III. 4X4 MULTIPLIER MODULE

2X2 VEDIC MULTIPLIER MODULE FOR BINARY NUMBERS.

Implementation of 2x2 vedic multiplier is done using two half-adder modules as shown in below figure. The total delay is 2-half adder delays, once the bit products are generated.

The implementation equations of 2x2 vedic multiplier modules are.

RO(1-BIT)=BOA0;

R1(1-BIT)=B0A1+B1A0;

R2(2-BITS)=B1A1+C1;

PRODUCT = R2&R1&R0;

The 4x4 Vedic multiplier architecture is implemented using four 2x2 Vedic multiplier modules as shown in the below diagram. Here partial product generation and additions are done concurrently.

IV. DESCRIPTION OF TRIPLE MODE REDUNDANCY

It is a method used for fault detection generally in any device. Here, when we want to detect whether there is any fault in some core component, two more components similar to this core component are made & the inputs that are supplied to the core component are also supplied to these two other components which are assumed to be working fine. Out of the three outputs that we get, if any of the two outputs are equal, then that output is considered for further stages. This way, if we can detect that there is a fault in the core component, and then sooner or later, that component will be replaced by the redundant component until a correction is made to the original one. Though there is an increase in the area & power dissipation because of these additional circuits, reliability of the circuits are high if in case they are used in critical applications

V. DESCRIPTION OF 16-BIT KOGGE STONE ADDER

The above diagram describes the way in which a 16bit Kogge stone adder generates the carries beforehand. The top block PG calculates the generate & propagate values for corresponding inputs of the operands:

Gi = Ai and Bi

Pi = Ai xor Bi

Then the black boxes calculate intermediate generate & propagate values in the following fashion:

Pi:j = Pi:k+1 and Pk:j

Gi:j = Gi:k+1 or (Pi:k+1 and Gk:j)

Finally, the white boxes are used for carry calculations which are nothing but the generate values as described in the above equation.

Using the carry values generated above, the sum can be calculated using the equation described below:

Si = Pi xor C(i-1)

VI. DESCRIPTION OF 16-BIT SPARSE KOGGE STONE ADDER

As shown in the previous diagram, we can observe that only carries from the 3^rd bit, 7^th bit & 11^th bit are produced for the addition beforehand. Remaining carries are produced by the ripple carry adders RC0, RC1, RC2 & RC3. These are 4-bit full adders which takes the generated carries as inputs. We can observe that the logic complexity reduces when we use this adder still maintaining the same delay.

VII. INCLUSION OF TRIPLE MODE REDUNDANCY FOR LOWER HALF FAULT DETECTION

In the above diagram, Sparse kogge stone is considered & various other components are used for fault detection in the circuit: (LOWER HALF FAULT DETECTION)

· As soon as the inputs are supplied, the sparse adder will generate the 3 carries & a carry-in is supplied from outside & the sum is generated by the RC0 to RC3.

· A 2-bit counter is used; each time the counter increments, the corresponding carry & other inputs are supplied to the test-RC1 & test-RC2.

· Depending on the counter value, corresponding sum from any one of RC’s will be taken & the sum obtained from the two test-RC’s are also taken. They are compared in the comparator & if any of the 2 outputs are equal, that is sent as the tested sum.

· The sum from the remaining RC’s are concatenated directly with the tested sum to produce the final sum.

· In case, the 2 test RC’s are producing an equal sum which is unequal to the sum obtained from a particular RC, then an error signal is generated.

· The error signal will stop the counter from incrementing.

· The incorrect output may come for 1 to 4 clock cycles before the error signal is detected; once we detect that some particular RC is faulty, then we have to take suitable measures to correct it.

· The correction measure suggested in this paper would be to use a redundant RC which is non-faulty after the error is detected.

Testing methodology for lower half circuit:

As per the expectations in the working of lower half circuit, if a fault is introduced in any one of the RC’s, depending on the counter value & the RC which is getting tested, it must take 1 to 4 clock cycles to detect the error & an immediate correction is provided by switching to a redundant RC.

To achieve the above expectations, fault was introduced intentionally into the RC’s using a MUX so that the particular RC will produce the erroneous output. As soon as the error signal comes, there will be an immediate switch to a non-faulty backup RC. Introduction of fault testing logic like this into the design can be classified as the post-validation logic.

VIII. FAULT DETECTION AND CORRECTION FOR UPPER HALF CIRCUIT:

Observe that the upper half carry generator logic is divided into GREEN, PURPLE & BLUE sections.

In order to detect whether there is a fault in one of these sections, we consider the outputs of the ripple carry adders. That is:

· Fault in the carry generated from GREEN section can be detected by comparing the carry c3 generated by it with the carry generated by ripple carry adder RC0.

· Similarly comparison can be done between c7 from PURPLE section & that of carry from RC1.

· The final comparison would be between the carry c11 from BLUE section & the carry from RC2.

If a fault is introduced in any one of these sections, it takes 1 to 3 clock cycles during which incorrect outputs are visible and later another comparator provided for the carry comparison will detect the mismatch & the error signal is generated. This will stop the counter. Now the counter will replace the particular section with the corresponding backup section.

IX.DESIGNING MAC UNIT FROM VEDIC MULTIPLIER AND FAULT TOLERANT SPARSE KOGGE STONE ADDER.

A basic MAC architecture consists of a multiplier and an accumulator. The MAC unit computes the product of two numbers and adds the product to an accumulator register. The output of the register is fed back to one input of the adder as shown in the above figure. On each clock edge, either considered positive or negative edge, the output of the multiplier is added to the prev sum value using the adder present in the design.

Vedic multiplier has got 2 4-bit inputs, it gives the 8-bit P as output. This is given as the one of the input to the Fault tolerant sparse kogge stone adder, which adds the prev sum value and the P input to it and gives the MAC OUT.

X. IMPLEMENTATION

The whole implementation MAC unit based on the Vedic Multiplier and Fault tolerant sparse kogge stone adder is done with Verilog HDL. Simulation is done with ModelSim PE Student Edition 10.1c version.

XI. VERIFICATION RESULTS

The above simulation result shows , based on the inputs the vedic multiplier is giving product P and based on the previous sum value(mac_out), fault tolerant sparse kogge stone adder is giving the mac out value.

Performance calculation of the MAC

CLOCK TIME PERIOD : 2 PICO SECONDS.

CLOCK FREQUENCY : 500 GIGA HERTZ.

INFERENCE

VEDIC MULTIPLIER DELAY : LESS THAN 2 PICO SECONDS

FAULT TOLERANT SPARSE KOGGE : LESS THAN 2 PICO SECONDS

STONE ADDER

The clock frequency for the design is 2 pico seconds, i.e, the design is operating at the frequency of 500 Giga Hertz. Within the 500 GHz freq the inputs are changing and within the 2 pico seconds time the vedic multiplier is giving the product value and Fault tolerant sparse kogge stone adder is calculating the mac_out value.

XII. CONCLUSION

The obtained simulation results above clearly indicate the proper functioning of the MAC unit based on Vedic multiplier and Fault tolerant sparse kogge stone adder. The delay calculations clearly indicate the MAC is operating at high performace, and can be implemented in high speed DSP applications.

**************PHOTOS WILL BE UPLOADED SOON**************************

A COMPLETE TUTOR FOR ASIC DESIGN FLOW (RTL TO GDS II)

Pages

PROJECTS

No comments:

Post a Comment