VEDIC MULTIPLIER AND FAULT
TOLERANT SPARSE KOGGE STONE ADDER BASED MULTIPLY AND ACCUMULATE UNIT
Abstract:- In most of the digital signal processing(DSP)
applications the critical operations are the multiplication and accumulation.
Real-time signal processing requires high speed and high throughput
Multiplier-Accumulator (MAC) unit that consumes low power, which is always a
key to achieve a high performance digital signal processing system. The
multiplier used inside the MAC unit is based on the Sutra "Urdhava
Tiryagbhyam"(Vertically and Cross wise) which is one of the Sutras of
Vedic mathematics. Vedic mathematics is mainly based on sixteen Sutras
and was rediscovered in early twentieth century. In ancient time this Sutra was
traditionally used for decimal number multiplications within less time. The
same concept is applied for multiplication of binary numbers to make it useful
in the digital hardware. The adder that is used in the design of MAC unit is
fault tolerant sparse kogge stone adder. Sparse kogge stone adder is very fast
adder , it is made fault tolerant to make the design robust even in the
critical environments. The combined use of the vedic muliplier and fault
tolerant sparse kogge stone adder makes this MAC unit highly robust with high
performance.
II.INTRODUCTION
The DSP functions extensively
make use of the multiply-accumulate (MAC) operation, for high performance digital
signal processing system. The research focus on MAC design is to enchance its
speed. The main motivation behind this work is to acheive high speed through
VLSI design and implementation of MAC unit architecture using multipliers based
on Vedic multiplier and fault tolerant fast adder. Vedic mathematics was
reconstructed from Vedas by Sri Bharati Krishna Tirthaji(1884-1960) after his
eight years of research on Vedas. According to his view, Vedic mathematics is
mainly focused on sixteen very important principles or word-formulae which are
otherwise known as Sutras. The beauty of Vedic mathematics lies in the fact
that it reduces very cumbersome calculations in conventional mathematics to
very simple means. This is true, because Vedic formulae are developed in a
natural way on which the human mind acts. The most important feature of the
Vedic mathematics is its coherence. The entire system is wisely interrelated
and unified. The general multiplication scheme can be easily be reversed to
acheive one-line divisions. Similarly, the simple squaring scheme can easily be
reversed to produce one-line square roots. These methods are very easy to
understand. These mathematics Sutras are very powerful and useful even for
astrological calculations. This is very intresting filed and presents some
effective algorithms which can be applied to various branches of engineering
such as computing and digital signal processing.
Generally, digital adders work faster when the carries are generated
before the addition is actually performed. This is the case for carry look
ahead adders. But the disadvantage of the carry look ahead adders is that the
logic of the circuit gets complicated as & when the number of binary bits
used for the operands increases. This increases the area consumption and
increases the delay for calculation of carry for higher most bits. Later Peter
Kogge in 1970’s found the Kogge stone adder which generates & propagates
the carry in different stages in an orderly fashion. This increases the
fastness of the calculation.
Later an enhanced version of Kogge stone adder was proposed & was
called as Sparse Kogge stone adder. This is similar to Kogge stone adder except
for the fact that it calculates only intermediate carries instead of
calculating all the carries. The remaining carries are generated by the ripple
carry adders themselves. This reduces the area consumption compared to Kogge
stone adder but gives the same delay & fastness as the Kogge stone adder. A
16 bit Sparse Kogge stone adder is taken & fault tolerant circuit is
introduced for it.
A
conventional MAC unit consists of fast multiplier and an accumulator that
contains the sum of previous consecutive
products. The main goal of DSP processor design is to enhance the speed of the
MAC unit. We have designed the MAC unit based on the vedic multiplier and fault
tolerant sparse kogge stone adder, to acheive high computational capability.
II.THE MULTIPLIER ARCHITECTURE
The
vedic Sutra called Urdhava Tiryagbhyam(Vertically and crosswise) deals with the
multiplication of numbers. This Sutra has been traditionally used for the
multiplication of decimal numbers. We have applied the same idea to the binary
numbers to make it compatible with the digital hardware. Let us first
illustrate this Sutra with the help of an example in which two decimal numbers
592 and 687 are multiplied.
We now extend the vedic multiplication algorithm
to binary number system. Let us consider the multiplication of two 4-bit binary
numbers A3A2A1A0 and B3B2B1B0. As the result would be more than 4 bits, we
express it as R7R6R5......R0. Line diagram for multiplication of two 4-bit
numbes is shown in below diagram, which is nothing but mapping of the above fig
for binary system.
Least
significant bit R0 is obtained by multiplying the least significant bits of the
multiplicand and multiplier. The process is followed according to the steps
shown in above figure. The digits on both the side of the line are multiplied
and added with the carry form the previous step. This generates one of the bits
of the result(Rn) and a carry(say Cn). This carry is added in the next step and
hence the process goes on. If more than one line are there in one step, all the
results are added to the previous carry. In each step, least significant bit
acts as result bit and the other entire bits act as carry. Thus the following
expressions are obtained
R0=B0A0;
C1R1=BOA1+B1A0+C0;
C2R2=C1+BOA2+B1A1+B2A0;
C3R2=C2+B0A3+B1A2+B2A1+B3A0;
C4R4=C3+B1A3+B2A2+B3A1;
C5R5=C4+B2A3+B3A2;
C6R6=C5+B3A3;
R7=COUT;
III. 4X4 MULTIPLIER MODULE
2X2 VEDIC MULTIPLIER MODULE FOR BINARY NUMBERS.
Implementation of 2x2 vedic multiplier is done
using two half-adder modules as shown in below figure. The total delay is
2-half adder delays, once the bit products are generated.
The implementation equations of 2x2 vedic
multiplier modules are.
RO(1-BIT)=BOA0;
R1(1-BIT)=B0A1+B1A0;
R2(2-BITS)=B1A1+C1;
PRODUCT = R2&R1&R0;
The
4x4 Vedic multiplier architecture is implemented using four 2x2 Vedic
multiplier modules as shown in the below diagram. Here partial product
generation and additions are done concurrently.
IV.
DESCRIPTION OF TRIPLE MODE REDUNDANCY
It is a
method used for fault detection generally in any device. Here, when we want to
detect whether there is any fault in some core component, two more components
similar to this core component are made & the inputs that are supplied to
the core component are also supplied to these two other components which are
assumed to be working fine. Out of the three outputs that we get, if any of the
two outputs are equal, then that output is considered for further stages. This
way, if we can detect that there is a fault in the core component, and then
sooner or later, that component will be replaced by the redundant component
until a correction is made to the original one.
Though there is an increase in the area & power dissipation because
of these additional circuits, reliability of the circuits are high if in case
they are used in critical applications
V.
DESCRIPTION OF 16-BIT KOGGE STONE ADDER
The above
diagram describes the way in which a 16bit Kogge stone adder generates the
carries beforehand. The top block PG calculates the generate & propagate
values for corresponding inputs of the operands:
Gi = Ai and Bi
Pi = Ai xor Bi
Then the
black boxes calculate intermediate generate & propagate values in the
following fashion:
Pi:j = Pi:k+1 and Pk:j
Gi:j = Gi:k+1 or (Pi:k+1 and Gk:j)
Finally,
the white boxes are used for carry calculations which are nothing but the
generate values as described in the above equation.
Using the carry values generated above, the sum can be
calculated using the equation described below:
Si = Pi xor C(i-1)
VI.
DESCRIPTION OF 16-BIT SPARSE KOGGE STONE ADDER
As shown in the previous diagram, we can observe that only
carries from the 3rd bit, 7th bit & 11th
bit are produced for the addition beforehand. Remaining carries are produced by
the ripple carry adders RC0, RC1, RC2 & RC3. These are 4-bit full adders
which takes the generated carries as inputs. We can observe that the logic
complexity reduces when we use this adder still maintaining the same delay.
VII. INCLUSION OF TRIPLE
MODE REDUNDANCY FOR LOWER HALF FAULT DETECTION
In the above diagram, Sparse
kogge stone is considered & various other components are used for fault
detection in the circuit: (LOWER HALF FAULT DETECTION)
·
As soon as the inputs are supplied, the sparse adder will
generate the 3 carries & a carry-in
is supplied from outside & the sum is generated by the RC0 to RC3.
·
A 2-bit counter is used; each time the counter
increments, the corresponding carry & other inputs are supplied to the
test-RC1 & test-RC2.
·
Depending on the counter value, corresponding sum from
any one of RC’s will be taken & the sum obtained from the two test-RC’s are
also taken. They are compared in the comparator & if any of the 2 outputs
are equal, that is sent as the tested sum.
·
The sum from the remaining RC’s are concatenated directly
with the tested sum to produce the final sum.
·
In case, the 2 test RC’s are producing an equal sum which
is unequal to the sum obtained from a particular RC, then an error signal is
generated.
·
The error signal will stop the counter from incrementing.
·
The incorrect output may come for 1 to 4 clock cycles
before the error signal is detected; once we detect that some particular RC is
faulty, then we have to take suitable measures to correct it.
·
The correction measure suggested in this paper would be
to use a redundant RC which is non-faulty after the error is detected.
Testing methodology for
lower half circuit:
As per the expectations in the working of lower half
circuit, if a fault is introduced in any one of the RC’s, depending on the
counter value & the RC which is getting tested, it must take 1 to 4 clock
cycles to detect the error & an immediate correction is provided by
switching to a redundant RC.
To achieve the above expectations, fault was introduced
intentionally into the RC’s using a MUX so that the particular RC will produce
the erroneous output. As soon as the error signal comes, there will be an
immediate switch to a non-faulty backup RC. Introduction of fault testing logic
like this into the design can be classified as the post-validation logic.
VIII. FAULT DETECTION AND
CORRECTION FOR UPPER HALF CIRCUIT:
Observe that the upper half carry generator logic
is divided into GREEN, PURPLE & BLUE sections.
In order
to detect whether there is a fault in one of these sections, we consider the
outputs of the ripple carry adders. That is:
·
Fault in the carry generated from GREEN section can be
detected by comparing the carry c3 generated by it with the carry generated by
ripple carry adder RC0.
·
Similarly comparison can be done between c7 from PURPLE
section & that of carry from RC1.
·
The final comparison would be between the carry c11 from
BLUE section & the carry from RC2.
If a fault is introduced in any one of these
sections, it takes 1 to 3 clock cycles during which incorrect outputs are
visible and later another comparator provided for the carry comparison will
detect the mismatch & the error signal is generated. This will stop the
counter. Now the counter will replace the particular section with the
corresponding backup section.
IX.DESIGNING MAC UNIT FROM VEDIC MULTIPLIER AND FAULT
TOLERANT SPARSE KOGGE STONE ADDER.
A basic MAC
architecture consists of a multiplier and an accumulator. The MAC unit computes
the product of two numbers and adds the product to an accumulator register. The
output of the register is fed back to one input of the adder as shown in the
above figure. On each clock edge, either considered positive or negative edge,
the output of the multiplier is added to the prev sum value using the adder
present in the design.
Vedic
multiplier has got 2 4-bit inputs, it gives the 8-bit P as output. This is
given as the one of the input to the Fault tolerant sparse kogge stone adder,
which adds the prev sum value and the P input to it and gives the MAC OUT.
X.
IMPLEMENTATION
The whole
implementation MAC unit based on the Vedic Multiplier and Fault tolerant sparse
kogge stone adder is done with Verilog HDL. Simulation is done with ModelSim PE
Student Edition 10.1c version.
XI.
VERIFICATION RESULTS
The
above simulation result shows , based on the inputs the vedic multiplier is
giving product P and based on the previous sum value(mac_out), fault tolerant
sparse kogge stone adder is giving the mac out value.
Performance
calculation of the MAC
CLOCK TIME PERIOD
: 2 PICO SECONDS.
CLOCK FREQUENCY :
500 GIGA HERTZ.
INFERENCE
VEDIC
MULTIPLIER DELAY :
LESS THAN 2 PICO SECONDS
FAULT
TOLERANT SPARSE KOGGE : LESS
THAN 2 PICO SECONDS
STONE
ADDER
The
clock frequency for the design is 2 pico seconds, i.e, the design is operating
at the frequency of 500 Giga Hertz.
Within the 500 GHz freq the inputs are changing
and within the 2 pico seconds time the vedic multiplier is giving the
product value and Fault tolerant sparse kogge stone adder is calculating the
mac_out value.
XII.
CONCLUSION
The obtained simulation
results above clearly indicate the proper functioning of the MAC unit based on
Vedic multiplier and Fault tolerant sparse kogge stone adder. The delay
calculations clearly indicate the MAC is operating at high performace, and can
be implemented in high speed DSP applications.
**************PHOTOS WILL BE UPLOADED SOON**************************
No comments:
Post a Comment