CRITICAL DESIGN ISSUES FOR GALLIUM ARSENIDE VLSI CIRCUITS

E. BUSHEHRI

Ph.D. 1992
The copying of this thesis in any way or form is illegal. YOU MUST NOT COPY DISSERTATIONS

Anybody found making illegal copies of any part of this thesis will be dealt with in accordance to University regulations.
Your borrowing rights will be revoked.

Please sign the copyright declaration below.
The copyright of this thesis rests with the author or the University. No part will be photocopied or published without prior written consent from one of the above. Any quotation or information derived from this thesis will be fully acknowledged and fully cited and that failure to do so will constitute plagiarism. I agree to abide by this declaration.

<table>
<thead>
<tr>
<th>Date</th>
<th>Name</th>
<th>Signature</th>
<th>Student ID</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
CRITICAL DESIGN ISSUES FOR GALLIUM ARSENIDE VLSI CIRCUITS

A thesis submitted to the Council for National Academic Awards by

Ebrahim Bushehri

in partial fulfilment of the requirements for the degree of
Doctor of Philosophy

April 1992

Microelectronics Centre, Middlesex Polytechnic
INDEX

ABSTRACT

ACKNOWLEDGEMENTS

GLOSSARY

CHAPTER 1 Introduction

1.1 Review of Silicon Technology 1
1.2 Limitations of Silicon Technology 2
for High Speed Applications
1.3 Gallium Arsenide as an Alternative Substrate 6
1.4 Current Developments and Future Trends 11
1.5 Scope of this Thesis 12

CHAPTER 2 GaAs Device Fabrication
and Modelling

2.1 Suitable Devices for VLSI Implementation 15
2.2 GaAs MESFET Structure 16
2.3 Planar Processing Steps for GaAs MESFETs 17
2.4 Self-Aligned Gate Process Technology 20
2.5 GaAs MESFET Design Rules and Layer Representation 22
2.6 An Appropriate Device Model for GaAs VLSI 22
2.7 Important Effects Included in the Device Model 26
2.8 Interconnect Modelling 27

Chapter 3 MESFET Logic Families for GaAs VLSI Circuits

3.1 Types of MESFET Logic Gate 30
3.2 Normally-ON Logic Gates 30
3.3 Normally-OFF Logic Gates 34
3.4 Suitable Logic Gates for GaAs VLSI 37
3.5 First Order Design of DCFL and SDCFL Gates 37
3.6 Definition of Design Parameters 41
3.7 Detailed Analysis of DCFL and SDCFL Gates 43
3.8 Design of Buffering Schemes for GaAs VLSI Circuits 49

CHAPTER 4 Analysis of Adder Circuits for GaAs VLSI Implementation

4.1 Adder Design Approach 59
4.2 Types of Adder 62
4.3 Evaluation of Adder Circuits for GaAs VLSI 69
4.4 Summary of Important Points 77
CHAPTER 5 A High Speed GaAs Multiplier

5.1 A Suitable Multiplier for GaAs Implementation 81
5.2 The Algorithm 82
5.3 The Overall Architecture 85
5.4 Implementation Issues 94
5.5 Performance Evaluation 96

CHAPTER 6 A Novel Design and Layout

Approach for GaAs VLSI Circuits

6.1 Architectural Decomposition of GaAs VLSI Circuits 99
6.2 Ring Notation for the Layout of GaAs VLSI Circuits 100
6.3 Important Issues in Ring Notation Layouts 106
6.4 Design of BLC Adders using the Ring Notation 110
6.5 Evaluation of the Ring Notation Adders and Multiplier Circuits 115

CHAPTER 7 Conclusions

7.1 Summary and Conclusions 119
7.2 Recommendations 124
Appendices

A Layer Representation and Design Rules 126
B Derivation of Gate Delay Formula 138
C Brief Description of the Design Tool 141
D CPS and CPW Models for the Estimation of the Inductances in the Supply Rails 143
E Logic and Ring Notation Diagrams of the DCFL and SDCFL BLC Adders 147

References 153
Critical Design Issues for Gallium Arsenide VLSI Circuits

Abstract

The aim of this research was to design and evaluate various Gallium Arsenide circuit elements such as logic gates, adders and multipliers suitable for high speed VLSI circuits. The issues addressed are the logic gate design and optimisation, evaluation of various buffering schemes and the impact of the algorithm on adder and multiplier performance for digital signal processing applications. This has led to the development of a design approach to produce high speed and low power dissipation Gallium Arsenide VLSI circuits. This is achieved by:

- Evaluating the well established Direct Coupled Logic (DCFL) gates and proposing an alternative gate, namely the Source Follower DCFL (SDCFL), to improve the noise margin and speed.
- Suggesting various buffering schemes to maintain high speed in areas where the fanout loading is high (eg. clock drivers).
- Comparing various adder types in terms of delay-power and delay-area products to arrive at a suitable architecture for Gallium Arsenide implementation and to determine the influence of the algorithm and layout approach on circuit performance. To investigate this further, a multiplier was also designed to assess the performance at higher levels of integration.
- Applying a new layout approach, called the 'ring notation', to the adder and multiplier circuits in order to improve their delay-area product.

Finally, the critical factors influencing the performance of the circuits are reviewed and a number of suggestions are given to maintain reliable operation at high speed.
Acknowledgements

I would like to express my gratitude to the following people at Middlesex Polytechnic.

Professor John Butcher, my director of studies, for his valuable guidance and support throughout this research programme. His much needed comments and constructive criticisms on the draft of the key chapters are also greatly appreciated.

My supervisors Mr Richard Bayford and Dr Robert Paul Camp for their help, advice and technical input to the project.

Mr Paul Burn, managing director of the Integrated Circuit Design Centre (ICDC), for his support and encouragement throughout the project.

My colleagues Mr Divya Pujara and Mr Majid Saber with whom I had many useful discussions on various aspects of the project.

I would like to acknowledge the support and help of the following people at the University of Adelaide, South Australia.

Dr Kamran Eshraghian, Head of the Centre for Gallium Arsenide VLSI Technology, for his direct influence on many areas of the project. His novel idea of 'ring notation layout' methodology has formed the basis of the results presented in chapter 6 of this thesis.

Mr Derek Abbott, research officer, for his guidance, especially in the initial stages of the project.

Mr Andrew Beaumont-Smith for providing much of the software support for the design tools and the design rules for the particular GaAs process used in this research.

This work has been partially supported by the Sir Keith and Sir Ross Smith Foundation of the Australian Council for Research.
Acknowledgements

I would like to express my gratitude to the following people at Middlesex Polytechnic.

Professor John Butcher, my director of studies, for his valuable guidance and support throughout this research programme. His much needed comments and constructive criticisms on the draft of the key chapters are also greatly appreciated.

My supervisors Mr Richard Bayford and Dr Robert Paul Camp for their help, advice and technical input to the project.

Mr Paul Burn, managing director of the Integrated Circuit Design Centre (ICDC), for his support and encouragement throughout the project.

My colleagues Mr Divya Pujara and Mr Majid Saber with whom I had many useful discussions on various aspects of the project.

I would like to acknowledge the support and help of the following people at the University of Adelaide, South Australia.

Dr Kamran Eshraghian, Head of the Centre for Gallium Arsenide VLSI Technology, for his direct influence on many areas of the project. His novel idea of 'ring notation layout' methodology has formed the basis of the results presented in chapter 6 of this thesis.

Mr Derek Abbott, research officer, for his guidance, especially in the initial stages of the project.

Mr Andrew Beaumont-Smith for providing much of the software support for the design tools and the design rules for the particular GaAs process used in this research.

This work has been partially supported by the Sir Keith and Sir Ross Foundation of the Australian Council for Research.
GLOSSARY

\( \tau_d \)  
Gate delay (ps)

\( L_{\text{eff}} \)  
Effective channel length (\( \mu \)m)

\( W \)  
Width of the FET channel (\( \mu \)m)

\( \rho \)  
Resistivity (\( \Omega \)cm)

\( \mu_n, \mu_p \)  
Electron and hole mobilities (\( \text{cm}^2/\text{Vs} \))

\( V_{\text{bi}} \)  
Schottky barrier height (V)

\( V_t \)  
Threshold voltage (V)

\( V_p \)  
Pinch-off voltage (V)

\( \lambda \)  
Channel length modulation parameter (1/V)

\( \beta \)  
Transconductance parameter (amp/\( V^2 \))

\( N \)  
Effective channel doping density (atom/cm\(^3\))

\( \varepsilon = \varepsilon_0 \cdot \varepsilon_r \)  
where \( \varepsilon_0 \) is the permittivity of free space (\( \text{F/cm} \)) and \( \varepsilon_r \) is the relative permittivity of GaAs (13.1)

\( C_{\text{go}} \)  
Zero bias gate capacitance (\( \text{F} \))

\( C_{gs}, C_{gs} \)  
Gate-drain and gate-source capacitances (\( \text{F} \))

\( R_d, R_s \)  
Drain and source resistances (ohm)

\( a \)  
Effective channel implant depth (\( \text{Å} \))

\( q \)  
Electron charge (C)

\( \alpha \)  
Hyperbolic tangent drain multiplier (1/V)

\( F_c \)  
Average clocking frequency

\( F_j \)  
Fanin
<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>F&lt;sub&gt;o&lt;/sub&gt;</td>
<td>Fanout</td>
</tr>
<tr>
<td>BFL</td>
<td>Buffered FET Logic</td>
</tr>
<tr>
<td>SDFL</td>
<td>Schottky Diode FET Logic</td>
</tr>
<tr>
<td>CCFL</td>
<td>Capacitor-Coupled FET Logic</td>
</tr>
<tr>
<td>QFL</td>
<td>Quasi-FET Logic</td>
</tr>
<tr>
<td>DCFL</td>
<td>Direct Coupled FET Logic</td>
</tr>
<tr>
<td>SDCFL</td>
<td>Source Follower DCFL</td>
</tr>
<tr>
<td>RDCFL</td>
<td>Ring notation DCFL</td>
</tr>
<tr>
<td>RSDCFL</td>
<td>Ring notation SDCFL</td>
</tr>
</tbody>
</table>
CHAPTER 1
INTRODUCTION

1.1 Review of Silicon Technology

Silicon is the most widely used semiconductor material for integrated circuits. The main reasons for this choice are the ease of purification, the ease of forming single crystals and the device considerations such as the ease of epitaxial growth and the growth of high integrity oxide [1]. As a result many device types have been proposed in silicon for integrated circuits. Initially the main workhorse in the IC industry was the bipolar technology and more recently the MOS process.

MOS integrated circuit technology has progressed tremendously because of the huge demand for digital electronics applications. As shown in Table 1.1, it is now possible to fabricate integrated circuits containing up to 1 million or more transistors [2]. This trend is likely to continue (Moore's law) such that by the end of 1990's the level of complexity will probably exceed 10 million transistors per chip.

The advantages of this increased level of integration are reflected in the cost reduction, higher reliability, higher speed and low power dissipation of systems which are also extremely small and light weight. To achieve these results there has been a systematic approach to improving the process technology and also major efforts have been directed towards solving the problems of device scaling. Apart from the higher packing densities achievable from the fabrication of smaller devices, it is possible to make devices with higher operating frequencies in order to fulfil the speed requirements of state-of-the-art computer systems [3] [4] [5].
<table>
<thead>
<tr>
<th>Year</th>
<th>Technology</th>
<th>No. of Trans. per Chip</th>
<th>Typical Products</th>
</tr>
</thead>
<tbody>
<tr>
<td>1950</td>
<td>Discrete Components</td>
<td>1</td>
<td>Junction Trans. and diodes.</td>
</tr>
<tr>
<td>1961</td>
<td>SSI</td>
<td>10</td>
<td>Logic gates, Flip-Flops.</td>
</tr>
<tr>
<td>1966</td>
<td>MSI</td>
<td>100-1000</td>
<td>Counters, Adders, Multiplexers.</td>
</tr>
<tr>
<td>1971</td>
<td>LSI</td>
<td>1000-20,000</td>
<td>8 bit microprocessors, ROM, RAM.</td>
</tr>
<tr>
<td>1980</td>
<td>VLSI</td>
<td>20,000-500,000</td>
<td>16 and 32 bit Microprocessors.</td>
</tr>
<tr>
<td>1985</td>
<td>ULSI</td>
<td>&gt; 500,000</td>
<td>Special Processors, Real time image processors.</td>
</tr>
<tr>
<td>1990</td>
<td>GSI</td>
<td>&gt;10,000,000</td>
<td>WSI</td>
</tr>
</tbody>
</table>

Table 1.1 Microelectronics Evolution.

1.2 Limitations of Silicon for High Speed Applications

Super fast computers with sub-nanosecond cycle times, and multi-gigabit per second telecommunication and instrumentation systems are the driving forces behind the development of high speed VLSI circuits. The emphasis is on increasing the level of integration and the speed of these circuits to achieve the computational power required by the application areas mentioned above [6].

The principal requirements of high speed VLSI circuits are: small feature size, high process yield and, most important of all, extremely low dynamic switching energy [7] [8]. [9].

The origins of the first two requirements are obvious. Clearly, large numbers of gates can not be placed on a reasonably sized chip unless the gate areas are small. For instance if a 1cm² chip is to accommodate
100,000 transistors, the size of the individual gates must be less than 1000\(\mu\text{m}^2\). The process yield should also be sufficient to produce economically such complex parts.

The dynamic switching energy or power-delay product, \(2P_d \times \tau_d\), is the minimum energy that a gate can dissipate during a clock cycle. The power dissipation for a chip with \(N_g\) gates with an average gate clocking frequency \(F_c\) will therefore be:

\[
P_{\text{(CHIP)}} = 2 \times N_g \times F_c \times (P_d \times \tau_d)
\]  

(1.1)

This relation is illustrated in Figure 1.1, for a typically 'large' total input power of 2 Watts [10].

Figure 1.1 Switching energy as a function of the number of gates per chip for a practical power of 2 Watts.
The requirement on dynamic switching energy for high speed VLSI is quite severe. Even allowing for the fact that power dissipation for large chips could safely be somewhat higher than 2 Watts, dynamic switching energies of much less than 0.1pJ appear essential for achieving practical very high speed VLSI [11]. Therefore, it is of critical importance to evaluate the existing technologies and choose the one with the lowest speed-power product in order to be able to combine high levels of integration with high speed performance.

As mentioned in section 1.1, MOS is by far the most often used technology for VLSI circuits and will continue to fill this role. In order to obtain high speed and high density MOS ICs, the device geometries need to be continuously scaled to smaller sizes [12]. This means that the theoretical and practical limits associated with the scaling of MOS circuits must be investigated to find the limitations of existing technologies.

Figure 1.2 shows the gate propagation delay and power dissipation against the channel length of fabricated CMOS inverters [13] [14]. At 0.5μm (state-of-the-art commercial device size) and standard power supply of 5V, the delay is about 120ps with power dissipation of 1.1mW. The speed-power product of the gate is therefore about 0.1pJ, enabling the realisation of high speed, medium scale integrated circuits. The expected circuit performance with scaling for different technologies has also been investigated by P.A.H Hart, et al [15]. They have considered a range of devices such as ECL, I^2L and MOS. The scaling process most benefits the MOS technology, with speeds higher than that of ECL and speed-power product even lower than I^2L. Below 1μm gate width, a delay time of 100ps and a power-delay product of 0.02pJ should theoretically be possible. However when device miniaturisation is continued, the second order effects on device characteristics become so significant that simple scaling of the technology becomes a non-viable approach at a certain geometry [16]. For example, the encroachment of the field oxide (the so-called bird’s beak created during the local oxidation stage of the normal
silicon process) makes the effective channel width smaller than the design size and degrades the drain current significantly. In addition hot carriers generated by the high electric field across the channel and the drain pinch-off region cause unacceptable device instabilities unless the power supply voltage is scaled down along with the channel length reduction. Scaling down the supply voltage results in the loss of a marked distinction between the logic 'low' and logic 'high' levels. For example scaling a 2\( \mu \text{m} \) technology to 0.2\( \mu \text{m} \) would require the supply voltage to be lowered from 5 to 0.5V with a consequent narrow noise margin and high sensitivity to variations in the supply voltage.

![Graph](https://via.placeholder.com/150)

**Figure 1.2 Delay and power dissipation of scaled inverters for power supplies of 3 and 5 volts.**

Another problem encountered in CMOS is the latch-up susceptibility
which becomes a serious drawback in sub-micron geometries.

Therefore as the device geometry is reduced, we are quickly reaching the limits of silicon technology for ultra high speed, VLSI circuits. We are hence prompted to seek other technologies to provide for faster devices which will be a prerequisite for even more sophisticated system design capabilities.

1.3 Gallium Arsenide as an Alternative Substrate

Before assessing the suitability of GaAs as a substrate for VLSI circuits it is important to note that our concern is only with ultra-high speed applications. Then, in order to explore the potential of the technology, it is necessary to make a direct comparison between GaAs and silicon. First we concentrate on the two materials and their electrical properties, a summary of which is given in Table 1.2 [17].

<table>
<thead>
<tr>
<th>Properties</th>
<th>GaAs</th>
<th>silicon</th>
</tr>
</thead>
<tbody>
<tr>
<td>Electron mobility (cm$^2$/Vs)</td>
<td>5000</td>
<td>800</td>
</tr>
<tr>
<td>Maximum electron drift velocity (cm/s)</td>
<td>$2 \times 10^7$</td>
<td>$1 \times 10^7$</td>
</tr>
<tr>
<td>Hole mobility (cm$^2$/Vs)</td>
<td>250</td>
<td>350</td>
</tr>
<tr>
<td>Energy gap (eV)</td>
<td>1.43</td>
<td>1.12</td>
</tr>
<tr>
<td>Type of gap</td>
<td>Direct</td>
<td>Indirect</td>
</tr>
<tr>
<td>Density of states in conduction band (cm$^{-3}$)</td>
<td>$5 \times 10^{13}$</td>
<td>$3 \times 10^{19}$</td>
</tr>
<tr>
<td>Maximum resistivity ($\Omega$cm)</td>
<td>$10^9$</td>
<td>$10^9$</td>
</tr>
<tr>
<td>Minority carrier life time (s)</td>
<td>$10^{-8}$</td>
<td>$10^3$</td>
</tr>
<tr>
<td>Breakdown field (V/cm)</td>
<td>$4 \times 10^5$</td>
<td>$3 \times 10^5$</td>
</tr>
<tr>
<td>Schottky barrier height (V)</td>
<td>0.7-0.8</td>
<td>0.4-0.6</td>
</tr>
</tbody>
</table>

Table 1.2 Properties of GaAs and silicon at 300 K.
The advantages of GaAs over silicon as a base material for ICs are [18] [19] [20]:

a) At normal doping levels the saturated drift velocity for GaAs and silicon are almost equal with values of $1.4\times 10^7$ and $1\times 10^7$ cm/s respectively. However the saturation velocity in GaAs is achieved at electric fields about four times lower than in silicon.

b) Electron mobility in GaAs is six to seven times higher than in silicon. Therefore, transit times as short as 15-10ps, corresponding to current gain-bandwidth products in the range 15-25GHz can be obtained for GaAs transistors for typical gate lengths of 0.5-1µm (a three to five times improvement over silicon devices).

c) The semi-insulating property of GaAs material (resistivity in the range of $10^7$-$10^9\Omega \text{cm}$ at room temperature) is another advantage for high performance devices. It not only minimises the parasitic capacitances but also allows for easy electrical isolation of multiple devices on a single substrate.

d) Schottky barriers can be realised on GaAs with a large variety of metals (e.g. aluminium, platinum, titanium) leading to high quality Schottky junctions with excellent ideality factors ($n$ less than 1.1) and fairly low reverse currents $J_s < 1\mu A/cm^2$.

e) GaAs is more radiation resistant than silicon due to the absence of gate oxide and can operate over a wider temperature range (-200 to 200°C) because of its larger band gap, and finally:

f) The direct band gap of GaAs allows efficient radiative recombination of electrons and holes, meaning that forward-biased pn junctions can be used as light emitters. Thus, efficient integration of electrical and optical functions is possible.

The expected higher performance of GaAs compared with silicon should be studied not only on the basis of the material properties but also in
terms of the actual logic gates and integrated circuits implemented in either technology. As explained in section 1.2, the most important figure of merit for logic gates in high-speed VLSI circuit applications is the dynamic switching energy. Figure 1.3 shows the calculated dynamic switching energy versus propagation delay relationships for GaAs and silicon MESFETs (W= 10μm, L= 1μm), with a load capacitance of 30fF [21].

Figure 1.3 Optimised switching performances of silicon and GaAs MESFETs with a load capacitance of 30fF.

It is evident that the logic switching speeds and speed-power products of the FET gate are dramatically improved in GaAs. For the same logic voltage swing, a GaAs MESFET (L= 1μm) would give about 4-6 times higher switching speeds than its silicon counterpart. For a logic voltage
swing of 3.5V, the silicon MESFET should achieve a switching speed of 183ps. With the same gate length a GaAs MESFET, should achieve the same switching speed with only a 300mV logic swing. This is reflected in the figures for the dynamic switching energies of the gates. For the GaAs MESFET, it is only about 3fJ, whereas for the silicon MESFET, its value is about 150 times higher (0.45pJ), restricting the level of integration.

Having discussed the superior performance potential of GaAs material and logic gates compared with silicon, we must also consider the performance of GaAs integrated circuits with reasonable complexity, and compare them with their silicon counterparts. Tables 1.3 through 1.5 list some of the GaAs and Si multipliers, memories and gate arrays [22] [23] [24]. The performance trade-off between speed and power is evident within each technology as well as the effect of design rules. For the same device dimensions, GaAs devices perform better either in terms of power dissipation or propagation delay. The results show that GaAs IC technology will have a significant impact on the performance of digital signal processing systems. A factor of 2 to 5 times the system clock frequency over present systems is projected for digital GaAs ICs.

<table>
<thead>
<tr>
<th>Technology</th>
<th>Size (mm)</th>
<th>Delay (ns)</th>
<th>Power (mW)</th>
<th>Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td>Si NMOS (TRW)</td>
<td>8x8</td>
<td>45</td>
<td>1000</td>
<td>2μm design rule</td>
</tr>
<tr>
<td>Si ECL (NEC)</td>
<td>8x8</td>
<td>5</td>
<td>1400</td>
<td>2x6μm emitter</td>
</tr>
<tr>
<td>Si NMOS (BELL)</td>
<td>16x16</td>
<td>20</td>
<td>1000</td>
<td>1.5μm design rule</td>
</tr>
<tr>
<td>Si SOS (TOSHIBA)</td>
<td>16x16</td>
<td>27</td>
<td>150</td>
<td></td>
</tr>
<tr>
<td>Si CMOS (NEC)</td>
<td>16x16</td>
<td>45</td>
<td>100</td>
<td></td>
</tr>
<tr>
<td>GaAs DCFL (FUJITSU)</td>
<td>16x16</td>
<td>10.5</td>
<td>952</td>
<td>2μm gate length</td>
</tr>
<tr>
<td>GaAs DCFL (TOSHIBA)</td>
<td>8x8</td>
<td>12</td>
<td>160</td>
<td></td>
</tr>
<tr>
<td>GaAs SDCFL (ROCKWELL)</td>
<td>8x8</td>
<td>5.25</td>
<td>2200</td>
<td></td>
</tr>
</tbody>
</table>

Table 1.3 IC technologies comparison (for multiplier circuit).


<table>
<thead>
<tr>
<th>Technology</th>
<th>Size (bits)</th>
<th>Access time (ns)</th>
<th>Power (mW per 1K)</th>
<th>Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td>Si ECL (FUJITSU)</td>
<td>4K</td>
<td>3.2</td>
<td>750</td>
<td></td>
</tr>
<tr>
<td>Si ECL (NEC)</td>
<td>4K</td>
<td>2.3</td>
<td>400</td>
<td></td>
</tr>
<tr>
<td>Si NMOS (BELL)</td>
<td>4K</td>
<td>5.0</td>
<td>100</td>
<td>1μm design rule</td>
</tr>
<tr>
<td>Si CMOS (NIPPON)</td>
<td>1K</td>
<td>25.0</td>
<td>low</td>
<td>1.5μm design rule</td>
</tr>
<tr>
<td>GaAs DCFL (FUJITSU)</td>
<td>1K</td>
<td>1.3</td>
<td>300</td>
<td>2μm gate length</td>
</tr>
<tr>
<td></td>
<td>4K</td>
<td>3.0</td>
<td>175</td>
<td></td>
</tr>
<tr>
<td>GaAs DCFL (NIPPON)</td>
<td>1K</td>
<td>2.0</td>
<td>459</td>
<td>1μm gate length</td>
</tr>
<tr>
<td></td>
<td></td>
<td>6.0</td>
<td>38</td>
<td></td>
</tr>
<tr>
<td>GaAs DCFL MC D-DOUGLAS</td>
<td>256</td>
<td>5.0</td>
<td>35</td>
<td></td>
</tr>
<tr>
<td>GaAs HEMT (FUJITSU)</td>
<td>1K</td>
<td>3.4</td>
<td>290</td>
<td>JFET technology</td>
</tr>
<tr>
<td></td>
<td></td>
<td>0.9</td>
<td>360</td>
<td></td>
</tr>
</tbody>
</table>

Table 1.4 IC technologies comparison (for memory circuit).

<table>
<thead>
<tr>
<th>Technology</th>
<th>Size (gates)</th>
<th>Gate delay (ps)</th>
<th>Power (mW/gate)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Si ECL (NIPPON)</td>
<td>5000</td>
<td>500 (average)</td>
<td>1.0</td>
</tr>
<tr>
<td>Si BIPOLAR (IBM)</td>
<td>10000</td>
<td>1700 loaded</td>
<td>0.34</td>
</tr>
<tr>
<td></td>
<td></td>
<td>1400 loaded</td>
<td>0.57</td>
</tr>
<tr>
<td>Si SOS (TOSHIBA)</td>
<td>8000</td>
<td>870 loaded</td>
<td>0.45</td>
</tr>
<tr>
<td>Si ECL (COMMERCIAL)</td>
<td>170-1500</td>
<td>3500-1500</td>
<td>29-0.85</td>
</tr>
<tr>
<td>GaAs DCFL (TOSHIBA)</td>
<td>1000</td>
<td>300 loaded</td>
<td>0.2</td>
</tr>
<tr>
<td>GaAs DCFL (TEKTRONIK)</td>
<td>1224</td>
<td>100 (fo=1)*</td>
<td>0.25</td>
</tr>
<tr>
<td></td>
<td></td>
<td>200-250 (fo=3)</td>
<td></td>
</tr>
<tr>
<td>GaAs SDCFL (HONEYWELL)</td>
<td>432</td>
<td>250 (r.o)+</td>
<td>3.0</td>
</tr>
<tr>
<td>GaAs SDCFL (LOCKHEED)</td>
<td>320</td>
<td>184 (r.o)</td>
<td>&gt; 1.0 (Est.)</td>
</tr>
</tbody>
</table>

* (fo=N) is a gate with fanout of N.
+ (r.o) is the results obtained from ring oscillators.

Table 1.5 IC technologies comparison (for gate array).
1.4 Current Developments and Future Trends

GaAs technology maturity in the processing of digital integrated circuits in 1991 is equivalent to silicon technology maturity of the mid 1970's. However, improvements seen with GaAs processing technology are occurring at a rate which is three times that which occurred in silicon processing during the 1970's and early 1980's [25]. The turning point came in 1986 with the development of a new method of manufacturing digital GaAs ICs. The process employs the usual metal-semiconductor field-effect transistors (MESFETs), except that a refractory metal replaces gold in the MESFET self-aligned gates [26]. This innovation not only eases manufacture but also permits the use of a logic family which trades off some of gallium arsenide's high speed for lower power consumption. The result is a high yield and relatively low cost solution to the needs of very high speed digital integrated circuits.

The market for digital GaAs ICs is growing very fast. Figure 1.4 shows the perceived European GaAs IC market in 1984, 1989 and 1994 [27]. This demonstrates that the leading sector until the late 1980's was analogue MMICs, but that both digital and optoelectronic ICs will be employed increasingly in systems. By the end of 1994, the European market will mostly be devoted to GaAs digital applications. The same progress is happening world-wide, with most of the newly available VLSI products in GaAs being application specific integrated circuits (ASICs). The most dramatic impact on the computer market will occur when GaAs microprocessors begin to appear. These chips will bring the power of today's supercomputer to the desktop workstation. Because of their relatively low power dissipation, clock frequencies in excess of 250MHz could be accommodated in an office environment enclosure which contains only a fan for cooling [28]. In sharp contrast, today's supercomputers require exotic liquid or refrigerated-air cooling.
1.5 Scope of this Thesis

This chapter has shown the superior performance of digital GaAs circuits in terms of speed and power dissipation and has predicted an ever growing use of this technology for high speed digital applications.

The ultimate success of GaAs as a base for digital integrated circuits depends on various factors, the most important of which are the process and design issues.

The process maturity of GaAs is reaching the stage where the implementation of true VLSI circuits (≥20,000 transistors) is possible. This is brought about by the constant improvement in the preparation of defect free crystals as well as in production of devices with very small parameter variations. At such levels of integration, a design approach must be developed to ensure reliable operation whilst maintaining the high speed and low power dissipation offered by the technology.

The subject of this thesis is to identify the critical design issues, ranging from the optimisation of basic gates to the impact of the algorithms and overall architecture on the performance of GaAs VLSI circuits. This is
achieved by designing a range of test circuits such as logic gates, buffers, storage elements, adders and multipliers based on existing design ideas to identify potential problem areas. The data provided from this design exercise are then used to develop novel techniques to improve the performance of GaAs circuits at high levels of integration. Although the designs are primarily targeted for image processing applications, in principle they could have much wider applications.

In chapter 2 various GaAs devices are introduced and their suitability for VLSI applications is assessed. The manufacturing sequence of the devices is then explained to provide a better understanding of their structures. The layers and their associated layout rules are subsequently defined in order to be able to identify them on the circuit layouts and to show the minimum feature sizes for the GaAs process used. Also, in this chapter, the device models and process parameters are discussed in some detail. These are important issues as they directly determine the validity of the simulation results.

The GaAs MESFET logic families are discussed in chapter 3. A detailed comparison between the logic gates is presented to select the most appropriate one for GaAs VLSI applications, namely the Direct Coupled FET Logic (DCFL) gate. An alternative gate configuration called the Source follower DCFL (SDCFL) is also proposed in an attempt to improve the noise margin and speed of GaAs circuits. This is followed by suggesting a number of buffering schemes to improve the speed where the fanout loading is high. This is particularly important for the clock drivers required in any synchronous VLSI circuit.

The fourth chapter gives a review of various adder circuits. These adders are designed, laid out and simulated to find the best adder architecture for GaAs implementation. The effects of algorithm and design technique on the performance of the adder circuits are fully demonstrated. The effects of various interconnect technologies on the overall delay are also investigated to suggest adder architectures which would be least sensitive
to interconnect. The design and evaluation of a GaAs multiplier circuit is presented in chapter 5. This is a natural progression towards the implementation of a VLSI circuit for digital signal processing applications. The multiplier circuit is used to demonstrate further the effectiveness and identify the limitations of conventional circuit design approaches for GaAs digital circuits.

A hierarchical design procedure and a novel layout method are proposed in chapter 6 to minimise the delay and area of circuits. This novel design technique is applied to the same circuit examples in chapters 4 and 5 which are then re-evaluated. A comparison between the results obtained from the circuits in this chapter and those achieved by using the conventional design techniques is given to show the improvements in performance.

Finally, the overall objectives and the work carried out during the course of the project are summarised in chapter 7. The outcomes together with the conclusions drawn from the research are also presented.
2.1 Suitable Devices for VLSI Implementation

A number of different devices have been developed for GaAs. They fall into two categories, the first and second generation devices [30]. First generation devices are the Depletion-mode MESFET (DFET), Enhancement-mode MESFET (EFET), Enhancement-mode Junction FET (EJFET) and Complementary EJFET (CE-JFET). The second generation devices include the High Electron Mobility Transistor (HEMT) and Heterojunction Bipolar Transistor (HBT). Second generation devices are faster than the first generation devices due to better exploitation of the GaAs. For example the operating frequency of DFETs, in general, is between 20 to 80GHz and for HEMTs it can vary from 70 to 100GHz [31].

There are also more exotic devices being invented in the research labs which attempt to reach the ultimate performance of GaAs. However for high speed VLSI circuits the most important factor, apart from high operating frequency, is the maturity of the process. At present the first generation MESFETs are the most widely used devices for VLSI applications. Even at sub-micron level they can still be easily manufactured and provide high operating frequencies.

The designs and analyses of the circuits presented in this thesis are based on MESFETs. Therefore the results and the final conclusions are specific to MESFETs, although the fundamental design and implementation issues are believed to be applicable to circuits using other GaAs devices.

The following section presents a detailed description of MESFETs, their fabrication process and design rules as well as the equivalent circuit models used in all the simulations.
2.2 GaAs MESFET Structure

Figure 2.1 shows the basic structure of a GaAs MESFET. It consists of a chromium doped, semi-insulating substrate into which source, drain and channel are made by n-type dopant implantation [32].

![Cross section of an ion-implanted MESFET.](image)

The gate is formed when a metal such as aluminium is deposited over the channel. Conduction in the channel is confined to the region between the gate depletion-edge and the substrate and may be modulated by the gate voltage.

GaAs MESFETs are somewhat similar to silicon MOSFETs. The major difference is the presence of a Schottky diode at the gate-channel interface. The detailed device operation is also different in that in GaAs the electron velocity saturates for an electron field roughly ten times lower than in silicon. Thus, the saturation in drain current, for GaAs MESFETs occurs due to the carrier-velocity saturation, whereas channel pinch off causes this in silicon [33].

The threshold voltage of the GaAs MESFET can be adjusted by varying the channel thickness and the concentration of the implanted impurity. The normally 'ON' DFET is characterised by its thick and highly doped channel exhibiting a negative threshold voltage. By reducing the channel thickness a normally 'OFF' EFET with positive threshold voltage can be fabricated. For the DFETs the channel thickness is in the range of 1000 to 2000 Å, whereas for the EFETs it ranges from 500 to 1000 Å.
There are many ways of fabricating MESFETs and the process can be adapted to the application for which they are intended. For high performance GaAs VLSI circuits the most dominant approaches in device fabrication are the planar and self-aligned gate processes [34].

2.3 Planar Processing Steps for GaAs MESFETs

Figure 2.2 shows a generalised manufacturing sequence for a discrete planar GaAs DFET process. It is presented here to show the steps in transistor fabrication without the complications of simultaneous fabrication of other components (the same process applies to EFETs).

As shown in Figure 2.2a, initially the GaAs substrate is coated with the first level of insulator which is a thin layer of silicon nitride (Si$_3$N$_4$). This thin film of insulator remains on the wafer throughout the processing steps that are to follow. A photoresist is then applied and selectively removed to define a shallow high resistivity n-channel layer. The channel is formed by direct implantation of silicon ions through the silicon nitride layer, into the GaAs substrate.

Figure 2.2b shows the formation of the deep and heavily doped n$^+$ layer for the source and drain regions, after a second application of photoresist and the selective removal process. The resultant channel resistance is in the range of 1000 to 2500Ω/square, which is too high for source and drain contacts. Therefore the surface concentration of the n$^+$ is kept relatively high to minimise the resistance seen by the ohmic metal contacts.

In the next step, namely the cap and anneal process (Figure 2.2c), the wafer is capped with a suitable material such as silicon dioxide (SiO$_2$) by chemical vapour deposition. This layer of silicon dioxide is particularly important as it prevents arsenic out-diffusion, brought about by the high vapour pressure associated with GaAs when subject to temperatures in excess of about 600°C, during the anneal step. The anneal step is performed in a hydrogen ambient to activate electrically the implanted
regions.

The ohmic contact metallisation step in which contact areas for the source and drain are formed uses a process known as the lift off technique (figure 2.2d).

In the lift off process the deposited metal adheres to the underlying material where there is no cap layer while the remaining metal on the cap layer is removed when the layer is stripped. This allows precise metal definition without an etch back process. The metals used in the ohmic metallisation are gold-germanium-nickel or gold-germanium-platinum alloy.

An important point to note is that the semi-insulating nature of the GaAs substrate cannot be used alone to provide good isolation between devices (back-gating) [35] [36]. In fact, it is usual to implant $H^+$ ions into the field areas to reduce the effect of the parasitic interactions between the nearby devices.

One of the most critical steps in the fabrication process is the gate metallisation. Schottky gates together with the first level interconnect are formed by multi-layer gold and refractory metal thin films such as titanium/platinum/gold alloy, deposited by electron beam evaporation (Figure 2.2e). Second and higher level metals are not in contact with the GaAs substrate, therefore platinum which is used to prevent the interaction of gold with the GaAs surface can sometimes be eliminated from this step.

The final step of the process is the passivation step which is used to protect against moisture and contamination (Figure 2.2f). This entails a thick layer of silicon nitride being deposited on the gate, source and drain metallisation, using a low temperature plasma enhanced chemical vapour deposition process.
Figure 2.2  A typical planar manufacturing process for a GaAs MESFET.
2.4 Self-Aligned Gate Process Technology

In order to improve fabrication technology, the self-aligned gate method was borrowed from silicon NMOS process. In this method, the Schottky gate is used as a mask for implanting the source and drain regions of the devices. The n\textsuperscript{+} source and drain layers are embedded close to the gates. Therefore the parasitic source resistance of the FETs is greatly reduced and as a consequence the transconductance of the device is increased. In addition the process offers improved pinch-off voltage uniformity, which is of crucial importance for the manufacture of VLSI circuits based on normally-off EFETs.

The fabrication steps for a self-aligned gate process are shown in Figure 2.3. Just as for the planar process the first step is to form the channel area by selective implantation of silicon ions into the GaAs substrate (Figure 2.3a). Next, a high temperature stable material such as Tungsten Nitride is deposited over the substrate and is patterned by an etching process to define the gate area (Figure 2.3b). The gate acts as a mask for the next step in the process which is the formation of source and drain by the high dose implantation of ions (Figure 2.3c). This step is followed by capping of the substrate with silicon dioxide so that the sample can be annealed without any arsenic out-diffusion due to the high vapour pressure.

It is important to note that the gate material must withstand the high temperatures (about 800°C) during the annealing process. Tungsten Nitride has been found to be satisfactory as a gate material. It has a typical film resistivity of 70\(\mu\Omega\cdot\text{cm}\) and Schottky barrier height of 0.8 V to n-type GaAs.

After the annealing (Figure 2.3d), the final stage of the process is the ohmic metallisation of the source and drain regions (Figure 2.3e). As in the case of the planar process, the metals used in the ohmic metallisation are gold-germanium-nickel alloy or gold-germanium-platinum.
Figure 2.3 A typical manufacturing process for a self-aligned GaAs MESFET.
The formation of the second and higher level metals together with the final passivation stage is similar to that of the planar process, described in the previous section.

2.5 GaAs MESFET Design Rules and Layer Representation

The layout and design rules are intended to ensure reliable circuits with optimum yield and size. They are set by the designer and the process engineer to provide the best compromise between yield and performance.

The layout rules must define: a) the geometry of the features that can be reproduced by the mask and lithography process and, b) the interaction between different layers. There are two main approaches to achieve this: the lambda-based and micron-based rules. In lambda-based rules, every feature is expressed in terms of the parameter lambda. The micron rules, on the other hand, are given as a list of minimum feature sizes and spacings, according to the capabilities of the process technology.

The lambda-based rules are simple and somewhat relaxed to ensure high yield circuits. This, however, results in performance degradation due to the increase in area. For high speed GaAs VLSI circuits, micron-based rules must be used to achieve optimum performance [37].

The layout rule set used throughout the work presented in this thesis is given in appendix A, so that it can be used for further circuit design and implementation work, if required. The set includes the width and spacing rules for different layers together with some special rules for MESFETs. The colour coding of the layers together with the layer patterns are also provided so that each layer in the circuit can easily be identified [38] [39].

2.6 An Appropriate Device Model for GaAs VLSI

In the following chapters a considerable amount of computer simulation is described, in order to present a novel design approach for GaAs
MESFETs. The validity of the results and final conclusions depend totally on: a) the accuracy of the model for the individual devices and b) the accuracy of the parameters, extracted for the model [40]. The deciding factor in choosing a particular model must arise from the comparison of the simulated results with the measured data to provide reliable results.

For VLSI circuit simulation, another important factor in choosing a particular model is that it should be CPU time efficient. Clearly complex models can not be used for circuits with many thousands of MESFETs. On the other hand MESFETs are complex internally and simple equations can not describe their behaviour under all possible conditions.

The most commonly used MESFET model is based on the JFET model, consisting of a parallel diode and capacitor between gate-source ($D_{gs}$, $C_{gs}$) and gate-drain ($D_{gd}$, $C_{gd}$), plus a controlled current source ($I_{ds}$) between drain-source. For anything other than the most approximate simulations it is necessary to add resistors $R_d$, $R_s$ and $R_g$ in series with the drain, source and gate respectively, add a drain-source resistor ($R_{ds}$) and drain-source capacitor ($C_{ds}$). The complete equivalent circuit model is shown in Figure 2.4 [41] [42].

![Figure 2.4 The equivalent circuit model for GaAs MESFETs.](image-url)
The problem is to define a formula for the $I_{ds}$ current. The simplest formula is given by the Schichman and Hodges model [43], which is implemented in most versions of SPICE programs.

The model has a number of inadequacies when it comes to modelling short channel MESFETs (which is the case for most MESFETs) [44].

These are as follows.

a) The square-law relationship of $I_{ds}$ to $V_{gs}$ is often significantly different from the behaviour of the actual device.

b) The approximately linear dependence of output conductance on $I_{ds}$ is often not observed (they are more often independent).

c) The saturation of $I_{ds}$ is assumed to be at $V_{ds} = V_{gs} - V_t$, whereas the actual device exhibits early saturation at a significantly lower voltage than the formula suggests.

A simple, more accurate, model was proposed by W.R. Curtice in 1980 [45], which incorporates a tanh function in the formula. It allows the linear and saturation regions to be modelled by the same equation. This model is used for all the simulations presented in this thesis and apart from the accuracy and simplicity, having access to the foundry measured parameters for this model was the main reason for choosing it.

The drain-to-source current [$I_{ds}$], described by the Curtice equation is as follows:

$$I_{ds} = \beta (V_{gs} - V_t)^2 (1 + \lambda V_{ds}) \tanh(\alpha V_{ds})$$

(2.1)

where $\beta$ is the transconductance parameter, $V_{gs}$ is the gate-source voltage, $V_t$ is the threshold voltage, $\lambda$ is the channel length modulation parameter, $\alpha$ is the hyperbolic tangent drain voltage multiplier and $V_{ds}$ is the drain-source voltage.

DC characteristics are defined by the model parameters $V_t$ and $\beta$ (which
determine the drain current with gate voltage), by \( \lambda \) (which determines the output conductance) and by the saturation current of the two gate junctions.

The following equations describe the threshold voltage and transconductance parameters [46]:

\[
V_t = V_{bi} - \frac{qN\alpha^2}{2\varepsilon}
\]  
(2.2)

\[
\beta = \left( \frac{\mu_n e}{2a} \right) \left( \frac{W}{L} \right)
\]  
(2.3)

where \( V_{bi} \) is the built-in potential, \( N \) is the effective channel doping density, \( q \) is the electron charge, \( \alpha \) is the effective channel implant depth, \( \varepsilon \) is the permittivity, \( \mu_n \) is the electron mobility, \( W \) is the gate width and \( L \) is the channel length.

Charge storage is modelled by non-linear capacitances, defined by the parameters \( C_{gs} \) and \( C_{gd} \). They are considered as Schottky-barrier diodes and modelled as :

\[
C_{gs} = \frac{C_{go}}{\sqrt{1 - \frac{V_{gs}}{V_{bi}}}}
\]  
(2.4a)

\[
C_{gd} = \frac{C_{go}}{\sqrt{1 - \frac{V_{gd}}{V_{bi}}}}
\]  
(2.4b)

where \( V_{gs} \) and \( V_{gd} \) are the gate-drain and gate-source voltages, and \( C_{go} \) is the zero bias capacitance.

The parameter values used in the model are given in Table 2.1 [47]. They are derived from an n-channel self-aligned GaAs MESFET process.
Table 2.1 Parameter values used in the MESFET model.

2.7 Important Effects Included in the Device Model

Having introduced the equations for the $I_d$, current and the gate capacitances, there are two important effects which have to be modelled.

a) Transit-time effects

Transit-time is brought about by a finite delay in a change in $I_d$ when the voltage at the gate is changed. This is due to the fact that charge transport occurs at a maximum velocity of $10^7$ cm/s. Therefore, for a 1 $\mu$m channel length, it takes about 10 ps for the current to change when the gate voltage is altered. This time delay is very important in delay calculation of GaAs circuits and can be included in the model by
substituting the $V_{t}(t) - (V_{t}(t-\tau))$ for $V_{gs}$, where $\tau$ is the time delay.

b) Dispersion effects [48] [49]

There are a number of undesirable effects in GaAs MESFETs which may be significant in the performance of the overall circuits. One of the most dominant effects is the transconductance dispersion which is brought about by the non-ideal semi-insulating substrate and surface. This results in higher output conductance (order of 2-3 times) in saturation for high frequency signals than would be predicted from curve tracer or parameter analyser measurements.

One of the easiest way to model this effect is simply to increase the value of $\lambda$ in the Curtice model from the value extracted for the low-frequency measurement to its high frequency value. Typically the high-frequency value is three orders of magnitude larger than the low-frequency value. Although this simple model ignores the effect of overshoot and phase shift due to dispersion effects, it is adequate for the performance evaluation of digital circuits presented in this thesis.

2.8 Interconnect Modelling

The switching speed of MESFET circuits depends on both the device and interconnect lines. The propagation of a signal along an interconnect line is dependent on a number of factors. They include the distributed line resistance, capacitance and inductance, the impedance of the driving source and the cross-talk between the lines [50].

The interconnect for digital GaAs circuits can still be treated as purely capacitive provided the effective ON resistance of the driver gate is larger than that of the line by at least 2 orders of magnitude [51]. This is the case with the MESFET gates used in our circuits (see chapter three for design and analysis of logic gates).

The capacitance of the lines can be derived using the parallel plate model
[52], but this simple model ignores the influence of the cross-talk (coupling) which can severely degrade the speed of GaAs VLSI circuits. There are several methods to reduce the effect of cross-talk. For example using a thick interlayer with low dielectric constant between the lines and the GaAs substrate can reduce the cross-talk by as much as 13%. A further 8% reduction can be achieved by using an air bridge technology where the interconnect lines are suspended in the air [53].

![Diagram of line capacitance calculation](image)

**Figure 5.5 Line capacitance calculation.**

In order to be able to predict accurately the performance of the overall GaAs circuits the effect of coupling must be included in the computer simulation. One effective method is to use Green’s function to provide an electrode capacitance matrix for self and mutual capacitances of the lines by determining their total electron charge. This method provides accurate values for the capacitance of both the device and interconnect lines [54]. However as the number of conductors increases, the size of the
capacitance matrix continues to grow and results in excessive CPU time and memory allocation to compute the capacitances and store the final values. Therefore in the computer simulation of the circuits presented in the following chapters the parasitic capacitances due to coupling are manually added to the capacitance of the lines in the critical paths and are based on the calculated results given in Figure 2.5. This provides a crude estimation, but sufficiently accurate results without any sacrifice in CPU time or memory allocation [55].

2.9 Effect of Process Variations

Another important issue is the effect of process variation on circuit performance. The simulations performed in this research are all based on parameters for a commercial GaAs process. The parameters were also varied by as much as 50% to ensure that the results were valid for a large change in parameters. Therefore the proposed design approaches are believed to show a good tolerance to process spread. A detailed analyses of the process parameter spread is beyond the scope of this thesis and is not presented.
CHAPTER 3
MESFET LOGIC FAMILIES FOR GaAs VLSI CIRCUITS

3.1 Types of MESFET Logic Gate [56] [57]

There are two main approaches to the design of MESFET logic gates. They are categorised as either Normally-ON or Normally-OFF logic gates. The Normally-ON logic gates consist of DFETs and were the first generation devices developed for GaAs digital circuits. The main reason for the development of this class of logic was the process maturity of DFETs. Later, when the yield and threshold voltage uniformity of EFETs were improved the Normally-OFF logic gates were introduced. They consist of both types of device (DFETs and EFETs) and possess characteristics essential for the implementation of VLSI circuits on GaAs (eg small area, low power dissipation etc).

Gate configurations based on these logic classes are described in this chapter. They are intended to show the trends and developments in GaAs logic design and further aid the choosing of a particular gate configuration best suited to VLSI implementation.

3.2 Normally-ON Logic Gates

A number of approaches have been proposed for the design of this class of logic. They are: the Buffered FET Logic (BFL), Schottky Diode FET Logic (SDFL) and Capacitor-Coupled FET Logic (CCFL).

a) Buffered FET Logic (BFL) [58] [59] [60]

The basic structure for the BFL gate is shown in Figure 3.1a. It consists of two sections, the logic input and the driver/level Shifter output.
Different logic functions are implemented by modifying the logic input. The output driver is used to ensure input and output logic level compatibility between the gates. Also, in order to be able to turn off the DFET logic switch ($T_s$) of the driven gate, a negative supply voltage (VSS) is required which adds to the complexity of the gate.

This type of gate is considered to be one of the fastest, but is expensive in terms of power and area. Most of the power is dissipated in the driver section, therefore to reduce the power it is possible to remove the load driver DFET ($T_D$) in the output stage of the BFL gate, as shown in Figure 3.1b. This new configuration is called the Unbuffered FET Logic (UFL) and is more suitable for LSI applications. The absence of $T_D$, however, reduces the speed and fanout capability of the gate.

![Figure 3.1](image)

**Figure 3.1** (a) BFL gate with the load driver. (b) UFL gate without the load driver.

b) Schottky Diode-FET Logic (SDFL) [61] [62]

In this logic approach Schottky diodes are used to perform the logic
operations. They are followed by a Schottky diode for level shifting and a buffer stage. A possible configuration of the gate is shown in Figure 3.2a. The power consumption and area of this type of gate are less than the BFL gate but with lower speed and drive capability.

It is possible to increase the drive capability of the gate without excessive increase in power dissipation by adding a push-pull source follower at the output, as shown in Figure 3.2b. To improve the noise immunity of the gate, the power supply for the logic is normally isolated from the source follower.

Figure 3.2 (a) The basic SDFL gate. (b) SDFL gate with a source follower output stage.

c) Capacitor-Coupled FET Logic (CCFL) [63] [64]

In order to overcome the problem of level shifting in the Normally-ON gates the natural choice is to use a capacitor to couple the input and output stages. Figure 3.3a shows a typical CCFL gate, where a reverse-biased diode is used as the capacitor (D_{cap}).

The gate has a very simple structure and requires only one supply rail.
In addition, the power dissipation of the gate is low compared with BFL and SDFL gates. This is due to the fact that there is no power consumed in the capacitors. As soon as they are charged, the action thereafter is to transfer the charge between successive stages. Also, as the capacitor is placed in series with the DFET gate ($T_{PD}$), the capacitive loading is reduced and hence the speed of the gate is improved.

The use of a capacitor implies a minimum operational frequency of the circuit. This frequency is determined by the leakage currents and relative sizes of the coupling capacitor and reverse biased gate-source junction of the $T_{PD}$. For applications where the low frequency cutoff point is not acceptable, a combination of reverse and forward biased diodes is used to provide both the level shifting and capacitive coupling between the stages [65]. Figure 3.3b shows the basic structure of such a gate, called Capacitor-Diode FET Logic (CDFL). The gate area is increased as a result of adding the level shifting diodes but the low power dissipation is still maintained since the current through them can be made very small.

Figure 3.3 (a) CCFL gate configuration. (b) CDFL gate configuration.
3.3 Normally-OFF Logic Gates

Normally-OFF logic includes Quasi-FET Logic (QFL) and the Direct-Coupled FET Logic (DCFL). These utilise EFETs as switching devices and have become increasingly popular as their yield is constantly being improved.

a) Quasi-FET Logic (QFL) [66]

The development of the Normally-OFF logic gates was hampered by the lack of maturity of GaAs processing in the 70's and early 80's. The major obstacle was the variation in threshold voltage across the wafer. The QFL gate was invented to allow for a wider spread in threshold voltage (-0.4 to 0.1V) with little effect on the noise margin of the gate. The gate consists of a logic and level shift circuit, as shown in Figure 3.4. The insensitivity of the gate performance to process variation is due to the level shift circuit. However, the circuit is operated in strong overdrive, with the supply voltage set at 2.5V, resulting in an increase in power dissipation. Unlike the Normally-ON logic gate (with the exception of the CCFL gate), the QFL gate requires only one supply rail but achieves comparable dynamic performance.

![Figure 3.4 QFL gate configuration.](image-url)
b) Direct-Coupled FET Logic (DCFL) [67] [68] [69] [70]

Figure 3.5a shows the basic structure of a DCFL gate. It consists of a DFET load (pull-up, T_L) and an EFET switch (pull-down, T_s), and closely resembles an nMOS gate. DCFL is much simpler than others mentioned so far, which leads to a higher packing density. DCFL gates with faster switching speeds (about 15ps) than any other GaAs logic gate have been fabricated. These results are however obtained with a large power supply voltage of 4V which causes the pull-down FET to be heavily forward biased, reducing the reliability of the gate. At a more realistic supply voltage ranges between 1 and 2V DCFL gate delays are slightly greater than that of the BFL gate. The main drawback with this type of gate is that the allowable output voltage swing is about 0.8V, equal to the barrier height of the Schottky gate diode of the driven EFET. Therefore, only small voltage swing can be expected from DCFL circuits, resulting in small noise margins. Also DCFL gates have a poor load drive capability which could severely limit the performance of large circuits with high fanout and long interconnect lines.

A possible solution to low noise margin and poor fanout capability is to use a super-buffer configuration as shown in Figure 3.5b. The output stage consists of a load driver (T_D, connected as a source follower) and a pull-down (T_PD) EFET. They can be appropriately sized to drive a given capacitive load. The problem with the super-buffer configuration is that when the output logic level is to switch from a logic 'high' to a logic 'low', both the T_D and T_PD transistors are hard ON for a short period of time. Therefore a current spike appears with a momentary voltage drop in the supply line [71]. With many of these gates in a VLSI circuit switching at the same time, large voltage drops could be observed in the supply rail, giving rise to an incorrect logic operation. Therefore the use of super-buffer configuration necessitates a careful design of the supply lines.

Another approach to improving the noise margin and fanout of the DCFL gate is to use the Source follower DCFL (SDCFL) gate [72]. Figure 3.5c
shows the SDCFL gate configuration. The source follower stage can be sized to drive a given load and due to the action of the $T_D$ high values of noise margin can be obtained.

Figure 3.5 (a) DCFL gate configuration. (b) Super-buffer inverter. (c) SDCFL inverter.
3.4 Suitable Logic Gates for GaAs VLSI

The logic gate requirements for high speed VLSI circuits are explained in chapter one. They are, apart from high speed, low power dissipation and small area. The prospects of such gates for VLSI implementation are summarised by K. Lehovec et al. [46]. Taking the area of the logic gates into consideration, BFL and CCFL (> 1000μm²) are limited to MSI complexity and the SDFL (> 500μm²) gate can be used only for LSI structures. In other words Normally-ON logic gates are not suitable for VLSI on the basis of area alone.

Even with a larger chip area, these gates can not satisfy the power requirements for VLSI. The high power dissipation of the BFL gate (40mW) limits the integration level to MSI. CCFL and SDFL gates, with power dissipations of 2.5mW and 3.5mW respectively, can achieve only LSI complexity. According to H.C. Josephs [73] the power restriction for a high speed VLSI circuit would require logic swings of less than 1.8V. Further increase in the level of integration to Ultra Large Scale would require a voltage swing of 0.8V or less.

Therefore the DCFL gate with small area (=200μm²), low power dissipation (0.1-0.2mW) and low voltage supply level (1-2V), as well as circuit simplicity, is by far the strongest contender for GaAs VLSI implementation. SDCFL gate of comparable delay and power dissipation can also be used in conjunction with the DCFL to improve the fanout and interconnect drive capability. To show this, a detailed analysis of the SDCFL and DCFL gates is presented in section 3.7. They form the basis of the designs presented in the following chapters.

3.5 First Order Design of DCFL and SDCFL Gates

The design of logic gates involves the determination of optimum transistor sizes. This stage is very important in the design process as the performance of the overall circuit is directly determined by the
performance of the logic gates.

We begin by using the device model to give a first order approximation and an insight to the parameters influencing the choice of transistor sizes for DCFL and SDCFL gates. This is followed by a detailed computer simulation for various input/output conditions, supply voltage, etc to find the optimum transistor ratios.

Figure 3.6a shows two basic DCFL inverters, with their typical interconnections. The current equation for the load DFET (I_L) and the switch EFET (I_s) are as follows [74]:

\[ I_L = \beta_L (-V_{th})^2 \tanh(\alpha [V_{DD} - V_o]) \]  
\[ I_s = \beta_s (V_{in} - V_{tS})^2 \tanh(\alpha V_o) \]

Equating the two currents and using equation 2.3 we obtain:

\[ \frac{W_S}{W_L} = \frac{a_L}{a_s} \frac{(V_{in} - V_{tS})^2}{(-V_{tL})^2} \frac{\tanh(\alpha V_o)}{\tanh(\alpha [V_{DD} - V_o])} \]  

For \( V_{in} = V_o = \frac{V_{DD}}{2} = 0.4V \), equation 3.2 reduces to the form:

\[ \frac{W_S}{W_L} = \frac{a_L}{a_s} \frac{(0.4 - V_{tS})^2}{(-V_{tL})^2} \]

From equation 3.3 the ratio of the transistor widths can be determined for various values of load and switch threshold voltages, as shown in Figure-3.7. For an implant depth ratio \( (a_L/a_s) \) of 2:1 the transistor width ratio is reduced by a factor of three when the switch threshold voltage is varied from 250 to 150mV. The same effect is observed when the load threshold voltage is reduced from 900 to 500mV. The smaller device ratio results in smaller logic gates and ultimately smaller overall circuit. This justifies the choice of the threshold voltages given in table 2.1.
Figure 3.6 (a) Two DCFL inverters with their typical interconnections. (b) Two SDCFL inverters with their typical interconnections.
Figure 3.7 The gate width ratio ($W_S/W_L$) as a function of $V_t$. The solid lines are for the implant depth ratio of 2:1 and the dashed lines are for a ratio of 4:1.

The effect of the supply voltage derived from equation 3.2 is also shown in Figure 3.7 (dashed-dotted line). Above the gate built-in potential (0.8V) the effect of the supply voltage is minimal. Therefore the supply voltage can be set at 0.8V. However to account for the supply voltage variations, in practice, it is set to a higher value (1-2V).

Figure 3.6b shows two SDCFL inverters, with their typical interconnections. The logic part is the same as the DCFL gate and equation 3.2 can be used to determine the ratio of the active load ($T_D$) to logic switch ($T_S$). The driver is added to improve the noise margin and the speed of the gate. The size of this stage is determined by the output drive requirement. Therefore, the input transistor sizing is independent of the
output drive requirements. However, the size ratio of the input switch to that of the driver load influences the gate intrinsic delay. The smaller the ratio the longer is the gate intrinsic delay.

3.6 Definition of Design Parameters

In the following section the gates are evaluated in terms of noise margin, propagation delay and power dissipation. There are various definitions for these parameters. In order to avoid confusion, the definitions used in our analysis are given below.

a) Noise margin

In the evaluation of the gates, we are interested in the worst case noise margin. Therefore only the static noise margin is considered which is found graphically using the 'mirror-and-maximum-square' method [75] [76]. In this approach, noise of equal and opposite amplitude is applied to the inputs of a flip-flop and the noise margin is measured as shown in Figure 3.8.

![Figure 3.8 Noise margin calculation.](image)

There are several other definitions of noise margin which can give results
slightly conflicting with the above method [77] [78]. In our analysis however, a detailed comparison of the gates is presented and only the relative values of the noise margins are of interest. Therefore, irrespective of the method used, the final conclusions should be the same. Indeed, the absolute values should also be confirmed by measurements on real devices.

b) Propagation delay

The propagation delay is defined as the average of $t_r$ and $t_f$ ($t_\text{d} = \frac{t_r + t_f}{2}$), where $t_r$ and $t_f$ are shown graphically in Figure 3.9 [79].

\begin{figure}[h]
\centering
\includegraphics[width=\textwidth]{delay_time_calculation.pdf}
\caption{Delay time calculation.}
\end{figure}

c) Power dissipation

The power dissipation consists of static and dynamic components. For high speed circuits, the dynamic component of the power dissipation is significant and must be included in the calculations [80].
A general formula for the power dissipation of a DCFL gate is:

\[
Power_{(DCFL\, gate)} = VDD \times I_D + C_1 \times (V_{bi})^2 \times f \tag{3.4}
\]

where \(VDD\) is the supply voltage, \(I_D\) is the DC current supplied by \(VDD\), \(C_1\) is the load capacitance, \(V_{bi}\) is the output voltage swing and \(f\) is the operating frequency.

For the SDCFL gate, the power dissipation through the source follower stage must be added to the above expression:

\[
P_{(SDCFL\, gate)} = VDD \times (I_{D1} + I_{D2}) + f \times (V_{bi})^2 \times (4C_1 + C_2) \tag{3.5}
\]

where \(I_{D1}, C_1\) and \(I_{D2}, C_2\) are the current and load capacitances of the logic and the source follower stages, respectively. The above equation is derived under the assumption that the voltage swing at the output of the logic stage is twice the built-in voltage.

The term average power dissipation, used in the following chapters is derived by taking the average of the instantaneous power dissipation over one clock period which includes both the static and dynamic components of the power dissipation.

3.7 Detailed Analysis of DCFL and SDCFL Gates

Having introduced the terms used in the analysis of the logic gates, the following gives the results of detailed SPICE simulations performed to evaluate the suitability of DCFL and SDCFL gates for VLSI.

a) Effect of device width ratio on gate performance

Figure 3.10 shows the effect of the ratio of the load-to-switch gate widths of DCFL and driver-load to logic-switch gatewidths of SDCFL gate on noise margin and propagation delay. An increase in device width ratio degrades the noise margin and improves the speed of both the DCFL and SDCFL gates. For the entire range of device ratios the noise margin of the
SDCFL gate is at least twice that of the DCFL gate. For the same propagation delay of about 60ps, the SDCFL gate shows a fourfold improvement in noise margin over the DCFL gate.

\[
\begin{align*}
\text{SDCFL Driver Load-Logic Switch Gatewidth (} W_D/W_S \text{)}
\end{align*}
\]

\[
\begin{align*}
\text{DCFL Load-Switch Gatewidth (} W_L/W_S \text{)}
\end{align*}
\]

Figure 3.10 Noise margin and propagation delay of the DCFL (solid lines) and SDCFL (dashed lines) gates as a function of the gatewidth ratios.

The most important criteria in the design and evaluation of the gates are the noise margin and the propagation delay. The former will guarantee the correct functionality of the circuit and the latter determines the dynamic performance of the overall circuit. The power dissipation is given a lesser priority since its value for DCFL and SDCFL gates is very low compared with other logic families.
For optimum gate performance in terms of noise margin and delay, the width ratio of the driver-load ($T_D$) to logic-switch ($T_S$) of the SDCFL gate is set to 8:10. In order to optimise the area, the logic-load ($T_L$) and current-sink ($T_{CS}$) gate widths are set to minimum geometry. For the same criteria the load ($T_L$) to switch ($T_S$) ratio of the DCFL gate is set to 4:16, with minimum geometry load gate width. The absolute values of the transistor sizes are given in Figures 3.6a and 3.6b.

b) Effect of supply voltage on the gate performance.

The relationship between the propagation delay and power dissipation of the gates is given in Figure 3.11.

![Figure 3.11 The propagation delay of DCFL and SDCFL gates versus their power dissipation for different values of the supply voltage.](image)
Since the output voltage swing is limited by the Schottky barrier height of the driven FET, high values of the supply voltage will result in higher power dissipation without any useful increase in speed. The same is observed for the noise margin of the gates. As shown in Figure 3.12, the noise margin of the DCFL gate remains constant for supply voltages above 1V. For the SDCFL gate, the noise margin is improved by 30mV for an increase in supply voltage from 1.4 to 2V. This, however, doubles the power dissipation with only 15ps reduction in delay.

![Figure 3.12](image)

**Figure 3.12** The noise margin of the DCFL and SDCFL gates as a function of the supply voltage.

In order to maintain the constant current supplied by the pull-up FETs (the load in DCFL and, the logic-load and driver-load in SDCFL), the supply rail voltages for DCFL and SDCFL gates are set to a minimum of
1 and 1.4V, respectively. This is to account for any voltage variations in the supply rail.

c) Fanout and fanin sensitivity of the gates

The drive capability of the gates is important in large circuits since the fanout loading increases due to circuit complexity. As the number of driven gates is increased, the current into the gates of the switch FETs is further subdivided. Therefore there is less voltage across them, resulting in a degradation of the logic high level. This subsequently limits the fanout of the gate. The effect of fanout on noise margin and delay of the gates is shown in Figure 3.13.

![Figure 3.13 Noise margin and propagation delay of the DCFL (solid lines) and SDCFL (dashed lines) gates as a function of fanout.](image)

Figure 3.13 Noise margin and propagation delay of the DCFL (solid lines) and SDCFL (dashed lines) gates as a function of fanout.
The SDCFL gate maintains a noise margin which is at least twice that of the DCFL gate for a fanout range of 1 to 5. Table 3.1 shows that, in terms of fanout, the delay and noise margin of the SDCFL gate can be further improved by increasing the width of the FETs in the driver stage while maintaining the nominal ratio of 2:1. This will however increase the area and power dissipation of the gate and should only be considered for heavy fanout loading.

<table>
<thead>
<tr>
<th>Driver ratio (W_D/W_J)</th>
<th>Noise margin (mV)</th>
<th>Delay (ps)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>FO=1 FO=3 FO=5</td>
<td>FO=1 FO=3 FO=5</td>
</tr>
<tr>
<td>8/4</td>
<td>127 105 91</td>
<td>72 185 290</td>
</tr>
<tr>
<td>12/6</td>
<td>140 110 101</td>
<td>75 120 205</td>
</tr>
</tbody>
</table>

Table 3.1 Effect of varying the width of the FETs in the driver stage (while maintaining the same ratio) of the SDCFL gate.

Both gates are very sensitive to fanin loading. This is due to the low OFF resistance of the MESFETs which results in a leakage current through the pull down FETs, degrading the noise margin of the gates. Also the delay is increased with fanin as the result of added stray capacitances. The effect of fanin on the delay of the gates is given in table 3.2. In order to avoid overall performance degradation the fanin is set to a maximum of 3.

<table>
<thead>
<tr>
<th>Type of gate</th>
<th>Delay (ps)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>FI=1 FI=3</td>
</tr>
<tr>
<td>DCFL</td>
<td>100 133</td>
</tr>
<tr>
<td>SDCFL</td>
<td>72 128</td>
</tr>
</tbody>
</table>

Table 3.2 Effect of fanin on the delay of the SDCFL and DCFL gates.
The analyses show that the DCFL gate should be used for the basic logic elements within a GaAs VLSI circuit. Small area and low power dissipation are the main reasons for this choice. As demonstrated (in Figure 3.13) the gate is very sensitive to fanout loading. In fact, the maximum tolerable fanout is 5, beyond which the noise margin becomes too small for reliable circuit operation.

On the other hand, the SDCFL gate shows a superior performance to the DCFL gate in terms of noise margin and speed but it consumes larger power and area. Noise margin improvement better than fourfold is possible with power dissipation of three to five times that of the DCFL gate. Therefore, the use of the SDCFL gate is particularly advantageous where the fanout loading is high. Both gates should be utilised to complement each other in high speed, low power and reliable GaAs VLSI circuits.

3.8) Design of Buffering Schemes for GaAs VLSI Circuits

Having introduced the basic gates for GaAs VLSI, the next step is to design appropriate buffering schemes for driving large loads. This is particularly important for the clock drivers required in any synchronous VLSI circuit. There are two important issues which must be addressed, namely the effect of wiring and high fanout count.

The former accounts for up to 50% of the total delay in large GaAs circuits [55] [81] [82]. As the length of the interconnect lines increases relative to circuit complexity, the RC time delay of the lines can seriously degrade the performance. For 'sufficiently small' wire lengths, RC delays can be ignored. The lines can then be treated as one electrical node and modelled as simple capacitive loads. This assumption holds if either of the following inequalities is true [83]:

\[ \tau_w < \tau_g \]  

(3.6)
or:

$$R_{on} \geq 2.3 \times R_{int}$$  \hspace{1cm} (3.7)

where $\tau_w$ is the delay through the wire, $\tau_g$ is the gate delay, $R_{on}$ is the ON resistance of the driver FET and $R_{int}$ is the resistance of the interconnect line.

The interconnect delay can be estimated by:

$$t_w = \frac{r \times c \times l^2}{2}$$  \hspace{1cm} (3.8)

where $r$ is the resistance per unit length, $C$ is the capacitance per unit length and $l$ is the length of the wire.

Substituting equation 3.8 into equation 3.6, gives:

$$l < \sqrt{\frac{2 \times \tau_g}{r \times c}}$$  \hspace{1cm} (3.9)

Substituting the typical values for $r$ (~ 0.023 $\Omega/\mu m$) and $c$ (~ 0.05 $fF/\mu m$) and an average gate delay of 100ps gives a maximum line length of about 13mm. For a conservative design guide the maximum line length, with capacitive behaviour should be set to 4mm. The same order of magnitude for the line length can be obtained using the equation 3.7. Typical values for $R_{on}$ are in the range of 40 to 400 $\Omega$, depending on the bias voltages and the frequency of operation. For MESFETs with 10GHz operating frequency and dimensions of $W=10 \mu m$, $L=1 \mu m$ and with the typical bias conditions required in DCFL and SDCFL gates, the value of $R_{on}$ is about 250 $\Omega$. Using the equation 3.6 and the previous value of $r$, the maximum length of a capacitive line would be of the order of 4.3mm.

For VLSI applications the length of interconnections can often be longer than 4mm, therefore the effect of the RC time delays of the lines must be considered in delay calculations. As demonstrated by H.B. Bakoglu [84]
this effect can seriously degrade the performance of large circuits and should be avoided in practice. The solution is to break up these long lines into segments and add buffers at every stage so as to transform the lines into capacitive loads. These buffers are commonly termed repeaters and can be sized for optimum speed performance. The size of these buffers must be carefully adjusted to drive other gates as well as the interconnect lines. The following section attempts to define a buffering scheme suitable for GaAs VLSI implementation.

a) Some useful concepts [85] [86]

The conventional unit of drive capability is that produced by an inverter. One method of increasing the drive capability is to WIRE-OR the unit inverters in parallel. For example the drive strength of the buffer in Figure 3.14 is 3.

![Figure 3.14 Three inverters WIRE-ORed to form a buffer with drive strength of 3.](image)

More inverters can be added to the chain to achieve the required signal rise and fall times. This however, loads the previous stage which decreases its operating speed. Therefore the drive strength of all the previous stages must also be increased. The number of inverters in each stage must be determined to achieve optimum speed. This can be done by defining a relative fanout for the overall buffer, given by:

\[
\text{relative fanout} = \frac{\text{absolute fanout}}{\text{drive strength}}
\]

where the absolute fanout is defined as the sum of loads imposed by the
driven gates and drive strength is the number of gates which are WIRED-ORed.

b) An optimum relative fanout for GaAs buffers

The basic gate configurations used to arrive at an optimum value for the relative fanout of GaAs buffers are the DCFL, super-buffer (SU) and SDCFL gates (see Figure 3.5).

Three ring oscillators, based on the above gates were simulated in SPICE. The oscillation periods were made equal by adjusting the dimensions of the FETs. The delay of each gate was set at about 100ps. The gates were then evaluated in terms of noise margin, power dissipation and area. The results for the noise margin are given in Figure 3.15.

![Noise Margin Graph](image)

**Figure 3.15** Noise margin of the SDCFL, SU and DCFL buffers with fanout loading.

It is evident that the DCFL, SU and SDCFL gates should be used in low, medium and high fanout situations respectively, to ensure adequate noise margins.
The results for power dissipation and area of the gates are given in Table 3.3. The power dissipation of the SU gate is one third of the SDCFL gate, hence can be used as a logic element within a VLSI circuit to provide buffering for high fanout and long interconnect lines.

The SU gate is also less sensitive to the capacitance of the high-impedance node (output of the logic stage in Figures 3.5b and 3.5c). As shown in Figure 3.16 the delay of the SU gate is about 150ps whereas the delay of the SDCFL gate is about 200ps for a high-impedance node capacitance of 40fF. In other words, in terms of delay, it is more advantageous to implement the logic functions with medium to high fanout load in SU gates.

<table>
<thead>
<tr>
<th>Power dissipation (mW)</th>
<th>area (µm²)</th>
</tr>
</thead>
<tbody>
<tr>
<td>DCFL 0.06</td>
<td>480</td>
</tr>
<tr>
<td>SU 0.5</td>
<td>1404</td>
</tr>
<tr>
<td>SDCFL 1.4</td>
<td>1560</td>
</tr>
</tbody>
</table>

Table 3.3 Comparison of power dissipation and area of the DCFL, SU and SDCFL gates.

To find an optimum value for the relative fanout of the above gates, the buffers in Figure 3.17 were simulated in SPICE and evaluated in terms of delay, area and power dissipation.

Figure 3.18 shows the delay of the buffers as a function of relative fanout. In terms of delay, the optimum relative fanout of the DCFL buffer is 4, for which the delay is about 850ps. Beyond this point the delay is increased due to the high sensitivity of DCFL gates to fanout loading. For the SU and SDCFL buffers, an increase in relative fanout from 4 to 8 reduces the delay from 725 to 700ps and 580 to 535ps respectively. However, this improvement is insignificant compared with the sharp reduction in delay from the relative fanout of 2 to 4 (320ps for SU gate and 350ps for the SDCFL gate).
Figure 3.16 Delay sensitivity of the SU and SDCFL buffers to the capacitance of the high impedance node.

![Figure 3.16 Delay sensitivity of the SU and SDCFL buffers to the capacitance of the high impedance node.](image)

A very important issue in the design of the buffers (especially for the clock drivers) is to ensure equal signal rise and fall times at the output of the buffers. The differences in the rise and fall times (skew) for all three types of buffer are given in Figure 3.19. Minimum skew is achieved with a relative fanout of 4. The amount of skew for DCFL, SU and SDCFL buffers are 110, 90 and 12ps respectively.

Figure 3.17 Three buffering schemes with relative fanouts of 8, 4 and 2.

![Figure 3.17 Three buffering schemes with relative fanouts of 8, 4 and 2.](image)
Figure 3.18 Delay versus relative fanout for different buffering schemes.

Figure 3.19 Skew versus relative fanout for different buffering schemes.
The area of the buffers are reduced with increasing the relative fanout. As shown in Figure 3.20, there is a sharp decrease in area for a change of relative fanout from 2 to 4. However the reduction in area is very small for the relative fanout of greater than 4. At a relative fanout of 4, the area of the DCFL, SU and SDCFL buffers are $14 \times 10^3$, $39 \times 10^3$ and $45 \times 10^3 \mu \text{m}^2$ respectively.

The buffers were evaluated also in terms of power dissipation and the results are shown in Figure 3.21. The power dissipation of the DCFL buffer is almost constant. The power dissipation of the SDCFL buffer is most affected by the change in relative fanout and is reduced from 31 to 13mW for an increase in relative fanouts from 2 to 4.

Based on the above, the optimum relative fanout of all three buffers is 4. A relative fanout of 8 shows slight improvement in the delay, area and power dissipation of the SU and SDCFL buffers, whereas only the area of the DCFL buffer is improved. Once the important issue of equal rise and fall times is considered (Figure 3.19), a relative fanout of 4 is considered as the best compromise. Finally, were the buffers to be used as clock drivers, the length of the lines to the driven gates are usually long and the lengths may vary significantly. If the buffers are sensitive to this variation, the well known problem of clock skew may occur. Figure 3.22 shows the sensitivity of the buffers to this loading. For a large increase in load capacitance from 0.5 to 2pF, the delays of the DCFL, SU and SDCFL buffers are increased by 150, 32 and 48ps respectively.

Based on the results obtained in this chapter, the design of the large circuits presented hereafter is based on DCFL gates. Where a clear advantage in using the SDCFL gate is expected, the circuits are also implemented in SDCFL and their performance is compared to that of the DCFL counterpart. Super-buffers are also used as an extension to DCFL elements to improve the speed and noise margin of the overall circuit. The clock drivers are implemented in SDCFL, with a relative fanout of four to drive a particular fanout and interconnect load.
Figure 3.20 Area versus relative fanout of different buffering schemes.

Figure 3.21 Power versus relative fanout of different buffering schemes.
Figure 3.22 Delay versus interconnect capacitance (relative fanout=4).
4.1 Adder Design Approach [87]

Addition is an essential element in computer arithmetic and is considered the workhorse in most digital signal processing systems. At a VLSI level of complexity, adder cells are required to be physically small, operate at high speed and dissipate minimum power.

The purpose of this chapter is to evaluate various adder configurations for GaAs VLSI implementation. The circuits are based on DCFL gates and are fully optimised in terms of speed for a given area and power allocation.

A one bit full adder computes two binary digits \(a_i\) and \(b_i\), and a carry input \(c_i\) to produce a sum output \(s_i\) and a carry output \(c_{i+1}\). The outputs are related to the inputs by the following boolean equations:

\[
s_i = a_i \oplus b_i \oplus c_i \quad (4.1)
\]

\[
c_{i+1} = a_i b_i + b_i c_i + c_i a_i \quad (4.2)
\]

To implement the one bit adder in GaAs DCFL, the above logical expressions must be represented in the equivalent NOR functions:

\[
s_{i+1} = \overline{(a_i b_i + b_i c_i + c_i a_i)} \quad (4.3)
\]

\[
c_{i+1} = \overline{(a_i + b_i) + (b_i + c_i) + (a_i + c_i)} \quad (4.4)
\]

These equations can be mapped directly into DCFL using NOR gates. As
discussed in the previous chapter, the high sensitivity of the DCFL gates to fanin and fanout loading can severely degrade the performance. To show this effect, two design techniques have been employed. The first approach is to design for a minimum number of gates with high fanin and fanout counts in order to optimise the area. The only limit imposed on the design is a maximum fanout of 6, so as to achieve a positive noise margin under the worst case conditions. This design is called the unbuffered adder. The fanin and fanout limits are then reduced to achieve optimum speed performance. This is termed the buffered adder. Figures 4.1a and 4.1b show the circuit diagrams of the unbuffered and buffered one bit adder respectively. The former is the direct implementation of equations 4.3 and 4.4 while the latter modifies the equations to accommodate a maximum fanin and fanout of 3.

The delay through the carry chain, \( \tau_{c_{i-1}} \) is given by:

\[
\tau_{c_{i-1}(\text{unbuffered})} = \tau_{G1(1,5)} + \tau_{G2(2,1)} + \tau_{G3(3,1)} \quad (4.5)
\]

\[
\tau_{c_{i-1}(\text{buffered})} = \tau_{G1(1,2)} + \tau_{G2(2,1)} + \tau_{G3(3,1)} \quad (4.6)
\]

where \( \tau_{G_n(F_i,F_o)} \) is the delay through the nth gate with fanin of \( F_i \) and fanout of \( F_o \).

A general formula was derived (see Appendix B) for the delay of DCFL gates [88]:

\[
\tau_{G(F_i,F_o)} = 40 \times [1 + 0.28 \times F_i + 1.2 \times F_o] + 1840 \times C_l \quad (4.7)
\]

where \( C_l \) is the loading capacitance of the gate in femto farads.

Substituting equation 4.7 into equations 4.5 and 4.6, gives a carry chain delay of 536 and 435 ps for the unbuffered and buffered adders respectively. Clearly if the one bit adder is to be cascaded to form a long ripple carry chain, the buffered adder should be used for optimum speed. Both designs should also be evaluated in terms of power dissipation, area
and sensitivity to interconnect to achieve the best compromise. For example, in the case of the ripple-carry adder, a fanout limit should be imposed on the carry block to improve the speed. The unbuffered sum block in Figure 4.1a may be used to reduce the overall area and power dissipation.

![Figure 4.1 Logic diagram of the one-bit RC adder a) unbuffered b) buffered.]

This design technique is used in the implementation of the adders discussed in this chapter, and forms a basis for selecting a particular type of adder suitable for GaAs VLSI.
4.2 Types of Adder [89] [90]

Adder circuit configurations are presented in this section. They range from the simple and slow versions like the Ripple-Carry adders to the high speed and more complex implementations such as the Carry-Look-ahead adders. Furthermore, the buffered and unbuffered versions of each adder type are given to show the trade-offs in speed, power and area.

a) Ripple-Carry adder

The block diagram of a Ripple-Carry (RC) adder is shown in Figure 4.2. The logic diagrams for the Sum and Carry generator blocks of the unbuffered RC adder are given in Figure 4.1a. The buffered version is realised by a fanout reduction on the Carry generator block as shown in Figure 4.1b.

![Figure 4.2 Block diagram of the RC adder.](image)

b) Carry-Look-ahead adder [91] [92]

The speed of the RC adder can be improved by calculating the carries to each stage in parallel. In other words, the carries are generated simultaneously resulting in a constant addition time irrespective of the number of bits.
The circuitry required to generate the parallel carries is derived using the following equations:

\[ S_i = P_i \oplus C_{i-1} \] (4.8)

\[ C_i = G_i + P_i \cdot C_{i-1} \] (4.9)

where:

\[ G_i = a_i \cdot b_i \] (4.10)

\[ P_i = a_i \oplus b_i \] (4.11)

\[ G_i \] and \[ P_i \] are called the carry generate and propagate functions and they are derived directly from the inputs \( a_i \) and \( b_i \). The recursive equation of 4.9 can be applied repeatedly to obtain the required set of carry signals.

The equations for an \( n \)-bit Carry-Look-ahead (CL) adder are as follows:

\[ C_1 = G_1 + C_0 P_1 \]
\[ C_2 = G_2 + G_1 P_2 + C_0 P_2 P_1 \]
\[ \vdots \]
\[ C_k = G_k + G_{k-1} P_k + G_{k-2} P_{k-1} P_k + \ldots + G_1 P_2 \ldots P_k + C_0 P_1 P_2 \ldots P_k \]
\[ \vdots \]
\[ C_n = G_n + G_{n-1} P_n + \ldots + C_0 P_1 P_2 \ldots P_n \]

These equations should be transformed into their equivalent NOR form for GaAs DCFL implementation. The logic diagram of a 4-bit CL generator is given in Figure 4.3a. As the size of the CL generator is expanded, the fanin and fanout limitations of the DCFL gates are quickly reached. Therefore the number of carry-look-ahead bits should be limited to 2, 4 or 8 depending on the speed requirement. For GaAs DCFL implementation, this limit is set to 4 (section 4.3). The 4-bit CL blocks are then abutted as illustrated in Figure 4.3b, to form an \( n \)-bit adder.
Figure 4.3 a) Logic diagram of a 4-bit CL generator. b) An n-bit adder constructed using the 4-bit CL generators.

c) Carry Select adder [93] [94]

Another approach to speed up the addition cycle is to use the Carry Select scheme (CS). The basic structure for a CS adder is shown in Figure 4.4. Two n-bit ripple-carry adders are built, one with a zero and the other with a one carry input. The carry from the previous stage is used to select the output of the appropriate adder using a multiplexer. The carry output to the next stage is determined from the previous carry and the carry output from the two n-bit adders. The value of n was set to 4, in order to be able to easily expand the adder from 4 to 32 bits. The buffered CS
adder is also implemented by applying a fanout reduction on the 4-bit adders.

Figure 4.4 Schematic diagram of a Carry Select adder.

d) Binary Look-ahead Carry adder [95] [96] [97]

Binary Look-ahead Carry (BLC) adder, like the CL adder is based on the parallel computation of the carries. It uses an associative operator 'O' which computes the carry signals in a binary tree structure. The function of the 'O' operator is as follows:

\[(g,p) \circ (g',p') = (g + (p \cdot g'), p \cdot p')\]  \hspace{1cm} (4.13)

where \(g\), \(p\), \(g'\) and \(p'\) are boolean variables.

The carry signals can be computed as follows:

\[C_i = G_i\]  \hspace{1cm} (4.14)
where

\[
(G_i, P_i) = \begin{cases} 
(g_0, p_0) & \text{if } i = 0 \\
(g_i, p_i) \circ \ldots \circ (G_{i-1}, P_{i-1}) & \text{if } 1 \leq i \leq n
\end{cases}
\]  

(4.14)

and

\[
(g_i, p_i) \circ \ldots \circ (G_{i-1}, P_{i-1}) = (g_i, p_i) \circ (g_{i-1}, p_{i-1}) \circ \ldots \circ (g_0, p_0)
\]  

(4.15)

where \( n \) is the number of bits.

Therefore the \( G_i \)'s and \( P_i \)'s of each consecutive stage are computed using the same function. In other words, identical circuit elements arranged in a binary tree structure can be used to implement the carry bits.

For example, consider the equations for an 8-bit carry generator:

\[
\begin{align*}
C_0 &= g_0 \\
C_1 &= g_1 + p_1 \cdot g_0 \\
C_2 &= g_2 + p_2 \cdot C_1 \\
C_3 &= g_3 + p_3 \cdot g_2 + C_1 \cdot p_2 \cdot p_3 \\
C_4 &= g_4 + p_4 \cdot C_3 \\
C_5 &= g_5 + p_5 \cdot g_4 + C_3 \cdot p_4 \cdot p_5 \\
C_6 &= g_6 + p_6 \cdot C_5
\end{align*}
\]  

(4.16)

The eight bit BLC adder can now be constructed. The complete structure is illustrated in Figure 4.5. The similarity in the equations results in a simple carry generator block consisting of only three cells. They are the 'black', 'half-black' and the 'white' processors. The black processors perform the 'O' operation defined in equation 4.13 and the white cells transmit the data. The function performed by each of the processors is also shown in Figure 4.5. The variables \( g_i \) and \( p_i \) are the \( g \)'s and \( p \)'s from the previous stages. The 'precondition' cells provide the inputs to the carry generator block and the sum cells perform the XOR function on the carries (\( C_i \)) and the propagate signals (\( p_i \)) from the precondition cells to generate the sum output.
The logic diagrams of the cells within the 8-bit adder are shown in Figure 4.6. They are the NOR equivalents of the equations given in Figure 4.5 and can be directly implemented in GaAs DCFL. As for the other adders, buffers must be included also in the carry block to exploit the speed performance of GaAs. With this objective in mind, the buffers are placed in the critical path of the carry block. For the 8-bit adder, minimum geometry inverters are added at positions (T₂, C₁) and (T₄, C₃) to reduce the fanout loading (Figure 4.5). The positions of the buffers for 8, 16 and 32...
bit buffered BLC adders were calculated, to minimise the delay through the critical path, bearing in mind the unique timing characteristics of DCFL gates (Table 4.1).

![Logic diagrams of the cells in a BLC adder.](image)

**Table 4.1** Location of the buffers for 8, 16 and 32-bit BLC adder.
4.3 Evaluation of Adder Circuits for GaAs VLSI

The adders were implemented using a full-custom approach, in order to optimise the area of the circuits. The layouts of all the adders were handcrafted using the Phase1 layout tool (Plan, Appendix C). From the layouts, a set of SPICE input files was generated using the Phase1 net list extractor (GaAsnet, Appendix C). They include the transistor models, the nodal capacitances and transistor connectivity. From the SPICE simulation results, the delay and power dissipation of the adders were accurately determined. The area of the adders can be extracted directly from the layout. Also the customised buffering schemes proposed in the previous section were evaluated for each type of adder. Comparison of the adders in terms of delay, power dissipation and area forms the basis for selecting a particular adder type for GaAs VLSI.

In section 4.1b, it was mentioned that the number of carry-look-ahead bits in the CL adder is limited to 4. Due to the high fanin and fanout sensitivity of the DCFL gates (demonstrated in chapter 3) the expected speed improvements will not be achieved if the number of carry-look-ahead bits is expanded beyond 4. This can be shown by implementing a 32 bit CL adder with carry-look-ahead blocks of 2, 4 and 8 bits. The SPICE simulation results are shown in Figure 4.7. The delay of the adder with 2 bit carry-look-ahead blocks is 13.5ns. The increase in the carry-look-ahead bits from 2 to 8 reduces the delay by 5.3ns, i.e. an improvement in speed of only 39%. However the area is increased from 0.9mm$^2$ to 3.9mm$^2$. This rather unexpected increase in area is the result of having to add extra gates to fulfil the fanin and fanout requirements of the DCFL gates. The best compromise is to use 4-bit carry-look-ahead blocks with a delay and area of 10.28ps and 1.9mm$^2$ respectively. In this section, the adders referred to as CL adders consist of 4 bit carry-look-ahead blocks.
The following is the evaluation of the buffered and unbuffered versions of RC, CL, CS (using 4-bit ripple-carry blocks) and BLC adders introduced in the previous section.

Figure 4.8 shows the delay of the unbuffered adders as a function of the number of bits (dashed lines). For the 2 and 4-bit adders, there is no clear advantage in using the carry speed-up techniques and the simple RC adder can be used since the 4-bit CL, CS and BLC adders give the same performance in terms of delay (about 2.6ns). As the number of bits is increased the adder delays begin to diverge. The delays for 32-bit RC, CL, CS and BLC adders are 17.16, 10.28, 6.91 and 5.92ns respectively. Therefore in terms of delay, there is a clear advantage in using the BLC or CS adders for a high number of bits (i.e. 24-32 bits).

The solid lines in Figure 4.8 show the delay of the buffered adders. The
benefit of including the buffers as proposed in the previous section is evident from the graph. In the case of 32-bit BLC and CS adders, the delays are reduced from 5.92 down to 4.61 ps and from 6.91 down to 5.40 ps respectively (a 22% improvement).

Figure 4.8 Buffered (solid lines) and unbuffered (dashed lines) Adder delays for different number of bits.
The delays of 8 to 32-bit RC and CL buffered adders are only 5% less than their unbuffered counterparts. This is due to the relatively low fanout loading in the critical paths of the RC and CL adders in comparison with the CS and BLC versions. Also, the interconnects in the carry chain of the RC and CL adders were short in comparison. As a result the capacitance loading due to the lines was not significant.

The area of the adder circuits is another important issue for VLSI application. Figure 4.9 shows a comparison of the areas of the buffered adders. The unbuffered adders are not included in the graph since their area is almost equal to the buffered versions. In fact the extra gates required to implement the buffered adders results in less than 5% increase in area.

The RC, CL, BLC and CS adders occupy almost the same area as the number of bits is varied from 2 to 4 (about 0.2mm² for 4-bit adders). They begin to differ significantly as the number of bits exceeds 16. In fact, a 16-bit CS adder with an area of 1.3mm² is almost twice the size of its RC counterpart. At 32 bits the area of the RC, CL, BLC and CS adders are 1.50, 1.98, 2.73 and 2.84mm² respectively. Therefore in terms of size, the RC and CL adders are the most suitable for GaAs VLSI, especially where the number of bits is more than 16. However, for VLSI, a generally accepted measure of performance is the delay-area product. A circuit with the lowest delay-area product is the optimal design.

For up to 8 bits, the performance of the adders is closely matched and any one of the above adders can be selected. It could be argued that since the RC adder is the easiest to implement, given its simple structure, it can be used for a low number of bits. For a high number of bits, the time-area optimal circuit is the BLC adder, closely followed by the CS adder. To further justify this claim, the area of a CL adder with 8-bit carry-look-ahead blocks (CL8) is also included in the graph of Figure 4.9.

The delay of this adder is comparable with the delay of the BLC adder.
At 32 bits the delays are equal, but the area of the CL8 adder is 1.5 times that of the BLC adder.

Figure 4.9 Adder area for different number of bits.
Although delay and area are normally used to evaluate a particular circuit configuration for VLSI, in high speed applications power dissipation of circuits is another criterion which must be considered before selecting a particular design style. In fact one of the limiting factors in increasing the level of integration for high speed circuits is power dissipation. The average power dissipation of the buffered adders against the number of bits is shown in Figure 4.10. Again, the results for the unbuffered adders are not shown as the excess power due to the buffers is less than 2% of the total power dissipation.

Up to 8 bits, the power dissipations of the adders are comparable. For a higher number of bits, the CS and CL adders dissipate the most power, about 56mW for 32-bit addition. This is due to the fact that a relatively large number of gates is required to implement the CL and CS adders, especially in the case of the CS adder, where blocks of 4-bit RC adders are duplicated to generate the carry into the next stage. The power dissipation of the BLC adder is as low as the RC adder. At 32 bits, the average power dissipation for RC and BLC adders is about 40mW.

The average power dissipation has static and dynamic components. The static power dissipation is proportional to the total number of transistors in a circuit. The dynamic power dissipation however, is directly related to the number of gates switching at a given time. The BLC adder exhibits a comparatively low average power dissipation because it has a particularly low dynamic dissipation. This is due to the fact that only one row of the carry block is activated at a given time. Since each processor consists of only a few basic gates, the total number of switching devices is low. Furthermore, the interconnect lines are short and the fanout loading is kept low.

The final issue to consider is the effect of interconnect on the delay of the overall circuits. There has been a major effort to improve the existing interconnect technology. This has led to the development of low impedance lines such as second and higher level metallisation and more
recently the air bridge technology. This, however, adds to the cost and reduces the yield.

![Image of power dissipation graph with multiple lines representing different adders: RC adder, CL adder, BLC adder, and CS adder.](image)

**Figure 4.10** Power dissipation of various adders.
Figure 4.11 Delay sensitivity of 32-bit adders on interconnect.

For a given delay, power and area, a design which is less sensitive to interconnect should be considered a better candidate for GaAs VLSI implementation. Figure 4.11 shows the delay of 32-bit adders with
increasing line capacitance. It attempts to show the effect of different interconnect technologies on the delay of the adders. The low capacitance values (<0.02 fF/\mu m^2) correspond to the air bridge technology; second and higher level metals are given a line capacitance of 0.06 down to 0.02 fF/\mu m^2. The capacitance values higher than 0.06 fF/\mu m^2 are used to show the performance of the adders implemented using the first level metal only.

As shown in Figure 4.11, the advantage of using the air bridge and/or a high level metal (second or third) is quite evident. For instance the delay of the 32-bit unbuffered CS adder is doubled from 6 to 12.5 ns, as the line capacitance is raised from 0.02 to 0.1 fF/\mu m^2. The graph shows also the effect of buffers in reducing the adder sensitivity to interconnect. For example with the line capacitance of 0.1 fF/\mu m^2, the CS unbuffered adder has a delay of 12.5 ns whereas the delay is about 8.9 ns for the buffered version.

Another important point is the effect of interconnect on design styles. The BLC adder is the least sensitive circuit configuration to interconnect than the other designs. The worst case delay for the buffered BLC adder is about 7.3 ns. This is followed by the buffered CS adder with a worst case delay of 8.9 ns.

4.4 Summary of Important Points

In this chapter various adder circuits have been evaluated for GaAs VLSI implementation. The following points can be derived from the analysis.

a) For a low number of bits (up to 8), the traditionally slow RC adder may well be adequate for high speed GaAs applications. However as the number of bits is increased, the BLC adder followed by the CS adder show far superior performance to that of the RC and CL adders. This performance is measured by delay-power and delay-area products which are lowest for the BLC and CS adders (Figures 4.12, 13).
Figure 4.12 The delay-power product of adders for different number of bits.

b) The proposed buffering scheme is an effective method of speeding up the logic elements (e.g., adders). The buffers improve the speed by as much as 30%, but occupy less than 5% of the total area and result in less than 2% increase in power dissipation. This is achieved by the way of reducing the fanout and breaking up the interconnect lines into
smaller segments. Therefore the designs are more tolerant to interconnect loading and cross talk.

**Figure 4.13** Delay-area product of the adders for different number of bits.
c) The effect of the original algorithm and overall architecture on the performance of the final design should not be overlooked. For example, the binary tree structure of the BLC adder, resulting from the associative property of the algorithm, produces a regular layout of processing elements, connected over short interconnect lines. This is particularly useful for GaAs DCFL implementation as the fanout and interconnect loading are reduced.

Having introduced a practical approach to the design of optimal GaAs adders, a more complex circuit example is required to show the effectiveness or limitations of our design approach. A natural progression is to implement a multiplier which makes extensive use of the optimal BLC adder. In the next chapter, a modified Booth's multiplier is designed and implemented to be used as a vehicle for the evaluation of a new design and layout technique for GaAs.