145

PAPER Special Section on Analog Technologies in Submicron Era

# A Concept of Analog-Digital Merged Circuit Architecture for Future VLSI's

Atsushi IWATA† and Makoto NAGATA††, Members

SHMMARY This paper describes the new analog-digital merged circuit architecture which utilizes the pulse modulation signals. By reconsidering the information representing and processing principles, and the circuit operations governed by the physical law, the new circuit architecture is proposed to overcome the limitations of existent VLSI technologies. proposed architecture utilizes the pulse width modulation (PWM) signal which has analog information in the time domain, and be constructed with the novel PWM circuits which carry out the multi-input arithmetic operations, the signal conversions and the data storage. It has a potential to exploit the high speed switching capability of deep sub-µm devices, and to reduce the number of devices and the power dissipation to one-tenth of those of the binary digital circuits. Therefore it will effectively implement the intelligent processing systems utilizing 0.5-0.2 µm scaled CMOS devices.

key words: pulse width modulation (PWM), switched current integrator, PWM adder, PWM signal converter

#### 1. Introduction

Rapid progress in the silicon LSI technologies has made it possible to implement multi-function and high performance electronic circuits on a single chip. The largest scale integration has been demonstrated by the experimental 1 Gb DRAM using 0.25 µm MOS devices The clock frequency of micro-processors has reached to 500 MHz on the 32 b RISC architecture [2]. The highest accuracy of analog circuits was obtained by the 20 bit A-to-D converter with a 120 dB dynamic range [3]. Wide band analog circuit performances were demonstrated by the development of 19 GHz bandwidth SiGe bipolar amplifiers for 20 Gb/s optical receiver [4]. Future systems for multi-media, human friendly man-machine interfaces, and intelligent processings demand integration of a huge amount of digital circuits and flexible analog circuits on a same chip.

Many efforts have been made to reduce device sizes and power dissipation, nevertheless the conventional digital VLSI technologies reach to various physical and practical limitations. Especially, power dissipation has been recognized as the most severe

problem. A high power exhausting digital LSI which requires an expensive package or a cooling system might lose its application field, even if it has a superior processing capability. For battery operated portable systems that can provide a high processing capability at anywhere and anytime, power dissipation has to be ultimately reduced. This is a reason why low power design techniques in architecture, logic, circuits and devices become key points for enhancing the value of future LSI's. Increase of development cost and production cost of deep sub-um technology will also make it difficult to keep rapid progress. We have to develop a new technology which overcomes the limitations and brings again the rapid progress. One promising solution would be a new approach utilizing parallel processing of analog signals. The purpose of this paper is to propose an innovative analog-digital merged circuits architecture using pulse modulation sigdistrict.

# 2. Progress in Si Devices and Analog Circuits

### 2.1 Scaled MOS Devices and Circuit Architecture

Up to the present, bipolar devices have been applied to high-speed IC's and high-accuracy analog IC's. Its cutoff frequency  $(f_T)$ , however, almost reaches to the physical limitations. CMOS devices have been applied to low power digital LSI's, and those application fields have been immediately expanded to high speed logic LSI's and analog-digital (A-D) mixed LSI's. Bipolar IIL and BiCMOS devices have been also applied to A-D mixed LSI's which require high analog performance. From the research and development standpoint, the value of these complex devices will gradually decline toward the deep sub- $\mu$ m era, because the superiority of bipolar devices will deteriorate in digital applications, high drivability of BiCMOS gates also becomes not so attractive. Therefore CMOS devices will enhance its superiority in the 21st century.

The gate delay of the scaled CMOS devices has been improved as shown in Fig. 1. The MOS devices were scaled down to 0.8 µm preserving the supply voltage. The supply voltage of 0.5  $\mu$ m MOS circuits was scaled to 3.3 V due to prevent the hot electron degradation. An effect of the scaling is the drastic reduction of the gate delay. Beyond 0.5  $\mu$ m, although

Manuscript received August 20, 1995.

<sup>†</sup> The author is with the Faculty of Engineering, Hiroshima University, Higashi-Hiroshima-shi, 739 Japan.

<sup>††</sup> The author is with the Research Center for Integrated Systems, Hiroshima University, Higashi-Hiroshima-shi, 739 Japan.



Fig. 1 Gate delay of scaled CMOS devices.

non-load gate delay will be reduced, the wire delay will limit the critical path speed on LSI chip. The reduction of dissipation power is not enough to cope with the rapid increase of integration scales and clock frequencies. Accordingly, processor architectures for localized interconnections and design techniques for low power circuits are essential to utilize the deep sub- $\mu$ m devices. On the line of these technology trends, the device scaling will continue to below 0.1  $\mu$ m with some refinements in device structures and circuit configurations.

The analog performance of the scaled MOS devices was studied based on the first order analysis [5]. The quasi-constant voltage (QCV) scaling brings the best overall performance among the well known three scaling laws: the constant field (CE), constant voltage (CV), and QCV scaling laws. Key parameters of a basic amplifier, low frequency gain (Av), unity gain frequency  $(f_0)$ , and signal-to-noise ratio (S/N)were estimated considering the higher order effects, as shown in Fig. 2 [6]. Av is almost constant or slightly decrease with scaling, while it rapidly decreases for the CV scaling law. The  $f_0$  increases following the firstorder theory  $(k^{1.5})$  for small scaling factor (k), but the increase becomes slowly with large  $k \ (>2)$ . This is caused by the mobility degradation. The S/N ratio decreases with scaling depending on  $k^{-1}$  for small k(<2), and  $k^{-0.5}$  for large k (>5). For analog circuits using QCV scaled 0.25  $\mu$ m MOS devices, the S/Nratio is estimated to be 74 dB which corresponds to a 12 bits resolution. Therefore improvements in the accuracy of analog circuits are not expected as long as the conventional circuit scheme would be employed.

As devices were scaled down, the analog-digital



Fig. 2 Analog performance of QCV scaled CMOS.



Power Dispersion of the Chip Area of the

Fig. 3 Comparison between ASP and DSP.

(b) ISDN Equalizer



Fig. 4 Circuit architecture trends.

mixed circuit architecture was drastically changed. Reductions in the chip size and power dissipation are compared between the analog signal processing (ASP) and digital signal processing (DSP). The estimated values for the voice PCM codec LSI and the ISDN equalizer LSI are shown in Figs. 3(a), (b), respectively. In the 5  $\mu$ m era, ASP had advantages of small chip area and low power. In the 2  $\mu$ m era, however, these advantages were disappeared in voice band applications, because power reduction and integration density of the digital circuits exceeded those of the analog circuits. Up to the 0.5  $\mu$ m era, the analog circuits have been replaced by the digital circuits.

The guideline of analog-digital mixed LSI design in each feature size is shown in Fig. 4. The technology trend from analog to digital will continue to the 0.5  $\mu$ m era. Beyond 0.5  $\mu$ m, the design concept of large scale A-D mixed LSI's would be changed due to the limitations of analog accuracy and logic power dissipation.

### 2. 2 Integrated Analog Circuit Techniques

To discuss the new design concept, let us review the key technologies of the analog integrated circuits during the past 20 years.

### (1) Amplifiers

OP amplifiers are the most popular analog IC's. In 1960's many bipolar amplifier circuit techniques were developed [7]. The basic concept of integrated amplifiers is the differential amplifier scheme. This circuit fully utilizes the properties that integrated elements have high matching accuracy (0.2-1%) in spite of large absolute value deviations (10-20%). Therefore circuits and pattern layouts must be designed to keep its symmetry. To enhance bandwidth and gain, the differential cascode amplifier was developed. For a wide input voltage range, the common mode feedback technique was also proposed. MOS amplifiers were also designed based on the same concept [8]. chopper-stabilization technique was employed in order to reduce 1/f noise of MOS devices. Fully differential configurations are widely employed for the A-D mixed LSI's, because they have high tolerance to the substrate coupling noises and power supply noise.

A 900 MHz CMOS RF amplifiers were developed for wireless application. One chip transceiver has been



(a) MASH A-to-D Converter



(b) Swing Suppression Delta-Sigma A-to-D Converte H j: Integrator, H<sub>d</sub>: Differentiator, Q :One-bit Quantizer

Fig. 5 Oversampling delta-sigma A-to-D converters.

studied for the future target of wireless applications

# (2) A-D Converters IN house the

A-D conversion is an essential function for A-D mixed LSI's. In the first stage, the bipolar Successive Approximation A-to-D converters were developed using a binary weighted current source array or an R-2R weighting network. The capacitor array A-to-D converter based on the charge redistribution principle was developed [10]. It consists of highly accurate capacitors and analog switches which are easily implemented with standard MOS technologies. The well-designed binary capacitor array realizes a 13 bit accuracy and a 75 dB dynamic range without any trimming.

Over-sampling delta-sigma A-to-D converters which consist of high speed one-bit quantizer and noise shaping filters have been developed with bellow 2 μm MOS technologies. These converters are very suitable to the scaled MOS devices because they utilize the high frequency sampled low bit signals in stead of highly accurate analog signals. In order to realize higher order noise shaping without oscillation problem, the multi-stage noise shaping (MASH) architecture, which consists of a multiple cascade connection of the first-order noise shaping loops, was developed as shown in Fig. 5(a) [11]. The MASH A-to-D converter with over 90 dB S/N ratio for the standard audioband was obtained by the 2  $\mu$ m CMOS devices. Extending MASH configuration, the 3rd order delta sigma using the cascade connection of 2nd order first stage and first order second stage was developed. This is called the 2-1 architecture, and improves the sensitivities for analog components. For low voltage applications, the voltage swing suppression over-sampling scheme as shown in Fig. 5(b) was proposed [12]. Since the output of the delta-sigma A-to-D converter is

a pulse density modulation (PDM) signal, it is ordinary converted to the Nyquist rate PCM signal by the decimation filter. However, it seems to be useful to process the PDM signal as it is. The other architectures of A-D converters, the parallel, the pipe-line cascade, and the sub-ranging have been studied to realize a high conversion speed with lower power dissipation and smaller chip area.

### (3) Analog Filters

In 1987, an integrated analog filter technique was revolved by the MOS switched capacitor filter (SCF) [13]. Its main features are that an equivalent high resistance can be realized with a periodically operated analog switches and a capacitor. An equivalent RC time constant is precisely determined by high accuracy of a capacitor ratio. The CMOS voice Coder/Decoder with low-pass and band-pass SCF's was developed [14]. These SCF's realized a sharp cut off and an over 80 dB dynamic range without any trimming. Since the SCF is a kind of discrete time sampled filter and its aliasing noise reduces S/N ratio, pre- and postcontinuous-time filters are required. continuous-time integrated RC active filters are intensively studied [15]. To realize the equivalent R, three types of integrators, gm-C, gm-OTA-C, and MOSFET-C were proposed. Frequency tuning circuits in which an equivalent resistance value is controlled by bias voltages of feedback loop circuit, is inevitable. This feedback circuit is a key technology for the new circuit architecture. Since these amplifier-based filter circuits consume large power, a circuit scheme without any amplifier will be required in the future. The current mode circuit scheme using dependent current sources or switched current sources is expected to satisfy this requirement.

# (4) Phase Locked Loop

Phase Locked Loop (PLL) was originally developed for a frequency synthesizer which was applicable to a local oscillator of wireless transceivers. extended applications are the clock recovery for synchronous data transmissions and the clock generation for high speed microprocessors [16]. Recently delay locked loop (DLL) has been studied intensively. It generates multi-phase clocks with less than 1 ns timing resolution by using a ring oscillator with variable delay cells as shown in Fig. 6. It realizes a lower jitter and higher noise immunity than the conventional VCO type PLL. The CMOS clock generator with a 140 ps peak-to peak jitter was realized [17]. Due to the progress of DLL technologies, a new circuit techniques which process highly accurate information in the time domain will become available.

# (5) Noise Cancellation Techniques

The offset cancel circuit can be easily realized with a switched capacitor circuits. This circuit is also effective in suppressing the 1/f noise components which have sufficiently lower frequency than the clock



Fig. 6 Delay locked loop.

longie lampte bac

walk A-D mixed LSU vi

frequency. Another large noise component is the crosstalk noise from digital circuits to analog circuits. The major noise coupling paths are power supply line, ground line and a LSI substrate. Power supply and ground lines are properly separated just inside a bonding pad or including bonding pads. The substrate coupling noise is the most serious problem. The noise canceling technique has been already studied. It consists of the noise detection and the active noise cancellation circuit [18].

# 3. Information Processing Principles and Circuit Architectures

#### 3.1 Representations of Information

The current integrated electronics have been built based on the binary digital architecture in which information is represented by discrete-time binary data and processed by the Boolean algebra. This data form is suitable to represent symbol data and numerical data. A binary bit stream in the time domain is regarded as the pulse code modulation (PCM) signal which is widely applied to the digital signal processings and the digital communications. Recently the image processing and pattern processing become very important for the intelligent systems which perform the feature extraction and recognition. Since we can not understand the meaning of these data, it is essentially unsuitable and redundant to process these data as the binary data by the stored binary program. More efficient representation should be studied for future systems. An another representation is analog which has values of both continuous-time and continuous-amplitude. This is an essential representation for auditory and vision systems and real world environments.

Between the analog and digital representations, different data forms exist. One is the multi-valued digital data which represents multi-bits by multi-level voltage or current. The others are the modulation signals. As for a carrier wave form, a sinusoidal wave and a square pulse wave are available. The several kinds of pulse modulation signals were studied from



Fig. 7 Pulse modulation signals.

the view point of neural network implementation [19]. This paper will discuss the pulse width modulation (PWM), pulse phase modulation (PPM), and pulse density modulation (PDM) as shown in Fig. 7, comparing with the PCM signal.

In the conventional electronics, information is represented by the physical quantities: voltage, current, or charge. For example, binary digital bit "1" and "0" are represented by the voltage levels of high and low, respectively. These electrical quantities are determined by the supply voltage or the element values, based on the well-known Ohm's law (v=iR) or the basic physical principle  $(v=q/C=\int idt/C)$ .

In the binary digital systems, a dynamic range and an S/N ratio can be enhanced by increasing the bit-length without any influence of element mismatches and various noises. An error rate of digital processing is negligibly small because the signal energy (qV) is sufficiently larger than the thermal noise energy (kT). Where q is the electron charge, V is the signal voltage swing, k is Boltzmann constant and T is the absolute temperature. As for the conventional analog representation, the dynamic range is practically limited by the thermal noise and the 1/f noise, the harmonic distortions and the cross talk noise.

### 3. 2 Data Processing

The basic functions of information processing are the storage, the arithmetic and the communication. A long-term storage is essential for the stored program computers. Widely used binary memories, DRAM and SRAM have many features of a large capacity, long retention time, and high reliability. To store one datum, however, these memories require many MOS devices at least a number of bit. On the other hand, an analog memory can be realized with a single storage element such as a capacitor. But practically the accuracy and reliability of the analog memory have been still not enough because of leakage of the stored

charge.

As for the arithmetic operation, multiplication and addition are most widely used. Many efforts have been made to implement a high speed and low power multiplier. The analog arithmetic operations are realized with a simple circuit based on the physical laws. Multiplication is carried out with a single resistor based on the Ohm's law. Current summation and charge summation are carried out based on the Kirchhoff's law and the charge conservation principle, respectively. Therefore these analog arithmetic circuits have a potential to process data with considerably low power consumption.

Data communication has been a bottleneck in the conventional electronic system. In MPU chips, high frequency clocks and control signals are distributed to whole chip area for synchronous operations. Furthermore, a future MPU chip with several processors will require a wide bandwidth crossbar switches which connect between memories and CPU's. A highly parallel processor system will require ultra-wideband channels which connect inter processors. The data transfer power efficiency should be drastically improved by utilizing the coding and modulation technique and the asynchronous system architecture.

developed, which employed the anal-

the virtual neuron architecture in

#### 3. 3 New Circuit Architectures and Devices

Recently, new circuit techniques have been studied to introduce to the conventional binary logic circuits. For example, pass transistor logic family such as Complementary Pass Transistor Logic (CPL) has been proposed to save transistor count and power dissipation [20]. In order to improve data transmission power efficiency, the new bus interface circuit using charge recycle was proposed [21]. The power dissipation of CMOS logic LSI is dominated by the two types of energy losses. One caused when the load capacitance is charged, and the other caused when discharging happens. Both are  $0.5CV^2$  where C is the load capacitance, and V is the voltage swing. If the capacitor is charged by a slowly ramped power supply, the energy loss can be reduced to tc/T, where tc is the time constant of the output resistance and the node capacitance, and T is a period of the ramp waveform. This is called Adiabatic Charging [22]. If the charge returns to the power supply, the energy loss at discharging does not occur. This is called Energy Recycle Power Supply. Although operating speed is limited by the frequency of the ramped pulse power supply, the power dissipation will be reduced to 1/10~1/100 of the conventional CMOS logic.

For future integrated systems with intelligence, a functional device with programmability is strongly required. It works also as a memory by itself. The four terminal neu-MOS device which operates based on charge redistribution principle on floating gate

capacitors were proposed [23]. The main feature of this device is that a threshold voltage can be controlled by the fourth terminal and be memorized with the floating gate charge. By using this device, the synaptic circuit which calculates the weighted sums of the multi-inputs and memorizes the weighting coefficients, was implemented. The neu-MOS has also been applied to the winner take all circuit and the CAM which are useful for the intelligent systems. This device has been studied to build digital, analog and multi-value merged integrated electronics [24].

Neuromorphic processing which has a potential to overcome the bottleneck of the von Neumann architecture, is one of the promising architectures. Its fundamental differences from the conventional architectures are the elementary functions, the representation of information and the organizing principle. These realize massively parallel and flexible processing with human-like intelligence. The neural processing is carried out by the connection of a huge number of neurons, in which information is basically represented in the distributed form. This representation seems to be not efficient but very flexible and robust. Flexible and intelligent functions are organized by the learning mechanisms. Many types of neural network LSI's were developed, which employed the analog, digital, or A-D mixed representations. The digital approach employed the virtual neuron architecture in which arithmetic units and connections are used in time sharing operations. In the analog approaches the neural nets are implemented by real neurons which correspond to its topology and work on real time. On-chip backpropagation learning function was also realized with fully analog approach [25]. The silicon retina VLSI's have been developed utilizing the neuromorphic architecture and the MOS analog circuits operated at a weak inversion region [26]. The simple functions of the early vision are realized with a parallel scheme and low power dissipation. The pulse stream neural chips using pulse modulation signals were developed by using A-D mixed circuits [19].

# 4. Analog-Digital Merged Circuit Architecture Using Pulse Width Modulation Signals

# 4. 1 Circuit Architecture Using Pulse Modulation

In order to contrive an analog-digital merged circuit, it is necessary to develop a new representation for information and circuit implementation for its arithmetics, which realize a high accuracy and a wide dynamic range by the deep sub- $\mu$ m devices. Since the switching speed of these devices is expected to be improved to sub-100 ps, the time domain information processing become effective. The pulse modulation signals using a square pulse wave as a carrier, are available to

implement with the conventional CMOS devices, where information is represented by pulse raising and falling edges instead of its voltage level. Features of the various pulse modulation signals are summarized in Table 1. The data transmission bit rate and power efficiency of the signals are estimated assuming that interconnecting wire is 1  $\mu$ m in width and is 20 mm in length. Figure 8 shows a result of the estimation. Although the PCM is known as the most efficient representation, the power efficiency is relatively low, because of its high transition probability between "1" and "0." Since the PPM handles multi-bit data with a single pulse, the higher power efficiency is attained. The data transmission system using the PPM has been proposed [27]. The bit rate of 160 Mb/s was attained

Table 1 Features of pulse modulation signals.

| q bri                 | Amplitude-axis       | Bits per Time Slot                          | SCUL) | Transfer         |  |
|-----------------------|----------------------|---------------------------------------------|-------|------------------|--|
| РСМ                   | Discrete             | Tslot / Tmin                                | 7     | Sync.            |  |
| PWM Analog / discrete |                      | log <sub>2</sub> ((Tslot - 2 Tmin) / Δt)    |       | Sync.<br>/Async. |  |
| РРМ                   | Analog<br>/ discrete | 2 log <sub>2</sub> ((Tslot /2 - Tmin) / Δt) | 8     | Sync.            |  |
| PDM                   | Discrete             | log <sub>2</sub> (Tslot / 2Tmin)            | 1.8   | Async.           |  |

Tmin=5ns, Δt= 0.7ns, Tslot= 35ns,



based on the well-known Ohm's law



Fig. 8 Comparison of bit rate and dissipation power.

using a 20 MHz clock which can be handled with a 1  $\mu$ m CMOS technology. But it requires the synchronization circuit for extracting the reference clock phase. On the other hand, the PWM signal can transfer multi-bit data in the asynchronous mode. Therefore it can simplify the data receiver circuits comparing with those of the PPM. Recently, the data bus architecture using the asynchronously compressed PWM representation was proposed. The bit rate of 12 times higher than the conventional 100 MHz bus was demonstrated [28].

Although a PWM signal amplitude is binary digital value, its pulse width represents analog information. Since the PWM signal has both properties of analog and digital signals, it is useful to combine digital and analog signals. To merge each signal in a fine grain size, new circuit techniques for PWM arithmetic operations and mutual signal conversions are required. If the amplifier-based circuit is employed, it consumes much power because of the bias currents of amplifiers. The number of amplifiers has to be reduced as small as possible, to save power dissipation. The proposed scheme allows us to merge the PWM signal with the binary digital and analog signals in the right position.

The PWM representation takes a long time period of  $T=2^n\times \Delta t$ . Where n is data length in the binary representation, and  $\Delta t$  is the timing resolution. Assuming that n=8 and  $\Delta t=1$  ns, one arithmetic operation requires the period of T=256 ns. To reduce the period, the multi-line PWM is proposed. The original binary data are divided into two data, the higher bits (H) and the lower bits (L), and each data is represent by the PWM. Assuming that H=4 bits and L=4 bits, the period decrease to 16 ns. By using the dual-line representation, the period and the processing time is reduced to  $2^{0.5n}$  of the original representation.

# 4. 2 Arithmetic Circuit for PWM Signals

The most important arithmetic operations are the multi-input adding operation and the multiplication and accumulation (MAC) operation. The PWM arithmetic operations can be implemented by a simple logic gate as shown in Fig. 9 [19]. The addition is carried out by an OR gate, under the condition that each input PWM signal has no overlapping in time domain. The subtraction is performed by a XOR gate, under the condition that one of the input PWM signals is included in the other one. The multiplication (PWM×PWM) can be implemented with only one AND gate. Although an output signal becomes a pulse stream, it can be converted to a single PWM pulse by the integration circuits and the analog-to-PWM converter which will be described later. These circuits, however, are not suitable to realize parallel operations for a large number of PWM inputs, because the timing overlap of each input is difficult to be



Fig. 9 PWM arithmetic circuits using logic gates.



Fig. 10 PWM adder circuit using switched current sources.

removed.

In order to realize parallel operations, an adder governed by the Kirchhof's law is useful. Connecting a large number of PWM signals to the multi-input RC integrator, all PWM signals are summed up in parallel. Since this circuit utilizes an operational amplifier, it consumes large power. Our circuit solution without using the amplifier is the switched current integrator (SCI) which consists of switched current sources (SCS's) and an capacitor as shown in Fig. 10. When the input PWM signal is law, the constant current flows to the capacitor. By connecting a number of SCS's to a single capacitor, multi-input addition can

be carried out in parallel. If we use binary weighted SCS's, the multiplication of a PWM signal by a binary digital data can be realized as shown in Fig. 11. Thus these PWM arithmetic circuits work in a highly parallel schemes.

If the output voltage of the SCI increases, the arithmetic accuracy is limited to around 0.3% by the finite output impedance of the SCS. The output impedance can be improved by the use of a current buffer or a feed back control circuit. Furthermore, if the integrating charge is successively converted to digital data, the voltage of the integrator does not increase and therefore an accurate analog processing is realized with a relatively low supply voltage for the deep sub- $\mu$ m scaled devices. This circuit will be designed by the same principle as the APWC described in Table 2.

# 4. 3 Conversion Circuits for PWM Signals

In order to merge the PWM circuit with the existing



Fig. 11 PWM multiplier using binary weighted switched current sources.

Table 2 Mutual signal converters.



analog and digital circuits in a fine grain size, the mutual signal conversion functions should be realized with simple and small circuits. Table 2 shows block diagrams of mutual converters. The Analog-to-PWM conversion (APWC) is carried out by measuring the integration time, using a reference current source, an integrator, and a voltage comparator. This operation principle is as same as the integration A-to-D converter. The PWM-to-Analog convertor (PWAC) is implemented by an SCI as shown in Fig. 10. The Digital-to-PWM converter (DPWC) is implemented by a programmable down counter. Input digital data is loaded to the counter and the count down time is correspond to the output PWM signal. The PWM-to-Digital converter (PWDC) also realized with a binary counter operating with the clock gated by an input PWM signal. Although these DPWC and PWDC are simple, they require a high frequency clock and therefore consume much power. and a seed down somusinoo

If a high frequency global clock is utilized to the conversion circuits, it consumes much power because large stray capacitors are charged and discharged. The low power design guideline is to use local clock for each converter. The timing deviation of each local clock is allowable because the PWM arithmetic circuit using SCI integrator is insensitive to the pulse timing. These local clocks can be generated by using the delay locked loop (DLL), and its accuracy is controlled by the period of low frequency flame clock signal.

higher bits (H) and the lower bits (1)

### 4.4 PWM Memories shivib are stab yranid lanighto

For the simple PWM arithmetic circuit using a simple gate, the PWM delay circuit is necessary as shown in Fig. 9. The PWM delay circuit can be implemented with a combination of the PWAC, analog memory and APWC. Figure 12 shows one example of the PWM memory. If a delay time becomes long, the leakage current of switching devices degrade the accuracy of the output pulse width. For long-term memory, we have to use digital memories, such as shift registers or RAM's. Figure 13 shows conceptual block diagrams of the two long-term PWM memories. The PWM data are stored in the registers or RAM cells as a bit stream of continuous bit "1"s which is proportional to the pulse width. The timing accuracy of the memory is determined by the local clock resolution and its accuracy. In the shift register type, since data are shifted at high frequency, clocking power dissipation is dominant and large. On the other hand, in the memory cell type, data are fixed in the cells and only one cell is addressed at a time. Therefore power dissipation is considerably reduced comparing with the shift register type.

# 4. 5 System Architecture and Canceling Device Deviations and Noises

A general concept of the PWM A-D merged architecture is shown in Fig. 14(a). Figure 14(b) shows a relation between the accuracy and processing time for each circuit. The PWM signal is applied to the intelligent processing engine for feature extractions and





(b) PWM Memory using RAM Cells

Fig. 13 PWM memory using digital storage.

overall decisions, which require multi-input additions and MAC operations but do not require a high accuracy. These circuits will dominate a chip area in the future VLSI's with intelligence. The circuits which require a high accuracy are implemented with the binary digital circuits. The analog signals are used limitedly in the temporary memory for PWM signal conversion and external analog interfaces.

Many reference circuits which compensate the fabrication inhomogeneities of a huge number of devices should be distributively implemented on a whole chip area. The reference ramp circuits generate the SCS's bias voltage which compensates capacitance deviations. The PLL technique is applicable to the reference circuit as shown in Fig. 15. The frequency of the voltage controlled multivibrator (VCM) is locked to *n* times of the low frequency reference flame signal and the loop filter output can be used for the control voltage of each SCS.

### 4. 6 Estimated Performance of the PWM Circuits

The advantage of the proposed PWM circuit are shown in Table 3, comparing with the conventional binary digital circuits. The addition and MAC operations with 2 and 16 inputs are estimated under the conditions as shown in the table. The energy of SCI is estimated by  $CV_{dd}V_c$ , where C is the integration capacitance,  $V_{dd}$  is the supply voltage, and  $V_c$  is the maximum integration voltage. Processing performance of the PWM circuit is improved with the increase of a



Fig. 12 PWM memory using analog storage.



(a) Basic Configuration and Data Flow



(b) Accurcy and Processing Time of PWM, Binary Digital and Analog

Fig. 14 PWM A-D merged circuit architecture.



FIg. 15 Reference ramp generator.

number of inputs, because parallel operations are more carried out more efficiently. The device counts for the PWM arithmetic circuits are around one-tenth of those of the binary digital. The energy dissipation of 16 inputs arithmetic operation is smaller than one-tenth of that of the binary digital. These estimation shows the advantage of the PWM circuit architecture.

# 4. 7 Applications of the A-D Merged Circuits Architecture

Since the A-D merged circuit is expected to have a

Table 3 Comparison of circuit performance (PWM vs. Digital).

| (a) Two | Input Adder | (8 bit) |
|---------|-------------|---------|
|---------|-------------|---------|

| PWM      | Dual PWM              | Binary Digital                            |                                                                                                                          |
|----------|-----------------------|-------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|
| Switched | Current Integrator    | Riple Carry                               |                                                                                                                          |
| 20       | 30                    | 192                                       |                                                                                                                          |
| 256      | the satcuracy         | 199 4.7                                   |                                                                                                                          |
| 32       | 11 12 64 W/9 s        | 48                                        | ,                                                                                                                        |
|          | Switched<br>20<br>256 | Switched Current Integrator 20 30 256 116 | Switched Current Integrator         Riple Carry           20         30         192           256         16         4.7 |

(b) Sixteen Input Adder (8 bit)

|                    | PWM                         | Dual PWM | Binary Digital |
|--------------------|-----------------------------|----------|----------------|
| Circuit Scheme     | Switched Current Integrator |          | Riple Carry    |
| Device Count       | 118                         | 128      | 350            |
| Operation Time(ns) | 256                         | 16       | 96             |
| Energy (pJ/OP)     | 46                          | 92       | 700            |

(c) Two Input Multiplier (8 bit)

|                    | PWM                         | Dual PWM | Binary Digital |
|--------------------|-----------------------------|----------|----------------|
| Circuit Scheme     | Switched Current Integrator |          | Array Type     |
| Device Count       | 62                          | 72       | 1730           |
| Operation Time(ns) | 256                         | 16       | 15             |
| Energy (pJ/OP)     | 39                          | 78       | 430            |

(d) Sixteen Input Multiplier (8 bit)

| W9 PW              | PWM                         | Dual PWM | Binary Digital |
|--------------------|-----------------------------|----------|----------------|
| Circuit Scheme     | Switched Current Integrator |          | Array Type     |
| Device Count       | 454                         | 464      | 2240           |
| Operation Time(ns) | 256                         | 16       | 320            |
| Energy (pJ/OP)     | 174                         | 350      | 3460           |

Notes: PWM time resolusion = 1ns = 10 pF 
Max. Integration Voltage = 1V 
Gate delay = 0.3 ns 
Switching Energy /gate = 2pJ 
Supply Voltage = 3V 
Not include registers

linearity of over 9 bit, it can be generally introduced to A-D mixed systems. The multi-input digital processor and analog processor can be effectively replaced by the PWM arithmetic circuits.

Neural processings can be well implemented by the proposed architecture. For example, a block diagram of Kohonen's network is shown in Fig. 16. Calculating the absolute value of the difference between the input and the reference vectors, in other word. Manhattan distance is applied for simplification. The calculation of the Manhattan distance of the PWM data is realized with only one XOR gate. The summation of the distance is performed with the multi-input SCI. The winner take all is realized with the circuit to find the widest pulse in parallel. This network is also applicable to the feature extraction circuit or the motion estimation circuit for a picture coding such as the MPEG II algorithm. The multi-layer neural network and Hopfield network are also easily realized with the proposed A-D merged circuits architecture.

### 5. Future Challenges

The proposed architecture utilizes the PWM signals for



Fig. 16 Kohonen network using A-D merged circuit architec-

its compatibility to the conventional CMOS devices. More sophisticated modulation techniques were developed for the telecommunications. The development object was how to implement the efficient data transfer utilizing the limited bandwidth and low S/N ratio More sophisticated modulation methods channel. which handle the amplitude and phase such as multibit QAM (Quadrature Amplitude Modulation) are expected to be introduced to the future VLSI's. Thus, the telecommunication technologies will resolve the communication crisis in a gigantic systems. introducing these technologies, novel analog circuit techniques are also necessary. The optical communication technologies will be also applied to the long interconnection on the future ULSI's [29]. The feature of the optical interconnection is being free from parasitic capacitance. Therefore the transfer speed is as high as the photon speed without being affected by the increase of fun outs and wire length. This concept was proposed as Ultra Opto Electronic Integrated Circuit (U-OEIC).

In the 21st century, the Neumann computer will be still employed to the electronics and the multi GOPS (Giga-Operations per Second) MPU will be realized by the binary digital architecture. However, the digital architecture will be changing to the new one which is suitable to the intelligent processing. The one candi-

date is the memory-based architecture which utilizes functional memories as processing elements, and operates with data driven scheme. The proposed A-D merged circuit will be also useful to the memory based-architecture, because the ultimate functional memory will be realized with functional devices and handle the analog information.

ii. Del Signore, D. Kerth

# 6. Conclusion

The progress in devices and circuits for the existent binary digital and A-D mixed LSI's are reviewed. These techniques reach to the limitations of power dissipation and analog accuracy. The new technological challenges to overcome the limitations are described. By reconsidering the information representation and processing principles, and circuit operations based on the physical law, the new analog-digital merged circuit architecture is proposed. That exploits the PWM signal which has both properties of analog and digital, and consists of the novel PWM circuits which carry out the arithmetic operations, the signal converters and the memory. These PWM circuits are introduced to the A-D merged architecture as the intelligent processing engine which carried out the addition and MAC operations for a large number of inputs. The highly accurate processings are implemented by the binary digital circuits. The circuit architecture has a potential to exploit the high speed switching capability of deep sub- $\mu$ m devices, and reduce the number of devices and the power dissipation to one-tenth of those of the binary digital circuits. Therefore it will effectively implement the intelligent processing systems utilizing 0.5-0.2  $\mu$ m scaled CMOS devices.

### Acknowledgement

This work was partially supported by the Ministry of Education, Science, Sports, and Culture under Grantin-Aid for Scientific Research on Priority Areas, "Ultimate Integration of Intelligence on Silicon Electronic Systems" (Head Investigator: Tadahiro Ohmi, Tohoku University).

#### References

[1] T. Sugibayashi, I. Naritake, S. Utsugi, K. Shibahara, R. Oikawa, H. Mori, S. Iwao, T. Murotani, K. Koyama, S. Fukuzawa, T. Itani, K. Kasama, T. Okuda, S. Ohya, and M. Ogawa, "A 1 Gb DRAM for file applications," Digest of ISSCC95, pp. 254-255, 1995.

[2] K. Suzuki, M. Yamashina, T. Nakayama, M. Izumikawa, M. Nomura, H. Igura, H. Heiuchi, J. Goto, T. Inoue, Y. Koseki, H. Abiko, K. Okabe, A. Ono, Y. Yano, and H. Yamada, "A 500 MHz 32 b 0.4 μm CMOS RISC processor LSI," Digest of ISSCC94, pp. 214-215, 1994.

[3] B. Del Signore, D. Kerth, N. Sooch, and E. Awanson, "A monolithic 20-b delta-sigma A/D converter," IEEE J. of Solid-State Circuits, vol. 25, no. 6, pp. 1311-1317, 1990.

- [4] M. Soda, H. Tezuka, F. Sato, T. Hashimoto, S. Nakamura, T. Tatsumi, T. Suzaki, and T. Tashiro, "Si-analog ICs for 20 Gb/s optical receiver," Digest of ISSCC94, pp. 170-171, 1994.
- [5] S. Wong and A. Salama, "Impact of scaling on MOS analog performance," IEEE J. of Solid-State Circuits, vol. SC-18, no. 1, pp. 106-114, 1983.
- [6] E. Sano, T. Tsukahara, and A. Iwata, "Performance limits of mixed analog/digital circuits with scaled MOSFETs," IEEE J. of Solid-State Circuits, vol. SC-23, no. 4, pp. 942 -949, 1988.
- [7] R. Widlar, "Design techniques of monolithic operational amplifier," IEEE J. of Solid-State Circuits, vol. SC-4, no. 8, pp. 184-191, 1969.
- [8] P. Gray and R. Meyer, "MOS operational amplifier design-A tutorial," IEEE J. of Solid-State Circuits, vol. SC-17, no. 12, pp. 969-983, 1982.
- [9] A. Abidi, "Low-power radio-frequency IC's for portable communications," Proc. of IEEE, vol. 83, no. 4, pp. 544– 569, 1995.
- [10] J. McCreary and P. Gray, "All-MOS charge redistribution analog-to-digital conversion techniques—part II," IEEE J. of Solid-State Circuits, vol. SC-10, no. 6, pp. 371-379, 1975.
- [11] K. Uchimura, T. Hayashi, T. Kimura, and A. Iwata, "Oversampling A-to-D and D-to-A converters with multistage noise shaping modulators," IEEE Trans. ASSP, vol. 36, no. 12, pp. 1899-1905, 1988.
- [12] Y. Matsuya and J. Yamada, "I V power supply, lowpower consumption A/D conversion technique with

- swing-suppression noise shaping," IEEE J. of Solid-State Circuits, vol. SC-2/, no. 12, pp. 1524-1530, 1994.
- [13] B. Hosticka, R. Brodersen, and P. Gray, "MOS sampled data recursive filters using switched capacitor integrators," IEEE J. of Solid-State Circuits, vol. SC-12, no. 6, pp. 600 -608, 1977.
- [14] A. Iwata, H. Kikuchi, K. Uchimura, A. Morino, and M. Nakajima, "A single-chip CODEC with switched-capacitor filters," IEEE J. of Solid-State Circuits, vol. SC-16, no. 4, pp. 315-321, 1981.
- [15] Y. Tsividis, "Integrated continuous-time filter design—an overview," IEEE J. of Solid State Circuits, vol. SC29, no. 3, pp. 166-176, 1994.
- [16] J. Alvarez, H. Sanchez, G. Gerosa, and R. Countryman, "A wide-bandwidth low-voltage PLL for powerPC microprocessors," IEEE J. of Solid State Circuits, vol. SC30, no. 4, pp. 383-391, 1995.

[17] T. Lee, K. Donnelly, J. Ho, M. Zerbe, and T. Ishikawa, "A 2.5V delay-locked loop for an 18 Mb 500 MB/s DRAM," Digest of ISSCC94, pp. 300-301, 1994.

- [18] K. Fukuda, S. Maeda, T. Tsukada, and T. Matsuura, "Substrate noise reduction using active guard band filters in mixed-signal integrated circuits," 1995 Symp. on VLSI Circuits, pp. 33-34, 1995.
  [19] K. Yano, T. Yamanaka, T. Nishida, M. Saito, K. Shimo-
- [19] K. Yano, T. Yamanaka, T. Nishida, M. Saito, K. Shimo-higashi, and A. Shimizu, "A 3.8-ns CMOS 16×16-b multiplier using complementary pass-transistor logic," IEEE J. of Solid-State Circuits, vol. SC-25, no. 2, pp. 388-395, 1990.
- [20] H. Yamauchi, H. Akamatsu, and T. Fujita, "An asymptotically zero power charge-recycling bus architecture for battery-operated ultrahigh data rate ULSI's," IEEE J. of Solid-State Circuits, vol. SC-30, no. 4, pp. 423 -431, 1995.
- [21] A. Kramer, J. Denker, S. Avery, G. Dickson, and T. Wik, "Adiabatic computing with the 2N-2N2D logic family," 1994 Symp. on VLSI Circuits, pp. 25-26, 1994.
- [22] T. Ohmi and T. Shibata, "The concept of four-terminal devices and its significance in the implementation of intelligent integrated circuits," IEICE Trans. Electron. vol. E77-C, no. 7, pp. 1032-1041, 1994.
- [23] T. Ohmi, "Integrating intelligence on silicon electronic systems," 1995 Symposium on VLSI Circuits Digest of Technical Papers, pp. 1-2, 1995.
- [24] T. Morie and Y. Amemiya, "An all-analog expandable neural network LSI with on-chip backpropagation learning," IEEE J. of Solid-State Circuits, vol. SC-29, no. 9, pp. 1086-1093, 1994.
- [25] C. Mead and M. Mahowald, "A silicon model of early visual processing," Neural Networks, vol. 1, pp. 91-97, 1988.
- [26] A. Murray, D. Del Corso, and L. Tarassenko, "Pulsestream VLSI neural networks mixing analog and digital techniques," IEEE Trans. on Neural Networks, vol. 2, no. 2, pp. 193-204, 1991.
- [27] K. Nogami and A. El Gamal, "A CMOS 160 Mb/s phase modulation I/O interface circuits," ISSCC Digest of Technical Papers, pp. 108-109, Feb. 1994.
- [28] T. Yamauchi, Y. Morooka, and H. Ozaki, "A low power and high speed data transfer scheme with asynchronous compressed pulse width modulation for AS-memory," 1995 Symp. on VLSI Circuits, pp. 27-28, 1995.
   [29] A. Iwata and I. Hayashi, "Optical interconnections as a
- [29] A. Iwata and I. Hayashi, "Optical interconnections as a new LSI technology," IEICE Trans. Electron. vol. E76-C, no. 1, pp. 90-99, 1993.



Atsushi Iwata received the B.E., M.S. and Ph.D. degrees in electronics engineering from Nagoya University, Nagoya, Japan, in 1968, 1970, and 1994 respectively. From 1970 to 1993, he was at the Electrical Communications Laboratories, Nippon Telegraph and Telephone Corporation, Atsugi, Japan. Since 1994 he has been a Professor of Electronics Engineering at Hiroshima University, Higashi-Hiroshima, Japan. His research is in the

field of integrated circuit design where his interest has included, circuit architecture and design techniques for digital signal processors, analog-to-digital and digital-to-analog converters, analog-digital merged signal processors, Opto-Electronic IC's, and neural network implementations. He received an Outstanding Panelist Award for the 1990 International Solid-State Circuits Conference. He is a member of the Institute of the Electrical and Electronics Engineers.



Makoto Nagata received the B.S. and M.S. degrees in physics from Gakushuin University, Tokyo, Japan, in 1991 and 1993, respectively. He is currently a Research Associate in Research Center for Integrated Systems, Hiroshima University. His main research interests include the development of the analog-digital merged circuit architecture using pulse modulation signals. He is a member of the IEEE.

end of the LSI process reducing the control of high control of high control of high control of high frequency more argoresas to be an early control of high-frequency more area of the control of high-frequency more more than a control of high-frequency more resolution of high-frequency more control of high-fre

entre send a service de la visitation de