Slew-Aware Clock Tree Design for Reliable Subthreshold Circuits Jeremy R. Tolbert, Xin Zhao, Sung Kyu Lim, and Saibal Mukhopadhyay

Similar documents
Robust Subthreshold Circuit Design to Manufacturing and Environmental Variability. Masanori Hashimoto a

Design of Low-Power CMOS Cell Structures Using Subthreshold Conduction Region

Low-Power 1-bit CMOS Full Adder Using Subthreshold Conduction Region

Impact of Local Interconnects on Timing and Power in a High Performance Microprocessor

Bulk CMOS Device Optimization for High-Speed and Ultra-Low Power Operations

Design Methodology for Fine-Grained Leakage Control in MTCMOS

Reduction of Subthreshold Leakage Current in MOS Transistors

Timing-Based Delay Test for Screening Small Delay Defects

DATE 2006 Session 5B: Timing and Noise Analysis

Post-Synthesis Sleep Transistor Insertion for Leakage Power Optimization in Clock Tree Networks

Lecture 10: Efficient Design of Current-Starved VCO

Shannon Expansion Based Supply-Gated Logic for Improved Power and Testability

Comparative Analysis of Voltage Controlled Oscillator using CMOS

A High-Density Subthreshold SRAM with Data-Independent Bitline Leakage and Virtual Ground Replica Scheme

A 2.4 GHZ FULLY INTEGRATED LC VCO DESIGN USING 130 NM CMOS TECHNOLOGY

Viability of Low Temperature Deep and Ultra Deep Submicron Scaled Bulk nmosfets on Ultra Low Power Applications

Statistical Analysis of MUX-based Physical Unclonable Functions

SUPPLEMENTAL MATERIAL

Review Article Device and Circuit Design Challenges in the Digital Subthreshold Region for Ultralow-Power Applications

Chapter VI Development of ISFET model. simulation

Enhancement of VCO Linearity and Phase Noise by Implementing Frequency Locked Loop

TOTAL IONIZING DOSE TEST REPORT

Application of Op-amp Fixators in Analog Circuits

Position Paper: How Certain is Recommended Trust-Information?

5. Low-Power OpAmps. Francesc Serra Graells

VLSI Design, Fall Design of Adders Design of Adders. Jacob Abraham

IN RECENT years, the demand for power-sensitive designs

TOTAL IONIZING DOSE TEST REPORT

A low supply voltage and wide-tuned CMOS Colpitts VCO

Genetically Generated Neural Networks I: Representational Effects

Statistical Analysis of MUX-based Physical Unclonable Functions

Project: Feedback Systems for Alternative Treatment of Obstructive Sleep Apnea

SUBTHRESHOLD CIRCUIT DESIGN AND OPTIMIZATION. Sungil Kim

MPS Baseline proposal

Variation Aware Sleep Vector Selection in Dual Dynamic OR Circuits for Low Leakage Register File Design

A BIST Scheme for Testing and Repair of Multi-Mode Power Switches

Perceptual Plasticity in Spatial Auditory Displays

SSO Simulation with IBIS

Texas A&M University Electrical Engineering Department. ELEN 665 RF Communication Circuits Laboratory Fall 2010

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF MECHANICAL ENGINEERING QUESTION BANK. Subject Name: ELECTRONICS AND MICRIPROCESSORS UNIT I

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

A Multiple-Crystal Interface PLL With VCO Realignment to Reduce Phase Noise

Chapter 1. Introduction

Power Turn-On. IEEE 802.3af DTE Power via MDI March Dieter Knollman AVAYA

Student Performance Q&A:

The Reach Profiler (REAPER): Enabling the Mitigation of DRAM Retention Failures via Profiling at Aggressive Conditions

SO14 IC Serial Output IC For 14 Bits of Output

Total Ionizing Dose Test Report. No. 16T-RT3PE3000L-CG896-QMLPK

Neuromorhic Silicon Neuron and Synapse: Analog VLSI Implemetation of Biological Structures

AUTOMATIONWORX. User Manual. UM EN IBS SRE 1A Order No.: INTERBUS Register Expansion Chip IBS SRE 1A

More skilled internet users behave (a little) more securely

MAC Sleep Mode Control Considering Downlink Traffic Pattern and Mobility

Impact of Particle Mass Distribution on the Measurement Accuracy of Low-Cost PM-Sensors

AND BIOMEDICAL SYSTEMS Rahul Sarpeshkar

Electromyography II Laboratory (Hand Dynamometer Transducer)

E2 Programming Start Up Guide for the M400 drive.

Conditional Distributions and the Bivariate Normal Distribution. James H. Steiger

Analog Design and System Integration Lab. Graduate Institute of Electronics Engineering National Taiwan University

Fig. 1: Typical PLL System

Enzyme Analysis using Tyrosinase. Evaluation copy

TEPZZ 85_Z 4A_T EP A1 (19) (11) EP A1 (12) EUROPEAN PATENT APPLICATION

Assessment of Reliability of Hamilton-Tompkins Algorithm to ECG Parameter Detection

Data processing software for TGI/TGE series

Digital. hearing instruments have burst on the

Signing IBIS model against DDR4 spec

Pitfalls in Linear Regression Analysis

PHYSIOEX 3.0 EXERCISE 33B: CARDIOVASCULAR DYNAMICS

An Energy Efficient Sensor for Thyroid Monitoring through the IoT

7 Grip aperture and target shape

Adaptive Mode Control: A Static-Power-Efficient Cache Design

INFLUENCE OF AFTERBODY SHAPE ANGLE OF TRAPEZOIDAL BLUFF BODY ON MEASURED SIGNAL PARAMETERS

When designing with these products it is important that the Designer take note of the following considerations:

The Effect of Code Coverage on Fault Detection under Different Testing Profiles

CCS Concepts Security and privacy Security in hardware Hardware Application-specific VLSI designs.

PoDL- Decoupling Network Presentation Andy Gardner. May 2014

LATEST TRENDS on SYSTEMS (Volume I) New Seismocardiographic Measuring System with Separate QRS Detection. M. Stork 1, Z. Trefny 2

Implementation of a Multi bit VCO based Quantizer using Frequency to Digital Converter

PLL. The system blocks and their input and output signals are as follows:

Part I. Boolean modelling exercises

PCA Enhanced Kalman Filter for ECG Denoising

Unified Padring Design Flow

Design of the HRV Analysis System Based on AD8232

Current Testing Procedure for Deep Submicron Devices

ULTRA LOW POWER SUBTHRESHOLD DEVICE DESIGN USING NEW ION IMPLANTATION PROFILE. A DISSERTATION IN Electrical and Computer Engineering and Physics

FUSE TECHNICAL REPORT

Test Equipment Depot Washington Street Melrose, MA TestEquipmentDepot.com

PHASE-LOCKED loops (PLLs) are one of the key building

Learning Classifier Systems (LCS/XCSF)

ENVIRONMENTAL REINFORCEMENT LEARNING: A Real-time Learning Architecture for Primitive Behavior Refinement

Genetic Algorithms and their Application to Continuum Generation

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Mike Davies Director, Neuromorphic Computing Lab Intel Labs

Spreading of Epidemic Based on Human and Animal Mobility Pattern

INVESTIGATION OF ROUNDOFF NOISE IN IIR DIGITAL FILTERS USING MATLAB

IEEE SIGNAL PROCESSING LETTERS, VOL. 13, NO. 3, MARCH A Self-Structured Adaptive Decision Feedback Equalizer

SOLUTIONS Homework #3. Introduction to Engineering in Medicine and Biology ECEN 1001 Due Tues. 9/30/03

What is a stepper motor?

The Long Tail of Recommender Systems and How to Leverage It

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Transcription:

Slew-Aware Clock Tree Design for Reliable Subthreshold Circuits Jeremy R. Tolbert, Xin Zhao, Sung Kyu Lim, and Saibal Mukhopadhyay Georgia Institute of Technology, Atlanta, GA, 3332 jeremy.r.tolbert@gatech.edu, {xinzhao, limsk, saibal}@ece.gatech.edu ABSTRACT In the paper, we analyze the effect of clock slew in subthreshold circuits. Specifically, we address the issue that variations in clock slew at the register control can cause serious timing violations. We show that clock slew variations can cause latch timing metrics such as setup, hold and clock-to-q times to deviate by 9% from the design goals. Based on these observations, we recognize the importance of clock slew control in subthreshold circuits. We propose a systematic approach to design the clock tree for subthreshold circuits to reduce the clock slew variations while minimizing the power dissipation in the tree. We show that a tighter nodal capacitance control is necessary to control the slew in a subthreshold clock tree, which can increase the power dissipation. Recognizing that the wire resistances have a negligible effect in subthreshold circuits, we show proper wire sizing is necessary to reduce the clock power. Finally, we propose a dynamic nodal capacitance control technique that allows larger slew at the earlier nets of the tree while controlling it more aggressively near the sink nodes. The combined approach, including the wire sizing and dynamic nodal capacitance control, can achieve better slew control (and better timing control) at lower power in subthreshold circuits. Categories and Subject Descriptors B.8.2 [Performance and Reliability]: General General Terms Performance, Design, Reliability Keywords Subthreshold, Slew, Clocks. I TRODUCTIO Transistors operating in the subthreshold region constitute an attractive technology for ultra-low power mobile applications such as micro-sensors and biomedical devices. When the primary goal is to save energy, subthreshold logic can allow for maximum power reduction by simply reducing the supply voltage. Even though low power is the main focus, it is still innate for the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISLPED 9, August 9-2, 9, San Francisco, CA, USA. Copyright 9 ACM 978--6558-684-7/9/8 $.. designer to optimize secondary parameters such as robustness and performance. As a result of these efforts, works have been presented to optimize energy-delay product, while performing computations with minimal error []. In addressing the design of an optimal energy-delay subthreshold system, the clock network plays a significant role. Delivering robust clock signals to hundreds or even thousands of flip-flops requires the clock tree to be optimally designed to handle issues of delay, skew, and jitter. In subthreshold, the signal slew also has capacity to affect system performance. Additionally, because the clock network is the work horse of sequential logic, with the highest switching activity, it can contribute up to 4% of the total dynamic power [2]. The same trend is expected when an above threshold system is scaled to subthreshold voltages. Thus, designing a low-power yet robust clock tree is a critical challenge to implement large scale subthreshold systems. This challenge is increasingly difficult because subthreshold designs are always constrained by the requirement of robustness. The inherent variations in process parameters and other sources of random circuit noises severely degrade the robustness of the subthreshold circuits. Hence, the deterministic variation sources (i.e. clock slew) that can be modeled during clock tree design time need to be accurately considered and methods need to be developed to reduce them. The purpose of this work is to analyze the impact of clock slew in subthreshold design and propose a technique for a lowpower slew-controlled clock-tree design. We examine the inherent slew variations in clock tree. We show that the slew variations can contribute to as much as 9% variation in parameters such as setup time, hold time, and clock-to-q. As the slew increases, these timing metrics deviate further from the best case, increasing the risk of violating cycle and hold time margins. By designing a subthreshold clock tree that causes less slew variations at the registers, the timing violations will be reduced, creating a more robust clock design. Experiments in predictive 65nm technology shows that by employing the proposed techniques, it is possible to reduce the clock slew variation induced errors in latch timing while maintaining or reducing the power dissipation of the clock tree. 2. THE IMPORTA TA CE OF SUBTHRESHOLD SLEW CO TROL As the supply voltage of digital circuits is scaled below the device threshold, the characteristics of the transistor change. An immediate observation is that the current in the subthreshold regime has an exponential dependence on gate voltage, threshold voltage, and additional parameters that are functions of the process. This is in contrary to the above threshold design, whose dependence has been noted to be linear or square [3]. As a result 5

3 Sub V TH σ / µ =.239 4 6 8 Slew Variations, ns Load Cap, ff Figure : (Left) Slew variations at the sink nodes of a subthreshold clock tree designed using above threshold concepts. (Right) ormalized output slew contours and their dependence on input slew and total load capacitance for an inverter. of these effects, small variations in the subthreshold regime have been known to follow this exponential dependence, and care has been taken to control these effects [4-8]. In above threshold circuits, proper modeling has given insight into physical device parameters and how they interact with each other. It has been shown that the output slew of a logic gate has a strong dependence on the device dimensions as well as the load capacitance. The input slew of a logic gate can cause the output delay to change in the range of 5-% [9]. More recently, this effect has been designated a concern for flip-flop design in the subthreshold region [8]. The purpose of this section is to demonstrate to the reader the effects of clock slew, and how severely it can impact important timing metrics. When designed with above threshold methods, subthreshold clock trees exhibit significant slew variations at the sink nodes (i.e. the nodes directly connected to the latches) that will increase the probability of timing violations. As an example, the focus of this experimental section will be based on a clock network designed using an above threshold, zero skew clock tree design algorithm, with a slew control method that limits the maximum capacitance driven by each internal and external clock node (i.e. hereafter, referred to as the nodal capacitance). The power supply of this design was then scaled to below the device threshold. In this clock tree, inverting buffers were used to reduced the number of devices and thus save on power. The small scale tree was designed using a 65nm Predictive Technology Model (PTM) with V TP = -378mV and V TN = 429mV. The power supply was 3mV and the clock tree has 267 sink nodes, each driving multiple flip-flops []. 2. Clock Tree Impact on Slew Variations Variations can come from sources of random noise, imperfect device processing, and design choices. While it is possible to corral variations, it is impossible to eliminate them in general. Slew variations are no different, yet little attention has been designated to this problem. Figure (left) depicts the slew distribution at the sink nodes for the subthreshold clock tree described above. The slew at the clock sink nodes is important because they are the control for pipeline registers, which constitute a significant amount of latches and flip-flops in a chip design. The coefficient of variation, CV, is a normalized measure of dispersion of a probability distribution and is defined as the ratio of standard deviation (σ) to mean (µ): σ = µ 8 6 4.5 2.5 CV () 2 3.5 5 Normalized Input Slew 3 4 In the subthreshold clock tree the CV is 23 percent, showing there is a wide distribution of slew. Well controlled CVs are in the range of -5 percent. This distribution of slew shown in Figure (left) corresponds to the slew variation across different sink nodes of the tree. This slew variation is caused by variation in the output slew of the inverter stages of the clock tree. The right plot in Figure shows the effect of input slew and load capacitance on the output slew of an inverter. The line contours represent how the output slew of an inverter has a strong dependence on the input slew and load capacitance []. This is important for clock tree paths, because it is composed of numerous inverters, and the slew and capacitance vary from node to node. For a smaller output slew, a smaller input slew and load capacitance are required. The recovery of the slew (i.e. output slew smaller than the input slew) through an inverter stage is an important consideration for the clock tree path. If not controlled, the slew can become progressively higher (as much as 3X) as signals progress down a clock path. The slew recovery is a strong function of the load capacitance. Figure shows that if the load capacitance is reasonable, the inverter can recover slew even for a large input slew. Hence, controlling the capacitance driven by each node in the clock tree is very important to control slew propagation through the tree and reduce the slew variation at the sink nodes. 2.2 Slew Impact on Timing Violations The primary concern that can arise from clock slew variations is the impact on sequential circuit timing. Robust latches and flip-flops are required to minimize the functionality errors, but their design alone can not account for the impact of clock slew. In this section we will characterize the clock slew impact on timing violations and show its direct influence on a commonly used flip-flop. The operating frequency of a pipelined processor is limited by the logic that can be performed between latches and is defined as the inverse of the cycle time, T: + δ tc q + tlogic t su (2) T + Where t c-q denotes the maximum propagation delay of the register (clock-to-q), t logic is the maximum delay of the combinational logic, t su is the setup time for the registers and δ the clock skew. When the cycle time is not met, functionality errors arise as the result of a computation has not been completed before the next cycle begins. While maximum delay is important, the hold time of the destination register must be shorter than the minimum propagation delay through the network. It is defined as, δ + t hold < tc q, cd + tlogic, cd (3) where the subscript cd refers to the contamination delay, or minimum delay. Essentially, if the hold time of a logic path is violated, the data presented at the destination register will be overwritten by new data, before it has an opportunity to write to the latch. Initially we ignore skew to understand how slew independently impacts the timing metrics. To understand the effect of clock slew we have analyzed how these variations impact the most important timing metrics; setup 6

T + δ tc q + tlogic + tsu (4) Normalization.5.5 t SETUP t HOLD t CQ CD 4 6 8 Slew, ns t CQ Figure 2: Timing variations for a clock tree designed in above threshold lowered to subthreshold. Timing metrics are normalized by the best case value. time, hold time and clock-to-q. The clock slew values obtained from Figure were used to simulate slew variations on a flip-flop (Figure 2). The authors realize that understanding the slew impact is a complex problem [8], so to simplify, the data slew is assumed to be zero to isolate the effects of clock slew. Figure 2 depicts a commonly used flip-flop configuration that can be used in subthreshold operation [2]. The plot above in Figure 2 shows how the slew distribution directly affects each of the four timing metrics described above. They have been normalized since the focus of this study is on the general behavior. In the next subsections we will explain how the overall timing margins are changed by a degrading slew. 2.2. Slew Variations affecting cycle time margins When slew variations are applied to clock, the setup time is directly proportional to it. In severe cases the setup time can vary by 9% worse than the best case achieved. This variation in setup will reduce the time that is available to compute logic, and in some cases will cause errors in the logic by violating the setup requirements. When the worst case clock-to-q (t CQ ) is considered, a similar trend will exist in relation to the clock slew rate. For the given slew distributions, the clock-to-q will be in the range of 25% worse than the best case clock-to-q. Overall, the clock slew distribution shows that there will be a distribution of cycle time as well, based on each logic path. Referring now to (4), designers should consider the setup and clock-to-q variations when designing for a target frequency. The delta denotes that there is a variation in setup and clockto-q times that will ultimately contribute to the cycle time. From our understanding, a smaller slew variation with translate in to smaller setup and clock-to-q variations. By reducing the clock slew variation, this timing metrics will reduce and the cycle time will be better controlled. 2.2.2 Slew Variations affecting hold margins Looking at the hold time curve of Figure 2, we see the curve with a negative slope. This leads us to believe that increasing the clock slew will reduce the hold time. Upon further investigation, it is known than all hold times are negative; thus the smaller ratio of hold time to best case hold time means the hold time at a larger slew is less negative (i.e hold time is increased). The small negative number means the requirement of the hold edge has been pulled to a time closer to the clock edge. This range of hold times can vary as much as 5% of the best case. Much like the worse case clock-to-q, the contamination delay clock-to-q (t CQ CD ) is directly dependent on the slew distribution. For an increasing clock slew, the variation of contamination delay can be as much as 25%. The combination of hold and contamination times will come into play when the hold time margin is considered. In our flip-flop configuration, the hold time and contamination delay increased with increasing clock slew distribution. Using this information and (3), we can rearrange the equation to get a better understanding of the hold requirement. To ensure that a race condition does not occur, the following expression should hold true, even with variations: δ (5) t hold tc q, cd < + tlogic, cd Since the hold time and contamination delay increases with slew it is possible to violate the requirement of (5). If the hold time plus variations are greater than the contamination delay plus variations, a violation occurs and there is a logic failure. Additionally, if the clock tree has been designed with a recognized clock skew, any change in the left hand expression could cause a race condition. If the slew variation is minimal, this will reduce or even eliminate this problem. By considering this point, it further justifies the need for clock slew control in subthreshold design. 2.3 Clock Tree Impact on Timing Violations In subsections 2. and 2.2 we have discussed that () a clock tree design will cause slew variations and (2) slew variations have the potential to cause severe timing violations in subthreshold. In this section, we make the connection that the design of the clock tree contributes to these timing variations. Figures 3 (a) and (b) shows the distribution of the timing metrics that impact cycle time. Figure 3 (a) reiterates the concept that the setup time is worsened by increasing clock slew with a variation of up to 9%. Recall that the clock-to-q trends were similar (Figure 3 (b)), as the clock slew increases the cycle time will increase as well. Figures 3 (c) and (d) summarize the trends on the hold time requirements. In subthreshold, the hold time is increasing (becoming less negative) while the contamination delay is increasing. The subthreshold flip-flop will be more sensitive to clock slews, because it will increase the chance of violating a hold time. 7

3 5 5.5 2 Setup Time Variations 3 3. TECH IQUES FOR LOW POWER SLEW CO TROL I SUBTHRESHOLD The findings from section 2 show that it is necessary to control and reduce the variations associated with clock slew for robust subthreshold operation. The focus of this section will be to investigate techniques to design a clock tree in subthreshold that provide a smaller slew variation. 3. Conventional Methods that limit C MAX Conventional above threshold methods for controlling clock slew currently exist and rely on limiting the maximum nodal capacitance, C MAX a buffer can drive [3-6]. In our analysis of an above threshold clock tree scaled to operate in subthreshold, the C MAX was defined by above threshold methods. In above threshold, there is more available current to drain the charge from the node, therefore we expect a smaller C MAX to be required for optimal subthreshold clock trees. Figure 4(a) shows the results of a clock tree designed in subthreshold (for the same netlist), with varying C MAX. Keep in mind that this figure of merit C MAX, represents the maximum nodal capacitance a node can have, and that in most cases it is less than that. The results show that it is Normalized Power.9 5 5 (a).6.8 Hold Time Variations X Wire 4X Wire 25 Normalization..2.3.4 C Q Variations.8.4 # INVs 25 3 3 4 5 6 C, ff MAX (a) (b) Figure 4: Summary of experimental methods to reduce the slew variations in subthreshold. (a) shows that increasing C MAX increases slew and reduces power. The numbers indicate the fixed C MAX in ff. (b) The impact of C MAX constraint on wire length, power and buffer count..8.6 (b)..2.3.4 C Q CD Variations (c) (d) Figure 3: Timing metric distributions for a clock tree designed in above threshold and then the power supply is scaled to subthreshold voltages. The variations are normalized by the best case scenario. Hold time distributions are for negative hold times. 3 Wire Length Power Figure 5: Distribute RC Line used to model wires possible to better control the slew in subthreshold reducing the average rise slew from 6ns to 3ns. At the same time we are reducing the slew, we are increasing the power by nearly %. This behavior is exhibited because as we reduce the C MAX each node can drive, the total interconnect capacitance remains constant and we need to insert more buffers to compensate for the reduced C MAX. Figure 4 (b) shows this trend of power and number of inverters as a function of C MAX. Additionally, the total wire length of the design has subtle changes, because a small wire will take the place on removed inverters. Note that going from a C MAX of ff to 3fF, the number of inverters decreases by 6%, but the power only decreases by nearly %. A large portion of this is unaffected power comes from the large interconnect capacitance. In summary to design a subthreshold clock tree, we require a smaller C MAX than in above threshold. 3.2 Minimum Wire Width in Subthreshold The wire interconnect contributes to a large portion of the power in a clock network. Reducing the capacitance in interconnect without sacrificing delay in the clock path can help to address this power component. Figure 5 shows the distributed RC line that is often used to model wires. The line of length L is broken into smaller networks each of L, with multiple resistor and capacitive paths. Based on [2] the design rule of thumb is that rc wire delays should only be considered when the line being modeled has reached a critical length, L crit : L crit td >> (6).38rc where td is the gate delay, r the resistance per unit length and c the capacitance per unit length. Based on the above equation, the L crit for a ns gate in 65nm (using the PTM model) should be much greater than 4mm. This value also assumes a minimum wire width to reduce the interconnect capacitance. A recent work showed a 65nm subthreshold chip with a 2.29mm x.86mm area [7]. Using this as a benchmark we could expect a worse case wire length of just over 4mm. Since we expect the length of the wire to be much less than the critical length, it may be possible to neglect the propagation delay across the wire. When L << L crit, the wire delay can respond to changes just as quickly as they arrive on the line. This is possible in subthreshold as the device resistance is much higher than the wire resistance and we are operating at lower frequencies. Since rc wire delays are negligible, all wires can be designed with minimum width as well. In above threshold it is not possible to neglect rc wire delays, therefore the wires are usually larger than minimum width. Additionally, because above threshold circuits operate at higher frequencies, the wire can not respond as quickly to the changes at the driver output. In subthreshold, a decrease in physical wire width will also reduce the interconnect capacitance, because the area of the wire is reduced. 8

Figure 6: Dynamic C MAX selection is the proposed technique to reduce slew variations at the sink nodes, while ensuring the power does not increase. Referring back to Figure 4 (a), we can show how reducing the wire width in a subthreshold clock tree can reduce the power without sacrificing slew. The above threshold clock tree used 4X minimum wires to ensure the rc delays were handled correctly. When this tree is lowered to subthreshold voltages, the 4X wire width only adds capacitance and thus power to the system. This result can allow us to design the clock tree in subthreshold with minimum widths to reduce wire cap and neglect the wire resistance compared to the driver resistance. 3.3 Dynamic C MAX Selection The previous two methods have provided insight into reducing the slew (and variation) while at the same time considering power. While they each have their pros, alone they can not provide an optimal subthreshold clock tree for slew control. As a result, the authors propose a new technique for optimal design in subthreshold, based on regressive C MAX near clock sinks. From the results of Figures 4 (a), we know that we can reduce the power in the clock network by increasing the C MAX that each node drives. At the other end of the spectrum, a smaller C MAX will control slew better. For the purposes of controlling slew, we are most interested in reducing variations at the sink nodes, which directly attach to flip-flops. We still need to control slew at other nodes, but we can relax constraint in order to save power (fewer or smaller buffers). To reduce power while achieving a proper slew at the sinks, C MAX should vary at each level of the clock tree, reducing the closer we get to the sink. Additionally, if we ensure the wire widths as minimum, we should receive more power savings. Figure 6 summarizes the proposed technique and the previous method with fixed C MAX. Figure 7 shows the potential performance that Dynamic C MAX selection has compared to fixed CMAX methods. For the given clock tree, there were ten buffer levels with ascending C MAX (3fF to ff) as clock signal gets closer to the sink node. It is apparent that the axes in the figure represent a power vs. robustness plane. In an optimal case, there is zero slew and zero power, leading us toward the origin. Based on this, curves that are closer to the origin represent a better power-robustness tradeoff. Figure 7 (a) shows the trend for a selection of points, while Figure 7 (b) zooms into a region where slew is small. In Figure 7 (b), we see that it is possible to reduce the power in the clock network by nearly 6% while maintaining the same slew, by employing Dynamic C MAX Selection. Table. shows a comparison of the nodal capacitances at each level of a clock tree employing a fixed vs. dynamic C MAX. The objective is to tighten the constraint of C MAX near the sink node and relax the constraint near the source. The dynamic trees Normalized Power.9.8 5 Dyn C MAX Fixed C MAX 3 4 5 6 25.95 3 3.5 4 4 and 5 were determined by a selected as a random C MAX pattern and shows that it is possible to achieve a power and robustness (slew) improvement. This demonstrates the possibilities of dynamic C MAX selection and the authors there exists a methodology to optimally decide the dynamic C MAX values at different levels to co-optimize the slew variations and clock power. 4. FI AL CLOCK TREE RESULTS To check the results of the combined methods researched above, we have implemented several subthreshold clock trees in 65nm CMOS, based off the PTM Model at V DD = 3mV. Figure 8 (a) represents a plane of transitions as a new technique was employed at each number to reduce the slew variations. The results are depicted for rise slew information, but a similar trend exists for fall slews. We started with an above threshold clock tree (with Fixed C MAX ) that was scaled to subthreshold voltages. The transitions can be described as follows: ( to 2) the C MAX was reduced from 25 to ff and the tree was redesigned using fixed C MAX ; (2 to 3) the wire length was reduced from 4X to X minimum sized, and then redesigned with fixed C MAX, (3 to 4) one possibility for Dynamic C MAX Selection was used, and (4 to 5) another possibility for Dynamic C MAX Selection is used. With a starting point of and ending point of 5, this proves that it is possible to reduce the slew without increasing power in a subthreshold clock tree. Figure 8 (a) and (b) show that our final clock tree, 5, has smaller slew variations compared to the original above threshold design, scaled to subthreshold. Figure 8 (b) shows that the clock tree designed with Dynamic C MAX selection has reduced the variations in slew from the range of 3- ns to the range of 3-6ns. Recall that in Figure 2, a slew of ns would cause a setup time variation of 9% from the design target. By using the dynamic C MAX selection, the slew of 6ns would cause the setup time variations to be only 4% of the design target. This is a 5% reduction in setup time deviations and shows a significant improvement in using Dynamic C MAX Selection over simply using above threshold methods. Lastly, the coefficient of variation for the new clock tree is.579 compared to the larger dispersion of.239. Lastly, Figure 8 (c) shows the final clock tree (5) implemented with the Dynamic C MAX Selection. 5. CO CLUSIO In this paper, we have explained how the design of a subthreshold clock tree can impact the slew variations. These slew variations.9 Dyn C MAX Fixed C MAX (a) (b) Figure 7: Preliminary results of slew control and power savings of Dynamic C MAX Selection. The numbers indicate the fixed C MAX in ff. By traversing the Dynamic C MAX path, we can achieve better slew control at targeted power. 5 9

Tree # FIXED #4 DY #5 DY Normalized Power SOURCE.3.2..9 Table. Summary of Fixed C MAX vs. Dynamic C MAX for combined methods 9 8 7 6 5 4 3 2 SI K have the potential to cause setup and hold time violations that will corrupt the flow of data in a logic path. We showed that for the selected flip-flop, as the slew increases, setup, hold and clock-to-q times worsened. This notion provided the motive to design an optimal subthreshold clock tree with slew control. We conclude that the following guidelines should be used when designing an optimal clock tree in subthreshold with slew control:.) The maximum allowable nodal capacitance should be reduced to control the slew in the subthreshold clock tree at the expense of higher power. 2.) The minimum wire size should be used to design the clock tree as wire resistance is insignificant compared to the device resistance. 3.) The maximum nodal capacitance can be controlled dynamically to allow more slew propagation near the root of the tree while saving power. On the other hand, near the sink-nodes (i.e. leaves) the maximum nodal capacitance should be reduced to better control the slew. We have presented a systematic approach combining the above three guidelines for subthreshold clock tree design that has the potential to reduce the timing metric variations by 5% or more while maintaining the power advantage of subthreshold design. The concept of slew control to reduce the timing errors in subthreshold is very complex and the focus of this study was designated towards controlling clock slew. This work inspires the authors to further investigate the effects of slew in both clock and data. Additionally, the flip-flop design will also have an impact on how severe the timing variations are. By co-designing these efforts, it will be possible to further reduce the variations in slew; thus reducing the probability of timing violations. 6. REFERE CES [] A. Wang et. al, Optimal Supply and Threshold Scaling for Subthreshold CMOS Circuits, Symposium of VLSI, 2 [2] N. Magen, A. Kolodny, et. al, Interconnect-power dissipation in a microprocessor, International Workshop on SLIP, 4 orm. Power 25fF 25fF 25fF 25fF 25fF 25fF 25fF 25fF 25fF 25fF. 5.92ns 3fF 3fF 3fF 3fF 3fF 3fF 3fF 5fF 5fF 5fF.2 4.5ns 3fF 3fF 3fF 3fF 3fF ff ff ff ff ff.97 5.5ns 3 2 4 5 2 4 6 8 Variations 8 6 4 4 6 8 Slew Distribution Scaled Tree Dyn C MAX σ / µ (new) =.579 (a) (b) (c) Figure 8: (a) Summary of combined approaches to reduce slew variations in subthreshold. (b) Dynamic C MAX slew distribution compared with the original clock tree. The new slew distribution has a coefficient of variation of.579, better than the original.239 of Figure. (c) Clock tree (5) routed using Dynamic C MAX and combined methods. Avg. Slew [3] T. Sakurai and A.R. Newton, "Alpha-power model, and its application to CMOS inverter delay and other formulas", IEEE JSSC vol. 25, pp. 584-594, April 99. [4] B. Zhai, S. Hanson, D. Blaauw, D. Slyvester, Analysis and Mitigation of Variability in Subthreshold Design, International Symposium on Low Power Electronic Design, August 5 [5] J. Kwong, A. Chandrakasan, Variation-driven device sizing for minimum energy sub-threshold circuits, ISLPED, October 4-6, 6, Tegernsee, Bavaria, Germany [6] N. Jayakumar, S.P. Khatri, A Variation-tolerant Sub-threshold Design Approach, Design Automation Conference (5), [7] N. Lotze, M. Ortmanns, Y. Manoli, Variability of flip-flop timing at sub-threshold voltages, ISLPED, Bangalore, India, August 8 [8] N. Verma and A. Chandrakasan, Nanometer MOSFET Variation in Minimum Energy Subthreshold Circuits, JSSC January 8 [9] N. Hedenstierna and K.. Jeppson, CMOS circuit speed and buffer optimization, IEEE TCAD, no. 2, pp. 27-28, Mar. 987. [] J. R. Tolbert, S. Mukhopadhyay, Accurate Buffer Modeling with Slew Propagation in Subthreshold Circuits, ISQED, March 9 [] Predictive Technology Model, http://www.eas.asu.edu/~ptm/ [2] J. Rabaey et. al, Digital Integrated Circuits, 3 [3] G. E. Tellez and M. Sarrafzadeh, "Minimal buffer insertion in clock trees with skew and slew rate constraints," IEEE TCAD, vol. 6, no. 4, pp. 333-342, April 997. [4] C. J. Alpert, A. B. Kahng, L. Bao, I. I. Mandoiu, and A. Z. Zelikovsky, "Minimum buffered routing with bounded capacitive load for slew rate and reliability control," IEEE TCAD, vol. 22, no. 3, pp. 24-253, March 3. [5] C. Albrecht, A. B. Kahng, L. Bao, I. I. Mandoiu, and A. Z. Zelikovsky, "On the skew-bounded minimum-buffer routing tree problem," IEEE TCAD, vol. 22, no. 7, pp. 937-945, July 3. [6] S. Hu, C. Alpert, J. Hu, S. Karandikar, Z. Li, W. Shi, and C. Sze, "Fast algorithms for slew-constrained minimum cost buffering," IEEE TCAD, vol. 26, no., pp. 9-22, Nov. 7. [7] J. Kwong and et. al, A 65 nm Sub-Vt Microcontroller With Integrated SRAM and Switched Capacitor DC-DC Converter, IEEE JSSC, January 9