Systolic and Pipelined. Processors

Similar documents
INDIVIDUALIZATION FEATURE OF HEAD-RELATED TRANSFER FUNCTIONS BASED ON SUBJECTIVE EVALUATION. Satoshi Yairi, Yukio Iwaya and Yôiti Suzuki

Stochastic Extension of the Attention-Selection System for the icub

APPLICATION OF THE WALSH TRANSFORM IN AN INTEGRATED ALGORITHM FO R THE DETECTION OF INTERICTAL SPIKES

Instructions For Living a Healthy Life

Joint Iterative Equalization and Decoding for 2-D ISI Channels

Uncoordinated Checkpointing. Rollback-Recovery. The Domino Effect. The Domino Effect. Easy to understand No synchronization overhead Flexible p

Chapter 16. Simple patterns of inheritance

SMARTPHONE-BASED USER ACTIVITY RECOGNITION METHOD FOR HEALTH REMOTE MONITORING APPLICATIONS

Radiation Protection Dosimetry, Volume II: Technical and Management System Requirements for Dosimetry Services. REGDOC-2.7.

Measurement uncertainty of ester number, acid number and patchouli alcohol of patchouli oil produced in Yogyakarta

Objective Find the Coefficient of Determination and be able to interpret it. Be able to read and use computer printouts to do regression.

Multiscale Model of Oxygen transport in Diabetes

HaLoop: Efficient Iterative Data Processing On Large Clusters

OPTIMUM AUTOFRETTAGE PRESSURE IN THICK CYLINDERS

HTGR simulations in PSI using MELCOR 2.2

Simulation of the Human Glucose Metabolism Using Fuzzy Arithmetic

A Brain-Machine Interface Enables Bimanual Arm Movements in Monkeys

Interpreting Effect Sizes in Contrast Analysis

Analytical and Numerical Investigation of FGM Pressure Vessel Reinforced by Laminated Composite Materials

Shunting Inhibition Controls the Gain Modulation Mediated by Asynchronous Neurotransmitter Release in Early Development

Why do we remember some things and not others? Consider

Coach on Call. Thank you for your interest in When the Scale Does Not Budge. I hope you find this tip sheet helpful.

An Ethological and Emotional Basis for Human-Robot Interaction

FUNCTIONAL MAGNETIC RESONANCE IMAGING OF LANGUAGE PROCESSING AND ITS PHARMACOLOGICAL MODULATION DISSERTATION

Numerical Investigations of Open-water Performance of Contra-Rotating Propellers

F1 generation: The first set of offspring from the original parents being crossed. F2 generation: The second generation of offspring.

Structural Safety. Copula-based approaches for evaluating slope reliability under incomplete probability information

The Egyptian Journal of Hospital Medicine (January 2018) Vol. 70, Page

Test Retest and Between-Site Reliability in a Multicenter fmri Study

Realizing Undelayed n-step TD Prediction with Neural Networks

The Probability of Disease. William J. Long. Cambridge, MA hospital admitting door (or doctors oce, or appropriate

Coupling of Substructures for Dynamic Analyses

Word and tree-based similarities for textual entailment

Design of a Fully Differential Current Mode Operational Amplifier with Improved Input-Output Impedances and Its Filter Applications

A Mathematical Model of The Effect of Immuno-Stimulants On The Immune Response To HIV Infection

Protecting Location Privacy: Optimal Strategy against Localization Attacks

Influencing Factors on Fertility Intention of Women University Students: Based on the Theory of Planned Behavior

Collaborative Evaluation of a Fluorometric Method for Measuring Alkaline Phosphatase Activity in Cow s, Sheep s, and Goat s Milk

METHODS FOR DETERMINING THE TAKE-OFF SPEED OF LAUNCHERS FOR UNMANNED AERIAL VEHICLES

GLOBAL PARTNERSHIP FOR YOUTH EMPLOYMENT. Testing What Works. Evaluating Kenya s Ninaweza Program

Analysis on Retrospective Cardiac Disorder Using Statistical Analysis and Data Mining Techniques

Visual stimulus locking of EEG is modulated by temporal congruency of auditory stimuli

Multi View Cluster Approach to Explore Multi Objective Attributes based on Similarity Measure for High Dimensional Data

If your child is poorly

Technical and Economic Analyses of Poultry Production in the UAE: Utilizing an Evaluation of Poultry Industry Feeds and a Cross-Section Survey

Effect of Hydroxymethylated Experiment on Complexing Capacity of Sodium Lignosulfonate Ting Zhou

Predictors of Maternal Identity of Korean Primiparas

OPRA: Optimal Relay Assignment for Capacity Maximization in Cooperative Networks

Nadine Gaab, 1,2 * John D.E. Gabrieli, 1 and Gary H. Glover 2 INTRODUCTION. Human Brain Mapping 28: (2007) r

COHESION OF COMPACTED UNSATURATED SANDY SOILS AND AN EQUATION FOR PREDICTING COHESION WITH RESPECT TO INITIAL DEGREE OF SATURATION

CH 11: Mendel / The Gene. Concept 11.1: Mendel used the scientific approach to identify two laws of inheritance

Heritability of Small-World Networks in the Brain: A Graph Theoretical Analysis of Resting-State EEG Functional Connectivity

Common and Specific Contributions of the Intraparietal Sulci to Numerosity and Length Processing

Abstract. Background and objectives

Probing feature selectivity of neurons in primary visual cortex with natural stimuli.

Adaptive and Context-Aware Privacy Preservation Schemes Exploiting User Interactions in Pervasive Environments

QUEEN CONCH STOCK RESTORATION

Application Of Fuzzy Logic Controller For Intensive Insulin Therapy In Tpye 1 Diabetic Mellitus Patient

Disrupted Rich Club Network in Behavioral Variant Frontotemporal Dementia and Early-Onset Alzheimer s Disease

Relationship of Mammographic Parenchymal Patterns with Breast Cancer Risk Factors and Risk of Breast Cancer in a Prospective Study

Activation of the Caudal Anterior Cingulate Cortex Due to Task-Related Interference in an Auditory Stroop Paradigm

lsokinetic Measurements of Trunk Extension and Flexion Performance Collected with the Biodex Clinical Data Station

NUMERICAL INVESTIGATION OF BVI MODELING EFFECTS ON HELICOPTER ROTOR FREE WAKE SIMULATIONS

sfo ffi FORM TP CARIBBEAN EXAMINATIONS COUNCIL

Two universal runoff yield models: SCS vs. LCM

Advanced Placement Psychology Grades 11 or 12

Re-entrant Projections Modulate Visual Cortex in Affective Perception: Evidence From Granger Causality Analysis

Journal of Applied Science and Agriculture

Frequency Domain Connectivity Identification: An Application of Partial Directed Coherence in fmri

Altered Sleep Brain Functional Connectivity in Acutely Depressed Patients

Stacy R. Tomas, David Scott and John L. Crompton. Department of Recreation, Park and Tourism Sciences, Texas A&M University, USA

Micromethod for the Methyl Red Test

The Engagement of Mid-Ventrolateral Prefrontal Cortex and Posterior Brain Regions in Intentional Cognitive Activity

Evaluation of the accuracy of Lachman and Anterior Drawer Tests with KT1000 ın the follow-up of anterior cruciate ligament surgery

The Neural Signature of Phosphene Perception

BEFORE PRACTICE BE PROACTIVE

Effect of Using and Reusing Melt-Blown, Microwavable Materials to Heat Frozen Fish Filets: Objective and Sensory Perspectives

Equine Biosecurity Principles and Best Practices

Causal Beliefs Influence the Perception of Temporal Order

I Can PresCribE A Drug: Mnemonic-based Teaching of Rational Prescribing

By: Charlene K. Baker, Fran H. Norris, Eric C. Jones and Arthur D. Murphy

Your Golden Guide G to Fundraising Success. Together we can make every moment count for sick kids

An Efficient Method for Correlation of Vapor Pressure of Gaseous Compounds Containing C-H-O

Dietary Assessment in Epidemiology: Comparison of a Food Frequency and a Diet History Questionnaire with a 7-Day Food Record

Structural Brain Magnetic Resonance Imaging of Pediatric Twins

Neural Correlates of Strategic Memory Retrieval: Differentiating Between Spatial-Associative and Temporal-Associative Strategies

Could changes in national tuberculosis vaccination policies be ill-informed?

A Novel Current-Mode Min-Max Circuit

National Institutes of Health, Bethesda, Maryland 4 Genomic Imaging Unit, Department of Psychiatry, University of Würzburg, Germany

PEKKA KANNUS, MD* Downloaded from at on April 8, For personal use only. No other uses without permission.

CHAPTER 1 BACKGROUND ON THE EPIDEMIOLOGY AND MODELING OF BIV/AIDS

The Effects of Rear-Wheel Camber on Maximal Effort Mobility Performance in Wheelchair Athletes

Detecting Marker-Disease Association by Testing for Hardy-Weinberg Disequilibrium at a Marker Locus

Lifespan Psychology. Chapter 3 Biology of Development. Outline. Infertility. Infertility/Reproductive technologies. Heredity/DNA/Genes

An Efficient RLS Algorithm For Output-Error Adaptive IIR Filtering And Its Application To Acoustic Echo Cancellation

The Regional Economics Applications Laboratory (REAL) of the University of Illinois focuses on the development and use of analytical models for urban

Real-Time fmri Using Brain-State Classification

CORE CONSOLIDATION OF HERITAGE STRUCTURE MASONRY WALLS & FOUNDATIONS USING GROUTING TECHNIQUES - CANADIAN CASE STUDIES. Paul A.

Imperfect Vaccine Aggravates the Long-Standing Dilemma of Voluntary Vaccination

The Impact of College Experience on Future Job Seekers Diversity Readiness

Transcription:

Gaduate Institute of Electonics Engineeing, NTU Systolic and Pipelined Why Systolic Pocessos Achitectue? Souces: VLSI Signal Pocessing ÜÜ Ande Hon, Jason Handube, Michelle Gunning, Beman, J. Kim, Heiko Schoede, Hai Tao, Shaaban ACCESS IC LAB

Systolic Pocessos Discussed on Monday, May 26 1. Achitectue fo Matix Matix multiplication 2. Achitectue fo Matix Vecto Multiplication 3. Convolution, one-dimensional and polynomial multiplication 4. Pipelined Systolic matche fo many gene pattens in a long chomosome. Count 100%, 75% and 50 % matches. 5. Systolic pocesso that solves Petick Function, SAT, and SAT with minimal liteal cost poblems. (not shown hee, if you ae inteested, ask Newton fo the solution).

Methodology of designing systolic pocessos. 1. Wite full equation fo esults. 2. Skew the input flows o put inputs to pocessos. 3. The inputs may come fom two diffeent diections, these diections may be opposite 4. Sometimes the pocessos must be sepaated by othe types of pocessos o just delays (do-nothing egistes clocked by geneal clock). Howeve, sometimes skewing input data is sufficient fo this. 5. Daw the stuctue of pocessos. 6. Design data path of each pocesso. 7.Veify the coect opeation. Daw snapshots, use colo pencils. 8. Design the contolle of each pocesso o a SIMD global contolle. Remembe about synchonization. Each pocesso must complete its wok in the exactly same moment (clock pulse). It is somehow simila to Sieve of Eatostenes but has inteesting ealization of logic opeatos and use of andom numbe geneato..

Pipelined Computations Pipelined pogam divided into a seies of tasks that have to be completed one afte the othe. Each task executed by a sepaate pipeline stage Data steamed fom stage to stage to fom computation f, e, d, c, b, a P1 P2 P3 P4 P5

Pipelined Computations Computation consists of data steaming though pipeline stages Execution Time = Time to fill pipeline (P-1) P = # of pocessos N = # of data items (assume P < N) + Time to un in steady state (N-P+1) + Time to empty pipeline (P-1) f, e, d, c, b, a P1 P2 P3 P4 P5 P5 P4 P3 P2 P1 a b a b c a b c d a b c d e a b c d e f c d e f d e f e f f time

Pipelined Example: Sieve of Eatosthenes Goal is to take a list of integes geate than 1 and poduce a list of pimes E.g. Fo input 2 3 4 5 6 7 8 9 10, output is 2 3 5 7 Fan s pipelined appoach (a little diffeent than the book): Pocesso P_i divides each input by the ith pime If the input is divisible (and not equal to the diviso), it is maked (with a negative sign) and fowaded If the input is not divisible, it is fowaded Last pocesso only fowads unmaked (positive) data [pimes]

Sieve of Eatosthenes Pseudo-Code Code fo pocesso Pi (and pime p_i): x=ecv(data,p_(i-1)) If (x>0) then If (p_i divides x and p_i = x ) then send(-x,p_(i+1) If (p_i does not divide x o p_i = x) then send(x, P_(i+1)) Else Send(x,P_(i+1)) / Code fo last pocesso x=ecv(data,p_(i-1)) If x>0 then send(x,output) P2 P3 P5 P7 out

Simple Sting Matching Steam text past set of matches =c1? =c2? =c3? & & &

Matix Vecto Multiplication Conside multiplying a 3x2 X 2x1 matix:

Systolic Aays

Y values goes left, X values go ight, A values fan in T0 T1 T2 T3 T4 T5 T6 T7

The pocesso 1. The pocesso was designed on Monday by students. 2. Ask fo thei notes o design you own. 3. You need to emembe data path design and egiste tansfe design, as well as FSM synthesis, in geneal.

Systolic Aay Example: 3x3 Systolic Aay Matix Multiplication Evey cell is like this: ai bi egiste * clea + egiste Clock to all thee egistes egiste 1. Obseve multiplie/adde combination, so typical 2. Obseve accumulation of esults in-place. Evey cell of final matix is a pocesso. Results emain in these pocessos, next they should be shifted-out, if necessay.

Systolic Aay Example: 3x3 Systolic Aay Matix Multiplication Pocessos aanged in a 2-D gid Each pocesso accumulates one element of the poduct Alignments in time b2,2 b2,1 b1,2 b2,0 b1,1 b0,2 b1,0 b0,1 b0,0 Columns of B Rows of A a0,2 a0,1 a0,0 a1,2 a1,1 a1,0 a2,2 a2,1 a2,0 T = 0 Example souce: http://www.cs.hmc.edu/couses/2001/sping/cs156/

Systolic Aay Example: 3x3 Systolic Aay Matix Multiplication Pocessos aanged in a 2-D gid Each pocesso accumulates one element of the poduct Alignments in time b2,2 b2,1 b1,2 b2,0 b1,1 b0,2 b1,0 b0,1 b0,0 a0,2 a0,1 a0,0 a0,0*b0,0 a1,2 a1,1 a1,0 a2,2 a2,1 a2,0 T = 1 Example souce: http://www.cs.hmc.edu/couses/2001/sping/cs156/

Systolic Aay Example: 3x3 Systolic Aay Matix Multiplication Pocessos aanged in a 2-D gid Each pocesso accumulates one element of the poduct Alignments in time b2,2 b2,1 b1,2 b2,0 b1,1 b0,2 b1,0 b0,1 a0,2 a0,1 a0,0*b0,0 + a0,1*b1,0 a0,0 a0,0*b0,1 b0,0 a1,2 a1,1 a1,0 a1,0*b0,0 a2,2 a2,1 a2,0 T = 2 Example souce: http://www.cs.hmc.edu/couses/2001/sping/cs156/

Systolic Aay Example: 3x3 Systolic Aay Matix Multiplication Pocessos aanged in a 2-D gid Each pocesso accumulates one element of the poduct Alignments in time b2,2 b2,1 b1,2 b2,0 b1,1 b0,2 a0,2 a0,0*b0,0 + a0,1*b1,0 + a0,2*b2,0 a0,1 a0,0*b0,1 + a0,1*b1,1 a0,0 a0,0*b0,2 b1,0 b0,1 a1,2 a1,1 a1,0*b0,0 + a1,1*b1,0 a1,0 a1,0*b0,1 b0,0 a2,2 a2,1 a2,0 a2,0*b0,0 T = 3 Example souce: http://www.cs.hmc.edu/couses/2001/sping/cs156/

Systolic Aay Example: 3x3 Systolic Aay Matix Multiplication Pocessos aanged in a 2-D gid Each pocesso accumulates one element of the poduct Alignments in time b2,2 b2,1 b1,2 a0,0*b0,0 + a0,1*b1,0 + a0,2*b2,0 a0,2 a0,0*b0,1 + a0,1*b1,1 + a0,2*b2,1 a0,1 a0,0*b0,2 + a0,1*b1,2 b2,0 b1,1 b0,2 a1,2 a1,0*b0,0 + a1,1*b1,0 + a1,2*a2,0 a1,1 a1,0*b0,1 +a1,1*b1,1 a1,0 a1,0*b0,2 a2,2 a2,1 b1,0 a2,0*b0,0 + a2,1*b1,0 b0,1 a2,0 a2,0*b0,1 T = 4 Example souce: http://www.cs.hmc.edu/couses/2001/sping/cs156/

Systolic Aay Example: 3x3 Systolic Aay Matix Multiplication Pocessos aanged in a 2-D gid Each pocesso accumulates one element of the poduct Alignments in time b2,2 a0,0*b0,0 + a0,1*b1,0 + a0,2*b2,0 a0,0*b0,1 + a0,1*b1,1 + a0,2*b2,1 a0,2 a0,0*b0,2 + a0,1*b1,2 + a0,2*b2,2 b2,1 b1,2 a1,0*b0,0 + a1,1*b1,0 + a1,2*a2,0 a1,2 a1,0*b0,1 +a1,1*b1,1 + a1,2*b2,1 a1,1 a1,0*b0,2 + a1,1*b1,2 a2,2 b2,0 a2,0*b0,0 + a2,1*b1,0 + a2,2*b2,0 b1,1 a2,1 a2,0*b0,1 + a2,1*b1,1 b0,2 a2,0 a2,0*b0,2 T = 5 Example souce: http://www.cs.hmc.edu/couses/2001/sping/cs156/

Systolic Aay Example: 3x3 Systolic Aay Matix Multiplication Pocessos aanged in a 2-D gid Each pocesso accumulates one element of the poduct Alignments in time a0,0*b0,0 + a0,1*b1,0 + a0,2*b2,0 a0,0*b0,1 + a0,1*b1,1 + a0,2*b2,1 a0,0*b0,2 + a0,1*b1,2 + a0,2*b2,2 b2,2 a1,0*b0,0 + a1,1*b1,0 + a1,2*a2,0 a1,0*b0,1 +a1,1*b1,1 + a1,2*b2,1 a1,2 a1,0*b0,2 + a1,1*b1,2 + a1,2*b2,2 a2,0*b0,0 + a2,1*b1,0 + a2,2*b2,0 b2,1 a2,2 a2,0*b0,1 + a2,1*b1,1 + a2,2*b2,1 b1,2 a2,1 a2,0*b0,2 + a2,1*b1,2 T = 6 Example souce: http://www.cs.hmc.edu/couses/2001/sping/cs156/

Systolic Aay Example: 3x3 Systolic Aay Matix Multiplication Pocessos aanged in a 2-D gid Each pocesso accumulates one element of the poduct Alignments in time a0,0*b0,0 + a0,1*b1,0 + a0,2*b2,0 a0,0*b0,1 + a0,1*b1,1 + a0,2*b2,1 a0,0*b0,2 + a0,1*b1,2 + a0,2*b2,2 Done a1,0*b0,0 + a1,1*b1,0 + a1,2*a2,0 a1,0*b0,1 +a1,1*b1,1 + a1,2*b2,1 a1,0*b0,2 + a1,1*b1,2 + a1,2*b2,2 b2,2 a2,0*b0,0 + a2,1*b1,0 + a2,2*b2,0 a2,0*b0,1 + a2,1*b1,1 + a2,2*b2,1 a2,2 a2,0*b0,2 + a2,1*b1,2 + a2,2*b2,2 T = 7 Example souce: http://www.cs.hmc.edu/couses/2001/sping/cs156/

Hexagonal Systolic Aay fo matix-matix multiplication This is an inteesting modification of pevious design in which movement is in thee diections and the esults ae not stoed but flow out It is you task to design each pocesso in detail

Example Systolic Algoithm: Matix Multiplication Poblem: multiply two nxn matices A ={a_ij} and B={b_ij}. Poduct matix will be R={_ij}. Systolic solution uses 2D aay with NxN cells, 2 input steams and 2 output steams

Systolic Matix Multiplication a44 a34 a24 a14 == == == b41 b42 b43 b44 a43 a33 a23 a13 == == b31 b32 b33 b34 a42 a32 a22 a12 === b21 b22 b23 b24 a41 a31 a21 a11 -- -- -- P11 b11 b12 b13 b14 -- -- -- -- -- P21 P12 -- -- P31 P22 P13 -- -- P41 P32 P23 P14 P42 P33 P24 P43 P34 P44

k, Opeation at each cell Each cell updates at each time step as shown i j below 0 i, j initialized to 0 a i, k b k, j k 1 i, j k i, j + a = i, k b k, j b k, j a i, k

Data Flow fo Systolic MM Beat 1 Beat 2 a11 b11 1 1,1 a 21 a 12 1 2,1 2 1,1 b 21 1 1,2 b 12

Data Flow fo Systolic MM Beat 3 Beat 4 a 31 a 22 1 3,1 a 13 2 2,1 3 1,1 1 2,2 b 31 2 1,2 b 22 1 1,3 b 13 a 41 a 32 1 4,1 a 23 2 3,1 a 14 3 2,1 1 3, 2 4 1,1 2 2,2 b 41 3 1,2 1 2,3 b 32 2 1,3 b 23 1 1,4 b 14

Data Flow fo Systolic MM Beat 5 Beat 6 a 42 a 33 2 4,1 a 24 3 3,1 4 2,1 2 3,2 1,1 3 2,2 4 1,2 2 2,3 b 42 3 1,3 b 33 2 1,4 b 24 a 43 a 34 3 4,1 4 3,1 2,1 1,1 1, 2 3 3,2 4 2,2 3 2,3 4 1,3 b 43 3 1,4 b 34 1 4,2 1 3,3 1 2,4 2 4,2 1 4,3 2 3,3 1 3,4 2 2,4

Data Flow fo Systolic MM Beat 7 Beat 8 1,1 1,1 1, 2 2,1 1, 2 2,1 a 44 4 4,1 3,1 2, 2 1, 3 4 4 3, 2 2,3 3 4,2 2 4,3 3 3,3 2 3,4 3 2,4 4 1,4 b 44 3,1 2, 2 1, 3 4,1 3, 2 4 4,2 2, 3 3 4,3 4 3,3 3 3,4 4 2,4 1,4 1 4,4 2 4,4

Data Flow fo Systolic MM Beat 9 Beats 10 and 11 1,1 1, 2 2,1 2,1 1,1 1, 2 3,1 2, 2 1, 3 4,1 3, 2 2, 3 1,4 3,1 2, 2 1, 3 4,2 3, 3 2, 4 4,1 3, 2 2, 3 4,2 3, 3 2, 4 4 4 4,3 3,4 3 4,4 1,4 2,1 1,1 1, 2 3,1 2, 2 1, 3 2 4,3 3, 4 4 4,4 4,1 2, 3 3, 1,4 4,2 3, 3 2, 4 4,3 3, 4 4,4

Gaduate Institute of Electonics Engineeing, NTU 1. Use this method fo 1D polynomial multiplication 2. Genealize to two dimensions. ÜÜ ACCESS IC LAB

Design B1 Peviously poposed fo cicuits to implement a patten matching pocesso and = fo a cicuit to implement polynomial multiplication. - Boadcast input, move esults, weights stay - (Semi-systolic convolution aays with global data communication

Design B2 = = The path fo moving y i s is wide then w i s because of y i s cay moe bits then w i s in numeical accuacy. The use of multiplie-accumulatos may also help incease pecision of the esult, since exta bit can be kept in these accumulatos with modest cost. Boadcast input, move weights, esults stay [(Semi-) systolic convolution aays with global data communication]

Design F When numbe of cell is lage, the adde can be implemented as a pipelined adde tee to avoid lage delay. Design of this type using unbounded fan-in. - Fan-in esults, move inputs, weights stay - Semi-systolic convolution aays with global data communication

Design R1 Design R1 has the advantage that it dose not equie a bus, o any othe global netwok, fo collecting output fom cells. The basic ideal of this de-sign has been used to implement a patten matching chip. - Results stay, inputs and weights move in opposite diections - Pue-systolic convolution aays with global data communication

Design R2 Multiplie-accumulato can be used effectively and so can tag bit method to signal the output of each cell. Compaed with R1, all cells wok all the time when additional egiste in each cell to hold a w value. - Results stay, inputs and weights move in the same diection but at diffeent speeds - Pue-systolic convolution aays with global data communication

Design W1 This design is fundamental in the sense that it can be natually extend to pefom ecusive filteing. This design suffes the same dawback as R1, only appoximately 1/2 cells wok at any given time unless two independent computation ae inteleaved in the same aay. -Weights stay, inputs and esults move in opposite diection - Pue-systolic convolution aays with global data communication

Design W2 This design lose one advan-tage of W1, the constant esponse time. This design has been extended to implement 2-D convolution, whee high thoughputs athe than fast esponse ae of concen. -Weights stay, inputs and esults move in the same diection but at diffeent speeds - Pue-systolic convolution aays with global data communication

Remaks Above designs ae all possible systolic designs fo the convolution poblem. Using a systolic contol path, weight can be selected onthe-fly to implement intepolation o adaptive filteing. We need to undestand pecisely the stengths and dawbacks of each design so that an appopiate design can be selected fo a given envionment. Fo impoving thoughput, it may be wothwhile to implement multiplie and adde sepaately to allow ovelapping of thei execution. (Such as next page show) When chip pin is consideed, pue-systolic equies fou; semi-systolic equies thee I/O pots.

Ovelapping the executions of multiply-and-add in design W1

Citeia and advantages The design makes multiple use of each input data item Because of this popety, systolic systems can achieve high thoughputs with modest I/O bandwidths fo outside communication. The design uses extensive concuency Concuency can be obtained by pipelining the stages involved in the computation of each single esult, by multipocessing many esults in paallel, o by both.

Citeia and advantages Thee ae only a few types of simple cells To achieve pefomance goals, a systolic system is likely to use a lage numbe of cells which must be simple and of only a few types to cutail design and implementation cost. Data and contol flow ae simple and egula Pue systolic system totally avoid long-distance o iegula wies fo data communication.

Othogonal Tiangulaization.. On-the-fly least-squaes solutions using one and two dimensional systolic aay, with p=4.

paallel mege initial situation: x1 x2 x3 x4 x5 x6 x7...... x17 x18 y1 y2 y3 y4 y5 y6 y7...... y17 y18 1.) sot columns (odd-even-tansposition sot) 2.) sot ows (odd-even-tansposition sot) soted!!!!

0-1 pinciple The 0-1 pinciple states that if all sequences of 0 and 1 ae soted popely than this is a coect sote. The sote must be based on moving data. 0s 1s 0s 0s initially afte vetical sot 1s 0s afte hoizontal sot 1s

MIMD-mesh (clocked) min max Time: 2n

systolic mege 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2

systolic mege 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2

systolic mege 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2

systolic mege 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2

systolic mege 1 3 3 4 5 5 6 7 94 84 83 72 49 48 38 27

systolic mege 1 3 3 4 5 5 6 7 4 4 3 2 9 8 8 7

systolic mege 1 3 3 4 4 4 3 2 5 5 6 7 9 8 8 7

systolic mege 1 3 3 4 4 4 3 2 5 5 6 7 9 8 8 7

systolic mege 1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7

systolic mege 1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7

systolic mege 1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7

systolic mege 1 3 2 3 4 3 4 4 5 5 6 7 9 8 8 7

systolic mege 1 3 2 3 4 3 4 4 5 5 6 7 9 8 8 7

systolic mege 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7

systolic mege 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7

systolic mege 1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7

systolic mege 1 2 3 3 3 4 4 4 5 5 6 7 8 9 7 8

systolic mege 1 2 3 3 3 4 4 4 5 5 6 7 8 9 7 8

systolic mege 1 2 3 3 3 4 4 4 5 5 6 7 8 7 9 8

systolic mege 1 2 3 3 3 4 4 4 5 5 6 7 8 7 9 8

systolic mege systolic mege 1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9 soted!!!

Poblems to think about 1. Compae these designs with sotes shown ealie in class 2. Use these ideas to edesign sotes/absobes and sotes that emove epeated elements. 3. Think about othe applications of the ideas shown hee fo sot and mege.

Applications of Systolic Aay - Signal and image pocessing: FTR, IIR filteing, and 1-D convolution. 2-D convolution and coelation. Discete Fuie tansfom Intepolation 1-D and 2-D median filteing Geometic waping

Applications of Systolic Aay - Matix aithmetic: Matix-vecto multiplication Matix-matix multiplication Matix tiangulaization (solution of linea systems, matix invesion) QR decomposition (eigenvalue, least-squae computation) Solution of tiangula linea systems

Applications of Systolic Aay - Non-numeic applications: Data stuctue Gaph algoithm Language ecognition Dynamic pogamming Encode (polynomial division) Relational data-base opeations