High Performance Memory Communication Architectures for Coarse-grained Reconfigurable Computing Systems
Ph.D. Thesis
 

Table of Contents

Abstract V

Table of Contents VII

1. Introduction 1

2. Coarse-grained Configurable Architectures 5

2.1 The DP-FPGA Architecture 6

2.2 The KressArray-1 7

2.3 The RaPiD Architecture 9

2.4 The MATRIX Architecture 10

2.5 The Raw Machine 13

2.6 The Pleiades System 14

2.7 The KressArray-3 15

2.8 MorphoSys Reconfigurable Cell Array 17

2.9 The CHESS Array 19

2.10 The DReAM Array 21

2.11 Conclusions 22

3. Memory Technologies 25

3.1 Current Memory Devices Suitable to Form Large Data Memories 25

3.1.1 Conclusions 27

3.2 The Data Memory Organization of Reconfigurable Systems 28

3.2.1 Reconfigurable System With External Memory 29

3.2.2 Reconfigurable Architecture With Virtual Internal Memory 31

3.2.3 Reconfigurable Architecture With Embedded Memory 31

3.2.4 Reconfigurable Architecture Surrounded by Memory 34

3.2.5 Conclusions 35

4. Earlier Address Generators 37

4.1 The Structured Memory Access Machine 38

4.2 The Address Generator of the Map-oriented Machine 1 42

4.3 A Synthesis Method for Address Generators 43

4.4 The Address Generator of the Map-oriented Machine 2 44

4.5 The Video Signal Processor 47

4.6 The Address Generator of the Map-oriented Machine 3 51

4.7 The Texas Instruments TMS320C54x DSP 57

4.8 The Adopt Project 62

4.9 The Intersil HSP45240 Address Sequencer 66

4.10 Conclusions 70

5. Reconfigurable Systems using Data Sequencing 73

5.1 The PRISM-II System 75

5.2 Riley-2 76

5.3 An EPLD Based Transient Recorder for Video Signals 77

5.4 The RaPiD Data Sequencing Method 79

5.5 The PAR-1 79

5.6 The REACT System 81

5.7 The CHAMP Architecture 82

5.8 The Map-oriented Machine 1 83

5.9 The Map-oriented Machine 2 83

5.10 The Map-oriented Machine 3 84

5.11 Conclusions 87

6. A Novel Data Sequencing Concept 89

6.1 The Basic Data Sequencing Principles 90

6.1.1 The Basic Xputer Architecture 90

6.1.2 A Two-Level Address Generation Method 92

6.1.3 The Principle Data Sequencer Architecture 94

6.2 Address Generation with the Slider Method 96

6.2.1 The Video Scan 98

6.2.2 The Compound Scan 104

6.2.3 The Nested Scan 105

6.2.4 The Meshed Scan 106

6.2.5 The Complex Scan 107

6.2.6 Summary of the Classification 107

6.3 The Hardware Implementation of the Slider Method 109

6.3.1 The Stepper 109

6.3.2 The One-dimensional Generic Address Generator 111

6.3.3 The Two-dimensional Video Scan Generator 113

6.4 A Stack Mechanism for the Generation of Complex Address Sequences 114

6.5 The Data Sequencer Mapped to the KressArray 121

6.5.1 Memory Communication Models for the KressArray 123

6.5.2 The KressArray Implementation of the Data Sequencer 125

6.5.3 The Generation of Application Specific Data Sequencers 133

6.6 Chapter Summary 142

7. Data Sequencer use for Higher Memory Bandwidth 145

7.1 The Memory Architecture 147

7.1.1 Row Major Mapping 148

7.1.2 Parallel Memory Banks 148

7.1.2.1 Row Major Mapping With Parallel Memory Banks 149

7.1.2.2 The Dynamic Assignment of Relative Scan Window Positions to Parallel Memory Banks 150

7.1.2.3 A Mapping Scheme to Solve Data Locality Coherence Problems 151

7.1.3 Consequences of the Presented Memory Organization 152

7.1.4 The Hardware Level Support for Memory Access Optimization 153

7.1.4.1 Concurrent Access to Parallel Memory Banks 153

7.1.4.2 The Scan Window Overlap Optimization 155

7.1.4.3 The Burst Access Optimization 158

7.1.4.4 Summary of Hardware Level Optimizations 160

7.2 Loop Transformations 162

7.2.1 Inner Scan Line Loop Unrolling 162

7.2.2 Scan Line Unrolling 163

7.3 The Modification of Storage Schemes 165

7.3.1 The Low Level Storage Scheme Modification 166

7.3.2 The High Level Storage Scheme Modification 168

7.4 Scheduling of Memory Accesses 174

7.4.1 The Optimum Data Access Scheduling 175

7.4.2 Scheduling Trade-off 177

7.5 The Optimization Method 179

7.6 Chapter Summary 181

8. An Application Example 183

8.1 The Linear Filter Application 184

8.1.1 The Design Specification Using the XMDS 186

8.1.2 MoPL Description of the Merged Buffer Linear Filter Application 188

8.1.3 Required Memory Cycles of the Merged Buffer Linear Filter
Design 189

8.2 The Hardware Level Memory Access Optimization of the Merged Buffer Linear Filter Application 189

8.3 The Software Level Memory Access Optimization of the Merged Buffer Linear Filter Application 192

8.4 Memory Access Optimization Results for the Parallelized Merged Buffer Linear Filter Application 201

8.5 Data Sequencer Performance Evaluation of the Parallelized Merged
Buffer Linear Filter Application for the CPLD-based Data Sequencer Solution 202

8.6 A Parallelized Linear Filter Implementation using the KressArray-3 207

8.6.1 Example Implementation With Datapath Units of Conventional Functionality 209

8.6.2 Example Implementation With Highly Integrated Multi-function Datapath Units 215

8.6.3 The Linear Filter Example Mapping Results 217

8.7 Chapter Summary 217

9. Conclusions 219

Appendix

A. A Development Framework for Data Sequencers 225
A.1 The Internet-based Implementation 225

A.2 The Xputer Multimedia Development System Tools 229

A.3 Chapter Summary 238

B. The MoPL-3 Grammar 241
B.1 Program Definition 241

B.2 Boundary Declarations 241

B.3 Scan Window Declarations 242

B.4 rALU Set-up Declarations 242

B.5 Scan Pattern Declarations 244

B.6 Scan Statement Declarations 245

B.7 Scan Action Declarations 246

B.8 Expression Declarations 248

B.9 Lexical Declarations 249

B.10 Common Production Rules 250

C. Multibank DRAM Technology 253
C.1 The Internal Structure and Interfaces 253

C.2 The MDRAM Commands and Functionality 255

C.3 The Memory Refresh 259

D. The Map-oriented Machine with Parallel Data Access 261
D.1 The MoM-PDA Overall Architecture 262

D.2 The MoM-PDA Data Sequencer 263

D.2.1 The MoM-PDA Handle Position Generation 264

D.2.2 The MoM-PDA Scan Window Generation 269

D.2.3 The MoM-PDA Memory Mapping 273

D.2.4 Interfacing Multibank DRAM 275

D.2.5 Multitasking 281

D.3 The Hardware Components of the MoM-PDA 282

D.3.1 The PCI Interface Board 282

D.3.2 The MoM-PDA Board 283

D.3.2.1 The Data Sequencer 284

D.3.2.2 The Burst Control Unit 286

D.3.2.3 The Reconfigurable ALU Port 287

D.3.3 The KressArray Emulator 293

Acknowledgments 295

Curriculum Vitae 297

List of Figures 299

List of Tables 311

List of Definitions 313

List of Symbols and Acronyms 315

References 325

Index 347