Page last updated on 2021 March 21

*Enrollment code:* 13185

*Prerequisite:* ECE 254A (can be waived, but ECE 154B is required)

*Class meetings:* None (3-4 hours of pre-recorded lectures per week)

*Instructor:* Professor Behrooz Parhami

*Open office hours:* MW 10:00-11:00 (via Zoom, link to be provided)

**Course announcements:** Listed in reverse chronological order

**Course calendar:** Schedule of lectures, homework, exams, research

**Homework assignments:** Five assignments, worth a total of 40%

**Exams:** None for winter 2021

**Research paper and poster:** Worth 60%

**Research paper guidlines:** Brief guide to format and contents

**Poster presentation tips:** Brief guide to format and structure

**Policy on academic integrity:** Please read very carefully

**Grade statistics:** Range, mean, etc. for homework and exam grades

**References:** Textbook and other sources (Textbook's web page)

**Lecture slides:** Available on the textbook's web page

**Miscellaneous information:** Motivation, catalog entry, history

In addition to submitting your research poster on 3/10, please also include it as an appendix to your completed PDF paper, which will be due on Wednesday, March 17 (any time). This will allow me to deal with a single document in assessing all aspects of your research. The March 17 deadline is firm, because I will need about a week for careful reading and comparative assessment of your research papers, before my 3/23 grading deadline.

A possible venue for publishing your research paper is the 32nd IEEE Int'l Conf. Application-Specific Systems, Architectures, and Processors (ASAP 2021), held virtually from July 7 to July 9, 2021. Abstract submission deadline is March 29 and full papers are due April 5.

I will be talking in an upcoming IEEE Central Coast Section Zoom meeting, as an IEEE Distinguished Lecturer, under the title "Eight Key Ideas in Computer Architecture from Eight Decades of Innovation": Wednesday, March 17, 2021, 6:30 PM PDT. [Registration link (free for everyone)]

IEEE Central Coast Section is also sponsoring a Video-Essay Contest for students with cash prizes, but this one is limited to IEEE student members. [Details & Registration]

A just-published article in

I will maintain my MW 10:00-11:00 AM Zoom office hours through the end of the quarter (March 17).

[P.S.: UCSB talk of possible interest: Peng Gu, PhD defense, "Architecture Supports and Optimizations for Memory-Centric Processing System," Zoom Meeting: https://ucsb.zoom.us/j/7147136139; Meeting ID 714 713 6139; Passcode 123456]

Course lectures and homework assignments have been scheduled as follows. This schedule will be strictly observed. In particular, no extension is possible for homework due dates; please start work on the assignments early. Each lecture covers topics in 1-2 chapters of the textbook. Chapter numbers are provided in parentheses, after day & date. PowerPoint and PDF files of the lecture slides can be found on the textbook's web page.

**Day & Date (book chapters) Lecture topic [HW posted/due] {Research milestones}**

M 01/04 (1) Introduction to parallel processing Lecture 1

W 01/06 (2) A taste of parallel algorithms [HW1 posted; chs. 1-4] Lecture 2

M 01/11 (3-4) Complexity and parallel computation models Lecture 3

W 01/13 (5) The PRAM shared-memory model and basic algorithms {Research topics issued} Lecture 4

M 01/18 No lecture: Martin Luther King Holiday [HW1 due] [HW2 posted; chs. 5-6C]

W 01/20 (6A) More shared-memory algorithms {Research topic preferences due} Lecture 5

M 01/25 (6B-6C) Shared memory implementations and abstractions {Research assignments} Lecture 6

W 01/27 (7) Sorting and selection networks [HW2 due] [HW3 posted; chs. 7-8C] Lecture 7

M 02/01 (8A) Search acceleration circuits Lecture 8

W 02/03 (8B-8C) Other circuit-level examples Lecture 9

M 02/08 (9) Sorting on a 2D mesh or torus architectures [HW3 due] Lecture 10

W 02/10 (10) Routing on a 2D mesh or torus architectures {Priliminary references due} Lecture 11

M 02/15 No lecture: President's Day Holiday [HW4 posted; chs. 9-12]

W 02/17 No lecture: Research focus week

M 02/22 (11-12) Other mesh/torus concepts Lecture 12

W 02/24 (13) Hypercubes and their algorithms [HW4 due] [HW5 posted; chs. 13-16] Lecture 13

M 03/01 (14) Sorting and routing on hypercubes {Final references & provisional abstract due} Lecture 14

W 03/03 (15-16) Other interconnection architectures Lecture 15

M 03/08 (17) Network embedding and task scheduling [HW5 due] Lecture 16

W 03/10 No lecture: Research focus {PDF of research poster due}

W 03/17 {Research paper PDF file due; include submitted poster as an appendix}

T 03/23 {Course grades due by midnight}

- Because solutions will be handed out on the due date, no extension can be granted.

- Include your name, course name, and assignment number at the top of the first page.

- If homework is handwritten and scanned, make sure that the PDF is clean and legible.

- Although some cooperation is permitted, direct copying will have severe consequences.

** Homework 1: Introduction, complexity, and models** (due M 2021/01/18, 10:00 AM)

Do the following problems from the textbook or defined below: 1.9, 1.16, 1.19, 2.6, 3.8, 4.5

A multiprocessor is built out of 5-GFLOPS processors. What is the maximum allowable sequential fraction of a job in Amdahl's formulation if the multiprocessor's performance is to exceed 1 TFLOPS?

Show that the asymptotic time complexity for the parallel addition of

** Homework 2: Shared-memory parallel processing** (due W 2021/01/27, 10:00 AM)

Do the following problems from the textbook or defined below: 5.4, 5.8c, 6.3, 6.14 (corrected), 16.19, 16.25

Consider the butterfly memory access network depicted in Fig. 6.9, but with only 8 processors and 8 memory banks, one connected to each circular node in columns 0 and 3.

a. Show that there exist permutations that are not realizable.

b. Show that the shift-permutation, where processor

Show that the diameter of a

A Hoffman-Singleton graph is built from ten copies of a 5-node ring as follows. The rings are named P0-P4 and Q0-Q4. The nodes in each ring are labeled 0 through 4 and interconnected as a standard (bidirectional) ring. Additionally, node

[According to Eric Weisstein's World of Mathematics, this graph was described by Robertson 1969, Bondy and Murty 1976, and Wong 1982, with other constructions given by Benson and Losey 1971 and Biggs 1993.]

a. Determine the node degree of the Hoffman-Singleton graph.

b. Determine the diameter of the Hoffman-Singleton graph.

c. Prove that the Hoffman-Singleton graph is a Moore graph.

d. Show that the Hoffman-Singleton graph contains many copies of the Petersen graph.

** Homework 3: Circuit model of parallel processing** (due M 2021/02/08, 10:00 AM)

Do the following problems from the textbook or defined below: 7.3, 7.7, 7.20, 8.6, 8.13, 8.21

Let

a. Prove or disprove:

b. Prove or disprove:

In the divide-and-conquer scheme of Fig. 8.7 for designing a parallel prefix network, one may observe that all but one of the outputs of the right block can be produced one time unit later without affecting the overall latency of the network. Show that this observation leads to a linear-cost circuit for

[Ladn80] Ladner, R.E. and M.J. Fischer, "Parallel Prefix Computation,"

** Homework 4: Mesh/torus-connected parallel computers** (due W 2021/02/24, 10:00 AM)

Do the following problems from the textbook or defined below: 9.3, 9.9, 10.17, 11.2, 11.22, 12.2

An

a. Find the routing time and required queue size with store-and-forward row-first routing.

b. Discuss the routing time with column-first wormhole routing, assuming 20-flit packets.

We used odd-even reduction method to solve a tridiagonal system of

a. What would the complexity of odd-even reduction be on an

b. Can you make your algorithm more efficient by using fewer processors?

** Homework 5: Hypercubic & other parallel computers** (due M 2021/03/08, 10:00 AM)

Do the following problems from the textbook or defined below: 13.3, 13.17, 14.10, 15.7, 15.19, 16.7abc

a. Show that a 2D multigrid whose base is a 2^(

b. Show that the 21-node 2D multigrid with a 4×4 base can be embedded in a 5-cube with dilation 2 and congestion 1 but that the 21-node pyramid cannot.

Show edge-disjoint paths for each of the following permutations of the inputs (0, 1, 2, 3, 4, 5, 6, 7) on an 8-input Benes network.

a. (6, 2, 3, 4, 5, 1, 7, 0)

b. (5, 4, 2, 7, 1, 6, 3, 0)

c. (1, 7, 3, 4, 2, 6, 0, 5)

The following sample exams using problems from the textbook are meant to indicate the types and levels of problems, rather than the coverage (which is outlined in the course calendar). Students are responsible for all sections and topics, in the textbook, lecture slides, and class handouts, that are not explicitly excluded in the study guide that follows the sample exams, even if the material was not covered in class lectures.

** Sample Midterm Exam (105 minutes)** (Chapters 8A-8C do not apply to this year's midterm)

Textbook problems 2.3, 3.5, 5.5 (with

*Midterm Exam Study Guide*

The following sections are excluded from Chapters 1-7 of the textbook to be covered in the midterm exam, including the three new chapters named 6A-C (expanding on Chpater 6):

3.5, 4.5, 4.6, 6A.6, 6B.3, 6B.5, 6C.3, 6C.4, 6C.5, 6C.6, 7.6

** Sample Final Exam (150 minutes)** (Chapters 1-7 do not apply to this year's final)

Textbook problems 1.10, 6.14, 9.5, 10.5, 13.5a, 14.10, 16.1; note that problem statements might change a bit for a closed-book exam.

*Final Exam Study Guide*

The following sections are excluded from Chapters 8A-19 of the textbook to be covered in the final exam:

8A.5, 8A.6, 8B.2, 8B.5, 8B.6, 9.6, 11.6, 12.5, 12.6, 13.5, 15.5, 16.5, 16.6, 17.1, 17.2, 17.6, 18.6, 19

Here are a handful of general references on machine learning and associated hardware architectures.

[Jord15] M. I. Jordan and T. M. Mitchell, "Machine Learning: Trends, Perspectives, and Prospects," *Science*, Vol. 349, No. 6245, 2015, pp. 255-260.

[Shaf18] M. Shafique et al., "An Overview of Next-Generation Architectures for Machine Learning: Roadmap, Opportunities and Challenges in the IoT Era," *Proc. Design Automation & Test Europe Conf.*, 2018, pp. 827-832.

[Kham19] A. Khamparia and K. M. Singh, "A Systematic Review on Deep Learning Architectures and Applications," *Expert Systems*, Vol. 36, No. 3, 2019, p. e12400.

Important concerns in machine learning have to do with direct & indirect impacts on humans: Fairness, bias, and security/reliability/safety. We will tie our studies into "UCSB Reads 2021" by having each student include a section entitled "Bias and Fairness Considerations" in his/her paper. A free copy of the "UCSB Reads 2021" selection, cited below, will be provided to each student. I have also included in the following an introductory reference on bias and fairness issues in machine learning.

[Khan18] P. Khan-Cullors and A. Bandele, *When They Call You a Terrorist: A Black Lives Matter Memoir*, St. Martin's Press, 2018 ("UCSB Reads 2021" book selection). [My book review on GoodReads]

[Mehr19] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan, "A Survey of Bias and Fairness in Machine Learning," arXiv document 1908.09635, 2019/09/17.

Here is a list of research paper titles. In topic selection, you can either propose a topic of your choosing, which I will review and help you refine into something that would be manageable and acceptable for a term paper, or you can give me your first to third choices among the topics that follow by the "Research topic preferences due" date. I will then assign a topic to you within a few days, based on your preferences and those of your classmates. Sample references follow the titles to help define the topics and serve as starting points.

1. Parallel Processing in Machine Learning Using Tabular Schemes (Assigned to: **TBD**)

[Jian18] H. Jiang, L. Liu, P. P. Jonker, D. G. Elliott, F. Lombardi, and J. Han, J., "A High-Performance and Energy-Efficient FIR Adaptive Filter Using Approximate Distributed Arithmetic Circuits," *IEEE Trans. Circuits and Systems I*, Vol. 66, No. 1, 2018, pp. 313-326.

[Parh19] B. Parhami, "Tabular Computation," In *Encyclopedia of Big Data Technologies*, S. Sakr and A. Zomaya (eds.), Springer, 2019.

2. Parallel Processing in Machine Learning Using Neuromorphic Chips (Assigned to: **Trenton Rochelle**)

[Gree20] S. Greengard, "Neuromorphic Chips Take Shape," *Communications of the ACM*, Vol. 63, No. 8, pp. 9-11, July 2020.

[Sung18] C. Sung, H. Hwang, and I. K. Yoo, "Perspective: A Review on Memristive Hardware for Neuromorphic Computation," *J. Applied Physics*, Vol. 124, No. 15, 2018, p. 151903.

3. Parallel Processing in Machine Learning Using Biological Methods (Assigned to: **Jennifer E. Volk**)

[Bell19] G. Bellec, F. Scherr, E. Hajek, D. Salaj, R. Legenstein, and W. Maass, "Biologically Inspired Alternatives to Backpropagation Through Time for Learning in Recurrent Neural Nets," arXiv document 1901.09049, 2019/02/21.

[Purc14] O. Purcell and T. K. Lu, "Synthetic Analog and Digital Circuits for Cellular Computation and Memory," *Current Opinion in Biotechnology*, Vol. 29, October 2014, pp. 146-155.

4. Parallel Processing in Machine Learning Using FPGA Assist (Assigned to: **Varshika Mirtini**)

[Lace16] G. Lacey, G. W. Taylor, and S. Areibi, "Deep Learning on FPGAs: Past, Present, and Future," arXiv document 1602.04283, 2016/02/13.

[Menc20] O. Mencer et al., "The History, Status, and Future of FPGAs," *Communications of the ACM*, Vol. 63, No. 10, October 2020, pp. 36-39.

5. Parallel Processing in Machine Learning Using Photonic Accelerators (Assigned to: **Guyue Huang**)

[Kita19] K. I. Kitayama, M. Notomi, M. Naruse, K. Inoue, S. Kawakami, and A. Uchida, A., "Novel Frontier of Photonics for Data Processing—Photonic Accelerator," *APL Photonics*, Vol. 4. No. 9, 2019, p. 090901.

[Mehr18] A. Mehrabian, Y. Al-Kabani, V. J. Sorger, and T. El-Ghazawi, "PCNNA: A Photonic Convolutional Neural Network Accelerator," In *Proc. 31st IEEE Int'l System-on-Chip Conf.*, 2018, pp. 169-173.

6. Parallel Processing in Machine Learning Using Reversible Circuits (Assigned to: **TBD**)

[Gari13] R. Garipelly, P. M. Kiran, and A. S. Kumar, "A Review on Reversible Logic Gates and Their Implementation," *Int'l J. Emerging Technology and Advanced Engineering*, Vol. 3, No. 3, March 2013, pp. 417-423.

[Saee13] M. Saeedi, and I. L. Markov, "Synthesis and Optimization of Reversible Circuits—A Survey," *ACM Computing Surveys*, Vol. 45, No. 2, 2013, pp. 1-34.

7. Parallel Processing in Machine Learning Using Neural Networks (Assigned to: **Karthi N. Suryanarayana**)

[Shar12] V. Sharma, S. Rai, and A. Dev, "A Comprehensive Study of Artificial Neural Networks," *Int'l J. Advanced Research in Computer Science and Software Engineering*, Vol. 2, No. 10, October 2012, pp. 278-284.

[Sze17] V. Sze, Y. H. Chen, T. J. Yang, and J. S. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," *Proceedings of the IEEE*, Vol. 105, No. 12, 2017, pp. 2295-2329.

8. Parallel Processing in Machine Learning Using Approximate Arithmetic (Assigned to: **TBD**)

[Jian17] H. Jiang, C. Liu, L. Liu, F. Lombardi, and J. Han, J., "A Review, Classification, and Comparative Evaluation of Approximate Arithmetic Circuits, *ACM J. Emerging Technologies in Computing Systems*, Vol. 13, No. 4, 2017, pp. 1-34.

[Jian20] H. Jiang, F. J. Santiago, H. Mo, L. Liu, and J. Han, "Approximate Arithmetic Circuits: A Survey, Characterization, and Recent Applications," *Proceedings of the IEEE*, Vol. 108, No. 12, 2020, pp. 2108-2135.

9. Parallel Processing in Machine Learning Using Stochastic Computation (Assigned to: **TBD**)

[Alag13] A. Alaghi and J. P. Hayes, "Survey of Stochastic Computing," *ACM Trans. Embedded Computing Systems*, Vol. 12, No. 2s, May 2013, pp. 1-19.

[Alag17] A. Alaghi, W. Qian, and J. P. Hayes, "The Promise and Challenge of Stochastic Computing," *IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems*, Vol. 37, No. 8, November 2017, pp. 1515-1531.

10. Parallel Processing in Machine Learning Using Analog/Digital Components (Assigned to: **Boning Dong**)

[Hasl20] J. Hasler, "Large-Scale Field-Programmable Analog Arrays," *Proceedings of the IEEE*, Vol. 108, No. 8, August 2020, pp. 1283-1302.

[Lei19] T. Lei et al., "Low-Voltage High-Performance Flexible Digital and Analog Circuits Based on Ultrahigh-Purity Semiconducting Carbon Nanotubes," *Nature Communications*, Vol. 10, No. 1, 2019, pp. 1-10.

11. Parallel Processing in Machine Learning Using Graph Neural Networks (Assigned to: **Zhaodong Chen**)

[Dakk19] Dakkak, A., C. Li, J. Xiong, I. Gelado, and W. M. Hwu, "Accelerating Reduction and Scan Using Tensor Core Units," *Proc. ACM Int'l Conf. Supercomputing*, 2019, pp. 46-57.

[Raih19] Raihan, M. A., N. Goli, and T. M. Aamodt, "Modeling Deep Learning Accelerator Enabled GPUs," *Proc. Int'l Symp. Performance Analysis of Systems and Software*, 2019, pp. 79-92.

Posters prepared for conferences must be colorful and eye-catching, as they are typically competing with dozens of other posters for the attendees' attention. Here are some tips and examples. Such posters are often mounted on a colored cardboard base, even if the pages themselves are standard PowerPoint slides. You can also prepare a "plain" poster (loose sheets, to be taped or pinned to the a wall) that conveys your message in a simple and direct way. Eight to 10 pages, each resembling a PowerPoint slide, would be an appropriate goal. You can organize the pages into 2 x 4 (2 columns, 4 rows), 2 x 5, or 3 x 3 array on the wall. The top two of these might contain the project title, your name, course name and number, and a very short (50-word) abstract. The final two can perhaps contain your conclusions and directions for further work (including work that does not appear in the poster, but will be included in your research report). The rest will contain brief description of ideas, with emphasis on diagrams, graphs, tables, and the like, rather than text which is very difficult to absorb for a visitor in a short time span.

HW1 grades: Range = [B+, A+], Mean = 3.6, Median = B+

HW2 grades: Range = [C, A], Mean = 3.2, Median = B+

HW3 grades: Range = [B+, A], Mean = 3.8, Median = ~A

HW4 grades: Range = [B, A], Mean = 3.5, Median = B+/A–

HW5 grades: Range = [B–, A], Mean = 3.6, Median = ~A

Overall homework grades (percent): Range = [85, 102], Mean = 89, Median = 87

Research paper/poster grades (percent): Range = [70, 95], Mean = 83, Median = 83

Course letter grades: Range = [B, A], Mean = 3.6, Median = A–

** Required text:** B. Parhami,

The follolwing journals contain a wealth of information on new developments in parallel processing:

The following are the main conferences of the field: Int'l Symp. Computer Architecture (ISCA, since 1973), Int'l Conf. Parallel Processing (ICPP, since 1972), Int'l Parallel & Distributed Processing Symp. (IPDPS, formed in 1998 by merging IPPS/SPDP, which were held since 1987/1989), and ACM Symp. Parallel Algorithms and Architectures (SPAA, since 1988).

UCSB library's electronic journals, collections, and other resources

** Motivation:** The ultimate efficiency in parallel systems is to achieve a computation speedup factor of

*Catalog entry:* 254B. Advanced Computer Architecture: Parallel Processing(4) PARHAMI.*Prerequisites: ECE 254A. Lecture, 4 hours*. The nature of concurrent computations. Idealized models of parallel systems. Practical realization of concurrency. Interconnection networks. Building-block parallel algorithms. Algorithm design, optimality, and efficiency. Mapping and scheduling of computations. Example multiprocessors and multicomputers.

** History:** The graduate course ECE 254B was created by Dr. Parhami, shortly after he joined UCSB in 1988. It was first taught in spring 1989 as ECE 594L, Special Topics in Computer Architecture: Parallel and Distributed Computations. A year later, it was converted to ECE 251, a regular graduate course. In 1991, Dr. Parhami led an effort to restructure and update UCSB's graduate course offerings in the area of computer architecture. The result was the creation of the three-course sequence ECE 254A/B/C to replace ECE 250 (Adv. Computer Architecture) and ECE 251. The three new courses were designed to cover high-performance uniprocessing, parallel computing, and distributed computer systems, respectively. In 1999, based on a decade of experience in teaching ECE 254B, Dr. Parhami published the textbook

Offering of ECE 254B in winter 2020 (Link)

Offering of ECE 254B in winter 2019 (Link)

Offering of ECE 254B in winter 2018 (Link)

Offering of ECE 254B in winter 2017 (Link)

Offering of ECE 254B in winter 2016 (PDF file)

Offering of ECE 254B in winter 2014 (PDF file)

Offering of ECE 254B in winter 2013 (PDF file

Offering of ECE 254B in fall 2010 (PDF file)

Offering of ECE 254B in fall 2008 (PDF file)

Offerings of ECE 254B from 2000 to 2006 (PDF file)