|
ECE 254B: Information on Previous Offerings
Behrooz Parhami: 2007/06/19 || E-mail: parhami@ece.ucsb.edu || Problems: webadmin@ece.ucsb.edu Other contact info at: Bottom of this page || Go up to: B. Parhami's course syllabi or his home page
On June 19, 2007, Professor Parhami's UCSB ECE website moved to a new location. For an up-to-date version of this page, visit it at the new address: http://www.ece.ucsb.edu/~parhami/ece_254b_old.htm Link to the most recent offering of ECE 254B: Parallel ProcessingPrevious offerings of the course
Return to:
Top of this page CE 254B: Spring Quarter 2005This area reserved for important course announcements: Research paper and final course grades have been assigned. An e-mail message, with comments on the research paper, will be sent to each student no later than Friday, June 17, 2005. Have a nice summer!
Calendar: Course
lectures, homework assignments, and exams have been scheduled as follows. This
schedule will be strictly observed. Textbook chapters are specified for lectures
and exams. It is a good idea to look over the covered chapter(s) before
each lecture if possible.
Homework: General Requirements Solutions should be brought to class on the due date
and handed in before the lecture begins. Late
homework will not be accepted, so plan to start work on your assignments early. Use a cover page that includes your name, course and assignment number for your solutions. Staple
the sheets and write your name on top of every sheet in case sheets are
separated. Although
some cooperation is permitted, direct copying will have severe consequences. Course Grade Statistics
* Course = 25%(HW avg.) + 25%(Midterm) + 50%(Research) ECE 254B s2005, Homework #1: Fundamental concepts (chapters 1-4, due Mon. 4/11/05) Do the following problems from the textbook or defined below: 1.6 [20 pts.], 2.1 [20 pts.], 2.J [15 points], 3.9 [10 pts.], 3.13 [15 pts.], 4.12 [20 pts.] Problem 2.J An integer sequence S of length n is stored in the n leaves of a complete binary tree. Develop an efficient parallel algorithm that determines whether the sequence is sorted in nondescending order from left to right. If the sequence is not nondescending, the algorithm should also flag each out-of-order element (one that is less than the element to its left). ECE 254B s2005, Homework #2: Shared-memory and circuit models (chapters 5-8, due Wed. 4/20/05) Do the following problems from the textbook or defined below: 5.1 [10 pts.], 5.8 [15 pts], 6.7 [20 pts.], 7.5 [20 pts.], 7.E [15 pts.], 8.4 [20 pts.] Problem 7.E Selection networks – An (n, k)-selector is an n-input, single-output network of comparators that selects the kth smallest of the n inputs. For example, an (n, 1)-selector outputs the smallest element and an (n, n)-selector yields the largest element. (a) Prove or disprove that the zero-one principle is also valid for selection networks. (b) Design delay-minimal and cost-minimal (5, 2)- and (5, 3)-selectors. (c) Prove or disprove that the delay of an (n, k)-selector is strictly less than that of an n-sorter. (d) Prove or disprove that the cost of an (n, k)-selector is strictly less than that of an n-sorter. ECE
254B s2005, Homework #3:
Do the following problems from the textbook or defined below: 9.4 [20 pts.], 9.8 [15 pts.], 10.11 [15 pts], 10.B [20 pts.], 11.E [15 pts.], 12.1 [15 pts.] Clarification for Problem 12.1: In part b, the assumption is that in any given step, all processors must communicate along the same dimension. This is sometimes referred to as uniaxis communication. Problem 10.B Interval routing in 2D mesh – Consider an interval routing scheme, as defined in problem 10.14, with the intervals attached to the four mesh links for node (i, j) corresponding to the submeshes defined by bounds on row and column numbers for the destination node (k, l) as follows: above, if k < i, l ³ j; right, if k ³ i, l > j; below, if k > i, l £ j; left, if k £ i, l < j. (a) Draw a few sample paths with various source destination pairs for the above mesh routing algorithm and state the general properties of the paths selected. (b) Compare the algorithm to row-first routing with respect to routing delay and buffer requirements in the worst case. (c) Compared to row-first routing, is the traffic going through the links more balanced or less balanced in this algorithm? (d) Is the algorithm deadlock-free if used with wormhole routing? Why or why not? (e) Modify the above regions or intervals for the case of a square 2D torus to achieve balance in the traffic for the various links.
Problem 11.E FIR filter implemented as a linear array – A finite impulse response (FIR) filter produces a sequence of outputs yj in response to the sequence of inputs xi such that yj = å akxj–k, where the ak are the filter’s coefficients and the first p – 1 outputs are ignored. Show how a linear array of p processors can perform this computation, with the input xt accepted, and the output yt produced, in cycle 2t. ECE 254B s2005, Homework #4: Interconnection networks (chapters 13-17, due Wed. 5/25/05) Do the following problems from the textbook or defined below: 13.7 [20 pts.], 13.F [15 pts.], 14.10 [15 pts.], 15.13 [15 pts.], 16.5 [20 pts.], 17.13 [15 pts.] Problem 13.F Embedding parameters – The three examples in Fig. 13.2 suggest that dilation, congestion, and load factor are orthogonal parameters in the sense that knowing two of them does not provide much, if any, information about the third. Reinforce this conclusion by constructing, when possible, additional examples for which dilation, congestion, and load factor are (respectively): (a) 1, 1, 2; (b) 1, 2, 1; (c) 2, 1, 1; (d) 2, 1, 2; (e) 2, 2, 2 ECE 254B s2005, Reading list for midterm exam Remember that the midterm exam ends at 2:00 PM (not 1:30 as our normal classes). The following sections are excluded from the first 12 chapters of the textbook to be covered in the midterm exam: 2.6, 3.5, 4.5, 4.6, 6.5, 7.6, 8.5, 8.6, 9.6, 11.4, 12.6 (pp. 250-253). ECE 254B s2005, Reading list for final exam
Not applicable to this quarter. ECE 254B s2005, Research Project Notes Due to exchange of original ideas that are as yet unpublished, most research project notes will be handed out in hard-copy format rather than posted to this Web page.
Return to:
Top of this page ECE 254B: Winter Quarter 2004This area reserved for important course announcements: Homework 5 (last homework) has been posted below.
Calendar: Course
lectures, homework assignments, and exams have been scheduled as follows. This
schedule will be strictly observed. Textbook chapters are specified for lectures
and exams. It is a good idea to look over the covered chapter(s) before
each lecture if possible.
Homework: General Requirements Solutions should be brought to class on the due date
and handed in before the lecture begins. Late
homework will not be accepted, so plan to start work on your assignments early. Use a cover page that includes your name, course and assignment number for your solutions. Staple
the sheets and write your name on top of every sheet in case sheets are
separated. Although
some cooperation is permitted, direct copying will have severe consequences. ECE 254B w2004, Homework #1: Fundamental concepts (chapters 1-4, due Wed. 1/21/03) Do the following problems from the textbook or defined below: 1.D, 2.2, 2.5, 3.7ab, 4.7 In Problem 3.7b, assume that n is an even power of 2. In Problem 4.7c, assume that p is divisible by s; i.e., p = ks. Problem 1.D: Parallel and pipelined processing -- In a particular plant, laptop computers are assembled in a 3-stage pipelined process, with stage delays all being the same. There are three separate assembly lines, each with an input-to-output latency of 30 minutes. Plot the speed-up due to having three assembly lines, relative to using a single assembly line, when n laptop computers (1 £ n £ 10) are to be assembled. Then derive a general speed-up formula for a p-pipeline, h-segment system relative to a single-pipeline system. ECE 254B w2004, Homework #2: Abstract shared memory (chapters 5-6, due Mon. 2/2/03) Do the following problems from the textbook or defined below: 5.2, 5.5, 5.10, 6.2, 6.B. Problem 6.B: Plurality voting on the PRAM -- Devise an efficient PRAM algorithm for “plurality voting”. There are p processors and a vector x[1..p] of data values. The algorithm should compute a pair of results y and w such that the value y = x[j] appears w times in x and no other value has more appearances. For example, x = 5 7 4 5 4 should yield y = 5 or y = 4 with w = 2, since both 5 and 4 appear twice in x, whereas 7 appears only once. (a) Describe your algorithm in high-level terms, using known subalgorithms if needed. (b) Analyze the worst-case running time of the algorithm. (c) How much simpler can the algorithm become if standard “majority voting” is desired? Assume that a majority always exists and that only the single result y is to be computed. ECE 254B w2004, Homework #3: Circuit model of parallelism (chapters 7-8, due Mon. 2/9/03) Do the following problems from the textbook or defined below: 7.7, 7.D, 7.O, 8.6abc, 8.C.
Problem 7.D: Delay and cost of sorting networks -- Let Cmin(n)
and Dmin(n)
be the minimal cost and minimal delay for an n-sorter constructed only of
2-sorters; the two lower bounds may not be realizable in a single circuit.
(a) Prove or disprove: D Problem 7.O: Batcher sorting networks -- Verify that based on the cost recurrence given in Section 7.4, the cost of a Batcher sorting network is n(log2n)2/4 to within lower-order terms. Then, using the substitution method of Section 3.6, find an exact expression for C(n) that includes all lower-order terms. Problem 8.C: Distributed dictionary machines -- The p processors of a linear array hold n records in sorted order such that the leftmost processor holds the records with the smallest keys. As the processors compute, they produce new records or cause records to be deleted. Discuss in high-level terms a protocol to be followed by the processors for insertion and deletion of records such that the sorted order of the records is maintained and the processor loads (number of records held) are roughly balanced over the long run. ECE 254B w2004, Homework #4: Mesh and mesh-based architectures (chapters 9-12, due Mon. 2/23/03) Do the following problems from the textbook or defined below: 9.3, 10.5, 10.F, 11.2, 12.2. Problem 10.F: k-to-1 routing on a 2D mesh -- Show that if each processor of p-processor square mesh begins with one packet and if up to k packets can have the same destination, the greedy routing algorithm can take Q(min(kp1/2, p)) steps in the worst case. ECE 254B w2004, Homework #5: Low-diameter architectures (chapters 13-16, due Wed. 3/3/03) Do the following problems from the textbook or defined below: 13.8, 13.B, 14.12, 15.1, 16.1.
Problem 13.B: Embedding multigrids and pyramids into hypercubes --
(a)
Note on Problem 15.1: In part f, ignoring the hint can lead to a simpler solution. ECE 254B w2004, Reading list for midterm exam Remember that the midterm exam ends at 6:00 PM (not 5:30 as our normal classes). The following sections are excluded from the first 10 chapters of the textbook to be covered in the midterm exam: 2.6, 3.5, 4.5, 4.6, 6.5, 7.6, 8.5, 8.6, 9.6, 10.6. ECE 254B w2004, Reading list for final exam
Not applicable to this quarter. ECE 254B w2004, Research Project Notes Due to exchange of original ideas that are as yet unpublished, most research project notes will be handed out in hard-copy format rather than posted to this Web page.
Return to:
Top of this page ECE 254B: Winter Quarter 2003This area reserved for important course announcements: The final exam and course grade stats have been posted below. If you would like to be notified of your final and course grades, please send e-mail with an explicit request for e-mail transmission of your grades. To look at your final exam paper and my solutions, stop by during my office hours (same as this quarter for spring 2003). I hope that you enjoyed the course and took with you knowledge that is of lasting value in our fast-moving field.
Calendar: Course
lectures, homework assignments, and exams have been scheduled as follows. This
schedule will be strictly observed. Textbook chapters are specified for lectures
and exams. It is a good idea to look over the covered chapter(s) before
each lecture if possible.
Homework: General Requirements Solutions should be brought to class on the due date
and handed in before the lecture begins. Late
homework will not be accepted, so plan to start work on your assignments early. Use a cover page that includes your name, course and assignment number for your solutions. Staple
the sheets and write your name on top of every sheet in case sheets are
separated. Although
some cooperation is permitted, direct copying will have severe consequences. Course Grade Statistics
* Course = 25%(HW total / 7) + 25%(Midterm) + 50%(Final or Paper) ECE 254B w2003, Homework #1: Fundamental concepts (chapters 1-4, due Wed. 1/22/03) Do the following problems from the textbook or defined below: 1.1, 1.8, 2.7, 3.8, 3.A Problem
3.A
Comparing algorithms a. T(n) = 2T(n/2) + n/(log2n)2 b. T(n) = T(n/2) + n(log2n)2 ECE 254B w2003, Homework #2: Abstract shared memory (chapters 5-6, due Mon. 1/27/03) Do the following problems from the textbook or defined below: 5.3, 5.12, 6.3, 6.4, 6.D Problem
6.D
Geometric problems on circles a. Design an efficient parallel algorithm to determine if any pair of these circles intersect. b. Evaluate the worst-case complexity and the efficiency of your algorithm. c. Specify the PRAM submodel used for your algorithm and discuss simplifications or complications if other submodels are assumed. d. Discuss if the solution can be speeded up by using more than n processors. e. Discuss how the solution is slowed down if p processor are used, where p < n. ECE 254B w2003, Homework #3: Circuit model of parallelism (chapters 7-8, due Mon. 2/3/03) Do the following problems from the textbook or defined below: 7.2ab, 7.3, 7.9, 8.9, 8.A Problem
8.A
PRAM parallel search revisited a. Propose and analyze an efficient parallel algorithm for adding an item to the sorted list. b. Repeat part a for the deletion operation. c. What are the speedups in parts a and b compared to the sequential solutions? d. An application involves a stream of add, delete, and search operations. These operations are randomly intermixed, with a fraction a being adds, a fraction d being deletes, and the rest (i.e., 1 – a – d) consisting of search operations. What is the expected overall speedup for this application if run on the PRAM using the algorithms above? e. Under what conditions is linear O(p) speedup possible in part d? f. An alternative to keeping the list in sorted order is to keep it in arbitrary order. Hence we get four implementation options corresponding to sorted/unsorted list and sequential/parallel solution. Redo part d, computing the speed-up of each of the two parallel alternatives with respect to the best sequential solution. Special notes on the solution to Problem 8.A: A common error in the solutions offered for this homework was averaging of speedups for various parts of an algorithm to derive the overall speedup. The following example shows why this leads to incorrect results. Consider an algorithm with two parts. Alternative A does the first part in 5 s and the second part in 1 s. Alternative B does the first part in 1 s and the second part in 5 s. Both solutions take 6 s, assuming the two parts are equally weighted (are always done together). Speedup of A over B is thus 1.0. Speedups for the two parts are 0.2 and 5.0, when viewed in isolation. The average of the two speedups is 2.6 which is quite different from the correct answer of 1.0. ECE 254B w2003, Homework #4: Data movement on 2D arrays (chapters 9-10, due Mon. 2/10/03) Do the following problems from the textbook or defined below: 9.10a, 9.Ba, 10.4, 10.12, 10.14abc Problem 9.B Selection on a 2D mesh a. Show why the following proposed algorithm for selection on a 2D mesh is not viable. The mesh is divided into four quadrants. The median of the elements in each quadrant is determined using the same selection algorithm recursively. Then the second largest value s among the 4 medians is found using two semigroup computations. Next, each processor determines if its value is <, >, or equal to s. The number of elements in each class is found using 3 semigroup computations. Finally, it is decided how the computation should continue, the elements among which selection must be performed are compressed into a square submesh in the upper left corner, and the next phase is begun. b. Would adding a global bus, or row/column buses, to the mesh change the conclusion of part a? ECE 254B w2003, Homework #5: Mesh algorithms and variants (chapters 11-12, due Mon. 2/24/03) Do the following problems from the textbook or defined above/below: 9.Bb, 11.5, 11.9, 12.3, 12.C Problem 12.C Meshes with diagonal links Consider an m ´ m mesh augmented with 2 diagonal links, one connecting processors (0, 0) and (m – 1, m – 1) and the other connecting processors (0, m – 1) and (m – 1, 0). a. Show that the diameter of such an augmented mesh is no more than 1.5m. b. Can you derive the exact diameter of the m ´ m mesh augmented with diagonal links? c. Derive the bisection width and suggest a lower bound on the time for sorting. d. Formulate an efficient algorithm for finding the maximum of m2 numbers, stored one per processor. The algorithm should take advantage of the diagonal links. ECE 254B w2003, Homework #6: The hypercube architecture (chapters 13-14, due Mon. 3/3/03) Do the following problems from the textbook or defined below: 13.2, 13.A, 13.C, 14.5, 14.15 Problem 13.A Embedding X-trees into hypercubes Extend the result of Problem 13.6 to X-trees. [Correction to the statement of Problem 13.6: Node labels begin from 1 and end with 2q – 1.] Problem 13.C Embedding into hypercubes a. Show that even after removing a quarter of the leaf nodes from a (2q – 1)-node complete binary tree, where q ³ 5, the resulting graph is not a subgraph of the q-cube. b. Show that a forest (collection of independent trees) composed of 2h complete binary trees, each having 2q–h – 1 nodes, is a subgraph of the q-cube when h ³ 1. ECE 254B w2003, Homework #7: Other architectures and topics (chapters 15-18, due Mon. 3/10/03) Do the following problems from the textbook or defined below: 15.2, 15.A, 16.12, 17.12, 18.12 Problem 15.A Shuffle-exchange graphs a. Derive the diameter of an n-node shuffle-exchange graph. b. Consider the (n/2)-node graph formed by merging each node u in the n-node shuffle-exchange graph with its binary complement node u'. Show that the resulting graph has degree 3 and diameter at most 1.5 log2n. c. Show that r perfect shuffles return an n-card deck (n even) to its original order iff 2r = 1 mod (n – 1). d. Prove that 8 perfect shuffles are required and sufficient to restore a deck of 52 cards to its original order. ECE 254B w2003, Reading list for midterm exam Remember that the midterm exam ends at 6:00 PM (not 5:30 as our normal classes). The following sections are excluded from the first 10 chapters of the textbook to be covered in the midterm exam: 2.6, 3.5, 4.5, 4.6, 6.5, 7.6, 8.5, 8.6, 9.6. ECE 254B w2003, Reading list for final exam The final exam will emphasize the material covered after the midterm (Chapters 11-20). All midterm exclusions apply for the final as well. Additionally, the following sections are excluded from the final exam: 11.4, 11.6, 12.6, 13.5, last page of 14.3, 15.6, 16.3, 16.5, 16.6, last page of 17.2, 17.6, 18.6, 19.6, all of Chapter 20. ECE 254B w2003, Possible Research Topics For list of possible topics, please see the previous offerings of the course. An updated list of research topics will not be produced unless some students indicate an interest in pursuing the research option during this quarter. Return to:
Top of this page ECE 254B: Winter Quarter 2002This area reserved for important course announcements: I hope that you enjoyed the course. Course grades have been sent to the Registrar. Send e-mail to the instructor to request your final exam and course grades.
Calendar: Course
lectures, homework assignments, and exams have been scheduled as follows. This
schedule will be strictly observed. Textbook chapters are specified for lectures
and exams. It is a good idea to look over the covered chapter(s) before
each lecture if possible.
Homework: General Requirements Solutions should be brought to class on the due date
and handed in before the lecture begins. Late
homework will not be accepted, so plan to start work on your assignments early. Use a cover page that includes your name, course and assignment number for your solutions. Staple
the sheets and write your name on top of every sheet in case sheets are
separated. Although
some cooperation is permitted, direct copying will have severe consequences. Course Grades
* Course = 25%(HW total / 7) + 25%(Midterm) + 50%(Final or Paper) ECE 254B w2002, Homework #1: Fundamental concepts (chapters 1-4, due Fri. Jan. 25) Do the following problems; all but problem 1.C, given below, are from the textbook. 1.9, 1.C, 2.11, 3.2, 3.6, 4.2 Problem 1.C Amdahl's law A multiprocessor is built out of 5 GFLOPS processors. What is the maximum allowable sequential fraction of a job in Amdahl's formulation if the multiprocessor's performance is to exceed 1 TFLOPS? ECE 254B w2002, Homework #2: Abstract shared memory (chapters 5-6, due Mon. Jan. 28) Do the following problems; all but problem 5.C, given below, are from the textbook. 5.4, 5.11, 5.C, 6.14 (note the correction below) Problem 6.14 -- replace the problem introduction with the following: Consider the butterfly memory access network depicted in Fig. 6.9, but with only 8 processors and 8 memory banks, one connected to each circular node in columns 0 and 3 of the butterfly. Problem 5.C Parallel prefix computation on a linked list a. Outline the design of an efficient parallel prefix algorithm for the PRAM (no actual code is needed). The n data items are stored in the form of a doubly linked list. The array D[1..n] contains the data items while the integer arrays F[1..n] and B[1..n] hold the forward and backward links; i.e., the next and previous elements in sequence. The terminal list element in either direction points to itself. The computed prefix results should be stored in P[1..n] in the same linked format (i.e., with sequencing links given by F and B). b. Which PRAM submodel is implied by your algorithm? c. If the answer to part b is not EREW, then how can the algorithm be modified to run on this “weaker” submodel with little or no performance penalty? ECE 254B w2002, Homework #3: Circuit model of parallelism (chapters 7-8, due Mon. Feb. 4) Do the following problems; all but problem 7.L, given below, are from the textbook. 7.4, 7.13, 7.L, 8.5, 8.8 Problem 7.L Synthesizing sorting networks Show that given valid n-sorter, the following transformations lead to a valid 2n-sorter. (1) Replace each horizontal line by a pair of lines. (2) Replace each 2-sorter connected to a pair of the original lines by a 4-sorter connected to both pairs of lines replacing them. (3) Replace each 4-soter by the 5-comparator network of Fig. 7.4. ECE 254B w2002, Homework #4: Data movement on 2D arrays (chapters 9-10, due Mon. Feb. 11) Do the following problems; all but problems 9.A and 10.E, given below, are from the textbook. 9.1, 9.6, 9.A, 10.7, 10.E Problem 9.A The shearsort algorithm Using the result of the analysis for optimized shearsort in Section 9.3: a. Determine conditions under which shearsort would be faster for 2m^2 elements, stored one per processor, when run on a 2m-by-m mesh (2m rows, m columns) compared to an m-by-2m mesh (m rows, 2m columns). b. Given p elements to be sorted, find the optimal mesh dimensions (r and p/r) that would minimize the sorting time. Problem 10.E Greedy routing on a 2D torus Show that on an r-row p-processor 2D torus, the greedy row-first routing algorithm solves a 1-1 routing problem in no more than r/2 + p/(2r) steps. ECE 254B w2002, Homework #5: Mesh algorithms and variants (chapters 11-12, due Mon. Feb. 25) Do the following problems; all but problem 11.B, given below, are from the textbook. 11.8, 11.B, 12.4, 12.8, 12.11 Problem 11.B Matrix algorithms on a linear array a. A b-banded or b-diagonal matrix is a matrix for which all the entries not contained in a single band of b diagonals are zero. Show how to multiply an m-vector by an m ´ m b-diagonal matrix in O(m) steps on a b-cell linear array. b. Draw a linear array and its associated data flow for solving an upper-triangular system of five linear equations. c. Draw a linear array and its associated data flow for solving a strictly-lower-triangular system of 5 linear equations (i.e., the main diagonal elements are also 0s). ECE 254B w2002, Homework #6: The hypercube architecture (chapters 13-14, due Mon. Mar. 4) Do the following problems from the textbook. 13.1, 13.6, 13.14, 14.4, 14.13 ECE 254B w2002, Homework #7: Other architectures and topics (chapters 15-18, due Fri. Mar. 15) Do the following problems; all but problem 15.D, given below, are from the textbook. 15.3, 15.D, 16.7, 17.2, 18.11 Problem 15.D Routing on Benes networks Show edge-disjoint paths for each of the following permutations of the inputs (0, 1, 2, 3, 4, 5, 6, 7) on an 8-input Benes network. a. (6, 2, 3, 4, 5, 1, 7, 0) b. (5, 4, 2, 7, 1, 6, 3, 0) c. (1, 7, 3, 4, 2, 6, 0, 5)ECE 254B w2002, Reading list for midterm exam Remember that the midterm exam ends at 6:00 PM (not 5:30 as our normal classes). The following sections are excluded from the first 10 chapters of the textbook to be covered in the midterm exam: 2.6, 3.5, 4.5, 4.6, 6.5, 7.6, 8.5, 8.6, 9.6. ECE 254B w2002, Reading list for final exam The final exam will emphasize the material covered after the midterm (Chapters 11-18). All midterm exclusions apply for the final as well. Additionally, the following sections are excluded from the final exam: 12.6, 13.5, last page of 14.3, 15.6, 16.3, 16.5, 16.6, last page of 17.2, 17.6, 18.6 ECE 254B w2002, Possible Research Topics For list of possible topics, please see the previous offerings of the course. An updated list of research topics has not been produced because no student indicated an interest in pursuing the research option during this quarter. Return to:
Top of this page ECE 254B: Spring Quarter 2001
Calendar: Course
lectures, homework assignments, and exams have been scheduled as follows. This
schedule will be strictly observed. Textbook chapters are specified for lectures
and exams. It is a good idea to look over the covered chapter(s) before
each lecture if possible.
Homework: General Requirements Solutions should be brought to class on the due date
and handed in before the lecture begins. Late
homework will not be accepted, so plan to start work on your assignments early. Use a cover page that includes your name, course and assignment number for your solutions. Staple
the sheets and write your name on top of every sheet in case sheets are
separated. Although
some cooperation is permitted, direct copying will have severe consequences. Course Grades
* Course = 25%(HW total / 7) + 25%(Midterm) + 50%(Final or Paper) ECE 254B s2001, Homework #1: Fundamental concepts (chapters 1-4, due Mon. Apr. 16) Do the following problems from the textbook, plus Problem 1.A given below. 1.4 (p. 21), 2.6 (p. 42), 3.3 (p. 61), 4.1 (p. 82) Problem 1.A on parallel processing effectiveness: A task graph is built from back-to-back complete binary trees (two complete binary trees merged at their leaves). One root is the starting node and the other root is the final node, with dependency arrows all going in the direction from the former to the latter. There are 2^h nodes at the widest point of the task graph (leaves of the original trees) and n = 3 ´ 2^h – 2 nodes overall. Nodes correspond to unit-time tasks to be scheduled for processing on a parallel machine with p processors and arrows indicate data dependencies or required execution order. Ignore the overhead for scheduling and communication throughout. a. What is the maximum attainable speedup, given an unlimited supply of processors? What is the minimum number of processors required to achieve this speedup? b. With p = 2^q processors (q < h), find the speed-up and average processor utilization as functions of h and q. c. What is the acceptable range for p if we were to achieve asymptotically linear speedup? d. Define an equivalent Amdahl task graph as having one "input" and one "output" task with computation times fn/2 and p parallel tasks executable only after the input task and before the output task, each having computation time (1 – f)n/p, where f (0 £ f £ 1) is the inherently sequential fraction of the corresponding Amdahl job. Compute the fraction f to achieve the same speedup as in part a. ECE 254B s2001, Homework #2: Abstract shared memory (chapters 5-6, due Mon. Apr. 23) Do the following problems from the textbook, plus Problem 5.C given below. 5.10 (p. 107), 5.16 (p. 108), 6.1 (p. 125), 6.9b (p. 126) Note: In Problem 5.16, please do not assume the CRCW PRAM submodel that writes the maximum or minimum of the offered values in memory! Problem 5.C on parallel prefix computation on a linked list: a. Outline the design of an efficient parallel prefix algorithm for the PRAM (no actual code needed), assuming that the n data items are stored in the form of a doubly linked list. The array D[1..n] contains the data items while the integer arrays F[1..n] and B[1..n] hold the forward and backward links; i.e., F[i] and B[i] point to the next and previous elements in sequence. The terminal list element in either direction points to itself. The computed prefix results should be stored in P[1..n] in the same linked format (i.e., with sequencing links given by F and B). b. Which PRAM submodel is implied by your algorithm? c. If the answer to part b is not EREW, then how can the algorithm be modified to run on this "weaker" submodel with little or no performance penalty? ECE 254B s2001, Homework #3: Circuit model of parallelism (chapters 7-8, due Wed. May 2) Do the following problems from the textbook, plus Problem 7.H given below. 7.12 (p. 146), 7.14 (p. 146), 8.1 (p. 165), 8.6 (p. 166) Problem 7.H on sorting networks with adjacent comparators: A comparator in a sorting network is "adjacent" if it compares values on lines i and i + 1 for some i in the range [0, n - 2]. a. Prove that any n-input sorting network must contain at least one instance of each of the n - 1 possible adjacent comparators. b. Let a(n) be the least possible number of comparators in an n-sorter composed solely of adjacent comparators. Establish upper and lower bounds for a(n); that is, a(n) = O(?) and a(n) = W(?). ECE 254B s2001, Homework #4: Data movement on 2D arrays (chapters 9-10, due Mon. May 7) Do the following problems from the textbook, plus Problems 9.C and 10.D given below. 9.5 (p. 188), 9.11 (p. 189), 10.2 (p. 208) Optional problem for extra credit: 9.16 (p. 189) Problem 9.C on the analysis of shearsort: On a p-processor 2D mesh with one dimension equal to x, where x < p^(1/2), is shearsort faster when x is the number of rows or the number of columns? Problem 10.D on reversing a sequence on a mesh: Consider a sequence of p values, x_0, x_1, . . . , x_(p–1), stored on a mesh, one value per processor. a. Outline an algorithm for reversing the sequence if the storage order is row-major. b. Analyze the complexity of the algorithm given in your answer to part a. c. Repeat part a for snake-like row-major order. ECE 254B s2001, Homework #5: Mesh algorithms and variants (chapters 11-12, due Mon. May 21) Do the following problems from the textbook. 11.3 (pp. 231-232), 11.4* (p. 232), 11.7 (p. 232), 12.7 (p. 254), 12.14 (p. 255) * Correct "back substitution" to "forward substitution". ECE 254B s2001, Homework #6: The hypercube architecture (chapters 13-14, due Wed. May 30) Do the following problems from the textbook. 13.3 (p. 275), 13.4 (pp. 275-276), 13.7 (p. 276), 14.9 (p. 296), 14.11 (p. 297) ECE 254B s2001, Homework #7: Other networks and topics (chapters 15-17, due Wed. June 6) Do the following problems from the textbook. 15.8 (p. 318), 15.12 (p. 319), 16.1 (p. 340), 16.12 (p. 342), 17.9 (p. 365) Optional problem for extra credit: Problem 16.E on Clos networks Consider a Clos network with rs inputs, rs outputs, and three columns (0-2) of switches. Columns 0 and 2 contain r switches, each of which is an s ´ s crossbar. Column 1 contains s switches that are r ´ r crossbars. Zero-origin top-to-bottom indexing is used to identify the switches in each column and their input/output lines. Switch terminals are identified by in(c, b, a) and out(c, b, a), with c being the column index, b the switch (block) index, and a the line index. Intercolumn connections are as follows: for all x and y (0 £ x £ r – 1, 0 £ y £ s – 1), out(0, x, y) is connected to in(1, y, x) and a. Prove that the Clos network can realize any rs ´ rs permutation. [Hint: You may find the result on perfect matching in Section 17.1 useful.] b. If the cost of an m ´ m crossbar switch is m^2 units, what are the optimal values of r and s for a given total number n = rs of inputs? c. Try to prove a more general optimality result that holds for a class of cost functions rather than only for the one given in part b. ECE 254B s2001, Reading list for midterm exam Remember that the midterm exam starts at 8:00 AM. The following sections are excluded from the first 11 chapters of the textbook to be covered in the midterm exam: 2.6, 3.5, 4.5, 5.6, 6.5, 7.6, 9.6, 11.4 ECE 254B s2001, Reading list for final exam The final exam will emphasize the material covered after the midterm (Chapters 12-18). All midterm exclusions apply for the final as well. Additionally, the following sections are excluded from the final exam: 12.6, 13.5, last page of 14.3, 16.5, 16.6, 17.6, 18.6 ECE 254B s2001, Possible Research Topics The following are some suggested topics for research; they will be assigned on a first-come, first-served basis. In each case, one or two references are given to familiarize you with the subject area and the nature of the work expected. The paper may deal with a subset of the topic area, to be defined through further discussion after work has begun. If, after reviewing the cited reference(s), you are interested in a particular topic, please discuss your interest with the instructor for additional references and/or guidance. You are expected to find most of the needed references on your own. Searching the Internet and consulting the reference sources listed in the course outline are possible avenues. Topic 1. Mesh-connected computers with hierarchically organized row/column buses [network] [Pan01] Pan, Y., K. Li, and H. Shen, "An Improved Generalization of Mesh-Connected Computers with Multiple Buses," IEEE Trans. Parallel and Distributed Systems, Vol. 12, No. 3, pp. 293-305, Mar. 2001. [Serr93] Serrano, M.J. and B. Parhami, "Optimal Architectures and Algorithms for Mesh-Connected Parallel Computers with Separable Row/Column Buses," IEEE Trans. Parallel and Distributed Systems, Vol. 4, No. 10, pp. 1073-1080, Oct. 1993. Topic 2. Extended fault diameter of torus networks [mathematical] [Parh00] Parhami, B., "Extended Fault Diameter of Mesh Networks," Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications, pp. 1035-1039, June 2000. Topic 3. Building-block computational algorithms for periodically regular chordal rings [algorithmic] [Parh99] Parhami, B. and D.-M. Kwai, "Periodically Regular Chordal Rings," IEEE Trans. Parallel and Distributed Systems, Vol. 10, No. 6, pp. 658-672, June 1999. [Nass93] Nassimi, D., "Parallel Algorithms for the Class of +/-(2^b) DESCEND and ASCEND Computations on a SIMD Hypercube," IEEE Trans. Parallel and Distributed Systems, Vol. 10, No. 12, pp. 1372-1381, Dec. 1993. Topic 4. Reconfigurable parallel processor systems [architectural] [Li91] Li, H. and Q.F. Stout (Eds.), Reconfigurable Massively Parallel Computers, Prentice-Hall, 1991. Topic 5. Building special-purpose parallel systems on FPGAs or FPGA-like arrays [hardware] [Thib99] Thibeault, C. and G. Begin, "A Scan-Based Configurable, Programmable, and Scalable Architecture for Sliding Window-Based Operations," IEEE Trans. Computers, Vol. 48, No. 6, pp. 615-627, June 1999. [Vill98] Villasenor, J. and B. Hutchings, "The Flexibility of Configurable Computing", IEEE Signal Processing Magazine, Vol. 15, No. 5, pp. 67-84, Sep. 1998. Topic 6. Coordination and scheduling in networks/clusters of workstations [software] [Amir00]
Amir, Y. et al, "An Opportunity Cost Approach for Job Assignment in a
Scalable Computing Cluster,"
IEEE Trans. Computers, Vol. 11, No.
7, pp. 760-768, July 2000. Return to:
Top of this page ECE 254B: Spring Quarter 2000 offering
Calendar: Following, is a lecture-by-lecture course calendar, including homework assignment and due dates as well as research deadlines. This schedule will be strictly observed. It is a good idea to look over the specified course material before each lecture if possible.
Return to: Top of this page || Go up to: B. Parhami's course syllabi or his home page
|
|