CSC 724: Advanced Distributed Systems

Spring 2010
Credits:
3
Meeting Times: Tuesday/Thursday, 2:20 - 3:35pm
Meeting Location: EBII 1228
Wolfware Course Web

Instructor Information:

Course Objectives: 

This course explores design and implementation principles in modern distributed systems. In particular, the course will emphasize on recent techniques used by real-world distributed systems such as peer-to-peer file sharing (e.g., BitTorrent), enterprise data center, and Internet search engine (Google). Students will learn the state of the art in distributed system architectures, algorithms, and performance evaluation methodologies. Topics include canonical distributed concepts such as remote procedure call, distributed objects, replication, distributed system security, concensus protocol, and recent distributed system technologies such as peer-to-peer, Grid, autonomic computing, distributed massive data processing/Google map-reduce, system machine learning,  distributed system debugging, multi-core systems, distributed virtualization. On completing this course, the student should be able to the following:

Text Books:

There are no assigned textbooks for this course. Topics will be covered during in-class lectures, and through course notes made available on this web page.

Links to the supplementary material in the form of research papers related to each topic are included in this syllabus. PDF for most papers is available through the NCSU library web site, which has full-text access to most recent ACM and IEEE journals and conferences. A number of supplemental distributed system textbooks are also available:

Distributed Systems: Concepts and Design, (4th Edition), G. Coulouris, J. Dollimore, and T. Kindberg
Distributed Systems (2nd Edition), Sape Mullender
Distributed Systems: Principles and Paradigms, Andrew S. Tanenbaum, Maarten van Steen

Course Description

Distributed systems have become the fundamental computing infrastructure for many important real-world applications such as Internet search engine, media streaming servers, online file sharing, information analytics, and scientific exploration. This course explores design and implementation principles in modern distributed systems. In particular, the course will emphasize on recent techniques used by real-world distributed systems such as peer-to-peer file sharing (e.g., BitTorrent), enterprise data center, and Internet search engine (Google). Students will learn the state of the art in distributed system architectures, algorithms, and performance evaluation methodologies. Topics include i) traditional distriubted computing concepts (e.g., distriubted objects, middleware, replication, distributed system security, and concensus protocol); and ii) recent emergent distributed system techniques such as peer-to-peer systems, massive data processing, Grid, and autonomic computing. Students will have opportunities to not only learn the common design methodology of many important distributed systems, but also gain hands-on experience through project implementations. The majority of course materials will be drawn from classic papers and current state-of-the-art work. The instructor will lecture for the first half of the semester and students will present papers and projects in the second half of the semester. Students will read and review papers ahead of time, participate in class discussions, present at least one research topic during the course, and do a term project individually or in a two-member team. Students will also write a paper (as well as review other students' papers) describing their project and present their work at the end of the course, in a "conference" format designed to give students an experience similar to that of participating in a professional conference.

Prerequisites:


CSC501, CSC 246 or equivalents. Programming in C++ or Java in Unix environment. If you are not sure whether you can attend this course, please consult the instructor.

Tentative Grading Policy

Written reviews 20%, class participation 20% (presentation: 10%, discussion: 10%), project 60% (proposal writeup 10%, demo 20%, presentation 10%, Final write-up 20%)

Late policy: Calculated by the time recorded in the assignment emails received to the instructor. Students will lose 25% for each 24-hour period they are late on reviews, project, or paper.

Paper Review:

Review guidelines: Provide a paragraph of summary about the paper, a paragraph of 2-3 strong points of the paper (i.e., Why the paper should be accepted), a paragraph of 2-3 weak points of the paper (i.e., why the paper should be rejected),  brainstorming ideas for developing new research ideas related to the work described in the paper(optional).

Project:

Suggested Term Project Topics (NCSU unity ID required).

 Both project proposal and final report should follow typical paper requirements using ACM Double-Column Paper format. The project proposal should include abstract, introduction, and proposed approaches. The final project report should include a full paper content including abstract, introduction, design and algorithms, experiment evaluation, related work, and conclusion. We will organize a mini-conference for the students to present their project work. Three best papers will be selected during the mini-conference.

Class Schedule (Tentative):


 W

 Date

Topic

Assigned Readings

Assignments










1



  1/12


Class is cancelled due to NSF site visit.


Investigate your term project idea and do preparation for it. A list of candidate project topics will also be provided to you on the class. Talk to the instructor about your project idea and talk to other students in forming a two-three members group. Email the instructor to setup the appointment. 


Sunday midnight: review due for Time, clocks and the ordering of events in a distributed system, L. Lamport, Communications ACM 1978. and Distributed snapshots: determining global states of distributed systems, Chandy and Lamport, ACM TOCS 1985.

 





  1/14

Introduction

[slides]


Chapter 1, Distributed Systems: Concepts and Design

2

01/19

Consensus Protocol
[slides]


 

 

Investigate your term project idea and do preparation for it. Talk to the instructor about your project idea and talk to other students in forming a group if you would like to work in a group.

Sunday midnight: review due for Rowstron and P. Druschel, "Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems".  Middleware, 2001 and Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan, "Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications", Proc. of SIGCOMM, 2001.

Sunday midnight: Paper presentation signup due. Please send an email to the intructor to bid two papers in the list below and list your choices in decreasing order. You will be allocated with one paper to present based on the FCFS policy and paper availability.

01/21

RPC, Distributed Objects, Middleware
[slides]


3

01/26

Replication
[slides]

Sunday midnight: review due for D. Andersen and H. Balakrishnan and F. Kaashoek and R. Morris, Resilient Overlay Networks, Proc. 18th ACM SOSP, 2001. and Xiaohui Gu, Klara Nahrstedt, Bin Yu, "SpiderNet: An Integrated Peer-to-Peer Service Composition Framework", Proc. of IEEE International Symposium on High-Performance Distributed Computing (HPDC), Honolulu, Hawaii, June, 2004.


02/2


SysMD&IBM Stream System
[slides1]
[slides2]

   
  • Presented by Yongmin Tan, Juan Du, Vinay Venketash

  

4

02/02


Xen and VCL

  • Presented by Yongmin Tan and Vinay Venketash

 

02/07 midnight: project proposal due.


02/04


Distributed System Security
[slides]

  • Chapter 9, Distributed Systems: Concepts and Design

5

02/09

Peer-to-Peer Systems
[slides]

Sunday midnight: reviews due  Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters. Proc. of OSDI 2004. and Buğra Gedik, Henrique Andrade, Kun-Lung Wu, Philip S. Yu, and MyungCheol Doo. SPADE: The System S Declarative Stream Processing Engine. International Conference on Management of Data, ACM SIGMOD, 2008.

 

02/11

Overlay Networks
[slides]
  • D. Andersen and H. Balakrishnan and F. Kaashoek and R. Morris, Resilient Overlay Networks, Proc. 18th ACM SOSP, 2001.
  • Y. Chu and S. G. Rao and S. Seshan and H. Zhang, A Case For End System Multicast, IEEE Journal on Selected Areas in Communication (JSAC), Special Issue on Networking Support for Multicast", 2002.

6

02/16

Service Composition in Service Oriented Architecture

[slides]

   

 

Sunday midnight: reviews due for I. Cohen and S. Zhang and M. Goldszmidt and J. Symons and T. Kelly and A. Fox, Capturing, indexing, clustering, and retrieving system history, Proc. of SOSP, 2005. and Xiaohui Gu, Haixun Wang, "Online Anomaly Prediction for Robust Cluster Systems", IEEE International Conference on Data Engineering (ICDE), Shanghai, China, April, 2009. 

 

02/18

Project Proposal Presentation

7

02/23




Data-Intensive Computing
[slides]





Sunday midnight: reviews due Z. Gong, P. Ramaswamy, X. Gu, X. Ma,"SigLM: Signature-Driven Load Management for Cloud Computing Infrastructures", Proc. of IEEE International Conference on Quality of Service (IWQoS), Charleston, South Carolina, July, 2009. and J. Du, W. Wei, X. Gu, T. Yu, "RunTest: Assuring Integrity of Dataflow Processing in Cloud Computing Infrastructures", ACM Symposium on Information, Computer and Communications Security (ASIACCS), Beijing, China, April, 2010.




02/25




Massive Data Stream Processing
[slides]




8

03/02


Autonomic Computing
[slides]


  • J. Kephart and D. Chess, The Vision of Autonomic Computing, Computer Magazine, IEEE, 2003.
  • Jeffrey O. Kephart: Research challenges of autonomic computing. ICSE 2005: 15-22.

Sunday midnight: reviews due
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica, Improving MapReduce Performance in Heterogeneous Environments, OSDI 2008. and
Wei Xu, Ling Huang, Armando Fox, David Patterson, Michael Jordan, Detecting Large-Scale System Problems by Mining Console Logs, Proc. of SOSP 2009.


03/04

System Machine Learning
[slides]


9

03/09

Cloud Computing
[slides]

 

Sunday midnight: reviews dueJed Liu, Michael George, K. Vikram, Xin Qi, Lucas Waye, Andrew C. Myers, Fabric: A Platform for Secure Distributed Computation and Storage, Proc. of SOSP 2009.  and

Bryan Parno, Jonathan M. McCune, Dan Wendlandt, David G. Andersen, Adrian Perrig, CLAMP: Practical Prevention of Large-Scale Data Leaks, Proc. of IEEE Symposium on Security and Privacy 2009.



03/11

System Research Methodology (move to the 03/09 due to NSF panel trip)
[slides]


10

03/16

Spring Break
  • No Class.

 

 



03/18

Spring Break
  • No Class.


11

03/23

Student presentation

No paper reading assigned. You should spend time on your term projects.

03/25

Studen presentation


12

03/30

Student presentation

 

 

No paper reading assigned. You should spend time on your term projects.

 



04/01

Spring Holiday

  • No Class.


13

04/06

Student Presentation



 

No paper reading assigned. You should spend time on your term projects.

 



04/08


Project MidReview



14

04/13

Student presentation

  • P. Barham et al., Xen and the Art of Virtualization, Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), October 2003. (presented by Aditya Rao)

 

 

No paper reading assigned. You should spend time on your term projects.

 



04/15

Student presentation




15

04/20

Student presentation

No paper reading assigned. You should spend time on your term projects.





04/22

Mini-Conference for Project Presentation











16
 

  04/27

Mini-Conference for Project Presentation

May 11 midnight: final project report due, project source code and document due

Your project source code and document submission should be a single zip file. The zip file should include your system source code including all other dependent packages, the experimental subjects used in the project report, instructions on how to set up and use the system to reproduce the experimental results, and other documents that help others understand your tool source cod



 
 


 04/29



Mini-Conference for Project Presentation





 


 

Suggested Topics for Student Presentations (You can suggest to the instructor the papers that are not in this list but you would like to present):


System  Anomaly Diagnosis

Data-Intensive Computing

Distributed System Security

Virtualization & VM security monitoring

Distributed Systems in Real World

Large-Scale System Monitoring

Green  Computing

Academic Integrity

The university provides a detailed policy on academic integrity. This policy can be found in the Code of Student Conduct. It is understood that when you submit your homework, you are implicitly agreeing to the university honor pledge: "I have neither given nor received unauthorized aid on this test or assignment."

Academic dishonesty (e.g., cheating or plagiarism) will not be tolerated under any circumstances. If you are having difficultly with any part of the course material, please see me as soon as possible. I will do everything I can to help you with any course-related problems you may be having. If you are found to be guilty of academic dishonesty, however, I will then do everything I can to see that you are punished as forcefully as possible. This may include asking to have you suspended or expelled from the course, the program, and/or the university. At a minimum, you will receive -50% for the assignment in question, and your name will be placed on record with the university as having committed an academic offence. Multiple offences during your academic career will result in suspension or expulsion from the university. I take absolutely no pleasure in pursuing cases of academic misconduct, and would ask that you please do not put me in this position.

Students With Disabilities

All effort will be made to ensure that no students with disabilities are denied any opportunity to successfully complete this course. If you have specific requirements that need to be addressed, please contact me immediately. Possible changes can include (but are not necessarily limited to) rescheduling classes from inaccessible to accessible buildings, or providing access to auxiliary aids such as tape recorders, special lab equipment, or other services such as readers, note takers, or interpreters. This may also include oral or taped tests, readers, scribes, separate testing rooms, or extension of time limits.

Lab Safety Issues

None.

Pass-Through Costs

None.