CSC 541
Advanced Data Structures
MWF 8:30-9:20  2236 EB-III

Introduction

CSC 541 focuses on advanced data structures and systems that utilize these data structures. My goals for you in CSC 541 are to:

Enrollment

Because of the popularity of CSC 541, no audits are permitted. I have no control over enrollment. This is managed by MyPack, so any requests for entry into the class or the waitlist will be directed there. Openings in the class will be filled from the waitlist in order of entry onto the waitlist. Finally, students from outside the Computer Science Department will not be permitted to request enrollment until the first day of the semester, which normally means there will not be any availability either in the class or on the waitlist.

For students not on the waitlist, the only option is to continue to check for room on the waitlist through MyPack.

Having said this, if you know that you WILL NOT maintain enrollment in CSC 541, please drop the class from your schedule as soon as possible. This will allow other students who intend to take the class to enroll. Waiting until the drop deadline when you know you have no intention of staying in the class is extremely selfish, since it's essentially impossible for new students to enter the class at that date and successfully catch up with the material that's been covered. Please be considerate of your fellow classmates.

Instructor Information

Both the TA and I are available during office hours each week. If you need to meet either of us outside of office hours, please email to arrange an appointment. The TA is responsible for all grading of assignments and tests, so if you have any questions about grading, please start by talking with the TA. If you cannot resolve the issue, you should then contact me to discuss it further.

Instructor. Christopher G. Healey
Contact. healey@ncsu.edu
Office Hours. M 9:30–10:30 in 2266 EB-II, or by appointment
TA. Kornelia Bastin
TA Contact. kabastin@ncsu.edu
TA Office Hours. Th 11:00–12:00 in 2231 EB-II, or by appointment

Assignments

The four assignments in CSC 541 will investigate different searching and sorting techniques. The topics and due dates of the four assignments are as follows:

All assignments must be completed individually. No collaboration is allowed during the implementation of any of the assignments. Each assignment is worth 10% of your final grade.

In-Class Quizzes

There will be two unannounced in-class quizzes during the first few weeks of class. The quizzes are will be short, and will focus on material discussed during the 2–3 lectures prior to each quiz.

Please ensure you attend the lectures so you are present for the quizzes. Each quiz is worth 2.5% of your final grade.

Textbooks

All instruction in this year's CSC 541 will use material from Disk-Based Algorithms for Big Data. In addition to this resource, Donald Knuth's Sorting and Searching reference can be used to provide an alternative presentation to some of the different topics we will discuss.

Course Schedule

Below is a tentative course schedule, including topics, a brief overview of each topic, and links to the slides on topics presented in class. Please note that time frames and topics will be confirmed in class and are subject to possible changes.

Topic Description Slides
Physical Disk Storage physical hard disk drives, cluster and block allocation, access cost, logical-to-physical data storage, buffer management Chapter 01
Order Notation theta notation, big-O notation, big-Omega notation, practical algorithm analysis Appendix A
Logical File Organization logical components, target record identification Chapter 02
File Access sequential access, direct access
File Management record addition, record update, fixed-length record deletion, variable-length record deletion
File Indexing simple indices, index management, large index files, secondary key indices
In-Memory Sorting heapsort, merge sort, Timsort Chapter 03
In-Memory Searching linear search, binary search, BST, k-d tree, hashing Chapter 04
Disk-Based Sorting mergesort, mergesort analysis, mergesort improvements Chapter 05
B-Trees paged BST, B-Tree, B* Tree, B+ Tree Chapter 06
Extendible Hashing extendible hashing, tries, hash tries
Storage Technology optical drives, SSD, holographic storage, molecular memory, MRAM Chapter 07
Distributed Hash Tables DHT history, keyspaces, keyspace partitioning, overlay networks, Chord Chapter 08
Large File Systems RAID, ZFS, GFS, Hadoop/HFS, Cassandra, Presto Chapter 09
NoSQL Storage RDBMS, graph databases, Neo4j, document databases, Mongo Chapter 10

Schedule of Reading Assignments

Apart from material in the class slides related to individual lectures, no additional readings will be assigned. Students will be informed in class at the beginning of each week which sections of the slides will be covered during that week's lectures.

Assignment and Test Schedule

All assignments will be submitted with Moodle, the university's web-based assignment submission system.

Grading

Grades for the course will be made up from four assignments, the in-class quizzes, a midterm, and a final exam. You are expected to attend all lectures, and read all relevant class slides and programs I provide. Missed exams cannot be made up without an official university excuse. Homework should be submitted via Moodle by 11:59:59pm on the due date.

Final grades will be calculated as follows, using +/- grading:

Assignments. 40%
In-Class Quizzes. 5%
Midterm Exam. 20%
Final Exam. 35%

Because CSC 541 is oversubscribed, no audits will be allowed in this class.

Late or Missed Assignments

Missed assignments and exams cannot be made up without a university-recommended excuse. There is no late assignment policy. Assignments will not be accepted after their due date. Contact me as soon as possible if you need to discuss reasons for late or missed assignments or exams.

Class Absences

If you miss (or plan to miss) class(es), you are expected to "make up" the material by reading the appropriate section(s) in the class slides, and meeting with me if necessary to discuss the material.

Prerequisites

Registered as a graduate computer science (CSC) student, or with permission of the instructor. You are also expected to have a solid foundation in C programming, data structures, and analysis of algorithms.

Academic Integrity

The university provides a detailed policy on academic integrity. This policy can be found in the Code of Student Conduct. It is understood that when you sign and submit your homework, midterm exam, and final exam, you are implicitly agreeing to the university honor pledge: "I have neither given nor received unauthorized aid on this test or assignment."

Academic dishonesty (e.g., cheating or plagiarism) will not be tolerated under any circumstances. If you are having difficultly with any part of the course material, please see me or the TA as soon as possible. I will do everything I can to help you with any course-related problems you may be having. If you are found to be guilty of academic dishonesty, however, I will then follow standard university procedure to ensure the standard consequence for academic dishonisty at the university. Please do not ask me to "give you a break." I am required by university policy to document every case of verified cheating. And, to be honest, I am rarely sympathetic to this type of behaviour.

At a minimum, you will receive -25% for the assignment or exam in question, and your name will be placed on record with the university as having committed an academic offense. Multiple offenses during your academic career will result in suspension or expulsion from the university. Depending on the severity of the offense, the penalty could be increased to an F in the course, or a request to have a student removed from the program.

I take absolutely no pleasure in pursuing cases of academic misconduct, and would ask that you please do not put me in this position.

Compliance will be monitored by the MOSS software, which is very good at detecting similarities in programs. MOSS has been used to successfully identify cases of copying or plagiarism in a number of CSC courses, and will be applied to all programming assignments you complete.

Note: A recently common tactic for attempting to falsify assignments in this course is to use GitHub or some other online resource to obtain a solution to the data structure being implemented, then attempt to modify that solution to "fit" the assignment requirements. We actively monitor available online solutions and include them in our MOSS testing. I consider this strategy worse than standard cheating or aiding and abetting, since your are subverting someone's work that was provided for the benefit of the programming community to your own personal gain. Because of this, if you are found to be cheating based off external resources, the penalty will be increased to a minimum of -50%, with a reasonable likelihood of a recommendation of an F for the course barring extenuating circumstances. I strongly encourage you to dissuade yourself of considering this avenue for assignment completion.

Students With Disabilities

All effort will be made to ensure that no students with disabilities are denied any opportunity to successfully complete this course. If you have specific requirements that need to be addressed, please contact me immediately. Possible changes can include (but are not limited to) rescheduling classes from inaccessible to accessible buildings, or providing access to auxiliary aids such as tape recorders, special lab equipment, or other services such as readers, note takers, or interpreters. This may also include oral or taped tests, readers, scribes, separate testing rooms, or extension of time limits.