[ Home | Lab | New Students | Courses | Research | Publications | Activities ]
This course is self-contained, and provides the essential foundation in natural language processing. It identifies the key concepts underlying NLP applications as well as the main NLP paradigms and techniques.
This course combines the core ideas developed in linguistics and in artificial intelligence to show how to understand language. Key topics include regular expressions, unigrams, and n-grams; word embeddings; syntactic (phrase-structure) and dependency parsing; semantic role labeling; language modeling; sentiment and affect analysis; question answering; text-based dialogue; discourse processing; and applications of machine learning to language processing.
The course provides the necessary background in linguistics and artificial intelligence. This course is suitable for high-performing undergraduates who are willing and able to learn abstract concepts, complete programming assignments, and develop a student-selected project.
This course is being offered in two editions, as CSC 495 and CSC 791. CSC 791 students must complete all the requirements for CSC 495 and in addition produce a term paper describing a research topic based on their project.
The research topic could instead be a substantial review of the literature on some specific aspect of NLP or be and original contribution. Please discuss any potential overlap of your paper with your other work with me and report it within your term paper.
Upon completion of this course, students will be able to do the following.
The tentative schedule lists the main topics of this course.
I will assign +/- grades. There will be a fair amount of work—please plan to spend about eight hours (plus time in class) each week.
Component | 791 campus | 791 EOL | 495 |
---|---|---|---|
Exams | 25 | 30 | 20 |
Programming | 60 | 60 | 60 |
Homework | 5 | 5 | 5 |
Class participation | 5 | 0 | 10 |
Message board participation | 5 | 5 | 5 |
Term paper for CSC 791 | 10 | 10 | NA |
Total | 110 | 110 | 100 |
The following programming assignments jointly add up to the programming component of the course grade in the above table. The weights of the assignments are based on their expected complexity. I may change the weights as the semester progresses.
Assignment | Weight |
---|---|
TBD 1 | 15 |
TBD 2 | 15 |
Project report (R0) | 10 |
Project report (R1) | 10 |
Project report (R2) | 10 |
Project report (R3) and demo | 40 |
CSC 791 students must submit a term paper worth approximately 9% of the total grade for them. A general-purpose rubric for term papers is here. However, if you base your term paper on the same topic as your semester project, you can submit it merely by extending your final project report by two pages. The project-based option will turn out to be far less work for you, in general.
The course is self-contained. The main informal prerequisite is maturity in thinking about subtle concepts, such as might be gained through experience with conceptual modeling in databases or software.
Prior encounters with AI (knowledge representation and machine learning) or data science will help but aren't necessary.
From long experience, I have discovered that the material in CSC 226 is essential for my courses. Here is a (partial) list of topics that will be assumed: elementary set theory, relations, partial orders, functions, concept of a theorem, propositional logic, and predicate logic.
I recommend you brush up on these topics if you aren't comfortable with them. These topics are covered in CSC 226: Applied Discrete Mathematics. You may review Chapters 1 to 6 from the following book, which is sometimes used as the CSC 226 textbook: