Course Syllabus
Course Description and Objectives
MAT388E
is an undergraduate level course which aims to provide an introduction to commonly used statistical methods for inference and prediction problems in data analysis. This course is designed such that:
- The methods covered will include supervised learning algorithms with a focus on regression and classification problems and unsupervised learning algorithms with a focus on clustering problems,
- Application of these methods to data analysis problems and their software implementation will be done via Python.
Course Type
This is an undergraduate-level elective course for Mathematical Engineering students.
Course Credits
3 local credits.
Course Prerequisites
Since the course also touches on the mathematical and statistical theory behind the methods and uses Python for implementation, this course requires the following background:
- Knowledge of linear algebra, probability, statistics, and optimization,
- Familiarity with Python’s Numpy, Pandas, Matplotlib, Seaborn, Statsmodels, and Scikit-Learn libraries,
- Familiarity with at least one computational document such as Jupyter Notebook, Google Colab, Visual Studio Code, or RStudio Quarto, and
- Familiarity with Git commands and GitHub interface.
Class Schedule
CRN 10623:
Thursdays between 14:30-17:30 at OBL3 (Computer Lab).
Course Logistics
- Course related all announcements will be done through Ninova.
- Lecture materials (lecture slides, code scripts, assignments etc) will be uploaded on GitHub organization of the course.
- Students are also expected to bring their own portable computer to the class.
Course Workload
2 quizzes, 1 midterm, and 1 group-based project presentation along with a written-report (see details below).
Course Tentative Plan
We will closely follow the weekly schedule given below. However, weekly class schedules are subject to change depending on the progress we make as a class.
Week 1. Introduction to statistical learning. Supervised and unsupervised learning. Introduction to simple linear regression. Basic optimization concepts used in simple linear regression analysis. Models evaluation metrics for regression problems.
Week 2. Multiple linear regression. Basic optimization concepts used in multiple linear regression analysis.
Week 3. Polynomial regression. Bias-variance trade-off. Over-fitting and under-fitting.
Week 4. Feature selection approaches. Feature Engineering (scaling, encoding).
Week 5. Regularization methods for regression problems. Ridge and lasso regression.
Week 6. Cross-validation. Grid search and hyper-parameter tuning. Pipelines.
Week 7. Introduction to classification. Logistic regression. Gradient descent algorithm. Evaluation metrics for binary classification algorithms. Decision boundary concept.
Week 8. Multi-class classification. Evaluation metrics for multi-class classification algorithms.
Week 9. Naive Bayes. K-nearest neighbors. Remaining topics related to classification such as under-sampling, over-sampling.
Week 10. Decision trees.
Week 11. Bootstrapping, Bagging, Ensemble methods (Random forests and Boosting).
Week 12. Unsupervised learning. Principal component analysis.
Week 13. Clustering methods. K-means algorithm.
Week 14. Hierarchical clustering, DBSCAN.
Student Learning Outcomes
A student who completed this course successfully is expected:
- To be fluent in the fundamental principles behind several statistical methods,
- To be able to apply statistical methods to real life problems and data sets, and
- To be prepared for more advanced coursework or industrial internship in machine learning and related fields.
immediately following the course, and/or a few months after the course.
Textbook
All lecture materials.
Recommended Primary Bibliography
Students are encouraged to consult the following sources on their own:
- Hastie, T., Tibshirani, R., Friedman, J.H., and Friedman, J.H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer. [Hard copy available at ITU Mustafa Inan Library with CALL #Q325.5 .H37 2009] [Available online at https://hastie.su.domains/ElemStatLearn/]
- James, G., Witten, D., Hastie, T., and Tibshirani, R. (2021). An Introduction to Statistical Learning: With Applications in R. New York: Springer. [Available online at https://www.statlearning.com/ ].
- Fan, J., Li, R., Zhang, C.H., and Zou, H. (2020). Statistical Foundations of Data Science. Chapman and Hall/CRC.
- Deisenroth, M.P., Faisal, A.A., and Ong, C. S. (2020). Mathematics for Machine Learning. Cambridge University Press. [Available online at https://mml-book.github.io/].
- VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O’Reilly Media, Inc. [Available online at https://jakevdp.github.io/PythonDataScienceHandbook/].
- Müller, A.C., and Guido, S. (2016). Introduction to Machine Learning with Python: A Guide for Data Scientists. O’Reilly Media, Inc. [Available online at https://github.com/amueller/introduction_to_ml_with_python].
Supplementary Readings
- Murphy, K.P. (2022). Probabilistic Machine Learning: An Introduction. MIT Press. [Available online at https://probml.github.io/pml-book/book1.html].
- Bishop, C.M., Nasrabadi, N. M. (2006). Pattern Recognition and Machine Learning. New York: Springer. [Hard copy available at ITU Mechanical Eng. Library with CALL #Q327 .B52 2006]
Off-Campus Access to the ITU Library E-sources
Access to library e-sources remotely is possible with a library account. Users without a library account should apply for the library registration at Library register. After setting the web configurations given at Proxy only once on your computer, you will able to have an access to ITU Library e-sources.
Selected Important Dates
For the official ITU Fall 2022-2023 academic calendar, please visit:
Here are some selected important dates in Fall 2023 semester:
October 2, 2023: First day of classes.
October 2-13, 2023: Add-drop week.
October 29, 2023: Republic Day of Turkey (Sunday).
January 1, 2024: New year (Monday).
January 5, 2024: Last day of classes.
January 8-21, 2024: Final exam week.
I also honor other national and religious holidays. Students, who needs flexibility on individual-based studies overlapping with these special days, can inform me.
Course Policies
Please read the information below as a reference for how this class will be conducted.
Grading Policy
Assessment Method | Contribution to Final Grade |
---|---|
2 quizzes | Each 5% |
1 midterm | 50% |
Data analysis project presentation | 20% |
Data analysis project report | 20% |
Midterm date and coverage
The midterm date will be announced later. The midterm topics will cover whatever we have covered up to that week. The main aim of the midterm is to assess whether you are able to frame a data analysis problem, implement it, and report the results. The midterm will be hands-on and open-book exam. For that reason, you have to bring your own portable computer to the exam place.
Data analysis project presentation (along with report submission) date and coverage
- The project presentation and report submission date is the final exam date that will be announced by ITU SIS later in December 2023.
- In the data analysis project you are asked to develop a data analysis project from zero.
- You need to find a data and define a research problem around this data.
- Then, you have to apply the algorithms covered as well as the ones not covered (e.g., kernel methods, network clustering, graph analytics, semi-supervised learning, Gaussian processes, reinforcement learning, and big data analytic platforms) in the course to find answers to your research problem.
Final Exam Attendance Policy
There is no VF rule to attend or not to attend the final exam.
Make-Up Exam Policy
- The students who miss either midterm or data analysis project presentation due to a health problem can take a make-up midterm or presentation day as long as they have a valid medical report taken on the exam day.
- The medical report should be handed in immediately (within two days of its expiration).
- There will be NO make-up for missed quizzes.
Class Attendance Policy
The students must attend at least 70% of classes and are deemed responsible to manage his/her absences.
Participation Policy
The students are expected to ask and answer questions, participate in in-class activities, and show their interest and engagement in the class.
E-mail Policy
Please:
- Use a proper descriptive subject line (which may consist of the course number MAT388E followed by a short phrase summarizing the subject of your e-mail).
- Start off your e-mail with a proper greeting, introduce yourself (give your name), then state your problem as short as possible.
- Finally, use a proper closing and then finish your e-mail with your first name and so on.
Feel free to send me e-mails. But be sure you that give me enough time to get back to you.
Academic Honesty Policy
At every stage of the academic life, every ITU student is responsible for obeying the academic honesty policy of ITU stated below:
https://odek.itu.edu.tr/en/code-of-honor/ethics-in-university-life.
Equity, Diversity, and Inclusion
In this class, I am committed to cultural and individual differences and diversity as including, but not limited to, age, disability, ethnicity, gender, gender identity, language, national origin, race, religion, culture, and socioeconomic status and I acknowledge the value of differences.
Student with Special Needs
I truly care about that every student in my class feels that she/he involved in this class equally. If you are a student with special needs, please, let me know that how we can adjust the course environment, materials, and course assessment methods in accordance with your needs. Furthermore, you are also invited to contact the office of students with special needs at: