Welcome to CPS 4721 (Data Mining Principles)
Starting in Spring 2025, CPS 4721 (Data Mining Principles) will no longer be co-listed with CPS 5721 (Knowledge Discovery and Data Mining). From that point forward, CPS 4721 will have distinct materials, assignments, and exams separate from CPS 5721. CPS 4721 will foucs on basic data mining, while CPS 5721 will focus more on advanced data mining.
Students who take CPS 4721 in Spring 2025 will be eligible to take CPS 5721 in Spring 2026. However, students who took CPS 4721 in Spring 2024 cannot take CPS 5721 in Spring 2025, as the two courses were co-listed at that time and shared the same teaching content and exams.
Data Science is becoming one of the important areas in computer science and data mining is the core of this new era. In this course, you will learn about data warehousing, data mining concepts, supervised and unsupervised techniques, and automated analytics as well as obtain hands-on experiences. This course emphasizes data analytics, development, and automation, which means a lot of programming.
If you don't like coding, please do NOT take this course.
CPS 4721 is a required course of B.S. Computer Science (Data Science Option).
Students are encouraged to take CPS 4745/5745 (Data Visualization) in the fall semester, if they are interested in Data Science.
Please click here to see all CS/IT programs at Kean University.
We will cover the following topics:
- SQL DML, aggregations, group by
- SQL DDL, upload CSV to MySQL
- Materialized view, data mart and data visualization
- Data mining concepts, case study
- Data warehousing, OLTP, OLAP
- Binning, similarity, dissimilarity
- Association, mining frequent pattern
- Supervised vs unsupervised methods
- Anomaly impact, outlier detection using Boxplot
- Clustering, K-means
- Statistical techniques, histogram, median, mean and standard deviation
- Data visualization concepts
- Data mining ethics, privacy, security, impact and trends
Prerequisite
The prerequisite is CPS 3740 for CPS 4721. If you have not completed the prerequisite, you should withdraw from the class. The projects require a strong database and web database skills built from CPS 3740/5740.
Note: This course is available only in the spring semester.
Instructor: Dr. Ching-yu (Austin) Huang
Class information:
- All the class materials (syllabus, slides, assignments, tools) are posted on the class Google drive. You must use the Kean email account to access the folder.
- You can access the latest syllabus here.
- This is a hands-on course. Students are required to bring a personal laptop to every class.
- Instructional Methods: lecture with slides, student presentations, class discussions and exercises, and project creations.
- Course grading: please see the course syllabus.
- In addition to the written exams, this combined course requires students to study and present a selected data mining research topics and implement 3 big homework on a Linux server.
- Grading policy: Total 1000 points.
A: >=940, A-: 939-890, B+: 889-840, B: 839-800, B-: 799-760, C+: 759-720, C: 719-680, D: 679-640, F: below 640
(C or better is needed for CS and IT majors.)
(B- or better is needed for the CIS Graduate program.)
CPS 4721 Course Description
This course provides the basic principles, methods, and applications of data mining. Students will gain knowledge of how data mining techniques work, how they can be applied across different domains by using data mining methods in the real-world, and impacts on society.
CPS 4721 Student Learning Outcomes
Upon completion of this course, the student will be able to:
- Demonstrate an understanding of what data mining is
- Summarize the social impact of data mining
- Illustrate the differences between data mining and information access/retrieval
- Analyze data mining processes, concepts, techniques and methods
- Understand data mining applications
Books and resources
Requirements - Students will need the followings to do exercsies and assignments:
- Students need to download and install Kean VPN.
- https://www.kean.edu/software (Kean authentication is required.)
- The VPN server name should be: sslvpn.kean.edu/student
- Students will use Kean email to login to obi.kean.edu (a Linux server): Hostname: obi.kean.edu, port: 22, protocol: ssh
Windows: Use "PuTTY" software to connect to the Linux server. You can download putty from http://www.putty.org/
Mac: Use "terminal" software. Please refer to this tutorial.
- FileZilla-Client to transfer files between the Linux server and your desktop/laptop. You can download FileZilla-Client at
https://filezilla-project.org/download.php?type=client
The hostname and port # is the same as above. Protocol should be "sftp". The logon type should be "normal". You need to enter the user login and password.
- A good text editor: sublime, visual studeio code, or others. You can download sublime at https://www.sublimetext.com/3
- Create public_html and CPS4721 folder under your home directory on obi.kean.edu, and set proper permission for the folders and files. Please refer to this yoda web page procedures
- Practice these basic Unix commands: cd, ls -la, more, pwd, mkdir, rm, mv, cp, chmod, etc. Please
refer to this Unix command document.
- Create and test your database account by following this procedure.
You should review the basic Unix, SQL, and PHP MySQL before the class starts. We will quickly go through these topics and then focus on data mining techniques. You can refresh Unix commands, SQL and PHP MySQL at the following links:
You can get help from the Samurai program for basic Web Database Programming. Samurai will host group review sessions for some topics related to the web & database. You can see the Samurai schedule and VIRTUAL walk-in hours at
Code Samurai Program.