Welcome to CPS 5721 (Knowledge Discovery and Data Mining)
Starting in Spring 2025, CPS 4721 (Data Mining Principles) will no longer be co-listed with CPS 5721 (Knowledge Discovery and Data Mining). From that point forward, CPS 4721 will have distinct materials, assignments, and exams separate from CPS 5721. CPS 4721 will foucs on basic data mining, while CPS 5721 will focus more on advanced data mining.
Students who take CPS 4721 in Spring 2025 will be eligible to take CPS 5721 in Spring 2026. However, students who took CPS 4721 in Spring 2024 cannot take CPS 5721 in Spring 2025, as the two courses were co-listed at that time and shared the same teaching content and exams.
Data Science is becoming one of the important areas in computer science and data mining is the core of this new era. In this course, you will learn about data warehousing, data mining concepts, supervised and unsupervised techniques, and automated analytics as well as obtain hands-on experiences. This course emphasizes data analytics, development, and automation, which means a lot of programming.
If you don't like coding, please do NOT take this course.
Students are encouraged to take CPS 4745/5745 (Data Visualization) in the fall semester, if they are interested in Data Science.
Please click here to see all CS/IT programs at Kean University.
We will cover the following topics:
- SQL DML, aggregations, group by
- SQL DDL, upload CSV to MySQL
- Materialized view, data mart and data visualization
- Data mining concepts, case study
- Data warehousing, OLTP, OLAP
- Data processing: internet data, text and image
- Detect local minimum, local maximum, up and down trends
- Supervised vs unsupervised methods
- Binning
- Similarity, dissimilarity - distances, cosine, Manhattan (city block), euclidean, supernum
- Text mining and text similarity
- Web crawling, search engine
- Association, mining frequent pattern
- Classification, impurity, gain, decision tree
- Image classification - TensorFlow
- Regression, SVM
- Anomaly impact, outlier detection using Boxplot, regression, histogram, mean and standard deviation, Z-score,
- Clustering, K-means, hierarchical, dendrogram, 4-connected
- Statistical techniques
- Data visualization concepts
- Data mining trends, impacts
- Automated analytics - statistical functions, correlation, p-value
- Python Pandas, matplotlib, geopandas
Prerequisite
The prerequisite is CPS 3740 or CPS 5740 for CPS 5721. If you have not completed the prerequisite, you should withdraw from the class. The projects require a strong database and web database skills built from CPS 3740/5740.
Note: This course is available only in the spring semester.
Instructor: Dr. Ching-yu (Austin) Huang
Class information:
- All the class materials (syllabus, slides, assignments, tools) are posted on the class Google drive. You must use the Kean email account to access the folder.
- You can access the latest syllabus here.
- This is a hands-on course. Students are required to bring a personal laptop to every class.
- Instructional Methods: lecture with slides, student presentations, class discussions and exercises, and project creations.
- Course grading: please see the course syllabus.
- In addition to the written exams, this combined course requires students to study and present a selected data mining research topics and implement 3 big homework on a Linux server.
- Grading policy: Total 1000 points.
A: >=940, A-: 939-890, B+: 889-840, B: 839-800, B-: 799-760, C+: 759-720, C: 719-680, D: 679-640, F: below 640
(C or better is needed for CS and IT majors.)
(B- or better is needed for the CIS Graduate program.)
CPS 5721 Course Description
This course covers fundamentals of knowledge discovery and data mining concepts, techniques, algorithms, and languages; architectures, designs, and technology; and includes applications in business and the sciences.
CPS 5721 Student Learning Outcomes
Upon completion of this course, the student will be able to:
- Summarize knowledge discovery and data mining (KDDM).
- Summarize the historical perspective and the social impact of KDDM.
- Compare and summarize the differences between KDDM and information access/retrieval.
- Analyze KDDM processes, concepts, techniques, algorithms, and languages.
- Demonstrate KDDM applications in business and the sciences.
Books and resources
Requirements - Students will need the followings to do exercsies and assignments:
- Students will need an account on obi.kean.edu (a Linux server). The instructor will create the accounts so students can do exercises and assignments from the first day of class.
- Students will need the following tools to connect to a Linux server. hostname: obi.kean.edu, port: 22, protocol: ssh
Windows: Use "PuTTY" software to connect to the Linux server. You can download putty from http://www.putty.org/
Mac: Use "terminal" software. Please refer to this tutorial.
- FileZilla-Client to transfer files between the Linux server and your desktop/laptop. You can download FileZilla-Client at
https://filezilla-project.org/download.php?type=client
The hostname and port # is the same as above. Protocol should be "sftp". The logon type should be "normal". You need to enter the user login and password.
- A good text editor: sublime, visual studeio code, or others. You can download sublime at https://www.sublimetext.com/3
- Create public_html and CPS5721 folders under your home directory on obi.kean.edu, and set proper permission for the folders and files. Please refer to this yoda web page procedures
- Practice these basic Unix commands: cd, ls -la, more, pwd, mkdir, rm, mv, cp, chmod, etc. Please
refer to this Unix command document.
- Create and test your database account by following this procedure.
You should review the basic Unix, SQL, and PHP MySQL before the class starts. We will quickly go through these topics and then focus on data mining techniques. You can refresh Unix commands, SQL and PHP MySQL at the following links:
You can get help from the Samurai program for basic Web Database Programming. Samurai will host group review sessions for some topics related to the web & database. You can see the Samurai schedule and VIRTUAL walk-in hours at
Code Samurai Program.