课程名称: 数据挖掘专题 (Topics on Data Mining)
学 分: 1
周 学 时: 8
开课学院: 统计与数学学院
预修课程: 概率论与数理统计、随机过程、线性代数、Python语言设计
修读对象: 本科生
课程简介: 随着计算机和英特网的快速发展使得我们能够即时的获得大量信息,如文本、声音、图像等。此外,大量的个人数据,如搜索日志、购买记录和诊断历史,每天都在积累。如此巨大的数据量被称为大数据,通过从数据中提取有用的信息来创造新的价值和商业机会的趋势越来越大。这个过程通常被称为数据挖掘,机器学习算法是提取有用信息的关键核心技术。本课程,将全面系统地介绍数据挖掘的基本概念、原理及其主要方法,特别是性能度量、分类算法(决策树、SVM、KNN)、主成分和聚类、线性回归、文本挖掘(文本预处理、文本分类)以及神经网络算法等。
拟用教材: 李航 著,《统计学习方法》,清华大学出版社,2012年3月第1版
参考教材: (日)杉山将 著,《统计机器学习导论(英文版)》,机械工业出版社,2017年12月
Course Title: Topic on Data Mining
Credit: 1
Periods per week: 8
Department: School of Statistics and Mathematics,
Preparatory Course: probability and mathematical statistics, Stochastic Process, Linear Algebra, Python Programming
Students: Undergraduates
Contents: Recent development of computers and the Internet allows us to immediately access a vast amount of information such as texts, sounds, images, and movies. Furthermore, a wide range of personal data such as search logs, purchase records, and diagnosis history is accumulated every day. Such a huge amount of data is called big data, and there is a growing tendency to create new values and business opportunities by extracting useful knowledge from data. This process is often called data mining, and machine learning is the key technology for extracting useful knowledge. In this course, an overview of the field of machine learning is provided. The content mainly includes, in particular, supervised learning methods, including k-nearest neighbor method, decision tree, logistic regression, support vector machine, Cluster, Text mining and Neuron Network, etc.
Course Book: Statistical Learning, by Hang Li, Tsinghua Press, first edition, March 2012
Reference Book: Masashi Sugiyama, Statistical Machine Learning: An Introduction, Elsevier, 2015.