数据挖掘专题

程宏、王艺红、张海彬、许志钦

目录

  • 1 学习指南
    • 1.1 学习指南
    • 1.2 教学大纲
    • 1.3 授课计划
  • 2 机器学习介绍(预习)
    • 2.1 机器学习介绍
    • 2.2 机器学习Python
    • 2.3 监督 Vs 无监督学习
    • 2.4 Quiz1--机器学习介绍
  • 3 数据挖掘之--模型评价与分类算法
    • 3.1 章节介绍
    • 3.2 机器学习的模型评价
    • 3.3 机器学习分类算法
    • 3.4 分类算法--K近邻算法
      • 3.4.1 Lab: KNN
    • 3.5 分类算法--决策树模型与学习
      • 3.5.1 Lab:Decision Trees
    • 3.6 分类算法--逻辑斯蒂回归模型
      • 3.6.1 Lab: Logistic Regression
    • 3.7 分类算法--支持向量机
      • 3.7.1 Lab: SVM
    • 3.8 Quiz2--Classification
  • 4 数据挖掘之--文本挖掘
    • 4.1 文本挖掘简介
    • 4.2 预处理
    • 4.3 文本分类
    • 4.4 文本挖掘实战
    • 4.5 讲义ppt
  • 5 数据挖掘之--无监督学习
    • 5.1 K-Means聚类
    • 5.2 聚类分析应用实例
    • 5.3 主成分分析
    • 5.4 主成分分析应用实例
  • 6 数据挖掘之--神经网络
    • 6.1 基本结构和主要问题
      • 6.1.1 Lect1-课堂作业
      • 6.1.2 Lect1-课后作业
    • 6.2 傅里叶分析一
      • 6.2.1 Lect2--课堂作业
      • 6.2.2 Lect2-课后作业
    • 6.3 傅里叶分析二
      • 6.3.1 Lect3-课堂作业
      • 6.3.2 Lect3-课后作业
    • 6.4 频率原则
章节介绍

模型评价与分类算法

在这模块,你将会学习到分类技术。 你将练习不同的分类算法,如KNN、决策树、Logistic回归和SVM。此外,你还将了解每种方法的优缺点以及不同分类的分类精度指标。

In Module 2, you will learn about classification techniques. You practice with different classification algorithms, such as KNN, Decision Trees, Logistic Regression, and SVM. Also, you learn about the pros and cons of each method and different classification accuracy metrics.

 KEY CONCEPTS

· To understand different Classification methods.

· To apply Classification algorithms on various datasets to solve real-world problems.

· To understand evaluation methods in Classification.

A Tour of Machine Learning Classifiers Using sci-kit-learn

Chapter Outline

  • Choosing a classification algorithm

  • First steps with scikit-learn – training a perceptron

  • Modeling class probabilities via logistic regression

    • Logistic regression intuition and conditional probabilities

    • Learning the weights of the logistic cost function

    • Converting an Adaline implementation into an algorithm for logistic regression

    • Training a logistic regression model with scikit-learn

    • Tackling overtting via regularization

  • Maximum margin classification with support vector machines

    • Maximum margin intuition

    • Dealing with a nonlinearly separable case using slack variables

    • Alternative implementations in scikit-learn

  • Solving nonlinear problems using a kernel SVM

    • Kernel methods for linearly inseparable data

    • Using the kernel trick to find separating hyperplanes in high-dimensional space

  • Decision tree learning

    • Maximizing information gain – getting the most bang for your buck

    • Building a decision tree

    • Combining multiple decision trees via random forests

  • K-nearest neighbors – a lazy learning algorithm

A note on using the code examples

The recommended way to interact with the code examples in this course is via Jupyter Notebook (the .ipynb files). Using Jupyter Notebook, you will be able to execute the code step by step and have all the resulting outputs (including plots and images) all in one convenient document.

Setting up Jupyter Notebook is really easy: if you are using the Anaconda Python distribution, all you need to install jupyter notebook is to execute the following command in your terminal:

conda install jupyter notebook

Then you can launch jupyter notebook by executing

jupyter notebook

A window will open up in your browser, which you can then use to navigate to the target directory that contains the .ipynb file you wish to open.

You can also consider the IBM Developer Skills Network Labs:

https://labs.cognitiveclass.ai