参考资料

  • B站视频
  • 官方课本
  • katex 参考
  • 这样做笔记还是比较吃力,建议还是卖一本原版书对着看。
  • 英文四级刚过,水平有限,有错误欢迎指正😂

What is Machine Learning?

  • 从experience到expertise

  • expertise usually will be another program

  • nature

    • experiments 举了一个老鼠被毒死的例子,老鼠被毒过一次,就会知道某种看起来像食物的东西是不能吃的。老鼠学习的关键是记忆和如何将经验转化为一种普遍和推广的认识。
    • pigeon superstation experiments
      • B.F.Skinner 心理学家
      • 行为心理学认为所有动物都相当于一个黑箱,我们无法理解内部
      • 收集了很多鸽子放到一个巨大的屋子里,屋子的顶部有机械装置可以喷洒谷粒,and put some toys around the fool.
      • Skinner不断的调整间隔intervals to have the food rain, pigeon will peck the toy usually, for more times ,pigeon will find out that some toys are relation with the food rain.
  • silly generalization

  • what distinguishes between these two types of learning (two phenomena )?

    • Garcia 博士 1996 尝试理解第一个实验,做了相同的事,给了老鼠类似食物的毒药诱饵,但是不同的是,在老鼠尝试去触碰有毒的食物时,他将会响铃。

      讨论时有人说,食物并不是足以致死的食物,不然实验无法进行多轮

    • 老鼠是否会学会响铃和毒药事物的关系,大概说了一下巴甫洛夫的狗和这个实验的区别和联系。

    • 老鼠是否可以通过一次的experiment 得到 association between the ringing of a bell and the sickness five hours later.

  • 结果是老鼠最终失败了

  • prior knowledge

  • why not pay attention to every thing ?

    • 当老鼠在想毒药和食物的因果关系的时候,来自食物的信息是特殊的,如果接受了所有的信息,那么所有信息对老鼠来说都是特殊的。
  • prior knowledge is the key of what going to do in machine learning .

  • poor prior knowledge means more training data

  • a lots of prior knowledge,means I already knows how to do ,even don’t need collect data.

  • 机器学习介于上述两者之间,需要首先判断我们有多少先验知识,然后再收集数据进行训练。

What do we need Machine learning ?

  • 如果我们不使用machine learning的话自己编写一个程序,区分spam和not spam 是比较困难的

  • Tasks that require experience with amounts of date that are beyond capabilities

    需要处理超出能力范围的大量数据的经验的任务

    • 举了一个Google的例子,google 收集大数据来分析用户的在页面点击的数据

Many types of Machine learning

  • supervised VS unsupervised

  • supervised监督学习

    • 通过人工标记垃圾邮件和正常邮件来训练模型
  • unsupervised无监督学习

    • 没有需要人工标记的地方,需要通过整个模型来找到outliers(离群值),或者叫找到特征
    • scenarios 情节
  • reinforce learning 强化学习

    • 介于监督学习和无监督学习之间
  • Batch learning 批处理学习

  • online learning 在线学习,不断地观察周围进行学习

  • Cooperative -> indifferent -> adversarial teacher

  • 被动学习主动学习

  • relationship courses

    • AI programming
    • algorithms 涉及计算复杂度
    • linear algebra
    • statistics
    • combinatorics
    • optimization

Outline of the course

  • principles
    • supervised batch
  • Algorithmic paradigms

Papaya Tasting

  • 在一个贫瘠的小岛上有很多木瓜树,在岛上的人不得不学会如何判断树上的木瓜熟没熟
  • 挑选特征
  • color
  • softness hard or mushy
  • Papaya -> (soft,color)
  • training data 就是二维的所有的点
  • domain space 是(0,1)
  • label {Tasty,N}
  • output, $f:[0,1]^2 - > {T,N} $
  • assumption about data generation
    • the data is randomly generated
    • Realizability by rectangles 可实现的矩形

How do I evaluate the success of such a function

  • measure of success

Formal modal for learning

  • papayas

    • Domain set X,$ [0,1]^2$
    • Label set Y,$ (T,N)$
    • Training input ,set of already
    • sampleS=((x1,y1)(xm,ym))S= ((x_1,y_1)…(x_m,y_m)),tasted papaya
    • learners output,h:X->Y
    • get the prediction rule for tastiness
  • the quality of such h is determined with respect to some data generating distribution and labeling rule.

    这些h的质量是根据一些数据生成分布和标注规则来决定的。

LD,f(h)=D[dx:h(x)f(x)]L_{D,f}(h) = D[dx: h(x) \ne f(x)]

  • L : lost

  • D: distribution that generates papaya in the world.(分布)

  • F: function,the true function X->Y. (其中x表示一个点,(softness,color),a strong assumption)

  • this is kind of the probability of error (h失败的可能性)

  • the goal of learner is given s to come up with h of small loss.学习者的目标是给定s来想出h的小损失

  • Basic learner strategy

  • Empirical risk Minimization (ERM) ,经验风险最小化

    • 我们并不知道样本的分布,我们只知道这些sample,样品
  • define Ls(h)L_s(h),s 表示 sample,empirical loss of h,下面表示损失函数是如何定义的

    \begin{align} L_s(h) = \frac{|di: h(x_i)\ne y_i|}{|S|} \end{align}

  • A very simple rule for finding h with small ER

    \begin{align} h_S(x) \triangleq \begin{cases}y_i \quad if, x=x_i \quad for \quad some(x_i,y_i)\in S, \\N \quad otherwise\end{cases}\end{align}

  • We really to amend this empirical risk minimization technique.

    我们真的要修正这个经验风险最小化技术 (ERM)

  • overfitting 过拟合

How to overcome the overfitting?

  • to guard against overfitting we introduce some prior knowledge(inductive bias)

    为了防止过拟合,我们引入了一些先验知识(归纳偏差)

  • the exist a good prediction rule that is some axis some aligned rectangle.

    存在一个很好的预测规则,即某一轴某一对齐矩形。

  • Let H denote a fixed collection of potential (candidate) prediction rules.

    设H表示一个固定的潜在(候选)预测规则集合。

    H{f:f:xy}=XyH \subseteq \lbrace f :f:x \to y \rbrace = X^y

  • call it hypothesis class h

Revised learning rule

ERMhERM_h —— pick h $ h \in H $ that minimizes of Ls(h)L_s(h)

ERMH(s)Argmin{Ls(h),hH}ERM_H(s) \in Argmin \lbrace L_s(h),h \in H\rbrace

  • ERM表示一个算法,上面的意思是,我的算法将会选择一个最小的损失(a minimum loss)
  • 可以保证在众多的假设中,这是一个成功的学习规则(learning rule)

arg min 就是使后面这个式子达到最小值时的x,t的取值

arg的意思:aargument of a complex number 复数的辐角

来源:https://www.cnblogs.com/Ph-one/p/12811442.html

Theorem

  • Theorem : Let x be any set , y = {0,1}, and let H be a finite set of functions from X to Y.

    设x为任意集合,y = {0,1}, H为从x到y的有限函数集。

  • Assume : the training sample S is generated by some probability distribution d over x and labeled by some $ f \in H$

  • and the elements of S are picked i.i.d ( indentically independent distributed 独立分布相同,意思是每一个xi独立的取出,不考虑先前的取出, indentically所有的取出基于相同的分布)

  • Then $ERM_h $ is guaranteed to come up with an h that has small true loss , given sufficients large sample S. 在足够大样本S的情况下,$ERM_h $保证得到一个真实损失小的h。

Proof

  • coufusing samples are those on which ERMhERM_h may come up with a bad h.

  • Fix some success parameter ϵ>0\epsilon > 0

  • the set of congusing S’s is $ \lbrace S ; L_{D,f}(h_s) > \epsilon\rbrace$

  • We wish to upper bound the probability of fetting such a bad sample.

    我们希望把得到这样一个坏样品的可能性设个上限。

    Dm[{Sx:LD,f(hs)>ϵ}]D^m [ \lbrace S|_x : L_{D,f}(h_s)> \epsilon \rbrace]

    • D[A] 表示命中A的概率,上面的A为Bad samples