CS485/685 Foundations of Machine Learning
参考资料
What is Machine Learning?
-
从experience到expertise
-
expertise usually will be another program
-
nature
- experiments 举了一个老鼠被毒死的例子,老鼠被毒过一次,就会知道某种看起来像食物的东西是不能吃的。老鼠学习的关键是记忆和如何将经验转化为一种普遍和推广的认识。
- pigeon superstation experiments
- B.F.Skinner 心理学家
- 行为心理学认为所有动物都相当于一个黑箱,我们无法理解内部
- 收集了很多鸽子放到一个巨大的屋子里,屋子的顶部有机械装置可以喷洒谷粒,and put some toys around the fool.
- Skinner不断的调整间隔intervals to have the food rain, pigeon will peck the toy usually, for more times ,pigeon will find out that some toys are relation with the food rain.
-
silly generalization
-
what distinguishes between these two types of learning (two phenomena )?
-
Garcia 博士 1996 尝试理解第一个实验,做了相同的事,给了老鼠类似食物的毒药诱饵,但是不同的是,在老鼠尝试去触碰有毒的食物时,他将会响铃。
讨论时有人说,食物并不是足以致死的食物,不然实验无法进行多轮
-
老鼠是否会学会响铃和毒药事物的关系,大概说了一下巴甫洛夫的狗和这个实验的区别和联系。
-
老鼠是否可以通过一次的experiment 得到 association between the ringing of a bell and the sickness five hours later.
-
-
结果是老鼠最终失败了
-
prior knowledge
-
why not pay attention to every thing ?
- 当老鼠在想毒药和食物的因果关系的时候,来自食物的信息是特殊的,如果接受了所有的信息,那么所有信息对老鼠来说都是特殊的。
-
prior knowledge is the key of what going to do in machine learning .
-
poor prior knowledge means more training data
-
a lots of prior knowledge,means I already knows how to do ,even don’t need collect data.
-
机器学习介于上述两者之间,需要首先判断我们有多少先验知识,然后再收集数据进行训练。
What do we need Machine learning ?
-
如果我们不使用machine learning的话自己编写一个程序,区分spam和not spam 是比较困难的
-
Tasks that require experience with amounts of date that are beyond capabilities
需要处理超出能力范围的大量数据的经验的任务
- 举了一个Google的例子,google 收集大数据来分析用户的在页面点击的数据
Many types of Machine learning
-
supervised VS unsupervised
-
supervised监督学习
- 通过人工标记垃圾邮件和正常邮件来训练模型
-
unsupervised无监督学习
- 没有需要人工标记的地方,需要通过整个模型来找到outliers(离群值),或者叫找到特征
- scenarios 情节
-
reinforce learning 强化学习
- 介于监督学习和无监督学习之间
-
Batch learning 批处理学习
-
online learning 在线学习,不断地观察周围进行学习
-
Cooperative -> indifferent -> adversarial teacher
-
被动学习主动学习
-
relationship courses
- AI programming
- algorithms 涉及计算复杂度
- linear algebra
- statistics
- combinatorics
- optimization
Outline of the course
- principles
- supervised batch
- Algorithmic paradigms
Papaya Tasting
- 在一个贫瘠的小岛上有很多木瓜树,在岛上的人不得不学会如何判断树上的木瓜熟没熟
- 挑选特征
- color
- softness hard or mushy
- Papaya -> (soft,color)
- training data 就是二维的所有的点
- domain space 是(0,1)
- label {Tasty,N}
- output, $f:[0,1]^2 - > {T,N} $
- assumption about data generation
- the data is randomly generated
- Realizability by rectangles 可实现的矩形
How do I evaluate the success of such a function
- measure of success
Formal modal for learning
-
papayas
- Domain set X,$ [0,1]^2$
- Label set Y,$ (T,N)$
- Training input ,set of already
- sample,,tasted papaya
- learners output,h:X->Y
- get the prediction rule for tastiness
-
the quality of such h is determined with respect to some data generating distribution and labeling rule.
这些h的质量是根据一些数据生成分布和标注规则来决定的。
-
L : lost
-
D: distribution that generates papaya in the world.(分布)
-
F: function,the true function X->Y. (其中x表示一个点,(softness,color),a strong assumption)
-
this is kind of the probability of error (h失败的可能性)
-
the goal of learner is given s to come up with h of small loss.学习者的目标是给定s来想出h的小损失
-
Basic learner strategy
-
Empirical risk Minimization (ERM) ,经验风险最小化
- 我们并不知道样本的分布,我们只知道这些sample,样品
-
define ,s 表示 sample,empirical loss of h,下面表示损失函数是如何定义的
\begin{align} L_s(h) = \frac{|di: h(x_i)\ne y_i|}{|S|} \end{align}
-
A very simple rule for finding h with small ER
\begin{align} h_S(x) \triangleq \begin{cases}y_i \quad if, x=x_i \quad for \quad some(x_i,y_i)\in S, \\N \quad otherwise\end{cases}\end{align}
-
We really to amend this empirical risk minimization technique.
我们真的要修正这个经验风险最小化技术 (ERM)
-
overfitting 过拟合
How to overcome the overfitting?
-
to guard against overfitting we introduce some prior knowledge(inductive bias)
为了防止过拟合,我们引入了一些先验知识(归纳偏差)
-
the exist a good prediction rule that is some axis some aligned rectangle.
存在一个很好的预测规则,即某一轴某一对齐矩形。
-
Let H denote a fixed collection of potential (candidate) prediction rules.
设H表示一个固定的潜在(候选)预测规则集合。
-
call it hypothesis class h
Revised learning rule
—— pick h $ h \in H $ that minimizes of
- ERM表示一个算法,上面的意思是,我的算法将会选择一个最小的损失(a minimum loss)
- 可以保证在众多的假设中,这是一个成功的学习规则(learning rule)
arg min 就是使后面这个式子达到最小值时的x,t的取值。
arg的意思:aargument of a complex number 复数的辐角
Theorem
-
Theorem : Let x be any set , y = {0,1}, and let H be a finite set of functions from X to Y.
设x为任意集合,y = {0,1}, H为从x到y的有限函数集。
-
Assume : the training sample S is generated by some probability distribution d over x and labeled by some $ f \in H$
-
and the elements of S are picked i.i.d ( indentically independent distributed 独立分布相同,意思是每一个xi独立的取出,不考虑先前的取出, indentically所有的取出基于相同的分布)
-
Then $ERM_h $ is guaranteed to come up with an h that has small true loss , given sufficients large sample S. 在足够大样本S的情况下,$ERM_h $保证得到一个真实损失小的h。
Proof
-
coufusing samples are those on which may come up with a bad h.
-
Fix some success parameter
-
the set of congusing S’s is $ \lbrace S ; L_{D,f}(h_s) > \epsilon\rbrace$
-
We wish to upper bound the probability of fetting such a bad sample.
我们希望把得到这样一个坏样品的可能性设个上限。
- D[A] 表示命中A的概率,上面的A为Bad samples