CS485/685 Foundations of Machine Learning
参考资料
What is Machine Learning?
- 
从experience到expertise 
- 
expertise usually will be another program 
- 
nature - experiments 举了一个老鼠被毒死的例子,老鼠被毒过一次,就会知道某种看起来像食物的东西是不能吃的。老鼠学习的关键是记忆和如何将经验转化为一种普遍和推广的认识。
- pigeon superstation experiments
- B.F.Skinner 心理学家
- 行为心理学认为所有动物都相当于一个黑箱,我们无法理解内部
- 收集了很多鸽子放到一个巨大的屋子里,屋子的顶部有机械装置可以喷洒谷粒,and put some toys around the fool.
- Skinner不断的调整间隔intervals to have the food rain, pigeon will peck the toy usually, for more times ,pigeon will find out that some toys are relation with the food rain.
 
 
- 
silly generalization 
- 
what distinguishes between these two types of learning (two phenomena )? - 
Garcia 博士 1996 尝试理解第一个实验,做了相同的事,给了老鼠类似食物的毒药诱饵,但是不同的是,在老鼠尝试去触碰有毒的食物时,他将会响铃。 讨论时有人说,食物并不是足以致死的食物,不然实验无法进行多轮 
- 
老鼠是否会学会响铃和毒药事物的关系,大概说了一下巴甫洛夫的狗和这个实验的区别和联系。 
- 
老鼠是否可以通过一次的experiment 得到 association between the ringing of a bell and the sickness five hours later. 
 
- 
- 
结果是老鼠最终失败了 
- 
prior knowledge 
- 
why not pay attention to every thing ? - 当老鼠在想毒药和食物的因果关系的时候,来自食物的信息是特殊的,如果接受了所有的信息,那么所有信息对老鼠来说都是特殊的。
 
- 
prior knowledge is the key of what going to do in machine learning . 
- 
poor prior knowledge means more training data 
- 
a lots of prior knowledge,means I already knows how to do ,even don’t need collect data. 
- 
机器学习介于上述两者之间,需要首先判断我们有多少先验知识,然后再收集数据进行训练。 
What do we need Machine learning ?
- 
如果我们不使用machine learning的话自己编写一个程序,区分spam和not spam 是比较困难的 
- 
Tasks that require experience with amounts of date that are beyond capabilities 需要处理超出能力范围的大量数据的经验的任务 - 举了一个Google的例子,google 收集大数据来分析用户的在页面点击的数据
 
Many types of Machine learning
- 
supervised VS unsupervised 
- 
supervised监督学习 - 通过人工标记垃圾邮件和正常邮件来训练模型
 
- 
unsupervised无监督学习 - 没有需要人工标记的地方,需要通过整个模型来找到outliers(离群值),或者叫找到特征
- scenarios 情节
 
- 
reinforce learning 强化学习 - 介于监督学习和无监督学习之间
 
- 
Batch learning 批处理学习 
- 
online learning 在线学习,不断地观察周围进行学习 
- 
Cooperative -> indifferent -> adversarial teacher 
- 
被动学习主动学习 
- 
relationship courses - AI programming
- algorithms 涉及计算复杂度
- linear algebra
- statistics
- combinatorics
- optimization
 
Outline of the course
- principles
- supervised batch
 
- Algorithmic paradigms
Papaya Tasting
- 在一个贫瘠的小岛上有很多木瓜树,在岛上的人不得不学会如何判断树上的木瓜熟没熟
- 挑选特征
- color
- softness hard or mushy
- Papaya -> (soft,color)
- training data 就是二维的所有的点
- domain space 是(0,1)
- label {Tasty,N}
- output, $f:[0,1]^2 - > {T,N} $
- assumption about data generation
- the data is randomly generated
- Realizability by rectangles 可实现的矩形
 
How do I evaluate the success of such a function
- measure of success
Formal modal for learning
- 
papayas - Domain set X,$ [0,1]^2$
- Label set Y,$ (T,N)$
- Training input ,set of already
- sample,,tasted papaya
- learners output,h:X->Y
- get the prediction rule for tastiness
 
- 
the quality of such h is determined with respect to some data generating distribution and labeling rule. 这些h的质量是根据一些数据生成分布和标注规则来决定的。 
- 
L : lost 
- 
D: distribution that generates papaya in the world.(分布) 
- 
F: function,the true function X->Y. (其中x表示一个点,(softness,color),a strong assumption) 
- 
this is kind of the probability of error (h失败的可能性) 
- 
the goal of learner is given s to come up with h of small loss.学习者的目标是给定s来想出h的小损失 
- 
Basic learner strategy 
- 
Empirical risk Minimization (ERM) ,经验风险最小化 - 我们并不知道样本的分布,我们只知道这些sample,样品
 
- 
define ,s 表示 sample,empirical loss of h,下面表示损失函数是如何定义的 \begin{align} L_s(h) = \frac{|di: h(x_i)\ne y_i|}{|S|} \end{align} 
- 
A very simple rule for finding h with small ER \begin{align} h_S(x) \triangleq \begin{cases}y_i \quad if, x=x_i \quad for \quad some(x_i,y_i)\in S, \\N \quad otherwise\end{cases}\end{align} 
- 
We really to amend this empirical risk minimization technique. 我们真的要修正这个经验风险最小化技术 (ERM) 
- 
overfitting 过拟合 
How to overcome the overfitting?
- 
to guard against overfitting we introduce some prior knowledge(inductive bias) 为了防止过拟合,我们引入了一些先验知识(归纳偏差) 
- 
the exist a good prediction rule that is some axis some aligned rectangle. 存在一个很好的预测规则,即某一轴某一对齐矩形。 
- 
Let H denote a fixed collection of potential (candidate) prediction rules. 设H表示一个固定的潜在(候选)预测规则集合。 
- 
call it hypothesis class h 
Revised learning rule
—— pick h $ h \in H $ that minimizes of
- ERM表示一个算法,上面的意思是,我的算法将会选择一个最小的损失(a minimum loss)
- 可以保证在众多的假设中,这是一个成功的学习规则(learning rule)
arg min 就是使后面这个式子达到最小值时的x,t的取值。
arg的意思:aargument of a complex number 复数的辐角
Theorem
- 
Theorem : Let x be any set , y = {0,1}, and let H be a finite set of functions from X to Y. 设x为任意集合,y = {0,1}, H为从x到y的有限函数集。 
- 
Assume : the training sample S is generated by some probability distribution d over x and labeled by some $ f \in H$ 
- 
and the elements of S are picked i.i.d ( indentically independent distributed 独立分布相同,意思是每一个xi独立的取出,不考虑先前的取出, indentically所有的取出基于相同的分布) 
- 
Then $ERM_h $ is guaranteed to come up with an h that has small true loss , given sufficients large sample S. 在足够大样本S的情况下,$ERM_h $保证得到一个真实损失小的h。 
Proof
- 
coufusing samples are those on which may come up with a bad h. 
- 
Fix some success parameter 
- 
the set of congusing S’s is $ \lbrace S ; L_{D,f}(h_s) > \epsilon\rbrace$ 
- 
We wish to upper bound the probability of fetting such a bad sample. 我们希望把得到这样一个坏样品的可能性设个上限。 - D[A] 表示命中A的概率,上面的A为Bad samples
 

