Learning to Answer Yes/No
Perceptron Hypothesis Set
Example
Approval Problem Revisited
A Simple Hypothesis Set
‘Perceptron’For x=(x1,x2,...,xd) ‘feature of customer’,compute a weighted’score’ and
<center> approvecreditif∑i=1dwixi>threshold<center>
<center> denycreditif∑i=1dwixi<threshold<center>
y:{+1(good),−1(bad)},0 ignored-linear formulah∈Hare
<center>h(x)=sign((∑i=idwixi)−threshold)<center>
this is called’perceptron’hypothesis histroically
h(x)=sign((i=1∑dwixi)−threshold)=sign((i=1∑dwixi)+w0(threshold)⋅x0(+1))=sign(sumi=0dwixi)=sign(WTx)- each’tall’w represents a hypothesis h &is multiplied with ‘tall’ x——will use tall versions to simplify notation
Q
does
h look like
Perceptrons in R2
perceptrons⇔linear(binary) classifiers
Perceptron Learning Algorithm (PLA)
Select g from H
- want: g≈f(hard when f unkown)
- almost necessary≈f on D,ideally g(xn)=f(xn)=yn
- difficultis of infinite size
- idea: start from some g0,and ‘correct’ its mistakes on D
Perceptron Learning Algorithm
For t=0,1,…
1. find a mistake of wt called(xn(t),yn(t))
sign(WtTXn(t))=yn(t)
2. (try to) correct the mistake by
wt+1←wt+yn(t)xn(t)
…until no more mistakes
return last w (called wPLA )as g
My question
determinant is like vector
Pratical implementation of PLA
Cyclic PLA
Some Remaning issues of PLA
### Algorithmic
(no mistake)?
- naive cyclic:??
- random cyclic:??
- other variant:??
Learning:g≈f ?
- on D, if halt,yes(no mistake)
- outside D:??
- if not halting:??
Guarantee of PLA
Linear Separability
- if PLA halts(i.e. no more mistakes), (necessary condition)D allows some w to make no mistakes
- call such D linear separable
PLA Fact: wtGets More Aligned with wf
wf perfect hence every xn correctly away from line: yn(t)wfTxn(t)≥nminyn(t)wfTxn(t)>0
wfTwt↑by updating with any(xn(t),yn(t))
wfTwt+1=wfT(wt+yn(t)xn(t))≥wfTwt+nminyn(t)wfTxn(t)>wfTwt+0
wt appears more algned with wf
Q
length of vector
PLA fact: wt Does Not Grow Too Fast
wt changed only when mistake
⇔sign(wtTxn(t))=yn(t)⇔yn(t)wtTxn(t)≤0
- mistake ‘limits’∣∣wt∣∣2 growth,even when updating with’longest’ xn ∣∣wt+1∣∣2=∣∣wt+yn(t)xn(t)∣∣2=∣∣wt∣∣2+2yn(t)wtTxn(t)+∣∣yn(t)xn(t)∣∣2≤∣∣wt∣∣2+0+∣∣yn(t)xn(t)∣∣2≤∣∣wt∣∣2+nmax∣∣ynxn∣∣2
start from w0=0,afterT mistake corrections,
∣∣wf∣∣∣∣wT∣∣wfTwT≥T⋅constantNovikoff theorem
ref
page 31设训练数据集T=(x1,y1),(x2,y2),...,(xN,yN)是线性可分的,其中xi∈X=Rn,yi∈Y=−1,+1,i=1,2,...,N,则
(1)存在满足条件∣∣w^opt∣∣=1的超平面w^opt⋅x^+bopt=0将训练数据集完全正确分开,且存在γ>0,对所有i=1,2,...,N
yi(w^opt⋅x^i)=yi(w^opt⋅x^i+bopt≥γ(2)令R=1≤i≤Nmax∣∣x^i∣∣,则感知机算法在训练数据集上的误分类次数k满足不等式
k≤(γR)2Non-Separable Data
More about PLA
CONS
not’linnear separable’,and not fully sure how long halting takesLearning with Noisy Data
Line with Noise Tolerance
NP-hard
Pocket Algorithm
Find the least mistakes until iterations
Summary
Perceptron Hypothesis Set
hyperplanes/linear classifiers in Rd
Perceptron Learning Algorithm(PLA)
correct mistakes and improve iteratively
Guarantee of PLA
no mistake eventually if linear separable
Non-Separable Data
hold somewhat’best’weights in pocket