< overcomplete Fourier representation (128 basis vectors). The plot shows rank order distribntion of the coefficients of s nnder a Ganssian prior (dashed); and a Laplacian prior (solid). Far more significantly positive coefficients are reqnired under the Ganssian prior than nnder the Laplacian prior. 2 Learning The learning objective is to adapt A to maximize the probability of the data which is computed by marginalizing over the internal states P(xlA) =/ds P(s)P(xlA, s) (3) general, this integral cannot be evaluated analytically but can be approximated with a Gaussian integral around , yielding log P(x[A) const. + log P( ) - A A )a 1 (x - - log der H (4) where H is the Hessian of the log posterior at , given by AATA - VX7 log P( To avoid a singularity under the Laplacian prior, we use the approximation (log P(s,))' -0tanh( s,) which gives the Hessian full rank and positive de- terminant. For large this approximates t, he true Laplacian prior. A learning rule can be obtained by differentiating log P(x[A) with respect to A. In the following discussion, we will present the derivations of the three terms in (4) and simplifying assumptions that lead to the following simple form of the learning rule /..-XA -- AATV log P(xIA) -A(z T q- l) (5) where z = O log P(sa)/Osa. >
Back to Index A paper by famous neuroscientist Terry Sejnowski
Digital Library Home