`optimisers.Rd`

Functions to set up optimisers (which find parameters that
maximise the joint density of a model) and change their tuning parameters,
for use in `opt()`

. For details of the algorithms and how to
tune them, see the
SciPy optimiser docs or the
TensorFlow optimiser docs.

nelder_mead() powell() cg() bfgs() newton_cg() l_bfgs_b(maxcor = 10, maxls = 20) tnc(max_cg_it = -1, stepmx = 0, rescale = -1) cobyla(rhobeg = 1) slsqp() gradient_descent(learning_rate = 0.01) adadelta(learning_rate = 0.001, rho = 1, epsilon = 1e-08) adagrad(learning_rate = 0.8, initial_accumulator_value = 0.1) adagrad_da( learning_rate = 0.8, global_step = 1L, initial_gradient_squared_accumulator_value = 0.1, l1_regularization_strength = 0, l2_regularization_strength = 0 ) momentum(learning_rate = 0.001, momentum = 0.9, use_nesterov = TRUE) adam(learning_rate = 0.1, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-08) ftrl( learning_rate = 1, learning_rate_power = -0.5, initial_accumulator_value = 0.1, l1_regularization_strength = 0, l2_regularization_strength = 0 ) proximal_gradient_descent( learning_rate = 0.01, l1_regularization_strength = 0, l2_regularization_strength = 0 ) proximal_adagrad( learning_rate = 1, initial_accumulator_value = 0.1, l1_regularization_strength = 0, l2_regularization_strength = 0 ) rms_prop(learning_rate = 0.1, decay = 0.9, momentum = 0, epsilon = 1e-10)

maxcor | maximum number of 'variable metric corrections' used to define the approximation to the hessian matrix |
---|---|

maxls | maximum number of line search steps per iteration |

max_cg_it | maximum number of hessian * vector evaluations per iteration |

stepmx | maximum step for the line search |

rescale | log10 scaling factor used to trigger rescaling of objective |

rhobeg | reasonable initial changes to the variables |

learning_rate | the size of steps (in parameter space) towards the optimal value |

rho | the decay rate |

epsilon | a small constant used to condition gradient updates |

initial_accumulator_value | initial value of the 'accumulator' used to tune the algorithm |

global_step | the current training step number |

initial_gradient_squared_accumulator_value | initial value of the accumulators used to tune the algorithm |

l1_regularization_strength | L1 regularisation coefficient (must be 0 or greater) |

l2_regularization_strength | L2 regularisation coefficient (must be 0 or greater) |

momentum | the momentum of the algorithm |

use_nesterov | whether to use Nesterov momentum |

beta1 | exponential decay rate for the 1st moment estimates |

beta2 | exponential decay rate for the 2nd moment estimates |

learning_rate_power | power on the learning rate, must be 0 or less |

decay | discounting factor for the gradient |

an `optimiser`

object that can be passed to `opt()`

.

The optimisers `powell()`

, `cg()`

, `newton_cg()`

,
`l_bfgs_b()`

, `tnc()`

, `cobyla()`

, and `slsqp()`

are
deprecated. They will be removed in greta 0.4.0, since they will no longer
be available in TensorFlow 2.0, on which that version of greta will depend.

The `cobyla()`

does not provide information about the number of
iterations nor convergence, so these elements of the output are set to NA

# NOT RUN { # use optimisation to find the mean and sd of some data x <- rnorm(100, -2, 1.2) mu <- variable() sd <- variable(lower = 0) distribution(x) <- normal(mu, sd) m <- model(mu, sd) # configure optimisers & parameters via 'optimiser' argument to opt opt_res <- opt(m, optimiser = bfgs()) # compare results with the analytic solution opt_res$par c(mean(x), sd(x)) # }