optimisers.Rd
Functions to set up optimisers (which find parameters that
maximise the joint density of a model) and change their tuning parameters,
for use in opt()
. For details of the algorithms and how to
tune them, see the
SciPy optimiser docs or the
TensorFlow optimiser docs.
nelder_mead() powell() cg() bfgs() newton_cg() l_bfgs_b(maxcor = 10, maxls = 20) tnc(max_cg_it = -1, stepmx = 0, rescale = -1) cobyla(rhobeg = 1) slsqp() gradient_descent(learning_rate = 0.01) adadelta(learning_rate = 0.001, rho = 1, epsilon = 1e-08) adagrad(learning_rate = 0.8, initial_accumulator_value = 0.1) adagrad_da( learning_rate = 0.8, global_step = 1L, initial_gradient_squared_accumulator_value = 0.1, l1_regularization_strength = 0, l2_regularization_strength = 0 ) momentum(learning_rate = 0.001, momentum = 0.9, use_nesterov = TRUE) adam(learning_rate = 0.1, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-08) ftrl( learning_rate = 1, learning_rate_power = -0.5, initial_accumulator_value = 0.1, l1_regularization_strength = 0, l2_regularization_strength = 0 ) proximal_gradient_descent( learning_rate = 0.01, l1_regularization_strength = 0, l2_regularization_strength = 0 ) proximal_adagrad( learning_rate = 1, initial_accumulator_value = 0.1, l1_regularization_strength = 0, l2_regularization_strength = 0 ) rms_prop(learning_rate = 0.1, decay = 0.9, momentum = 0, epsilon = 1e-10)
maxcor | maximum number of 'variable metric corrections' used to define the approximation to the hessian matrix |
---|---|
maxls | maximum number of line search steps per iteration |
max_cg_it | maximum number of hessian * vector evaluations per iteration |
stepmx | maximum step for the line search |
rescale | log10 scaling factor used to trigger rescaling of objective |
rhobeg | reasonable initial changes to the variables |
learning_rate | the size of steps (in parameter space) towards the optimal value |
rho | the decay rate |
epsilon | a small constant used to condition gradient updates |
initial_accumulator_value | initial value of the 'accumulator' used to tune the algorithm |
global_step | the current training step number |
initial_gradient_squared_accumulator_value | initial value of the accumulators used to tune the algorithm |
l1_regularization_strength | L1 regularisation coefficient (must be 0 or greater) |
l2_regularization_strength | L2 regularisation coefficient (must be 0 or greater) |
momentum | the momentum of the algorithm |
use_nesterov | whether to use Nesterov momentum |
beta1 | exponential decay rate for the 1st moment estimates |
beta2 | exponential decay rate for the 2nd moment estimates |
learning_rate_power | power on the learning rate, must be 0 or less |
decay | discounting factor for the gradient |
an optimiser
object that can be passed to opt()
.
The optimisers powell()
, cg()
, newton_cg()
,
l_bfgs_b()
, tnc()
, cobyla()
, and slsqp()
are
deprecated. They will be removed in greta 0.4.0, since they will no longer
be available in TensorFlow 2.0, on which that version of greta will depend.
The cobyla()
does not provide information about the number of
iterations nor convergence, so these elements of the output are set to NA
# NOT RUN { # use optimisation to find the mean and sd of some data x <- rnorm(100, -2, 1.2) mu <- variable() sd <- variable(lower = 0) distribution(x) <- normal(mu, sd) m <- model(mu, sd) # configure optimisers & parameters via 'optimiser' argument to opt opt_res <- opt(m, optimiser = bfgs()) # compare results with the analytic solution opt_res$par c(mean(x), sd(x)) # }