Functions to set up optimisers (which find parameters that maximise the joint density of a model) and change their tuning parameters, for use in opt(). For details of the algorithms and how to tune them, see the SciPy optimiser docs or the TensorFlow optimiser docs.

nelder_mead()

powell()

cg()

bfgs()

newton_cg()

l_bfgs_b(maxcor = 10, maxls = 20)

tnc(max_cg_it = -1, stepmx = 0, rescale = -1)

cobyla(rhobeg = 1)

slsqp()

gradient_descent(learning_rate = 0.01)

adadelta(learning_rate = 0.001, rho = 1, epsilon = 1e-08)

adagrad(learning_rate = 0.8, initial_accumulator_value = 0.1)

adagrad_da(
  learning_rate = 0.8,
  global_step = 1L,
  initial_gradient_squared_accumulator_value = 0.1,
  l1_regularization_strength = 0,
  l2_regularization_strength = 0
)

momentum(learning_rate = 0.001, momentum = 0.9, use_nesterov = TRUE)

adam(learning_rate = 0.1, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-08)

ftrl(
  learning_rate = 1,
  learning_rate_power = -0.5,
  initial_accumulator_value = 0.1,
  l1_regularization_strength = 0,
  l2_regularization_strength = 0
)

proximal_gradient_descent(
  learning_rate = 0.01,
  l1_regularization_strength = 0,
  l2_regularization_strength = 0
)

proximal_adagrad(
  learning_rate = 1,
  initial_accumulator_value = 0.1,
  l1_regularization_strength = 0,
  l2_regularization_strength = 0
)

rms_prop(learning_rate = 0.1, decay = 0.9, momentum = 0, epsilon = 1e-10)

Arguments

maxcor

maximum number of 'variable metric corrections' used to define the approximation to the hessian matrix

maxls

maximum number of line search steps per iteration

max_cg_it

maximum number of hessian * vector evaluations per iteration

stepmx

maximum step for the line search

rescale

log10 scaling factor used to trigger rescaling of objective

rhobeg

reasonable initial changes to the variables

learning_rate

the size of steps (in parameter space) towards the optimal value

rho

the decay rate

epsilon

a small constant used to condition gradient updates

initial_accumulator_value

initial value of the 'accumulator' used to tune the algorithm

global_step

the current training step number

initial_gradient_squared_accumulator_value

initial value of the accumulators used to tune the algorithm

l1_regularization_strength

L1 regularisation coefficient (must be 0 or greater)

l2_regularization_strength

L2 regularisation coefficient (must be 0 or greater)

momentum

the momentum of the algorithm

use_nesterov

whether to use Nesterov momentum

beta1

exponential decay rate for the 1st moment estimates

beta2

exponential decay rate for the 2nd moment estimates

learning_rate_power

power on the learning rate, must be 0 or less

decay

discounting factor for the gradient

Value

an optimiser object that can be passed to opt().

Details

The optimisers powell(), cg(), newton_cg(), l_bfgs_b(), tnc(), cobyla(), and slsqp() are deprecated. They will be removed in greta 0.4.0, since they will no longer be available in TensorFlow 2.0, on which that version of greta will depend.

The cobyla() does not provide information about the number of iterations nor convergence, so these elements of the output are set to NA

Examples

# NOT RUN {
# use optimisation to find the mean and sd of some data
x <- rnorm(100, -2, 1.2)
mu <- variable()
sd <- variable(lower = 0)
distribution(x) <- normal(mu, sd)
m <- model(mu, sd)

# configure optimisers & parameters via 'optimiser' argument to opt
opt_res <- opt(m, optimiser = bfgs())

# compare results with the analytic solution
opt_res$par
c(mean(x), sd(x))
# }