optimisation methods — optimisers • greta

Functions to set up optimisers (which find parameters that maximise the joint density of a model) and change their tuning parameters, for use in opt(). For details of the algorithms and how to tune them, see the SciPy optimiser docs or the TensorFlow optimiser docs.

nelder_mead()

powell()

cg()

bfgs()

newton_cg()

l_bfgs_b(maxcor = 10, maxls = 20)

tnc(max_cg_it = -1, stepmx = 0, rescale = -1)

cobyla(rhobeg = 1)

slsqp()

gradient_descent(learning_rate = 0.01)

adadelta(learning_rate = 0.001, rho = 1, epsilon = 1e-08)

adagrad(learning_rate = 0.8, initial_accumulator_value = 0.1)

adagrad_da(
  learning_rate = 0.8,
  global_step = 1L,
  initial_gradient_squared_accumulator_value = 0.1,
  l1_regularization_strength = 0,
  l2_regularization_strength = 0
)

momentum(learning_rate = 0.001, momentum = 0.9, use_nesterov = TRUE)

adam(learning_rate = 0.1, beta1 = 0.9, beta2 = 0.999, epsilon = 1e-08)

ftrl(
  learning_rate = 1,
  learning_rate_power = -0.5,
  initial_accumulator_value = 0.1,
  l1_regularization_strength = 0,
  l2_regularization_strength = 0
)

proximal_gradient_descent(
  learning_rate = 0.01,
  l1_regularization_strength = 0,
  l2_regularization_strength = 0
)

proximal_adagrad(
  learning_rate = 1,
  initial_accumulator_value = 0.1,
  l1_regularization_strength = 0,
  l2_regularization_strength = 0
)

rms_prop(learning_rate = 0.1, decay = 0.9, momentum = 0, epsilon = 1e-10)

Arguments

maxcor	maximum number of 'variable metric corrections' used to define the approximation to the hessian matrix
maxls	maximum number of line search steps per iteration
max_cg_it	maximum number of hessian * vector evaluations per iteration
stepmx	maximum step for the line search
rescale	log10 scaling factor used to trigger rescaling of objective
rhobeg	reasonable initial changes to the variables
learning_rate	the size of steps (in parameter space) towards the optimal value
rho	the decay rate
epsilon	a small constant used to condition gradient updates
initial_accumulator_value	initial value of the 'accumulator' used to tune the algorithm
global_step	the current training step number
initial_gradient_squared_accumulator_value	initial value of the accumulators used to tune the algorithm
l1_regularization_strength	L1 regularisation coefficient (must be 0 or greater)
l2_regularization_strength	L2 regularisation coefficient (must be 0 or greater)
momentum	the momentum of the algorithm
use_nesterov	whether to use Nesterov momentum
beta1	exponential decay rate for the 1st moment estimates
beta2	exponential decay rate for the 2nd moment estimates
learning_rate_power	power on the learning rate, must be 0 or less
decay	discounting factor for the gradient

Value

an optimiser object that can be passed to opt().

Details

The optimisers powell(), cg(), newton_cg(), l_bfgs_b(), tnc(), cobyla(), and slsqp() are deprecated. They will be removed in greta 0.4.0, since they will no longer be available in TensorFlow 2.0, on which that version of greta will depend.

The cobyla() does not provide information about the number of iterations nor convergence, so these elements of the output are set to NA

Examples

# NOT RUN {
# use optimisation to find the mean and sd of some data
x <- rnorm(100, -2, 1.2)
mu <- variable()
sd <- variable(lower = 0)
distribution(x) <- normal(mu, sd)
m <- model(mu, sd)

# configure optimisers & parameters via 'optimiser' argument to opt
opt_res <- opt(m, optimiser = bfgs())

# compare results with the analytic solution
opt_res$par
c(mean(x), sd(x))
# }