This page provides technical implementation details for potential contributors or the curious. You don’t need to read this to be able to use greta.
There are three layers to how greta defines a model: users manipulate greta arrays, these define nodes, and nodes then define Tensors.
greta arrays are the user-facing representation of the model. greta arrays extend R arrays and have the classes greta_array
and array
.
<- ones(3, 3)
x class(x)
[1] "greta_array" "array"
The main difference between greta arrays and R arrays is that greta array has a node
attribute; an R6 object inheriting from the R6 class ‘node’, as well as one of the three node types: ‘data’, ‘operation’ or ‘variable’.
<- attr(x, "node")
x_node class(x_node)
[1] "data_node" "node" "R6"
There is a fourth node type: ‘distribution’, but these are never directly associated with greta arrays.
These R6 node objects are where the magic happens. When created, each node points to its ‘parent’ nodes - the nodes for the greta arrays that were used to create this one.
# data nodes have no parents
length(x_node$parents)
[1] 0
# operation nodes have one or more parents
<- x * 3
z <- attr(z, "node")
z_node length(z_node$parents)
[1] 2
Each node also has a list of its children, the nodes that have been created from this one.
When model()
is called, that inheritance information is used to construct the directed acyclic graph (DAG) that defines the model. The inheritance also preserves intermediate nodes, such as those creates in multi-part operations, but not assigned as greta arrays.
Nodes also have a value member, which is an array for data nodes or an ‘unknowns’ object for other node types. The unknowns class is a thin wrapper around arrays, which prints the question marks. Generic functions for working on arrays (e.g. dim
, length
, print
) use these node values to return something familiar to the user.
$value() x_node
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1
# R calls this a matrix because it's 2d, but it's an array
class(x_node$value())
[1] "matrix" "array"
$value() z_node
[,1] [,2] [,3]
[1,] ? ? ?
[2,] ? ? ?
[3,] ? ? ?
class(z_node$value())
[1] "unknowns" "matrix" "array"
In addition to remembering their shape and size and where they are in the DAG, each node has methods to define a corresponding TensorFlow Tensor object in a specified environment. That doesn’t happen until the user runs model()
, which creates a ‘dag_class’ object to store the relevant nodes, the environment for the tensors, and methods to talk to the TensorFlow graph.
The node tf()
method takes the DAG as an argument, and defines a tensor representing itself in the tensorflow environment, with a name determined by the dag object.
$tf x_node
function(dag) {
tfe <- dag$tf_environment
tf_name <- dag$tf_name(self)
unbatched_name <- glue::glue("{tf_name}_unbatched")
mode <- dag$how_to_define(self)
# if we're in sampling mode, get the distribution constructor and sample
if (mode == "sampling") {
batched_tensor <- dag$draw_sample(self$distribution)
}
# if we're defining the forward mode graph, create either a constant or a
# placeholder
if (mode == "forward") {
value <- self$value()
ndim <- length(dim(value))
shape <- to_shape(c(1, dim(value)))
value <- add_first_dim(value)
# under some circumstances we define data as constants, but normally as
# placeholders
using_constants <- !is.null(greta_stash$data_as_constants)
if (using_constants) {
unbatched_tensor <- tf$constant(
value = value,
dtype = tf_float(),
shape = shape
)
} else {
unbatched_tensor <- tf$compat$v1$placeholder(
shape = shape,
dtype = tf_float()
)
dag$set_tf_data_list(unbatched_name, value)
}
# expand up to batch size
tiling <- c(tfe$batch_size, rep(1L, ndim))
batched_tensor <- tf$tile(unbatched_tensor, tiling)
# put unbatched tensor in environment so it can be set
assign(unbatched_name, unbatched_tensor, envir = tfe)
}
assign(tf_name, batched_tensor, envir = tfe)
}
<environment: 0x7fd0316e1698>
Because R6 objects are pass-by-reference (rather than pass-by-value), the dag accumulates all of the defined tensors, rather than being re-written each time. Similarly, because nodes are R6 objects and know which are their parents, they can make sure those parents are defined as tensors before they are. The define_tf()
member function ensures that that happens, enabling operation nodes to use the tensors for their parents when defining their own tensor.
$define_tf x_node
function(dag) {
# if defined already, skip
if (!self$defined(dag)) {
# make sure parents are defined
parents_defined <- vapply(self$list_parents(dag),
function(x) x$defined(dag),
FUN.VALUE = FALSE
)
if (any(!parents_defined)) {
parents <- self$list_parents(dag)
lapply(
parents[which(!parents_defined)],
function(x) x$define_tf(dag)
)
}
# then define self
self$tf(dag)
}
}
<environment: 0x7fd0316e3660>
Hamiltonian Monte Carlo (HMC) requires all of the parameters to be transformed to a continuous scale for sampling. Variable nodes are therefore converted to tensors by first defining a ‘free’ (unconstrained) version of themselves as a tensor, and then applying a transformation function to convert them back to the scale of interest.
<- variable(lower = 0)
a <- attr(a, "node")
a_node class(a_node)
[1] "variable_node" "node" "R6"
$tf_from_free a_node
function(x) {
tf_bijector <- self$create_tf_bijector()
tf_bijector$forward(x)
}
<environment: 0x7fd032a20b38>
distribution nodes are node objects just like the others, but they are not directly associated with greta arrays. Instead, greta arrays may have a distribution node in their distribution
slot to indicate that their values are assumed to follow that distribution. The distribution node will also be listed as a child node, and likewise the ‘target node’ will be listed as a child of the distribution. Distribution nodes also have child nodes (data, variables or operations) representing their parameters.
<- normal(0, 1)
b <- attr(b, "node")
b_node class(b_node)
[1] "variable_node" "node" "R6"
class(b_node$distribution)
[1] "normal_distribution" "distribution_node" "node"
[4] "R6"
# b is the target of its own distribution
class(b_node$distribution$target)
[1] "variable_node" "node" "R6"
When they define themselves as tensors, distribution nodes define the log density of their target node/tensor given the tensors representing their parameters.
$distribution$tf b_node
function(dag) {
# assign the distribution object constructor function to the environment
assign(dag$tf_name(self),
self$tf_distrib,
envir = dag$tf_environment
)
}
<environment: 0x7fd0323e2ae8>
If the distribution was truncated, the log density is normalised using the cumulative distribution function.
Those log-densities for these distributions are summed on the TensorFlow graph to create a Tensor for the joint log-density of the model. TensorFlow’s automatic gradient capabilities are then used to define a Tensor for the gradient of the log-density with respect to each parameter in the model.
The dag
R6 object contained within the model then exposes methods to send parameters to the TensorFlow graph and return the joint density and gradient.
<- model(b)
model $dag$send_parameters model
function(parameters) {
# reshape to a row vector if needed
if (is.null(dim(parameters))) {
parameters <- array(parameters, dim = c(1, length(parameters)))
}
# create a feed dict in the TF environment
parameter_list <- list(free_state = parameters)
# set the batch size to match parameters
self$set_tf_data_list("batch_size", nrow(parameters))
self$build_feed_dict(parameter_list)
}
<environment: 0x7fd03289b008>
$dag$log_density model
function() {
res <- cleanly(self$tf_sess_run(joint_density_adj))
if (inherits(res, "error")) {
res <- NA
}
res
}
<environment: 0x7fd03289b008>
$dag$gradients model
NULL
These methods are used to check the validity of initial values, sampling is now done using samplers from tensorflow probability, which require a function mapping from the overall free state to the joint log density. That’s created with the generate_log_prob_function
method:
$dag$generate_log_prob_function model
function(which = c(
"adjusted",
"unadjusted",
"both"
)) {
which <- match.arg(which)
function(free_state) {
# temporarily define a new environment
tfe_old <- self$tf_environment
on.exit(self$tf_environment <- tfe_old)
tfe <- self$tf_environment <- new.env()
# copy the placeholders over here, so they aren't recreated
data_names <- self$get_tf_names(types = "data")
for (name in data_names) {
tfe[[name]] <- tfe_old[[name]]
}
tfe$batch_size <- tfe_old$batch_size
# put the free state in the environment, and build out the tf graph
tfe$free_state <- free_state
self$define_tf_body()
# define the densities
self$define_joint_density()
objectives <- list(
adjusted = tfe$joint_density_adj,
unadjusted = tfe$joint_density
)
# return either of the densities, or a list of both
result <- switch(which,
adjusted = objectives$adjusted,
unadjusted = objectives$unadjusted,
both = objectives
)
result
}
}
<environment: 0x7fd03289b008>