This page provides technical implementation details for potential contributors or the curious. You don’t need to read this to be able to use greta.

There are three layers to how greta defines a model: users manipulate *greta arrays*, these define *nodes*, and nodes then define *Tensors*.

greta arrays are the user-facing representation of the model. greta arrays extend R arrays and have the classes `greta_array`

and `array`

.

```
x <- ones(3, 3)
class(x)
```

`[1] "greta_array" "array" `

The main difference between greta arrays and R arrays is that greta array has a `node`

attribute; an R6 object inheriting from the R6 class ‘node’, as well as one of the three node types: ‘data’, ‘operation’ or ‘variable’.

```
x_node <- attr(x, "node")
class(x_node)
```

`[1] "data_node" "node" "R6" `

There is a fourth node type: ‘distribution’, but these are never directly associated with greta arrays.

These R6 node objects are where the magic happens. When created, each node points to its ‘child’ nodes - the nodes for the greta arrays that were used to create this one.

```
# data nodes have no children
length(x_node$children)
```

`[1] 0`

```
# operation nodes have one or more children
z <- x * 3
z_node <- attr(z, "node")
length(z_node$children)
```

`[1] 2`

Each node also has a list of its parents, the nodes that have been created from this one.

When `model()`

is called, that inheritance information is used to construct the directed acyclic graph (DAG) that defines the model. The inheritance also preserves intermediate nodes, such as those creates in multi-part operations, but not assigned as greta arrays.

Nodes also have a value member, which is an array for data nodes or an ‘unknowns’ object for other node types. The unknowns class is a thin wrapper around arrays, which prints the question marks. Generic functions for working on arrays (e.g. `dim`

, `length`

, `print`

) use these node values to return something familiar to the user.

`x_node$value()`

```
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1
```

```
# R calls this a matrix because it's 2d, but it's an array
class(x_node$value())
```

`[1] "matrix"`

`z_node$value()`

```
[,1] [,2] [,3]
[1,] ? ? ?
[2,] ? ? ?
[3,] ? ? ?
```

`class(z_node$value())`

`[1] "unknowns" "matrix" `

In addition to remembering their shape and size and where they are in the DAG, each node has methods to define a corresponding TensorFlow Tensor object in a specified environment. That doesn’t happen until the user runs `model()`

, which creates a ‘dag_class’ object to store the relevant nodes, the environment for the tensors, and methods to talk to the TensorFlow graph.

The node `tf()`

method takes the dag as an argument, and defines a tensor representing itself in the tensorflow environment, with a name determined by the dag object.

`x_node$tf`

```
function(dag) {
tfe <- dag$tf_environment
tf_name <- dag$tf_name(self)
value <- self$value()
shape <- to_shape(c(1, dim(value)))
value <- add_first_dim(value)
# under some circumstances we define data as constants, but normally as
# placeholders
using_constants <- !is.null(greta_stash$data_as_constants)
if (using_constants) {
tensor <- tf$constant(value = value,
dtype = tf_float(),
shape = shape)
} else {
tensor <- tf$placeholder(shape = shape,
dtype = tf_float())
tfe$data_list[[tf_name]] <- value
}
# create placeholder
assign(tf_name, tensor, envir = tfe)
}
<environment: 0x7f8a500c6238>
```

Because R6 objects are pass-by-reference (rather than pass-by-value), the dag accumulates all of the defined tensors, rather than being re-written each time. Similarly, because nodes are R6 objects and know which are their children, they can make sure those children are defined as tensors before they are. The `define_tf()`

member function ensures that that happens, enabling operation nodes to use the tensors for their children when defining their own tensor.

`x_node$define_tf`

```
function(dag) {
# if defined already, skip
if (!self$defined(dag)) {
# make sure children are defined
children_defined <- vapply(self$children,
function(x) x$defined(dag),
FUN.VALUE = FALSE)
if (any(!children_defined)) {
lapply(self$children[which(!children_defined)],
function(x) x$define_tf(dag))
}
# then define self
self$tf(dag)
}
}
<environment: 0x7f8a500c82e0>
```

Hamiltonian Monte Carlo (HMC) requires all of the parameters to be transformed to a continuous scale for sampling. Variable nodes are therefore converted to tensors by first defining a ‘free’ (unconstrained) version of themselves as a tensor, and then applying a transformation function to convert them back to the scale of interest.

```
a <- variable(lower = 0)
a_node <- attr(a, "node")
class(a_node)
```

`[1] "variable_node" "node" "R6" `

`a_node$tf_from_free`

```
function(x, env) {
upper <- self$upper
lower <- self$lower
if (self$constraint == "none") {
y <- x
} else if (self$constraint == "both") {
y <- tf$nn$sigmoid(x) * fl(upper - lower) + fl(lower)
} else if (self$constraint == "low") {
y <- fl(upper) - tf$exp(x)
} else if (self$constraint == "high") {
y <- tf$exp(x) + fl(lower)
} else {
y <- x
}
y
}
<environment: 0x7f8a4bc7e148>
```

distribution nodes are node objects just like the others, but they are not *directly* associated with greta arrays. Instead, greta arrays may have a distribution node in their `distribution`

slot to indicate that their values are assumed to follow that distribution. The distribution node will also be listed as a parent node, and likewise the ‘target node’ will be listed as a child of the distribution. Distribution nodes also have child nodes (data, variables or operations) representing their parameters.

```
b <- normal(0, 1)
b_node <- attr(b, "node")
class(b_node)
```

`[1] "variable_node" "node" "R6" `

`class(b_node$distribution)`

```
[1] "normal_distribution" "distribution_node" "node"
[4] "R6"
```

```
# b is the target of its own distribution
class(b_node$distribution$target)
```

`[1] "variable_node" "node" "R6" `

When they define themselves as tensors, distribution nodes define the log density of their target node/tensor given the tensors representing their parameters.

`b_node$distribution$tf`

```
function(dag) {
# for distributions, tf assigns a *function* to execute the density
density <- function(tf_target) {
# fetch inputs
tf_parameters <- self$tf_fetch_parameters(dag)
# ensure x and the parameters are all expanded to have the n_chains
# batch dimension, if any have it
target_params <- match_batches(c(list(tf_target), tf_parameters))
tf_target <- target_params[[1]]
tf_parameters <- target_params[-1]
# calculate log density
ld <- self$tf_log_density_function(tf_target, tf_parameters, dag)
# check for truncation
if (!is.null(self$truncation))
ld <- ld - self$tf_log_density_offset(tf_parameters)
ld
}
# assign the function to the environment
assign(dag$tf_name(self),
density,
envir = dag$tf_environment)
}
<environment: 0x7f8a4be1ef58>
```

If the distribution was truncated, the log density is normalised using the cumulative distribution function.

Those log-densities for these distributions are summed on the TensorFlow graph to create a Tensor for the joint log-density of the model. TensorFlow’s automatic gradient capabilities are then used to define a Tensor for the gradient of the log-density with respect to each parameter in the model.

The `dag`

R6 object contained within the model then exposes methods to send parameters to the TensorFlow graph and return the joint density and gradient.

```
model <- model(b)
model$dag$send_parameters
```

```
function(parameters) {
# reshape to a row vector if needed
if (is.null(dim(parameters))) {
parameters <- array(parameters, dim = c(1, length(parameters)))
}
# create a feed dict in the TF environment
parameter_list <- list(free_state = parameters)
self$build_feed_dict(parameter_list)
}
<environment: 0x7f8a4d226848>
```

`model$dag$log_density`

```
function() {
res <- cleanly(self$tf_sess_run(joint_density_adj))
if (inherits(res, "error"))
res <- NA
res
}
<environment: 0x7f8a4d226848>
```

`model$dag$gradients`

`NULL`

These methods are used to check the validity of initial values, sampling is now done using samplers from tensorflow probability, which require a function mapping from the overall free state to the joint log density. That’s created with the `generate_log_prob_function`

method:

`model$dag$generate_log_prob_function`

```
function(which = c("adjusted",
"unadjusted",
"both")) {
which <- match.arg(which)
function(free_state) {
# temporarily define a new environment
tfe_old <- self$tf_environment
on.exit(self$tf_environment <- tfe_old)
tfe <- self$tf_environment <- new.env()
# copy the placeholders over here, so they aren't recreated
data_names <- self$get_tf_names(types = "data")
for (name in data_names)
tfe[[name]] <- tfe_old[[name]]
# put the free state in the environment, and build out the tf graph
tfe$free_state <- free_state
self$define_tf_body()
# define the densities
self$define_joint_density()
objectives <- list(adjusted = tfe$joint_density_adj,
unadjusted = tfe$joint_density)
# return either of the densities, or a list of both
result <- switch(which,
adjusted = objectives$adjusted,
unadjusted = objectives$unadjusted,
both = objectives)
result
}
}
<environment: 0x7f8a4d226848>
```