Quick Tutorial
Construct a very simple graph
Just to get started, lets create a simple graph for the summation of two numbers.
using NaiveNASlib, Test # We'll make use of @test below to condense verbose output
NaiveNASlib uses a special immutable type of vertex to annotate inputs so that one can be certain that size mutations won't suddenly change the input shape of the model.
in1 = inputvertex("in1", 1);
in2 = inputvertex("in2", 1);
In this example we could have done without them as we won't have any parameters to change the size of, but lets show things as they are expected to be used.
Create a new vertex which computes the sum of in1
and in2
:
add = "add" >> in1 + in2;
NaiveNASlib lets you do this using +
which creates a new vertex which sums it inputs. Use >>
to attach a name to the vertex when using infix operations. Naming vertices is completely optional, but it is quite helpful as can be seen below.
CompGraph
helps evaluating the whole graph as a function.
graph = CompGraph([in1, in2], add)
CompGraph with graphsummary:
┌────────────────┬───────────┬────────────────┬──────────────────┐
│ Graph Position │ Vertex Nr │ Input Vertices │ Op │
├────────────────┼───────────┼────────────────┼──────────────────┤
│ Input │ 1 │ │ │
│ Input │ 2 │ │ │
│ Output │ 3 │ 1,2 │ + (element wise) │
└────────────────┴───────────┴────────────────┴──────────────────┘
The show
function for CompGraph
just redirects to graphsummary
until I can find a better way to print it.
Evaluate the function represented by graph
by just calling it.
graph(2,3)
5
graph(100,200)
300
The vertices
function returns the vertices in topological order.
@test vertices(graph) == [in1, in2, add]
@test name.(vertices(graph)) == ["in1", "in2", "add"]
Test Passed
The graphsummary
function can be used to print a summary table:
graphsummary(graph, name, "Name of inputs" => v -> name.(inputs(v)))
┌────────────────┬───────────┬────────────────┬──────────────────┬──────┬────────────────┐
│ Graph Position │ Vertex Nr │ Input Vertices │ Op │ Name │ Name of inputs │
├────────────────┼───────────┼────────────────┼──────────────────┼──────┼────────────────┤
│ Input │ 1 │ │ │ in1 │ │
│ Input │ 2 │ │ │ in2 │ │
│ Output │ 3 │ 1,2 │ + (element wise) │ add │ in1,in2 │
└────────────────┴───────────┴────────────────┴──────────────────┴──────┴────────────────┘
CompGraph
s can be indexed:
@test graph[begin] == graph[1] == in1
@test graph[end] == graph[3] == add
@test graph[begin:end] == vertices(graph)
Test Passed
This is a bit slow though as it traverses the whole graph each time. It is better to call vertices
first and then apply the indexing if one needs to do this many times.
Modify a graph
Now lets look at how to make use of it to modify the structure of a neural network. Since batteries are excluded, lets first create a tiny neural network library and do the wiring so that NaiveNASlib can work with it.
module TinyNNlib
using NaiveNASlib
# A simple linear layer
mutable struct LinearLayer{T}
W::Matrix{T}
end
# Normally ones uses something like randn here, but this makes output
# in examples easier on the eyes
LinearLayer(nin, nout) = LinearLayer(ones(Int, nout, nin))
(l::LinearLayer)(x) = l.W * x
# NaiveNASlib needs to know what LinearLayer considers its output and input size
# In this case it is the number of rows and columns of the weight matrix
# Input size is always a vector since vertices might have multiple inputs
NaiveNASlib.nin(l::LinearLayer) = [size(l.W, 2)]
NaiveNASlib.nout(l::LinearLayer) = size(l.W, 1)
# We also need to tell NaiveNASlib how to change the size of LinearLayer
# The Δsize! function will receive indices to keep from existing weights
# as well as where to insert new rows/columns
function NaiveNASlib.Δsize!(l::LinearLayer, newins::AbstractVector, newouts::AbstractVector)
# newins is a vector of vectors as vertices may have more than one input
# LinearLayer has only one howevever
# The function NaiveNASlib.parselect can be used to interpret newins and newouts.
# We just need to tell it along which dimensions to apply them.
l.W = NaiveNASlib.parselect(l.W, 1=>newouts, 2=>only(newins))
end
# This makes LinearLayer print nice in the graphsummary table
Base.show(io::IO, l::LinearLayer) = print(io, "LinearLayer(", only(nin(l)), " => " ,nout(l), ')')
# Helper function which creates a LinearLayer wrapped in an vertex in a computation graph.
# This creates a Keras-like API
linearvertex(in, outsize) = absorbvertex(LinearLayer(nout(in), outsize), in)
export linearvertex, LinearLayer
end;
There are a handful of other functions one can implement to e.g. provide better defaults and offer other forms of convenience, but here we use the bare minimum to get ourselves started. Some examples of this is provided further down.
In practice one might not want to create a whole neural network library from scratch, but rather incorporate NaiveNASlib with an existing library in a glue package. It might appear that this will inevitably lead to type-piracy as neither the layer definitions nor the functions (e.g. nout, nin) would belong to the glue package. However, one typically wants to wrap the layers in some intermediate type anyways. For instance, NaiveNASflux needs to wrap the layers from Flux in a mutable container as the layers themselves are not mutable.
Lets do a super simple example where we make use of the tiny neural network library to create a model and then modify it:
using .TinyNNlib
invertex = inputvertex("input", 3);
layer1 = linearvertex(invertex, 4);
layer2 = linearvertex(layer1, 5);
Vertices may be called to execute their computation alone. We generally outsource this work to CompGraph
, but now we are trying to illustrate how things work.
batchsize = 2
batch = randn(nout(invertex), batchsize)
y1 = layer1(batch)
@test size(y1) == (nout(layer1), batchsize) == (4, 2)
y2 = layer2(y1)
@test size(y2) == (nout(layer2), batchsize) == (5, 2)
Test Passed
Lets change the output size of layer1
. First check the input sizes so we have something to compare to.
@test [nout(layer1)] == nin(layer2) == [4]
@test Δnout!(layer1 => -2) # Returns true if successful
@test [nout(layer1)] == nin(layer2) == [2]
Test Passed
And now the weight matrices have changed!
The graph is still operational of course but the sizes of the activations have changed.
y1 = layer1(batch)
@test size(y1) == (nout(layer1), batchsize) == (2, 2)
y2 = layer2(y1)
@test size(y2) == (nout(layer2), batchsize) == (5, 2)
Test Passed
As can be seen above, the consequence of changing the output size of layer1
was that the input size of layer2
also was changed. This is of course required for the computation graph to not throw a dimension mismatch error when being executed.
Besides the very simple graph, this mutation was trivial because the sizes of the input and output dimensions of LinearLayer
's parameters can change independently as they are the rows and columns of the weight matrix. This is expressed by giving the layers the mutation size trait SizeAbsorb
in the lingo of NaiveNASlib, meaning that a change in number of input/output neurons does not propagate further in the graph.
A more elaborate example
While the previous example was simple enough to be done by hand, things can quickly get out of hand when using:
- Layers which require
nin==nout
, e.g. batch normalization and pooling. - Element wise operations such as activation functions or just element wise arithmetics (e.g
+
used in residual connections). - Concatenation of activations.
Lets use a small but non-trivial model including all of the above. We begin by making a helper which creates a vertex which does elementwise scaling of its input:
scalarmult(v, s::Number) = invariantvertex(x -> x .* s, v);
When multiplying with a scalar, the output size is the same as the input size. This vertex type is said to be size invariant (in lack of better words), hence the name invariantvertex
.
Ok, lets create the model:
# First a few `LinearLayer`s
invertex = inputvertex("input", 6);
start = linearvertex(invertex, 6);
split = linearvertex(start, nout(invertex) ÷ 3);
# Concatenation means the output size is the sum of the input sizes
joined = conc(scalarmult(split,2), scalarmult(split,3), scalarmult(split,5), dims=1);
# Elementwise addition is of course also size invariant
out = start + joined;
# CompGraph to help us run the whole thing. Don't forget to use [`graphsummary`](@ref)
# to help understand the structure
graph = CompGraph(invertex, out)
graphsummary(graph, nin, nout)
┌────────────────┬───────────┬────────────────┬─────────────────────┬───────┬──────┐
│ Graph Position │ Vertex Nr │ Input Vertices │ Op │ Nin │ Nout │
├────────────────┼───────────┼────────────────┼─────────────────────┼───────┼──────┤
│ Input │ 1 │ │ │ │ 6 │
│ Hidden │ 2 │ 1 │ LinearLayer(6 => 6) │ 6 │ 6 │
│ Hidden │ 3 │ 2 │ LinearLayer(6 => 2) │ 6 │ 2 │
│ Hidden │ 4 │ 3 │ #3 │ 2 │ 2 │
│ Hidden │ 5 │ 3 │ #3 │ 2 │ 2 │
│ Hidden │ 6 │ 3 │ #3 │ 2 │ 2 │
│ Hidden │ 7 │ 4,5,6 │ cat(x..., dims=1) │ 2,2,2 │ 6 │
│ Output │ 8 │ 2,7 │ + (element wise) │ 6,6 │ 6 │
└────────────────┴───────────┴────────────────┴─────────────────────┴───────┴──────┘
As seen above, the anonymous function in scalarmult
does not print nicely. Consider using a callable struct and implement Base.show
for it.
@test graph((ones(6))) == [78, 78, 114, 114, 186, 186]
Test Passed
Now we have a somewhat complex set of size relations at our hand since the sizes are constrained so that
start
andjoined
must have the same output size due to element wise addition.joined
will always have 3 times the output size ofsplit
since there are no size absorbing vertices between them.
Modifying this graph manually would of course be manageable (albeit a bit cumbersome) if we created the model by hand and knew it inside out. When things like the above emerges out of a neural architecture search things are less fun though and this is where use of NaiveNASlib will really pay off.
Ok, lets try to increase the size of the vertex out
by 2. Before we do that, lets have a look at the sizes of the vertices in the graph to have something to compare to.
@test [nout(start)] == nin(split) == [3nout(split)] == [sum(nin(joined))] == [nout(out)] == [6]
@test [nout(start), nout(joined)] == nin(out) == [6, 6]
Test Passed
In many cases it is useful to hold on to the old graph before mutating
parentgraph = deepcopy(graph);
It is not possible to change the size of out
by exactly 2 due to 1.
and 2.
above. By default, NaiveNASlib warns when this happens and then tries to make the closest possible change. If we don't want the warning, we can tell NaiveNASlib to relax and make the closest possible change right away:
@test Δnout!(out => relaxed(2))
@test [nout(start)] == nin(split) == [3nout(split)] == [sum(nin(joined))] == [nout(out)] == [9]
@test [nout(start), nout(joined)] == nin(out) == [9, 9]
Test Passed
As we can see above, the size change rippled through the graph due to the size relations described above. Pretty much every vertex was affected.
Lets evaluate the graph just to verify that we don't get a dimension mismatch error.
@test graph((ones(6))) == [78, 78, 0, 114, 114, 0, 186, 186, 0]
Test Passed
TinyNNlib
uses parselect
which defaults to inserting rows/columns of zeros when size increases. This helps the graph maintain the same function after mutation. In this case we changed the size of the output layer so we don't have the exact same function though, but hopefully it is clear why e.g. a linear layer after out
would have made it produce the same output.
Copy is still intact of course.
@test parentgraph((ones(6))) == [78, 78, 114, 114, 186, 186]
Test Passed
While we still have the complex model in scope, lets show a few more way to change the sizes. See the built in documentation for more information.
It is possible to supply a utility function for telling the utility of each neuron in a vertex. NaiveNASlib will prioritize keeping the neurons with higher utility.
Prefer high indices:
graphhigh = deepcopy(graph)
@test Δnout!(v -> 1:nout(v), graphhigh[end] => -3)
@test graphhigh((ones(6))) == [42, 0, 60, 0, 96, 0]
Test Passed
Perfer low indices:
graphlow = deepcopy(graph)
@test Δnout!(v -> nout(v):-1:1, graphlow[end] => -3)
@test graphlow((ones(6))) == [78, 78, 114, 114, 186, 186]
Test Passed
A simple approach when doing structured pruning is to prefer neurons with high magnitude. Here is how to set that as the default for LinearLayer. This is something one should probably implement in TinyNNlib
instead.
using Statistics: mean
NaiveNASlib.defaultutility(l::LinearLayer) = mean(abs, l.W, dims=2)
graphhighmag = deepcopy(graph)
@test Δnout!(graphhighmag[end] => -3)
@test graphhighmag((ones(6))) == [78, 78, 114, 114, 186, 186]
Test Passed
In many NAS applications one wants to apply random mutations to the graph. When doing so, one might end up in situations like this:
badgraphdecinc = deepcopy(graph)
v1, v2 = badgraphdecinc[[3, end]] # Imagine selecting these at random
@test Δnout!(v1 => relaxed(-2))
@test Δnout!(v2 => 6)
# Now we first deleted a bunch of weights, then we added new :(
@test badgraphdecinc((ones(6))) == [42, 0, 0, 60, 0, 0, 96, 0, 0]
Test Passed
In such cases, it might be better to supply all wanted changes in one go and let NaiveNASlib try to come up with a decent compromise.
goodgraphdecinc = deepcopy(graph)
v1, v2 = goodgraphdecinc[[3, end]]
@test Δnout!(v1 => relaxed(-2), v2 => 3) # Mix relaxed and exact size changes freely
@test goodgraphdecinc((ones(6))) == [78, 0, 0, 78, 114, 6, 0, 108, 186, 6, 0, 180]
Test Passed
It is also possible to change the input direction, but it requires specifying a size change for each input and is generally not recommended due to this.
graphΔnin = deepcopy(graph)
v1, v2 = graphΔnin[end-1:end]
# Use missing to signal "don't care"
@test Δnin!(v1 => (3, relaxed(2), missing), v2 => relaxed((1,2)))
@test nin(v1) == [6, 6, 6] # Sizes are tied to nout of split so they all have to be equal
@test nin(v2) == [18, 18] # Sizes are tied due to elementwise addition
Test Passed
A common pruning strategy is to just remove the x% of params with lowest utility. This can be done by just not putting any size requirements and assign negative utility.
graphprune40 = deepcopy(graph)
Δsize!(graphprune40) do v
utility = NaiveNASlib.defaultutility(v)
# We make some strong assumptions on weight distribution here for breviety :)
return utility .- 0.4mean(utility)
end
@test nout.(vertices(graphprune40)) == [6, 6, 2, 2, 2, 2, 6, 6]
# Compare to original:
@test nout.(vertices(graph)) == [6, 9, 3, 3, 3, 3, 9, 9]
Test Passed
Closer look at how weights are modified
Here we take a closer look at how the weight matrices are changed. We use the following function to create LinearLayer
s with easy to distinguish weights. It returns both a vertex and the LinearLayer
so we can easily look at the weights:
function vertexandlayer(in, outsize)
nparam = nout(in) * outsize
l = LinearLayer(collect(reshape(1:nparam, :, nout(in))))
return absorbvertex(l, in), l
end;
Make a simple model:
invertices = inputvertex.(["in1", "in2"], [3,4])
v1, l1 = vertexandlayer(invertices[1], 4)
v2, l2 = vertexandlayer(invertices[2], 3)
merged = conc(v1, v2, dims=1)
v3, l3 = vertexandlayer(invertices[1], nout(merged))
add = v3 + merged
v4, l4 = vertexandlayer(merged, 2)
CompGraph(invertices, v4)
CompGraph with graphsummary:
┌────────────────┬───────────┬────────────────┬─────────────────────┐
│ Graph Position │ Vertex Nr │ Input Vertices │ Op │
├────────────────┼───────────┼────────────────┼─────────────────────┤
│ Input │ 1 │ │ │
│ Input │ 3 │ │ │
│ Hidden │ 2 │ 1 │ LinearLayer(3 => 4) │
│ Hidden │ 4 │ 3 │ LinearLayer(4 => 3) │
│ Hidden │ 5 │ 2,4 │ cat(x..., dims=1) │
│ Output │ 6 │ 5 │ LinearLayer(7 => 2) │
└────────────────┴───────────┴────────────────┴─────────────────────┘
These weights might look a bit odd, but here we only care about making it easy to spot what has changed after size change below.
@test l1.W ==
[ 1 5 9
2 6 10
3 7 11
4 8 12 ]
@test l2.W ==
[ 1 4 7 10
2 5 8 11
3 6 9 12 ]
@test l3.W ==
[ 1 8 15
2 9 16
3 10 17
4 11 18
5 12 19
6 13 20
7 14 21 ]
@test l4.W ==
[ 1 3 5 7 9 11 13 ;
2 4 6 8 10 12 14 ]
Test Passed
Now, lets decrease v2
by 1 and force merged
to retain its size which in turn forces v1
to grow by 1. Assign utility 10 to neurons 1 and 3 of v2
and 1 for all other neurons in the model.
@test Δnout!(v2 => -1, merged => 0) do v
v == v2 ? [10, 1, 10] : 1
end
Test Passed
v1
got a new row of parameters at the end:
@test l1.W ==
[ 1 5 9
2 6 10
3 7 11
4 8 12
0 0 0 ]
Test Passed
v2
chose to drop its middle row as it was the output neuron with lowest utility:
@test l2.W ==
[ 1 4 7 10
3 6 9 12 ]
Test Passed
v4
dropped the second to last column (which is aligned to the middle row of v2
). and got new parameters in column 5 (which is aligned to the last row of v1
):
@test l4.W ==
[ 1 3 5 7 0 9 13
2 4 6 8 0 10 14 ]
Test Passed
Oh, and since v4
is connected to v3
though add
, v3
had the equivalent change to its rows:
@test l3.W ==
[ 1 8 15
2 9 16
3 10 17
4 11 18
0 0 0
5 12 19
7 14 21 ]
Test Passed
Other modifications
Lets just do a few quick examples of the other types of modifications.
Add a vertex
Using insert!
:
invertex = inputvertex("input", 3)
layer1 = linearvertex(invertex, 5)
graph = CompGraph(invertex, layer1)
# nvertices(g) is shortcut for length(vertices(g))
@test nvertices(graph) == 2
@test graph(ones(3)) == [3,3,3,3,3]
# Insert a layer between invertex and layer1
@test insert!(invertex, vertex -> linearvertex(vertex, nout(vertex))) # True if success
@test nvertices(graph) == 3
@test graph(ones(3)) == [9, 9, 9, 9, 9]
Test Passed
Remove a vertex
Using remove!
:
invertex = inputvertex("input", 3)
layer1 = linearvertex(invertex, 5)
layer2 = linearvertex(layer1, 4)
graph = CompGraph(invertex, layer2)
@test nvertices(graph) == 3
@test graph(ones(3)) == [15, 15, 15, 15]
# Remove layer1 and change nin of layer2 from 5 to 3
# Would perhaps have been better to increase nout of invertex, but it is immutable
@test remove!(layer1) # True if success
@test nvertices(graph) == 2
@test graph(ones(3)) == [3, 3, 3, 3]
Test Passed
Add an edge
Using create_edge!
:
invertices = inputvertex.(["input1", "input2"], [3, 2])
layer1 = linearvertex(invertices[1], 4)
layer2 = linearvertex(invertices[2], 4)
add = layer1 + layer2
out = linearvertex(add, 5)
graph = CompGraph(invertices, out)
@test nin(add) == [4, 4]
# Two inputs to this graph!
@test graph(ones(3), ones(2)) == [20, 20, 20, 20, 20]
# This graph is not interesting enough for there to be a good showcase for adding a new edge.
# Lets create a new layer which has a different output size just to see how things change
# The only vertex which support more than one input is add
layer3 = linearvertex(invertices[2], 6)
@test create_edge!(layer3, add) # True if success
# NaiveNASlib will try to increase the size in case of a mismatch by default
@test nin(add) == [6, 6, 6]
@test graph(ones(3), ones(2)) == [28, 28, 28, 28, 28]
Test Passed
Remove an edge
Using remove_edge!
:
invertex = inputvertex("input", 4)
layer1 = linearvertex(invertex, 3)
layer2 = linearvertex(invertex, 5)
merged = conc(layer1, layer2, layer1, dims=1)
out = linearvertex(merged, 3)
graph = CompGraph(invertex, out)
@test nin(merged) == [3, 5, 3]
@test graph(ones(4)) == [44, 44, 44]
@test remove_edge!(layer1, merged) # True if success
@test nin(merged) == [5, 3]
@test graph(ones(4)) == [32, 32, 32]
Test Passed
This page was generated using Literate.jl.