Search Spaces
The search space is a set of possible architectures which the search policy may use to create initial candidates or to extend existing candidates. Search spaces are constructed from simple components which can be combined in multiple ways, giving a lot of flexibility.
Simple Parameter Spaces
The most fundamental building block of any search space if the ParSpace
:
ps1d = ParSpace([2,4,6,10])
Draw from the search space.
@test ps1d() == 6
@test ps1d() == 10
Possible to supply a random number generator.
@test ps1d(MersenneTwister(0)) == 4
ParSpaces can be of any dimension and type.
ps2d = ParSpace(["1","2","3"], ["4","5","6","7"])
@test typeof(ps1d) == ParSpace{1, Int}
@test typeof(ps2d) == ParSpace{2, String}
@test ps2d() == ("1", "4")
Layer Search Spaces
A LayerSpace
is a search space of configurations of a single Flux layer. It is worth noting that NaiveGAflux separates search spaces for creation of new layers from modification of existing layers. A layer drawn from a LayerSpace
can thus be mutated into a new layer which is not part of the search space it was drawn from.
Lets look at how to create a search space for 2D convolutions:
cs = ConvSpace{2}(outsizes=4:32, activations=[relu, elu, selu], kernelsizes=3:9)
inputsize = 16
convlayer = cs(inputsize)
@test string(convlayer) == "Conv((8, 3), 16 => 22, relu, pad=(4, 3, 1, 1))"
Architecture Search Spaces
An architecture space is a search space over vertices in a computation graph, including how they are connected. Just as with Layer Search Spaces, the architecture search spaces are separate from modifications of graphs, meaning that an architecture drawn from a search space may be mutated into an architecture which is not part of the searh space it was drawn from.
Lets look at an example of a search space of large image classifiers:
import Flux
import Flux: MaxPool, relu, elu
VertexSpace creates a MutableVertex of layers generated by the wrapped search space.
cs = VertexSpace(ConvSpace{2}(outsizes=8:256, activations=[identity, relu, elu], kernelsizes=3:5))
bs = VertexSpace(BatchNormSpace([identity, relu]))
Block of Conv->BatchNorm
and BatchNorm->Conv
respectively. Need to make sure there is always at least one SizeAbsorb layer to make fork
and res
below play nice.
csbs = ArchSpaceChain(cs ,bs)
bscs = ArchSpaceChain(bs, cs)
Randomly generates either Conv
, Conv->BatchNorm
or BatchNorm->Conv
:
cblock = ArchSpace(ParSpace1D(cs, csbs, bscs))
Generates between 1 and 5 (independent) samples from csbs
in a sequence.
rep = RepeatArchSpace(cblock, 1:5)
Generates between 2 and 4 parallel paths joined by concatenation (inception like-blocks) drawn from rep
.
fork = ForkArchSpace(rep, 2:4)
Generates a residual connection around what is generated by rep
.
res = ResidualArchSpace(rep)
... and a residual fork.
resfork = ResidualArchSpace(fork)
Pick one of the above randomly...
repforkres = ArchSpace(ParSpace1D(rep, fork, res, resfork))
...1 to 3 times.
blocks = RepeatArchSpace(repforkres, 1:3)
End each block with subsamping through maxpooling.
ms = VertexSpace(PoolSpace{2}(windowsizes=2, strides=2, poolfuns=MaxPool))
reduction = ArchSpaceChain(blocks, ms)
And lets do 2 to 4 reductions.
featureextract = RepeatArchSpace(reduction, 2:4)
Adds 1 to 3 dense layers as outputs.
dense = VertexSpace(DenseSpace(16:512, [relu, selu]))
drep = RepeatArchSpace(dense, 0:2)
Last layer has fixed output size (number of labels).
dout=VertexSpace(Shielded(), DenseSpace(10, identity))
output = ArchSpaceChain(drep, dout)
Aaaand lets glue it together: Feature extracting Conv
/BatchNorm
layers -> global pooling -> dense layers.
archspace = ArchSpaceChain(featureextract, GlobalPoolSpace(), output)
Input is 3 channel image.
samplemodel(invertex=conv2dinputvertex("input", 3)) = CompGraph(invertex, archspace(invertex))
Sample one architecture from the search space.
graph1 = samplemodel()
@test nvertices(graph1) == 79
And one more...
graph2 = samplemodel()
@test nvertices(graph2) == 128
This page was generated using Literate.jl.