Search Spaces

The search space is a set of possible architectures which the search policy may use to create initial candidates or to extend existing candidates. Search spaces are constructed from simple components which can be combined in multiple ways, giving a lot of flexibility.

Simple Parameter Spaces

The most fundamental building block of any search space if the ParSpace:

ps1d = ParSpace([2,4,6,10])

Draw from the search space.

@test ps1d() == 6
@test ps1d() == 10

Possible to supply a random number generator.

@test ps1d(MersenneTwister(0)) == 4

ParSpaces can be of any dimension and type.

ps2d = ParSpace(["1","2","3"], ["4","5","6","7"])

@test typeof(ps1d) == ParSpace{1, Int}
@test typeof(ps2d) == ParSpace{2, String}

@test ps2d() == ("1", "4")

Layer Search Spaces

A LayerSpace is a search space of configurations of a single Flux layer. It is worth noting that NaiveGAflux separates search spaces for creation of new layers from modification of existing layers. A layer drawn from a LayerSpace can thus be mutated into a new layer which is not part of the search space it was drawn from.

Lets look at how to create a search space for 2D convolutions:

cs = ConvSpace{2}(outsizes=4:32, activations=[relu, elu, selu], kernelsizes=3:9)

inputsize = 16
convlayer = cs(inputsize)

@test string(convlayer) == "Conv((8, 3), 16 => 22, relu, pad=(4, 3, 1, 1))"

Architecture Search Spaces

An architecture space is a search space over vertices in a computation graph, including how they are connected. Just as with Layer Search Spaces, the architecture search spaces are separate from modifications of graphs, meaning that an architecture drawn from a search space may be mutated into an architecture which is not part of the searh space it was drawn from.

Lets look at an example of a search space of large image classifiers:

import Flux
import Flux: MaxPool, relu, elu

VertexSpace creates a MutableVertex of layers generated by the wrapped search space.

cs = VertexSpace(ConvSpace{2}(outsizes=8:256, activations=[identity, relu, elu], kernelsizes=3:5))
bs = VertexSpace(BatchNormSpace([identity, relu]))

Block of Conv->BatchNorm and BatchNorm->Conv respectively. Need to make sure there is always at least one SizeAbsorb layer to make fork and res below play nice.

csbs = ArchSpaceChain(cs ,bs)
bscs = ArchSpaceChain(bs, cs)

Randomly generates either Conv, Conv->BatchNorm or BatchNorm->Conv:

cblock = ArchSpace(ParSpace1D(cs, csbs, bscs))

Generates between 1 and 5 (independent) samples from csbs in a sequence.

rep = RepeatArchSpace(cblock, 1:5)

Generates between 2 and 4 parallel paths joined by concatenation (inception like-blocks) drawn from rep.

fork = ForkArchSpace(rep, 2:4)

Generates a residual connection around what is generated by rep.

res = ResidualArchSpace(rep)

... and a residual fork.

resfork = ResidualArchSpace(fork)

Pick one of the above randomly...

repforkres = ArchSpace(ParSpace1D(rep, fork, res, resfork))

...1 to 3 times.

blocks = RepeatArchSpace(repforkres, 1:3)

End each block with subsamping through maxpooling.

ms = VertexSpace(PoolSpace{2}(windowsizes=2, strides=2, poolfuns=MaxPool))
reduction = ArchSpaceChain(blocks, ms)

And lets do 2 to 4 reductions.

featureextract = RepeatArchSpace(reduction, 2:4)

Adds 1 to 3 dense layers as outputs.

dense = VertexSpace(DenseSpace(16:512, [relu, selu]))
drep = RepeatArchSpace(dense, 0:2)

Last layer has fixed output size (number of labels).

dout=VertexSpace(Shielded(), DenseSpace(10, identity))
output = ArchSpaceChain(drep, dout)

Aaaand lets glue it together: Feature extracting Conv/BatchNorm layers -> global pooling -> dense layers.

archspace = ArchSpaceChain(featureextract, GlobalPoolSpace(), output)

Input is 3 channel image.

samplemodel(invertex=conv2dinputvertex("input", 3)) = CompGraph(invertex, archspace(invertex))

Sample one architecture from the search space.

graph1 = samplemodel()
@test nvertices(graph1) == 79

And one more...

graph2 = samplemodel()
@test nvertices(graph2) == 128

This page was generated using Literate.jl.