geppy.core package¶
Submodules¶
geppy.core.entity module¶
The module entity
defines the core data structures used in GEP, including the gene, the chromosome, the
K-expression and the expression tree. We use primitives (functions and terminals) to build genes and then compose
a chromosome with one or multiple genes. Refer to the geppy.core.symbol
module for information on primitives.
A chromosome composed of only one gene is called a monogenic chromosome, while one composed of more than one genes is named a multigenic chromosome. A multigenic chromosome can be assigned a linking function to combine the results from multiple genes into a single result.
-
class
geppy.core.entity.
Chromosome
(gene_gen, n_genes, linker=None)[source]¶ Bases:
list
Individual representation in gene expression programming (GEP), which may contain one or more genes. Note that in a multigenic chromosome, all genes must share the same head length and the same primitive set. Each element of
Chromosome
is of typeGene
.-
__init__
(gene_gen, n_genes, linker=None)[source]¶ Initialize an individual in GEP.
Parameters: - gene_gen – callable, a gene generator, i.e.,
gene_gen()
should return a gene - n_genes – number of genes
- linker – callable, a linking function
Note
If a linker is specified, then it must accept n_genes arguments, each produced by one gene. If the linker parameter is the default value
None
, then for a monogenic chromosome, no linking is applied, while for a multigenic chromosome, thetuple
function is used, i.e., a tuple of values of all genes is returned.- gene_gen – callable, a gene generator, i.e.,
-
classmethod
from_genes
(genes, linker=None)[source]¶ Create a chromosome instance by providing a list of genes directly. The number of genes is implicitly specified by
len(genes)
.Parameters: - genes – iterable, a list of genes
- linker – callable, a linking function. Refer to
Chromosome.__init__()
for more details.
Returns: Chromosome
, a chromosome
-
head_length
¶ Get the length of the head domain. All genes in a chromosome should have the same head length.
-
kexpressions
¶ Get a list of K-expressions for all the genes in this chromosome.
-
linker
¶ Get the linking function.
-
max_arity
¶ Get the max arity of the functions in the primitive set with which all genes are built. Note that all genes in a chromosome should share the same primitive set.
-
tail_length
¶ Get the length of the tail domain. All genes in a chromosome should have the same tail length.
-
-
class
geppy.core.entity.
ExpressionTree
(root)[source]¶ Bases:
object
Class representing an expression tree (ET) in GEP, which may be obtained by translating a K-expression, a gene, or a chromosome, i.e., genotype-phenotype mapping.
-
class
Node
(name)[source]¶ Bases:
object
Class representing a node in the expression tree. Each node has a variable number of children, depending on the arity of the primitive at this node.
-
children
¶ Get the children of this node.
-
name
¶ Get the name (label) of this node.
-
-
__init__
(root)[source]¶ Initialize a tree with the given root node.
Parameters: root – ExpressionTree.Node
, the root node
-
classmethod
from_genotype
(genome)[source]¶ Create an expression tree by translating genome, which may be a K-expression, a gene, or a chromosome.
Parameters: genome – KExpression
,Gene
, orChromosome
, the genotype of an individualReturns: ExpressionTree
, an expression tree
-
root
¶ Get the root node of this expression tree.
-
class
-
class
geppy.core.entity.
Gene
(pset, head_length)[source]¶ Bases:
list
A single gene in GEP, which is a fixed-length linear sequence composed of functions and terminals.
-
__init__
(pset, head_length)[source]¶ Instantiate a gene. :param head_length: length of the head domain :param pset: a primitive set including functions and terminals for genome construction.
Supposing the maximum arity of functions in pset is max_arity, then the tail length is automatically determined to be
tail_length = head_length * (max_arity - 1) + 1
. The genome, i.e., list of symbols in the instantiated gene is formed randomly from pset.
-
__str__
()[source]¶ Return the expression in a human readable string, which is also a legal python code that can be evaluated.
Returns: string form of the expression
-
classmethod
from_genome
(genome, head_length)[source]¶ Build a gene directly from the given genome.
Parameters: - genome – iterable, a list of symbols representing functions and terminals
- head_length – length of the head domain
Returns: Gene
, a gene
-
head
¶ Get the head domain of the gene.
-
head_length
¶ Get the length of the head domain.
-
kexpression
¶ Get the K-expression of type
KExpression
represented by this gene.
-
max_arity
¶ Get the max arity of the functions in the primitive set with which the gene is built.
-
tail
¶ Get the tail domain of the gene.
-
tail_length
¶ Get the length of the tail domain.
-
-
class
geppy.core.entity.
GeneDc
(pset, head_length, rnc_gen, rnc_array_length)[source]¶ Bases:
geppy.core.entity.Gene
Class represents a gene with an additional Dc domain to handle numerical constants in GEP.
The basic
Gene
has two domains, a head and a tail, while thisGeneDc
class introduces another domain called Dc after the tail. The length of the Dc domain is equal to the length of the tail domain. The Dc domain only stores the indices of numbers present in a separate arrayrnc_array()
, which collects a group of candidate random numerical constants (RNCs). Thus, eachGeneDc
instance comes with a rnc_array, which are generally different among different instances.In addition to the operators of the basic gene expression algorithm (mutation, inversion, transposition, and recombination), Dc-specific operators in GEP-RNC are also created to better evolve the Dc domain and to manipulate the attached set of constants rnc_array. Such operators are all suffixed with ‘_dc’ in
crossover
andmutation
modules, for example, the methodinvert_dc()
.The rnc_array associated with each
GeneDc
instance can be provided explicitly when creating aGeneDc
instance. Seefrom_genome()
. Or more generally, a random number generator rnc_gen can be provided, with which a specified number of RNCs are generated during the creation of gene instances.A special terminal of type
TerminalRNC
is used internally to represent the RNCs.Note
To create an effective
GeneDc
gene, the primitive set should contain at least oneTerminalRNC
terminal. Seeadd_rnc()
for details.Refer to Chapter 5 of [FC2006] for more knowledge about GEP-RNC.
-
__init__
(pset, head_length, rnc_gen, rnc_array_length)[source]¶ Initialize a gene with a Dc domain.
Parameters: - head_length – length of the head domain
- pset – a primitive set including functions and terminals for genome construction.
- rnc_gen – callable, which should generate a random number when called by
rnc_gen()
. - rnc_array_length – int, number of random numerical constant candidates associated with this gene, usually 10 is enough
Supposing the maximum arity of functions in pset is max_arity, then the tail length is automatically determined to be
tail_length = head_length * (max_arity - 1) + 1
. The genome, i.e., list of symbols in the instantiated gene is formed randomly from pset. The length of Dc domain is equal to tail_length, i.e., the tail domain and the Dc domain share the same length.
-
__str__
()[source]¶ Return the expression in a human readable string, which is also a legal python code that can be evaluated. The special terminal representing a RNC will be replaced by their true values retrieved from the array
rnc_array()
.Returns: string form of the expression
-
dc
¶ Get the Dc domain of this gene. :return:
-
dc_length
¶ Get the length of the Dc domain, which is equal to
GeneDc.tail_length()
-
classmethod
from_genome
(genome, head_length, rnc_array)[source]¶ Build a gene directly from the given genome and the random numerical constant (RNC) array rnc_array.
Parameters: - genome – iterable, a list of symbols representing functions and terminals (especially the special RNC
terminal of type
TerminalRNC
). This genome should have three domains: head, tail and Dc. The Dc domain should be composed only of integers representing indices into the rnc_array. - head_length – length of the head domain. The length of the tail and Dc domain should follow the rule and
can be determined from the head_length:
dc_length = tail_length = (len(genome) - head_length) / 2
. - rnc_array – the RNC array associated with the gene, which contains random constant candidates
Returns: GeneDc
, a gene- genome – iterable, a list of symbols representing functions and terminals (especially the special RNC
terminal of type
-
kexpression
¶ Get the K-expression of type
KExpression
represented by this gene. The involved RNC terminal will be replaced by a constant terminal with its value retrived from thernc_array()
according to the GEP-RNC algorithm.
-
max_arity
¶ Get the max arity of functions in the primitive set used to build this gene.
-
rnc_array
¶ Get the random numerical array (RNC) associated with this gene.
-
tail_length
¶ Get the tail domain length.
-
-
class
geppy.core.entity.
KExpression
(content)[source]¶ Bases:
list
Class representing the K-expression, or the open reading frame (ORF), of a gene in GEP. A K-expression is actually a linear form of an expression tree obtained by level-order traversal.
geppy.core.symbol module¶
This symbol
module defines the classes encapsulating the symbols in GEP. A gene in GEP is composed of multiple
symbols, which include terminals and functions. Together, they are called primitives. This module also
provides a PrimitiveSet
class, which represents a primitive set containing Terminal
and Function
items, for management purpose.
This module implementation refers a lot to the
deap.gp module in DEAP,
especially on primitive management and parsing during evaluation. However, unlike genetic programming (GP) in
deap.gp
, in GEP we usually only consider loosely typed evolution, i.e., the terminals and functions don’t
require specific data types. As a result, the codes can be greatly simplified.
Note
In GEP and GP terminology, there are two kinds of primitives: functions and terminals. In
deap.gp, a primitive actually refers to
a function. To avoid possible ambiguity, in geppy
they are explicitly named function and terminal
respectively. For instance, the PrimitiveSet.add_function()
method can be used to add a function primitive.
Reference:
[FC2006] | (1, 2, 3, 4, 5, 6) Ferreira, Cândida. Gene expression programming: mathematical modeling by an artificial intelligence. Vol. 21. Springer, 2006. |
-
class
geppy.core.symbol.
ConstantTerminal
(value)[source]¶ Bases:
geppy.core.symbol.Terminal
Class that represents a constant terminal, whose value will never change during the whole evolution. The value of a constant terminal can only be a literal variable , such as a numeric number or a Boolean variable.
-
class
geppy.core.symbol.
EphemeralTerminal
(name, gen)[source]¶ Bases:
geppy.core.symbol.Terminal
Class that encapsulates an ephemeral numeric constant terminal in GEP, which can be used to build a normal constant terminal with a randomly produced value when needed using the
update_value()
method.Just as the name implies, the value of an ephemeral terminal may be changed during evolution, either by mutation or by some local search heuristics, to optimize the numerical constants in a mathematical model.
Note
This special terminal named ephemeral random constant was originally introduced in genetic programming by Koza to handle numerical constants in evolutionary programming. Another way to evolve simple constants in GEP is to add a RNC domain in genes and use the GEP-RNC algorithm. See Chapter 5 of [FC2006].
See also
mutate_uniform_ephemeral()
for ephemeral mutation,PrimitiveSet.add_ephemeral_terminal()
to add an ephemeral termination in evolution, andGeneDc
for the GEP-RNC algorithm.-
__init__
(name, gen)[source]¶ Initialize an ephemeral terminal with the given name and the random number generator gen. Its initial value is randomly generated by gen and can later be set explicitly with the
value
property. Alternatively, its value can be updated to a new value generated by gen with theupdate_value()
method. Note that just like a constant terminal, the value of an ephemeral terminal can only be a number or a bool.Parameters: - name – str, name of the ephemeral constant
- gen – callable, an ephemeral number generator, which should return a random value when called with
no arguments
gen()
.
-
format
()[source]¶ Get a string representation of this terminal for subsequent evaluation purpose, which is equivalently
repr(self.value)
.
-
generator
¶ Get the ephemeral random number generator.
-
update_value
()[source]¶ Update the value of this ephemeral constant in place, whose new value is randomly produced by the random number generator
generator
.
-
value
¶ Get the value.
-
-
class
geppy.core.symbol.
Function
(name, arity)[source]¶ Bases:
geppy.core.symbol.Primitive
Class that encapsulates a function in GEP. Note that this class only stores the function ID, i.e., the name attribute instead of the callable function itself. On the other hand, the underlying callable is retrieved somewhere else when needed, for example, from
PrimitiveSet
. Thus, in the whole GEP program the provided name for each function must be unique.-
__init__
(name, arity)[source]¶ Initialize a function.
Parameters: - name – str, name of the function, must be valid non-keyword Python identifier
- arity – int, arity of the function
-
format
(*args)[source]¶ Insert the arguments args into the function and get a Python statement to call the functions with the arguments in a string form. This returned string can afterwards by evaluated using the builtin
eval()
function.>>> f = Function('add', 2) >>> f.format(2, 10) 'add(2, 10)'
Parameters: args – arguments, whose number should be equal to the arity of this function Returns: str, a string form of function calling with arguments
-
-
class
geppy.core.symbol.
Primitive
(name, arity)[source]¶ Bases:
object
Class that encapsulates a primitive in GEP. A primitive may be a function or a terminal.
-
__init__
(name, arity)[source]¶ Initialize a primitive.
Parameters: - name – str, name of the primitive
- arity – int, arity of the primitive
-
arity
¶ Get the arity of this primitive. For a terminal, the arity is 0.
-
format
(*args)[source]¶ Get a string representation of the underlying concept with possible arguments. The derived classes should implement this method.
Parameters: args – variable number of arguments Returns: str, a string
-
name
¶ Get the name of this primitive.
-
-
class
geppy.core.symbol.
PrimitiveSet
(name, input_names)[source]¶ Bases:
object
A class representing a primitive set, which contains the primitives (terminals and functions) that are used in GEP.
Note
Each function
Function
and symbol terminalSymbolTerminal
must have their unique names. This is because internally their true value (or the callable for a function) will be be stored in a dictionaryglobals
insidePrimitiveSet
and later be retrieved when compiling a genome. Consequently, the name must be a valid non-keyword Python identifier. To learn more about the underlying mechanism, refer to the Python builtin functioncompile()
.Note
To use the GEP-RNC algorithm, i.e., to use
GeneDc
for numerical constant handling, please use thePrimitiveSet.add_rnc_terminal()
method, which will add a special terminal of typeRNCTerminal
internally. Then, use a chromosome composed ofGeneDc
genes as the individual. Refer to Chapter 5 of [FC2006] for more details.-
__init__
(name, input_names)[source]¶ Initiate a primitive set with the given name and the list of input names.
Parameters: - name – name of the primitive set
- input_names – iterable, a list of names for the inputs in the GEP problem, for instance,
['x', 'y']
for two inputs. Internally, a symbol terminal is built for each input.
-
add_constant_terminal
(value)[source]¶ Add a terminal which is a constant.
There is no need to store such kind of terminals into globals for later evaluation, because their string representation acquired by
format()
can be used directly.Parameters: value – value of the terminal. Only numeric and Boolean types can be accepted.
-
add_ephemeral_terminal
(name, gen)[source]¶ Add an ephemeral constant of type
EphemeralTerminal
to the set. An ephemeral’s true value is generated by gen, which should be a zero-argument function that returns a random value, when the ephemeral is picked to build a gene. After that, the ephemeral’s value is fixed.If you want to evolve its value actively, then the mutation operator
mutate_uniform_ephemeral()
should be used for ephemeral mutation. Of course, the value of anEphemeralTerminal
instance can also be set directly in more advanced algorithms, e.g., via local search optimization.Parameters: - name – str, name of the ephemeral
- gen – callable, a random number generator which returns a random constant when called by
gen()
Usually calling
add_ephemeral_terminal()
once is sufficient to generate the numerical coefficients in the model, because during evolution multiple independent copies of the ephemeral terminal will be constructed with different values. Nevertheless, if the numerical coefficients have various ranges, it may be more efficient to add multiple ephemeral terminals with corresponding generators.Note
If the number of other kinds of terminals is very large, e.g., hundreds, then it is better to call this method multiple times even if with a single gen. The reason is that each terminal is chosen randomly and uniformly from the list of all terminals given by
terminals()
, and the probability of picking an ephemeral terminal will become quite low if there is only a tiny number of ephemeral terminals but a large number of other terminals, which contain typically input terminals.Note
The special terminal named ephemeral random constant was originally introduced in genetic programming by Koza to handle numerical constants in evolutionary programming. Alternatively, to evolve the numerical constants in GEP, we can also use the GEP-RNC algorithm. See Chapter 5 of [FC2006].
See also
mutate_uniform_ephemeral()
for ephemeral mutation, andGeneDc
for the GEP-RNC algorithm.
-
add_function
(func, arity, name=None)[source]¶ Add a function, which is internally encapsulated as a
Function
object.Parameters: - func – callable
- arity – number of arguments accepted by func
- name – name of func, default
None
. If remainingNone
, then thefunc.__name__
attribute is used instead.
-
add_rnc_terminal
(name='?')[source]¶ Add a special terminal representing a random numerical constant (RNC), as defined in the GEP-RNC algorithm. This terminal’s value is retrieved dynamically from an RNC array attached to a gene of type
GeneDc
according to the GEP-RNC algorithm. See alsoRNCTerminal
and refer to Chapter 5 of [FC2006] about GEP-RNC.Parameters: name – str, name of the terminal. For a RNC terminal, generally there is no need to specify a name and the default value ‘?’ is recommended. Usually it is sufficient to call this method once to add only one RNC terminal, since the values of the RNC terminals at different positions are different random numerical constants.
Note
If the number of other kinds of terminals is very large, e.g., hundreds, then it is better to call this method multiple times. Please check the documentation of
add_ephemeral_terminal()
for more details.
-
add_symbol_terminal
(name, value)[source]¶ Add a symbolic terminal, whose name points to the true value stored in
globals
.For example, we may add a terminal for the constant pi, with
add_symbol_terminal('pi', 3.14)
.Parameters: - name – name of the terminal
- value – value of the terminal
-
ephemerals
¶ Get all the ephemeral terminals in this set.
-
functions
¶ Get all the functions.
-
globals
¶ Get a dictionary which can be used to set up the evaluation/execution environment. This dictionary can be fed into the builtin
eval()
andexec()
functions for expression evaluation.For example, if we call
add_function(max, 2, 'max2')
, thenglobals['max2']
corresponds to aFunction
object encapsulating themax()
function.
-
input_names
¶ Get a list of names for the input arguments.
-
max_arity
¶ Get the max arity of functions in this primitive set.
-
name
¶ Get the name of this primitive set.
-
terminals
¶ Get all the terminals.
-
-
class
geppy.core.symbol.
RNCTerminal
(name='?')[source]¶ Bases:
geppy.core.symbol.Terminal
A special terminal, which is just a placeholder representing a random numerical constant (RNC) in the GEP-RNC algorithm. This class is mainly used internally by the
GeneDc
class. The name of aRNCTerminal
object is ‘?’ by default, and its value is retrieved dynamically according to the GEP-RNC algorithm. Refer to Chapter 5 of [FC2006] for more details.In geppy and the GEP-RNC algorithm, the RNC terminal is just a placeholder and the default name ‘?’ is recommended.
-
class
geppy.core.symbol.
SymbolTerminal
(name)[source]¶ Bases:
geppy.core.symbol.Terminal
Class that represents a symbolic terminal. Only the
name
of a symbol terminal participates in the genome construction, while its value is retrieved from thePrimitiveSet
only during evaluation. Therefore, thename
of a symbol terminal must be a valid and unique Python identifier. The value of a symbol terminal can be any object that leads to a legal expression when evaluated.
-
class
geppy.core.symbol.
Terminal
(name, value)[source]¶ Bases:
geppy.core.symbol.Primitive
Class that encapsulates a terminal in GEP.
-
__init__
(name, value)[source]¶ Initialize a terminal.
Parameters: - name – str, name of the terminal
- value – value of the terminal
-
format
()[source]¶ Get a string representation of this terminal for subsequent evaluation purpose, which may be a symbol or a numeric string.
Returns: str, a string representation
-
value
¶ Get the value of this terminal.
-