geppy.core package

Submodules

geppy.core.entity module

The module entity defines the core data structures used in GEP, including the gene, the chromosome, the K-expression and the expression tree. We use primitives (functions and terminals) to build genes and then compose a chromosome with one or multiple genes. Refer to the geppy.core.symbol module for information on primitives.

A chromosome composed of only one gene is called a monogenic chromosome, while one composed of more than one genes is named a multigenic chromosome. A multigenic chromosome can be assigned a linking function to combine the results from multiple genes into a single result.

class geppy.core.entity.Chromosome(gene_gen, n_genes, linker=None)[source]

Bases: list

Individual representation in gene expression programming (GEP), which may contain one or more genes. Note that in a multigenic chromosome, all genes must share the same head length and the same primitive set. Each element of Chromosome is of type Gene.

__init__(gene_gen, n_genes, linker=None)[source]

Initialize an individual in GEP.

Parameters:
  • gene_gen – callable, a gene generator, i.e., gene_gen() should return a gene
  • n_genes – number of genes
  • linker – callable, a linking function

Note

If a linker is specified, then it must accept n_genes arguments, each produced by one gene. If the linker parameter is the default value None, then for a monogenic chromosome, no linking is applied, while for a multigenic chromosome, the tuple function is used, i.e., a tuple of values of all genes is returned.

__str__()[source]

Return the expressions in a human readable string.

classmethod from_genes(genes, linker=None)[source]

Create a chromosome instance by providing a list of genes directly. The number of genes is implicitly specified by len(genes).

Parameters:
  • genes – iterable, a list of genes
  • linker – callable, a linking function. Refer to Chromosome.__init__() for more details.
Returns:

Chromosome, a chromosome

head_length

Get the length of the head domain. All genes in a chromosome should have the same head length.

kexpressions

Get a list of K-expressions for all the genes in this chromosome.

linker

Get the linking function.

max_arity

Get the max arity of the functions in the primitive set with which all genes are built. Note that all genes in a chromosome should share the same primitive set.

tail_length

Get the length of the tail domain. All genes in a chromosome should have the same tail length.

class geppy.core.entity.ExpressionTree(root)[source]

Bases: object

Class representing an expression tree (ET) in GEP, which may be obtained by translating a K-expression, a gene, or a chromosome, i.e., genotype-phenotype mapping.

class Node(name)[source]

Bases: object

Class representing a node in the expression tree. Each node has a variable number of children, depending on the arity of the primitive at this node.

__init__(name)[source]

Initialize self. See help(type(self)) for accurate signature.

children

Get the children of this node.

name

Get the name (label) of this node.

__init__(root)[source]

Initialize a tree with the given root node.

Parameters:rootExpressionTree.Node, the root node
classmethod from_genotype(genome)[source]

Create an expression tree by translating genome, which may be a K-expression, a gene, or a chromosome.

Parameters:genomeKExpression, Gene, or Chromosome, the genotype of an individual
Returns:ExpressionTree, an expression tree
root

Get the root node of this expression tree.

class geppy.core.entity.Gene(pset, head_length)[source]

Bases: list

A single gene in GEP, which is a fixed-length linear sequence composed of functions and terminals.

__init__(pset, head_length)[source]

Instantiate a gene. :param head_length: length of the head domain :param pset: a primitive set including functions and terminals for genome construction.

Supposing the maximum arity of functions in pset is max_arity, then the tail length is automatically determined to be tail_length = head_length * (max_arity - 1) + 1. The genome, i.e., list of symbols in the instantiated gene is formed randomly from pset.

__str__()[source]

Return the expression in a human readable string, which is also a legal python code that can be evaluated.

Returns:string form of the expression
classmethod from_genome(genome, head_length)[source]

Build a gene directly from the given genome.

Parameters:
  • genome – iterable, a list of symbols representing functions and terminals
  • head_length – length of the head domain
Returns:

Gene, a gene

head

Get the head domain of the gene.

head_length

Get the length of the head domain.

kexpression

Get the K-expression of type KExpression represented by this gene.

max_arity

Get the max arity of the functions in the primitive set with which the gene is built.

tail

Get the tail domain of the gene.

tail_length

Get the length of the tail domain.

class geppy.core.entity.GeneDc(pset, head_length, rnc_gen, rnc_array_length)[source]

Bases: geppy.core.entity.Gene

Class represents a gene with an additional Dc domain to handle numerical constants in GEP.

The basic Gene has two domains, a head and a tail, while this GeneDc class introduces another domain called Dc after the tail. The length of the Dc domain is equal to the length of the tail domain. The Dc domain only stores the indices of numbers present in a separate array rnc_array(), which collects a group of candidate random numerical constants (RNCs). Thus, each GeneDc instance comes with a rnc_array, which are generally different among different instances.

In addition to the operators of the basic gene expression algorithm (mutation, inversion, transposition, and recombination), Dc-specific operators in GEP-RNC are also created to better evolve the Dc domain and to manipulate the attached set of constants rnc_array. Such operators are all suffixed with ‘_dc’ in crossover and mutation modules, for example, the method invert_dc().

The rnc_array associated with each GeneDc instance can be provided explicitly when creating a GeneDc instance. See from_genome(). Or more generally, a random number generator rnc_gen can be provided, with which a specified number of RNCs are generated during the creation of gene instances.

A special terminal of type TerminalRNC is used internally to represent the RNCs.

Note

To create an effective GeneDc gene, the primitive set should contain at least one TerminalRNC terminal. See add_rnc() for details.

Refer to Chapter 5 of [FC2006] for more knowledge about GEP-RNC.

__init__(pset, head_length, rnc_gen, rnc_array_length)[source]

Initialize a gene with a Dc domain.

Parameters:
  • head_length – length of the head domain
  • pset – a primitive set including functions and terminals for genome construction.
  • rnc_gen – callable, which should generate a random number when called by rnc_gen().
  • rnc_array_length – int, number of random numerical constant candidates associated with this gene, usually 10 is enough

Supposing the maximum arity of functions in pset is max_arity, then the tail length is automatically determined to be tail_length = head_length * (max_arity - 1) + 1. The genome, i.e., list of symbols in the instantiated gene is formed randomly from pset. The length of Dc domain is equal to tail_length, i.e., the tail domain and the Dc domain share the same length.

__str__()[source]

Return the expression in a human readable string, which is also a legal python code that can be evaluated. The special terminal representing a RNC will be replaced by their true values retrieved from the array rnc_array().

Returns:string form of the expression
dc

Get the Dc domain of this gene. :return:

dc_length

Get the length of the Dc domain, which is equal to GeneDc.tail_length()

classmethod from_genome(genome, head_length, rnc_array)[source]

Build a gene directly from the given genome and the random numerical constant (RNC) array rnc_array.

Parameters:
  • genome – iterable, a list of symbols representing functions and terminals (especially the special RNC terminal of type TerminalRNC). This genome should have three domains: head, tail and Dc. The Dc domain should be composed only of integers representing indices into the rnc_array.
  • head_length – length of the head domain. The length of the tail and Dc domain should follow the rule and can be determined from the head_length: dc_length = tail_length = (len(genome) - head_length) / 2.
  • rnc_array – the RNC array associated with the gene, which contains random constant candidates
Returns:

GeneDc, a gene

kexpression

Get the K-expression of type KExpression represented by this gene. The involved RNC terminal will be replaced by a constant terminal with its value retrived from the rnc_array() according to the GEP-RNC algorithm.

max_arity

Get the max arity of functions in the primitive set used to build this gene.

rnc_array

Get the random numerical array (RNC) associated with this gene.

tail_length

Get the tail domain length.

class geppy.core.entity.KExpression(content)[source]

Bases: list

Class representing the K-expression, or the open reading frame (ORF), of a gene in GEP. A K-expression is actually a linear form of an expression tree obtained by level-order traversal.

__init__(content)[source]

Initialize a K-expression.

Parameters:content – iterable, each element is a primitive (function or terminal)
__str__()[source]

Get a string representation of this expression by joining the name of each primitive. :return:

geppy.core.symbol module

This symbol module defines the classes encapsulating the symbols in GEP. A gene in GEP is composed of multiple symbols, which include terminals and functions. Together, they are called primitives. This module also provides a PrimitiveSet class, which represents a primitive set containing Terminal and Function items, for management purpose.

This module implementation refers a lot to the deap.gp module in DEAP, especially on primitive management and parsing during evaluation. However, unlike genetic programming (GP) in deap.gp, in GEP we usually only consider loosely typed evolution, i.e., the terminals and functions don’t require specific data types. As a result, the codes can be greatly simplified.

Note

In GEP and GP terminology, there are two kinds of primitives: functions and terminals. In deap.gp, a primitive actually refers to a function. To avoid possible ambiguity, in geppy they are explicitly named function and terminal respectively. For instance, the PrimitiveSet.add_function() method can be used to add a function primitive.

Reference:

[FC2006](1, 2, 3, 4, 5, 6) Ferreira, Cândida. Gene expression programming: mathematical modeling by an artificial intelligence. Vol. 21. Springer, 2006.
class geppy.core.symbol.ConstantTerminal(value)[source]

Bases: geppy.core.symbol.Terminal

Class that represents a constant terminal, whose value will never change during the whole evolution. The value of a constant terminal can only be a literal variable , such as a numeric number or a Boolean variable.

__init__(value)[source]

Initialize a constant terminal with the given value.

Parameters:value – int, float, bool, value of the constant, can only be a number or a Boolean variable
class geppy.core.symbol.EphemeralTerminal(name, gen)[source]

Bases: geppy.core.symbol.Terminal

Class that encapsulates an ephemeral numeric constant terminal in GEP, which can be used to build a normal constant terminal with a randomly produced value when needed using the update_value() method.

Just as the name implies, the value of an ephemeral terminal may be changed during evolution, either by mutation or by some local search heuristics, to optimize the numerical constants in a mathematical model.

Note

This special terminal named ephemeral random constant was originally introduced in genetic programming by Koza to handle numerical constants in evolutionary programming. Another way to evolve simple constants in GEP is to add a RNC domain in genes and use the GEP-RNC algorithm. See Chapter 5 of [FC2006].

See also

mutate_uniform_ephemeral() for ephemeral mutation, PrimitiveSet.add_ephemeral_terminal() to add an ephemeral termination in evolution, and GeneDc for the GEP-RNC algorithm.

__init__(name, gen)[source]

Initialize an ephemeral terminal with the given name and the random number generator gen. Its initial value is randomly generated by gen and can later be set explicitly with the value property. Alternatively, its value can be updated to a new value generated by gen with the update_value() method. Note that just like a constant terminal, the value of an ephemeral terminal can only be a number or a bool.

Parameters:
  • name – str, name of the ephemeral constant
  • gen – callable, an ephemeral number generator, which should return a random value when called with no arguments gen().
format()[source]

Get a string representation of this terminal for subsequent evaluation purpose, which is equivalently repr(self.value).

generator

Get the ephemeral random number generator.

update_value()[source]

Update the value of this ephemeral constant in place, whose new value is randomly produced by the random number generator generator.

value

Get the value.

class geppy.core.symbol.Function(name, arity)[source]

Bases: geppy.core.symbol.Primitive

Class that encapsulates a function in GEP. Note that this class only stores the function ID, i.e., the name attribute instead of the callable function itself. On the other hand, the underlying callable is retrieved somewhere else when needed, for example, from PrimitiveSet. Thus, in the whole GEP program the provided name for each function must be unique.

__init__(name, arity)[source]

Initialize a function.

Parameters:
  • name – str, name of the function, must be valid non-keyword Python identifier
  • arity – int, arity of the function
format(*args)[source]

Insert the arguments args into the function and get a Python statement to call the functions with the arguments in a string form. This returned string can afterwards by evaluated using the builtin eval() function.

>>> f = Function('add', 2)
>>> f.format(2, 10)
'add(2, 10)'
Parameters:args – arguments, whose number should be equal to the arity of this function
Returns:str, a string form of function calling with arguments
class geppy.core.symbol.Primitive(name, arity)[source]

Bases: object

Class that encapsulates a primitive in GEP. A primitive may be a function or a terminal.

__init__(name, arity)[source]

Initialize a primitive.

Parameters:
  • name – str, name of the primitive
  • arity – int, arity of the primitive
__str__()[source]

Return str(self).

arity

Get the arity of this primitive. For a terminal, the arity is 0.

format(*args)[source]

Get a string representation of the underlying concept with possible arguments. The derived classes should implement this method.

Parameters:args – variable number of arguments
Returns:str, a string
name

Get the name of this primitive.

class geppy.core.symbol.PrimitiveSet(name, input_names)[source]

Bases: object

A class representing a primitive set, which contains the primitives (terminals and functions) that are used in GEP.

Note

Each function Function and symbol terminal SymbolTerminal must have their unique names. This is because internally their true value (or the callable for a function) will be be stored in a dictionary globals inside PrimitiveSet and later be retrieved when compiling a genome. Consequently, the name must be a valid non-keyword Python identifier. To learn more about the underlying mechanism, refer to the Python builtin function compile().

Note

To use the GEP-RNC algorithm, i.e., to use GeneDc for numerical constant handling, please use the PrimitiveSet.add_rnc_terminal() method, which will add a special terminal of type RNCTerminal internally. Then, use a chromosome composed of GeneDc genes as the individual. Refer to Chapter 5 of [FC2006] for more details.

__init__(name, input_names)[source]

Initiate a primitive set with the given name and the list of input names.

Parameters:
  • name – name of the primitive set
  • input_names – iterable, a list of names for the inputs in the GEP problem, for instance, ['x', 'y'] for two inputs. Internally, a symbol terminal is built for each input.
__str__()[source]

Gets an overview of the functions and terminals in this primitive set.

add_constant_terminal(value)[source]

Add a terminal which is a constant.

There is no need to store such kind of terminals into globals for later evaluation, because their string representation acquired by format() can be used directly.

Parameters:value – value of the terminal. Only numeric and Boolean types can be accepted.
add_ephemeral_terminal(name, gen)[source]

Add an ephemeral constant of type EphemeralTerminal to the set. An ephemeral’s true value is generated by gen, which should be a zero-argument function that returns a random value, when the ephemeral is picked to build a gene. After that, the ephemeral’s value is fixed.

If you want to evolve its value actively, then the mutation operator mutate_uniform_ephemeral() should be used for ephemeral mutation. Of course, the value of an EphemeralTerminal instance can also be set directly in more advanced algorithms, e.g., via local search optimization.

Parameters:
  • name – str, name of the ephemeral
  • gen – callable, a random number generator which returns a random constant when called by gen()

Usually calling add_ephemeral_terminal() once is sufficient to generate the numerical coefficients in the model, because during evolution multiple independent copies of the ephemeral terminal will be constructed with different values. Nevertheless, if the numerical coefficients have various ranges, it may be more efficient to add multiple ephemeral terminals with corresponding generators.

Note

If the number of other kinds of terminals is very large, e.g., hundreds, then it is better to call this method multiple times even if with a single gen. The reason is that each terminal is chosen randomly and uniformly from the list of all terminals given by terminals(), and the probability of picking an ephemeral terminal will become quite low if there is only a tiny number of ephemeral terminals but a large number of other terminals, which contain typically input terminals.

Note

The special terminal named ephemeral random constant was originally introduced in genetic programming by Koza to handle numerical constants in evolutionary programming. Alternatively, to evolve the numerical constants in GEP, we can also use the GEP-RNC algorithm. See Chapter 5 of [FC2006].

See also

mutate_uniform_ephemeral() for ephemeral mutation, and GeneDc for the GEP-RNC algorithm.

add_function(func, arity, name=None)[source]

Add a function, which is internally encapsulated as a Function object.

Parameters:
  • func – callable
  • arity – number of arguments accepted by func
  • name – name of func, default None. If remaining None, then the func.__name__ attribute is used instead.
add_rnc_terminal(name='?')[source]

Add a special terminal representing a random numerical constant (RNC), as defined in the GEP-RNC algorithm. This terminal’s value is retrieved dynamically from an RNC array attached to a gene of type GeneDc according to the GEP-RNC algorithm. See also RNCTerminal and refer to Chapter 5 of [FC2006] about GEP-RNC.

Parameters:name – str, name of the terminal. For a RNC terminal, generally there is no need to specify a name and the default value ‘?’ is recommended.

Usually it is sufficient to call this method once to add only one RNC terminal, since the values of the RNC terminals at different positions are different random numerical constants.

Note

If the number of other kinds of terminals is very large, e.g., hundreds, then it is better to call this method multiple times. Please check the documentation of add_ephemeral_terminal() for more details.

add_symbol_terminal(name, value)[source]

Add a symbolic terminal, whose name points to the true value stored in globals.

For example, we may add a terminal for the constant pi, with add_symbol_terminal('pi', 3.14).

Parameters:
  • name – name of the terminal
  • value – value of the terminal
ephemerals

Get all the ephemeral terminals in this set.

functions

Get all the functions.

globals

Get a dictionary which can be used to set up the evaluation/execution environment. This dictionary can be fed into the builtin eval() and exec() functions for expression evaluation.

For example, if we call add_function(max, 2, 'max2'), then globals['max2'] corresponds to a Function object encapsulating the max() function.

input_names

Get a list of names for the input arguments.

max_arity

Get the max arity of functions in this primitive set.

name

Get the name of this primitive set.

terminals

Get all the terminals.

class geppy.core.symbol.RNCTerminal(name='?')[source]

Bases: geppy.core.symbol.Terminal

A special terminal, which is just a placeholder representing a random numerical constant (RNC) in the GEP-RNC algorithm. This class is mainly used internally by the GeneDc class. The name of a RNCTerminal object is ‘?’ by default, and its value is retrieved dynamically according to the GEP-RNC algorithm. Refer to Chapter 5 of [FC2006] for more details.

In geppy and the GEP-RNC algorithm, the RNC terminal is just a placeholder and the default name ‘?’ is recommended.

__init__(name='?')[source]

Initialize a RNC terminal.

Parameters:name – str, default ‘?’, name of the terminal
class geppy.core.symbol.SymbolTerminal(name)[source]

Bases: geppy.core.symbol.Terminal

Class that represents a symbolic terminal. Only the name of a symbol terminal participates in the genome construction, while its value is retrieved from the PrimitiveSet only during evaluation. Therefore, the name of a symbol terminal must be a valid and unique Python identifier. The value of a symbol terminal can be any object that leads to a legal expression when evaluated.

__init__(name)[source]

Initialize a symbol terminal with the symbol given by name.

Parameters:name – str, must be a valid non-keyword Python identifier
class geppy.core.symbol.Terminal(name, value)[source]

Bases: geppy.core.symbol.Primitive

Class that encapsulates a terminal in GEP.

__init__(name, value)[source]

Initialize a terminal.

Parameters:
  • name – str, name of the terminal
  • value – value of the terminal
__str__()[source]

Return str(self).

format()[source]

Get a string representation of this terminal for subsequent evaluation purpose, which may be a symbol or a numeric string.

Returns:str, a string representation
value

Get the value of this terminal.

Module contents