
AbstractMulti Expression Programming (MEP) is a method for automatic generation of computer programs. Particularly, MEP can be used for generating mathematical expressions for data analysis (regression, classification and time series). MEP differentiates from other GP techniques by encoding multiple solutions in the same chromosome. MEP strengthsMEP has several strong advantages compared to other techniques:
Representation of solutionsMEP representation is not specific. Linear and treebased representations have been tested (see the list of papers). Here we exemplify with a linear representation.
ExampleConsider a representation where the numbers on the left positions stand for gene labels. Labels do not belong to the chromosome, as they are provided only for explanation purposes. For this example we use the set of functions: F = {+, *}, and the set of terminals: T = {a, b, c, d}. An example of chromosome using the sets F and T is given below:
The maximum number of symbols in MEP chromosome is given by the formula: Number_of_Symbols = (n + 1) * (Number_of_Genes  1) + 1, where n is the number of arguments of the function with the greatest number of arguments. The maximum number of effective symbols is achieved when each gene (excepting the first one) encodes a function symbol with the highest number of arguments. The minimum number of effective symbols is equal to the number of genes and it is achieved when all genes encode terminal symbols only. Decoding MEP chromosome and the fitness assignment processNow we are ready to describe how MEP individuals are translated into computer programs. This translation represents the phenotypic transcription of the MEP chromosomes. Phenotypic translation is obtained by parsing the chromosome topdown. A terminal symbol specifies a simple expression. A function symbol specifies a complex expression obtained by connecting the operands specified by the argument positions with the current function symbol. For instance, genes 1, 2, 4 and 5 in the previous example encode simple expressions formed by a single terminal symbol. These expressions are: E_{1} = a, E_{2} = b, E_{4} = c, E_{5} = d, Gene 3 indicates the operation + on the operands located at positions 1 and 2 of the chromosome. Therefore gene 3 encodes the expression: E_{3} = a + b. Gene 6 indicates the operation + on the operands located at positions 4 and 5. Therefore gene 6 encodes the expression: E_{6} = c + d. Gene 7 indicates the operation * on the operands located at position 3 and 5. Therefore gene 7 encodes the expression: E_{7} = (a + b) * d. E_{7} is the last expression encoded by in chromosome. There is neither practical nor theoretical evidence that one of these expressions is better than the others. This is why each MEP chromosome is allowed to encode a number of expressions equal to the chromosome length (number of genes). The chromosome described above encodes the following expressions: E_{1} = a, E_{2} = b, E_{3} = a + b, E_{4} = c, E_{5} = d, E_{6} = c + d, E_{7} = (a + b) * d. The value of these expressions may be computed by reading the chromosome top down. Partial results are computed by dynamic programming and are stored in a conventional manner. Due to its multi expression representation, each MEP chromosome may be viewed as a forest of trees rather than as a single tree, which is the case of Genetic Programming. As MEP chromosome encodes more than one problem solution, it is interesting to see how the fitness is assigned. The chromosome fitness is usually defined as the fitness of the best expression encoded by that chromosome. For instance, if we want to solve symbolic regression problems, the fitness of each subexpression E_{i} may be computed using the formula: fitness(E_{i}) = sum(obtained_{k,i}  target_{k,i}), k=1,... ,nwhere obtained_{k,i} is the result obtained by the expression E_{i} for the fitness case k and target_{k} is the targeted result for the fitness case k. In this case the fitness needs to be minimized. The fitness of an individual is set to be equal to the lowest fitness of the expressions encoded in the chromosome: fitness(C) = min fitness(E_{i}).When we have to deal with other problems, we compute the fitness of each subexpression encoded in the MEP chromosome. Thus, the fitness of the entire individual is supplied by the fitness of the best expression encoded in that chromosome. Why encoding multiple solutions within a chromosome?When you compute the value of an expression encoded as a GP tree you have the compute the value of all subtrees. This means that all GP subtrees can be viewed as a potential solution of the problem being solved. Most of the GP techniques considers only the one tree while ignoring all the other subtrees. However, the value/fitness for all subtrees is computed by GP. The biggest difference between MEP and other GP techniques is that MEP outputs the best subtree encoded in a chromosome. Note that the complexity (roughly speaking  the running time) is the same for MEP and other GP techniques encoding 1 solution/chromosome. The second reason for the this question is motivated by the No Free Lunch Theorems for Search. There is neither practical nor theoretical evidence that one of the solutions encoded in a chromosome is better than the others. More than that, Wolpert and McReady proved that we cannot use the search algorithm's behavior so far for a particular test function to predict its future behavior on that function. How to efficiently encode multiple solutions within a chromosome is sometimes difficult. There is no general prescription on how to encode multiple solutions in the same chromosome. In most cases this involves creativity and imagination. However, some general suggestions can be given. In order to obtain some benefits from encoding more than one solution in a chromosome we have to spend a similar effort (computer time, memory etc) as in the case of encoding a single solution in a chromosome. For instance, if we have 10 solutions encoded in chromosome and the time needed to extract and decode these solutions is 10 times larger than the time needed to extract and process one solution we got nothing. In this case we can not talk about an useful encoding. We have to be careful when we want to encode a multiple solutions in a variable length chromosome (for instance in a standard GP chromosome), because this kind of chromosome will tend to increase its size in order to encode more and more solutions. And this could lead to bloat. Usually encoding multiple solutions in a chromosome might require storing the partial results. Sometimes this can be achieved by using the Dynamic Programming technique. This kind of model is employed by the Multi Expression Programming. Genetic operatorsCrossoverOne cutting point or uniform crossover have been tested with similar results. The cut points are chosen between instructions not inside them. Onepoint recombination operator in MEP representation is similar to the corresponding binary representation operator. One crossover point is randomly chosen and the parent chromosomes exchange the sequences at the right side of the crossover point. MutationEach symbol (terminal, function of function pointer) in the chromosome may be the target of the mutation operator. Some symbols in the chromosome are changed by mutation. To preserve the consistency of the chromosome, its first gene must encode a terminal symbol. We may say that the crossover operator occurs between genes and the mutation operator occurs inside genes. If the current gene encodes a terminal symbol, it may be changed into another terminal symbol or into a function symbol. In the later case, the positions indicating the function arguments are randomly generated. If the current gene encodes a function, the gene may be mutated into a terminal symbol or into another function (function symbol and pointers towards ar guments). Main algorithmThe MEP algorithm starts by creating a random population of individuals. The following steps are repeated until a given number of generations is reached:
Read moreRead more about MEP and MEP for symbolic regression problems here. Read more about MEP for classification problems here. 