MEPX software

version 2021.12.8.0-beta

Home
    News
Description
Papers
Source code
MEPX Software
    Videos
    User manual
Links
Contact
Download User manual What is new

How to install Multi Expression Programming X

Windows 7, 8, 10 (64bit)

Just download the program from here:

mepx_win64.zip (2.37 MB)

unzip the archive and run the mepx.exe application. There is no installation kit. Please remember where you saved it so that you can run it next time.

Apple macOS / OSX (64bit) (version 10.9 or newer)

Download the program from here:

mepx_macos.zip (4.37 MB)

It is .zip archive. Double-click it in Finder. It should be decompressed in the same folder as the zip archive. Open it from there (you should right click the icon and choose Open command).

Ubuntu (64bit) (tested on Ubuntu 18)

Download the program from here:

mepx.deb (2.23 MB)

Install the program with Ubuntu Software Center.

If the icons are not shown on buttons, please open a Terminal and run the following command (this will display icons on buttons - which are disabled by default):

gsettings set org.gnome.settings-daemon.plugins.xsettings overrides "{'Gtk/ButtonImages': <1>, 'Gtk/MenuImages': <1>}"

Test data

Test projects (taken from PROBEN1 and other datasets) can be downloaded from: MEPX test projects on Github. Just download a .xml file and press the Load project button from MEPX to load it.

User manual

  1. Quick start
  2. Data
  3. Parameters
    1. Problem type
    2. Fitness function
    3. Other parameters related to data
    4. Mathematical operators
    5. Constants
    6. Algorithm
    7. Runs
  4. Results
    1. Error
    2. Model
    3. Source code
    4. Evolution
  5. The format of the project files
  6. Running from command line
  7. Integrating MEPX into other programs
  8. MEPX formulas as Excel functions
  9. Reporting problems

1. Quick start

  1. Select Data panel.
  2. Select Training data panel.
  3. Press Load training data button and choose a csv or txt file. Data must be separated by blank space, tab or ;.
  4. Select Parameters panel. Modify some parameters if needed. For instance, one could modify code length, number of subpopulations, the (sub)population size, number of generations etc. Also specify the problem type (regression or binary classification).
  5. Press Start button from the main toolbar.
  6. Read the results from Results panel.
  7. You can also save the entire project (data, parameters, results) by pressing the Save project from the main toolbar.

2. Data

data

Data are loaded from csv or txt files. Data must be separated by blank space, tab or ;.

Last value (y column) on each line is the target (expected output). Test data can be without output (they may have one column less than training data).

Currently the problems can have only 1 output (see below an exception for classification problems). Files containing multiple outputs must be split accordingly (for instance Building problem from PROBEN1 which has 3 outputs (energy, hot and cold water)).

For classification problems, the last column may contain only values 0 or 1 (for binary classification) or values 0,1 ... (num_classes - 1) for more multi-class classification.

It is also possible that the output for classification problems to be given in One-of-m format. For instance if the problem has 5 classes, the output will have 5 values, one of them being set to 1 and all others being set to 0. This type of format is loaded from files with dt extension.

Training data is compulsory. The others (validation and test) are optional.

You can also load alphanumeric values.

alphanumeric

Alphanumeric values must be converted to numerical values before running the analysis. You have several specialised buttons for that:

advanced controls
  • To numeric - which will do an automatic conversion of alphanumerical values to integer values. First alphanumerical value will be converted to 0, the second (distinct one) to one and so on.
  • Replace values - which will replace some values (alphanumeric with numeric). Find and replace works with regular expressions too.

The user can also scale numerical values to a given interval.

A description of the problem can also be given and it will be saved as the part of the project.

problem description

3. Parameters

3.1. Problem type

problem type

Can be:

  • Regression (sometimes called function fitting)
  • Binary classification (with 2 classes)
  • Multi-class classification (with 2 or more classes)

3.2. Error (Fitness) function

error measure

The algorithm will try to minimize the error.

Regression

For symbolic regression problems, the fitness is either Mean Absolute Error (sum of errors divided by the number of examples) or Mean Squared Error (sum of squared error divided by the number of examples).

Classification

For classification problems the fitness is computed in multiple ways depending on problem or strategy. However what we report, in the resulted tables, is the percentage of incorrectly classified data (the number of incorrectly classified examples divided by the number of examples and multiplied by 100).

A classification problem with 2 classes can be solved by selecting either binary classification or multi-class classification.

Binary classification uses a threshold for making distinction between classes. Values less or equal to the threshold are classified as belonging to class 0 and the others are classified as belonging to class 1.

In the case of binary classification, the threshold is computed automatically (because of that, binary classification can be slower sometimes).

For multi-class classification there are 3 strategies:

  • Winner takes all - fixed positions -the outputs are assigned to groups of genes and the gene encoding the expression having the first maximal value will provide class for that data (see more details here: google groups post).
  • Winner takes all - fixed positions - smooth fitness
  • Winner takes all - best genes

If use validation set is checked then, at each generation, the best individual is run against the validation set, and the best such individual (from those tested against the validation set) is the output of the program (and will be applied on the test data).

It is possible to run the optimization on a smaller set of training data. In such case you have to set the Random subset size to a value smaller than the size of the training set. The set is changed after Num generations for which random subset is kept fixed.

3.4. Functions (or mathematical operators)

Here is the list of operators that can appear in the result.

functions

These are classic arithmetic operators +, -, *, ... nothing new here.

Do not confuse them with genetic operators used by the MEP algorithm.

Note that trigonometric operators work with radians.

3.5. Constants

constants

In order to enable constants, one must define a probability greater than 0 for constants. You cannot edit that probability directly, but constants_probability + operators_probability + variables_probability = 1. So if you define a value for probability for operators or variables such that their sum is less than 0, you will get a greater than 0 value for constants.

Constants can be user defined or generated by the program (over a given interval). Generated constants can be kept fixed for all the evolution or they can also evolve. Mutation of constants is done by adding a random value between [-max delta, +max_delta].

3.6. The algorithm

The data analysis cannot be made instanteniously due to the complexity of the problem. This is why a special algorithm inspired from biology is used. The algorithm works with a multiple potential solutions that are modified (hopefully improved) along a number of generations.

MEPX uses a steady state model with multiple subpopulations. Steady-state means that inside one subpopulation, the worst individuals are replaced with newer ones (if the newer are better).

Parameters for the algoritm are specified in the below image:

parameters

User may specify the number of subpopulations. Each subpopulation will run independently from the others and, after one generation, they will exchange few individuals.

Genetic operators (crossover and mutation) are classic ... nothing new here.

It is possible to specify how often the variables, operators and constants should appear in a chromosome. This is done probabilistically. If you want more operators to appear, please increase the operators probability. More operators mean more complex expressions.

Sum of functions (operators) probability, variables probability and constants probability must be 1. The constants' probability is computed automatically as 1 - the sum of the other 2 probabilities.

3.7. Runs

runs

Random seed - It is also possible to specify the initial seed of the first run (consecutive runs will start from the previous seed + 1).

Num runs - Usually multiple runs must be performed for computing some statistics.

Num threads - Specify the number of processor's cores used for running the algorithm. Each subpopulations is run on a thread. This can increase the speed of analysis significantly. If you have a quad core processor with hyper-threading, you may set the number of threads to 8. For best results make sure that the number of subpopulations is a multiple of the number of threads.

4. Results

The following results are displayed:

4.1. Error

Error for the entire training, validation and test set. In the case of classification problems we display the number of incorrectly classified data in percent.

error

4.2. Model

Obtained value for each data in the training, validation and test set (also called Model or Output).

model

4.3. Source code

C or Excel VBA source code of the best solution. This code can be simplified in order to show only instructions that generate the output (remember that not all genes of a chromosome participate to the solution - these genes are called introns). Note that there is no simplification in the case of multi-class classification.

source code in C++

Excel VBA code can be used to create a custom formula in Excel.

Excel VBA code

One can analyze new data directly from Excel with the formula discovered by MEPX.

MEPX Excel formula

A movie with this feature is here on YouTube.

4.4. Evolution

Evolution of fitness (this can be different from the number of incorrectly classified data) for the best individual in the population and the population average.

evolution

5. The format of the project files

Projects can be saved/loaded to/from .xml files which is the default format of the MEPX.

The xml file has a tree structure. The root node is named project.

Inside project we have other nodes like mepx_version, problem_description, algorithm.

Inside the algorithm node we have training, validation, test, parameters, operators, results, etc

Main structure of the .xml files is the following:

<?xml version="1.0"?>

<project>

    <mepx_version> </mepx_version>

    <problem_description> </problem_description>

    <algorithm>

        <training> </training>

        <validation> </validation>

        <test> </test>

        <variables_utilization> </variables_utilization>

        <parameters> </parameters>

        <constants> </constants>

        <operators> </operators>

        <results> </results>

    </algorithm>

</project>


6. Running from command line

MEPX can be run in dual mode:

  • If you run it with no parameters the program will create the standard user interface.
  • If you supply with 2 or 3 parameters (see below), the program will create no interface and it will just read the data from a file and it will output the results in another file.

The command line has this form:

mepx.exe input.xml output.xml

, where input.xml is a valid MEPX project.

If the files are not in the same directory, please make sure that you give the full path for them!

If you want to stop it earlier, run this command:

mepx.exe input.xml output.xml stop_file_name

Note: Last parameter is optional and is used only if you want to stop the search process earlier.

7. Embedding the MEPX in your application

If you want to embed the program in your Windows application you can create a new process using the CreateProcess function from Windows API.

When the process is over you should read the results from the output.xml file. You can check if the process is almost over by checking for the existance of output.xml. The existance of the file does not guarantee that the writing of data is over. You can check of the file writing is over by looking for the </project> tag in the xml file. Or you can simply check if the file can be renamed.

The search process can take few miliseconds or minutes or hours depending on the data to be analyzed. If you want to stop the search process before it naturally ends, you should create a file (stop_file_name)) which you specified as the last parameter of the command line. The program will check for the existence of that file and will stop the search when that file is found. Make sure that you delete this file if you previously created it with another search!

8. Use MEPX formulas as Excel functions

  1. First, run MEPX on some data to obtain the formula.
  2. Go to the Results tab and show source code in Excel VBA (Visual Basic for Applications).
  3. Start Excel and load some data. They must have the same number of columns as the data used by MEPX to discover the formula. The target column may or may not exist.
  4. Now we will create a custom Excel formula with the code generated by MEPX.
  5. Press Alt+F11 to open the Excel VBA editor for Windows. On Mac, you should press Fn+Alt+F11.
  6. Copy/Paste the formula generated by MEPX into the VBA editor.
  7. Save the Excel file as xlsm (Excel Macro-Enabled Workbook). This format is required because now we have some code in our xls file. Is not only data. It is VBA code too.
  8. Now we have a new Excel formula named mepx that we will use for our data. The formula accepts a Range of cells as the parameter (e.g. A1:D1). The formula can be applied for 1 row at a time.
  9. Go to Excel data file (press Alt+F11 to switch from data to code and vice-versa).
  10. Use the formula to compute the output.

A movie with this feature is here on YouTube.

You need version 2021.10.29 or newer to export in VBA for Excel.

9. Reporting problems, bugs, comments

If you have problems with this program please save the project (by pressing the Save Project button from the main toolbar) and send it to mihai.oltean@gmail.com