OptML A modelling language for optimization problems |
Overview
OptML is a modelling language for expressing linear program (LP) and mixed-integer linear program (MILP) models. It stands for Optimization Modelling Language. Unlike some other modelling languages, OptML does not give instructions to, or control, a solver; it is purely for expressing the form of the model. OptML is the modelling language of the Minimax iPhone app.
OptML was designed to be extremely easy to express simple models, with a syntax very close to how one would naturally write an optimization model on paper. But it is also very flexible and able to concisely express large complicated models, allowing close interaction with external datasets. The syntax and interaction with datasets was designed for the constraints of mobile devices.
OptML has two distinct syntaxes: basic syntax and macro syntax. Basic syntax is how the model variables, constraints and objective function are expressed. Macro syntax offers powerful flexibility to generate text with loops, math expressions, and substitutions from datasets.
An OptML model is parsed in two distinct phases. First, the macro compiler passes through the model processing any macro syntax lexicographically (as it comes to it). The output of the macro compiler is simply text, and it should be basic syntax. If there is no macro syntax in the model, then the compiler phase has no effect on the model. The second phase is where the basic syntax model is interpreted, and internalised in the solver, ready to solve.
The sections that follow give a complete introduction to the syntax and use of OptML:
Basic syntax
The core of OptML is the basic syntax for expressing linear models. The basic syntax consists of an objective function and a number of bounds and constraints.
max 143x + 60 * y : 120 x + 210y <= 15000 : 110x + 30y <= 4000 : x + y <= 75
The model must start with the objective sense, max
or min
, indicating whether the problem is a maximization problem or a minimization problem. Following straight after must come a linear expression which is the objective function.
Then follows one or more constraints. There are several forms that a constraint may take, but every constraint begins with a colon ":
".
Unlike in some other modelling languages, it is not necessary to declare the variable names separately, these are automatically determined from the model.
Variable names may contain alphanumeric characters and the underscore character (0‑9
, a‑z
, A‑Z
, _
), and may not start with a digit (0-9
). Variable names are case insensitive, the first occurence of the variable is used in the results.
Regular constraints
Regular constraints consist of one or more linear terms on both side of an equals sign or inequality (=
, >=
, or <=
). A term consists of either a variable name, a constant, or a coefficient times a variable. Parentheses may be used as long as they don’t evaluate to non-linear terms (variables multiplying each other). When the model is parsed terms are grouped together. The following are all valid constraints:
: 10x - 5 * y <= + 2000 + 10y : 200 >= x : 5 x = 2y : 0.5 * (230*x + y + 10) - -5 x >= - x /2
Note that the multiplication sign “*
” is optional if it makes sense in regular math notation. Note also that whitespace is not significant, so the following is valid:
: 10x-5*y <= 2000 + 10y : x >= 20 : y <= -3 : x <= 100
Bounds
Bounds are a special case of regular constraints, with a single variable:
: x <= 100
Any constraint that can be simplified to this form still counts as a bound. For example the following constraint simplifies to the previous bound:
: 3 x - 10 <= (180 + 4 x) / 2
Multi-part constraints and bounds
For convenience, it is possible to specify multi-part constraints, with more than one inequality sign, e.g:
: 10 <= x + y <= 20
Multi-part constraints can have as many parts as desired, and these are translated to the equivalent two-part constraints with a single inequality. For example, the following multi-part constraint
: 12 <= x + 2y <= 3x - z <= 25
is translated by the parser to the following two-part constraints
: 12 <= x + 2y : x + 2y <= 3x - z : 3x - z <= 25
Variable types
A variable may optionally be defined with one or more types, identified by a keyword. Each keyword has multiple aliases that may be used interchangably (and they are case insensitive).
- Integer - the variable may only take integer (whole number) values, such as
-2
,0
,1
,5
, etc.
Keywords:INTEGER
orINT
. - Binary - the variable may only take the values
0
or1
.
Keywords:BINARY
orBIN
. - Free - the variable does not have a default upper limit or lower limit (allows negative values).
Keywords:FREE
orUNBOUNDED
. - Non-negative - the variable must be greater-than or equal-to zero (negative values are not allowed).
Keywords:NONNEGATIVE
orNONNEG
.
Variables are non-negative by default, unless specifically constrained to be free
.
The definition of any variables as integer or binary automatically makes the problem a Mixed Integer Linear Program (MILP), it is not necessary to specify the problem type.
The variables are given a type by specifying a type keyword in a constraint with no inequality:
: x integer : x FREE : y bin
List syntax
Bounds and variable type definitions can use a special feature called list syntax. This is a way of specifying the same bound or type for multiple variables at once to save space. List syntax can only be used on a straight list of variable names, not an expression that simplifies to a variable name.
When variables are separated by commas “,
”, the bound or type definition is applied to all the variables in the list, so
: x, y, z <= 10
is equivalent to
: x <= 10 : y <= 10 : z <= 10
In addition, each of the items between the commas may be a set of variables. Each set is defined using either a range or a wildcard.
Variable range list syntax
A variable range applies to variables that conceptually have subscripts, i.e., end in a number. Suppose we have five variables: x1
, x2
, x3
, x4
, and x5
. We can specify a subset of these by using the “~
” symbol, for example:
x1, x3~x5 INT
is equivalent to
x1, x3, x4, x5 INT
The prefix part of the variable name must be the same for both sides of the range, so “x2~y5
” is invalid. The exception to this is where we exclude the prefix for the second part of the range altogether. So the following range
x1~x100
may be equivalently expressed
x1~100
This is purely for convenience when specifying ranges with large variable names.
Variable wildcard list syntax
There are two types of variable wildcard that can be used in list syntax. Number wildcards (or “single-dollar” wildcards) have part of a variable name followed by a “$
” symbol, which substitutes for all variables that match where the “$
” symbol is a number.
Suppose we have the following variables defined elsewhere in our model:
var, var1, var3, var450, variable_Limit, x21
Then we can use “var$
” to substitute for
var1, var3, var450
Super wildcards (or “double-dollar” wildcards) are used to match any characters that come after, so we can use “var$$
” to substitute for
var, var1, var3, var450, variable_Limit
Note the difference that “var$$
” matches “var
”, but “var$
” does not. The $$
wildcard can match nothing, but the $
wildcard must match a number. A consequence of this is that “$$
” by itself will match all defined variables. So if we are trying to model a pure 0-1 integer program, with all variables binary, we simply need to define the following constraint:
: $$ binary
Using list syntax is completely optional; it is simply a convenience feature to save space and make models more concise. All constraints may be equivalently written out in full. List syntax may not be used in regular constraints that contain linear expressions rather than just variable names.
It is important to remember that, when using a range or wildcard variable definition, only the variables that are otherwise defined in a regular constraint or the objective function will be included. So if we only have variables x2
and x3
in our model and we have a bound with “x1~x10000
”, this only translates to actual bounds for x2
and x3
. This means that the range and wildcard abbreviations can be used freely without fear of unnecessarily defining new variables.
Comments
Comments are parts of the code that are ignored by the parser. There are two types of comments supported in OptML. The first type is a single-line comment, and is denoted by the hash symbol “#
”. Everything from the hash symbol to the end of that line is part of the comment:
# This is a comment max x + y : x + y <= 17 # This is another comment
The second type can be used to comment out part of a line, or multiple lines. It is denoted by a starting “/*
” and an ending “*/
”, and everything between is commented.
/* This is a multi-line comment that can be used to document the model or disable certain sections. The one below comments out var2. */ max var1 + /* var2 + */ var3
Macro syntax introduction
Macro syntax is a way of writing large basic syntax models using shorthand. It has constructs like loops and conditionals, and allows you to read in values from datasets (key-value lists). All macro syntax is processed and converted to equivalent basic syntax as a compiler step, and you can view this intermediate output in the parser view to make sure you have it correct.
Macro syntax consists of symbols and datasets and macro expressions. These are explained in detail below.
Symbols and datasets
Two of the key features of OptML macro scripting syntax are symbols and datasets.
Symbols are named parameters that are assigned a value, for example at the start of a model, and then later evaluated to substitute that value in the model. Symbols can either be defined within the body of the model itself, or in a dataset that is imported within the model.
A dataset is a set of key-value pairs that exists independently of a particular model. This allows multiple models to use the same parameters, and for data to be updated independently of the structure of an OptML model. Combined with macro scripting syntax, datasets allow quite large and complicated models to be expressed very concisely.
Symbols are registered in a symbol table, which can be thought of as a map that links a symbol name (case insensitive) to a value. If the same symbol is registered again, then the new value overwrites the previous value. Symbols are treated as text, even if they look like a number. Leading and trailing whitespace is stripped from symbol values, but they may contain whitespace. All internal whitespace is compressed to a single space. For example, two spaces is compressed to one space, and a new line followed by four spaces is also compressed to a single space within a symbol value.
Symbols introductory example
A symbol can be thought of as a variable where you can store some text (which could be a number), and refer to it later in the model. For example, consider the basic syntax model introduced above.
max 143x + 60 * y : 120 x + 210y <= 15000 : 110x + 30y <= 4000 : x + y <= 75
We might decide to keep the RHS values at the top of the model, so that they can be easily modified as parameters:
# Define our parameters [@ limit1=15000, limit2=4000] # The model max 143x + 60 * y : 120 x + 210y <= &limit1 : 110x + 30y <= &limit2 : x + y <= 75
You can see several features here. At the top our two parameters are defined. Don’t worry too much about the [@...]
characters, these are explained in more detail below. Where the RHS values were we now see the symbols being evaluated, with their name prefixed by &
. Remember that when this model is parsed by the macro compiler it turns into the same basic syntax as defined above.
Now we could run this model multiple times, simply changing the parameters at the top to get different results. It can act as a template. We could further separate the form of the model from the actual parameter values by putting our two symbols in a dataset. A dataset is a set of key-value pairs that define symbols and the values they should take. Say we have defined the following dataset, called “Parameters”:
Key | Value |
---|---|
limit1 | 15000 |
limit2 | 4000 |
We could then express our model as:
# Define our parameters [@/Parameters] # The model max 143x + 60 * y : 120 x + 210y <= &limit1 : 110x + 30y <= &limit2 : x + y <= 75
The [@/...]
syntax instructs the compiler to read the values of the symbols in the dataset, just as if they had been defined in the model itself. Now our model template can stay unchanged, and we could just modify the dataset.
Symbol scope
Symbols have a scope in which they are valid. Every macro expression (explained below) creates a new scope and a new symbol table, and symbols can either be defined in this local scope, or in global scope. Scopes can be nested, and when a symbol is evaluated, the local scope is checked first, and then the next level up, and so on until the global symbol table.
Symbol and dataset definition
A symbol definition expression is used to register symbols. A single symbol definition expression can be used to register multiple symbols, and the subexpressions are separated by commas. An example symbol definition expression is given below:
[@sym1=10, sym2=1.32, /dataset number 1, sym1=xyz]
A standard symbol definition expression is enclosed by square brackets and starts with an "@
" symbol. The above symbol definition expression defines two symbols explicitly (sym1
is redefined, so the first definition has no effect), and one dataset. A symbol is defined with an "=
" sign, with the symbol name on the left and the value on the right. Importing a dataset starts with a forward-slash "/
" sign followed by the dataset name. If any of the key-value pairs in the dataset had the same name as an existing symbol, its value would be replaced. The effect of importing a dataset is exactly the same as if all the key-value pairs had been listed in the symbol definition expression, in the order they appear in the dataset.
Symbol definition expressions can occur at any point in the model (with a few exceptions discussed later). Whenever a symbol with the same name (case insensitive) is reregistered, its value is replaced from that point on.
As discussed above in the section on symbol scope, symbol definition expressions of the form [@...]
define symbols in global scope. To define symbols in local scope, use [%...]
.
Symbol evaluation
A symbol evaluation expression is used to look up the value of symbols in the symbol table and replace that value in the model where the symbol evaluation expression was. They start with an "&
" character. A few example expressions are shown below:
&mysym &mysym.x &&x&y &&x.&y
One way to think of multiple &
characters is that all the single &
subexpressions are evaluated first, and then these feed into the &&
expressions, and so on. The exception is when a subexpression ends with a .
- this forces evaluation.
Redundant &
characters are ignored, so &&x
is equivalent to &x
. However, if you force evaluation with a .
then they can have different evaluations. So, &&x.
first evaluates x
, and then evaluates the value of x
as a symbol in its own right.
If you actually want a decimal point in the middle of the string you are trying to create, then you need to make sure that the symbol evaluation expression to the left of the point is fully realised, so the macro scripting code below would evaluate to 37.5
[@sym1=37,sym2=5] &sym1..&sym2
Macro expressions
Macro expressions define some logic that is executed by the macro compiler to generate text. Macro expressions start and end with curly braces: "{...}
". Each macro expression is evaluated and the result is substituted for the original expression, then passed again to the macro compiler to consider. So macro expressions can be nested, and the eventual output will be plain text corresponding to basic OptML syntax.
Macro conditional expressions
Macro conditional expressions are essentially if‑then‑else statements. They compare two expressions for equality, and based on the outcome one block of text or another is chosen to be passed on to the macro compiler. They take the form "{? A=B; C | D}
", which can be read "if A=B then C else D
". For example:
{? xxx = yyy; text-if-true | text-if-false }
The expression starts with "{?
", then has three parts: an equality expression then a semi-colon ";
", followed by a true clause, a vertical pipe "|
", then a false clause.
The equality expression is a text comparison, so numbers are evaluated as text. Either side may be the result of evaluating some other expression, or some symbol, or any combination.
Before the comparison, each expression has surrounding whitespace stripped, and each occurence of interior whitespace is compressed to a single space, including new lines.
Macro loop expressions
Macro loop expressions when evaluated, pass on a block of text multiple times, and at each iteration a local symbol is defined with a different value. They take the form "{% sym=A,B,...; XXX}
". For example:
{% i=76, 1~2, abc, 10~7 (-1); + x&i }
This would evaluated by the macro compiler to the following text, which would then be re-evaluated by the parser:
+ x76 + x1 + x2 + xabc + x10 + x9 + x8 + x7
The expression starts with "{%
", then defines a set of values for a local symbol to take, and the output is repeated once for each of the distinct values of the local symbol. The symbol name is followed by an equals "=
" sign, then one or more values for the symbol to take, separated by commas. Next comes a semi-colon ";
" then the expression to be repeatedly passed on as output.
The values that the local symbol should take are separated by commas and are either a text expression ("76
" and "abc
" in the example above), or a numeric range ("1~2
" and "10~77 (‑1)
" above). The numeric range represents a set of numbers which are then interpreted as text. A numeric range takes the form "A~B (C)
", where A
is the start of the range, B
is the end of the range, and C
is the amount to increment. An abbreviated form of the range excludes the increment, i.e. "A~B
", in which case the default increment is 1
.
Macro block expressions
A macro block expression does not perform any special logic, it exists simply to allow symbols to be defined with local scope. It takes the following form:
{; XXX}
Any symbols defined in the block "XXX
" have local scope and expire when the block is finished, or take their previous values if they were previously defined in an expression of higher scope.
Macro math expressions
Macro math expressions allow arbitrarily complicated expressions to be evaluated numerically. They take the form "{...}
", and the resulting output will be a number. Internally, the input can consist of other macro expressions, symbols, and any desired math expression.
The math expressions are evaluated using the DDMathParser library. This includes arithemtic, many mathematical functions, inequalities, and boolean operators. True sub-expressions evaluate to 1
and false to 0
. Check the DDMathParser documentation for a complete description of supported functions.
For example, the following macro math expression evaluates to 243.32898
.
{ 3 * 2 PI() + sin(1/2) + 15**2 - (5 > 2) }
It is important to emphasise that math expressions, like all macro expressions, are evaluated and result in text output. In this case a number which would represent a coefficient or RHS value in an optimization model. They are not a method of sneaking non-linearity into a model.