• Basic Concepts
• Converting between ASTs and Text Strings
• The String Formula Syntax and Differences with MathML
• Methods for working directly with libSBML's Abstract Syntax Trees
• Reading and Writing MathML from/to ASTs

This section describes libSBML's facilities for working with SBML representations of mathematical expressions. Unless otherwise noted, all classes are in the Java package org.sbml.libsbml.

Internally, libSBML uses Abstract Syntax Trees (ASTs) to provide a canonical, in-memory representation for all mathematical formulas regardless of their original format (i.e., C-like infix text strings or the XML-based MathML 2.0 format). LibSBML provides an extensive API for working with ASTs; it also provides facilities for translating between ASTs and mathematical formulas writing in a text-string notation, as well as translating between ASTs and MathML.

Basic concepts

An AST node in libSBML is a recursive tree structure; each node has a type, a pointer to a value, and a list of children nodes. Each ASTNode node may have none, one, two, or more children depending on its type. There are node types to represent numbers (with subtypes to distinguish integer, real, and rational numbers), names (e.g., constants or variables), simple mathematical operators, logical or relational operators and functions. The following diagram illustrates an example of how the mathematical expression "1 + 2" is represented as an AST with one plus node having two integer children nodes for the numbers 1 and 2. The figure also shows the corresponding MathML representation:
Example AST representation of a mathematical expression.
Infix AST MathML
1 + 2 <math xmlns="http://www.w3.org/1998/Math/MathML">
  <apply>
    <plus/>
    <cn type="integer"> 1 </cn>
    <cn type="integer"> 2 </cn>
  </apply>
</math>

The following are noteworthy about the AST representation in libSBML:

For many applications, the details of ASTs are irrelevant because the applications can use libSBML's text-string based translation functions such as libsbml.formulaToL3String(ASTNode) and libsbml.parseL3Formula(java.lang.String) If you find the complexity of using the AST representation of expressions too high for your purposes, perhaps the string-based functions will be more suitable.

Converting between ASTs and text strings

SBML Levels 2 and 3 represent mathematical expressions using using MathML 2.0 (more specifically, a subset of the content portion of MathML 2.0), but most applications using libSBML do not use MathML directly. Instead, applications generally interact with mathematics using either the API for Abstract Syntax Trees (described below), or using libSBML's facilities for encoding and decoding mathematical formulas to/from text strings. The latter is simpler to use directly, so we describe it first.

The libSBML formula parser has been carefully engineered so that transformations from MathML to the libSBML infix text notation and back is possible with a minimum of disruption to the structure of the mathematical expression. The example below shows a simple program that, when run, takes a MathML string compiled into the program, converts it to an AST, converts that to an infix representation of the formula, compares it to the expected form of that formula, and finally translates that formula back to MathML and displays it. The output displayed on the terminal should have the same structure as the MathML it started with. The program is a simple example of using libSBML's basic MathML and AST reading and writing methods, and shows that libSBML preserves the ordering and structure of the mathematical expressions.

import org.sbml.libsbml.ASTNode;
import org.sbml.libsbml.libsbml;

public class example
{
  public static void main (String[] args)
  {        
      String expected = "1 + f(x)";
      String input_mathml = "<?xml version='1.0' encoding='UTF-8'?>" 
          + "<math xmlns='http://www.w3.org/1998/Math/MathML'>"
          + "  <apply> <plus/> <cn> 1 </cn>"
          + "                  <apply> <ci> f </ci> <ci> x </ci> </apply>"
          + "  </apply>"
          + "</math>";

      ASTNode ast_result   = libsbml.readMathMLFromString(input_mathml);
      String ast_as_string = libsbml.formulaToString(ast_result);

      if (ast_as_string.equals(expected))
      {
          System.out.println("Got expected result.");
      }
      else
      {
          System.out.println("Mismatch after readMathMLFromString().");
          System.exit(1);
      }

      ASTNode new_mathml = libsbml.parseFormula(ast_as_string);
      String new_string  = libsbml.writeMathMLToString(new_mathml);

      System.out.println("Result of writing AST to string:");
      System.out.print(new_string);
      System.out.println();
  }

  static 
  {
    try 
    {
      System.loadLibrary("sbmlj");
    }
    catch (Exception e)
    {
      System.err.println("Could not load libSBML library:" + e.getMessage());
    }
  }
}

The text-string form of mathematical formulas produced by libsbml.formulaToString(ASTNode) and libsbml.formulaTol3String(ASTNode), and read by libsbml.parseFormula(java.lang.String) and libsbml.parseL3Formula(java.lang.String) are in a simple C-inspired infix notation. It is summarized in the next section below. A formula in this text-string form therefore can be handed to a program that understands SBML mathematical expressions, or used as part of a translation system.

The text-string formula syntax, and differences with MathML

There are actually two text-based formula parsing/writing systems in libSBML: one that uses a more limited syntax and was originally designed for translation between SBML Level 1 (which used a text-string format for representing mathematics) and higher levels of SBML, and a more recent, more powerful version that offers features to support SBML Level 3. We describe both below, beginning with the simpler but more limited system.

Simpler scheme based on SBML Level 1's syntax

The simpler, more limited translation system is read by libsbml.parseFormula(java.lang.String) and written by libsbml.formulaToString(ASTNode). It uses an infix notation essentially derived from the syntax of the C programming language and was originally used in SBML Level 1. We summarize the syntax here, but for more complete details, readers should consult the documentation for libsbml.parseFormula(java.lang.String).

Formula strings in this infix notation may contain operators, function calls, symbols, and white space characters. The allowable white space characters are tab and space. The following are illustrative examples of formulas expressed in the syntax:

0.10 * k4^2
(vm * s1)/(km + s1)

The following table shows the precedence rules in this syntax. In the Class column, operand implies the construct is an operand, prefix implies the operation is applied to the following arguments, unary implies there is one argument, and binary implies there are two arguments. The values in the Precedence column show how the order of different types of operation are determined. For example, the expression a * b + c is evaluated as (a * b) + c because the * operator has higher precedence. The Associates column shows how the order of similar precedence operations is determined; for example, a - b + c is evaluated as (a - b) + c because the + and - operators are left-associative. The precedence and associativity rules are taken from the C programming language, except for the symbol ^, which is used in C for a different purpose. (Exponentiation can be invoked using either ^ or the function power.)

Token Operation Class Precedence Associates
namesymbol referenceoperand6n/a
(expression)expression groupingoperand6n/a
f(...)function callprefix6left
-negationunary5right
^powerbinary4left
*multiplicationbinary3left
/divisonbinary3left
+additionbinary2left
-subtractionbinary2left
,argument delimiterbinary1left
A table of the expression operators and their precedence in the text-string format for mathematical expressions used by SBML_parseFormula().

A program parsing a formula in an SBML model should assume that names appearing in the formula are the identifiers of Species, Compartment, Parameter, FunctionDefinition, (in Level 2) Reaction, or (in Level 3) SpeciesReference objects defined in a model. When a function call is involved, the syntax consists of a function identifier, followed by optional white space, followed by an opening parenthesis, followed by a sequence of zero or more arguments separated by commas (with each comma optionally preceded and/or followed by zero or more white space characters), followed by a closing parenthesis. There is an almost one-to-one mapping between the list of predefined functions available, and those defined in MathML. All of the MathML funcctions are recognized; this set is larger than the functions defined in SBML Level 1. In the subset of functions that overlap between MathML and SBML Level 1, there exist a few differences. The following table summarizes the differences between the predefined functions in SBML Level 1 and the MathML equivalents in SBML Levels 2 and 3:

Text string formula functions MathML equivalents in SBML Levels 2 and 3
acosarccos
asinarcsin
atanarctan
ceilceiling
logln
log10(x)log(x) or log(10, x)
pow(x, y)power(x, y)
sqr(x)power(x, 2)
sqrt(x)root(x) or root(2, x)
Table comparing the names of certain functions in the SBML text-string formula syntax and MathML. The left column shows the names of functions recognized by SBML_parseFormula(); the right column shows their equivalent function names in MathML 2.0, used in SBML Levels 2 and 3.

Note that there are differences between the symbols used to represent the common mathematical functions and the corresponding MathML token names. This is a potential source of incompatibilities. Note in particular that in this text-string syntax, log(x) always represents the natural logarithm, whereas in MathML, the natural logarithm is <ln/>. Application writers are urged to be careful when translating between text forms and MathML forms, especially if they provide a direct text-string input facility to users of their software systems. The more advanced mathematical formula system, described below, offers the ability to control how log is interpreted as well as other parsing behaviors.

Advanced, SBML Level 3-oriented formula scheme

The text-string form of mathematical formulas read by the function libsbml.parseL3Formula(java.lang.String) and written by the function libsbml.formulaTol3String(ASTNode) uses an expanded version of the syntax read and written by libsbml.parseFormula(java.lang.String) and written by the function libsbml.formulaToString(ASTNode), respectively. The latter two libSBML functions were originally developed to support conversion between SBML Levels 1 and 2, and were focused on the syntax of mathematical formulas used in SBML Level 1. With time, and the use of MathML in SBML Levels 2 and 3, it became clear that supporting Level 2 and 3's expanded mathematical syntax would be useful for software developers. To maintain backwards compatibility for libSBML users, the original libsbml.parseFormula(java.lang.String) and libsbml.formulaToString(ASTNode) have been left untouched, and instead, the new functionality is provided in the form of libsbml.parseL3Formula(java.lang.String) and libsbml.formulaTol3String(ASTNode).

The following lists the main differences in the formula syntax supported by the "Level 3" or L3 versions of the formula parsers and formatters, compared to what is supported by the Level 1-oriented libsbml.parseFormula(java.lang.String) and libsbml.formulaToString(ASTNode):

These configuration settings cannot be changed directly using the basic parser and formatter functions, but can be changed on a per-call basis by using the alternative functions libsbml.parseL3FormulaWithSettings(String, L3ParserSettings) and libsbml.formulaToL3StringWithSettings(ASTNode, L3ParserSettings).

Neither SBML nor the MathML standard define a "string-form" equivalent to MathML expressions. The approach taken by libSBML is to start with the formula syntax defined by SBML Level 1 (which in fact used a custom text-string representation of formulas, and not MathML), and expand it to include the functionality described above. This formula syntax is based mostly on C programming syntax, and may contain operators, function calls, symbols, and white space characters. The following table provides the precedence rules for the different entities that may appear in formula strings.

Token Operation Class Precedence Associates
namesymbol referenceoperand8n/a
(expression)expression groupingoperand8n/a
f(...)function callprefix8left
^powerbinary7left
-, !negation and boolean 'not'unary6right
*, /, %multiplication, division, and modulobinary5left
+, -addition and subtractionbinary4left
==, <, >, <=, >=, !=boolean equality, inequality, and comparisonbinary3left
&&, ||boolean 'and' and 'or'binary2left
,argument delimiterbinary1left
Expression operators and their precedence in the "Level 3" text-string format for mathematical expressions.

In the table above, operand implies the construct is an operand, prefix implies the operation is applied to the following arguments, unary implies there is one argument, and binary implies there are two arguments. The values in the Precedence column show how the order of different types of operation are determined. For example, the expression a + b * c is evaluated as a + (b * c) because the * operator has higher precedence. The Associates column shows how the order of similar precedence operations is determined; for example, a && b || c is evaluated as (a && b) || c because the && and || operators are left-associative and have the same precedence.

The function call syntax consists of a function name, followed by optional white space, followed by an opening parenthesis token, followed by a sequence of zero or more arguments separated by commas (with each comma optionally preceded and/or followed by zero or more white space characters), followed by a closing parenthesis token. The function name must be chosen from one of the pre-defined functions in SBML or a user-defined function in the model. The following table lists the names of certain common mathematical functions; this table corresponds to Table 6 in the SBML Level 1 Version 2 specification with additions based on the functions added in SBML Level 2 and Level 3:

Name Argument(s) Formula or meaning Argument Constraints Result constraints
abs x Absolute value of x.
acos, arccos x Arccosine of x in radians. –1.0 ≤ x ≤ 1.0 0 ≤ acos(x) ≤ π
acosh, arccosh x Hyperbolic arccosine of x in radians.
acot, arccot x Arccotangent of x in radians.
acoth, arccoth x Hyperbolic arccotangent of x in radians.
acsc, arccsc x Arccosecant of x in radians.
acsch, arccsch x Hyperbolic arccosecant of x in radians.
asec, arcsec x Arcsecant of x in radians.
asech, arcsech x Hyperbolic arcsecant of x in radians.
asin, arcsin xArcsine of x in radians. –1.0 ≤ x ≤ 1.0 0 ≤ asin(x) ≤ π
atan, arctan x Arctangent of x in radians. 0 ≤ atan(x) ≤ π
atanh, arctanh x Hyperbolic arctangent of x in radians.
ceil, ceiling x Smallest number not less than x whose value is an exact integer.
cos x Cosine of x
cosh x Hyperbolic cosine of x.
cot x Cotangent of x.
coth x Hyperbolic cotangent of x.
csc x Cosecant of x.
csch x Hyperbolic cosecant of x.
delay x, y The value of x at y time units in the past.
factorial n The factorial of n. Factorials are defined by n! = n*(n–1)* ... * 1. n must be an integer.
exp x e x, where e is the base of the natural logarithm.
floor x The largest number not greater than x whose value is an exact integer.
ln x Natural logarithm of x. x > 0
log x By default, the base 10 logarithm of x, but can be set to be the natural logarithm of x, or to be an illegal construct. x > 0
log x, y The base x logarithm of y. y > 0
log10 x Base 10 logarithm of x. x > 0
piecewise x1, y1, [x2, y2,] [...] [z] A piecewise function: if (y1), x1. Otherwise, if (y2), x2, etc. Otherwise, z. y1, y2, y3 [etc] must be boolean
pow, power x, y x y.
root b, x The root base b of x.
sec x Secant of x.
sech x Hyperbolic secant of x.
sqr x x2.
sqrt x x. x > 0 sqrt(x) ≥ 0
sin x Sine of x.
sinh x Hyperbolic sine of x.
tan x Tangent of x. x ≠ n*π/2, for odd integer n
tanh x Hyperbolic tangent of x.
and x, y, z... Boolean and(x, y, z...): returns true if all of its arguments are true. Note that and is an n-ary function, taking 0 or more arguments, and that and() returns true. All arguments must be boolean
not x Boolean not(x) x must be boolean
or x, y, z... Boolean or(x, y, z...): returns true if at least one of its arguments is true. Note that or is an n-ary function, taking 0 or more arguments, and that or() returns false. All arguments must be boolean
xor x, y, z... Boolean xor(x, y, z...): returns true if an odd number of its arguments is true. Note that xor is an n-ary function, taking 0 or more arguments, and that xor() returns false. All arguments must be boolean
eq x, y, z... Boolean eq(x, y, z...): returns true if all arguments are equal. Note that eq is an n-ary function, but must take 2 or more arguments.
geq x, y, z... Boolean geq(x, y, z...): returns true if each argument is greater than or equal to the argument following it. Note that geq is an n-ary function, but must take 2 or more arguments.
gt x, y, z... Boolean gt(x, y, z...): returns true if each argument is greater than the argument following it. Note that gt is an n-ary function, but must take 2 or more arguments.
leq x, y, z... Boolean leq(x, y, z...): returns true if each argument is less than or equal to the argument following it. Note that leq is an n-ary function, but must take 2 or more arguments.
lt x, y, z... Boolean lt(x, y, z...): returns true if each argument is less than the argument following it. Note that lt is an n-ary function, but must take 2 or more arguments.
neq x, y Boolean x != y: returns true unless x and y are equal.
plus x, y, z... x + y + z + ...: The sum of the arguments of the function. Note that plus is an n-ary function taking 0 or more arguments, and that plus() returns 0.
times x, y, z... x * y * z * ...: The product of the arguments of the function. Note that times is an n-ary function taking 0 or more arguments, and that times() returns 1.
minus x, y xy.
divide x, y x / y.
Mathematical functions defined in the "Level 3" text-string formula syntax.

Parsing of the various MathML functions and constants are all case-insensitive by default: function names such as cos, Cos and COS are all parsed as the MathML cosine operator, <cos>. However, when a Model object is used in conjunction with either libsbml.parseL3FormulaWithModel(String, Model) or libsbml.parseL3FormulaWithSettings(String, L3ParserSettings), any identifiers found in that model will be parsed in a case-sensitive way. For example, if a model contains a Species having the identifier Pi, the parser will parse "Pi" in the input as "<ci> Pi </ci>" but will continue to parse the symbols "pi" and "PI" as "<pi>".

As mentioned above, the manner in which the "L3" versions of the formula parser and formatter interpret the function "log" can be changed. To do so, callers should use the function libsbml.parseL3FormulaWithSettings(String, L3ParserSettings) and pass it an appropriate L3ParserSettings object. By default, unlike the SBML Level 1 parser implemented by libsbml.parseFormula(java.lang.String), the string "log" is interpreted as the base 10 logarithm, and not as the natural logarithm. However, you can change the interpretation to be base-10 log, natural log, or as an error; since the name "log" by itself is ambiguous, you require that the parser uses log10 or ln instead, which are more clear. Please refer to libsbml.parseL3FormulaWithSettings(String, L3ParserSettings).

In addition, the following symbols will be translated to their MathML equivalents, if no symbol with the same SId identifier string exists in the Model object provided:

Name Meaning MathML
true The boolean value true <true/>
false The boolean value false <false/>
pi The mathematical constant pi <pi/>
avogadro The numerical value of Avogadro's constant, as defined in the SBML specification <csymbol encoding="text" definitionURL="http://www.sbml.org/sbml/symbols/avogadro"> avogadro </csymbol/>
time Simulation time as defined in SBML <csymbol encoding="text" definitionURL="http://www.sbml.org/sbml/symbols/time"> time </csymbol/>
inf or infinity The mathematical constant "infinity" <infinity/>
nan or notanumber The mathematical concept "not a number" <notanumber/>
Mathematical symbols defined in the "Level 3" text-string formula syntax.

Again, as mentioned above, whether the string "avogadro" is parsed as an AST node of type AST_NAME_AVOGADRO or AST_NAME is configurable; use the version of the parser function called libsbml.parseL3FormulaWithSettings(String, L3ParserSettings). This Avogadro-related functionality is provided because SBML Level 2 models may not use AST_NAME_AVOGADRO AST nodes.

Methods for working directly with libSBML's Abstract Syntax Trees

While it is convenient to read and write mathematical expressions in the form of text strings, advanced applications usually need more powerful ways of creating, traversing, and modifying mathematical formulas. For this reason, libSBML provides a rich API for interacting with ASTs directly. This section summarizes these facilities; for more information, readers should consult the documentation for the ASTNode class.

Every ASTNode in a libSBML abstract syntax tree has an associated type, which is a value taken from a set of constants having names beginning with AST_ and defined in org.sbml.libsbml.libsbmlConstants. The list of possible AST types in libSBML is quite long, because it covers all the mathematical functions that are permitted in SBML. The values are shown in the following table; their names hopefully evoke the construct that they represent:

AST_CONSTANT_E AST_FUNCTION_COT AST_LOGICAL_NOT
AST_CONSTANT_FALSE AST_FUNCTION_COTH AST_LOGICAL_OR
AST_CONSTANT_PI AST_FUNCTION_CSC AST_LOGICAL_XOR
AST_CONSTANT_TRUE AST_FUNCTION_CSCH AST_MINUS
AST_DIVIDE AST_FUNCTION_DELAY AST_NAME
AST_FUNCTION AST_FUNCTION_EXP AST_NAME_AVOGADRO (Level 3 only)
AST_FUNCTION_ABS AST_FUNCTION_FACTORIAL AST_NAME_TIME
AST_FUNCTION_ARCCOS AST_FUNCTION_FLOOR AST_PLUS
AST_FUNCTION_ARCCOSH AST_FUNCTION_LN AST_POWER
AST_FUNCTION_ARCCOT AST_FUNCTION_LOG AST_RATIONAL
AST_FUNCTION_ARCCOTH AST_FUNCTION_PIECEWISE AST_REAL
AST_FUNCTION_ARCCSC AST_FUNCTION_POWER AST_REAL_E
AST_FUNCTION_ARCCSCH AST_FUNCTION_ROOT AST_RELATIONAL_EQ
AST_FUNCTION_ARCSEC AST_FUNCTION_SEC AST_RELATIONAL_GEQ
AST_FUNCTION_ARCSECH AST_FUNCTION_SECH AST_RELATIONAL_GT
AST_FUNCTION_ARCSIN AST_FUNCTION_SIN AST_RELATIONAL_LEQ
AST_FUNCTION_ARCSINH AST_FUNCTION_SINH AST_RELATIONAL_LT
AST_FUNCTION_ARCTAN AST_FUNCTION_TAN AST_RELATIONAL_NEQ
AST_FUNCTION_ARCTANH AST_FUNCTION_TANH AST_TIMES
AST_FUNCTION_CEILING AST_INTEGER AST_UNKNOWN
AST_FUNCTION_COS AST_LAMBDA
AST_FUNCTION_COSH AST_LOGICAL_AND

The types have the following meanings:

There are a number of methods for interrogating the type of an ASTNode and for testing whether a node belongs to a general category of constructs. The methods defined by the ASTNode class are the following:

Programs manipulating AST node structures should check the type of a given node before calling methods that return a value from the node. The following methods are available for returning values from nodes:

Of course, all of this would be of little use if libSBML didn't also provide methods for setting the values of AST node objects! And it does. The methods are the following:

Finally, ASTNode also defines some miscellaneous methods for manipulating

Reading and Writing MathML from/to ASTs

As mentioned above, applications often can avoid working with raw MathML by using either libSBML's text-string interface or the AST API. However, when needed, reading MathML content directly and creating ASTs, as well as the converse task of writing MathML, is easily done using two methods designed for this purpose:

The example program given above demonstrate the use of these methods.