Skip to content

Table Column Expressions

This page highlights the most often used API functionalities and is not complete. It covers expression-based table operations (apply and group_by). For complete coverage, see the API Reference.

Expression Basics

Table columns are exposed as expression objects. You can use either attribute-style access (t.mz) or item access (t["mz"]).

import emzed

t = emzed.Table.create_table(
    ["name", "mz", "rt"],
    [str, float, float],
    rows=[["glucose", 181.071, 120.0], ["caffeine", 195.088, 180.0]],
)

# column expressions
expr1 = t.mz > 190.0
expr2 = t["rt"] < 200.0

subset = t.filter(expr1 & expr2)
print(subset)

Supported Operations

  • arithmetic: +, -, *, /
  • comparisons: ==, !=, <, <=, >, >=
  • boolean composition: &, | (use these, not and/or)
  • range/tolerance helpers: in_range(left, right), approx_equal(other, atol, rtol)
  • null predicates: is_none(), is_not_none()
  • string predicates: startswith(...), endswith(...), contains(...)
  • membership: is_in(iterable)
  • conditional/value helpers: then_else(then, else_), if_not_none_else(...)
  • numeric helpers: abs(), round(digits=0), floor()
  • column reductions: min(), max()

Examples:

expr = (t.mz.in_range(100.0, 300.0) & t.rt.is_not_none()) | t.name.contains("gluc")
subset = t.filter(expr)
t.add_column("flag", t.mz.is_none().then_else("missing", "ok"), str)

Additional examples for operations not shown elsewhere on this page:

# arithmetic + comparisons
arith = ((t.mz + 1.0) * 2.0 - 3.0) / 5.0
subset = t.filter((arith >= 50.0) & (arith <= 200.0) & (t.name != "water"))

# tolerance helper
subset = t.filter(t.mz.approx_equal(181.07, atol=0.02, rtol=0.0))

# string predicates
subset = t.filter(t.name.startswith("glu") | t.name.endswith("ose"))

# membership
subset = t.filter(t.name.is_in(["glucose", "caffeine"]))

# conditional/value helper
filled_rt = t.rt.if_not_none_else(-1.0)
t.add_column("rt_filled", filled_rt, float)

# numeric helpers
t.add_column("mz_abs", t.mz.abs(), float)
t.add_column("mz_round", t.mz.round(2), float)
t.add_column("mz_floor", t.mz.floor(), float)

# column reductions
t.add_column("mz_min", t.mz.min(), float)
t.add_column("mz_max", t.mz.max(), float)

Additional notes

Evaluation Priority And Parentheses

Python operator precedence applies before expression objects are evaluated. In particular, & and | have lower precedence than comparisons, and chained comparisons can produce unexpected parsing.

Always parenthesize boolean sub-expressions:

# recommended
expr = (t.a > 3) | (t.b < 10)
subset = t.filter(expr)
# avoid: ambiguous/invalid precedence
# t.filter(t.a > 3 | t.b < 10)

Expression.__bool__ intentionally raises an error when a boolean context is used without explicit parentheses, helping catch precedence mistakes early.

Type Notes

For non-algebraic/object-like column types (for example PeakMap, nested Table, and many object payloads), arithmetic/comparison operators are not available as SQL expressions. Use apply(...) or explicit Python processing instead.

Apply function to column

apply(function, *args, ignore_nones=True, result_type=None)

Build a row-wise expression by applying a Python function.

Parameters:

Name Type Description Default
function

Callable that receives args row-wise.

required
args

Function inputs, typically column expressions or constants.

()
ignore_nones

If True, rows with None in any input produce None without calling function.

True
result_type

Deprecated compatibility parameter. It is currently ignored; set the target type when you add/replace the destination column.

None

Returns:

Type Description

Apply expression usable in column operations.

Applies a Python function row-wise to one or more input expressions and returns an expression that can be used in add_column/replace_column.

Example:

import emzed

t = emzed.Table.create_table(
    ["mz", "charge"],
    [float, int],
    rows=[[181.071, 1], [195.088, 1], [342.116, 2]],
)

def neutral_mass(mz, z):
    return mz * z

t.add_column("neutral_mass", t.apply(neutral_mass, t.mz, t.charge), float)
print(t)

Example output:

mz       charge  neutral_mass
float    int     float
-------  ------  ------------
181.071       1       181.071
195.088       1       195.088
342.116       2       684.232

Grouped table operations

group_by(*colums, group_nones=False)

Create grouped expressions based on one or more columns.

Parameters:

Name Type Description Default
colums

Grouping columns, for example t.a or t["b"].

()
group_nones

If False, rows with None in grouping columns are excluded from groups.

False

Returns:

Type Description

GroupBy helper (for example .min(...)).

Creates grouped expressions for per-group aggregates.

Supported aggregates:

  • numeric aggregates: sum(expression), min(expression), max(expression), mean(expression), count(), std(expression), median(expression)
  • grouping helper: id()
  • boolean/null aggregates: all_false(...), any_false(...), all_true(...), any_true(...), all_none(...), any_none(...)
  • generic aggregate: aggregate(function, *args, ignore_nones=True)

Example:

import emzed

t = emzed.Table.create_table(
    ["group", "intensity"],
    [str, float],
    rows=[["A", 10.0], ["A", 12.5], ["B", 5.0], ["B", 7.0]],
)

t.add_column(
    "intensity_sum_per_group",
    t.group_by(t.group).sum(t.intensity),
    float,
)
t.add_column(
    "intensity_mean_per_group",
    t.group_by(t.group).mean(t.intensity),
    float,
)
print(t)

Example output:

group  intensity  intensity_sum_per_group  intensity_mean_per_group
str    float      float                    float
-----  ---------  -----------------------  ------------------------
A         10.000                   22.500                    11.250
A         12.500                   22.500                    11.250
B          5.000                   12.000                     6.000
B          7.000                   12.000                     6.000

Further examples:

# generate stable group id per group key
t.add_column("group_id", t.group_by(t.group).id(), int)

# boolean/null aggregates
t_bool = emzed.Table.create_table(
    ["group", "flag"],
    [str, bool],
    rows=[["A", True], ["A", False], ["B", None], ["B", False]],
)
t_bool.add_column(
    "all_true_per_group",
    t_bool.group_by(t_bool.group).all_true(t_bool.flag),
    bool,
)
t_bool.add_column(
    "any_true_per_group",
    t_bool.group_by(t_bool.group).any_true(t_bool.flag),
    bool,
)
t_bool.add_column(
    "all_none_per_group",
    t_bool.group_by(t_bool.group).all_none(t_bool.flag),
    bool,
)
t_bool.add_column(
    "any_none_per_group",
    t_bool.group_by(t_bool.group).any_none(t_bool.flag),
    bool,
)

# generic aggregate
t.add_column(
    "intensity_span_per_group",
    t.group_by(t.group).aggregate(
        lambda values: max(values) - min(values),
        t.intensity,
    ),
    float,
)
print(t)
print(t_bool)

Example output:

group  intensity  intensity_sum_per_group  intensity_mean_per_group  group_id  intensity_span_per_group
str    float      float                    float                     int       float
-----  ---------  -----------------------  ------------------------  --------  ------------------------
A         10.000                   22.500                    11.250         0                    2.500
A         12.500                   22.500                    11.250         0                    2.500
B          5.000                   12.000                     6.000         1                    2.000
B          7.000                   12.000                     6.000         1                    2.000

group  flag   all_true_per_group  any_true_per_group  all_none_per_group  any_none_per_group
str    bool   bool                bool                bool                bool
-----  -----  ------------------  ------------------  ------------------  ------------------
A      True   False               True                False               False
A      False  False               True                False               False
B      -      False               False               False               True
B      False  False               False               False               True