Table Column Expressions¶
This page highlights the most often used API functionalities and is not complete.
It covers expression-based table operations (apply and group_by).
For complete coverage, see the API Reference.
Expression Basics¶
Table columns are exposed as expression objects. You can use either
attribute-style access (t.mz) or item access (t["mz"]).
import emzed
t = emzed.Table.create_table(
["name", "mz", "rt"],
[str, float, float],
rows=[["glucose", 181.071, 120.0], ["caffeine", 195.088, 180.0]],
)
# column expressions
expr1 = t.mz > 190.0
expr2 = t["rt"] < 200.0
subset = t.filter(expr1 & expr2)
print(subset)
Supported Operations¶
- arithmetic:
+,-,*,/ - comparisons:
==,!=,<,<=,>,>= - boolean composition:
&,|(use these, notand/or) - range/tolerance helpers:
in_range(left, right),approx_equal(other, atol, rtol) - null predicates:
is_none(),is_not_none() - string predicates:
startswith(...),endswith(...),contains(...) - membership:
is_in(iterable) - conditional/value helpers:
then_else(then, else_),if_not_none_else(...) - numeric helpers:
abs(),round(digits=0),floor() - column reductions:
min(),max()
Examples:
expr = (t.mz.in_range(100.0, 300.0) & t.rt.is_not_none()) | t.name.contains("gluc")
subset = t.filter(expr)
Additional examples for operations not shown elsewhere on this page:
# arithmetic + comparisons
arith = ((t.mz + 1.0) * 2.0 - 3.0) / 5.0
subset = t.filter((arith >= 50.0) & (arith <= 200.0) & (t.name != "water"))
# tolerance helper
subset = t.filter(t.mz.approx_equal(181.07, atol=0.02, rtol=0.0))
# string predicates
subset = t.filter(t.name.startswith("glu") | t.name.endswith("ose"))
# membership
subset = t.filter(t.name.is_in(["glucose", "caffeine"]))
# conditional/value helper
filled_rt = t.rt.if_not_none_else(-1.0)
t.add_column("rt_filled", filled_rt, float)
# numeric helpers
t.add_column("mz_abs", t.mz.abs(), float)
t.add_column("mz_round", t.mz.round(2), float)
t.add_column("mz_floor", t.mz.floor(), float)
# column reductions
t.add_column("mz_min", t.mz.min(), float)
t.add_column("mz_max", t.mz.max(), float)
Additional notes¶
Evaluation Priority And Parentheses¶
Python operator precedence applies before expression objects are evaluated.
In particular, & and | have lower precedence than comparisons, and chained
comparisons can produce unexpected parsing.
Always parenthesize boolean sub-expressions:
Expression.__bool__ intentionally raises an error when a boolean context is
used without explicit parentheses, helping catch precedence mistakes early.
Type Notes¶
For non-algebraic/object-like column types (for example PeakMap, nested
Table, and many object payloads), arithmetic/comparison operators are not
available as SQL expressions. Use apply(...) or explicit Python processing
instead.
Apply function to column¶
apply(function, *args, ignore_nones=True, result_type=None)
¶
Build a row-wise expression by applying a Python function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
function
|
Callable that receives |
required | |
args
|
Function inputs, typically column expressions or constants. |
()
|
|
ignore_nones
|
If |
True
|
|
result_type
|
Deprecated compatibility parameter. It is currently ignored; set the target type when you add/replace the destination column. |
None
|
Returns:
| Type | Description |
|---|---|
|
|
Applies a Python function row-wise to one or more input expressions and returns
an expression that can be used in add_column/replace_column.
Example:
import emzed
t = emzed.Table.create_table(
["mz", "charge"],
[float, int],
rows=[[181.071, 1], [195.088, 1], [342.116, 2]],
)
def neutral_mass(mz, z):
return mz * z
t.add_column("neutral_mass", t.apply(neutral_mass, t.mz, t.charge), float)
print(t)
Example output:
mz charge neutral_mass
float int float
------- ------ ------------
181.071 1 181.071
195.088 1 195.088
342.116 2 684.232
Grouped table operations¶
group_by(*colums, group_nones=False)
¶
Create grouped expressions based on one or more columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
colums
|
Grouping columns, for example |
()
|
|
group_nones
|
If |
False
|
Returns:
| Type | Description |
|---|---|
|
|
Creates grouped expressions for per-group aggregates.
Supported aggregates:
- numeric aggregates:
sum(expression),min(expression),max(expression),mean(expression),count(),std(expression),median(expression) - grouping helper:
id() - boolean/null aggregates:
all_false(...),any_false(...),all_true(...),any_true(...),all_none(...),any_none(...) - generic aggregate:
aggregate(function, *args, ignore_nones=True)
Example:
import emzed
t = emzed.Table.create_table(
["group", "intensity"],
[str, float],
rows=[["A", 10.0], ["A", 12.5], ["B", 5.0], ["B", 7.0]],
)
t.add_column(
"intensity_sum_per_group",
t.group_by(t.group).sum(t.intensity),
float,
)
t.add_column(
"intensity_mean_per_group",
t.group_by(t.group).mean(t.intensity),
float,
)
print(t)
Example output:
group intensity intensity_sum_per_group intensity_mean_per_group
str float float float
----- --------- ----------------------- ------------------------
A 10.000 22.500 11.250
A 12.500 22.500 11.250
B 5.000 12.000 6.000
B 7.000 12.000 6.000
Further examples:
# generate stable group id per group key
t.add_column("group_id", t.group_by(t.group).id(), int)
# boolean/null aggregates
t_bool = emzed.Table.create_table(
["group", "flag"],
[str, bool],
rows=[["A", True], ["A", False], ["B", None], ["B", False]],
)
t_bool.add_column(
"all_true_per_group",
t_bool.group_by(t_bool.group).all_true(t_bool.flag),
bool,
)
t_bool.add_column(
"any_true_per_group",
t_bool.group_by(t_bool.group).any_true(t_bool.flag),
bool,
)
t_bool.add_column(
"all_none_per_group",
t_bool.group_by(t_bool.group).all_none(t_bool.flag),
bool,
)
t_bool.add_column(
"any_none_per_group",
t_bool.group_by(t_bool.group).any_none(t_bool.flag),
bool,
)
# generic aggregate
t.add_column(
"intensity_span_per_group",
t.group_by(t.group).aggregate(
lambda values: max(values) - min(values),
t.intensity,
),
float,
)
print(t)
print(t_bool)
Example output:
group intensity intensity_sum_per_group intensity_mean_per_group group_id intensity_span_per_group
str float float float int float
----- --------- ----------------------- ------------------------ -------- ------------------------
A 10.000 22.500 11.250 0 2.500
A 12.500 22.500 11.250 0 2.500
B 5.000 12.000 6.000 1 2.000
B 7.000 12.000 6.000 1 2.000
group flag all_true_per_group any_true_per_group all_none_per_group any_none_per_group
str bool bool bool bool bool
----- ----- ------------------ ------------------ ------------------ ------------------
A True False True False False
A False False True False False
B - False False False True
B False False False False True