Skip to content

Tables

This page highlights the most often used API functionalities and is not complete. It covers the Table container and core value types (MzType, RtType) used in table columns. Table expression syntax is documented on the dedicated Table Column Expressions page. For complete coverage, see the API Reference.

Table

Primary tabular container used across emzed.

Table Model

Each table column has three parts:

  1. a name (col_names)
  2. a type (col_types)
  3. a display format (col_formats)

Supported column types include standard Python scalars (int, float, str, bool) and emzed domain types such as PeakMap or nested Table.

For arbitrary Python data (for example dict, tuple, list, or custom objects), use column type object. In object columns, all pickleable Python data types are supported.

col_formats controls how values are displayed (for example in print(table) or GUI inspectors). It does not change the stored raw values.

Typical format values:

  • format strings such as %.5f, %.2f, %d
  • callable formatters. E.g. the RtType uses lambda v: f"{v/60:.2f} m"
  • None to hide a column in printed/GUI views while keeping the data

Formats can be changed later with set_col_format and changed back if needed.

Usage pattern (class-level constructors vs instance methods):

import emzed

# class-level constructor (staticmethod)
peaks = emzed.Table.create_table(
    ["name", "sum_formula"],
    [str, str],
    rows=[["glucose", "C6H12O6"], ["caffeine", "C8H10N4O2"]],
)

# class-level method
loaded = emzed.Table.load("peaks.table")

# instance methods
filtered = peaks.filter(peaks.name.contains("gluc"))
peaks.save("peaks.table", overwrite=True)

In-memory vs on-disk

emzed tables can live in RAM or on disk, backed by a SQLite database file.

  • Table.load(path) reads the whole file into memory.
  • Table.open(path) returns a lightweight handle to the on-disk file; no data is loaded until accessed. This lets you work with tables larger than your available RAM, for example in workflows that process many samples.

Use t.is_in_memory() to check which mode a table is in, and t.close() to release the file handle when done.

Several combining and consolidating operations accept a path= argument to write their result directly to disk instead of returning an in-memory table, for example Table.stack_tables(..., path=...) or view.consolidate(path=...).

See the out-of-memory processing example for a runnable workflow.

Table Attributes And Column Access

  • meta_data: table metadata dictionary wrapper
  • col_names: tuple of column names
  • col_types: tuple of Python/emzed column types
  • col_formats: tuple of column format definitions
  • dynamic column access: t.col_name is equivalent to t["col_name"]; the bracket form is often simpler for programmatic access
  • t.mz and t["mz"] are equivalent column expression objects
  • t.mz[0] (or t["mz"][0]) returns the first value in that column

Example:

import emzed

t = emzed.Table.create_table(
    ["name", "mz"],
    [str, float],
    rows=[["glucose", 181.071]],
)
print(t.col_names)
print(t.mz[0], t["mz"][0])
col_name = "mz"
print(t[col_name][0])  # useful for programmatic access
t.set_col_format("mz", "%.3f")
print(t)
t.set_col_format("mz", None)  # hidden in print/gui views, data is still there

Example output:

('name', 'mz')
181.071 181.071
181.071
name     mz
str      float
-------  -----
glucose  181.071

Table Methods

Class level constructors

create_table(col_names, col_types, col_formats=None, rows=None, title=None, meta_data=None, path=None) staticmethod

creates a table.

Parameters:

Name Type Description Default
col_names

list or tuple of strings.

required
col_types

list of types.

required
col_formats

list of formats using format specifiers like "%.2f" If not specified emzed tries to guess appropriate formats based on column type and column name.

None
rows

list of lists.

None
title

table title as string.

None
meta_data

dictionary to manage user defined meta data.

None
path

path for the db backend, default is None to use the the in-memory db backend.

None

Returns:

Type Description

emzed.Table.

to_table(name, values, type_, format_=None, title=None, meta_data=None, path=None)

generates a one-column Table from an iterable, e.g. from a list.

Parameters:

Name Type Description Default
name

name of the column.

required
values

iterable with column values.

required
type_

supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type 'object' instead.

required
format_

is a format string such as "%d". To suppress visibility set format_ = None.

None
title

Table title as string.

None
meta_data

Python dictionary to assign meta data to the table.

None
path

Path for the db backend, use None for an in memory db backend.

None

Returns:

Type Description

emzed.Table

load(path) classmethod

loads table from disk into memory.

Parameters:

Name Type Description Default
path

path to file.

required

Returns:

Type Description

emzed.Table.

open(path) classmethod

opens table on disk without loading data into memory.

Parameters:

Name Type Description Default
path

path to file.

required

Returns:

Type Description

emzed.Table.

Column Access, Mutation, And Layout

get_column(name)

returns column expression object for column name.

Parameters:

Name Type Description Default
name

existing column name or "_index".

You can use t[name] or t.name instead.

required

add_column(name, what, type_, format_=not_specified, insert_before=None, insert_after=None)

adds a new column with name in place.

Parameters:

Name Type Description Default
name

the name of the new column.

required
what

either a list with the same length as table or an expression.

required
type_

supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type 'object' instead.

required
format_

is a format string as "%d" or an executable string with python code. To suppress visibility set format_ = None. The default not_specified is a sentinel meaning "argument not provided" and is different from None. In that case the method determines a default format for the column type.

not_specified
insert_before

to add column name at a defined position, one can specify its position left-wise to column insert_before via the name of an existing column, or an integer index (negative values allowed !).

None
insert_after

to add column name at a defined position, one can specify its position right-wise to column insert_after.

None

add_column_with_constant_value(name, value, type_, format_=not_specified, insert_before=None, insert_after=None)

add column name with unique value value.

Parameters:

Name Type Description Default
name

new column name.

required
value

any of accepted types int, float, bool, MzType, RtType, str, PeakMap, Table.

required
type_

target column type.

required
format_

target column format. The default not_specified is a sentinel (different from None) and means "use the default format for type_".

not_specified
insert_before

insertion position for the new column.

None
insert_after

insertion position for the new column.

None

replace_column(name, what, type_=None, format_=not_specified)

replaces content of existing column name in place.

Parameters:

Name Type Description Default
name

the name of the exisiting column.

required
what

you can use a list with the same length as table or an expression.

required
type_

supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type 'object' instead.

None
format_

is a format string as "%d" or an executable string with python code. To suppress visibility set format_ = None. The default not_specified is a sentinel meaning "argument not provided" and is different from None. In that case the method determines a default format for the column type.

The column keeps its existing position in the table.

not_specified

add_or_replace_column(name, what, type_=None, format_=not_specified, insert_before=None, insert_after=None)

replaces the content of column name if it exists, else name is added (in place).

Parameters:

Name Type Description Default
name

column name to replace or create.

required
what

either a list with the same length as table or an expression.

required
type_

target column type. If None and the column exists, the current column type is reused.

None
format_

target column format. The default not_specified is a sentinel (different from None) and means "use the existing format or the default format for type_".

not_specified
insert_before

insertion position used when the column is created.

None
insert_after

insertion position used when the column is created.

None

add_or_replace_column_with_constant_value(name, what, type_=None, format_=not_specified, insert_before=None, insert_after=None)

replaces the content of column name with unique value if name exists, else name is added (in place).

Parameters:

Name Type Description Default
name

column name to replace or create.

required
what

scalar value assigned to all rows.

required
type_

target column type. If None and the column exists, the current column type is reused.

None
format_

target column format. The default not_specified is a sentinel (different from None) and means "use the existing format or the default format for type_".

not_specified
insert_before

insertion position used when the column is created.

None
insert_after

insertion position used when the column is created.

None

replace_column_with_constant_value(name, what, type_=None, format_=not_specified)

replaces the content of column name with unique value what.

Parameters:

Name Type Description Default
name

existing column name.

required
what

any of accepted types int, float, bool, MzType, RtType, str, PeakMap, Table.

required
type_

target column type. If None, the current column type is used.

None
format_

target column format. The default not_specified is a sentinel (different from None) and means "use the existing format or the default format for type_".

not_specified

drop_columns(*col_names)

removes columns in place.

Parameters:

Name Type Description Default
col_names

column names. either exact names or names containg wild cards like ? and *.

Example: Table t with colnames id, mz, mzmin, mzmax, sample_1k1, sample_1m1, sample_1k2

t.drop_columns('mz*', 'sample_1?1')

results t with columns id, sample_1k2

()

extract_columns(*col_names)

returns new Table with selected columns col_names.

Parameters:

Name Type Description Default
col_names

list or tuple with selected, existing column names.

()

rename_columns(**from_to)

Rename columns from current names to new names.

Parameters:

Name Type Description Default
from_to

Keyword arguments mapping old column names to new column names, for example a="b".

Example: t.rename_columns(a="b") renames column "a" to "b".

{}

rename_postfixes(**from_to)

Rename postfixes in column names using keyword arguments.

Each key is the old postfix and each value is the new postfix.

Example: t.rename_postfixes(__0="_zero") changes columns like a__0 and b__0 to a_zero and b_zero.

set_col_format(col_name, format_)

sets format of column col_name to format format_.

Parameters:

Name Type Description Default
col_name

column name.

required
format_

accepted column format (see add_column).

required

Returns:

Type Description

None.

Row Selection, Ordering, Grouping, And Collapse

filter(condition)

creates a new table by filtering rows fulfiling the given condition. similar use as pandas query.

Parameters:

Name Type Description Default
condition

expression like t.a < 0 or t.a <= t.b.

required

Returns:

Type Description

emzed.Table with filtered rows.

sort_by(*col_names, ascending=True)

sort table by given column names in given order.

Parameters:

Name Type Description Default
col_names

one or more column names as separate arguments.

()
ascending

either bool or list/tuple of bools of same number as specified column names.

True

Returns:

Type Description

emzed.Table.

split_by(*col_names, keep_view=False)

Split a table into subtables by unique values of selected columns.

Parameters:

Name Type Description Default
col_names

Column names defining split groups.

()
keep_view

If True, return views. If False, return consolidated standalone tables.

False

Returns:

Type Description

List of Table subtables, one per unique group.

collapse(*col_names, new_col_name='collapsed', path=None)

Collapse rows into grouped nested-table rows.

Parameters:

Name Type Description Default
col_names

Column names defining grouping keys.

()
new_col_name

Name of the column that stores collapsed subtables.

'collapsed'
path

Optional target database path for the result.

None

Returns:

Type Description

Collapsed emzed.Table with one nested table per group.

Joining And Combining Tables

join(other, expression=None, *, path=None, overwrite=False)

Join two tables.

Parameters:

Name Type Description Default
other

Right-hand side table.

required
expression

Optional join condition expression. If None, a full row-wise cross product is returned.

None
path

Optional target database path for the result.

None
overwrite

Whether an existing path may be overwritten.

False

Returns:

Type Description

Joined emzed.Table.

left_join(other, expression=None, *, path=None, overwrite=False)

Left-join two tables while keeping all rows from the left table.

Parameters:

Name Type Description Default
other

Right-hand side table.

required
expression

Optional join condition expression. If None, a full row-wise cross product is used.

None
path

Optional target database path for the result.

None
overwrite

Whether an existing path may be overwritten.

False

Returns:

Type Description

Joined emzed.Table with all left-side rows preserved.

fast_join(other, col_name, col_name_other=None, atol=0.0, rtol=0.0, extra_condition=None, *, path=None, overwrite=False)

joins (combines) two tables based on comparing approximate equality of two numerical columns.

Parameters:

Name Type Description Default
other

second table for join.

required
col_name

column name to consider.

required
col_name_other

column name of other to consider in case it is different to col_name.

None
atol

absolute tolerance for approximate equality check.

0.0
rtol

relative tolerance for approximate equality check.

0.0
extra_condition

optional additional join expression that must also match for a row pair to be included.

None

Returns:

Type Description

emzed.Table.

Performance: In case other is significantly larger than self, it is recommended to swap the tables.

The apprimate equality check for two numbers a and b is:

 abs(a - b) <= atol + rtol * abs(a)

So if you only need comparison based absolute tolerance you can set rtol to 0.0, and if you only need relative tolerance check you can set atol to 0.0.

fast_left_join(other, col_name, col_name_other=None, atol=0.0, rtol=0.0, extra_condition=None, *, path=None, overwrite=False)

joins (combines) two tables based on comparing approximate equality of two numerical columns.

In contrast to fast_join this method will include also non-matching rows from self.

Parameters:

Name Type Description Default
other

second table for join.

required
col_name

column name to consider.

required
col_name_other

column name of other to consider in case it is different to col_name.

None
atol

absolute tolerance for approximate equlity check.

0.0
rtol

relative tolerance for approximate equlity check.

0.0
extra_condition

optional additional join expression that must also match for a row pair to be included.

None

Returns:

Type Description

emzed.Table.

Performance: In case other is significantly larger than self, it is recommended to swap the tables.

The apprimate equality check for two numbers a and b is:

 abs(a - b) <= atol + rtol * abs(a)

So if you only need comparison based absolute tolerance you can set rtol to 0.0, and if you only need relative tolerance check you can set atol to 0.0.

stack_tables(tables, path=None, overwrite=False) staticmethod

builds a single Table from list or tuple of Tables.

Parameters:

Name Type Description Default
tables

list or tuple of Tables. All tables must have the same colum names with same types and formats.

required
path

If specified the result will be a Table with a db file backend, else the result will be managed in memory.

None
overwrite

Indicate if an already existing database file should be overwritten.

False

Returns:

Type Description

emzed.Table.

Storage, Export, And Summary

save(path, *, overwrite=False)

save table to a file.

Parameters:

Name Type Description Default
path

path describing target location.

required
overwrite

If set to True an existing file will be overwritten, else an exception will be thrown.

False

save_csv(path, delimiter=';', as_printed=False, dash_is_none=True, *, overwrite=False)

saves Table as csv in path.

Parameters:

Name Type Description Default
path

specifies path of the file. The path must end with .csv.

required
delimiter

Alias for sep. Default value is set to Excel dialect ';'.

';'
as_printed

If True, formatted values will be stored. Note, format settings can lead to information loss, i.e. if column format value is set to .2f% only the first 2 decimal places will be saved.

False
dash_is_none

if True, missing values are written as - when as_printed is enabled.

True
overwrite

If set to True an existing file will be overwritten, else an exception will be thrown.

False

save_excel(path, *, overwrite=False)

saves Table as xls or xlsx in path.

Parameters:

Name Type Description Default
path

specifies path of the file. The path must end with .xls or .xlsx.

required

to_pandas()

converts table to pandas DataFrame object

consolidate(path=None, *, overwrite=False)

consolidates if underlying database table is a view.

Parameters:

Name Type Description Default
path

If specified the result will be a Table with a db file backend, else the result will be managed in memory.

None
overwrite

Indicate if an already existing database file should be overwritten.

False

Returns:

Type Description

emzed.Table.

summary()

is_mutable()

returns boolean value to show whether the content of a Table is mutable.

MzType

Bases: float

Represents Mass-to-Charge ratio (m/z). Inherits from float and provides high-precision formatting (6 decimal places) in tables.

RtType

Bases: float

Represents Retention Time in seconds. Inherits from float and provides specialized formatting in tables (e.g., '12.34 m').