Skip to content

emzed.table

Table

col_formats property

Column formats.

Returns:

Type Description

tuple of format specifiers.

col_names property

Column names.

Returns:

Type Description

tuple of strings.

col_types property

Column types.

Returns:

Type Description

tuple of types.

rows property

Returns:

Type Description

All rows as list of tuples.

unique_id property

computes unique identifier based on table content and meta data.

Returns:

Type Description

unique identifier as string.

add_column(name, what, type_, format_=not_specified, insert_before=None, insert_after=None)

adds a new column with name in place.

Parameters:

Name Type Description Default
name

the name of the new column.

required
what

either a list with the same length as table or an expression.

required
type_

supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type 'object' instead.

required
format_

is a format string as "%d" or an executable string with python code. To suppress visibility set format_ = None. The default not_specified is a sentinel meaning "argument not provided" and is different from None. In that case the method determines a default format for the column type.

not_specified
insert_before

to add column name at a defined position, one can specify its position left-wise to column insert_before via the name of an existing column, or an integer index (negative values allowed !).

None
insert_after

to add column name at a defined position, one can specify its position right-wise to column insert_after.

None

add_column_with_constant_value(name, value, type_, format_=not_specified, insert_before=None, insert_after=None)

add column name with unique value value.

Parameters:

Name Type Description Default
name

new column name.

required
value

any of accepted types int, float, bool, MzType, RtType, str, PeakMap, Table.

required
type_

target column type.

required
format_

target column format. The default not_specified is a sentinel (different from None) and means "use the default format for type_".

not_specified
insert_before

insertion position for the new column.

None
insert_after

insertion position for the new column.

None

add_enumeration(col_name='id', insert_before=None, insert_after=None, start_with=0)

adds enumerated column as first column to table in place.

Parameters:

Name Type Description Default
col_name

name of added column. Default name is id.

'id'
insert_before

to add column col_name at a defined position, one can specify its position via the name of an existing column, or an integer index (negative values allowed). setting insert_before and the insert_after at the same time is not allowed.

None
insert_after

to add column col_name at a defined position, one can specify its position via the name of an existing column, or an integer index (negative values allowed). setting insert_before and the insert_after at the same time is not allowed.

None
start_with

start value for creating the ids. default value is 0.

0

add_or_replace_column(name, what, type_=None, format_=not_specified, insert_before=None, insert_after=None)

replaces the content of column name if it exists, else name is added (in place).

Parameters:

Name Type Description Default
name

column name to replace or create.

required
what

either a list with the same length as table or an expression.

required
type_

target column type. If None and the column exists, the current column type is reused.

None
format_

target column format. The default not_specified is a sentinel (different from None) and means "use the existing format or the default format for type_".

not_specified
insert_before

insertion position used when the column is created.

None
insert_after

insertion position used when the column is created.

None

add_or_replace_column_with_constant_value(name, what, type_=None, format_=not_specified, insert_before=None, insert_after=None)

replaces the content of column name with unique value if name exists, else name is added (in place).

Parameters:

Name Type Description Default
name

column name to replace or create.

required
what

scalar value assigned to all rows.

required
type_

target column type. If None and the column exists, the current column type is reused.

None
format_

target column format. The default not_specified is a sentinel (different from None) and means "use the existing format or the default format for type_".

not_specified
insert_before

insertion position used when the column is created.

None
insert_after

insertion position used when the column is created.

None

add_row(row)

adds row.

Parameters:

Name Type Description Default
row

list or tuple of values. Length must match.

required

apply(function, *args, ignore_nones=True, result_type=None)

Build a row-wise expression by applying a Python function.

Parameters:

Name Type Description Default
function

Callable that receives args row-wise.

required
args

Function inputs, typically column expressions or constants.

()
ignore_nones

If True, rows with None in any input produce None without calling function.

True
result_type

Deprecated compatibility parameter. It is currently ignored; set the target type when you add/replace the destination column.

None

Returns:

Type Description

Apply expression usable in column operations.

collapse(*col_names, new_col_name='collapsed', path=None)

Collapse rows into grouped nested-table rows.

Parameters:

Name Type Description Default
col_names

Column names defining grouping keys.

()
new_col_name

Name of the column that stores collapsed subtables.

'collapsed'
path

Optional target database path for the result.

None

Returns:

Type Description

Collapsed emzed.Table with one nested table per group.

consolidate(path=None, *, overwrite=False)

consolidates if underlying database table is a view.

Parameters:

Name Type Description Default
path

If specified the result will be a Table with a db file backend, else the result will be managed in memory.

None
overwrite

Indicate if an already existing database file should be overwritten.

False

Returns:

Type Description

emzed.Table.

create_table(col_names, col_types, col_formats=None, rows=None, title=None, meta_data=None, path=None) staticmethod

creates a table.

Parameters:

Name Type Description Default
col_names

list or tuple of strings.

required
col_types

list of types.

required
col_formats

list of formats using format specifiers like "%.2f" If not specified emzed tries to guess appropriate formats based on column type and column name.

None
rows

list of lists.

None
title

table title as string.

None
meta_data

dictionary to manage user defined meta data.

None
path

path for the db backend, default is None to use the the in-memory db backend.

None

Returns:

Type Description

emzed.Table.

drop_columns(*col_names)

removes columns in place.

Parameters:

Name Type Description Default
col_names

column names. either exact names or names containg wild cards like ? and *.

Example: Table t with colnames id, mz, mzmin, mzmax, sample_1k1, sample_1m1, sample_1k2

t.drop_columns('mz*', 'sample_1?1')

results t with columns id, sample_1k2

()

extend(other, path=None, overwrite=False)

appends the rows of another compatible table in place.

Parameters:

Name Type Description Default
other

table with the same columns, types, and formats.

required
path

unused legacy argument kept for API compatibility.

None
overwrite

unused legacy argument kept for API compatibility.

False

extract_columns(*col_names)

returns new Table with selected columns col_names.

Parameters:

Name Type Description Default
col_names

list or tuple with selected, existing column names.

()

fast_join(other, col_name, col_name_other=None, atol=0.0, rtol=0.0, extra_condition=None, *, path=None, overwrite=False)

joins (combines) two tables based on comparing approximate equality of two numerical columns.

Parameters:

Name Type Description Default
other

second table for join.

required
col_name

column name to consider.

required
col_name_other

column name of other to consider in case it is different to col_name.

None
atol

absolute tolerance for approximate equality check.

0.0
rtol

relative tolerance for approximate equality check.

0.0
extra_condition

optional additional join expression that must also match for a row pair to be included.

None

Returns:

Type Description

emzed.Table.

Performance: In case other is significantly larger than self, it is recommended to swap the tables.

The apprimate equality check for two numbers a and b is:

 abs(a - b) <= atol + rtol * abs(a)

So if you only need comparison based absolute tolerance you can set rtol to 0.0, and if you only need relative tolerance check you can set atol to 0.0.

fast_left_join(other, col_name, col_name_other=None, atol=0.0, rtol=0.0, extra_condition=None, *, path=None, overwrite=False)

joins (combines) two tables based on comparing approximate equality of two numerical columns.

In contrast to fast_join this method will include also non-matching rows from self.

Parameters:

Name Type Description Default
other

second table for join.

required
col_name

column name to consider.

required
col_name_other

column name of other to consider in case it is different to col_name.

None
atol

absolute tolerance for approximate equlity check.

0.0
rtol

relative tolerance for approximate equlity check.

0.0
extra_condition

optional additional join expression that must also match for a row pair to be included.

None

Returns:

Type Description

emzed.Table.

Performance: In case other is significantly larger than self, it is recommended to swap the tables.

The apprimate equality check for two numbers a and b is:

 abs(a - b) <= atol + rtol * abs(a)

So if you only need comparison based absolute tolerance you can set rtol to 0.0, and if you only need relative tolerance check you can set atol to 0.0.

filter(condition)

creates a new table by filtering rows fulfiling the given condition. similar use as pandas query.

Parameters:

Name Type Description Default
condition

expression like t.a < 0 or t.a <= t.b.

required

Returns:

Type Description

emzed.Table with filtered rows.

from_pandas(df, col_names=None, col_types=None, col_formats=None) staticmethod

converts pandas data frame into emzed Table.

Parameters:

Name Type Description Default
df

pandas data frame.

required
col_names

list of colum names, can be used to override data frame colum names.

None
col_types

list of colum types, if not provided emzed determines types from column contents and names.

None
col_formats

list of colum formats, if not provided emzed determines formats from column contents and names.

None

Returns:

Type Description

emzed.Table.

get_column(name)

returns column expression object for column name.

Parameters:

Name Type Description Default
name

existing column name or "_index".

You can use t[name] or t.name instead.

required

group_by(*colums, group_nones=False)

Create grouped expressions based on one or more columns.

Parameters:

Name Type Description Default
colums

Grouping columns, for example t.a or t["b"].

()
group_nones

If False, rows with None in grouping columns are excluded from groups.

False

Returns:

Type Description

GroupBy helper (for example .min(...)).

is_mutable()

returns boolean value to show whether the content of a Table is mutable.

join(other, expression=None, *, path=None, overwrite=False)

Join two tables.

Parameters:

Name Type Description Default
other

Right-hand side table.

required
expression

Optional join condition expression. If None, a full row-wise cross product is returned.

None
path

Optional target database path for the result.

None
overwrite

Whether an existing path may be overwritten.

False

Returns:

Type Description

Joined emzed.Table.

left_join(other, expression=None, *, path=None, overwrite=False)

Left-join two tables while keeping all rows from the left table.

Parameters:

Name Type Description Default
other

Right-hand side table.

required
expression

Optional join condition expression. If None, a full row-wise cross product is used.

None
path

Optional target database path for the result.

None
overwrite

Whether an existing path may be overwritten.

False

Returns:

Type Description

Joined emzed.Table with all left-side rows preserved.

load(path) classmethod

loads table from disk into memory.

Parameters:

Name Type Description Default
path

path to file.

required

Returns:

Type Description

emzed.Table.

load_csv(path, col_names=None, col_types=None, col_formats=None, *, delimiter=';', dash_is_none=True) staticmethod

loads csv file.

Parameters:

Name Type Description Default
path

path to csv file.

required
col_names

list of colum names, if not provided first line of csv file is used instead.

None
col_types

list of colum types, if not provided emzed determines types from column contents and names.

None
col_formats

list of colum formats, if not provided emzed determines formats from column contents and names.

None
delimiter

csv delimiter character.

';'
dash_is_none

cells with '-' are interpreted as None (missing value). types. In case - should be handled as a string with the single character "-" one must set this argument to False.

True

Returns:

Type Description

emzed.Table.

load_excel(path, col_names=None, col_types=None, col_formats=None) staticmethod

loads excel file.

Parameters:

Name Type Description Default
path

path to file.

required
col_names

list of column names, if not provided first line of .xlsx or .xls file is used instead.

None
col_types

list of colum types, if not provided emzed determines types from column contents and names.

None
col_formats

list of colum formats, if not provided emzed determines formats from column contents and names.

None

Returns:

Type Description

emzed.Table.

open(path) classmethod

opens table on disk without loading data into memory.

Parameters:

Name Type Description Default
path

path to file.

required

Returns:

Type Description

emzed.Table.

print_(max_rows=30, max_col_width=None, stream=None)

print table.

Parameters:

Name Type Description Default
max_rows

Maximum number of rows to display. If the table is longer only head and tail of the table are shown. The missing part is denoted with "...".

30
max_col_width

If specified the width of columns can be restricted.

None
stream

file object to redirect printing, e.g. to a file.

None

rename_columns(**from_to)

Rename columns from current names to new names.

Parameters:

Name Type Description Default
from_to

Keyword arguments mapping old column names to new column names, for example a="b".

Example: t.rename_columns(a="b") renames column "a" to "b".

{}

rename_postfixes(**from_to)

Rename postfixes in column names using keyword arguments.

Each key is the old postfix and each value is the new postfix.

Example: t.rename_postfixes(__0="_zero") changes columns like a__0 and b__0 to a_zero and b_zero.

replace_column(name, what, type_=None, format_=not_specified)

replaces content of existing column name in place.

Parameters:

Name Type Description Default
name

the name of the exisiting column.

required
what

you can use a list with the same length as table or an expression.

required
type_

supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type 'object' instead.

None
format_

is a format string as "%d" or an executable string with python code. To suppress visibility set format_ = None. The default not_specified is a sentinel meaning "argument not provided" and is different from None. In that case the method determines a default format for the column type.

The column keeps its existing position in the table.

not_specified

replace_column_with_constant_value(name, what, type_=None, format_=not_specified)

replaces the content of column name with unique value what.

Parameters:

Name Type Description Default
name

existing column name.

required
what

any of accepted types int, float, bool, MzType, RtType, str, PeakMap, Table.

required
type_

target column type. If None, the current column type is used.

None
format_

target column format. The default not_specified is a sentinel (different from None) and means "use the existing format or the default format for type_".

not_specified

save(path, *, overwrite=False)

save table to a file.

Parameters:

Name Type Description Default
path

path describing target location.

required
overwrite

If set to True an existing file will be overwritten, else an exception will be thrown.

False

save_csv(path, delimiter=';', as_printed=False, dash_is_none=True, *, overwrite=False)

saves Table as csv in path.

Parameters:

Name Type Description Default
path

specifies path of the file. The path must end with .csv.

required
delimiter

Alias for sep. Default value is set to Excel dialect ';'.

';'
as_printed

If True, formatted values will be stored. Note, format settings can lead to information loss, i.e. if column format value is set to .2f% only the first 2 decimal places will be saved.

False
dash_is_none

if True, missing values are written as - when as_printed is enabled.

True
overwrite

If set to True an existing file will be overwritten, else an exception will be thrown.

False

save_excel(path, *, overwrite=False)

saves Table as xls or xlsx in path.

Parameters:

Name Type Description Default
path

specifies path of the file. The path must end with .xls or .xlsx.

required

set_col_format(col_name, format_)

sets format of column col_name to format format_.

Parameters:

Name Type Description Default
col_name

column name.

required
format_

accepted column format (see add_column).

required

Returns:

Type Description

None.

set_col_type(col_name, type_)

sets type of column col_name to type type_.

Parameters:

Name Type Description Default
col_name

column name.

required
type_

accepted column type (see add_column).

required

Returns:

Type Description

None.

set_title(title)

sets the table title.

Parameters:

Name Type Description Default
title

title string stored with the table metadata.

required

sort_by(*col_names, ascending=True)

sort table by given column names in given order.

Parameters:

Name Type Description Default
col_names

one or more column names as separate arguments.

()
ascending

either bool or list/tuple of bools of same number as specified column names.

True

Returns:

Type Description

emzed.Table.

split_by(*col_names, keep_view=False)

Split a table into subtables by unique values of selected columns.

Parameters:

Name Type Description Default
col_names

Column names defining split groups.

()
keep_view

If True, return views. If False, return consolidated standalone tables.

False

Returns:

Type Description

List of Table subtables, one per unique group.

split_by_iter(*col_names, keep_view=False)

Yield subtables by unique values of selected columns.

Parameters:

Name Type Description Default
col_names

Column names defining split groups.

()
keep_view

If True, yield views. If False, yield consolidated standalone tables.

False

Returns:

Type Description

Generator yielding Table subtables lazily.

stack_tables(tables, path=None, overwrite=False) staticmethod

builds a single Table from list or tuple of Tables.

Parameters:

Name Type Description Default
tables

list or tuple of Tables. All tables must have the same colum names with same types and formats.

required
path

If specified the result will be a Table with a db file backend, else the result will be managed in memory.

None
overwrite

Indicate if an already existing database file should be overwritten.

False

Returns:

Type Description

emzed.Table.

supported_postfixes(col_names)

returns common postfixes (endings) of column col_names.

Parameters:

Name Type Description Default
col_names

list or tuple of column names.

required

Returns:

Type Description

list of common postfixes.

Examples: Assuming a Table with columns ['rt', 'rtmin', 'rtmax', 'rt1', 'rtmin1'].

t.supported_postfixes(['rt'])

returns ['', 'min', 'max', '1', 'min1']

t.supported_postfixes(['rt', 'rtmin'])

returns ['', '1']

t.supported_postfixes(['rt', 'rtmax'])

returns ['']

to_pandas()

converts table to pandas DataFrame object

to_table(name, values, type_, format_=None, title=None, meta_data=None, path=None)

generates a one-column Table from an iterable, e.g. from a list.

Parameters:

Name Type Description Default
name

name of the column.

required
values

iterable with column values.

required
type_

supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type 'object' instead.

required
format_

is a format string such as "%d". To suppress visibility set format_ = None.

None
title

Table title as string.

None
meta_data

Python dictionary to assign meta data to the table.

None
path

Path for the db backend, use None for an in memory db backend.

None

Returns:

Type Description

emzed.Table