Skip to content

emzed.table

Table

col_formats property

Column formats.

Returns:

Type Description

tuple of format specifiers.

col_names property

Column names.

Returns:

Type Description

tuple of strings.

col_types property

Column types.

Returns:

Type Description

tuple of types.

rows property

Returns:

Type Description

All rows as list of tuples.

unique_id property

computes unique identifier based on table content and meta data.

Returns:

Type Description

unique identifier as string.

add_column(name, what, type_, format_=not_specified, insert_before=None, insert_after=None)

adds a new column with name in place.

Parameters:

Name Type Description Default
name

the name of the new column.

required
what

either a list with the same length as table or an expression.

required
type_

supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type 'object' instead.

required
format_

is a format string as "%d" or or an executable string with python code. To suppress visibility set format_ = None. By default (not_specified) the method tries to determine a default format for the type.

not_specified
insert_before

to add column name at a defined position, one can specify its position left-wise to column insert_before via the name of an existing column, or an integer index (negative values allowed !).

None
insert_after

to add column name at a defined position, one can specify its position right-wise to column insert_after.

None

add_column_with_constant_value(name, value, type_, format_=not_specified, insert_before=None, insert_after=None)

add column name with unique value value.

Parameters:

Name Type Description Default
name

new column name.

required
value

any of accepted types int, float, bool, MzType, RtType, str, PeakMap, Table.

required
type_

target column type.

required
format_

target column format. By default the default format for type_ is used.

not_specified
insert_before

insertion position for the new column.

None
insert_after

insertion position for the new column.

None

add_enumeration(col_name='id', insert_before=None, insert_after=None, start_with=0)

adds enumerated column as first column to table in place.

Parameters:

Name Type Description Default
col_name

name of added column. Default name is id.

'id'
insert_before

to add column col_name at a defined position, one can specify its position via the name of an existing column, or an integer index (negative values allowed). setting insert_before and the insert_after at the same time is not allowed.

None
insert_after

to add column col_name at a defined position, one can specify its position via the name of an existing column, or an integer index (negative values allowed). setting insert_before and the insert_after at the same time is not allowed.

None
start_with

start value for creating the ids. default value is 0.

0

add_or_replace_column(name, what, type_=None, format_=not_specified, insert_before=None, insert_after=None)

replaces the content of column name if it exists, else name is added (in place).

Parameters:

Name Type Description Default
name

column name to replace or create.

required
what

either a list with the same length as table or an expression.

required
type_

target column type. If None and the column exists, the current column type is reused.

None
format_

target column format. By default the existing format or the default format for type_ is used.

not_specified
insert_before

insertion position used when the column is created.

None
insert_after

insertion position used when the column is created.

None

add_or_replace_column_with_constant_value(name, what, type_=None, format_=not_specified, insert_before=None, insert_after=None)

replaces the content of column name with unique value if name exists, else name is added (in place).

Parameters:

Name Type Description Default
name

column name to replace or create.

required
what

scalar value assigned to all rows.

required
type_

target column type. If None and the column exists, the current column type is reused.

None
format_

target column format. By default the existing format or the default format for type_ is used.

not_specified
insert_before

insertion position used when the column is created.

None
insert_after

insertion position used when the column is created.

None

add_row(row)

adds row.

Parameters:

Name Type Description Default
row

list or tuple of values. Length must match.

required

apply(function, *args, ignore_nones=True, result_type=None)

allows computing columns using a function with multiple arguments.

Parameters:

Name Type Description Default
function

any function accepting arguments *args. The return value can be used to compute another column.

required
args

function arguments. arguments can be column expressions like t['col_name'], or local or global variables accepted by the function.

()
ignore_nones

since None represents a missing value, apply will not call function in case one of the arguments is None and will instead consider None as result. in case the function is able to consider such missing values, one must set ignore_nones to False.

Example: the following code

.. code-block:: python

def convert(v):
    return str(v) + "s"

t = emzed.to_table("a", [1, None, 5], int)
t.add_column("b", t.apply(replace_none, t.a), int)
t.add_column("c", t.apply(replace_none, t.a, ignore_nones=False), int)
print(t)

prints

.. parsed-literal::

a    b    c
int  int  int
---  ---  ---
1    1    1
-    -   -1
5    5    5
True

collapse(*col_names, new_col_name='collapsed', path=None)

colapses a table by grouping according to columns col_names.

Parameters:

Name Type Description Default
col_names

column names with values defining colapsing groups.

()
new_col_name

column name of the new column holding the collapsed sub-tables.

'collapsed'
path

If specified the result will be a Table with a db file backend, else the result will be managed in memory.

None

Returns:

Type Description

emzed.Table

Example:

.. parsed-literal::

a   b   c
int int int
--- --- ---
  1   1   1
  1   1   2
  2   1   3
  2   2   4

print(t.collapse('a'))

results

.. parsed-literal::

a   collapsed
int emzed.Table
--- ---------------
1   <Table af3 ...>
2   <Table e9f ...>

consolidate(path=None, *, overwrite=False)

consolidates if underlying database table is a view.

Parameters:

Name Type Description Default
path

If specified the result will be a Table with a db file backend, else the result will be managed in memory.

None
overwrite

Indicate if an already existing database file should be overwritten.

False

Returns:

Type Description

emzed.Table.

create_table(col_names, col_types, col_formats=None, rows=None, title=None, meta_data=None, path=None) staticmethod

creates a table.

Parameters:

Name Type Description Default
col_names

list or tuple of strings.

required
col_types

list of types.

required
col_formats

list of formats using format specifiers like "%.2f" If not specified emzed tries to guess appropriate formats based on column type and column name.

None
rows

list of lists.

None
title

table title as string.

None
meta_data

dictionary to manage user defined meta data.

None
path

path for the db backend, default is None to use the the in-memory db backend.

None

Returns:

Type Description

emzed.Table.

drop_columns(*col_names)

removes columns in place.

Parameters:

Name Type Description Default
col_names

column names. either exact names or names containg wild cards like ? and *.

Example: Table t with colnames id, mz, mzmin, mzmax, sample_1k1, sample_1m1, sample_1k2

t.drop_columns('mz*', 'sample_1?1')

results t with columns id, sample_1k2

()

extend(other, path=None, overwrite=False)

appends the rows of another compatible table in place.

Parameters:

Name Type Description Default
other

table with the same columns, types, and formats.

required
path

unused legacy argument kept for API compatibility.

None
overwrite

unused legacy argument kept for API compatibility.

False

extract_columns(*col_names)

returns new Table with selected columns col_names.

Parameters:

Name Type Description Default
col_names

list or tuple with selected, existing column names.

()

fast_join(other, col_name, col_name_other=None, atol=0.0, rtol=0.0, extra_condition=None, *, path=None, overwrite=False)

joins (combines) two tables based on comparing approximate equality of two numerical columns.

Parameters:

Name Type Description Default
other

second table for join.

required
col_name

column name to consider.

required
col_name_other

column name of other to consider in case it is different to col_name.

None
atol

absolute tolerance for approximate equlity check.

0.0
rtol

relative tolerance for approximate equlity check.

0.0
extra_condition

optional additional join expression that must also match for a row pair to be included.

None

Returns:

Type Description

emzed.Table.

Performance: In case other is significantly larger than self, it is recommended to swap the tables.

The apprimate equality check for two numbers a and b is:

 abs(a - b) <= atol + rtol * abs(a)

So if you only need comparison based absolute tolerance you can set rtol to 0.0, and if you only need relative tolerance check you can set atol to 0.0.

fast_left_join(other, col_name, col_name_other=None, atol=0.0, rtol=0.0, extra_condition=None, *, path=None, overwrite=False)

joins (combines) two tables based on comparing approximate equality of two numerical columns.

In contrast to fast_join this method will include also non-matching rows from self.

Parameters:

Name Type Description Default
other

second table for join.

required
col_name

column name to consider.

required
col_name_other

column name of other to consider in case it is different to col_name.

None
atol

absolute tolerance for approximate equlity check.

0.0
rtol

relative tolerance for approximate equlity check.

0.0
extra_condition

optional additional join expression that must also match for a row pair to be included.

None

Returns:

Type Description

emzed.Table.

Performance: In case other is significantly larger than self, it is recommended to swap the tables.

The apprimate equality check for two numbers a and b is:

 abs(a - b) <= atol + rtol * abs(a)

So if you only need comparison based absolute tolerance you can set rtol to 0.0, and if you only need relative tolerance check you can set atol to 0.0.

filter(condition)

creates a new table by filtering rows fulfiling the given condition. similar use as pandas query.

Parameters:

Name Type Description Default
condition

expression like t.a < 0 or t.a <= t.b.

required

Returns:

Type Description

emzed.Table with filtered rows.

from_pandas(df, col_names=None, col_types=None, col_formats=None) staticmethod

converts pandas data frame into emzed Table.

Parameters:

Name Type Description Default
df

pandas data frame.

required
col_names

list of colum names, can be used to override data frame colum names.

None
col_types

list of colum types, if not provided emzed determines types from column contents and names.

None
col_formats

list of colum formats, if not provided emzed determines formats from column contents and names.

None

Returns:

Type Description

emzed.Table.

get_column(name)

returns column expression object for column name.

Parameters:

Name Type Description Default
name

existing column name or "_index".

You can use t[name] instead.

required

group_by(*colums, group_nones=False)

return Table group_by object where rows got grouped by columns.

:param columns: table columns i.e t.a, or t['b'].

:param group_nones: ignores rows where group columns are None.

:returns: GroupBy object

Examples: For given Table t

.. parsed-literal::

a b c int int int


0    1    2
1    -    1
2    -    0
2    2    3

t.add_Column('ga', t.group_by(t.a).min(t.c), int) t.add_Column('gb1', t.group_by(t.b).min(t.c), int) t.add_Column('gb2', t.group_by(t.c).min(t.c), int)

print(t)

.. parsed-literal::

a b c ga gb1 gb2 int int int int int int


0    1    2    2    2    2
1    -    1    1    -    0
2    -    0    0    -    0
2    2    3    0    3    3

is_mutable()

returns boolean value to show whether the content of a Table is mutable.

join(other, expression=None, *, path=None, overwrite=False)

joins (combines) two tables.

Parameters:

Name Type Description Default
other

second table for join.

required
expression

If None this method returns a table with the row wise cross product of both tables. else this expression is used to filter rows from this cross product.

None

Returns:

Type Description

emzed.Table.

Example:

if you have two table t1 and t2 as

.. parsed-literal::

id   mz
int  float
---  -----
  0  100.0
  1  200.0
  2  300.0

and

.. parsed-literal::

id   mz     rt
int  float  float
---  -----  -----
  0  100.0   10.0
  1  110.0   20.0
  2  200.0   30.0

Then the result of t1.join(t2, t1.mz.in_range(t2.mz - 20.0, t2.mz + 20.0)) is

.. parsed-literal::

id   mz     id__0  mz__0  rt__0
int  float  int    float  float
---  -----  -----  -----  -----
  0  100.0      0  100.0   10.0
  0  100.0      1  110.0   20.0
  1  200.0      2  200.0   30.0

If you do not provide an expression, this method returns the full cross product.

left_join(other, expression=None, *, path=None, overwrite=False)

Combines two tables (also known as outer join).

Parameters:

Name Type Description Default
other

Second table for join.

required
expression

If None this method returns a table with the row wise cross product of both tables. Else this expression is used to filter rows from this cross product, whereby all rows of the left table are kept.

None

Returns:

Type Description

emzed.Table.

If we take the example from join

Then t1.left_join(t2, t1.mz.in_range(t2.mz - 20.0, t2.mz + 20.0)) results:

.. parsed-literal::

id   mz     id__0  mz__0  rt__0
int  float  int    float  float
---  -----  -----  -----  -----
  0  100.0      0  100.0   10.0
  0  100.0      1  110.0   20.0
  1  200.0      2  200.0   30.0
  3  300.0      -      -      -

load(path) classmethod

loads table from disk into memory.

Parameters:

Name Type Description Default
path

path to file.

required

Returns:

Type Description

emzed.Table.

load_csv(path, col_names=None, col_types=None, col_formats=None, *, delimiter=';', dash_is_none=True) staticmethod

loads csv file.

Parameters:

Name Type Description Default
path

path to csv file.

required
col_names

list of colum names, if not provided first line of csv file is used instead.

None
col_types

list of colum types, if not provided emzed determines types from column contents and names.

None
col_formats

list of colum formats, if not provided emzed determines formats from column contents and names.

None
delimiter

csv delimiter character.

';'
dash_is_none

cells with '-' are interpreted as None (missing value). types. In case - should be handled as a string with the single character "-" one must set this argument to False.

True

Returns:

Type Description

emzed.Table.

load_excel(path, col_names=None, col_types=None, col_formats=None) staticmethod

loads excel file.

Parameters:

Name Type Description Default
path

path to file.

required
col_names

list of column names, if not provided first line of .xlsx or .xls file is used instead.

None
col_types

list of colum types, if not provided emzed determines types from column contents and names.

None
col_formats

list of colum formats, if not provided emzed determines formats from column contents and names.

None

Returns:

Type Description

emzed.Table.

open(path) classmethod

opens table on disk without loading data into memory.

Parameters:

Name Type Description Default
path

path to file.

required

Returns:

Type Description

emzed.Table.

print_(max_rows=30, max_col_width=None, stream=None)

print table.

Parameters:

Name Type Description Default
max_rows

Maximum number of rows to display. If the table is longer only head and tail of the table are shown. The missing part is denoted with "...".

30
max_col_width

If specified the width of columns can be restricted.

None
stream

file object to redirect printing, e.g. to a file.

None

rename_columns(**from_to)

changes column names from current to new name using key word arguments.

:param from_to: key word arguments like a="b", see example below.

Example: t.rename_columns(a="b") renames column "a" to "b"

rename_postfixes(**from_to)

changes column names from current to new name using key word arguments.

Example:

.. code-block:: python

t = emzed.Table.create_table(
       ["a", "a__0", "a__1", "b__0", "b__1"],
       [int, int, int, int, int],
       rows=[[1, 2, 3, 4, 5]]
)
print(t)

t.rename_postfixes(__0="_zero")
print(t)

prints

.. parsed-literal::

   a   a__0  a__1  b__0  b__1
   int  int   int   int   int
   ---  ----  ----  ----  ----
     1     2     3     4     5

   a    a_zero  a__1  b_zero  b__1
   int  int     int   int     int
   ---  ------  ----  ------  ----
     1       2     3       4     5

replace_column(name, what, type_=None, format_=not_specified)

replaces content of existing column name in place.

Parameters:

Name Type Description Default
name

the name of the exisiting column.

required
what

you can use a list with the same length as table or an expression.

required
type_

supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type 'object' instead.

None
format_

is a format string as "%d" or or an executable string with python code. To suppress visibility set format_ = None. By default (not_specified) the method tries to determine a default format for the type.

The column keeps its existing position in the table.

not_specified

replace_column_with_constant_value(name, what, type_=None, format_=not_specified)

replaces the content of column name with unique value what.

Parameters:

Name Type Description Default
name

existing column name.

required
what

any of accepted types int, float, bool, MzType, RtType, str, PeakMap, Table.

required
type_

target column type. If None, the current column type is used.

None
format_

target column format. By default the existing format or the default format for type_ is used.

not_specified

save(path, *, overwrite=False)

save table to a file.

Parameters:

Name Type Description Default
path

path describing target location.

required
overwrite

If set to True an existing file will be overwritten, else an exception will be thrown.

False

save_csv(path, delimiter=';', as_printed=False, dash_is_none=True, *, overwrite=False)

saves Table as csv in path.

Parameters:

Name Type Description Default
path

specifies path of the file. The path must end with .csv.

required
delimiter

Alias for sep. Default value is set to Excel dialect ';'.

';'
as_printed

If True, formatted values will be stored. Note, format settings can lead to information loss, i.e. if column format value is set to .2f% only the first 2 decimal places will be saved.

False
dash_is_none

if True, missing values are written as - when as_printed is enabled.

True
overwrite

If set to True an existing file will be overwritten, else an exception will be thrown.

False

save_excel(path, *, overwrite=False)

saves Table as xls or xlsx in path.

Parameters:

Name Type Description Default
path

specifies path of the file. The path must end with .xls or .xlsx.

required

set_col_format(col_name, format_)

sets format of column col_name to format format_.

Parameters:

Name Type Description Default
col_name

column name.

required
format_

accepted column format (see add_column).

required

Returns:

Type Description

None.

set_col_type(col_name, type_)

sets type of column col_name to type type_.

Parameters:

Name Type Description Default
col_name

column name.

required
type_

accepted column type (see add_column).

required

Returns:

Type Description

None.

set_title(title)

sets the table title.

Parameters:

Name Type Description Default
title

title string stored with the table metadata.

required

sort_by(*col_names, ascending=True)

sort table by given column names in given order.

Parameters:

Name Type Description Default
col_names

one or more column names as separate arguments.

()
ascending

either bool or list/tuple of bools of same number as specified column names.

True

Returns:

Type Description

emzed.Table.

split_by(*col_names, keep_view=False)

generates a list of subtables, whereby split columns col_names contain unique values.

Parameters:

Name Type Description Default
col_names

column names with values defining split groups.

()

Returns:

Type Description

a list of sub_tables

Example: If we have a table t as

.. parsed-literal::

a   b   c
int int int
--- --- ---
  1   1   1
  1   1   2
  2   1   3
  2   2   4

sub_tables = t.splitBy("a") results 3 subtables

sub_tables[0]

.. parsed-literal::

a   b   c
int int int
--- --- ---
  1   1   1
  1   1   2

sub_tables[1]

.. parsed-literal::

a   b   c
int int int
--- --- ---
  2   1   3

and subtables[2]

.. parsed-literal::

a   b   c
int int int
--- --- ---
  2   2   4

split_by_iter(*col_names, keep_view=False)

builds a generator yielding subtables, whereby subtable split columns col_names contain unique values.

Parameters:

Name Type Description Default
col_names

column names with values defining split groups.

()

Returns:

Type Description

a generator object of subtables

refering to example table split_by:

sub_tables=t.split_by_iter("a")

print(next(sub_tables))

results

.. parsed-literal::

a   b   c
int int int
--- --- ---
  1   1   1
  1   1   2

hence the first sub_table of t, corresponding to sub_tables[0] in split_by example. split_by_iter can be more memory efficient than split_by.

stack_tables(tables, path=None, overwrite=False) staticmethod

builds a single Table from list or tuple of Tables.

Parameters:

Name Type Description Default
tables

list or tuple of Tables. All tables must have the same colum names with same types and formats.

required
path

If specified the result will be a Table with a db file backend, else the result will be managed in memory.

None
overwrite

Indicate if an already existing database file should be overwritten.

False

Returns:

Type Description

emzed.Table.

supported_postfixes(col_names)

returns common postfixes (endings) of column col_names.

Parameters:

Name Type Description Default
col_names

list or tuple of column names.

required

Returns:

Type Description

list of common postfixes.

Examples: Assuming a Table with columns ['rt', 'rtmin', 'rtmax', 'rt1', 'rtmin1'].

t.supported_postfixes(['rt'])

returns ['', 'min', 'max', '1', 'min1']

t.supported_postfixes(['rt', 'rtmin'])

returns ['', '1']

t.supported_postfixes(['rt', 'rtmax'])

returns ['']

to_pandas()

converts table to pandas DataFrame object

to_table(name, values, type_, format_=None, title=None, meta_data=None, path=None)

generates a one-column Table from an iterable, e.g. from a list.

Parameters:

Name Type Description Default
name

name of the column.

required
values

iterable with column values.

required
type_

supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type 'object' instead.

required
format_

is a format string as "%d". To suppress visibility set format_ = None. By default (not_specified) the method tries to determine a default format for the type.

None
title

Table title as string.

None
meta_data

Python dictionary to assign meta data to the table.

None
path

Path for the db backend, use None for an in memory db backend.

None

Returns:

Type Description

emzed.Table