emzed package

Subpackages

Submodules

Module contents

class emzed.MzType(x=0, /)[source]

Bases: float

class emzed.PeakMap(conn, access_name, meta_data, info)[source]

Bases: ImmutablePeakMap

add_spectrum(spectrum)[source]
classmethod from_(pm, *, target_db_file=None)[source]
merge(other)[source]
spectra_for_modification()[source]

contextmanager. in this context one can change spectra attributes like rt, ms_level, peaks or polarity

class emzed.RtType(x=0, /)[source]

Bases: float

class emzed.Spectrum(scan_number, rt, ms_level, polarity, precursors, peaks)[source]

Bases: object

unbind()[source]

dettaches spectrum from peakmap

class emzed.Table(model, meta_data=None, *, _freeze_unique_id=None)[source]

Bases: object

add_column(name, what, type_, format_=<class 'emzed.table.table.not_specified'>, insert_before=None, insert_after=None)[source]

adds a new column with name in place.

Parameters:
  • name – the name of the new column.

  • what – either a list with the same length as table or an expression.

  • type_ – supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type ‘object’ instead. :param format_: is a format string as “%d” or or an executable string with python code. To suppress visibility set format_ = None. By default (not_specified) the method tries to determine a default format for the type.

  • insert_before – to add column name at a defined position, one can specify its position left-wise to column insert_before via the name of an existing column, or an integer index (negative values allowed !).

  • insert_after – to add column name at a defined position, one can specify its position right-wise to column insert_after.

add_column_with_constant_value(name, value, type_, format_=<class 'emzed.table.table.not_specified'>, insert_before=None, insert_after=None)[source]

add column name with unique value value.

For method parameters see add_column() with exception of

Parameters:

what – any of accepted types int, float, bool, MzType, RtType, str, PeakMap, Table.

add_enumeration(col_name='id', insert_before=None, insert_after=None, start_with=0)[source]

adds enumerated column as first column to table in place.

Parameters:
  • col_name – name of added column. Default name is id.

  • insert_before – to add column col_name at a defined position, one can specify its position via the name of an existing column, or an integer index (negative values allowed). setting insert_before and the insert_after at the same time is not allowed.

  • insert_after – to add column col_name at a defined position, one can specify its position via the name of an existing column, or an integer index (negative values allowed). setting insert_before and the insert_after at the same time is not allowed.

  • start_with – start value for creating the ids. default value is 0.

add_or_replace_column(name, what, type_=None, format_=<class 'emzed.table.table.not_specified'>, insert_before=None, insert_after=None)[source]

replaces the content of column name if it exists, else name is added (in place).

For parameters see replace_column().

add_or_replace_column_with_constant_value(name, what, type_=None, format_=<class 'emzed.table.table.not_specified'>, insert_before=None, insert_after=None)[source]

replaces the content of column name with unique value if name exists, else name is added (in place).

For parameters see replace_column_with_constant_value().

add_row(row)[source]

adds row.

Parameters:

row – list or tuple of values. Length must match.

apply(function, *args, ignore_nones=True, result_type=None)[source]

allows computing columns using a function with multiple arguments.

Parameters:
  • function – any function accepting arguments *args. The return value can be used to compute another column.

  • args – function arguments. arguments can be column expressions like t[‘col_name’], or local or global variables accepted by the function.

  • ignore_nones – since None represents a missing value, apply will not call function in case one of the arguments is None and will instead consider None as result. in case the function is able to consider such missing values, one must set ignore_nones to False.

Example: the following code

def convert(v):
    return str(v) + "s"

t = emzed.to_table("a", [1, None, 5], int)
t.add_column("b", t.apply(replace_none, t.a), int)
t.add_column("c", t.apply(replace_none, t.a, ignore_nones=False), int)
print(t)

prints

a    b    c
int  int  int
---  ---  ---
1    1    1
-    -   -1
5    5    5
close()[source]
property col_formats

Column formats.

Returns:

tuple of format specifiers.

property col_names

Column names.

Returns:

tuple of strings.

property col_types

Column types.

Returns:

tuple of types.

collapse(*col_names, new_col_name='collapsed', path=None)[source]

colapses a table by grouping according to columns col_names.

Parameters:
  • col_names – column names with values defining colapsing groups.

  • new_col_names – column name of new column resulting from colapsing process.

  • path – If specified the result will be a Table with a db file backend, else the result will be managed in memory.

Returns:

emzed.Table

Example:

a   b   c
int int int
--- --- ---
  1   1   1
  1   1   2
  2   1   3
  2   2   4
>>> print(t.collapse('a'))

results

a   collapsed
int emzed.Table
--- ---------------
1   <Table af3 ...>
2   <Table e9f ...>
consolidate(path=None, *, overwrite=False)[source]

consolidates if underlying database table is a view.

Parameters:
  • path – If specified the result will be a Table with a db file backend, else the result will be managed in memory.

  • overwrite – Indicate if an already existing database file should be overwritten.

Returns:

emzed.Table.

copy(path=None, *, overwrite=False)

consolidates if underlying database table is a view.

Parameters:
  • path – If specified the result will be a Table with a db file backend, else the result will be managed in memory.

  • overwrite – Indicate if an already existing database file should be overwritten.

Returns:

emzed.Table.

static create_table(col_names, col_types, col_formats=None, rows=None, title=None, meta_data=None, path=None)[source]

creates a table.

Parameters:
  • col_names – list or tuple of strings.

  • col_types – list of types.

  • col_formats – list of formats using format specifiers like “%.2f” If not specified emzed tries to guess appropriate formats based on column type and column name.

  • rows – list of lists.

  • title – table title as string.

  • meta_data – dictionary to manage user defined meta data.

  • path – path for the db backend, default is None to use the the in-memory db backend.

Returns:

emzed.Table.

drop_columns(*col_names)[source]

removes columns in place.

Parameters:

col_names – column names. either exact names or names containg wild cards like ? and *.

Example: Table t with colnames id, mz, mzmin, mzmax, sample_1k1, sample_1m1, sample_1k2

t.drop_columns('mz*', 'sample_1?1')

results t with columns id, sample_1k2

extend(other, path=None, overwrite=False)[source]
extract_columns(*col_names, keep_view=False, path=None, overwrite=False)

returns new Table with selected columns col_names.

Parameters:
  • col_names – list or tuple with selected, existing column names.

  • keep_view – keep view or consolidate result (default: True)

  • path – path in case view is consolidated, ‘None’ keeps resultin memory

  • overwite – if path is not None, this flag specifies if an existing file should be overwritten

fast_join(other, col_name, col_name_other=None, atol=0.0, rtol=0.0, extra_condition=None, *, path=None, overwrite=False)[source]

joins (combines) two tables based on comparing approximate equality of two numerical columns.

Parameters:
  • other – second table for join.

  • col_name – column name to consider.

  • col_name_other – column name of other to consider in case it is different to col_name.

  • atol – absolute tolerance for approximate equlity check.

  • rtol – relative tolerance for approximate equlity check.

Returns:

emzed.Table.

Performance: In case other is significantly larger than self, it is recommended to swap the tables.

The apprimate equality check for two numbers a and b is:

abs(a - b) <= atol + rtol * abs(a)

So if you only need comparison based absolute tolerance you can set rtol to 0.0, and if you only need relative tolerance check you can set atol to 0.0.

fast_left_join(other, col_name, col_name_other=None, atol=0.0, rtol=0.0, extra_condition=None, *, path=None, overwrite=False)[source]

joins (combines) two tables based on comparing approximate equality of two numerical columns.

In contrast to fast_join() this method will include also non-matching rows from self.

Parameters:
  • other – second table for join.

  • col_name – column name to consider.

  • col_name_other – column name of other to consider in case it is different to col_name.

  • atol – absolute tolerance for approximate equlity check.

  • rtol – relative tolerance for approximate equlity check.

Returns:

emzed.Table.

Performance: In case other is significantly larger than self, it is recommended to swap the tables.

The apprimate equality check for two numbers a and b is:

abs(a - b) <= atol + rtol * abs(a)

So if you only need comparison based absolute tolerance you can set rtol to 0.0, and if you only need relative tolerance check you can set atol to 0.0.

filter(condition, keep_view=False, path=None, overwrite=False)

creates a new table by filtering rows fulfiling the given condition. similar use as pandas query.

Parameters:
  • condition – expression like t.a < 0 or t.a <= t.b.

  • keep_view – keep view or consolidate result (default: True)

  • path – path in case view is consolidated, ‘None’ keeps resultin memory

  • overwite – if path is not None, this flag specifies if an existing file should be overwritten

Returns:

emzed.Table with filtered rows.

static from_pandas(df, col_names=None, col_types=None, col_formats=None)[source]

converts pandas data frame into emzed Table.

Parameters:
  • df – pandas data frame.

  • col_names – list of colum names, can be used to override data frame colum names.

  • col_types – list of colum types, if not provided emzed determines types from column contents and names.

  • col_formats – list of colum formats, if not provided emzed determines formats from column contents and names.

Returns:

emzed.Table.

get_column(name)[source]

returns column expression object for column name.

You can use t[name] instead.

get_title()[source]
group_by(*colums, group_nones=False)[source]

return Table group_by object where rows got grouped by columns.

param columns:

table columns i.e t.a, or t['b'].

param group_nones:

ignores rows where group columns are None.

returns:

GroupBy object

Examples: For given Table t

a    b    c
int  int  int
---  ---  ---
  0    1    2
  1    -    1
  2    -    0
  2    2    3
>>> t.add_Column('ga', t.group_by(t.a).min(t.c), int)
>>> t.add_Column('gb1', t.group_by(t.b).min(t.c), int)
>>> t.add_Column('gb2', t.group_by(t.c).min(t.c), int)
>>> print(t)
a    b    c    ga   gb1  gb2
int  int  int  int  int  int
---  ---  ---  ---  ---  ---
  0    1    2    2    2    2
  1    -    1    1    -    0
  2    -    0    0    -    0
  2    2    3    0    3    3
is_in_memory()[source]
is_mutable()[source]

returns boolean value to show whether the content of a Table is mutable.

is_open()[source]
join(other, expression=None, *, path=None, overwrite=False)[source]

joins (combines) two tables.

Parameters:
  • other – second table for join.

  • expression – If None this method returns a table with the row wise cross product of both tables. else this expression is used to filter rows from this cross product.

Returns:

emzed.Table.

Example:

if you have two table t1 and t2 as

id   mz
int  float
---  -----
  0  100.0
  1  200.0
  2  300.0

and

id   mz     rt
int  float  float
---  -----  -----
  0  100.0   10.0
  1  110.0   20.0
  2  200.0   30.0

Then the result of t1.join(t2, t1.mz.in_range(t2.mz - 20.0, t2.mz + 20.0)) is

id   mz     id__0  mz__0  rt__0
int  float  int    float  float
---  -----  -----  -----  -----
  0  100.0      0  100.0   10.0
  0  100.0      1  110.0   20.0
  1  200.0      2  200.0   30.0

If you do not provide an expression, this method returns the full cross product.

left_join(other, expression=None, *, path=None, overwrite=False)[source]

Combines two tables (also known as outer join).

Parameters:
  • other – Second table for join.

  • expression – If None this method returns a table with the row wise cross product of both tables. Else this expression is used to filter rows from this cross product, whereby all rows of the left table are kept.

Returns:

emzed.Table.

If we take the example from join()

Then t1.left_join(t2, t1.mz.in_range(t2.mz - 20.0, t2.mz + 20.0)) results:

id   mz     id__0  mz__0  rt__0
int  float  int    float  float
---  -----  -----  -----  -----
  0  100.0      0  100.0   10.0
  0  100.0      1  110.0   20.0
  1  200.0      2  200.0   30.0
  3  300.0      -      -      -
classmethod load(path)[source]

loads table from disk into memory.

Parameters:

path – path to file.

Returns:

emzed.Table.

static load_csv(path, col_names=None, col_types=None, col_formats=None, *, delimiter=';', dash_is_none=True)[source]

loads csv file.

Parameters:
  • path – path to csv file.

  • col_names – list of colum names, if not provided first line of csv file is used instead.

  • col_types – list of colum types, if not provided emzed determines types from column contents and names.

  • col_formats – list of colum formats, if not provided emzed determines formats from column contents and names.

  • delimiter – csv delimiter character.

  • dash_is_none – cells with ‘-’ are interpreted as None (missing value). types. In case - should be handled as a string with the single character “-” one must set this argument to False.

Returns:

emzed.Table.

static load_excel(path, col_names=None, col_types=None, col_formats=None)[source]

loads excel file.

Parameters:
  • path – path to file.

  • col_names – list of column names, if not provided first line of .xlsx or .xls file is used instead.

  • col_types – list of colum types, if not provided emzed determines types from column contents and names.

  • col_formats – list of colum formats, if not provided emzed determines formats from column contents and names.

Returns:

emzed.Table.

property meta_data
classmethod open(path)[source]

opens table on disk without loading data into memory.

Parameters:

path – path to file.

Returns:

emzed.Table.

property path
print_(max_rows=30, max_col_width=None, stream=None)[source]

print table.

Parameters:
  • max_rows – Maximum number of rows to display. If the table is longer only head and tail of the table are shown. The missing part is denoted with “…”.

  • max_col_width – If specified the width of columns can be restricted.

  • stream – file object to redirect printing, e.g. to a file.

rename_columns(**from_to)[source]

changes column names from current to new name using key word arguments.

param from_to:

key word arguments like a="b", see example below.

Example: t.rename_columns(a="b") renames column "a" to "b"

rename_postfixes(**from_to)[source]

changes column names from current to new name using key word arguments.

Example:

t = emzed.Table.create_table(
       ["a", "a__0", "a__1", "b__0", "b__1"],
       [int, int, int, int, int],
       rows=[[1, 2, 3, 4, 5]]
)
print(t)

t.rename_postfixes(__0="_zero")
print(t)

prints

a   a__0  a__1  b__0  b__1
int  int   int   int   int
---  ----  ----  ----  ----
  1     2     3     4     5

a    a_zero  a__1  b_zero  b__1
int  int     int   int     int
---  ------  ----  ------  ----
  1       2     3       4     5
replace_column(name, what, type_=None, format_=<class 'emzed.table.table.not_specified'>)[source]

replaces content of existing column name in place.

Parameters:
  • name – the name of the exisiting column.

  • what – you can use a list with the same length as table or an expression.

  • type_ – supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type ‘object’ instead.

  • format_ – is a format string as “%d” or or an executable string with python code. To suppress visibility set format_ = None. By default (not_specified) the method tries to determine a default format for the type.

  • insert_before – to add column name at a defined position, one can specify its position left-wise to column insert_before via the name of an existing column, or an integer index (negative values allowed !).

  • insert_after – to add column name at a defined position, one can specify its position right-wise to column insert_after.

replace_column_with_constant_value(name, what, type_=None, format_=<class 'emzed.table.table.not_specified'>)[source]

replaces the content of column name with unique value what.

For method parameters see replace_column() with exception of

Parameters:

what – any of accepted types int, float, bool, MzType, RtType, str, PeakMap, Table.

property rows
Returns:

All rows as list of tuples.

save(path, *, overwrite=False)[source]

save table to a file.

Parameters:
  • path – path describing target location.

  • overwrite – If set to True an existing file will be overwritten, else an exception will be thrown.

save_csv(path, delimiter=';', as_printed=False, dash_is_none=True, *, overwrite=False)[source]

saves Table as csv in path.

Parameters:
  • path – specifies path of the file. The path must end with .csv.

  • delimiter – Alias for sep. Default value is set to Excel dialect ‘;’.

  • as_printed – If True, formatted values will be stored. Note, format settings can lead to information loss, i.e. if column format value is set to .2f% only the first 2 decimal places will be saved.

  • overwrite – If set to True an existing file will be overwritten, else an exception will be thrown.

save_excel(path, *, overwrite=False)[source]

saves Table as xls or xlsx in path.

Parameters:

path – specifies path of the file. The path must end with .xls or .xlsx.

set_col_format(col_name, format_)[source]

sets format of column col_name to format format_.

Parameters:
  • col_name – column name.

  • format_ – accepted column format (see add_column()).

Returns:

None.

set_col_type(col_name, type_)[source]

sets type of column col_name to type type_.

Parameters:
  • col_name – column name.

  • type_ – accepted column type (see add_column()).

Returns:

None.

set_title(title)[source]
sort_by(*col_names, ascending=True, keep_view=False, path=None, overwrite=False)

sort table by given column names in given order.

Parameters:
  • col_names – one or more column names as separate arguments.

  • ascending – either bool or list/tuple of bools of same number as specified column names.

  • keep_view – keep view or consolidate result (default: True)

  • path – path in case view is consolidated, ‘None’ keeps resultin memory

  • overwite – if path is not None, this flag specifies if an existing file should be overwritten

Returns:

emzed.Table.

split_by(*col_names, keep_view=False)[source]

generates a list of subtables, whereby split columns col_names contain unique values.

Parameters:

col_names – column names with values defining split groups.

Returns:

a list of sub_tables

Example: If we have a table t as

a   b   c
int int int
--- --- ---
  1   1   1
  1   1   2
  2   1   3
  2   2   4

sub_tables = t.splitBy("a") results 3 subtables

sub_tables[0]

a   b   c
int int int
--- --- ---
  1   1   1
  1   1   2

sub_tables[1]

a   b   c
int int int
--- --- ---
  2   1   3

and subtables[2]

a   b   c
int int int
--- --- ---
  2   2   4
split_by_iter(*col_names, keep_view=False)[source]

builds a generator yielding subtables, whereby subtable split columns col_names contain unique values.

Parameters:

col_names – column names with values defining split groups.

Returns:

a generator object of subtables

refering to example table split_by():

>>> sub_tables=t.split_by_iter("a")
>>> print(next(sub_tables))

results

a   b   c
int int int
--- --- ---
  1   1   1
  1   1   2

hence the first sub_table of t, corresponding to sub_tables[0] in split_by() example. split_by_iter() can be more memory efficient than split_by().

static stack_tables(tables, path=None, overwrite=False)[source]

builds a single Table from list or tuple of Tables.

Parameters:
  • tables – list or tuple of Tables. All tables must have the same colum names with same types and formats.

  • path – If specified the result will be a Table with a db file backend, else the result will be managed in memory.

  • overwrite – Indicate if an already existing database file should be overwritten.

Returns:

emzed.Table.

summary()[source]
supported_postfixes(col_names)[source]

returns common postfixes (endings) of column col_names.

Parameters:

col_names – list or tuple of column names.

Returns:

list of common postfixes.

Examples: Assuming a Table with columns

['rt', 'rtmin', 'rtmax', 'rt1', 'rtmin1'].

>>> t.supported_postfixes(['rt'])

returns ['', 'min', 'max', '1', 'min1']

>>> t.supported_postfixes(['rt', 'rtmin'])

returns ['', '1']

>>> t.supported_postfixes(['rt', 'rtmax'])

returns ['']

property title
to_pandas()[source]

converts table to pandas DataFrame object

property unique_id

computes unique identifier based on table content and meta data.

Returns:

unique identifier as string.

emzed.mf

alias of MolecularFormula

emzed.run_feature_finder_metabo_on_folder(in_folder, file_patterns=None, out_folder=None, ms_level=None, n_cores=1, verbose=False, run_feature_grouper=True, split_by_precursor_mz_tol=0.0, overwrite=False, **parameters)[source]

runs feature_finder_metabo on all files in given folder matching providedfile_extension and saves the resulting table in out_folder.

Parameters:
  • in_folder – input folder, must exist.

  • file_patterns – list of file patterns. if not specified use [”.mzML”, “.mzXML”].

  • out_folder – output folder, not required to exist, will be created on demand. Default: out_folder = in_folder.

  • ms_level – optional ms level to be used for peak picking.

  • n_cores – run feature finding on n_cores in parallel.

  • verbose – set to True for verbose output.

  • parameters – check help(run_feature_finder_metabo) for details.

Returns:

None.

emzed.to_table(name, values, type_, format_=None, title=None, meta_data=None, path=None)[source]

generates a one-column Table from an iterable, e.g. from a list.

Parameters:
  • name – name of the column.

  • values – iterable with column values.

  • type_ – supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type ‘object’ instead.

  • format_ – is a format string as “%d”. To suppress visibility set format_ = None. By default (not_specified) the method tries to determine a default format for the type.

  • title – Table title as string.

  • meta_data – Python dictionary to assign meta data to the table.

  • path – Path for the db backend, use None for an in memory db backend.

Returns:

emzed.Table