emzed package¶
Subpackages¶
- emzed.align package
- emzed.annotate package
- emzed.chemistry package
- Submodules
- emzed.chemistry.abundance module
- emzed.chemistry.adducts module
- emzed.chemistry.elements module
- emzed.chemistry.fetch_elements_pubchem module
- emzed.chemistry.fit_formula module
- emzed.chemistry.formula_parser module
- emzed.chemistry.isotope_generator module
- emzed.chemistry.mass module
- emzed.chemistry.molecular_formula module
- emzed.chemistry.pubchem module
- Module contents
- Submodules
- emzed.config package
- emzed.core package
- emzed.ms2 package
- emzed.ms_data package
- Submodules
- emzed.ms_data.peak_map module
BoundSpectrum
Chromatogram
ChromatogramType
DbBackedProperty
ImmutableMSChromatograms
ImmutablePeakMap
ImmutableSpectra
MSChromatogram
MutableSpectra
PeakMap
SpectraBase
Spectrum
chromatogram()
create_indices()
create_table()
db_backed_property
extract_spectrum()
insert_chromatogram()
insert_peaks()
insert_precursors()
profile()
representing_mz_peak()
to_openms_spectrum()
- emzed.ms_data.peak_map_hasher module
- emzed.ms_data.peak_map module
- Module contents
- Submodules
- emzed.optimized package
- emzed.peak_picking package
- emzed.pyopenms package
- emzed.quantification package
- emzed.r_connect package
- emzed.remote_package package
- emzed.table package
- Submodules
- emzed.table.add_column module
- emzed.table.base_models module
- emzed.table.col_types module
- emzed.table.collapse module
- emzed.table.delete_rows module
- emzed.table.expressions module
- emzed.table.extract_columns_model module
- emzed.table.filter_model module
- emzed.table.full_table_model module
- emzed.table.group_by module
- emzed.table.immutable_table_model module
- emzed.table.join module
- emzed.table.load_into_from module
- emzed.table.pickle module
- emzed.table.prepare_table_cell_content module
- emzed.table.replace_column module
- emzed.table.row_class module
- emzed.table.select_model module
- emzed.table.sort_model module
- emzed.table.table module
- emzed.table.table_migrations module
- emzed.table.table_utils module
- Module contents
- Submodules
- emzed.targeted package
- emzed.utils package
Submodules¶
Module contents¶
- class emzed.PeakMap(conn, access_name, meta_data, info)[source]¶
Bases:
ImmutablePeakMap
- class emzed.Table(model, meta_data=None, *, _freeze_unique_id=None)[source]¶
Bases:
object
- add_column(name, what, type_, format_=<class 'emzed.table.table.not_specified'>, insert_before=None, insert_after=None)[source]¶
adds a new column with
name
in place.- Parameters:
name – the name of the new column.
what – either a
list
with the same length as table or anexpression
.type_ – supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type ‘object’ instead. :param format_: is a format string as “%d” or or an executable string with python code. To suppress visibility set format_ =
None
. By default (not_specified
) the method tries to determine a default format for the type.insert_before – to add column
name
at a defined position, one can specify its position left-wise to columninsert_before
via the name of an existing column, or an integer index (negative values allowed !).insert_after – to add column
name
at a defined position, one can specify its position right-wise to columninsert_after
.
- add_column_with_constant_value(name, value, type_, format_=<class 'emzed.table.table.not_specified'>, insert_before=None, insert_after=None)[source]¶
add column
name
with unique valuevalue
.For method parameters see
add_column()
with exception of- Parameters:
what – any of accepted types int, float, bool, MzType, RtType, str, PeakMap, Table.
- add_enumeration(col_name='id', insert_before=None, insert_after=None, start_with=0)[source]¶
adds enumerated column as first column to table in place.
- Parameters:
col_name – name of added column. Default name is
id
.insert_before – to add column
col_name
at a defined position, one can specify its position via the name of an existing column, or an integer index (negative values allowed). settinginsert_before
and theinsert_after
at the same time is not allowed.insert_after – to add column
col_name
at a defined position, one can specify its position via the name of an existing column, or an integer index (negative values allowed). settinginsert_before
and theinsert_after
at the same time is not allowed.start_with – start value for creating the ids. default value is 0.
- add_or_replace_column(name, what, type_=None, format_=<class 'emzed.table.table.not_specified'>, insert_before=None, insert_after=None)[source]¶
replaces the content of column
name
if it exists, elsename
is added (in place).For parameters see
replace_column()
.
- add_or_replace_column_with_constant_value(name, what, type_=None, format_=<class 'emzed.table.table.not_specified'>, insert_before=None, insert_after=None)[source]¶
replaces the content of column
name
with unique value ifname
exists, elsename
is added (in place).For parameters see
replace_column_with_constant_value()
.
- apply(function, *args, ignore_nones=True, result_type=None)[source]¶
allows computing columns using a function with multiple arguments.
- Parameters:
function – any function accepting arguments
*args
. The return value can be used to compute another column.args – function arguments. arguments can be column expressions like t[‘col_name’], or local or global variables accepted by the function.
ignore_nones – since
None
represents a missing value, apply will not callfunction
in case one of the arguments isNone
and will instead considerNone
as result. in case the function is able to consider such missing values, one must setignore_nones
toFalse
.
Example: the following code
def convert(v): return str(v) + "s" t = emzed.to_table("a", [1, None, 5], int) t.add_column("b", t.apply(replace_none, t.a), int) t.add_column("c", t.apply(replace_none, t.a, ignore_nones=False), int) print(t)
prints
a b c int int int --- --- --- 1 1 1 - - -1 5 5 5
- property col_formats¶
Column formats.
- Returns:
tuple of format specifiers.
- property col_names¶
Column names.
- Returns:
tuple of strings.
- property col_types¶
Column types.
- Returns:
tuple of types.
- collapse(*col_names, new_col_name='collapsed', path=None)[source]¶
colapses a table by grouping according to columns
col_names
.- Parameters:
col_names – column names with values defining colapsing groups.
new_col_names – column name of new column resulting from colapsing process.
path – If specified the result will be a Table with a db file backend, else the result will be managed in memory.
- Returns:
emzed.Table
Example:
a b c int int int --- --- --- 1 1 1 1 1 2 2 1 3 2 2 4
>>> print(t.collapse('a'))
results
a collapsed int emzed.Table --- --------------- 1 <Table af3 ...> 2 <Table e9f ...>
- consolidate(path=None, *, overwrite=False)[source]¶
consolidates if underlying database table is a view.
- Parameters:
path – If specified the result will be a Table with a db file backend, else the result will be managed in memory.
overwrite – Indicate if an already existing database file should be overwritten.
- Returns:
- copy(path=None, *, overwrite=False)¶
consolidates if underlying database table is a view.
- Parameters:
path – If specified the result will be a Table with a db file backend, else the result will be managed in memory.
overwrite – Indicate if an already existing database file should be overwritten.
- Returns:
- static create_table(col_names, col_types, col_formats=None, rows=None, title=None, meta_data=None, path=None)[source]¶
creates a table.
- Parameters:
col_names – list or tuple of strings.
col_types – list of types.
col_formats – list of formats using format specifiers like “%.2f” If not specified emzed tries to guess appropriate formats based on column type and column name.
rows – list of lists.
title – table title as string.
meta_data – dictionary to manage user defined meta data.
path – path for the db backend, default is
None
to use the the in-memory db backend.
- Returns:
- drop_columns(*col_names)[source]¶
removes columns in place.
- Parameters:
col_names – column names. either exact names or names containg wild cards like
?
and*
.
Example: Table
t
with colnamesid, mz, mzmin, mzmax, sample_1k1, sample_1m1, sample_1k2
t.drop_columns('mz*', 'sample_1?1')
results
t
with columnsid, sample_1k2
- extract_columns(*col_names, keep_view=False, path=None, overwrite=False)¶
returns new Table with selected columns
col_names
.- Parameters:
col_names – list or tuple with selected, existing column names.
keep_view – keep view or consolidate result (default: True)
path – path in case view is consolidated, ‘None’ keeps resultin memory
overwite – if path is not None, this flag specifies if an existing file should be overwritten
- fast_join(other, col_name, col_name_other=None, atol=0.0, rtol=0.0, extra_condition=None, *, path=None, overwrite=False)[source]¶
joins (combines) two tables based on comparing approximate equality of two numerical columns.
- Parameters:
other – second table for join.
col_name – column name to consider.
col_name_other – column name of other to consider in case it is different to col_name.
atol – absolute tolerance for approximate equlity check.
rtol – relative tolerance for approximate equlity check.
- Returns:
Performance: In case other is significantly larger than self, it is recommended to swap the tables.
The apprimate equality check for two numbers a and b is:
abs(a - b) <= atol + rtol * abs(a)
So if you only need comparison based absolute tolerance you can set rtol to 0.0, and if you only need relative tolerance check you can set atol to 0.0.
- fast_left_join(other, col_name, col_name_other=None, atol=0.0, rtol=0.0, extra_condition=None, *, path=None, overwrite=False)[source]¶
joins (combines) two tables based on comparing approximate equality of two numerical columns.
In contrast to
fast_join()
this method will include also non-matching rows fromself
.- Parameters:
other – second table for join.
col_name – column name to consider.
col_name_other – column name of other to consider in case it is different to col_name.
atol – absolute tolerance for approximate equlity check.
rtol – relative tolerance for approximate equlity check.
- Returns:
Performance: In case other is significantly larger than self, it is recommended to swap the tables.
The apprimate equality check for two numbers a and b is:
abs(a - b) <= atol + rtol * abs(a)
So if you only need comparison based absolute tolerance you can set rtol to 0.0, and if you only need relative tolerance check you can set atol to 0.0.
- filter(condition, keep_view=False, path=None, overwrite=False)¶
creates a new table by filtering rows fulfiling the given condition. similar use as pandas
query
.- Parameters:
condition – expression like
t.a < 0
ort.a <= t.b
.keep_view – keep view or consolidate result (default: True)
path – path in case view is consolidated, ‘None’ keeps resultin memory
overwite – if path is not None, this flag specifies if an existing file should be overwritten
- Returns:
emzed.Table
with filtered rows.
- static from_pandas(df, col_names=None, col_types=None, col_formats=None)[source]¶
converts pandas data frame into emzed Table.
- Parameters:
df – pandas data frame.
col_names – list of colum names, can be used to override data frame colum names.
col_types – list of colum types, if not provided emzed determines types from column contents and names.
col_formats – list of colum formats, if not provided emzed determines formats from column contents and names.
- Returns:
- get_column(name)[source]¶
returns column expression object for column
name
.You can use
t[name]
instead.
- group_by(*colums, group_nones=False)[source]¶
return Table group_by object where rows got grouped by columns.
- param columns:
table columns i.e
t.a
, ort['b']
.- param group_nones:
ignores rows where group columns are None.
- returns:
GroupBy
object
Examples: For given Table t
a b c int int int --- --- --- 0 1 2 1 - 1 2 - 0 2 2 3
>>> t.add_Column('ga', t.group_by(t.a).min(t.c), int) >>> t.add_Column('gb1', t.group_by(t.b).min(t.c), int) >>> t.add_Column('gb2', t.group_by(t.c).min(t.c), int)
>>> print(t)
a b c ga gb1 gb2 int int int int int int --- --- --- --- --- --- 0 1 2 2 2 2 1 - 1 1 - 0 2 - 0 0 - 0 2 2 3 0 3 3
- join(other, expression=None, *, path=None, overwrite=False)[source]¶
joins (combines) two tables.
- Parameters:
other – second table for join.
expression – If
None
this method returns a table with the row wise cross product of both tables. else this expression is used to filter rows from this cross product.
- Returns:
Example:
if you have two table
t1
andt2
asid mz int float --- ----- 0 100.0 1 200.0 2 300.0
and
id mz rt int float float --- ----- ----- 0 100.0 10.0 1 110.0 20.0 2 200.0 30.0
Then the result of
t1.join(t2, t1.mz.in_range(t2.mz - 20.0, t2.mz + 20.0))
isid mz id__0 mz__0 rt__0 int float int float float --- ----- ----- ----- ----- 0 100.0 0 100.0 10.0 0 100.0 1 110.0 20.0 1 200.0 2 200.0 30.0
If you do not provide an expression, this method returns the full cross product.
- left_join(other, expression=None, *, path=None, overwrite=False)[source]¶
Combines two tables (also known as outer join).
- Parameters:
other – Second table for join.
expression – If
None
this method returns a table with the row wise cross product of both tables. Else this expression is used to filter rows from this cross product, whereby all rows of the left table are kept.
- Returns:
If we take the example from
join()
Then
t1.left_join(t2, t1.mz.in_range(t2.mz - 20.0, t2.mz + 20.0))
results:id mz id__0 mz__0 rt__0 int float int float float --- ----- ----- ----- ----- 0 100.0 0 100.0 10.0 0 100.0 1 110.0 20.0 1 200.0 2 200.0 30.0 3 300.0 - - -
- classmethod load(path)[source]¶
loads table from disk into memory.
- Parameters:
path – path to file.
- Returns:
- static load_csv(path, col_names=None, col_types=None, col_formats=None, *, delimiter=';', dash_is_none=True)[source]¶
loads csv file.
- Parameters:
path – path to csv file.
col_names – list of colum names, if not provided first line of csv file is used instead.
col_types – list of colum types, if not provided emzed determines types from column contents and names.
col_formats – list of colum formats, if not provided emzed determines formats from column contents and names.
delimiter – csv delimiter character.
dash_is_none – cells with ‘-’ are interpreted as None (missing value). types. In case - should be handled as a string with the single character “-” one must set this argument to
False
.
- Returns:
- static load_excel(path, col_names=None, col_types=None, col_formats=None)[source]¶
loads excel file.
- Parameters:
path – path to file.
col_names – list of column names, if not provided first line of .xlsx or .xls file is used instead.
col_types – list of colum types, if not provided emzed determines types from column contents and names.
col_formats – list of colum formats, if not provided emzed determines formats from column contents and names.
- Returns:
- property meta_data¶
- classmethod open(path)[source]¶
opens table on disk without loading data into memory.
- Parameters:
path – path to file.
- Returns:
- property path¶
- print_(max_rows=30, max_col_width=None, stream=None)[source]¶
print table.
- Parameters:
max_rows – Maximum number of rows to display. If the table is longer only head and tail of the table are shown. The missing part is denoted with “…”.
max_col_width – If specified the width of columns can be restricted.
stream – file object to redirect printing, e.g. to a file.
- rename_columns(**from_to)[source]¶
changes column names from current to new name using key word arguments.
- param from_to:
key word arguments like
a="b"
, see example below.
Example:
t.rename_columns(a="b")
renames column"a"
to"b"
- rename_postfixes(**from_to)[source]¶
changes column names from current to new name using key word arguments.
Example:
t = emzed.Table.create_table( ["a", "a__0", "a__1", "b__0", "b__1"], [int, int, int, int, int], rows=[[1, 2, 3, 4, 5]] ) print(t) t.rename_postfixes(__0="_zero") print(t)
prints
a a__0 a__1 b__0 b__1 int int int int int --- ---- ---- ---- ---- 1 2 3 4 5 a a_zero a__1 b_zero b__1 int int int int int --- ------ ---- ------ ---- 1 2 3 4 5
- replace_column(name, what, type_=None, format_=<class 'emzed.table.table.not_specified'>)[source]¶
replaces content of existing column
name
in place.- Parameters:
name – the name of the exisiting column.
what – you can use a
list
with the same length as table or anexpression
.type_ – supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type ‘object’ instead.
format_ – is a format string as “%d” or or an executable string with python code. To suppress visibility set format_ =
None
. By default (not_specified
) the method tries to determine a default format for the type.insert_before – to add column
name
at a defined position, one can specify its position left-wise to columninsert_before
via the name of an existing column, or an integer index (negative values allowed !).insert_after – to add column
name
at a defined position, one can specify its position right-wise to columninsert_after
.
- replace_column_with_constant_value(name, what, type_=None, format_=<class 'emzed.table.table.not_specified'>)[source]¶
replaces the content of column
name
with unique valuewhat
.For method parameters see
replace_column()
with exception of- Parameters:
what – any of accepted types int, float, bool, MzType, RtType, str, PeakMap, Table.
- property rows¶
- Returns:
All rows as list of tuples.
- save(path, *, overwrite=False)[source]¶
save table to a file.
- Parameters:
path – path describing target location.
overwrite – If set to
True
an existing file will be overwritten, else an exception will be thrown.
- save_csv(path, delimiter=';', as_printed=False, dash_is_none=True, *, overwrite=False)[source]¶
saves Table as csv in
path
.- Parameters:
path – specifies path of the file. The path must end with
.csv
.delimiter – Alias for sep. Default value is set to Excel dialect ‘;’.
as_printed – If
True
, formatted values will be stored. Note, format settings can lead to information loss, i.e. if column format value is set to .2f% only the first 2 decimal places will be saved.overwrite – If set to
True
an existing file will be overwritten, else an exception will be thrown.
- save_excel(path, *, overwrite=False)[source]¶
saves Table as xls or xlsx in
path
.- Parameters:
path – specifies path of the file. The path must end with
.xls
or.xlsx
.
- set_col_format(col_name, format_)[source]¶
sets format of column
col_name
to formatformat_
.- Parameters:
col_name – column name.
format_ – accepted column format (see
add_column()
).
- Returns:
None
.
- set_col_type(col_name, type_)[source]¶
sets type of column
col_name
to typetype_
.- Parameters:
col_name – column name.
type_ – accepted column type (see
add_column()
).
- Returns:
None
.
- sort_by(*col_names, ascending=True, keep_view=False, path=None, overwrite=False)¶
sort table by given column names in given order.
- Parameters:
col_names – one or more column names as separate arguments.
ascending – either bool or list/tuple of bools of same number as specified column names.
keep_view – keep view or consolidate result (default: True)
path – path in case view is consolidated, ‘None’ keeps resultin memory
overwite – if path is not None, this flag specifies if an existing file should be overwritten
- Returns:
- split_by(*col_names, keep_view=False)[source]¶
generates a list of subtables, whereby split columns
col_names
contain unique values.- Parameters:
col_names – column names with values defining split groups.
- Returns:
a list of sub_tables
Example: If we have a table
t
asa b c int int int --- --- --- 1 1 1 1 1 2 2 1 3 2 2 4
sub_tables = t.splitBy("a")
results 3 subtablessub_tables[0]
a b c int int int --- --- --- 1 1 1 1 1 2
sub_tables[1]
a b c int int int --- --- --- 2 1 3
and subtables[2]
a b c int int int --- --- --- 2 2 4
- split_by_iter(*col_names, keep_view=False)[source]¶
builds a generator yielding subtables, whereby subtable split columns
col_names
contain unique values.- Parameters:
col_names – column names with values defining split groups.
- Returns:
a generator object of subtables
refering to example table
split_by()
:>>> sub_tables=t.split_by_iter("a")
>>> print(next(sub_tables))
results
a b c int int int --- --- --- 1 1 1 1 1 2
hence the first sub_table of t, corresponding to sub_tables[0] in
split_by()
example.split_by_iter()
can be more memory efficient thansplit_by()
.
- static stack_tables(tables, path=None, overwrite=False)[source]¶
builds a single Table from list or tuple of Tables.
- Parameters:
tables – list or tuple of Tables. All tables must have the same colum names with same types and formats.
path – If specified the result will be a Table with a db file backend, else the result will be managed in memory.
overwrite – Indicate if an already existing database file should be overwritten.
- Returns:
- supported_postfixes(col_names)[source]¶
returns common postfixes (endings) of column
col_names
.- Parameters:
col_names – list or tuple of column names.
- Returns:
list of common postfixes.
- Examples: Assuming a Table with columns
['rt', 'rtmin', 'rtmax', 'rt1', 'rtmin1']
.
>>> t.supported_postfixes(['rt'])
returns
['', 'min', 'max', '1', 'min1']
>>> t.supported_postfixes(['rt', 'rtmin'])
returns
['', '1']
>>> t.supported_postfixes(['rt', 'rtmax'])
returns
['']
- property title¶
- property unique_id¶
computes unique identifier based on table content and meta data.
- Returns:
unique identifier as string.
- emzed.mf¶
alias of
MolecularFormula
- emzed.run_feature_finder_metabo_on_folder(in_folder, file_patterns=None, out_folder=None, ms_level=None, n_cores=1, verbose=False, run_feature_grouper=True, split_by_precursor_mz_tol=0.0, overwrite=False, **parameters)[source]¶
runs feature_finder_metabo on all files in given folder matching providedfile_extension and saves the resulting table in out_folder.
- Parameters:
in_folder – input folder, must exist.
file_patterns – list of file patterns. if not specified use [”.mzML”, “.mzXML”].
out_folder – output folder, not required to exist, will be created on demand. Default: out_folder = in_folder.
ms_level – optional ms level to be used for peak picking.
n_cores – run feature finding on n_cores in parallel.
verbose – set to
True
for verbose output.parameters – check
help(run_feature_finder_metabo)
for details.
- Returns:
None.
- emzed.to_table(name, values, type_, format_=None, title=None, meta_data=None, path=None)[source]¶
generates a one-column Table from an iterable, e.g. from a list.
- Parameters:
name – name of the column.
values – iterable with column values.
type_ – supported colum types are int, float, bool, MzType, RtType, str, PeakMap, Table, object. In case you want to use Python objects like lists or dicts, use column type ‘object’ instead.
format_ – is a format string as “%d”. To suppress visibility set format_ =
None
. By default (not_specified
) the method tries to determine a default format for the type.title – Table title as string.
meta_data – Python dictionary to assign meta data to the table.
path – Path for the db backend, use
None
for an in memory db backend.
- Returns:
emzed.Table