API Utility functions

emzed.utils.toTable(colName, iterable, format_='', type_=None, title='', meta=None)[source]

generates a one-column table from an iterable, e.g. from a list, colName is name for the column.

  • if type_ is not given a common type for all values is determined,
  • if format_ is not given, a default format for type_ is used.

further one can provide a title and meta data

emzed.utils.integrate(ftable, integratorid='std', msLevel=None, showProgress=True, n_cpus=-1, min_size_for_parallel_execution=500, post_fixes=None)[source]

integrates features in ftable. returns processed table. ftable is not changed inplace.

The peak integrator corresponding to the integratorId is defined in algorithm_configs.py or local_configs.py

n_cpus <= 0 has special meaning:
n_cpus = 0 means “use all cpu cores” n_cpus = -1 means “use all but one cpu cores”, etc
emzed.utils.attach_ms2_spectra(peak_table, peak_map, mode='union', mz_tolerance=0.0013, verbose=True)[source]

takes peak_table with columns “rtmin”, “rtmax”, “mzmin”, “mzmax” and “peakmap” and extracts the ms2 spectra for these peaks.

the peak_table is modified in place, an extra column “ms2_spectra” is added. the content of such a cell in the table is always a list of spectra. for modes “union” and “intersection” this list contains one single spectrum.

  • “all”: extracts a list of ms2 spectra per peak
  • “max_range”: extracts spec with widest m/z range
  • “max_energy”: extrats spec with maximal energy
  • “union”: merges all ms2 spectra from one peak to one spectrum containing all peaks
  • “intersection”: merges all ms2 spectra from one peak to one spectrum containing peaks which appear in all ms2 spectra.

mz_tolerance: only needed for modes “union” and “intersection”.

verbose: prints some diagnosis messages for testing if mz_tolerance parameter fits,
you shoud set this parameter to True if you are not sure if mz_tolerance fits to your machines resolution.

dumb and fast version of Table.mergeTables if all tables have common column names, types and formats unless they are empty.

emzed.utils.mergeTables(tables, reference_table=None, force_merge=False)

merges tables. Eg:

>>> import emzed
>>> t1 = emzed.utils.toTable("a", [1], type_=int)
>>> t2 = t1.copy()
>>> t1.addColumn("b", 3, type_=int)
>>> t2.addColumn("c", 5, type_=int)
>>> print t1
a        b       
int      int     
------   ------  
1        3       

>>> print t2
a        c       
int      int     
------   ------  
1        5       

>>> t3 = emzed.utils.mergeTables([t1, t2])
>>> print t3
a        b        c       
int      int      int     
------   ------   ------  
1        3        -       
1        -        5       

in case of conflicting names, name orders, types or formats you can try force_merge=True or provide a reference table. This reference table just serves information about wanted column names, types and formats and is merged to the result only if it appers in tables.


Creates formula object which allows addition and subtraction:

>>> import emzed
>>> mf1 = emzed.utils.formula("H2O")
>>> mf2 = emzed.utils.formula("NaOH")
>>> mf3 = mf1 + mf2
>>> print str(mf3)
>>> print str(mf3 - mf1)

Mass calculation is supported too:

>>> print mf1.mass()

If you need some internal representation, you can get a dictionary. Keys are pairs of (symbol, massnumber), where massnumber = None refers to the lowest massnumber. Values of the dictionary are counts:

>>> print mf1.asDict()
OrderedDict([(('H', None), 2), (('O', None), 1)])
>>> mixed = emzed.utils.formula("[13]C2[14]C3")
>>> print mixed.asDict()
OrderedDict([(('C', 13), 2), (('C', 14), 3)])
emzed.utils.addmf(formula0, *formulas)[source]

Combines molecular formulas by addition and subtraction:

>>> import emzed
>>> print emzed.utils.addmf("H2O", "COOH")
>>> print emzed.utils.addmf("H2O", "COOH", "NaCl")

A leading minus sign subtracts the formula following this sign:

>>> print emzed.utils.addmf("H2O2", "-H2O")
>>> print emzed.utils.addmf("H2O", "COOH", "NaCl", "-H2O2")
>>> print emzed.utils.addmf("(CH2)7COOH", "-C7")
emzed.utils.formulaTable(min_mass, max_mass, C=(0, None), H=(0, None), N=(0, None), O=(0, None), P=(0, None), S=(0, None), prune=True)[source]

This is a reduced Python version of HR2 formula generator, see http://fiehnlab.ucdavis.edu/projects/Seven_Golden_Rules/Software/

This function generates a table containing molecular formulas consisting of elements C, H, N, O, P and S having a mass in range [min_mass, max_mass]. For each element one can provide an given count or an inclusive range of atom counts considered in this process.

If prune is True, mass ratio rules (from “seven golden rules”) and valence bond checks are used to avoid unrealistic compounds in the table, else all formulas explaining the given mass range are generated.

Putting some restrictions on atomcounts, eg C=(0, 100), can speed up the process tremendously.


>>> import emzed
>>> m0 = emzed.mass.of("C6H12O3")
>>> mmin, mmax = m0 - 0.01, m0 + 0.01
>>> print mmin, mmax
132.068645383 132.088645383
>>> tab = emzed.utils.formulaTable(mmin, mmax)
>>> print tab
mf       m0       
str      float    
------   ------   
C2H8N6O  132.07596
C5H13N2P 132.08163
C5H12N2S 132.07212
C6H12O3  132.07865
C6H13OP  132.07040
C8H8N2   132.06875

>>> # reduce output by putting restrictions on atom counts:
>>> tab = emzed.utils.formulaTable(mmin, mmax, C=6, N=0, P=(0,3), S=0)
>>> print tab
mf       m0       
str      float    
------   ------   
C6H12O3  132.07865
C6H13OP  132.07040

>>> # generating all hydrocarbons with a neutral mass below 30:
>>> tab = emzed.utils.formulaTable(1, 30, C=(1, 100), H=(1,100), N=0, O=0, P=0, S=0, prune=False)
>>> print tab    
mf       m0      
str      float   
------   ------  
CH       13.00783
CH2      14.01565
C2H4     28.03130
C2H5     29.03913

emzed.utils.isotopeDistributionTable(formula, R=None, fullC13=False, minp=0.01, **kw)[source]

generates Table for most common isotopes of molecule with given mass formula.

If the resolution R is given, the measurement device is simulated, and overlapping peaks may merge.

fullC13=True assumes that only C13 carbon is present in formula.

Further you can give a threshold minp for considering only isotope peaks with an abundance above the value. Standard is minp=0.01.

If you have special elementary isotope abundances which differ from the natural abundances, you can tell that like emzed.utils.isotopeDistributionTable("S4C4", C=dict(C13=0.5, C12=0.5))


>>> import emzed
>>> # natural abundances:
>>> tab = emzed.utils.isotopeDistributionTable("C3H7NO2")
>>> tab.abundance /= tab.abundance.sum()
>>> print tab
mf                                mass      abundance
str                               float     float    
------                            ------    ------   
[12]C3 [1]H7 [14]N1 [16]O2        89.047679 0.96     
[12]C2 [13]C1 [1]H7 [14]N1 [16]O2 90.051034 0.03     
-                                 -         0.01     

>>> # artifical abundances:
>>> tab = emzed.utils.isotopeDistributionTable("C3H7NO2", C=dict(C13=0.5, C12=0.5))
>>> tab.abundance /= tab.abundance.sum()
>>> print tab
mf                                mass      abundance
str                               float     float    
------                            ------    ------   
[12]C3 [1]H7 [14]N1 [16]O2        89.047679 0.12     
[12]C2 [13]C1 [1]H7 [14]N1 [16]O2 90.051034 0.37     
[12]C1 [13]C2 [1]H7 [14]N1 [16]O2 91.054389 0.37     
[13]C3 [1]H7 [14]N1 [16]O2        92.057744 0.12     
-                                 -         0.01     

emzed.utils.plotIsotopeDistribution(formula, R=None, fullC13=False, minp=0.01, plotGauss=None, **kw)[source]

plots isotope distribution for given molecular formula formula. for all parameters, despite plotGauss: see isotopeDistributionTable()

If R is provided, gaussian peaks are plotted, else centroids. This behavior can be overrun by setting plotGauss to True or False.

If plotGauss is True, bell shaped curves are plotted, else the centroids according to the used resolution are shown.

For low minp the choice plotGauss=False the plot is drawn faster.

>>> emzed.utils.plotIsotopeDistribution("C3H7NO2", C=dict(C13=0.5, C12=0.5), R=5000) 

opens urlPath in browser, eg:

>>> emzed.utils.openInBrowser("http://emzed.biol.ethz.ch") 

Adds mz value for peaks not detected with centwaves algorithm based on rt and mz window: needed are columns mzmin, mzmax, rtmin, rtmax and peakmap mz, postfixes are automaticaly taken into account