[1]:
%%capture
# setup
import os

import emzed
import numpy as np
import matplotlib.pyplot as plt

# loading peakmap example
data_folder = os.path.join(os.getcwd(), "tutorial_data")
path = os.path.join(data_folder, "CoA_ester_ms_ms2.mzXML")
coa = emzed.io.load_peak_map(path)

8 PeakMap

Data structure

Whenever you open an LC-MS raw data file (‘.mzML’, ‘.mzXML’) using emzed.io.load_peak_map(path), it gets converted into a PeakMap object. Similar to Table, PeakMaps are SQLite databases allowing memory-efficient data processing. Simply phrased, a PeakMap is a collection of measured MS spectra providing a number of (mainly) data extraction methods. Each spectrum is represented by an emzed Spectra object. In principle, PeakMap, Spectra and associated methods allow handling all common LC-MS data acquisition approaches. To learn more about PeakMap, we will extract some information from the example PeakMap coa. It contains MS and MS-MS data of coenzyme A thioesters. To get an overview of the content we can use the PeakMap method summary:

[2]:
coa.summary()
[2]:
info ms_level value
str str str
rt range 1 5.1m .. 32.0m
rt range 2 6.2m .. 32.0m
mz range 1 700.017 .. 999.976
mz range 2 210.069 .. 1777.090
num_scans 1 1697
num_scans 2 361
polarities {'+'}
ms_chromatograms 0

Alternatively, we can directly access the different parameters one by one using the corresponding attributes:

[3]:
print(f"""
rt range:\t{coa.rt_range()}
mz range:\t{coa.mz_range()}
ms levels:\t{coa.ms_levels()}
spectra:\t{coa.spectra[:2]}
""")

rt range:       (306.065, 1920.36)
mz range:       (210.06932067871094, 1777.0897216796875)
ms levels:      {1, 2}
spectra:        [<BoundSpectrum n_peaks=3>, <BoundSpectrum n_peaks=6>]

We already used the method rt_range() in Chapter 2 to extract ion chromatograms of amino acids. It returns a tuple with min(RT) and max(RT). Analogously, mz_range and ms_levels return the corresponding values. Note, ms_levels returns a set(), since there is no MS level order in sequences of spectra. Next, we select an arbitrary spectrum:

[4]:
# we select a single spectrum out of the list of spectra
spectrum = coa.spectra[50]

print(f"""
scan number:\t{spectrum.scan_number}
retention time:\t{spectrum.rt}
polarity:\t{spectrum.polarity}
MS level:\t{spectrum.ms_level}
precursors:\t{spectrum.precursors}
peaks: {spectrum.peaks[:5]}""")

scan number:    51
retention time: 343.081
polarity:       +
MS level:       1
precursors:     []
peaks: [[ 761.7154541  2283.57983398]
 [ 782.6260376  2659.90942383]
 [ 784.76593018 2127.91162109]
 [ 798.34472656 2325.74389648]
 [ 831.12145996 2208.859375  ]]

A Spectrum provides retention time in seconds, MS level, polarity, precursor ions, and spectral peaks, which is a list of (mz, intensity) pairs. Since the 51st spectrum is of MS level 1, no precursor ions exist. Next, we will inspect a level 2 Spectrum:

[5]:
# we select a single spectrum out of the list of spectra, this time an MS2 spectrum
spectrum = coa.spectra[794]
print(f"""
scan number:\t{spectrum.scan_number}
retention time:\t{spectrum.rt}
polarity:\t{spectrum.polarity}
MS level:\t{spectrum.ms_level}
precursors:\t{spectrum.precursors}
peaks: {spectrum.peaks[-5:]}""")

scan number:    795
retention time: 909.846
polarity:       +
MS level:       2
precursors:     [(810.13323975, 1921060.0, 1)]
peaks: [[   792.12030029   2947.25415039]
 [   810.13122559  34063.26171875]
 [   811.1338501  236483.8125    ]
 [   811.46936035   1006.56011963]
 [   818.55566406    664.02734375]]

In case of a level 2 Spectrum, the precursor ion is given as a list containing a single (mz, intensity, MS level) tuple. Note, for spectra of MS level n, precursors contains precursor ions of level 1 up to level n. In principle, emzed3 can handle spectral data of any MS level. In practice, almost exclusively MS level 1 and level 2 data are acquired.

Methods

The method chromatogram allows ion chromatogram extraction.

chromatogram(mzmin=None, mzmax=None, rtmin=None, rtmax=None, ms_level=None, precursormzmin=None, precursormzmax=None)

with arguments

  • mzmin, mzmax: lower and upper mz limit

  • rtmin, rtmax: lower and upper rt limit (in seconds)

  • ms_level: the MS level

  • precursormzmin, precursormzmax: lower and upper precursor m/z limit. (only required if ms_level > 1)

returns a list consisting of retention time, intensity pairs, where intensity is the sum of all spectral peaks with mzmin \(\leq\) mz \(\leq\) mzmax.

As a simple example, we will extract the total ion current of the MS level 1 peakmap:

[6]:
def plot_chromatogram(rts, intensities):
    plt.plot(rts, intensities)
    plt.xlabel("time [s]")
    plt.ylabel("intensity")
    plt.show()


rts, intensities = coa.chromatogram(ms_level=1)
plot_chromatogram(rts, intensities)
../_images/emzed_tutorial_08_PeakMap_11_0.png

Next, we will extract the EIC of the [M+H] ion of acetyl-CoA and the corresponding TIC of the MS level 2:

[7]:
mz = emzed.mass.of("C23H38N7O17P3S") + emzed.mass.p
print(mz)
mzmin = mz - 0.003
mzmax = mz + 0.003
rts_2, intensities_2 = coa.chromatogram(
    ms_level=2, precursormzmin=mzmin, precursormzmax=mzmax
)
# we limit the plot time range for the MS level 1 EIC to the range of the fragment ion TIC
rtmin = min(rts_2)
rtmax = max(rts_2)
rts_1, intensities_1 = coa.chromatogram(
    mzmin=mzmin, mzmax=mzmax, rtmin=rtmin, rtmax=rtmax, ms_level=1
)

plt.plot(rts_1, intensities_1)
plt.plot(rts_2, intensities_2)
plt.xlabel("time [s]")
plt.ylabel("intensity")
plt.show()
810.133055879012
../_images/emzed_tutorial_08_PeakMap_13_1.png

The method extract allows creating a sub peakmap:

extract(mzmin=None, mzmax=None, rtmin=None, rtmax=None, imin=None, imax=None, mslevelmin=None, mslevelmax=None, precursormzmin=None, precursormzmax=None, *, target_db_file=None)

Arguments:

  • mzmin, mzmax: lower and upper mz limit

  • rtmin, rtmax: lower and upper rt limit

  • imin, imax: lower and upper intensity limit

  • precursormzmin, precursormzmax: lower and upper precursor mz limit. (only required if ms_level > 1)

  • target_db_file: allows defining a path where the resulting file will be stored as SQLite database to allow memory-efficient on-disk processing

  • *: additional Spectra arguments i.e. polarity, scan_number

The method returns a subset PeakMap fulfilling the defined conditions.

As an example, we will extract the major MS level 1 peaks:

[8]:
coa_major = coa.extract(imin=5e5, mslevelmax=1)
# to visualize the result we plot the correspinding TIC (we use a stem plot since only a few RTs are remaining after the extraciton)
plt.stem(*coa_major.chromatogram(), markerfmt=" ", basefmt="C0-")
plt.xlabel("time [s]")
plt.ylabel("intensity")
plt.show()
../_images/emzed_tutorial_08_PeakMap_15_0.png

The method filter is dedicated to spectral filtering:

filter(predicate, target_db_file=None)

Arguments:

  • predicate: a function defining filtering condition with Spectrum as unique required argument, returns True or False.

  • target_db_file: allows defining a path where the resulting file will be stored as SQLite database to allow memory-efficient on-disk processing.

The method returns a PeakMap containing all spectra fulfilling the predicate.

Since most routine cases are already covered by the method extract, filter will be only needed for very specific cases. In our next example, we will filter the PeakMap for MS level 2 sectra with a specific neutral loss (the m/z difference between precursor ion and fragment ion). First we define the predicate function:

[9]:
neutral = emzed.mass.of("C10H16N5O13P3")
mztol = 0.005


def neutral_loss(spectrum):
    # only spectra of ms level 2 are considered
    if spectrum.ms_level == 2:
        # we extract the mz value of the precursor ion:
        mzprec = spectrum.precursors[0][0]
        # to extract spectral mz values we take the 1st column of peaks array
        # if we add the neutral loss to the fragment ions resulting mz value
        # equals precursor ion mz with tolerance mztol
        delta = np.abs(spectrum.peaks[:, 0] + neutral - mzprec)
        candidates = delta[delta <= mztol]
        return len(candidates) > 0
    return False


sub = coa.filter(neutral_loss)
print(f"""number of spectra
before:\t{len(coa)}
after:\t{len(sub)}""")
number of spectra
before: 2058
after:  185

The method get_dominating_peak_map yields a PeakMap containing all spectra of the lowest MS level. In case of MS level n experiments, those spectra contain the source (precursor) ions of subsequent fragmentations and hence are dominating.

coa.get_dominating_peak_map()

With very few exceptions, the dominating peak map is always level 1 and the method is identical to

coa.extract(mslevelmax = 1)

The method representing_mz_peak returns the most intense spectrum peak in a user-defined rt and mz range.

representing_mz_peak(mzmin, mzmax, rtmin, rtmax, ms_level=1)

with arguments

  • mzmin, mzmax: lower and upper mz limit

  • rtmin, rtmax: lower and upper rt limit

  • ms_level: the MS level

[10]:
coa.representing_mz_peak(
    mzmin=750, mzmax=850, rtmin=6 * 60.0, rtmax=10 * 60.0, ms_level=1
)
[10]:
np.float64(796.3164026464547)

The method get_precursors_mzs returns a set of all precursor m/z values as tuple

[11]:
precursors = coa.get_precursors_mzs()
print("number of precursor ions:", len(precursors))
# we will show the first
precursors.pop()
number of precursor ions: 124
[11]:
(872.14941406,)

The method split_by_precursors creates a dictionary {(level n-1 precursor,), level n peakmap}

split_by_precursors(mz_tol = 0.0)

with argument mz_tol: maximal allowed m/z difference of precursor ions. Default: 0.0

When we apply the method with a m/z tolerance of 0.005 for the precursor ion we obtain

[12]:
pairs = coa.split_by_precursors(0.005)
len(pairs)
[12]:
33

A potential use case would be a targeted MS level 2 approach based on the peak area of the common CoA ester fragment ion at m/z 428.0367:

[13]:
precursors, pms = zip(*pairs.items())
mz_tol = 0.003
# we create an exctraction table
t = emzed.to_table(
    "precursors", precursors, object
)  # tuple is not allowed as type, therefore object is used
t.add_column_with_constant_value("mz", 428.0367, emzed.MzType)
t.add_column("mzmin", t.mz - mz_tol, emzed.MzType)
t.add_column("mzmax", t.mz + mz_tol, emzed.MzType)
t.add_column("peakmap", pms, emzed.PeakMap)
t.add_column(
    "rtmin",
    t.apply(lambda v: v.rt_range()[0], t.peakmap),
    emzed.RtType,
)
t.add_column(
    "rtmax",
    t.apply(lambda v: v.rt_range()[1], t.peakmap),
    emzed.RtType,
)
t.add_enumeration()
t = emzed.quantification.integrate(t, "linear", show_progress=False)
t = t.filter(t.area > 0)
t
[13]:
id precursors mz mzmin mzmax rtmin rtmax peak_shape_model area rmse valid_model
int object MzType MzType MzType RtType RtType str float float bool
0 (872.145,)  428.036700  428.033700  428.039700   7.25 m  31.95 m linear 6.81e+06 0.00e+00 True
1 (864.175,)  428.036700  428.033700  428.039700   6.20 m  17.86 m linear 2.76e+04 0.00e+00 True
2 (810.13,)  428.036700  428.033700  428.039700  14.97 m  17.92 m linear 2.03e+06 0.00e+00 True
3 (872.15,)  428.036700  428.033700  428.039700   7.25 m  31.95 m linear 6.81e+06 0.00e+00 True
4 (864.18,)  428.036700  428.033700  428.039700   6.20 m  17.86 m linear 2.76e+04 0.00e+00 True
9 (838.165,)  428.036700  428.033700  428.039700  19.57 m  19.60 m linear 9.24e+03 0.00e+00 True
11 (894.15,)  428.036700  428.033700  428.039700  14.18 m  14.68 m linear 3.71e+04 0.00e+00 True
13 (910.185,)  428.036700  428.033700  428.039700  14.53 m  16.33 m linear 4.31e+03 0.00e+00 True
14 (868.135,)  428.036700  428.033700  428.039700  12.56 m  16.12 m linear 1.10e+06 0.00e+00 True
15 (892.17,)  428.036700  428.033700  428.039700  18.92 m  27.77 m linear 9.98e+05 0.00e+00 True
16 (854.12,)  428.036700  428.033700  428.039700  11.63 m  11.84 m linear 2.14e+04 0.00e+00 True
17 (854.16,)  428.036700  428.033700  428.039700  15.60 m  15.81 m linear 8.56e+04 0.00e+00 True
19 (854.155,)  428.036700  428.033700  428.039700  15.60 m  15.81 m linear 8.56e+04 0.00e+00 True
21 (838.16,)  428.036700  428.033700  428.039700  18.29 m  23.78 m linear 6.94e+04 0.00e+00 True
23 (814.13,)  428.036700  428.033700  428.039700  15.08 m  15.13 m linear 1.42e+03 0.00e+00 True
26 (824.15,)  428.036700  428.033700  428.039700   6.90 m  28.79 m linear 2.07e+05 0.00e+00 True
27 (868.14,)  428.036700  428.033700  428.039700  12.56 m  16.12 m linear 1.10e+06 0.00e+00 True
30 (910.19,)  428.036700  428.033700  428.039700  14.53 m  16.33 m linear 4.31e+03 0.00e+00 True
31 (810.135,)  428.036700  428.033700  428.039700  15.00 m  16.57 m linear 2.02e+06 0.00e+00 True

For 29 out of the 33 precursor ions we obtained peak areas > 0 for the common CoA ester fragment ion.

The method merge allows merging of non-overlapping peakmaps.

merge(self, other)

An example: We can merge 2 peakmaps from the list of peakmaps pms we obtained when we applied the split_by_precursor method (see above).

[14]:
pm1, pm2 = pms[6:8]
pm1.merge(pm2)
plt.stem(
    *pm1.chromatogram(), linefmt="C0-", markerfmt=" ", basefmt="C0-"
)
plt.stem(
    *pm2.chromatogram(), linefmt="C1--", markerfmt=" ", basefmt="C0-"
)
plt.xlabel("time [s]")
plt.ylabel("counts")
plt.show()
../_images/emzed_tutorial_08_PeakMap_37_0.png

The peakmap pm1 (blue) includes now peakmap pm2 (orange).

Note, merging is in-place and spectral scan_numbers of the two peakmaps must not overlap! Once you have applied merge, you cannot execute merge a second time since the function does not replace existing spectra.

PeakMap supports adding spectra to an existing peakmap using the context manager spectra_for_modification. WARNING: The method can manipulate raw data and such manipulation should be indicated!

with peakmap.spectra_for_modifications() as sp:
    sp.add_spectra(spectrum)

Note, adding a spectrum to a PeakMap requires a with statement. It adds a Spectrum in place.

This is a good opportunity to learn how to create a new Spectrum from scratch:

Spectrum(scan_number, rt, ms_level, polarity, precursors, peaks)

We will add a new spectrum of MS level 1 and a single MS peak at the end of the peakmap coa:

[15]:
# get the current maximal spectrum scan_number
scan_no = coa.spectra[-1].scan_number
scan_no += 1
# maximum rt
rt = coa.rt_range()[1] + 5.0
# create an evil peak
peaks = np.array([[666.666, 1e8]])
spectrum = emzed.Spectrum(scan_no, rt, 1, "+", [], peaks)

with coa.spectra_for_modification() as sp:
    sp.add_spectrum(spectrum)
[16]:
rts, ints = coa.chromatogram()
print(f"""
retention time:\t{rts[-1]/60:.1f} min
intensity:\t{ints[-1]:.1e}""")

retention time: 32.1 min
intensity:      1.0e+08

We can simply check for any peakmap modification by checking the attribute unique_id. It returns a unique identifier.

For example:

[17]:
print("before:", coa.unique_id)
# we add a second spectrum
scan_no = coa.spectra[-1].scan_number
scan_no += 1
# maximum rt
rt = coa.rt_range()[1] + 5.0
# create an evil peak
peaks = np.array([[666.666, 1e8]])
spectrum = emzed.Spectrum(scan_no, rt, 1, "+", [], peaks)
with coa.spectra_for_modification() as sp:
    sp.add_spectrum(spectrum)
print("after:", coa.unique_id)
before: 397b6ea16eec043655a1084671007a36
after: b8dbe25e2e302f4aae477cbfdc3ab2d3

We can undo add_spectrum with the Spectrum method unbind. It detaches the spectrum from the peakmap:

[18]:
# remove the last added spectrum
spectrum = coa.spectra[-1]
spectrum.unbind()
coa.unique_id
[18]:
'b8dbe25e2e302f4aae477cbfdc3ab2d3'

The PeakMap method representing_mz_peak

representing_mz_peak(mzmin, mzmax, rtmin, rtmax, ms_level=1)

with arguments

  • mzmin, mzmax: lower and upper mz limit

  • rtmin, rtmax: lower and upper rt limit

  • ms_level: selected MS level. Default value is 1.

returns the log(intensity) weighted average mz value of a defined mz/RT window.

representing_mz_peak can be used to determine the “average”/representing mz value of a peak.

As an example we will determine the representing mz value of acetyl-CoA eluting at 15 min:

[19]:
mz = 810.1331  # mz of acetyl-CoA
mz_tol = 0.003
rtmin = 900  # in seconds
rtmax = 910
coa.representing_mz_peak(mz - mz_tol, mz + mz_tol, rtmin, rtmax)
[19]:
np.float64(810.1324484513556)

Finally, PeakMap provides a dictionary meta_data. We recommend to document all PeakMap manipulation in the meta_data.

[20]:
coa.meta_data.keys()
[20]:
['full_source', 'source']
[21]:
coa.meta_data["source"]
[21]:
'CoA_ester_ms_ms2.mzXML'

We can use meta_data to document any peak map manipulation, e.g.

[22]:
coa.meta_data["modified"] = True

Back to top

© Copyright 2012-2026 ETH Zurich
Last build 2026-03-31 13:33:53.513402.
Created using Sphinx 9.1.0.