[1]:
# setup
import os
import emzed
import numpy as np
import matplotlib.pyplot as plt
# loading peakmap example
data_folder = os.path.join(os.getcwd(), "tutorial_data")
path = os.path.join(data_folder, "CoA_ester_ms_ms2.mzXML")
path_srm = os.path.join(
data_folder, "Enolase_repeats_AQv1.4.2-20070918_en_10.mzML"
)
coa = emzed.io.load_peak_map(path)
start remote ip in /root/.emzed3/pyopenms_venv
pyopenms client started.
connected to pyopenms client.
Whenever you open an LC-MS raw data file (‘.mzML’, ‘.mzXML’) using emzed.io.load_peak_map(path)
, it gets converted into a PeakMap object. Similar to Table, PeakMaps are sqlite databases allowing memory efficient data processing. Simply spoken, a PeakMap is a collection of measured MS spectra providing a number of (mainly) data extraction methods. Each spectum is represented by an emzed Spectra
object. In principle, PeakMap
, Spectra
and associated methods allow handling all common
LC-MS data acquisition approaches. To learn more about PeakMap
we will extract some information from the example PeakMap coa
. It contains MS and MS-MS data of coenzyme A thioesters. To get an overview of the content we can use the PeakMap
method summary
:
[2]:
coa.summary()
[2]:
info | ms_level | value |
---|---|---|
str | str | str |
rt range | 1 | 5.1m .. 32.0m |
rt range | 2 | 6.2m .. 32.0m |
mz range | 1 | 700.017 .. 999.976 |
mz range | 2 | 210.069 .. 1777.090 |
num_scans | 1 | 1697 |
num_scans | 2 | 361 |
polarities | {'+'} | |
ms_chromatograms | 0 |
Alternatively, we can directly access the different parameters one by one using the corresponding attributes:
[3]:
print(
f"""
rt range:\t{coa.rt_range()}
mz range:\t{coa.mz_range()}
ms levels:\t{coa.ms_levels()}
spectra:\t{coa.spectra[:2]}
"""
)
rt range: (306.065, 1920.36)
mz range: (210.06932067871094, 1777.0897216796875)
ms levels: {1, 2}
spectra: [<BoundSpectrum n_peaks=3>, <BoundSpectrum n_peaks=6>]
We already used the method rt_range()
in chapter 2 to extract ion chromatograms of amino acids. It returns a tuple with min(RT) and max(RT). Analogous, mz_range
and ms_levels
return the corresponding values. Note, ms_levels
returns a set(), since there is no MS level order in sequences of spectra. Next, we select an arbitrary spectrum:
[4]:
# we select a single spectrum out of the list of spectra
spectrum = coa.spectra[50]
print(
f"""
scan number:\t{spectrum.scan_number}
retention time:\t{spectrum.rt}
polarity:\t{spectrum.polarity}
MS level:\t{spectrum.ms_level}
precursosr:\t{spectrum.precursors}
peaks: {spectrum.peaks[:5]}"""
)
scan number: 51
retention time: 343.081
polarity: +
MS level: 1
precursosr: []
peaks: [[ 761.7154541 2283.57983398]
[ 782.6260376 2659.90942383]
[ 784.76593018 2127.91162109]
[ 798.34472656 2325.74389648]
[ 831.12145996 2208.859375 ]]
A Spectrum provides retention time in seconds, MS level, polarity, precursor ions, and spectral peaks, which is a list of (mz, intensity) pairs. Since the 51st spectrum is of MS level 1, no precursor ions exist. Next, we will inspect a level 2 Spectrum:
[5]:
# we select a single spectrum out of the list of spectra, this time an MS2 spectrum
spectrum = coa.spectra[794]
print(
f"""
scan number:\t{spectrum.scan_number}
retention time:\t{spectrum.rt}
polarity:\t{spectrum.polarity}
MS level:\t{spectrum.ms_level}
precursors:\t{spectrum.precursors}
peaks: {spectrum.peaks[-5:]}"""
)
scan number: 795
retention time: 909.846
polarity: +
MS level: 2
precursors: [(810.13323975, 1921060.0, 1)]
peaks: [[ 792.12030029 2947.25415039]
[ 810.13122559 34063.26171875]
[ 811.1338501 236483.8125 ]
[ 811.46936035 1006.56011963]
[ 818.55566406 664.02734375]]
In case of a level 2 Spectrum, the precursor ion is given as a list containing a single (mz, intensity, MS level) tuple. Note, for spectra of ms level n, precursors
contains precursor ions of level 1 up to level n. In principle, emzed3 can handle spectral data of any MS level. In practice, almost exclusively MS level 1 and level 2 data are acquired.
The method ``chromatogram`` allows ion chromatogram extraction.
chromatogram(mzmin=None, mzmax=None, rtmin=None, rtmax=None, ms_level=None, precursormzmin=None, precursormzmax=None)
with arguments
mzmin
, mzmax
: lower and upper mz limit
rtmin
, rtmax
: lower and upper rt limit (in seconds)
ms_level
: the MS level
precursormzmin
, precursormzmax
: lower and upper precursor mz limit. (only required if ms_level > 1)
returns a list consisting of retention time, intensity pairs, where intensity is the sum of all spectral peaks with mzmin \(\leq\) mz \(\leq\) mzmax.
As a simple example, we will extract the total ion current of the MS level 1 peakmap:
[6]:
def plot_chromatogram(rts, intensities):
plt.plot(rts, intensities)
plt.xlabel("time [s]")
plt.ylabel("intensity")
plt.show()
rts, intensities = coa.chromatogram(ms_level=1)
plot_chromatogram(rts, intensities)
Next, we will extract the EIC of the [M+H] ion of acetyl-CoA and the corresponding TIC of the ms level 2:
[7]:
mz = emzed.mass.of("C23H38N7O17P3S") + emzed.mass.p
print(mz)
mzmin = mz - 0.003
mzmax = mz + 0.003
rts_2, intensities_2 = coa.chromatogram(
ms_level=2, precursormzmin=mzmin, precursormzmax=mzmax
)
# we limit the plot time range for the MS level 1 EIC to the range of the fragment ion TIC
rtmin = min(rts_2)
rtmax = max(rts_2)
rts_1, intensities_1 = coa.chromatogram(
mzmin=mzmin, mzmax=mzmax, rtmin=rtmin, rtmax=rtmax, ms_level=1
)
plt.plot(rts_1, intensities_1)
plt.plot(rts_2, intensities_2)
plt.xlabel("time [s]")
plt.ylabel("intensity")
plt.show()
810.1330558790121
The method ``extract`` allows creating a sub peakmap: ~~~ extract(mzmin=None, mzmax=None, rtmin=None, rtmax=None, imin=None, imax=None, mslevelmin=None, mslevelmax=None, precursormzmin=None, precursormzmax=None, *, target_db_file=None) ~~~
Arguments: - mzmin
, mzmax
: lower and upper mz limit - rtmin
, rtmax
: lower and upper rt limit - imin
, imax
: lower and uppper intensity limit - precursormzmin
, precursormzmax
: lower and upper precursor mz limit. (only required if ms_level > 1) - target_db_file
: allows defining a path where the resulting file will be stored as sqlite data base to allow memory efficient on disk processing - *
: additonal Spectra arguments i.e. polarity, scan_number
The method returns a subset peakmap fullfilling the defined conditions.
As an example, we will extract the major ms level 1 peaks:
[8]:
coa_major = coa.extract(imin=5e5, mslevelmax=1)
# to visualize the result we plot the correspinding TIC (we use a stem plot since only a few RTs are remaining after the extraciton)
plt.stem(*coa_major.chromatogram(), markerfmt=" ", basefmt="C0-")
plt.xlabel("time [s]")
plt.ylabel("intensity")
plt.show()
The method ``filter`` is dedicated to spectral filtering: ~~~ filter(predicate, target_db_file=None) ~~~
Arguments:
predicate
: a function defining filtering condition with Spectrum as unique required argument, returns True or False.
target_db_file
: allows defining a path where the resulting file will be stored as sqlite data base to allow memory efficient on disk processing.
The method returns a peakmap containing all spectra fullfilling the predicate.
Since most routine cases are already covered by the method extract
, filter
will be only needed for very specific cases. In our next example we will filter the peakmap for ms level 2 spectra with a specific neutral loss (the mz difference between precuror ion and fragment ion). First we define the predicate function:
[9]:
neutral = emzed.mass.of("C10H16N5O13P3")
mztol = 0.005
def neutral_loss(spectrum):
# only spectra of ms level 2 are considered
if spectrum.ms_level == 2:
# we extract the mz value of the precursor ion:
mzprec = spectrum.precursors[0][0]
# to extract spectral mz values we take the 1st column of peaks array
# if we add the neutral loss to the fragment ions resulting mz value
# equals precursor ion mz with tolerance mztol
delta = np.abs(spectrum.peaks[:, 0] + neutral - mzprec)
candidates = delta[delta <= mztol]
return len(candidates) > 0
return False
sub = coa.filter(neutral_loss)
print(
f"""number of spectra
before:\t{len(coa)}
after:\t{len(sub)}"""
)
number of spectra
before: 2058
after: 185
The method ``get_dominating_peak_map`` yields a PeakMap containing all spectra of lowest MS level. In case of MS level N experiments those spectra contain the source (precursor) ions of subsequent fragmentations and hence are dominating.€€ not really clear to me. @ Better? €€
coa.get_dominating_peak_map()With very few exceptions, the dominating peak map is always level 1 and the method is identical to
coa.extract(mslevelmax = 1)The method ``representing_mz_peak`` returns the most intense spectrum peak in a user defined rt and mz range.
representing_mz_peak(mzmin, mzmax, rtmin, rtmax, ms_level=1)
with arguments
mzmin, mzmax: lower and upper mz limit
rtmin, rtmax: lower and upper rt limit
ms_level: the MS level
[10]:
coa.representing_mz_peak(
mzmin=750, mzmax=850, rtmin=6 * 60.0, rtmax=10 * 60.0, ms_level=1
)
[10]:
796.3164026464547
The method ``get_precursors_mzs`` returns a set of all precursor mz values as tuple
[11]:
precursors = coa.get_precursors_mzs()
print("number of precursor ions:", len(precursors))
# we will show the first
precursors.pop()
number of precursor ions: 124
[11]:
(872.14941406,)
The method ``split_by_precursors`` creates a dictionary {(level n-1 precursor,), level n peakmap}
split_by_precursors(mz_tol = 0.0)
with argument - mz_tol: maximal allowed mz difference of precursor ions. Default: 0.0
When we apply the method with a m/z tolerance of 0.005 for the precursor ion we obtain
[12]:
pairs = coa.split_by_precursors(0.005)
len(pairs)
[12]:
33
A potential use case would be a targeted MS level 2 approach based on the peak area of the common CoA ester fragment ion at m/z 428.0367:
[13]:
precursors, pms = zip(*pairs.items())
mz_tol = 0.003
# we create an exctraction table
t = emzed.to_table(
"precursors", precursors, object
) # tuple is not allowed as type, therefore object is used
t.add_column_with_constant_value("mz", 428.0367, emzed.MzType)
t.add_column("mzmin", t.mz - mz_tol, emzed.MzType)
t.add_column("mzmax", t.mz + mz_tol, emzed.MzType)
t.add_column("peakmap", pms, emzed.PeakMap)
t.add_column(
"rtmin",
t.apply(lambda v: v.rt_range()[0], t.peakmap),
emzed.RtType,
)
t.add_column(
"rtmax",
t.apply(lambda v: v.rt_range()[1], t.peakmap),
emzed.RtType,
)
t.add_enumeration()
t = emzed.quantification.integrate(t, "linear", show_progress=False)
t = t.filter(t.area > 0)
t
[13]:
id | precursors | mz | mzmin | mzmax | rtmin | rtmax | peak_shape_model | area | rmse | valid_model |
---|---|---|---|---|---|---|---|---|---|---|
int | object | MzType | MzType | MzType | RtType | RtType | str | float | float | bool |
0 | (872.145,) | 428.036700 | 428.033700 | 428.039700 | 7.25 m | 31.95 m | linear | 6.81e+06 | 0.00e+00 | True |
1 | (864.175,) | 428.036700 | 428.033700 | 428.039700 | 6.20 m | 17.86 m | linear | 2.76e+04 | 0.00e+00 | True |
2 | (810.13,) | 428.036700 | 428.033700 | 428.039700 | 14.97 m | 17.92 m | linear | 2.03e+06 | 0.00e+00 | True |
3 | (872.15,) | 428.036700 | 428.033700 | 428.039700 | 7.25 m | 31.95 m | linear | 6.81e+06 | 0.00e+00 | True |
4 | (864.18,) | 428.036700 | 428.033700 | 428.039700 | 6.20 m | 17.86 m | linear | 2.76e+04 | 0.00e+00 | True |
9 | (838.165,) | 428.036700 | 428.033700 | 428.039700 | 19.57 m | 19.60 m | linear | 9.24e+03 | 0.00e+00 | True |
11 | (894.15,) | 428.036700 | 428.033700 | 428.039700 | 14.18 m | 14.68 m | linear | 3.71e+04 | 0.00e+00 | True |
13 | (910.185,) | 428.036700 | 428.033700 | 428.039700 | 14.53 m | 16.33 m | linear | 4.31e+03 | 0.00e+00 | True |
14 | (868.135,) | 428.036700 | 428.033700 | 428.039700 | 12.56 m | 16.12 m | linear | 1.10e+06 | 0.00e+00 | True |
15 | (892.17,) | 428.036700 | 428.033700 | 428.039700 | 18.92 m | 27.77 m | linear | 9.98e+05 | 0.00e+00 | True |
16 | (854.12,) | 428.036700 | 428.033700 | 428.039700 | 11.63 m | 11.84 m | linear | 2.14e+04 | 0.00e+00 | True |
17 | (854.16,) | 428.036700 | 428.033700 | 428.039700 | 15.60 m | 15.81 m | linear | 8.56e+04 | 0.00e+00 | True |
19 | (854.155,) | 428.036700 | 428.033700 | 428.039700 | 15.60 m | 15.81 m | linear | 8.56e+04 | 0.00e+00 | True |
21 | (838.16,) | 428.036700 | 428.033700 | 428.039700 | 18.29 m | 23.78 m | linear | 6.94e+04 | 0.00e+00 | True |
23 | (814.13,) | 428.036700 | 428.033700 | 428.039700 | 15.08 m | 15.13 m | linear | 1.42e+03 | 0.00e+00 | True |
26 | (824.15,) | 428.036700 | 428.033700 | 428.039700 | 6.90 m | 28.79 m | linear | 2.07e+05 | 0.00e+00 | True |
27 | (868.14,) | 428.036700 | 428.033700 | 428.039700 | 12.56 m | 16.12 m | linear | 1.10e+06 | 0.00e+00 | True |
30 | (910.19,) | 428.036700 | 428.033700 | 428.039700 | 14.53 m | 16.33 m | linear | 4.31e+03 | 0.00e+00 | True |
31 | (810.135,) | 428.036700 | 428.033700 | 428.039700 | 15.00 m | 16.57 m | linear | 2.02e+06 | 0.00e+00 | True |
For 29 out of the 33 precursor ions we obtained peak areas > 0 for the common CoA ester fragment ion.
The method ``merge`` allows merging of none overlapping peakmaps.
merge(self, other)
An example: We can merge 2 peakmaps form the list of peakmaps ``pms`` we obtained when we applied the split_by_precursor
method (see above).
[14]:
pm1, pm2 = pms[6:8]
pm1.merge(pm2)
plt.stem(
*pm1.chromatogram(), linefmt="C0-", markerfmt=" ", basefmt="C0-"
)
plt.stem(
*pm2.chromatogram(), linefmt="C1--", markerfmt=" ", basefmt="C0-"
)
plt.xlabel("time [s]")
plt.ylabel("counts")
plt.show()
The peakmap pm1
(blue) includes now peakmap pm2
(orange).
Note, merging is in place and spectral scan_numbers of the two peakmaps must not overlap! Once you applied merge, you cannot execute merge a second time since the function does not replace existing spectra.
PeakMap supports adding spectra to an existing peakmap using the context manager spectra_for_modification
. WARNING: The method can manipulate raw data and such manipulation should be indicated!
Note, adding a spectrum to a PeakMap requires a with
statement. It adds a Spectrum in place.
This is a good opportunity to learn how to create a new Spectrum from scatch:
Spectrum(scan_number, rt, ms_level, polarity, precursors, peaks)We will add a new spectrum of MS level 1 and a single MS peak at the end of the peakmap coa
:
[15]:
# get the current maximal spectrum scan_number
scan_no = coa.spectra[-1].scan_number
scan_no += 1
# maximum rt
rt = coa.rt_range()[1] + 5.0
# create an evil peak
peaks = np.array([[666.666, 1e8]])
spectrum = emzed.Spectrum(scan_no, rt, 1, "+", [], peaks)
with coa.spectra_for_modification() as sp:
sp.add_spectrum(spectrum)
[16]:
rts, ints = coa.chromatogram()
print(
f"""
retention time:\t{rts[-1]/60:.1f} min
intensity:\t{ints[-1]:.1e}"""
)
retention time: 32.1 min
intensity: 1.0e+08
We can simply check for any peakmap modification by checking the attribute ``unique_id``. It returns a unique identifier.
In example:
[17]:
print("before:", coa.unique_id)
# we add a second spectrum
scan_no = coa.spectra[-1].scan_number
scan_no += 1
# maximum rt
rt = coa.rt_range()[1] + 5.0
# create an evil peak
peaks = np.array([[666.666, 1e8]])
spectrum = emzed.Spectrum(scan_no, rt, 1, "+", [], peaks)
with coa.spectra_for_modification() as sp:
sp.add_spectrum(spectrum)
print("after:", coa.unique_id)
before: ebda6261ec5e80fc183affd36a09a124
after: 26cc1ac2819445fa24e9e2729034ab9f
We can undo add_spectrum with the ``Spectrum`` method ``unbind``. It detaches the spectrum from the peakmap:
[18]:
# remove the last added spectrum
spectrum = coa.spectra[-1]
spectrum.unbind()
coa.unique_id
[18]:
'26cc1ac2819445fa24e9e2729034ab9f'
The PeakMap
method ``representing_mz_peak`` ~~~ representing_mz_peak(mzmin, mzmax, rtmin, rtmax, ms_level=1) ~~~ with arguments
mzmin
, mzmax
: lower and upper mz limit
rtmin
, rtmax
: lower and upper rt limit
ms_level
: selected MS level. Default value is 1.
returns the log(intensity) weighted average mz value of a defined mz/RT window.
representing_mz_peak
can be used to determine the “average”/representing mz value of a peak.
As an example we will determine the representing mz value of acetyl-CoA eluting at 15 min:
[19]:
mz = 810.1331 # mz of acetyl-CoA
mz_tol = 0.003
rtmin = 900 # in seconds
rtmax = 910
coa.representing_mz_peak(mz - mz_tol, mz + mz_tol, rtmin, rtmax)
[19]:
810.1324484513556
Finall, PeakMap provides a dictionary ``meta_data``. We recommend to document all PeakMap manipulation in the meta_data.
[20]:
coa.meta_data.keys()
[20]:
['full_source', 'source']
[21]:
coa.meta_data["source"]
[21]:
'CoA_ester_ms_ms2.mzXML'
We can use meta_data
to document any peak map manipulation, e.g.
[22]:
coa.meta_data["modified"] = True
Emzed allows reading peakmaps from selected reaction monitoring (SRM) experiments. Those peakmaps contain only chromatograms and no spectra. In the following we will use an example file from ProteoWizard that you can download from here.
[23]:
srm_peakmap = emzed.io.load_peak_map(path_srm)
print(
"Number of chromatograms: %s" % len(srm_peakmap.ms_chromatograms)
)
print("Number of spectra: %s" % len(srm_peakmap.spectra))
Number of chromatograms: 8
Number of spectra: 0
Let’s have a closer look at the ms_chromatograms. First, we select a a single chromatogram:
[24]:
chromatogram = srm_peakmap.ms_chromatograms[-1]
A chromatogram has the attributes - available_types
: possible chromatogram types - intensities
: 1D array of intensity values - mz
: m/z value of the fragment ion - precursor_mz
: m/z value of the precursor ion - rt_range
: method returning a tuple with rt values at start and and of chromatogram - rts
: 1D array of retention time values - type
: type of current chromatogram
Let’s inspect some attributes of the selected chromatogram
:
[25]:
print(f"available types: {', '.join(chromatogram.available_types)}\n")
print(f"type:\t\t {chromatogram.type}")
print(f"m/z:\t\t {chromatogram.mz}")
print(f"rt range:\t{chromatogram.rt_range()}")
available types: MASS_CHROMATOGRAM, TOTAL_ION_CURRENT_CHROMATOGRAM, SELECTED_ION_CURRENT_CHROMATOGRAM, BASEPEAK_CHROMATOGRAM, SELECTED_ION_MONITORING_CHROMATOGRAM, SELECTED_REACTION_MONITORING_CHROMATOGRAM, ELECTROMAGNETIC_RADIATION_CHROMATOGRAM, ABSORPTION_CHROMATOGRAM, EMISSION_CHROMATOGRAM
type: SELECTED_REACTION_MONITORING_CHROMATOGRAM
m/z: 1172.71
rt range: (337.864, 3001.052)
Next we list the types of all chromatograms present in ms_chromatograms
:
[26]:
[chrom.type for chrom in srm_peakmap.ms_chromatograms]
[26]:
['TOTAL_ION_CURRENT_CHROMATOGRAM',
'BASEPEAK_CHROMATOGRAM',
'SELECTED_REACTION_MONITORING_CHROMATOGRAM',
'SELECTED_REACTION_MONITORING_CHROMATOGRAM',
'SELECTED_REACTION_MONITORING_CHROMATOGRAM',
'SELECTED_REACTION_MONITORING_CHROMATOGRAM',
'SELECTED_REACTION_MONITORING_CHROMATOGRAM',
'SELECTED_REACTION_MONITORING_CHROMATOGRAM']
Usually, the first two chromatograms are the TIC and the basepeak chromatogram (BPC). We can filter for all SRM chromatograms and plot them:
[27]:
import matplotlib.pyplot as plt
for chrom in srm_peakmap.ms_chromatograms:
if chrom.type == "SELECTED_REACTION_MONITORING_CHROMATOGRAM":
plt.plot(
chrom.rts / 60,
chrom.intensities,
label="Q1=%s, Q3=%s" % (chrom.precursor_mz, chrom.mz),
)
plt.legend()
plt.xlabel("Retention time (min)")
plt.ylabel("Intensity")
plt.show()
For further processing such as peak finding you might use the scipy.signal
module.
© Copyright 2012-2024 ETH Zurich
Last build 2024-03-25 10:41:42.995953.
Created using Sphinx 7.2.6.