[1]:

# Setup
import emzed
import os

# path to files
data_folder = os.path.join(os.getcwd(), "tutorial_data")
path_sample_caulobacter = os.path.join(
    data_folder, "AA_sample_caulobacter.mzML"
)
path_sample_arabidopsis = os.path.join(
    data_folder, "AA_sample_arabidopsis.mzML"
)
path_caulo_final = os.path.join(data_folder, "t_adap_final.table")
path_caulo_annotated = os.path.join(
    data_folder, "t_adap_annotated.table"
)
ref_path = os.path.join(data_folder, "mz_calibration_table.csv")
# peakmaps and tables
peakmap_arabidopsis = emzed.io.load_peak_map(path_sample_arabidopsis)
peakmap_caulobacter = emzed.io.load_peak_map(path_sample_caulobacter)
adap = emzed.io.load_table(path_caulo_final)
ta_caulo_annotated = emzed.io.load_table(path_caulo_annotated)
t_ref = emzed.io.load_csv(ref_path)

start remote ip in /root/.emzed3/pyopenms_venv

pyopenms client started.
connected to pyopenms client.

pyopenms: No source file annotated.

5 Untargeted sample analysis¶

General¶

emzed3 heavily features untargeted analysis. To do so emzed3 provides tools from OpenMS and MZmine2. Whereas open ms features are integrated in the emzed3 core mzmine2 is an extension that has to be installed first. To get familiar with those tools we will exercise the basic workflow depicted above which comprises the core peak processing steps detection, alignment, and grouping. Moreover, we will perform some basic sample comparison applying data base matching with emzed integrated pubchem library.

Installing the MZmine2 extension¶

As already mentioned above MzMine2 is an emzed extenion, and must be installed first. Open your IPython console and run the command:

emzed_install_ext('mzmine2')

After installing, close the current console and open a new one to update the installation.

Next, type the command:

emzed.ext.mzmine2.init()

Again, close the current console and start a new session. After opening the new console MZmine2 features will be available in the emzed name space

emzed.ext.mzmine2

mzmine2 = emzed.ext.mzmine2

A detailed user manual with all features can be find here. Note that currently, only feature detection related tools are implemented in emzed3.

Feature detection¶

Let’s start with a more general look at LC-MS data structure. LC-MS level 1 data acquisition can be regarded as a 2 dimensional separation procedure. In the first dimension, LC separates compounds by physico-chemical interactions over time (retention time). In the second dimension, compound ions are separated by their mass to charge ratios m/z whereby the continuous LC output is scanned with an MS instrument specific frequency. Hence, we can define LC-MS peaks as sequences of spectra with m/z specific intensity series of with shapes typical for applied LC separation. Accordingly, we can subdivide the task of peak detection algorithm into two principal steps:

EIC detection: Search the peak map for all series of consecutive m/z peaks with the same m/z values (EIC).
Peak detection: Find all chromatographic (LC) peaks within each EIC.

To find an EIC we have to define the the meaning of m/z peaks (signal), what means a series of consecutive m/z peaks, and what means m/z peaks with same value. To define a signal we have to distinguish between signal (S) and noise (N). Remember, it is commonn practice to the define presence of a signal by a \(S/N >=3\) and a quantifiable signal by \(S/N >=10\). The number of (consecutive) m/z peaks or the time range considered to define a m/z trace directly depends of the applied LC-method and on the scan speed of the instrument. Whereas HPLC columns produce peaks with peak widths in the range of 20-30s, UPLC peaks are up to 10x narrower. Independent on the applied column type, compound peaks of interest can strongly differ from the typical shape and hence the range must be chosen with care. Finally, the variation of measured m/z values depends on the mass resolution \(R = mz/\Delta mz\) (measured at full width half maximum, FWHM) and is not only instrument dependent but also depends on the m/z itself for most common MS instruments (TOF, Orbitrap). The width of a m/z peak can be easily seen when data were acquired in the profile mode. However, most algorithms work with centroided data, where each spectral peak is represented by its centroided value. In that case the R value of the peak or the spectrum is required. A more practical approach is to determine the observed m/z width (max(m/z) - min(m/z)) directly from the raw data: Figure 3: Measuring m/z width with PeakMap explorer using the plot of the summed spectra. For the given EIC, the m/z difference of the minimal and maximal measured m/z value is dmz = 0.000946.

Figure 3 shows a typical Orbitrap m/z trace. In this example the absolute trace width is around 1 mmU and the relative about 4 ppm (m/z = 238.154). If the algorithm requires relative values the acquisition m/z range is crucial to provide reasonable values (1 mmu at m/z 75.0 corresponds to ~13 ppm).
Once EICs are available, LC-peaks can be defined. Similar to mass trace detection, the values start, apex and end of LC peaks must be determined. Since those parameters are much more algorithm dependent, we will discuss peak detection in more details directly with provided peak detectors.

We will introduce untargeted feature detection using LC-MS data acquired with an Orbitrap MS instrument and hence, some aspects will be instrument specific.

openMS¶

emzed3 provides openMS run_feature_finder_metabo for feature detection. To explore the command type:

help(emzed.run_feature_finder_metabo)

run_feature_finder_metabo has a high number of function arguments and adaptation to the applied LC-MS method is not trivial. If we have a closer look at the dectection parameter names, we can assign those parameters to the different processing steps by their prefixes: - common: common parameters - mtd: mass trace detection - epdet: elution peak detection - ffm: feature finding metabo

The last process is only executed if run_feature_grouper = True and provides isotopologue peak grouping.

Note, it’s much more convenient and less error prone to use key word arguments `kwargs <https://realpython.com/python-kwargs-and-args/>`__ for parameter settings. For the given data set we provide a dictionary with optimized parameters:

[2]:

kwargs = dict(
    common_chrom_peak_snr=10.0,
    common_chrom_fwhm=3.0,
    mtd_noise_threshold_int=7000.0,
    mtd_mass_error_ppm=20.0,
    mtd_reestimate_mt_sd="true",
    mtd_trace_termination_criterion="outlier",
    mtd_trace_termination_outliers=4,
    mtd_min_sample_rate=0.5,
    mtd_min_trace_length=5.0,
    mtd_max_trace_length=-1.0,
    epdet_width_filtering="auto",
    epdet_masstrace_snr_filtering="false",
    ffm_local_rt_range=2.0,
    ffm_local_mz_range=5.0,
    ffm_charge_lower_bound=0,
    ffm_charge_upper_bound=3,
    ffm_report_summed_ints="false",
    ffm_isotope_filtering_model="none",
    ffm_use_smoothed_intensities="false",
)

We can now process both samples:

[3]:

t_ara = emzed.run_feature_finder_metabo(
    peakmap_arabidopsis, verbose=False, **kwargs
)
t_caulo = emzed.run_feature_finder_metabo(
    peakmap_caulobacter, verbose=False, **kwargs
)

Note, verbose == False omits data processing output. Let’s have a look at the Caulobacter resulting table:

[4]:

t_caulo.summary()

[4]:

id	name	type	format	nones	len	min	max	distinct values
int	str	str	str	int	int	float	float	int
0	id	int	%d	0	1042	0.000000	1041.000000	1042
1	feature_id	int	%d	0	1042	0.000000	778.000000	779
2	feature_size	int	%d	0	1042	1.000000	4.000000	4
3	mz	MzType	%11.6f	0	1042	85.075765	599.389233	1042
4	mzmin	MzType	%11.6f	0	1042	85.075607	599.388428	1037
5	mzmax	MzType	%11.6f	0	1042	85.075829	599.390381	1039
6	rt	RtType	rt_formatter	0	1042	28.472400	401.853800	265
7	rtmin	RtType	rt_formatter	0	1042	26.869800	397.279000	290
8	rtmax	RtType	rt_formatter	0	1042	30.920500	599.716100	315
9	intensity	float	%.2e	0	1042	38101.944970	1174356996.343237	1042
10	quality	float	%.2e	0	1042	0.000011	0.295786	779
11	fwhm	RtType	rt_formatter	0	1042	1.715400	6.514842	779
12	z	int	%d	0	1042	0.000000	3.000000	4
13	peakmap	PeakMap	None	0	1042	-	-	1
14	source	str	%s	0	1042	-	-	1

With given parameters we detected 1149 peaks represented by 1149 different id values. Those peaks correspond to 863 isotopologue features grouped by feature_id. Note, ``id`` and ``feature_id`` are sample specific! Let’s print the 10 most intense peaks of the feature table:

[5]:

t_caulo.sort_by("intensity", ascending=False)[:10]

[5]:

id	feature_id	feature_size	mz	mzmin	mzmax	rt	rtmin	rtmax	intensity	quality	fwhm	z	source
int	int	int	MzType	MzType	MzType	RtType	RtType	RtType	float	float	RtType	int	str
1	0	4	245.149248	245.148941	245.149796	1.53 m	1.41 m	2.15 m	1.17e+09	2.96e-01	0.09 m	1	AA_sample_caulobacter.mzML
4	1	3	238.154740	238.154495	238.155441	0.88 m	0.84 m	1.51 m	6.88e+08	1.74e-01	0.06 m	1	AA_sample_caulobacter.mzML
2	0	4	246.152416	246.152115	246.153122	1.53 m	1.42 m	2.16 m	1.43e+08	2.96e-01	0.09 m	1	AA_sample_caulobacter.mzML
7	2	3	153.101914	153.101608	153.102173	1.00 m	0.95 m	1.20 m	1.13e+08	2.78e-02	0.04 m	1	AA_sample_caulobacter.mzML
10	3	3	182.164898	182.164536	182.165176	2.36 m	2.26 m	4.17 m	1.08e+08	2.67e-02	0.10 m	1	AA_sample_caulobacter.mzML
13	4	3	171.112591	171.112335	171.112778	1.25 m	1.20 m	1.42 m	9.37e+07	2.30e-02	0.05 m	1	AA_sample_caulobacter.mzML
5	1	3	239.157875	239.157730	239.158966	0.88 m	0.84 m	1.50 m	9.03e+07	1.74e-01	0.06 m	1	AA_sample_caulobacter.mzML
16	5	2	160.180614	160.180328	160.180893	6.23 m	6.18 m	10.00 m	8.23e+07	2.01e-02	0.06 m	1	AA_sample_caulobacter.mzML
18	6	2	102.127362	102.127266	102.127548	0.93 m	0.90 m	1.39 m	6.95e+07	1.68e-02	0.04 m	1	AA_sample_caulobacter.mzML
20	7	2	139.122672	139.122513	139.122894	0.95 m	0.91 m	1.44 m	4.51e+07	1.10e-02	0.05 m	1	AA_sample_caulobacter.mzML

All columns required for peak visualization and integration are provided (mzmin, mzmax, rtmin, rtmax, peakmap). The charge state of a feature is shown in column z. In case z equals 0, isotopoloque grouping is missing and no charge state could be assigned. Moreover, column intensity does not refer to the individual peak intensity but is the same for all peaks of the same feature configured by the paramter ffm_report_sum_ints ('true': sum of all peak intensity, 'false': intensity of monoisotopopic peak).

Next, we compare the most intense feature of the Arabidopsis sample with the corresponding one in the Caulobacter sample:

[6]:

t_comp = t_ara.left_join(
    t_caulo,
    t_ara.mz.approx_equal(t_caulo.mz, 0.003, 0)
    & t_ara.rt.approx_equal(t_caulo.rt, 20.0, 0),
)

In this example we used the ``left_join`` Table method since all rows of the reference table (the left table) are kept in the result table and missing values in the right Table are set to None. We compare peaks m/z and RT values using the Column method approx_equal(other, atol, rtol)) with - other: column of other (right) Table - atol: absolute allowed tolerance - rtol: allowed tolerance relative to value in left Table

[7]:

t_comp.summary()

[7]:

id	name	type	format	nones	len	min	max	distinct values
int	str	str	str	int	int	float	float	int
0	id	int	%d	0	410	0.000000	407.000000	408
1	feature_id	int	%d	0	410	0.000000	318.000000	319
2	feature_size	int	%d	0	410	1.000000	4.000000	4
3	mz	MzType	%11.6f	0	410	102.054852	960.576274	408
4	mzmin	MzType	%11.6f	0	410	102.054733	960.573914	405
5	mzmax	MzType	%11.6f	0	410	102.054993	960.578186	405
6	rt	RtType	rt_formatter	0	410	26.818200	393.331500	170
7	rtmin	RtType	rt_formatter	0	410	25.168600	389.327900	192
8	rtmax	RtType	rt_formatter	0	410	31.236300	397.905100	217
9	intensity	float	%.2e	0	410	42038.464802	44851637.202265	408
10	quality	float	%.2e	0	410	0.000060	0.067584	319
11	fwhm	RtType	rt_formatter	0	410	1.482326	6.230299	319
12	z	int	%d	0	410	0.000000	1.000000	2
13	peakmap	PeakMap	None	0	410	-	-	1
14	source	str	%s	0	410	-	-	1
15	id__0	int	%d	384	410	65.000000	971.000000	24
16	feature_id__0	int	%d	384	410	26.000000	708.000000	24
17	feature_size__0	int	%d	384	410	1.000000	3.000000	3
18	mz__0	MzType	%11.6f	384	410	104.070310	406.243648	24
19	mzmin__0	MzType	%11.6f	384	410	104.070198	406.242706	24
20	mzmax__0	MzType	%11.6f	384	410	104.070457	406.244476	24
21	rt__0	RtType	rt_formatter	384	410	36.803200	315.126200	20
22	rtmin__0	RtType	rt_formatter	384	410	33.335500	313.410600	22
23	rtmax__0	RtType	rt_formatter	384	410	39.315900	342.005800	23
24	intensity__0	float	%.2e	384	410	170555.439210	14589258.617420	24
25	quality__0	float	%.2e	384	410	0.000040	0.003667	24
26	fwhm__0	RtType	rt_formatter	384	410	1.923615	6.021642	24
27	z__0	int	%d	384	410	0.000000	1.000000	2
28	peakmap__0	PeakMap	None	384	410	-	-	1
29	source__0	str	%s	384	410	-	-	1

The Table summary of columns id and id__0 shows that only 25 peaks of Caulobacter had a match with an Arabidopsis sample peak. Moreover, the distinct values of id is 451 whereas table length is 454 showing that 3 ambiguous matches occured. This is not very surprising since we set RT tolerance to 20 s. Finally we will check, how many of 10 most intend features have a match in Caulobacter sample:

[8]:

t_comp.sort_by("intensity")[:10]

[8]:

id	feature_id	feature_size	mz	mzmin	mzmax	rt	rtmin	rtmax	intensity	quality	fwhm	z	source	id__0	feature_id__0	feature_size__0	mz__0	mzmin__0	mzmax__0	rt__0	rtmin__0	rtmax__0	intensity__0	quality__0	fwhm__0	z__0	source__0
int	int	int	MzType	MzType	MzType	RtType	RtType	RtType	float	float	RtType	int	str	int	int	int	MzType	MzType	MzType	RtType	RtType	RtType	float	float	RtType	int	str
407	318	1	638.245705	638.245239	638.246704	0.75 m	0.73 m	0.77 m	4.20e+04	5.97e-05	0.03 m	0	AA_sample_arabidopsis.mzML	-	-	-	-	-	-	-	-	-	-	-	-	-	-
406	317	1	271.176873	271.176117	271.177032	0.82 m	0.80 m	0.83 m	4.30e+04	6.11e-05	0.04 m	0	AA_sample_arabidopsis.mzML	-	-	-	-	-	-	-	-	-	-	-	-	-	-
405	316	1	387.126863	387.126526	387.127106	1.20 m	1.18 m	1.23 m	4.63e+04	6.57e-05	0.04 m	0	AA_sample_arabidopsis.mzML	-	-	-	-	-	-	-	-	-	-	-	-	-	-
149	87	2	148.079803	148.079605	148.079926	5.43 m	5.38 m	5.49 m	5.48e+04	1.71e-03	0.06 m	1	AA_sample_arabidopsis.mzML	-	-	-	-	-	-	-	-	-	-	-	-	-	-
404	315	1	779.201745	779.201355	779.202515	4.04 m	3.98 m	4.07 m	5.68e+04	8.06e-05	0.05 m	0	AA_sample_arabidopsis.mzML	-	-	-	-	-	-	-	-	-	-	-	-	-	-
403	314	1	757.558652	757.556519	757.560486	0.80 m	0.76 m	1.09 m	5.95e+04	8.44e-05	0.04 m	0	AA_sample_arabidopsis.mzML	-	-	-	-	-	-	-	-	-	-	-	-	-	-
402	313	1	750.508721	750.507080	750.510620	0.61 m	0.60 m	0.64 m	6.23e+04	8.84e-05	0.04 m	0	AA_sample_arabidopsis.mzML	-	-	-	-	-	-	-	-	-	-	-	-	-	-
401	312	1	522.203429	522.203064	522.203857	5.95 m	5.93 m	5.98 m	6.28e+04	8.91e-05	0.03 m	0	AA_sample_arabidopsis.mzML	-	-	-	-	-	-	-	-	-	-	-	-	-	-
120	68	3	597.172992	597.171814	597.178040	1.66 m	1.62 m	1.69 m	7.23e+04	2.11e-03	0.06 m	1	AA_sample_arabidopsis.mzML	-	-	-	-	-	-	-	-	-	-	-	-	-	-
400	311	1	658.408975	658.407776	658.410034	0.71 m	0.69 m	0.76 m	7.62e+04	1.08e-04	0.03 m	0	AA_sample_arabidopsis.mzML	-	-	-	-	-	-	-	-	-	-	-	-	-	-

The 10 most intense peaks of the Arabidopsis sample seem to be sample specific since there is not a single match with the Caulobacter sample.

MZmine2¶

Similar to run_feature_finder_metabo, MzMine2 pick_peaks features peak detection in two main processing steps ion chromatogram extraction and peak detection where different peak detection algorithms can be applied:

# mzmine2 is an extension we find in emzed name space under ext. # We create a mopre dicrect access mzmine = emzed.ext.mzmine2 mzmine.pick_peaks(peakmap, adap_chromatogram_builder, peak_resolver, rsp_parameters=None, verbose=False)

The peak_picker function arguments are

peakmap: PeakMap object
adap_chromatogram_builder: ADAPChromatogramBuilder object
peak_resolver: PeakDetectionBuilder object
rsp_parameters: RemoveShoulderPeaksParameters object, optional. If provided also shoulder peaks will be removed.
verbose: print more output from mzmine when verbose is True .

In contrast to openMS run_feature_finder_metabo , pick_peaks does not hand over directly feature detection parameters as function arguments, but objects defining processing step and each processing step object is configured separately. The optional rsp_parameters allows removing shoulder (satellite) peaks. Those are artificial peaks which can by generated by Orbitrap Instruments.

MzMine2 provides ADAP chromatogram extraction and peak detection algorithms [Meyers et al.] (https://pubs.acs.org/doi/abs/10.1021/acs.analchem.7b00947) which we will discuss further. Let us start with ADAPChromatogramBuilder description by Meyers et al.:

Take all the data points in a datafile, sort them by their intensities, and remove those points (mostly noise) below a certain intensity threshold.
Starting with the most intense data point, the first EIC is created.
For this EIC, establish an immutable m/z range that is the data point’s where is specified by the user.
The next data point, which will be the next most intense, is added to an existing EIC if its m/z value falls within its m/z range.
If the next data point does not fall within an EICs m/z range, a new EIC is created. New EICs are only created if the point meets the minimum start intensity requirement set by the user.
An m/z range for a new EIC is created the same way as in step 3 except the boundaries will be adjusted to avoid overlapping with pre-existing EICs. As an example consider an existing EIC with m/z range (100.000,100.020) for . If the new EIC is initialized with a data point having an m/z value of 100.025, then this new EIC will have am/z range set to (100.020, 100.035) rather than (100.015, 100.035).
Repeat steps 4−6 until all the data has been processed.
Finally, a post processing step is implemented. Only EICs with a user defined number of continuous points above a user defined intensity threshold are kept.

``ADAPChromatogramBuilder`` has 4 tuning parameters:

minimum_scan_span: Minimum number of scans over which some peak in the chromatogram must have continuous points above the noise level to be recognized as a chromatogram. The optimal value depends on the chromatography system setup. The best way to set this parameter is by studying the raw data and determining what is the typical time span of chromatographic peaks.
mz_tolerance: Maximum allowed difference between two m/z values to be considered same. The value is specified both as absolute tolerance (in m/z) and relative tolerance (in ppm). The tolerance range is calculated using maximum of the absolute and relative tolerances.
start_intensity: Points below this intensity will not be considered in starting a new chromatogram.
intensity_thresh2: This parameter is the intensity value for which intensities greater than this value can contribute to the minimum_scan_span count

Note, with start_intensity and intensity_thresh2 2 different threshold parameters are provided. It is useful to choose a higher start_intensity than intensity_thresh2 values since it allows avoiding chromatogram splits along the baseline due to noise. The parameter depends a lot on signal quality i.e. electro spray stability. openMS ff_metabo handles the same issue by allowing a user defined number of outliers (values below the intensity threshold) or alternatively, by defining a signal frequency (not shown). All those values are instrument and sample dependent and require specific tuning. The art is to find the right balance between peak number and quality and requires some time investment. Good practice is to check for the detection of compounds known to be present in the samples at different abundances (i.e. due targeted analysis or spiked compounds).

Also note, the parameter mz_tolerance is defined by a tuple (absolute_tolerance, relative_tolerance in ppm) and the applied tolerance is defined as \(max(atol, rtol)\). Some emzed expressions have a very similar syntax i.e. column.approx_equal(other_column, atol, rtol) In contrast, all core emzed expressions use the common additive tolerance definition :math:`tol = atol + rtol *value`. To avoid unexpected results, we recommend to set one of the two tuple values to 0.

When comparing openMS run_feature_finder_metabo and MZmine2 peak_picker performance we should configure the same or similar parameters with similar values. Let’s configure the ADAPChromatogramBuilder:

[9]:

mzmine = emzed.ext.mzmine2
chrom = mzmine.ADAPChromatogramBuilder()
chrom.intensity_thresh2 = 7e3
chrom.minimum_scan_span = 5
chrom.mz_tolerance = (0.008, 0)
chrom.start_intensity = 1e4

Next, we will define peak detection process. MzmMine 2 provides 5 different peak detectors, each with different strengths and weakness.

ADAPDetector
BaselinePeakDetector
MinimumSearchPeakDetector
NoiseAmplitudePeakDetector
SavitzkyGolayPeakDetector

In the following we will focus on the ``ADAPDectector`` since it performs best in terms of data quality.

ADAP detects peaks using continuous wavelet transformation (CWT). Such transformation simplifies peak recognition since peaks can be detected by varying wavelet scale (the principle is nicely demonstrated here. In principle, LC-MS peaks can now be detected by following along their ridges or ridgeline (max values) as function of the scaling parameter see also Meyers et al.. Configuration requires 6 parameters:

peak_duration: Range of acceptable peak lengths. Tuple (min, max) in seconds.
rt_for_cwt_scales_duration: Upper and lower bounds of retention times to be used for setting the wavelet scales. Choose a range that is similar to the range of peak widths (FWHM) in seconds expected to be found in the data.
sn_estimators: User can choose between two signal to noise estimator objects:
1. IntensityWindowsSNParameters was tested on LC-MS datasets and uses the peak height as the signal level and the standard deviation of intensities around the peak as the noise level
2. WaveletCoefficientsSNParameters was tested on GC-MS datasets and uses the continuous wavelet transform coefficients to estimate the signal and noise levels.
sn_threshold: Signal to noise ratio threshold. The minimum signal to noise ratio a peak must have to be considered a real feature. Values greater than or equal to 7 will work well and will only detect a very small number of false positive peaks.
coef_area_threshold: This is the best coefficient found by taking the inner product of the wavelet at the best scale and the peak, and then dividing by the area under the peak. Values around 100 work well for most data. Filters out bad peaks.
min_feat_height: Minimum height of a feature. The smallest intensity a peak can have and be considered a real feature. Should be the same, or similar to start_intensity value of ADAPChromatogramBuilder.

Additional Notes on ``rt_for_cwt_scales_duration`` parameter. The parameter can be interpreted as the width range of the transformed peak. The applied transformation function (mother wavelet) is the so called mexican hat, corresponding to the second derivative of the Gaussian distribution or bell curve. Most importantly, the width of the center peak corresponds to the standard deviation \(\sigma\) and is a good estimator of the chromatogram peak width FWHM range. You might also check the original configuration parameter description in the adap user manual.

Again we configure the parameters in a way to obtain results comparable with the settings we used above for run_feature_finder_metabo.

[10]:

pd = mzmine.ADAPDetector()
pd.peak_duration = (5.0, 60.0)  # in seconds
pd.rt_for_cwt_scales_duration = (0.04, 3.0)
pd.sn_threshold = 10.0
pd.sn_estimators = (
    mzmine.IntensityWindowsSNParameters()
)  # since we have LC-MS data
pd.coef_area_threshold = 50
pd.min_feat_height = 5e4

Last not least, optional rsp_parameters allows removing shoulder (satellite) peaks. The corresponding RemoveShoulderPeaksParameters object has 2 configuration parameters:

resolution: Mass resolution is the dimensionless ratio of the mass of the peak divided by its width. Peak width is taken as the full width at half maximum intensity (FWHM). default = 100’000
peak_model: Peaks under the curve of this peak model will be removed. Allowed values: 'GAUSS', 'LORENTZ', 'LORENTZEXTENDED'; default = 'GAUSS'

Samples were acquired with a mass resolution of 30’000 at m/z 400. In case of Orbitrap instruments the resolution is \(R = R_{ref} \sqrt(\frac{{mz}_{ref}}{mz})\) and hence, the m/z acquisition range is important. Also keep in mind, the lower the resolution the broader the correction window width. We set it up as follows:

[11]:

rsp = mzmine.RemoveShoulderPeaksParameters(30000, "LORENTZ")

Finally, we can run mzmine feature detection:

[12]:

%%capture
ta_caulo = mzmine.pick_peaks(
    peakmap_caulobacter, chrom, pd, rsp_parameters=rsp
)

[13]:

ta_caulo.sort_by("parent_id")[:10]

[13]:

id	parent_id	mz	rt	mzmin	mzmax	rtmin	rtmax	area	height	width
int	int	MzType	RtType	MzType	MzType	RtType	RtType	float	float	RtType
1	-	85.075737	1.31 m	85.068855	85.083290	0.30 m	9.83 m	1.28e+07	2.01e+06	9.54 m
2	-	86.059761	1.25 m	86.059601	86.067551	0.59 m	6.18 m	1.37e+07	2.24e+06	5.59 m
3	-	86.079033	1.31 m	86.072250	86.082825	1.28 m	8.44 m	1.09e+06	8.67e+04	7.16 m
4	-	86.096146	1.13 m	86.096039	86.096184	1.10 m	2.54 m	9.79e+05	1.55e+05	1.44 m
5	-	87.063080	1.25 m	87.062820	87.070770	0.09 m	8.53 m	1.40e+06	6.27e+04	8.44 m
6	-	88.050133	1.36 m	88.042343	88.057983	1.07 m	9.67 m	1.03e+06	3.82e+04	8.60 m
7	-	88.075447	1.45 m	88.075363	88.075470	1.44 m	1.49 m	8.37e+05	4.78e+05	0.05 m
8	-	88.111633	0.97 m	88.103668	88.111786	0.93 m	8.50 m	2.16e+06	4.51e+04	7.57 m
9	-	89.059425	0.74 m	89.052620	89.060440	0.05 m	9.98 m	1.43e+06	1.65e+04	9.93 m
10	-	90.054634	4.00 m	90.047531	90.059875	0.58 m	10.00 m	1.70e+06	5.07e+04	9.42 m

Most columns of pick_peaks and run_feature_finder_metabo output Table are the same. pick_peaks provides an additional column parent_id referring to the id of the corresponding EIC. Hence, peaks originating from the same EIC have the same parent_id value and we can sort the table by parent_id to evaluate the peak detection process: chromatogram with peaks

Evaluating the different chromatograms simplifies verifying peak extraction and optimization of peak detection parameters. Since we will no longer need the unsplitteded EICs for further processing, we can remove them via a simple filter command using the fact that they have an empty parent_id:

[14]:

ta_caulo = ta_caulo.filter(ta_caulo.parent_id.is_not_none())

At this state, isotopologue grouping is still missing. We can accomplish the grouping using ``mzmine.isotope_grouper()``. To this end we have to configure the ``IsotopeGrouperParameters`` object with parameters: - ``mz_tolerance``: Maximum allowed difference between two m/z values to be considered same. The value is specified both as absolute tolerance (in m/z) and relative tolerance (in ppm). The tolerance range is calculated using the maximum of the absolute and relative tolerances. - ``rt_tolerance``: Maximum allowed difference between two retention time values in seconds. Defined as a tuple: (is_abs_value, value) with is_abs_value being either True or False to define if the provided value is absolute or relative. As an example we assume an RT = 100s and rtol = (True, 0.5) rt tolerance equals 0.5s and (False, 0.5) equals 50s. - ``monotonic_shape``: If true, then monotonically decreasing height of isotope pattern is required. - ``maximum_charge``: Maximum charge to consider for detecting the isotope patterns. - ``representative_isotope``: peak, which should represent the whole isotope pattern. For low molecular weight compounds with monotonically decreasing isotope pattern, the most intense isotope should be representative. For high molecular weight molecules, the lowest m/z isotope may be the representative. Allowed values: 'Most intense', 'Lowest m/z'

Note, parameter mz_tolerance refers not to the mz values of isotopologues directly, but refers to the nominal isotopologue mass shift of neighboring isotopologues of an isotopologue pattern. The pattern results from the natural isotope distributions of compound elements and can be calculated (see also link). Hence, mz_tolerance should be selected in a way that it fullfills following condition for all compounds of interest:

\(\lvert (mz_{n+1, measured} -mz_{n+1, calculated}-mz_{n, measured} + mz_{n, calculated})*z\rvert\le mz_{tolerance}\)

with n corresponding to the nominal isotopologue number. Since most biomolecules are composed out of the elements C, H, N, O, P, S isotopologue shifts are mainly driven by \(^{13}C\), \(^{18}O\), \(^{34}S\) and to a certain extend by \(^{15}N\), spanning a mass range of (0.995 and 1.005), which corresponds to a mz_tolerance of about 0.005 for single charged ions. Since \(^{13}C\) is mainly repsonsible for metabolites M1 isotopologue, isotopologue grouping algorithm assums a default isotope mass shift of 1.0033 Da corresponding to the \(^{12}C\) - \(^{13}C\) mass difference. However, high mass resolution instruments are capable to separate \(^{13}C\) and \(^{15}N\) isotopologues and if you want to include the \(^{15}N\) isotopologue peak you should increase the mass tolerance to \(\pm\) 0.008.

For most metabolites with typical elemental composition, a monotonic shape can be assumed and hence the parameter monotonic_shape should be set to True. However, keep in mind that compounds rich in S atoms or some adducts can ommit isotopologue grouping when monotopic shape is assumed, i.e. grouping of \(M_{1}\) and \(M_{2}\) of ions with a Cl adduct will fail due to the high abundance of \(^{35}Cl\).

We configure the IsotopeGrouperParameters as follows:

[15]:

ig = mzmine.IsotopeGrouperParameters()
ig.mz_tolerance = (0.005, 0)
ig.rt_tolerance = (True, 2.0)  # (is_abs_value, value)
ig.monotonic_shape = True
ig.maximum_charge = 3
ig.representative_isotope = "Lowest m/z"

And we get our feature table by:

[16]:

ta_caulo_final = mzmine.isotope_grouper(ta_caulo, ig)


shutdown send alive token

got first alive token1711363218.9265552
wait for first alive token
json file
/tmp/tmpgddzwb8p.json
Mar 25, 2024 10:40:19 AM net.sf.mzmine.modules.rawdatamethods.rawdataimport.fileformats.MzMLReadTask run
INFO: Started parsing file /tmp/tmp7agqe5yd.mzML
Mar 25, 2024 10:40:19 AM uk.ac.ebi.jmzml.MzMLElement loadProperties
WARNING: MzIdentML Configuration file: jar:file:/root/.emzed3/emzed.ext.mzmine2/mzmine2/MZmine-2.41.2/lib/jmzml-1.7.11.jar!/defaultMzMLElement.cfg.xml

got 1711363219.4278626

Mar 25, 2024 10:40:20 AM net.sf.mzmine.modules.rawdatamethods.rawdataimport.fileformats.MzMLReadTask run
INFO: Finished parsing /tmp/tmp7agqe5yd.mzML, parsed 1066 scans
/tmp/tmp3e11ksan.csv

Mar 25, 2024 10:40:21 AM net.sf.mzmine.modules.peaklistmethods.isotopes.deisotoper.IsotopeGrouperTask run
INFO: Running isotopic peak grouper on

Mar 25, 2024 10:40:21 AM net.sf.mzmine.modules.peaklistmethods.isotopes.deisotoper.IsotopeGrouperTask run
INFO: Finished isotopic peak grouper on
write output file to /tmp/tmpvegf08d6.txt

extracted 1115 deconvolved peaks
extracted 792 deconvolved peaks

!!!DONE

[17]:

ta_caulo_final.sort_by("mz", ascending=False)[:10]

[17]:

id	parent_id	mz	rt	mzmin	mzmax	rtmin	rtmax	area	height	width	isotope_group_id	isotope_base_peak	isotope_group_size	isotope_charge
int	int	MzType	RtType	MzType	MzType	RtType	RtType	float	float	RtType	int	int	int	int
3495	2379	599.389374	1.01 m	599.388489	599.390259	0.97 m	1.08 m	3.64e+06	1.41e+06	0.11 m	-	-	-	-
3494	2371	597.875854	0.78 m	597.875366	597.877563	0.74 m	0.84 m	3.32e+05	1.28e+05	0.09 m	414	3492	2	2
3492	2370	597.374146	0.78 m	597.373413	597.375305	0.73 m	0.83 m	5.35e+05	1.74e+05	0.09 m	414	3492	2	2
3493	2370	597.374146	0.54 m	597.373596	597.376770	0.53 m	0.61 m	2.21e+05	7.46e+04	0.07 m	-	-	-	-
3491	2368	596.317200	0.50 m	596.312622	596.318970	0.47 m	0.54 m	4.72e+05	2.11e+05	0.07 m	40	3489	3	1
3490	2367	595.315369	0.48 m	595.314880	595.316895	0.47 m	0.56 m	2.46e+06	1.19e+06	0.10 m	40	3489	3	1
3489	2365	594.312195	0.48 m	594.310303	594.312561	0.47 m	0.56 m	8.60e+06	4.07e+06	0.10 m	40	3489	3	1
3486	2351	591.876099	0.72 m	591.875061	591.877075	0.67 m	0.78 m	3.99e+05	1.83e+05	0.10 m	331	3484	4	2
3487	2351	591.875244	1.15 m	591.873657	591.876282	1.12 m	1.24 m	2.62e+05	6.80e+04	0.12 m	331	3484	4	2
3488	2351	591.875092	0.96 m	591.873291	591.877380	0.91 m	0.98 m	1.56e+05	6.00e+04	0.07 m	331	3484	4	2

Note, only grouped peaks obtain additional identifiers, hence columns isotope_group_id and isotope_base_peak contain None values if peaks were not grouped.

Comparing feature_finder_metabo and ADAP¶

We can use the emzed Table method join to compare the feature detection results. Ideally, both methods will result in the same mz and rt values. We can test this using Table.join allowing only small mz and rt tolerances:

[18]:

adap = ta_caulo_final
ff = t_caulo
mztol = (0.001, 0.0)
rttol = (2.0, 0.0)  # in the range of about one spectrum distance
comp_adap_ff = adap.join(
    ff,
    adap.mz.approx_equal(ff.mz, *mztol)
    & adap.rt.approx_equal(ff.rt, *rttol),
)
print("number of common peaks:", len(comp_adap_ff))


shutdown send alive token

number of common peaks: 736

Note, the length of table comp_adap_ff does not necessarily correspond to the number of common peaks since in principle, a peak in one table can match several peaks in the other table and vice versa, leading to multiple entries for each peak. Here, a more general way to determine the number of common peaks:

[19]:

ids = comp_adap_ff.id.to_list()
no_peaks = len(set(ids))
print("number of common peaks:", no_peaks)

number of common peaks: 736

Ideally, both feature detection approaches result identical m/z and rt values for the same peaks. We can add a column to show the mz differences:

[20]:

t = comp_adap_ff
t.add_or_replace_column(
    "mz_delta", t.mz - t.mz__0, emzed.MzType, format_="%.1e"
)
t.add_or_replace_column(
    "mz_delta",
    t.apply(abs, t.mz_delta),
    emzed.MzType,
    format_="%.1e",
)
t = t.sort_by("mz_delta", ascending=False)
t[:10]

[20]:

id	parent_id	mz	rt	mzmin	mzmax	rtmin	rtmax	area	height	width	isotope_group_id	isotope_base_peak	isotope_group_size	isotope_charge	id__0	feature_id__0	feature_size__0	mz__0	mzmin__0	mzmax__0	rt__0	rtmin__0	rtmax__0	intensity__0	quality__0	fwhm__0	z__0	source__0	mz_delta
int	int	MzType	RtType	MzType	MzType	RtType	RtType	float	float	RtType	int	int	int	int	int	int	int	MzType	MzType	MzType	RtType	RtType	RtType	float	float	RtType	int	str	MzType
2718	618	245.605164	1.51 m	245.603516	245.610641	1.49 m	1.56 m	1.67e+05	6.15e+04	0.07 m	-	-	-	-	575	341	2	245.606116	245.602371	245.610641	1.53 m	1.49 m	1.58 m	2.02e+05	1.67e-04	0.06 m	2	AA_sample_caulobacter.mzML	9.5e-04
3187	1661	447.270050	0.53 m	447.269714	447.270508	0.52 m	0.59 m	9.05e+05	3.96e+05	0.07 m	-	-	-	-	581	346	1	447.269209	447.259308	447.272034	0.53 m	0.47 m	0.66 m	7.04e+05	1.64e-04	0.03 m	0	AA_sample_caulobacter.mzML	8.4e-04
3338	2033	520.332031	0.74 m	520.330017	520.333252	0.70 m	0.79 m	6.78e+05	2.83e+05	0.09 m	-	-	-	-	660	412	1	520.332712	520.330017	520.344360	0.74 m	0.71 m	0.82 m	5.35e+05	1.25e-04	0.03 m	0	AA_sample_caulobacter.mzML	6.8e-04
3422	2236	564.327545	0.52 m	564.321594	564.328308	0.50 m	0.57 m	1.33e+06	5.86e+05	0.07 m	-	-	-	-	390	195	2	564.328171	564.321594	564.341370	0.52 m	0.50 m	0.64 m	1.17e+06	3.53e-04	0.05 m	1	AA_sample_caulobacter.mzML	6.3e-04
2675	534	230.838531	1.53 m	230.835220	230.840469	1.46 m	1.60 m	3.98e+06	9.45e+05	0.14 m	132	2675	2	3	244	110	1	230.837950	230.835220	230.840469	1.53 m	1.45 m	1.59 m	3.26e+06	7.61e-04	0.08 m	0	AA_sample_caulobacter.mzML	5.8e-04
3491	2368	596.317200	0.50 m	596.312622	596.318970	0.47 m	0.54 m	4.72e+05	2.11e+05	0.07 m	40	3489	3	1	113	45	3	596.317778	596.312622	596.320618	0.49 m	0.47 m	0.56 m	4.03e+05	2.15e-03	0.05 m	1	AA_sample_caulobacter.mzML	5.8e-04
3262	1853	484.314926	1.57 m	484.312286	484.315338	1.54 m	1.64 m	4.49e+05	1.68e+05	0.10 m	123	3257	6	3	332	155	3	484.314352	484.312286	484.315338	1.58 m	1.50 m	1.63 m	4.71e+05	5.00e-04	0.06 m	3	AA_sample_caulobacter.mzML	5.7e-04
3375	2097	533.306763	0.52 m	533.304932	533.307312	0.50 m	0.59 m	1.41e+06	6.08e+05	0.09 m	181	3375	2	1	395	199	2	533.306200	533.296936	533.307312	0.52 m	0.46 m	0.61 m	1.19e+06	3.38e-04	0.04 m	1	AA_sample_caulobacter.mzML	5.6e-04
3293	1933	499.283325	1.55 m	499.281006	499.284668	1.48 m	1.58 m	6.57e+05	1.74e+05	0.10 m	23	3286	3	1	41	15	3	499.282783	499.279083	499.284668	1.54 m	1.47 m	1.58 m	6.34e+05	5.55e-03	0.06 m	1	AA_sample_caulobacter.mzML	5.4e-04
3434	2261	570.339050	0.80 m	570.331909	570.341797	0.75 m	0.84 m	3.59e+05	1.34e+05	0.09 m	291	3431	3	3	535	308	2	570.338545	570.331909	570.341797	0.80 m	0.75 m	0.86 m	2.92e+05	1.96e-04	0.05 m	3	AA_sample_caulobacter.mzML	5.1e-04

We observe rather small m/z variances at the 4th digit which are below mass accuracy of the instrument. Exceptions seem to be due to bad quality i.e. peak id==2722 turned out to be a satellite peak which has not been removed. Summarized, peaks detected with both methods give the same results for common detected peaks.

Next we evaluate peaks exclusively extracted with only one out of the two detectors: the joined table provides peaks ids of those peaks detected with both tools. With the method left_join we can also find peaks detected with only one out of the two tools, since rows of the joined table contain only None values.

[21]:

# mzmine2
comp_adap_ff = adap.left_join(
    ff,
    adap.mz.approx_equal(ff.mz, *mztol)
    & adap.rt.approx_equal(ff.rt, *rttol),
)
adap_only = comp_adap_ff.filter(comp_adap_ff.id__0.is_none())
# open ms
comp_ff_adap = ff.left_join(
    adap,
    ff.mz.approx_equal(adap.mz, *mztol)
    & ff.rt.approx_equal(adap.rt, *rttol),
)
ff_only = comp_ff_adap.filter(comp_ff_adap.id__0.is_none())
print(f"only by ADAP: {len(adap_only)}")
print(f"only by run_feature_finder_metabo: {len(ff_only)}")

only by ADAP: 379
only by run_feature_finder_metabo: 308

Note, here we do not build a set from id values to count the number of peaks, since by definition there are exists no similar peak in the joint table.

Figure 5: Top 10 peaks exclusively detected with ff_metabo (A) and ADAP (B).

When evaluating exclusive peaks, both approaches miss significant peaks (Figure 5). For given parameters, ff_metabo detects about 4 % more peaks than ADAP but it ff_metabo misses more high quality peaks.

SUMMARY: Both approaches give similar results with a slightly better performance of the ADAP. However, some further parameter optimization might close the gap between the two methods. ADAP has a clear advantage in case of Orbitrap instruments data since it can directly remove shoulder peaks. On the other hand ff_metabo is about 100x faster than ADAP. This is a clear advantage when analyzing huge data sets.

Peak deconvolution (adduct grouping)¶

emzed¶

``emzed.adducts`` module largely enhances adduct annotation. Note, that here we do not use the term adduct properly, since we also consider protonation and deprotonation “adducts”. First, it provides detailed information about all common adducts. In total 54 adducts in positive and negative ionization mode are provided. To get detailed information about all available adducts in both modes use the command

[22]:

emzed.adducts.all[:10]

[22]:

id	adduct_name	m_multiplier	adduct_add	adduct_sub	z	sign_z
int	str	int	str	str	int	int
0	M-3H	1		H3	3	-1
1	M-2H	1		H2	2	-1
2	M-	1			1	-1
3	M-H	1		H	1	-1
4	M-H2O-H	1		H2OH	1	-1
5	M+Na-2H	1	Na	H2	1	-1
6	M+Cl	1	Cl		1	-1
7	M+K-2H	1	K	H2	1	-1
8	M+KCl-H	1	KCl	H	1	-1
9	M+FA-H	1	H2CO2	H	1	-1

[23]:

emzed.adducts.all[-10:]

[23]:

id	adduct_name	m_multiplier	adduct_add	adduct_sub	z	sign_z
int	str	int	str	str	int	int
44	M+3ACN+2H	1	(C2H3N)3H2		2	1
45	M+ACN+2H	1	(C2H3N)1H2		2	1
46	M+2H	1	H2		2	1
47	M+H+Na	1	HNa		2	1
48	M+H+K	1	HK		2	1
49	M+2Na	1	Na2		2	1
50	M+3H	1	H3		3	1
51	M+2H+Na	1	(H2)1Na		3	1
52	M+3Na	1	Na3		3	1
53	M+2Na+H	1	(Na2)1H		3	1

For space reasons we only list the first 10 and the last 10 adducts for each mode. Multitude of adducts subsets are also directly available i.e.

[24]:

adds = emzed.adducts.positive_single_charged
adds

[24]:

id	adduct_name	m_multiplier	adduct_add	adduct_sub	z	sign_z
int	str	int	str	str	int	int
19	M+	1			1	1
20	M+H	1	H		1	1
21	M+NH4	1	NH4		1	1
22	M+Na	1	Na		1	1
23	M+H-2H2O	1	H	(H2O)2	1	1
24	M+H-H2O	1	H	H2O	1	1
25	M+K	1	K		1	1
26	M+ACN+H	1	C2H3NH		1	1
27	M+2ACN+H	1	(C2H3N)2H		1	1
28	M+ACN+Na	1	(C2H3N)1Na		1	1
29	M+2Na-H	1	Na2	H	1	1
30	M+Li	1	Li		1	1
31	M+CH3OH+H	1	CH3OHH		1	1
32	M+2K-H	1	K2	H	1	1
33	M+IsoProp+H	1	(C3H8O)1H		1	1
34	M+IsoProp+Na+H	1	(C3H8O)1NaH		1	1
35	M+DMSO+H	1	(C2H6OS)1H		1	1
36	2M+H	2	H		1	1
37	2M+NH4	2	NH4		1	1
38	2M+Na	2	Na		1	1
39	2M+K	2	K		1	1
40	2M+ACN+H	2	(C2H3N)1H		1	1
41	2M+ACN+Na	2	(C2H3N)1Na		1	1

Columns ``m_multiplier``, ``adduct_add``, ``adduct_sub`` and ``z`` are required to calculate the correct neutral mass. Provided adduct tables can be directly used for adduct asignment with method:

[25]:

annotate = emzed.annotate.annotate_adducts
help(annotate)  # more explanat

Help on function annotate_adducts in module emzed.annotate.annotate_adducts:

annotate_adducts(peaks, adducts, mz_tol, rt_tol, explained_abundance=0.2)
    attempts to group peaks as adducts.

    required column names are `mz` and `rt` only. all other columns will be ignored.

Note that you should select only those adducts for assignment which are likely to be present. Some adducts i.e. acetate are more difficult to interprete since the mass shift can also origin from an in source fragmentation of i.e. a sugar moiety. We will select most common adducts from our adducts table by columnn id:

[26]:

%%capture
ids = [20, 21, 22, 24, 25, 32, 36, 38, 39]
adducts = adds.filter(adds.id.is_in(ids))
t_annotate = annotate(adap, adducts, 0.002, 1.5, 0.3)

[27]:

t_annotate.sort_by("adduct_cluster_id", ascending=False)[:10]

[27]:

id	parent_id	mz	rt	mzmin	mzmax	rtmin	rtmax	area	height	width	isotope_group_id	isotope_base_peak	isotope_group_size	isotope_charge	adduct_name	adduct_isotopes	adduct_isotopes_abundance	adduct_m0	adduct_cluster_id
int	int	MzType	RtType	MzType	MzType	RtType	RtType	float	float	RtType	int	int	int	int	str	str	float	MzType	int
2460	123	143.154099	6.24 m	143.154053	143.154343	6.20 m	6.29 m	2.51e+05	6.59e+04	0.09 m	-	-	-	-	M+H	+[1]H	0.999885	142.146822	148
2506	197	160.180588	6.22 m	160.180466	160.180832	6.16 m	6.40 m	1.26e+08	2.56e+07	0.24 m	5	2506	3	1	M+NH4	+[1]H4 [14]N	0.995862	142.146762	148
2475	141	151.035149	6.10 m	151.034973	151.035324	6.08 m	6.19 m	9.79e+06	2.55e+06	0.10 m	67	2475	2	1	M+NH4	+[1]H4 [14]N	0.995862	133.001323	147
2771	718	267.013214	6.10 m	267.012726	267.013519	6.08 m	6.19 m	6.60e+05	1.85e+05	0.10 m	-	-	-	-	2M+H	+[1]H	0.999885	133.002969	147
2391	15	94.044815	6.11 m	94.044685	94.044945	6.09 m	6.18 m	3.34e+05	1.16e+05	0.09 m	-	-	-	-	M+NH4	+[1]H4 [14]N	0.995862	76.010990	146
2481	150	153.032631	6.10 m	153.032425	153.032776	6.08 m	6.19 m	1.40e+06	3.69e+05	0.10 m	-	-	-	-	2M+H	+[1]H	0.999885	76.012677	146
2683	555	237.090317	6.01 m	237.090012	237.091080	5.98 m	6.09 m	1.23e+06	5.54e+05	0.11 m	-	-	-	-	2M+H	+[1]H	0.999885	118.041520	145
2751	681	259.072266	6.01 m	259.071991	259.072571	6.00 m	6.07 m	1.47e+05	7.63e+04	0.08 m	-	-	-	-	2M+Na	+[23]Na	1.000000	118.041522	145
2683	555	237.090317	6.01 m	237.090012	237.091080	5.98 m	6.09 m	1.23e+06	5.54e+05	0.11 m	-	-	-	-	M+H	+[1]H	0.999885	236.083040	144
2751	681	259.072266	6.01 m	259.071991	259.072571	6.00 m	6.07 m	1.47e+05	7.63e+04	0.08 m	-	-	-	-	M+Na	+[23]Na	1.000000	236.083045	144

``annotate_adducts`` adds colums adduct_name, adduct_isotopes, adduct_abundance, adduct_m0 and adduct_cluster_id. It provides detailed information about the grouping process and manual evaluation is straightforward. The more adducts you add the more groupings are possible and adducts might be present in more than 1 adduct group, which can be deduced from table length:

[28]:

print("before:", len(adap), "\nafter :", len(t_annotate))

before: 1115
after : 311

With a few lines of code we can extract those cases:

[29]:

from collections import Counter

id2counts = Counter(t_annotate.id)
# peaks with ids present more than ones show ambiguous grouping
ids = [key for key, counts in id2counts.items() if counts > 1]
# next we can filter our table for selected peaks
t = t_annotate.filter(t_annotate.id.is_in(ids))
# we now obtain the associated adduct_cluster_ids
ac_is = t.adduct_cluster_id.to_list()
# now we can access the subtable of interest:
t_cases = t_annotate.filter(t_annotate.adduct_cluster_id.is_in(ac_is))
t_cases[:10]

[29]:

id	parent_id	mz	rt	mzmin	mzmax	rtmin	rtmax	area	height	width	isotope_group_id	isotope_base_peak	isotope_group_size	isotope_charge	adduct_name	adduct_isotopes	adduct_isotopes_abundance	adduct_m0	adduct_cluster_id
int	int	MzType	RtType	MzType	MzType	RtType	RtType	float	float	RtType	int	int	int	int	str	str	float	MzType	int
2382	2	86.059761	1.25 m	86.059662	86.059853	1.22 m	1.35 m	6.63e+06	2.24e+06	0.13 m	73	2382	3	1	M+H	+[1]H	0.999885	85.052485	0
2407	39	112.050262	1.63 m	112.050102	112.050461	1.59 m	1.72 m	8.53e+05	2.85e+05	0.13 m	-	-	-	-	M+H	+[1]H	0.999885	111.042986	2
2407	39	112.050262	1.63 m	112.050102	112.050461	1.59 m	1.72 m	8.53e+05	2.85e+05	0.13 m	-	-	-	-	2M+H	+[1]H	0.999885	55.521493	3
2418	57	118.085960	1.65 m	118.085831	118.086128	1.62 m	1.72 m	7.23e+05	2.20e+05	0.10 m	-	-	-	-	M+NH4	+[1]H4 [14]N	0.995862	100.052135	4
2424	67	122.924286	2.16 m	122.924202	122.924393	2.10 m	2.23 m	3.22e+06	9.15e+05	0.12 m	-	-	-	-	M+K	+[39]K	0.932581	83.961128	99
2444	97	134.032135	1.63 m	134.031860	134.032349	1.60 m	1.68 m	2.37e+05	1.08e+05	0.08 m	-	-	-	-	M+Na	+[23]Na	1.000000	111.042914	2
2444	97	134.032135	1.63 m	134.031860	134.032349	1.60 m	1.68 m	2.37e+05	1.08e+05	0.08 m	-	-	-	-	2M+Na	+[23]Na	1.000000	55.521457	3
2479	145	152.056465	1.92 m	152.056320	152.056839	1.88 m	2.05 m	7.97e+05	2.44e+05	0.17 m	346	2479	2	1	M+H	+[1]H	0.999885	151.049189	90
2479	145	152.056465	1.92 m	152.056320	152.056839	1.88 m	2.05 m	7.97e+05	2.44e+05	0.17 m	346	2479	2	1	2M+H	+[1]H	0.999885	75.524594	91
2484	153	153.102066	1.25 m	153.101929	153.102234	1.19 m	1.42 m	2.00e+07	5.76e+06	0.23 m	-	-	-	-	M+H-H2O	+[1]H -[1]H2 [16]O	0.997226	170.105355	6

When scrolling to the right end of the resulting table we observe that multiple grouping was due to selecting the adducts M+X and 2M+X. If both adducts result identical mass shifts but different monoisotopic masses m0, the same peaks are grouped twice but with different adduct_custer_id. Hence, selecting M+X and 2M+X at the same time must result in multiple grouping. Choose possible adducts with care!

MzMine2¶

MzMine2 supports adduct annotation, thereby correctly using the term adduct, meaning protonated and deprotonated ions are not incuded. The adduct_search algorithm only annotates potential adducts. However, since a grouping id is missing it is hardly possible to evaluate the quality of the grouping process. MzMine2 ``search_adducts`` is only suited for adduct peak removing. Although grouping of molecule ions of different charge state is possible with Mzmine2, the required method Peak complex search is currently not supported by emzed and therefore, we recommend using the emzed.annotate_adducts() method described above.

How to group adducts with the MzMine2 extension? The function adduct_search(peaks, parameters) with attribute peaks being a peaks table with required columns id, mz, rt, height and wih attribute parameters being an mzmine AdductSearchParameters(rt_tolerance, adducts, mz_tolerance, max_adduct_height) object. The default parameters are configured as follows:

[30]:

rt_tolerance = (False, 1.0)
adducts = (
    ("[M+Na-H]", 21.9825),
    ("[M+K-H]", 37.9559),
    ("[M+Mg-2H]", 21.9694),
    ("[M+NH3]", 17.0265),
    ("[M+H3PO4]", 97.9769),
    ("[M+H2SO4]", 97.9674),
    ("[M+H2CO3]", 62.0004),
    ("[(Deuterium)]glycerol", 5.0),
)
mz_tolerance = (0.001, 0)
max_adduct_height = 1.0

rt_tolerance: a tuple (is_abolute_value, \(\Delta RT\)) that allows defining absolute difference by is_abolute_value (True, False). By default relative rt value differences are used. However, since we group coeluting peak the allowed tolerance is rather depending on the peak width (shape) than on the RT itself and absolute values should be used.
adducts: are tuples (adduct name, mass difference to molecular ion) and any single charged adduct can be defined. Note: Default values contain not the adduct ion itself but the mass shift relative to the main ion (normally M+H, M-H). Morevoer mass shifts of both ionization modes are listed and hence the list must be adapted to obtain meainingful results.
mz_tolerance: a tuple (absolute, relative) with absolute and relative allowed maximal adduct mass difference. Remember, if you provide both, a relative and an absolute difference, the higher of both values is applied.
max_adduct_height: Maximum height of the recognized adduct peak, relative to the main peak. By default peaks of the same height are allowed.

Again, we can use ``emzed.adducts`` module to build our adducts search list

[31]:

adds = emzed.adducts.positive_single_charged
ids = [21, 22, 24, 25, 32]
adducts = adds.filter(adds.id.is_in(ids))
a = adducts

# we calculate the mass shift relative to the M+H ion
expr = (
    a.apply(emzed.mass.of, a.adduct_add)
    - a.apply(emzed.mass.of, a.adduct_sub)
    - emzed.mass.H1
)
a.add_column("mass_shift", expr, float)
# We build the (name, mass_shift) tuple with mass shift relative to M+H
adducts = list(zip(a.adduct_name, a.mass_shift))
adducts

[31]:

[('M+NH4', 17.0265490957),
 ('M+Na', 21.981944248999998),
 ('M+H-H2O', -18.0105650638),
 ('M+K', 37.955881648100004),
 ('M+2K-H', 75.91176329620001)]

We can now setup the AdductSearchParameter object: ~~~ params = mzmine.AdductSearchParameters() params.adducts = adducts params.mz_tolerance=(0.001, 0.0) params.rt_tolerance=(True, 2.0) params_max_adduct_height=1.0 ~~~

and we can run search_adduct ~~~ ta_caulo_annotated = mzmine.adduct_search(t_caulo, params) ~~~

[32]:

annotated = ta_caulo_annotated.filter(
    ta_caulo_annotated.adduct_annotation.is_not_none()
)
annotated[:10]

[32]:

id	parent_id	mz	rt	mzmin	mzmax	rtmin	rtmax	area	height	width	isotope_group_id	isotope_base_peak	isotope_group_size	isotope_charge	adduct_annotation
int	int	MzType	RtType	MzType	MzType	RtType	RtType	float	float	RtType	int	int	int	int	str
2444	97	134.032135	1.63 m	134.031860	134.032349	1.60 m	1.68 m	2.37e+05	1.08e+05	0.08 m	-	-	-	-	M+Na 21.9819 m/z adduct of 112.0503 m/z
2484	153	153.102066	1.25 m	153.101929	153.102234	1.19 m	1.42 m	2.00e+07	5.76e+06	0.23 m	-	-	-	-	M+H-H2O -18.0106 m/z adduct of 171.1126 m/z
2488	159	154.086075	0.55 m	154.085953	154.086288	0.52 m	0.61 m	3.39e+05	1.01e+05	0.10 m	-	-	-	-	M+H-H2O -18.0106 m/z adduct of 172.0966 m/z
2490	160	154.105362	1.25 m	154.105255	154.105515	1.18 m	1.33 m	1.38e+06	4.84e+05	0.15 m	2	2483	4	1	M+H-H2O -18.0106 m/z adduct of 172.1159 m/z
2534	253	174.038376	1.92 m	174.038132	174.038559	1.89 m	2.03 m	8.39e+05	2.43e+05	0.14 m	-	-	-	-	M+Na 21.9819 m/z adduct of 152.0565 m/z
2562	305	186.038345	1.32 m	186.038132	186.038528	1.30 m	1.38 m	1.38e+05	5.29e+04	0.08 m	-	-	-	-	M+Na 21.9819 m/z adduct of 164.0565 m/z
2745	681	259.072266	6.01 m	259.071991	259.072571	6.00 m	6.07 m	1.47e+05	7.63e+04	0.08 m	-	-	-	-	M+Na 21.9819 m/z adduct of 237.0903 m/z
2751	691	261.160095	0.52 m	261.156097	261.160767	0.51 m	0.59 m	2.15e+05	7.31e+04	0.08 m	-	-	-	-	M+H-H2O -18.0106 m/z adduct of 279.1706 m/z
2766	720	267.130951	1.53 m	267.130585	267.131561	1.45 m	1.56 m	1.05e+06	3.37e+05	0.11 m	-	-	-	-	M+Na 21.9819 m/z adduct of 245.1494 m/z
2787	778	276.070312	3.98 m	276.069855	276.070496	3.95 m	4.04 m	1.80e+05	6.68e+04	0.10 m	-	-	-	-	M+Na 21.9819 m/z adduct of 254.0883 m/z

Pubchem database matching¶

The ``emzed.db`` module supports database matching using a local subset of the pubchem data base, which comprises all compounds annotated in KEGG, BIOCYC, and HMDB. When using the local database for the first time, you have to run the command:

[33]:

emzed.db.update_pubchem()

update pubchem: 00:00:00.0 [ ]: </pre>
update pubchem: 00:00:00.0 [ ]: end{sphinxVerbatim}

update pubchem: 00:00:00.0 [ ]

update pubchem: 00:00:00.1 [| ]: </pre>
update pubchem: 00:00:00.1 [| ]: end{sphinxVerbatim}

update pubchem: 00:00:00.1 [| ]

update pubchem: 00:00:00.1 [|| ]: </pre>
update pubchem: 00:00:00.1 [|| ]: end{sphinxVerbatim}

update pubchem: 00:00:00.1 [|| ]

update pubchem: 00:00:00.2 [||| ]: </pre>
update pubchem: 00:00:00.2 [||| ]: end{sphinxVerbatim}

update pubchem: 00:00:00.2 [||| ]

update pubchem: 00:00:00.3 [|||| ]: </pre>
update pubchem: 00:00:00.3 [|||| ]: end{sphinxVerbatim}

update pubchem: 00:00:00.3 [|||| ]

update pubchem: 00:00:00.4 [||||| ]: </pre>
update pubchem: 00:00:00.4 [||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:00.4 [||||| ]

update pubchem: 00:00:00.4 [|||||| ]: </pre>
update pubchem: 00:00:00.4 [|||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:00.4 [|||||| ]

update pubchem: 00:00:00.5 [||||||| ]: </pre>
update pubchem: 00:00:00.5 [||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:00.5 [||||||| ]

update pubchem: 00:00:00.6 [|||||||| ]: </pre>
update pubchem: 00:00:00.6 [|||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:00.6 [|||||||| ]

update pubchem: 00:00:00.6 [||||||||| ]: </pre>
update pubchem: 00:00:00.6 [||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:00.6 [||||||||| ]

update pubchem: 00:00:00.7 [|||||||||| ]: </pre>
update pubchem: 00:00:00.7 [|||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:00.7 [|||||||||| ]

update pubchem: 00:00:00.8 [||||||||||| ]: </pre>
update pubchem: 00:00:00.8 [||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:00.8 [||||||||||| ]

update pubchem: 00:00:00.9 [|||||||||||| ]: </pre>
update pubchem: 00:00:00.9 [|||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:00.9 [|||||||||||| ]

update pubchem: 00:00:00.9 [||||||||||||| ]: </pre>
update pubchem: 00:00:00.9 [||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:00.9 [||||||||||||| ]

update pubchem: 00:00:01.0 [|||||||||||||| ]: </pre>
update pubchem: 00:00:01.0 [|||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:01.0 [|||||||||||||| ]

update pubchem: 00:00:01.1 [|||||||||||||||]: </pre>
update pubchem: 00:00:01.1 [|||||||||||||||]: end{sphinxVerbatim}

update pubchem: 00:00:01.1 [|||||||||||||||]

update pubchem: 00:00:01.2 [|||||||||||||| ]: </pre>
update pubchem: 00:00:01.2 [|||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:01.2 [|||||||||||||| ]

update pubchem: 00:00:01.2 [||||||||||||| ]: </pre>
update pubchem: 00:00:01.2 [||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:01.2 [||||||||||||| ]

update pubchem: 00:00:01.5 [|||||||||||| ]: </pre>
update pubchem: 00:00:01.5 [|||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:01.5 [|||||||||||| ]

update pubchem: 00:00:01.6 [||||||||||| ]: </pre>
update pubchem: 00:00:01.6 [||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:01.6 [||||||||||| ]

update pubchem: 00:00:01.8 [|||||||||| ]: </pre>
update pubchem: 00:00:01.8 [|||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:01.8 [|||||||||| ]

update pubchem: 00:00:02.0 [||||||||| ]: </pre>
update pubchem: 00:00:02.0 [||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:02.0 [||||||||| ]

update pubchem: 00:00:02.1 [|||||||| ]: </pre>
update pubchem: 00:00:02.1 [|||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:02.1 [|||||||| ]

update pubchem: 00:00:02.4 [||||||| ]: </pre>
update pubchem: 00:00:02.4 [||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:02.4 [||||||| ]

update pubchem: 00:00:02.7 [|||||| ]: </pre>
update pubchem: 00:00:02.7 [|||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:02.7 [|||||| ]

update pubchem: 00:00:02.8 [||||| ]: </pre>
update pubchem: 00:00:02.8 [||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:02.8 [||||| ]

update pubchem: 00:00:02.9 [|||| ]: </pre>
update pubchem: 00:00:02.9 [|||| ]: end{sphinxVerbatim}

update pubchem: 00:00:02.9 [|||| ]

update pubchem: 00:00:03.1 [||| ]: </pre>
update pubchem: 00:00:03.1 [||| ]: end{sphinxVerbatim}

update pubchem: 00:00:03.1 [||| ]

update pubchem: 00:00:03.5 [|| ]: </pre>
update pubchem: 00:00:03.5 [|| ]: end{sphinxVerbatim}

update pubchem: 00:00:03.5 [|| ]

update pubchem: 00:00:03.6 [| ]: </pre>
update pubchem: 00:00:03.6 [| ]: end{sphinxVerbatim}

update pubchem: 00:00:03.6 [| ]

update pubchem: 00:00:03.7 [ ]: </pre>
update pubchem: 00:00:03.7 [ ]: end{sphinxVerbatim}

update pubchem: 00:00:03.7 [ ]

update pubchem: 00:00:03.8 [| ]: </pre>
update pubchem: 00:00:03.8 [| ]: end{sphinxVerbatim}

update pubchem: 00:00:03.8 [| ]

update pubchem: 00:00:03.9 [|| ]: </pre>
update pubchem: 00:00:03.9 [|| ]: end{sphinxVerbatim}

update pubchem: 00:00:03.9 [|| ]

update pubchem: 00:00:04.2 [||| ]: </pre>
update pubchem: 00:00:04.2 [||| ]: end{sphinxVerbatim}

update pubchem: 00:00:04.2 [||| ]

update pubchem: 00:00:04.4 [|||| ]: </pre>
update pubchem: 00:00:04.4 [|||| ]: end{sphinxVerbatim}

update pubchem: 00:00:04.4 [|||| ]

update pubchem: 00:00:04.5 [||||| ]: </pre>
update pubchem: 00:00:04.5 [||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:04.5 [||||| ]

update pubchem: 00:00:05.0 [|||||| ]: </pre>
update pubchem: 00:00:05.0 [|||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:05.0 [|||||| ]

update pubchem: 00:00:05.2 [||||||| ]: </pre>
update pubchem: 00:00:05.2 [||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:05.2 [||||||| ]

update pubchem: 00:00:05.2 [|||||||| ]: </pre>
update pubchem: 00:00:05.2 [|||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:05.2 [|||||||| ]

update pubchem: 00:00:05.4 [||||||||| ]: </pre>
update pubchem: 00:00:05.4 [||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:05.4 [||||||||| ]

update pubchem: 00:00:05.8 [|||||||||| ]: </pre>
update pubchem: 00:00:05.8 [|||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:05.8 [|||||||||| ]

update pubchem: 00:00:05.9 [||||||||||| ]: </pre>
update pubchem: 00:00:05.9 [||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:05.9 [||||||||||| ]

update pubchem: 00:00:06.0 [|||||||||||| ]: </pre>
update pubchem: 00:00:06.0 [|||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:06.0 [|||||||||||| ]

update pubchem: 00:00:06.1 [||||||||||||| ]: </pre>
update pubchem: 00:00:06.1 [||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:06.1 [||||||||||||| ]

update pubchem: 00:00:06.1 [|||||||||||||| ]: </pre>
update pubchem: 00:00:06.1 [|||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:06.1 [|||||||||||||| ]

update pubchem: 00:00:06.2 [|||||||||||||||]: </pre>
update pubchem: 00:00:06.2 [|||||||||||||||]: end{sphinxVerbatim}

update pubchem: 00:00:06.2 [|||||||||||||||]

update pubchem: 00:00:06.3 [|||||||||||||| ]: </pre>
update pubchem: 00:00:06.3 [|||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:06.3 [|||||||||||||| ]

update pubchem: 00:00:06.3 [||||||||||||| ]: </pre>
update pubchem: 00:00:06.3 [||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:06.3 [||||||||||||| ]

update pubchem: 00:00:06.4 [|||||||||||| ]: </pre>
update pubchem: 00:00:06.4 [|||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:06.4 [|||||||||||| ]

update pubchem: 00:00:06.5 [||||||||||| ]: </pre>
update pubchem: 00:00:06.5 [||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:06.5 [||||||||||| ]

update pubchem: 00:00:06.6 [|||||||||| ]: </pre>
update pubchem: 00:00:06.6 [|||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:06.6 [|||||||||| ]

update pubchem: 00:00:06.6 [||||||||| ]: </pre>
update pubchem: 00:00:06.6 [||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:06.6 [||||||||| ]

update pubchem: 00:00:06.7 [|||||||| ]: </pre>
update pubchem: 00:00:06.7 [|||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:06.7 [|||||||| ]

update pubchem: 00:00:06.8 [||||||| ]: </pre>
update pubchem: 00:00:06.8 [||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:06.8 [||||||| ]

update pubchem: 00:00:06.8 [|||||| ]: </pre>
update pubchem: 00:00:06.8 [|||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:06.8 [|||||| ]

update pubchem: 00:00:06.9 [||||| ]: </pre>
update pubchem: 00:00:06.9 [||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:06.9 [||||| ]

update pubchem: 00:00:07.0 [|||| ]: </pre>
update pubchem: 00:00:07.0 [|||| ]: end{sphinxVerbatim}

update pubchem: 00:00:07.0 [|||| ]

update pubchem: 00:00:07.1 [||| ]: </pre>
update pubchem: 00:00:07.1 [||| ]: end{sphinxVerbatim}

update pubchem: 00:00:07.1 [||| ]

update pubchem: 00:00:07.1 [|| ]: </pre>
update pubchem: 00:00:07.1 [|| ]: end{sphinxVerbatim}

update pubchem: 00:00:07.1 [|| ]

update pubchem: 00:00:07.2 [| ]: </pre>
update pubchem: 00:00:07.2 [| ]: end{sphinxVerbatim}

update pubchem: 00:00:07.2 [| ]

update pubchem: 00:00:07.3 [ ]: </pre>
update pubchem: 00:00:07.3 [ ]: end{sphinxVerbatim}

update pubchem: 00:00:07.3 [ ]

update pubchem: 00:00:07.4 [| ]: </pre>
update pubchem: 00:00:07.4 [| ]: end{sphinxVerbatim}

update pubchem: 00:00:07.4 [| ]

update pubchem: 00:00:07.4 [|| ]: </pre>
update pubchem: 00:00:07.4 [|| ]: end{sphinxVerbatim}

update pubchem: 00:00:07.4 [|| ]

update pubchem: 00:00:07.5 [||| ]: </pre>
update pubchem: 00:00:07.5 [||| ]: end{sphinxVerbatim}

update pubchem: 00:00:07.5 [||| ]

update pubchem: 00:00:07.6 [|||| ]: </pre>
update pubchem: 00:00:07.6 [|||| ]: end{sphinxVerbatim}

update pubchem: 00:00:07.6 [|||| ]

update pubchem: 00:00:07.7 [||||| ]: </pre>
update pubchem: 00:00:07.7 [||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:07.7 [||||| ]

update pubchem: 00:00:07.7 [|||||| ]: </pre>
update pubchem: 00:00:07.7 [|||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:07.7 [|||||| ]

update pubchem: 00:00:07.8 [||||||| ]: </pre>
update pubchem: 00:00:07.8 [||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:07.8 [||||||| ]

update pubchem: 00:00:07.9 [|||||||| ]: </pre>
update pubchem: 00:00:07.9 [|||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:07.9 [|||||||| ]

update pubchem: 00:00:07.9 [||||||||| ]: </pre>
update pubchem: 00:00:07.9 [||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:07.9 [||||||||| ]

update pubchem: 00:00:08.0 [|||||||||| ]: </pre>
update pubchem: 00:00:08.0 [|||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:08.0 [|||||||||| ]

update pubchem: 00:00:08.1 [||||||||||| ]: </pre>
update pubchem: 00:00:08.1 [||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:08.1 [||||||||||| ]

update pubchem: 00:00:08.2 [|||||||||||| ]: </pre>
update pubchem: 00:00:08.2 [|||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:08.2 [|||||||||||| ]

update pubchem: 00:00:08.2 [||||||||||||| ]: </pre>
update pubchem: 00:00:08.2 [||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:08.2 [||||||||||||| ]

update pubchem: 00:00:08.3 [|||||||||||||| ]: </pre>
update pubchem: 00:00:08.3 [|||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:08.3 [|||||||||||||| ]

update pubchem: 00:00:08.4 [|||||||||||||||]: </pre>
update pubchem: 00:00:08.4 [|||||||||||||||]: end{sphinxVerbatim}

update pubchem: 00:00:08.4 [|||||||||||||||]

update pubchem: 00:00:08.5 [|||||||||||||| ]: </pre>
update pubchem: 00:00:08.5 [|||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:08.5 [|||||||||||||| ]

update pubchem: 00:00:08.5 [||||||||||||| ]: </pre>
update pubchem: 00:00:08.5 [||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:08.5 [||||||||||||| ]

update pubchem: 00:00:08.6 [|||||||||||| ]: </pre>
update pubchem: 00:00:08.6 [|||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:08.6 [|||||||||||| ]

update pubchem: 00:00:08.7 [||||||||||| ]: </pre>
update pubchem: 00:00:08.7 [||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:08.7 [||||||||||| ]

update pubchem: 00:00:08.7 [|||||||||| ]: </pre>
update pubchem: 00:00:08.7 [|||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:08.7 [|||||||||| ]

update pubchem: 00:00:08.8 [||||||||| ]: </pre>
update pubchem: 00:00:08.8 [||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:08.8 [||||||||| ]

update pubchem: 00:00:08.9 [|||||||| ]: </pre>
update pubchem: 00:00:08.9 [|||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:08.9 [|||||||| ]

update pubchem: 00:00:09.0 [||||||| ]: </pre>
update pubchem: 00:00:09.0 [||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:09.0 [||||||| ]

update pubchem: 00:00:09.0 [|||||| ]: </pre>
update pubchem: 00:00:09.0 [|||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:09.0 [|||||| ]

update pubchem: 00:00:09.1 [||||| ]: </pre>
update pubchem: 00:00:09.1 [||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:09.1 [||||| ]

update pubchem: 00:00:09.2 [|||| ]: </pre>
update pubchem: 00:00:09.2 [|||| ]: end{sphinxVerbatim}

update pubchem: 00:00:09.2 [|||| ]

update pubchem: 00:00:09.2 [||| ]: </pre>
update pubchem: 00:00:09.2 [||| ]: end{sphinxVerbatim}

update pubchem: 00:00:09.2 [||| ]

update pubchem: 00:00:09.3 [|| ]: </pre>
update pubchem: 00:00:09.3 [|| ]: end{sphinxVerbatim}

update pubchem: 00:00:09.3 [|| ]

update pubchem: 00:00:09.4 [| ]: </pre>
update pubchem: 00:00:09.4 [| ]: end{sphinxVerbatim}

update pubchem: 00:00:09.4 [| ]

update pubchem: 00:00:09.5 [ ]: </pre>
update pubchem: 00:00:09.5 [ ]: end{sphinxVerbatim}

update pubchem: 00:00:09.5 [ ]

update pubchem: 00:00:09.5 [| ]: </pre>
update pubchem: 00:00:09.5 [| ]: end{sphinxVerbatim}

update pubchem: 00:00:09.5 [| ]

update pubchem: 00:00:09.6 [|| ]: </pre>
update pubchem: 00:00:09.6 [|| ]: end{sphinxVerbatim}

update pubchem: 00:00:09.6 [|| ]

update pubchem: 00:00:09.7 [||| ]: </pre>
update pubchem: 00:00:09.7 [||| ]: end{sphinxVerbatim}

update pubchem: 00:00:09.7 [||| ]

update pubchem: 00:00:09.7 [|||| ]: </pre>
update pubchem: 00:00:09.7 [|||| ]: end{sphinxVerbatim}

update pubchem: 00:00:09.7 [|||| ]

update pubchem: 00:00:09.8 [||||| ]: </pre>
update pubchem: 00:00:09.8 [||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:09.8 [||||| ]

update pubchem: 00:00:09.9 [|||||| ]: </pre>
update pubchem: 00:00:09.9 [|||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:09.9 [|||||| ]

update pubchem: 00:00:10.0 [||||||| ]: </pre>
update pubchem: 00:00:10.0 [||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:10.0 [||||||| ]

update pubchem: 00:00:10.0 [|||||||| ]: </pre>
update pubchem: 00:00:10.0 [|||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:10.0 [|||||||| ]

update pubchem: 00:00:10.1 [||||||||| ]: </pre>
update pubchem: 00:00:10.1 [||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:10.1 [||||||||| ]

update pubchem: 00:00:10.2 [|||||||||| ]: </pre>
update pubchem: 00:00:10.2 [|||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:10.2 [|||||||||| ]

update pubchem: 00:00:10.2 [||||||||||| ]: </pre>
update pubchem: 00:00:10.2 [||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:10.2 [||||||||||| ]

update pubchem: 00:00:10.3 [|||||||||||| ]: </pre>
update pubchem: 00:00:10.3 [|||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:10.3 [|||||||||||| ]

update pubchem: 00:00:10.4 [||||||||||||| ]: </pre>
update pubchem: 00:00:10.4 [||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:10.4 [||||||||||||| ]

update pubchem: 00:00:10.5 [|||||||||||||| ]: </pre>
update pubchem: 00:00:10.5 [|||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:10.5 [|||||||||||||| ]

update pubchem: 00:00:10.5 [|||||||||||||||]: </pre>
update pubchem: 00:00:10.5 [|||||||||||||||]: end{sphinxVerbatim}

update pubchem: 00:00:10.5 [|||||||||||||||]

update pubchem: 00:00:10.6 [|||||||||||||| ]: </pre>
update pubchem: 00:00:10.6 [|||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:10.6 [|||||||||||||| ]

update pubchem: 00:00:10.7 [||||||||||||| ]: </pre>
update pubchem: 00:00:10.7 [||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:10.7 [||||||||||||| ]

update pubchem: 00:00:10.8 [|||||||||||| ]: </pre>
update pubchem: 00:00:10.8 [|||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:10.8 [|||||||||||| ]

update pubchem: 00:00:10.8 [||||||||||| ]: </pre>
update pubchem: 00:00:10.8 [||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:10.8 [||||||||||| ]

update pubchem: 00:00:10.9 [|||||||||| ]: </pre>
update pubchem: 00:00:10.9 [|||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:10.9 [|||||||||| ]

update pubchem: 00:00:11.0 [||||||||| ]: </pre>
update pubchem: 00:00:11.0 [||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:11.0 [||||||||| ]

update pubchem: 00:00:11.0 [|||||||| ]: </pre>
update pubchem: 00:00:11.0 [|||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:11.0 [|||||||| ]

update pubchem: 00:00:11.1 [||||||| ]: </pre>
update pubchem: 00:00:11.1 [||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:11.1 [||||||| ]

update pubchem: 00:00:11.2 [|||||| ]: </pre>
update pubchem: 00:00:11.2 [|||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:11.2 [|||||| ]

update pubchem: 00:00:11.3 [||||| ]: </pre>
update pubchem: 00:00:11.3 [||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:11.3 [||||| ]

update pubchem: 00:00:11.3 [|||| ]: </pre>
update pubchem: 00:00:11.3 [|||| ]: end{sphinxVerbatim}

update pubchem: 00:00:11.3 [|||| ]

update pubchem: 00:00:11.4 [||| ]: </pre>
update pubchem: 00:00:11.4 [||| ]: end{sphinxVerbatim}

update pubchem: 00:00:11.4 [||| ]

update pubchem: 00:00:11.5 [|| ]: </pre>
update pubchem: 00:00:11.5 [|| ]: end{sphinxVerbatim}

update pubchem: 00:00:11.5 [|| ]

update pubchem: 00:00:11.5 [| ]: </pre>
update pubchem: 00:00:11.5 [| ]: end{sphinxVerbatim}

update pubchem: 00:00:11.5 [| ]

update pubchem: 00:00:11.6 [ ]: </pre>
update pubchem: 00:00:11.6 [ ]: end{sphinxVerbatim}

update pubchem: 00:00:11.6 [ ]

update pubchem: 00:00:11.7 [| ]: </pre>
update pubchem: 00:00:11.7 [| ]: end{sphinxVerbatim}

update pubchem: 00:00:11.7 [| ]

update pubchem: 00:00:11.8 [|| ]: </pre>
update pubchem: 00:00:11.8 [|| ]: end{sphinxVerbatim}

update pubchem: 00:00:11.8 [|| ]

update pubchem: 00:00:11.8 [||| ]: </pre>
update pubchem: 00:00:11.8 [||| ]: end{sphinxVerbatim}

update pubchem: 00:00:11.8 [||| ]

update pubchem: 00:00:11.9 [|||| ]: </pre>
update pubchem: 00:00:11.9 [|||| ]: end{sphinxVerbatim}

update pubchem: 00:00:11.9 [|||| ]

update pubchem: 00:00:12.0 [||||| ]: </pre>
update pubchem: 00:00:12.0 [||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:12.0 [||||| ]

update pubchem: 00:00:12.0 [|||||| ]: </pre>
update pubchem: 00:00:12.0 [|||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:12.0 [|||||| ]

update pubchem: 00:00:12.1 [||||||| ]: </pre>
update pubchem: 00:00:12.1 [||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:12.1 [||||||| ]

update pubchem: 00:00:12.2 [|||||||| ]: </pre>
update pubchem: 00:00:12.2 [|||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:12.2 [|||||||| ]

update pubchem: 00:00:12.3 [||||||||| ]: </pre>
update pubchem: 00:00:12.3 [||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:12.3 [||||||||| ]

update pubchem: 00:00:12.3 [|||||||||| ]: </pre>
update pubchem: 00:00:12.3 [|||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:12.3 [|||||||||| ]

update pubchem: 00:00:12.4 [||||||||||| ]: </pre>
update pubchem: 00:00:12.4 [||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:12.4 [||||||||||| ]

update pubchem: 00:00:12.5 [|||||||||||| ]: </pre>
update pubchem: 00:00:12.5 [|||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:12.5 [|||||||||||| ]

update pubchem: 00:00:12.5 [||||||||||||| ]: </pre>
update pubchem: 00:00:12.5 [||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:12.5 [||||||||||||| ]

update pubchem: 00:00:12.6 [|||||||||||||| ]: </pre>
update pubchem: 00:00:12.6 [|||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:12.6 [|||||||||||||| ]

update pubchem: 00:00:12.7 [|||||||||||||||]: </pre>
update pubchem: 00:00:12.7 [|||||||||||||||]: end{sphinxVerbatim}

update pubchem: 00:00:12.7 [|||||||||||||||]

update pubchem: 00:00:12.8 [|||||||||||||| ]: </pre>
update pubchem: 00:00:12.8 [|||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:12.8 [|||||||||||||| ]

update pubchem: 00:00:12.8 [||||||||||||| ]: </pre>
update pubchem: 00:00:12.8 [||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:12.8 [||||||||||||| ]

update pubchem: 00:00:12.9 [|||||||||||| ]: </pre>
update pubchem: 00:00:12.9 [|||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:12.9 [|||||||||||| ]

update pubchem: 00:00:13.0 [||||||||||| ]: </pre>
update pubchem: 00:00:13.0 [||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:13.0 [||||||||||| ]

update pubchem: 00:00:13.0 [|||||||||| ]: </pre>
update pubchem: 00:00:13.0 [|||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:13.0 [|||||||||| ]

update pubchem: 00:00:13.1 [||||||||| ]: </pre>
update pubchem: 00:00:13.1 [||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:13.1 [||||||||| ]

update pubchem: 00:00:13.2 [|||||||| ]: </pre>
update pubchem: 00:00:13.2 [|||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:13.2 [|||||||| ]

update pubchem: 00:00:13.3 [||||||| ]: </pre>
update pubchem: 00:00:13.3 [||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:13.3 [||||||| ]

update pubchem: 00:00:13.3 [|||||| ]: </pre>
update pubchem: 00:00:13.3 [|||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:13.3 [|||||| ]

update pubchem: 00:00:13.4 [||||| ]: </pre>
update pubchem: 00:00:13.4 [||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:13.4 [||||| ]

update pubchem: 00:00:13.5 [|||| ]: </pre>
update pubchem: 00:00:13.5 [|||| ]: end{sphinxVerbatim}

update pubchem: 00:00:13.5 [|||| ]

update pubchem: 00:00:13.6 [||| ]: </pre>
update pubchem: 00:00:13.6 [||| ]: end{sphinxVerbatim}

update pubchem: 00:00:13.6 [||| ]

update pubchem: 00:00:13.6 [|| ]: </pre>
update pubchem: 00:00:13.6 [|| ]: end{sphinxVerbatim}

update pubchem: 00:00:13.6 [|| ]

update pubchem: 00:00:13.7 [| ]: </pre>
update pubchem: 00:00:13.7 [| ]: end{sphinxVerbatim}

update pubchem: 00:00:13.7 [| ]

update pubchem: 00:00:13.8 [ ]: </pre>
update pubchem: 00:00:13.8 [ ]: end{sphinxVerbatim}

update pubchem: 00:00:13.8 [ ]

update pubchem: 00:00:13.8 [| ]: </pre>
update pubchem: 00:00:13.8 [| ]: end{sphinxVerbatim}

update pubchem: 00:00:13.8 [| ]

update pubchem: 00:00:13.9 [|| ]: </pre>
update pubchem: 00:00:13.9 [|| ]: end{sphinxVerbatim}

update pubchem: 00:00:13.9 [|| ]

update pubchem: 00:00:14.0 [||| ]: </pre>
update pubchem: 00:00:14.0 [||| ]: end{sphinxVerbatim}

update pubchem: 00:00:14.0 [||| ]

update pubchem: 00:00:14.1 [|||| ]: </pre>
update pubchem: 00:00:14.1 [|||| ]: end{sphinxVerbatim}

update pubchem: 00:00:14.1 [|||| ]

update pubchem: 00:00:14.1 [||||| ]: </pre>
update pubchem: 00:00:14.1 [||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:14.1 [||||| ]

update pubchem: 00:00:14.2 [|||||| ]: </pre>
update pubchem: 00:00:14.2 [|||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:14.2 [|||||| ]

update pubchem: 00:00:14.3 [||||||| ]: </pre>
update pubchem: 00:00:14.3 [||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:14.3 [||||||| ]

update pubchem: 00:00:14.3 [|||||||| ]: </pre>
update pubchem: 00:00:14.3 [|||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:14.3 [|||||||| ]

update pubchem: 00:00:14.4 [||||||||| ]: </pre>
update pubchem: 00:00:14.4 [||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:14.4 [||||||||| ]

update pubchem: 00:00:14.5 [|||||||||| ]: </pre>
update pubchem: 00:00:14.5 [|||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:14.5 [|||||||||| ]

update pubchem: 00:00:14.6 [||||||||||| ]: </pre>
update pubchem: 00:00:14.6 [||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:14.6 [||||||||||| ]

update pubchem: 00:00:14.6 [|||||||||||| ]: </pre>
update pubchem: 00:00:14.6 [|||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:14.6 [|||||||||||| ]

update pubchem: 00:00:14.7 [||||||||||||| ]: </pre>
update pubchem: 00:00:14.7 [||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:14.7 [||||||||||||| ]

update pubchem: 00:00:14.8 [|||||||||||||| ]: </pre>
update pubchem: 00:00:14.8 [|||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:14.8 [|||||||||||||| ]

update pubchem: 00:00:14.8 [|||||||||||||||]: </pre>
update pubchem: 00:00:14.8 [|||||||||||||||]: end{sphinxVerbatim}

update pubchem: 00:00:14.8 [|||||||||||||||]

update pubchem: 00:00:14.9 [|||||||||||||| ]: </pre>
update pubchem: 00:00:14.9 [|||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:14.9 [|||||||||||||| ]

update pubchem: 00:00:15.0 [||||||||||||| ]: </pre>
update pubchem: 00:00:15.0 [||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:15.0 [||||||||||||| ]

update pubchem: 00:00:15.1 [|||||||||||| ]: </pre>
update pubchem: 00:00:15.1 [|||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:15.1 [|||||||||||| ]

update pubchem: 00:00:15.1 [||||||||||| ]: </pre>
update pubchem: 00:00:15.1 [||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:15.1 [||||||||||| ]

update pubchem: 00:00:15.2 [|||||||||| ]: </pre>
update pubchem: 00:00:15.2 [|||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:15.2 [|||||||||| ]

update pubchem: 00:00:15.3 [||||||||| ]: </pre>
update pubchem: 00:00:15.3 [||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:15.3 [||||||||| ]

update pubchem: 00:00:15.3 [|||||||| ]: </pre>
update pubchem: 00:00:15.3 [|||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:15.3 [|||||||| ]

update pubchem: 00:00:15.4 [||||||| ]: </pre>
update pubchem: 00:00:15.4 [||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:15.4 [||||||| ]

update pubchem: 00:00:15.5 [|||||| ]: </pre>
update pubchem: 00:00:15.5 [|||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:15.5 [|||||| ]

update pubchem: 00:00:15.6 [||||| ]: </pre>
update pubchem: 00:00:15.6 [||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:15.6 [||||| ]

update pubchem: 00:00:15.6 [|||| ]: </pre>
update pubchem: 00:00:15.6 [|||| ]: end{sphinxVerbatim}

update pubchem: 00:00:15.6 [|||| ]

update pubchem: 00:00:15.7 [||| ]: </pre>
update pubchem: 00:00:15.7 [||| ]: end{sphinxVerbatim}

update pubchem: 00:00:15.7 [||| ]

update pubchem: 00:00:15.8 [|| ]: </pre>
update pubchem: 00:00:15.8 [|| ]: end{sphinxVerbatim}

update pubchem: 00:00:15.8 [|| ]

update pubchem: 00:00:15.9 [| ]: </pre>
update pubchem: 00:00:15.9 [| ]: end{sphinxVerbatim}

update pubchem: 00:00:15.9 [| ]

update pubchem: 00:00:15.9 [ ]: </pre>
update pubchem: 00:00:15.9 [ ]: end{sphinxVerbatim}

update pubchem: 00:00:15.9 [ ]

update pubchem: 00:00:16.0 [| ]: </pre>
update pubchem: 00:00:16.0 [| ]: end{sphinxVerbatim}

update pubchem: 00:00:16.0 [| ]

update pubchem: 00:00:16.1 [|| ]: </pre>
update pubchem: 00:00:16.1 [|| ]: end{sphinxVerbatim}

update pubchem: 00:00:16.1 [|| ]

update pubchem: 00:00:16.1 [||| ]: </pre>
update pubchem: 00:00:16.1 [||| ]: end{sphinxVerbatim}

update pubchem: 00:00:16.1 [||| ]

update pubchem: 00:00:16.2 [|||| ]: </pre>
update pubchem: 00:00:16.2 [|||| ]: end{sphinxVerbatim}

update pubchem: 00:00:16.2 [|||| ]

update pubchem: 00:00:16.3 [||||| ]: </pre>
update pubchem: 00:00:16.3 [||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:16.3 [||||| ]

update pubchem: 00:00:16.4 [|||||| ]: </pre>
update pubchem: 00:00:16.4 [|||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:16.4 [|||||| ]

update pubchem: 00:00:16.4 [||||||| ]: </pre>
update pubchem: 00:00:16.4 [||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:16.4 [||||||| ]

update pubchem: 00:00:16.5 [|||||||| ]: </pre>
update pubchem: 00:00:16.5 [|||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:16.5 [|||||||| ]

update pubchem: 00:00:16.6 [||||||||| ]: </pre>
update pubchem: 00:00:16.6 [||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:16.6 [||||||||| ]

update pubchem: 00:00:16.6 [|||||||||| ]: </pre>
update pubchem: 00:00:16.6 [|||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:16.6 [|||||||||| ]

update pubchem: 00:00:16.7 [||||||||||| ]: </pre>
update pubchem: 00:00:16.7 [||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:16.7 [||||||||||| ]

update pubchem: 00:00:16.8 [|||||||||||| ]: </pre>
update pubchem: 00:00:16.8 [|||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:16.8 [|||||||||||| ]

update pubchem: 00:00:16.9 [||||||||||||| ]: </pre>
update pubchem: 00:00:16.9 [||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:16.9 [||||||||||||| ]

update pubchem: 00:00:16.9 [|||||||||||||| ]: </pre>
update pubchem: 00:00:16.9 [|||||||||||||| ]: end{sphinxVerbatim}

update pubchem: 00:00:16.9 [|||||||||||||| ]

update pubchem: 00:00:17.1 [|||||||||||||||]: </pre>
update pubchem: 00:00:17.1 [|||||||||||||||]: end{sphinxVerbatim}

update pubchem: 00:00:17.1 [|||||||||||||||]

update pubchem: 00:00:17.2 total runtime: </pre>
update pubchem: 00:00:17.2 total runtime: end{sphinxVerbatim}

update pubchem: 00:00:17.2 total runtime

got 242017 entries from pubchem_2024-03-02_12-03.gz
you can access emzed.db.pubchem now

Downloading the database might take several minutes. Once the download is finished you can access the pubchem database as Table with the command:

[34]:

pc = emzed.db.pubchem
pc[:5]

[34]:

cid	mw	m0	mf	iupac	synonyms	inchi	inchikey	smiles	is_in_kegg	is_in_hmdb	is_in_biocyc	url
str	float	float	str	str	str	str	str	str	bool	bool	bool	str
157010395	227.2400	227.025230	C9H9NO4S	(6-methyl-1H-indol-3-yl) hydrogen sulfate		InChI=1S/C9H9NO4S/c1-6-2-3-7-8(4-6)10-5-9(7)14-15(11,12)13/h2-5,10H,1H3,(H,11,12,13)	LTCLZQOHPYRNCF-UHFFFAOYSA-N	CC1=CC2=C(C=C1)C(=CN2)OS(=O)(=O)O	False	True	False	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=157010395
157010394	211.2400	211.030315	C9H9NO3S	5-methyl-1H-indole-3-sulfonic acid		InChI=1S/C9H9NO3S/c1-6-2-3-8-7(4-6)9(5-10-8)14(11,12)13/h2-5,10H,1H3,(H,11,12,13)	YCHNMDFAZIEGTD-UHFFFAOYSA-N	CC1=CC2=C(C=C1)NC=C2S(=O)(=O)O	False	True	False	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=157010394
157010393	245.2500	245.035795	C9H11NO5S	(5-methoxy-2,3-dihydro-1H-indol-3-yl) hydrogen sulfate		InChI=1S/C9H11NO5S/c1-14-6-2-3-8-7(4-6)9(5-10-8)15-16(11,12)13/h2-4,9-10H,5H2,1H3,(H,11,12,13)	IBLCRTNVHXHBPT-UHFFFAOYSA-N	COC1=CC2=C(C=C1)NCC2OS(=O)(=O)O	False	True	False	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=157010393
157010392	247.2300	247.015060	C8H9NO6S	(3-acetamido-2-hydroxyphenyl) hydrogen sulfate		InChI=1S/C8H9NO6S/c1-5(10)9-6-3-2-4-7(8(6)11)15-16(12,13)14/h2-4,11H,1H3,(H,9,10)(H,12,13,14)	RDMTZUKFIWMIKV-UHFFFAOYSA-N	CC(=O)NC1=C(C(=CC=C1)OS(=O)(=O)O)O	False	True	False	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=157010392
157010391	231.2300	231.020145	C8H9NO5S	(3-acetyl-2-aminophenyl) hydrogen sulfate		InChI=1S/C8H9NO5S/c1-5(10)6-3-2-4-7(8(6)9)14-15(11,12)13/h2-4H,9H2,1H3,(H,11,12,13)	VBVLVDWTWCPJFQ-UHFFFAOYSA-N	CC(=O)C1=C(C(=CC=C1)OS(=O)(=O)O)N	False	True	False	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=157010391

Colummns is_kegg, is_hmdb, and is_biocyc allows further constraining the database to a single source. To obtain the KEGG database one can apply the Table filter method:

[35]:

kegg = pc.filter(pc.is_in_kegg)
kegg[:5]

[35]:

cid	mw	m0	mf	iupac	synonyms	inchi	inchikey	smiles	is_in_kegg	is_in_hmdb	is_in_biocyc	url
str	float	float	str	str	str	str	str	str	bool	bool	bool	str
154573045	183.1400	183.066046	C5H14NO4P	hydroxy-[(1S)-1-hydroxy-2-(trimethylazaniumyl)ethyl]phosphinate	[(1R)-1-hydroxy-2-(trimethylamino)ethyl]phosphonate, C22271	InChI=1S/C5H14NO4P/c1-6(2,3)4-5(7)11(8,9)10/h5,7H,4H2,1-3H3,(H-,8,9,10)/t5-/m0/s1	YJIOAKRBBHTUPD-YFKPBYRVSA-N	C[N+](C)(C)CC(O)P(=O)(O)[O-]	True	False	False	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=154573045
154573044	639.5000	639.178908	C22H34N5O15P	(2R,4S,5R,6S)-5-acetamido-6-[(1R,2S)-1-acetamido-2-hydroxypropyl]-2-[[(2R,3S,4R,5R)-5-(4-amino-2-oxopyrimidin-1-yl)-3,4-dihydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-4-hydroxyoxane-2-carboxylic acid	CMP-8eLeg5Ac7Ac, CMP-N,N'-diacetyl-8-epilegionaminate, CMP-di-N-acetyl-8-epilegionaminic acid, CMP-5,7-diacetamido-3,5,7,9-tetradeoxy-L-glycero-D-galacto-non-2-ulosonic acid, C22256	InChI=1S/C22H34N5O15P/c1-8(28)14(24-9(2)29)18-15(25-10(3)30)11(31)6-22(41-18,20(34)35)42-43(37,38)39-7-12-16(32)17(33)19(40-12)27-5-4-13(23)26-21(27)36/h4-5,8,11-12,14-19,28,31-33H,6-7H2,1-3H3,(H,24,29)(H,25,30)(H,34,35)(H,37,38)(H2,23,26,36)/t8-,11-,12+,14+,15+,16+,17+,18-,19+,22+/m0/s1	XTZJKGIMUFZFBV-RGZSOLJUSA-N	CC(C(C1C(C(CC(O1)(C(=O)O)OP(=O)(O)OCC2C(C(C(O2)N3C=CC(=NC3=O)N)O)O)O)NC(=O)C)NC(=O)C)O	True	False	False	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=154573044
154573043	334.3200	334.137619	C13H22N2O8	(2S,4S,5R,6S)-5-acetamido-6-[(1R,2S)-1-acetamido-2-hydroxypropyl]-2,4-dihydroxyoxane-2-carboxylic acid	5,7-Di-N-acetyl-8-epi-legionaminic acid, 8eLeg5,7Ac2, 8eLeg5Ac7Ac, di-N-acetyl-8-epilegionaminic acid, N,N'-Diacetyl-8-epilegionaminate, C22255	InChI=1S/C13H22N2O8/c1-5(16)9(14-6(2)17)11-10(15-7(3)18)8(19)4-13(22,23-11)12(20)21/h5,8-11,16,19,22H,4H2,1-3H3,(H,14,17)(H,15,18)(H,20,21)/t5-,8-,9+,10+,11-,13-/m0/s1	ZJOSXOOPEBJBMC-HFPOIACQSA-N	CC(C(C1C(C(CC(O1)(C(=O)O)O)O)NC(=O)C)NC(=O)C)O	True	False	False	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=154573043
154573042	363.3600	363.164168	C14H25N3O8	(2S,4S,5R,6S)-5-acetamido-6-[(1R,2R)-1-[[(2R)-2-aminopropanoyl]amino]-2-hydroxypropyl]-2,4-dihydroxyoxane-2-carboxylic acid	5-N-Acetyl-7-N-(D-alanyl)-legionaminic acid, Leg5Ac7Ala, C22250	InChI=1S/C14H25N3O8/c1-5(15)12(21)17-9(6(2)18)11-10(16-7(3)19)8(20)4-14(24,25-11)13(22)23/h5-6,8-11,18,20,24H,4,15H2,1-3H3,(H,16,19)(H,17,21)(H,22,23)/t5-,6-,8+,9-,10-,11+,14+/m1/s1	YOJVIVTWNYBJKZ-YAAZEVEFSA-N	CC(C(C1C(C(CC(O1)(C(=O)O)O)O)NC(=O)C)NC(=O)C(C)N)O	True	False	False	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=154573042
154573041	275.3000	275.148123	C11H21N3O5	(2R)-N-[(2R,3S,4S,5S)-5-acetamido-4,6-dihydroxy-2-methyloxan-3-yl]-2-aminopropanamide	C22249, 2-Acetamido-4-(D-alanylamino)-2,4,6-trideoxy-D-mannopyranose	InChI=1S/C11H21N3O5/c1-4(12)10(17)14-7-5(2)19-11(18)8(9(7)16)13-6(3)15/h4-5,7-9,11,16,18H,12H2,1-3H3,(H,13,15)(H,14,17)/t4-,5-,7-,8+,9+,11?/m1/s1	AEVALLGLWYRNFL-PRKKQAIISA-N	CC1C(C(C(C(O1)O)NC(=O)C)O)NC(=O)C(C)N	True	False	False	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=154573041

Database matching after adduct annotation with emzed annotate_adducts is straighforward since it is sufficient to match the database column ``m0`` with the peak table column ``adduct_m0``.

In case the peaks table was built with mzmine we can use the column ``isotope_base_peak_id`` to keep only the monoisotopic features (and the features without isotopologues):

[36]:

t = t_annotate.filter(
    t_annotate.isotope_base_peak.is_none()
    | (t_annotate.isotope_base_peak == t_annotate.id)
)
print(f"before: {len(t_annotate)}\tafter: {len(t)} ")
t.col_names

before: 311     after: 259

[36]:

('id',
 'parent_id',
 'mz',
 'rt',
 'mzmin',
 'mzmax',
 'rtmin',
 'rtmax',
 'area',
 'height',
 'width',
 'peakmap',
 'isotope_group_id',
 'isotope_base_peak',
 'isotope_group_size',
 'isotope_charge',
 'adduct_name',
 'adduct_isotopes',
 'adduct_isotopes_abundance',
 'adduct_m0',
 'adduct_cluster_id')

[37]:

# keep only features with KEGG match
t_match = t.join(kegg, t.adduct_m0.approx_equal(kegg.m0, 0.003, 0.0))
print(f"number of matches: {len(t_match)}")
t_match[:10]

number of matches: 524

[37]:

id	parent_id	mz	rt	mzmin	mzmax	rtmin	rtmax	area	height	width	isotope_group_id	isotope_base_peak	isotope_group_size	isotope_charge	adduct_name	adduct_isotopes	adduct_isotopes_abundance	adduct_m0	adduct_cluster_id	cid__0	mw__0	m0__0	mf__0	iupac__0	synonyms__0	inchi__0	inchikey__0	smiles__0	is_in_kegg__0	is_in_hmdb__0	is_in_biocyc__0	url__0
int	int	MzType	RtType	MzType	MzType	RtType	RtType	float	float	RtType	int	int	int	int	str	str	float	MzType	int	str	float	float	str	str	str	str	str	str	bool	bool	bool	str
2382	2	86.059761	1.25 m	86.059662	86.059853	1.22 m	1.35 m	6.63e+06	2.24e+06	0.13 m	73	2382	3	1	M+H	+[1]H	0.999885	85.052485	0	6406	85.1000	85.052764	C4H7NO	2-hydroxy-2-methylpropanenitrile	ACETONE CYANOHYDRIN, 2-Hydroxy-2-methylpropanenitrile, 75-86-5, Acetone cyanhydrin, 2-Hydroxyisobutyronitrile, 2-Methyllactonitrile, 2-Cyano-2-propanol, alpha-Hydroxyisobutyronitrile, Acetoncyanhydrin, 2-Cyano-2-hydroxypropane, Propanenitrile, 2-hydroxy-2-methyl-, 2-Propanone, cyanohydrin, Acetoncianidrina, Acetonkyanhydrin, Lactonitrile, 2-methyl-, 2-Hydroxy-2-methylpropionitrile, Acetoncianhidrinei, Acetoncyaanhydrine, Acetonecyanhydrine, Cyanhydrine d'acetone, USAF RH-8, RCRA waste number P069, .alpha.-Hydroxyisobutyronitrile, NSC 131093, 2-hydroxy-2-methyl-propionitrile, CO1YOV1KFI, DTXSID7025427, CHEBI:15348, NSC-131093, WLN: QX1&1&CN, DTXCID705427, Acetonkyanhydrin [Czech], Acetoncyanhydrin [German], Acetonecyanohydrin, Acetoncianidrina [Italian], Acetoncyaanhydrine [Dutch], Acetonecyanhydrine [French], CAS-75-86-5, Acetoncianhidrinei [Romanian], Acetone cyanohydrin, stabilized, CCRIS 4657, Cyanhydrine d'acetone [French], HSDB 971, a-hydroxyisobutyronitrile, UNII-CO1YOV1KFI, EINECS 200-909-4, UN1541, RCRA waste no. P069, BRN 0605391, AI3-04257, aceton cyanohydrin, 2-hydroxy-2-methyl-propanenitrile, acetone cyanhydrine, acetone-cyanohydrin, acetone cyanohydrine, acetone-cyanohydrine, acetone cyano-hydrin, 2-Cyanopropan-2-ol, 2-hydroxy-2-cyanopropane, Acetone cyanohydrin, 99%, EC 200-909-4, 4-03-00-00785 (Beilstein Handbook Reference), NSC977, 2-methyl-2-hydroxypropionitrile, ACETONE CYANOHYDRIN [MI], CHEMBL1231861, DIMETHYLKETONE CYANOHYDRIN, 2-Methyl-2-hydroxypropanenitrile, NSC-977, NSC7080, 2-hydroxy-2-methylpropane nitrile, 2-methyl-2-oxidanyl-propanenitrile, 2-Hydroxy-2-methylpropanenitrile #, NSC-7080, Tox21_201490, Tox21_303256, MFCD00004455, NSC131093, AKOS000118890, DB02203, UN 1541, NCGC00249054-01, NCGC00257016-01, NCGC00259041-01, FT-0665317, M0361, NS00001430, EN300-19189, C02659, A838531, Q222936, W-109043, Acetone cyanohydrin, stabilized [UN1541] [Poison], InChI=1/C4H7NO/c1-4(2,6)3-5/h6H,1-2H, F1908-0094	InChI=1S/C4H7NO/c1-4(2,6)3-5/h6H,1-2H3	MWFMGBPGAXYFAR-UHFFFAOYSA-N	CC(C)(C#N)O	True	True	True	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6406
2391	15	94.044815	6.11 m	94.044685	94.044945	6.09 m	6.18 m	3.34e+05	1.16e+05	0.09 m	-	-	-	-	M+NH4	+[1]H4 [14]N	0.995862	76.010990	146	2723790	76.1200	76.009519	CH4N2S	thiourea	THIOUREA, Thiocarbamide, 62-56-6, 2-Thiourea, Isothiourea, Pseudothiourea, Sulfourea, Thiuronium, Sulourea, 2-Thiopseudourea, Thiocarbonic acid diamide, Urea, thio-, Carbamimidothioic acid, beta-Thiopseudourea, Thiomocovina, Urea, 2-thio-, Tsizp 34, Pseudourea, 2-thio-, Thioharnstoff, Thiokarbamid, USAF EK-497, carbonothioic diamide, Thiocarbamid, RCRA waste number U219, Sulfouren, Caswell No. 855, NSC 5033, CCRIS 588, aminothioamide, GYV9AM2QAG, thio-urea, UNII-GYV9AM2QAG, HSDB 1401, 17356-08-0, aminothiocarboxamide, EINECS 200-543-5, H2NC(S)NH2, EPA Pesticide Chemical Code 080201, .beta.-Thiopseudourea, DTXSID9021348, CHEBI:36946, AI3-03582, NSC-5033, MFCD00008067, (NH2)2CS, CHEMBL260876, DTXCID101348, NSC5033, EC 200-543-5, THIOUREA (IARC), THIOUREA [IARC], TOU, Thiomocovina [Czech], sulfocarbamide, RCRA waste no. U219, CAS-62-56-6, S C (N H2)2, PROPYLTHIOURACIL IMPURITY A (EP IMPURITY), PROPYLTHIOURACIL IMPURITY A [EP IMPURITY], THIOUREA, ACS, thiopseudourea, 2-Thio-Pseudourea, Thiocarbonic diamide, 2-Thio-Urea, beta -thiopseudourea, Urea, 2-thio, Caswell no 855, THIOCARBMATE, Thiourea, 99%, thiourea; thiocarbamide, THIOUREA [HSDB], THIOUREA [INCI], WLN: ZYZUS, THIOUREA [MI], THIOUREA [VANDF], Urea, thio- (8CI), THIOUREA [WHO-DD], Thiourea ACS Reagent Grade, Thiourea, LR, >=98%, MLS002454451, BIDD:ER0582, HMS2234E12, HMS3369M21, AMY40190, BCP27948, STR00054, Tox21_201873, Tox21_302767, BDBM50229993, Thiourea, ACS reagent, >=99.0%, AKOS000269032, AKOS028109302, CCG-207963, UN 2877, Thiourea, ReagentPlus(R), >=99.0%, NCGC00091199-01, NCGC00091199-02, NCGC00091199-03, NCGC00256530-01, NCGC00259422-01, Thiourea, >=99.999% (metals basis), BP-31025, SMR000857187, Thiourea, JIS special grade, >=98.0%, Thiourea, p.a., ACS reagent, 99.0%, FT-0675198, NS00002781, T0445, T2475, T2835, EN300-19634, T-3650, 10.14272/UMGDCJDMYOKAJW-UHFFFAOYSA-N.1, A833853, Q528995, Thiourea, puriss. p.a., ACS reagent, >=99.0%, doi:10.14272/UMGDCJDMYOKAJW-UHFFFAOYSA-N.1, J-524966, F0001-1650, Thiourea, Pharmaceutical Secondary Standard; Certified Reference Material	InChI=1S/CH4N2S/c2-1(3)4/h(H4,2,3,4)	UMGDCJDMYOKAJW-UHFFFAOYSA-N	C(=S)(N)N	True	True	True	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2723790
2407	39	112.050262	1.63 m	112.050102	112.050461	1.59 m	1.72 m	8.53e+05	2.85e+05	0.13 m	-	-	-	-	M+H	+[1]H	0.999885	111.042986	2	68791	111.1000	111.043262	C4H5N3O	4-amino-1,3-diazabicyclo[3.1.0]hex-3-en-2-one	BM-06002, 4-Amino-1,3-diazabicyclo[3.1.0]hex-3-en-2-one, MLS002702958, 4-Imino-1,3-diazabicyclo-[3.1.0]hexan-2-one, NSC313425, NSC-313425, NCGC00181304-01, Imexon (USAN/INN), SCHEMBL154584, CHEMBL146428, GTPL8273, DTXCID9026895, Amplimexon (proposed trade name), HMS3264B05, Tox21_112779, NSC714597, AKOS006273819, AOP-990001, CCG-213631, DB05003, NSC-714597, NCGC00389454-01, NCI60_002705, SMR001566772, BM 06 002, CAS-59643-91-3, 4-CHLORO-(ALPHA-PHENYL)-CINNAMICACID, NS00034209, D08932, AB01013867_03, 4-Imino-1,3-diazabicyclo-[3.1.0]-hexan-2-one	InChI=1S/C4H5N3O/c5-3-2-1-7(2)4(8)6-3/h2H,1H2,(H2,5,6,8)	BIXBBIPTYBJTRY-UHFFFAOYSA-N	C1C2N1C(=O)N=C2N	True	True	False	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=68791
2407	39	112.050262	1.63 m	112.050102	112.050461	1.59 m	1.72 m	8.53e+05	2.85e+05	0.13 m	-	-	-	-	M+H	+[1]H	0.999885	111.042986	2	597	111.1000	111.043262	C4H5N3O	6-amino-1H-pyrimidin-2-one	cytosine, 71-30-7, 4-Amino-2-hydroxypyrimidine, Cytosinimine, 4-aminopyrimidin-2(1H)-one, 2(1H)-Pyrimidinone, 4-amino-, 4-Amino-2(1H)-pyrimidinone, 6-Aminopyrimidin-2(1h)-One, Cytosin, 4-aminopyrimidin-2-ol, 6-amino-1H-pyrimidin-2-one, Zytosin, 4-amino-2-oxo-1,2-dihydropyrimidine, Cyt, AI3-52281, MFCD00006034, CHEBI:16040, 2(1H)-pyrimidinone, 6-amino-, EINECS 200-749-5, UNII-8J337D1HZY, NSC 27787, NSC-27787, 8J337D1HZY, 107646-83-3, DTXSID4044456, Cytosine-5,6-d2, 134434-40-5, 107646-84-4, 134434-39-2, 2(1H)-Pyrimidinone, 3,4-dihydro-4-imino-, (E)- (9CI), 4-Aminopyrimidin-2-(1H)-one, DTXCID2024456, EC 200-749-5, NSC27787, 2(1H)-Pyrimidinone, 3,4-dihydro-4-imino-, (Z)- (9CI), 4-Aminopyrimidin-2(1H)-one (Cytosine), CYTOSINE (USP-RS), CYTOSINE [USP-RS], SMR000857094, 4-Amino-1H-pyrimidin-2-one, LAMIVUDINE IMPURITY E (EP IMPURITY), LAMIVUDINE IMPURITY E [EP IMPURITY], LAMIVUDINE IMPURITY C (USP IMPURITY), LAMIVUDINE IMPURITY C [USP IMPURITY], 106391-24-6, 287484-45-1, 4-amino-2-pyrimidinol, aminopyrimidone, iminopyrimidinone, 3h-cytosine, Cytosine (8CI), 4-amino-1,2-dihydropyrimidin-2-one, Lamivudine impurity c, 2-Pyrimidinol, 1,4-dihydro-4-imino-, (Z)- (9CI), Gemcitabine impurity A, Cytosine, >=99%, 2(1H)-Pyrimidinone, 4-amino- (9CI), CYTOSINE [INCI], Lamivudine impurity c rs, CYTOSINE [MI], 4-amino-pyrimidin-2-ol, 4-Amino-2-oxypyrimidine, bmse000180, CYTOSINE [WHO-DD], Epitope ID:167475, 4-Amino-2(1H)pyrimidone, SCHEMBL4059, 4-Amino-2(1H)-pyrimidone, 2-Hydroxy-6-amino-pyrimidin, 4-amino-3h-pyrimidin-2-one, MLS001332635, MLS001332636, CHEMBL15913, 2-Pyrimidinol, 1,6-dihydro-6-imino-, (E)- (9CI), GTPL8490, HMS2233N21, HMS3369N05, Cytosine, >=99.0% (HPLC), BCP22793, HY-I0626, STR01426, Tox21_302139, s4893, 6-amino-1,2-dihydropyrimidin-2-one, AKOS000120336, AKOS005443393, AKOS015896942, AC-2489, AM83918, BCP9000005, CCG-266052, CS-W020703, 6-amino-1H-pyrimidin-2-one;CYTOSINE, CAS-71-30-7, CID 5274263, SRI-2354-05, NCGC00247019-01, NCGC00255926-01, BP-20183, Cytosine, Vetec(TM) reagent grade, 99%, NCI60_012445, SY001643, 4-imino-3,4-dihydropyrimidin-2(1H)-one, FT-0617471, NS00006844, EN300-21504, C00380, 2-Pyrimidinol,1,6-dihydro-6-imino-,(E)-(9ci), A837149, Q178425, 2(1H)-Pyrimidinone,3,4-dihydro-4-imino-,(E)-(9ci), CBA1D098-C5AB-46CE-AAC6-754572886EB2, Z203045338, Cytosine, United States Pharmacopeia (USP) Reference Standard, Cytosine, Pharmaceutical Secondary Standard; Certified Reference Material, Gemcitabine impurity A, European Pharmacopoeia (EP) Reference Standard	InChI=1S/C4H5N3O/c5-3-1-2-6-4(8)7-3/h1-2H,(H3,5,6,7,8)	OPTASPLRGRRNAP-UHFFFAOYSA-N	C1=C(NC(=O)N=C1)N	True	True	True	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=597
2418	57	118.085960	1.65 m	118.085831	118.086128	1.62 m	1.72 m	7.23e+05	2.20e+05	0.10 m	-	-	-	-	M+NH4	+[1]H4 [14]N	0.995862	100.052135	4	125468	100.1200	100.052430	C5H8O2	(E)-2-methylbut-2-enoic acid	TIGLIC ACID, 80-59-1, Tiglinic acid, Cevadic acid, 2-methylbut-2-enoic acid, trans-2,3-Dimethylacrylic acid, (E)-2-Methylbut-2-enoic acid, trans-2-Methyl-2-butenoic acid, (E)-2-Methyl-2-butenoic acid, 2-Methyl-2-butenoic acid, trans-2-Methylcrotonic acid, (E)-2,3-Dimethylacrylic acid, (E)-2-Methylcrotonic acid, 2-Butenoic acid, 2-methyl-, (2E)-, Crotonic acid, 2-methyl-, (E)-, tiglate, (2E)-2-methylbut-2-enoic acid, 2,3-Dimethylacrylic acid, (E)-, alpha-Methylcrotonic acid, FEMA No. 3599, trans-alpha,beta-Dimethylacrylic acid, 2-Butenoic acid, 2-methyl-, (E)-, 13201-46-2, 2-Methylcrotonic acid, NSC 44235, Tiglinsaeure, alpha-Methylcrotonic acid, (E)-, 2-Methyl-2-butenoic acid, (E)-, EINECS 201-295-0, (E)-2-methyl-Crotonic acid, UNII-I5792N03HC, BRN 1236500, CHEBI:9592, AI3-36118, HSDB 7614, methyl methacrylic acid, I5792N03HC, NSC-8999, 2,3-Dimethylacrylic acid, MFCD00066864, NSC-44235, 2-methyl-2E-butenoic acid, trans-.alpha.,.beta.-Dimethylacrylic acid, 2-methyl-(E)-2-butenoic acid, alpha,beta-dimethyl acrylic acid, E)-2-METHYLCROTONIC ACID, NSC8999, 4-02-00-01552 (Beilstein Handbook Reference), NSC44235, 2-METHYLBUT-2-ENOIC ACID, (E)-, METHYL-2-BUTENOIC ACID, TRANS-2-, sabadillic acid, 2-Butenoic acid,2-methyl-, Cevadate, Tiglinate, epsilon-Tiglate, E-Tiglate, Methylbutenoicacid, E-Tiglic acid, (2E)-2-Methyl-2-butenoic acid, 2-methyl-Crotonate, epsilon-Tiglic acid, EINECS 236-167-3, methyl crotonic acid, methylmethacrylic acid, 2,3-Dimethylacrylate, 2-methyl-2-butenoate, Tiglic acid, (E)-, 2-Methylbut-2-enoate, 2-methyl-Crotonic acid, trans-2-Methylcrotonate, (E)-2-Methylcrotonate, 2-methylbut-2-enoicacid, (E)-2-methyl-Crotonate, alpha-methyl-crotonic acid, TIGLIC ACID [MI], trans-2,3-Dimethylacrylic acid (Tiglic acid), bmse000727, TIGLIC ACID [HSDB], trans-2,3-Dimethylacrylate, (E)-2,3-Dimethylacrylate, 2-methyl-(E)-2-butenoate, trans-2-Methyl-2-butenoate, (E)-2-methyl-2-Butenoate, SCHEMBL15042, CHEMBL52416, (2E)-2-Methyl-2-butenoate, GTPL6499, trans-Crotonic acid, 2-methyl-, CHEBI:36432, trans-alpha,beta-Dimethylacrylate, DTXSID80883257, trans-2-methyl-but-2-enoic acid, BCP18945, (2E)-2-Methyl-2-butenoic acid #, LMFA01020030, s3789, AKOS003375681, CCG-266028, CS-W013715, HY-W012999, trans-2,3-Dimethylacrylic acid, 98%, (E)-.ALPHA.-METHYLCROTONIC ACID, LS-13047, CS-0356606, NS00009823, T0246, EN300-83150, C08279, D78012, EN300-370249, TRANS-2-METHYL-2-BUTENOIC ACID [FHFI], trans-2-Methyl-2-butenoic acid, >=99%, FG, A839954, Q425475, Q27116830, F0001-2086, Z1205493556, trans-2-Methylcrotonic acid = trans-2-Methyl-2-butenoate, trans-2-Methylcrotonic acid = trans-2-Methyl-2-butenoic acid, trans-2-Methylcrotonic acid = trans-2-Methyl-2-butenoic acid = Tiglinate, alpha,beta-dimethyl acrylic acid; 2-Methyl-2-butenoic acid; (E)-2-methyl-Crotonic acid, trans-2-Methylcrotonic acid = trans-2-Methyl-2-butenoic acid = Tiglinic acid	InChI=1S/C5H8O2/c1-3-4(2)5(6)7/h3H,1-2H3,(H,6,7)/b4-3+	UIERETOOQGIECD-ONEGZZNKSA-N	CC=C(C)C(=O)O	True	True	True	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=125468
2418	57	118.085960	1.65 m	118.085831	118.086128	1.62 m	1.72 m	7.23e+05	2.20e+05	0.10 m	-	-	-	-	M+NH4	+[1]H4 [14]N	0.995862	100.052135	4	31261	100.1200	100.052430	C5H8O2	pentane-2,4-dione	Acetylacetone, 2,4-Pentanedione, Pentane-2,4-dione, 123-54-6, Acetoacetone, ACAC, 2,4-Dioxopentane, Diacetylmethane, 2,4-Pentadione, ACETYL ACETONE, Pentanedione, Pentan-2,4-dione, Pentanedione-2,4, Acetyl 2-propanone, Acetone, acetyl-, Hacac, 2-Propanone, acetyl-, 2,4-Pentandione, NSC 5575, acetylaceton, CCRIS 3466, acetyl-acetone, HSDB 2064, EINECS 204-634-0, 4-Hydroxy-3-penten-2-one, UNII-46R950BP4J, BRN 0741937, CH3-CO-CH2-CO-CH3, DTXSID4021979, CHEBI:14750, AI3-02266, 46R950BP4J, ACETYLACETONE ENOL, CH3COCH2COCH3, NSC-5575, DTXCID601979, EC 204-634-0, 4-01-00-03662 (Beilstein Handbook Reference), MFCD00008787, UN2310, Acetylaceetone, pentane-2, pentan-2, acetylacetone (2,4-pentanedione), 2,4 pentanedione, 2.4-pentanedione, pentane2,4-dione, Acetyl-2-Propanone, Acetyl-2-propaneone, 2,4-pentane-dione, ACETYLACETONE [MI], 1-methylbutane-1,3-dione, SCHEMBL1608, NCIOpen2_000702, Pentane-2,4-dione [UN2310] [Flammable liquid], ACETYL ACETONE [HSDB], CHEMBL191625, WLN: 1V1V1, Acetylacetone;Pentane-2,4-dione, BDBM22766, NSC5575, Acetylacetone, analytical standard, BCP31333, STR00020, Tox21_200414, LMFA12000075, AKOS000118994, UN 2310, Acetylacetone, ReagentPlus(R), >=99%, NCGC00248599-01, NCGC00257968-01, BP-30252, CAS-123-54-6, PD193123, Acetylacetone, JIS special grade, >=99%, FT-0610237, FT-0622988, NS00007112, P0052, EN300-19143, Q413447, J-507260, Pentane-2,4-dione [UN2310] [Flammable liquid], Ultra pure, inverted exclamation markY99.5% (GC), F1908-0168, InChI=1/C5H8O2/c1-4(6)3-5(2)7/h3H2,1-2H, Acetylacetone, produced by Wacker Chemie AG, Burghausen, Germany, >=99.5% (GC)	InChI=1S/C5H8O2/c1-4(6)3-5(2)7/h3H2,1-2H3	YRKCREAYFQTBPV-UHFFFAOYSA-N	CC(=O)CC(=O)C	True	True	True	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=31261
2418	57	118.085960	1.65 m	118.085831	118.086128	1.62 m	1.72 m	7.23e+05	2.20e+05	0.10 m	-	-	-	-	M+NH4	+[1]H4 [14]N	0.995862	100.052135	4	10953	100.1200	100.052430	C5H8O2	oxan-2-one	delta-Valerolactone, 542-28-9, TETRAHYDRO-2H-PYRAN-2-ONE, 5-Valerolactone, oxan-2-one, 2H-Pyran-2-one, tetrahydro-, tetrahydropyran-2-one, Valerolactone, Tetrahydro-2-pyranone, .delta.-Valerolactone, 26354-94-9, .delta.-Valeryllactone, tetrahydro-pyran-2-one, 1-Oxacyclohexan-2-one, NSC 6247, CHEMBL452383, DTXSID6044438, CHEBI:16545, NSC-6247, MFCD00006645, 14V1X9149L, DELTA-VALEROLACTONE-3,3,4,4-D4, Penta-1,5-lactone, delta-Valeryllactone, d-valerolactone, Cyclopentanolide, o-Valerolactone, o-Valeryllactone, 5-pentanolide, UNII-14V1X9149L, Pentan-5-olide, Delta valerolactone, 2-oxotetrahydropyran, Delta-Valerolactotie, EINECS 208-807-1, delta -Valerolactone, .delta.-Pentalactone, tetrahydro-4H-pyranone, AI3-25024, Pentanoic acid, 5-hydroxy-, delta-lactone, Valeric acid, delta-hydroxy-, delta-lactone, WLN: T6OVTJ, Tetrahydro-2H-2-pyranone, EC 208-807-1, SCHEMBL37722, Valeric acid, .delta.-lactone, DTXCID4024438, Pentanoic acid, .delta.-lactone, NSC6247, DELTA-VALEROLACTONE [INCI], NSC65442, STR08736, delta-Valerolactone, technical grade, Tox21_302166, BDBM50360797, NSC 65442, NSC-65442, s3099, AKOS009158729, CS-W013713, HY-W012997, 5-Hydroxypentanoic acid .delta.-lactone, NCGC00255756-01, CAS-542-28-9, Delta-valerolactone (may contain polymer), SY018264, FT-0624505, NS00005148, V0039, EN300-43042, Pentanoic acid, 5-hydroxy-, .delta.-lactone, C02240, (Difluoro-trimethylsilanyl-methyl)-phosphonicacid, Q903610, Valeric acid, .delta.-hydroxy-, .delta.-lactone, W-105654, InChI=1/C5H8O2/c6-5-3-1-2-4-7-5/h1-4H, Z432085120	InChI=1S/C5H8O2/c6-5-3-1-2-4-7-5/h1-4H2	OZJPLYNZGCXSJM-UHFFFAOYSA-N	C1CCOC(=O)C1	True	True	True	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=10953
2418	57	118.085960	1.65 m	118.085831	118.086128	1.62 m	1.72 m	7.23e+05	2.20e+05	0.10 m	-	-	-	-	M+NH4	+[1]H4 [14]N	0.995862	100.052135	4	6658	100.1200	100.052430	C5H8O2	methyl 2-methylprop-2-enoate	METHYL METHACRYLATE, 80-62-6, methyl 2-methylprop-2-enoate, Methylmethacrylate, Methyl methylacrylate, Methyl 2-methylpropenoate, Methacrylic acid methyl ester, Pegalan, Methyl-methacrylat, Methyl 2-methyl-2-propenoate, Diakon, 9011-14-7, Acryester M, 2-Propenoic acid, 2-methyl-, methyl ester, Methacrylate de methyle, Methyl 2-methylacrylate, 2-Methyl-2-propenoic acid methyl ester, Methacrylsaeuremethyl ester, 2-(Methoxycarbonyl)-1-propene, Metakrylan metylu, Methylmethacrylaat, Metil metacrilato, Methyl meth-d3-acrylate, Rcra waste number U162, Methyl alpha-methylacrylate, Methyl methacrylate monomer, TEB 3K, NCI-C50680, 2-Methylacrylic acid, methyl ester, Methacrylic acid, methyl ester, Acrylic acid, 2-methyl-, methyl ester, 2-Methyl-acrylic acid methyl ester, NSC 4769, Monocite methacrylate monomer, Methylester kyseliny methakrylove, CHEBI:34840, 2-methylacrylic acid methyl ester, Methyl .alpha.-methylacrylate, 51391-19-6, Cranioplast, DTXSID2020844, Metaplex, Kallocryl A, NSC-4769, Simplex P, Methyl methacrylate monomer, inhibited, 143476-91-9, 55063-97-3, Methyl ester of 2-methyl-2-propenoic acid, 196OC77688, Methacrylic acid-methyl ester, 114512-63-9, 9065-11-6, Plexiglass, Methylmethacrylaat [Dutch], Metakrylan metylu [Polish], Methyl-methacrylat [German], Metil metacrilato [Italian], CCRIS 1364, HSDB 195, Methacrylate de methyle [French], Methacrylsaeuremethyl ester [German], EINECS 201-297-1, UN1247, RCRA waste no. U162, BRN 0605459, Eudragit, Methylester kyseliny methakrylove [Czech], AI3-24946, methoxymethacrolein, UNII-196OC77688, MMA (stabilized), J69, Acrylic resins (PMMA), High flow injection grade, METHYL METHACTRYLATE, Epitope ID:131321, Methyl 2-methylacrylate #, Methyl methacrylate (MMA), EC 201-297-1, Methyl-.alpha.-methacrylate, SCHEMBL1849, CH2=C(CH3)COOCH3, 4-02-00-01519 (Beilstein Handbook Reference), NA 1247 (Salt/Mix), UN 1247 (Salt/Mix), BIDD:ER0634, CHEMBL49996, DTXCID80844, Methyl methacrylate, 99.5%, WLN: 1UY1&VO1, Methyl methacrylate, stabilized, 'monocite' Methacrylate monomer, Methyl methacrylate, CP,98.0%, NSC4769, METHYL METHACRYLATE [HSDB], METHYL METHACRYLATE [IARC], METHYL METHACRYLATE [INCI], METHYLMETHACRYLATE [MART.], METHYL METHACRYLATE [VANDF], METHYLMETHACRYLATE [WHO-DD], Tox21_200367, MFCD00008587, AKOS000120216, Methyl methacrylate, 99%, stabilized, CAS-80-62-6, NCGC00091089-01, NCGC00091089-02, NCGC00257921-01, Methacrylic Acid Methyl Ester (stabilized, METHACRYLIC ACID METHYL ESTER [MI], M0087, METHYL 2-METHYL-2-PROPENOATE [FHFI], NS00009302, EN300-19210, C19504, Methyl methacrylate 1000 microg/mL in Methanol, Methyl methacrylate, SAJ first grade, >=99.0%, A839957, Q382897, J-522614, F0001-2087, InChI=1/C5H8O2/c1-4(2)5(6)7-3/h1H2,2-3H, Methacrylic acid-methyl ester 100 microg/mL in Cyclohexane, Methyl Methacrylate (stabilized with 6-tert-Butyl-2,4-xylenol), Methyl Methacrylate, (stabilized with 6-tert-Butyl-2,4-xylenol), Methyl methacrylate, contains <=30 ppm MEHQ as inhibitor, 99%, Methyl methacrylate, European Pharmacopoeia (EP) Reference Standard, Methyl methacrylate (MMA), 99.5%(GC), contains 30ppm MEHQ as stabilizer, Methyl methacrylate (MMA), AR, 99.0%, contains 30ppm MEHQ as stabilizer, Methyl methacrylate monomer, inhibited [UN1247] [Flammable liquid], PROPENOIC ACID,2-METHYL,METHYLESTER (METHACRYLATE METHYLESTER), 97555-82-3	InChI=1S/C5H8O2/c1-4(2)5(6)7-3/h1H2,2-3H3	VVQNEPGJFQJSBK-UHFFFAOYSA-N	CC(=C)C(=O)OC	True	True	False	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=6658
2418	57	118.085960	1.65 m	118.085831	118.086128	1.62 m	1.72 m	7.23e+05	2.20e+05	0.10 m	-	-	-	-	M+NH4	+[1]H4 [14]N	0.995862	100.052135	4	8821	100.1200	100.052430	C5H8O2	ethyl prop-2-enoate	ETHYL ACRYLATE, 140-88-5, ethyl prop-2-enoate, Acrylic acid ethyl ester, Ethyl propenoate, 2-Propenoic acid, ethyl ester, Ethyl 2-propenoate, Ethylacrylaat, Ethylakrylat, Etil acrilato, Ethyl acrylate, inhibited, Aethylacrylat, Etilacrilatului, Ethoxycarbonylethylene, Acrylic acid, ethyl ester, Acrylate d'ethyle, Carboset 511, Akrylanem etylu, Acrylsaeureaethylester, Ethyl acrylate (inhibited), Ethyl propenoate, inhibited, NCI-C50384, FEMA No. 2418, FEMA Number 2418, Ethyl acrylate (natural), Acrylic acid, ethyl ester (inhibited), Etilacrilatului [Romanian], RCRA waste number U113, Ethylester kyseliny akrylove, NSC 8263, 9003-32-1, CCRIS 248, 2-Propenoic Acid Ethyl Ester, HSDB 193, EINECS 205-438-8, ethyl 2E-propenoate, BRN 0773866, DTXSID4020583, UNII-71E6178C9T, AI3-15734, NSC-8263, Etilacrilatului(roumanian), CH2=CHCOOC2H5, 71E6178C9T, ETHYL-D5 ACRYLATE, DTXCID00583, CHEBI:82327, Ethyl ester of 2-propenoic acid, EC 205-438-8, 4-02-00-01460 (Beilstein Handbook Reference), 35717-06-7, WE(2:0/3:1(2E)), Ethyl Acrylate (Stabilized with 4-methoxyphenol), ETHYL ACRYLATE (IARC), ETHYL ACRYLATE [IARC], Ethylakrylat [Czech], Ethylacrylaat [Dutch], Aethylacrylat [German], Etil acrilato [Italian], Akrylanem etylu [Polish], Acrylate d'ethyle [French], Acrylsaeureaethylester [German], Ethylester kyseliny akrylove [Czech], UN1917, RCRA waste no. U113, Acrylic acid-ethyl ester, SCHEMBL3180, ETHYL ACRYLATE [MI], ETHYL ACRYLATE [FCC], CHEMBL52084, ETHYL ACRYLATE [FHFI], ETHYL ACRYLATE [HSDB], ETHYL ACRYLATE [INCI], WLN: 2OV1U1, Ethyl acrylate, inhibited [UN1917] [Flammable liquid], FEMA 2418, FEMA-2418, NSC8263, Ethyl acrylate, analytical standard, AMY40211, BCP06341, Tox21_202513, LMFA07010505, MFCD00009188, Ethyl Acrylate, stabilized with MEHQ, AKOS005721113, Ethyl Acrylate (stabilized with MEHQ), NCGC00091041-01, NCGC00260062-01, CAS-140-88-5, VS-02962, Ethyl acrylate, purum, >=99.0% (GC), Propenoic acid,ethyl ester (ethylacrylate), Ethyl Acrylate 1000 microg/mL in Methanol, A0143, Ethyl acrylate, >=99.5%, stabilized, FG, FT-0621878, FT-0625767, NS00006376, EN300-19964, Ethyl acrylate, SAJ first grade, >=99.0%, C19238, Ethyl Acrylate stabilized with 10 - 20 ppm MEHQ, Q343014, J-007427, PROPENOIC ACID,ETHYL ESTER (ETHYLACRYLATE), Ethyl acrylate, inhibited [UN1917] [Flammable liquid], F1908-0175, Ethyl acrylate, contains 10-20 ppm MEHQ as inhibitor, 99%, InChI=1/C5H8O2/c1-3-5(6)7-4-2/h3H,1,4H2,2H, 2-Propenoic acid, 1,1'-((dihydro-5-(2-hydroxyethyl)-2,4,6-trioxo-1,3,5-triazine-1,3(2H,4H)-diyl)di-2,1-ethanediyl) ester	InChI=1S/C5H8O2/c1-3-5(6)7-4-2/h3H,1,4H2,2H3	JIGUQPWFLRLWPJ-UHFFFAOYSA-N	CCOC(=O)C=C	True	True	False	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=8821
2418	57	118.085960	1.65 m	118.085831	118.086128	1.62 m	1.72 m	7.23e+05	2.20e+05	0.10 m	-	-	-	-	M+NH4	+[1]H4 [14]N	0.995862	100.052135	4	3485	100.1200	100.052430	C5H8O2	pentanedial	glutaraldehyde, Pentanedial, Glutaral, 111-30-8, Glutaric dialdehyde, 1,5-Pentanedial, Cidex, Glutardialdehyde, Sonacide, Glutaric acid dialdehyde, Glutaric aldehyde, Glutaraldehyd, Glutaralum, Aldesan, Alhydex, Glutarol, Hospex, Pentane-1,5-dial, Ucarcide, 1,3-Diformylpropane, 1,5-Pentanedione, Gluteraldehyde, Aldesen, Sterihyde L, Aldehyd glutarowy, Novaruca, Sporicidin, Caswell No. 468, NCI-C55425, NSC 13392, pentandial, Dioxopentane, Glutaclean, CCRIS 3800, HSDB 949, Sterihyde, Aqucar, Glutaralum [INN-Latin], Veruca-sep, Coldcide-25 microbiocide, Relugan GT, Relugan GTW, component of Cidex, EINECS 203-856-5, Glutarex 28, NSC-13392, Sonacide (TN), Cidex 7, EPA Pesticide Chemical Code 043901, Potentiated acid glutaraldehyde, Ucarcide 250, UNII-T3C89M417N, BRN 0605390, Relugan GT 50, Sterihyde L (TN), DTXSID6025355, CHEBI:64276, T3C89M417N, Glutaral (JAN/USP/INN), DTXCID605355, Glutaral [USAN:USP:INN:JAN], EC 203-856-5, 4-01-00-03659 (Beilstein Handbook Reference), MFCD00007025, NCGC00091110-01, Glutaralum (INN-Latin), GLUTARAL (MART.), GLUTARAL [MART.], GLUTARAL (USP IMPURITY), GLUTARAL [USP IMPURITY], Glutaraldehyd [Czech], Glutaral (USAN:USP:INN:JAN), 1,3-Diformyl propane, Diswart, Gludesin, Glutarol-1,5-pentanedial, Aldehyd glutarowy [Polish], Glutaral [USAN:INN:JAN], CAS-111-30-8, Glutural, Ucarset, Verucasep, Virsal, Cudex, Glutaral(usan), glutaric dihydride, GLUTARALDEHYDE, 25% SOLN, Glutamic dialdehyde, Pond Health Guard, Bactron K31, Ucarcide 225, GLUTARAL [HSDB], GLUTARAL [INCI], GLUTARAL [USAN], pentane-1,5-dialdehyde, GLUTARAL [INN], GLUTARAL [JAN], Glutaral, INN, USAN, Protectol GDA, GT 50, SCHEMBL836, WLN: VH3VH, GLUTARAL [WHO-DD], GLUTARALDEHYDE [MI], Pentane-1,5-dial solution, GLUTARALDEHYDE [FCC], Pesticide Code: 043901, BIDD:ER0299, GLUTARALDEHYDE [VANDF], CHEMBL1235482, Glutaraldehyde (50% in water), AMY3308, Bio1_000462, Bio1_000951, Bio1_001440, Glutaraldehyde solution, 25% w/w, Glutaraldehyde solution, 50% w/w, Glutaraldehyde solution, 70% w/w, NSC13392, STR01121, Tox21_111083, Tox21_201742, Tox21_303295, AKOS008967285, Glutaraldehyde (50per cent in water), DB03266, Glutaric dialdehyde, 25%sol. In water, Glutaric dialdehyde, 25% sol. in water, NCGC00091110-02, NCGC00091110-03, NCGC00257231-01, NCGC00259291-01, FT-0626730, G0067, G0068, NS00004136, EN300-18037, D01120, A802339, Q416475, Q-201162, Z57127529, F2191-0161	InChI=1S/C5H8O2/c6-4-2-1-3-5-7/h4-5H,1-3H2	SXRSQZLOMIGNAQ-UHFFFAOYSA-N	C(CC=O)CC=O	True	True	True	http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=3485

Note the number of matches does not correspond to the number of identified peaks:

[38]:

identified = len(set(t_match.id))
total = len(t)
print(f"identified: {identified/total*100:.1f}%")

identified: 38.6%