Skip to content

Targeted Analysis

This page highlights the most often used API functionalities and is not complete. It covers candidate generation with solution_space and adduct annotation with annotate_adducts. For complete coverage, see the API Reference.

solution_space(targets, adducts, explained_abundance=0.999, resolution=None)

Expand target formulas into adduct and isotope hypotheses.

Parameters:

Name Type Description Default
targets

table containing at least id and mf columns.

required
adducts

adduct-definition table, typically from emzed.chemistry.adducts.

required
explained_abundance

cumulative isotope abundance to explain.

0.999
resolution

optional resolving power used for measured centroids.

None

Returns:

Type Description

table describing target/adduct/isotope combinations and expected m/z.

Example:

import emzed

targets = emzed.Table.create_table(
    ["id", "mf", "rt"],
    [int, str, emzed.RtType],
    rows=[[0, "C6H12O6", 20.0]],
)

adducts = emzed.Table.stack_tables(
    [
        emzed.adducts.M_plus_Br,
        emzed.adducts.Two_M_plus_H,
        emzed.adducts.M_plus_ACN_plus_H,
    ]
)

candidates = emzed.targeted.solution_space(targets, adducts, 0.99)
print(candidates.sort_by("abundance", ascending=False)[:5])
id   target_id  mf       rt        adduct_id  adduct_name  m_multiplier  adduct_add  adduct_sub  z    sign_z  full_mf    isotope_id  isotope_decomposition         m0           abundance  mz
int  int        str      RtType    int        str          int           str         str         int  int     str        int         str                           MzType       float      MzType
---  ---------  -------  --------  ---------  -----------  ------------  ----------  ----------  ---  ------  ---------  ----------  ----------------------------  -----------  ---------  -----------
  6          0  C6H12O6    0.33 m         26  M+ACN+H                 1  C2H3NH                    1       1  C8H16NO6            0  [12]C8 [1]H16 [14]N [16]O6     222.097765      0.899   222.097216
 11          0  C6H12O6    0.33 m         36  2M+H                    2  H                         1       1  C12H25O12           0  [12]C12 [1]H25 [16]O12         361.134606      0.851   361.134057
  0          0  C6H12O6    0.33 m         12  M+Br                    1  Br                        1      -1  C6H12O6Br           0  [12]C6 [1]H12 [16]O6 [79]Br    258.981727      0.468   258.982276
  1          0  C6H12O6    0.33 m         12  M+Br                    1  Br                        1      -1  C6H12O6Br           1  [12]C6 [1]H12 [16]O6 [81]Br    260.979681      0.455   260.980230
 12          0  C6H12O6    0.33 m         36  2M+H                    2  H                         1       1  C12H25O12           1  [12]C11 [13]C [1]H25 [16]O12   362.137961      0.110   362.137412

annotate_adducts(peaks, adducts, mz_tol, rt_tol, explained_abundance=0.2)

Annotate peaks with adduct hypotheses that are mutually consistent.

The algorithm generates adduct and adduct-isotope hypotheses for each input peak, converts each hypothesis into an inferred neutral mass, and then links hypotheses that agree in both retention time and inferred neutral mass. Connected components of that hypothesis graph are reported as adduct clusters.

Input rows must provide mz and rt. Other columns are preserved.

Parameters:

Name Type Description Default
peaks

input peak table containing at least mz and rt.

required
adducts

adduct-definition table, typically from emzed.adducts.

required
mz_tol

tolerance used when comparing inferred neutral masses (adduct_m0) between hypotheses.

required
rt_tol

tolerance used when comparing retention times and for partitioning the peak table into RT windows.

required
explained_abundance

cumulative theoretical isotope abundance to include when generating centroids for the adduct addition/subtraction formulas. For example, 0.99 includes isotope centroids until 99% of the theoretical abundance is covered.

0.2

Returns:

Type Description

table with the original peak rows plus annotation columns: adduct_name, adduct_isotopes, adduct_isotopes_abundance, adduct_m0, and adduct_cluster_id.

Notes: - Peaks are compared in inferred neutral-mass space, not by direct m/z matching alone. - adduct_cluster_id identifies a connected cluster of compatible hypotheses, not a unique best assignment. - A single input peak can appear in multiple output rows if several adduct hypotheses remain compatible. - Clusters supported only by multiple hypotheses of the same original peak are discarded. - Peaks are pre-partitioned into RT windows separated by gaps larger than rt_tol; only peaks within the same window are compared. - explained_abundance is not derived from observed peak intensities; it only controls hypothesis generation from theoretical isotope patterns. - Rows with missing mz or rt are preserved, but their annotation columns are set to None.

Example:

import emzed

targets = emzed.Table.create_table(
    ["id", "mf", "rt"],
    [int, str, emzed.RtType],
    rows=[[0, "C6H12O6", 20.0]],
)

adducts = emzed.Table.stack_tables(
    [
        emzed.adducts.M_plus_Br,
        emzed.adducts.Two_M_plus_H,
        emzed.adducts.M_plus_ACN_plus_H,
    ]
)

candidates = emzed.targeted.solution_space(targets, adducts, 0.99)
candidates.rename_columns(adduct_name="original_adduct_name")

annotated = emzed.annotate.annotate_adducts(
    candidates,
    adducts,
    mz_tol=2e-5,
    rt_tol=5.0,
    explained_abundance=0.95,
).sort_by("adduct_name")

print(
    annotated.extract_columns(
        "mf",
        "rt",
        "adduct_name",
        "isotope_decomposition",
        "adduct_isotopes",
        "m0",
        "abundance",
        "adduct_cluster_id",
    )[:5]
)
found 0 gaps > rt_tol in rt values

process 1 out of 1
    process 16 peaks in rt range 0.0..21.0
    build up lookup table
    look for matches
    found matches

mf       rt        adduct_name  isotope_decomposition             adduct_isotopes      m0           abundance  adduct_cluster_id
str      RtType    str          str                               str                  MzType       float      int
-------  --------  -----------  --------------------------------  -------------------  -----------  ---------  -----------------
C6H12O6    0.33 m  2M+H         [12]C12 [1]H25 [16]O12            +[1]H                 361.134606      0.851                  0
C6H12O6    0.33 m  2M+H         [12]C10 [13]C2 [1]H25 [16]O12     +[1]H                 363.141316      0.007                  1
C6H12O6    0.33 m  M+ACN+H      [12]C8 [1]H16 [14]N [16]O6        +[12]C2 [1]H4 [14]N   222.097765      0.899                  0
C6H12O6    0.33 m  M+ACN+H      [12]C7 [13]C [1]H16 [14]N [16]O6  +[12]C2 [1]H4 [14]N   223.101120      0.078                  1
C6H12O6    0.33 m  M+ACN+H      [12]C8 [1]H16 [14]N [16]O5 [18]O  +[12]C2 [1]H4 [14]N   224.102019      0.011                  2

Candidate Chromatogram Extraction

Use top solution_space candidates to build extraction windows and then extract chromatograms from a peak map.

from pathlib import Path

import emzed

targets = emzed.Table.create_table(
    ["id", "mf", "rt"],
    [int, str, emzed.RtType],
    rows=[[0, "C6H12O6", 20.0]],
)

adducts = emzed.Table.stack_tables(
    [
        emzed.adducts.M_plus_Br,
        emzed.adducts.Two_M_plus_H,
        emzed.adducts.M_plus_ACN_plus_H,
    ]
)

candidates = emzed.targeted.solution_space(targets, adducts, 0.99)
pm = emzed.io.load_peak_map(Path("tests/data/test_smaller.mzXML"))

# Keep top hypotheses and create the columns required by extract_chromatograms:
# mzmin, mzmax, rtmin, rtmax, peakmap
peaks = candidates.sort_by("abundance", ascending=False)[:5].consolidate()
peaks.add_column("mzmin", peaks.mz - 0.01, emzed.MzType)
peaks.add_column("mzmax", peaks.mz + 0.01, emzed.MzType)
peaks.add_column("rtmin", peaks.rt - 5.0, emzed.RtType)
peaks.add_column("rtmax", peaks.rt + 5.0, emzed.RtType)
peaks.add_column_with_constant_value("peakmap", pm, emzed.PeakMap)

chrom = emzed.extract_chromatograms(peaks, ms_level=1)
chrom.set_col_format("peakmap", None)
chrom.set_col_format("chromatogram", None)

print(
    chrom.extract_columns(
        "adduct_name",
        "mz",
        "rt",
        "rtmin_chromatogram",
        "rtmax_chromatogram",
    )[:5]
)
needed 0.0 seconds
adduct_name  mz           rt        rtmin_chromatogram  rtmax_chromatogram
str          MzType       RtType    RtType              RtType
-----------  -----------  --------  ------------------  ------------------
M+ACN+H       222.097216    0.33 m              0.25 m              0.42 m
2M+H          361.134057    0.33 m              0.25 m              0.42 m
M+Br          258.982276    0.33 m              0.25 m              0.42 m
M+Br          260.980230    0.33 m              0.25 m              0.42 m
2M+H          362.137412    0.33 m              0.25 m              0.42 m