emzed.chemistry¶
formula_table(min_mass, max_mass, *, mass_c=None, mass_h=None, mass_n=None, mass_o=None, mass_p=None, mass_s=None, c_range=None, h_range=None, n_range=None, o_range=None, p_range=None, s_range=None, apply_rules=True, apply_rule_1=True, apply_rule_2=True, apply_rule_4=True, apply_rule_5=True, apply_rule_6=True, rule_45_range='extended')
¶
This is a Python version of HR2 formula generator for CHNOPS, see https://fiehnlab.ucdavis.edu/projects/seven-golden-rules
This function generates a table containing molecular formulas consisting of elements C, H, N, O, P and S having a mass in range [min_mass, max_mass]. For each element one can provide an given count or an inclusive range of atom counts considered in this process.
Putting some restrictions on atomcounts, eg C=(0, 100), can speed up the process tremendously.
MolecularFormula
¶
Represent a molecular formula as both string and element-count mapping.
as_dict()
¶
Return the molecular formula as a plain dict mapping atoms to counts.
as_string()
¶
Return the normalized molecular-formula string or None if invalid.
mass(**specialisations)
¶
Calculate the exact mass of the formula.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
specialisations
|
optional isotope overrides such as |
{}
|
Returns:
| Type | Description |
|---|---|
|
exact mass as |
compute_centroids(mf, explained_abundance, *, abundances=None)
¶
computes table with theoretial ms peaks of molecular formula.
Usage examples:
compute_centroids("C6S2", 0.995)
compute_centroids("C6S2", 0.995, abundances=dict(C={12: 0.5, 13: 0.5}))
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mf
|
molecular sum formula. |
required | |
explained_abundance
|
stopping criterium, value is between 0 and 1. |
required | |
abundances
|
override natural abundances. |
None
|
Returns:
| Type | Description |
|---|---|
|
table with columns |
measured_centroids(mf, R, explained_abundance, *, abundances=None)
¶
computes table with theoretial measured ms peaks of molecular formula.
Usage examples:
measured_centroids("C6S2", 200_000, 0.995)
measured_centroids("C6S2", 200_000, 0.995, abundances=dict(C={12: 0.5, 13: 0.5}))
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mf
|
molecular sum formula. |
required | |
R
|
resolution defined as as zz / FWHM |
required | |
explained_abundance
|
stopping criterium, value is between 0 and 1. |
required | |
abundances
|
override natural abundances. |
None
|
Returns:
| Type | Description |
|---|---|
|
table with columns |
plot_profile(mf, R, explained_abundance, *, path=None, abundances=None)
¶
plots theoretial ms peaks of molecular formula.
Usage examples:
plot_profile("C6S2", 200_000, 0.995)
plot_profile("C6S2", 200_000, 0.995, path="profile.png")
plot_profile("C6S2", 200_000, 0.995, abundances=dict(C={12: 0.5, 13: 0.5}))
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mf
|
molecular sum formula. |
required | |
R
|
resolution defined as as zz / FWHM |
required | |
explained_abundance
|
stopping criterium, value is between 0 and 1. |
required | |
path
|
path to save plot to. If not provided a plot window will pop up. |
None
|
|
abundances
|
override natural abundances. |
None
|
Lazy access to natural isotope abundances.
The module forwards isotope abundances such as C12 and element abundance maps such
as C via __getattr__ without loading the full element table up front.
__dir__()
¶
forward attributes for autocompletion
Predefined adduct tables and convenience subsets for targeted annotation.
The module exposes:
all: every predefined adduct- charge-based subsets such as
positiveandnegative - single-adduct tables addressable as Python identifiers such as
M_plus_H
Convenience access to exact masses and common particle masses.
The module exposes a small set of particle masses directly (e, p, n) and lazily
forwards element and isotope masses via __getattr__ so that expressions such as
emzed.mass.C12 work without preloading the full elements table.
__dir__()
¶
forward attributes for autocompletion
of(mf, **specialisation)
¶
Calculate the exact mass for a molecular formula.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mf
|
molecular formula string. |
required | |
specialisation
|
optional isotope specialisations forwarded to
|
{}
|
Returns:
| Type | Description |
|---|---|
|
exact mass as |
DelayedElementsTable
¶
Lazy proxy for the elements table to avoid loading it at import time.
create_abundance_dict(symbols, abundances)
¶
Create the lazy-access abundance dictionary exported by abundance.
create_elements_table(symbols, atomic_numbers, abundances, atomic_masses, average_masses)
¶
Create the tabular element/isotope representation used by elements.
create_mass_dict(symbols, atomic_masses, average_masses)
¶
Create the lazy-access mass dictionary exported by :mod:emzed.chemistry.mass.
fix_pubchem_entries(name, isotopes, masses, abundances)
¶
Normalize a few known inconsistencies in the bundled PubChem export.
load_elements()
¶
Load element masses and abundances from bundled OpenMS and PubChem data.
Returns:
| Type | Description |
|---|---|
|
tuple |
load_elements_pubchem()
¶
Load bundled element/isotope data derived from PubChem JSON resources.
parse_float(txt)
¶
txt from pubchem can be sth like "1.23 45 (23)" or '[1.2211, 1.2212]'
Helpers for downloading and assembling PubChem-derived compound tables.
Downloader
¶
Parallel downloader for PubChem summary data filtered to emzed's sources.
download(limit=None, result_path=None)
¶
Download PubChem summary JSON data and compress it into one archive.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
optional maximum number of compound identifiers to download. |
None
|
|
result_path
|
optional output path for the compressed JSON archive. |
None
|
Returns:
| Type | Description |
|---|---|
|
path to the generated |
PubChemAccessor
¶
Thin wrapper around the PubChem E-Utilities search and summary endpoints.
get_count(search_term=None)
¶
Count number of compuouds for given search term.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_term
|
In case search term is not provided we search for compounds which are from KEGG, HMDB or Biocyc and carry no charge. For more complicated searches, like restricting the search term only for some fields, the term can be constructed manually by using the search form at https://www.ncbi.nlm.nih.gov/pccompound/advanced |
None
|
Returns:
| Type | Description |
|---|---|
|
integer number |
get_identifiers(start=0, end=1000000, search_term=None, source=None)
¶
Get compuoud identifiers for given search term. These can be used later for retrieving details of compounds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start
|
fetch results starting and given index |
0
|
|
end
|
fetch results up to given index |
1000000
|
|
search_term
|
In case search term is not provided we search for compounds which comd from KEGG, HMDB or Biocyc and carry no charge. For more complicated searches, like restricting the search term only for some fields, the term can be constructed manually by using the search form at https://www.ncbi.nlm.nih.gov/pccompound/advanced |
None
|
|
source
|
in case the user does not provie a search term on can restrict fetching user ids from specified source, like 'HMDB' only. |
None
|
Returns:
| Type | Description |
|---|---|
|
list of strings |
get_summary_data(ids)
¶
Fetches data for given compound ids.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ids
|
list of compound ids. you can use get_identifiers to dermine identifiers by searching for terms and meta data first. |
required |
Returns:
| Type | Description |
|---|---|
|
dictionary mapping each id to a dictionary with keys 'cid', 'molecularweight', 'molecularformula', 'iupacname', 'inchi', 'inchikey', 'canonicalsmiles' and 'synonymlist'. |
register_pubchem_api_key(email_address, api_key, *, overwrite=False)
¶
The api key is required if you want to donwload larger amounts of data or if you make more than 3 requests per second.
-
You have to create a user accout at https://www.ncbi.nlm.nih.gov
-
To create the key, go to the “Settings” page of your NCBI account. (Hint: after signing in, simply click on your NCBI username in the upper right corner of any NCBI page.)
-
You’ll see a new “API Key Management” area. Click the “Create an API Key” button, and copy the resulting key.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
email_address
|
your valid email address. |
required | |
api_key
|
valid API key |
required | |
overwrite
|
overwrite existing data. |
False
|
assemble_table(gz_file, path=None)
¶
Convert a downloaded PubChem JSON archive into an emzed.Table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gz_file
|
path to the compressed JSON archive produced by
|
required | |
path
|
optional output path for the resulting table database. |
None
|
Returns:
| Type | Description |
|---|---|
|
|
fast_m0(mf)
¶
Estimate the monoisotopic mass of a formula string using bundled element data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mf
|
molecular formula string. |
required |
Returns:
| Type | Description |
|---|---|
|
monoisotopic mass as |
retry(n)
¶
Retry a function up to n times with a short delay between attempts.
Utility script for regenerating the bundled PubChem element isotope data.
fetch(number)
¶
Extract isotope masses and abundances for one element from PubChem.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
number
|
atomic number to request from PubChem. |
required |
Returns:
| Type | Description |
|---|---|
|
tuple |
get(number)
¶
Fetch the raw PubChem element record for one atomic number.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
number
|
atomic number to request from PubChem. |
required |
Returns:
| Type | Description |
|---|---|
|
parsed JSON data or |
main()
¶
Download bundled element isotope data for the supported atomic-number range.