PyOpenMS integration¶
emzed depends heavily on pyopenms for
reading mass-spectrometry data and running algorithms such as the feature
finder. PyOpenMS bundles its own Qt DLLs which conflict with the Qt libraries used by
emzed_gui and other Qt-based software when both are loaded in the same
process. To work around this, emzed does not list pyopenms as a normal
dependency.
Instead, it installs and runs pyopenms in a dedicated subprocess with its own
Python environment and exposes it through a transparent proxy:
- On the first
import emzed, a background Python process is started inside the isolated venv. That process imports pyopenms and opens amultiprocessing.connectionlistener on localhost. - The main process connects to the listener and wraps the connection in a
RemoteModuleproxy object. - The proxy is inserted into
sys.modules["pyopenms"], so every subsequentimport pyopenmsanywhere in the codebase receives the proxy instead of a real module. - Attribute access and calls on the proxy are serialised with pickle, sent over the TCP connection, executed in the subprocess, and the result is sent back the same way.
- Numpy arrays are transferred as raw bytes rather than pickle to avoid double-serialisation overhead.
Isolated venv¶
On the first import of emzed, the function setup_remote_venv() (in
src/emzed/remote_package/remote_module_wrapper.py) creates a dedicated
virtual environment at:
sys.prefix is the root of the active Python environment — the virtualenv
directory when running inside one, or the system Python root otherwise.
The venv is managed by uv and is pinned to
a fixed Python and pyopenms version defined in
src/emzed/pyopenms/__init__.py:
This means pyopenms always runs under Python 3.12, even when the host environment uses Python 3.11, 3.13, or 3.14. The venv's numpy version is kept in sync with the host environment to avoid serialisation ABI mismatches.
If the installed versions no longer match the pinned values (e.g. after an
emzed upgrade), setup_remote_venv() detects the mismatch and rebuilds the
venv automatically.
Performance optimisations¶
A few pyopenms operations are called very frequently (spectrum extraction,
experiment loading, feature finding). Routing every call through the IPC layer
would be too slow. src/emzed/pyopenms/optimizations.py provides pure-Python
reimplementations of these hot paths that bypass serialisation entirely.
RemoteModule.load_optimizations() registers them as overrides on the proxy
object after the connection is established.