Skip to content

PyOpenMS integration

emzed depends heavily on pyopenms for reading mass-spectrometry data and running algorithms such as the feature finder. PyOpenMS bundles its own Qt DLLs which conflict with the Qt libraries used by emzed_gui and other Qt-based software when both are loaded in the same process. To work around this, emzed does not list pyopenms as a normal dependency. Instead, it installs and runs pyopenms in a dedicated subprocess with its own Python environment and exposes it through a transparent proxy:

  1. On the first import emzed, a background Python process is started inside the isolated venv. That process imports pyopenms and opens a multiprocessing.connection listener on localhost.
  2. The main process connects to the listener and wraps the connection in a RemoteModule proxy object.
  3. The proxy is inserted into sys.modules["pyopenms"], so every subsequent import pyopenms anywhere in the codebase receives the proxy instead of a real module.
  4. Attribute access and calls on the proxy are serialised with pickle, sent over the TCP connection, executed in the subprocess, and the result is sent back the same way.
  5. Numpy arrays are transferred as raw bytes rather than pickle to avoid double-serialisation overhead.

Isolated venv

On the first import of emzed, the function setup_remote_venv() (in src/emzed/remote_package/remote_module_wrapper.py) creates a dedicated virtual environment at:

<sys.prefix>/share/pyopenms_venv

sys.prefix is the root of the active Python environment — the virtualenv directory when running inside one, or the system Python root otherwise.

The venv is managed by uv and is pinned to a fixed Python and pyopenms version defined in src/emzed/pyopenms/__init__.py:

PYOPENMS_VERSION = "3.3.0"
PYTHON_VERSION   = "3.12"

This means pyopenms always runs under Python 3.12, even when the host environment uses Python 3.11, 3.13, or 3.14. The venv's numpy version is kept in sync with the host environment to avoid serialisation ABI mismatches.

If the installed versions no longer match the pinned values (e.g. after an emzed upgrade), setup_remote_venv() detects the mismatch and rebuilds the venv automatically.

Performance optimisations

A few pyopenms operations are called very frequently (spectrum extraction, experiment loading, feature finding). Routing every call through the IPC layer would be too slow. src/emzed/pyopenms/optimizations.py provides pure-Python reimplementations of these hot paths that bypass serialisation entirely. RemoteModule.load_optimizations() registers them as overrides on the proxy object after the connection is established.