pynenc.client_data_store.base_client_data_store

Base class and interface for the ClientDataStore system.

Manages serialization and external storage of client-provided data: task arguments, results, and exceptions. Small values pass through as inline serialized strings. Large values are stored externally with content-hash keys for automatic deduplication.

Key components:

  • BaseClientDataStore: Abstract base with serialize/deserialize and size-based routing

Module Contents

Classes

BaseClientDataStore

Manages serialization and storage of client-provided data.

Functions

_generate_key

Generate a content-hash reference key from serialized data.

API

class pynenc.client_data_store.base_client_data_store.BaseClientDataStore(app: pynenc.app.Pynenc)[source]

Bases: abc.ABC

Manages serialization and storage of client-provided data.

Handles task arguments, results, and exceptions. Small values are returned inline as serialized strings. Large values are stored externally and referenced by content-hash keys for automatic deduplication.

Deduplication is achieved through deterministic content-hashing: the same serialized value always produces the same SHA-256 key, so backends naturally deduplicate via INSERT OR REPLACE / upsert semantics.

A small process-local LRU cache avoids repeated backend reads for recently deserialized objects.

Subclasses implement three abstract methods for backend storage: _store, _retrieve, and _purge.

Initialization

Initialize with app reference.

Parameters:

app (Pynenc) – The Pynenc application instance

conf() pynenc.conf.config_client_data_store.ConfigClientDataStore

Get the client data store configuration.

serialize_arguments(kwargs: dict[str, Any], disable_cache_args: tuple[str, ...]) dict[str, str][source]

Serialize task arguments, externalizing large values.

The disable_cache_args config controls which arguments always stay inline (never externalized). Use ("*",) to disable external storage for all arguments.

Deduplication of identical values is handled by content-hash keys: the same serialized content always maps to the same storage key.

Parameters:
  • kwargs (dict[str, Any]) – The dictionary with raw Python values.

  • disable_cache_args (tuple[str, ]) – Argument names to skip external storage.

Returns:

Dict of argument name → serialized value (inline or reference key).

deserialize_arguments(serialized_args: dict[str, str]) dict[str, Any][source]

Deserialize argument values, resolving any external references.

Each value is checked: if it is a ClientDataStore reference key, the value is loaded from external storage first, then deserialized. Inline values are deserialized directly.

Parameters:

serialized_args (dict[str, str]) – Argument name → serialized value.

Returns:

Dict of argument name → deserialized Python object.

serialize(obj: Any, disable_cache: bool = False) str[source]

Serialize an object, storing externally if it meets size thresholds.

Returns either an inline serialized string (small values) or a reference key pointing to externally stored data (large values).

Parameters:
  • obj (Any) – Object to serialize

  • disable_cache (bool) – If True, always return inline serialized string

Returns:

Serialized string or reference key

resolve(data: str) Any[source]

Resolve a serialized value to a Python object.

If the value is a reference key, loads the data from external storage first. Otherwise deserializes directly.

Parameters:

data (str) – Serialized string or reference key

Returns:

The deserialized Python object

deserialize(data: str) Any[source]

Alias for resolve() — resolve a serialized value to a Python object.

… deprecated:: Use resolve() instead for clarity.

Parameters:

data (str) – Serialized string or reference key

Returns:

The deserialized Python object

is_reference(value: str) bool[source]

Check if a string is a reference key to externally stored data.

Parameters:

value (str) – String to check

Returns:

True if this is a reference key

purge() None[source]

Clear local cache and backend storage.

_maybe_store(serialized: str) str[source]

Route serialized data to external storage or return inline.

Below min_size_to_cache: return inline. Above max_size_to_cache (if set): return inline with warning. Otherwise: store externally and return reference key.

_resolve_reference(ref_key: str) Any[source]

Resolve a reference key to the deserialized object.

Uses a small process-local LRU cache to avoid repeated backend reads for the same key within a single process.

_cache_deserialized(key: str, obj: Any) None[source]

Add to LRU cache, evicting oldest if at capacity.

_log_size_warning(size: int, reason: str) None[source]

Log a warning about value size.

abstractmethod _store(key: str, value: str) None[source]

Store a serialized value by its content-hash key.

Backends should use upsert/INSERT OR REPLACE semantics so that storing the same key twice is a no-op (content-hash deduplication).

Parameters:
  • key (str) – Content-hash reference key

  • value (str) – Serialized string to store

abstractmethod _retrieve(key: str) str[source]

Retrieve a serialized value by its reference key.

Parameters:

key (str) – Content-hash reference key

Returns:

The stored serialized string

Raises:

KeyError – If key not found

abstractmethod _purge() None[source]

Remove all stored data from the backend.

pynenc.client_data_store.base_client_data_store._generate_key(value: str) str[source]

Generate a content-hash reference key from serialized data.