Argument Caching¶

Overview¶

This guide demonstrates Pynenc’s argument caching system, which optimizes task execution by caching large serialized arguments. This feature is particularly useful when working with large objects frequently passed between tasks.

Prerequisites¶

For Redis-based caching in distributed environments, install the Redis plugin:

pip install pynenc-redis

For MongoDB-based caching, install the MongoDB plugin:

pip install pynenc-mongodb

The memory-based cache is included with the core Pynenc package.

Scenario¶

The argument caching scenario illustrates how to:

Configure argument caching thresholds
Control caching behavior per task
Utilize different cache backends (Redis/Memory)

Implementation¶

Basic Usage¶

from pynenc import Pynenc
import numpy as np

app = Pynenc()

@app.task
def process_array(data: np.ndarray) -> float:
    """Process a large numpy array."""
    return float(data.mean())

# Large arrays will be automatically cached
large_array = np.random.rand(1_000_000)
result = process_array(large_array)

Configuration¶

Configure caching behavior using either pyproject.toml or PynencBuilder:

Using pyproject.toml¶

Set up caching in your pyproject.toml file:

[tool.pynenc]
arg_client_data_store_cls = "SQLiteClientDataStore"  # or "MemClientDataStore"

[tool.pynenc.client_data_store]
min_size_to_cache = 1024  # Cache arguments larger than 1KB
local_cache_size = 1000   # Keep 1000 most recent entries in local cache

Using PynencBuilder¶

Alternatively, configure argument caching programmatically using PynencBuilder in your application code (e.g., in tasks.py):

from pynenc import Pynenc
from pynenc.builder import PynencBuilder
import numpy as np

app = (
    PynencBuilder()
    .redis(url="redis://localhost:6379")   # Requires pynenc-redis plugin
    .client_data(
        mode="redis",                      # Sets RedisClientDataStore; options: "redis", "memory", "disabled"
        min_size_to_cache=1024,            # Cache arguments larger than 1KB
        local_cache_size=1000              # Keep 1000 most recent entries in local cache
    )
    .build()
)

@app.task
def process_array(data: np.ndarray) -> float:
    """Process a large numpy array."""
    return float(data.mean())

# Large arrays will be automatically cached
large_array = np.random.rand(1_000_000)
result = process_array(large_array)

The .client_data_store() method allows selecting one of the following modes:

"redis": (Default; requires .redis() configuration and pynenc-redis plugin)
"mongodb": (Requires .mongodb() configuration and pynenc-mongodb plugin)
"memory": For local testing or development purposes
"disabled": Completely disables argument caching

Additionally, .client_data_store() directly accepts parameters like min_size_to_cache and local_cache_size, simplifying configuration compared to using .custom_config().

Controlling Cache Behavior¶

Disable caching for specific arguments:

@app.task
def process_data(data: bytes, *, disable_cache_args: list[str] = None) -> int:
    """Process data with optional cache control."""
    return len(data)

# Prevent caching of the 'data' argument
result = process_data(large_bytes, disable_cache_args=["data"])

Features¶

Automatic Caching: Large arguments automatically cached based on the defined size threshold.
Multiple Backends: Supports Redis (distributed), MongoDB (distributed), and memory-based (local) caching.
Cache Sharing: Shared cache across processes using runner-level storage.
Smart Detection: Avoids redundant serialization of identical objects.
LRU Cache: Maintains recently used items within configured limits.

For more details on configuration options, refer to the Configuration documentation.