Optimization#
Experimental Module
This module is experimental and it may change or be removed in future versions. Some functionalities are not well tested and may not be entirely reliable.
Losses#
stochastix.utils.optimization.reinforce_loss #
reinforce_loss(stop_returns_grad: bool = True)
Create a loss function for the REINFORCE algorithm.
Parameters:
-
stop_returns_grad(bool, default:True) –If True, gradients will not be computed for the returns. This is the standard formulation of REINFORCE.
Returns:
-
–
A loss function that computes the REINFORCE loss.
The returned loss function has the signature:
loss(model, ssa_results, returns, baseline=0.0)
Parameters:
-
model–The model being trained. It must have a
log_probmethod (e.g.,ReactionNetwork) or a.networkattribute with alog_probmethod (e.g.,StochasticModel). -
ssa_results–The output from a stochastic simulation.
-
returns–The returns for each step of the trajectory, typically discounted cumulative rewards.
-
baseline–A baseline to subtract from the returns to reduce variance. It should not have gradients with respect to the policy parameters.
Returns:
-
–
The computed REINFORCE loss as a scalar.
Rewards#
stochastix.utils.optimization.neg_final_state_distance #
neg_final_state_distance(species=0, distance='L1')
Compute the negative distance from a target final state.
This is useful for creating a reward function where the goal is to minimize the distance to a target state at the end of the simulation.
Parameters:
-
species–The index or name of the species to track.
-
distance–The distance metric, either 'L1' or 'L2'.
Returns:
-
–
A function that takes SimulationResults and a target state value
-
–
and returns a reward vector. The reward is non-zero only at the final
-
–
step of the simulation.
stochastix.utils.optimization.steady_state_distance #
steady_state_distance(from_t=0.0, species=0)
Compute the distance from a target steady state.
Parameters:
-
from_t–The time from which to start computing the distance.
-
species–The index or name of the species to track.
Returns:
-
–
A function that takes SimulationResults and a target steady-state
-
–
value and returns the absolute distance from the target.
stochastix.utils.optimization.rewards_from_state_metric #
rewards_from_state_metric(metric_fn, metric_type='cost')
Generate a reward function from the differences of a state metric.
This is a general-purpose utility to create reward functions. It takes a metric function that computes a value at each time step and returns the difference between consecutive values as the reward.
Parameters:
-
metric_fn–A function that takes SimulationResults and returns a vector of metric values, one for each time step.
-
metric_type–Either 'cost' or 'reward'. If 'cost', the reward is the negative difference of the metric. If 'reward', it is the positive difference.
Returns:
-
–
A function that takes SimulationResults and computes a reward vector.
Gradients#
stochastix.utils.optimization.grad.gradfd #
gradfd(
func: Callable[..., ndarray], epsilon: float = 1e-05
) -> Callable[..., Any]
Compute the gradient of a function using central first order finite differences.
NOTE: only supports differentiation with respect to the first argument of func.
Parameters:
-
func(Callable[..., ndarray]) –The function to differentiate. Should take a PyTree of arrays as input and return a scalar.
-
epsilon(float, default:1e-05) –The step size to use for finite differences.
Returns:
-
Callable[..., Any]–A function that takes the same arguments as
funcand returns the gradient offuncat that point as a PyTree with the same structure as the input.
stochastix.utils.optimization.grad.gradspsa #
gradspsa(
func: Callable[..., ndarray],
epsilon: float = 0.001,
num_samples: int = 20,
*,
split_first_arg_key: bool = True,
) -> Callable[..., Any]
Compute the gradient of a function using SPSA (Simultaneous Perturbation Stochastic Approximation).
NOTE: only supports differentiation with respect to the first argument of func.
Parameters:
-
func(Callable[..., ndarray]) –The function to differentiate. Should take a PyTree of arrays as input and return a scalar.
-
epsilon(float, default:0.001) –The perturbation size to use for SPSA.
-
num_samples(int, default:20) –Number of SPSA samples to average over.
-
split_first_arg_key(bool, default:True) –If True, assumes the first positional argument passed to the returned gradient function is a PRNGKey and splits it into
num_samplesindependent keys, using one per SPSA sample. The same per-sample key is used for both + and - perturbations (common random numbers) to reduce variance. If False, the exact same arguments are used for all SPSA samples.
Returns:
-
Callable[..., Any]–A function that takes the same arguments as
func(plus a PRNGKey) and returns the gradient offuncat that point as a PyTree with the same structure as the input.
Utils#
stochastix.utils.optimization.dataloader #
dataloader(
key: ndarray, arrays: list[ndarray], batch_size: int
)
Create a generator function that yields batches of data.
This function creates an infinite generator that repeatedly yields batches of data from the provided arrays.
Parameters:
-
key(ndarray) –A JAX random key for shuffling the data.
-
arrays(list[ndarray]) –A list of arrays, each with the same size in the first dimension.
-
batch_size(int) –The size of each batch.
Yields:
-
–
A tuple of batched arrays.
Raises:
-
ValueError–If the arrays do not all have the same size in the first dimension.
stochastix.utils.optimization.discounted_returns #
discounted_returns(
rewards: ndarray, GAMMA: float = 0.9
) -> jnp.ndarray
Calculate the discounted returns for a sequence of rewards.
Parameters:
-
rewards(ndarray) –A 1D array of rewards.
-
GAMMA(float, default:0.9) –The discount factor.
Returns:
-
ndarray–A 1D array of discounted returns, where each element represents the
-
ndarray–cumulative discounted return from that point forward.