gasfir.fitting and gasfir.retrieval

Note

These modules require the retrieval extras:

pip install gasfir[retrieval]

gasfir.retrieval — automated pipeline

gasfir.retrieval.retrieve(data_NA, config, uncertainty_NA=None, data_QS=None, uncertainty_QS=None, data_validation=None, run_phases=(True, True, True))[source]

Run the full DE/CMA-ES → least_squares → emcee retrieval pipeline.

Parameters:
  • data_NA (DataFrame) – Training DataFrame with "pulses" and "Y" columns.

  • config (RetrievalConfig) – RetrievalConfig controlling all pipeline settings.

  • uncertainty_NA (Optional[ndarray[tuple[Any, ...], dtype[float64]]]) – Per-point uncertainty array. If None, defaults to 0.05 × Y (5% relative uncertainty).

  • data_QS (Optional[DataFrame]) – Optional quasi-static data with "field" and "Y" columns for simultaneous fitting.

  • uncertainty_QS (Union[ndarray[tuple[Any, ...], dtype[float64]], float, None]) – Uncertainty for QS data. Defaults to 5%.

  • data_validation (Optional[DataFrame]) – Optional out-of-sample DataFrame for validation metrics (same columns as data_NA).

  • run_phases (Tuple[bool, bool, bool]) – Three-tuple (global, polish, mcmc) controlling which pipeline phases to execute. Default (True, True, True) runs all three.

Return type:

RetrievalResult

Returns:

A RetrievalResult with the best-fit parameters, posterior statistics, and optional validation metrics.

class gasfir.retrieval.RetrievalConfig(medium_name='unknown', output_dir=None, initial_params=None, fixed_params=None, relative_bound=5.0, is_crystal=False, simultaneous=False, dt=4.0, dT=0.25, ret_electron_density=False, global_method='cma', de_workers=1, de_maxiter=1000, de_tol=1e-07, de_popsize=15, cma_sigma0=0.3, cma_maxiter=10000, ls_method='least_squares', emcee_nwalkers=None, emcee_nsteps=3000, emcee_burn_fraction=0.3, emcee_thin=10, emcee_max_autocorr_loops=5, emcee_pool=None, trial_run=False, latex_mapping=None)[source]

All settings for the three-phase retrieval pipeline.

Parameters:
  • medium_name (str) – Name used to look up initial parameters via get_parameters() (if initial_params is None) and as the stem for output file names.

  • output_dir (Union[str, Path, None]) – Directory where JSON results, HDF5 chains, and LaTeX/corner outputs are written.

  • initial_params (Optional[Dict[str, float]]) – Explicit initial parameter dict. If None, get_parameters(medium_name)() is used.

  • fixed_params (Optional[List[str]]) – Names of parameters to hold fixed (not varied). E_g and m_eff are fixed by default for non-crystal media.

  • relative_bound (float) – Half-width of parameter bounds as a multiple of the absolute initial value. Default 5.0 means the search range is [val - 5*abs(val), val + 5*abs(val)].

  • is_crystal (bool) – If True, m_eff is also varied and a0_per_au3 is converted to a0 using the lattice constant.

  • simultaneous (bool) – If True, load and concatenate quasi-static residuals.

  • dt (float) – Coarse time step for pulse pre-computation (a.u.).

  • dT (float) – Fine time step for pulse pre-computation (a.u.).

  • ret_electron_density (bool) – Pass True to fit electron density instead of ionization probability.

  • global_method (str) – "differential_evolution" or "cma".

  • de_workers (int) – Number of workers for differential evolution. -1 uses all cores. 1 (default) lets Numba handle intra-call parallelism.

  • de_maxiter (int) – Maximum iterations for differential evolution.

  • de_tol (float) – Convergence tolerance for differential evolution.

  • de_popsize (int) – Population size multiplier for differential evolution.

  • cma_sigma0 (float) – Initial step size for CMA-ES.

  • cma_maxiter (int) – Maximum function evaluations for CMA-ES.

  • ls_method (str) – lmfit method for local polish (e.g. "least_squares", "leastsq").

  • emcee_nwalkers (Optional[int]) – Number of MCMC walkers (must be even and ≥ 2×ndim). Nonemax(32, 4×ndim).

  • emcee_nsteps (int) – Total MCMC steps per walker.

  • emcee_burn_fraction (float) – Fraction of chain discarded as burn-in if autocorrelation time estimation fails.

  • emcee_thin (int) – Thinning factor applied when drawing flat samples.

  • emcee_max_autocorr_loops (int) – Maximum auto-correction loops (re-centre bounds and restart MCMC if parameters drift).

  • emcee_pool (Optional[object]) – Optional multiprocessing.Pool-compatible object passed to emcee.EnsembleSampler. None (default) uses Numba thread-level parallelism instead.

  • trial_run (bool) – If True, run DE and MCMC with minimal iterations for a quick sanity check.

class gasfir.retrieval.RetrievalResult(params, var_names, chisqr=0.0, redchi=0.0, ndata=0, nvarys=0, covar=None, flat_samples=None, lnprob=None, val_stats=<factory>, de_result=None, ls_result=None, method='emcee', chain_path=None)[source]

Full results from a retrieval run.

params

Final lmfit Parameters with best-fit values and (after MCMC) stderr attributes set to posterior standard deviations.

var_names

Names of the varied parameters.

chisqr

Sum of squared residuals at the best-fit point.

redchi

Reduced chi-squared.

ndata

Number of data points used in the fit.

nvarys

Number of varied parameters.

covar

Covariance matrix from the MCMC posterior.

flat_samples

Flattened MCMC chain (shape: (n_samples, ndim)).

lnprob

Log-probability for each flat sample.

val_stats

Out-of-sample validation statistics keyed by subset name.

de_result

Raw scipy DE result (or None).

ls_result

Raw lmfit least-squares result (or None).

method

Comma-separated list of phases that were run.

Individual phases

Global parameter search using Differential Evolution or CMA-ES.

The result is used as the starting point for run_local_polish().

Parameters:
  • residual_fn (Callable) – Callable f(params) -> residual_array.

  • params (Parameters) – Starting lmfit.Parameters with bounds set.

  • method (str) – "differential_evolution" or "cma".

  • de_workers (int) – Workers for DE. 1 → Numba parallelism handles the inner loop. -1 → all cores for process-level DE.

  • de_maxiter (int) – Maximum DE iterations.

  • de_tol (float) – DE convergence tolerance.

  • de_popsize (int) – DE population multiplier.

  • cma_sigma0 (float) – Initial step size for CMA-ES (relative to bounds width).

  • cma_maxiter (int) – Maximum CMA-ES function evaluations.

  • trial_run (bool) – Use a very small budget for quick testing.

Return type:

Parameters

Returns:

Updated lmfit.Parameters with values set to the global search optimum.

Raises:
  • ImportError – If method="cma" but the cma package is not installed.

  • ValueError – If method is not recognised.

gasfir.retrieval.run_local_polish(residual_fn, params, method='least_squares')[source]

Locally polish the global-search result with Levenberg–Marquardt.

Parameters:
  • residual_fn (Callable) – Callable f(params) -> residual_array.

  • params (Parameters) – lmfit.Parameters at the global search optimum.

  • method (str) – lmfit minimisation method. "least_squares" is recommended as it handles bounds natively.

Return type:

Tuple[Parameters, MinimizerResult]

Returns:

Tuple of (updated params, lmfit.MinimizerResult).

gasfir.retrieval.run_mcmc(residual_fn, params, n_steps=3000, n_walkers=None, burn_fraction=0.3, thin=10, backend_path=None, pool=None, max_autocorr_loops=5, initial_stored_params=None, trial_run=False)[source]

Run emcee MCMC and return posterior summary.

Wraps the residual function in a log-probability callable, runs emcee with optional HDF5 chain persistence and auto-correction for parameter drift.

Parameters:
  • residual_fn (Callable) – Safe residual callable f(params) -> array.

  • params (Parameters) – lmfit.Parameters at the local-polish optimum.

  • n_steps (int) – Total MCMC steps per walker.

  • n_walkers (Optional[int]) – Number of walkers. Nonemax(32, 4 × ndim).

  • burn_fraction (float) – Fraction of chain used as burn-in when autocorrelation estimation fails.

  • thin (int) – Thinning factor for flat samples.

  • backend_path (Union[str, Path, None]) – HDF5 file for chain persistence. Pass None to skip persistence.

  • pool (Optional[object]) – Optional multiprocessing.Pool for walker-level parallelism. Pass None (default) to rely on Numba intra-call parallelism.

  • max_autocorr_loops (int) – Number of auto-correction cycles (re-centre bounds around drifted parameters and restart).

  • initial_stored_params (Optional[Dict[str, float]]) – Original parameter dict used as the reference for drift detection (passed to get_cleanest_value()).

  • trial_run (bool) – Use a very small step count for quick testing.

Return type:

Tuple[Parameters, ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]], ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]], object]

Returns:

(final_params, flat_samples, lnprob, sampler) where final_params has stderr set to posterior standard deviations and values snapped to clean estimates via get_cleanest_value().

Output & reporting

gasfir.retrieval.generate_publication_corner(flat_samples, var_names, output_dir, medium_name, truths=None, latex_mapping=None, constants=None)[source]

Save a publication-quality corner plot as a PDF.

The corner plot shows 1-D marginal distributions on the diagonal and 2-D contour projections off-diagonal. When truths are provided, two inset tables are added: one with adopted parameter values and one with the correlation matrix.

Parameters:
  • flat_samples (ndarray[tuple[Any, ...], dtype[float64]]) – Flattened MCMC chain, shape (n_samples, ndim).

  • var_names (List[str]) – Ordered parameter names matching the chain columns.

  • output_dir (Union[str, Path]) – Directory where the PDF is saved.

  • medium_name (str) – Used as the file-name stem.

  • truths (Optional[List[float]]) – Best-fit / snapped values to mark on the plot (one per param).

  • latex_mapping (Optional[Dict[str, str]]) – {name: latex_string} overrides for axis labels.

  • constants (Optional[Dict[str, float]]) – Fixed parameter values to list in the adopted-values table.

Return type:

Optional[Path]

Returns:

Path to the saved PDF, or None if matplotlib / corner are absent.

gasfir.retrieval.generate_trace_plot(chain_path, var_names, output_dir, medium_name, latex_mapping=None, discard=0)[source]

Save a trace plot (walker trajectories) as a PDF.

Loads the chain from an HDF5 backend file so no live sampler object is required — the plot can be regenerated from persisted runs.

Parameters:
  • chain_path (Union[str, Path]) – Path to the *.h5 emcee HDF5 backend file.

  • var_names (List[str]) – Parameter names in chain column order.

  • output_dir (Union[str, Path]) – Directory where the PDF is saved.

  • medium_name (str) – Used as the file-name stem.

  • latex_mapping (Optional[Dict[str, str]]) – {name: latex_string} label overrides.

  • discard (int) – Number of initial steps to discard from the trace.

Return type:

Optional[Path]

Returns:

Path to the saved PDF, or None if matplotlib or emcee absent.

gasfir.retrieval.generate_latex_summary(result, output_dir, latex_mapping=None)[source]

Write a self-contained LaTeX section summarising the fit.

Produces a .tex file with:

  • Parameter table (initial value, best fit, ±stderr).

  • Out-of-sample validation table (if result.val_stats is populated).

  • Correlation matrix (if result.covar is available).

  • \includegraphics link to the corner plot.

Parameters:
Return type:

Path

Returns:

Path to the written .tex file.

gasfir.retrieval.compare_to_stored(ls_result, stored_params, sigma=2.0)[source]

Print a table comparing LS-fitted values to stored reference parameters.

For each varied parameter, reports the stored value, fitted value, standard error, number of standard deviations separating them, and whether the stored value falls within sigma-σ of the fitted value.

Parameters:
  • ls_result (MinimizerResult) – lmfit.minimizer.MinimizerResult from the LS phase.

  • stored_params (Dict[str, float]) – Reference parameter dict (e.g. from get_parameters()). Only keys that appear in ls_result.params and were varied are compared.

  • sigma (float) – Confidence threshold in units of stderr (default 2 → 95 % CI).

Return type:

None

Post-processing

gasfir.retrieval.post_process_mcmc(medium_name, output_dir, original_stored_params, drop_vars=None, derived_exprs=None, latex_mapping=None, is_crystal=False)[source]

Re-analyse an existing MCMC chain: drop/derive variables, snap, and re-plot.

Loads the HDF5 chain written by a previous retrieve() call, applies transformations (drop parameters, add derived quantities via pandas eval), snaps each parameter to its cleanest 1-σ value, and writes updated corner plot, trace plot, and LaTeX summary.

This mirrors the logic in uncertainty_emcee_post.py but as a proper library function.

Parameters:
  • medium_name (str) – Stem used for file names (matches the original run).

  • output_dir (Union[str, Path]) – Directory containing the .h5 chain and .json result.

  • original_stored_params (Dict[str, float]) – The reference parameter dict (e.g. from get_parameters()) used for drift detection.

  • drop_vars (Optional[List[str]]) – Parameter names to remove from the chain before re-analysis.

  • derived_exprs (Optional[Dict[str, str]]) – {new_name: pandas_eval_expression} pairs, e.g. {"a0_per_au3": "a0 / 6.74**3"}.

  • latex_mapping (Optional[Dict[str, str]]) – {name: latex_string} label overrides.

  • is_crystal (bool) – If True, lattice_constant_au is kept as a fixed constant in the reported tables.

Return type:

RetrievalResult

Returns:

A new RetrievalResult with post-processed parameters.

Raises:
gasfir.retrieval.save_result(result, path)[source]

Serialise a RetrievalResult to a JSON file.

Parameters:
  • result (RetrievalResult) – The retrieval result to save.

  • path (Union[str, Path]) – Output file path (will be created/overwritten).

Return type:

None

gasfir.retrieval.load_result(path)[source]

Load a RetrievalResult from a JSON file written by save_result().

Parameters:

path (Union[str, Path]) – Path to the JSON file.

Return type:

RetrievalResult

Returns:

A RetrievalResult with flat_samples and lnprob set to None (they are not persisted in JSON; reload the HDF5 chain for those).

gasfir.fitting — low-level residual functions

gasfir.fitting.residual_P(params, data_Nadiabatic, uncertainty_Nadiabatic=0.05, dt=2, dT=0.5)[source]

Compute normalized residuals for non-adiabatic ionization probabilities.

Parameters:
  • params (Parameters) – lmfit parameter set passed to the kernel.

  • data_Nadiabatic (DataFrame) – DataFrame with "pulses" and "Y" columns.

  • uncertainty_Nadiabatic (Union[ndarray[tuple[Any, ...], dtype[float64]], float]) – Per-point weight or scalar uncertainty. Default is 5%.

  • dt (float) – Coarse time step in atomic units. Default is 2.

  • dT (float) – Fine time step in atomic units. Default is 0.5.

Return type:

ndarray[tuple[Any, ...], dtype[float64]]

Returns:

Array of normalized residuals (model - data) / uncertainty.

gasfir.fitting.residual_QS(params, data_QS, uncertainty_QS=0.05)[source]

Compute normalized residuals for quasi-static ionization rates.

Parameters:
  • params (Parameters) – lmfit parameter set passed to the kernel.

  • data_QS (DataFrame) – DataFrame with "field" and "Y" columns; "Y" stores the log of measured ionization rates.

  • uncertainty_QS (Union[ndarray[tuple[Any, ...], dtype[float64]], float]) – Per-point weight or scalar uncertainty. Default is 5%.

Return type:

ndarray[tuple[Any, ...], dtype[float64]]

Returns:

Array of normalized residuals in log-rate space.

gasfir.fitting.get_residuals_concat_QS_NA(params, data_Nadiabatic, uncertainty_Nadiabatic, data_QS, uncertainty_QS=0.05, dt=2, dT=0.5)[source]

Concatenate non-adiabatic and quasi-static residuals into a single array.

Convenience wrapper around residual_P() and residual_QS() for simultaneous fitting to both datasets.

Parameters:
  • params (Parameters) – lmfit parameter set.

  • data_Nadiabatic (DataFrame) – DataFrame with "pulses" and "Y" columns.

  • uncertainty_Nadiabatic (float) – Uncertainty for the non-adiabatic data.

  • data_QS (DataFrame) – DataFrame with "field" and "Y" columns.

  • uncertainty_QS (Union[ndarray[tuple[Any, ...], dtype[float64]], float]) – Uncertainty for the quasi-static data. Default is 5%.

  • dt (float) – Coarse time step in atomic units. Default is 2.

  • dT (float) – Fine time step in atomic units. Default is 0.5.

Return type:

ndarray[tuple[Any, ...], dtype[float64]]

Returns:

Concatenated array of residuals from both datasets.