gasfir.fitting and gasfir.retrieval
Note
These modules require the retrieval extras:
pip install gasfir[retrieval]
gasfir.retrieval — automated pipeline
- gasfir.retrieval.retrieve(data_NA, config, uncertainty_NA=None, data_QS=None, uncertainty_QS=None, data_validation=None, run_phases=(True, True, True))[source]
Run the full DE/CMA-ES → least_squares → emcee retrieval pipeline.
- Parameters:
data_NA (
DataFrame) – Training DataFrame with"pulses"and"Y"columns.config (
RetrievalConfig) –RetrievalConfigcontrolling all pipeline settings.uncertainty_NA (
Optional[ndarray[tuple[Any,...],dtype[float64]]]) – Per-point uncertainty array. IfNone, defaults to0.05 × Y(5% relative uncertainty).data_QS (
Optional[DataFrame]) – Optional quasi-static data with"field"and"Y"columns for simultaneous fitting.uncertainty_QS (
Union[ndarray[tuple[Any,...],dtype[float64]],float,None]) – Uncertainty for QS data. Defaults to 5%.data_validation (
Optional[DataFrame]) – Optional out-of-sample DataFrame for validation metrics (same columns asdata_NA).run_phases (
Tuple[bool,bool,bool]) – Three-tuple(global, polish, mcmc)controlling which pipeline phases to execute. Default(True, True, True)runs all three.
- Return type:
- Returns:
A
RetrievalResultwith the best-fit parameters, posterior statistics, and optional validation metrics.
- class gasfir.retrieval.RetrievalConfig(medium_name='unknown', output_dir=None, initial_params=None, fixed_params=None, relative_bound=5.0, is_crystal=False, simultaneous=False, dt=4.0, dT=0.25, ret_electron_density=False, global_method='cma', de_workers=1, de_maxiter=1000, de_tol=1e-07, de_popsize=15, cma_sigma0=0.3, cma_maxiter=10000, ls_method='least_squares', emcee_nwalkers=None, emcee_nsteps=3000, emcee_burn_fraction=0.3, emcee_thin=10, emcee_max_autocorr_loops=5, emcee_pool=None, trial_run=False, latex_mapping=None)[source]
All settings for the three-phase retrieval pipeline.
- Parameters:
medium_name (
str) – Name used to look up initial parameters viaget_parameters()(ifinitial_paramsisNone) and as the stem for output file names.output_dir (
Union[str,Path,None]) – Directory where JSON results, HDF5 chains, and LaTeX/corner outputs are written.initial_params (
Optional[Dict[str,float]]) – Explicit initial parameter dict. IfNone,get_parameters(medium_name)()is used.fixed_params (
Optional[List[str]]) – Names of parameters to hold fixed (not varied).E_gandm_effare fixed by default for non-crystal media.relative_bound (
float) – Half-width of parameter bounds as a multiple of the absolute initial value. Default 5.0 means the search range is[val - 5*abs(val), val + 5*abs(val)].is_crystal (
bool) – IfTrue,m_effis also varied anda0_per_au3is converted toa0using the lattice constant.simultaneous (
bool) – IfTrue, load and concatenate quasi-static residuals.dt (
float) – Coarse time step for pulse pre-computation (a.u.).dT (
float) – Fine time step for pulse pre-computation (a.u.).ret_electron_density (
bool) – PassTrueto fit electron density instead of ionization probability.global_method (
str) –"differential_evolution"or"cma".de_workers (
int) – Number of workers for differential evolution.-1uses all cores.1(default) lets Numba handle intra-call parallelism.de_maxiter (
int) – Maximum iterations for differential evolution.de_tol (
float) – Convergence tolerance for differential evolution.de_popsize (
int) – Population size multiplier for differential evolution.cma_sigma0 (
float) – Initial step size for CMA-ES.cma_maxiter (
int) – Maximum function evaluations for CMA-ES.ls_method (
str) – lmfit method for local polish (e.g."least_squares","leastsq").emcee_nwalkers (
Optional[int]) – Number of MCMC walkers (must be even and ≥ 2×ndim).None→max(32, 4×ndim).emcee_nsteps (
int) – Total MCMC steps per walker.emcee_burn_fraction (
float) – Fraction of chain discarded as burn-in if autocorrelation time estimation fails.emcee_thin (
int) – Thinning factor applied when drawing flat samples.emcee_max_autocorr_loops (
int) – Maximum auto-correction loops (re-centre bounds and restart MCMC if parameters drift).emcee_pool (
Optional[object]) – Optionalmultiprocessing.Pool-compatible object passed toemcee.EnsembleSampler.None(default) uses Numba thread-level parallelism instead.trial_run (
bool) – IfTrue, run DE and MCMC with minimal iterations for a quick sanity check.
- class gasfir.retrieval.RetrievalResult(params, var_names, chisqr=0.0, redchi=0.0, ndata=0, nvarys=0, covar=None, flat_samples=None, lnprob=None, val_stats=<factory>, de_result=None, ls_result=None, method='emcee', chain_path=None)[source]
Full results from a retrieval run.
- params
Final lmfit Parameters with best-fit values and (after MCMC)
stderrattributes set to posterior standard deviations.
- var_names
Names of the varied parameters.
- chisqr
Sum of squared residuals at the best-fit point.
- redchi
Reduced chi-squared.
- ndata
Number of data points used in the fit.
- nvarys
Number of varied parameters.
- covar
Covariance matrix from the MCMC posterior.
- flat_samples
Flattened MCMC chain (shape:
(n_samples, ndim)).
- lnprob
Log-probability for each flat sample.
- val_stats
Out-of-sample validation statistics keyed by subset name.
- de_result
Raw scipy DE result (or
None).
- ls_result
Raw lmfit least-squares result (or
None).
- method
Comma-separated list of phases that were run.
Individual phases
- gasfir.retrieval.run_global_search(residual_fn, params, method='differential_evolution', de_workers=1, de_maxiter=1000, de_tol=1e-07, de_popsize=15, cma_sigma0=0.3, cma_maxiter=10000, trial_run=False)[source]
Global parameter search using Differential Evolution or CMA-ES.
The result is used as the starting point for
run_local_polish().- Parameters:
residual_fn (
Callable) – Callablef(params) -> residual_array.params (
Parameters) – Startinglmfit.Parameterswith bounds set.method (
str) –"differential_evolution"or"cma".de_workers (
int) – Workers for DE.1→ Numba parallelism handles the inner loop.-1→ all cores for process-level DE.de_maxiter (
int) – Maximum DE iterations.de_tol (
float) – DE convergence tolerance.de_popsize (
int) – DE population multiplier.cma_sigma0 (
float) – Initial step size for CMA-ES (relative to bounds width).cma_maxiter (
int) – Maximum CMA-ES function evaluations.trial_run (
bool) – Use a very small budget for quick testing.
- Return type:
Parameters- Returns:
Updated
lmfit.Parameterswith values set to the global search optimum.- Raises:
ImportError – If
method="cma"but thecmapackage is not installed.ValueError – If
methodis not recognised.
- gasfir.retrieval.run_local_polish(residual_fn, params, method='least_squares')[source]
Locally polish the global-search result with Levenberg–Marquardt.
- Parameters:
- Return type:
Tuple[Parameters,MinimizerResult]- Returns:
Tuple of (updated params,
lmfit.MinimizerResult).
- gasfir.retrieval.run_mcmc(residual_fn, params, n_steps=3000, n_walkers=None, burn_fraction=0.3, thin=10, backend_path=None, pool=None, max_autocorr_loops=5, initial_stored_params=None, trial_run=False)[source]
Run emcee MCMC and return posterior summary.
Wraps the residual function in a log-probability callable, runs emcee with optional HDF5 chain persistence and auto-correction for parameter drift.
- Parameters:
residual_fn (
Callable) – Safe residual callablef(params) -> array.params (
Parameters) –lmfit.Parametersat the local-polish optimum.n_steps (
int) – Total MCMC steps per walker.n_walkers (
Optional[int]) – Number of walkers.None→max(32, 4 × ndim).burn_fraction (
float) – Fraction of chain used as burn-in when autocorrelation estimation fails.thin (
int) – Thinning factor for flat samples.backend_path (
Union[str,Path,None]) – HDF5 file for chain persistence. PassNoneto skip persistence.pool (
Optional[object]) – Optionalmultiprocessing.Poolfor walker-level parallelism. PassNone(default) to rely on Numba intra-call parallelism.max_autocorr_loops (
int) – Number of auto-correction cycles (re-centre bounds around drifted parameters and restart).initial_stored_params (
Optional[Dict[str,float]]) – Original parameter dict used as the reference for drift detection (passed toget_cleanest_value()).trial_run (
bool) – Use a very small step count for quick testing.
- Return type:
Tuple[Parameters,ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]],ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]],object]- Returns:
(final_params, flat_samples, lnprob, sampler)where final_params hasstderrset to posterior standard deviations and values snapped to clean estimates viaget_cleanest_value().
Output & reporting
- gasfir.retrieval.generate_publication_corner(flat_samples, var_names, output_dir, medium_name, truths=None, latex_mapping=None, constants=None)[source]
Save a publication-quality corner plot as a PDF.
The corner plot shows 1-D marginal distributions on the diagonal and 2-D contour projections off-diagonal. When truths are provided, two inset tables are added: one with adopted parameter values and one with the correlation matrix.
- Parameters:
flat_samples (
ndarray[tuple[Any,...],dtype[float64]]) – Flattened MCMC chain, shape(n_samples, ndim).var_names (
List[str]) – Ordered parameter names matching the chain columns.output_dir (
Union[str,Path]) – Directory where the PDF is saved.medium_name (
str) – Used as the file-name stem.truths (
Optional[List[float]]) – Best-fit / snapped values to mark on the plot (one per param).latex_mapping (
Optional[Dict[str,str]]) –{name: latex_string}overrides for axis labels.constants (
Optional[Dict[str,float]]) – Fixed parameter values to list in the adopted-values table.
- Return type:
- Returns:
Path to the saved PDF, or
Noneif matplotlib / corner are absent.
- gasfir.retrieval.generate_trace_plot(chain_path, var_names, output_dir, medium_name, latex_mapping=None, discard=0)[source]
Save a trace plot (walker trajectories) as a PDF.
Loads the chain from an HDF5 backend file so no live sampler object is required — the plot can be regenerated from persisted runs.
- Parameters:
chain_path (
Union[str,Path]) – Path to the*.h5emcee HDF5 backend file.var_names (
List[str]) – Parameter names in chain column order.output_dir (
Union[str,Path]) – Directory where the PDF is saved.medium_name (
str) – Used as the file-name stem.latex_mapping (
Optional[Dict[str,str]]) –{name: latex_string}label overrides.discard (
int) – Number of initial steps to discard from the trace.
- Return type:
- Returns:
Path to the saved PDF, or
Noneif matplotlib or emcee absent.
- gasfir.retrieval.generate_latex_summary(result, output_dir, latex_mapping=None)[source]
Write a self-contained LaTeX section summarising the fit.
Produces a
.texfile with:Parameter table (initial value, best fit, ±stderr).
Out-of-sample validation table (if
result.val_statsis populated).Correlation matrix (if
result.covaris available).\includegraphicslink to the corner plot.
- Parameters:
result (
RetrievalResult) –RetrievalResultfromretrieve().output_dir (
Union[str,Path]) – Directory where the.texfile is saved.latex_mapping (
Optional[Dict[str,str]]) –{name: latex_string}label overrides.
- Return type:
- Returns:
Path to the written
.texfile.
- gasfir.retrieval.compare_to_stored(ls_result, stored_params, sigma=2.0)[source]
Print a table comparing LS-fitted values to stored reference parameters.
For each varied parameter, reports the stored value, fitted value, standard error, number of standard deviations separating them, and whether the stored value falls within sigma-σ of the fitted value.
- Parameters:
ls_result (
MinimizerResult) –lmfit.minimizer.MinimizerResultfrom the LS phase.stored_params (
Dict[str,float]) – Reference parameter dict (e.g. fromget_parameters()). Only keys that appear inls_result.paramsand were varied are compared.sigma (
float) – Confidence threshold in units of stderr (default 2 → 95 % CI).
- Return type:
Post-processing
- gasfir.retrieval.post_process_mcmc(medium_name, output_dir, original_stored_params, drop_vars=None, derived_exprs=None, latex_mapping=None, is_crystal=False)[source]
Re-analyse an existing MCMC chain: drop/derive variables, snap, and re-plot.
Loads the HDF5 chain written by a previous
retrieve()call, applies transformations (drop parameters, add derived quantities via pandaseval), snaps each parameter to its cleanest 1-σ value, and writes updated corner plot, trace plot, and LaTeX summary.This mirrors the logic in
uncertainty_emcee_post.pybut as a proper library function.- Parameters:
medium_name (
str) – Stem used for file names (matches the original run).output_dir (
Union[str,Path]) – Directory containing the.h5chain and.jsonresult.original_stored_params (
Dict[str,float]) – The reference parameter dict (e.g. fromget_parameters()) used for drift detection.drop_vars (
Optional[List[str]]) – Parameter names to remove from the chain before re-analysis.derived_exprs (
Optional[Dict[str,str]]) –{new_name: pandas_eval_expression}pairs, e.g.{"a0_per_au3": "a0 / 6.74**3"}.latex_mapping (
Optional[Dict[str,str]]) –{name: latex_string}label overrides.is_crystal (
bool) – IfTrue,lattice_constant_auis kept as a fixed constant in the reported tables.
- Return type:
- Returns:
A new
RetrievalResultwith post-processed parameters.- Raises:
FileNotFoundError – If the HDF5 chain or JSON result is missing.
ImportError – If emcee or h5py are not installed.
- gasfir.retrieval.save_result(result, path)[source]
Serialise a
RetrievalResultto a JSON file.- Parameters:
result (
RetrievalResult) – The retrieval result to save.path (
Union[str,Path]) – Output file path (will be created/overwritten).
- Return type:
- gasfir.retrieval.load_result(path)[source]
Load a
RetrievalResultfrom a JSON file written bysave_result().- Parameters:
- Return type:
- Returns:
A
RetrievalResultwithflat_samplesandlnprobset toNone(they are not persisted in JSON; reload the HDF5 chain for those).
gasfir.fitting — low-level residual functions
- gasfir.fitting.residual_P(params, data_Nadiabatic, uncertainty_Nadiabatic=0.05, dt=2, dT=0.5)[source]
Compute normalized residuals for non-adiabatic ionization probabilities.
- Parameters:
params (
Parameters) – lmfit parameter set passed to the kernel.data_Nadiabatic (
DataFrame) – DataFrame with"pulses"and"Y"columns.uncertainty_Nadiabatic (
Union[ndarray[tuple[Any,...],dtype[float64]],float]) – Per-point weight or scalar uncertainty. Default is 5%.dt (
float) – Coarse time step in atomic units. Default is 2.dT (
float) – Fine time step in atomic units. Default is 0.5.
- Return type:
- Returns:
Array of normalized residuals
(model - data) / uncertainty.
- gasfir.fitting.residual_QS(params, data_QS, uncertainty_QS=0.05)[source]
Compute normalized residuals for quasi-static ionization rates.
- Parameters:
params (
Parameters) – lmfit parameter set passed to the kernel.data_QS (
DataFrame) – DataFrame with"field"and"Y"columns;"Y"stores the log of measured ionization rates.uncertainty_QS (
Union[ndarray[tuple[Any,...],dtype[float64]],float]) – Per-point weight or scalar uncertainty. Default is 5%.
- Return type:
- Returns:
Array of normalized residuals in log-rate space.
- gasfir.fitting.get_residuals_concat_QS_NA(params, data_Nadiabatic, uncertainty_Nadiabatic, data_QS, uncertainty_QS=0.05, dt=2, dT=0.5)[source]
Concatenate non-adiabatic and quasi-static residuals into a single array.
Convenience wrapper around
residual_P()andresidual_QS()for simultaneous fitting to both datasets.- Parameters:
params (
Parameters) – lmfit parameter set.data_Nadiabatic (
DataFrame) – DataFrame with"pulses"and"Y"columns.uncertainty_Nadiabatic (
float) – Uncertainty for the non-adiabatic data.data_QS (
DataFrame) – DataFrame with"field"and"Y"columns.uncertainty_QS (
Union[ndarray[tuple[Any,...],dtype[float64]],float]) – Uncertainty for the quasi-static data. Default is 5%.dt (
float) – Coarse time step in atomic units. Default is 2.dT (
float) – Fine time step in atomic units. Default is 0.5.
- Return type:
- Returns:
Concatenated array of residuals from both datasets.