Parameter Fitting
=================

.. note::

   Parameter fitting requires the retrieval extras::

      pip install gasfir[retrieval]

   All fitting-related imports come from :mod:`gasfir.fitting` or
   :mod:`gasfir.retrieval`, **not** from the top-level ``gasfir`` namespace.

   Results are saved to ``<medium_name>/`` by default (JSON, HDF5 chain,
   corner plot, trace plot, LaTeX summary).

Automated Pipeline (recommended)
---------------------------------

:func:`~gasfir.retrieval.retrieve` runs the full
CMA-ES → least-squares → emcee pipeline.  After each phase it prints
progress and a fit report using :func:`lmfit.fit_report`.

**Known medium** — initial guess and bounds from the stored reference:

.. code-block:: python

    import pandas as pd
    from gasfir import create_pulse
    from gasfir.retrieval import RetrievalConfig, retrieve

    data = pd.DataFrame({
        "pulses": [create_pulse(800, I, 0, 6) for I in intensities],
        "Y":      measured_probabilities,
    })

    cfg = RetrievalConfig(
        medium_name="Hydrogen_SFA",   # output saved to ./Hydrogen_SFA/
        emcee_nsteps=3000,
        latex_mapping={"E_g": r"$E_g$ [a.u.]", "a1": r"$a_1$"},
    )
    result = retrieve(data_NA=data, config=cfg)

After the LS phase the pipeline automatically prints a Nσ comparison of
fitted values against stored reference parameters.  After emcee it saves:

* ``<name>_publication_corner.pdf``
* ``<name>_trace.pdf``
* ``<name>_fit_summary.tex``
* ``<name>_retrieval_result.json``

**Atom cold start** — E_g looked up from the NIST atomic database:

.. code-block:: python

    # No initial_params needed; E_g resolved automatically for known elements
    cfg = RetrievalConfig(medium_name="Kr", cma_maxiter=5000)
    result = retrieve(data_NA=data, config=cfg)

**Crystal cold start** — E_g must always be supplied for crystals:

.. code-block:: python

    cfg = RetrievalConfig(
        medium_name="MyNewCrystal",
        initial_params={"E_g": 0.30},   # a.u. — only required field
        is_crystal=True,
        ret_electron_density=True,       # Y is electron density, not P
        cma_maxiter=5000,
    )
    result = retrieve(data_NA=data, config=cfg)

RetrievalConfig key options
-----------------------------

.. list-table::
   :header-rows: 1
   :widths: 25 10 65

   * - Option
     - Default
     - Description
   * - ``medium_name``
     - ``"unknown"``
     - Used for output file names and stored-parameter lookup
   * - ``output_dir``
     - ``None``
     - Defaults to ``./medium_name/``
   * - ``initial_params``
     - ``None``
     - Dict of starting values; overrides stored params; only ``E_g``
       is required for cold start
   * - ``fixed_params``
     - ``None``
     - Names to hold fixed (default: ``["E_g"]`` for atoms,
       ``["a4"]`` for crystals)
   * - ``is_crystal``
     - ``False``
     - Enables ``m_eff`` as free parameter and crystal cold-start bounds
   * - ``ret_electron_density``
     - ``False``
     - Fit electron density instead of ionisation probability
   * - ``relative_bound``
     - ``5.0``
     - Search range = ``[val − 5|val|, val + 5|val|]``
   * - ``global_method``
     - ``"cma"``
     - ``"cma"`` or ``"differential_evolution"``
   * - ``cma_maxiter``
     - ``10000``
     - CMA-ES evaluation budget (plateau-stopped early when flat)
   * - ``emcee_nsteps``
     - ``3000``
     - MCMC steps per walker
   * - ``latex_mapping``
     - ``None``
     - ``{name: latex_string}`` for axis labels in plots and .tex output

Post-processing an existing chain
----------------------------------

Reload a chain and apply transformations (drop/derive variables, re-snap):

.. code-block:: python

    from gasfir.retrieval import post_process_mcmc

    post_process_mcmc(
        medium_name="Diamond_TBmBJ",
        output_dir="Diamond_TBmBJ",
        original_stored_params={"E_g": 0.235, "a1": 14, "a2": 1.7,
                                 "a3": -3.0, "a0_per_au3": 0.35,
                                 "m_eff": 0.115, "lattice_constant_au": 6.74},
        drop_vars=["a0"],
        derived_exprs={"a0_per_au3": "a0 / 6.74**3"},
        latex_mapping={"E_g": r"$E_g$ [a.u.]", "m_eff": r"$m_\mathrm{eff}$"},
        is_crystal=True,
    )

Simultaneous Fit to QS and Non-Adiabatic Data
----------------------------------------------

Pass ``data_QS`` to concatenate quasi-static residuals:

.. code-block:: python

    from gasfir.retrieval import RetrievalConfig, retrieve
    import numpy as np

    qs_data = pd.DataFrame({
        "field": field_amplitudes,
        "Y":     np.log(qs_rates),
    })

    cfg = RetrievalConfig(medium_name="Hydrogen_SFA", simultaneous=True)
    result = retrieve(data_NA=data, data_QS=qs_data, config=cfg)

Low-level: residual function + manual lmfit
-------------------------------------------

For custom optimisers or more control:

.. code-block:: python

    import lmfit
    from gasfir.fitting import ret_residual_function

    residual = ret_residual_function(data, uncertainty_Nadiabatic=0.05)

    p = lmfit.Parameters()
    p.add("E_g", value=0.5, vary=False)
    p.add("a0",  value=5.0, min=0.01)
    p.add("a1",  value=3.5, min=0.1)
    p.add("a2",  value=2.0, min=0.0)
    p.add("a3",  value=1.0)
    p.add("a4",  value=0.0, vary=False)

    result = lmfit.minimize(residual, p, method="least_squares")
    lmfit.report_fit(result)

Uncertainty Quantification with emcee
--------------------------------------

Use the automated pipeline — emcee runs automatically after LS.  For
process-level walker parallelism:

.. code-block:: python

    from gasfir.retrieval import RetrievalConfig, retrieve, make_emcee_pool

    cfg = RetrievalConfig(medium_name="Hydrogen_SFA", emcee_nsteps=3000)

    pool = make_emcee_pool()          # each worker sets NUMBA_NUM_THREADS=1
    cfg.emcee_pool = pool
    result = retrieve(data_NA=data, config=cfg)
    pool.close(); pool.join()

    for name in result.var_names:
        v = result.params[name]
        print(f"  {name} = {v.value:.4g} ± {v.stderr:.2g}")