Performance Optimization
========================

Numba JIT Compilation
---------------------

GASFIR's inner loops are JIT-compiled with Numba. The first call to any kernel
function triggers compilation, which may take several seconds. Subsequent calls
run at near-native speed. To warm up the cache before a long computation::

    from gasfir import create_pulse, get_parameters, get_diabatic_ionization_probability

    laser = create_pulse(800, 1e14, 0, 5)   # short pulse just for warm-up
    params = get_parameters("Hydrogen_SFA")
    _ = get_diabatic_ionization_probability(pulse=laser, param_dict=params)
    # From here on, compiled kernels are cached in memory.

Pulse Result Caching
--------------------

Each :class:`~gasfir.Pulse` instance caches field/potential evaluations keyed by
the grid hash. Reusing the same time grid avoids redundant computation::

    t = laser.get_tgrid(dt=0.25)
    E1 = laser.get_electric_field(t)   # computed
    E2 = laser.get_electric_field(t)   # returned from cache

Batch Pre-Computation
---------------------

When computing probabilities for many pulses, use
:func:`~gasfir.get_diabatic_ionization_probability_vec`.  It calls
:func:`~gasfir.precompute_pulse_batch` once to strip the NumPy arrays out of
the pulse objects, then evaluates the kernel for every pulse across all CPU
cores::

    from gasfir import get_diabatic_ionization_probability_vec, get_parameters

    params = get_parameters("Hydrogen_SFA")
    probs  = get_diabatic_ionization_probability_vec(pulses, params)

During *fitting*, the grid pre-computation should happen once and be reused
across every parameter update.  :func:`~gasfir.ret_gasfir_P_for_dataFrame`
(and :func:`gasfir.fitting.ret_residual_function`) do exactly this: they run
:func:`~gasfir.precompute_pulse_batch` at construction time and return a fast
closure that only re-evaluates the kernel::

    from gasfir import ret_gasfir_P_for_dataFrame

    P_func = ret_gasfir_P_for_dataFrame(df)   # pre-computes once
    probs  = P_func(params)                   # cheap to call per parameter set

Choosing ``dt`` and ``dT``
--------------------------

* ``dT`` (fine grid) controls accuracy of the phase integral. Default 0.25 a.u.
  is safe for most conditions; increase to 0.5 for a ~2× speedup with mild
  accuracy loss.
* ``dt`` (coarse grid) controls the pulse envelope sampling. Default 2 a.u. is
  sufficient for pulses longer than ~5 optical cycles.