<aside> 📢

<aside> 🎯

Master AI & ML with Educatum, Your AI University

Curated resources from leading universities and industry experts to help you master artificial intelligence.

Build your knowledge base, and prepare for interviews.

Join study group and learn together.

Discover top AI tools and companies.

Connect with like-minded professionals.

No Ads, no noise

</aside>

[Daily AI Interview Questions] 7. Why has ReAct become a foundational prompting paradigm for LLM Agents?

The ReAct (Reasoning and Acting) paradigm, introduced in ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al.), shifted how Large Language Models can be orchestrated to behave as goal-directed agents.

Importantly, ReAct does not introduce a new neural architecture. Instead, it defines a structured prompting and control pattern that interleaves internal reasoning traces with external tool execution within a unified autoregressive context.

Prior to ReAct, two dominant prompting strategies were common:

ReAct unifies these by interleaving reasoning tokens (“Thought”) with executable commands (“Action”), followed by environment feedback (“Observation”), forming an iterative loop within a single prompt structure.

This prompting paradigm influenced many modern agentic frameworks (e.g., LangChain-style tool loops and AutoGPT-style planners) because of three practical advantages:

đź§Ş Core Insights & Mathematical Foundations

$$

\begin{aligned} & \text{[Standard Autoregressive Policy]: } \pi_\theta(y_t \mid c_t) \\ & \text{[ReAct Context]: } c_t = (x, z_1, a_1, o_1, \dots, z_{t-1}, a_{t-1}, o_{t-1}) \\ & \text{[Joint Token/Action Policy]: } P_\theta(z_t, a_t \mid c_t) \\ & \text{[Environment Transition]: } o_t = \mathcal{E}(a_t) \\ & \text{[Augmented Action Space (Conceptual)]: } \mathcal{A}_{ReAct} = \mathcal{A} \cup \mathcal{Z} \end{aligned} $$

Clarification:

Although this can be interpreted under a policy-based or reinforcement learning lens, most ReAct systems are deployed via prompting on pretrained autoregressive language models. They do not explicitly optimize a reward objective such as

$$ J(\theta) = \mathbb{E}{\tau \sim \pi\theta} \left( \sum_{t=1}^{T} \gamma^{t} r(s_t, a_t) \right) $$

The RL framing is therefore a theoretical abstraction rather than a description of standard training practice.

Follow-up 1: Explain the execution loop and how the Observation phase mitigates hallucination.

The ReAct mechanism operates on an autoregressive loop:

Thought → Action → Observation.

The “Thought” is a generated text span where the model plans its next step (e.g., identifying required information before performing a calculation). The “Action” is a structured command formatted according to predefined tool schemas (e.g., Search[...]).

In most implementations, generation is programmatically halted when an action token is emitted. The external environment executes the command and returns an “Observation,” which is appended to the model’s context window.

This halting-and-appending process constrains hallucination by injecting non-parametric evidence into the context. Instead of relying solely on internal parametric memory, the model conditions its subsequent reasoning on retrieved data.

However, hallucination is reduced—not eliminated. The model may still:

Thus, the Observation phase improves epistemic grounding but does not guarantee factual correctness.

Follow-up 2: How do frameworks like Toolformer and DSPy optimize or replace the ReAct paradigm? (Optional)

While ReAct is interpretable and flexible, it can be token-heavy and sensitive to prompt design. Reliance on in-context learning for structured tool calls may introduce formatting instability over long contexts.

[Daily AI Interview Questions] 6. Why has LoRA become the dominant choice for Parameter-Efficient Fine-Tuning?

LoRA (Low-Rank Adaptation) has revolutionized the adaptation of Large Language Models (LLMs) by addressing the "curse of dimensionality." It operates on the hypothesis that the change in weights during task-specific fine-tuning (ΔW) resides in a manifold of much lower "intrinsic dimensionality" than the original weight space. Instead of updating the massive pre-trained weight matrix W₀, LoRA freezes it and learns two smaller, low-rank matrices, A and B, that represent the update.

đź§Ş Core Insights & Mathematical Foundations

$$

\begin{aligned} & \text{[LoRA Forward Pass]: } h = W_0 x + \Delta W x = W_0 x + \frac{\alpha}{r} (BAx) \\ & \text{[Low-Rank Factorization]: } \Delta W = B \cdot A, \quad A \in \mathbb{R}^{r \times k}, B \in \mathbb{R}^{d \times r}, \quad r \ll \min(d, k) \\ & \text{[Initial State]: } A \sim \mathcal{N}(0, \sigma^2), \quad B = 0 \implies \Delta W = 0 \text{ at } t=0 \\ & \text{[Weight Merging]: } W_{merged} = W_0 + \frac{\alpha}{r}(BA) \\ & \text{[DoRA Decomposition]: } W = m \frac{V + \Delta V}{\|V + \Delta V\|_c}, \quad \Delta V = BA \\ & \text{[Parameter Ratio]: } \Phi \approx \frac{r(d+k)}{d \cdot k} \end{aligned}

$$

Follow-up 1: Explain the initialization strategy and the role of the α/r scaling factor.

LoRA employs a specific asymmetrical initialization to maintain training stability. Matrix A is initialized with random Gaussian noise, while Matrix B is initialized to zero. This ensures that the product ΔW = BA is exactly zero at the start of training, meaning the model begins its fine-tuning process with the identical output as the original pre-trained model.

The scaling factor α/r acts as a normalization hyperparameter that decouples the rank (r) from the learning rate (η). When experimenting with different ranks, the magnitude of the adapter's update would naturally shift; however, by scaling the update by α/r, the user can keep the learning rate constant. This significantly reduces the hyperparameter search space when scaling from a small rank (e.g., r=8) to a larger one (e.g., r=64).

Follow-up 2: How do QLoRA and DoRA address the representational limitations of standard LoRA? (Optional)

While LoRA is efficient, it often exhibits a performance gap compared to Full Fine-Tuning (FFT) on complex reasoning tasks because it restricts updates to a fixed low-rank subspace. SOTA variants solve this by optimizing different parts of the weight update:


<aside> đź’ˇ

Offerings

</aside>

<aside>

<aside> <img src="/icons/gem_blue.svg" alt="/icons/gem_blue.svg" width="40px" /> AI ML Interview Questions

</aside>

<aside> <img src="/icons/gem_blue.svg" alt="/icons/gem_blue.svg" width="40px" /> AI ML Curated Courses

</aside>

<aside> <img src="/icons/gem_blue.svg" alt="/icons/gem_blue.svg" width="40px" /> AI Study Groups

</aside>

</aside>

<aside> <img src="/icons/reorder_gray.svg" alt="/icons/reorder_gray.svg" width="40px" />

Interview Prep

Learn AI ML

Contacts & Community

</aside>

Coming Soon