Omnibus
This space is for those who, like me, are deeply curious about the future.
Semantic PRMs
December 2024 · 5 min read
Most AI models are judged solely on their final answers, especially in natural language tasks. While Process Reward Models (PRMs) have proven effective in formal mathematical domains—where steps can be verified against axiomatic truths—their application to non-empirical tasks remains largely unexplored. Using LLaMA 3.1 (8B parameters), I developed an approach that extends PRM principles from mathematical proof verification to semantic reasoning tasks, specifically The New York Times' Connections game.
The New York Times' Connections game presents players with a 4x4 grid of 16 words that must be categorized into four distinct groups of four words each. Each group shares a common theme or relationship (e.g., "Types of Birds", "Words Ending in 'AT'", "Chess Terms"). The challenge lies in the semantic ambiguity—words can often appear to belong to multiple categories, requiring iterative hypothesis testing and backtracking. This makes it an ideal testbed for studying non-deterministic reasoning processes, as it combines elements of both logical deduction and semantic understanding.
The experiment demonstrates how reward shaping can incentivize step-by-step reasoning in domains where "correctness" isn't binary. Unlike traditional approaches that optimize for end-state accuracy through metrics like F1-score or accuracy, this system implements a novel reward function R(s,a) that decomposes the solving process into verifiable sub-steps, where s represents the current state and a represents the action taken.
Why This Is Unique:
Traditional PRM applications excel in domains with formal verification methods—theorem proving, symbolic mathematics, and logical inference chains where each step can be validated against established axioms. The innovation here lies in extending PRM methodology to semantic spaces where intermediate states can't be validated through formal logic alone.
Consider the contrast:
Mathematical PRM: ∀x∈ℝ, if f(x) = x² + 2x + 1, then f'(x) = 2x + 2
- Each step is formally verifiable through calculus rules
- Clear success metrics for intermediate states
- Deterministic validation
Semantic PRM (proposed approach):
- Probabilistic state evaluation: P(category|words) ∈ [0,1]
- Fuzzy set membership for word groupings
- Bayesian updating of confidence scores as context evolves
Generalizing Beyond Math:
So why go through all this trouble of mapping the puzzle-solving process step by step? Because it generalizes. We already know stepwise explainability is critical in math and logic-based domains. However, if you can train an AI to reason about something as open-ended as grouping words by theme—where creativity and interpretation are crucial—then you're essentially showing it how to handle reasoning in a broader context.
- Creative Brainstorming: In writing assistance or idea generation, an AI that explains how it arrived at each suggestion could keep track of thought tangents, preventing rehashed ideas and encouraging true novelty.
- Complex Decision-Making: In areas like law or finance, it's not just about the final verdict or recommendation but also why you arrived there. If an AI can systematically walk through evidence, precedents, and logical inferences, it makes the output more trustworthy.
- Medical Diagnosis: Similar logic applies when a doctor needs to understand each piece of the puzzle, from symptom analysis to recommended treatments. A stepwise approach allows for better auditing and ensures the AI isn't making dangerous leaps.
The Motivation Behind This Experiment:
The core challenge was adapting PRM frameworks—typically grounded in formal logic—to handle semantic uncertainty. While mathematical PRMs can rely on proof assistants like Coq or Lean for verification, semantic reasoning requires a more nuanced approach to intermediate state validation. The implications of extending PRMs beyond mathematical domains suggest a pathway toward more robust reasoning capabilities in language models. Key areas of potential impact include:
- Enhanced Chain-of-Thought Generation: By decomposing complex reasoning tasks into verifiable sub-steps, models can develop more structured and reliable thought processes, potentially reducing hallucinations and improving output consistency.
- Cross-Domain Reasoning Transfer: The ability to apply mathematical PRM principles to semantic tasks suggests potential for transfer learning between formal and informal reasoning domains. This could lead to more sophisticated hybrid reasoning systems.
- Scalable Architecture Improvements: The success of PRMs in semantic spaces opens possibilities for architectural modifications in future language models, particularly in attention mechanisms and state representation layers that handle intermediate reasoning steps.
This experiment demonstrates how mathematical rigor typically associated with formal proof systems can be adapted for natural language tasks. By implementing PRM principles in the Connections puzzle domain, we've shown a potential pathway for improving model reasoning capabilities across both deterministic and probabilistic problem spaces.The key insight isn't just about solving puzzles—it's about developing more sophisticated reasoning architectures that can handle both formal logic and semantic uncertainty with equal rigor. As language models continue to evolve, this fusion of mathematical precision with semantic flexibility could become a crucial component in advancing their cognitive capabilities.
A Hitchhiker's Guide to X
October 2024 · 1 min read
xAI Hackathon Winner; Grok's embedding model transforms Twitter/X posts into a personalized, ever-expanding knowledge graph that lets users explore ideas and insights in depth, sparking new inspirations.
PlanForm
June 2024 - September 2024 · 1 min read
Empowering educators with AI-driven, one click personalization, PlanForm transforms students' learning experiences through dynamic personalization of activities powered by fine-tuned LLMs.
Rabbit-Hole
September 2024 · 1 min read
Rabbit-Hole is an AI-driven learning tool that simulates the immersive experience of "going down a rabbit hole," guiding users through layered topics and questions adapting to user curiosity by continuously offering deeper insights and related content. The tool integrates with a text-to-Manim model, enabling visual learning through educational animations.
Hephaestus Robotics
December 2020 - June 2023 · 1 min read
Led a team of 12 to design and build an underwater ROV (Remotely Operated Vehicle) for marine research and conservation. The ROV features advanced capabilities including 3D scanning of coral reefs, laser-based distance measurement, and precision manipulation tools.