Please wait while we prepare your experience

Building a Voice-Activated AI Assistant for My Portfolio

October 24, 2025•15 min read

AIWeb DevelopmentVoice RecognitionNext.js

Building a Voice-Activated, Page-Aware AI Assistant

The Problem

Portfolio sites are static. Visitors scan content, leave, and rarely engage deeply. I wanted something different: an interface that responds to context, understands intent, and works as naturally on mobile as it does on desktop.

The Approach

The system combines three core technologies: Web Speech API for voice recognition, RAG (Retrieval Augmented Generation) for intelligent responses, and context injection to understand which page the user is viewing.

Page awareness means the assistant knows where you are. Ask "What's this about" on the projects page versus the now page, and you get different, contextually relevant answers. The system passes current page metadata, content structure, and user location to the AI model before generating responses.

Voice-first design solves mobile UX challenges. Typing on mobile is slow and error-prone. Speaking is significantly faster and requires less cognitive load. The interface degrades gracefully: if voice recognition fails or isn't supported, users can type instead.

Technical Architecture

Voice Input Layer

• Web Speech API handles voice-to-text conversion
• Real-time transcription displays as users speak
• Automatic stop after 3 seconds of silence
• Manual stop via double-click interaction

Context Layer

• Extracts current page path, title, and main content
• Builds structured context object with page metadata
• Injects relevant information into prompt before API call
• Processes page contexts efficiently for each query

RAG System

• Static knowledge base with structured portfolio content
• Text-based search retrieves relevant context
• Claude API generates responses using retrieved information
• Responses limited to verified, context-specific information

Interface Layer

• Facebook Messenger-style chat widget
• Persistent conversation history within session
• Mobile-optimised touch interactions
• Graceful fallback to text input when voice fails

Implementation Challenges

Cross-browser voice recognition was the first hurdle. iOS Safari and Chrome handle the Web Speech API differently. Safari requires explicit user gestures and doesn't support continuous listening. The solution: detect browser capabilities and adjust behaviour accordingly.

Mobile performance required careful optimisation. Voice processing, context extraction, and API calls need to complete quickly to feel responsive. The system prioritises essential context extraction and efficient API communication to maintain good performance across devices.

Context relevance demanded experimentation. Early versions passed too much page data, creating noisy prompts. The system now extracts only essential information: page type, primary headings, and key content summaries. This keeps prompts focused and responses accurate.

Current State

The assistant handles conversational queries about my work, projects, and background. Response times are generally fast and responsive across devices. Voice recognition works well in typical environments, though accuracy varies with ambient noise and microphone quality.

It works across iOS Safari, Chrome (desktop and Android), and Edge. Firefox has limited Web Speech API support, so the system defaults to text input.

Users can ask questions like "What technologies do you work with" or "Tell me about the Lorenz visualization" and receive contextually appropriate responses based on their current page.

Example: On the projects page, asking "What's this project about?" returns details about the specific project being viewed. On the Now page, the same question provides context about current work and interests.

Trade-offs

Voice interfaces aren't universally better. They require microphone permissions, work poorly in noisy environments, and some users simply prefer typing. The system accommodates both interaction modes rather than forcing voice-first.

Page awareness adds complexity. Every query requires context extraction and processing. For static content, this overhead might not justify the benefit. For a portfolio showcasing AI capabilities, it demonstrates technical depth.

Lessons Learned

Start with text, add voice second. Building the chat interface first made voice integration cleaner. Voice became an input method rather than the entire feature.

Context quality matters more than quantity. Sending less, more relevant information produces better responses than dumping entire page content into prompts.

Mobile-first isn't optional. A significant portion of portfolio traffic comes from mobile devices. A feature that works poorly on mobile effectively doesn't work.

Graceful degradation is critical. Voice recognition fails. APIs timeout. Browsers lack support. The system needs fallback paths at every layer.

Try It

Visit any page and click the "Ask Miles" button. Ask about projects, technologies, or current work. The assistant understands context and provides relevant responses.

Or just type if you prefer. Both work.

What's Next?

The current implementation uses a static knowledge base. Queries match against pre-defined content structures. This works for straightforward questions about specific projects or skills.

Vector embeddings would enable semantic search across all content. Instead of keyword matching, the system could understand intent. A query like "show me projects involving real-time data" would surface relevant work even if those exact words don't appear in project descriptions.

The trade-off is complexity. Vector databases add infrastructure overhead, embedding generation adds latency, and semantic search introduces uncertainty in result relevance. For a portfolio with limited content, static context may be sufficient.

Currently evaluating whether the improved response quality justifies the added complexity.

Technical Details: Next.js 15, TypeScript, Web Speech API, Anthropic Claude API, Static knowledge base (investigating vector embeddings for future enhancement)

Code: Link to relevant sections on GitHub

← Back to Blog