logo

Project Overview

Benedict Chat is an AI-powered conversational assistant built for the TTD Tech Radar platform that helps organizations navigate technology decisions through intelligent, context-aware conversations. Leveraging cutting-edge RAG (Retrieval-Augmented Generation) architecture, Benedict provides personalized technology recommendations by combining Azure AI Search with Azure Foundry models.

The system intelligently retrieves relevant information from TTD's Tech Radar database (150+ technologies) and website content, delivering accurate, source-cited responses while maintaining cost efficiency through sophisticated token management and rate limiting strategies. The solution achieved a remarkable 60-70% token reduction, significantly reducing operational costs while delivering sub-5 second response times.

Key Achievements

60-70% Token Reduction
Intelligent RAG Architecture
Highly Cost-Efficient
Reduced Operational Costs
Sub-5 Second Responses
Optimized Performance
Azure Foundry + AI Search
Hybrid Search with RAG

The Challenge

TTD Consulting needed an intelligent way to help clients and prospects explore the comprehensive Tech Radar database and understand which technologies best fit their specific needs. The challenges included:

  • Information Overload: 150+ technologies across 4 quadrants (Platforms, Tools, Languages & Frameworks, Techniques) and 4 rings (ADOPT, TRIAL, ASSESS, HOLD) created decision paralysis for users.
  • Context Complexity: Combining Tech Radar data with company services, case studies, and methodologies required understanding complex relationships and delivering relevant guidance.
  • Cost Management: Traditional approaches of sending all data to AI models resulted in 5,000-8,000 tokens per request, making the solution expensive at scale.
  • Response Quality: Generic responses without source attribution lacked credibility and actionable guidance for technology decision-making.
  • User Experience: Users needed conversational, intelligent guidance rather than manual database searching through technical documentation.
  • Scalability: Solution needed to handle multiple concurrent users with session isolation, rate limiting, and cost controls.
  • Real-Time Updates: Technology recommendations change frequently, requiring dynamic content retrieval without redeploying the AI model.

Our Solution

We designed and implemented a sophisticated RAG (Retrieval-Augmented Generation) architecture that combines Azure AI Search with Azure OpenAI's GPT-4o model to deliver intelligent, cost-effective technology recommendations.

RAG Architecture Components

  • Azure AI Search Service: Hybrid search strategy combining traditional keyword search with semantic vector search (1,536-dimensional embeddings) across dual indexes for Tech Radar (150+ technologies) and website content.
  • Azure Foundry: Advanced language model with Assistants API for stateful conversation threads, delivering intelligent responses with structured JSON formatting and source citations.
  • Embedding Service: text-embedding-ada-002 model with 24-hour caching using SHA256 hash keys and batch processing (up to 16 texts per API call) for cost optimization.
  • Smart Context Building: Intelligently retrieves top 8 relevant technologies and 2 most relevant web pages, building focused context within 4,000 token budget for optimal cost/quality balance.
  • Session Management: Independent conversation threads per user with 60-minute timeout, in-memory caching, and full message history preservation for context-aware responses.
  • Rate Limiting: Configurable per-session limits (default: 10 messages/minute) with session ID + IP address composite keys for fair use enforcement.

Performance Optimization

  • Token Reduction: RAG architecture reduced tokens from 5,000-8,000 to 1,500-3,000 per request (60-70% reduction) by retrieving only relevant context instead of full database.
  • Cost Efficiency: Significant reduction in operational costs through intelligent token management and context optimization, making the solution highly scalable.
  • Response Speed: Sub-5 second average response time for complex queries through parallel search execution and efficient context retrieval.
  • Relevance Filtering: Minimum 0.02 relevance score threshold ensures only high-quality, relevant results are included in AI context.
  • Caching Strategy: Embedding caching with 24-hour TTL and assistant ID caching dramatically reduces API calls and improves performance.

Key Features

  • Source Attribution: Every response includes cited sources with relevance scores, source types (TechRadar/Website), Tech Radar metadata (Ring, Quadrant), and direct links.
  • Conversation Intelligence: Stateful conversations with full message history, context-aware responses, and natural dialogue flow through GPT-4o Assistants API.
  • Real-Time Search: Dynamic content retrieval ensures recommendations always reflect latest Tech Radar updates without model retraining.
  • Security & Validation: XSS prevention, 2,000 character message limits, CORS configuration, function-level authorization, and comprehensive input sanitization.
  • Monitoring: Token usage tracking, cost estimation per interaction, response time metrics, source citation counts, and Application Insights integration.
  • Resilience: Polly-based retry policies, circuit breaker patterns, graceful degradation, and detailed error logging with correlation IDs.

Technical Excellence

Benedict Chat implementation followed modern software engineering practices and AI best practices:

  • Clean Architecture: Service-based abstractions with dependency injection, SOLID principles, and interface-driven design for testability and maintainability.
  • Async/Await Pattern: Fully asynchronous implementation for maximum scalability, enabling efficient handling of multiple concurrent chat sessions.
  • RAG Optimization: Intelligent token budgeting with 4,000 token context limit, relevance-based filtering, and smart result limiting for cost/quality balance.
  • Caching Strategy: Multi-layer caching (embeddings with 24-hour TTL, assistant IDs, search results) reduces API calls and improves performance.
  • Error Handling: Comprehensive try-catch blocks, structured logging with correlation IDs, and graceful degradation with user-friendly error messages.
  • Monitoring & Analytics: Complete Application Insights integration with custom metrics for token usage, costs, response times, and source citation tracking.
  • Security Best Practices: Input validation, XSS prevention, rate limiting, CORS configuration, and secure secret management through Azure Key Vault.

Technology Stack

  • Backend: .NET with Azure Functions (Isolated Worker Model), Azure OpenAI Service, Azure AI Search, Polly for resilience
  • AI Services: Azure Foundry (Assistants API), text-embedding-ada-002 (1,536 dimensions), Azure AI Search hybrid search with semantic ranking
  • Frontend: Angular with TypeScript, Angular Material Design, RxJS reactive services, Azure Static Web Apps hosting
  • Storage: Azure Blob Storage for data persistence, Azure Table Storage for analytics logs, in-memory caching (IMemoryCache)
  • DevOps: GitHub Actions for CI/CD, Application Insights for monitoring, environment-based configuration management

Results & Impact

Benedict Chat delivered exceptional business value and technical innovation for TTD Consulting:

60-70% Cost Reduction

RAG architecture reduced tokens from 5,000-8,000 to 1,500-3,000 per request, significantly lowering operational costs while maintaining high-quality responses.

Lightning-Fast Responses

Sub-5 second average response time for complex technology queries through optimized RAG context retrieval and parallel search execution.

Source-Backed Credibility

Every recommendation includes cited sources with relevance scores, building trust and providing actionable guidance for technology decisions.

Enhanced User Engagement

Conversational interface makes Tech Radar exploration intuitive and engaging, capturing user interests and technology trends for business insights.

Production Stability

Session isolation, rate limiting, resilient design with Polly retry policies, and comprehensive monitoring ensure reliable 24/7 operation.

AI Innovation Leadership

Demonstrates TTD's expertise in cutting-edge AI/RAG architecture, positioning the company as a leader in AI-powered advisory solutions.

Ready to Build Your AI Solution?

Let's discuss how we can help you leverage AI and RAG architecture to create intelligent, cost-effective solutions for your business.

Get In Touch Try Benedict Chat