Connecting hearts, healing with technology—powered by Azure and ALIANDO
Executive Summary
A leading communications platform serving the healthcare sector set out to revolutionise how patients, clinicians, and families connect through voice. To scale securely and introduce multilingual, AI-driven capabilities, the organisation partnered with ALIANDO to design and deliver a modern Azure architecture—bringing together real-time transcription, sentiment and tone analysis, translation, text-to-speech, and contextual search in a unified solution. The result is an extensible, cloud-native platform that supports healthcare-context communication and continuous innovation.
Client
A leading healthcare communications provider—focused on humane, multilingual voice experiences for patients and care teams.
Challenge
The platform’s vision required five advanced capabilities, all working together at scale:
- Transcription with automatic language detection and healthcare context.
- Sentiment & tone analysis across languages.
- Translation that preserves clinical meaning and intent in real time.
- Text-to-speech with natural voices across locales.
- Contextual search over growing message stores—without reprocessing content.
Distinct constraints compounded the challenge:
- Real-time speech is approximately seven times costlier than batch; fast transcription options were still in preview.
- Custom healthcare speech models demand large training datasets; without them, standard models can be similar in outcomes.
- Tone replication at production quality requires dedicated ML R&D and was out of scope for the initial release.
Why ALIANDO
ALIANDO combined cloud architecture, LLMOps, and healthcare domain mapping to deliver a production-ready path—balancing cost, accuracy, and time-to-value. The approach included:
- Leveraging proven Azure Cognitive Services for speech, layered with LLM-based analysis and translation for healthcare context.
- Implementing embedding-driven search to avoid repeated analysis and enable rich, patient-specific context queries.
- Building API-first services with CI/CD and Terraform for repeatable deployment across environments.
Solution Overview
Architecture at a Glance
- Identity & security: Azure AD integration, Key Vault; segmented storage and patient-level security across embeddings and analysis artefacts.
- Networking & ops: Jumpbox, VPN (P2S with certificates), Terraform IaC, Azure DevOps/GitHub Actions for CI/CD across development and production environments.
- API Layer: Azure API Management (APIM) with four endpoints—Batch processing, Real-time processing, Real-time with TTS, and RAG Agent for retrieval-augmented generation.
- Data & intelligence:
- Speech: Azure Speech (Batch & Real-time) for transcription and language identification.
- LLM: GPT-4o mini for classification, sentiment, translation.
- Embeddings: text-embedding-3-small to power contextual search.
- Storage: Analysis outputs and embeddings persisted (e.g., Cosmos DB/Mongo API) for fast contextual queries without reprocessing.
High-level flow: A voice message hits APIM, routes to Batch or Real-time Speech; LLM performs translation/classification/sentiment; embeddings are generated and stored; responses return via API with language, text, sentiment, and optional neural TTS output.
Key Capabilities Delivered
1) Transcription (Batch & Real-time)
- Automatic language detection and healthcare-aware text output.
- Batch for cost-efficiency; Real-time for interactive clinician-patient scenarios.
2) Sentiment & Classification
- LLM-based multi-level sentiment and category tagging across languages.
- Tone analysis acknowledged but deferred for a future ML track.
3) Translation
- Bidirectional English↔supported languages; preserves clinical context and intent.
- Real-time translation pathway for live conversation use cases.
4) Text-to-Speech (Neural)
- Natural voices with accents and dialects; future-proof path for custom voices.
5) Contextual Search (RAG)
- Embedding-based retrieval over patient message histories—supports text and voice queries via NLP.
- Eliminates repeated analysis; returns contextually ranked matches.
Implementation Approach
Secure Landing Zone & Environments
- Development and production environments with IaC, CI/CD pipelines, observability, and LLMOps integration.
API Contracts & Developer Velocity
- Swagger definitions, C# or JS implementation, integrated telemetry and API authentication/authorisation.
Deliverables
- Low-level designs, Terraform & pipeline setup docs, API definition & code, test results, and LLMOps setup documentation—with structured acceptance checkpoints per phase.
Phased Plan
- Prototyping → Beta Testing → Production, with a two-month technical support window post-go-live.
Governance, Security & Compliance
- Centralised secrets in Key Vault; role-based access via Azure AD.
- Patient-level segmentation of embeddings and analysis data to enforce least-privilege, contextual retrieval.
- LLMOps practices aligned to CI/CD, telemetry, and safe prompt/data handling.
Outcomes & Impact
Based on the delivered scope and architecture:
- Faster, multilingual care communication through real-time transcription, translation, and natural TTS—reducing language barriers at the point of need.
- Safer, more empathetic messaging with sentiment classification to flag inappropriate or negative tone for clinical review.
- Efficient knowledge retrieval via embedding-powered contextual search, minimising repeat compute and speeding clinical workflows.
- Operational resilience & repeatability with Terraform, CI/CD, and environment parity for continuous releases.