Smart Sensing for Humans (SmaSH) Lab - Carnegie Mellon University
Sept 2024 - Present
Objective: To develop AI agents that respond more intelligently by leveraging semantic filtering, natural language processing, and voice activity detection (VAD) to filter out irrelevant or ambient speech. This minimizes false activations and ensures engagement that feels natural and non-intrusive.
TOOLS
OpenAI API
Whisper
Figma
SHI Bot: Adaptive Decision-Making for Multimodal Voice Interfaces is designed to support natural, confident interaction, particularly for users unfamiliar with or hesitant toward digital tools. By integrating NLP and semantic analysis, a classification engine evaluates the contextual relevance of input, triggering responses only when appropriate. This reduces false activations and builds user trust. Designed for hands-free, context-aware use, the interface is intuitive, non-intrusive, and empowers users to feel in control, not overwhelmed.
Overview:
Speech-to-text → NLP + Semantic Analysis
Continuously assesses real-time context and user needs to determine when to assist and when to stay passive
Helpful when you need it, invisible when you don’t
Optimized for situational awareness, not constant interaction
► Smart companion, not a source of distraction
Dataset
System Design
Use Cases
Accidental Command Trigger in Passive Conversation
A nearby user casually says “go home” during an unrelated conversation.
▶︎ The voice interface incorrectly interprets the phrase as a navigation command and initiates route guidance.
False Activation from Distant, Irrelevant Speech
Smart speaker overhears “no!” from another room
▶︎ The smart speaker misinterprets the emotional outburst as a cancellation command and prematurely stops an active task (e.g., timer or music).
Responding only to support the driver or driving task.
Retrieves directions or alternate routes so the driver doesn’t have to pause and search
Provides a brief definition or fact if a passenger asks about something they see outside
Reminds the driver when fuel is low
Alerts the driver if traffic conditions ahead have changed
Helpful aisstant when you need it, invisible when you don’t
Retrieve a specific document so the speak wouldn’t have to pause
▶︎ Records the meeting or document assignments for next steps
▶︎ Provides a brief definition of a term if asked
▶︎ Reminder if the meeting is running long
▶︎ Agent summarizes the meeting at the end
Next Steps
❏ Adaptive UI based on interaction patterns
❏ Real-time prompt editing pipeline
❏ Voice-based prompt revision
❏ Incorporating Emotional and Tonal Cues
❏ Personalized Interaction Models
Adapt system behavior based on individual user preferences, speech patterns, and interaction history.