FYI: ABC-TV's 'Prime Time' to Showcase Role of Sensory's Fluent Animated Speech


Subject: FYI: ABC-TV's 'Prime Time' to Showcase Role of Sensory's Fluent Animated Speech
From: Ginette Perkins (ginettep@seals.org)
Date: Fri Mar 09 2001 - 10:08:56 PST


SANTA CLARA, Calif.--(BUSINESS WIRE)--March 5, 2001--At the Tucker-Maxon
Oral School in Portland, Ore., deaf children ages 6 through 12 are
improving their listening and speech-production skills with the help of
Baldi, a dome-headed, talking, computer-generated face.

Tucker-Maxon's unusual talking tutor -- along with the powerful software
technology that combines 3D animation with speech recognition and
audio-visual generation of speech -- will be showcased on the ABC-TV news
program Prime Time on Thursday, March 8 (10 p.m. Eastern). To demonstrate
the power and accuracy of the software as a teaching tool for the
profoundly deaf, the voice and face of Prime Time's co-anchor Diane Sawyer
will be converted to a so-called conversational agent.

The software that allows the animated face of Sawyer -- and Baldi -- to
talk and be understood by Tucker-Maxon students is Sensory Inc.'s Fluent
Animated Speech(TM) technology. Sensory, based in Santa Clara, is a leading
provider of embedded speech technology.

Origins of the Software

Sensory's Fluent Animated Speech software had its beginnings through
research and development efforts primarily at the Oregon Graduate Institute
Center for Spoken Language Understanding and at the Perceptual Science
Laboratory at University of California-Santa Cruz. In 1997, researchers
from these two institutions left their academic posts to found Beaverton,
Ore.-based Fluent Speech Technologies, which Sensory acquired last October.

Technology Behind the Tucker-Maxon Story

With Sensory's Fluent Animated Speech technology, programmers and
non-programmers alike can control the facial expressions, emotional
expressions and lip synchronization of an animated 3D agent or avatar. At
Tucker-Maxon, for example, educators with minimal computer skills easily
design programs that both speak and listen. The software incorporates the
animated face, Baldi, whose articulators are aligned with the utterances
produced in either synthesized or natural speech. The motion of Baldi's
lips, eyes and facial expressions add meaning to the words "spoken" by the
computer. Around a topic chosen by the teacher, Baldi can ask a question;
the student will be prompted to respond. That response will determine the
next turn of the dialogue.

"The ability to create realistic, talking characters is no longer of
interest solely to professional animators or producers of motion pictures,"
said Todd Mozer, president and chief executive officer of Sensory. "Our
Fluent Animated Speech technology will bring such capabilities within the
reach of nearly everyone."

Applications in Education and Beyond

By achieving its unprecedented accuracy of speech and facial animation,
Sensory's Fluent Animated Speech technology will enable animated characters
to play roles in Internet-based commerce, entertainment and customer
support as well as education. Possible applications include adding an
animated agent to a text or voice message; automating an interactive web
host or agent; adding personality and emotional expressions to a web
character or message; and creating online games in which the players
control the speech of the characters.

New Animation Technology Represents a
Breakthrough

The Fluent Animated Speech technology employs a non-linear morphing
technique that enables Sensory to take a few dozen static pictures and
blend them to create a virtually unlimited assortment of expressions and
articulations. The technology provides memorable, highly accurate real-time
lip-synching, as well as the delivery of emotional content by a 3D animated
agent, synchronized to a variety of speech and text sources. The 3D models
can be created using off-the-shelf 3D graphics tools.

The speech output comes from Sensory's Fluent Speech(TM) Text-to-Speech
engine, which can reside in either a client or server environment. The
Fluent Speech Text-to-Speech engine is an LPC (linear predictive coding),
diphone-based speech synthesizer capable of expanding or contracting pitch
periods and changing speech rates to produce a variety of sounds. The LPC
approach makes it possible for the Fluent Animated Speech technology to
synthesize high-quality speech using very little computer memory.

The Fluent Animated Speech 3D animation comprises a general-purpose OpenGL-
or Direct3D-based real-time 3D rendering engine and a viseme generation
engine. (A viseme is the visual component of a phoneme, which is the
smallest individual component of speech.) The viseme generation engine is a
coarticulation package that generates weighted morphing data (in the form
of visemes) that drives the animated speech from either synthetic or
natural speech.

The coarticulation package is an important part of Sensory's special speech
software code that enables animated characters to speak with realistic
facial and mouth movements. In humans, coarticulation is the coordination
by the brain of the lips, tongue and jaw to create the movements needed to
produce adjacent vowels and consonants simultaneously during normal speech.
Coarticulation ensures that speech is produced smoothly, and it spreads out
acoustic information about a vowel or consonant to help a listener
understand what is being said. With Sensory's coarticulation package,
animated characters can communicate at five syllables per second - the same
rate that humans produce speech.

The Fluent Speech Animation technology's 3D rendering engine allows the
rendering of arbitrary 3D models and uses a morphing-based approach to
animation. Exporters for 3D authoring tools enable 3D models to be saved in
a compatible format. Additionally, the Fluent Animated Speech technology
can take advantage of other vendors' existing tools for the scripting of
speech and facial content and the automatic generation of expressions and
facial gestures.

Users can control lighting and background images as well as the characters
being animated, and AVI output is available. The Sensory technology comes
with a selection of human and animal 3D models that include the mouth and
facial targets required for animating (i.e., not every feature in a face or
mouth needs to be animated - and thus modeled - for creating realistic
speech). As a result, users can quickly create realistic animated
characters, along with background environments.

Price, Availability and System Requirements

Sensory's Fluent Animated Speech technology is available now. For networked
applications, typical pricing is based on an Application Service Provider
(ASP) model with an annual per-port fee. For embedded applications, pricing
is under $2 per unit in volume. The technology currently runs under Windows
95/98/2000/ME on a minimum 266 MHz Pentium II processor with at least 64 MB
of RAM.

About Sensory, Inc.

Founded in 1994, Sensory, Inc., is the leading provider of high-quality,
low-cost speech recognition and speech synthesis technology. Sensory's
speech technology is embedded in consumer products such as personal
electronics, Internet appliances, interactive toys, and high-end telephone
and automotive applications. Sensory offers a complete line of integrated
circuit (IC) and embedded software solutions, including the Interactive
Speech(TM) line of low-cost ICs and the Fluent Speech(TM) large-vocabulary
software engine. Sensory's customers include leading companies in the
consumer electronics and embedded product markets, such as JVC, Hasbro,
Mitsubishi, Mattel, Sega, Sharper Image, Fisher-Price, Sony, Tektronix,
Toshiba, Uniden, VOS and Westclox. More information is available from
Sensory's web site at www.sensoryinc.com.

CONTACT: Sensory, Inc.
Erik Soule
408/240-1575
marcom@sensoryinc.com

Ginette Perkins
Assistive Technology Information & Referral Specialist
Washington Assistive Technology Alliance
1-800-214-8731 (Toll Free)
(509) 328-9350 (V, TTY)
(509) 326-2261 (Fax)
ginettep@seals.org



This archive was generated by hypermail 2a24 : Fri Mar 09 2001 - 10:02:06 PST