University of East Anglia – Avatars for Visual Communication
Communication isn’t just about words.
After all, “Good job” can be transformed from a compliment to an insult with a twist of the voice. Every day, we use our tone, our hands, our faces and our body language to add to what we say, and even to alter its meaning.
It’s the same for sign language. So how do you teach a computer to read all that, and pass it on effectively?
“People imagine that sign language is all about the hands, but quite a lot comes from the face”, says Professor John Glauert. “Your facial expression changes what you’re talking about, and whether you’re happy about something or not.
“Also, when people perform signs they make mouth movements that go with the words. Sometimes you change the meaning of a sign with a facial expression. To change the type of fish, you might sign fish, but mouth “salmon”. The only difference between the sign variations is on the face, not the hands.”
Since 1999, the Virtual Humans Group has been looking at ways to translate everyday communication into recognisable and nuanced sign language. They develop systems that can interpret speech and language, and animate a 3D character to make recognisable signs and gestures.
This requires a diverse mix of skills. Based at the University of East Anglia, the team has expertise in speech and language recognition, 3D character animation, AI and computational linguistics.
Professor Glauert says: “Around 50,000 people in Britain use British Sign Language as their first language. With a number like that, some might not be particularly motivated to produce tailored services for signing deaf people. But part of our work is to make it more cost-effective, so that people don’t have the excuse not to do it.
“One of the things that’s struck me during the time I’ve been doing this is the gap in people’s understanding of the hearing-impaired community. There’s a lot of misunderstanding, which leads to people not providing them with what they need.”
It all started back in 1999, with an avatar called Simon the Signer. Simon translated text subtitles into animated sign language. It won two Royal Television Society Awards, but it was only a rough solution.
“Simon the Signer simply spotted important words and turned them into signs. So you’re basically putting stuff out in the order it would be in English. However, sign language doesn’t use the same order as the English language.
“If you do it like that, you can certainly turn it into something that most signers can understand, but it’s like turning a German or Spanish sentence into English without changing the word order. It doesn’t look right.”
The group originally started by looking solely at the linguistics side of the problem, but later created their own platform to animate the 3D character as well. The challenge was to balance two occasionally competing priorities: to make the signing movements as quick and fluid as possible, and to pass on the full meaning and nuance of the speech, often without any other visual means of communication.
By picking out words in order, Simon the Signer could translate quickly enough for the animation to look fluid and natural. However, for the signing sequences to have the right structure, the system needed to know more about the sentence before translating.
Enter the TESSA project. TESSA was a speech recognition system designed to translate sentences and phrases into true British Sign Language by identifying a phrase as it was being said, and producing the corresponding signs with almost no delay.
Of course, it’s a huge challenge to develop a system that can predict any sentence. There are so many variables. So TESSA was developed primarily for use in customer service situations, translating phrases spoken to customers by counter staff. In these situations, there are only a limited number of essential phrases that are likely to be said, so the system could spot them much quicker. In 2000, the technology was trialled by the Post Office in the UK.
Professor Glauert says: “The system has to recognise the whole sentence, but it can start making a pretty good guess midway through, and can come out almost straight away with an answer. It’s not looking for every phrase. It starts, gets better information, and corrects itself.”
The team also approached the issue of how to translate those meanings that weren’t spoken. First arriving in 2000, the ViSiCAST system took a number of features of communication into account, such as eyebrow position, plural verbs, and the size and placement of gestures. In two more EU-funded projects called eSIGN and Dicta-Sign, the group has since fine-tuned algorithms that can deliver gestures that differ subtly in hand shape and location.
In order to achieve this, they used an established transcription system called the Hamburg Notation System, which tackled the sentences phonetically. This is converted into computer-readable XML, and then processed through a module that uses this information to manipulate the skeleton of an avatar character.
“The actual speech recognition element of our work is state-of-the-art, but not groundbreaking. The more challenging part is what we do with the animation. It’s telling the system exactly how to move the bones of the fingers and arms to pass on a meaning.
“It’s working on two parallel tracks: One communicates what the hand is doing and what the body is doing. The other tracks the face, eyebrows and eyes. For the mouth, we use an animation technique called Morphing, which carries a description of the mouth shape.
“You add a mesh with a skin and clothes over the top. So one of the great things about our system is that we can play the information back with different characters, from humans to robots, aliens and monkeys.”
Their research has been applied in a number of different ways. IBM called their virtual signing technology “the most advanced and flexible system available”, and integrated it into a real-time system called Say It Sign It. Thanks to a collaboration with Orange, that system was modified to work on mobile devices.
It has helped to translate pre-defined information into sign language, from train announcements to weather forecasts and warnings about avalanches. In conjunction with Action on Hearing Loss, the Visual Humans Group has built resources that have allowed others to create sign language content for websites, including Germany’s Federal Ministry of Labour and Social Affairs and employment sites in the Netherlands.
There is also a valuable application in learning. Many children pick up words much more easily when there is a gesture associated with it, a process known as kinesthetic learning. With this in mind, a series of animated story DVDs have been released under the LinguaSign brand. They have been re-produced for English, Dutch, French and Portuguese. Following a 2013 trial of Key Stage 2 students at more than 50 UK primary schools, 62% of respondents confirmed that it improved a child’s speaking and listening skills in a new language.
The avatar technology has already begun to supplement the interpreters we’re used to seeing on TV. It has been showcased on Dutch programme Het Zandkasteel, and on online shows such as Wicked Kids. It has also been used by cultural heritage sites to help pass on stories from history using sign language.
Having almost mastered the hands, the team hopes to improve their work even further by improving their grasp of the expressive human face.
“It’s not about dragging deaf people into the hearing world, but providing them with the sort of services we take for granted on their own terms”, says Professor Glauert. “It’s what they want, rather than what we think they should get.”