The chart below gives an overview of our field of
research.
Please click on any item to get more information!

Human-Machine
Communication
Modern communication and information processing
systems enable us to interact with all kinds of computers and computer
controlled machines, e.g. to make a phone call, to access the
internet, to operate entertainment electronics, to use information
services, to operate household appliances, or even to navigate cars.
These systems have already become an inherent part of our environment
in everyday life (buzz phrase "pervasive computing").
With ongoing technological progress, these systems do not only become
more capable and efficient, but their handling can be rather complex.
For this reason, an adequate user interface is a major goal of
research and development to enable everyone to participate
effortlessly in a modern computing infrastructure.
Research at the Institute for Human-Machine
Communication focuses on the fundamentals of a widely intuitive,
natural, and therefore multimodal interaction between humans and
information processing systems. All forms of interaction, i.e.
modalities, that are available to humans, are to be investigated for
this purpose. Both the machine's representation of information and the
interaction technique is to be considered in this context, like text
and speech, sound and music, haptics, graphics and vision, gesture and
mimics, and emotions.
top
Media
Communications
In the area of media communications, research at the
Institute for Human-Machine Communication focuses on human interaction
with digital media technologies. We therefore investigate both the
semantic analysis of multimedia data (text, documents, handwriting,
audio, graphics, video), and techniques for information indexing and
data base retrieval. For this complex mixture of data and content,
intelligent pattern processing and recognition methods are explored,
and new interaction concepts are developed.
top
Pattern
Recognition
Pattern recognition is the research area that studies
the design and operation of systems that recognize patterns in data.
There are many kinds of different patterns, e.g. visual patterns,
temporal patterns, logical patterns, spectral patterns, etc. Pattern
recognition is an inherent part of every intelligent activity or
system. There are different approaches to pattern recognition,
including:
- On-line Boosting
- Conditional Random Fields
- Statistical or fuzzy pattern recognition
- Syntactic or structural pattern recognition
- Knowledge-based pattern recognition
The statistical approach views pattern recognition as
classification task, i.e. assigning an input to a category, based on
statistical criteria. It encloses subdisciplines like discriminant
analysis, feature extraction, error estimation, cluster analysis,
grammatical inference and parsing. Important application areas are
speech and image analysis, character recognition, man and machine
diagnostics, person identification, industrial inspection, and of
course, human-machine interaction. Consequentially, the area of
statistical pattern recognition is a fundamental scientific discipline
and area of research at the Institute for Human-Machine Communication.
top
Signal
Processing
Signal Processing means the theory and application of
filtering, coding, transmitting, estimating, detecting, analyzing,
recognizing, synthesizing, recording, and reproducing signals by
digital or analog devices or techniques. The term signal includes
audio, video, speech, image, communication, medical, musical, and
other signals in continous or discrete (i.e. sampled) form. Competence
in Signal Processing is vital for the development of new techniques in
Human-Machine Communication.
top
Statistical
Classifiers
Statistical Classifiers like Hidden Markov Models
(HMMs) have emerged during the last 5 years as probably the most
powerful paradigm for processing of dynamic patterns, such as time
series, speech signals, and other pattern sequences. Especially in
speech recognition, HMMs became the dominating technology. However, in
multimedia signal processing applications, involving mostly image
processing and computer vision problems with dynamic and static
patterns, HMMs are still far less often used. But this area became
more and more important during recent years, especially in
Human-Machine Communication. We therefore investigate the suitability
of HMMs with respect to various pattern recognition tasks in
multimedia information processing, like:
HMMs in speech recognition. HMMs for character,
handwriting and formula recognition. Image sequence processing with
HMMs. HMMs for gesture recognition. Video-indexing with HMMs and
stochastic video models. HMM-based audio-visual topic recognition.
Circular 1D- and 2D-HMMs for rotation-invariant recognition of
symbols. Recognition of deformed and occluded objects. HMMs in image
databases and image retrieval. Pseudo-2D-HMMs for face recognition.
Pseudo-2D-HMMs for pictogram recognition and spotting.
HMM-applications for person detection and object tracking. Gesture and
facial expression recognition with 1D- and Pseudo-3D-HMMs.
top
Neural
Networks
A Neural Network (NN) is an information-processing
structure inspired by the interconnected, parallel topology of the
mammalian brain. NNs use a collection of mathematical models to
emulate some of the observed properties of biological nervous systems
and draw on the analogies of adaptive biological learning. The key
element of the NN paradigm is its structure composed of a large number
of interconnected processing elements that are analogous to neurons
and that are tied together with weighted connections that are
analogous to synapses.
Learning in NNs involves adjustments to the
connections that exist between the neurons. Learning typically occurs
by example through training, or exposure to a set of verified
input/output data where the training algorithm iteratively adjusts the
connection weights (synapses). These connection weights store the
knowledge necessary to solve specific problems.
NNs are used for pattern recognition and
classification tasks, with the ability to robustly classify imprecise
input data, such as in character, speech and image recognition. The
advantage of NNs lies in their resilience against distortions in the
input data and their capability of learning. NNs can be implemented in
software or in specialized hardware.
top
Hybrid
Systems
Hybrid Systems used for pattern
recognition are an effective combination of neural
networks and statistical
classifiers, in particular Hidden-Markov Models (HMMs).
Special training procedures are required for Neural Networks in order
to combine them efficiently with HMMs. In many cases, the structure of
the underlying HMMs has to be modified for the combination with Neural
Networks. We designed a large variety of different combination
possibilites, including Maximum Mutual Information Neural Networks,
Discriminant Feature Transformation Hybrids and Tied-Posterior-HMMs.
top
Speech
Processing
Our research in speech processing aims to develop
algorithms and systems which are able to automatically recognize
continuous speech under real-world conditions. For that purpose, statistical
classifiers as well as hybrid
systems are being investigated. Most methods are based on
stochastic Hidden Markov models (HMMs), which are utilized as
reference models for speech sounds (phonemes). Words and complete
sentences can be built up from the phoneme models. The sentences are
analysed by a speech understanding module, giving an interpretation of
the meaning. Special problems have to be solved due to the great
variability in pronunciation as well as to the strong dependence from
the speaker. Here, we favourably apply pronunciation variants and
adaptive classifiers.
Utilizing the recognition capabilities of the speech
processing algorithms statistical methods are used to perform natural
speech interpretation by means of stochastic grammars. This ranges
from semantical decoding to automatic translation based on the former.
Especially expectation-based approaches are examined under parallel
exploitation of all participating knowledge bases - from the
acoustical to the semantical level.
top
Gestures,
Action and Emotion
top
Multimodal
Fusion
The combination of several different modalities for
input and output, such as haptics, speech, and gesture, provide for
efficient, intuitive and error-robust human-machine communication.
Merging different modalities to obtain multimodal information exchange
marks one of the most important topics of contemporary research on
human-machine communication, straightforwardly extending the idea of
an enhanced human-machine dialog by means of introducing natural input
and output channels. Some of the most interesting problems are:
- In which way may information transfer be distributed
appropriately - concerning temporal order and contents - over two
or more available modalities?
- To which extent does concurrent and semantically coupled use of
several modalities application-specifically improve robustness on
the one hand and efficiency as well as acceptance of an
information processing system on the other?
- May certain approaches or formalisms be transferred from one
modality to the other, and which statistical and maybe rule-based
methods can be applied to perform successive data fusion on the
different abstraction levels?
top
Interactive
Graphics
Techniques based on image processing render new ways
of natural human-machine interaction possible. These include gesture
recognition for visual command input, object tracking for locating
people and identifying their actions, and face recognition to
personalize interactive environments. New dimensions for interaction
open up by combining these methods with immersive technologies like
Augmented or Virtual Reality.
top
Face
Recognition
The recognition performance of human beings concerning
the classification of faces even under contrarious constraints, such
as partial occlusions, rotation or visual distortion can be seen as
extremely good. Such most people can easily spot known individuals in
larger groups, even under disadvantageous conditions.
Today all known technical systems are far beyond those
enormous evolutionary grown recognition capabilities. However, despite
the resulting problem to mess a technical system with the performance
of human beings, automated face recognition is still an active field
of resarch. In addition to this, the finding of faces in abritry
images as well as the recognition of facial expressions and mimiks is
focus of serveral activies within our institute. The modeling and
classfication is done using a wide range of signal processing methods
mentioned above.
Automated systems for face recognition enable a wide
spectrum for technical applications. For example automated entrance
systems for companies have nearly reached a stadium to be mature for
serial prudcts nowadays.
Real-Time Detection, Tracking and
Recognition
The key idea of our approach is to formulate the abilities to detect,
recognize and to track as classification problems. By doing so we can
apply the same techniques for all tasks. The major advantage is that
low-level computations can be shared and have to be done only once.

For each frame the integral representation needs to be computed only
once which is then used by all three modules for feature computation.
Note that each unit selects appropriate features for the specific task
however computation time of the features is negligible.

top
Person
and Object Tracking
top
Information
Indexing and Retrieval
Taking man machine interaction into consideration, the
basic goal of research in this field is to design queries in
multimedia data bases as intuitive and efficient as possible. The
background motivation of this discipline are the increasing sizes of
such digital data bases due to optimized compression algorithms,
increases in storage space and a growing internet community. While the
roots can be found in textual interpretation of documents, nowadays a
number of diverse applications and further digital data forms as
images, video, audio, and video games can be observed. These claim for
advanced methods of pattern recognition and artificial intelligence to
be interpreted.
In the field of Information Retrieval (IR) the main
focus lies on enabling intuitive access to such data for the user.
Information Indexing on the other hand is concerned with the
processing of the data streams for efficient latter queries. Like this
data will be categorized, subdivided or even sorted and summarized.
At our institute a demonstrator in the field of IR
exists for the recognition of tools drawn with the mouse for queries
in a tool databank. Furthermore we investigate the field of hummed or
sung queries for easy access in large music archives. Formerly unknown
polyphonic audio tracks will be preprocessed to enable dynamic
matching to monophonic samples. Finally in the field of II we strive
for the multimodal recognition of action units in recorded meetings.
top
User
Interfaces and Modeling
top
Usability
Engineering
The thorough design and test of novel dialogue
concepts with usability and acceptance tests yield man-machine
interfaces that users really enjoy.
top
Handwriting
Recognition
The goal of automatic handwriting recognition is to
enhance user-friendliness through pen-based input devices and to
increase automation for fast and efficient processing of large amounts
of documents. Automatic handwriting recognition can either be done at
the time of input "on-line", or "off-line" when
processing documents. On-line means in this context that
time-information, i.e. the trajectory of the strokes, is processed as
well. In contrast to this, off-line recognition only uses a picture.
Besides the well-known OCR (optical character
recognition) of machine-printed and digitized characters and the
recognition of single handwritten characters, the recognition of
cursive longhand plays a growing role for the input of text in mobile
devices.
Samples for application are:
- On-line handwriting recognition
Personal Digital Assistant (PDA), Pocket PC, digitizer tablet,
Notebook, Webpad, Tablet PC
- Off-line handwriting recognition
handwritten notes, address recognition (mail), form processing
- Document recognition (OCR)
archiving (newspapers, bills), indexing and retrieval in data
bases, form processing, address recognition
Depending on the application, different questions
prevail:
- localization, preprocessing and feature extraction of the script
- recognition of single characters, words or sentences
- segmentation properties (block letters, longhand, connected or
divided characters because of low quality or resolution)
- number of different fonts or writers (writer independent or not,
adaptation)
- choice of a codebook (size) or language model, grammar
Recognizing continous cursive longhand, which cannot
be easily segmented in single characters, is quite similar to a speech
recognition task. For this task, and for handwriting recognition as
well, statistical methods for pattern recognition (e.g. Hidden Markov
Models) are the most common technique for modeling and recognition.
top
Acoustics
Technical Acoustics and Noise Abatement
Physical and hearing-related methods for the
evaluation of noise are developed and implemented in measuring
systems. Sound Quality Design refers to creating the desired sound
characteristics of industrial products using psychophysical methods.
Psychoacoustics
The properties of the human auditory system are being
investigated and considered in practical applications, e.g. in the
context of source coding of audio signals, audiology, audio
engineering technology or room acoustics.
top