Tải bản đầy đủ
4 Open Issues, Future Research Directions, and Conclusions

4 Open Issues, Future Research Directions, and Conclusions

Tải bản đầy đủ

References
[1] J.A. Anderson, M.T. Gately, P.A. Penz, and D.R. Collins, “Radar signal categorization
using a neural network,” Proceedings of IEEE, vol. 78, pp. 1646–1657, 1990.
[2] J.A. Anderson and J.P. Sutton, “A network of networks: Computation and neurobiology,” World Congress of Neural Networks, vol. 1, pp. 561–568, 1995.
[3] E. Andre, G. Herzog, and T. Rist, “From visual data to multimedia presentations,” IEEE
Colloquium Grounding Representations: Integration of Sensory Information in Natural
Language Processing, Artificial Intelligence and Neural Networks, London, pp. 1–3,
May 1995.
[4] M. Arbib, The Handbook of Brain Theory and Neural Networks, MIT Press, Cambridge,
1995.
[5] R. Bajaj and S. Chaudhury, “Signature verification using multiple neural networks,”
Pattern Recognition, vol. 30, no. 1, pp. 1–7, 1997.
[6] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman, “Eigenfaces vs. Fisherfaces:
Recognition using class specific linear projection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, July 1997.
[7] H. Brandt, H.W. Lahmann, and R. Weber, “Quality control of saw blades based on
neural networks and laser vibration measurements,” Second International Conference
on Vibration Measurements by Laser Techniques: Advances and Applications. Proceedings of the SPIE — The International Society for Optical Engineering, vol. 2868,
pp. 119–124, Ancona, Italy, 1996.
[8] C. Bregler and Y. Konig, “Eigenlips for robust speech recognition,” In Proc. of the
Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP’94), pp. 669–672,
Adelaide, Australia, 1994.
[9] C. Bregler, S.M. Omohundro, and Y. Konig, “A hybrid approach to bimodal speech
recognition,” In 28th Asilomar Conf. on Signals, Systems, and Computers, pp. 572–
577, Pacific Grove, CA, 1994.
[10] J.S. Bridle, “Probabilistic interpretation of feedforward classification network outputs,
with relationships to statistical pattern recognition,” In Neuro-computing: Algorithms,
Architectures and Applications, F. Fogelman-Soulie and J. Hérault, editors, pp. 227–
236, Springer-Verlag, Berlin, 1991.
[11] R. Brunelli and T. Poggio, “Face recognition: Features versus templates,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, pp. 1042–1052, 1993.
[12] J. Cao, M. Ahmad, and M. Shridhar, “A hierarchical neural network architecture for
handwritten numeral recognition,” Pattern Recognition, vol. 30, no. 2, pp. 289–294,
1997.
[13] Y. Chan, S.H. Lin, Y.P. Tan, and S.Y. Kung, “Video shot classification using human faces,” in IEEE International Conference on Image Processing 1996, Lausanne,
Switzerland.

©2001 CRC Press LLC

[14] V. Chandrasekaran, M. Palaniswani, and T.M. Caelli, “Range image segmentation by
dynamic neural network architecture,” Pattern Recognition, vol. 29, no. 2, pp. 315–329,
1996.
[15] T. Chen, A. Katsaggelos, and S.Y. Kung, editors, “Content-based indexing and retrieval
of visual information,” in IEEE Signal Processing Magazine, pp. 45–48, July 1997.
[16] K. Chen, D. Xie, and H. Chi, “Text-dependent speaker identification using hierarchical
mixture of experts,” Acta Scientiarum Naturalium Universitatis Pekinensis, vol. 32,
no. 3. pp. 396–404, May 1996.
[17] T. Chen, A. Katsaggelos, and S.Y. Kung, editors, “The past, present, and future of
multimedia signal processing,” IEEE Signal Processing Magazine, July 1997.
[18] Y.-K. Chen, Yunting Lin, and S.Y. Kung, “A feature tracking algorithm using neighborhood relaxation with multi-candidate pre-screening,” in Proceedings of IEEE International Conference on Image Processing, vol. II, pp. 513–516, Lausanne, Switzerland,
Sept. 1996.
[19] T. Chen and R. Rao, “Audio-visual interaction in multimedia communication,” Proceedings and ICASSP, vol. 1, pp. 179–182, Munich, April 1997.
[20] G.I. Chiou and J.N. Hwang, “Image sequence classification using a neural network
based active contour model and a hidden Markov model,” International Conference on
Image Processing, vol. III, pp. 926–930, Austin, Texas, November 1994.
[21] G.I. Chiou and J.N. Hwang, “A neural network based stochastic active contour model
(NNS-SNAKE) for contour finding of distinct features,” IEEE Transactions on Image
Processing, vol. 4, no. 10, pp. 1407–1416, October 1995.
[22] L.D. Cohen and I. Cohen, “Finite-element methods for active contour models and
balloons for 2-D and 3-D images,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 15, no. 11, pp. 1131–1141, Nov. 1993.
[23] T.F. Cootes and C.J. Taylor, “Active shape models — smart snakes,” In Proceedings of
British Machine Vision Conference, pp. 266–275, Springer-Verlag, Berlin, 1992.
[24] J.M. Corridoni, A. del Bimbo, and L. Landi, “3D object classification using multi-object
Kohonen networks,” Pattern Recognition, vol. 29, no. 6, pp. 919–935, 1996.
[25] S.M. Courtney, L.H. Finkel, and G. Buchsbaum, “A multistage neural network for color
constancy and color induction,” IEEE Transactions on Neural Networks, vol. 6, no. 4,
pp. 972–985, 1995.
[26] I.J. Cox, J. Ghosn, and P. Yianilos, “Feature-based face recognition using mixture
distance,” Tech. rep. 95-09, NEC Research Institute, 1995.
[27] P. Duchnowski, U. Meier, and A. Waibel, “See Me, Hear Me: Integrating automatic
speech recognition and lipreading,” ICSLP’95, Yokohoma, Japan, pp. 547–550, 1995.
[28] J.L. Elman, “Finding structure in time,” Cognitive Science, vol. 14, pp. 179–211, 1990.
[29] D.B. Fogel, Evolutionary Computation: Toward a New Philosophy of Machine Intelligence, IEEE Press, Piscataway, NJ, 1995.
[30] K. Fukushima and N. Wake, “Handwritten alphanumerical character recognition by
the neocognition,” IEEE Transactions on Neural Networks, vol. 2, no. 3, pp. 355–365,
1991.

©2001 CRC Press LLC

[31] A.J. Goldschen, O.N. Garcia, and E. Petajan, “Continuous optical automatic speech
recognition by lipreading,” in 28th Asilomar Conference on Signals, Systems, and Computers, pp. 572–577, Pacific Grove, CA, 1994.
[32] L. Guan, “Image restoration by a neural network with hierarchical cluster architecture,”
Journal of Electronic Imaging, vol. 3, pp. 154–163, April 1994.
[33] L. Guan, S. Perry, R. Romagnoli, H.S. Wong, and H.S. Kong, “Neural vision system
and applications in image processing and analysis,” Proceedings of IEEE International
Conference on Acoustics, Speech and Signal Processing, vol. II, pp. 1245–1248, Seattle,
WA, 1998.
[34] D.H. Han, H.K. Sung, and H.M. Choi, “Nonlinear shape restoration based on selective
learning SOFM approach,” Journal of the Korean Institute of Telematics and Electronics,
vol. 34C, no. 1, pp. 59–64, Jan. 1997.
[35] T. Harris, “Kohonen neural networks for machine and process condition monitoring,”
Proceedings of the International Conference on Artificial Neural Nets and Genetic
Algorithms, Ales, France, pp. 3–4, April 1995.
[36] S. Haykin, Neural Networks, 2nd ed., Prentice-Hall, Englewood Cliffs, NJ, 1998.
[37] M.E. Hennecke, K.V. Prasad, and D.G. Stork, “Using deformable templates to infer
visual speech dynamics,” Proceedings of 28th Annual Asilomar Conference, vol. 1,
pp. 578–582, Pacific Grove, CA, Nov. 1994.
[38] H. Hermansky, N. Morgan, A. Bayya, and P. Kohn, “RASTA-RLP speech analysis
technique,” ICASSP’92, pp. 121–124, San Francisco, CA, 1992.
[39] J. Hertz, A. Krogh, and R.G. Palmer, Introduction to the Theory of Neural Computation,
ch. 7, pp. 163–196, 1991.
[40] T.S. Huang, C.P. Hess, H. Pan, and Z.-P. Liang, “A neuronet approach to information fusion,” Proceedings of IEEE First Workshop on Multimedia Signal Processing,
Princeton, NJ, June 1997.
[41] J.N. Hwang and H. Li, “A limited feedback time delay neural network,” International
Joint Conference on Neural Networks, Nagoya, Japan, pp. 271–274, October 1993.
[42] J.N. Hwang and E. Lin, “Mixture of discriminative learning experts of constant sensitivity for automated cytology screening,” 1997 IEEE Workshop for Neural Networks
for Signal Processing, Amelia Island, FL, September 1997.
[43] J.N. Hwang, S.Y. Kung, M. Niranjan, and J.C. Principe, editors, “The past, present, and
future of neural networks for signal processing,” IEEE Signal Processing Magazine,
November 1997.
[44] R.A. Jacobs, M.I. Jordan, S.J. Nowlan, and G.E. Hinton, “Adaptive mixtures of local
experts,” Neural Computation, vol. 3, pp. 79–87, 1991.
[45] D.L. James, “SARDNET: A self-organizing feature map for sequences,” Advances in
Neural Information Processing Systems, vol. 7, pp. 577–584, Nov. 1994.
[46] J. Jang, C.-T. Sun, and E. Mizutani, editors, Neuro-Fuzzy and Soft Computing, PrenticeHall, NJ, 1997.
[47] M.I. Jordan and R.A. Jacobs, “Learning to control an unstable system with forward
modeling,” Advances in NIPS ’90, pp. 325–331, 1990.

©2001 CRC Press LLC

[48] M.I. Jordan and R.A. Jacobs, “Hierarchies of adaptive experts,” Witkin Neural Information Systems, vol. 4, 1992.
[49] B.H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,”
IEEE Transactions on Signal Processing, vol. 40, no. 12, pp. 3043–3054, 1992.
[50] M. Kass and D. Terzopoulos, “Snakes: Active contour models,” International Journal
of Computer Vision, pp. 321–331, 1988.
[51] J. Koh, M. Suk, and S.M. Bhandarkar, “A multilayer self-organizing feature map for
range image segmentation,” Neural Networks, vol. 8, no. 1, pp. 67–86, 1995.
[52] T. Kohonen, “Self-organized formation of topologically correct feature maps,” Biological Cybernetics, vol. 43, pp. 59–69, 1982.
[53] T. Kohonen, Self-Organization and Associative Memory, 2nd edition, Springer-Verlag,
Berlin, 1984.
[54] H. Kong and L. Guan, “A self-organizing tree map for eliminating impulse noise with
random intensity distributions,” Journal of Electronic Imaging, vol. 6, no. 1, pp. 36–44,
1998.
[55] H. Kong, “Self-organizing tree map and its applications in digital image processing,”
Ph.D. thesis, University of Sydney, Sept. 1998.
[56] B. Kosko, “Fuzzy systems are universal approximators,” Proceedings of IEEE International Conference on Fuzzy Systems, pp. 1153–1162, San Diego, CA.
[57] S.Y. Kung, Digital Neural Networks, Prentice-Hall, Englewood Cliffs, NJ, 1993.
[58] S. Kung and J. Taur, “Decision-based neural networks with signal/image classification
applications,” IEEE Transactions on Neural Networks, vol. 6, no. 1, pp. 170–181,
Jan. 1995.
[59] S.Y. Kung and J.-N. Hwang, “Neural networks for intelligent multimedia processing,”
Proceedings of the IEEE, vol. 86, no. 6, pp. 1244–1272, June 1998.
[60] S.Y. Kung, J.-S. Taur, and S.-H. Lin, “Synergistic modeling and applications of fuzzy
neural networks,” Proceedings of the IEEE, vol. 87, no. 8, August 1999.
[61] S.-H. Lai and M. Fang, “Robust and automatic adjustment of display window width
and center for MR images,” SCR Invention No. 97E7464, 1997.
[62] J. Lampinen and E. Oja, “Distortion tolerant pattern recognition based on selforganizing feature extraction,” IEEE Transactions on Neural Networks, vol. 6, no. 3,
pp. 539–547, 1995.
[63] R. Laganiere and P. Cohen, “Gradual perception of structure from motion: A neural
approach,” IEEE Transactions on Neural Networks, vol. 6, no. 3, pp. 736–748, 1995.
[64] K. Langer and F. Bodendorf, “Flexible user-guidance in multimedia CBT-applications
using artificial neural networks and fuzzy logic,” International ICSC Symposia on Intelligent Industrial Automation and Soft Computing, pp. B9–13, March 1996.
[65] F. Lavagetto, “Converting speech into lip movements: A multimedia telephone for
hard of hearing people,” IEEE Transactions on Rehabilitation Engineering, p. 114,
March 1995.
[66] F. Lavagetto, S. Lepsoy, C. Braccini, and S. Curinga, “Lip motion modeling and speech
driven estimation,” ICASSP’97, pp. 183–186, Munich, Germany, April 1994.

©2001 CRC Press LLC

[67] S.H. Lin, “Biometric identification for network security and access control,” Ph.D. dissertation, Dept. of Electrical Engineering, Princeton University, Princeton, NJ, 1996.
[68] S.-H. Lin, S. Kung, and L.-J. Lin, “A probabilistic DBNN with applications to sensor
fusion and object recognition,” Proceedings of 5th IEEE Workshop on Neural Networks
for Signal Processing, pp. 333–342, Aug. 1995.
[69] S.-H. Lin, Y. Chan, and S.Y. Kung, “A probabilistic decision-based neural network
for location of deformable objects and its applications to surveillance system and video
browsing,” IEEE International Conference on Acoustics, Speech and Signal Processing,
Atlanta, GA, 1996.
[70] S.-H. Lin, S.Y. Kung, and L.J. Lin, “Face recognition/detection by probabilistic
decision-based neural networks,” IEEE Transactions on Neural Networks, vol. 8, no. 1,
pp. 114–132, Jan. 1997.
[71] S.-H. Lin, S.Y. Kung, and M. Fang, “A neural network approach for face/palm recognition,” Proceedings of 5th IEEE Workshop on Neural Networks for Signal Processing,
pp. 323–332, Aug. 1995.
[72] Z. Liu, J. Huang, Y. Wang, and T. Chen, “Extraction and analysis for scene classification,” Proceedings of IEEE First Workshop on Multimedia Signal Processing, Y. Wang
et al., editors, Princeton, NJ, June 1997.
[73] S.W. Lu and A. Szeto, “Hierarchical artificial neural networks for edge enhancement,”
Pattern Recognition, vol. 26, no. 8, pp. 1149–1163, 1993.
[74] H. McGurk and J. MacDonald, “Hearing lips and seeing voices,” Nature, pp. 746–748,
Dec. 1976.
[75] T. Mandl and H.C. Womser, “Soft computing — vague query handling in object oriented
information systems,” Proceedings HIM’95, pp. 277–291, Mase Konstanz, Germany,
April 1995.
[76] M. Mangeas and A.S. Weigend, “First experiments using a mixture of nonlinear experts
for time series prediction,” 1995 World Congress on Neural Networks, vol. 2, pp. 104–
109, Washington, DC, July 1995.
[77] K. Mase and A. Pentland, “Automatic lipreading by optical-flow analysis,” Systems and
Computers in Japan, vol. 22, no. 6, pp. 67–76, 1991.
[78] Y. Matsuyama and M. Tan, “Multiply descent cost competitive learning as an aid for
multimedia image processing,” Proceedings of 1993 International Joint Conference on
Neural Networks, Nagoya, Japan, pp. 2061–2064, Oct. 1993.
[79] M. McLuhan, Understanding Media, McGraw-Hill, New York, 1964.
[80] B. Moghaddam and A. Pentland, “Face recognition using view-based and modular
eigenspaces,” SPIE, vol. 2257, 1994.
[81] S.Y. Moon and J.N. Hwang, “Robust speech recognition based on joint model and
feature space optimization of hidden Markov models,” IEEE Transactions on Neural
Networks, vol. 8, no. 2, pp. 194–204, March 1997.
[82] S. Morishima, K. Aizawa, and H. Harashima, “An intelligent facial image coding driven
by speech and phoneme,” ICASSP, pp. 1795–1978, 1989.
[83] Committee Draft of the Standard: ISO 11172-2, “Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbits/s,” Nov. 1991.

©2001 CRC Press LLC

[84] Committee Draft of the Standard: ISO 13818, “MPEG-2 video coding standard,”
Nov. 1994.
[85] Special Issue of MPEG-4 Video Coding Standards, IEEE Transactions on Circuits and
Systems for Video Technology, Feb. 1997.
[86] “Second draft of MPEG-7 applications document,” ISO/IEC JTC1/SC29/WG11 Coding
of Moving Pictures and Associated Audio MPEG97/N2666, Oct. 1997.
[87] “Third draft of MPEG-7 requirements,” ISO/IEC JTC1/SC29/WG11 Coding of Moving
Pictures and Associated Audio MPEG97/N2606, Oct. 1997.
[88] T. Murdoch and N. Ball, “Machine learning in configuration design,” (AI EDAM) Artificial Intelligence for Engineering Design, Analysis and Manufacturing, vol. 10, no. 2,
pp. 101–113, April 1996.
[89] Y. Nakagawa, E. Hirota, and W. Pedrycz, “The concept of fuzzy multimedia intelligent communication system (FuMICS),” Proceedings of the Fifth IEEE International
Conference on Fuzzy Systems, pp. 1476–1480, New Orleans, LA, Sept. 1996.
[90] R. Nakatsu, “Media integration for human communication,” IEEE Signal Processing
Magazine, pp. 36–37, July 1997.
[91] C.L. Nikias, “Riding the new integrated media systems wave,” IEEE Signal Processing
Magazine, pp. 32–33, July 1997.
[92] Neural Networks for Signal Processing, Proceedings of IEEE Workshops, 1991–1997,
vols. I–VII, IEEE Press.
[93] P.T.A. Nguyen, R. Romagnoli, P. Fekete, M.R. Arnison, L. Guan, and C. Cogswell, “A
self-organizing map for extracting features of chromosomes in microscopy images,”
Australian Journal of Intelligent Information Systems, vol. 5, no. 1, pp. 34–38, 1998.
[94] E. Oja, “Principal component analysis, minor components, and linear neural networks,”
Neural Networks, vol. 5, no. 6, pp. 927–935, Nov.–Dec. 1992.
[95] A. Pedotti, G. Ferrigno, and M. Redolfi, “Neural network in multimedia speech recognition,” Proceedings of the International Conference on Neural Networks and Expert
Systems in Medicine and Healthcare, Plymouth, UK, pp. 167–173, Aug. 1994.
[96] A. Pentland, B. Moghaddam, and T. Starner, “View-based and modular eigenspaces for
face recognition,” Proceedings of IEEE Conference on Computer Vision and Pattern
Recognition, pp. 84–91, June 1994.
[97] E. Petajan, B. Bischoff, D. Bodoff, and N. Brooke, “An improved automatic lipreading
system to enhance speech recognition,” ACM SIGCHI, pp. 19–25, 1988.
[98] T. Poggio and F. Girosi, “Networks for approximation and learning,” Proceedings of
IEEE, vol. 78, pp. 1481–1497, Sept. 1990.
[99] L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-Hall,
Englewood Cliffs, NJ, 1993.
[100] R.R. Rao and R.M. Mersereau, “Lip modeling for visual speech recognition,” Proceedings of 28th Annual Asilomar Conference, vol. 1, pp. 587–590, Pacific Grove, CA,
Nov. 1994.
[101] M.D. Richard and R.P. Lippmann, “Neural network classifiers estimate Bayesian a
posteriori probabilities,” Neural Computation, vol. 3, no. 4, pp. 461–483, 1991.

©2001 CRC Press LLC

[102] J. Risch, R. May, J. Thomas, and S. Dowson, “Interactive information visualization for
exploratory intelligence data analysis,” Proceedings of the IEEE 1996 Virtual Reality
Annual International Symposium, pp. 230–238, Santa Clara, CA, April 1996.
[103] R. Romagnoli, P.T.A. Nguyen, L. Guan, L. Cinque, and S. Levialdi, “Self-organizing
map for segmenting 3D biological images,” presented at the International Conference
on Pattern Recognition, Brisbane, Australia, 1998.
[104] H.S. Ranganath, D.E. Kerstetter, and S.R.F. Sim, “Self partitioning neural networks for
target recognition,” Neural Networks, vol. 8, no. 9, pp. 1475–1486, 1995.
[105] L. Rothkrantz, V.R. Van, and E. Kerckhoffs, “Analysis of facial expressions with artificial neural networks,” European Simulation Multiconference, Prague, Czech Republic,
pp. 790–794, June 1995.
[106] D.E. Rumelhart, G.E. Hinton, and R.J. William, “Learning internal representation by
error propagation,” in Parallel Distributed Processing: Explorations in the MicroStructure of Cognition, vol. 1, MIT Press, Cambridge, MA, 1986.
[107] P.L. Silsbee, “Sensory integration in audiovisual automatic speech recognition,” Proceedings of 28th Annual Asilomar Conference, vol. 1, pp. 561–565, Pacific Grove, CA,
Nov. 1994.
[108] P.L. Silsbee and A.C. Bovik, “Computer lipreading for improved accuracy in automatic
speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 4, no. 5,
pp. 337–351, Sept. 1996.
[109] V. Shastri, L.C. Rabelo, and E. Onjeyekwe, “Device-independent color correction for
multimedia applications using neural networks and abductive modeling approaches,”
1996 IEEE International Conference on Neural Networks, Washington, DC, pp. 2176–
2181, June 1996.
[110] D.F. Specht, “A general regression neural network,” IEEE Transactions on Neural
Networks, vol. 2, no. 6, pp. 569–576, 1991.
[111] N. Srinvasa and R. Sharma, “SOIM: A self-organizing invertable map with applications
in active vision,” IEEE Transactions on Neural Networks, vol. 8, no. 3, pp. 758–773,
1997.
[112] D.G. Stork, G. Wolff, and E. Levine, “Neural network lipreading system for improved
speech recognition,” Proceedings of IJCNN, pp. 285–295, 1992.
[113] Q. Summerfield, “Some preliminaries to a comprehensive account of audio-visual
speech perception,” in Hearing by Eye: The Psychology of Lip-Reading, B. Dodd
and R. Campbell, editors, pp. 97–113, Lawrence Erlbaum, London, 1987.
[114] K. Sung and T. Poggio, “Learning human face detection in cluttered scenes,” Computer
Analysis of Image and Patterns, pp. 432–439, 1995.
[115] J.P. Sutton, J.S. Beis, and L.E.H. Trainor, “A hierarchical model of neuro-cortical synaptic organization,” Mathematical Computer Modeling, vol. 11, pp. 346–350, 1988.
[116] J.P. Sutton, “Mean field theory of nested neural clusters,” Proceedings of the First AMSE
International Conference on Neural Networks, pp. 47–58, San Diego, CA, May 1991.
[117] F. Takeda and S. Omatu, “High speed paper currency recognition by neural networks,”
IEEE Transactions on Neural Networks, vol. 6, no. 1, pp. 73–77, 1995.

©2001 CRC Press LLC

[118] Y.H. Tseng, J.N. Hwang, and F. Sheehan, “Three-dimensional object representation
and invariant recognition using continuous distance transform neural networks,” IEEE
Transactions on Neural Networks, vol. 8, no. 1, pp. 141–147, Jan. 1997.
[119] L.H. Tung, I. King, and W.S. Lee, “Two-stage polygon representation for efficient shape
retrieval in image databases,” Proceedings of the First International Workshop on Image
Databases and Multi-Media Search, pp. 146–153, Amsterdam, The Netherlands, 1996.
[120] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neuroscience, vol. 3, pp. 71–86, 1991.
[121] J. Ukovich, “Image Retrieval in Multimedia Systems Using Neural Networks,” B. Eng.
thesis, University of Sydney, Dec. 1998.
[122] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K.J. Lang, “Phoneme recognition
using time-delay neural networks,” IEEE Transactions on Acoustics, Speech, and Signal
Processing, vol. 37, no. 3, pp. 328–339, March 1989.
[123] L.X. Wang, “Fuzzy systems are universal approximators,” Proceedings of IEEE International Conference on Fuzzy Systems, pp. 1163–1169, San Diego, CA.
[124] Y. Wang, A. Reibman, F. Juang, T. Chen, and S.Y. Kung, editors, Proceedings of the
IEEE Workshops on Multimedia Signal Processing, IEEE Press, Princeton, NJ, 1997.
[125] S.R. Waterhouse and A.J. Robinson, “Constructive algorithms for hierarchical mixtures
of experts,” Advances in Neural Information Processing, vol. 8, pp. 584–590, Nov. 1995.
[126] S.R. Waterhouse, D. MacKay, and A.J. Robinson, “Bayesian methods for mixtures of
experts,” Advances in Neural Information Processing, vol. 8, pp. 351–357, Nov. 1995.
[127] S.R. Waterhouse and A.J. Robinson, “Non-linear prediction of acoustic vectors using
hierarchical mixtures of experts,” Advances in Neural Information Processing Systems,
vol. 7, pp. 835–842, Nov. 1994.
[128] W.X. Wen, A. Jennings, and H. Liu, “Self-generating neural networks and their applications to telecommunications,” Proceedings of International Conference on Communication Technology, pp. 222–228, Beijing, China, Sept. 1992.
[129] H. White, “Connectionist nonparametric regression: Multilayer feedforward networks
can learn arbitrary mappings,” Neural Networks, vol. 3, pp. 535–549, 1990.
[130] R.J. William and D. Zipser, “A learning algorithm for continually running fully recurrent
neural networks,” Neural Computation, vol. 1, no. 2, pp. 270–280, 1994.
[131] H.S. Wong and L. Guan, “Adaptive regularization in image restoration using evolutionary programming,” Proceedings of IEEE International Conference on Evolutionary
Computation, pp. 159–164, Anchorage, AK, 1998.
[132] H.S. Wong, T.M. Caelli, and L. Guan, “A model-based neural network for edge characterization,” to appear in Pattern Recognition, 1999.
[133] W.A. Woods, “Language processing for speech understanding,” in Readings in Speech
Recognition, A. Waibel and K.F. Lee, editors, pp. 519–533, Morgan Kaufman, 1990.
[134] M.M. Yeung, B.L. Yeo, W. Wolf, and B. Liu, “Video browsing using clustering and scene
transitions on compressed sequences,” Proceedings of SPIE, Multimedia Computing
and Networking, 1995.

©2001 CRC Press LLC

[135] H.H. Yu and W. Wolf, “A hierarchical, multi-resolution method for dictionary-driven
content-based image retrieval,” Proceedings of International Conference on Image Processing, Santa Barbara, CA, Oct. 1997.
[136] B.P. Yuhas, M.H. Goldstein, T.J. Sejnowski, and R.E. Jenkins, “Neural networks models
for sensory integration for improved vowel recognition,” Proceedings of IEEE, vol. 78,
no. 10, pp. 1658–1668, Oct. 1990.
[137] J. Zurada, R. Marks II, and C. Robinson, editors, Computational Intelligence — Imitating Life, IEEE Press, NJ, 1994.

©2001 CRC Press LLC

Chapter 7
On Independent Component Analysis for
Multimedia Signals

Lars Kai Hansen, Jan Larsen, and Thomas Kolenda

7.1

Background

Blind reconstruction of statistically independent source signals from linear mixtures is relevant to many signal processing contexts [1, 6, 8, 9, 22, 24, 36]. With reference to principal
component analysis (PCA), the problem is often referred to as independent component analysis
(ICA).1
The source separation problem can be formulated as a likelihood formulation (see, e.g.,
[7, 32, 35, 37]). The likelihood formulation is attractive for several reasons. First, it allows
a principled discussion of the inevitable priors implicit in any separation scheme. The prior
distribution of the source signals can take many forms and factorizes in the source index
expressing the fact that we look for independent sources. Second, the likelihood approach
allows for direct adaptation of the plethora of powerful schemes for parameter optimization,
regularization, and evaluation of supervised learning algorithms. Finally, for the case of linear
mixtures without noise, the likelihood approach is equivalent to another popular approach
based on information maximization [1, 6, 27].
The source separation problem can be analyzed under the assumption that the sources either
are time independent or possess a more general time-dependence structure. The separation
problem for autocorrelated sequences was studied by Molgedey and Schuster [33]. They
proposed a source separation scheme based on assumed nonvanishing temporal autocorrelation
functions of the independent source sequences evaluated at a specific time lag. Their analysis
was developed for sources mixed by square, nonsingular matrices. Attias and Schreiner derived
a likelihood-based algorithm for separation of correlated sequences with a frequency domain
implementation [2]–[4]. The approach of Molgedey and Schuster is particularly interesting as
regards computational complexity because it forms a noniterative, constructive solution.
Belouchrani and Cardoso presented a general likelihood approach allowing for additive
noise and nonsquare mixing matrices. They applied the method to separation of sources taking
discrete values [7], estimating the mixing matrix using an estimate–maximize (EM) approach
with both a deterministic and a stochastic formulation. Moulines et al. generalized the EM
approach to separation of autocorrelated sequences in the presence of noise, and they explored
a family of flexible source priors based on Gaussian mixtures [34]. The difficult problem
1 There are a number of very useful ICA Web pages providing links to theoretical analysis, implementations, and

applications. Follow links from the page http://eivind.imm.dtu.dk/staff/lkhansen/ica.html.

© 2001 CRC Press LLC

of noisy, overcomplete source models (i.e., more sources than acquired mixture signals) was
recently analyzed by Lewicki and Sejnowski within the likelihood framework [28, 31].
In this chapter we study the likelihood approach and entertain two different approaches
to the problem: a modified version of the Molgedey–Schuster scheme [15], based on time
correlations, and a novel iterative scheme generalizing the mixing problem to separation of
noisy mixtures of time-independent white sources [16]. The Molgedey–Schuster scheme
is extended to the undercomplete case (i.e., more acquired mixture signals than sources),
and further inherent erroneous complex number results are alleviated. In the noisy mixture
problem we find a maximum posterior estimate for the sources that, interestingly, turns out
to be nonlinear in the observed signal. The specific model investigated here is a special case
of the general framework proposed by Belouchrani and Cardoso [7]; however, we formulate
the parameter estimation problem in terms of the Boltzmann learning rule, which allows for a
particular transparent derivation of the mixing matrix estimate.
The methods are applied within several multimedia applications: separation of sound, image
sequences, and text.

7.2

Principal and Independent Component Analysis

PCA is a very popular tool for analysis of correlated data, such as temporal correlated image
databases. With PCA the image database is decomposed in terms of “eigenimages” that often
lend themselves to direct interpretation. A most striking example is face recognition, where
so-called eigenfaces are used as orthogonal preprocessing projection directions for pattern
recognition. The principal components (the sequence of projections of the image data onto the
eigenimages) are also uncorrelated and, hence, perhaps the simplest example of independent
components [9]. The basic tool for PCA is singular value decomposition (SVD).
Define the observed M × N signal matrix, representing a multichannel signal, by
X = {Xm,n } = {xm (n)} = [x(1), x(2), . . . , x(N )]

(7.1)

where M is the number of measurements and N is the number of samples. xm (n), n =
1, 2, . . . , N is the mth signal and x(n) = [x1 (n), x2 (n), · · · , xM (n)] . In the case of image
sequences, M is the number of pixels.
For the fixed choice of P ≤ M, the SVD of X reads2
P

X = U DV

=
i=1

P

ui Di,i vi ,

Xm,n =

Um,i Di,i Vn,i

(7.2)

i=1

where M × P matrix U = {Um,i } = [u1 , u2 , . . . , uP ] and N × P matrix V = {Vn,i } =
[v1 , v2 , . . . , vP ] represent the orthonormal basis vectors (i.e., eigenvectors of the symmetric
matrices XX and X X, respectively). D = {Di,i } is a P × P diagonal matrix of singular
values. In terms of independent sources, SVD can identify a set of uncorrelated time sequences,
the principal components: Di,i vi , enumerated by the source index i = 1, 2, . . . , P . That is,
we can write the observed signal as a weighted sum of fixed eigenvectors (eigenimages) ui .
However, considering the likelihood for the time-correlated source density, we are often
interested in a slightly more general separation of image sources that are independent in time
2 Usually, SVD expresses X = U D V

where U is M × M, D is M × N , and V is N × N. U is the first P columns
of U , D is the P × P upper-left submatrix of D, and V is the first P columns of V .

© 2001 CRC Press LLC