Tải bản đầy đủ

4 Open Issues, Future Research Directions, and Conclusions

References

[1] J.A. Anderson, M.T. Gately, P.A. Penz, and D.R. Collins, “Radar signal categorization

using a neural network,” Proceedings of IEEE, vol. 78, pp. 1646–1657, 1990.

[2] J.A. Anderson and J.P. Sutton, “A network of networks: Computation and neurobiology,” World Congress of Neural Networks, vol. 1, pp. 561–568, 1995.

[3] E. Andre, G. Herzog, and T. Rist, “From visual data to multimedia presentations,” IEEE

Colloquium Grounding Representations: Integration of Sensory Information in Natural

Language Processing, Artificial Intelligence and Neural Networks, London, pp. 1–3,

May 1995.

[4] M. Arbib, The Handbook of Brain Theory and Neural Networks, MIT Press, Cambridge,

1995.

[5] R. Bajaj and S. Chaudhury, “Signature verification using multiple neural networks,”

Pattern Recognition, vol. 30, no. 1, pp. 1–7, 1997.

[6] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman, “Eigenfaces vs. Fisherfaces:

Recognition using class specific linear projection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, July 1997.

[7] H. Brandt, H.W. Lahmann, and R. Weber, “Quality control of saw blades based on

neural networks and laser vibration measurements,” Second International Conference

on Vibration Measurements by Laser Techniques: Advances and Applications. Proceedings of the SPIE — The International Society for Optical Engineering, vol. 2868,

pp. 119–124, Ancona, Italy, 1996.

[8] C. Bregler and Y. Konig, “Eigenlips for robust speech recognition,” In Proc. of the

Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP’94), pp. 669–672,

Adelaide, Australia, 1994.

[9] C. Bregler, S.M. Omohundro, and Y. Konig, “A hybrid approach to bimodal speech

recognition,” In 28th Asilomar Conf. on Signals, Systems, and Computers, pp. 572–

577, Pacific Grove, CA, 1994.

[10] J.S. Bridle, “Probabilistic interpretation of feedforward classification network outputs,

with relationships to statistical pattern recognition,” In Neuro-computing: Algorithms,

Architectures and Applications, F. Fogelman-Soulie and J. Hérault, editors, pp. 227–

236, Springer-Verlag, Berlin, 1991.

[11] R. Brunelli and T. Poggio, “Face recognition: Features versus templates,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, pp. 1042–1052, 1993.

[12] J. Cao, M. Ahmad, and M. Shridhar, “A hierarchical neural network architecture for

handwritten numeral recognition,” Pattern Recognition, vol. 30, no. 2, pp. 289–294,

1997.

[13] Y. Chan, S.H. Lin, Y.P. Tan, and S.Y. Kung, “Video shot classification using human faces,” in IEEE International Conference on Image Processing 1996, Lausanne,

Switzerland.

©2001 CRC Press LLC

[14] V. Chandrasekaran, M. Palaniswani, and T.M. Caelli, “Range image segmentation by

dynamic neural network architecture,” Pattern Recognition, vol. 29, no. 2, pp. 315–329,

1996.

[15] T. Chen, A. Katsaggelos, and S.Y. Kung, editors, “Content-based indexing and retrieval

of visual information,” in IEEE Signal Processing Magazine, pp. 45–48, July 1997.

[16] K. Chen, D. Xie, and H. Chi, “Text-dependent speaker identification using hierarchical

mixture of experts,” Acta Scientiarum Naturalium Universitatis Pekinensis, vol. 32,

no. 3. pp. 396–404, May 1996.

[17] T. Chen, A. Katsaggelos, and S.Y. Kung, editors, “The past, present, and future of

multimedia signal processing,” IEEE Signal Processing Magazine, July 1997.

[18] Y.-K. Chen, Yunting Lin, and S.Y. Kung, “A feature tracking algorithm using neighborhood relaxation with multi-candidate pre-screening,” in Proceedings of IEEE International Conference on Image Processing, vol. II, pp. 513–516, Lausanne, Switzerland,

Sept. 1996.

[19] T. Chen and R. Rao, “Audio-visual interaction in multimedia communication,” Proceedings and ICASSP, vol. 1, pp. 179–182, Munich, April 1997.

[20] G.I. Chiou and J.N. Hwang, “Image sequence classification using a neural network

based active contour model and a hidden Markov model,” International Conference on

Image Processing, vol. III, pp. 926–930, Austin, Texas, November 1994.

[21] G.I. Chiou and J.N. Hwang, “A neural network based stochastic active contour model

(NNS-SNAKE) for contour finding of distinct features,” IEEE Transactions on Image

Processing, vol. 4, no. 10, pp. 1407–1416, October 1995.

[22] L.D. Cohen and I. Cohen, “Finite-element methods for active contour models and

balloons for 2-D and 3-D images,” IEEE Transactions on Pattern Analysis and Machine

Intelligence, vol. 15, no. 11, pp. 1131–1141, Nov. 1993.

[23] T.F. Cootes and C.J. Taylor, “Active shape models — smart snakes,” In Proceedings of

British Machine Vision Conference, pp. 266–275, Springer-Verlag, Berlin, 1992.

[24] J.M. Corridoni, A. del Bimbo, and L. Landi, “3D object classification using multi-object

Kohonen networks,” Pattern Recognition, vol. 29, no. 6, pp. 919–935, 1996.

[25] S.M. Courtney, L.H. Finkel, and G. Buchsbaum, “A multistage neural network for color

constancy and color induction,” IEEE Transactions on Neural Networks, vol. 6, no. 4,

pp. 972–985, 1995.

[26] I.J. Cox, J. Ghosn, and P. Yianilos, “Feature-based face recognition using mixture

distance,” Tech. rep. 95-09, NEC Research Institute, 1995.

[27] P. Duchnowski, U. Meier, and A. Waibel, “See Me, Hear Me: Integrating automatic

speech recognition and lipreading,” ICSLP’95, Yokohoma, Japan, pp. 547–550, 1995.

[28] J.L. Elman, “Finding structure in time,” Cognitive Science, vol. 14, pp. 179–211, 1990.

[29] D.B. Fogel, Evolutionary Computation: Toward a New Philosophy of Machine Intelligence, IEEE Press, Piscataway, NJ, 1995.

[30] K. Fukushima and N. Wake, “Handwritten alphanumerical character recognition by

the neocognition,” IEEE Transactions on Neural Networks, vol. 2, no. 3, pp. 355–365,

1991.

©2001 CRC Press LLC

[31] A.J. Goldschen, O.N. Garcia, and E. Petajan, “Continuous optical automatic speech

recognition by lipreading,” in 28th Asilomar Conference on Signals, Systems, and Computers, pp. 572–577, Pacific Grove, CA, 1994.

[32] L. Guan, “Image restoration by a neural network with hierarchical cluster architecture,”

Journal of Electronic Imaging, vol. 3, pp. 154–163, April 1994.

[33] L. Guan, S. Perry, R. Romagnoli, H.S. Wong, and H.S. Kong, “Neural vision system

and applications in image processing and analysis,” Proceedings of IEEE International

Conference on Acoustics, Speech and Signal Processing, vol. II, pp. 1245–1248, Seattle,

WA, 1998.

[34] D.H. Han, H.K. Sung, and H.M. Choi, “Nonlinear shape restoration based on selective

learning SOFM approach,” Journal of the Korean Institute of Telematics and Electronics,

vol. 34C, no. 1, pp. 59–64, Jan. 1997.

[35] T. Harris, “Kohonen neural networks for machine and process condition monitoring,”

Proceedings of the International Conference on Artificial Neural Nets and Genetic

Algorithms, Ales, France, pp. 3–4, April 1995.

[36] S. Haykin, Neural Networks, 2nd ed., Prentice-Hall, Englewood Cliffs, NJ, 1998.

[37] M.E. Hennecke, K.V. Prasad, and D.G. Stork, “Using deformable templates to infer

visual speech dynamics,” Proceedings of 28th Annual Asilomar Conference, vol. 1,

pp. 578–582, Pacific Grove, CA, Nov. 1994.

[38] H. Hermansky, N. Morgan, A. Bayya, and P. Kohn, “RASTA-RLP speech analysis

technique,” ICASSP’92, pp. 121–124, San Francisco, CA, 1992.

[39] J. Hertz, A. Krogh, and R.G. Palmer, Introduction to the Theory of Neural Computation,

ch. 7, pp. 163–196, 1991.

[40] T.S. Huang, C.P. Hess, H. Pan, and Z.-P. Liang, “A neuronet approach to information fusion,” Proceedings of IEEE First Workshop on Multimedia Signal Processing,

Princeton, NJ, June 1997.

[41] J.N. Hwang and H. Li, “A limited feedback time delay neural network,” International

Joint Conference on Neural Networks, Nagoya, Japan, pp. 271–274, October 1993.

[42] J.N. Hwang and E. Lin, “Mixture of discriminative learning experts of constant sensitivity for automated cytology screening,” 1997 IEEE Workshop for Neural Networks

for Signal Processing, Amelia Island, FL, September 1997.

[43] J.N. Hwang, S.Y. Kung, M. Niranjan, and J.C. Principe, editors, “The past, present, and

future of neural networks for signal processing,” IEEE Signal Processing Magazine,

November 1997.

[44] R.A. Jacobs, M.I. Jordan, S.J. Nowlan, and G.E. Hinton, “Adaptive mixtures of local

experts,” Neural Computation, vol. 3, pp. 79–87, 1991.

[45] D.L. James, “SARDNET: A self-organizing feature map for sequences,” Advances in

Neural Information Processing Systems, vol. 7, pp. 577–584, Nov. 1994.

[46] J. Jang, C.-T. Sun, and E. Mizutani, editors, Neuro-Fuzzy and Soft Computing, PrenticeHall, NJ, 1997.

[47] M.I. Jordan and R.A. Jacobs, “Learning to control an unstable system with forward

modeling,” Advances in NIPS ’90, pp. 325–331, 1990.

©2001 CRC Press LLC

[48] M.I. Jordan and R.A. Jacobs, “Hierarchies of adaptive experts,” Witkin Neural Information Systems, vol. 4, 1992.

[49] B.H. Juang and S. Katagiri, “Discriminative learning for minimum error classification,”

IEEE Transactions on Signal Processing, vol. 40, no. 12, pp. 3043–3054, 1992.

[50] M. Kass and D. Terzopoulos, “Snakes: Active contour models,” International Journal

of Computer Vision, pp. 321–331, 1988.

[51] J. Koh, M. Suk, and S.M. Bhandarkar, “A multilayer self-organizing feature map for

range image segmentation,” Neural Networks, vol. 8, no. 1, pp. 67–86, 1995.

[52] T. Kohonen, “Self-organized formation of topologically correct feature maps,” Biological Cybernetics, vol. 43, pp. 59–69, 1982.

[53] T. Kohonen, Self-Organization and Associative Memory, 2nd edition, Springer-Verlag,

Berlin, 1984.

[54] H. Kong and L. Guan, “A self-organizing tree map for eliminating impulse noise with

random intensity distributions,” Journal of Electronic Imaging, vol. 6, no. 1, pp. 36–44,

1998.

[55] H. Kong, “Self-organizing tree map and its applications in digital image processing,”

Ph.D. thesis, University of Sydney, Sept. 1998.

[56] B. Kosko, “Fuzzy systems are universal approximators,” Proceedings of IEEE International Conference on Fuzzy Systems, pp. 1153–1162, San Diego, CA.

[57] S.Y. Kung, Digital Neural Networks, Prentice-Hall, Englewood Cliffs, NJ, 1993.

[58] S. Kung and J. Taur, “Decision-based neural networks with signal/image classification

applications,” IEEE Transactions on Neural Networks, vol. 6, no. 1, pp. 170–181,

Jan. 1995.

[59] S.Y. Kung and J.-N. Hwang, “Neural networks for intelligent multimedia processing,”

Proceedings of the IEEE, vol. 86, no. 6, pp. 1244–1272, June 1998.

[60] S.Y. Kung, J.-S. Taur, and S.-H. Lin, “Synergistic modeling and applications of fuzzy

neural networks,” Proceedings of the IEEE, vol. 87, no. 8, August 1999.

[61] S.-H. Lai and M. Fang, “Robust and automatic adjustment of display window width

and center for MR images,” SCR Invention No. 97E7464, 1997.

[62] J. Lampinen and E. Oja, “Distortion tolerant pattern recognition based on selforganizing feature extraction,” IEEE Transactions on Neural Networks, vol. 6, no. 3,

pp. 539–547, 1995.

[63] R. Laganiere and P. Cohen, “Gradual perception of structure from motion: A neural

approach,” IEEE Transactions on Neural Networks, vol. 6, no. 3, pp. 736–748, 1995.

[64] K. Langer and F. Bodendorf, “Flexible user-guidance in multimedia CBT-applications

using artificial neural networks and fuzzy logic,” International ICSC Symposia on Intelligent Industrial Automation and Soft Computing, pp. B9–13, March 1996.

[65] F. Lavagetto, “Converting speech into lip movements: A multimedia telephone for

hard of hearing people,” IEEE Transactions on Rehabilitation Engineering, p. 114,

March 1995.

[66] F. Lavagetto, S. Lepsoy, C. Braccini, and S. Curinga, “Lip motion modeling and speech

driven estimation,” ICASSP’97, pp. 183–186, Munich, Germany, April 1994.

©2001 CRC Press LLC

[67] S.H. Lin, “Biometric identification for network security and access control,” Ph.D. dissertation, Dept. of Electrical Engineering, Princeton University, Princeton, NJ, 1996.

[68] S.-H. Lin, S. Kung, and L.-J. Lin, “A probabilistic DBNN with applications to sensor

fusion and object recognition,” Proceedings of 5th IEEE Workshop on Neural Networks

for Signal Processing, pp. 333–342, Aug. 1995.

[69] S.-H. Lin, Y. Chan, and S.Y. Kung, “A probabilistic decision-based neural network

for location of deformable objects and its applications to surveillance system and video

browsing,” IEEE International Conference on Acoustics, Speech and Signal Processing,

Atlanta, GA, 1996.

[70] S.-H. Lin, S.Y. Kung, and L.J. Lin, “Face recognition/detection by probabilistic

decision-based neural networks,” IEEE Transactions on Neural Networks, vol. 8, no. 1,

pp. 114–132, Jan. 1997.

[71] S.-H. Lin, S.Y. Kung, and M. Fang, “A neural network approach for face/palm recognition,” Proceedings of 5th IEEE Workshop on Neural Networks for Signal Processing,

pp. 323–332, Aug. 1995.

[72] Z. Liu, J. Huang, Y. Wang, and T. Chen, “Extraction and analysis for scene classification,” Proceedings of IEEE First Workshop on Multimedia Signal Processing, Y. Wang

et al., editors, Princeton, NJ, June 1997.

[73] S.W. Lu and A. Szeto, “Hierarchical artificial neural networks for edge enhancement,”

Pattern Recognition, vol. 26, no. 8, pp. 1149–1163, 1993.

[74] H. McGurk and J. MacDonald, “Hearing lips and seeing voices,” Nature, pp. 746–748,

Dec. 1976.

[75] T. Mandl and H.C. Womser, “Soft computing — vague query handling in object oriented

information systems,” Proceedings HIM’95, pp. 277–291, Mase Konstanz, Germany,

April 1995.

[76] M. Mangeas and A.S. Weigend, “First experiments using a mixture of nonlinear experts

for time series prediction,” 1995 World Congress on Neural Networks, vol. 2, pp. 104–

109, Washington, DC, July 1995.

[77] K. Mase and A. Pentland, “Automatic lipreading by optical-flow analysis,” Systems and

Computers in Japan, vol. 22, no. 6, pp. 67–76, 1991.

[78] Y. Matsuyama and M. Tan, “Multiply descent cost competitive learning as an aid for

multimedia image processing,” Proceedings of 1993 International Joint Conference on

Neural Networks, Nagoya, Japan, pp. 2061–2064, Oct. 1993.

[79] M. McLuhan, Understanding Media, McGraw-Hill, New York, 1964.

[80] B. Moghaddam and A. Pentland, “Face recognition using view-based and modular

eigenspaces,” SPIE, vol. 2257, 1994.

[81] S.Y. Moon and J.N. Hwang, “Robust speech recognition based on joint model and

feature space optimization of hidden Markov models,” IEEE Transactions on Neural

Networks, vol. 8, no. 2, pp. 194–204, March 1997.

[82] S. Morishima, K. Aizawa, and H. Harashima, “An intelligent facial image coding driven

by speech and phoneme,” ICASSP, pp. 1795–1978, 1989.

[83] Committee Draft of the Standard: ISO 11172-2, “Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbits/s,” Nov. 1991.

©2001 CRC Press LLC

[84] Committee Draft of the Standard: ISO 13818, “MPEG-2 video coding standard,”

Nov. 1994.

[85] Special Issue of MPEG-4 Video Coding Standards, IEEE Transactions on Circuits and

Systems for Video Technology, Feb. 1997.

[86] “Second draft of MPEG-7 applications document,” ISO/IEC JTC1/SC29/WG11 Coding

of Moving Pictures and Associated Audio MPEG97/N2666, Oct. 1997.

[87] “Third draft of MPEG-7 requirements,” ISO/IEC JTC1/SC29/WG11 Coding of Moving

Pictures and Associated Audio MPEG97/N2606, Oct. 1997.

[88] T. Murdoch and N. Ball, “Machine learning in configuration design,” (AI EDAM) Artificial Intelligence for Engineering Design, Analysis and Manufacturing, vol. 10, no. 2,

pp. 101–113, April 1996.

[89] Y. Nakagawa, E. Hirota, and W. Pedrycz, “The concept of fuzzy multimedia intelligent communication system (FuMICS),” Proceedings of the Fifth IEEE International

Conference on Fuzzy Systems, pp. 1476–1480, New Orleans, LA, Sept. 1996.

[90] R. Nakatsu, “Media integration for human communication,” IEEE Signal Processing

Magazine, pp. 36–37, July 1997.

[91] C.L. Nikias, “Riding the new integrated media systems wave,” IEEE Signal Processing

Magazine, pp. 32–33, July 1997.

[92] Neural Networks for Signal Processing, Proceedings of IEEE Workshops, 1991–1997,

vols. I–VII, IEEE Press.

[93] P.T.A. Nguyen, R. Romagnoli, P. Fekete, M.R. Arnison, L. Guan, and C. Cogswell, “A

self-organizing map for extracting features of chromosomes in microscopy images,”

Australian Journal of Intelligent Information Systems, vol. 5, no. 1, pp. 34–38, 1998.

[94] E. Oja, “Principal component analysis, minor components, and linear neural networks,”

Neural Networks, vol. 5, no. 6, pp. 927–935, Nov.–Dec. 1992.

[95] A. Pedotti, G. Ferrigno, and M. Redolfi, “Neural network in multimedia speech recognition,” Proceedings of the International Conference on Neural Networks and Expert

Systems in Medicine and Healthcare, Plymouth, UK, pp. 167–173, Aug. 1994.

[96] A. Pentland, B. Moghaddam, and T. Starner, “View-based and modular eigenspaces for

face recognition,” Proceedings of IEEE Conference on Computer Vision and Pattern

Recognition, pp. 84–91, June 1994.

[97] E. Petajan, B. Bischoff, D. Bodoff, and N. Brooke, “An improved automatic lipreading

system to enhance speech recognition,” ACM SIGCHI, pp. 19–25, 1988.

[98] T. Poggio and F. Girosi, “Networks for approximation and learning,” Proceedings of

IEEE, vol. 78, pp. 1481–1497, Sept. 1990.

[99] L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-Hall,

Englewood Cliffs, NJ, 1993.

[100] R.R. Rao and R.M. Mersereau, “Lip modeling for visual speech recognition,” Proceedings of 28th Annual Asilomar Conference, vol. 1, pp. 587–590, Pacific Grove, CA,

Nov. 1994.

[101] M.D. Richard and R.P. Lippmann, “Neural network classifiers estimate Bayesian a

posteriori probabilities,” Neural Computation, vol. 3, no. 4, pp. 461–483, 1991.

©2001 CRC Press LLC

[102] J. Risch, R. May, J. Thomas, and S. Dowson, “Interactive information visualization for

exploratory intelligence data analysis,” Proceedings of the IEEE 1996 Virtual Reality

Annual International Symposium, pp. 230–238, Santa Clara, CA, April 1996.

[103] R. Romagnoli, P.T.A. Nguyen, L. Guan, L. Cinque, and S. Levialdi, “Self-organizing

map for segmenting 3D biological images,” presented at the International Conference

on Pattern Recognition, Brisbane, Australia, 1998.

[104] H.S. Ranganath, D.E. Kerstetter, and S.R.F. Sim, “Self partitioning neural networks for

target recognition,” Neural Networks, vol. 8, no. 9, pp. 1475–1486, 1995.

[105] L. Rothkrantz, V.R. Van, and E. Kerckhoffs, “Analysis of facial expressions with artificial neural networks,” European Simulation Multiconference, Prague, Czech Republic,

pp. 790–794, June 1995.

[106] D.E. Rumelhart, G.E. Hinton, and R.J. William, “Learning internal representation by

error propagation,” in Parallel Distributed Processing: Explorations in the MicroStructure of Cognition, vol. 1, MIT Press, Cambridge, MA, 1986.

[107] P.L. Silsbee, “Sensory integration in audiovisual automatic speech recognition,” Proceedings of 28th Annual Asilomar Conference, vol. 1, pp. 561–565, Pacific Grove, CA,

Nov. 1994.

[108] P.L. Silsbee and A.C. Bovik, “Computer lipreading for improved accuracy in automatic

speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 4, no. 5,

pp. 337–351, Sept. 1996.

[109] V. Shastri, L.C. Rabelo, and E. Onjeyekwe, “Device-independent color correction for

multimedia applications using neural networks and abductive modeling approaches,”

1996 IEEE International Conference on Neural Networks, Washington, DC, pp. 2176–

2181, June 1996.

[110] D.F. Specht, “A general regression neural network,” IEEE Transactions on Neural

Networks, vol. 2, no. 6, pp. 569–576, 1991.

[111] N. Srinvasa and R. Sharma, “SOIM: A self-organizing invertable map with applications

in active vision,” IEEE Transactions on Neural Networks, vol. 8, no. 3, pp. 758–773,

1997.

[112] D.G. Stork, G. Wolff, and E. Levine, “Neural network lipreading system for improved

speech recognition,” Proceedings of IJCNN, pp. 285–295, 1992.

[113] Q. Summerfield, “Some preliminaries to a comprehensive account of audio-visual

speech perception,” in Hearing by Eye: The Psychology of Lip-Reading, B. Dodd

and R. Campbell, editors, pp. 97–113, Lawrence Erlbaum, London, 1987.

[114] K. Sung and T. Poggio, “Learning human face detection in cluttered scenes,” Computer

Analysis of Image and Patterns, pp. 432–439, 1995.

[115] J.P. Sutton, J.S. Beis, and L.E.H. Trainor, “A hierarchical model of neuro-cortical synaptic organization,” Mathematical Computer Modeling, vol. 11, pp. 346–350, 1988.

[116] J.P. Sutton, “Mean field theory of nested neural clusters,” Proceedings of the First AMSE

International Conference on Neural Networks, pp. 47–58, San Diego, CA, May 1991.

[117] F. Takeda and S. Omatu, “High speed paper currency recognition by neural networks,”

IEEE Transactions on Neural Networks, vol. 6, no. 1, pp. 73–77, 1995.

©2001 CRC Press LLC

[118] Y.H. Tseng, J.N. Hwang, and F. Sheehan, “Three-dimensional object representation

and invariant recognition using continuous distance transform neural networks,” IEEE

Transactions on Neural Networks, vol. 8, no. 1, pp. 141–147, Jan. 1997.

[119] L.H. Tung, I. King, and W.S. Lee, “Two-stage polygon representation for efficient shape

retrieval in image databases,” Proceedings of the First International Workshop on Image

Databases and Multi-Media Search, pp. 146–153, Amsterdam, The Netherlands, 1996.

[120] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neuroscience, vol. 3, pp. 71–86, 1991.

[121] J. Ukovich, “Image Retrieval in Multimedia Systems Using Neural Networks,” B. Eng.

thesis, University of Sydney, Dec. 1998.

[122] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K.J. Lang, “Phoneme recognition

using time-delay neural networks,” IEEE Transactions on Acoustics, Speech, and Signal

Processing, vol. 37, no. 3, pp. 328–339, March 1989.

[123] L.X. Wang, “Fuzzy systems are universal approximators,” Proceedings of IEEE International Conference on Fuzzy Systems, pp. 1163–1169, San Diego, CA.

[124] Y. Wang, A. Reibman, F. Juang, T. Chen, and S.Y. Kung, editors, Proceedings of the

IEEE Workshops on Multimedia Signal Processing, IEEE Press, Princeton, NJ, 1997.

[125] S.R. Waterhouse and A.J. Robinson, “Constructive algorithms for hierarchical mixtures

of experts,” Advances in Neural Information Processing, vol. 8, pp. 584–590, Nov. 1995.

[126] S.R. Waterhouse, D. MacKay, and A.J. Robinson, “Bayesian methods for mixtures of

experts,” Advances in Neural Information Processing, vol. 8, pp. 351–357, Nov. 1995.

[127] S.R. Waterhouse and A.J. Robinson, “Non-linear prediction of acoustic vectors using

hierarchical mixtures of experts,” Advances in Neural Information Processing Systems,

vol. 7, pp. 835–842, Nov. 1994.

[128] W.X. Wen, A. Jennings, and H. Liu, “Self-generating neural networks and their applications to telecommunications,” Proceedings of International Conference on Communication Technology, pp. 222–228, Beijing, China, Sept. 1992.

[129] H. White, “Connectionist nonparametric regression: Multilayer feedforward networks

can learn arbitrary mappings,” Neural Networks, vol. 3, pp. 535–549, 1990.

[130] R.J. William and D. Zipser, “A learning algorithm for continually running fully recurrent

neural networks,” Neural Computation, vol. 1, no. 2, pp. 270–280, 1994.

[131] H.S. Wong and L. Guan, “Adaptive regularization in image restoration using evolutionary programming,” Proceedings of IEEE International Conference on Evolutionary

Computation, pp. 159–164, Anchorage, AK, 1998.

[132] H.S. Wong, T.M. Caelli, and L. Guan, “A model-based neural network for edge characterization,” to appear in Pattern Recognition, 1999.

[133] W.A. Woods, “Language processing for speech understanding,” in Readings in Speech

Recognition, A. Waibel and K.F. Lee, editors, pp. 519–533, Morgan Kaufman, 1990.

[134] M.M. Yeung, B.L. Yeo, W. Wolf, and B. Liu, “Video browsing using clustering and scene

transitions on compressed sequences,” Proceedings of SPIE, Multimedia Computing

and Networking, 1995.

©2001 CRC Press LLC

[135] H.H. Yu and W. Wolf, “A hierarchical, multi-resolution method for dictionary-driven

content-based image retrieval,” Proceedings of International Conference on Image Processing, Santa Barbara, CA, Oct. 1997.

[136] B.P. Yuhas, M.H. Goldstein, T.J. Sejnowski, and R.E. Jenkins, “Neural networks models

for sensory integration for improved vowel recognition,” Proceedings of IEEE, vol. 78,

no. 10, pp. 1658–1668, Oct. 1990.

[137] J. Zurada, R. Marks II, and C. Robinson, editors, Computational Intelligence — Imitating Life, IEEE Press, NJ, 1994.

©2001 CRC Press LLC

Chapter 7

On Independent Component Analysis for

Multimedia Signals

Lars Kai Hansen, Jan Larsen, and Thomas Kolenda

7.1

Background

Blind reconstruction of statistically independent source signals from linear mixtures is relevant to many signal processing contexts [1, 6, 8, 9, 22, 24, 36]. With reference to principal

component analysis (PCA), the problem is often referred to as independent component analysis

(ICA).1

The source separation problem can be formulated as a likelihood formulation (see, e.g.,

[7, 32, 35, 37]). The likelihood formulation is attractive for several reasons. First, it allows

a principled discussion of the inevitable priors implicit in any separation scheme. The prior

distribution of the source signals can take many forms and factorizes in the source index

expressing the fact that we look for independent sources. Second, the likelihood approach

allows for direct adaptation of the plethora of powerful schemes for parameter optimization,

regularization, and evaluation of supervised learning algorithms. Finally, for the case of linear

mixtures without noise, the likelihood approach is equivalent to another popular approach

based on information maximization [1, 6, 27].

The source separation problem can be analyzed under the assumption that the sources either

are time independent or possess a more general time-dependence structure. The separation

problem for autocorrelated sequences was studied by Molgedey and Schuster [33]. They

proposed a source separation scheme based on assumed nonvanishing temporal autocorrelation

functions of the independent source sequences evaluated at a specific time lag. Their analysis

was developed for sources mixed by square, nonsingular matrices. Attias and Schreiner derived

a likelihood-based algorithm for separation of correlated sequences with a frequency domain

implementation [2]–[4]. The approach of Molgedey and Schuster is particularly interesting as

regards computational complexity because it forms a noniterative, constructive solution.

Belouchrani and Cardoso presented a general likelihood approach allowing for additive

noise and nonsquare mixing matrices. They applied the method to separation of sources taking

discrete values [7], estimating the mixing matrix using an estimate–maximize (EM) approach

with both a deterministic and a stochastic formulation. Moulines et al. generalized the EM

approach to separation of autocorrelated sequences in the presence of noise, and they explored

a family of flexible source priors based on Gaussian mixtures [34]. The difficult problem

1 There are a number of very useful ICA Web pages providing links to theoretical analysis, implementations, and

applications. Follow links from the page http://eivind.imm.dtu.dk/staff/lkhansen/ica.html.

© 2001 CRC Press LLC

of noisy, overcomplete source models (i.e., more sources than acquired mixture signals) was

recently analyzed by Lewicki and Sejnowski within the likelihood framework [28, 31].

In this chapter we study the likelihood approach and entertain two different approaches

to the problem: a modified version of the Molgedey–Schuster scheme [15], based on time

correlations, and a novel iterative scheme generalizing the mixing problem to separation of

noisy mixtures of time-independent white sources [16]. The Molgedey–Schuster scheme

is extended to the undercomplete case (i.e., more acquired mixture signals than sources),

and further inherent erroneous complex number results are alleviated. In the noisy mixture

problem we find a maximum posterior estimate for the sources that, interestingly, turns out

to be nonlinear in the observed signal. The specific model investigated here is a special case

of the general framework proposed by Belouchrani and Cardoso [7]; however, we formulate

the parameter estimation problem in terms of the Boltzmann learning rule, which allows for a

particular transparent derivation of the mixing matrix estimate.

The methods are applied within several multimedia applications: separation of sound, image

sequences, and text.

7.2

Principal and Independent Component Analysis

PCA is a very popular tool for analysis of correlated data, such as temporal correlated image

databases. With PCA the image database is decomposed in terms of “eigenimages” that often

lend themselves to direct interpretation. A most striking example is face recognition, where

so-called eigenfaces are used as orthogonal preprocessing projection directions for pattern

recognition. The principal components (the sequence of projections of the image data onto the

eigenimages) are also uncorrelated and, hence, perhaps the simplest example of independent

components [9]. The basic tool for PCA is singular value decomposition (SVD).

Define the observed M × N signal matrix, representing a multichannel signal, by

X = {Xm,n } = {xm (n)} = [x(1), x(2), . . . , x(N )]

(7.1)

where M is the number of measurements and N is the number of samples. xm (n), n =

1, 2, . . . , N is the mth signal and x(n) = [x1 (n), x2 (n), · · · , xM (n)] . In the case of image

sequences, M is the number of pixels.

For the fixed choice of P ≤ M, the SVD of X reads2

P

X = U DV

=

i=1

P

ui Di,i vi ,

Xm,n =

Um,i Di,i Vn,i

(7.2)

i=1

where M × P matrix U = {Um,i } = [u1 , u2 , . . . , uP ] and N × P matrix V = {Vn,i } =

[v1 , v2 , . . . , vP ] represent the orthonormal basis vectors (i.e., eigenvectors of the symmetric

matrices XX and X X, respectively). D = {Di,i } is a P × P diagonal matrix of singular

values. In terms of independent sources, SVD can identify a set of uncorrelated time sequences,

the principal components: Di,i vi , enumerated by the source index i = 1, 2, . . . , P . That is,

we can write the observed signal as a weighted sum of fixed eigenvectors (eigenimages) ui .

However, considering the likelihood for the time-correlated source density, we are often

interested in a slightly more general separation of image sources that are independent in time

2 Usually, SVD expresses X = U D V

where U is M × M, D is M × N , and V is N × N. U is the first P columns

of U , D is the P × P upper-left submatrix of D, and V is the first P columns of V .

© 2001 CRC Press LLC