Structural Modelling with Sparse Kernels

Gunn, S.R.; Kandola, J.S.

doi:10.1023/A:1013903804720

Structural Modelling with Sparse Kernels

Published: July 2002

Volume 48, pages 137–163, (2002)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Structural Modelling with Sparse Kernels

Download PDF

S.R. Gunn¹ &
J.S. Kandola¹

838 Accesses
56 Citations
Explore all metrics

Abstract

A widely acknowledged drawback of many statistical modelling techniques, commonly used in machine learning, is that the resulting model is extremely difficult to interpret. A number of new concepts and algorithms have been introduced by researchers to address this problem. They focus primarily on determining which inputs are relevant in predicting the output. This work describes a transparent, advanced non-linear modelling approach that enables the constructed predictive models to be visualised, allowing model validation and assisting in interpretation. The technique combines the representational advantage of a sparse ANOVA decomposition, with the good generalisation ability of a kernel machine. It achieves this by employing two forms of regularisation: a 1-norm based structural regulariser to enforce transparency, and a 2-norm based regulariser to control smoothness. The resulting model structure can be visualised showing the overall effects of different inputs, their interactions, and the strength of the interactions. The robustness of the technique is illustrated using a range of both artifical and “real world” datasets. The performance is compared to other modelling techniques, and it is shown to exhibit competitive generalisation performance together with improved interpretability.

Article PDF

Kernel Methods

High dimensional model representation constructed by support vector regression. I. Independent variables with known probability distributions

Article 22 September 2016

An interpretable regression approach based on bi-sparse optimization

Article 11 July 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc., 686, 337–404.
Google Scholar
Bellman, R. (1961). Adaptive control processes. Princeton, NJ: Princeton University Press.
Google Scholar
Bishop, C. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.
Google Scholar
Blake, C., & Merz, C. (1998). UCI Repository of machine learning databases.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Wadsworth Inc.
Brown, M., & Harris, C. J. (1994). Neurofuzzy adaptive modelling and control. Hemel Hempstead: Prentice Hall.
Google Scholar
Buntine, W. (1991). Theory refinement on bayesian networks. In B. D. D'Ambrosio, P. Smets, & P. P. Bonissone (Eds.), Proc. Seventh Annual Conference on Uncertainty Artificial Intelligence. San Francisco, CA, (pp. 52–60), San Mateo, CA: Morgan Kaufmann Publishers.
Google Scholar
Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Journal of Data Mining and Knowledge Discovery, 2, 121–167.
Google Scholar
Chen, S. (1995). Basis pursuit. Ph.D. thesis, Department of Statistics, Stanford University.
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines. Cambridge: Cambridge University Press.
Google Scholar
Dagum, P., & Luby, M. (1993). Approximating probabilistic inference in bayesian belief networks is NP-hard. Artificial Intelligence, 60, 141–153.
Google Scholar
Dawid, A. (1979a). Conditional independence in statistical theory (with discussion). Journal of the Royal Statistical Society B, 41:1, 1–31.
Google Scholar
Dawid, A. (1979b). Some misleading arguments concerning conditional independence. Journal of the Royal Statistical Society B, 41:2, 249–252.
Google Scholar
Friedman, J. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19, 1–141.
Google Scholar
Friedman, N., & Nachman, N. (2000). Gaussian process networks. In Proc. Sixteenth Conf. on Uncertainty in Artificial Intelligence (UAI), to appear.
Girosi, F. (1997). An equivalence between sparse approximation and support vector machines. A.I. Memo 1606, MIT Artificial Intelligence Laboratory.
Girosi, F., Jones, M., & Poggio, T. (1995). Regularization theory and neural networks architectures. Neural Computation, 7, 219–269.
Google Scholar
Gull, S. (1989). Developments in maximum entropy data analysis. In J. Skilling (Ed.), Maximum entropy and bayesian methods. Dordrecht: Kluwer Academic Publishers.
Google Scholar
Gunn, S. R. (1998). Support vector machines for classification and regression. Technical Report ISIS-1-98, Department of Electronics and Computer Science, University of Southampton.
Gunn, S. R., Brown, M., & Bossley, K. M. (1997). Network performance assessment for neurofuzzy data modelling. In Intelligent Data Analysis, (pp. 313-323).
Hadamard, J. (1923). Lectures on the cauchy problem in linear partial differential equations. Yale University Press.
Harrison, D., & Rubinfield, D. (1978). Hedonic housing prices and the demand for clean air. Journal of Enviromental Economics and Management, (5), 81–102.
Google Scholar
Heckerman, D. (1999). A tutorial on learning with bayesian network, Learning in graphical models, Cambridge, MA: MIT Press.
Google Scholar
Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.
Google Scholar
Husmeier, D. (1999). Neural networks for conditional probability estimation. Berlin: Springer-Verlag Publishers.
Google Scholar
Kandola, J. S., & Gunn, S. R. (2000). On the use of advanced inductive methods for knowledge extraction from complex datasets. Submitted to Journal of Data Mining and Knowledge Discovery.
Kavli, T., & Weyer, E. (1995). On ASMOD-an algorithm for building multivariable spline models. In G. I. K. J. Hunt & K. Warwick (Eds.), Advances in neural networks for control systems, Springer series on Advances in Industrial Control. Berlin: Springer Verlag, pp. 83–104.
Google Scholar
Lauritzen, S. (1995). Graphical models. Oxford: Oxford University Press.
Google Scholar
MacKay, D. (1994). Bayesian non-linear modelling for the prediction competition. ASHRAE Transactions: Symposia, OR-94-17-1.
MacKay, D. (1995). Ensemble learning and evidence maximization. Technical Report, Cavendish Laboratory, Dept. Physics, University of Cambridge.
Méxszáros, C. (1998). The BPMPD interior point solver for convex quadratic problems. Technical Report WP 98-8, Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest.
Moody, J. E., & Rognvaldsson, T. S. (1996). Smoothing regularisers for projective basis function networks. Technical Report OGI CSE TR 96-006, Dept. Computer Science and Engineering, Oregan Graduate Institute of Science and Technology.
Neal, R. (1995). Bayesian learning for neural networks. Berlin: Springer-Verlag Publishers.
Google Scholar
Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Mateo, CA: Morgan Kaufmann Publishers.
Google Scholar
Penny, W., Roberts, S. (1998). Bayesian classification using neural networks-how useful is the evidence framework. Neural Networks, 12, 877–892.
Google Scholar
Plate, T. (1999). Accuracy versus interpretability in flexible modelling: Implementing a tradeoff using Gaussian process models. Behaviourmetrika special issue on “Interpreting Neural Network Models,” 26, 29–50.
Google Scholar
Poggio, T., Torre, V., & Koch, C. (1985). Computational vision and regularization theory. Nature, 317, 314–319.
Google Scholar
Quinlan, J. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Google Scholar
Rasmussen, C. (1996). Evaluation of gaussian processes and other methods for nonlinear regression.
Smola, A., Schölkopf, B., & Müller, K.-R. (1998). General cost functions for support vector regression. In T. Downs, M. Frean, & M. Gallagher (Eds.), Proc. of the Ninth Australian Conf. on Neural Networks. Brisbane, Australia (pp. 79–83). University of Queensland.
Google Scholar
Smola, A. J. (1998). Learning with Kernels. Ph.D. thesis, Technische Universitüt Berlin.
Stitson, M., Gammerman, A., Vapnik, V., Vovk, V., Watkins, C., & Weston, J. (1999). Support vector regression with ANOVA decomposition kernels. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.), Advances in Kernel methods-support vector learning. Cambridge, MA (pp. 285-292). Cambridge, MA: MIT Press.
Google Scholar
Tikhonov, A. N., & Arsenin, V. Y. (1977). Solutions of ill-posed problems. Washington, D.C.: W. H. Winston.
Google Scholar
Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer.
Google Scholar
Wahba, G. (1990). Splines models for observational data. Philadelphia: Series in Applied Mathematics, Vol. 59, SIAM.
Wahba, G., Wang, Y., Gu, C., Klein, R., & Klein, B. (1994). Structured machine learning for 'soft' classification with smoothing spline ANOVA and stacked tuning, testing and evaluation. In J. Cowan, G. Tesaro & J. Alspector (Eds.), Advances in neural information processing (NIPS), vol. 6, San Mateo, CA: Morgan Kauffman.
Google Scholar
Whittaker, J. (1990). Graphical models in applied multivariate statistics. Chichester, UK: John Wiley and Sons.
Google Scholar
Wyatt, J. (1995). Nervous about artificial neural networks? (commentary). The Lancet, 346, 1175–1177.
Google Scholar

Download references

Author information

Authors and Affiliations

ISIS Research Group, Department of Electronics and Computer Science, University of Southampton, UK
S.R. Gunn & J.S. Kandola

Authors

S.R. Gunn
View author publications
You can also search for this author in PubMed Google Scholar
J.S. Kandola
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gunn, S., Kandola, J. Structural Modelling with Sparse Kernels. Machine Learning 48, 137–163 (2002). https://doi.org/10.1023/A:1013903804720

Download citation

Issue Date: July 2002
DOI: https://doi.org/10.1023/A:1013903804720

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Structural Modelling with Sparse Kernels

Abstract

Article PDF

Similar content being viewed by others

Kernel Methods

High dimensional model representation constructed by support vector regression. I. Independent variables with known probability distributions

An interpretable regression approach based on bi-sparse optimization

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Structural Modelling with Sparse Kernels

Abstract

Article PDF

Similar content being viewed by others

Kernel Methods

High dimensional model representation constructed by support vector regression. I. Independent variables with known probability distributions

An interpretable regression approach based on bi-sparse optimization

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation