Skip to main content
Log in

Autonomous graph mining algorithm search with best performance trade-off

  • Regular paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The pervasiveness of graphs today has raised the demand for algorithms to answer various questions: Which products would a user like to purchase given her order list? Which users are buying fake followers? Myriads of new graph algorithms are proposed every year to answer such questions—each with a distinct problem formulation, computational time, and memory footprint. This lack of unity makes it difficult for practitioners to compare different algorithms and pick the most suitable one for their application. These challenges create a gap in which state-of-the-art techniques developed in academia fail to be optimally deployed in real-world applications. To bridge this gap, we propose AutoGM, an automated system for graph mining algorithm development. We first define a unified framework UnifiedGM for message-passing-based graph algorithms. UnifiedGM defines a search space in which five parameters are required to determine a graph algorithm. Under this search space, AutoGM explicitly optimizes for the optimal parameter set of UnifiedGM using Bayesian Optimization. AutoGM defines a novel budget-aware objective function for the optimization to find the best speed-accuracy trade-off in algorithms under a computation budget. On various real-world datasets, AutoGM generates novel graph algorithms with the best speed/accuracy trade-off compared to existing models with heuristic parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. This paper is an extended version of [46].

  2. https://github.com/minjiyoon/ICDM20-AutoGM.

  3. https://github.com/fmfn/BayesianOptimization.

References

  1. Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized pagerank. Proc VLDB Endow 4(3):173–184

    Article  Google Scholar 

  2. Bennett J, Lanning S (2007) August. The netflix prize. In Proceedings of KDD cup and workshop (Vol. 2007, p. 35)

  3. Brochu E, Cora VM, De Freitas N (2010) A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599

  4. Brohee S, Van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinf 7(1):1–19

    Article  Google Scholar 

  5. Dreżewski R, Sepielak J, Filipkowski W (2015) The application of social network analysis algorithms in a system supporting money laundering detection. Inf Sci 295:18–32

    Article  MathSciNet  Google Scholar 

  6. Eksombatchai C, Jindal P, Liu JZ, Liu Y, Sharma R, Sugnet C, Ulrich M, Leskovec J (2018) April. Pixie: a system for recommending 3+ billion items to 200+ million users in real-time. In: Proceedings of the 2018 world wide web conference. pp. 1775–1784

  7. Floreano D, Dürr P, Mattiussi C (2008) Neuroevolution: from architectures to learning. Evol Intel 1(1):47–62

    Article  Google Scholar 

  8. Gao Y, Yang H, Zhang P, Zhou C, Hu Y (2020) July. Graph neural architecture search. IJCAI 20:1403–1409

    Google Scholar 

  9. Guo M, Yi T, Zhu Y, Bao Y (2021) Jitune: Just-in-time hyperparameter tuning for network embedding algorithms. arXiv preprint arXiv:2101.06427

  10. Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Advances in neural information processing systems, 30

  11. Hooi B, Song HA, Beutel A, Shah N, Shin K, Faloutsos C (2016) August. Fraudar: Bounding graph fraud in the face of camouflage. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. pp. 895–904

  12. Hu Z, Dong Y, Wang K, Sun Y (2020) April. Heterogeneous graph transformer. In: Proceedings of The Web Conference 2020. pp. 2704–2710

  13. Kandasamy K, Dasarathy G, Oliva JB, Schneider J, Póczos B (2016) Gaussian process bandit optimisation with multi-fidelity evaluations. Advances in neural information processing systems, 29

  14. Kandasamy K, Dasarathy G, Schneider J, Póczos B (2017) July. Multi-fidelity bayesian optimisation with continuous approximations. In: International Conference on Machine Learning. PMLR. pp. 1799–1808

  15. Kandasamy K, Neiswanger W, Schneider J, Poczos B, Xing EP (2018) Neural architecture search with bayesian optimisation and optimal transport. Advances in neural information processing systems, 31

  16. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations 2014

  17. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations 2017

  18. Kitano H (1990) Designing neural networks using genetic algorithms with graph generation system. Complex Syst 4:461–476

    MATH  Google Scholar 

  19. Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632

    Article  MathSciNet  Google Scholar 

  20. Koutra D, Ke TY, Kang U, Chau DHP, Pao HKK, Faloutsos C (2011) September. Unifying guilt-by-association approaches: theorems and fast algorithms. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 245–260). Springer, Berlin, Heidelberg

  21. Lemaire C, Achkar A, Jodoin PM (2019) Structured pruning of neural networks with budget-aware regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9108–9116

  22. Li X, Zhou Y, Pan Z, Feng J (2019) Partial order pruning: for best speed/accuracy trade-off in neural architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9145–9153

  23. Liu H, Simonyan K, Yang Y (2019) Darts: differentiable architecture search. International Conference on Learning Representations, ICLR 2019

  24. Liu H, Simonyan K, Vinyals O, Fernando C, Kavukcuoglu K (2017) Hierarchical representations for efficient architecture search. International Conference on Learning Representations, ICLR 2018

  25. Metropolis N, Ulam S (1949) The Monte Carlo method. J Am Stat Assoc 44(247):335–341

    Article  Google Scholar 

  26. Michalak K, Korczak J (2011) September. Graph mining approach to suspicious transaction detection. In 2011 Federated conference on computer science and information systems (FedCSIS) (pp. 69–75). IEEE

  27. Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Stanford InfoLab

  28. Pearl J (2022) Reverend Bayes on inference engines: a distributed hierarchical approach. In Probabilistic and Causal Inference: The Works of Judea Pearl. pp. 129–138

  29. Robert JV (2021) Linear programming: foundations and extensions. Springer, Berlin

    Google Scholar 

  30. Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T (2008) Collective classification in network data. AI Mag 29(3):93–93

    Google Scholar 

  31. Shchur O, Mumme M, Bojchevski A, Günnemann S (2018) 2018. Pitfalls of graph neural network evaluation, Relational Representation Learning Workshop, NeurIPS

  32. Shin K, Eliassi-Rad T, Faloutsos C (2016) December. Corescope: Graph mining using k-core analysis-patterns, anomalies and algorithms. In: 2016 IEEE 16th international conference on data mining (ICDM) (pp. 469–478). IEEE

  33. Strehl A, Ghosh J (2000) December. A scalable approach to balanced, high-dimensional clustering of market-baskets. In International Conference on High-Performance Computing (pp. 525–536). Springer, Berlin, Heidelberg

  34. Szekeres G, Wilf HS (1968) An inequality for the chromatic number of a graph. J Comb Theory 4(1):1–3

    Article  MathSciNet  Google Scholar 

  35. Tu K, Ma J, Cui P, Pei J, Zhu W (2019) July. Autone: Hyperparameter optimization for massive network embedding. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 216–225

  36. Vazquez A, Flammini A, Maritan A, Vespignani A (2003) Global protein function prediction from protein-protein interaction networks. Nat Biotechnol 21(6):697–700

    Article  Google Scholar 

  37. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2018) 6th International Conference on Learning Representations , ICLR 2018

  38. Wang Y, Liu S, Yoon M, Lamba H, Wang W, Faloutsos C, Hooi B (2020) November. Provably robust node classification via low-pass message passing. In: 2020 IEEE International Conference on Data Mining (ICDM) (pp. 621–630). IEEE

  39. Wright S, Nocedal J (1999) Numerical optimization. Springer Sci 35(67–68):7

    MATH  Google Scholar 

  40. Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019) May. Simplifying graph convolutional networks. In International conference on machine learning (pp. 6861–6871). PMLR

  41. Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY (2020) A comprehensive survey on graph neural networks. IEEE Trans ctions Neural Networks Learn Systems 32(1):4–24

    Article  MathSciNet  Google Scholar 

  42. Yao L, Mao C, Luo Y (2019) July. Graph convolutional networks for text classification. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 7370-7377)

  43. Yoon M, Jung J, Kang U (2018) April. Tpa: fast, scalable, and accurate method for approximate random walk with restart on billion scale graphs. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE) (pp. 1132–1143). IEEE

  44. Yoon M, Jin W, Kang U (2018) April. Fast and accurate random walk with restart on dynamic graphs with guarantees. In: Proceedings of the 2018 World Wide Web Conference (pp. 409–418)

  45. Yoon M, Hooi B, Shin K, Faloutsos C (2019) July. Fast and accurate anomaly detection in dynamic graphs with a two-pronged approach. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 647–657)

  46. Yoon M, Gervet T, Hooi B, Faloutsos C (2020) November. Autonomous graph mining algorithm search with best speed/accuracy trade-off. In: 2020 IEEE International Conference on Data Mining (ICDM) (pp. 751–760). IEEE

  47. You J, Ying Z, Leskovec J (2020) Design space for graph neural networks. Adv Neural Inf Process Syst 33:17009–17021

    Google Scholar 

  48. Yuan Y, Wang W, Coghill GM, Pang W (2021) A novel genetic algorithm with hierarchical evaluation strategy for hyperparameter optimisation of graph neural networks. arXiv preprint arXiv:2101.09300

  49. Zamir O, Etzioni O (1998) August. Web document clustering: a feasibility demonstration. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 46–54)

  50. Zhang Z, Wang X, Zhu W (2021) Automated machine learning on graphs: a survey. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21), Survey Track

  51. Zhong Z, Yan J, Liu CL (2017) Practical network blocks design with q-learning. arXiv preprint arXiv:1708.05552, 6

  52. Zoph B, Le QV (2017) Neural architecture search with reinforcement learning. International Conference on Learning Representations, ICLR 2017

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minji Yoon.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yoon, M., Gervet, T., Hooi, B. et al. Autonomous graph mining algorithm search with best performance trade-off. Knowl Inf Syst 64, 1571–1602 (2022). https://doi.org/10.1007/s10115-022-01683-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01683-8

Keywords

Navigation