Abstract
The pervasiveness of graphs today has raised the demand for algorithms to answer various questions: Which products would a user like to purchase given her order list? Which users are buying fake followers? Myriads of new graph algorithms are proposed every year to answer such questions—each with a distinct problem formulation, computational time, and memory footprint. This lack of unity makes it difficult for practitioners to compare different algorithms and pick the most suitable one for their application. These challenges create a gap in which state-of-the-art techniques developed in academia fail to be optimally deployed in real-world applications. To bridge this gap, we propose AutoGM, an automated system for graph mining algorithm development. We first define a unified framework UnifiedGM for message-passing-based graph algorithms. UnifiedGM defines a search space in which five parameters are required to determine a graph algorithm. Under this search space, AutoGM explicitly optimizes for the optimal parameter set of UnifiedGM using Bayesian Optimization. AutoGM defines a novel budget-aware objective function for the optimization to find the best speed-accuracy trade-off in algorithms under a computation budget. On various real-world datasets, AutoGM generates novel graph algorithms with the best speed/accuracy trade-off compared to existing models with heuristic parameters.
Similar content being viewed by others
Notes
This paper is an extended version of [46].
References
Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized pagerank. Proc VLDB Endow 4(3):173–184
Bennett J, Lanning S (2007) August. The netflix prize. In Proceedings of KDD cup and workshop (Vol. 2007, p. 35)
Brochu E, Cora VM, De Freitas N (2010) A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599
Brohee S, Van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinf 7(1):1–19
Dreżewski R, Sepielak J, Filipkowski W (2015) The application of social network analysis algorithms in a system supporting money laundering detection. Inf Sci 295:18–32
Eksombatchai C, Jindal P, Liu JZ, Liu Y, Sharma R, Sugnet C, Ulrich M, Leskovec J (2018) April. Pixie: a system for recommending 3+ billion items to 200+ million users in real-time. In: Proceedings of the 2018 world wide web conference. pp. 1775–1784
Floreano D, Dürr P, Mattiussi C (2008) Neuroevolution: from architectures to learning. Evol Intel 1(1):47–62
Gao Y, Yang H, Zhang P, Zhou C, Hu Y (2020) July. Graph neural architecture search. IJCAI 20:1403–1409
Guo M, Yi T, Zhu Y, Bao Y (2021) Jitune: Just-in-time hyperparameter tuning for network embedding algorithms. arXiv preprint arXiv:2101.06427
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Advances in neural information processing systems, 30
Hooi B, Song HA, Beutel A, Shah N, Shin K, Faloutsos C (2016) August. Fraudar: Bounding graph fraud in the face of camouflage. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. pp. 895–904
Hu Z, Dong Y, Wang K, Sun Y (2020) April. Heterogeneous graph transformer. In: Proceedings of The Web Conference 2020. pp. 2704–2710
Kandasamy K, Dasarathy G, Oliva JB, Schneider J, Póczos B (2016) Gaussian process bandit optimisation with multi-fidelity evaluations. Advances in neural information processing systems, 29
Kandasamy K, Dasarathy G, Schneider J, Póczos B (2017) July. Multi-fidelity bayesian optimisation with continuous approximations. In: International Conference on Machine Learning. PMLR. pp. 1799–1808
Kandasamy K, Neiswanger W, Schneider J, Poczos B, Xing EP (2018) Neural architecture search with bayesian optimisation and optimal transport. Advances in neural information processing systems, 31
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations 2014
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations 2017
Kitano H (1990) Designing neural networks using genetic algorithms with graph generation system. Complex Syst 4:461–476
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632
Koutra D, Ke TY, Kang U, Chau DHP, Pao HKK, Faloutsos C (2011) September. Unifying guilt-by-association approaches: theorems and fast algorithms. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 245–260). Springer, Berlin, Heidelberg
Lemaire C, Achkar A, Jodoin PM (2019) Structured pruning of neural networks with budget-aware regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9108–9116
Li X, Zhou Y, Pan Z, Feng J (2019) Partial order pruning: for best speed/accuracy trade-off in neural architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9145–9153
Liu H, Simonyan K, Yang Y (2019) Darts: differentiable architecture search. International Conference on Learning Representations, ICLR 2019
Liu H, Simonyan K, Vinyals O, Fernando C, Kavukcuoglu K (2017) Hierarchical representations for efficient architecture search. International Conference on Learning Representations, ICLR 2018
Metropolis N, Ulam S (1949) The Monte Carlo method. J Am Stat Assoc 44(247):335–341
Michalak K, Korczak J (2011) September. Graph mining approach to suspicious transaction detection. In 2011 Federated conference on computer science and information systems (FedCSIS) (pp. 69–75). IEEE
Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Stanford InfoLab
Pearl J (2022) Reverend Bayes on inference engines: a distributed hierarchical approach. In Probabilistic and Causal Inference: The Works of Judea Pearl. pp. 129–138
Robert JV (2021) Linear programming: foundations and extensions. Springer, Berlin
Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T (2008) Collective classification in network data. AI Mag 29(3):93–93
Shchur O, Mumme M, Bojchevski A, Günnemann S (2018) 2018. Pitfalls of graph neural network evaluation, Relational Representation Learning Workshop, NeurIPS
Shin K, Eliassi-Rad T, Faloutsos C (2016) December. Corescope: Graph mining using k-core analysis-patterns, anomalies and algorithms. In: 2016 IEEE 16th international conference on data mining (ICDM) (pp. 469–478). IEEE
Strehl A, Ghosh J (2000) December. A scalable approach to balanced, high-dimensional clustering of market-baskets. In International Conference on High-Performance Computing (pp. 525–536). Springer, Berlin, Heidelberg
Szekeres G, Wilf HS (1968) An inequality for the chromatic number of a graph. J Comb Theory 4(1):1–3
Tu K, Ma J, Cui P, Pei J, Zhu W (2019) July. Autone: Hyperparameter optimization for massive network embedding. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 216–225
Vazquez A, Flammini A, Maritan A, Vespignani A (2003) Global protein function prediction from protein-protein interaction networks. Nat Biotechnol 21(6):697–700
Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2018) 6th International Conference on Learning Representations , ICLR 2018
Wang Y, Liu S, Yoon M, Lamba H, Wang W, Faloutsos C, Hooi B (2020) November. Provably robust node classification via low-pass message passing. In: 2020 IEEE International Conference on Data Mining (ICDM) (pp. 621–630). IEEE
Wright S, Nocedal J (1999) Numerical optimization. Springer Sci 35(67–68):7
Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K (2019) May. Simplifying graph convolutional networks. In International conference on machine learning (pp. 6861–6871). PMLR
Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY (2020) A comprehensive survey on graph neural networks. IEEE Trans ctions Neural Networks Learn Systems 32(1):4–24
Yao L, Mao C, Luo Y (2019) July. Graph convolutional networks for text classification. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 7370-7377)
Yoon M, Jung J, Kang U (2018) April. Tpa: fast, scalable, and accurate method for approximate random walk with restart on billion scale graphs. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE) (pp. 1132–1143). IEEE
Yoon M, Jin W, Kang U (2018) April. Fast and accurate random walk with restart on dynamic graphs with guarantees. In: Proceedings of the 2018 World Wide Web Conference (pp. 409–418)
Yoon M, Hooi B, Shin K, Faloutsos C (2019) July. Fast and accurate anomaly detection in dynamic graphs with a two-pronged approach. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 647–657)
Yoon M, Gervet T, Hooi B, Faloutsos C (2020) November. Autonomous graph mining algorithm search with best speed/accuracy trade-off. In: 2020 IEEE International Conference on Data Mining (ICDM) (pp. 751–760). IEEE
You J, Ying Z, Leskovec J (2020) Design space for graph neural networks. Adv Neural Inf Process Syst 33:17009–17021
Yuan Y, Wang W, Coghill GM, Pang W (2021) A novel genetic algorithm with hierarchical evaluation strategy for hyperparameter optimisation of graph neural networks. arXiv preprint arXiv:2101.09300
Zamir O, Etzioni O (1998) August. Web document clustering: a feasibility demonstration. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 46–54)
Zhang Z, Wang X, Zhu W (2021) Automated machine learning on graphs: a survey. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21), Survey Track
Zhong Z, Yan J, Liu CL (2017) Practical network blocks design with q-learning. arXiv preprint arXiv:1708.05552, 6
Zoph B, Le QV (2017) Neural architecture search with reinforcement learning. International Conference on Learning Representations, ICLR 2017
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yoon, M., Gervet, T., Hooi, B. et al. Autonomous graph mining algorithm search with best performance trade-off. Knowl Inf Syst 64, 1571–1602 (2022). https://doi.org/10.1007/s10115-022-01683-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01683-8