GraphMat: Bridging the Productivity-Performance Gap in Graph Analytics: With increasing interest in large-scale distributed graph analytics for machine learning and data mining, more data scientists and developers are struggling to achieve high performance without sacrificing productivity on large graph problems. In this talk, I will discuss our solution to this problem: GraphMat. Using generalized sparse matrix-based primitives, we are able to achieve performance that is very close to hand-optimized native code, while allowing users to write programs using the familiar vertex-centric programming paradigm. I will show how we optimized GraphMat to achieve this performance on distributed platforms and provide programming examples. We have integrated GraphMat with Apache Spark in a manner that allows the combination to outperform all other distributed graph frameworks. I will explain the reasons for this performance and show that our approach achieves very high hardware efficiency in both single-node and distributed environments using primitives that are applicable to many machine learning and HPC problems. GraphMat is open source software and available for download.

Published on: **Mar 3, 2016**

Published in:
Technology

- 1. GraphMat: Bridging the Productivity- Performance Gap in Graph Analytics Narayanan Sundaram Parallel Computing Lab, Intel Labs
- 2. © 2015 Intel Corporation A cybersecurity application • Intel Security • Loopy belief propagation for reputation management • ~2B vertices, ~6 Billion edges • Needed to run daily • Took almost a day with Giraph on 16 machines How can we handle Internet-of-Things reputation management without increased performance? Port scanning DDoS Normal Traffic 2
- 3. © 2015 Intel Corporation A social problem • Pagerank • ~1 trillion edges in graph • Takes 3 minutes/iteration on 200 machines on Giraph How can we handle personalized pagerank for even top 1% users without increased performance? Ching, Avery, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. "One trillion edges: graph processing at Facebook-scale." Proceedings of the VLDB Endowment 8, no. 12 (2015): 1804-1815. 3
- 4. © 2015 Intel Corporation Problem scale Social network ~1 billion vertices ~100 billion connections Web graph ~50 billion pages ~1 trillion hyperlinks Brain network ~100 billion neurons ~100 trillion connections Marc Smith: NodeXL Twitter Network Graphs: CHI2010: https://www.flickr.com/photos/marc_smith/4511844243 (License: CC BY 2.0 http://creativecommons.org/licenses/by/2.5 ) Larry & Teddy Page: Blog webgraph: https://www.flickr.com/photos/igboo/1814232325 (License: CC BY 2.0 http://creativecommons.org/licenses/by/2.5 ) Xavier Gigandet et. al. - Gigandet X, Hagmann P, Kurant M, Cammoun L, Meuli R, et al. (2008) Estimating the Confidence Level of White Matter Connections Obtained with MRI Tractography. PLoS ONE 3(12): e4006. doi:10.1371/journal.pone.0004006 (License: CC BY 2.0 http://creativecommons.org/licenses/by/2.5 ) 4
- 5. © 2015 Intel Corporation GraphMat • What is GraphMat? • GraphMat is a graph programming framework with vertex programming as front-end and sparse matrix operations as back-end • “Matrix level performance with vertex program productivity” • How can it help you? • “I know vertex programming and I like it, but Giraph/GraphX/Pregel/GraphLab… is too slow” • “I heard that graph programs can be written as matrix operations (and matrices are fast), but I do not want to recode my graph algorithms as matrix algorithms” Narayanan Sundaram, Nadathur Satish, Md Mostofa Ali Patwary, Subramanya R Dulloor, Michael Anderson, Satya Gautam Vadlamudi, Dipankar Das, Pradeep Dubey “GraphMat: High performance graph analytics made productive”, PVLDB, Vol 8 No 11, 2015. 5
- 6. © 2015 Intel Corporation Why? • Why GraphMat? • We want to enable super-fast distributed graph processing on X86 servers • Why open-source? • We want to enable super-fast distributed graph processing on X86 servers for everyone - C++/MPI - BSD license • Integrate it with your data processing/ML frameworks • We can help 6
- 7. © 2015 Intel Corporation Current state-of-the-art • GraphMat is faster than other distributed graph frameworks • Faster than GraphLab, CombBLAS, GraphX, Giraph… • Optimized for multi-node and multi-core • Uses vertex programming • Bringing sparse matrix optimizations from High Performance Computing to Big Graph processing 7
- 8. © 2015 Intel Corporation • Vertex programming “think like a vertex” – GraphLab, Giraph, MapGraph, Pregel, GraphX • Matrix based “graphs are sparse matrices” – CombBLAS, PEGASUS • Task models – Galois • Declarative programming – SociaLite (datalog-like) • Domain-specific languages – GreenMarl Diversity in current graph frameworks 0 20 40 60 80 100 120 140 160 PageRank (8 million vertices, 128 million edges) Speedupw.r.t.Giraph Giraph GraphLab CombBLAS Galois Native 8
- 9. © 2015 Intel Corporation Diversity in current graph frameworks (contd.) Framework Productivity Performance GraphLab Giraph CombBLAS Galois GraphX GraphMat Combine high productivity with great performance Green = good, orange = ok, red = bad. Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Jiwon Seo, Jongsoo Park, Muhammad Hassaan, Shubho Sengupta, Zhaoming Yin, Pradeep Dubey, “Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets”, SIGMOD 2014 9
- 10. © 2015 Intel Corporation Assumptions • Vertex programming is productive • Fewer building blocks are better • Sparse matrix operations are scalable • Very few people have the ability and interest to optimize “to the metal” • Can use MPI in distributed setting (even on cloud) • This assumption may be relaxed in the future • Graph data fits in memory 10
- 11. © 2015 Intel Corporation GraphMat: High level operation Benefits High productivity (vertex programming for users) High performance (optimized sparse matrix backend) Vertex program: • Send message to all edges • Process incoming message • Reduce • Operate on vertex Our transformation: Send message Create (sparse) vector Process message SPMV multiply Reduce SPMV Add Apply Data parallel operator Scatter Gather Apply Generalized SpMV / SpGEMM 11
- 12. © 2015 Intel Corporation Example (Vertex Degree) Can process in- edges, out-edges or all edges. C++ templates for handling arbitrary types User-defined functions to specify a particular algorithm 12
- 13. © 2015 Intel Corporation What is new? • Graph algorithms as linear algebra are well-known • Unifying vertex programming with linear algebra is new 13
- 14. © 2015 Intel Corporation 𝐴 𝐵 𝐶 𝐷 𝐸 𝐺 𝑇 = 𝐴 𝐵 𝐶 𝐷 𝐸 − − − − 4 1 − − − − 3 1 − − − 2 − 2 − − − − − 2 − B A C D E1 2 1 3 4 2 2 Single Source Shortest Path SEND_MESSAGE : message ≔ vertex_distance PROCESS_MESSAGE : result ≔ message + edge_value REDUCE : result ≔ min(result, operand) APPLY : vertex_distance = min(result, vertex_distance) Example 14
- 15. © 2015 Intel Corporation ∞ ∞ ∞ ∞ ∞ 𝐼𝑛𝑖𝑡 0 ∞ ∞ ∞ ∞ − − − − 4 1 − − − − 3 1 − − − 2 − 2 − − − − − 2 − , 0 − − − − 𝑃𝑟𝑜𝑐𝑒𝑠𝑠 𝑚𝑒𝑠𝑠𝑎𝑔𝑒 + 𝑅𝑒𝑑𝑢𝑐𝑒 − 1 3 2 − − 1 3 2 − , 0 ∞ ∞ ∞ ∞ 𝐴𝑝𝑝𝑙𝑦 0 1 3 2 ∞ Iteration 0 − − − − 4 1 − − − − 3 1 − − − 2 − 2 − − − − − 2 − , − 1 3 2 − 𝑃𝑟𝑜𝑐𝑒𝑠𝑠 𝑚𝑒𝑠𝑠𝑎𝑔𝑒 + 𝑅𝑒𝑑𝑢𝑐𝑒 − − 2 5 4 − − 2 5 4 , 0 1 3 2 ∞ 𝐴𝑝𝑝𝑙𝑦 0 1 2 2 4 Iteration 1 𝑆𝑒𝑛𝑑 𝑚𝑒𝑠𝑠𝑎𝑔𝑒 𝑆𝑒𝑛𝑑 𝑚𝑒𝑠𝑠𝑎𝑔𝑒 B A C D E1 2 1 3 4 2 2 B A C D E1 2 1 3 4 2 2 B A C D E1 2 1 3 4 2 2 0 ∞∞ ∞∞ 0 ∞1 23 0 41 22 reduced values previous distances updated distances Single Source Shortest Path SEND_MESSAGE : message ≔ vertex_distance PROCESS_MESSAGE : result ≔ message + edge_value REDUCE : result ≔ min(result, operand) APPLY : vertex_distance = min(result, vertex_distance) 15
- 16. © 2015 Intel Corporation Optimizations • Flexible graph partitioning • 1-D, 2-D, Block cyclic • Flexible data structures • Compressed Sparse Row (CSR) • Doubly compressed sparse column (DCSC) • Dense with bitvectors • Low-level • Compiler optimizations • Vectorization 16
- 17. © 2015 Intel Corporation GraphMat vs others 0 1 2 3 4 5 6 7 8 MapGraph Galois CombBLAS GraphLab Slowdown vs GraphMat (>1 imples GraphMat is faster) 17
- 18. © 2015 Intel Corporation Is GraphMat good enough? * Native code performs direction optimized sweeps for BFS, GraphMat only forward 18 0 2 4 6 8 Pagerank Breadth First Search Triangle counting Shortest path Time in seconds Native runtime vs GraphMat GraphMat Native optimized code
- 19. © 2015 Intel Corporation Scalability (Preliminary results) Weak scaling, RMAT 128 M edges/node 19 0.1 1 10 100 1000 1 2 4 Timeperiteration(insec) #Nodes Pagerank GraphMat GraphX 0.1 1 10 100 1000 1 2 4 Timeinseconds #Nodes Shortest path GraphMat GraphX
- 20. © 2015 Intel Corporation Availability • Open source under BSD license • https://github.com/narayanan2004/GraphM at • (Single-node code only at the moment) • Plan to integrate with 3rd party data processing frameworks • JNI wrappers to call with Spark as a first step 20
- 21. © 2015 Intel Corporation Summary • GraphMat bridges the productivity-performance gap for graph analytics • Within 20% of native code performance • Faster than GraphLab, CombBLAS, Galois, and GraphX • As easy as vertex programming • Integration with other frameworks on the way • Code available under BSD at https://github.com/narayanan2004/GraphMat 21
- 22. © 2015 Intel Corporation Acknowledgements (Parallel Computing Lab, Intel Labs) Michael J. Anderson Nadathur Rajagopalan Satish Md Mostofa Ali Patwary Subramanya Dulloor Satya Gautam Vadlamudi Nesreen Ahmed Dipankar Das Ted Willke Pradeep Dubey 22
- 23. Questions? https://github.com/narayanan2004/GraphMat narayanan.sundaram@intel.com