# GLEE: Geometric Laplacian Eigenmap Embedding :author: { :name: Leo Torres :affiliation: Network Science Institute, Northeastern University, Boston, MA 02115, USA :email: [email protected] } :: :author: { :name: Kevin S. Chan :affiliation: U.S. CCDC Army Research Laboratory, Adelphi, MD 20783, USA } :: :author: { :name: Tina Eliassi-Rad :affiliation: Network Science Institute and Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA } :: :paragraph: {:class: {press-note, small}} Originally published in *Journal of Complex Networks* 8(2), 2020 (doi:10.1093/comnet/cnaa007). This is an interactive edition with explorable figures and does not contain new scientific content. :abstract: { :keywords: {graph embedding, graph Laplacian, simplex geometry} } Graph embedding seeks to build a low-dimensional representation of a graph $G$. This low-dimensional representation is then used for various downstream tasks. One popular approach is Laplacian Eigenmaps (LE), which constructs a graph embedding based on the spectral properties of the Laplacian matrix of $G$. The intuition behind it, and many other embedding techniques, is that the embedding of a graph must respect node similarity: similar nodes must have embeddings that are close to one another. Here, we dispose of this distance-minimization assumption. Instead, we use the Laplacian matrix to find an embedding with geometric properties instead of spectral ones, by leveraging the so-called simplex geometry of $G$. We introduce a new approach, *Geometric Laplacian Eigenmap Embedding*, and demonstrate that it outperforms various other techniques (including LE) in the tasks of graph reconstruction and link prediction. :: :paragraph: {:class: {press-cta, small}} This paper contains 8 interactive figures. Scroll to explore. ## Introduction {:label: sec-introduction} :html: { :path: morph.html :label: fig-morph } :caption: **The same graph, three ways of seeing it.** Zachary's Karate Club smoothly morphs between its network layout (force-directed), its full 33-dimensional simplex (PCA-projected), and its 3D GLEE embedding (sphere-normalized). Drag the slider or let it loop. :: Graphs are ubiquitous in real-world systems from the internet to the world wide web to social media to the human brain. The application of machine learning to graphs is a popular and active research area. One way to apply known machine learning methods to graphs is by transforming the graph into a representation that can be directly fed to a general machine learning pipeline. For this purpose, the task of graph representation learning, or graph embedding, seeks to build a vector representation of a graph by assigning to each node a feature vector that can then be fed into any machine learning algorithm. Popular graph embedding techniques seek an embedding where the distance between the latent representations of two nodes represents their similarity. For example, :cite:chen2018:: calls this the 'community aware' property (nodes in a community are considered similar, and thus their representations must be close to one another), while :cite:chen2017:: calls it a 'symmetry' between the node domain and the embedding domain. Others call methods based on this property with various names such as 'positional' :cite:srinivasan2020:: or 'proximity-based' :cite:jin2019:: embeddings. Consequently, many of these approaches are formulated in such a way that the distance (in the embedding space) between nodes that are similar (in the original data domain) is small. Here, we present a different approach. Instead of focusing on minimizing the distance between similar nodes, we seek an embedding that preserves the most basic structural property of the graph, namely adjacency; the works :cite:srinivasan2020,jin2019:: call this approach 'structural' node embeddings. Concretely, if the nodes $i$ and $j$ are neighbours in the graph $G$ with $n$ nodes, we seek $d$-dimensional vectors $s_i$ and $s_j$ such that the adjacency between $i$ and $j$ is encoded in the geometric properties of $s_i$ and $s_j$, for some $d \ll n$. Examples of geometric properties are the dot product of two vectors (which is a measure of the angle between them), the length (or area or volume) of a line segment (or polygon or polyhedron), the centre of mass or the convex hull of a set of vectors, among others. In :ref:sec-approach::, we propose one such geometric embedding technique, called Geometric Laplacian Eigenmap Embedding (GLEE), that is based on the properties of the Laplacian matrix of $G$, and we then proceed to compare it to the original formulation of Laplacian Eigenmaps (LE) as well as other popular embedding techniques. GLEE has deep connections with the so-called simplex geometry of the Laplacian :cite:devriendt2019,fiedler2011::. Fiedler :cite:fiedler2011:: first made this observation, which highlights the bijective correspondence between the Laplacian matrix of an undirected, weighted graph and a geometric object known as a *simplex*. :ref:fig-morph:: shows the Karate Club graph morphing from its network layout into its simplex. Using this relationship, we find a graph embedding such that the representations $s_i, s_j$ of two non-adjacent nodes $i$ and $j$ are always orthogonal, $s_i \cdot s_j = 0$, thus achieving a geometric encoding of adjacency. Note that this does not satisfy the 'community-aware' property of :cite:chen2018::. For example, the geometric embedding $s_i$ of node $i$ will be orthogonal to each non-neighbouring node, including those in its community. Thus, $s_i$ is not close to other nodes in its community, whether we define *closeness* in terms of Euclidean distance or cosine similarity. However, we show that this embedding---based on the simplex geometry---contains desirable information, and that it outperforms the original, distance-minimizing, formulation of LE on the tasks of graph reconstruction and link prediction in certain cases. The contributions of this work are as follows: :enumerate: :-: We present a geometric framework for graph embedding that departs from the tradition of looking for representations that minimize the distance between similar nodes by highlighting the intrinsic geometric properties of the Laplacian matrix. :-: The proposed method, GLEE, while closely related to the LE method, outperforms LE in the tasks of link prediction and graph reconstruction. Moreover, a common critique of LE is that it only considers first-order adjacency in the graph. We show that GLEE takes into account higher-order connections (see :ref:sec-link-prediction::). :-: The performance of existing graph embedding methods (which minimize distance between similar nodes) suffers when the graph's average clustering coefficient is low. This is not the case for GLEE. :: In :ref:sec-background::, we recall the original formulation of LE, in order to define the GLEE in :ref:sec-approach:: and discuss its geometric properties. We mention related work in :ref:sec-related-work:: and present experimental studies of GLEE in :ref:sec-experiments::. We finish with concluding remarks in :ref:sec-conclusions::. ## Background on LE {:label: sec-background} Belkin and Niyogi :cite:belkin2002,belkin2003:: introduced LE as a general-purpose method for embedding and clustering an arbitrary dataset. Given a dataset $\{x_i\}_{i=1}^n$, a proximity graph $G = (V, \mathbf{A})$ is constructed with node set $V = \{x_i\}$ and edge weights $\mathbf{A} = (a_{ij})$. The edge weights are built using one of many heuristics that determine which nodes are close to each other and can be binary or real valued. Some examples are $k$ nearest neighbours, $\varepsilon$-neighbourhoods, heat kernels, etc. To perform the embedding, one considers the Laplacian matrix of $G$, defined as $\mathbf{L} = \mathbf{D} - \mathbf{A}$, where $\mathbf{D}$ is the diagonal matrix whose entries are the degrees of each node. One of the defining properties of $\mathbf{L}$ is the value of the quadratic form: $${:label: quad-form} y^\top \mathbf{L} y = \frac{1}{2}\sum_{ij} a_{ij}(y_i - y_j)^2. $$ The vector $y^*$ that minimizes the value of :ref:quad-form:: will be such that the total weighted distance between all pairs of nodes is minimized. Here, $y_i$ can be thought of as the one-dimensional embedding of node $i$. One can then extend this procedure to arbitrary $d$-dimensional node embeddings by noting that $tr(\mathbf{Y}^\top \mathbf{L}\mathbf{Y}) = \sum_{ij} a_{ij} \|y_i - y_j\|^2$, where $\mathbf{Y} \in \mathbb{R}^{n \times d}$ and $y_i$ is the $i$th row of $\mathbf{Y}$. The objective function in this case is $${:label: le-objective} \begin{align} \mathbf{Y}^* = \arg\min_{\mathbf{Y} \in \mathbb{R}^{n \times d}} tr(\mathbf{Y}^\top \mathbf{L}\mathbf{Y}) \\ \text{s.t. } \mathbf{Y}^\top \mathbf{D}\mathbf{Y} = \mathbf{I}. \end{align} $$ Importantly, the quantity $tr(\mathbf{Y}^\top \mathbf{L}\mathbf{Y})$ has a global minimum at $\mathbf{Y} = 0$. Therefore, a restriction is necessary to guarantee a non-trivial solution. Belkin and Niyogi :cite:belkin2002,belkin2003:: choose $\mathbf{Y}^\top \mathbf{D}\mathbf{Y} = \mathbf{I}$, though others are possible. Applying the method of Lagrange multipliers, one can see that the solution of :ref:le-objective:: is achieved at the matrix $\mathbf{Y}^*$ whose rows $y_i^*$ are the solutions to the eigenvalue problem $${:label: le-eigenvalue} \mathbf{L}y_i^* = \lambda_i \mathbf{D} y_i^*. $$ When the graph contains no isolated nodes, $y_i^*$ is then an eigenvector of the matrix $\mathbf{D}^{-1}\mathbf{L}$, also known as the normalized Laplacian matrix. The embedding of a node $j$ is then the vector whose entries are the $j$th elements of the eigenvectors $y_1^*, y_2^*, \ldots, y_d^*$. ## Proposed approach: Geometric LE {:label: sec-approach} We first give our definition and then proceed to discuss both the algebraic and geometric motivations behind it. :definition: { :label: def-glee :title: GLEE } Given a graph $G$, consider its Laplacian matrix $\mathbf{L}$. Using singular value decomposition, we may write $\mathbf{L} = \mathbf{S}\mathbf{S}^\top$ for a unique matrix $\mathbf{S}$. Define $\mathbf{S}^d$ as the matrix of the first $d$ columns of $\mathbf{S}$. If $i$ is a node of $G$, define its $d$-dimensional *Geometric Laplacian Eigenmap Embedding* (GLEE) as the $i$th row of $\mathbf{S}^d$, denoted by $s_i^d$. If the dimension $d$ is unambiguous, we will just write $s_i$. :: **Algebraic motivation.** In the case of positive semidefinite matrices, such as the Laplacian, the singular values coincide with the eigenvalues. Moreover, it is well known that $\mathbf{S}^d$ is the matrix of rank $d$ that is closest to $\mathbf{L}$ in Frobenius norm, i.e., $\|\mathbf{L} - \mathbf{S}^d(\mathbf{S}^d)^\top\|_F \leq \|\mathbf{L} - \mathbf{M}\|_F$ for all matrices $\mathbf{M}$ of rank $d$. Because of this, we expect $\mathbf{S}^d$ to achieve better performance in the graph reconstruction task than any other $d$-dimensional embedding (see :ref:sec-exp-reconstruction::). As can be seen from :ref:quad-form::, the original formulation of LE is due to the fact that the distance between the embeddings of neighbouring nodes is minimized, under the restriction $Y^\top \mathbf{D} Y = I$. We can also formulate GLEE in terms of the distance between neighbouring nodes. Perhaps counterintuitively, GLEE solves a distance *maximization* problem, as follows. The proof follows from a routine application of Lagrange multipliers and is omitted. :theorem: {:label: thm-max} Let $\Lambda$ be the diagonal matrix whose entries are the eigenvalues of $\mathbf{L}$. Consider the optimization problem $${:label: glee-objective} \begin{align} \arg\max_{\mathbf{Y} \in \mathbb{R}^{n \times d}} tr(\mathbf{Y}^\top \mathbf{L}\mathbf{Y}) \\ \text{s.t. } \mathbf{Y}^\top \mathbf{Y} = \Lambda. \end{align} $$ Its solution is the matrix $\mathbf{S}^d$ whose columns are the eigenvectors corresponding to the largest eigenvalues of $\mathbf{L}$. If $d = n$, then $\mathbf{L} = \mathbf{S}^d (\mathbf{S}^d)^\top$. :: The importance of :ref:thm-max:: is to highlight the fact that distance-minimization may be misleading when it comes to exploiting the properties of the embedding space. Indeed, the original formulation of LE, while well established in :ref:le-objective::, yields as result the eigenvectors corresponding to the *lowest* eigenvalues of $\mathbf{L}$. However, standard results in linear algebra tell us that the best low rank approximation of $\mathbf{L}$ is given by the eigenvectors corresponding to the *largest* eigenvalues. Therefore, these are the ones used in the definition of GLEE. **Geometric motivation.** The geometric reasons underlying :ref:def-glee:: are perhaps more interesting than the algebraic ones. A recent review paper :cite:devriendt2019:: highlights the work of Fiedler :cite:fiedler2011::, who discovered a bijective correspondence between the Laplacian matrix of a graph and a higher-dimensional geometric object called a *simplex*. :definition: { :label: def-simplex :title: Simplex } Given a set of $k + 1$ $k$-dimensional points $\{p_i\}_{i=0}^k$, if they are affinely independent (i.e. if the set of $k$ points $\{p_0 - p_i\}_{i=1}^k$ is linearly independent) then their convex hull is called a *simplex*. :: A simplex is a high-dimensional polyhedron that is the generalization of a two-dimensional triangle or a three-dimensional tetrahedron. To see the connection between the Laplacian matrix of a graph and simplex geometry, we invoke the following result. The interested reader will find the proof in :cite:devriendt2019,fiedler2011::. :theorem: {:label: thm-simplex} Let $\mathbf{Q}$ be a positive semidefinite $k \times k$ matrix. There exists a $k \times k$ matrix $\mathbf{S}$ such that $\mathbf{Q} = \mathbf{S}\mathbf{S}^\top$. The rows of $\mathbf{S}$ lie at the vertices of a simplex if and only if the rank of $\mathbf{Q}$ is $k - 1$. :: :corollary: {:label: cor-simplex} Let $G$ be a connected graph with $n$ nodes. Its Laplacian matrix $\mathbf{L}$ is positive semidefinite, has rank $n - 1$, and has eigendecomposition $\mathbf{L} = \mathbf{P}\Lambda\mathbf{P}^\top$. Write $\mathbf{S} = \mathbf{P}\sqrt{\Lambda}$. Then, $\mathbf{L} = \mathbf{S}\mathbf{S}^\top$ and the rows of $\mathbf{S}$ are the vertices of a $(n-1)$-dimensional simplex called the *simplex of $G$*. :: :ref:cor-simplex:: is central to the approach in :cite:devriendt2019::, providing a correspondence between graphs and simplices. :ref:cor-simplex:: also shines new light on GLEE: *the matrix $\mathbf{S}^d$ from :ref:def-glee:: corresponds to the first $d$ dimensions of the simplex of $G$.* In other words, computing the GLEE embeddings of a graph $G$ is equivalent to computing the simplex of $G$ and projecting it down to $d$ dimensions. We proceed to explore the geometric properties of this simplex that can aid in the interpretation of GLEE embeddings. We can find in :cite:devriendt2019:: the following result. :corollary: {:label: cor-properties} Let $s_i$ be the $i$th row of $\mathbf{S}$ in :ref:cor-simplex::. $s_i$ is the simplex vertex corresponding to node $i$, and satisfies $\|s_i\|^2 = \deg(i)$, and $s_i \cdot s_j^\top = -a_{ij}$, where $\deg(i)$ is the degree of $i$. In particular, $s_i$ is orthogonal to the embedding of any non-neighbouring node $j$. :: :ref:cor-properties:: highlights some of the basic geometric properties of the simplex (such as lengths and dot products) that can be interpreted in graph theoretical terms (respectively degrees and adjacency). In :ref:fig-simplex::, we show examples of these properties. It is worth noting that other common matrix representations of graphs do not present a spectral decomposition that yields a simplex. For example, the adjacency matrix $\mathbf{A}$ is not in general positive semidefinite, and the normalized Laplacian $\mathbf{D}^{-1}\mathbf{L}$ (used by LE) is not symmetric. Therefore, :ref:thm-simplex:: does not apply to them. We now proceed to show how to take advantage of the geometry of GLEE embeddings, which can all be thought of as coming from the simplex, in order to perform common graph mining tasks. In the following we focus on unweighted, undirected graphs. :html: { :path: fig1_paper.html :label: fig-simplex } :caption: Simplex geometry and GLEE. Given a graph $G$ with $n$ nodes (top row), there is a $(n-1)$-dimensional simplex that perfectly encodes the structure of $G$, given by the rows of the matrix $\mathbf{S}$ from :ref:cor-simplex:: (middle row). The first $d$ columns of $\mathbf{S}$ yield the GLEE of $G$ (bottom row). In each example, embeddings are colour-coded according to the node they represent. For $n = 3$, all nodes in the triangle graph are interchangeable. Accordingly, their embeddings all have the same length and subtend equal angles with each other. For $n = 4$, the green and purple nodes are interchangeable, and thus their embeddings are symmetric. Note that the length of each embedding corresponds to the degree of the corresponding node. For $n = 34$ we show the Karate Club network :cite:zachary1977::, in which we highlight one node in green and all of its neighbours in purple. In the bottom right panel, the dotted line is orthogonal to the green node's embedding. Note that most of the non-neighbours' embeddings (in gray) are close to orthogonal to the green node's embedding, while all neighbours (in purple) are not. :: ### Graph reconstruction {:label: sec-reconstruction} For a graph $G$ with $n$ nodes, consider its $d$-dimensional GLEE embedding $\mathbf{S}^d$. When $d = n$, in light of :ref:cor-properties::, the dot product between any two embeddings $s_i, s_j$ can only take the values $-1$ or $0$ and one can reconstruct the graph perfectly from its simplex. However, if $d < n$, the distribution of dot products will take on real values around $-1$ and $0$ with varying amounts of noise; the larger the dimension $d$, the less noise we find around the two modes. It is important to distinguish which nodes $i, j$ have embeddings $s_i, s_j$ whose dot product belongs to the mode at $0$ or to the mode at $-1$, for this determines whether or not the nodes are neighbours in the graph. One possibility is to simply 'split the difference' and consider $i$ and $j$ as neighbours whenever $s_i \cdot s_j < -0.5$. More generally, given a graph $G$ and its embedding $\mathbf{S}^d$, define $\hat{\mathbf{L}}(\theta)$ to be the estimated Laplacian matrix using the above heuristic with threshold $\theta$, that is $${:label: threshold} \hat{\mathbf{L}}_{ij}(\theta) = \begin{cases} -1 & s_i \cdot s_j^\top < \theta \\ 0 & \text{otherwise.} \end{cases} $$ Then, we seek the value of $\theta$, call it $\theta_{\text{opt}}$, that minimizes the loss $${:label: theta-opt} \theta_{\text{opt}} = \arg\min_{\theta \in [-1,0]} \|\mathbf{L} - \hat{\mathbf{L}}(\theta)\|_F^2. $$ If all we have access to is the embedding, but not the original graph, we cannot optimize :ref:theta-opt:: directly. Thus, we have to estimate $\theta_{\text{opt}}$ heuristically. As explained above, one simple estimator is the constant $\hat\theta_c = -0.5$. We develop two other estimators: $\hat\theta_k, \hat\theta_g$, obtained by applying Kernel Density Estimation and Gaussian Mixture Models (GMMs), respectively. We do so in :ref:sec-threshold-estimators:: as their development has little to do with the geometry of GLEE embeddings. Our experiments show that different thresholds $\theta_c, \theta_k$, and $\theta_g$ produce excellent results on different datasets; see :ref:sec-threshold-estimators:: for discussion. ### Link prediction {:label: sec-link-prediction} Since the objective of GLEE is to directly encode graph structure in a geometric way, rather than solve any one particular task, we are able to use it in two different ways to perform link prediction. These are useful in different kinds of networks. #### Number of common neighbours {:label: sec-common-neighbours} It is well known that heuristics such as number of CN or Jacard similarity (JS) between neighbourhoods are highly effective for the task of link prediction in networks with a strong tendency for triadic closure :cite:sarkar2011::. Here, we show that we can use the geometric properties of GLEE in order to approximately compute CN. For the purpose of exposition, we assume $d = n$ unless stated otherwise in this section. Given an arbitrary subset of nodes $V$ in the graph $G$, we denote by $|V|$ its number of elements. We further define *the centroid of $V$*, denoted by $C_V$, as the centroid of the simplex vertices that correspond to its nodes, i.e., $C_V = \frac{1}{|V|} \sum_{i \in V} s_i$. The following lemma, which can be found in :cite:devriendt2019::, highlights the graph-theoretical interpretation of the geometric object $C_V$. :lemma: {:label: lem-centroid} **(From :cite:devriendt2019::)** Given a graph $G$ and its GLEE embedding $S$, consider two disjoint node sets $V_1$ and $V_2$. Then, the number of edges with one endpoint in $V_1$ and one endpoint in $V_2$, is given by $${:label: centroid-edges} -|V_1||V_2| \; C_{V_1}^\top \cdot C_{V_2}. $$ :: :proof: By linearity of the dot product, we have $${:label: centroid-proof} |V_1||V_2| C_{V_1}^\top \cdot C_{V_2} = \sum_{i \in V_1} \sum_{j \in V_2} s_i \cdot s_j^\top = -\sum_{i \in V_1} \sum_{j \in V_2} a_{ij}. $$ The expression on the right is precisely the required quantity. :: :ref:lem-centroid:: says that we can use the dot product between the centroids of two node sets to count the number of edges that are shared by them. Thus, we now reformulate the problem of finding the number of CN between two nodes in terms of centroids of node sets. In the following, we use $N(i)$ to denote the neighbourhood of node $i$, that is, the set of nodes connected to it. :lemma: {:label: lem-cn} Let $i, j \in V$ be non-neighbours. Then, the number of CN of $i$ and $j$, denoted by $CN(i, j)$, is given by $${:label: cn-formula} CN(i,j) = -\deg(i) \; C_{N(i)} \cdot s_j^\top = -\deg(j) \; C_{N(j)} \cdot s_i^\top. $$ :: :proof: Apply :ref:lem-centroid:: to the node sets $V_1 = N(i)$ and $V_2 = \{j\}$, or, equivalently, to $V_1 = N(j)$ and $V_2 = \{i\}$. :: Now assume we have the $d$-dimensional GLEE of $G$. We approximate $CN(i,j)$ by estimating both $\deg(i)$ and $C_{N(j)}$. First, we know from :ref:cor-properties:: that $\deg(i) \approx \|s_i^d\|^2$. Second, we define the approximate neighbour set of $i$ as $\hat{N}(i) = \{k : s_k^d \cdot (s_i^d)^\top < \hat\theta \}$, where $\hat\theta$ is any of the estimators from :ref:sec-reconstruction::. We can now write $${:label: cn-approx} CN(i,j) \approx -\|s_i^d\|^2 C_{\hat{N}(i)} \cdot (s_j^d)^\top. $$ The higher the value of this expression, the more confident is our prediction that the link $(i,j)$ exists. #### Number of paths of length 3 {:label: sec-paths-length-3} A common critique of the original LE algorithm is that it only takes into account first-order connections, which were considered in :ref:sec-common-neighbours::. Furthermore, authors of :cite:kovacs2019:: point out that the application of link prediction heuristics CN and JS does not have a solid theoretical grounding for certain types of biological networks such as protein--protein interaction networks. They further propose to use the (normalized) number of paths of length three (L3) between two nodes to perform link prediction. We next present a way to approximate L3 using GLEE. This achieves good performance in those networks where CN and JS are invalid and show that GLEE can take into account higher-order connectivity of the graph. :lemma: {:label: lem-l3} Assume $S$ is the GLEE of a graph $G$ of dimension $d = n$. Then, the number of paths of length three between two distinct nodes $i$ and $j$ is $${:label: l3-formula} L3(i,j) = -\deg(i)\deg(j) \; C_{N(i)}^\top \cdot C_{N(j)} + \sum_{k \in N(i) \cap N(j)} \|s_k\|^2. $$ :: :proof: The number of paths of length three between $i$ and $j$ is $(\mathbf{A}^3)_{ij}$, where $\mathbf{A}$ is the adjacency matrix of $G$. We have $${:label: l3-proof} \begin{align} (\mathbf{A}^3)_{ij} = \sum_{k \in N(i)} \sum_{l \in N(j)} a_{kl} = - \sum_{k \in N(i)} \sum_{\substack{l \in N(j) \\ l \neq k}} s_k \cdot s_l^\top \\ = -\sum_{k \in N(i)} \sum_{l \in N(j)} s_k \cdot s_l^\top + \sum_{k \in N(i) \cap N(j)} s_k \cdot s_k^\top \\ = -|N(i)||N(j)| C_{N(i)}^\top \cdot C_{N(j)} + \sum_{k \in N(i) \cap N(j)} \|s_k\|^2, \end{align} $$ where the last expression follows by the linearity of the dot product, and is equivalent to :ref:l3-formula::. :: When $d < n$, we can estimate $\deg(i)$ by $\|s_i^d\|^2$ and $N(i)$ by $\hat{N}(i)$ as before, with the help of an estimator $\hat\theta$ from :ref:sec-reconstruction::. ### Runtime analysis {:label: sec-runtime} On a graph $G$ with $n$ nodes, finding the $k$ largest eigenvalues and eigenvectors of the Laplacian takes $O(kn^2)$ time, if one uses algorithms for fast approximate singular value decomposition :cite:halko2011,trefethen1997::. Given a $k$-dimensional embedding matrix $S$, reconstructing the graph is as fast as computing the product $S \cdot S^\top$ and applying the threshold $\theta$ to each entry, thus it takes $O(n^\omega + n^2)$, where $\omega$ is the exponent of matrix multiplication. Approximating the number of CN between two nodes depends only on the dot products between embeddings corresponding to their neighbours, thus it takes $O(k \min(\deg(i), \deg(j)))$, while approximating the number of paths of length 3 takes $O(k \deg(i) \deg(j))$. ## Related work {:label: sec-related-work} Spectral analyses of the Laplacian matrix have multiple applications in graph theory, network science and graph mining :cite:newman2018,spielman2017,vanmieghem2010,spielman2011::. Indeed, the eigendecomposition of the Laplacian has been used for sparsification :cite:spielman2011::, clustering :cite:vonluxburg2007::, dynamics :cite:prakash2014,vanmieghem2014::, robustness :cite:jamakovic2008,shahrivar2015::, etc. We here discuss those applications that are related to the general topic of this work, namely, dimensionality reduction of graphs. One popular application is the use of Laplacian eigenvectors for graph drawing :cite:koren2005,pisanski2000::, which can be thought of as graph embedding for the specific objective of visualization. In :cite:pisanski2000::, one such method is outlined, which, similarly to GLEE, assigns a vector, or higher-dimensional position, to each node in a graph using the eigenvectors of its Laplacian matrix, in such a way that the resulting vectors have certain desirable geometric properties. However, in the case of :cite:pisanski2000::, those geometric properties are externally enforced as constraints in an optimization problem, whereas GLEE uses the intrinsic geometry already present in a particular decomposition of the Laplacian. Furthermore, their method focuses on the eigenvectors corresponding to the smallest eigenvalues of the Laplacian, while GLEE uses those corresponding to the largest eigenvalues, that is, to the best approximation to the Laplacian through singular value decomposition. On another front, many graph embedding algorithms have been proposed, see for example :cite:goyal2018,hamilton2017:: for extensive reviews. Most of these methods fall in one of the following categories: matrix factorization, random walks or deep architectures. Of special importance to us are methods that rely on matrix factorization. Among many advantages, we have at our disposal the full toolbox of spectral linear algebra to study them :cite:charisopoulos2019,chen2015,chen2017b,levin2018::. Examples in this category are the aforementioned Laplacian Eigenmaps (LE) :cite:belkin2002,belkin2003:: and graph factorization (GF) :cite:ahmed2013::. One important difference between GLEE and LE is that LE uses the small eigenvalues of the normalized Laplacian $\mathbf{D}^{-1}\mathbf{L}$, while GLEE uses the large eigenvalues of $\mathbf{L}$. Furthermore, LE does not present the rich geometry of the simplex. GF finds a decomposition of the weighted adjacency matrix $\mathbf{W}$ with a regularization term. Their objective is to find embeddings $\{s_i\}$ such that $s_i \cdot s_j = a_{ij}$, whereas in our case we try to reconstruct $s_i \cdot s_j = \mathbf{L}_{ij}$. This means that the embeddings found by GF will present different geometric properties. There are many other methods of dimensionality reduction on graphs that depend on matrix factorization :cite:cai2011,kuang2012,wang2017::. However, even if some parameterization, or special case, of any of these methods results in a method resembling the singular value decomposition of the Laplacian (thus imitating GLEE), to the authors' knowledge none of these methods make direct use of its intrinsic geometry. Among the methods based on random walks we find DeepWalk :cite:perozzi2014:: and node2vec :cite:grover2016::, both of which adapt the framework of word embeddings :cite:mikolov2013:: to graphs by using random walks and optimize a shallow architecture. It is also worth mentioning NetMF :cite:qiu2018:: which unifies several methods in a single algorithm that depends on matrix factorization and thus unifies the two previous categories. Among the methods using deep architectures, we have the deep autoencoder Structural Deep Network Embedding (SDNE) :cite:wang2016::. It penalizes representations of similar nodes that are far from each other using the same objective as LE. Thus, SDNE is also based on the distance-minimization approach. There is also :cite:cao2016:: which obtains a non-linear mapping between the probabilistic mutual information (PMI) matrix of a sampled network and the embedding space. This is akin to applying the distance-minimization assumption not to the graph directly but to the PMI matrix. Others have used geometric approaches to embedding. For example, :cite:estrada2014:: and :cite:pereda2019:: find embeddings on the surface of a sphere, while :cite:papadopoulos2012:: and :cite:nickel2018:: use the hyperbolic plane. These methods are generally developed under the assumption that the embedding space is used to generate the network itself. They are therefore aimed at recovering the generating coordinates, and not, as in GLEE's case, at finding a general representation suitable for downstream tasks. ## Experiments {:label: sec-experiments} We put into practice the procedures detailed in :ref:sec-reconstruction:: and :ref:sec-link-prediction:: to showcase GLEE's performance in the tasks of link prediction and graph reconstruction. Code to compute the GLEE embeddings of networks and related computations is publicly available at :cite:torres2019::. For our experiments, we use the following baselines: GF because it is a direct factorization of the adjacency matrix, node2vec because it is regarded as a reference point among those methods based on random walks, SDNE because it aims to recover the adjacency matrix of a graph (a task GLEE excels at), NetMF because it generalizes several other well-known techniques and LE because it is the method that most directly resembles our own. In this way, we cover all of the categories explained in :ref:sec-related-work:: and use either methods that resemble GLEE closely or methods that have been found to generalize other techniques. For node2vec and SDNE, we use default parameters. For NetMF, we use the spectral approximation with rank 256. The datasets we use are outlined in :ref:tbl-datasets::. Beside comparing GLEE to the other algorithms, we are interested in how the graph's structure affects performance of each method. This is why we have chosen datasets have similar number of nodes and edges, but different values of average clustering coefficient. Accordingly, we report our results with respect to the average clustering coefficient of each dataset and the number of dimensions of the embedding (the only parameter of GLEE). In :ref:sec-estimator-comparison::, we compare the performance of each estimator explained in :ref:sec-reconstruction::. In the following experiments, we use $\hat\theta_k$ as our estimator for $\theta_{\text{opt}}$. :table: {:label: tbl-datasets} :thead: :tr: :td: Name :: :td: $n$ :: :td: $m$ :: :td: $\bar{c}$ :: :td: Type :: :: :: :tbody: :tr: :td: `PPI` :cite:rolland2014:: :: :td: 4,182 :: :td: 13,343 :: :td: 0.04 :: :td: Protein interaction :: :: :tr: :td: `wiki-vote` :cite:leskovec2010:: :: :td: 7,066 :: :td: 100,736 :: :td: 0.14 :: :td: Endorsement :: :: :tr: :td: `caida` :cite:leskovec2007:: :: :td: 26,475 :: :td: 53,381 :: :td: 0.21 :: :td: AS Internet :: :: :tr: :td: `CA-HepTh` :cite:leskovec2007:: :: :td: 8,638 :: :td: 24,806 :: :td: 0.48 :: :td: Co-authorship :: :: :tr: :td: `CA-GrQc` :cite:leskovec2007:: :: :td: 4,158 :: :td: 13,422 :: :td: 0.56 :: :td: Co-authorship :: :: :: :caption: Datasets used in this work (all undirected, unweighted): number of nodes $n$, number of edges $m$ and average clustering coefficient $\bar{c}$ of the largest connected component of each network. AS, autonomous systems of the Internet. :: ### Graph reconstruction {:label: sec-exp-reconstruction} *Given a GLEE matrix $S^d$, how well can we reconstruct the original graph?* This is the task of graph reconstruction. We use as performance metric the *precision at $k$* measure, defined as the precision of the first $k$ reconstructed edges. Note that *precision at $k$* must always decrease when $k$ grows large, as there will be few correct edges left to reconstruct. :html: { :path: reconstruction_machine.html } :caption: **The Reconstruction Machine.** Drag the threshold slider to reconstruct the Karate Club graph from its GLEE embedding. Green edges are true positives, red are false positives, and gray are missed edges. Change the embedding dimension to see how higher dimensions yield cleaner separation of the dot-product modes, making reconstruction easier. :: Following :ref:sec-reconstruction::, we reconstruct the edge $(i,j)$ if $s_i^d \cdot s_j^d < \hat\theta$. The further the dot product is from 0 (the ideal value for non-edges), the more confident we are in the existence of this edge. For LE, GF, node2vec and NetMF, we reconstruct the edge $(i,j)$ according to how small the distance between their embeddings is. For both GF, node2vec and NetMF, we reconstruct edges based on how high their dot product is. SDNE is a deep autoencoder and thus its very architecture involves a mechanism to reconstruct the adjacency matrix of the input graph. We show results in :ref:fig-reconstruction::, where we have ordered datasets from left to right in ascending order of clustering coefficient, and from bottom up in ascending order of embedding dimension $d$. (GF results omitted from :ref:fig-reconstruction:: as it scored close to 0 for all values of $k$ and $d$.) On `CA-GrQc`, for low embedding dimension $d = 32$, SDNE performs best among all methods, followed by node2vec and LE. However, as $d$ increases, GLEE substantially outperforms all others, reaching an almost perfect precision score at the first 10,000 reconstructed edges. This analysis is also valid for `CA-HepTh`, another dataset with high clustering coefficient. However, on `PPI`, our dataset with lowest clustering coefficient, GLEE drastically outperforms all other methods for all values of $d$. Interestingly, LE and node2vec perform well compared to other methods in datasets with high clustering, but their performance drops to near zero on `PPI`. We hypothesize that this is due to the fact that LE and node2vec depend on the 'community-aware' assumption, thereby assuming that two proteins in the same cluster would interact with each other. This is the exact point that :cite:kovacs2019:: refutes. On the other hand, GLEE directly encodes graph structure, making no assumptions about the original graph, and its performance depends more directly on the embedding dimension than on the clustering coefficient, or on any other assumption about graph structure. GLEE's performance on datasets `PPI`, `Wiki-Vote` and `caida` point to the excellent potential of our method in the case of low clustering coefficient. :html: { :path: fig2_reconstruction.html :label: fig-reconstruction } :caption: Graph reconstruction. GLEE performs best in networks with low clustering coefficient, presumably because it depends on the large eigenvalues of the Laplacian, which encode micro-level graph structure. :: ### Link prediction {:label: sec-exp-link-prediction} :html: { :path: link_prediction_machine.html } :caption: **The Link Prediction Machine.** Given the full GLEE embedding of the Karate Club, all node pairs are scored by their dot product: more negative means more likely to be an edge. Drag the threshold to classify pairs as edges or non-edges. The ROC curve shows the trade-off, and the red dot marks your operating point. Increase the embedding dimension and watch the AUC climb toward 1. :: *Given the embedding of a large subgraph of some graph $G$, can we identify which edges are missing?* The experimental setup is as follows. Given a graph $G$ with $n$ nodes, node set $V$ and edge set $E_{\text{obs}}$, we randomly split its edges into train and test sets $E_{\text{train}}$ and $E_{\text{test}}$. We use $|E_{\text{train}}| = 0.75n$, and we make sure that the subgraph induced by $E_{\text{train}}$, denoted by $G_{\text{train}}$, is connected and contains every node of $V$. We then proceed to compute the GLEE of $G_{\text{train}}$ and test on $E_{\text{test}}$. We report area under receiver operator curve (AUC) metric for this task. We use both techniques described in :ref:sec-common-neighbours:: and :ref:sec-paths-length-3::, which we label GLEE and GLEE-L3, respectively. :ref:fig-linkpred:: shows that node2vec repeats the behaviour seen in graph reconstruction of increased performance as clustering coefficient increases, though again it is fairly constant with respect to embedding dimension. This observation is also true for NetMF. On the high clustering datasets, LE and GLEE have comparable performance to each other. However, either GLEE or GLEE-L3 perform better than all others on the low clustering datasets `PPI`, `Wiki-Vote`, as expected. Also as expected, the performance of GLEE-L3 decreases as average clustering increases. Note that GLEE and LE generally improve performance when $d$ increases, whereas node2vec and SDNE do not improve. (GF and SDNE not shown in :ref:fig-linkpred:: for clarity. They scored close to 0.5 and 0.6 in all datasets independently of $d$.) The reason why none of the methods studied here perform better than 0.6 AUC in the `caida` dataset is an open question left for future research. We conclude that the hybrid approach of NetMF is ideal for high clustering coefficient, whereas GLEE is a viable option in the case of low clustering coefficient as evidenced by the results on `PPI`, `Wiki-Vote` and `caida`. :html: { :path: fig3_linkpred.html :label: fig-linkpred } :caption: Link prediction results. We approximate the number of CN (GLEE), or the number of paths of length 3 (GLEE-L3). Each circle is the average of 10 realizations; error bars too small to show at this scale. GF and SDNE perform close to 0.5 and 0.6 independently of $d$ or dataset (not shown). Datasets ordered from left to right in increasing order of clustering. :: ## Conclusions {:label: sec-conclusions} In this work, we have presented the GLEE, a geometric approach to graph embedding that exploits the intrinsic geometry of the Laplacian. When compared to other methods, we find that GLEE performs the best when the underlying graph has low clustering coefficient, while still performing comparably to other state-of-the-art methods when the clustering coefficient is high. We hypothesize that this is due to the fact that the large eigenvalues of the Laplacian correspond to the small eigenvalues of the adjacency matrix and thus represent the structure of the graph at a micro-level. Furthermore, we find that GLEE's performance increases as the embedding dimension increases, something we do not see in other methods. In contrast to techniques based on neural networks, which have many hyperparameters and costly training phases, GLEE has only one parameter other than the embedding dimension, the threshold $\theta$, and we have provided three different ways of optimizing for it. Indeed, GLEE only depends on the singular value decomposition of the Laplacian matrix. We attribute these desirable properties of GLEE to the fact that it departs from the traditional literature of graph embedding by replacing the 'community aware' notion (similar nodes' embeddings must be similar) with the notion of directly encoding graph structure using the geometry of the embedding space. In all, we find that GLEE is a promising alternative for graph embedding due to its simplicity in both theoretical background and computational implementation, especially in the case of low clustering coefficient. By taking a direct geometric encoding of graph structure using the simplex geometry, GLEE covers the gap left open by the 'community-aware' assumption of other embedding techniques, which requires high clustering. Future lines of work will explore what other geometric properties of the embedding space can yield interesting insight, as well as what are the important structural properties of graphs, such as clustering coefficient, that affect the performance of these methods. :paragraph: {:class: small} **Funding.** National Science Foundation (IIS-1741197); and by the Combat Capabilities Development Command Army Research Laboratory and was accomplished (Cooperative Agreement Number W911NF-13-2-0045; U.S. Army Research Lab Cyber Security CRA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Combat Capabilities Development Command Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes not withstanding any copyright notation here on. :appendix: ## Threshold estimators {:label: sec-threshold-estimators} We present two other estimators of $\theta_{\text{opt}}$ to accompany the heuristic $\hat\theta_c = -0.5$ mentioned in :ref:sec-reconstruction::. :html: { :path: figA1_dotproducts.html :label: fig-dotproducts } :caption: Distribution of dot products $\{s_i \cdot s_j^\top\}$ in the Karate Club graph. Columns show different dimension $d$. Dot products of existing edges are in blue. Non-edges are in orange. Each row shows a different estimator of $\theta_{\text{opt}}$. Top row shows the constant value $\hat\theta_c = -0.5$. Middle row shows our GMM estimator $\hat\theta_g$ (:ref:alg-gmm::). Bottom row shows our Kernel Density Estimator $\hat\theta_k$ (:ref:sec-kde::). :: ### Kernel density estimation {:label: sec-kde} As can be seen in :ref:fig-dotproducts::, the problem of finding a value of $\theta$ that sufficiently separates the peaks corresponding to edges (around the peak centred at $-1$) and non-edges (around the peak centred at $0$) can be stated in terms of density estimation. That is, given the histogram of values of $s_i \cdot s_j^\top$ for all $i,j$, we can approximate the density of this empirical distribution by some density function $f_k$. A good heuristic estimator of $\theta_{\text{opt}}$ is the value that minimizes $f_k$ between the peaks near $-1$ and $0$. For this purpose, we use Kernel Density Estimation over the distribution of $s_i \cdot s_j^\top$ and a box kernel (a.k.a. 'top hat') kernel function to define $${:label: kde-estimator} f_k(x) \propto \sum_{i < j} \mathbf{1}\{x - s_i \cdot s_j^\top < h\}. $$ We then use gradient descent to find the minimal value of $f_k$ between the values of $-1$ and $0$. We call this value $\hat\theta_k$. We have found experimentally that a value of $h = 0.3$ gives excellent results, achieving near zero error in the reconstruction task (:ref:fig-dotproducts::, middle row). ### Gaussian mixture models {:label: sec-gmm} Here, we use a GMM over the distribution of $s_i \cdot s_j^\top$. The model will find the two peaks near $-1$ and $0$ and fit each to a Gaussian distribution. Once the densities of said Gaussians have been found, say $f_1$ and $f_2$, we define the estimator $\hat\theta_g$ as that point at which the densities are equal (see :ref:fig-dotproducts::, bottom row). However, we found that a direct application of this method yields poor results due to the sparsity of network datasets. High sparsity implies that the peak at $0$ is orders of magnitude higher than the one at $-1$. Thus, the left peak will usually be hidden by the tail of the right one so that the GMM cannot detect it. To solve this issue we take two steps. First, we use a Bayesian version of GMM that accepts priors for the Gaussian means and other parameters. This guides the GMM optimization algorithm to find the peaks at the right places. Second, we sub-sample the distribution of dot products in order to minimize the difference between the peaks, and then to fix it back after the fit. Concretely, put $r = \sum_{i < j} \mathbf{1}\{s_i \cdot s_j^\top < \hat\theta_c\}$. That is, $r$ is the number of dot products less than the constant $\hat\theta_c = -0.5$. Instead of fitting the GMM to all the observed dot products, we fit it to the set of all $r$ dot products less than $\hat\theta_c$ plus a random sample of $r$ dot products greater than $\hat\theta_c$. This temporarily fixes the class imbalance, which we recover after the model has been fit as follows. The GMM fit will yield a density for the sub-sample as $f_g = w_1 f_1 + w_2 f_2$, where $f_i$ is the density of the $i$th Gaussian, and $w_i$ are the mixture weights, for $i = 1, 2$. Since we sub-sampled the distribution, we will get $w_1 \approx w_2 \approx 0.5$, but we need the weights to reflect the original class imbalance. For this purpose, we define $\hat{w}_1 = \hat{m}/\binom{n}{2}$ and $\hat{w}_2 = 1 - \hat{w}_1$, where $\hat{m}$ is an estimate of the number of edges in the graph. (This can be estimated in a number of ways, for example one may put $\hat{m} = r$, or $\hat{m} = n\log(n)$.) Finally, we define the estimator as the value that satisfies $${:label: gmm-estimator} \hat{w}_1 f_1(\hat\theta_g) = \hat{w}_2 f_2(\hat\theta_g), $$ under the constraint $-1 < \hat\theta_g < 0$. Since $f_1$ and $f_2$ are known Gaussian densities, :ref:gmm-estimator:: can be solved analytically. In this case, due to sparsity, the problem of optimizing the GMM is one of non-parametric density estimation with extreme class imbalance. We solve it by utilizing priors for the optimization algorithm, as well as sub-sampling the distribution of dot products, according to some of its known features (i.e. the fact that the peaks will be found near $-1$ and $0$), and we account for the class imbalance by estimating graph sparsity separately. Finally, we define the estimator $\hat\theta_g$ according to :ref:gmm-estimator::. :ref:alg-gmm:: gives an overview of this procedure. For a comparison between the effectiveness of the three different estimators $\hat\theta_c, \hat\theta_k, \hat\theta_g$, see :ref:sec-estimator-comparison::. :algorithm: {:label: alg-gmm} \begin{algorithm} \caption{Estimating $\theta_{\text{opt}}$ with a Gaussian Mixture Model} \begin{algorithmic} \PROCEDURE{GMM}{$\{s_i\}_{i=1}^n, \hat\theta_c, \hat{m}$} \STATE $L \gets \{s_i \cdot s_j^\top : s_i \cdot s_j^\top < \hat\theta_c\}$ \STATE $R \gets$ random sample of size $|L|$ of $\{s_i \cdot s_j^\top : s_i \cdot s_j^\top \geq \hat\theta_c\}$ \STATE $w_1, w_2, f_1, f_2 \gets$ fit a Bayesian GMM to $L \cup R$ \STATE $\hat{w}_1 \gets \hat{m} / \binom{n}{2}$ \STATE $\hat{w}_2 \gets 1 - \hat{w}_1$ \STATE $\hat\theta_g \gets$ solution of $\hat{w}_1 f_1(\theta) = \hat{w}_2 f_2(\theta)$ \RETURN $\hat\theta_g$ \ENDPROCEDURE \end{algorithmic} \end{algorithm} :: ## Estimator comparison {:label: sec-estimator-comparison} In :ref:sec-reconstruction:: and :ref:sec-threshold-estimators::, we outlined three different schemes to estimate $\theta_{\text{opt}}$ which resulted in $\hat\theta_c, \hat\theta_k, \hat\theta_g$. *Which one is the best?* We test each of these estimators on three random graph models: Erdős–Rényi (ER) :cite:erdos1960::, Barabási–Albert (BA) :cite:barabasi1999:: and Hyperbolic Graphs (HG) :cite:krioukov2010::. For each random graph with adjacency matrix $\mathbf{A}$, we compute the Frobenius norm of the difference between the reconstructed adjacency matrix $\hat{\mathbf{A}}$ using each of the three estimators. In :ref:fig-estimators::, we show our results. :html: { :path: figB1_estimators.html :label: fig-estimators } :caption: Estimator comparison. We compute the three different estimators on three different random graph models: ER, BA and HG. All graphs have $n = 10^3$ nodes, and average degree $\langle k \rangle \approx 8$. Hyperbolic graphs generated with degree distribution exponent $\gamma = 2.3$. We show the average of 20 experiments; error bars mark two standard deviations. Values normalized in the range $[0, 1]$. :: We see that $\hat\theta_c$ and $\hat\theta_k$ achieve similar performance across datasets, while $\hat\theta_g$ outperforms the other two for ER at $d = 512$, though it has high variability in the other models. From these results, we conclude that at low dimensions $d = 32$, too much information has been lost and thus there is no hope of learning a value of $\hat\theta$ that outperforms the heuristic $\theta_c = -0.5$. However, at larger dimensions, the estimators $\hat\theta_g$ and $\hat\theta_k$ perform better, with different degrees of variability. We conclude also that no single heuristic for $\hat\theta$ is best for all types of graphs. In the rest of our experiments, we use $\hat\theta_k$ as our estimator for $\theta_{\text{opt}}$. We highlight that even though $\theta_k$ is better than $\theta_c$ in some datasets, it might be costly to compute, while $\theta_c$ incurs no additional costs. :references: @article{chen2018, title={A tutorial on network embeddings}, author={Chen, H. and Perozzi, B. and Al-Rfou, R. and Skiena, S.}, year={2018}, journal={Preprint}, doi={10.48550/arXiv.1808.02590} } @article{chen2017, title={Fast warped graph embedding: unifying framework and one-click algorithm}, author={Chen, S. and Niu, S. and Akoglu, L. and Kovacevic, J. and Faloutsos, C.}, year={2017}, journal={Preprint}, doi={10.48550/arXiv.1702.05764} } @article{srinivasan2020, title={On the equivalence between positional node embeddings and structural graph representations}, author={Srinivasan, B. and Ribeiro, B.}, year={2020}, journal={Proceedings of ICLR}, doi={10.48550/arXiv.1910.00452} } @article{jin2019, title={Latent network summarization: bridging network embedding and summarization}, author={Jin, D. and Rossi, R. A. and Koh, E. and Kim, S. and Rao, A. and Koutra, D.}, year={2019}, journal={Proceedings of KDD}, publisher={ACM}, doi={10.1145/3292500.3330992} } @article{devriendt2019, title={The simplex geometry of graphs}, author={Devriendt, K. and Van Mieghem, P.}, year={2019}, journal={Journal of Complex Networks}, doi={10.1093/comnet/cny036} } @book{fiedler2011, title={Matrices and Graphs in Geometry}, author={Fiedler, M.}, year={2011}, publisher={Cambridge University Press}, doi={10.1017/CBO9780511973611} } @article{belkin2002, title={Laplacian eigenmaps and spectral techniques for embedding and clustering}, author={Belkin, M. and Niyogi, P.}, year={2002}, journal={Advances in Neural Information Processing Systems 14}, publisher={MIT Press}, doi={10.7551/mitpress/1120.003.0080} } @article{belkin2003, title={Laplacian eigenmaps for dimensionality reduction and data representation}, author={Belkin, M. and Niyogi, P.}, year={2003}, journal={Neural Computation}, volume={15}, doi={10.1162/089976603321780317} } @article{zachary1977, title={An information flow model for conflict and fission in small groups}, author={Zachary, W. W.}, year={1977}, journal={Journal of Anthropological Research}, volume={33}, doi={10.1086/jar.33.4.3629752} } @article{sarkar2011, title={Theoretical justification of popular link prediction heuristics}, author={Sarkar, P. and Chakrabarti, D. and Moore, A. W.}, year={2011}, journal={Proceedings of IJCAI}, doi={10.5591/978-1-57735-516-8/IJCAI11-453} } @article{kovacs2019, title={Network-based prediction of protein interactions}, author={Kovács, I. A. and Luck, K. and Spirohn, K. and Wang, Y. and Pollis, C. and Schlabach, S. and Bian, W. and Kim, D.-K. and Kishore, N. and Hao, T. and Calderwood, M. A. and Vidal, M. and Barabási, A.-L.}, year={2019}, journal={Nature Communications}, volume={10}, doi={10.1038/s41467-019-09177-y} } @article{halko2011, title={Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions}, author={Halko, N. and Martinsson, P.-G. and Tropp, J. A.}, year={2011}, journal={SIAM Review}, volume={53}, doi={10.1137/090771806} } @book{trefethen1997, title={Numerical Linear Algebra}, author={Trefethen, L. N. and Bau III, D.}, year={1997}, publisher={SIAM}, doi={10.1137/1.9780898719574} } @book{newman2018, title={Networks}, author={Newman, M. E. J.}, year={2018}, publisher={Oxford University Press}, doi={10.1093/oso/9780198805090.001.0001} } @article{spielman2017, title={Graphs vectors and matrices}, author={Spielman, D.}, year={2017}, journal={Bulletin of the American Mathematical Society}, volume={54}, doi={10.1090/bull/1557} } @book{vanmieghem2010, title={Graph Spectra for Complex Networks}, author={Van Mieghem, P.}, year={2010}, publisher={Cambridge University Press}, doi={10.1017/CBO9780511921681} } @article{spielman2011, title={Graph sparsification by effective resistances}, author={Spielman, D. and Srivastava, N.}, year={2011}, journal={SIAM Journal on Computing}, volume={40}, doi={10.1137/080734029} } @article{vonluxburg2007, title={A tutorial on spectral clustering}, author={Von Luxburg, U.}, year={2007}, journal={Statistics and Computing}, volume={17}, doi={10.1007/s11222-007-9033-z} } @article{prakash2014, title={Efficiently spotting the starting points of an epidemic in a large graph}, author={Prakash, B. and Vreeken, J. and Faloutsos, C.}, year={2014}, journal={Knowledge and Information Systems}, volume={38}, doi={10.1007/s10115-013-0671-5} } @article{vanmieghem2014, title={An upper bound for the epidemic threshold in exact Markovian SIR and SIS epidemics on networks}, author={Van Mieghem, P. and Sahneh, F. D. and Scoglio, C.}, year={2014}, journal={53rd IEEE Conference on Decision and Control}, publisher={IEEE}, doi={10.1109/cdc.2014.7040365} } @article{jamakovic2008, title={On the robustness of complex networks by using the algebraic connectivity}, author={Jamakovic, A. and Van Mieghem, P.}, year={2008}, journal={NETWORKING 2008}, publisher={Springer}, doi={10.1007/978-3-540-79549-0_16} } @article{shahrivar2015, title={Robustness and algebraic connectivity of random interdependent networks}, author={Shahrivar, E. M. and Pirani, M. and Sundaram, S.}, year={2015}, journal={IFAC-PapersOnLine}, volume={48}, doi={10.1016/j.ifacol.2015.10.339} } @article{koren2005, title={Drawing graphs by eigenvectors: theory and practice}, author={Koren, Y.}, year={2005}, journal={Computers and Mathematics with Applications}, volume={49}, doi={10.1016/j.camwa.2004.08.015} } @article{pisanski2000, title={Characterizing graph drawing with eigenvectors}, author={Pisanski, T. and Shawe-Taylor, J.}, year={2000}, journal={Journal of Chemical Information and Computer Sciences}, volume={40}, doi={10.1021/ci9900938} } @article{goyal2018, title={Graph embedding techniques applications and performance: a survey}, author={Goyal, P. and Ferrara, E.}, year={2018}, journal={Knowledge-Based Systems}, volume={151}, doi={10.1016/j.knosys.2018.03.022} } @article{hamilton2017, title={Representation learning on graphs: methods and applications}, author={Hamilton, W. L. and Ying, R. and Leskovec, J.}, year={2017}, journal={IEEE Data Engineering Bulletin}, volume={40}, url={http://sites.computer.org/debull/A17sept/p52.pdf} } @article{charisopoulos2019, title={Incrementally updated spectral embeddings}, author={Charisopoulos, V. and Benson, A. R. and Damle, A.}, year={2019}, journal={Preprint}, doi={10.48550/arXiv.1909.01188} } @article{chen2015, title={Fast eigen-functions tracking on dynamic graphs}, author={Chen, C. and Tong, H.}, year={2015}, journal={Proceedings of SDM}, publisher={SIAM}, doi={10.1137/1.9781611974010.63} } @article{chen2017b, title={On the eigen-functions of dynamic graphs: fast tracking and attribution algorithms}, author={Chen, C. and Tong, H.}, year={2017}, journal={Statistical Analysis and Data Mining}, volume={10}, doi={10.1002/sam.11310} } @article{levin2018, title={Out-of-sample extension of graph adjacency spectral embedding}, author={Levin, K. and Roosta-Khorasani, F. and Mahoney, M. W. and Priebe, C. E.}, year={2018}, journal={Proceedings of ICML}, publisher={PMLR}, url={https://proceedings.mlr.press/v80/levin18a.html} } @article{ahmed2013, title={Distributed large-scale natural graph factorization}, author={Ahmed, A. and Shervashidze, N. and Narayanamurthy, S. M. and Josifovski, V. and Smola, A. J.}, year={2013}, journal={Proceedings of WWW}, publisher={ACM}, doi={10.1145/2488388.2488393} } @article{cai2011, title={Graph regularized nonnegative matrix factorization for data representation}, author={Cai, D. and He, X. and Han, J. and Huang, T. S.}, year={2011}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, volume={33}, doi={10.1109/TPAMI.2010.231} } @article{kuang2012, title={Symmetric nonnegative matrix factorization for graph clustering}, author={Kuang, D. and Ding, C. H. Q. and Park, H.}, year={2012}, journal={Proceedings of SDM}, publisher={SIAM}, doi={10.1137/1.9781611972825.10} } @article{wang2017, title={Community preserving network embedding}, author={Wang, X. and Cui, P. and Wang, J. and Pei, J. and Zhu, W. and Yang, S.}, year={2017}, journal={Proceedings of AAAI}, publisher={AAAI Press}, doi={10.1609/aaai.v31i1.10488} } @article{perozzi2014, title={DeepWalk: online learning of social representations}, author={Perozzi, B. and Al-Rfou, R. and Skiena, S.}, year={2014}, journal={Proceedings of KDD}, publisher={ACM}, doi={10.1145/2623330.2623732} } @article{grover2016, title={node2vec: scalable feature learning for networks}, author={Grover, A. and Leskovec, J.}, year={2016}, journal={Proceedings of KDD}, publisher={ACM}, doi={10.1145/2939672.2939754} } @article{mikolov2013, title={Distributed representations of words and phrases and their compositionality}, author={Mikolov, T. and Sutskever, I. and Chen, K. and Corrado, G. S. and Dean, J.}, year={2013}, journal={Advances in Neural Information Processing Systems}, publisher={Curran Associates}, url={https://proceedings.neurips.cc/paper/2013/hash/9aa42b31882ec039965f3c4923ce901b-Abstract.html} } @article{qiu2018, title={Network embedding as matrix factorization: unifying DeepWalk LINE PTE and node2vec}, author={Qiu, J. and Dong, Y. and Ma, H. and Li, J. and Wang, K. and Tang, J.}, year={2018}, journal={WSDM}, publisher={ACM}, doi={10.1145/3159652.3159706} } @article{wang2016, title={Structural deep network embedding}, author={Wang, D. and Cui, P. and Zhu, W.}, year={2016}, journal={Proceedings of KDD}, publisher={ACM}, doi={10.1145/2939672.2939753} } @article{cao2016, title={Deep neural networks for learning graph representations}, author={Cao, S. and Lu, W. and Xu, Q.}, year={2016}, journal={Proceedings of AAAI}, publisher={AAAI Press}, doi={10.1609/aaai.v30i1.10179} } @article{estrada2014, title={Hyperspherical embedding of graphs and networks in communicability spaces}, author={Estrada, E. and Sánchez-Lirola, M. G. and de la Peña, J. A.}, year={2014}, journal={Discrete Applied Mathematics}, volume={176}, doi={10.1016/j.dam.2013.05.032} } @article{pereda2019, title={Visualization and machine learning analysis of complex networks in hyperspherical space}, author={Pereda, M. and Estrada, E.}, year={2019}, journal={Pattern Recognition}, volume={86}, doi={10.1016/j.patcog.2018.09.018} } @article{papadopoulos2012, title={Popularity versus similarity in growing networks}, author={Papadopoulos, F. and Kitsak, M. and Serrano, M. Á. and Boguñá, M. and Krioukov, D.}, year={2012}, journal={Nature}, volume={489}, doi={10.1038/nature11459} } @article{nickel2018, title={Learning continuous hierarchies in the Lorentz model of hyperbolic geometry}, author={Nickel, M. and Kiela, D.}, year={2018}, journal={Proceedings of ICML}, publisher={PMLR}, url={https://proceedings.mlr.press/v80/nickel18a.html} } @software{torres2019, title={GLEE: Geometric Laplacian Eigenmap Embedding}, author={Torres, L.}, year={2019}, url={https://github.com/leotrs/glee} } @article{rolland2014, title={A proteome-scale map of the human interactome network}, author={Rolland, T. and Tasan, M. and Charloteaux, B. and Pevzner, S. J. and Zhong, Q. and Sahni, N. and Yi, S. and Lemmens, I. and Fontanillo, C. and Mosca, R. and others}, year={2014}, journal={Cell}, volume={159}, doi={10.1016/j.cell.2014.10.050} } @article{leskovec2010, title={Predicting positive and negative links in online social networks}, author={Leskovec, J. and Huttenlocher, D. P. and Kleinberg, J. M.}, year={2010}, journal={Proceedings of WWW}, publisher={ACM}, doi={10.1145/1772690.1772756} } @article{leskovec2007, title={Graph evolution: densification and shrinking diameters}, author={Leskovec, J. and Kleinberg, J. M. and Faloutsos, C.}, year={2007}, journal={ACM Transactions on Knowledge Discovery from Data}, volume={1}, doi={10.1145/1217299.1217301} } @article{erdos1960, title={On the evolution of random graphs}, author={Erdős, P. and Rényi, A.}, year={1960}, journal={Publications of the Mathematical Institute of the Hungarian Academy of Sciences}, volume={5} } @article{barabasi1999, title={Emergence of scaling in random networks}, author={Barabási, A.-L. and Albert, R.}, year={1999}, journal={Science}, volume={286}, doi={10.1126/science.286.5439.509} } @article{krioukov2010, title={Hyperbolic geometry of complex networks}, author={Krioukov, D. and Papadopoulos, F. and Kitsak, M. and Vahdat, A. and Boguñá, M.}, year={2010}, journal={Physical Review E}, volume={82}, doi={10.1103/PhysRevE.82.036106} } ::

GLEE: Geometric Laplacian Eigenmap Embedding

Leo Torres¹, Kevin S. Chan², Tina Eliassi-Rad³

Affiliations

¹Network Science Institute, Northeastern University, Boston, MA 02115, USA
²U.S. CCDC Army Research Laboratory, Adelphi, MD 20783, USA
³Network Science Institute and Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA

Correspondence: [email protected]

Originally published in Journal of Complex Networks 8(2), 2020 (doi:10.1093/comnet/cnaa007). This is an interactive edition with explorable figures and does not contain new scientific content.

Abstract

Graph embedding seeks to build a low-dimensional representation of a graph $G$. This low-dimensional representation is then used for various downstream tasks. One popular approach is Laplacian Eigenmaps (LE), which constructs a graph embedding based on the spectral properties of the Laplacian matrix of $G$. The intuition behind it, and many other embedding techniques, is that the embedding of a graph must respect node similarity: similar nodes must have embeddings that are close to one another. Here, we dispose of this distance-minimization assumption. Instead, we use the Laplacian matrix to find an embedding with geometric properties instead of spectral ones, by leveraging the so-called simplex geometry of $G$. We introduce a new approach, Geometric Laplacian Eigenmap Embedding, and demonstrate that it outperforms various other techniques (including LE) in the tasks of graph reconstruction and link prediction.

Keywords: graph embedding, graph Laplacian, simplex geometry

This paper contains 8 interactive figures. Scroll to explore.

1. Introduction

Widget 1.1. The same graph, three ways of seeing it. Zachary's Karate Club smoothly morphs between its network layout (force-directed), its full 33-dimensional simplex (PCA-projected), and its 3D GLEE embedding (sphere-normalized). Drag the slider or let it loop.

Graphs are ubiquitous in real-world systems from the internet to the world wide web to social media to the human brain. The application of machine learning to graphs is a popular and active research area. One way to apply known machine learning methods to graphs is by transforming the graph into a representation that can be directly fed to a general machine learning pipeline. For this purpose, the task of graph representation learning, or graph embedding, seeks to build a vector representation of a graph by assigning to each node a feature vector that can then be fed into any machine learning algorithm.

Popular graph embedding techniques seek an embedding where the distance between the latent representations of two nodes represents their similarity. For example, [1] calls this the 'community aware' property (nodes in a community are considered similar, and thus their representations must be close to one another), while [2] calls it a 'symmetry' between the node domain and the embedding domain. Others call methods based on this property with various names such as 'positional' [3] or 'proximity-based' [4] embeddings. Consequently, many of these approaches are formulated in such a way that the distance (in the embedding space) between nodes that are similar (in the original data domain) is small. Here, we present a different approach. Instead of focusing on minimizing the distance between similar nodes, we seek an embedding that preserves the most basic structural property of the graph, namely adjacency; the works [3, 4] call this approach 'structural' node embeddings. Concretely, if the nodes $i$ and $j$ are neighbours in the graph $G$ with $n$ nodes, we seek $d$-dimensional vectors $s_i$ and $s_j$ such that the adjacency between $i$ and $j$ is encoded in the geometric properties of $s_i$ and $s_j$, for some $d \ll n$. Examples of geometric properties are the dot product of two vectors (which is a measure of the angle between them), the length (or area or volume) of a line segment (or polygon or polyhedron), the centre of mass or the convex hull of a set of vectors, among others. In Section 3, we propose one such geometric embedding technique, called Geometric Laplacian Eigenmap Embedding (GLEE), that is based on the properties of the Laplacian matrix of $G$, and we then proceed to compare it to the original formulation of Laplacian Eigenmaps (LE) as well as other popular embedding techniques.

GLEE has deep connections with the so-called simplex geometry of the Laplacian [5, 6]. Fiedler [6] first made this observation, which highlights the bijective correspondence between the Laplacian matrix of an undirected, weighted graph and a geometric object known as a simplex.

Widget 1.1 shows the Karate Club graph morphing from its network layout into its simplex. Using this relationship, we find a graph embedding such that the representations $s_i, s_j$ of two non-adjacent nodes $i$ and $j$ are always orthogonal, $s_i \cdot s_j = 0$, thus achieving a geometric encoding of adjacency. Note that this does not satisfy the 'community-aware' property of [1]. For example, the geometric embedding $s_i$ of node $i$ will be orthogonal to each non-neighbouring node, including those in its community. Thus, $s_i$ is not close to other nodes in its community, whether we define closeness in terms of Euclidean distance or cosine similarity. However, we show that this embedding---based on the simplex geometry---contains desirable information, and that it outperforms the original, distance-minimizing, formulation of LE on the tasks of graph reconstruction and link prediction in certain cases.

The contributions of this work are as follows:

We present a geometric framework for graph embedding that departs from the tradition of looking for representations that minimize the distance between similar nodes by highlighting the intrinsic geometric properties of the Laplacian matrix.
The proposed method, GLEE, while closely related to the LE method, outperforms LE in the tasks of link prediction and graph reconstruction. Moreover, a common critique of LE is that it only considers first-order adjacency in the graph. We show that GLEE takes into account higher-order connections (see Section 3.2).
The performance of existing graph embedding methods (which minimize distance between similar nodes) suffers when the graph's average clustering coefficient is low. This is not the case for GLEE.

In Section 2, we recall the original formulation of LE, in order to define the GLEE in Section 3 and discuss its geometric properties. We mention related work in Section 4 and present experimental studies of GLEE in Section 5. We finish with concluding remarks in Section 6.

2. Background on LE

Belkin and Niyogi [7, 8] introduced LE as a general-purpose method for embedding and clustering an arbitrary dataset. Given a dataset $\{x_i\}_{i=1}^n$, a proximity graph $G = (V, \mathbf{A})$ is constructed with node set $V = \{x_i\}$ and edge weights $\mathbf{A} = (a_{ij})$. The edge weights are built using one of many heuristics that determine which nodes are close to each other and can be binary or real valued. Some examples are $k$ nearest neighbours, $\varepsilon$-neighbourhoods, heat kernels, etc. To perform the embedding, one considers the Laplacian matrix of $G$, defined as $\mathbf{L} = \mathbf{D} - \mathbf{A}$, where $\mathbf{D}$ is the diagonal matrix whose entries are the degrees of each node. One of the defining properties of $\mathbf{L}$ is the value of the quadratic form:

$$ y^\top \mathbf{L} y = \frac{1}{2}\sum_{ij} a_{ij}(y_i - y_j)^2. $$

(2.1)

The vector $y^*$ that minimizes the value of (2.1) will be such that the total weighted distance between all pairs of nodes is minimized. Here, $y_i$ can be thought of as the one-dimensional embedding of node $i$. One can then extend this procedure to arbitrary $d$-dimensional node embeddings by noting that $tr(\mathbf{Y}^\top \mathbf{L}\mathbf{Y}) = \sum_{ij} a_{ij} \|y_i - y_j\|^2$, where $\mathbf{Y} \in \mathbb{R}^{n \times d}$ and $y_i$ is the $i$th row of $\mathbf{Y}$. The objective function in this case is

$$ \begin{align} \mathbf{Y}^* = \arg\min_{\mathbf{Y} \in \mathbb{R}^{n \times d}} tr(\mathbf{Y}^\top \mathbf{L}\mathbf{Y}) \\ \text{s.t. } \mathbf{Y}^\top \mathbf{D}\mathbf{Y} = \mathbf{I}. \end{align} $$

(2.2)

Importantly, the quantity $tr(\mathbf{Y}^\top \mathbf{L}\mathbf{Y})$ has a global minimum at $\mathbf{Y} = 0$. Therefore, a restriction is necessary to guarantee a non-trivial solution. Belkin and Niyogi [7, 8] choose $\mathbf{Y}^\top \mathbf{D}\mathbf{Y} = \mathbf{I}$, though others are possible. Applying the method of Lagrange multipliers, one can see that the solution of (2.2) is achieved at the matrix $\mathbf{Y}^*$ whose rows $y_i^*$ are the solutions to the eigenvalue problem

$$ \mathbf{L}y_i^* = \lambda_i \mathbf{D} y_i^*. $$

(2.3)

When the graph contains no isolated nodes, $y_i^*$ is then an eigenvector of the matrix $\mathbf{D}^{-1}\mathbf{L}$, also known as the normalized Laplacian matrix. The embedding of a node $j$ is then the vector whose entries are the $j$th elements of the eigenvectors $y_1^*, y_2^*, \ldots, y_d^*$.

3. Proposed approach: Geometric LE

We first give our definition and then proceed to discuss both the algebraic and geometric motivations behind it.

Definition 3.1: GLEE

Given a graph $G$, consider its Laplacian matrix $\mathbf{L}$. Using singular value decomposition, we may write $\mathbf{L} = \mathbf{S}\mathbf{S}^\top$ for a unique matrix $\mathbf{S}$. Define $\mathbf{S}^d$ as the matrix of the first $d$ columns of $\mathbf{S}$. If $i$ is a node of $G$, define its $d$-dimensional Geometric Laplacian Eigenmap Embedding (GLEE) as the $i$th row of $\mathbf{S}^d$, denoted by $s_i^d$. If the dimension $d$ is unambiguous, we will just write $s_i$.

Algebraic motivation. In the case of positive semidefinite matrices, such as the Laplacian, the singular values coincide with the eigenvalues. Moreover, it is well known that $\mathbf{S}^d$ is the matrix of rank $d$ that is closest to $\mathbf{L}$ in Frobenius norm, i.e., $\|\mathbf{L} - \mathbf{S}^d(\mathbf{S}^d)^\top\|_F \leq \|\mathbf{L} - \mathbf{M}\|_F$ for all matrices $\mathbf{M}$ of rank $d$. Because of this, we expect $\mathbf{S}^d$ to achieve better performance in the graph reconstruction task than any other $d$-dimensional embedding (see Section 5.1).

As can be seen from (2.1), the original formulation of LE is due to the fact that the distance between the embeddings of neighbouring nodes is minimized, under the restriction $Y^\top \mathbf{D} Y = I$. We can also formulate GLEE in terms of the distance between neighbouring nodes. Perhaps counterintuitively, GLEE solves a distance maximization problem, as follows. The proof follows from a routine application of Lagrange multipliers and is omitted.

Theorem 3.2.

Let $\Lambda$ be the diagonal matrix whose entries are the eigenvalues of $\mathbf{L}$. Consider the optimization problem

$$ \begin{align} \arg\max_{\mathbf{Y} \in \mathbb{R}^{n \times d}} tr(\mathbf{Y}^\top \mathbf{L}\mathbf{Y}) \\ \text{s.t. } \mathbf{Y}^\top \mathbf{Y} = \Lambda. \end{align} $$

(3.1)

Its solution is the matrix $\mathbf{S}^d$ whose columns are the eigenvectors corresponding to the largest eigenvalues of $\mathbf{L}$. If $d = n$, then $\mathbf{L} = \mathbf{S}^d (\mathbf{S}^d)^\top$.

The importance of Theorem 3.2 is to highlight the fact that distance-minimization may be misleading when it comes to exploiting the properties of the embedding space. Indeed, the original formulation of LE, while well established in (2.2), yields as result the eigenvectors corresponding to the lowest eigenvalues of $\mathbf{L}$. However, standard results in linear algebra tell us that the best low rank approximation of $\mathbf{L}$ is given by the eigenvectors corresponding to the largest eigenvalues. Therefore, these are the ones used in the definition of GLEE.

Geometric motivation. The geometric reasons underlying Definition 3.1 are perhaps more interesting than the algebraic ones. A recent review paper [5] highlights the work of Fiedler [6], who discovered a bijective correspondence between the Laplacian matrix of a graph and a higher-dimensional geometric object called a simplex.

Definition 3.3: Simplex

Given a set of $k + 1$ $k$-dimensional points $\{p_i\}_{i=0}^k$, if they are affinely independent (i.e. if the set of $k$ points $\{p_0 - p_i\}_{i=1}^k$ is linearly independent) then their convex hull is called a simplex.

A simplex is a high-dimensional polyhedron that is the generalization of a two-dimensional triangle or a three-dimensional tetrahedron. To see the connection between the Laplacian matrix of a graph and simplex geometry, we invoke the following result. The interested reader will find the proof in [5, 6].

Theorem 3.4.

Let $\mathbf{Q}$ be a positive semidefinite $k \times k$ matrix. There exists a $k \times k$ matrix $\mathbf{S}$ such that $\mathbf{Q} = \mathbf{S}\mathbf{S}^\top$. The rows of $\mathbf{S}$ lie at the vertices of a simplex if and only if the rank of $\mathbf{Q}$ is $k - 1$.

Corollary 3.5.

Let $G$ be a connected graph with $n$ nodes. Its Laplacian matrix $\mathbf{L}$ is positive semidefinite, has rank $n - 1$, and has eigendecomposition $\mathbf{L} = \mathbf{P}\Lambda\mathbf{P}^\top$. Write $\mathbf{S} = \mathbf{P}\sqrt{\Lambda}$. Then, $\mathbf{L} = \mathbf{S}\mathbf{S}^\top$ and the rows of $\mathbf{S}$ are the vertices of a $(n-1)$-dimensional simplex called the simplex of $G$.

Corollary 3.5 is central to the approach in [5], providing a correspondence between graphs and simplices. Corollary 3.5 also shines new light on GLEE: the matrix $\mathbf{S}^d$ from Definition 3.1 corresponds to the first $d$ dimensions of the simplex of $G$. In other words, computing the GLEE embeddings of a graph $G$ is equivalent to computing the simplex of $G$ and projecting it down to $d$ dimensions. We proceed to explore the geometric properties of this simplex that can aid in the interpretation of GLEE embeddings. We can find in [5] the following result.

Corollary 3.6.

Let $s_i$ be the $i$th row of $\mathbf{S}$ in Corollary 3.5. $s_i$ is the simplex vertex corresponding to node $i$, and satisfies $\|s_i\|^2 = \deg(i)$, and $s_i \cdot s_j^\top = -a_{ij}$, where $\deg(i)$ is the degree of $i$. In particular, $s_i$ is orthogonal to the embedding of any non-neighbouring node $j$.

Corollary 3.6 highlights some of the basic geometric properties of the simplex (such as lengths and dot products) that can be interpreted in graph theoretical terms (respectively degrees and adjacency). In Widget 3.1, we show examples of these properties. It is worth noting that other common matrix representations of graphs do not present a spectral decomposition that yields a simplex. For example, the adjacency matrix $\mathbf{A}$ is not in general positive semidefinite, and the normalized Laplacian $\mathbf{D}^{-1}\mathbf{L}$ (used by LE) is not symmetric. Therefore, Theorem 3.4 does not apply to them. We now proceed to show how to take advantage of the geometry of GLEE embeddings, which can all be thought of as coming from the simplex, in order to perform common graph mining tasks. In the following we focus on unweighted, undirected graphs.

Example
graph G

Simplex
of G

2-dimensional
GLEE of G

Widget 3.1. Simplex geometry and GLEE. Given a graph $G$ with $n$ nodes (top row), there is a $(n-1)$-dimensional simplex that perfectly encodes the structure of $G$, given by the rows of the matrix $\mathbf{S}$ from Corollary 3.5 (middle row). The first $d$ columns of $\mathbf{S}$ yield the GLEE of $G$ (bottom row). In each example, embeddings are colour-coded according to the node they represent. For $n = 3$, all nodes in the triangle graph are interchangeable. Accordingly, their embeddings all have the same length and subtend equal angles with each other. For $n = 4$, the green and purple nodes are interchangeable, and thus their embeddings are symmetric. Note that the length of each embedding corresponds to the degree of the corresponding node. For $n = 34$ we show the Karate Club network [9], in which we highlight one node in green and all of its neighbours in purple. In the bottom right panel, the dotted line is orthogonal to the green node's embedding. Note that most of the non-neighbours' embeddings (in gray) are close to orthogonal to the green node's embedding, while all neighbours (in purple) are not.

3.1. Graph reconstruction

For a graph $G$ with $n$ nodes, consider its $d$-dimensional GLEE embedding $\mathbf{S}^d$. When $d = n$, in light of Corollary 3.6, the dot product between any two embeddings $s_i, s_j$ can only take the values $-1$ or $0$ and one can reconstruct the graph perfectly from its simplex. However, if $d < n$, the distribution of dot products will take on real values around $-1$ and $0$ with varying amounts of noise; the larger the dimension $d$, the less noise we find around the two modes. It is important to distinguish which nodes $i, j$ have embeddings $s_i, s_j$ whose dot product belongs to the mode at $0$ or to the mode at $-1$, for this determines whether or not the nodes are neighbours in the graph. One possibility is to simply 'split the difference' and consider $i$ and $j$ as neighbours whenever $s_i \cdot s_j < -0.5$. More generally, given a graph $G$ and its embedding $\mathbf{S}^d$, define $\hat{\mathbf{L}}(\theta)$ to be the estimated Laplacian matrix using the above heuristic with threshold $\theta$, that is

$$ \hat{\mathbf{L}}_{ij}(\theta) = \begin{cases} -1 & s_i \cdot s_j^\top < \theta \\ 0 & \text{otherwise.} \end{cases} $$

(3.2)

Then, we seek the value of $\theta$, call it $\theta_{\text{opt}}$, that minimizes the loss

$$ \theta_{\text{opt}} = \arg\min_{\theta \in [-1,0]} \|\mathbf{L} - \hat{\mathbf{L}}(\theta)\|_F^2. $$

(3.3)

If all we have access to is the embedding, but not the original graph, we cannot optimize (3.3) directly. Thus, we have to estimate $\theta_{\text{opt}}$ heuristically. As explained above, one simple estimator is the constant $\hat\theta_c = -0.5$. We develop two other estimators: $\hat\theta_k, \hat\theta_g$, obtained by applying Kernel Density Estimation and Gaussian Mixture Models (GMMs), respectively. We do so in Appendix A as their development has little to do with the geometry of GLEE embeddings. Our experiments show that different thresholds $\theta_c, \theta_k$, and $\theta_g$ produce excellent results on different datasets; see Appendix A for discussion.

3.2. Link prediction

Since the objective of GLEE is to directly encode graph structure in a geometric way, rather than solve any one particular task, we are able to use it in two different ways to perform link prediction. These are useful in different kinds of networks.

3.2.1. Number of common neighbours

It is well known that heuristics such as number of CN or Jacard similarity (JS) between neighbourhoods are highly effective for the task of link prediction in networks with a strong tendency for triadic closure [10]. Here, we show that we can use the geometric properties of GLEE in order to approximately compute CN. For the purpose of exposition, we assume $d = n$ unless stated otherwise in this section.

Given an arbitrary subset of nodes $V$ in the graph $G$, we denote by $|V|$ its number of elements. We further define the centroid of $V$, denoted by $C_V$, as the centroid of the simplex vertices that correspond to its nodes, i.e., $C_V = \frac{1}{|V|} \sum_{i \in V} s_i$. The following lemma, which can be found in [5], highlights the graph-theoretical interpretation of the geometric object $C_V$.

Lemma 3.7.

(From [5]) Given a graph $G$ and its GLEE embedding $S$, consider two disjoint node sets $V_1$ and $V_2$. Then, the number of edges with one endpoint in $V_1$ and one endpoint in $V_2$, is given by

$$ -|V_1||V_2| \; C_{V_1}^\top \cdot C_{V_2}. $$

(3.4)

Proof.

By linearity of the dot product, we have

$$ |V_1||V_2| C_{V_1}^\top \cdot C_{V_2} = \sum_{i \in V_1} \sum_{j \in V_2} s_i \cdot s_j^\top = -\sum_{i \in V_1} \sum_{j \in V_2} a_{ij}. $$

(3.5)

The expression on the right is precisely the required quantity.

Lemma 3.7 says that we can use the dot product between the centroids of two node sets to count the number of edges that are shared by them. Thus, we now reformulate the problem of finding the number of CN between two nodes in terms of centroids of node sets. In the following, we use $N(i)$ to denote the neighbourhood of node $i$, that is, the set of nodes connected to it.

Lemma 3.8.

Let $i, j \in V$ be non-neighbours. Then, the number of CN of $i$ and $j$, denoted by $CN(i, j)$, is given by

$$ CN(i,j) = -\deg(i) \; C_{N(i)} \cdot s_j^\top = -\deg(j) \; C_{N(j)} \cdot s_i^\top. $$

(3.6)

Proof.

Apply Lemma 3.7 to the node sets $V_1 = N(i)$ and $V_2 = \{j\}$, or, equivalently, to $V_1 = N(j)$ and $V_2 = \{i\}$.

Now assume we have the $d$-dimensional GLEE of $G$. We approximate $CN(i,j)$ by estimating both $\deg(i)$ and $C_{N(j)}$. First, we know from Corollary 3.6 that $\deg(i) \approx \|s_i^d\|^2$. Second, we define the approximate neighbour set of $i$ as $\hat{N}(i) = \{k : s_k^d \cdot (s_i^d)^\top < \hat\theta \}$, where $\hat\theta$ is any of the estimators from Section 3.1. We can now write

$$ CN(i,j) \approx -\|s_i^d\|^2 C_{\hat{N}(i)} \cdot (s_j^d)^\top. $$

(3.7)

The higher the value of this expression, the more confident is our prediction that the link $(i,j)$ exists.

3.2.2. Number of paths of length 3

A common critique of the original LE algorithm is that it only takes into account first-order connections, which were considered in Section 3.2.1. Furthermore, authors of [11] point out that the application of link prediction heuristics CN and JS does not have a solid theoretical grounding for certain types of biological networks such as protein--protein interaction networks. They further propose to use the (normalized) number of paths of length three (L3) between two nodes to perform link prediction. We next present a way to approximate L3 using GLEE. This achieves good performance in those networks where CN and JS are invalid and show that GLEE can take into account higher-order connectivity of the graph.

Lemma 3.9.

Assume $S$ is the GLEE of a graph $G$ of dimension $d = n$. Then, the number of paths of length three between two distinct nodes $i$ and $j$ is

$$ L3(i,j) = -\deg(i)\deg(j) \; C_{N(i)}^\top \cdot C_{N(j)} + \sum_{k \in N(i) \cap N(j)} \|s_k\|^2. $$

(3.8)

Proof.

The number of paths of length three between $i$ and $j$ is $(\mathbf{A}^3)_{ij}$, where $\mathbf{A}$ is the adjacency matrix of $G$. We have

$$ \begin{align} (\mathbf{A}^3)_{ij} = \sum_{k \in N(i)} \sum_{l \in N(j)} a_{kl} = - \sum_{k \in N(i)} \sum_{\substack{l \in N(j) \\ l \neq k}} s_k \cdot s_l^\top \\ = -\sum_{k \in N(i)} \sum_{l \in N(j)} s_k \cdot s_l^\top + \sum_{k \in N(i) \cap N(j)} s_k \cdot s_k^\top \\ = -|N(i)||N(j)| C_{N(i)}^\top \cdot C_{N(j)} + \sum_{k \in N(i) \cap N(j)} \|s_k\|^2, \end{align} $$

(3.9)

where the last expression follows by the linearity of the dot product, and is equivalent to (3.8).

When $d < n$, we can estimate $\deg(i)$ by $\|s_i^d\|^2$ and $N(i)$ by $\hat{N}(i)$ as before, with the help of an estimator $\hat\theta$ from Section 3.1.

3.3. Runtime analysis

On a graph $G$ with $n$ nodes, finding the $k$ largest eigenvalues and eigenvectors of the Laplacian takes $O(kn^2)$ time, if one uses algorithms for fast approximate singular value decomposition [12, 13]. Given a $k$-dimensional embedding matrix $S$, reconstructing the graph is as fast as computing the product $S \cdot S^\top$ and applying the threshold $\theta$ to each entry, thus it takes $O(n^\omega + n^2)$, where $\omega$ is the exponent of matrix multiplication. Approximating the number of CN between two nodes depends only on the dot products between embeddings corresponding to their neighbours, thus it takes $O(k \min(\deg(i), \deg(j)))$, while approximating the number of paths of length 3 takes $O(k \deg(i) \deg(j))$.

5. Experiments

We put into practice the procedures detailed in Section 3.1 and Section 3.2 to showcase GLEE's performance in the tasks of link prediction and graph reconstruction. Code to compute the GLEE embeddings of networks and related computations is publicly available at [45]. For our experiments, we use the following baselines: GF because it is a direct factorization of the adjacency matrix, node2vec because it is regarded as a reference point among those methods based on random walks, SDNE because it aims to recover the adjacency matrix of a graph (a task GLEE excels at), NetMF because it generalizes several other well-known techniques and LE because it is the method that most directly resembles our own. In this way, we cover all of the categories explained in Section 4 and use either methods that resemble GLEE closely or methods that have been found to generalize other techniques. For node2vec and SDNE, we use default parameters. For NetMF, we use the spectral approximation with rank 256. The datasets we use are outlined in Table 1. Beside comparing GLEE to the other algorithms, we are interested in how the graph's structure affects performance of each method. This is why we have chosen datasets have similar number of nodes and edges, but different values of average clustering coefficient. Accordingly, we report our results with respect to the average clustering coefficient of each dataset and the number of dimensions of the embedding (the only parameter of GLEE). In Appendix B, we compare the performance of each estimator explained in Section 3.1. In the following experiments, we use $\hat\theta_k$ as our estimator for $\theta_{\text{opt}}$.

Name	$n$	$m$	$\bar{c}$	Type
`PPI`[46]	4,182	13,343	0.04	Protein interaction
`wiki-vote`[47]	7,066	100,736	0.14	Endorsement
`caida`[48]	26,475	53,381	0.21	AS Internet
`CA-HepTh`[48]	8,638	24,806	0.48	Co-authorship
`CA-GrQc`[48]	4,158	13,422	0.56	Co-authorship

Table 1. Datasets used in this work (all undirected, unweighted): number of nodes $n$, number of edges $m$ and average clustering coefficient $\bar{c}$ of the largest connected component of each network. AS, autonomous systems of the Internet.

5.1. Graph reconstruction

Given a GLEE matrix $S^d$, how well can we reconstruct the original graph? This is the task of graph reconstruction. We use as performance metric the precision at $k$ measure, defined as the precision of the first $k$ reconstructed edges. Note that precision at $k$ must always decrease when $k$ grows large, as there will be few correct edges left to reconstruct.

Widget 5.1. The Reconstruction Machine. Drag the threshold slider to reconstruct the Karate Club graph from its GLEE embedding. Green edges are true positives, red are false positives, and gray are missed edges. Change the embedding dimension to see how higher dimensions yield cleaner separation of the dot-product modes, making reconstruction easier.

Following Section 3.1, we reconstruct the edge $(i,j)$ if $s_i^d \cdot s_j^d < \hat\theta$. The further the dot product is from 0 (the ideal value for non-edges), the more confident we are in the existence of this edge. For LE, GF, node2vec and NetMF, we reconstruct the edge $(i,j)$ according to how small the distance between their embeddings is. For both GF, node2vec and NetMF, we reconstruct edges based on how high their dot product is. SDNE is a deep autoencoder and thus its very architecture involves a mechanism to reconstruct the adjacency matrix of the input graph.

We show results in Widget 5.2, where we have ordered datasets from left to right in ascending order of clustering coefficient, and from bottom up in ascending order of embedding dimension $d$. (GF results omitted from Widget 5.2 as it scored close to 0 for all values of $k$ and $d$.) On CA-GrQc, for low embedding dimension $d = 32$, SDNE performs best among all methods, followed by node2vec and LE. However, as $d$ increases, GLEE substantially outperforms all others, reaching an almost perfect precision score at the first 10,000 reconstructed edges. This analysis is also valid for CA-HepTh, another dataset with high clustering coefficient. However, on PPI, our dataset with lowest clustering coefficient, GLEE drastically outperforms all other methods for all values of $d$. Interestingly, LE and node2vec perform well compared to other methods in datasets with high clustering, but their performance drops to near zero on PPI. We hypothesize that this is due to the fact that LE and node2vec depend on the 'community-aware' assumption, thereby assuming that two proteins in the same cluster would interact with each other. This is the exact point that [11] refutes. On the other hand, GLEE directly encodes graph structure, making no assumptions about the original graph, and its performance depends more directly on the embedding dimension than on the clustering coefficient, or on any other assumption about graph structure. GLEE's performance on datasets PPI, Wiki-Vote and caida point to the excellent potential of our method in the case of low clustering coefficient.

Widget 5.2. Graph reconstruction. GLEE performs best in networks with low clustering coefficient, presumably because it depends on the large eigenvalues of the Laplacian, which encode micro-level graph structure.

5.2. Link prediction

Widget 5.3. The Link Prediction Machine. Given the full GLEE embedding of the Karate Club, all node pairs are scored by their dot product: more negative means more likely to be an edge. Drag the threshold to classify pairs as edges or non-edges. The ROC curve shows the trade-off, and the red dot marks your operating point. Increase the embedding dimension and watch the AUC climb toward 1.

Given the embedding of a large subgraph of some graph $G$, can we identify which edges are missing? The experimental setup is as follows. Given a graph $G$ with $n$ nodes, node set $V$ and edge set $E_{\text{obs}}$, we randomly split its edges into train and test sets $E_{\text{train}}$ and $E_{\text{test}}$. We use $|E_{\text{train}}| = 0.75n$, and we make sure that the subgraph induced by $E_{\text{train}}$, denoted by $G_{\text{train}}$, is connected and contains every node of $V$. We then proceed to compute the GLEE of $G_{\text{train}}$ and test on $E_{\text{test}}$. We report area under receiver operator curve (AUC) metric for this task. We use both techniques described in Section 3.2.1 and Section 3.2.2, which we label GLEE and GLEE-L3, respectively.

Widget 5.4 shows that node2vec repeats the behaviour seen in graph reconstruction of increased performance as clustering coefficient increases, though again it is fairly constant with respect to embedding dimension. This observation is also true for NetMF. On the high clustering datasets, LE and GLEE have comparable performance to each other. However, either GLEE or GLEE-L3 perform better than all others on the low clustering datasets PPI, Wiki-Vote, as expected. Also as expected, the performance of GLEE-L3 decreases as average clustering increases. Note that GLEE and LE generally improve performance when $d$ increases, whereas node2vec and SDNE do not improve. (GF and SDNE not shown in Widget 5.4 for clarity. They scored close to 0.5 and 0.6 in all datasets independently of $d$.) The reason why none of the methods studied here perform better than 0.6 AUC in the caida dataset is an open question left for future research. We conclude that the hybrid approach of NetMF is ideal for high clustering coefficient, whereas GLEE is a viable option in the case of low clustering coefficient as evidenced by the results on PPI, Wiki-Vote and caida.

Widget 5.4. Link prediction results. We approximate the number of CN (GLEE), or the number of paths of length 3 (GLEE-L3). Each circle is the average of 10 realizations; error bars too small to show at this scale. GF and SDNE perform close to 0.5 and 0.6 independently of $d$ or dataset (not shown). Datasets ordered from left to right in increasing order of clustering.

6. Conclusions

In this work, we have presented the GLEE, a geometric approach to graph embedding that exploits the intrinsic geometry of the Laplacian. When compared to other methods, we find that GLEE performs the best when the underlying graph has low clustering coefficient, while still performing comparably to other state-of-the-art methods when the clustering coefficient is high. We hypothesize that this is due to the fact that the large eigenvalues of the Laplacian correspond to the small eigenvalues of the adjacency matrix and thus represent the structure of the graph at a micro-level. Furthermore, we find that GLEE's performance increases as the embedding dimension increases, something we do not see in other methods. In contrast to techniques based on neural networks, which have many hyperparameters and costly training phases, GLEE has only one parameter other than the embedding dimension, the threshold $\theta$, and we have provided three different ways of optimizing for it. Indeed, GLEE only depends on the singular value decomposition of the Laplacian matrix.

We attribute these desirable properties of GLEE to the fact that it departs from the traditional literature of graph embedding by replacing the 'community aware' notion (similar nodes' embeddings must be similar) with the notion of directly encoding graph structure using the geometry of the embedding space. In all, we find that GLEE is a promising alternative for graph embedding due to its simplicity in both theoretical background and computational implementation, especially in the case of low clustering coefficient. By taking a direct geometric encoding of graph structure using the simplex geometry, GLEE covers the gap left open by the 'community-aware' assumption of other embedding techniques, which requires high clustering. Future lines of work will explore what other geometric properties of the embedding space can yield interesting insight, as well as what are the important structural properties of graphs, such as clustering coefficient, that affect the performance of these methods.

Funding. National Science Foundation (IIS-1741197); and by the Combat Capabilities Development Command Army Research Laboratory and was accomplished (Cooperative Agreement Number W911NF-13-2-0045; U.S. Army Research Lab Cyber Security CRA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Combat Capabilities Development Command Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes not withstanding any copyright notation here on.

A. Threshold estimators

We present two other estimators of $\theta_{\text{opt}}$ to accompany the heuristic $\hat\theta_c = -0.5$ mentioned in Section 3.1.

Widget A.1. Distribution of dot products $\{s_i \cdot s_j^\top\}$ in the Karate Club graph. Columns show different dimension $d$. Dot products of existing edges are in blue. Non-edges are in orange. Each row shows a different estimator of $\theta_{\text{opt}}$. Top row shows the constant value $\hat\theta_c = -0.5$. Middle row shows our GMM estimator $\hat\theta_g$ (Algorithm 1). Bottom row shows our Kernel Density Estimator $\hat\theta_k$ (Section A.1).

A.1. Kernel density estimation

As can be seen in Widget A.1, the problem of finding a value of $\theta$ that sufficiently separates the peaks corresponding to edges (around the peak centred at $-1$) and non-edges (around the peak centred at $0$) can be stated in terms of density estimation. That is, given the histogram of values of $s_i \cdot s_j^\top$ for all $i,j$, we can approximate the density of this empirical distribution by some density function $f_k$. A good heuristic estimator of $\theta_{\text{opt}}$ is the value that minimizes $f_k$ between the peaks near $-1$ and $0$. For this purpose, we use Kernel Density Estimation over the distribution of $s_i \cdot s_j^\top$ and a box kernel (a.k.a. 'top hat') kernel function to define

$$ f_k(x) \propto \sum_{i < j} \mathbf{1}\{x - s_i \cdot s_j^\top < h\}. $$

(A.1)

We then use gradient descent to find the minimal value of $f_k$ between the values of $-1$ and $0$. We call this value $\hat\theta_k$. We have found experimentally that a value of $h = 0.3$ gives excellent results, achieving near zero error in the reconstruction task (Widget A.1, middle row).

A.2. Gaussian mixture models

Here, we use a GMM over the distribution of $s_i \cdot s_j^\top$. The model will find the two peaks near $-1$ and $0$ and fit each to a Gaussian distribution. Once the densities of said Gaussians have been found, say $f_1$ and $f_2$, we define the estimator $\hat\theta_g$ as that point at which the densities are equal (see Widget A.1, bottom row).

However, we found that a direct application of this method yields poor results due to the sparsity of network datasets. High sparsity implies that the peak at $0$ is orders of magnitude higher than the one at $-1$. Thus, the left peak will usually be hidden by the tail of the right one so that the GMM cannot detect it. To solve this issue we take two steps. First, we use a Bayesian version of GMM that accepts priors for the Gaussian means and other parameters. This guides the GMM optimization algorithm to find the peaks at the right places. Second, we sub-sample the distribution of dot products in order to minimize the difference between the peaks, and then to fix it back after the fit. Concretely, put $r = \sum_{i < j} \mathbf{1}\{s_i \cdot s_j^\top < \hat\theta_c\}$. That is, $r$ is the number of dot products less than the constant $\hat\theta_c = -0.5$. Instead of fitting the GMM to all the observed dot products, we fit it to the set of all $r$ dot products less than $\hat\theta_c$ plus a random sample of $r$ dot products greater than $\hat\theta_c$. This temporarily fixes the class imbalance, which we recover after the model has been fit as follows. The GMM fit will yield a density for the sub-sample as $f_g = w_1 f_1 + w_2 f_2$, where $f_i$ is the density of the $i$th Gaussian, and $w_i$ are the mixture weights, for $i = 1, 2$. Since we sub-sampled the distribution, we will get $w_1 \approx w_2 \approx 0.5$, but we need the weights to reflect the original class imbalance. For this purpose, we define $\hat{w}_1 = \hat{m}/\binom{n}{2}$ and $\hat{w}_2 = 1 - \hat{w}_1$, where $\hat{m}$ is an estimate of the number of edges in the graph. (This can be estimated in a number of ways, for example one may put $\hat{m} = r$, or $\hat{m} = n\log(n)$.) Finally, we define the estimator as the value that satisfies

$$ \hat{w}_1 f_1(\hat\theta_g) = \hat{w}_2 f_2(\hat\theta_g), $$

(A.2)

under the constraint $-1 < \hat\theta_g < 0$. Since $f_1$ and $f_2$ are known Gaussian densities, (A.2) can be solved analytically.

In this case, due to sparsity, the problem of optimizing the GMM is one of non-parametric density estimation with extreme class imbalance. We solve it by utilizing priors for the optimization algorithm, as well as sub-sampling the distribution of dot products, according to some of its known features (i.e. the fact that the peaks will be found near $-1$ and $0$), and we account for the class imbalance by estimating graph sparsity separately. Finally, we define the estimator $\hat\theta_g$ according to (A.2). Algorithm 1 gives an overview of this procedure. For a comparison between the effectiveness of the three different estimators $\hat\theta_c, \hat\theta_k, \hat\theta_g$, see Appendix B.

\begin{algorithm}
  \caption{Estimating $\theta_{\text{opt}}$ with a Gaussian Mixture Model}
  \begin{algorithmic}
  \PROCEDURE{GMM}{$\{s_i\}_{i=1}^n, \hat\theta_c, \hat{m}$}
      \STATE $L \gets \{s_i \cdot s_j^\top : s_i \cdot s_j^\top < \hat\theta_c\}$
      \STATE $R \gets$ random sample of size $|L|$ of $\{s_i \cdot s_j^\top : s_i \cdot s_j^\top \geq \hat\theta_c\}$
      \STATE $w_1, w_2, f_1, f_2 \gets$ fit a Bayesian GMM to $L \cup R$
      \STATE $\hat{w}_1 \gets \hat{m} / \binom{n}{2}$
      \STATE $\hat{w}_2 \gets 1 - \hat{w}_1$
      \STATE $\hat\theta_g \gets$ solution of $\hat{w}_1 f_1(\theta) = \hat{w}_2 f_2(\theta)$
      \RETURN $\hat\theta_g$
  \ENDPROCEDURE
  \end{algorithmic}
  \end{algorithm}

B. Estimator comparison

In Section 3.1 and Appendix A, we outlined three different schemes to estimate $\theta_{\text{opt}}$ which resulted in $\hat\theta_c, \hat\theta_k, \hat\theta_g$. Which one is the best? We test each of these estimators on three random graph models: Erdős–Rényi (ER) [49], Barabási–Albert (BA) [50] and Hyperbolic Graphs (HG) [51]. For each random graph with adjacency matrix $\mathbf{A}$, we compute the Frobenius norm of the difference between the reconstructed adjacency matrix $\hat{\mathbf{A}}$ using each of the three estimators. In Widget B.1, we show our results.

Widget B.1. Estimator comparison. We compute the three different estimators on three different random graph models: ER, BA and HG. All graphs have $n = 10^3$ nodes, and average degree $\langle k \rangle \approx 8$. Hyperbolic graphs generated with degree distribution exponent $\gamma = 2.3$. We show the average of 20 experiments; error bars mark two standard deviations. Values normalized in the range $[0, 1]$.

We see that $\hat\theta_c$ and $\hat\theta_k$ achieve similar performance across datasets, while $\hat\theta_g$ outperforms the other two for ER at $d = 512$, though it has high variability in the other models. From these results, we conclude that at low dimensions $d = 32$, too much information has been lost and thus there is no hope of learning a value of $\hat\theta$ that outperforms the heuristic $\theta_c = -0.5$. However, at larger dimensions, the estimators $\hat\theta_g$ and $\hat\theta_k$ perform better, with different degrees of variability. We conclude also that no single heuristic for $\hat\theta$ is best for all types of graphs. In the rest of our experiments, we use $\hat\theta_k$ as our estimator for $\theta_{\text{opt}}$. We highlight that even though $\theta_k$ is better than $\theta_c$ in some datasets, it might be costly to compute, while $\theta_c$ incurs no additional costs.

References

1. Chen, H. and Perozzi, B. and Al-Rfou, R. and Skiena, S. "A tutorial on network embeddings". Preprint. 2018.
[↖1,↖2]

2. Chen, S. and Niu, S. and Akoglu, L. and Kovacevic, J. and Faloutsos, C. "Fast warped graph embedding: unifying framework and one-click algorithm". Preprint. 2017.
[↖1]

3. Srinivasan, B. and Ribeiro, B. "On the equivalence between positional node embeddings and structural graph representations". Proceedings of ICLR. 2020.
[↖1,↖2]

4. Jin, D. and Rossi, R. A. and Koh, E. and Kim, S. and Rao, A. and Koutra, D. "Latent network summarization: bridging network embedding and summarization". Proceedings of KDD. 2019.
[↖1,↖2]

5. Devriendt, K. and Van Mieghem, P. "The simplex geometry of graphs". Journal of Complex Networks. 2019.
[↖1,↖2,↖3,↖4,↖5,↖6,↖7]

6. Fiedler, M. "Matrices and Graphs in Geometry". Cambridge University Press. 2011.
[↖1,↖2,↖3,↖4]

7. Belkin, M. and Niyogi, P. "Laplacian eigenmaps and spectral techniques for embedding and clustering". Advances in Neural Information Processing Systems 14. 2002.
[↖1,↖2,↖3]

8. Belkin, M. and Niyogi, P. "Laplacian eigenmaps for dimensionality reduction and data representation". Neural Computation. 2003.
[↖1,↖2,↖3]

9. Zachary, W. W. "An information flow model for conflict and fission in small groups". Journal of Anthropological Research. 1977.
[↖1]

10. Sarkar, P. and Chakrabarti, D. and Moore, A. W. "Theoretical justification of popular link prediction heuristics". Proceedings of IJCAI. 2011.
[↖1]

11. Kovács, I. A. and Luck, K. and Spirohn, K. and Wang, Y. and Pollis, C. and Schlabach, S. and Bian, W. and Kim, D.-K. and Kishore, N. and Hao, T. and Calderwood, M. A. and Vidal, M. and Barabási, A.-L. "Network-based prediction of protein interactions". Nature Communications. 2019.
[↖1,↖2]

12. Halko, N. and Martinsson, P.-G. and Tropp, J. A. "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions". SIAM Review. 2011.
[↖1]

13. Trefethen, L. N. and Bau III, D. "Numerical Linear Algebra". SIAM. 1997.
[↖1]

14. Newman, M. E. J. "Networks". Oxford University Press. 2018.
[↖1]

15. Spielman, D. "Graphs vectors and matrices". Bulletin of the American Mathematical Society. 2017.
[↖1]

16. Van Mieghem, P. "Graph Spectra for Complex Networks". Cambridge University Press. 2010.
[↖1]

17. Spielman, D. and Srivastava, N. "Graph sparsification by effective resistances". SIAM Journal on Computing. 2011.
[↖1,↖2]

18. Von Luxburg, U. "A tutorial on spectral clustering". Statistics and Computing. 2007.
[↖1]

19. Prakash, B. and Vreeken, J. and Faloutsos, C. "Efficiently spotting the starting points of an epidemic in a large graph". Knowledge and Information Systems. 2014.
[↖1]

20. Van Mieghem, P. and Sahneh, F. D. and Scoglio, C. "An upper bound for the epidemic threshold in exact Markovian SIR and SIS epidemics on networks". 53rd IEEE Conference on Decision and Control. 2014.
[↖1]

21. Jamakovic, A. and Van Mieghem, P. "On the robustness of complex networks by using the algebraic connectivity". NETWORKING 2008. 2008.
[↖1]

22. Shahrivar, E. M. and Pirani, M. and Sundaram, S. "Robustness and algebraic connectivity of random interdependent networks". IFAC-PapersOnLine. 2015.
[↖1]

23. Koren, Y. "Drawing graphs by eigenvectors: theory and practice". Computers and Mathematics with Applications. 2005.
[↖1]

24. Pisanski, T. and Shawe-Taylor, J. "Characterizing graph drawing with eigenvectors". Journal of Chemical Information and Computer Sciences. 2000.
[↖1,↖2,↖3]

25. Goyal, P. and Ferrara, E. "Graph embedding techniques applications and performance: a survey". Knowledge-Based Systems. 2018.
[↖1]

26. Hamilton, W. L. and Ying, R. and Leskovec, J. "Representation learning on graphs: methods and applications". IEEE Data Engineering Bulletin. 2017.
[↖1]

27. Charisopoulos, V. and Benson, A. R. and Damle, A. "Incrementally updated spectral embeddings". Preprint. 2019.
[↖1]

28. Chen, C. and Tong, H. "Fast eigen-functions tracking on dynamic graphs". Proceedings of SDM. 2015.
[↖1]

29. Chen, C. and Tong, H. "On the eigen-functions of dynamic graphs: fast tracking and attribution algorithms". Statistical Analysis and Data Mining. 2017.
[↖1]

30. Levin, K. and Roosta-Khorasani, F. and Mahoney, M. W. and Priebe, C. E. "Out-of-sample extension of graph adjacency spectral embedding". Proceedings of ICML. 2018.
[↖1]

31. Ahmed, A. and Shervashidze, N. and Narayanamurthy, S. M. and Josifovski, V. and Smola, A. J. "Distributed large-scale natural graph factorization". Proceedings of WWW. 2013.
[↖1]

32. Cai, D. and He, X. and Han, J. and Huang, T. S. "Graph regularized nonnegative matrix factorization for data representation". IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011.
[↖1]

33. Kuang, D. and Ding, C. H. Q. and Park, H. "Symmetric nonnegative matrix factorization for graph clustering". Proceedings of SDM. 2012.
[↖1]

34. Wang, X. and Cui, P. and Wang, J. and Pei, J. and Zhu, W. and Yang, S. "Community preserving network embedding". Proceedings of AAAI. 2017.
[↖1]

35. Perozzi, B. and Al-Rfou, R. and Skiena, S. "DeepWalk: online learning of social representations". Proceedings of KDD. 2014.
[↖1]

36. Grover, A. and Leskovec, J. "node2vec: scalable feature learning for networks". Proceedings of KDD. 2016.
[↖1]

37. Mikolov, T. and Sutskever, I. and Chen, K. and Corrado, G. S. and Dean, J. "Distributed representations of words and phrases and their compositionality". Advances in Neural Information Processing Systems. 2013.
[↖1]

38. Qiu, J. and Dong, Y. and Ma, H. and Li, J. and Wang, K. and Tang, J. "Network embedding as matrix factorization: unifying DeepWalk LINE PTE and node2vec". WSDM. 2018.
[↖1]

39. Wang, D. and Cui, P. and Zhu, W. "Structural deep network embedding". Proceedings of KDD. 2016.
[↖1]

40. Cao, S. and Lu, W. and Xu, Q. "Deep neural networks for learning graph representations". Proceedings of AAAI. 2016.
[↖1]

41. Estrada, E. and Sánchez-Lirola, M. G. and de la Peña, J. A. "Hyperspherical embedding of graphs and networks in communicability spaces". Discrete Applied Mathematics. 2014.
[↖1]

42. Pereda, M. and Estrada, E. "Visualization and machine learning analysis of complex networks in hyperspherical space". Pattern Recognition. 2019.
[↖1]

43. Papadopoulos, F. and Kitsak, M. and Serrano, M. Á. and Boguñá, M. and Krioukov, D. "Popularity versus similarity in growing networks". Nature. 2012.
[↖1]

44. Nickel, M. and Kiela, D. "Learning continuous hierarchies in the Lorentz model of hyperbolic geometry". Proceedings of ICML. 2018.
[↖1]

45. Torres, L. "GLEE: Geometric Laplacian Eigenmap Embedding". 2019.
[↖1]

46. Rolland, T. and Tasan, M. and Charloteaux, B. and Pevzner, S. J. and Zhong, Q. and Sahni, N. and Yi, S. and Lemmens, I. and Fontanillo, C. and Mosca, R. and others. "A proteome-scale map of the human interactome network". Cell. 2014.
[↖1]

47. Leskovec, J. and Huttenlocher, D. P. and Kleinberg, J. M. "Predicting positive and negative links in online social networks". Proceedings of WWW. 2010.
[↖1]

48. Leskovec, J. and Kleinberg, J. M. and Faloutsos, C. "Graph evolution: densification and shrinking diameters". ACM Transactions on Knowledge Discovery from Data. 2007.
[↖1,↖2,↖3]

49. Erdős, P. and Rényi, A. "On the evolution of random graphs". Publications of the Mathematical Institute of the Hungarian Academy of Sciences. 1960.
[↖1]

50. Barabási, A.-L. and Albert, R. "Emergence of scaling in random networks". Science. 1999.
[↖1]

51. Krioukov, D. and Papadopoulos, F. and Kitsak, M. and Vahdat, A. and Boguñá, M. "Hyperbolic geometry of complex networks". Physical Review E. 2010.
[↖1]

Name	\(n\)	\(m\)	\(\bar{c}\)	Type
`PPI`[46]	4,182	13,343	0.04	Protein interaction
`wiki-vote`[47]	7,066	100,736	0.14	Endorsement
`caida`[48]	26,475	53,381	0.21	AS Internet
`CA-HepTh`[48]	8,638	24,806	0.48	Co-authorship
`CA-GrQc`[48]	4,158	13,422	0.56	Co-authorship

GLEE: Geometric Laplacian Eigenmap Embedding

Abstract

1. Introduction

2. Background on LE

3. Proposed approach: Geometric LE

3.1. Graph reconstruction

3.2. Link prediction

3.2.1. Number of common neighbours

3.2.2. Number of paths of length 3

3.3. Runtime analysis

4. Related work

5. Experiments

5.1. Graph reconstruction

5.2. Link prediction

6. Conclusions

A. Threshold estimators

A.1. Kernel density estimation

A.2. Gaussian mixture models

B. Estimator comparison

References