Heterogeneous Graph Neural Network

1. Introduction

This paper addresses the representation learning challenges of content-associated heterogeneous graphs (HetG). Traditional works primarily focused on homogeneous structural information and ignored the rich, diverse content across different node types, such as textual, attribute, and image data. We introduce HetGNN, a model designed to encapsulate both the structural and content heterogeneity within graphs.

2. Methodology

2.1 Challenges Addressed

HetGNN tackles several key challenges in the realm of heterogeneous graphs:

C1: Sampling relevant heterogeneous neighbors: Unlike most GNNs, which only aggregate first-degree neighbors, HetGNN utilizes a random walk with restart strategy to sample a diverse set of strongly associated heterogeneous neighbors.
C2: Encoding diverse content information: Nodes in HetG have varied content types. HetGNN employs a content encoder using RNNs to model the intricate interactions of these content types deeply.
C3: Aggregating heterogeneous neighbor features: To account for the varying influences of different neighbor types, HetGNN integrates a type-based attention mechanism, ensuring that each node type’s contribution is weighted appropriately.

2.2 HetGNN Architecture

HetGNN’s architecture comprises two main modules:

Content Encoding Module: Uses a Bi-LSTM to capture deep feature interactions of node content, resulting in robust content embeddings.
Neighbor Aggregation Module: Aggregates embeddings of different neighbor types using another Bi-LSTM, with attention mechanisms to adjust the influence of each type on the final node embedding.

3. Experiments and Results

HetGNN was evaluated against state-of-the-art models across multiple datasets and graph mining tasks, including link prediction, recommendation, node classification, and clustering, both in transductive and inductive settings. The results demonstrate that HetGNN consistently outperforms existing methods, particularly in environments rich in node content information.

4. Contributions and Model Advantages

HetGNN significantly advances the field by:

Defining the heterogeneous graph representation learning problem that involves both structural and content heterogeneity.
Developing a robust model that effectively captures both elements, applicable to both transductive and inductive tasks.
Achieving state-of-the-art performance on multiple graph mining tasks, demonstrating the effectiveness of our dual-module architecture and the importance of considering both node and edge types in heterogeneous graphs.

5. Conclusion

HetGNN represents a comprehensive approach to the challenges of heterogeneous graph analysis. The model’s ability to integrate and learn from both the structural connections and the rich content of nodes leads to superior performance and broad applicability. This work not only sets a new benchmark for heterogeneous graph neural networks but also opens new avenues for future research in this area.