1. Introduction
In recent years, the significance of network analysis and graph structures has grown substantially in both academia and industry. This is because they offer the capability to represent and analyze massive amounts of interconnected data. In this way, it is possible to gain valuable insights across a broad spectrum in the area of application domains, including but not limited to planning, vulnerability analysis, social networks, modeling, and also gene expression, protein interactions, Web graph structure, Internet traffic analysis, and disease spread through contact networks [
1,
2,
3,
4,
5,
6,
7]. For example, Jo et al. [
8] used graph representations for the visualization of online bibliographic datasets, together with an interactive interface for performing visual exploration of the information network, to get an in-depth analysis of, for example, understanding the history and flow of research, its current status, and ongoing trends. Ureña et al. [
9] proposed a completely different use-case for utilizing graphs allowing to model the relationships and the history of reputations such as likes of social media users as a source of trustworthiness. The growth of network theory is driven by its cross-disciplinary significance and it serves as a crucial tool in a systematic method of understanding complex systems. To meet the growing demand for efficient graph analysis tools, several libraries have been proposed that offer a comprehensive and user-friendly approach for defining, representing, and analyzing graph structures [
10,
11,
12,
13].
Creating a strong, efficient, and versatile graph library is a challenging process with numerous design decisions and performance compromises. While existing libraries primarily focus on the representation and analysis of graph structures, we introduce the Graph Transformation Framework (GTF), an open-source Java library that offers the means to separate graph representation and analysis. This article outlines the design of GTF, discussing key considerations and showcasing its key features and supported algorithms. GTF provides a simple Java-based API to define and represent a graph, and also includes a mechanism to transform the graph structure to an arbitrary output format, allowing for subsequent analysis and verifying its consistency. This makes GTF suitable for scenarios where an input network structure needs to be transformed into a desired target structure.
GTF does not replace existing contributions for graph and network processing in Java, but rather complements them [
10,
11,
12,
13]. It can be seamlessly integrated into existing workflows for graph and network analysis, providing users with an additional tool for transforming and verifying graph structures. In summary, this paper makes the following contributions:
An open source framework for the definition, transformation and verification of graph structures. The framework is hosted on GitHub, including a wide variety of examples and possible application domains (
https://github.com/FHOOEAIST/GTF) (accessed on 3 April 2023).
An instantiation of the framework to show how it can be applied in a study of the library dependencies in Android open source applications, for which we provide a replication package available under (
https://zenodo.org/record/6077478) (accessed on 3 April 2023).
2. Related Work
Representing and analyzing network structures has seen an upsurge in recent years. As a result, several software libraries have been proposed to
define,
analyze, and
visualize graph structures. Such libraries are deployed among a wide range of different application and research domains, e.g., malware detection [
2], software performance analysis [
3], and social networking and navigation [
9]. With GTF we contributed a software package in Java to aid developers in academia and industry alike to define, represent, and transform typed graph structures. Our approach shares some design considerations with JGraph-T [
10]. A library offers efficient and generic graph structures, together with a collection of graph-based algorithms. The library is implemented in Java, which, due to its object-oriented nature, allows JGraph-T to model edges and vertices as arbitrary objects. It was first introduced in 2003 and has since been updated and extended regularly. However, GTF has a different focus, as JGraph-T offers a wide range of algorithms towards analyzing a network structure. In contrast to that, in our library GTF, the primary goal lies in graph definition and its subsequent transformation to a desired output format. Besides JGraph-T, the Java ecosystem is very limited when it comes to graph libraries and software packages. Since Michail et al. [
10] first stated these circumstances, not much has changed. The list of graph packages for the Java platform remains limited, with the exception of JGraph-T [
10], Google Guava [
11], JUNG [
12], and JgraphX [
13].
While Google Guava [
11] is mainly a set of core libraries for Java with a focus on additional collection types and utility functionality for concurreny, I/O hashing, and more, it also contains a basic module for graph-structured data. It supports three structural types that are not compatible with each other: simple
graphs with arbitrary values in the nodes,
value-graphs with values in nodes and edges, and
networks picking up the features of a value-graph which are also allowed to have parallel edges. The graph module does not provide any functionality for I/O or visualization.
The Java Universal Network/Graph Framework (JUNG) [
12] offers a unified and customizable framework for modeling, analyzing, and visualizing data that can be depicted as a graph or network. Its architecture allows us to support a high variety for graph-structured data, and also supports multi-modal graphs, multi-edge graphs, and hypergraphs. In addition to structural variety, it also supports many algorithms from the field of graph theory, data mining, and social network analysis, including, for example, processes for clustering, decomposition, and optimization. The most recent version of the JUNG framework 2.1.1 was published in September 2016. Even if there is no official announcement, this indicates that the framework has reached the end of its life.
JgraphX [
13] is a graph library that focuses on the visualization of graphs. It is mainly used for visualization and interaction with node-edge graphs. It provides a wide range of features for visualizing and editing graphs, including layout algorithms, drag-and-drop support, and customizable styles. The framework has been developed for multiple years, but reached its end of life at the end of 2020, and thus active development and maintenance was stopped.
The NetworkX [
14] python package offers capabilities to explore and analyze network structures. Next to the basic data structures for representing networks, NetworkX offers a wide range of graph algorithms. For ease of exchange, NetworkX supports reading and writing various graph formats with existing data.
NetworKit [
15] is a software package written in C++ with python bindings for the comprehensive structural analysis of massive complex networks. NetworKit shares a modular software architecture to aid code reuse and extensibility, and therefore allows to efficiently contribute new functionality to the library. NetworKit is constantly extended with new functionality and algorithms which allow for exploring and characterizing large network data sets [
15].
Two more software packages, implemented in C++ and well-known for their efficiency, are LEDA [
16] and the Boost Graph Library [
17]. Both libraries provide implementations of a generic graph data structure together with a set of basic graph algorithms.
The Stanford Network Analysis Platform (SNAP) [
18] is described as a general purpose, high performance system for analysis and manipulation of large networks. The authors further describe SNAP as capable of handling hundreds of millions of vertices and edges. Furthermore, SNAP provides over 140 graph algorithms for network analysis.
The igraph [
19] software package offers a wide range of tools for network science. The package is characterized by its capabilities to handle large graphs with millions of vertices efficiently. Furthermore, it provides functionality to create, manipulate, and visualize networks.
A lightweight graph processing framework specific for multicore architectures, LIGRA, was presented by Shun and Blelloch [
20]. LIGRA defines a set of operations that efficiently create graph traversal algorithms, which makes it particularly useful for algorithms that operate on sub-graphs.
Gunrock [
21] is a framework for graph-processing which is specifically designed to run on the GPU. The framework offers a high-level programming model to reduce GPU programming knowledge when used.
Visualizing complex networks and graph structures is often a crucial step to gain insights on the distribution of nodes and edges of a graph structure. Throughout the years with GraphViz [
22], Gephi [
23], Cytoscape [
24], and the Graph Visualisation Toolkit [
25], several approaches have been described that allow for visual inspection of complex networks, even for cases where the network is considered to be large (e.g., >20,000 nodes [
23]). Features of these available approaches cover visualization, filtering, manipulating or layouting networks of different structures, size, and complexity.
GraphViz [
22] may be one of the most well-known open source software suites when it comes to visualizing networks or hierarchical data. As described by Ellson et al. [
22], it is often the weapon of choice when preparing graph visualizations that are included in papers. GraphViz offers a wide variety of algorithms and tools to visualize graphs.
Gephi [
23] is an open source software for exploring and manipulating networks and is specialized at visualizing and manipulating large networks. Using a 3D-engine, Gephi allows for rendering large networks in real time. With a wide range of layout algorithms, it allows for exploring dynamic networks.
Graphia [
26] is an open source visual analytics applications written in C++ with a focus on large, connected datasets. Comparable to other solutions, such as Gephi, it is intended for exploring and filtering graphs using an out-of-the-box application.
Finally, with Cytoscape [
24] and the Graph visualization toolkit [
25], there are software suites that allow for visualizing complex network structures independent of the type of data or domain.
In contrast to the existing work, our GTF approach is primarily designed to offer an unobtrusive and intuitive way in network structure representation. Once a particular network structure is defined, GTF offers an easy and extensible API to further process the network to a desired output representation. At the time of writing, GTF itself does not support any form of visualization of network structures. This is due to the fact that we believe that there is already very good software and tools available for this purpose, with the related work in this paper outlining a variety of visualization approaches. Furthermore, we believe that the GTF could be used in combination with the approaches listed above, where a defined graph structure is to be preprocessed and transformed to a desired output format which is supported by either one of the described tools and approaches.
5. Discussion
Within the example scenario, we have shown one sample use case for the proposed Graph Transformation Framework. This example is based on an analysis of around 15,000 nodes. Compared to other examples, such as the one presented by Cauteruccio et al. [
1] with 272,062 nodes or our previous publication on verifications using graph databases [
30] with 17,000,000 nodes, this is a relatively small number. While the framework is based on loose interfaces that allow us to use, for example, any persistence layer, the available implementations currently only support in-memory persistence. This leads to limitations regarding the size of the analyzed graph. However, due to the presented architecture, it is possible to extend the framework by additional implementations allowing alternative persistence strategies such as file or database persistence, which in turn allow for overcoming this limitation.
Additionally, the framework is based on the graph definition , with the edge definition as functions . Sticking to this definition, our framework is only intended for regular graphs, where edges connect two vertices. Similarly, the definition of hypergraphs with edges that can connect more than two vertices or multi-edge graphs with an arbitrary number of edges between the same two nodes are not possible.
One of the key advantages of our framework is its implementation of all graph concepts through strictly defined interfaces, combined with loose composition between individual components. This design enables the easy extension of the framework with new graph concepts and algorithms without the need to modify the original code elements.
This design approach also promotes modularity and flexibility in the framework, as it enables users to easily swap out individual components without disrupting the entire system. Furthermore, the use of well-defined interfaces enhances the clarity and readability of the codebase, making it easier for developers to understand and work with the framework and to interweave it with other frameworks as shown on the example of GraphViz.
By leveraging this design approach, our framework provides a powerful tool for graph analysis and algorithm development that can adapt to evolving needs and requirements. Users can easily incorporate new graph concepts and algorithms as they become available, allowing them to stay at the forefront of research and innovation in the field.