# Optimization-Based Construction of Quadrilateral Table Cartograms

^{*}

Previous Article in Journal

Department of Human-Social Information Sciences, Graduate School of Information Sciences, Tohoku University, Miyagi 980-8577, Japan

Author to whom correspondence should be addressed.

Received: 26 November 2019 / Revised: 20 December 2019 / Accepted: 13 January 2020 / Published: 14 January 2020

A quadrilateral table cartogram is a rectangle-shaped figure that visualizes table-form data; quadrilateral cells in a table cartogram are transformed to express the magnitude of positive weights by their areas, while maintaining the adjacency of cells in the original table. However, the previous construction method is difficult to implement because it consists of multiple operations that do not have a unique solution and require complex settings to obtain the desired outputs. In this article, we propose a new construction for quadrilateral table cartograms by recasting the construction as an optimization problem. The proposed method is formulated as a simple minimization problem to achieve mathematical clarity. It can generate quadrilateral table cartograms with smaller deformation of rows and columns, thereby aiding readers to recognize the correspondence between table cartograms and original tables. In addition, we also propose a means of sorting rows and/or columns prior to the construction of table cartograms to reduce excess shape deformation. Applications of the proposed method confirm its capability to output table cartograms that clearly visualize the characteristics of datasets.

A contiguous (or continuous) area cartogram is a visualization method for geospatial data that records the attributes of positive values of regions, such as population and Gross Domestic Product (GDP). It is a deformed map in which the shape of each region is transformed to fit its area on the cartogram to its size of positive attribute value, while maintaining the adjacency of regions (see [1,2]). The deformed shapes of the regions on cartograms give readers an intuitive impression on the characteristics of datasets.

The special feature of visualization by contiguous area cartograms is the data representation through the deformation of figures. This feature is not only effective for geospatial datasets but also for table-form datasets with positive values. The visualization of a table-form dataset by a contiguous area cartogram-like representation is called a table cartogram. The most common examples of table cartograms are the visualization of characteristics of elements on the periodic chemical table, the table cartograms that exhibit several features of elements created [3,4,5], and a table cartogram published by the European Chemical Society published in 2019, International Year of the Periodic Table of Chemical Elements, to exhibit the quantity of chemical elements on Earth [6].

Table cartograms are categorized into two types. One type is a ‘freely-deformed’ table cartogram whose outline border and edges of cells are deformed into curves [3,4,6]. A method of construction of cartograms of this type is the diffusion-based method [7]—one of several contiguous area cartogram construction methods—which imitates the diffusion process of fluids in physics. It is mathematically clear and easy to operate; thus, it has attracted attention in constructing contiguous area cartograms. Winter [4], for instance, utilized the diffusion-based method. However, when the magnification ratios of adjacent cells—i.e., a cell’s attribute value over its area in the original table—are quite different from each other, then the edge between them is deformed into a curved line. The shapes of cells with large values therefore become almost like circles, and the structure of a table in which each cell has quadrilateral shape is not preserved on table cartograms. Consequently, not all rows and columns can be clearly recognized in ‘freely-deformed’ table cartograms.

The other type is a quadrilateral table cartogram; here, the rectangular outline border is fixed and all cell edges are maintained as straight lines to preserve the quadrilateral shape of the cells. It is an attractive representation of table-form data, being more table-like with each row and column clearly distinguishable than on ‘freely-deformed’ table cartograms; this allows for a simpler interpretation of the characteristics of the represented data on the cartogram. Its construction method was first proposed in [5]. However, it is difficult to apply to new datasets because it consists of multiple operations, some of which do not have unique solutions and require complex settings.

In this article, we develop a new construction method of quadrilateral table cartograms. To create a construction method with high operability, the quadrilateral table cartogram construction is formulated as an optimization problem, based on the idea of [8], which proposed a contiguous area cartogram construction method. Because the large deformation of rows and columns in the resultant cartogram interrupts figure readers from recognizing the correspondence between the table cartogram and the original table, we attempt to maintain vertical and horizontal edges as much as possible. A brief explanation for the proposal has been made in [9], and this paper gives detailed explanation for the correspondence of the proposal to the idea of [8] and the formulation.

In addition, we propose a means of sorting rows and/or columns of original tables to reduce the shape deformation of table cartograms. The shape deformation of table cartograms becomes prominent if the difference in attribute values of adjacent cells are larger, at which point sorting rows and/or columns based on their similarity is expected to mitigate the shape deformation. Some table-form data have meanings associated to the order of rows and columns; the periodic chemical table and time-series data tables are typical examples. However, not all orders of rows and columns in table-form datasets have meanings. By sorting the rows and/or columns based on their similarity of data distribution, it can reduce the shape deformation and make the table cartogram more interpretable. Hence, we formulate the problem of reordering rows and/or columns based on their similarity as a combinatorial optimization problem and solve it using the simulated annealing algorithm.

The proposed table cartogram construction method involves two types of optimization problems and is evaluated through by applying it to actual dataset.

Area cartograms are distorted maps on which the attribute value of each region is represented by its size. They can be classified by two classificatory criteria: the representation of adjacency of regions, which can be for contiguous or non-contiguous area cartograms; and the representation of shapes of regions, which can be for ‘simple-’ or ‘complex-’ shaped area cartograms.

Non-contiguous area cartograms are relatively easy to construct. Circle cartograms [10,11] and rectangular cartograms [12,13,14,15] are examples of simple-shaped non-contiguous area cartograms; they represent each region by a simple shape, a circle or a rectangle, whose area is fit to its attribute value, and place them according to the geographic configuration of regions. Non-contiguous complex-shaped cartograms were proposed in [16]; for these, the regions whose sizes are modified to fit to the attribute values are placed on a plane considering the geographic configuration of regions without overlaps.

Complex-shaped contiguous area cartograms, on which each region is of a similar shape to one on the geographic map, are rather difficult to construct. Up to now, there have been several studies proposing their construction methods [1,2,7,8,17,18,19,20,21,22,23,24,25]. Among these, the most popular construction method is the diffusion-based method [7], whose mathematical guarantees its operability. Compared to the complex-shaped contiguous area cartograms, studies on simple-shaped contiguous area cartograms are few. Methods of representing regions by rectangles were discussed in [26,27,28], and a method of representing regions by rectilinear polygons was discussed in [29]. Because the degree of freedom is smaller in the construction of a simple-shaped contiguous area cartogram than in a complex-shaped one, the construction is comparably more challenging.

A table cartogram, a visualization for table-form data, is a special contiguous area cartogram. It is a deformed table on which the sizes of cells are fit to the given positive values, with its cells having the same adjacency relationship to their corresponding cells in the original table. Table cartograms can be classified into two types based on the shapes of cells on them. One type, a free-shaped table cartogram, describes cells with curved edges, and is considered as a complex-shaped contiguous area cartogram. The other type, a quadrilateral table cartogram, describes cells with straight edges, and is categorized as a simple-shaped contiguous area cartogram.

Because a free-shaped table cartogram can be considered as a complex-shaped contiguous area cartogram, the diffusion-based method [7] is used for its construction [4]. The diffusion-based method imitates the diffusion process of fluids with different initial ‘densities’, i.e., the ratio of attribute values over areas of regions. According to the diffusion principle, high-density fluids diffuse toward locations where low-density fluids exist. The diffusion process stops when the densities are equalized, and the areas where the fluids are diffused indicate the transformed regions on the contiguous area cartogram. When densities differ greatly by regions, the high-density regions are transformed balloon-like shapes [23], and the low-density regions are reduced to long and narrow streamline shapes. When it is applied to the construction of table cartograms, the shape of an entire table is also deformed. If the data values in the table exhibit differences large enough deform the table significantly, it might be difficult to see the correspondence of rows, columns, and cells on the table cartogram to those of the original table.

It is worth to note the difference between data visualization by contiguous area cartograms and data visualization by table cartograms. When visualizing table-form data by table cartograms, it is important to preserve the table-form structure, in which horizontal and vertical edges form cells. When people look at contiguous area cartograms, the shapes of regions—although deformed—help them to recognize the correspondence between regions on cartograms and on geographical maps. However, the shapes of cells on table cartograms are not as useful because all cells have the same shapes. Hence, it is important to preserve the rows and columns visible on table cartograms.

A quadrilateral table cartogram, whose cells are represented by quadrilaterals, is one answer to improving the recognizability of rows and columns on table cartograms. A quadrilateral table cartogram and its construction were first proposed in [5]. The construction method consists of multiple segmentation and transformation operations. Because information on the sizes of quadrilaterals is not sufficient to determine their shapes, the construction of table cartograms, or any area cartograms, is inherently an ill-posed problem with no unique solutions. The method requires many parameters to be adjusted and is thus difficult to implement when solving this ill-posed construction problem.

As has been described in the previous section, the construction of an area cartogram is an ill-posed problem with no unique answers. The importance of preserving the visibility of rows and columns to create easy-to-comprehend table cartograms has also been well noted.

The technique of controlling the shape deformation for the regularization of an ill-posed problem was utilized in [8], as well as the formulation of the construction of contiguous area cartograms as an optimization problem. This basic idea has been employed in the constructions of circle cartograms [11], rectangular cartograms [14,15], rectilinear cartograms [29], and distance cartograms [30] hitherto. This approach is also considered useful for the construction of quadrilateral table cartograms, and we introduce the idea and formulation of contiguous area cartogram construction in this section.

The study introduces the regularization term that penalizes the change in bearing angles of edges to solve an ill-posed problem of contiguous area cartogram construction. The regularization term attempts to preserve the similarity of polygonal shapes as much as possible to enhance the readability of cartograms.

To formulate the problem as a simple equation, it is presumed that each polygon is divided into triangles. Let t_{ijk} denote a triangle consisting of three vertices i, j, and k, e_{mn} denote an edge that connects vertices m and n, T denote the set of triangles, E denote the set of edges, D_{ijk} denote an attribute value given to the triangle t_{ijk}, A_{ijk} denote the area of t_{ijk} with sign, and θ_{mn} and θ^{G}_{mn} denote the bearing angles of edge e_{mn} on the cartogram and on the geographical configuration, respectively. The formulation consists of an objective function that minimizes the differences between squares of areas and squares of data values of triangles and a regularization term that minimizes the changes in bearing angles of edges,
where μ is a weight on the regularization term. The objective function is intended to avoid calculating absolute values when computing the areas of triangles using their vertex coordinates. Equation (1) is obtained using the point coordinates,
where x^{G}_{m} and y^{G}_{m} denote the coordinates of vertex m on the geographical configuration, d_{mn} and d^{G}_{mn} denote the length of the edge e_{mn} on the cartogram and on the geographical configuration, respectively. This study solves the nonlinear minimization of Equation (2) by linearization. Equation (2) is linearized around the approximate position of vertex n, x’_{n} and y’_{n}, respectively, as
where A’_{ijk} denotes the size of triangle by approximate position of vertices i, j, and k, d’_{mn} denotes a length of edges by approximate position of vertices m and n, and x’_{ij} = x’_{i} − x’_{j}. The solution of Equation (2) is obtained by an iterative process that solves Equation (3) and updates the solutions. The weight μ should be adjusted to balance between the control of shape deformation and the convergence speed of the iterative processes. The study proposes an algorithm to adjust the weight μ automatically.

$$\mathrm{min}\left[{\displaystyle \sum _{{t}_{ijk}\in T}{\left(1-{A}_{ijk}^{2}/{D}_{ijk}^{2}\right)}^{2}}+\mu {\displaystyle \sum _{{e}_{mn}\in E}{\left({\theta}_{mn}-{\theta}_{mn}^{G}\right)}^{2}}\right]$$

$$\begin{array}{r}\mathrm{min}[{{\displaystyle \sum _{{t}_{ijk}\in T}\left\{1-\frac{1}{4{D}_{ijk}^{2}}{\left\{\left({x}_{j}-{x}_{i}\right)\left({y}_{k}-{y}_{i}\right)-\left({x}_{k}-{x}_{i}\right)\left({y}_{j}-{y}_{i}\right)\right\}}^{2}\right\}}}^{2}\\ +\mu {\displaystyle \sum _{{e}_{mn}\in E}{\left\{\frac{\left({x}_{n}-{x}_{m}\right)\left({y}_{n}^{G}-{y}_{m}^{G}\right)-\left({y}_{n}-{y}_{m}\right)\left({x}_{n}^{G}-{x}_{m}^{G}\right)}{{d}_{mn}{d}_{mn}^{G}}\right\}}^{2}}]\end{array}$$

$$\begin{array}{r}\mathrm{min}[{{\displaystyle \sum _{{t}_{ijk}\in T}\left\{1+\frac{3{{A}^{\prime}}_{ijk}^{2}}{{D}_{ijk}^{2}}+\frac{{{A}^{\prime}}_{ijk}^{2}}{{D}_{ijk}^{2}}\left({{y}^{\prime}}_{jk}{x}_{i}-{{y}^{\prime}}_{ik}{x}_{j}+{{y}^{\prime}}_{ij}{x}_{k}-{{x}^{\prime}}_{jk}{y}_{i}+{{x}^{\prime}}_{ik}{y}_{j}-{{x}^{\prime}}_{ij}{y}_{k}\right)\right\}}}^{2}\\ +\mu {\displaystyle \sum _{{e}_{mn}\in E}{\left\{\frac{\left({x}_{n}-{x}_{m}\right){{y}^{\prime}}_{mn}-{{x}^{\prime}}_{mn}\left({y}_{n}-{y}_{m}\right)}{{{d}^{\prime}}_{mn}^{2}}\right\}}^{2}}]\end{array}$$

Because quadrilateral table cartograms can be regarded as special contiguous area cartograms, the basic requirements for the construction of contiguous area cartograms are also applicable to them. In this article, we utilize the basic idea of contiguous area cartogram construction introduced in [8].

It must, however, be emphasized that there are differences between the construction of contiguous area cartograms and quadrilateral table cartograms. On contiguous area cartograms, the shape of each polygon is useful in identifying the corresponding polygon on the original geographic maps; however, on quadrilateral table cartograms, the shape of each cell is not useful in identifying the corresponding cell on the original tables because every cell in the original table has the same shape. Consequently, the prevention of deformation of each cell during the quadrilateral table cartogram construction is of less importance.

To enhance the readability of quadrilateral table cartograms, it is important that the rows and columns are identifiable instead. Based on the basic idea in [8], which introduces the regularization term that controls the bearings of edges, this study introduces more strict control on the change in directions of horizontal and vertical edges of table cartograms.

In light of the foregoing discussion, this study sets the following three requirements for the construction of quadrilateral table cartograms.

- To represent data precisely: the size of each cell should be close to its attribute value as much as possible.
- To make each row and column recognizable on quadrilateral table cartograms: the direction of horizontal and vertical edges should be kept as much as possible.
- To create a table-like figures: the outline border should be fixed.

The following section consists of two parts. Section 3.1 introduces a proposal for constructing quadrilateral table cartograms that satisfy the above requirements from a given table-form dataset. Additionally, Section 3.2 introduces a method of sorting rows and/or columns of original tables to construct less deformed table cartograms with high readability.

This section explains a two-step approach of constructing quadrilateral table cartograms. The requirement to preserve the direction of horizontal and vertical edges is emphasized in their construction. The first step creates figures that perfectly satisfy our requirements; it only adjusts heights of rows and widths of columns to fit each cell’s size to its given values. The second step weakens the constraints on the directions of edges to increase the data representation precision; a similar formulation to that in [8] is set, although the regularization term that penalizes the change in bearings of edges is strengthened.

The first step only adjusts the heights of rows and widths of columns to ensure that no changes are made in bearing angles of edges. Let N_{r} and N_{c} denote the number of rows and columns, respectively, and D_{ij} denote the attribute value given to the cell of row i and column j, enclosed by horizontal lines i − 1 and i and vertical lines j − 1 and j. The initial cells on the original table are set to squares of edge length l, and the table is a rectangular with the height of N_{r} × l and the width of N_{c} × l (Figure 1). Because the total size of a table is given by the total sum of attribute values, l is fixed to

$$l=\sqrt{{\displaystyle {\sum}_{i=1}^{{N}_{r}}{\displaystyle {\sum}_{j=1}^{{N}_{c}}{D}_{ij}}}/\left({N}_{r}\times {N}_{c}\right)}.$$

Let h_{i} and w_{j} denote the height of row i and the width of column j, respectively. The formulation that fits the cell size to their attributes can be written as

$$\underset{\begin{array}{l}{h}_{1},\dots ,{h}_{{N}_{r}}\\ {w}_{1},\dots ,{w}_{{N}_{c}}\end{array}}{\mathrm{min}}{\displaystyle \sum _{i=1}^{{N}_{r}}{\displaystyle \sum _{j=1}^{{N}_{c}}{\left({D}_{ij}-{h}_{i}{w}_{j}\right)}^{2}}}\text{}\mathrm{subject}\text{}\mathrm{to}\text{}{h}_{i}0\hspace{0.17em}\forall i,\hspace{0.17em}{w}_{j}0\hspace{0.17em}\forall j,\hspace{0.17em}{\displaystyle \sum _{i=1}^{{N}_{r}}{h}_{i}}={N}_{r}l,\hspace{0.17em}{\displaystyle \sum _{j=1}^{{N}_{c}}{w}_{j}=}{N}_{c}l$$

Now we rewrite Equation (4) using the x-coordinates of the vertical line j, x_{j}, and the y-coordinates of horizontal line i, y_{i}. Set the lower-left corner of table (x_{0}, y_{0}) at the origin (0, 0), then Equation (4) can be rewritten as

$$\begin{array}{l}\underset{\begin{array}{l}{y}_{1},\dots ,{y}_{{N}_{r}-1}\\ {x}_{1},\dots ,{x}_{{N}_{c}-1}\end{array}}{\mathrm{min}}{\displaystyle \sum _{i=1}^{{N}_{r}}{\displaystyle \sum _{j=1}^{{N}_{c}}{\left[{D}_{ij}-\left({y}_{i}-{y}_{i-1}\right)\left({x}_{j}-{x}_{j-1}\right)\right]}^{2}}}\\ \mathrm{subject}\text{}\mathrm{to}\text{}{y}_{i}-{y}_{i-1}0\hspace{0.17em}\forall i,\text{}{x}_{j}-{x}_{j-1}0\hspace{0.17em}\forall j,\text{}{x}_{0}={y}_{0}=0,\text{}{y}_{{N}_{r}}={N}_{r}l,{x}_{{N}_{c}}={N}_{c}l.\end{array}$$

The optimization problem of Equation (5) is solved by an iterative process that solves the linearized problem around the approximate solution of coordinates of vertices and updates the solution. The initial coordinate values are set by allocating the rows’ heights based on the proportion of their total attribute values and the widths of columns based on the proportion of their total attribute values.

The vertical and horizontal lines are kept as straight lines and the shapes of all cells are rectangles on the resultant figure. It is easy to read the correspondence between the table cartogram and the original table; however, because the degrees of freedom are low, the data representation precision cannot be high.

The second step weakens the constraints on edge directions to achieve higher data representation precision.

A new quadrilateral table cartogram construction method is developed based on Equation (3), and two modifications are made to fit the method to the quadrilateral table cartogram construction. Firstly, to fix the outer border of table cartograms, the x-coordinates of vertices on the horizontal outer edges and y-coordinates of vertices on the vertical outer edges are not set as parameters. Secondly, to preserve the directions of cell edges in quadrilateral table cartograms to their original horizontal or vertical directions, this study does not update the bearing angles in the regularization term from the previous solutions during the iteration processes of the minimization of linearized problems, although the bearing angles are updated in [8]. The minimization of linearized problems is written as
where x* and y* denote the x- and y-coordinates of vertices of the output of first step, respectively. The iteration is repeated until the preset data representation precision is satisfied.

$$\begin{array}{r}\underset{x,y}{\mathrm{min}}f=\underset{x,y}{\mathrm{min}}[{{\displaystyle \sum _{{t}_{ijk}\in T}\left\{1+\frac{3{{A}^{\prime}}_{ijk}^{2}}{{D}_{ijk}^{2}}+\frac{{{A}^{\prime}}_{ijk}^{2}}{{D}_{ijk}^{2}}\left({{y}^{\prime}}_{jk}{x}_{i}-{{y}^{\prime}}_{ik}{x}_{j}+{{y}^{\prime}}_{ij}{x}_{k}-{{x}^{\prime}}_{jk}{y}_{i}+{{x}^{\prime}}_{ik}{y}_{j}-{{x}^{\prime}}_{ij}{y}_{k}\right)\right\}}}^{2}\\ +\mu {\displaystyle \sum _{{e}_{mn}\in E}{\left\{\frac{\left({x}_{n}-{x}_{m}\right){y}_{mn}^{*}-{x}_{mn}^{*}\left({y}_{n}-{y}_{m}\right)}{{d}_{mn}^{*2}}\right\}}^{2}}]\end{array}$$

Here we discuss the computational complexity of the solution by Equation (6). An N_{r} by N_{c} table has (N_{r} + 1) × (N_{c} + 1) vertices, and the number of parameters is 2 × (N_{r} N_{c} − 1) because the four corners and the outer border lines are fixed. We solve Equation (6) by formulating the linear equations in which the partial derivatives of objective function f with respect to the all parameters are zero. The solution is represented by the square matrix of 2 × (N_{r} N_{c} − 1) rows and columns.

Focusing on vertex F, the partial derivatives of f with respect to x_{F} and y_{F} are written as

$$\begin{array}{r}\frac{\partial f}{\partial {x}_{F}}=2[{\displaystyle \sum _{{t}_{Fjk}\in T}\frac{{{A}^{\prime}}_{Fjk}^{2}{{y}^{\prime}}_{jk}}{{D}_{Fjk}^{2}}\left\{1+\frac{3{{A}^{\prime}}_{Fjk}^{2}}{{D}_{Fjk}^{2}}+\frac{{{A}^{\prime}}_{Fjk}^{2}}{{D}_{Fjk}^{2}}\left({{y}^{\prime}}_{jk}{x}_{F}-{{y}^{\prime}}_{Fk}{x}_{j}+{{y}^{\prime}}_{Fj}{x}_{k}-{{x}^{\prime}}_{jk}{y}_{F}+{{x}^{\prime}}_{Fk}{y}_{j}-{{x}^{\prime}}_{Fj}{y}_{k}\right)\right\}}\\ -\mu {\displaystyle \sum _{{e}_{Fn}\in E}\frac{{y}_{Fn}^{*}}{{d}_{Fn}^{*2}}\left\{\frac{\left({x}_{n}-{x}_{F}\right){y}_{Fn}^{*}-{x}_{Fn}^{*}\left({y}_{n}-{y}_{F}\right)}{{d}_{Fn}^{*2}}\right\}}]\end{array}$$

$$\begin{array}{r}\frac{\partial f}{\partial {y}_{F}}=2[{\displaystyle \sum _{{t}_{Fjk}\in T}\frac{-{{A}^{\prime}}_{Fjk}^{2}{{x}^{\prime}}_{jk}}{{D}_{Fjk}^{2}}\left\{1+\frac{3{{A}^{\prime}}_{Fjk}^{2}}{{D}_{Fjk}^{2}}+\frac{{{A}^{\prime}}_{Fjk}^{2}}{{D}_{Fjk}^{2}}\left({{y}^{\prime}}_{jk}{x}_{F}-{{y}^{\prime}}_{Fk}{x}_{j}+{{y}^{\prime}}_{Fj}{x}_{k}-{{x}^{\prime}}_{jk}{y}_{F}+{{x}^{\prime}}_{Fk}{y}_{j}-{{x}^{\prime}}_{Fj}{y}_{k}\right)\right\}}\\ +\mu {\displaystyle \sum _{{e}_{Fn}\in E}\frac{{x}_{Fn}^{*}}{{d}_{Fn}^{*2}}\left\{\frac{\left({x}_{n}-{x}_{F}\right){y}_{Fn}^{*}-{x}_{Fn}^{*}\left({y}_{n}-{y}_{F}\right)}{{d}_{Fn}^{*2}}\right\}}]\end{array}$$

The parameters that appear in Equations (7) and (8) are limited to the x- and y-coordinates of vertices that form the same triangles or edges with vertex F. Figure 2a shows the case when the number of parameters is a maximum of 18, and Figure 2b shows the case when the number of parameters is a minimum of 10 if surrounding vertices are not on the outer border of table cartograms. The number of nonzero coefficients is small in the linear system that is composed by Equations (7) and (8), then the system is represented by a sparse matrix.

It is difficult to estimate the computational complexity of the solution as the structure of matrix is not ideal such as symmetric matrices. However, it is clear that the complexity of solution is much smaller than that of ‘ordinary’ linear equations, whose complexity is O(n^{3}) where n is the number of parameters. The usage of sparse solver enables to apply the proposed method to the large table-form data.

If data records with high similarity are placed near, the characteristics of data distribution become clearly visible on the table cartogram with small deformation. If rows and columns are randomly ordered from the viewpoint of similarity of data distribution, such as tables sorted in alphabetical order of names and ascending order of specific rows or columns, the table cartogram would be largely deformed and difficult to be interpreted.

Some tables, such as the periodic table of elements and the time-series data tables, have meanings associated with the order of rows and/or columns. The meanings of order indicate higher similarity of neighborhood cells and contribute to output table cartograms with smaller deformation. Previous studies only discussed constructing table cartograms of given tables without sorting rows and columns.

However, there are many tables that can be sorted to enhance the visibility of characteristics of data. Hence, in this study, we order the rows and/or columns of data based on the similarity of data distribution of cells, following the Kullback and Leibler (K–L) divergence [31].

The K–L divergence ${D}_{KL}\left(P\left|\right|Q\right)={\displaystyle {\sum}_{i}P\left(i\right)\mathrm{ln}\left(P\left(i\right)/Q\left(i\right)\right)}$ is a measure of the similarity between two probability distributions. This study proposes to search the order of rows and/or columns that minimizes the sum of K–L divergence between adjacent rows and/or columns in a table. Because ${D}_{KL}\left(P\left|\right|Q\right)\ne {D}_{KL}\left(Q\left|\right|P\right)$, we take an average of both to set the same evaluation measures for the adverse order.

Let P(i, j) denote the proportion of data value of cell ij to the total data value of N_{r} by N_{c} table. The similarity indices of the given row order (S_{r}) and column order (S_{c}) are as follows.

$${S}_{r}={\displaystyle \sum _{i=1}^{{N}_{r}-1}{\displaystyle \sum _{j=1}^{{N}_{c}}\left(P\left(i,j\right)\mathrm{ln}\left(\frac{P\left(i,j\right)}{P\left(i+1,j\right)}\right)+P\left(i+1,j\right)\mathrm{ln}\left(\frac{P\left(i,j\right)}{P\left(i,j+1\right)}\right)\right)}}$$

$${S}_{c}={\displaystyle \sum _{i=1}^{{N}_{r}}{\displaystyle \sum _{j=1}^{{N}_{c}-1}\left(P\left(i,j\right)\mathrm{ln}\left(\frac{P\left(i,j\right)}{P\left(i,j+1\right)}\right)+P\left(i,j+1\right)\mathrm{ln}\left(\frac{P\left(i,j+1\right)}{P\left(i,j\right)}\right)\right)}}$$

The search of orders of rows or columns that minimize S_{r} or S_{c} is a combinatorial optimization problem. The simulated annealing (SA) algorithm, a probabilistic method to solve combinatorial optimization problems, is used to solve the problem.

This section evaluates the proposed methods—the construction method of quadrilateral table cartogram from the given tables and the row (or column) sorting method—by applying them to the visualization of table-form data. The proposed methods were implemented in MATLAB 2019 by the MathWorks, Inc., MA, USA and tested on a computer with 2.93-GHz CPU.

The effect of initial values of weight μ was tested by a simple $3\times 3$ table. Figure 3a,b show, respectively, the original table for input and the table cartogram output by Step 1 with vertical and horizontal lines. Figure 3c–f show the resultant quadrilateral table cartograms by different initial μ values.

Table 1 indicates the root mean square error (RMSE) between values and sizes of cells as the indices of data representation precision, the maximal and minimal inner angles, the RMSE of bearing angles of edges between table cartograms and original tables, and the RMSE between right angles and inner angles as indices of deformations. When the deformation is small, inner angles approach right angles, and the bearings angles of edges approach horizontal or vertical directions.

It is confirmed that all the resultant figures represent the data at high precision; the RMSEs are smaller than 0.01% of the smallest data values. The shapes of resultant figures are different, but they preserve the table-like shapes. The larger initial weight μ outputs the figure with smallest deformation according to all the deformation indices in Table 1.

The proposed method was modified to reduce the bearing changes of horizontal and vertical edges, compared to the method in [8]. To check the validity of the modification, we compare the output of proposed method with that of the previous method that is modified to be able to fix the outline border.

This evaluation utilizes a $7\times 7$ table that records the GDP of the main affected countries and regions by the Asian financial crisis in 1997. The data was obtained from a publication by the World Bank. Figure 4 indicates the outputs of two cartogram construction methods, and Table 2 indicates the indices of data representation precision and shape deformation.

Although Table 2 indicates that the both figures represent values by areas of cells at high precision, it is apparent from Figure 4 that the proposed method output a lesser deformed table than that obtained using the Inoue-Shimizu method. The output obtained by the proposed method almost preserves the horizontal and vertical lines, and it is easy to recognize the characteristics of data distribution; the ‘1998’ column shrinks drastically and the degree of decline is larger for Indonesia than for other countries and regions. It is difficult to read the characteristics from Figure 4b owing to the unnecessary deformation of vertical and horizontal lines, which hinders the understandings of figure readers.

Through the comparison, it is confirmed that the two-step procedure and the enforcement of controls on the bearing angle changes in edges by the proposed method is effective in enhancing the readability of quadrilateral table cartograms.

The proposed method was applied to the visualization of properties of chemical elements, which is the most common applications of table cartograms. The quadrilateral table cartograms of properties of elements are presented in [5]. Figure 5 indicates the resultant table cartograms that visualize the relative thermal conductivity and the relative boiling points, and Table 3 shows the indices of the precision, deformation, and calculation times.

The resultant figures are alike those obtained in [5], although the data representation precision is not discussed in that study. The RMSEs of Figure 5a,b is 0.031% and 0.017% of the minimum data values, respectively; it is confirmed that the proposed method can output table cartograms that represent data at high precision. The feasibility of proposed method is higher than that of the previous method as initial weight μ is the only parameter to be decided when constructing table cartograms. The different initial values of μ output the different shapes of table cartograms; however, because of the fast calculation of the proposed method, users can try several initial values of μ and select a figure based on the data representation precision, the indices of deformation, and the shape.

We evaluate the effect of sorting of rows and/or columns using the data of crude death rate per thousand people published by the World Bank. Figure 6 shows the resultant table cartograms. Figure 6a is constructed from the table whose rows are sorted by the proposed sorting method, and Figure 6b is constructed from the table whose rows are sorted by the alphabetical order of country names. Table 4 indicates the indices of data representation precision and shape deformation.

Figure 6 clearly indicates that the table deformation becomes smaller when the table is sorted by the similarities of data distribution. Because countries with similar socio-economic condition are placed near, the sorting of rows (or columns) cannot only improve the appearance of table cartograms but also make the output more interpretable. The data representation precision and shape deformation in Table 4 show that the table cartogram from the sorted dataset and that from the original table have the same level of precision, while the shape deformation of the sorted table cartogram is smaller than the table cartogram with the original order of rows. We therefore confirm that the sorting of rows and/or columns reduces deformation, thereby enabling figure readers to focus on the change in shapes in table cartograms.

A table cartogram is one of visualization options, such as coloring of cells and drawing of bar charts in cells. An advantage of the visualization by table cartograms is that they can indicate the differences of data sizes. Another advantage is that they can visualize two types of information simultaneously by classifying cells of table cartograms by colors. When visualizing time-series data, it would be effective to show the values by cell sizes and their rates of change by colors.

A quadrilateral table cartogram is useful to visualize table-form data that have clear characteristics of rows/columns. Figure 5a is the example; the data distribution patterns of columns, rows, and cells are clearly visualized by the linear edges. The cell sizes fit the data almost exactly, even though the large difference exists in data size such that the thermal conductivity of silver (Ag) is 55 times as large as that of manganese (Mn). Even if the original table-form data do not indicate clear data distribution patterns, the quadrilateral table cartogram construction with sorting of rows/columns might be able to clarify and visualize the patterns, as is shown in Figure 6a.

However, if there is no clear pattern in the data distribution of the sorted table, a table cartogram is not an option for the visualization. When the distribution of data varies in rows or columns, it is expected to be difficult to fit cell sizes to their given data values at high precision. For example, an origin–destination matrix usually has large values in diagonal cells, and the similarity of data distribution between rows or columns is rather small. Consequently, it would be difficult to visualize the table by a quadrilateral table cartogram with fixed borders.

As we discussed in Section 3.1.2, the proposed construction method is applicable to the large table-form data, although the sorting would take time to get the order with smaller similarity indices. However, the limitation of applying table cartograms to large table-form data lies in the effectiveness as a visualization tool. When the number of rows and columns are large, it would be difficult to find the data of interest.

The above discussion reveals the limitations in the usages of quadrilateral table cartograms.

In this article, we proposed a method to construct quadrilateral table cartograms from the given table-form data, along with a method to sort rows and columns of tables, to output table cartograms with small deformations.

The proposed method for the construction consists of two steps: the first step adjusts the heights of rows and widths of columns of tables by preserving the direction of horizontal and vertical edges; the second step transforms the shapes of cells to fit their sizes to their given data values, while controlling the degrees of deformation. Both steps were formulated as a non-linear minimization problem, and they are solved by iterative processes of linearization around the approximate solutions.

This study also proposed a method to sort rows and columns of tables according to the similarity of data distribution. Utilizing the K–L divergence as an index of similarity of rows and columns, the sorting problem was formulated as a combinatorial optimization problem and was solved using the simulated annealing algorithm.

Applying this method to the data confirmed that the proposed construction method can generate highly accurate quadrilateral table cartograms with smaller deformations of shape compared to previous construction methods, and the sorting of rows and/or columns enables the output of table cartograms with smaller deformations while improving the visibility of the characteristics of data distribution.

The proposed methods are applicable for the visualization of tables with positive values. However, as we discussed in Section 4.3, there are limitations where the visualization by quadrilateral table cartograms are not applicable. We evaluated the proposed method based on data representation precision and shape deformation; however, we did not analyze the interpretability of data. It is necessary to check the usefulness of visualization by quadrilateral table cartograms through the comparison with other data visualization methods by user studies. These discussions are left for future studies.

Conceptualization, R.I.; Methodology, R.I.; Software, M.L.; Validation, M.L. and R.I.; Formal Analysis, M.L. and R.I.; Investigation, M.L.; Resources, R.I.; Data Curation, M.L.; Writing—Original Draft Preparation, M.L.; Writing—Review and Editing, R.I.; Visualization, M.L.; Supervision, R.I.; Project Administration, R.I.; Funding Acquisition, R.I. All authors have read and agreed to the published version of the manuscript.

This study was funded by Japan Society for the Promotion of Science KAKENHI, grant number JP18H01552.

The authors declare no conflict of interest.

- Tobler, W.R. Thirty-five years of computer cartograms. Ann. Am. Assoc. Geogr.
**2004**, 94, 58–73. [Google Scholar] [CrossRef] - Nusrat, S.; Kobourov, S. The state of the art in cartograms. Comput. Graph. Forum
**2016**, 35, 619–642. [Google Scholar] [CrossRef] - Sheehan, W.F. Periodic table of elements with emphasis. Chemistry
**1976**, 49, 17–18. [Google Scholar] - Winter, M.J. Diffusion cartogram for the display of periodic table data. J. Chem. Educ.
**2011**, 88, 1507–1510. [Google Scholar] [CrossRef] - Evans, W.; Felsner, S.; Kaufmann, M.; Kobourov, S.G.; Mondal, D.; Nishat, R.I.; Verbeek, K. Table cartogram. Comput. Geom.
**2018**, 68, 174–185. [Google Scholar] [CrossRef] - Element Scarcity—EuChemS Periodic Table. Available online: https://www.euchems.eu/euchems-periodic-table/ (accessed on 30 October 2019).
- Gastner, M.T.; Newman, M.E.J. Diffusion-based method for producing density-equalizing maps. Proc. Natl. Acad. Sci. USA
**2004**, 101, 7499–7504. [Google Scholar] [CrossRef] - Inoue, R.; Shimizu, E. A new algorithm for continuous area cartogram construction with triangulation of regions and restriction on bearing changes of edges. Cartogr. Geogr. Inf. Sci.
**2006**, 33, 115–125. [Google Scholar] [CrossRef] - Li, M.; Inoue, R. Table cartogram generation as an optimization problem. Abstr. Int. Cartogr. Assoc.
**2019**, 1, 1–2. [Google Scholar] [CrossRef] - Dorling, D. Area Cartograms: Their Use and Creation, 1st ed.; Department of Geography, University of Bristol: Bristol, UK, 1996. [Google Scholar]
- Inoue, R. A new construction method for circle cartograms. Cartogr. Geogr. Inf. Sci.
**2011**, 38, 146–152. [Google Scholar] [CrossRef] - Rasiz, E. The rectangular statistical cartogram. Geogr. Rev.
**1934**, 24, 292–296. [Google Scholar] [CrossRef] - Upton, G.J.G. Rectangular cartograms, spatial autocorrelation, and interpolation. J. Reg. Sci. Assoc. Int.
**1991**, 70, 287–302. [Google Scholar] [CrossRef] - Inoue, R. Generalized approach to construction of simple-shaped non-contiguous area cartograms. In Proceedings of the 25th International Cartographic Conference, Paris, France, 3–8 July 2011; p. CO-272. [Google Scholar]
- Inoue, R. An approach to the construction of simple-shaped non-contiguous area cartograms. Theory Appl. GIS
**2012**, 20, 11–22. (In Japanese) [Google Scholar] [CrossRef] - Olson, J.M. Noncontiguous area cartogram. Prof. Geogr.
**1976**, 28, 371–380. [Google Scholar] [CrossRef] - Tobler, W.R. A continuous transformation useful for districting. Ann. N. Y. Acad. Sci.
**1973**, 219, 215–220. [Google Scholar] [CrossRef] [PubMed] - Tobler, W.R. Pseudo-cartograms. Am. Cartogr.
**1986**, 13, 43–50. [Google Scholar] [CrossRef] - Dougenik, J.A.; Chrisman, N.R.; Niemeyer, D.R. An algorithm to construct continuous area cartograms. Prof. Geogr.
**1985**, 37, 75–81. [Google Scholar] [CrossRef] - Gusein-Zade, S.M.; Tikunov, V.S. A new technique for constructing continuous cartograms. Cartogr. Geogra. Inf. Syst.
**1993**, 20, 167–173. [Google Scholar] [CrossRef] - House, D.H.; Kocmoud, C.J. Continuous cartogram construction. Proc. IEEE Symp. Inf. Vis.
**1998**, 197–204. [Google Scholar] [CrossRef] - Keim, D.A.; North, S.C.; Panse, C. CartoDraw: A fast algorithm for generating contiguous cartograms. IEEE Trans. Vis. Comput. Graph.
**2004**, 10, 95–110. [Google Scholar] [CrossRef] - Sun, S. A fast, free-form rubber-sheet algorithm for contiguous area cartograms. Int. J. Geogr. Inf. Sci.
**2013**, 27, 567–593. [Google Scholar] [CrossRef] - Sun, S. An optimized rubber-sheet algorithm for continuous area cartograms. Prof. Geogr
**2013**, 65, 16–30. [Google Scholar] [CrossRef] - Gastner, M.T.; Seguy, V.; More, P. Fast flow-based algorithm for creating density-equalizing map projections. Proc. Natl. Acad. Sci. USA
**2018**, 115, E2156–E2164. [Google Scholar] [CrossRef] [PubMed] - Heilmann, R.; Keim, D.A.; Panse, C.; Sips, M. RecMap: Rectangular map approximations. Proc. IEEE Inf. Vis.
**2004**, 33–40. [Google Scholar] [CrossRef] - Speckmann, B.; van Kreveld, M.; Florisson, S. A linear programming approach to rectangular cartograms. In Progress in Spatial Data Handling; Riedl, A., Kainz, W., Elmes, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 529–546. [Google Scholar]
- van Kreveld, M.; Speckmann, B. On rectangular cartograms. Comput. Geom.
**2007**, 37, 175–187. [Google Scholar] [CrossRef] - Inoue, R.; Kitaura, K.; Shimizu, E. New solution for construction of rectilinear area cartogram. In Proceedings of the 24th International Cartographic Conference, Santiago, Chile, 15–21 November 2009; p. 20_3:1-10. [Google Scholar]
- Shimizu, E.; Inoue, R. A new algorithm for distance cartogram construction. Int. J. Geogr. Inf. Sci.
**2009**, 23, 1453–1470. [Google Scholar] [CrossRef] - Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat.
**1951**, 22, 79–86. [Google Scholar] [CrossRef]

μ | RMSE between Values and Sizes of Cells | Maximal Inner Angle (Degree) | Minimal Inner Angle (Degree) | RMSE of Bearing Angles of Edges between on Table Cartogram and original Table (Degree) | RMSE between Inner Angles and Right Angles (Degree) |
---|---|---|---|---|---|

0.005 | 5.05e-04 | 147.45 | 40.92 | 12.69 | 22.74 |

0.01 | 4.02e-04 | 141.58 | 41.74 | 11.71 | 21.44 |

0.05 | 3.32e-04 | 141.99 | 51.95 | 11.01 | 20.78 |

0.1 | 3.78e-04 | 133.68 | 50.54 | 10.81 | 20.29 |

Methods | RMSE between GDP and Sizes of Cells (10 billion Current USD) | Maximal Inner Angle (Degree) | Minimal Inner Angle (Degree) | RMSE of Bearing Angles of Edges between on Table Cartogram and Original Table (Degree) | RMSE between Inner Angles and Right Angles (Degree) |
---|---|---|---|---|---|

Proposed method | 2.93e-05 | 113.10 | 68.65 | 3.35 | 5.74 |

Inoue and Shimizu (2006) | 3.72e-05 | 130.07 | 50.90 | 10.31 | 15.15 |

Data | RMSE between Values and Sizes of Cells | Maximal Inner Angle (Degree) | Minimal Inner Angle (Degree) | RMSE of Bearing Angles of Edges between on Table Cartogram and Original Table (Degree) | RMSE between Inner Angles and Right Angles (Degree) | Calculation Time (Sec) |
---|---|---|---|---|---|---|

Thermal conductivity | 2.43e-05 (W·cm ^{−1}·K^{−1}) | 175.49 | 11.04 | 15.92 | 25.51 | 1.84 |

Boiling point | 0.0618 (°C) | 117.51 | 66.40 | 5.32 | 9.63 | 3.79 |

Dataset | RMSE between Death Rates and Sizes of Cells (%) | Maximal Inner Angle (Degree) | Minimal Inner Angle (Degree) | RMSE of Bearing Angles of Edges between on Table Cartogram and Original Table (Degree) | RMSE between Inner Angles and Right Angles (Degree) |
---|---|---|---|---|---|

Sorted by proposed method | 1.44e-04 | 129.05 | 57.15 | 6.12 | 11.32 |

Alphabetical order of country names | 1.35e-04 | 140.45 | 50.88 | 5.37 | 8.49 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).