Fast Semi-Supervised t-SNE for Transfer Function Enhancement in Direct Volume Rendering-Based Medical Image Visualization

Serna-Serna, Walter; Álvarez-Meza, Andrés Marino; Orozco-Gutiérrez, Álvaro

doi:10.3390/math12121885

Open AccessArticle

Fast Semi-Supervised t-SNE for Transfer Function Enhancement in Direct Volume Rendering-Based Medical Image Visualization

by

Walter Serna-Serna

^1,*

,

Andrés Marino Álvarez-Meza

²

and

Álvaro Orozco-Gutiérrez

¹

Automatics Research Group, Universidad Tecnológica de Pereira, Pereira 660003, Colombia

²

Signal Processing and Recognition Group, Universidad Nacional de Colombia, Manizales 170003, Colombia

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(12), 1885; https://0-doi-org.brum.beds.ac.uk/10.3390/math12121885

Submission received: 16 May 2024 / Revised: 14 June 2024 / Accepted: 14 June 2024 / Published: 17 June 2024

(This article belongs to the Special Issue Deep Learning Methods for Biomedical and Medical Images)

Download

Browse Figures

Versions Notes

Abstract

:

Magnetic resonance imaging and computed tomography produce three-dimensional volumetric medical images. While a scalar value represents each individual volume element, or voxel, volumetric data are characterized by features derived from groups of neighboring voxels and their inherent relationships, which may vary depending on the specific clinical application. Labeled samples are also required in most applications, which can be problematic for large datasets such as medical images. We propose a direct volume rendering (DVR) framework based on multi-scale dimensionality reduction neighbor embedding that generates two-dimensional transfer function (TF) domains. In this way, we present FSS.t-SNE, a fast semi-supervised version of the t-distributed stochastic neighbor embedding (t-SNE) method that works over hundreds of thousands of voxels without the problem of crowding and with better separation in a 2D histogram compared to traditional TF domains. Our FSS.t-SNE scatters voxels of the same sub-volume in a wider region through multi-scale neighbor embedding, better preserving both local and global data structures and allowing for its internal exploration based on the original features of the multi-dimensional space, taking advantage of the partially provided labels. Furthermore, FSS.t-SNE untangles sample paths among sub-volumes, allowing us to explore edges and transitions. In addition, our approach employs a Barnes–Hut approximation to reduce computational complexity from

O (N^{2})

(t-SNE) to

O (N l o g N)

. Although we require the additional step of generating the 2D TF domain from multiple features, our experiments show promising performance in volume segmentation and visual inspection.

Keywords:

medical image; direct volume rendering; semi-supervised learning; stochastic neighbor embedding; dimensionality reduction

MSC:

68U10

1. Introduction

Medical images such as magnetic resonance imaging (MRI) and computed tomography (CT) generate volumetric datasets composed of three-dimensional samples. Even though each volume element, or voxel, is represented by a scalar value, volumetric data are described by features computed from neighborhoods of voxels that can be singular for each kind of clinical application [1]. For example, some kinds of tumors are identified in a medical image as regions of particular intensities, sizes, and shapes that must be recognized in several slices [2]. Likewise, other pathologies, such as cortical dysplasias, are diagnosed with the thickness of the cortex, the gray-white matter blurring, the depth and curvature of the cerebral sulci, and other abnormalities undetectable directly over the medical image for the human eye [3].

The first alternative to visualize 3D structures, such as tissues and organs, in volumetric data is based on isosurface extraction using segmentation methods. Today, we know these methods as indirect volume rendering (iDVR). The boundaries among segmented regions in the volume are represented by geometric primitives (as vertices and edges that conform triangles) that conform the surface [4]. In many applications, however, the complex structures and their surroundings are indistinguishable based on a single isovalue. In these cases, we need to visualize a complete sub-volume including the neighboring anatomical structures instead of a single surface [5,6]. This second alternative is known as direct volume rendering (DVR), which uses the transfer function (TF) concept instead of the geometric primitives in iDVR. A TF in DVR pretends to find a visual solution through three stages: (i) multi-dimensional feature extraction that produces the TF domain; (ii) 2D feature space exploration to discover patterns that cluster voxels of a sub-volume of interest and its surroundings; (iii) and the application of an optical model that assigns optical properties to each voxel to be displayed on a screen [6,7]. Still, to make sense of 3D structures in these feature spaces, you need to be able to change the data, choose regions of interest (ROIs), and change the visualization widgets in order to better understand the data’s structure or some of their internal properties [8]. Furthermore, it is necessary to provide labels for all samples, e.g., voxels, for better visualization. Regrettably, obtaining labeled data for real-world medical imaging applications is challenging and limited in terms of quantity.

Recently, deep learning approaches have been proposed to automate feature extraction and pattern recognition, outperforming traditional alternatives [9]. Yet, current solutions require label information to be stored in massive databases, and this is not always possible in some clinical applications. Furthermore, evaluating the TF at each volume element can cause visual artifacts and noise in the final rendering [10]. For this reason, the most commonly implemented way to assign optical properties is by hand. Additionally, while modern rendering techniques can render volumetric data interactively, we still need a suitable feature space that facilitates the natural differentiation of volumetric structures and an intuitive and interactive way of designing visualization, especially for those cases of unknown diagnosis where medical images are explored for the first time [5]. Hence, researchers have explored dimensionality reduction (DR) methods to preserve data structures in two-dimensional feature spaces. A low-dimensional (LD) space not only allows for a manual analysis to generate the TF-based volume rendering, but it also avoids the technical issues of high-dimensional (HD) spaces as the curse of dimensionality [11]. Classical DR methods such as principal component analysis (PCA), locally linear embedding (LLE), and self-organizing maps (SOM) have been used in DVR [12,13], although it is well known that similarity preservation approaches such as t-distributed stochastic neighbor embedding (t-SNE) have better performance in DR and data visualization. Regarding medical images, a DR method must be able to deal with millions of samples, and this is actually a drawback of SNE-based algorithms [14].

Although t-SNE overcomes data structure preservation, it is not possible to process more than some tens of thousands of samples as a method that requires pairwise similarities. Not mentioning that conventional t-SNE is not suitable to incorporate labeled data, which can be of benefit for better local and global structure preservation [15]. Some DR methods also use the samples’ class membership, which is given as labels for each sample, when they embed them in an SNE-based framework [16]. Then, cat-SNE uses class labels to change the widths of the Gaussian neighborhoods around each datum instead of getting those widths from the user’s perplexity setting [17]. Still, the labels of all data samples are required. Unfortunately, in real-world applications, e.g., DVR, labeled data are scarce and hard to obtain, yielding semi-supervised DR [18,19]. This field aims to perform DR on partially labeled datasets. Yet, most semi-supervised DR methods do not perform neighborhood preservation assessments to verify their capability in data visualization because it has been prioritized in the classification tasks [20,21].

Regarding the DVR, optical models calculate the intensity of an incident ray on the camera sensor after it travels from the source. This calculation assumes the ray follows a straight path between the source and the sensor. The level of detail in the optical model is affected by the volume element’s properties, including how it absorbs and sends out light, as well as its scattering, shadowing, phase function, and index of refraction. Then, the absorption plus emission approach is the most common, in which each volume element acts as both a light source and an occluder for light from other sources [22]. More advanced models incorporate scattering and shadowing effects, but this increases computational complexity and processing time.

In this paper, we introduce an accelerated and semi-supervised version of the t-SNE algorithm, which allows us to compute the embedding space for hundreds of thousands of voxels given semi-supervised data devoted to DVR from medical images. Our fast semi-supervised t-SNE, termed FSS.t-SNE, deals with both available class labels and input feature similarities, properly fixing the widths of Gaussian neighborhoods to reveal the salient local and global data structures in a 2D space [23]. Moreover, FSS.t-SNE uses a Barnes–Hut approach for accelerating multi-scale neighbor embedding schemes, as proposed in [14]. The heart of the method is the construction of a sparse version of the HD similarity matrix and then the use of quadtrees (for a two-dimensional embedding) to compute the LD similarities. Thus, our approach enhances t-SNE by reducing computational complexity to

O (N l o g N)

. Also, FSS.t-SNE uses multi-scale neighbor embedding, better preserving both local and global data structures. We tested our approach to produce 2D TF domains from multiple features, and we evaluated the impact of this domain for volumetric data exploration based on volume rendering. FSS.t-SNE disentangles the trajectories of samples within sub-volumes, enabling us to investigate the boundaries and the shifts between them. Despite the need to generate the 2D TF domain from several features, all of our studies consistently demonstrated improved performance in medical image volume segmentation and comparative visual examination. We conducted the studies without using any pre- or post-processing methods to specifically evaluate the influence of the TF domains.

This paper is structured as follows: Section 2 summarizes the related work. Section 3 describes the methods. Section 4 and Section 5 present the experiments and discuss the results. Finally, Section 6 outlines the conclusions and future work.

2. Related Work

The TF-based direct volume rendering method consists of three main steps: building the feature space, estimating the transfer function, and mapping the voxels to their optical properties. The first stage involves creating the feature space using the enhanced information of interest. For high-contrast volumes, original voxel intensities are enough to use 1D TFs [24]. Multiple feature extraction of volume data enhances the separability of particular structures; the user sketches the TFs on the feature space, i.e., modifies the optical properties of the volume elements, as a three-dimensional (3D) reconstruction is rendered online. Nonetheless, as DVR is not a completely automated methodology, TF building is not feasible in HD domains. It is also possible to find the edges of elements in a two-dimensional (gradient magnitude, intensity) or three-dimensional (gradient magnitude, intensity, and second derivative) feature space from the original volume [25]. These 2D TFs allow us to explore the transition between two contiguous objects. Alternatively, we can employ the low–high (LH)-histogram for boundary segmentation by characterizing each voxel tagged as a boundary with the low and high intensities of the materials that produce those edges [26]. Yet, the above techniques are not effective for noisy data.

In this case, a statistical approach presents better outcomes. The seminal work in [27] codes the size of each voxel neighborhood depending on a Gaussian distribution; then, the mean and the standard deviation of the voxel neighbors are computed. However, there is not a single feature space that works for all kinds of applications. In this way, many features have been suggested to make data segmentation work better [7]. These include size-based features [28], curvature-based features [29], frequency-based features [30], spatial representation-based TFs [31], and texture-based features [32]. Of note, the combination of these representations establishes the basis for multidimensional TF domains. Once the feature space is constructed, a lookup table can be determined to take advantage of the new sample distribution for clustering the voxel and ease the segmentation of objects in the new space. The function that assigns the optical properties is a curve for the 1D domain and a region of interest (ROI) for the 2D and multidimensional domains. Voxels within the ROI combine to form the partial volume for visual inspection based on the assigned optical properties.

In recent years, researchers have developed both manual and automatic techniques for assigning optical properties. Manual methods introduce various widgets and tools for investigating the data distribution on 1D and 2D histograms [5]. The user configures the parameters and limits of the ROI through visual inspection of the rendered volume. The inconvenience is the non-intuitive relation between the volumetric structures and the data mapped in the TF domain, producing trial-and-error tuning. Regarding automatic methods, both supervised and unsupervised techniques have been applied to the feature spaces. In [33], the authors used isodata in a representation space that was made up of intensity, gradient magnitude, and second directional derivative. A two- and three-step clustering method is described in [34]. First, mean shift clustering is used on the LH-histogram. Next, it is optional to use a mean shift in the spatial domain for the elements of each group identified in the previous step. Afterward, hierarchical clustering is used to obtain the final set of voxels. In [35], the Gaussian mixtures model was used over the 2D domain, characterized by intensity and gradient magnitude. In a different study, the performance of five machine learning techniques was tested in feature spaces with lots of dimensions [36]. The techniques were Gaussian Naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and random forest. Still, a tuning step is required for the TF parameters to improve the details in the rendering.

Whether it is a manual or automatic exploration, each approach operates within a specific feature space. As we expose, HD feature spaces require the implementation of DR methods to improve the performance of automatic techniques and ease visual inspection for the user. Authors in [12] use self-organizing maps (SOMs) to build a 2D embedding space based on the data structure; however, SOMs are affected by the crowding problem. In [37], the authors interpret each feature as an image channel and use scaling by majorizing a complicated function (SMACOF) to generate a scalar image to be rendered directly. In these approaches, the user does not explore the feature space directly because the LD features are displayed as pixel intensities. Conversely, the work in [38] develops an automatic TF design over the intensity plus gradient magnitude domain as an initial exploration to quickly highlight hierarchically the internal volume structures using a graph-based procedure.

More recently, automatic approaches based on convolutional neural networks (CNNs) and other deep learning frameworks have been proposed. In [5], the authors present a deep-learning-assisted approach to automatically derive high-level features based on large data context features extracted with CNN, followed by a DR stage based on reordering and hierarchical exploration. Furthermore, they highlight important ideas for automatic volume rendering. First, as the complexity of the user’s desired criteria grows, finding features that precisely describe the characteristics of the target structures becomes increasingly challenging. Second, characterizing complex structures requires local and global data contexts; however, a high structural complexity significantly hinders the manual search for suitable feature spaces, highlighting that users can still provide valuable domain knowledge in a different way. Further, the work in [39] used generative adversarial networks (GANs) as a generative model to synthesize renderings of a given volume. Input data are a collection of rendered images from a single volume linked to the respective viewpoints and TFs. The authors of [40] added global information to a 3D CNN so that they could solve the supervised learning problem of volumetric ambient occlusion (color mapping and opacity mapping) using randomly generated 1D TFs and a ground truth. A semi-supervised learning method is suggested in [41] as a way to add probabilistic segmentation to the cryo-electron tomography volume visualization pipeline. This technique relies on user-provided label information to train the deep-learning algorithm. The user is then presented with a TF domain for final image rendering adjustments.

In a nutshell, current approaches assume that an appropriate feature space is predefined rather than automatically learned. This underscores the need for exploring new paradigms that develop TF-based feature spaces, which preserve both local and global data structures and offer flexibility in semi-supervised contexts. This flexibility is crucial, given the high costs associated with manually segmenting images for traditional supervised tasks.

3. Methods

Let

V = {v_{i} \in R}_{i = 1}^{N}

be a vectorized set of voxels of a three-dimensional volume and

C = {c_{i} \in {- 1, 1, 2, \dots, C}}_{i = 1}^{N}

be the corresponding set of labels for the sub-volume to which the given voxel

v_{i}

belongs. The label

c_{i} = - 1

means that the voxel is unlabeled or does not belong to any sub-volume of interest. A DVR-based TF’s domain is then built up from a feature estimation stage, which can be shown as a function

f : R \to R^{M}

. Here, M is the number of features that are linked to volumetric properties and

f (v_{i} | V, C, ϕ) = ξ_{i} \in R^{M}

, where

ϕ

is the set of parameters for each feature estimation function.

Here, we propose a fast DR method to obtain a 2D representation

x_{i} \in R^{2}

for each voxel

v_{i}

that explores the entire volume and its M features by hand. Further, optical-based properties are applied to the voxel inside the ROI

R

selected by hand exploration, such that

(r_{i}, g_{i}, b_{i}, α_{i}) = \{\begin{matrix} G (x_{i}) & x_{i} \in R \\ 0 & x_{i} \notin R \end{matrix},

(1)

where

(r_{i}, g_{i}, b_{i}, α_{i})

are the color and opacity channels of

v_{i}

. Following the feature estimation stage, our fast DR approach and the rendering strategy are explained in detail.

3.1. Transfer Function-Based Multi-Dimensional Feature Space

In this work, from a given image, we compute a multi-dimensional feature space that comes from traditional TF techniques for volumetric rendering, for a total of 66 features per voxel (M = 66). The first three are the intensity, the gradient, and the Laplacian, as derivative-based features [25]. The next two are the mean and standard deviation of spherical regions that fit a Gaussian distribution centered in each voxel, as proposed by Haidacher et al. [42] for a maximum radius of six samples. The rest of the features are extracted using the interactive learning and segmentation toolkit (ILASTIK) software, version 1.4.0, which allows us to compute the Gaussian smoothing, the Laplacian of Gaussian, the Gaussian gradient magnitude, a difference in Gaussians, and texture-based features such as the structure tensor and the Hessian of Gaussian eigenvalues.

3.2. Fast Semi-Supervised t-SNE (FSS.t-SNE)

Let

Ξ = {ξ_{i} \in R^{M}}_{i = 1}^{N}

be an HD set of N points and M variables, i.e., those estimated from TF-based features for DVR, and let

X = {x_{i} \in R^{P}}_{i = 1}^{N}

be its LD representation in P variables (

P < M

). When semi-supervised information is available, the labeled samples make up the set

L = {ξ_{i} | c_{i} \in {1, 2, \dots, C}}

, while unlabeled samples gather

U = {ξ_{i} | c_{i} = - 1}

, where C is the number of classes and

| L | + | U | = N

. Notation

| \cdot |

stands for set cardinality.

The semi-supervised t-SNE (SS. t-SNE) solves the crowding problem by better preserving both local and global data structures. It does this by using the HD and LD distances between the i-th and j-th samples, which are

δ_{i j} = {∥ ξ_{i} - ξ_{j} ∥}_{2}

and

d_{i j} = {∥ x_{i} - x_{j} ∥}_{2}

, where

{∥ \cdot ∥}_{2}

is the L2 norm and

i, j \in {1, 2, \dots, N}

[23]. Likewise, the HD and LD pairwise similarities

σ_{i j}

and

s_{i j}

are computed as

σ_{i j} = \{\begin{matrix} σ_{i j}^{(L)}, & ξ_{i} \in L \\ σ_{i j}^{(U)}, & ξ_{i} \in U \end{matrix}

(2)

s_{i j} = \frac{{(1 + d_{i j}^{2})}^{- 1}}{\sum_{k \neq l} {(1 + d_{k l}^{2})}^{- 1}};

(3)

where:

σ_{i j}^{(V)} = \frac{exp (- π_{i}^{(V)} δ_{i j}^{2} / 2)}{\sum_{k \neq i, ξ_{k} \in L \cup U} exp (- π_{i}^{(V)} δ_{i k}^{2} / 2)},

(4)

with

V \in {(L), (U) h}

,

σ_{i i} = s_{i i} = 0

, and

\sum_{j} σ_{i j} = \sum_{j} s_{i j} = 1

. The unlabeled point’s similarity,

σ_{i j}^{(U)}

, is computed as follows:

σ_{i j}^{(U)} = \frac{1}{H} \sum_{h = 1}^{H} σ_{i j}^{(U) h},

(5)

where

σ_{i j}^{(U) h}

is calculated as in Equation (4) using a precision parameter

π_{i}^{(U) h}

that is fixed based on a binary search using an exponentially growing perplexity framework

K_{★}^{h} = 2^{h}

, such that

log (K_{★}^{h}) = - \sum_{j = 1}^{N} σ_{i j}^{h} log (σ_{i j}^{h})

with

h \in {1, 2, . . ., H}

and

H = ⌊ {log}_{2} (N / 2) ⌉

(

⌊ \cdot ⌉

stands for the rounding operation).

The labeled point’s similarity

σ_{i j}^{(L)}

is found using the precision parameter

π_{i}^{(L)}

, which is the lowest value from the set of

π_{i}^{(L) h}

:

\sum_{j \neq i, ξ_{j} \in {L | c_{j} = c_{i}}} σ_{i j}^{(L)} > \tilde{θ} (1 - \sum_{j \neq i, ξ_{j} \in {U}} σ_{i j}^{(L)}) .

(6)

We focus on the largest area around

ξ_{i}

where class

c_{i}

stays dominant and

\tilde{θ} \in [0.5, 1)

allows varying the range of labeled samples that are used to calculate the HD precisions for points in

L

. When no

π_{i}^{(L) h}

satisfies the condition in Equation (6), the precision that maximizes

\sum_{j \neq i, ξ_{j} \in {L | c_{j} = c_{i}}} σ_{i j}^{(L)}

is taken as

π_{i}^{(L)}

.

The precision

π_{i}^{(L) h}

is computed in the same way as

π_{i}^{(U) h}

. Then, symmetric HD similarities are calculated using

σ_{i j} = (σ_{j i} + σ_{i j}) / 2 N

. Further, a Kullback–Leibler divergence quantifies the difference between the HD and LD similarity distributions to determine if

X

is a suitable representation of

Ξ

:

E (Ξ, X | {π_{i}}) = \sum_{i = 1}^{N} \sum_{j = 1}^{N} σ_{i j} \log (σ_{i j} / s_{i j}) .

(7)

Of note, the optimization of the loss in Equation (7) can be achieved using a gradient-descent-based approach as follows:

\frac{\partial E}{\partial x_{i}} = 4 (\sum_{j \neq i} σ_{i j} s_{i j} (x_{i} - x_{j}) Z - \sum_{j \neq i} s_{i j}^{2} (x_{i} - x_{j}) Z) = 4 (F_{i}^{(a t t r)} - F_{i}^{(r e p)}) .

(8)

Z represents the denominator of

s_{i j}

in Equation (3). The initial summation term corresponds to the attractive forces between points, whereas the second term corresponds to the repulsive forces.

In order to address the computational expense of SS.t-SNE and to prioritize the use of multi-dimensional TF-based features, we propose an enhanced version called fast SS.t-SNE (FSS.t-SNE). Our algorithm utilizes a Barnes–Hut approach to accelerate the embedding of neighbors at multiple scales [14].

The heart of the method is in the construction of a sparse version

{\tilde{σ}}_{i}

of the HD similarity vector

σ_{i}

and the use of quadtrees (for P = 2) to compute

s_{i j}

. As multi-scale methods combine single-scale similarities tuned at several perplexities, FSS.t-SNE gathers samples that reasonably model the local and global data structure of each

ξ_{i}

. Afterward, a subset

Ξ_{h}

of random samples in

Ξ

is built for

⌊ 2^{1 - h} N ⌉

elements drawn without replacement. For each h, a vantage-point tree can be efficiently created on

Ξ_{h}

to find the

⌊ 3 K_{h} ⌉

nearest neighbors of each datum. In this manner, we obtain all the nearest neighbors for the smallest scale to preserve the fine levels of detail and a scattered set of points as the scale index increases for a higher resolution view. Then,

{\tilde{σ}}_{i}

is used to compute the precision parameters

π_{i}^{(V)}

.

Although intuition says that the same nearest neighbors should be used in the LD space, experiments have shown an overlap of points in the embedding [14,43]. So, an alternative to using all the elements of

X

adapts a Barnes–Hut algorithm to construct a quadtree for 2D spaces. The principal idea says that points far enough from

x_{v}

, such that

d_{i j} < d_{v j} \approx d_{v i}

, can be treated as equals for the sake of simplification. For this, the rectangular LD space is divided into four cells, and each cell is divided into another four cells until the following condition is satisfied:

r_{\tilde{c}} / d_{v, m_{\tilde{c}}} < θ,

(9)

where

r_{\tilde{c}}

is the diagonal length of the

\tilde{c}

-th cell,

m_{\tilde{c}}

is the center of mass of all the points inside the

\tilde{c}

-th cell,

d_{v, m_{\tilde{c}}}

is the LD distance between

x_{v}

and

m_{\tilde{c}}

, and

θ \in [0, 1]

is a user-defined threshold hyper-parameter trading off accuracy and speed (see Figure 1).

It is worth noting that each cell is associated with a tree node, the root of which contains the entire LD embedding. We partition each cell into a nonleaf node with four children, and a cell-holding condition (see Equation (9)) is a leaf that contains at most one LD sample. In the node related to

\tilde{c}

, we store the number of LD points inside

N_{\tilde{c}}

and the center of mass

m_{\tilde{c}}

. Finally, the repulsive force

F_{v}^{(r e p)}

in Equation (7) is estimated as

F_{v}^{(r e p)} = \sum_{\forall \tilde{c}} N_{\tilde{c}} \frac{(x_{v} - m_{\tilde{c}})}{{(1 + d_{v, m_{\tilde{c}}}^{2})}^{2}} .

(10)

For the sake of clarity, the main t-SNE variant’s computational complexity is presented and contrasted against the FSS.t-SNE method in Table 1. As seen, our approach enhanced the computational complexity from

O (N^{2})

to

O (N log N)

.

3.3. FSS.t-SNE-Based Volumetric Rendering

We use our FSS.t-SNE on the multidimensional TF-based space shown in Section 3.1 to create a relevant 2D representation that codes important local and global data structures, making use of the semi-supervised voxel labels. Then, an absorption plus emission method is used [7,44] to bring out important optical properties of the input image. Our objective is to simulate the strength of a ray of light on the camera sensor after it has traveled from the source. It is presumed that the path of the ray is a direct line connecting the source and the sensor, which helps DVR work better for visualization.

In particular, we can emulate the optical model and generate an image projection of any volume on a screen. So, we build the lookup table based on one or multiple features of interest, grounded in our FSS.t-SNE-based embedding to highlight salient image patterns in a three-dimensional visualization. Of note, the fixed DVR approach allows for defining a common and well-known rendering strategy (holding default parameters) for further TF and DR comparison.

Figure 2 summarizes our DVR approach based on multidimensional TF features and their 2D FSS.t-SNE enhancement. First, a multi-domain TF-based feature space is computed from derivatives, statistical, structure tensor, Gaussian, and Laplacian-based measures from input medical images. Then, FSS.t-SNE is carried out to find a 2D-dimensional space that properly preserves local and global data structure from extracted TF-based features and available semi-supervised voxel labels. Finally, a straightforward ray casting method is computed for DVR.

4. Experimental Set-Up

The following section comprehensively describes the tested datasets and the key experimental conditions utilized to validate our FSS.t-SNE as a valuable tool for TF-based enhancement in DVR for medical image visualization.

4.1. Tested Datasets

Three real-world medical image datasets were studied:

–: Human Tooth Computed Tomography (CT-Tooth) comprises $256 \times 256 \times 161$ voxels, revealing the dentine, enamel, and pulp [25]. The volume is cropped from rows 61 to 106 and columns 58 to 148 to obtain a subvolume of 46 × 91 × 161 = 673,946 voxels. The entire volume is unlabeled (supervised dataset).
–: T1 Magnetic Resonance of a head with skull partially removed to reveal the brain (https://graphics.stanford.edu/data/voldata/ (accessed on 1 February 2024)) (MR-Brain), of size $256 \times 256 \times 99$ . The volume is cropped from rows 50 to 176, the columns 75 to 173, and the slices 54 to 90 to obtain a subvolume of $127 \times 99 \times 45 =$ 565,785 voxels. On some slices, tissues of interest were hand-brushed by an amateur user. Labels are associated with the white matter, the green matter, the skull, the ventricles, the cerebellum, the basal ganglia, and the background. The number of labeled voxels corresponds to $5 %$ of the cropped volume (semi-supervised dataset).
–: Abdominal Computed Tomography (CT-Abdomen) with manual annotations of the lung, bones, liver, kidneys, and bladder, labeled by an expert (https://www.cancerimagingarchive.net/collections/ (accessed on 1 February 2024)) [45]. The volume size is $256 \times 256 \times 75$ and is cropped from rows 14 to 150, columns 30 to 185, obtaining a subvolume of $135 \times 135 \times 75 =$ 751,080 voxels. Original manual annotations label $33 %$ of the voxels in the subvolume by clinicians (semi-supervised dataset).

Figure 3 presents the studied medical images. To test the TFs and FSS.t-SNE capabilities for DVR, we did not apply any additional pre- or post-processing techniques, such as filtering or background removal.

4.2. Training Details and Method Comparison

The specific TF hyper-parameter values include the maximum radius of the statistical properties and the Gaussian kernels’ variances (see Section 3.1). For concrete testing, we use the ILASTIK default values of

0.30

,

0.70

,

1.00

,

1.60

,

3.50

,

5.00

, and

10.00

[46]. In turn, regarding our semi-supervised DR approach for TF enhancement, it is worth mentioning that t-SNE-based methods require initial values for the LD coordinates. Typically, regardless of the DR task, the points are initialized with a random assignment, and then the method optimizes the coordinates, as explained in Section 3.2.

In our application, we take advantage of the original spatial distribution of voxels for each medical image volume. For the LD point representation, we can use two of these three spatial coordinates as a starting point to support FSS.t-SNE for capturing the similarity between the medical image sub-volumes. Further, we can cluster voxels from different initial regions according to the selected features of interest and the provided semi-supervised label information, if available.

Other than that, the authors in [23] looked at the basic SS.t-SNE DR performance and how the embedding changed when the hyper-parameter

\tilde{θ}

and the amount of labeled data were changed. It shows how to keep both intra-class and inter-class neighborhoods for semi-supervised data the same. From here, we use

\tilde{θ} = 0.9

, and the percentage of labeled data depends on the availability of information for each image dataset. Also, the value of the

θ_{r}

hyperparameter was tested in the simple fast unsupervised version of t-SNE in [14]. It was found that

θ = 0.75

is a good compromise between the amount of time needed to compute and the quality of the embedding for both single-scale and multi-scale versions, so we used that value for our FSS.t-SNE.

Our DVR experiments include the following scenarios: first, we carried out experiments for visual inspection of our FSS.t-SNE 2D embeddings and some relevant DVRs. Second, we made a visual comparison between the TF domains of three traditional techniques: Intensity vs. Gradient Magnitude [25], Intensity vs. Laplacian [7], Mean vs. Standard Deviation [27], and our FSS.t-SNE-based enhancement for volumetric structure identification.

Finally, a Dice coefficient was computed for a quantitative assessment as a segmentation task:

D i c e (C, \hat{C}) = \frac{2 | C \cap \hat{C} |}{| C | + | \hat{C} |},

(11)

where

D i c e (C, \hat{C}) \in [0, 1]

,

C

, and

\hat{C}

hold the target and predicted segmentation masks.

All experiments were carried out on a DELL (Texas, US) desktop computer with an Intel Core i7-6700K at 4.2 GHz, 16-GB RAM, and NVIDIA GeForce GTX1080. The FSS.t-SNE algorithm relies on the Python (3.10) programming language and involves some C and Cython developments for performance purposes. Codes are publicly available online at (https://github.com/wserna (accessed on 1 February 2024)).

5. Results and Discussion

5.1. FSS.t-SNE Embedding Results

First, we wanted to register the evolution of the FSS.t-SNE LD space at different training iterations in the sub-volumes of the CT-Tooth and MR-Brain datasets. To create the TF domain, the training process was unsupervised. However, ILASTIK’s segmentation tool [46] was used to automatically label voxels in some areas of interest so that the samples could be visually tracked through the training process.

As seen in Figure 4 and Figure 5, a similar behavior for the CT-Tooth and the MR-Brain was achieved. The global structure of the data is coded in the first 300 iterations, with only small changes being made in later iterations. We can see partially overlapped clusters of voxels from the same tissue, as well as the boundaries between sub-volumes with different intensities. Moreover, some groups experience significant isolation as the number of iterations increases; however, the voxels belonging to the boundary of the sub-volume leave a trace to neighboring sub-volumes. We desire these transitions between sub-volumes because they enable the boundary to be rendered independently. Now, embedding changes between 900 and 1000 iterations are imperceptible.

The division of some small groups, identified in the first 300 iterations, into larger clusters introduces the concept of exploring the LD space at different training moments. It is also important to note that the ILASTIK segmentation algorithm colored points in the embedding spaces based on their intensities, while the HD feature space emphasized pattern-based textures, which let clusters represent sub-volumes with different traits. On the other hand, as we hypothesize, the initialization of the LD space in our FSS.t-SNE using the voxel’s spatial coordinates allows for the preservation of large neighborhoods, as sub-volumes searched in DVR always involve thousands of nearby samples.

Afterward, we tested the Intensity vs. Gradient Magnitude [25], Intensity vs. Laplacian [7], Mean vs. Standard Deviation [27], and our FSS.t-SNE-based TFs enhancement. For concrete testing, the resulting histogram was divided into

200 \times 200

bins, and a heat color map was used to represent high-density spots. Next, we manually searched some structures of varying complexities in the 2D-histograms as a natural step in the DVR process. The user moves the elliptical ROIs over the histogram, rendering the volume online in the spatial coordinates to provide feedback on the search. As shown in Figure 6 and Figure 7, statistical or derivative properties group together voxels with the same volume, or similar intensities, in the feature space. This is advantageous to segment big sub-volumes when there are a few of them with differentiable intensities, as in computed tomography of simple volumes, e.g., CT-Tooth. However, most of the time, noise in volumetric data mixes clusters of different sub-volumes, as we can see in Figure 7 for the MR-Brain dataset.

Our FSS.t-SNE offers alternative approaches for TF domains, improving cluster definition and separability. It also bursts the crowded groups, spreading them over a larger area of the LD space while preserving the neighborhoods of the same type. Here, we take advantage of the basic principle of t-SNE to tackle the crowding problem in DR. In this way, we can segment the sub-volume and explore the interior part of larger ROIs in the TF domain. Moreover, 2D embeddings based on derivatives, which show the changes between sub-volumes as bows, often overlap with each other. In some cases, the enhanced TF domain of FSS.t-SNE unwraps the bow interceptions, tracing new routes to maintain the original transitions or cropping one of the bows in half.

5.2. DVR Visual Inspection Results

Next, we used 2D histograms from regular TF domains and the new FSS.t-SNE enhancement for the MR-Brain to look around by hand using elliptical ROIs to make sure we found the same part of the brain in all of the embedding spaces. We only utilized the label information known for manual exploration, concealing the labels generated automatically by ILASTIK. Here, we wanted to distinguish some relevant structures from semi-supervised inputs as a collection of smaller sub-volumes in the middle of the brain that have a low-contrast boundary with other tissues. Figure 8 depicts how the FSS.t-SNE TF domain allows for these structures to be segmented with superior performance compared to traditional TFs. We obtained the same results for the ventricles (see Figure 9) and the hard body (see Figure 10), but the 2D histogram showed the boundary and interior of the sub-volumes in different places in each case. In addition, we segmented a large anatomical structure, as is the white matter in Figure 11, reaching, again, better isolation from the facial muscles and the skull.

Figure 12 and Figure 13 present some realistic renders from multiple ROIs generated on the 2D FSS.t-SNE embedding for the MRI-Brain dataset. Including color and opacity allows us to study the spatial relations of sub-volumes before applying other segmentation algorithms. Additionally, it allows us to represent multiple variables using optical properties, something that is not possible with isosurfaces. Figure 12 shows how we can get a render that lets us see the white and gray matter (red and blue), the core of the brain (orange), and the face (yellow), using three ROIs; each one of these tissues may be removed completely if it is required. We applied the same look-up table to the voxels inside each ROI. Figure 13 shows another DVR visualization for the same TF domain. Our approach allows for rendering the gray matter and the ventricles in low-opacity blue; this ROI also includes the face and eye anatomy and the basal ganglia in high-opacity orange.

5.3. Segmentation Results

For the last experiment, we used the CT-Abdomen dataset and its verified label information for five organs. The goal was to obtain a quantitative measure of organ isolation based solely on the proposed FSS.t-SNE-based TF domain. To compute a DICE coefficient, we manually identified the organs and compared the segmented volume with the original label information. We did not implement any post-processing, automatic segmentation, or classification algorithms to solely measure the performance of the LD feature spaces. Next, as a quantitative assessment of segmentation performance, we used the official labeled CT-Abdomen data, which contains labels for the liver, the lungs, the kidneys, and the bone. Using elliptical ROIs, we investigated the 2D histogram by hand. Again, there was no type of post-processing (as in the DVR pipeline) to measure only the performance of the TF domains. Table 2 shows the outcomes for the segmented organs. As seen, our approach attained the best segmentation performance by properly coupling semi-supervised patterns from multiple TF features within a nonlinear DR approach that allows for the preservation of both the local and global data structures.

It is worth noting that our FSS.t-SNE-based DVR has significant practical implications for clinical applications in medical imaging [47]. By incorporating semi-supervised learning and reducing computational complexity, FSS.t-SNE can enhance diagnostic accuracy by providing clearer and more detailed visualizations of complex medical images, making it easier to identify abnormalities. Indeed, partially annotated images could boost specialist analysis. This improved clarity can aid in more precise treatment planning, allowing for better-targeted interventions and monitoring. Additionally, the method’s ability to handle large datasets efficiently can facilitate real-time analysis and decision-making, potentially improving patient outcomes by enabling faster and more accurate diagnoses and treatments. Indeed, integrating FSS.t-SNE into clinical workflows could lead to more effective and personalized medical care, ultimately benefiting patient health and treatment success [48].

5.4. Limitations

The benefits of FSS.t-SNE for DVR-based medical image visualization are dependent on the input data resolution. Higher-resolution images have a greater density of voxels, thereby increasing the overall volume of data that requires processing. This can impact both computational load and memory usage. More detailed images lead to a larger number of voxels, which requires more memory and computational power to process. To handle the increased data volume efficiently, we might need to optimize the algorithm or run it on more powerful hardware. In contrast, fewer voxels (low resolution) means less detailed images, potentially missing finer structures or boundaries within the medical images [49]. Resolution can also influence the process of extracting features, such as TF-based domains, from the images and their 2D dimensionality reduction representation. We can extract more detailed features, which can provide richer information for the TF domain and potentially lead to better visualization. However, the extraction process itself may become more complex and time-consuming. A low resolution yields less detailed TF, resulting in a loss of important information that would be critical for accurate DVR. Next, resolution has a significant impact on the final rendered output. The higher voxel count allows for smoother and more detailed renderings. Conversely, a low resolution may produce visualizations that are less detailed and potentially miss small but clinically significant features, affecting the diagnostic utility of the rendered images [50].

In turn, our FSS.t-SNE model seems to be effective at handling noise and artifacts because it can use semi-supervised data to pull out TF-based features and lower the number of dimensions. The latter makes the DR and DVR robust. However, we need to conduct more experiments to find out how the pre- and post-processing steps can affect our framework [51].

6. Conclusions

We propose a new framework for direct volume rendering based on multi-scale dimensionality reduction neighbor embedding and generating two-dimensional transfer function domains to explore volumetric data from multiple features. Compared to traditional transfer function-based domains, our proposal, Fast Semi-Supervised t-SNE (FSS.t-SNE), allows for a higher separability of voxels by coding semi-supervised inputs when tabulated in a 2D histogram. Moreover, it incorporates both class labels and input feature similarities, e.g., semi-supervised data, to accurately determine the widths of Gaussian neighborhoods. This allows for the identification of important local and global data structures in 2D space. In addition, FSS.t-SNE employs a Barnes–Hut method to expedite multi-scale neighbor embedding techniques. The core of the approach involves creating a condensed form of the HD similarity matrix and subsequently using quadtrees (for a two-dimensional representation) to calculate the LD similarities. Then, our approach outperforms t-SNE by reducing computational complexity to

O (N l o g N)

.

By leveraging the multi-scale neighbor embedding approach, FSS.t-SNE scatters voxels of the same sub-volume in a larger region. This technique reduces the crowding problem inherent in the traditional t-SNE, which helps to better distribute data points in low-dimensional space. By doing so, FSS.t-SNE allows for more distinct clustering of similar voxels while maintaining their relationships based on their original high-dimensional features. Furthermore, it partially untangles samples’ paths among sub-volumes, allowing us to explore the edges and transitions among these. Although we require the additional step of generating the 2D TF domain from multiple features, all our experiments show better performance in volume segmentation and in a comparative visual inspection. The experiments did not involve any pre- or post-processing algorithms in order to assess only the impact of the TF domains. Note that this is not an algorithm for automatic segmentation but rather a tool to analyze and visualize unexplored volumetric data or to generate special visual effects in 3D rendering based on multiple features of interest. Also, our framework allows us to collect label information from human specialists to feed other machine learning methods that require supervision.

As future work, we will update the framework to implement bricking and octree empty space skipping to process volumetric datasets with millions of voxels. Additionally, it is imperative that we conduct additional experiments to determine the potential impact of the pre- and post-processing phases on our framework, i.e., denoising filters [51]. Lastly, we aim to couple deep learning frameworks within our DR approach for DVR [52].

Author Contributions

Conceptualization, W.S.-S., A.M.Á.-M., and Á.O.-G.; data curation, W.S.-S. and A.M.Á.-M.; methodology, W.S.-S. and A.M.Á.-M.; project administration, Á.O.-G.; supervision, Á.O.-G., and A.M.Á.-M.; resources, W.S.-S. and A.M.Á.-M. All authors have read and agreed to the published version of the manuscript.

Funding

Under grants provided by the program: “Alianza científica con enfoque comunitario para mitigar brechas de atención y manejo de trastornos mentales relacionados con impulsividad en Colombia (ACEMATE)-91908”—Project: “Sistema multimodal apoyado en juegos serios orientado a la evaluación e intervención neurocognitiva personalizada en trastornos de impulsividad asociados a TDAH como soporte a la intervención presencial y remota en entornos clínicos, educativos y comunitarios-790-2023”, funded by Mincienicas. A. Alvarez acknowledges the project: “Sistema de visión artificial para el monitoreo y seguimiento de efectos analgésicos y anestésicos administrados vía neuroaxial epidural en población obstétrica durante labores de parto para el fortalecimiento de servicios de salud materna del Hospital Universitario de Caldas—SES HUC-(HERMES-57661)”, funded by Universidad Nacional de Colombia.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found at [25,45] and https://graphics.stanford.edu/data/voldata/ (accessed on 1 February 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mijwil, M.M.; Al-Mistarehi, A.H.; Abotaleb, M.; El-kenawy, E.S.M.; Ibrahim, A.; Abdelhamid, A.A.; Eid, M.M. From Pixels to Diagnoses: Deep Learning’s Impact on Medical Image Processing-A Survey. Wasit J. Comput. Math. Sci. 2023, 2, 9–15. [Google Scholar] [CrossRef]
Magadza, T.; Viriri, S. Deep learning for brain tumor segmentation: A survey of state-of-the-art. J. Imaging 2021, 7, 19. [Google Scholar] [CrossRef] [PubMed]
Zhang, S.; Zhuang, Y.; Luo, Y.; Zhu, F.; Zhao, W.; Zeng, H. Deep learning-based automated lesion segmentation on pediatric focal cortical dysplasia II preoperative MRI: A reliable approach. Insights Imaging 2024, 15, 71. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Song, L.; Liu, S.; Zhang, Y. A review of deep-learning-based medical image segmentation methods. Sustainability 2021, 13, 1224. [Google Scholar] [CrossRef]
Cheng, H.C.; Cardone, A.; Jain, S.; Krokos, E.; Narayan, K.; Subramaniam, S.; Varshney, A. Deep-learning-assisted volume visualization. IEEE Trans. Vis. Comput. Graph. 2018, 25, 1378–1391. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Han, J. Dl4scivis: A state-of-the-art survey on deep learning for scientific visualization. IEEE Trans. Vis. Comput. Graph. 2022, 29, 3714–3733. [Google Scholar] [CrossRef] [PubMed]
Ljung, P.; Krüger, J.; Groller, E.; Hadwiger, M.; Hansen, C.D.; Ynnerman, A. State of the art in transfer functions for direct volume rendering. Comput. Graph. Forum 2016, 35, 669–691. [Google Scholar] [CrossRef]
Besançon, L.; Ynnerman, A.; Keefe, D.F.; Yu, L.; Isenberg, T. The state of the art of spatial interfaces for 3d visualization. Comput. Graph. Forum 2021, 40, 293–326. [Google Scholar] [CrossRef]
Suganyadevi, S.; Seethalakshmi, V.; Balasamy, K. A review on deep learning in medical image analysis. Int. J. Multimed. Inf. Retr. 2022, 11, 19–38. [Google Scholar] [CrossRef]
Ruijters, D. Common Artifacts in Volume Rendering. arXiv 2021, arXiv:2109.13704. [Google Scholar]
Berisha, V.; Krantsevich, C.; Hahn, P.R.; Hahn, S.; Dasarathy, G.; Turaga, P.; Liss, J. Digital medicine and the curse of dimensionality. NPJ Digit. Med. 2021, 4, 153. [Google Scholar] [CrossRef]
de Moura Pinto, F.; Freitas, C.M. Design of multi-dimensional transfer functions using dimensional reduction. In Proceedings of the 9th Joint Eurographics/IEEE VGTC Conference on Visualization, Norrköping, Sweden, 23–25 May 2007; Eurographics Association: Goslar, Germany, 2007; pp. 131–138. [Google Scholar]
Kim, H.S.; Schulze, J.P.; Cone, A.C.; Sosinsky, G.E.; Martone, M.E. Dimensionality reduction on multi-dimensional transfer functions for multi-channel volume data sets. Inf. Vis. 2010, 9, 167–180. [Google Scholar] [CrossRef]
de Bodt, C.; Mulders, D.; Vserleysen, M.; Lee, J.A. Fast Multiscale Neighbor Embedding. IEEE Trans. Neural Netw. Learn. Syst. 2020, 34, 1546–1560. [Google Scholar] [CrossRef]
Zhu, R.; Dornaika, F.; Ruichek, Y. Semi-supervised elastic manifold embedding with deep learning architecture. Pattern Recognit. 2020, 107, 107425. [Google Scholar] [CrossRef]
Zheng, J.; Qiu, H.; Xu, X.; Wang, W.; Huang, Q. Fast Discriminative Stochastic Neighbor Embedding Analysis. Comput. Math. Methods Med. 2013, 2013, 106867. [Google Scholar] [CrossRef]
de Bodt, C.; Mulders, D.; López-Sánchez, D.; Verleysen, M.; Lee, J.A. Class-aware t-SNE: cat-SNE; ESANN: Bruges, Belgium, 2019; pp. 409–414. [Google Scholar]
Sheikhpour, R.; Sarram, M.A.; Gharaghani, S.; Chahooki, M.A.Z. A survey on semi-supervised feature selection methods. Pattern Recognit. 2017, 64, 141–158. [Google Scholar] [CrossRef]
Zhu, T.; Pimentel, M.A.; Clifford, G.D.; Clifton, D.A. Unsupervised bayesian inference to fuse biosignal sensory estimates for personalizing care. IEEE J. Biomed. Health Inform. 2018, 23, 47–58. [Google Scholar] [CrossRef]
Huang, S.; Elgammal, A.; Huangfu, L.; Yang, D.; Zhang, X. Globality-locality preserving projections for biometric data dimensionality reduction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 15–20. [Google Scholar]
Huang, R.; Zhang, G.; Chen, J. Semi-supervised discriminant Isomap with application to visualization, image retrieval and classification. Int. J. Mach. Learn. Cybern. 2019, 10, 1269–1278. [Google Scholar] [CrossRef]
Engel, K.; Hadwiger, M.; Kniss, J.M.; Lefohn, A.E.; Salama, C.R.; Weiskopf, D. Real-Time Volume Graphics. In Proceedings of the SIGGRAPH ’04: ACM SIGGRAPH 2004 Course Notes, Los Angeles, CA, USA, 8–12 August 2004; Association for Computing Machinery: New York, NY, USA, 2004; p. 29-es. [Google Scholar] [CrossRef]
Serna-Serna, W.; de Bodt, C.; Alvarez-Meza, A.M.; Lee, J.A.; Verleysen, M.; Orozco-Gutierrez, A.A. Semi-supervised t-SNE with multi-scale neighborhood preservation. Neurocomputing 2023, 550, 126496. [Google Scholar] [CrossRef]
Levoy, M. Display of surfaces from volume data. IEEE Comput. Graph. Appl. 1988, 8, 29–37. [Google Scholar] [CrossRef]
Kniss, J.; Kindlmann, G.; Hansen, C. Multidimensional transfer functions for interactive volume rendering. IEEE Trans. Vis. Comput. Graph. 2002, 8, 270–285. [Google Scholar] [CrossRef]
Sereda, P.; Bartroli, A.V.; Serlie, I.W.; Gerritsen, F.A. Visualization of boundaries in volumetric data sets using LH histograms. IEEE Trans. Vis. Comput. Graph. 2006, 12, 208–218. [Google Scholar] [CrossRef] [PubMed]
Haidacher, M.; Patel, D.; Bruckner, S.; Kanitsar, A.; Gröller, M.E. Volume visualization based on statistical transfer-function spaces. In Proceedings of the IEEE PacificVis, Taipei, Taiwan, 2–5 March 2010. [Google Scholar]
Correa, C.; Ma, K.L. Size-based transfer functions: A new volume exploration technique. IEEE Trans. Vis. Comput. Graph. 2008, 14, 1380–1387. [Google Scholar] [CrossRef] [PubMed]
Kindlmann, G.; Whitaker, R.; Tasdizen, T.; Moller, T. Curvature-based transfer functions for direct volume rendering: Methods and applications. In Proceedings of the IEEE Visualization (VIS 2003), Seattle, WA, USA, 19–24 October 2003; IEEE: Piscataway, NJ, USA, 2003; pp. 513–520. [Google Scholar]
Totsuka, T.; Levoy, M. Frequency domain volume rendering. In Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques, Anaheim, CA, USA, 2–6 August 1993; pp. 271–278. [Google Scholar]
Roettger, S.; Bauer, M.; Stamminger, M. Spatialized transfer functions. In Proceedings of the EuroVis, Leeds, UK, 1–3 June 2005; pp. 271–278. [Google Scholar]
Caban, J.J.; Rheingans, P. Texture-based transfer functions for direct volume rendering. IEEE Trans. Vis. Comput. Graph. 2008, 14, 1364–1371. [Google Scholar] [CrossRef] [PubMed]
Tzeng, F.Y.; Ma, K.L. A cluster-space visual interface for arbitrary dimensional classification of volume data. In Proceedings of the IEEE TCVG Conference on Visualization, Konstanz, Germany, 19–21 May 2004. [Google Scholar]
Nguyen, B.P.; Tay, W.L.; Chui, C.K.; Ong, S.H. A clustering-based system to automate transfer function design for medical image visualization. Vis. Comput. 2012, 28, 181–191. [Google Scholar] [CrossRef]
Wang, Y.; Chen, W.; Zhang, J.; Dong, T.; Shan, G.; Chi, X. Efficient volume exploration using the gaussian mixture model. IEEE Trans. Vis. Comput. Graph. 2011, 17, 1560–1573. [Google Scholar] [CrossRef] [PubMed]
Soundararajan, K.P.; Schultz, T. Learning probabilistic transfer functions: A comparative study of classifiers. Comput. Graph. Forum 2015, 34, 111–120. [Google Scholar] [CrossRef]
Matrakas, M.D.; Scheer, S. Three-Dimensional Representation of a Multidimensional Data Set. Appl. Math. Sci. 2016, 10, 959–971. [Google Scholar] [CrossRef]
Ponciano, D.; Seefelder, M.; Marroquim, R. Graph-based interactive volume exploration. Comput. Graph. 2016, 60, 55–65. [Google Scholar] [CrossRef]
Berger, M.; Li, J.; Levine, J.A. A generative model for volume rendering. IEEE Trans. Vis. Comput. Graph. 2018, 25, 1636–1650. [Google Scholar] [CrossRef]
Engel, D.; Ropinski, T. Deep volumetric ambient occlusion. IEEE Trans. Vis. Comput. Graph. 2020, 27, 1268–1278. [Google Scholar] [CrossRef] [PubMed]
Nguyen, N.; Bohak, C.; Engel, D.; Mindek, P.; Strnad, O.; Wonka, P.; Li, S.; Ropinski, T.; Viola, I. Finding Nano-Ötzi: Cryo-Electron Tomography Visualization Guided by Learned Segmentation. IEEE Trans. Vis. Comput. Graph. 2022, 29, 4198–4214. [Google Scholar] [CrossRef] [PubMed]
Haidacher, M.; Patel, D.; Bruckner, S.; Kanitsar, A.; Gröller, M.E. Volume visualization based on statistical transfer-function spaces. In Proceedings of the 2010 IEEE Pacific Visualization Symposium (PacificVis), Taipei, Taiwan, 2–5 March 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 17–24. [Google Scholar]
Van Der Maaten, L. Barnes-hut-sne. arXiv 2013, arXiv:1301.3342. [Google Scholar]
Max, N. Optical models for direct volume rendering. IEEE Trans. Vis. Comput. Graph. 1995, 1, 99–108. [Google Scholar] [CrossRef]
Rister, B.; Yi, D.; Shivakumar, K.; Nobashi, T.; Rubin, D.L. CT-ORG, a new dataset for multiple organ segmentation in computed tomography. Scientific Data 2020, 7, 381. [Google Scholar] [CrossRef] [PubMed]
Berg, S.; Kutra, D.; Kroeger, T.; Straehle, C.N.; Kausler, B.X.; Haubold, C.; Schiegg, M.; Ales, J.; Beier, T.; Rudy, M.; et al. Ilastik: Interactive machine learning for (bio) image analysis. Nat. Methods 2019, 16, 1226–1232. [Google Scholar] [CrossRef]
Duran, A.H.; Duran, M.N.; Masood, I.; Maciolek, L.M.; Hussain, H. The additional diagnostic value of the three-dimensional volume rendering imaging in routine radiology practice. Cureus 2019, 11, e5579. [Google Scholar] [CrossRef]
Bai, S.; Ma, C.; Wang, X.; Zhou, S.; Jiang, H.; Ma, L.; Jiang, H. Application of Medical Image 3D Visualization Web Platform in Auxiliary Diagnosis and Preoperative Planning. J. Image Graph. 2023, 11, 32–39. [Google Scholar] [CrossRef]
Chen, R.; Ran, Y.; Wu, Y.; Xu, H.; Niu, J.; Zhang, Y.; Cheng, J. The value of the cinematic volume rendering technique: Magnetic resonance imaging in diagnosing tumors associated with the brachial plexus. Eur. J. Med. Res. 2023, 28, 569. [Google Scholar] [CrossRef]
Chen, R.; Ran, Y.; Xu, H.; Niu, J.; Wang, M.; Wu, Y.; Zhang, Y.; Cheng, J. The guiding value of the cinematic volume rendering technique in the preoperative diagnosis of brachial plexus schwannoma. Front. Oncol. 2023, 13, 1278386. [Google Scholar] [CrossRef]
Ardakani, A.A.; Mohammadi, A.; Faeghi, F.; Acharya, U.R. Performance evaluation of 67 denoising filters in ultrasound images: A systematic comparison analysis. Int. J. Imaging Syst. Technol. 2023, 33, 445–464. [Google Scholar] [CrossRef]
Bhalodia, R.; Elhabian, S.; Adams, J.; Tao, W.; Kavan, L.; Whitaker, R. DeepSSM: A blueprint for image-to-shape deep learning models. Med. Image Anal. 2024, 91, 103034. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Barnes–Hut algorithm illustration for LD similarity computation in FSS.t-SNE. Black dots: LD samples.

Figure 2. Fast SS.t-SNE for image-based Direct Volume Rendering from Transfer Function features. Red and blue colors represent semi-supervised HD samples. Red and green colors in the LD feature space, as well as a rendered volume stand for visualized regions.

Figure 3. Tested medical images for DVR: CT-Tooth (supervised), MR-Brain (semi-supervised), and CT-Abdomen (semi-supervised). The structure’s label (if available) is denoted by color.

Figure 4. CT-Tooth 2D embedding at different FSS.t-SNE training iterations. Color stands for region label.

Figure 5. MR-Brain 2D embedding at different FSS.t-SNE training iterations. Color stands for region label.

Figure 6. CT-Tooth 2D embedding—method comparison. First row: Embeddings. Second row: 2D Histograms—Intensity vs. Gradient Magnitude, Intensity vs. Laplacian, Mean vs. Standard Deviation, and FSS.t-SNE-based TF enhancement (ours). Color stands for region label.

Figure 7. MR-Brain 2D embedding—method comparison. First row: Embeddings. Second row: 2D Histograms—Intensity vs. Gradient Magnitude, Intensity vs. Laplacian, Mean vs. Standard Deviation, and FSS.t-SNE-based TF enhancement (ours). Color stands for region label.

Figure 8. MR-Brain basal ganglia DVR results. First row: 3D Renders. Second row: 2D Histograms—Intensity vs. Gradient Magnitude, Intensity vs. Laplacian, Mean vs. Std, and FSS.t-SNE (ours). Arrows stand for the DVR studied region.

Figure 9. MR-Brain ventricles DVR results. First row: 3D Renders. Second row: 2D Histograms—Intensity vs. Gradient Magnitude, Intensity vs. Laplacian, Mean vs. Std, and FSS.t-SNE (ours). Arrows stand for the DVR studied region.

Figure 10. MR-Brain hard body DVR results. First row: 3D Renders. Second row: 2D Histograms—Intensity vs. Gradient Magnitude, Intensity vs. Laplacian, Mean vs. Std, and FSS.t-SNE (ours). Arrows stand for the DVR studied region.

Figure 11. MR-Brain white matter DVR results. First row: 3D Renders. Second row: 2D Histograms—Intensity vs. Gradient Magnitude, Intensity vs. Laplacian, Mean vs. Std, and FSS.t-SNE (ours). Arrows stand for the DVR studied region.

Figure 12. MR-Brain dataset DVR results: FSS.t-SNE-based multiple-ROIs render of the white matter, cortex, ventricles, and face from different points of view. The bottom left box represents the lookup table for optical properties. Colored points stand for 2D TF bounds. Colored shadow regions represent the studied structures for DVR.

Figure 13. MR-Brain dataset DVR results: FSS.t-SNE-based multiple ROIs render of the cortex and basal ganglia from different points of view. The bottom left box represents the lookup table of optical properties. Colored points stand for 2D TF bounds. Colored shadow regions and arrows represent the studied structures for DVR.

Table 1. Algorithm complexity: FSS.t-SNE vs. relevant t-SNE-based variants.

Method	HD Similarity Computation	Iteration in Gradient-Based Optimization
SNE, t-SNE	$O (N^{2})$	$O (N^{2})$
BH t-SNE	$O (N log N)$	$O (N log N)$
Multi-scale t-SNE	$O (N^{2} log N)$	$O (N^{2})$
Fast multi-scale t-SNE, FSS.t-SNE (ours)	$O (N {log}^{2} N)$	$O (N log N)$

Table 2. CT-Abdomen segmentation results. The Dice coefficient for four segmented organs is computed from DVR outcomes.

TF Approach/Structure	Liver	Lungs	Kidneys	Bone
FSS.t-SNE-based TF (ours)	0.8053	0.8899	0.6487	0.6692
Intensity vs. Gradient Magnitude [25]	0.6010	0.6075	0.2575	0.3497
Intensity vs. Laplacian [7]	0.4244	0.6671	0.5805	0.4055
Statistical Properties [27]	0.2932	0.8197	0.5307	0.6745

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Serna-Serna, W.; Álvarez-Meza, A.M.; Orozco-Gutiérrez, Á. Fast Semi-Supervised t-SNE for Transfer Function Enhancement in Direct Volume Rendering-Based Medical Image Visualization. Mathematics 2024, 12, 1885. https://0-doi-org.brum.beds.ac.uk/10.3390/math12121885

AMA Style

Serna-Serna W, Álvarez-Meza AM, Orozco-Gutiérrez Á. Fast Semi-Supervised t-SNE for Transfer Function Enhancement in Direct Volume Rendering-Based Medical Image Visualization. Mathematics. 2024; 12(12):1885. https://0-doi-org.brum.beds.ac.uk/10.3390/math12121885

Chicago/Turabian Style

Serna-Serna, Walter, Andrés Marino Álvarez-Meza, and Álvaro Orozco-Gutiérrez. 2024. "Fast Semi-Supervised t-SNE for Transfer Function Enhancement in Direct Volume Rendering-Based Medical Image Visualization" Mathematics 12, no. 12: 1885. https://0-doi-org.brum.beds.ac.uk/10.3390/math12121885

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fast Semi-Supervised t-SNE for Transfer Function Enhancement in Direct Volume Rendering-Based Medical Image Visualization

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Transfer Function-Based Multi-Dimensional Feature Space

3.2. Fast Semi-Supervised t-SNE (FSS.t-SNE)

3.3. FSS.t-SNE-Based Volumetric Rendering

4. Experimental Set-Up

4.1. Tested Datasets

4.2. Training Details and Method Comparison

5. Results and Discussion

5.1. FSS.t-SNE Embedding Results

5.2. DVR Visual Inspection Results

5.3. Segmentation Results

5.4. Limitations

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI