Next Article in Journal
Acceleration of Intrusion Detection in Encrypted Network Traffic Using Heterogeneous Hardware
Previous Article in Journal
Experimental Comparison between Event and Global Shutter Cameras
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deep Learning Using Isotroping, Laplacing, Eigenvalues Interpolative Binding, and Convolved Determinants with Normed Mapping for Large-Scale Image Retrieval

1
School of Computer Science and Technology, University of Science and Technology of China, Hefei 230009, China
2
Department of Computer Science, Bahauddin Zakariya University, Multan 60800, Pakistan
3
Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei 230009, China
*
Author to whom correspondence should be addressed.
Submission received: 11 January 2021 / Revised: 29 January 2021 / Accepted: 1 February 2021 / Published: 6 February 2021
(This article belongs to the Section Intelligent Sensors)

Abstract

:
Convolutional neural networks (CNN) are relational with grid-structures and spatial dependencies for two-dimensional images to exploit location adjacencies, color values, and hidden patterns. Convolutional neural networks use sparse connections at high-level sensitivity with layered connection complying indiscriminative disciplines with local spatial mapping footprints. This fact varies with architectural dependencies, insight inputs, number and types of layers and its fusion with derived signatures. This research focuses this gap by incorporating GoogLeNet, VGG-19, and ResNet-50 architectures with maximum response based Eigenvalues textured and convolutional Laplacian scaled object features with mapped colored channels to obtain the highest image retrieval rates over millions of images from versatile semantic groups and benchmarks. Time and computation efficient formulation of the presented model is a step forward in deep learning fusion and smart signature capsulation for innovative descriptor creation. Remarkable results on challenging benchmarks are presented with a thorough contextualization to provide insight CNN effects with anchor bindings. The presented method is tested on well-known datasets including ALOT (250), Corel-1000, Cifar-10, Corel-10000, Cifar-100, Oxford Buildings, FTVL Tropical Fruits, 17-Flowers, Fashion (15), Caltech-256, and reported outstanding performance. The presented work is compared with state-of-the-art methods and experimented over tiny, large, complex, overlay, texture, color, object, shape, mimicked, plain and occupied background, multiple objected foreground images, and marked significant accuracies.

1. Introduction

Nowadays artificial intelligence (AI) and machine learning (ML) based convolutional neural network (CNN) research has fascinated significant consideration. Due to its exceptional performance, CNN is applied in various fields—including image detection, image retrieval, and image classification [1]. AlexNet [2] was a good breakthrough and then VGGNet [3] was developed for large scale classification. GoogLeNet [4] applies combined inception modules including concatenation operations, convolutions with various scales, and pooling. Recently, ResNet [5] was introduced with more than 100 convolution layers. The number of trained weights are increased with number of increasing layers. This requires a huge amount for computation and significant computational time in training and the classification steps. The classification for these networks in the real time applications is also a challenging task. To overcome this problem, graphics processing units (GPUs) are normally applied to speed up the classification and training time [6]; however, a Tensor Processing Unit (TPU) is an equally good choice in non-parallel computing environment [7]. GPUs uses 1000 s of Arithmetic and Logic Units (ALUs) to compute heavy parallel and matrix processing neural networks. TPU do not contain general purpose hardware due to its domain specific architecture and normally adopted for massive multiplications and additions.
In practical applications, the classification acceleration is a main factor which is obtained by utilizing the power and storage resources [8]. The CNN architectures are employed for highest level of abstractions using deep learning architectures that are combined for many non-linear transformations [9]. State-of-the-art performance is achieved using CNN for various applications, including object recognition [10] and speech recognition [11]. CNN architectures have been presented to improve the workflow of image retrieval [12,13]. Furthermore, the features extraction by using deep learning methods have taken developments in the processing of multimedia contents. The CNN features [14] have attained high effectiveness in different retrieval and computer vision tasks. The combination of local features—such as color, shape, and texture—is performed with CNN features by incorporating scale invariant feature transform (SIFT) to detect the potential image contents. This is interested phenomena that exposes the combination of deep learning features with local features to find the background and foreground objects.
Image retrieval is a process of image extraction based on salient image features including shapes, texture, colors, and objects. Image retrieval system focuses on foreground and background image features with their semantic interpretation and spatial aspects. Therefore, content-based image retrieval is an intelligent image extraction system that retrieves the results based upon the actual image contents as contrast to text based retrieval [15]. Versatile local features have been proposed in last decades to support object recognition tasks and content-based image retrieval (CBIR). These local features includes SIFT [16] and SURF [17] methods to efficiently match local structures along with thousands of deep features being extracted. Six feature extraction techniques are extensively applied including image segmentation, color features, shape features, texture features, interest point detectors and combination of visual features. Image segmentation is applied to extract similar regions. For this, the common techniques are contour-based [18], thresholding [19], region growing segmentation [20], grid-based [21], k-means clustering [22], watershed segmentation [23], texture-based segmentation [24], statistical model [25], and normalized cut [26]. To focus the color representation, it is better achieved by applying different color models and spatial aspects. Color moments are also applied to match color similarity and indexing. Color space technique is used to convert color bins having their representational frequency. The color histogram of multiple objects with similar colors are also focused with large image databases. Color histogram has a problem to ignores main information like shape and texture. For image analysis, image quantization is attained in RGB space with pixel classification as interior or border pixel localization with four neighbors. In this regard, texture features analysis with their neighbor is not an easy task and rough, irregular, smooth, and random results. For this, the corner edges and similarities techniques are applied to model texture features.
In [27], texture features are extracted by incorporating different techniques which are invariant to shape [28] and rotation in [29]. The spatial texture features face complex indexing and searching process. It applies spectral texture methods for image transformation on frequency image domain using spatial filter banks. The feature extraction is achieved on transformed domain using statistical tools. Consequently, the techniques for spatial filter banks are vigorous to noise and makes prominent than spatial texture features. The fast Fourier transform (FFT) is also applied for spectral analysis to obtain spatial results [30].
The novelty of the presented method is the application of local features in combination with the strength of GoogLeNet, ResNet-50 and VGG-19 architectures for effective features extraction methods. The proposed method introduces a novel technique using Gaussian filters, isotropic filtering, multi-scale filtering, LOG (Laplacian of Gaussian), convolutions, derivatives, scale spacing, Eigenvalue computations, and feature reduction. The presented approach applies Gaussian filters whose results are purified with Laplacian of Gaussian. To produce the results on which Isotropic filtering is applied that builds the foundation for multiscale filtering. The multiscale filtering is applied to obtain the deep image filtering at various abstract levels. These features are aggregated with gray level convolution features which are concatenated with first order partial derivatives. To obtain the best texture values, Eigen coefficients are computed for these signatures. At an elementary level, BoW kernel function is applied on which non-maximum suppression (NMS) is performed to create the basis for interpolation of corner responses. Determinants are required to compute data points which are already produced by interpolated corner responses. After the computation of determinants, again convolution and derivatives are applied where the difference is that now second order of derivatives are applied instead of first order derivatives to obtain better differences. On this research, Laplacian of Gaussian (LOG) approximation is applied for scale spacing. These features are representative of image objects. On parallel, RGB channels are computed with the selection of color coefficients and normalized values mapping to get the color features. At this step, texture, objects and color features are extracted which are then concatenated with the CNN based trained feature vectors. The final assembly of feature vectors is the representation of textures, objects, colors CNN based image features for all strong image content candidates. Finally, for image retrieval the BoW architecture is applied to retrieve relevant image. The image conversion is performed from color to gray levels 0 to 255 and second level of normalization is applied for all channels, and integrated with produced signatures for presenting the massive feature vectors compactly. Principal component analysis (PCA) is used for redundant feature vectors. The GoogLeNet, ResNet-50, and VGG-19 architectures return the feature vectors for every image for which deep features have already been fetched. The GoogLeNet, ResNet, and VGG-19 based feature vectors are fused with deep features extracted from presented technique. These powerful features are combined to find the tiny, large, complex, overlay, texture, color, object, shape, mimicked, plain background, and multiple-object-based images. These image feature vectors are used as input to the BoW for efficient image indexing and image retrieval. To test the competitiveness of the presented method, it is experimented on ten standard image benchmarks such as Cifar-100, ALOT (250), Oxford Buildings, Cifar-10, FTVL Tropical Fruits, Corel-1000, Fashion (15), Caltech-256, Corel-10000, and 17-Flowers which contains millions of images from versatile semantic groups. The presented work has following novelties and contributions:
  • It comprehensively analyses and collects image contents such as color, object, texture, shape, and spatial information which produces significant recall and precision rates.
  • To attain better improved results, a method is introduced that improves the capabilities of ResNet-50, VGG-19, and GoogLeNet architectures using its internal coupling.
  • A light-weight description model and feature detection is presented that efficiently and effectively retrieves the related images from cluttered and complex databases.
  • First time proposed a technique that performs multilevel scaling, suppression, interpolation, determinants and scale spacing collectively to attain the detail of deep finer image contents.
  • Presented a color and gray level features based retrieval system to encompass the image features for edge, corner detection capabilities with color carrier candidate features.
  • The presented technique introduces a methodology that effectively returns significant performance on similar textures, overlay ambiguous objects, tiny objects, resize images, mimicked, color dominant arrangements, cropped objects, cluttered patterns, overlay ambiguous objects, color dominant arrangements, and similar textures.
  • A storage, computation and time efficient image retrieval system is presented that retrieves the images in fraction of time.
  • A new idea is presented that strengthens the normalized scaled features with BoW architecture for quick classification and indexing.
The remaining article is organized as follows. The related work about CNN with deep learning is presented in Section 2. The presented methodology is explained in Section 3. In Section 4, the experimental results which presents with graphs and tables, also discusses the results of experiments. In Section 5, the conclusion of the presented method is discussed.

2. Related Work

The Convolutional Neural Networks (CNN) have been researched by many existing methods to achieve highest performance for different applications. The recent trends are focused on understanding the Deep Neural Networks (DNN) performance for different complexity levels. In [31], a novel technique is presented to show the minor changes in results of image classification. Moreover, networks can be improved by adding new layers, and by contributing new computer vision algorithms [32]. The researchers introduced a feature extraction technique for image retrieval and image representation using CNN in [33]. The experimentation is performed on Cifar-100 and Cifar-10 benchmarks. For image classification using deep networks, a PCA approach is proposed in [34]. In [35], factorization is performed on similar pairwise semantic matrices into hash codes approximation for training images. Moreover, for efficient and effective image retrieval, CNN is used to make binary hash codes in [36]. Karhunen–Loeve transform (KLT) is used in discrete cases to present the stochastic processes in proper conditions [37]. An effective algorithm is presented in [36] for fast calculation for KLT operator. This fast KLT calculates eigenvectors efficiently when this is treated with small samples. In [38], deep learned features are computed by finding the symmetry in FAST scores with neighborhood, smoothing, and standard deviation. Feature scaling, reduction, and filtering is applied to resize the features for a variety of datasets. This approach indicates the potential research outcomes if it is compared with existing ResNet architecture.
A fusion technique presents the ResNet, VGG, and GoogLeNet models [39] with an interconnection. The SVM and random forest techniques are applied for image classification. The experimentation is performed to achieve better performance on Standford 40 actions. In [40], an approach is proposed that is based on deep learning using five stages including image pre-processing, pre-trained CNN, semantic segmentation, query analysis, and image retrieval for NTCIR13Lifelog-2 dataset. The stemming for image labels from deep neural networks (DNN) in [41], researchers are applied AlexNet [3] and GoogLeNet for object recognition on ImageNet dataset. Moreover, the AlexNet, ResNet [5], GoogLeNet, and VGG [3] architectures are used for scene recognition on places365. The histogram of oriented gradients (HOG) method is employed to detect number of person in each image. In [42], a new technique is performed for comparison and dissimilarities of latest CNN algorithms including AlexNet, VGGNet, GoogLeNet, and ResNet on BelgiumTS benchmark. In addition, an object detection technique is presented to find better processing speed and object recognition ratios among CNN. Symmetric solution by using SVD, PCA, VGG, and Gaussian mixture model (GMM), and highest dimensional feature extraction, feature selection, region segmentation, and softmax respectively in [43]. Furthermore, feature selection for SVD and PCA using FC layers validated the accuracy of images classification.
Various algorithms and methods are proposed for feature extraction, description and detection, and object detection and classification. Research literature is shared to induce the similarity among object detection, filter banks, and texture filters. The implementation of semantic concept this technique is relatively hard, and difficult to understand the overlapped and cluttered objects. Therefore, the previous approaches are focused on image classification and single object detection for feature set [44]. The researchers in [45] focused their research on lower level feature extraction, and object recognition, including filter banks, HOG, GIST, and the bag of feature (BoF) applied using word vocabulary.
The content-based image retrieval (CBIR) is used visual features including image edge, name suitability, color, and texture in input images [46]. The CNN is applied for image classification to retrieve images using cosine similarity. The CNN approach has demonstrated successful image retrieval tasks based on image classification. Many researchers contributed to the image processing task, the semantic gap among the lowest level of image features. In [47], a robust technique of CNN and sparse representation was proposed. Moreover, a novel technique is presented with an in-depth feature extraction using CNN, and increased image retrieval accuracy and speed using sparse representation. In this method is tested for image retrieval on MPEG7, ALOI, and Corel databases. Recent research has been deploying CNN for various types of object-based image classification. However, how to efficiently take advantage of deep CNN features with a trained network to increase the object-based retrieving images still required more research.
The presented method is formulated with best suited approach for efficient image retrieval. The Gaussian filters whose results are purified with LOG. Multi-scale filtering is used after Gaussian filtering in presented technique. Derivatives and Eigenvalues are also used in presented method. NMS is performed to create the basis for interpolation of corner responses. LOG approximation is applied for scale spacing. CNN is applied to reduce features, and the image classification is performed by ResNet-50, VGG-19, and GoogLeNet. Spatial mapping and L2 normalization are applied to normalize images for searching and indexing purposes. The GoogLeNet, ResNet-50, and VGG-19 based feature vectors are fused with the deep features extracted from the proposed method. These powerful features are combined to find out overlay, texture, color, object, shape, mimicked, plain, complex and occupied background, multiple objected foreground images from the challenging image benchmarks. BoW is applied for efficient image retrieval and index. The presented method shows outstanding performance for all challenging benchmarks including Cifar-10, Caltech-256, Corel-1000, Oxford Buildings, Cifar-100, FTVL Tropical Fruits, Corel-10000, 17-Flowers, ALOT (250), and Fashion (15).

3. Methodology

In Figure 1, the presented architecture inputs an image and converts it into gray level to extract Gaussian, Laplacian, Isotropic, and scale-based features mapping. Moreover, the convolution and derivations are encapsulated of Eigen values to input for bag-of-words architecture to show the separation effects with interpretation. In parallel, data blots with determinants are computed at second order with LOG and convolved values. These feature vectors are fused and assembled with RGB maps, colors, norm features and concatenated with ResNet-50, VGG-19, and GoogLeNet separately for quick indexing and retrieval.

3.1. Image Channeling

At the first and foremost step, is to find out the key points in the image. For this, image conversion is performed from color to gray levels 0 to 255 and that is input for first level of processing. The noise is removed from gray scale image by applying primitive techniques such that the gray scale image is formed with high and low intensity represented by white and black levels.

3.2. Isotroping and LOG Filtering with Derivations

In this step, The MR8 filter bank [48] is applied with 38 filters over eight responded filters to obtain texture orientation, neighboring and patterns. The arrangement of these 38 filters consist of eight Laplacian of Gaussian (LOG) filters along with four Gaussian filters to form 12 filters to be applied at three scales and generates 36 filters. These 36 filters are concatenated with first and second order of Gaussian to form 38 filter bank. To achieve the rotational invariance, the filters are employed at different orientations and scales. At each scale, the maximum response among multiple orientations is preserved. The final response at each position is an eight-dimensional feature vector having three scales for bar filters and edge, plus two for isotropic [49]. The filters comprise Gaussian, and Laplacian of a Gaussian (LOG) filter at scale φ = 10, a bar (second derivative) and also an edge filter of first derivative at six orientations and similar three scales respectively (φh, φp) = {i, j, k} where i = (1,3), j = (2,6) and k = (4,12). To obtain equal insensitive and mono directional context, isotroping is applied on the determined filters responses. These are collapsed at every scale due to filter responses through orientations using eight responses of filters to ensure rotational invariance in filter bank. Moreover, the filter bank of MR4 [50] is employed (φh, φp) = k scale [51]. MR4 contains 14 filters in which four are responded to obtain neighboring, pattern, and texture orientation. This classifier is used to complete the local neighbor values for pixel filter responses. The small neighborhoods with 3 × 3 size over to 7 × 7 attain higher categorization performance for multiple scale filter banks. It is important to distinguish between texture classes referred as joint classifier. The texture pattern analysis is a potential carrier for background and neighborhood similarities and helpful in semantic interpretation based upon texture distributed. Equation (1) defines for Markov random field (MRF) [51]
γ(W(he)|W(h), h = he) = γ(W(he) | W(h), hŦ(he))
In Equation (1), he is used as a site in two-dimensional integer lattice, W represent image and Ŧ (he) is the neighborhood of the site. It is defined Ŧ as Ŧ × Ŧ the square neighborhood. The value of central pixel is significant but its distribution must coordinate to its neighbors. To test this distribution conditioned, retain classifiers on the feature vectors from the set of Ŧ × Ŧ neighborhoods with left out central pixel. Classification ratios for Ŧ = 5 are improved when left out the center pixel and slightly worse for cases of Ŧ = 3 and Ŧ = 7. The joint distribution is sufficient to validate MRF model of the textures in dataset. It is an explicit model containing γ(W(he)|W(h), hŦ (he)). At this point, textons are used to define central pixel joint probability density function (PDF) conditioned and their neighbors. The center pixels are applied for feature vectors in Ŧ2 − 1 dimensional space by similar dictionary for q textons for every m textons is putative elementary units of texture where q = 610, one-dimensional distribution of central pixels learns by t bin histogram. The joint PDF is represented by m × t matrix. Furthermore, a Gaussian filter property is that it is an operator which satisfying uncertain relation in Equation (2) [52]
z   v   1 2
In Equation (2), v and z are variances in frequency and spatial domains respectively. Also, this property is allowed the Gaussian operator for provide best tradeoff among conflicting localization areas in frequency and spatial domains. The two-dimensional Gaussian filter uses rotational symmetric filter; that is a separate coordinate. The separability is significant for efficient computation when applying operation of smoothing using convolutions in the spatial domain. Moreover, the optimal smoothing filter is localized for both frequency and spatial domains for images, thus satisfies the uncertain relation which is given in Equation (2). Consider two dimensions Gaussian operator defines in Equation (3) [52]
s ( z , r ) =   1 2   ( 3.14 ) φ 2   e ( z 2 + r 2 / 2 φ 2 )
In Equation (3), φ is used for standard deviation and (z,r) are Cartesian coordinates for image. Gaussian filters are applied for various scales to image in Hildreth and Marr, a set for images with various smoothness levels is obtained. For image edges detection, it is also essential to find zero-crossings for the second derivatives. The Laplacian of a Gaussian (LoG) function is used as filter in Equation (4) [52]
2 s ( z , r ) =   d 2 d z 2   s ( z , r ) +   d 2 d r 2     s ( z , r ) =   z 2   +   r 2 2 φ 2 2 ( 3.14 ) φ 6 e ( z 2 + r 2 / 2 φ 2 )
In Equation (4), it is an operator of independent orientation and the scale is assumed by φ . It is broken down at locations, at curves, and corners, the image intensity function changes in non-linearity manner beside an edge. In Algorithm 1, it computes the square neighborhoods for the intensity values which are already joints in step-1 and step-2. Then, it computes the central pixel with Ŋ × Ŋ neighbor which is classified for odd numbers in step-3 and step-4. The feature vectors are computed for Ŋ2 − 1 for which ң and ω bin are represented in step-6 and step-7. Finally, the co-occurrence representation is shown for filter banks and probability is computed in step-8 to step-10.
Algorithm 1Neighborhood computations algorithm.
   Step-1: Square neighborhood preferences calculation
   Step-2: Jointly introduce the intensity values
   Step-3: Compute Ŋ × Ŋ neighborhoods with excluding central pixel
   Step-4: Test classification for Ŋ = 3,5,7
   Step-5: Include central pixel and retest step 2–5
   Step-6: Form Ŋ2 1 feature vector
   Step-7: Show in ң values ω bin representation
   Step-8: Compute co-occurrence representation of filter banks
   Step-9: Compute joint probability P (İ)
   Step-10: Compute conditional probability
   P (Ϛ (y^)|Ϛ (Ή (y^))

3.3. Eigen, Harris, Gaussian Coefficients Composition

At this step, the KLT [37] and Harris algorithms [53] are applied to identify corner pixels of an image. The first order Gaussian derivative kernel approximates using box kernel to speed up the algorithm on which convolution is computed at with less resources on integral images. The integral images of u2x, u2y, and uxuy are used to increase the computation of cornerness response. The cornerness concentrates computation, and time consumption for Harris algorithms. Furthermore, the D(γ) cornerness of pixel is estimated by adding squares, and multiplies gradients with integration window N such as shown in Equation (5). Both algorithms measure the alteration in intensities because shifts the N for each directions are given points in the cornerness response of corner pixels are defined as in Equation (5) [53]
D ( γ )   =   h N { [ u h 2 ( H ) u h ( H ) u p ( H ) u h ( H ) u p ( H ) u p 2 ( H ) ]   ×   £   ( H ) } =   [ U h h U h p U h p U p p ]
with,
u i = φ i   ( u W )   =   ( φ i u )     W ,   i     h , p ) ,   u = U   ( h , p ;   ϕ )
Here, £   ( H ) is used for weighted function, γ is for central pixel, N is used for the integration window centralized at γ, W is an image, u is used for two dimensional Gaussian function, and uh and up are gradients for image attained using convolution for Gaussian first order partial derivatives for h and p directions in Equation (5). The Harris corner detector is used to evaluate cornerness for each pixel without explicit decomposition of eigenvalues as described in Equation (7) [53]:
A   =   | D |     ƙ   ×   ( Trace   ( D ) ) 2
with,
| D |   =   ƛ 1 ×   ƛ 2 and   Trace   ( D )   =   ƛ 1   +   ƛ 2
where ƛ1 and ƛ2 are eigenvalues of D. The value of ƙ is taken between 0.04 to 0.06. The image pixel is like a corner if the eigenvalues are large, the resultant A in a peak response.
The D(γ) eigenvalues are calculated by KLT. It is selected for points that computed the minimum eigenvalue as defined in Equation (9) [53]
A = ƛmin = min (ƛ1, ƛ2)
ƛ min =   1 2 ( U hh + U pp     ( U h h U p p ) + 4   × U h p 2 )
The KLT and Harris algorithms uses similar method to detect the corner points, the difference is only the cornerness functions that is estimated in Equations (7)–(10) respectively. The KLT and Harris algorithms are divided in three steps as:
(a)
The Gaussian derivatives for image W, uh, and up are calculated using convolution with kernel of Gaussian derivative;
(b)
The D(γ) matrix and cornerness measure A evaluate individually of every pixel; and
(c)
The NMS and Quick-sort are applied to suppress local low points.
The complexity reduces with two aspects. First, the integral image reduces the complexity for both evaluation and convolution of cornerness response. Second, the efficient NMS adopts of NMS, then avoid high complex sorting.

3.4. First Ordering, Kernel Approximation, and Recursive Filtering

At this stage, detection of corner feature point is required for computation of Gaussian first order partial derivatives, uh and up, in directions of h and p with image W respectively. Moreover, the Gaussian kernels should be discrete and approximated with finite instinct response filter. It is required at least filter length. The Gaussian derivative kernel results improved achievement of the Harris corner detector using better repeatability. The Gaussian derivative kernel is applied recursively for fast computations of convolutions. The Gaussian derivative filter is estimated with infinitely impulse of response filter [54] using recursive filter approach, which is allowed to set the length of filter. Using this method, computation for Gaussian derivative kernel of various scales are applied in time constantly. In SURF [55], first order Gaussian partial derivative kernel with box kernel is approximated. The gray areas are set as 0 in it. Also, the white and black areas are estimated using +1 and −1, respectively. The gradients are computed at low computation cost and also in constant time with integral image. Finally, only seven operations and eight memory accesses need to calculate gradient. Furthermore, the filter response normalizes with the size of filter. The convolution using Gaussian derivative kernel integrates two steps including low-pass filtering using differentiation and Gaussian. Next, Gaussian function is approximated using a triangle function subsequently integration with integral image method. Furthermore, the operations number are essential for box kernel that is independent for kernel size, then linearly increase with the kernel size. The multiplication operations reduce to 1 and other multiplications replace by subtraction and addition operations.

3.5. Multiple Gradients Responses

The cornerness step is the most intensive in computation and also time consumption for KLT [37] and Harris algorithms [53]. The cornerness D(γ) for pixel evaluates by adding squares and multiplies of gradients in N as in Equation (5), which evaluates at each pixel. Therefore, the computation overlapping among pixels is lying in N occurs. It creates integral image for gradients such as in Equation (5), accelerating computation for corner response. The following Equations (11)–(13) are defined in [53]
  j j h h ( h , p ) =   h h ,   p p u h 2   ( h ,   p )
j j p p ( h , p ) =   h h ,   p p u p 2   ( h ,   p )
j j h p ( h , p ) =   h h ,   p p u h   ( h ,   p ) h p ( h ,   p )
From Equations (11)–(13), the integral images are created like summations Uhh, Upp, and Uhp in Equation (5), evaluated at low computational cost with four memory accesses and three operations. The repeated summation and multiplication operations in N from image of each pixel is exchanged with one-time creation for integral image; and after that easy operations of subtraction and addition. Furthermore, no loss is observed for detector efficiency with modification and contribution using huge speedup. The speedup is achieved by additional memory access for jjhp which not exist in original algorithm. The uhup can be computed at same time for the present CPU internal registers and gradients read from memory, that is not for W as it is located and pre calculated in the memory.

3.6. Perform Non Maximal Suppressions with Corner Responses

In this step, the NMS is performed over cornerness response images that preserves a single location of each corner feature point. NMS is applied in two steps [56] for the KLT detector. Firstly, the quick-sort is performed of arranging A cornerness response in descending order over image [56,57,58], that is a more expensive operation computationally as sorting for the list of response points corresponds with image size. The non-maximal points are blocked using selective robust response points from response sorted list and eliminating low strong response points consecutive in list inside the distance ƈ. Finally, a lowest distance among feature points are required. NMS naive employment of local neighborhood for (2ƈ + 1) × (2ƈ + 1), each cornerness response is leading a highest Harris complexity [59]. Comparison is performed for all pixels lying in integration window (2ƈ + 1) × (2ƈ + 1) to find the maximum point. The maximum point is chosen if it is above threshold and greater than all pixels. This procedure is repeated for each pixel in cornerness response image. In [60], the efficient non maximum suppression (E-NMS) is introduced for efficient extraction of single feature locations for each corner region. The E-NMS is performed NMS in image blocks rather than pixel by pixel, then reduces the complexity of computation. Moreover, the minimum distance enforcement and quick-sort steps are mostly intended for tracking [56] a unique location for each corner region. Furthermore, it switches order by applying the effective NMS, that is low complex computationally, and sort the feature points as cornerness response. The sorting performs on a small number of points, complexity reduces extremely. The E-NMS algorithm works as follows. First, it partitions the image in size blocks (ƈ + 1) × (ƈ + 1). Secondly, the maximum element is found within each individual block. If it passes the test of local maximum and existing threshold, then the feature point location is retained. For the KLT algorithm, sorting is performed for the feature points as cornerness response is related with them. In Algorithm 2, interest points are detected by applying non-maximal suppression on neighborhoods with their variants and determinants as shown in step-1 to step-3. After this, interpolation is computed at different scale factors for which the neighborhood extraction is performed based on contents description in step-4 to step-6. Similarly, variant extraction is computed based on gradients in step-7 and result in Haar wavelet response in step-8. Algorithm 3 shows Haar wavelet response by applying square region with their orientation and splits up in step-1 to step-3. Space based information is preserved with 5 × 5 boxes for which horizontal and vertical directions are computed and increased interval in step-4 to step-7. Geometric aspects are computed with deformation in step-8 and localization in step-9. Finally, results in weighted centered interest points to represent the feature vectors in step-10 and step-11.
Algorithm 2Find Interest Points algorithm.
   Step 1: Apply cube 3 NMS neighboring
   Step 2: Compute variants
   Step 3: Separate the determinants
   Step 4: Apply interpolation
   Step 5: Apply scaling and image spacing
   Step 6: Distribution content description neighboring extraction
   Step-7: Gradient variant extraction
   Step-8: Compute first order Haar wavelet response
Algorithm 3Haar wavelet response effects algorithm.
   Step 1: Square region formation
   Step 2: Orientation selection
   Step 3: Region split up
   Step 4: Spatial information preservation
   Step 5: 5 × 5 regularly spaced samples
   Step 6: Compute K x for horizontal directions
   Step 7: Compute K y for vertical directions
   Step 8: Increase interest point orientation
   Step 9: Deformation geometric aspects
   Step 10: Localization geometric aspects
   Step 11: Result weighted centered interest points

3.7. Hessian Blobs Detection

In this step, blob structures are detected by locations for determinants calculation. Consider a point Z = (h, p) for an image W, Hessian Matrix U (h, φ) of h at the scale ɵ is represented such as in Equation (14) [55]
U ( h , φ )   =   [ l h h ( h ,   φ ) l h p ( h ,   φ ) l h p ( h ,   φ ) l p p ( h ,   φ ) ]
In Equation (14), l h h ( h ,   φ ) is convolution for Gaussian second order derivative 2 h 2     z(φ) and same for l h p ( h ,   φ ) and l p p ( h ,   φ ) with image A of point h. Gaussians are used for the optimal analysis of scale-space [61]. Moreover, Gaussians are used for discretization and cropping. The resultant is the repeated image rotated around of odd multiples 3.14 4 which is the weakness of Hessian detectors. The repeatability is computed using multiples of 3.14 2 due to square shape filter. LoG approximations are second order derivatives and evaluated with low computation cost by integral images. The nine square box filters are applied to Laplacian of Gaussian and presented at low scale for computing the blob response maps. These are represented by lhh, lpp, and lhp. The weights are applied to rectangular regions for easy and efficient computation, as in Equation (15) [55]
Determinant (Lapprox) = lhh lpp − (wlhp)2
In Equation (15), the weight w for filter responses is applied of balancing Hessian’s determinant expression. Moreover, filter responses normalize according to their size. This is guaranteed as a constant Frobenius normalization for some filter size and also used for analysis of scale space. The approximation of Hessian determinant is represented for blob response of an image at the location h. The responses are kept in map form for blob response over various scaling level and detected the local maxima. These scale spaces are applied as an image pyramid. The repeated images are smoothen using Gaussian and the sub-sampled according to highest level for pyramid. In [62], Lowe subtracts pyramid layers to make difference for Gaussians of images to find blobs and edges. The box filters can apply for each size at exact similar speed on the real image. This scale space is used to analyze the increase in filter size. Next, the following layers are managed using image filtering along with applying gradually higher masks. The divided scale space is represented as a series of filter response maps attained by convoluting similar input image by increasing filter size. Moreover, each octave is sub-divided to constant number for scale levels. Integral images have discrete nature, the difference of minimal scale among two consequent scales is depended on length l1 for negative and positive lobes of second order partial derivative in derivation direction (h or p), that is set as size length of third filter. Moreover, for 9 × 9 filter, the length l1 is 3. It is necessary to increase the size by the minimal of two pixels to obtain uneven size and to assure the existence of central pixel for two successive levels. Re-scaling mask presents round off errors. The errors are smaller than l1, this type of approximation is conventional. The construction of scale space initiates with 9 × 9 filter, that computes the blob response for image of smallest scale. Algorithm 4 shows the intermediate response and summed up operations in step-1. All the absolute values are computed and polarity with intensity is calculated in step-2 and step-3, respectively. The differentials for the sub-regions are computed and their aggregate is scale factor with descriptor in step-4 to step-7.
Algorithm 4Intermediate response and summed operation algorithm.
   Step-1: Compute sum of absolute values
   Step-2: Check polarity intensities
   Step-3: Compute | V x | and | V y |
      Where V contains differentials
   Step-4: s V V x , V y
   Step-5: Apply s to all sub-regions
   Step-6: Sum Ǵ = s 1 + s 2 + s 3 +---------+ s n
   Step-7: scale factoring descriptor unit vector

3.8. Spatial RGB Mapping

The presented approach introduces a new way of presenting color channels by their coefficients to present the contents of color image in efficient and compact way. For this, a color histogram is captured to show the color distribution. The spatial correlation for changing the color is presented by distance in presented approach. Let £ be an image, i colors in £ are quantized such as x£,…xi. Color represents a pixel Q = (a, k) can be described in Equation (16) [63]
Q = ( a , k )     £
For computation of distance between pixels Q1 (a1, k1) and Q2 (a2, k2) in Equation (17) [63], we define
| Q 1     Q 2 |     m a x   { | ( a 1     a 2 ) | ,   | k 1     k 2 | }
In Equation (18) [63], the color histogram Y of an image I and n [ i ] , we define as
Y dn ( £ )   i 2 . P r   [ Q £ x n ] Q   £
In Equation (18), Yxn (W)/i2 returns the probability of pixel color Wxn where W is an image and xn is a pixel color. The histogram Y is a linear function in image size and it can be calculated as time O( i 2 ). Let the distance x [i] is used to fix a priori. The correlogram for the image £ is represented of a,k [i]; k [x] in Equation (19) [63]
  Ή x a x k l ( £ ) P r   [ Q 2 £ x a   | Q 1 Q 2 = P ] Q 1 £   x k   Q 2 £
Equation (19) is used to define the spatial arrangement for color pixels in image. Ή is represented by the probability and color pixel x k at distance P that is away from a given color pixel in Equation (16). Then, the spatial relationship among similar color values is presented in Equation (20) [63]
ȵ x P ( £ ) Q x P . x ( £ )
Equation (20) derives from Equation (18), where ȵ is denoted by probability of x color pixel and p is used for distance.

3.9. Covariant Selection for Detection

The presented approach shows compact image features; however, to reduce image retrieval time, the features are presented with their coefficients using PCA. After performing different scaling level steps, the proposed approach applies the algorithm to reduce features which applies various mathematical designs to reduce irrelevant images and apply compression on images [48]. The principal component analysis is used to reduce variables, and these coefficients are applied for the measurement of number of factors which are not correlated [64]. The PCA is applied in situation where most of variables are measured for same construct [64] and also applied to process data comprising extraction of few synthetic variables; that is named as principal components. The principal components are sequence of data projection. PCA is employed to compress and reduce dimension for finding high variance based coefficients [64]. Let m be represented by dimensions on a vector v of q 1 random variables and reduce measurement from q 1 to s [65]. The principal component analysis is applied to find out the linear combinations such as a 1 v ,     a 2 v     a   r v that have an extreme data variance focuses on un-correlated with last a 1 v e . To solve maximization problem, eigenvectors are a 1 ,   a 2 , . . , a r of covariance matrix e that corresponds to s largest eigenvalues. Furthermore, the eigenvalues show variances of principal components respectively and summation results for first r eigenvalues to sum of variances for every q 1 real variables that is denoted by proportion for each variance in real database, applied for s principle components [65].

3.10. Image Retrieval and Indexing Using BoW

In this step, the bag-of-words (BoW) architecture is used of quick image retrieval and indexing. In BoW architecture, each image is denoted with single linear vector. First, BoW model applies the controls for local feature descriptor as Scale Invariant Features Transform (SIFT) [16]. Second, BoW comparison of single vector is applied using dis-similarity score that is simply employed. The SIFT descriptor is represented by patches like numerical vectors. The SIFT is created to collect equal sized dimensional vectors having 128 bits that is represented by one byte for compact and effective image representation. The occurrence count of each visual word is represented in form for histogram of each image. An inverted image index generates efficient image retrieval based on histogram. Each index is represented by one visual word where small parts of an object carry versatile information—including shape, color, and texture. Visual words contain pixel changes with respect to low level features, descriptors and filters. An image identity list also creates to map terms using images. Finally, image ranking is employed to count visual words number shared between indexed images and query image. The highest number for shared words of image takes its rank to the top. The BoW model is not captured the location, spatial information and co-occurrence of visual words. In the presented approach, the spatial color extraction technique is embedded spatial information for feature vectors (FV) at feature extraction time that is the results of more relevant retrieved images.

3.11. Deep Image Networks

The presented method employs the architecture of deep convolutional neural network that is compared with ResNet-50, GoogLeNet, and VGG-19 to check the affectivity and accuracy of presented approach. The presented method uses inception-v1 [4], that is also called GoogLeNet, is a deep lightweight network such as its basic concept is improved performance and efficient computation. The GoogLeNet is relatively less computation cost is a product of two concepts: (1) optimum CNN with sparsity as introduced in [66]; and (2) the dimension reduction with 1 × 1 convolutional layer as presented in [67]. The Inception-v1 components in GoogLeNet architecture are used three filter sizes including 5 × 5, 3 × 3 and 1 × 1 and also a max-pooling layer. For dimension reduction, the 5 × 5 and 3 × 3 filters precedes with 1 × 1 convolutional layer, while the max-pooling layer succeeds with 1 × 1 convolutional layer. GoogleNet architecture designed for efficient computation and reduced the number of parameters. GoogleNet architecture is 22 deep layers when counts layers with parameters and 27 layers when counts pooling. In independent building blocks, there are 100 layers which are used for GoogLeNet architecture. Alternatively, VGG-19 architecture is employed to test its strength that primarily depends on the CNNs model. The VGG-19 employs 16 layers or 19 layers because of its simplicity like 3 × 3 convolutional layers are fixed on the top for increment with the depth level. In VGG-19, for reduction of volume size, the max pooling layers are applied as a handler. Two FC layers are applied with 4096 neurons. In the training phase, the convolutional layers are employed for feature extraction and the max pooling layers connected with some convolutional layers for reduction of feature dimensionality. In the first convolutional layer, 64 kernels were used for the feature extraction from input images. Fully connected layers were applied to prepare the feature vector (FV). The PCA is used to reduce dimensionality and the features selection for image with better results of classification. It is a significant task to reduce the highly dimensional data using PCA. Ten-fold cross validation is used to categorize the DR images depend on softmax method in the testing phase [43]. The performance of presented technique using VGG-19 is used to compare with other features extraction models including GoogLeNet and ResNet-50. Moreover, the presented method uses ResNet-50 architecture to fuse with the proposed feature extraction and detection to achieve maximum accurate results. The presented approach adopts second level non-linearity. The dimensions are essential of equal size as i and Ŗ in Y = Ŗ (i, {Wj}) + i. The output and input channels are used to apply changes and a linear estimation W s of shortcut connection performs to match the dimensions in Equation (21) [5]
Y = Ŗ   ( i ,   { W j } )   +   W t I
In Equation (21), the identity mapping appropriates the degradation problem and W t is employed for matching dimensions. Ŗ (i, {Wj}) is used to represent multiple convolutional layers.
Using ResNet-50, GoogLeNet and VGG-19, performance of many applications for computer vision have been increased including objects detection and recognition. The feature vectors are fused with the GoogLeNet, ResNet-50 and VGG-19 generated feature vectors to create a powerful image signature that deeply represents the object and shape features.

4. Experimentation

4.1. Databases

The effective and accurate image retrieval (IR) system is experimented on a variety of suitable image databases. Many contributions are essentially domain oriented. Various images datasets are applied to their object information, complexity, spatial color, generic usage of CBIR, versatility, and object occlusion. Experimentation is performed by selecting standardized benchmarks including Cifar-100 [68], Fashion (15) [69], Cifar-10 [68], (250) [63], Corel-1000 [63], FTVL Tropical Fruits [63], Corel-10000 [63], Oxford Buildings [70], Caltech-256 [71], and 17-Flowers [63]. These challenging databases exist to a wide range of semantic images groups. The effectivity and accuracy of results are affected using image characteristics including occlusion, cluttering, size, quality, object location color, and overlapping. The selected datasets characteristics are in different range of images, image classes and classes have various objects types located at foreground and background [72].

4.1.1. Input Process

The system takes a query image that is generally a color image. The conversion is performed on colored image into gray scale 0 to 255 levels for presented algorithm and this converted image is used as input image to CNN. The input image is used from image datasets in input process. In presented work, the input image is selected from Cifar-100, Corel-1000, Cifar-10, ALOT (250), Corel-10000, Oxford Buildings, Fashion (15), FTVL Tropical Fruits, 17-Flowers, and Caltech-256. For the query image, features are extracted, and the bag-of-words architecture is employed to find k-nearest images and index. The robustness and superiority of presented approach is capable to classify the shapes, colors, textures, and objects. The images are tested for testing and training with 30% and 70% proportions of testing and training respectively. The random images selection is performed from each image category using permutation.

4.1.2. Recall and Precision Evaluation

The two metrics, recall and precision are applied for evaluation of performance accuracy. The calculation of true positive ratio is used as recall and predicted positive values are employed to precision. Then, precision is calculated for each image category using Equation (22) [72] and recall is calculated for each image category using the following Equation (23) [72]
Precision = E w ( n ) E u ( m )
Recall = E w ( m ) E o
where E w ( m ) is applied at query image to retrieve the related images, E u ( m ) represents the contrast of query image and aggregates of the available relevant images.

4.1.3. Mean Average Precision Evaluation

The mean average precision (mAP) is mean precision that is computed over all queries. The mAP is described as in Equation (24) [73]
mAP   =   q = 1 l E ( q ) × r e l ( q ) r
In Equation (24), E(q) is denoted by average precision for top q image retrieval ratios; a binary indicator function rel (q) is equal to one if qth image retrieved rates are related to current query and otherwise zero; and r and l represent number of related results for current query and summation of retrieved images results respectively.

4.1.4. Average Retrieval Precision Evaluation

The graphs of average retrieval precision (ARP) indicate ARP for presented approach of different databases. The ARP is calculated for every category by applying Equation (25) [63]
ARP   =   b = 1 i A Q b |   i
The AQ is applied for the average precision and b is applied of all categories. The ARP is employed to calculate AQ for all classes of every dataset. The ARP graphs are used to show data placement in order where each data bar is denoted by corrected number of retrieved images regardless of the class. The x-axis represents number of categories contrary to AQ. The AQ is decreased slowly when increased number of classes. The reason is that the large number of classes plot a big denominator. The ARP is calculated for databases such as Cifar-100, Cifar-10, Corel-1000, Oxford Buildings, ALOT (250), Fashion (15), Tropical Fruits, Corel-10000, Caltech-256, and 17-Flowers.

4.1.5. f-Measure Evaluation

The calculation of f-measure is performed as harmonic mean (HM) for recall (t) and average precision (s) as described in Equation (26) [74]
f   =   2 × s × t s + t
Here, f is applied for f-measure in Equation (26). Also, t is used for recall and s for precision.

4.2. Discussion and Results

The experimentation is performed on Core i7 machine (GPU) with 8 GB RAM. MATLAB R2019a is used for experimentation using CNN and image processing toolboxes. The strong and versatile deep image and layer functions are integrated in code-base to reveal the image. Moreover, the computational efficiency and execution time to retrieve images and are computed against inverted file indexing and non-hierarchical searching.
To test the efficiency of the presented technique, the experiments are performed on Cifar-100, Corel-1000, Cifar-10, 17-Flowers, ALOT (250), Corel-10000, Oxford Buildings, Fashion (15), Caltech-256, and Tropical Fruits datasets. The versatility and superiority for presented approach is tested on three different convolutional neural networks including GoogLeNet, VGG-19, and ResNet-50.

4.2.1. Performance on the Cifar-10 Dataset

This dataset [68] comprises of different semantic groups including ships, cats, cars, airplane, dogs, trucks, horses birds, deer, and frogs. This dataset contains 6000 images in every category. The presented technique provides high AP ratios for most of Cifar-10 classes. The sample images of various categories for cifar-10 database is shown in Figure 2.
The images are classified accurately due to CNN feature applied in presented approach. The Gaussian filters and non-maximal suppressions (NMS) with deep learning features make possible to efficiently classify the images from a huge range of image semantic groups—such as ships, trucks, cars, dogs, horse, frogs, deer, airplane and birds. The presented approach is expected image retrieval results with maximum throughput. At this stage, the computational load is very important issue. The presented technique is adjusted it using proper image channeling, isotroping, LOG filtering with derivations, NMS, multilevel filtering, scaling, and feature reduction in different stages; firstly, image channeling in Section 3.1 to convert color image into gray level. Secondly, isotroping and LOG filtering with derivations are applied in Section 3.2. Thirdly, applying first ordering, Kernel approximation, and recursive filtering in Section 3.4. Finally, applying PCA in Co-variants selection for detection in Section 3.9.
In Figure 3a, the proposed method shows AP results for Cifar-10 database using ResNet-50, VGG-19, and GoogLeNet. Figure 3a reports highest AP ratios using ResNet-50 for most of the Cifar-10 categories. The presented technique shows above 85% AP rates by using features extracted from ResNet-50, above 80% AP ratios using VGG-19 and not less than 70% AP rates using GoogLeNet for most of the semantic groups of Cifar-10 dataset. The presented aooroach is outperformed for the tiny, mimicked, occupied background and multiple objected foreground images due to its object recognition capability. The image classification achieves remarkable AP ratios by presented method for cluttered, complex, and overlapping objects. The proposed approach shows significant AR results for cifar-10 in Figure 3b.
In Figure 4a, the presented approach reports outstanding f-measure ratios for the large, mimicked, complex foreground and background images. Moreover, the presented approach provides significant f-measure ratios using ResNet-50, VGG-19, and GoogLeNet. The proposed approach provides 90% mean average precision using ResNet-50, 85% using VGG-19, and 81% using GoogLeNet for Cifar-10. The presented approach reports outstanding ARP ratios using ResNet-50 for the categories including frogs, dogs, ships, airplanes, and birds as shown in Figure 4b. The proposed approach also shows above 85% average retrieval precision ratios for the other categories which reports significant performance for the proposed technique of Cifar-10 database. The presented approach shows better ARP rates by using features extracted from VGG-19 and GoogLeNet.

4.2.2. Performance on the Cifar-100 Dataset

This database is similar as Cifar-10 database using 32 × 32 color images rather it consists of 100 various categories. The set from [68] is comprised of different semantic groups including road, bowls, butterfly, rabbit, mountain, lamp, forest, bus, house, elephant, tractor, tiger, willow, clock, motorcycle, person, palm, rocket, etc. It contains 600 images for every category. Figure 5 shows various sample images for Cifar-100.
Table 1 shows significant AP and f-measure ratios for Cifar-100 dataset. The presented approach reports outperformance using ResNet-50 with more than 90% AP ratio for many categories. It is noticed that proposed approach using VGG-19 shows more than 85% AP results in categories including forest, mountain, bus, rocket, butterfly, willow, tiger, person, and elephant. It is also observed that the presented approach using GoogLeNet provides 90% AP rates in motorcycle and palm, 95% in truck and 92% in road category. The strength of the presented approach provides outstanding AP results of Cifar-100.
In Figure 6a, the presented approach reports 98% mean average precision using ResNet-50, 90% using VGG-19, and 81% using GoogLeNet for Cifar-100 dataset. The presented technique provides the highest average retrieval precision ratios by using features extraction from ResNet-50 for most of the image categories which reports the significant performance of presented approach for Cifar-100 database as reported in Figure 6b. The presented method shows above-80% ARP rates using VGG-19 and better ARP rates using GoogLeNet.

4.2.3. Performance on the Oxford Building Dataset

This dataset [70] contains 5062 images composed from the Flickr to search for specific Oxford landmarks. The collection of images is manually classified into 11 various landmarks images, and the query set comprises of 55 images. The oxford buildings dataset is challenging for detection of cluttered and occlusion background objects. In Figure 7, the sample images of oxford buildings dataset.
The presented approach provides significant performance using ResNet-50 with more than 85% AP rates for most of the categories as shown in Table 2. It is noticed that the presented approach using VGG-19 shows more than 80% AP ratios in categories including ashmolean, balliol, chirst church, hertford, jesus, magdalen, oriel, oxford, pitt rivers, Radcliffe, and trinity. It is also observed that the presented technique using GoogLeNet provides remarkable AP rates for most of the image categories. Moreover, the strength of the presented approach presents outstanding AP results for oxford buildings dataset. The presented approach extracts features based on shape and color which reports results accuracy. The proposed approach reports outstanding AR rates for most of the categories of Oxford buildings. Moreover, Table 2 shows f-measure results for oxford buildings database. The presented approach shows f-measure between 19% to 30% for all categories using ResNet-50, VGG-19, and GoogLeNet. In Table 2, the presented approach reports highest ARP rates using ResNet-50 for most of the categories which reports the significant performance of the proposed approach for oxford buildings dataset. The presented approach shows above 80% ARP rates using ReNet-50, 75% ARP rates using features extracted from VGG-19, and 70% ARP rates using GoogLeNet. The presented approach reports 82% mAP using ResNet-50, 90% using VGG-19, and 78% using GoogLeNet for Oxford buildings dataset.

4.2.4. Performance on the ALOT (250) Dataset

The ALOT (250) [63] dataset is a challenging benchmark for the image classification and categorization. The ALOT (250) dataset is specially used for texture image classification. Furthermore, all classes are essential aspect for content-based image retrieval. Due to this challenge, the presented approach is experimented on a huge database contains 250 classes to test the versatility and efficiency of the presented technique. The ALOT dataset [63] comprises 250 classes by 100 samples to each. The ALOT database images have 384 × 235 pixels resolution [63]. Semantically different groups of ALOT includes clothes, spices, leaves, cigarettes, sands, vegetables, fruit, stones, sea shells, fabrics, coins and seeds, embossed fabrics, horizontal and vertical lines, bubbles, small repeated patterns, etc. These various classes are contributed for several texture information, object shapes, spatial information and objects to effectively images classification. The proposed technique efficiently classifies texture images from semantic same groups with same large, complex, overlay, texture, background and foreground objects. Gaussian filters, multi-scale filters, color coefficients, and L2 normalization steps are used by the presented model to achieve significant results for images with various textures. The images are efficiently classified by CNN features with partial derivatives, Eigenvalues, and LoG in the presented t approach. Multi-scale filtering on various levels and scale spacing are applied to achieve outstanding average precision ratios for different texture images. Most of the categories of ALOT dataset contain texture images with same colors and patterns where other categories consist of various object patterns. In Figure 8, the various sample images of ALOT (250) database are shown.
Table 3 shows remarkable AP and f-measure rates for ALOT (250) database. The proposed approach shows significant AP results up to 80% using ResNet-50 for most of the ALOT categories. It is observed that the presented approach shows better results for many image categories with various shape and color. Multiscale filtering, spatial mapping, scale spacing, and RGB coefficients with CNN features make it possible to efficiently and effectively classify images. Different images with various categories including fruits, spices, vegetables, and seeds with various color are used. The presented technique shows more than 90% AP rates using ResNet-50, above 85% using VGG-19 and above 80% using GoogLeNet for these kinds of image classes. Also, image semantic groups for bubble textures, horizontal and vertical lines, small repeated patterns, embossed fabrics, and others are classified and experimented accurately. It is also observed that presented technique shows f-measure ratios between 18% to 27% for all categories in Table 3. Moreover, the presented approach reports highest mean average precision using ResNet-50 for ALOT 250 dataset. Overall, mean average precision using ResNet-50 is 89%, mAP by using features extracted from VGG-19 is 84% and mAP using GoogLeNet is 78% for all categories of ALOT (250) database. Moreover, the presented method provides above 85% ARP rates using ResNet-50, above 80% using VGG-19, and above 70% ARP rates by using features extracted from GoogLeNet for most of the image categories of ALOT (250) database.

4.2.5. Performance on the Fashion (15) Dataset

The robustness and versatility of the presented method is experimented by testing fashion (15) database. The fashion (15) database is appropriate for analyzing texture, also it contains images with different color, texture, size, and shapes. The object classes comprise with various fabrics kinds including jersey t-shirt, undergarments, shirt, long dress, robe, blouses, coat jacket, sweater, uniform, cloak, suit, polo-sport shirt, and vest-waistcoat [69]. The fashion database contains above 260 thousand images with various complex, cluttered, background and foreground textures. Figure 9 shows sample images of fashion dataset.
The presented approach outperforms for large, cluttered, mimicked and complex occupied objects due to its capability of object recognition. The presented approach is applied significantly image classification and provides improved average precision for complex, cluttered, and overlapping objects. The presented technique reports above 85% AP results using ResNet-50, above 75% using VGG-19, and above 70% using GoogLeNet in most of the categories as shown in Figure 10a. The presented approach is reported outstanding AP results using ResNet-50 for fashion dataset. The average recall ratios are shown in Figure 10b for fashion (15) dataset.
The presented method shows mAP in Figure 11a of fashion dataset. The mean average precision (mAP) using ResNet-50 is 86%, mAP using VGG-19 is 81%, and mAP using GoogLeNet is 76% for all categories of fashion (15) dataset. Figure 11b shows above 85% results by using features extraction from ResNet-50, above 75% ARP rates using VGG-19 and above 70% ARP rates using GoogLeNet for many categories. The presented technique shows remarkable ARP ratios of fashion (15) database using convolutional Laplacian scaled object features with mapped colored channels to obtain the highest image retrieval rates to efficiently classify and index images. Many categories of the fashion dataset—such as jersey-t-shirt, coat, blouses, robe, uniform jacket, and long dress texture—show encouraging performance of the presented technique. Moreover, the presented method reports improved ratios between 18% to 29% for all categories using ResNet-50, VGG-19, and GoogLeNet.

4.2.6. Performance on the Corel-1000 Dataset

This database is normally applied to retrieving and classifying images [75,76,77]. The Corel-1000 dataset consists of several image classes comprising plain foreground and background images for complex and cluttered objects. This dataset consists of different semantic groups including buildings, natural scenes, mountains, people food, buses, animals, and flowers. Figure 12 shows various sample images of Corel-1000.
The AP ratios are shown in Figure 13a for Corel-1000 database. The presented approach efficiently and effectively classifies images from various kinds of groups comprising different occupied background, blobs, complex and foreground objects. The spatial mappings with CNN features, image scaling, integration, and multilevel filtering techniques make it possible to proficiently image classification. The AP rates of Corel-1000 database show dominant performance of presented approach for most of the categories due to its L2 norm, scale spacing, RGB coefficient and spatial mappings. The presented approach shows highest performance in many classes such as horse, flowers, buses, dinosaurs, buildings, food, and mountains. For complex foreground and background categories—including mountains, horses, flowers, and buildings—the presented approach reports above 90% AP ratios using ResNet-50. The category buildings and mountains show 100% and 99% AP rates using ResNet-50 respectively. Moreover, all other categories also report more than 77% AP ratios. The presented approach shows AR rates for all categories of Corel-1000 dataset in Figure 13b.
The presented approach shows the mAP in Figure 14a for Corel-1000 dataset. The presented technique reports 91% mAP using ResNet-50, 87% using VGG-19, and 82% using GoogLeNet. In Figure 14b, the presented method shows above 87% ARP using ResNet-50, above 82% ARP rates using VGG-19, and above 75% ARP rates using GoogLeNet for many categories. The presented approach shows outstanding ARP ratios for Corel-1000 dataset using L2 norm, color coefficient, and filtering to effectively and efficiently image classification and indexing. Moreover, the presented approach shows outstanding f-measure ratios using ResNet-50, GoogLeNet, and VGG-19 for Corel-1000 database.

4.2.7. Performance on the Corel-10000 Dataset

The corel-10000 database [38] comprises several image classes. The corel-10000 dataset is comprised of 100 classes where each class consists of 100 images. This dataset comprises different semantic groups—including cars, flowers, texture, shining stars, butterfly, human texture flags, trees planets, ketch, hospital, text, sunset, food, animals, etc. Sample images from this dataset are shown in Figure 15.
The presented approach provides significant AP ratios in Figure 16 for corel-10000 dataset. The AP ratio is between 60% and 100%. The presented method reports highest AP results using ResNet-50 with more than 85% AP rates for mostly classes. It is noted that the proposed method using VGG-19 shows above 80% AP results in many categories. It is also noticed that the presented technique using GoogLeNet provides above AP rates 75% in most of the categories. This dataset effectively classifies images with its deep learning feature of the presented technique. The spatial mapping and multi-scale filtering with CNN features make possible to effectively and proficiently image classification. The strength of the presented method is presented outstanding AP rates of corel-10000. The presented method provides better recall rates for all categories of corel-10000.
The presented approach reports mAP results in Figure 17a. The mAP precision using ResNet-50 is 88%, mAP using VGG-19 is 84% and mAP using GoogLeNet is 79% for all categories of Corel-10000 dataset. Moreover, the presented method shows more than 85% using ResNet-50, above 80% ARP rates using VGG-19 and above 75% ARP rates using GoogLeNet for most of the categories in Figure 17b. The presented approach shows significant ARP ratios of Corel-10000 dataset using L2 normalization, RGB color coefficients to efficiently and effectively image classification and indexing.

4.2.8. Performance on the 17-Flowers Dataset

This dataset [63] contains 80 images for each class. These flowers images select from very common flowers that are existed in UK. The image has attributes like light and pose variations. Some categories of flowers are different in shape but same in color. The samples images of 17-flowers dataset are shown in Figure 18. 17-flowers dataset consists various types of flowers are Lily Valley, Daffodils, Crocus, Snowdrop, Iris, Dandelion, Tulip, Bluebell, Pansy, Buttercup, Sunflower, Tigerlily, Windflower, Daisy, Colts’ Foot, Fritillary, and Cowslip.
The presented approach shows remarkable AP rates for 17-flowers dataset as shown in Figure 19a. It is observed that the presented technique reports better results for most images of 17-flowers classes with various color, shape, and texture. The scale spacing, spatial mapping, L2 norm, and RGB coefficients with CNN features make it possible to affectively classify images of flowers. The presented technique reports significant average precision ratios using ResNet-50 in flower categories including Bluebell, Cowslip, Colts’ Foot, Daisy, Crocus, Dandelion, Tiger lily, Tulip, Fritillary, Sunflower, Lily valley, and Windflower. The versatility and superiority of the presented approach is to differentiate object depended on their color and texture which plays main role in flowers classification. Furthermore, the presented approach shows amazingly highest rates using ResNet-50 in 12/17 flower classes. The proposed technique reports high AP ratios using VGG-19 in Buttercup, Iris, and Snowdrop flower categories. It is noticed that the presented approach shows high AP ratio using GoogLeNet in Daffodils and Pansy flower categories. Sunflower and fritillary categories are similar in size and shape with different color. In Figure 19b, the presented approach shows f-measure results for 17-flowers. The proposed method shows f-measures between 18% and 30% for all image categories of 17-flowers dataset.
The presented approach shows 84% mAP using ResNet-50, 81% using VGG-19, and 77% using GoogLeNet for 17-flowers dataset in Figure 20a. In Figure 20b, the presented approach reports highest ARP rates using ResNet-50 for most of the categories which reports the significant performance of the proposed method for 17-flowers dataset. The presented approach presents above 80% ARP rates by using features extracted from ResNet-50, above 75% ARP using VGG-19, and 70% ARP rates using GoogLeNet.

4.2.9. Performance on the FTVL Tropical Fruits Dataset

The FTVL tropical fruits database [63] consists 2612 images of 15 various types of fruits including Tahiti Lime, Fuji Apple, Cashew, Diamond Peach, Granny Smith Apple, Asterix Potato, Nectarine, Watermelon, Honeydew Melon, Agata Potato, Spanish Pear, Plum, Kiwi, Onion, and Orange. Sample images for FTVL Tropical Fruits dataset are shown in Figure 21.
In Table 4, the presented approach shows remarkable AP, f-measure and ARP rates of the FTVL tropical fruits database. It is noted that presented technique shows highest results for most of the images of FTVL tropical fruits categories with similar shape and color. The spatial mapping, L2 norm, color coefficients, and multilevel scaling with CNN features make it possible to effectively classify images of tropical fruits. The presented technique reports high precision rates using ResNet-50 in tropical fruits categories including Tahiti Lime, Cashew, Agata Potato, Diamond Peach, Astrix Potato, Nectarine, Fuji Apple, Watermelon, Plum, Onion, and Orange. The superiority of the presented technique is to differentiate objects that are based on color, shape and texture which plays important role for classification of tropical fruits. However, the presented method shows amazingly high rates using ResNet-50 in 12/15 tropical fruits classes. The proposed approach reports highest AP rate using VGG-19 in Spanish Pear and Granny Smith Apple categories of tropical fruits. It is observed that the presented approach shows significant AP ratio using GoogLeNet in Honeydew Melon flower category. The presented approach shows outstanding f-measure results in Table 4 of FTVL Tropical Fruits database. However, the presented approach provides outstanding f-measure results using ResNet-50, VGG-19 and GoogLeNet. Moreover, the presented approach reports highest ARP rates using ResNet-50 for most of the categories which reports significant performance of the proposed method for tropical fruits dataset. The presented method presents above 90% ARP rates by using features extracted from ResNet-50, above 85% ARP using VGG-19 and 80% ARP rates using GoogLeNet. The proposed approach shows 92% mAP using ResNet-50, 88% using VGG-19, and 82% using GoogLeNet for tropical fruits dataset.

4.2.10. Performance on the Caltech-256 Dataset

The Caltech-256 database [71] comprises of more than thirty thousand images which are allocated to 257 varied categories. Fifteen different image categories are used to perform experimentation including bonsai, wrist watch, back-pack, teddy-bear, cactus, airplane, boxing gloves, teapot, spider, billiards, swan, tomato, bulldozer, tree, and butterfly. All image categories in database are essential due to their texture patterns and background and foreground objects. Sample images of Caltech-256 dataset are shown in Figure 22.
The AP rates of Caltech-256 database is shown in Figure 23a. The presented approach provides the highest results for most of the images of Caltech-256 categories with similar shape and color. The spatial mapping, L2 norm, RGB coefficients, and scale spacing with CNN features make it possible to effectively classify images of Caltech-256. It is noted that presented method reports high precision rates using ResNet-50 in tropical fruits for most of the categories. The superiority of the presented technique is to differentiate objects that are based on color, shape, and texture which plays important role for classification of Caltech-256. However, the presented approach shows amazingly high rates using ResNet-50 in 13/15 Caltech-256. The proposed approach reports highest AP ratios using VGG-19 in wrist watch and airplane categories of Caltech-256. It is observed that the presented technique shows significant AP ratio using GoogLeNet in teddy bear category. The presented approach provides outstanding recall rates for all categories of Caltech-256. Moreover, the presented approach reports 90% mAP using ResNet-50, 86% using VGG-19, and 80% using GoogLeNet. In Figure 23b, the presented method shows above 89% ARP using ResNet-50, above 82% ARP rates using VGG-19, and above 75% ARP rates using GoogLeNet for many categories. The presented approach shows outstanding ARP ratios for Caltech-256 dataset using L2 norm, color coefficient, and multi-scale filtering to effectively and efficiently image classification and indexing.

4.2.11. Results of the FTVL Tropical Fruits, Cifar-10, Corel-1000, and 17-Flowers Datasets with State-of-the-Art Methods

To check the accuracy and efficiency of the presented method, it compares with state-of-the-art methods. The comparison is performed with existing methods provided outstanding performance.
The FTVL fruit dataset [63] is used for experimentations because of its cropped object, illumination differences, pose variations, and partial occlusions; and Figure 24 depicts AP rates for FTVL tropical fruits dataset, the proposed method is compared with existing methods including CBRFF [63] and with existing research techniques in [78]. The presented method shows significant average precision results in most of the image categories of FTVL tropical fruits dataset.
The AP ratios of the proposed approach technique in assessment with state-of-the-art methods are presented in Table 5. Some techniques CDH +SEH illustrate low accuracy rate due to the missing nature for their technique in the cropped objects. Some methods incorporate the textural attribute and provide AP but with mixed, ambiguous images. However, the objects with the same color and shape are still hard to recognize for them. The presented method is takes color coordinates into consideration using shape and textural properties to report 0.96 mAP.
The mean average precision results are graphically shown in comparison with existing research methods for cifar-10 dataset of the presented approach and other existing research approaches are shown in Figure 25. The proposed approach shows remarkable mAP results over state of the art methods for cifar-10 dataset. The existing methods including ETRCI [76], IRSCTS [79], FFCDIR [80], MSCBIR [81], CBIRCT [82], SPRCNN [83], and PBOMLA [84]. The mean average precision of the presented approach as compared with existing research methods is graphically presented in Figure 25.
17-Flowers database is used for shape, texture and color features experimentation. The comparative results of the proposed approach with existing research methods for mean average precision metric are shown in Table 6. The fine-grained [85] technique provides improved ratios by extracting deeper shape and color information. The research method in [86] performs the spatial matching and calculates the differences depending upon the algorithmic criteria and results in low precision due to colors and shapes. The other existing research methods [86,87,88] include linear coding and return mean average precision results. This approach is not capable to perform deep analysis on shape and texture features. The proposed approach returns outstanding AP in most of the flower categories using spatial texture and color patterns along with shape details. The mean average precision results for the proposed approach versus state-of-the-art approaches in Table 6.
Experimentation is performed to check the accuracy and effectiveness of the presented approach, average precision results for Corel-1000 dataset are shown in comparison with state-of-the-art research approaches. The challenging research methods used for the comparison are ETRCI [76], DISR [73], CBIF [63], FFECIR [91], MDLBP [75], MNSIR [81], ENNSR [92], CRHOG [93], MCFGM [94], and EIRCTS [79]. In Figure 26, a graphical illustration of AP rates for the presented approach as compared with state-of-the-art research methods. The presented technique shows the highest AP results performance in most of the image categories of Corel-1000 dataset. The proposed approach reports better AP rates in the categories—such as buildings, flowers, mountains, and food.
Figure 27 shows mAP results of the presented approach in comparison with state-of-the-art research methods. The proposed approach reports highest 0.91 mAP. CBIF [63] shows second highest 0.84 mAP. MNSIR [81], ENNSR [92] report 0.76 mAP. ETRCI [76], DISR [73], FFECIR [91], MDLBP [75], CRHOG [93], and MCFGM [94] show mean average precision ratios between 0.66 and 0.80. EIRCTS [79] provides the lowest 0.59 mAP. The EIRCTS [75] covers the textural aspects therefore mAP rates are comparatively lower than the other approaches.

5. Conclusions

This research presents a novel interactional fusion of GoogLeNet, VGG-19, and ResNet-50 with an innovative salient anchor collection and detection framework to enhance the image retrieval accuracy over large datasets. This innovative deep learning solution is a breakthrough in framework capsulation, features binding, primitive features fusion with layered technology, deep features orientation, and CNN typed effects on revealed local and global signatures. The remarkable outcomes with extensive experimentation from benchmark architectures endorsed the superiority of the presented approach with ALOT (250), Corel-10000, Cifar-10, Oxford Buildings, FTVL Tropical Fruits, 17-Flowers, Cifar-100, Fashion (15), Corel-1000, and Caltech-256 highly recognizable datasets. An extension to this research is the fusion of several formulations.

Author Contributions

Conceptualization, K.K. and K.T.A.; Methodology, K.K.; Writing—Original draft preparation, K.K.; Investigation, K.K.; Data curation, K.K., R.K. and N.A.; Formal analysis, K.K.; validation, K.K., L.J. and K.T.A.; resources, L.J.; visualization, K.K., K.T.A. and L.J.; Software, K.K.; supervision, L.J.; experimentation, K.K.; investigation, K.K.; implementation, K.K. and K.T.A.; writing-review and editing, L.J. and K.T.A.; Project administration, L.J.; funding acquisition, L.J. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is supported by the Chinese Academy of Sciences through the Strategic Priority Research Program (A Class) under Grant XDA19020102.

Conflicts of Interest

There no conflict of interest.

References

  1. Zhu, X.; Bain, M. B-CNN: Branch convolutional neural network for hierarchical classification. arXiv 2017, arXiv:1709.09890. [Google Scholar]
  2. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–8 December 2012. [Google Scholar]
  3. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  4. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
  5. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  6. Chetlur, S.; Wool, C.; Vandermersch, P.; Cohen, J.; Tran, J.; Catanzaro, B.; Shelhamer, E. cuDNN: Efficient primitives for deep learning. arXiv 2014, arXiv:1410.0759. [Google Scholar]
  7. Wang, Y.; Wei, G.-Y.; Brooks, D. A systematic methodology for analysis of deep learning hardware and software platforms. In Proceedings of the Machine Learning and Systems, Austin, TX, USA, 2–4 March 2020; pp. 30–43. [Google Scholar]
  8. Canziani, A.; Paszke, A.; Culurciello, E. An analysis of deep neural network models for practical applications. arXiv 2016, arXiv:1605.07678. [Google Scholar]
  9. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [Green Version]
  10. Borji, A.; Cheng, M.; Jiang, H.; Li, J. alient object detection: A benchmark. IEEE Trans. Image Process. 2015, 24, 5706–5722. [Google Scholar] [CrossRef] [Green Version]
  11. Abdel-Hamid, O.; Mohamed, A.; Jiang, H.; Deng, L.; Penn, G.; Yu, D. Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2014, 22, 1533–1545. [Google Scholar] [CrossRef] [Green Version]
  12. Johnson, R.; Zhang, T. Semi-supervised convolutional neural networks for text categorization via region embedding. Adv. Neural Inf. Process. Syst. 2015, 28, 919–927. [Google Scholar]
  13. Zhou, W.; Li, H.; Tian, Q. Recent advance in content-based image retrieval: A literature survey. arXiv 2017, arXiv:1706.06064. [Google Scholar]
  14. Kim, Y. Convolutional neural networks for sentence classification. arXiv 2014, arXiv:1408.5882. [Google Scholar]
  15. Gudivada, V.N.; Raghavan, V.V. Content based image retrieval systems. Computer 1995, 28, 18–22. [Google Scholar] [CrossRef] [Green Version]
  16. Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision, Corfu, Greece, 25 September 1999. [Google Scholar]
  17. Massaoudi, M.; Bahroun, S.; Zagrouba, E. Video summarization based on local features. arXiv 2017, arXiv:8086943461. [Google Scholar]
  18. Abubakar, F.M. A study of region-based and contour-based image segmentation. Signal Image Process. 2012, 3, 15. [Google Scholar]
  19. Friedman, N.; Russell, S. Image segmentation in video sequences: A probabilistic approach. arXiv 2013, arXiv:1302.1539. [Google Scholar]
  20. Kamdi, S.; Krishna, R. Image segmentation and region growing algorithm. Int. J. Comput. Technol. Electron. Eng. (IJCTEE) 2012, 2, 103–107. [Google Scholar]
  21. Liu, J.; Leung, S. A splitting algorithm for image segmentation on manifolds represented by the grid based particle method. J. Sci. Comput. 2013, 56, 243–266. [Google Scholar] [CrossRef]
  22. Dhanachandra, N.; Manglem, K.; Chanu, Y.J. Image segmentation using K-means clustering algorithm and subtractive clustering algorithm. Procedia Comput. Sci. 2015, 54, 764–771. [Google Scholar] [CrossRef] [Green Version]
  23. Amoda, N.; Kulkarni, R.K. Image segmentation and detection using watershed transform and region based image retrieval. Int. J. Emerg. Trends Technol. Comput. Sci. 2013, 2, 89–94. [Google Scholar]
  24. Szczypiński, P.; Klepaczko, A.; Pazurek, M.; Daniel, P. Texture and color based image segmentation and pathology detection in capsule endoscopy videos. Comput. Methods Programs Biomed. 2014, 113, 396–411. [Google Scholar] [CrossRef]
  25. Rasoulian, A.; Rohling, R.; Abolmaesumi, P. Lumbar spine segmentation using a statistical multi-vertebrae anatomical shape+pose model. IEEE Trans. Med. Imaging 2013, 32, 1890–1900. [Google Scholar] [CrossRef]
  26. Tang, M.; Djelouah, A.; Perazzi, F.; Boykov, Y.; Schroers, C. Normalized cut loss for weakly-supervised cnn segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
  27. Kadir, A.; Nugroho, L.; Susanto, A.; Santosa, P. Leaf classification using shape, color, and texture features. arXiv 2013, arXiv:1401.4447. [Google Scholar]
  28. Lin, S.; Crotty, K.M.; Vazquez, N. Shape Feature Extraction and Classification. U.S. Patent 7,668,376 B2, 23 February 2010. [Google Scholar]
  29. Riaz, F.; Hassan, A.; Rehman, S.; Qamar, U. Texture classification using rotation-and scale-invariant gabor texture features. IEEE Signal Process. Lett. 2013, 20, 607–610. [Google Scholar] [CrossRef]
  30. Rippel, O.; Snoek, J.; Adams, R.P. Spectral representations for convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 11 June 2015. [Google Scholar]
  31. Szegedy, C.; Zaremba, W.; Sutskever, L.; Bruna, J.; Erhan, D.; Goodfellow, L.; Fergus, R. Intriguing properties of neural networks. arXiv 2013, arXiv:1312.6199. [Google Scholar]
  32. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, L.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
  33. Kuo, C.-H.; Chou, Y.-H.; Chang, P.-C. Using deep convolutional neural networks for image retrieval. Electron. Imaging 2016, 2016, 1–6. [Google Scholar] [CrossRef] [Green Version]
  34. Chan, T.-H.; Jia, K.; Gao, S.; Lu, J.; Zeng, Z.; Ma, Y. CANet: A simple deep learning baseline for image classification? IEEE Trans. Image Process. 2015, 24, 5017–5032. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Xia, R.; Pan, Y.; Lai, H.; Liu, C.; Yan, S. Supervised hashing for image retrieval via image representation learning. In Proceedings of the Twenty-eighth AAAI Conference on Artificial Intelligence, Quebec City, QC, Canada, 27–31 July 2014. [Google Scholar]
  36. Lin, K.; Yang, H.; Hsiao, J.; Chen, C. Deep learning of binary hash codes for fast image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
  37. Gianfelici, F.; Biagetti, G.; Crippa, P.; Turchetti, C. Novel KLT algorithm optimized for small signal sets [speech processing applications]. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’05), Philadelphia, PA, USA, 18–23 March 2005. [Google Scholar]
  38. Kanwal, K.; Ahmad, K.; Khan, R.; Abbasi, A. Deep Learning Using Symmetry, FAST Scores, Shape-Based Filtering and Spatial Mapping Integrated with CNN for Large Scale Image Retrieval. Symmetry 2020, 12, 612. [Google Scholar] [CrossRef]
  39. Lavinia, Y.; Vo, H.H.; Verma, A. Fusion based deep CNN for improved large-scale image action recognition. In Proceedings of the 2016 IEEE International Symposium on Multimedia (ISM), San Jose, CA, USA, 11–13 December 2016. [Google Scholar]
  40. Abdallah, F.B.; Feki, G.; Ammar, A.; Amar, C. Multilevel deep learning-based processing for lifelog image retrieval enhancement. In Proceedings of the 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Miyazaki, Japan, 7–10 October 2018. [Google Scholar]
  41. Yamamoto, S.; Nishimura, T.; Akagi, Y.; Takimoto, Y.; Inoue, T.; Toda, H. Pbg at the ntcir-13 lifelog-2 lat, lsat, and lest tasks. In Proceedings of the NTCIR-13, Tokyo, Japan, 5–8 December 2017. [Google Scholar]
  42. Sapijaszko, G.; Mikhael, W.B. An Overview of Recent Convolutional Neural Network Algorithms for Image Recognition. In Proceedings of the 2018 IEEE 61st International Midwest Symposium on Circuits and Systems (MWSCAS), Windsor, ON, Canada, 5–8 August 2018. [Google Scholar]
  43. Mateen, M.; Wen, J.; Song, S.; Huang, Z. Fundus image classification using VGG-19 architecture with PCA and SVD. Symmetry 2019, 11, 1. [Google Scholar] [CrossRef] [Green Version]
  44. Liu, L.; Chen, J.; Fieguth, P.; Zhao, G.; Chellappa, R.; Pietikäinen, M. From BoW to CNN: Two decades of texture representation for texture classification. Int. J. Comput. Vis. 2019, 127, 74–109. [Google Scholar] [CrossRef] [Green Version]
  45. Srinivas, S.; Sarvadevabhatla, R.; Mopuri, K.; Prabhu, N.; Kruthiventi, S.; Babu, R. A taxonomy of deep convolutional neural nets for computer vision. Front. Robot. AI 2016, 2, 36. [Google Scholar] [CrossRef]
  46. Varga, D.; Szirányi, T. Fast content-based image retrieval using convolutional neural network and hash function. In Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 9–12 October 2016. [Google Scholar]
  47. Sezavar, A.; Farsi, H.; Mohamadzadeh, S. Content-based image retrieval by combining convolutional neural networks and sparse representation. Multimed. Tools Appl. 2019, 78, 20895–20912. [Google Scholar] [CrossRef]
  48. Burghouts, G.J.; Geusebroek, J.-M. Material-specific adaptation of color invariant features. Pattern Recognit. Lett. 2009, 30, 306–313. [Google Scholar] [CrossRef]
  49. Guo, Z.; Zhang, L.; Zhang, D. Rotation invariant texture classification using binary filter response pattern (BFRP). In Proceedings of the International Conference on Computer Analysis of Images and Patterns, Munster, Germany, 2–4 September 2009. [Google Scholar]
  50. Varma, M.; Zisserman, A. Classifying images of materials: Achieving viewpoint and illumination independence. In Proceedings of the European Conference on Computer Vision, Copenhagen, Denmark, 28–31 May 2002. [Google Scholar]
  51. Varma, M.; Zisserman, A. Texture classification: Are filter banks necessary? In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 18–20 June 2003. [Google Scholar]
  52. Basu, M. Gaussian-based edge-detection methods-a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2002, 32, 252–260. [Google Scholar] [CrossRef] [Green Version]
  53. Mainali, P.; Yang, Q.; Lafruit, G.; Van Gool, L.; Lauwereins, R. Robust low complexity corner detector. IEEE Trans.Circuits Syst. Video Technol. 2011, 21, 435–445. [Google Scholar] [CrossRef]
  54. Deriche, R. Recursively Implementating the Gaussian and Its Derivatives. Ph.D. Thesis, Institut National de Recherche en Informatique et en Automatique (INRIA), Le Chesnay-Rocquencourt, France, 1993. [Google Scholar]
  55. Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-up robust features (SURF). Comput. Vision Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
  56. Tomasi, C.; Kanade, T. Detection and Tracking of Point Features; CMU-CS-91-132; School of Computer Science—Carnegie Mellon University: Pittsburgh, PA, USA, 1991. [Google Scholar]
  57. Birchfield, S. KLT: An Implementation of the Kanade-Lucas-Tomasi Feature Tracker. 2007. Available online: http://www.ces.clemson.edu/~stb/klt/ (accessed on 2 December 2020).
  58. Sinha, S.N.; Frahm, J.; Pollefeys, M.; Genc, Y. GPU-based video feature tracking and matching. In Proceedings of the EDGE, Workshop on Edge Computing Using New Commodity Architectures, Chapel Hill, NC, USA, 23–26 May 2006. [Google Scholar]
  59. Harris, C.; Stephens, M. A combined corner and edge detection. In Proceedings of the Fourth Alvey Vision Conference, Manchester, UK, 31 August–2 September 1988. [Google Scholar]
  60. Mainali, P.; Yang, Q.; Lafruit, G.; Lauwereins, R.; Van Gool, L. Lococo: Low complexity corner detector. In Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 14–19 March 2010. [Google Scholar]
  61. Koenderink, J.J. The structure of images. Biol. Cybern. 1984, 50, 363–370. [Google Scholar] [CrossRef]
  62. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
  63. Ahmed, K.T.; Ummesafi, S.; Iqbal, A. Content based image retrieval using image features information fusion. Inf. Fusion 2019, 51, 76–99. [Google Scholar] [CrossRef]
  64. Biagetti, G.; Crippa, P.; Falaschetti, L.; Orcioni, S.; Turchetti, C. Multivariate direction scoring for dimensionality reduction in classification problems. In Proceedings of the International Conference on Intelligent Decision Technologies, Puerto de la Cruz, Spain, 15–17 June 2016. [Google Scholar]
  65. Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. Brief: Binary robust independent elementary features. In Proceedings of the European Conference on Computer Vision, Crete, Greece, 5–11 September 2010. [Google Scholar]
  66. Arora, S.; Bhaskara, A.; Ge, R.; Ma, T. Provable bounds for learning some deep representations. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014. [Google Scholar]
  67. Zhang, J.; Yu, J.; Tao, D. Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans. Image Process. 2018, 27, 2420–2432. [Google Scholar] [CrossRef] [Green Version]
  68. Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. Comput. Sci. Available online: https://www.semanticscholar.org/paper/Learning-Multiple-Layers-of-Features-from-Tiny-Krizhevsky/5d90f06bb70a0a3dced62413346235c02b1aa086 (accessed on 2 December 2020).
  69. Rostamzadeh, N.; Hosseini, S.; Boquet, T.; Stokowiec, W.; Zhang, Y.; Jauvin, C.; Pal, C. Fashion-gen: The generative fashion dataset and challenge. arXiv 2018, arXiv:1806.08317. [Google Scholar]
  70. Philbin, J.; Sivic, J.; Zisserman, A. Geometric latent dirichlet allocation on a matching graph for large-scale image datasets. Int. J. Comput. Vis. 2011, 95, 138–153. [Google Scholar] [CrossRef] [Green Version]
  71. Griffin, G.; Holub, A.; Perona, P. Caltech-256 Object Category Dataset. 7694. 19 April 2007. Available online: https://authors.library.caltech.edu/7694/ (accessed on 11 January 2021).
  72. Ahmed, K.T.; Irtaza, A.; Iqbal, M.A. Fusion of local and global features for effective image extraction. Appl. Intell. 2017, 47, 526–543. [Google Scholar] [CrossRef]
  73. Ahmed, K.T.; Ummesafi, S.; Iqbal, A. Deep Image Sensing and Retrieval Using Suppression, Scale Spacing and Division, Interpolation and Spatial Color Coordinates With Bag of Words for Large and Complex Datasets. IEEE Access. 2020, 8, 90351–90379. [Google Scholar] [CrossRef]
  74. Kandefer, M.; Shapiro, S. An F-measure for context-based information retrieval. Commonsense 2009, 79–84. [Google Scholar]
  75. Dubey, S.R.; Singh, S.K.; Singh, R.K. Multichannel decoded local binary patterns for content-based image retrieval. IEEE Trans. Image Process. 2016, 25, 4018–4032. [Google Scholar] [CrossRef] [PubMed]
  76. Shrivastava, N.; Tyagi, V. An efficient technique for retrieval of color images in large databases. Comput. Electr. Eng. 2015, 46, 314–327. [Google Scholar] [CrossRef]
  77. Zhou, Y.; Zeng, F.Z.; Zhao, H.M.; Murray, P.; Ren, J. Hierarchical visual perception and two-dimensional compressive sensing for effective content-based color image retrieval. Cogn. Comput. 2016, 8, 877–889. [Google Scholar] [CrossRef] [Green Version]
  78. Dubey, S.R.; Jalal, A.S. Fruit and vegetable recognition by fusing colour and texture features of the image using machine learning. Int. J. Appl. Pattern Recogn. 2015, 2, 160–181. [Google Scholar] [CrossRef]
  79. Wang, X.-Y.; Yu, Y.-J.; Yang, H.-Y. An effective image retrieval scheme using color, texture and shape features. Comput. Stand. Interfaces 2011, 33, 59–68. [Google Scholar] [CrossRef]
  80. Raghuwanshi, G.; Tyagi, V. Feed-forward content based image retrieval using adaptive tetrolet transforms. Multimed. Tools Appl. 2018, 77, 23389–23410. [Google Scholar] [CrossRef]
  81. ElAlami, M.E. A new matching strategy for content based image retrieval system. Appl. Soft Comput. 2014, 14, 407–418. [Google Scholar] [CrossRef]
  82. Lin, C.-H.; Chen, R.-T.; Chan, Y.-K. A smart content-based image retrieval system based on color and texture feature. Image Vis. Comput. 2009, 27, 658–665. [Google Scholar] [CrossRef]
  83. Zeiler, M.D.; Fergus, R. Stochastic pooling for regularization of deep convolutional neural networks. arXiv 2013, arXiv:1301.3557. [Google Scholar]
  84. Snoek, J.; Larochelle, H.; Adams, R.P. Practical bayesian optimization of machine learning algorithms. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
  85. Gao, S.; Tsang, I.W.-H.; Ma, Y. Learning category-specific dictionary and shared dictionary for fine-grained image categorization. IEEE Trans. Image Process. 2013, 23, 623–634. [Google Scholar] [PubMed]
  86. Yang, J.; Yu, K.; Gong, Y.; Huang, T. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami Beach, FL, USA, 20–25 June 2009. [Google Scholar]
  87. Wang, J.; Yang, J.; Yu, K.; Lv, F.; Huang, T.; Gong, Y. Locality-constrained linear coding for image classification. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010. [Google Scholar]
  88. Zhou, N.; Fan, J. Jointly learning visually correlated dictionaries for large-scale visual recognition applications. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 715–730. [Google Scholar] [CrossRef]
  89. Gehler, P.; Nowozin, S. On feature combination for multiclass object classification. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October 2009. [Google Scholar]
  90. Kong, S.; Wang, D. A dictionary learning approach for classification: Separating the particularity and the commonality. In Proceedings of the European Conference on Computer Vision, Florence, Ialy, 7–13 October 2012. [Google Scholar]
  91. Walia, E.; Pal, A. Fusion framework for effective color image retrieval. J. Vis. Commun. Image Represent. 2014, 25, 1335–1348. [Google Scholar] [CrossRef]
  92. Irtaza, A.; Jaffar, M.; Aleisa, E.; Choi, T. Embedding neural networks for semantic association in content based image retrieval. Multimed. Tools Appl. 2014, 72, 1911–1931. [Google Scholar] [CrossRef]
  93. Pan, S.; Sun, S.; Yang, L.; Duan, F.; Guan, A. Content retrieval algorithm based on improved HOG. In Proceedings of the 2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence, Okyama, Japan, 12–16 July 2015. [Google Scholar]
  94. Xiao, Y.; Wu, J.; Yuan, J. mCENTRIST: A multi-channel feature generation mechanism for scene categorization. IEEE Trans. Image Process. 2013, 23, 823–836. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Step-by-step process of object detection.
Figure 1. Step-by-step process of object detection.
Sensors 21 01139 g001
Figure 2. Various sample images for cifar-10 dataset [68].
Figure 2. Various sample images for cifar-10 dataset [68].
Sensors 21 01139 g002
Figure 3. (a) Average precision for cifar-10 dataset; (b) recall for cifar-10 dataset.
Figure 3. (a) Average precision for cifar-10 dataset; (b) recall for cifar-10 dataset.
Sensors 21 01139 g003
Figure 4. (a) f-measure results for cifar-10 dataset; (b) Average retrieval precision for cifar-10 dataset.
Figure 4. (a) f-measure results for cifar-10 dataset; (b) Average retrieval precision for cifar-10 dataset.
Sensors 21 01139 g004
Figure 5. Various sample images for cifar-100 dataset [68].
Figure 5. Various sample images for cifar-100 dataset [68].
Sensors 21 01139 g005
Figure 6. (a) Mean average precision for cifar-100 database; (b) Average retrieval precision for cifar-100 dataset.
Figure 6. (a) Mean average precision for cifar-100 database; (b) Average retrieval precision for cifar-100 dataset.
Sensors 21 01139 g006
Figure 7. Various sample images of oxford buildings dataset [70].
Figure 7. Various sample images of oxford buildings dataset [70].
Sensors 21 01139 g007
Figure 8. Various sample images for ALOT (250) [63].
Figure 8. Various sample images for ALOT (250) [63].
Sensors 21 01139 g008
Figure 9. Various sample images for fashion (15) database [69].
Figure 9. Various sample images for fashion (15) database [69].
Sensors 21 01139 g009
Figure 10. (a) Average precision for fashion (15) database; (b) Recall for fashion (15) database.
Figure 10. (a) Average precision for fashion (15) database; (b) Recall for fashion (15) database.
Sensors 21 01139 g010
Figure 11. (a) Mean average precision of fashion (15) database; (b) Average retrieval precision of fashion (15) database.
Figure 11. (a) Mean average precision of fashion (15) database; (b) Average retrieval precision of fashion (15) database.
Sensors 21 01139 g011
Figure 12. Various sample images for Corel-1000 database [38].
Figure 12. Various sample images for Corel-1000 database [38].
Sensors 21 01139 g012
Figure 13. (a) Average precision of Corel-1000 database; (b) Recall of Corel-1000 database.
Figure 13. (a) Average precision of Corel-1000 database; (b) Recall of Corel-1000 database.
Sensors 21 01139 g013
Figure 14. (a) Mean average precision of Corel-1000 database; (b) Average retrieval precision for Corel-1000 database.
Figure 14. (a) Mean average precision of Corel-1000 database; (b) Average retrieval precision for Corel-1000 database.
Sensors 21 01139 g014
Figure 15. Various sample images for corel-10000 dataset [38].
Figure 15. Various sample images for corel-10000 dataset [38].
Sensors 21 01139 g015
Figure 16. Average precision of corel-10000 database.
Figure 16. Average precision of corel-10000 database.
Sensors 21 01139 g016
Figure 17. (a) Mean average precision for the corel-10000 database; (b) Average retrieval precision for the corel-10000 database.
Figure 17. (a) Mean average precision for the corel-10000 database; (b) Average retrieval precision for the corel-10000 database.
Sensors 21 01139 g017
Figure 18. Various sample images for 17-flowers dataset [63].
Figure 18. Various sample images for 17-flowers dataset [63].
Sensors 21 01139 g018
Figure 19. (a) Average precision for 17-flowers dataset; (b) F-measure results for 17-flowers dataset.
Figure 19. (a) Average precision for 17-flowers dataset; (b) F-measure results for 17-flowers dataset.
Sensors 21 01139 g019
Figure 20. (a) Mean average precision for 17-flowesr dataset; (b) Average retrieval precision for 17-flowers dataset.
Figure 20. (a) Mean average precision for 17-flowesr dataset; (b) Average retrieval precision for 17-flowers dataset.
Sensors 21 01139 g020
Figure 21. Various sample images from FTVL Tropical Fruits dataset [63].
Figure 21. Various sample images from FTVL Tropical Fruits dataset [63].
Sensors 21 01139 g021
Figure 22. Various sample images of Caltech-256 dataset [71].
Figure 22. Various sample images of Caltech-256 dataset [71].
Sensors 21 01139 g022
Figure 23. (a) Average precision for Caltech-256 dataset; (b) Average retrieval precision of Caltech-256 database.
Figure 23. (a) Average precision for Caltech-256 dataset; (b) Average retrieval precision of Caltech-256 database.
Sensors 21 01139 g023
Figure 24. AP results of presented approach versus existing research methods for FTVL tropical fruits dataset.
Figure 24. AP results of presented approach versus existing research methods for FTVL tropical fruits dataset.
Sensors 21 01139 g024
Figure 25. AP results of presented approach versus existing research methods for Cifar-10 dataset.
Figure 25. AP results of presented approach versus existing research methods for Cifar-10 dataset.
Sensors 21 01139 g025
Figure 26. AP results of presented approach versus existing research methods for Corel-1000 dataset.
Figure 26. AP results of presented approach versus existing research methods for Corel-1000 dataset.
Sensors 21 01139 g026
Figure 27. mAP results of presented approach versus existing research methods for Corel-1000 dataset.
Figure 27. mAP results of presented approach versus existing research methods for Corel-1000 dataset.
Sensors 21 01139 g027
Table 1. Average precision and f-measure ratios of Cifar-100 dataset.
Table 1. Average precision and f-measure ratios of Cifar-100 dataset.
Cifar-100 Dataset (Average Precision and F-Measure)
CategoryResNet-50VGG-19GoogLeNetCategoryResNet-50VGG-19GoogLeNet
10.980.180.900.200.850.21510.850.210.800.220.750.23
20.750.230.720.230.600.26520.960.190.920.190.850.21
30.920.190.960.200.800.22530.800.220.850.230.700.24
40.860.200.830.210.800.22540.940.190.900.200.840.21
50.720.230.750.230.600.26550.850.210.830.210.750.23
60.980.180.960.190.900.20560.980.180.950.190.900.20
70.970.190.900.200.880.20570.970.190.940.190.890.20
80.920.190.880.200.850.21580.900.200.870.200.850.21
90.880.200.800.220.900.22590.880.200.850.210.900.20
100.800.220.750.230.700.24600.950.190.900.200.880.20
110.860.200.880.200.800.22610.820.210.880.200.780.22
120.950.190.860.200.800.22620.860.200.800.220.820.21
130.900.200.840.210.780.22630.890.200.900.200.800.22
140.700.240.630.250.610.26640.900.200.870.200.810.21
150.830.210.780.220.700.24650.930.190.900.200.860.20
160.650.250.700.240.600.26660.920.190.880.200.850.21
170.700.240.750.230.700.24670.950.190.900.200.880.20
180.880.200.800.220.800.22680.880.200.860.200.900.21
190.900.200.930.210.700.24690.880.200.840.210.800.22
200.960.190.950.190.890.20700.830.210.800.220.700.24
210.830.210.780.220.700.24711.000.180.950.190.920.19
220.980.180.960.190.900.20720.830.210.810.210.800.22
230.830.210.810.210.850.21730.920.190.900.200.800.22
240.980.180.950.190.900.20740.980.180.900.200.800.22
250.920.190.940.190.830.21750.900.200.870.200.800.22
260.850.210.800.220.800.22760.950.190.970.200.800.22
270.800.220.780.220.700.24770.980.180.950.190.900.20
280.750.230.700.240.600.26780.800.220.700.240.650.25
291.000.180.950.190.900.20790.930.190.970.190.900.20
300.990.180.920.190.880.20800.980.180.940.190.890.20
310.760.220.800.220.720.23810.750.230.700.240.600.26
320.960.190.880.200.920.19821.000.180.980.180.930.19
330.940.190.900.200.950.19830.920.190.900.200.880.20
340.850.210.800.220.750.23840.820.210.750.230.650.25
350.970.190.900.200.860.20850.960.190.880.200.820.21
361.000.180.940.190.880.20860.920.190.800.220.880.20
370.800.220.820.210.750.23870.980.180.950.190.900.20
380.870.200.850.210.800.22880.940.190.900.200.880.20
390.880.200.800.220.830.21890.930.190.960.190.900.20
400.980.180.900.200.890.20900.920.190.850.210.800.22
410.850.210.880.200.840.21910.850.210.820.210.820.21
420.920.190.850.210.820.21920.800.220.700.240.660.25
430.970.190.950.190.880.20931.000.180.960.190.900.20
440.960.190.920.190.900.20940.800.220.750.230.750.23
450.920.190.960.190.890.20950.960.190.800.220.900.20
461.000.180.980.180.920.19960.900.200.820.210.920.20
470.980.180.900.200.850.21970.950.190.900.200.880.20
480.940.190.900.200.880.20981.000.180.980.180.900.20
490.890.200.870.200.850.21990.900.200.920.210.800.22
500.750.230.720.230.700.241000.980.180.900.200.860.20
Table 2. Average precision, F-measure, and ARP ratios for Oxford buildings dataset.
Table 2. Average precision, F-measure, and ARP ratios for Oxford buildings dataset.
Oxford Buildings Dataset (Average Precision, F-Measure, and ARP)
CategoryResNet-50VGG-19GoogLeNet
All soul0.650.200.650.600.260.600.580.270.58
Ashmolean0.880.210.770.850.210.730.800.220.69
Balliol0.820.250.780.750.230.730.700.240.69
Bodleain0.640.190.750.600.260.700.500.290.65
Chirst church0.950.230.790.900.200.740.870.200.69
Corner market0.750.200.780.720.230.740.700.240.69
Hertford0.880.190.800.820.210.750.900.200.72
Jesus0.960.220.820.900.200.770.880.200.74
Keble0.760.180.810.750.230.770.700.240.74
Magdalen1.000.220.830.970.190.790.900.200.75
New0.800.200.830.750.230.780.650.250.74
Oriel0.860.210.830.900.200.790.830.210.75
Oxford0.850.190.830.880.200.800.860.200.76
Pitt rivers0.940.200.840.940.190.910.900.200.77
Radcliffe0.900.180.840.950.190.820.880.200.78
Trinity0.980.220.850.920.190.830.840.210.78
Worcester0.800.200.850.780.220.820.700.240.78
Table 3. Average precision and F-measure ratios for ALOT (250) dataset.
Table 3. Average precision and F-measure ratios for ALOT (250) dataset.
ALOT (250) Dataset (Average Precision and F-Measure)
Bubble Textures
CategoryResNet-50VGG-19GoogLeNetCategoryResNet-50VGG-19GoogLeNet
10.800.190.900.200.850.21120.920.180.900.190.800.20
20.940.220.750.230.720.23130.980.200.960.200.900.21
30.850.180.900.200.820.21140.910.220.880.240.830.25
40.980.190.950.190.900.20150.800.200.700.190.650.20
50.970.190.880.200.800.22160.900.220.920.240.900.26
60.900.210.860.200.840.21170.760.200.700.200.600.22
70.880.200.820.210.700.24180.880.220.860.220.800.24
80.950.190.900.200.800.22190.800.210.760.220.700.22
90.820.190.890.200.800.22200.850.200.800.210.780.22
100.860.190.900.200.780.22210.890.180.820.190.800.20
110.890.240.600.260.550.27
Stone Textures
CategoryResNet-50VGG-19GoogLeNetCategoryResNet-50VGG-19GoogLeNet
10.700.240.650.250.600.26140.920.190.880.200.800.22
20.850.210.800.220.860.20150.980.180.930.190.900.20
30.650.250.600.260.500.29160.830.210.800.220.850.21
40.800.220.760.220.700.24170.960.190.920.190.900.20
50.900.200.860.200.800.22180.880.200.860.200.820.21
60.900.200.820.210.750.23190.920.190.870.200.800.22
70.600.260.550.270.500.29200.830.210.750.230.700.24
80.880.200.800.220.740.23210.850.210.800.220.780.22
90.870.200.800.220.730.23220.860.200.830.210.800.22
100.980.180.890.200.880.20230.900.200.920.190.800.22
110.860.200.820.210.700.24240.860.200.800.220.770.22
120.860.200.800.220.750.23250.800.220.760.220.720.23
130.940.190.900.200.870.20260.880.200.830.210.800.22
Leaf Textures
CategoryResNet-50VGG-19GoogLeNetCategoryResNet-50VGG-19GoogLeNet
10.760.220.720.230.680.24150.760.220.720.230.700.24
20.930.190.880.200.850.21160.800.220.780.220.700.24
30.960.190.910.200.830.21170.910.200.880.200.800.22
40.800.220.740.230.700.24180.880.200.800.220.760.22
50.950.190.900.200.800.22190.900.200.870.200.840.21
60.860.200.810.210.760.22200.870.200.820.210.800.22
70.900.200.880.200.830.21210.860.200.800.220.790.22
80.800.220.750.230.700.24220.800.220.700.240.630.25
90.960.190.900.200.800.22230.900.200.880.200.860.20
100.980.180.920.190.880.20240.910.200.800.220.780.22
110.870.200.810.210.780.22250.960.190.900.200.800.22
120.920.190.900.200.830.21260.980.180.880.200.830.21
130.970.190.920.190.880.20270.890.200.800.220.760.22
140.960.190.940.190.920.19280.800.220.780.220.750.23
Fabric Textures
CategoryResNet-50VGG-19GoogLeNetCategoryResNet-50VGG-19GoogLeNet
10.900.220.830.210.800.22190.950.190.920.190.820.21
20.960.210.900.200.820.21200.980.180.950.190.900.20
30.980.200.900.200.880.20210.800.220.750.230.700.24
40.940.210.880.200.840.21220.750.230.700.240.600.26
50.800.240.750.230.670.24230.900.200.880.200.880.20
60.890.220.840.210.800.22240.980.180.900.200.820.21
70.860.240.800.220.700.24250.950.190.900.200.880.20
80.920.220.860.200.800.22260.900.200.880.200.810.21
90.880.240.820.210.700.24270.980.180.950.190.900.20
100.980.220.890.200.800.22280.940.190.900.200.800.22
110.800.220.820.210.800.22290.960.190.920.190.880.20
120.920.210.900.200.830.21300.860.200.800.220.740.23
130.980.220.900.200.800.22310.880.200.850.210.800.22
140.960.210.920.190.820.21320.920.190.900.200.800.22
150.940.200.910.200.900.20330.980.180.920.190.820.21
160.900.220.860.200.800.22340.970.190.920.190.850.21
170.780.260.700.240.600.26350.940.190.900.200.800.22
180.850.240.800.220.700.24360.890.200.830.210.760.22
Vegetable Textures
CategoryResNet-50VGG-19GoogLeNetCategoryResNet-50VGG-19GoogLeNet
10.980.180.900.200.870.2070.800.220.700.240.660.25
20.880.200.800.220.750.2380.900.200.800.220.760.22
30.900.200.870.200.800.2290.650.250.600.260.500.29
40.920.190.880.200.820.21100.980.180.940.190.900.20
50.980.180.930.190.900.20110.700.240.660.250.600.26
60.900.200.820.210.740.23120.960.190.900.200.800.22
Fruit Textures
CategoryResNet-50VGG-19GoogLeNetCategoryResNet-50VGG-19GoogLeNet
10.920.190.880.200.800.22110.890.180.800.220.700.24
20.900.200.870.200.830.21120.930.200.880.200.830.21
30.880.200.820.210.730.23130.960.190.900.200.800.22
40.860.200.800.220.780.22140.950.190.930.190.850.21
50.980.180.900.200.810.21150.970.190.950.190.900.20
60.970.190.950.190.820.21160.890.190.850.210.800.22
70.940.190.900.200.880.20170.980.200.920.190.880.20
80.920.190.910.200.820.21180.950.180.900.200.820.21
90.800.220.780.220.700.24190.750.190.730.230.700.24
100.990.180.950.190.900.20200.890.230.800.220.700.24
SeedTextures
CategoryResNet-50VGG-19GoogLeNetCategoryResNet-50VGG-19GoogLeNet
10.780.220.750.230.700.2480.900.200.830.210.800.22
20.990.180.940.190.900.2090.890.200.840.210.810.21
30.960.190.920.190.880.20100.920.190.860.200.820.21
40.920.190.880.200.820.21110.950.190.920.190.830.21
50.880.200.840.210.800.22120.980.180.940.190.860.20
60.950.190.900.200.830.21130.950.190.920.190.880.20
70.800.220.760.220.700.24140.700.240.660.250.600.26
Bean Textures
CategoryResNet-50VGG-19GoogLeNetCategoryResNet-50VGG-19GoogLeNet
10.890.200.850.210.830.2190.950.190.940.190.900.20
20.870.200.820.210.700.24100.980.180.920.190.870.20
30.900.200.860.200.830.21110.600.260.550.270.500.29
40.900.200.820.210.800.22120.800.220.820.210.840.21
50.750.230.700.240.600.26130.850.210.810.210.800.22
60.980.180.900.200.870.20140.760.220.800.220.700.24
70.600.260.550.270.520.28150.900.200.850.210.760.22
80.900.200.820.210.760.22
Coin Textures
CategoryResNet-50VGG-19GoogLeNetCategoryResNet-50VGG-19GoogLeNet
10.870.200.800.220.740.23120.800.220.750.230.720.23
20.900.200.860.200.820.21130.900.200.860.200.810.21
30.940.190.820.210.780.22140.900.200.880.200.800.22
40.980.180.860.200.820.21150.800.220.750.230.700.24
50.950.190.900.200.800.22160.980.180.950.190.900.20
60.940.190.880.200.840.21170.960.190.900.200.820.21
70.980.180.900.200.800.22180.860.200.840.210.800.22
80.890.200.820.210.750.23190.890.200.830.210.790.22
90.870.200.810.210.760.22200.980.180.950.190.880.20
100.950.190.920.190.840.21210.890.200.850.210.800.22
110.870.200.820.210.700.24220.900.200.840.210.780.22
Sea Shell Textures
CategoryResNet-50VGG-19GoogLeNetCategoryResNet-50VGG-19GoogLeNet
10.980.180.950.190.850.21110.890.200.830.210.760.22
20.980.190.930.190.820.20120.910.200.880.200.820.21
30.960.190.920.200.880.20130.950.190.900.200.840.21
40.920.200.880.210.860.22140.960.190.920.190.800.22
50.900.220.830.220.800.24150.980.180.920.190.900.20
60.800.180.780.190.700.21160.800.220.790.220.730.23
70.980.180.940.190.830.20170.550.270.530.280.500.29
80.990.180.930.200.880.21180.750.230.720.230.700.24
90.980.190.900.190.830.22190.860.200.800.220.720.23
100.960.180.920.190.800.21200.980.180.880.200.830.21
Spice Textures
CategoryResNet-50VGG-19GoogLeNetCategoryResNet-50VGG-19GoogLeNet
10.970.190.840.210.800.22190.950.190.910.200.860.20
20.920.190.900.200.900.20200.980.180.940.190.880.20
30.900.200.820.210.740.23210.890.200.850.210.800.22
40.750.230.700.240.650.25220.850.210.800.220.700.24
50.750.230.730.230.680.24230.980.180.900.200.810.21
60.990.180.900.200.870.20240.940.190.920.190.820.21
70.600.260.580.270.500.29250.980.180.950.190.900.20
80.880.200.800.220.700.24260.880.200.800.220.760.22
90.950.190.900.200.840.21270.860.200.820.210.700.24
100.800.220.770.220.720.23280.980.180.900.200.830.21
110.940.190.900.200.800.22290.940.190.880.200.800.22
120.890.200.840.210.800.22300.920.190.840.210.800.22
130.860.200.800.220.760.22310.970.190.940.190.820.21
140.850.210.800.220.740.23320.880.200.800.220.700.24
150.970.190.840.210.800.22330.890.200.840.210.780.22
160.920.190.900.200.900.20340.960.190.900.200.760.22
170.900.200.820.210.740.23350.990.180.960.190.900.20
180.750.230.700.240.650.25360.900.200.880.200.850.21
Table 4. Average precision, F-measure, and ARP ratios of FTVL Tropical Fruits dataset.
Table 4. Average precision, F-measure, and ARP ratios of FTVL Tropical Fruits dataset.
FTVL Tropical Fruits Dataset (Average Precision, F-Measure, and ARP)
CategoryResNet-50VGG-19GoogLeNet
Tahiti Lime1.000.180.900.860.200.860.800.220.80
Cashew0.950.190.980.900.200.880.870.200.84
Agata Potato1.000.180.980.920.190.890.900.200.86
Diamond Peach0.950.190.980.890.200.890.820.210.85
Granny Smith Apple0.950.190.970.900.200.890.750.230.83
Asterix Potato0.900.200.960.880.200.890.700.240.81
Nectarine1.000.180.960.900.200.890.860.200.81
Fuji Apple0.850.210.950.780.220.880.700.240.80
Watermelon1.000.180.960.980.180.890.900.200.81
Honeydew Melon0.900.200.950.770.220.880.820.210.81
Spanish Pear1.000.180.950.980.180.890.880.200.82
Plum0.900.200.950.880.200.890.800.220.82
Kiwi0.980.200.950.780.220.880.760.220.81
Onion1.000.180.950.900.200.880.880.200.82
Orange0.950.190.950.880.200.880.800.220.82
Table 5. Comparison of presented method mAP results with CBRFF [63] and other state of the art methods [78] of FTVL Tropical Fruits dataset.
Table 5. Comparison of presented method mAP results with CBRFF [63] and other state of the art methods [78] of FTVL Tropical Fruits dataset.
FTVL Tropical Fruits Dataset Vs. Existing Methods–Mean Average Precision (mAP)
AverageProposed MethodCDH +
SEH
CCV +
CLBP
CCV +
LTP
GCH
+ LBP
CDH +
CLBP
CDH + SHE +
CLBP
CBRFF [63]
0.960.900.930.930.940.910.940.95
Table 6. Comparison of presented method mAP results versus state of the art methods of 17-Flowers dataset.
Table 6. Comparison of presented method mAP results versus state of the art methods of 17-Flowers dataset.
17-Flowers Dataset vs. Existing Methods—Mean Average Precision (mAP)
AverageProposed MethodLCDS [85]SCSPM [86]SCS W/B [86]LCLCIC [87]JLVCD [88]JLVCD W/B [88]FCMOC [89]DLCSPC [90]
0.840.720.520.620.650.690.710.670.59
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kanwal, K.; Tehseen Ahmad, K.; Khan, R.; Alhusaini, N.; Jing, L. Deep Learning Using Isotroping, Laplacing, Eigenvalues Interpolative Binding, and Convolved Determinants with Normed Mapping for Large-Scale Image Retrieval. Sensors 2021, 21, 1139. https://0-doi-org.brum.beds.ac.uk/10.3390/s21041139

AMA Style

Kanwal K, Tehseen Ahmad K, Khan R, Alhusaini N, Jing L. Deep Learning Using Isotroping, Laplacing, Eigenvalues Interpolative Binding, and Convolved Determinants with Normed Mapping for Large-Scale Image Retrieval. Sensors. 2021; 21(4):1139. https://0-doi-org.brum.beds.ac.uk/10.3390/s21041139

Chicago/Turabian Style

Kanwal, Khadija, Khawaja Tehseen Ahmad, Rashid Khan, Naji Alhusaini, and Li Jing. 2021. "Deep Learning Using Isotroping, Laplacing, Eigenvalues Interpolative Binding, and Convolved Determinants with Normed Mapping for Large-Scale Image Retrieval" Sensors 21, no. 4: 1139. https://0-doi-org.brum.beds.ac.uk/10.3390/s21041139

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop