1. Introduction
The application of science and engineering to the analysis of artifacts and artworks such as paintings, mosaics and statues dates back several centuries [
1,
2,
3]. However, only over the past few decades have the analytical methods developed in the mathematical, IT and physical sciences been able to gather information from the past and contribute to the analysis, interpretation and dissemination in the fine arts. In the past, there was a historical division between science and the humanities, so the interaction between these two fields has never been natural. For example, the application of signal and image processing techniques for the analysis and restoration of artworks was a very uncommon practice. Lately, there has been a greater and growing attention and interest in processing image data of artworks for storage, transmission, representation and analysis, and an increasing number of scientists with a background in analytical and mathematical techniques has approached this field, in an interdisciplinary way. There are several ways in which image processing can find significant applications in the fields of fine arts and cultural heritage. Among them, three main areas of application can be identified: obtaining a digital version of traditional photographic reproductions, pursuing imaging diagnostics and implementing virtual restoration [
1,
2,
4]. Obtaining the exact reproduction and explanation of an artwork was one of the first developments in the first area, which includes the process of archiving, retrieving and disseminating data and derives all the benefits from the digital format [
1,
2,
3,
4,
5,
6]. In the second area of imaging diagnostics, digital images are used to detect and document the state of preservation of artifacts [
7], as in the case of the noninvasive techniques based on imaging in different spectral regions used for the investigation of paintings [
8]. In the third area, the image processing techniques can be used as a guide to the actual restoration of fine arts (computer-guided restoration), or they can produce a digitally restored version of the artwork. In some activities, the computer is more suitable than traditional artistic tools. Examples of such activities are filtering, geometric transformation of an image, segmentation and pattern recognition. Using digital technologies, every change to the image can be seen on the screen almost in real time. Moreover, images and data can be edited, filtered and processed with minimal material costs even when complicated operations are performed, e.g., changes in colors, brightness or contrast [
5,
9,
10,
11,
12]. A further development consists of applying computer vision, an area of artificial intelligence, to recognize patterns of the historical art heritage [
6,
13].
In this scenario, this paper presents a method to perform the recognition of geometrical patterns in fine arts, thanks to image processing techniques. In particular, we developed and tested a deep learning-based framework to classify the geometric forms and patterns of floor mosaics, which consist of an arrangement of tiles usually characterized by jagged and undefined boundaries or surface irregularities. The workflow of the proposed method is shown in
Figure 1.
The paper is organized as follows: In
Section 2, we introduce methods of image processing applied to fine arts, involving machine learning and deep learning-based techniques.
Section 3 describes the proposed method based on deep neural networks.
Section 4 introduces the case study.
Section 5 presents the experiments resulting from the application of the deep neural network framework to the dataset and the results achieved. In
Section 6, some final remarks and open questions close the paper.
2. Related Work
This section proposes a literature survey dealing with various methods of image processing applied to fine arts, involving machine learning and deep learning-based techniques. In [
14,
15,
16,
17,
18], image processing techniques for art investigation are applied to the detection of defects and cracks, as well as to the removal of defects and canvas from high-resolution acquisition of paintings. Examples of these kinds of methods include the use of sparse representations and the removal of cradling artifacts in X-ray images of panel paintings [
15] and the automated crack detection using the Ghent Altarpiece [
16], employed as guidance during its ongoing restoration.
Various methods of automatic image segmentation are used in the literature aiming at identifying regions in an image and labeling them as different classes. The main applications are pattern recognition for classifying paintings [
19,
20,
21,
22,
23] or the authentication of fine arts (e.g., of paintings) [
24]. These image segmentation methods include the following: The thresholding methods transform a grey-scale image into a binary image, where the algorithm evaluates the differences among neighboring pixels to find object boundaries [
25,
26,
27]. The region growing methods are based on an expansion of an object detected inside of an object [
28,
29] by selecting object seed pixels (inside an area to be detected) and then searching for neighboring pixels with similar intensities to the object seed pixels. In the level sets, the algorithm will converge at the boundary of the object where the differences are the highest. In the graph-cut method [
30,
31,
32], firstly proposed by Wu and Leahy [
30], each image is represented as a graph of nodes: each node corresponds to an image pixel, and links connecting the nodes are called edges; a pathway is constructed connecting all the edges to travel across the graph.
Aggregation methods are important as well for image resampling [
33] or denoising [
34]: When an appropriate scale or resolution is determined, the next step is to obtain the corresponding images. In the case of low scale or resolution, resampling techniques are often used to interpolate an image into a desired resolution, and aggregation is a particular resampling technique widely practiced for “up-scaling” image data from high resolution to low resolution [
33].
This paper particularly focuses on deep learning [
35,
36], which is a kind of machine learning that uses several levels of neurons with complicated architectures or nonlinear changes to represent greater interpretations of information. With the growing volume of information and computing power, neural systems having increasingly sophisticated architecture have been of great interest and are used in a variety of disciplines. Some examples of applications in image processing and in fine arts are as follows: Image segmentation using a neural network has recently been used as a very strong tool for image processing [
22,
37]; recently, even convolutional neural networks have been applied to paintings [
38]. In [
39], a novel deep learning framework is developed to retrieve similar architectural floor plan layouts from a repository, analyzing the effect of individual deep convolutional neural network layers for the floor plan retrieval task. In [
40] the results of a novel method for building structure extraction in urbanized aerial images are presented. Most of the methods are based on CNN. Similarly, in [
41], the use of deep neural networks for object detection in floor plan images is investigated, evaluating the use of object detection architectures to recognize furniture objects, doors and windows in floor plans.
Gomez-Rios et al. [
42] classified the textures of underwater coral patterns based on a CNN-based transfer learning-based approach. To work on diverse data and evaluate the performance of the proposed approach, they used data augmentation. The adoption of a deep neural network can significantly improve phase demodulation efficiency from a singular fringe sequence [
43]. Their system was developed to anticipate several subsequent outcomes that may be used to calculate an incoming fringe pattern cycle. They collected fringe pictures of diverse situations to produce training input while the systems are being trained. The neural network blindly took only one input fringe sequence and produced the associated estimations of such transitional outcomes at great accuracy. Sandelin [
44] proposed a Mask R-CNN-based technique for floor plan pictures and segmented the walls, windows, chambers and doors. This method showed good performance even in noisy images. Vilnrotter et al. [
45] proposed a technique to generate appropriate naturalistic texture characteristics. The fundamental method of edge characteristics to determine an initial, incomplete identification of the components was discussed. The graphic components were extracted using such characterization. The components were classified into types and topological connections with them. The formulations were proven to be beneficial for texture identification and recurrent pattern restoration.
With a particular focus on mosaics, most of the related computer applications deal with their digital reconstruction using image-based techniques (i.e., photogrammetry) for documentation and analysis [
46,
47,
48,
49]. Besides, literature presents a few examples of image processing applications: In [
50], a registration method in the framework of a restoration process of a medieval mosaic to compare a historical black and white photograph with a current digital one is presented. In [
51], an algorithm that exploits deep learning and image segmentation techniques is presented to obtain a digital (vector) representation of a mosaic. In [
52], the restoration of historical photographs of an ancient mosaic (by removing noise, deburring the image and increasing the contrast) and then the removal of geometrical difference between images by means of the multimodal registration using mutual information is presented; the final identification of differences between the photos indicates the changes in the mosaic during the centuries. In [
53], Falomir et al. presented a mathematical method for calculating a likeness score among qualitative assessments of item structure, color and dimension in digitized pictures. The closeness scores calculated are dependent on compositional cluster maps or intermediate distances, as per the specification of the subjective characteristics. The outcome using prior techniques was enhanced by using an estimated identification process among item characteristics of a tile mosaic assembly.
3. Proposed Method
In this paper, we propose a deep learning-based framework to classify the forms of fine arts, such as paintings and mosaics. The algorithm is able to classify the geometrical forms constituting the patterns, even if they are partially deformed. This deep learning [
54] is a type of machine learning that eliminates the need for manual processing of features. Images are immediately fed into this system, and the final categorization is returned. Due to its high capacity to cope with geographically dispersed input, the convolutional neural network (CNN) [
55] is the most efficient and frequently utilized.
In this study, we used a CNN-based framework that autonomously quantifies the feature map and classifies it. To the best of our knowledge, there is no literature on the use of CNN for the identification of floor mosaic patterns to date. Convolution, pooling and dense layers are three distinct categories of levels found in CNN. The convolution levels generate attributes from the incoming images by introducing certain specified filters. The generated feature vector is passed through a pooling layer to reduce the spatial size of the feature map. As a result, the network parameter count and computational cost are reduced. The dense level receives all the outputs from the preceding level and delivers one output to the following level from every neuron. The proposed CNN framework can be described as CPCCCPDD architecture, where C, P and D represent convolution, pooling and dense, respectively. The input image is fed to the first convolutional layer, which consists of 32 filters having size 5 × 5. This convolutional layer is followed by a max-pool layer with filter size 3 × 3. Then three convolutional layers having 16 filters of size 3 × 3 each are fed in series. This is followed by another max-pool layer with filter size 2 × 2. There are two dense layers used in the proposed CNN framework: one is 45-dimensional dense and the second is 5-dimensional (output layer). The proposed CNN framework is depicted in
Figure 2.
The number of pixels shifted across the incoming tensor is referred to as the stride. If the stride is set to 1, the filters/masks are moved one element at a time. If it is set to 2, then the mask will be shifted by two elements, and so on. Here, for both the convolution and pooling layers, the stride value of 1 is considered throughout the experiment. The dropout value of 0.5 was taken. The dropout helps to reduce the overfitting problem in the network. Before feeding to the dense layer, a batch normalization strategy is used to speed up the training process. The learning rate is taken as 0.001. The ‘Adam’ optimizer and ‘cross-entropy loss function’ are deployed in the proposed framework. In the convolutional layers and the first dense layer, the rectified linear unit (ReLU) activation function is used, which can be formularized as:
where
n is the input to a neuron.
In the output layer, the activation function named ‘Softmax’ is used, which is provided in Equation (2).
The number of parameters used in the CNN architecture is presented in
Table 1. The total number of trainable parameters used is 617,491.
4. Case Study
The deep learning (CNN) framework was applied and tested on a Roman mosaic discovered in Savignano sul Panaro, near the city of Modena (Italy), in 1897 during an archaeological excavation. This floor mosaic belongs to the ruins of a large late Roman building dated to the 5th century A.D. [
56]. It originally measured about 6.90 m × 4.50 m, but less than half of its original surface is preserved. The Roman mosaic was removed for restoration and is now conserved in the birthplace house of the painter Giuseppe Graziosi (Savignano sul Panaro), who first documented its existence in 1897 (
Figure 3, left).
The mosaic pattern is described in [
57]. Its decorations present polychrome stone and terracotta tiles combined with emerald green and ruby red glass tiles. The mosaic shows a geometrical pattern of (originally) eight octagonal elements arranged around a larger central one, which consists of an eight-pointed star, formed by two superimposed squares to form a central octagon with irregular sides (in purple, in
Figure 3, right). The central octagon has a circular motif with a white background containing a laurel wreath and, presumably, a figured center. The vertices of the star originate eight octagons, smaller in size, arranged in pairs of two on each side (in red, blue and yellow, in
Figure 3, right), containing geometric and stylized plants that alternate with Solomon’s knots. The external octagons are only partially preserved, but all of them have internal circular motifs, with a border of pointed triangles in black on white. The space between the octagons and the side walls is filled with different polygonal and triangular forms. At the top, six circles (five full circles and one half-circle) alternate intertwined motifs with a red and black background, surrounding a central square.
A close-range photogrammetric model of the Roman mosaic is developed by means of 115 photos (standard compact camera Nikon P310 (Nikon, Tokyo, Japan), 16.1MP CMOS sensor, sensor size: 1/2.3” (~6.16 mm × 4.62 mm), max. image resolution 4608 × 3456) thanks to Agisoft Metashape Professional (Version 1.6.3). In this software, the 3D model is also scaled to its natural size using as references the sides of the inclined support of the mosaic (see
Figure 3, left), whose dimensions are known. The final model consists of a detailed textured 3D model of the mosaic, which shows the arrangements of the tiles, their edges and some planar issues due to its state of conservation, as well as the geometric forms and their arrangements.
The 3D model supported the generation of images showing the mosaic geometric forms in two ways: Firstly, from the 3D model, the Agisoft Metashape Pro software developed an orthophoto, which is a computer-generated image of the whole artifact that has been corrected for any geometric distortions. In particular, it is obtained as a parallel projection of the view of a photogrammetric textured model taken along a predetermined plane [
58]. During the transformation from a 2D perspective view into an orthophoto, each photo is rectified (i.e., it is an orthogonal projection of the real photo on the mosaic plane); therefore, it is no longer deformed by perspective. Conversely, the “real” photo is influenced by perspective, as seen by the human eye. Therefore, we obtained a set of 115 photographic images corrected and rectified, from which we could extract the images of geometrical forms to be classified by the deep learning algorithm.
Secondly, from the 3D model, we extracted and isolated additional image samples depicting each of the geometric forms to be analyzed. By simply rotating, translating and zooming the 3D models, we obtained images of the same geometric form with multiple spatial orientations and, therefore, with multiple distortions. Some of these images are shown in
Figure 4.
6. Discussion and Conclusions
This paper presents a framework for geometric form analysis based on images extracted from a close-range photogrammetric model of an artifact (floor mosaic) and deep learning (CNN) algorithm. From the digital model of the mosaic, an orthophoto was obtained, which the photogrammetric software generated by rectifying the photos used in photogrammetry. Therefore, two sets of photos were collected in a dataset: the original photos, affected by perspective, useful for obtaining images of the deformed geometric forms of the mosaic and, on the other hand, the rectified version of the same photos with the geometric forms projected on the floor plane and so not deformed. Moreover, additional images can be obtained by simply rotating, translating and zooming the 3D model of the mosaic, generating other images with geometric forms differently deformed.
The deep learning algorithm analyzed the entire dataset consisting of 407 (normalized) images, in particular, 103 images of circles, 79 images of octagons, 71 images of squares, 137 images of triangles and 17 images of leaves. The geometric forms in the mosaic are made by arrangements of tiles, which caused jagged contours and irregularities in the geometric forms to be analyzed; moreover, there were cracks and improper/incomplete geometry of the mosaic elements, which were sometimes due to unevenness in the ground or the elements having been destroyed in the past. Moreover, some of the photos showing the mosaic forms present noise and blurs, sometimes due to poor illumination.
Despite all these defects, the algorithm is able to identify and classify more than 94% of the forms in each category, and the method has proved to be robust enough to analyze the mosaic geometric forms chosen as a case study. Furthermore, the performance of the proposed method was compared with standard deep architectures that deployed a larger number of convolutions and pooling layers than the proposed method. Instead, we achieved good accuracy using the proposed lightweight architecture.
Concerning the selected case study, the proposed method has proved to be capable of extracting and classifying data from this kind of artwork. The dataset consists of various images related to five geometric forms that are repeated in the mosaic using different arrangements of tiles, colors and orientation, usually incomplete or separated by diameters, diagonals or simply by including smaller geometric forms in larger ones. Despite all these differences among the same kinds of geometric forms, the CNN architecture has proven to be capable of classifying the five geometric forms with high accuracy; therefore, we confidentially believe that it can be easily generalized to other mosaics with similar forms and patterns. As it was not possible to test it as part of this research activity, testing the CNN algorithm with other mosaics will be planned as future work.
Additional future works will consist in the analysis of mosaics and other artworks that are not flat but 3D-shaped in space, such as curved walls, domes and vaults. In addition, the method can originate a software tool for processing and analyzing fine arts data in a more automated way.