Next Article in Journal
Application of Dynamic Fragmentation Methods in Multimedia Databases: A Review
Previous Article in Journal
A Generalized Spatial Modulation System Using Massive MIMO Space Time Coding Antenna Grouping
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Image Hashtag Recommendations Using a Voting Deep Neural Network and Associative Rules Mining Approach

Institute of Computer Science, Pedagogical University of Krakow, 2 Podchorazych Ave, 30-084 Krakow, Poland
*
Author to whom correspondence should be addressed.
Submission received: 26 October 2020 / Revised: 25 November 2020 / Accepted: 27 November 2020 / Published: 30 November 2020
(This article belongs to the Section Signal and Data Analysis)

Abstract

:
Hashtag-based image descriptions are a popular approach for labeling images on social media platforms. In practice, images are often described by more than one hashtag. Due the rapid development of deep neural networks specialized in image embedding and classification, it is now possible to generate those descriptions automatically. In this paper we propose a novel Voting Deep Neural Network with Associative Rules Mining (VDNN-ARM) algorithm that can be used to solve multi-label hashtag recommendation problems. VDNN-ARM is a machine learning approach that utilizes an ensemble of deep neural networks to generate image features, which are then classified to potential hashtag sets. Proposed hashtags are then filtered by a voting schema. The remaining hashtags might be included in a final recommended hashtags dataset by application of associative rules mining, which explores dependencies in certain hashtag groups. Our approach is evaluated on a HARRISON benchmark dataset as a multi-label classification problem. The highest values of our evaluation parameters, including precision, recall, and accuracy, have been obtained for VDNN-ARM with a confidence threshold 0.95. VDNN-ARM outperforms state-of-the-art algorithms, including VGG-Object + VGG-Scene precision by 17.91% as well as ensemble–FFNN (intersection) recall by 32.33% and accuracy by 27.00%. Both the dataset and all source codes we implemented for this research are available for download, and our results can be reproduced.

1. Introduction

The number of social media users has continuously increased. Platforms like Facebook, Instagram, Twitter or Flickr are very popular tools for sharing news, keeping in touch with friends, and business promotion. With the aid of Natural Language Processing (NLP), researchers improve methods that might teach Artificial Intelligence (AI) to understand the meaning of messages published in the network. It is still a very challenging task, and algorithms are not perfect in capturing language flexibility, such as sentiments or context of a sentence. Many users include additional information in their post that classifies the context of the message using hashtags. Hashtags are words preceded by the ‘#’ symbol and are used not only to label text data but also images, which is crucial in image-oriented social networks [1]. Hashtags might describe the content of a picture (for example “cat”, “mum”), localization (“downtown”, “beach”), mood (for example “sad”, “happy”), or other topics, even abstract (for example “weather”, “future”, etc.). Users are also able to use different forms of words (“day”, “days”), upper and lowercase letters, slang-inspired words such as “luvu” (which means “love you”), or marketing slogans. The proper choice of hashtags is crucial for correctly categorizing image content and makes an image potentially easier to be found by viewers. In this research we will focus on automatic hashtag generation based solely on image data, formulated as a multi-label classification problem.

1.1. State-of-the-Art Works

Hashtags are very popular on social media platforms and impact users’ engagement. As a result, papers devoted to evaluating relationships between message content and hashtags have constantly been proposed by researchers. Authors have investigated relationships between hashtags and posts (text posted by user) [2,3,4], sentiment [5], topic [6], content similarity [7], and so forth.
On many social media platforms the content of the microblogging post is image-oriented. Valuable information can be collected solely from the image. That information might be applicable for recognizing the content of the photo to prepare a set of hashtags that describes the image. As we already mentioned, each image can be described by multiple hashtags (labels). The hashtag should explain and summarize the content and/or meaning of the image. The content might not be identical with the meaning, for example a picture containing a sunset might have hashtags such as “friendship”, “love”, “holidays”, and so forth. As can be seen, it is a very difficult task, even for a human observer, to categorize pictures based only on visual data. What is more, this classification problem is multi-labeled, which means one picture can be assigned to one or more classes.
The common approach for multi-label image classification is deep neural network (DNN)-based hashtag recommendation algorithms. Research on learning and describing the content of images has begun a few years ago [8]; however, hashtag recommendations defined as a multi-labeled problem seems to be a new and not yet explored subject. In a previous paper [1], researchers proposed an open dataset called HARRISON on which one can evaluate the efficiency of the proposed method. The authors also proposed a “baseline” algorithm that utilized a transfer learning approach. An ensemble of deep convolutional neural networks was also used by Reference [9] in a similar task; however, it was a single-label classification problem. A single-label approach was also presented in Reference [10], where the authors explored the possibility of generating hashtags for an input image and leveraged it to generate meaningful anecdotes connected to the essence of the image by applying a character-level Recurrent (RNN) Neural Network [11,12].
Many up-to-date methods utilize additional information besides image data to improve the performance of hashtag recommendation algorithms. The authors of Reference [13] applied transfer learning to train the network to extract embedded images and also used historical tagging information to generate personalized tag recommendations. In Reference [14], overall hashtag recommendations are generated on the basis of both the post’s features from the content modeling module (long short-term memory for text and CNN for images) and the habit influences from the habit modeling module. Another paper [15] proposed methods for calculating the recommended score of each hashtag based on the generated topical representations of multiple features—Distributed Vector Representation of Words, User-Hashtag Matrix, and User-Hashtag Topic Model Based on Short Text Expansion. The authors of Reference [16] proposed a hashtag recommendation algorithm that, besides CNN-based image data embedding, utilized user metadata such as age, gender, home city, and country. The authors of Reference [17] proposed an application of three different pre-trained CNN models to increase differentiation between hashtags. Then, an SVM machine learning model was learned from the extracted features. Semantic embedding modeling for vocabulary (hashtags) expansion is done by the Word2Vec Skip-gram model. The neural image hashtagging (A-NIH) model introduced in Reference [18] consists of two parts—a CNN-based encoder with sequential attention and a gated recurrent units-based decoder for hashtag recommendation. The PhD thesis of Thi Nguyen [19] is a valuable source of up-to-date information in the field of deep learning approaches for recommending image tags.

1.2. Motivation of This Paper

As can be seen in the previous section, multi-labeled hashtag recommendations for image data are challenging to model but are a very promising area of research with important applications in the industry, especially on social media platforms. In this research we will propose and evaluate a novel machine learning approach that utilizes an ensemble of deep neural networks to generate images features, which are then classified to a given potential hashtags set. The proposed hashtags are then filtered by a voting schema. The remaining hashtags might be included in a final recommended hashtag dataset by application of associative rules mining, which explores dependencies in certain hashtag groups. This method is called Voting Deep Neural Network with Associative Rules Mining (VDNN-ARM). We evaluated VDNN-ARM on the HARRISON dataset that contains 57383 images in 997 classes (one image can be assigned to more than one class). We implemented and trained other state-of-the-art approaches, namely References [1,9], and our method outperformed those algorithms. Both the dataset and all source codes we implemented for this research are available for download, and our results can be reproduced.
The most important contribution of this paper is the proposition and evaluation of a novel computer method that recommends hashtags from image data. The main novelty of this paper is employing an ensemble of deep neural networks to enhance classification using additional information about dependencies between certain hashtag groups discovered by associative rules mining. To our best knowledge, this combination of deep learning and rules discovering has not been combined into a single voting and recommendation schema for the task of hashtag recommendation. The learning step in our proposed algorithm requires only a training dataset that has a sufficiently large image dataset and information about hashtags associated with them.

2. Materials and Methods

In this section we will describe the dataset on which we evaluated our method and schema of our multi-labeled classifier.

2.1. Dataset

In this research we utilized a real-world photo dataset, HARRISON [1]. This dataset is composed of 57,383 photos from Instagram. The authors of the dataset processed it by filtering out less frequent hashtags. Finally, each image in the dataset is described with one to ten hashtags. The total number of hashtags is 997, and there is an average of 4.5 associated hashtags for each photo. The task of assigning hashtags to a photo is defined here as multi-label problem because each image might have one or more classes (hashtags) assigned to it. The efficiency of the evaluated method is evaluated using precision calculated on the first suggested hashtag (precision (1)), recall of first five suggested hashtags (recall (5)), and accuracy of first five suggested hashtags (accuracy (5)) as follows:
p r e c i s i o n ( k ) = r e s u l t s ( k ) G T r e s u l t s ( k )
r e c a l l ( k ) = r e s u l t s ( k ) G T G T
a c c u r a c y ( k ) = 1 i f r e s u l t ( k ) G T 0 i f r e s u l t ( k ) G T = ,
where k is the number of first k “best” (top) hashtags we want to consider, result(k) corresponds to the set of top k hashtags the algorithm predicted, and GT is a set of ground truth hashtags. As can be seen in this evaluation setup, obtaining 100% precision and recall is virtually impossible.

2.2. Classifier Architecture

The architecture of our solution was inspired by previous research in this field. Authors have quickly discovered that a single CNN might not be enough to extract all valuable features from an image. In Reference [1], the authors used two deep feature extractors, and both of them had VGG16 architectures [20]; however, the first one was trained on an ImageNet dataset [21] and the second one was trained on a Places dataset [22,23]. Researchers have used very popular transfer learning approaches in which network weights are imported from pre-trained models to extract deep features, and the final classification layers are re-trained using actual classes that are present in the dataset [24,25]. In Reference [1], transfer learning is conducted on the HARRISON dataset. The classification network is composed of two fully connected (dense) layers with ReLu activation and an output layer with a sigmoidal activation function. This is a typical network setup for multi-label classification problems.
The solution proposed in Reference [9] also uses an ensemble of DNNs pretrained on ImageNet, namely VGG16, InceptionV3 [26], and ResNet [27]. Contrary to Reference [1], the transfer learning for each DNN is performed separately on the HARRISON dataset. The solution proposed in Reference [9] is, however, simpler than the one in Reference [1] because the authors considered it a single-label problem, as the output layer for the DNN classification part used softmax. Authors have experimented on various ensemble schemas such as voting, union, intersection, and so forth.
In this paper we propose an approach that incorporates ideas from the above papers, called VDNN-ARM (Voting-based Deep Neural Network architecture with Associative Rules Mining). Figure 1 presents an overview of this method. It consists of several CNN-based feature extractors, namely Xception [28], DenseNet201 [29], InceptionResNetV2 [27], VGG16, NASNetLarge [30], InceptionV3, and MobileNetV2 [31]. Each of these seven networks, besides VGG16, is pretrained on ImageNet; VGG16 is pretrained on Places dataset (the same set as in Reference [1]). Each of those networks accepts input images with dimensions of 224 × 224 in RGB color space. The output of each CNN is processed by a Global Average Pooling 2D layer and then propagated to classification layers. Each of the eight networks has the same classification architecture consisting of two dense layers with 2048 neurons with ReLu activation functions and output layer with sigmoid activation. The role of the last layer is to perform multi-label classification. Similar to Reference [9], we performed separate transfer learning for each of the eight networks on the HARRISON dataset. For a given input image each of the eight DNNs generated sigmoidal output. In Reference [1], the authors classified input images into x class labels (they assigned x hashtags to an image), which corresponds to x top values generated by the output sigmoidal layer. The number of classes/recommendations x is arbitrarily decided by the user of an algorithm.
VDNN-ARM takes a different approach in recommending hashtags. Because each DNN generates separate recommendations, we can apply to it various ensemble techniques, similar to Reference [9]. However, besides using only image data during training, we can also utilize information about dependencies between hashtags that are available in the dataset. We can do this, for example, by applying an associative rules mining framework.
Let T be a set of all transactions in the given dataset; A and B are itemsets, and A B is the association rule [32]. We define the itemset support as counts of A among all transactions T, in other words the frequency of A in a dataset.
s u p p o r t ( A ) = # A # T .
The confidence of association rule A B is a conditional probability of A given B:
c o n f i d e n c e ( A B ) = s u p p o r t ( A B ) s u p p o r t ( A ) .
In our case, a transaction is a set of hashtags that describes a given image in our dataset. Because of this, as described in Section 2.1, each transaction in our dataset contains 1 to 10 objects. We want to investigate potential associations between hashtags with reasonable support and confidence. In order to extract frequent itemsets we used the Apriori algorithm [33].
After training the DNN and mining rules from the training dataset, the VDNN-ARM algorithm is applied and described in following section.

VDNN-ARM Algorithm

Let us assume there are l DNN classifiers. For an input image I each CNN f j generates a feature vector that is used as an input to dense (fully connect) the NN with a sigmoid output layer. For the prediction P j , [ 1 k ] ( I ) we take k classes that correspond to the top values of the NN output layer.
P j , [ 1 k ] ( I ) = p j ( f j ( I ) ) [ 1 k ] .
In the next step we compose a single vector P, which contains predictions from all l DNN classifiers.
P = [ P 1 , [ 1 k ] , , P l , [ 1 k ] ] .
Using P we generate two vectors: C, which contains unique elements from P, and C f r , which contains counts of hashtag class labels from C in P.
C = [ h 1 , h 2 , , h n ] C f r = [ # h 1 , # h 2 , , # h n ] ,
where # h 1 # h 2 # h n , h i is a hashtag class label, and # h i is counts of the hashtag class label h i in P.
Then, we perform thresholding of C and create two vectors: C 1 , which contains classes that appeared in more than one classifier output, and C 2 , which contains those appearing in only one output. C 1 is then ordered by descending number of hashtag counts.
C 1 = [ h 1 , h 2 , , h m ] C 2 = [ h m + 1 , , h n ] ,
where # h m 2
C 1 contains classes for which at least two classifiers have voted. Then, we apply associative rules mining for C 1 and generate A R M ( C 1 ) , which is a set of all conclusions supported by associative rules. Then, we take the common part of A R M ( C 1 ) with C 2 . In hashtag set C 3 we have only classes that appeared as a result of rules supported by C 1 and which are also present in C 2 .
C 3 = A R M ( C 1 ) C 2 ,
where A R M is reasoning applied by associative rules mining (ARM) on the rules we have previously discovered.
Classes that were present in C 3 are assigned at the end of vector C 1 . S is a vector that contains the suggested hashtags for image I ordered from most frequently proposed by classifiers to those that appeared only once, but they were supported by ARM.
S = [ C 1 , C 3 ] .
Now we can take x first elements from S to generate x top hashtag suggestions for a given image.

3. Results

We implemented our approach in Python 3.6. Among the most important packages we used were Keras 2.4.3 and Tensorflow 2.3.1 for DNN implementation and GPU-accelerated tensor calculations. We used pre-trained CNN network weights from Keras-Applications 1.0.8 that were trained on the ImageNet dataset [21]. We also used VGG16 network weights [23] that were trained on the Places dataset [22]. For associative rules mining we utilized the mlxtend 0.17.3. package [34]. To evaluate the proposed method we used the HARRISION dataset [1] described in Section 2.1. We used 52383 randomly chosen objects in the training set and 5000 in the validation dataset.
All classifiers have been trained using a first-order gradient-based Adam optimizer [35] with a binary cross-entropy loss function.
In order to generate associative rules we set the minimal support threshold in the a priori algorithm to 0.0001 . We filtered out all rules with confidence below 0.001 .
We also implemented and trained algorithms proposed in References [1,9]. In the case of Reference [9] we replaced the softmax layer with a dense layer with sigmoidal activation function to make this classifier applicable to multi-label problems. All source codes we implemented in our research can be downloaded from github (https://github.com/browarsoftware/VDNN-ARM). Calculations were performed on a PC computer with Intel i7-9700 3.00 GHz CPU, 64 GB RAM running Windows 10 OS. We used NVIDIA GeForce RTX 2060 GPU.
In Figure 2 we present the accuracy (5) tests on each DNN network in the form of a graph. We generated this visualization using Gephi 0.9.2 software [36]. Graph layout was generated using the ForceAtlas2 algorithm [37].
Each node (vertex) represents an image from the validation dataset. If an image is connected to a node by a colored edge, this represents a particular DNN network, which means that at least one of the top five hashtags generated by that DNN is among hashtags describing this image. An image might have several connections to different vertices if, and only if, it has correctly passed the accuracy (5) test in more than one network. If the node is isolated, that means it has not been correctly classified by any network. As can be seen, there is a group of images that are not correctly classified by any network. They are visible in the top part of the graph as isolated grey points. This clearly shows there are some limitations to algorithms that are based on applying DNN to hashtag discovery that cannot be overcome. In addition, various DNN covers are not identical subsets of all images. This means applying an ensemble of several DNNs of various types might give better results than using only a single DNN. In Table 1 and in Figure 3 we present detailed results of the proposed algorithm with various threshold values of confidence for A R M . It is also possible that our proposed method will not generate hashtag recommendations. This might happen when C 1 = (see Equation (9)). Column #Recommended hashtags means that we evaluate precision, recall, and accuracy for no more than x top generated hashtags, where x is not more than length of vector S (see (11)). “No restrictions” means that we calculate all evaluation parameters for the whole vector S. The bold font indicates parameters with the highest values in the table. The best results were obtained for VDNN-ARM with threshold = 0.95 if we take into account a limited number of hashtags. When there is no restriction on the number of hashtags, the highest recall and accuracy were obtained for VDNN-ARM with threshold = 0.2. In Equation (7) we use parameter k = 5 , the same as in References [1,9].
In all cases, when the number of proposed hashtags increases, the precision decreases, and the recall and accuracy become higher. This is an expected behavior. At the beginning of vector S there are the most voted (probable) hashtags. When the number of considered hashtags increases, the denominator in the precision equations also increases, and more and more less probable hashtags are included in the evaluation. In the case of recall and accuracy, the higher number of hashtags causes an increase in the numerator, which increases recall and accuracy. When we do not limit number of hashtags to 5 (“No restriction”), the recall and accuracy achieve the highest value.
Table 2 presents a comparison of the proposed method to state-of-the-art approaches. The highest value for all coefficients were obtained for VDNN-ARM with a confidence threshold 0.95 . VDNN-ARM outperforms the precision (1) of VGG-Object + VGG-Scene [9] by 17.91%; in the case of Ensemble–FFNN (intersection) [9], the recall (5) increased by 32.33% and accuracy (5) by 27.00%.

4. Discussion

As can be seen in Section 4, the proposed method outperformed state-of-the-art approaches. Due to its voting schema this method incorporates benefits of both union and intersection schema. An intersection schema is responsible for aggregating and counting the recommended hashtag label results of each sub-DNN network. Union does not exclude less frequent hashtags from the final recommendation. Application of associative rules mining utilizes additional knowledge about conditional dependencies between hashtags. As can be seen in Table 2, in the case of algorithm [1], the image content data solely generated by DNN is not enough to overcome baseline results. The transfer learning approach utilized features of CNN for successful classification of image content.
Typically, in up-to-date literature, authors use methods that suggest x most probable hashtags from an output sigmoidal layer, where x is arbitrarily chosen by the user. VDNN-ARM allows one to manually choose the number of hashtag recommendations; however, by applying A R M and confidence thresholding schema, it might also be used to include less frequent hashtags that are recommended by ARM. The highest values of evaluation parameters, including precision (1), recall (5), and accuracy (5), have been obtained for VDNN-ARM with confidence threshold = 0.95 . All parameters decreased as the confidence threshold decreased (see Table 1). This result suggests that increasing the confidence of A R M rules results in an increase in classifier performance. This very important indicator suggests that applying a higher number of confident rules leads to a higher number of “matching” hashtags generated by the approach.
Our results suggest the proposed algorithm is a promising approach that can be successfully applied in practice. Another important find is the limitations of DNN-based hashtag discovery algorithms, which we discussed in Section and visualized in Figure 2. In order to improve the evaluation results we need to improve other parts of algorithm than CNN-based image feature extractors. According to Reference [38] there is a certain category of image hashtags that authors named “stophashtags”. This name is inspired from the term “stopwords”, which is used in the field of computational linguistics to refer to common and non-descriptive words found in almost every text document. Authors of that research have shown that, contrary to descriptive hashtags (hashtags relevant to the subject of an image), “stophashtags” are characterized by a high normalized subject (hashtag) frequency on irrelevant subject categories. Because we used a third-party benchmark dataset in this research, which has already been preprocessed by their creators and used in other research, we did not filter out potential “stophashtags”. It is possible that filtering “stophashtags” might improve the results of our method; however, the algorithm described in Reference [38] should operate on all acquired hashtags, not a subset that is present in the HARRISON dataset. In our future research in the field of hashtag recommendations, we plan to acquire an even larger dataset than HARRISON and apply to it “stophashtags” filtering. We believe this operation might lead to even more interesting and valuable results.

5. Conclusions

The proposed VDNN-ARM hashtag recommendation algorithm is an efficient approach that can be applied to any type of social media image data. As can be seen, the precision (1) coefficient is still relatively low; the first top hashtag appears only in about one-fifth of validation data. In the case of accuracy (5), over 55% of validation data has at least one correctly assigned hashtag; this is because the multi-label classification problem is difficult to correct: not only are there 997 classes, but the ground truth of class labels might not match the objects, scenes, and places that are present in images. Real-life hashtag descriptions often tell the state of mind, sentiment, or some abstract context that the person had in the moment of taking or publishing a photo. Each photo might have between 1 to 10 different hashtags, and that number varies between images. Therefore, when we do not have additional knowledge about the context of the picture (so called “a story behind photo”), we might not be able to mine/learn the rules that govern certain phenomena. The knowledge about those rules is also not fully understandable to a person who might try to manually assign hashtags. Because of this, it is very improbable that any algorithm based on HARRISON data will obtain perfect or even nearly perfect accuracy. Contrary to already published methods, our algorithm is capable of limiting the number of proposed hashtags by applying the ARM approach and its thresholding schema. Thanks to this, the VDNN-ARM in no-restriction mode can easily trade-off between precision and accuracy/recall.
We believe our algorithm is not limited to hashtag recommendation; it can be applied to any type of multi-labeled image classification data. In our opinion, the next step in research should be developing methods that utilize additional information, such as the context of the photo, which can be extracted from discussions about this photo on social media, geopositioning information, and so forth. These additional data, besides image data and baseline hashtag information, seem to be crucial to increase the efficiency of multi-label hashtag recommendations above a certain level limited by image-oriented DNNs. The limitations of image-oriented DNNs are clearly visible in Figure 2.

Author Contributions

T.H. was responsible for conceptualization, proposed methodology, software implementation, and co-writing of original draft; J.M. was responsible for data curation and co-writing of original draft. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Pedagogical University of Krakow.

Conflicts of Interest

The authors declare no conflict of interest. The funder had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Park, M.; Li, H.; Kim, J. HARRISON: A Benchmark on HAshtag Recommendation for Real-world Images in Social Networks. arXiv 2016, arXiv:1605.05054. [Google Scholar]
  2. Zangerle, E.; Gassler, W.; Specht, G. Recommending#-tags in twitter. In Proceedings of the Workshop on Semantic Adaptive Social Web (SASWeb 2011); CEUR Workshop Proceedings; RWTH: Aachen, Germany, 2011. [Google Scholar]
  3. Gong, Y.; Zhang, Q.; Huang, X. Hashtag recommendation for multimodal microblog posts. Neurocomputing 2018, 272, 170–177. [Google Scholar] [CrossRef]
  4. Belhadi, A.; Djenouri, Y.; Lin, J.C.; Zhang, C.; Cano, A. Exploring Pattern Mining Algorithms for Hashtag Retrieval Problem. IEEE Access 2020, 8, 10569–10583. [Google Scholar] [CrossRef]
  5. Miazga, J.; Hachaj, T. Evaluation of Most Popular Sentiment Lexicons Coverage on Various Datasets. In Proceedings of the 2019 2nd International Conference on Sensors, Signal and Image Processing, Prague, Czech Republic, 8 October 2019; pp. 86–90. [Google Scholar] [CrossRef]
  6. Li, Q.; Shah, S.; Nourbakhsh, A.; Liu, X.; Fang, R. Hashtag Recommendation Based on Topic Enhanced Embedding, Tweet Entity Data and Learning to Rank. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM 2016), Indianapolis, IN, USA, 24 October 2016; pp. 2085–2088. [Google Scholar] [CrossRef]
  7. Liu, Z.; Chen, X.; Sun, M. A Simple Word Trigger Method for Social Tag Suggestion. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, 27–31 July 2011; Association for Computational Linguistics: Edinburgh, UK, 2011; pp. 1577–1588. [Google Scholar]
  8. Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhutdinov, R.; Zemel, R.; Bengio, Y. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2048–2057. [Google Scholar]
  9. Jocić, M.; Đorđe, O.; Malbasa, V.; Konjovic, Z. Image tagging with an ensemble of deep convolutional neural networks. In Proceedings of the 2017 International Conference on Information Society and Technology, Budapest, Hungary, 12–15 March 2017; pp. 13–17. [Google Scholar]
  10. Gaur, S. Generation of a Short Narrative Caption for an Image Using the Suggested Hashtag. In Proceedings of the 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW), Macao, China, 8–12 April 2019; pp. 331–337. [Google Scholar]
  11. De Boom, C.; Dhoedt, B.; Demeester, T. Character-level Recurrent Neural Networks in Practice: Comparing Training and Sampling Schemes. Neural Comput. Appl. 2019, 31, 4001–4017. [Google Scholar] [CrossRef] [Green Version]
  12. Hwang, K.; Sung, W. Character-level language modeling with hierarchical recurrent neural networks. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 5720–5724. [Google Scholar] [CrossRef] [Green Version]
  13. Nguyen, H.; Wistuba, M.; Schmidt-Thieme, L. Personalized Tag Recommendation for Images Using Deep Transfer Learning. In ECML PKDD 2017: Machine Learning and Knowledge Discovery in Databases; Springer: Cham, Switzerland, 2017; pp. 705–720. [Google Scholar] [CrossRef]
  14. Zhang, S.; Yao, Y.; Xu, F.; Tong, H.; Yan, X.; Lu, J. Hashtag Recommendation for Photo Sharing Services. Proc. AAAI Conf. Artif. Intell. 2019, 33, 5805–5812. [Google Scholar] [CrossRef]
  15. Kou, F.; Du, J.P.; Yang, C.X.; Shi, Y.S.; Cui, W.Q.; Liang, M.Y.; Geng, Y. Hashtag Recommendation Based on Multi-Features of Microblogs. J. Comput. Sci. Technol. 2018, 33, 711–726. [Google Scholar] [CrossRef]
  16. Denton, E.; Weston, J.; Paluri, M.; Bourdev, L.; Fergus, R. User Conditional Hashtag Prediction for Images. In Proceedings of the 21th ACM SIGKDD International Conference on kKnowledge Discovery and Data Mining, Sydney, NSW, Australia, 10 August 2015; pp. 1731–1740. [Google Scholar] [CrossRef]
  17. Kao, D.; Lai, K.T.; Chen, M.S. An Efficient and Resource-Aware Hashtag Recommendation Using Deep Neural Networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2019: Advances in Knowledge Discovery and Data Mining; Springer: Cham, Switzerland, 2019; pp. 150–162. [Google Scholar] [CrossRef]
  18. Gaosheng, W.; Li, Y.; Yan, W.; Li, R.; Gu, X.; Yang, Q. Hashtag Recommendation with Attention-Based Neural Image Hashtagging Network. In Proceedings of the 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, 13–16 December 2018; pp. 52–63. [Google Scholar] [CrossRef]
  19. Nguyen, T. Recommending Tags for Images: Deep Learning Approaches for Personalized Tag Recommendation. Ph.D. Thesis, University of Hildesheim, Hildesheim, Germany, 2018. [Google Scholar] [CrossRef]
  20. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
  21. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
  22. Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 Million Image Database for Scene Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Kalliatakis, G. Keras-VGG16-Places365. 2017. Available online: https://github.com/GKalliatakis/Keras-VGG16-places365 (accessed on 29 November 2020).
  24. Panboonyuen, T.; Jitkajornwanich, K.; Lawawirojwong, S.; Srestasathiern, P.; Vateekul, P. Semantic Segmentation on Remotely Sensed Images Using an Enhanced Global Convolutional Network with Channel Attention and Domain Specific Transfer Learning. Remote Sens. 2019, 11, 83. [Google Scholar] [CrossRef] [Green Version]
  25. Lin, D.; Wang, Y.; Xu, G.; Li, J.; Fu, K. Transform a Simple Sketch to a Chinese Painting by a Multiscale Deep Neural Network. Algorithms 2018, 11, 4. [Google Scholar] [CrossRef] [Green Version]
  26. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
  27. Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv 2016, arXiv:1602.07261. [Google Scholar]
  28. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
  29. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
  30. Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning Transferable Architectures for Scalable Image Recognition. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710. [Google Scholar]
  31. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
  32. Agrawal, R.; Imielinski, T.; Swami, A. Mining Associations Between Sets of Items in Massive Databases. In Proceedings of the ACM-SIGMOD 1993 International Conference on Management of Data, Washington, DC, USA, 26–28 May 1993. [Google Scholar]
  33. Agrawal, R.; Srikant, R. Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the 20th International Conference on Very Large Data Bases; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1994; pp. 487–499. [Google Scholar]
  34. Raschka, S. MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack. J. Open Source Softw. 2018, 3, 638. [Google Scholar] [CrossRef]
  35. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  36. Bastian, M.; Heymann, S.; Jacomy, M. Gephi: An Open Source Software for Exploring and Manipulating Networks. Int. AAAI Conf. Weblogs Soc. Media 2009, 8, 361–362. [Google Scholar]
  37. Jacomy, M.; Venturini, T.; Heymann, S.; Bastian, M. ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software. PLoS ONE 2014, 9, e98679. [Google Scholar] [CrossRef] [PubMed]
  38. Giannoulakis, S.; Tsapatsoulis, N. Defining and Identifying Stophashtags in Instagram. In INNS 2016: Advances in Big Data; Springer: Cham, Switzerland, 2016; pp. 304–313. [Google Scholar] [CrossRef]
Figure 1. An overview of the Voting Deep Neural Network with Associative Rules Mining (VDNN-ARM) method.
Figure 1. An overview of the Voting Deep Neural Network with Associative Rules Mining (VDNN-ARM) method.
Entropy 22 01351 g001
Figure 2. Graphic representation of accuracy (5) tests on each DNN network.
Figure 2. Graphic representation of accuracy (5) tests on each DNN network.
Entropy 22 01351 g002
Figure 3. Graphical results of Table 1. Plot (a) shows precision obtained for various numbers of hashtags and confidence of A R M . Plot (b) visualizes recall and (c) accuracy.
Figure 3. Graphical results of Table 1. Plot (a) shows precision obtained for various numbers of hashtags and confidence of A R M . Plot (b) visualizes recall and (c) accuracy.
Entropy 22 01351 g003
Table 1. Results of precision, recall, and accuracy obtained by VDNN-ARM depending on the confidence value.
Table 1. Results of precision, recall, and accuracy obtained by VDNN-ARM depending on the confidence value.
ARM Confidence#Recommended HashtagsPrecisionRecallAccuracy
0.2 No more than 112.343.3012.34
No more than 213.137.3223.82
No more than 312.7210.1831.90
No more than 412.3913.1238.58
No more than 512.0915.5844.00
No restrictions11.5933.6767.16
0.3 No more than 115.524.1815.52
No more than 216.148.9328.90
No more than 315.4112.5337.22
No more than 415.1515.8544.02
No more than 514.8318.4848.72
No restrictions14.4930.1263.24
0.5 No more than 119.125.2919.12
No more than 218.8210.5932.86
No more than 318.2114.9442.28
No more than 418.0118.5949.12
No more than 517.7221.2153.10
No restrictions17.4326.7759.44
0.75 No more than 119.905.5219.90
No more than 219.9911.3434.70
No more than 319.4016.0244.64
No more than 419.0019.7751.32
No more than 518.7622.2954.92
No restrictions18.4825.7358.72
0.95 No more than 120.445.7620.44
No more than 220.2511.4735.04
No more than 319.6116.2644.86
No more than 419.2320.0351.56
No more than 519.0022.5255.18
No restrictions18.7725.4558.46
Table 2. Comparison of precision (1), recall (5), and accuracy (5) of state-of-the-art algorithms and VDNN-ARM.
Table 2. Comparison of precision (1), recall (5), and accuracy (5) of state-of-the-art algorithms and VDNN-ARM.
MethodPrecision (1)Recall (5)Accuracy (5)
VGG-Object + VGG-Scene (Baseline) [1]16.7813.3335.80
Ensemble − FFNN (union) [9]11.3613.4837.46
Ensemble − FFNN (intersection) [9]14.1215.2440.28
VDNN-ARM confidence = 0.212.3415.5844.00
VDNN-ARM confidence = 0.315.5218.4848.72
VDNN-ARM confidence = 0.519.1221.2153.10
VDNN-ARM confidence = 0.7519.9022.2954.92
VDNN-ARM confidence = 0.9520.4422.5255.18
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Hachaj, T.; Miazga, J. Image Hashtag Recommendations Using a Voting Deep Neural Network and Associative Rules Mining Approach. Entropy 2020, 22, 1351. https://0-doi-org.brum.beds.ac.uk/10.3390/e22121351

AMA Style

Hachaj T, Miazga J. Image Hashtag Recommendations Using a Voting Deep Neural Network and Associative Rules Mining Approach. Entropy. 2020; 22(12):1351. https://0-doi-org.brum.beds.ac.uk/10.3390/e22121351

Chicago/Turabian Style

Hachaj, Tomasz, and Justyna Miazga. 2020. "Image Hashtag Recommendations Using a Voting Deep Neural Network and Associative Rules Mining Approach" Entropy 22, no. 12: 1351. https://0-doi-org.brum.beds.ac.uk/10.3390/e22121351

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop