- Analyzing the theoretical basis in the field of NDI research, defining the category of NDI and sources of its formation.
- Describing the key qualities of NDI, determining its advantages in comparison to other types of market information.
- Forming a basic methodology for conducting typical NDI-based market research.
- Conducting applied research in accordance with the devised methodology.
- Processing the research results, describing the unique methodological qualities of market research based on natural digital information processing.
2. Literature Review
- Universally accessible NDI, which include:
- Information stipulated by the user in their profile (account) on various social media: gender, age, place of residence, interests, contacts, hobbies, etc.
- Messages written by the user in open forums, in open groups on social media as comments, answers to questions, reviews, posts in blogs, etc.
- Number of views, likes, reposts, comments to various messages (posts, notes, etc.) in the digital space.
- Private NDI, which include:
- Data of personal correspondences in messengers and messages on social media.
- The so-called “passive digital footprint” : history of visits to websites, geolocation, data on purchases made using credit cards, etc. As a matter of fact, passive digital footprint, as an example of private natural digital information, is the closest to web-analytics data, and it is not always easy to draw the line between them.
- NDI is more reliable in comparison to data obtained through “traditional” instruments of market research, such as focus groups, in-depth interviews and surveys. This advantage is caused by such a property of NDI as independence of obtaining: in many cases, users do not suspect that the digital information they leave behind is somehow collected and analyzed, which deprives it of such a property as “social desirability”, mentioned above. This advantage helps absolutize such a property of information as “uncertainty reduction”  or “entropy production constraint” . This makes it possible to ideally refine natural digital information in the information hierarchy theory, according to which the purpose of using any information by the user can be characterized by the movement along the route: thermodynamic search→uncertainty reduction→appearance of meaning .
- The price of obtaining such information is minimum. Actually, most NDI is open access (on websites, forums, social media, etc.).
- The time necessary to collect NDI is also minimal and, most importantly, this process can be automated if a single updatable data frame is created to collect a pool of NDI determined in advance, with the frequency set by the researcher.
3. Materials and Methods
- The visual (graphic) part of the advertisement: What color solutions should be chosen to attract maximum attention of the user? What fonts should be used? How bright or neutral should the appearance of the advertising post be?
- The text (content) part of the advertisement: What key words will make the consumer want to make the target actions? What amount of text is going to be the best for perception? What should be the tonality of the advertisement?
- The targeted part of the advertisement: What settings of the target audience should be chosen when the advertisement is created? Who will it be shown to, on what platforms and under what conditions?
- Retrieving posts (on social media, forums, blogs, etc.) that caused the biggest responses of consumers (likes, reposts, comments, views) and their further visual and content analysis. What is more, not only text, but also photo/audio/and video messages can be analyzed.
- Analyzing the tonality of text information, i.e., automated identification of emotionally colored vocabulary and emotional assessment (opinions) of the authors towards the objects discussed in the text. This analysis makes it possible to define the emotional qualities of the content and, consequently, form the most effective tone of the marketing appeal for its consumers, as well as to determine the emotional color of the information background, which, in turn, can have an impact on the behavior of various players.
- Analyzing the meaningful content, aimed at identifying the lexical categories, “tokens”, i.e., meaningful elements (words, phrases, symbols) of the text, and their quantified analysis via calculating the frequency of occurrence of the most common words and creating an importance matrix of tokens. Importance analysis can identify the substantial features of the presented templates, and ignore any insignificant and, correspondingly, trivial inaccuracies they contain .
- Analyzing the social connections of the target audience, using a method of social graphs, which makes it possible to create substantiated avatars of the company’s target audience (an avatar or a portrait of an ideal client is a profound, step by step description of an individual representative of the target audience), search for look-alike audiences, representing the result of the analysis of the initial database (for instance, analysis of real consumers of the company or actual subscribers of a company’s community) in terms of the presence of resemblances (interests, behavior, other factors) and search for the maximally similar users among all registered accounts (as a rule, this is performed using neural networks).
- Data search. The initial HTML code, for example, that of a website page is uploaded in a parser. A script, which breaks up the entire text into lexemes, starts to work with the code, highlighting the necessary information.
- Information retrieval. Data are searched thanks to a certain collection of characters, describing the purpose of the search. This collection is also called regular expressions. They can be used to highlight only those fragments that are of interest in the entire array.
- Data saving. After it is obtained, information is saved in the form of tables or is included in the database.
- Males are more active in the analyzed communities in comparison with females (M—3344 people; F—2656 people).
- It is not common for users to indicate their marital status, so when the settings of the advertisement are chosen, this criterion is not to be considered.
- The target audience comprises people living in metropolises (Saint-Petersburg—2452 people; Moscow—1395 people).
- The average age of an active subscriber in the analyzed communities is 33.5 years.
- In order to advertise the online course, a picture has to be selected. It should meet the following identified criteria: represent vector graphics, demonstrate the process of animation, be executed in a style popular for vector graphics.
- The target audience should be the active subscribers of the analyzed communities, who are interested in motion design and want to develop in this field. They most often like and repost the recordings of video lessons and various tutorials, so the main emphasis in the advertisement should be placed on the availability of free video lessons.
- The time for demonstration of the advertisement should be set from 9 a.m. to 2 p.m., since this is the time when the subscribers most frequently repost the posts of the communities.
- The projected target audience of the advertisement is 6300 people, covering approximately 30% of the target audience, with the total budget being around RUB 700 (according to the statistics of the Vkontakte Advertising Account).
- The time for demonstration of the advertisement was chosen according to the previously obtained data: the target audience is most active within the period from 9 a.m. to 2 p.m., and from 4 p.m. to 7 p.m.
- Since, according to the data obtained, it was found that most of the target audience consists of males, corrections were concerning consumers’ gender and age: the average rate was raised by 100% for males aged 25–34, and by 50% for males 35–44. Thus, the priority of displays was increased for those living in Moscow and St. Petersburg, because it was already previously established that they comprise a major part of the target audience.
- The key phrases chosen for demonstration of the advertisement were “animation lessons”, “animation school”, “creating animation”, and “motion learning”. The key phrases were identified as a result of the tokens obtained from the most reposted posts of the communities.
- Social communication portals. These portals imply free communication between representatives of society, outside commercial goal setting. The bases of these portals are social media and thematic forums. It should be noted that these portals can be differentiated by thematic principle, geographic principle, demographic principle, etc. Information formed in the context of interaction of the subjects of these portals is the most significant for market research, since it is formed outside goal setting, which can reduce the representativeness of research results.
- News portals. These portals imply the formation of exceptionally targeted news content. This content is also natural digital information. However, its use in market research implies a specific totality of goals and objectives. This content has one-way mass impact on the representatives of society, thus forming a single information environment. If this content is quantified, the state of the information environment can be analyzed and its influence on society can be characterized.
- Commercial portals. The information on these portals implies exceptional commercial goal setting. In the first stage, these are the online stores and information websites of some enterprises. If the information presented on these portals is analyzed, the targeted impact of the subject on its consumers can be characterized, and, therefore, the efficiency of this impact can be determined through analysis of business activity results.
- Ideological portals. The information presented in these portals implies goal setting different from commercial. However, the structure of the impact made by this information on consumers is identical to commercial portals. We herein mean information portals, thematic websites, websites of religious and political organizations, etc. Thus, analysis of this information is virtually identical to the analysis of information on commercial portals. However, the effectiveness of these portals is not measured by commercial metrics.
- Content properties. These properties are determined by meaningful and lexical content. The range of vocabulary that characterizes the tools for presenting the information, and the themes, determining the scope of interests of the information carrier, are valuable to the researcher. Structuring and analysis of the content properties of natural digital information help the market researcher answer the questions of “what” mediates the mind of the user, competitor, or any other object of research and “what vocabulary apparatus” the object of research uses in narration, and, therefore, what is the most effective way to interact with them.
- Tonality properties. These properties define the emotional tone of the content. The tonality analysis tools existing today can be used to describe such characteristics of natural digital information as the level of positivity, negativity, neutrality and many others. The assessment and analysis of the tonality properties of NDI help to, primarily, determine the relation of the research object to the content, as well as to conclude about the tonality properties of a potentially effective marketing message. Tonality properties are secondary towards the content ones, but they are the ones which determine the emotional vector of the marketing message in relation to the content.
- Metaproperties. These properties define the time and place of generation of natural digital information as well as demographic, geographic, psychological, psychographic and other properties of the generation source. The description and analysis of these properties make it possible, first, to categorize the objects of market research, to identify the dependences between individual properties of objects and the tonality and/or content properties of the generated information, etc. These properties are determinant towards the content properties and can form a binary array of primary information. Its statistical analysis provides market researchers with a totality of comparative assessments, based on which meaningful conclusions can be made.
- Context properties. These properties help to describe the context within which the content is formed. Natural digital information can be contextually differentiated in accordance with the following attributes:
- Formation level. The formation level is understood as a characteristic of the cause-and-effect chain as a result of which the content was formed. According to this attribute, it is possible to distinguish primary content (natural digital information formed due to reasons detached from it), secondary content (natural digital information formed as a reaction to primary content (treatment of the primary content is the cause, the secondary content is the effect)), high-level content (natural digital information formed as a reaction to the secondary or other high-level content).
- Content of single information environment. Any natural information is formed because of the single information content. Differentiating one or other content element (political instability, war time, period of mass unrest, sport achievements of the nations, etc.) helps to correct the obtained analytical results and determine their importance for some other period.
- Tonality of the single information environment. As noted above, natural digital information is formed under the effect of the single information context, which determines the influence of the tonality of the single information context on the properties of natural digital information. Considering the tonality of the single information environment allows the universality of the obtained analytical results to be defined.
- Connection metrics. This class includes tools reflecting the strength and nature of connection between the quantitative expression of properties of various totalities of natural digital information, on the one hand, and the properties of the analyzed array of natural digital information and the properties of the single information background, on the other hand. The tool basis for developing such metrics includes various types of correlation coefficients and regression coefficients.
- Comparability metrics. This class has tools reflecting the relation between quantitative characteristics of the properties of various totalities of natural digital information as well as the properties of the analyzed array of natural digital information and the properties of the single information background. The tool basis for developing such metrics includes comparison measures, dispersion indicators, as well as procedures for comparing dynamic and spatial coefficients.
- Dominance metrics. This class includes tools to determine the most significant quantitative characteristics of natural digital information as well as the most significant representatives of an array of natural digital information. All tools in this class are based on distinguishing objects with maximal or minimal quantitative values of their characteristics.
- Exceptionality metrics. This class includes tools allowing us to identify statistical outliers and unreliable values both among quantitative characteristics of natural digital information and in a totality of consistent objects of an array of natural digital information. Such metrics are applied to increase the quality of the analyzed array, to improve the reliability of the research results, and to substantiate the significance of the analytical results obtained.
- —the level of positivity of post i.
- —the average level of positivity of all posts in sample n.
- —number of likes for post i.
- —the average number of likes for all posts in sample n.
- The toolkit developed by the authors (classification of sources of NDI, description of key properties of NDI, metrics to quantify and analyze the properties of NDI and, as a result, the algorithmic model for analyzing natural digital information in the context of market research) allows market research to be conducted without direct attraction of research subjects, which results in cost reduction and elimination of the phenomenon of social desirability.
- The toolkit developed by the authors allows so-called reasoned advertising messages to be created that meet the requests of the target audience, which is proved by the big data that underlie the presented methodology.
- The toolkit developed by the authors is universal for analyzing natural digital information, which, with minor adaptations, can be used by any subject conducting market research.
Conflicts of Interest
- Digital 2020 Global Digital Overview Essential Insights into How People around the World Use the Internet, Mobile Devices, Docial Media, and Ecommerce. Available online: https://wearesocial-net.s3-eu-west-1.amazonaws.com/wp-content/uploads/common/reports/digital-2020/digital-2020-global.pdf (accessed on 19 September 2021).
- AppsFlyer. Available online: https://www.campaignlive.co.uk/article/surge-mobile-apps-spend-bucks-pandemic-trend/1679937 (accessed on 19 September 2021).
- Aral, S. Commentary—Identifying Social Influence: A Comment on Opinion Leadership and Social Contagion in New Product Diffusion. Mark. Sci. 2011, 30, 217–223. [Google Scholar] [CrossRef]
- Bollen, J.; Mao, H.; Zeng, X.-J. Twitter mood predicts the stock market. J. Comput. Sci. 2011, 2, 1–8. [Google Scholar] [CrossRef]
- Candogan, U.O.; Bimpikis, K.; Ozdaglar, A.E. Optimal Pricing in Networks with Externalities. Oper. Res. 2012, 60, 883–905. [Google Scholar] [CrossRef]
- Crapis, D.; Ifrach, B.; Maglaras, C.; Scarsini, M. Monopoly Pricing in the Presence of Social Learning. Manag. Sci. 2017, 63, 3586–3608. [Google Scholar] [CrossRef]
- Jing, B. Social Learning and Dynamic Pricing of Durable Goods. Mark. Sci. 2011, 30, 851–865. [Google Scholar] [CrossRef]
- Ye, S.; Aydin, G.; Hu, S. Sponsored Search Marketing: Dynamic Pricing and Advertising for an Online Retailer. Manag. Sci. 2015, 61, 1255–1274. [Google Scholar] [CrossRef]
- Cui, R.; Gallino, S.; Moreno, A.; Zhang, D.J. The Operational Value of Social Media Information. Prod. Oper. Manag. 2018, 27, 1749–1769. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, Q.; Xie, J. Online Social Interactions: A Natural Experiment on Word of Mouth versus Observational Learning. J. Mark. Res. 2011, 48, 238–254. [Google Scholar] [CrossRef]
- Luca, M. Reviews, Reputation, and Revenue: The Case of Yelp.com; Technical Report; Harvard Business School: Boston, MA, USA, 2011. [Google Scholar]
- Managing Volumes of Data. Gartner Special Report (27 June 2011). Available online: https://www.businesswire.com/news/home/20110627005655/en/Gartner-Solving-Big-Data-Challenge-Involves-Managing (accessed on 1 October 2020).
- Palme, J. Developments in Using Natural Language. Computer Weekly. 1972.
- Partridge, D.; James, E. Natural information processing. Int. J. Man-Mach. Stud. 1974, 6, 205–235. [Google Scholar] [CrossRef]
- Bornstein, M.H.; Gibson, J.J. The Ecological Approach to Visual Perception. J. Aesthet. Art Crit. 1980, 39, 203. [Google Scholar] [CrossRef]
- Dretske, F. Knowledge and the Flow of Information; MIT Press: Cambridge, MA, USA, 1981. [Google Scholar]
- Dretske, F. Misrepresentation. In Belief. Form, Content, and Function; Bogdan, R.J., Ed.; Clarendon Press: Oxford, UK, 1986; pp. 17–36. [Google Scholar]
- Gibson, E.J.; Walk, R.D. The “Visual Cliff”. Sci. Am. 1960, 202, 64–71. [Google Scholar] [CrossRef]
- Gibson, J.J. On the Concept of ’Formless Invariants’ in Visual Perception. Leon 1973, 6, 43. [Google Scholar] [CrossRef]
- Gibson, J.J. The Information Available in Pictures. Leon 1971, 4, 27. [Google Scholar] [CrossRef]
- Gibson, J.J. The information contained in light. Acta Psychol. 1960, 17, 23–30. [Google Scholar] [CrossRef]
- Gibson, J.J. The Senses Considered as Perceptual Systems; Houghton Mifflin: Boston, MA, USA, 1966. [Google Scholar]
- Greif, H. Affording Illusions? Natural Information and the Problem of Misperception. AVANT J. Philos. Vanguard 2019, 10, 2082–6710. [Google Scholar] [CrossRef]
- Millikan, R.G. What has Natural Information to do with Intentional Representation? R. Inst. Philos. Suppl. 2001, 49, 105–125. [Google Scholar] [CrossRef]
- Withagen, R.; Chemero, A. Naturalizing Perception. Theory Psychol. 2009, 19, 363–389. [Google Scholar] [CrossRef]
- Girardin, F.; Calabrese, F.; Fiore, F.D.; Ratti, C.; Blat, J. Digital Footprinting: Uncovering Tourists with User-Generated Content. IEEE Pervasive Comput. 2008, 7, 36–43. [Google Scholar] [CrossRef]
- Goh, K.-Y.; Heng, C.-S.; Lin, Z. Social Media Brand Community and Consumer Behavior: Quantifying the Relative Impact of User- and Marketer-Generated Content. Inf. Syst. Res. 2013, 24, 88–107. [Google Scholar] [CrossRef]
- Kaplan, A.M.; Haenlein, M. Users of the world, unite! The challenges and opportunities of Social Media. Bus. Horiz. 2010, 53, 59–68. [Google Scholar] [CrossRef]
- Konnikova, O.; Yuldasheva, O. The phenomenon of natural digital information and its role in the process of conducting marketing research. Mark. I Mark. Issled. (Mark. Mark. Res.) 2021, 1, 4–16. (In Russian) [Google Scholar] [CrossRef]
- Shmelev, A.G. Psychodiagnostics of Personality Traits; Rech’ Publishing House: Saint-Petersburg, Russia, 2002. (In Russian) [Google Scholar]
- Bahia, T.K.; Simintiras, A.C. The Value Creation of Social Media Information. In Proceedings of the 15th Conference on e-Business, e-Services and e-Society (I3E), Swansea, UK, 13–15 September 2016; pp. 325–331. [Google Scholar]
- Del Fresno García, M.; Daly, A.J.; Sánchez-Cabezudo, S.S. Identifying the new influences in the internet era: Social media and social network analysis. Rev. Esp. Investig. Sociol. 2016, 153, 23–40. [Google Scholar]
- Pan, B.; Crotts, J. Theoretical models of social media, marketing implications, and future research directions. In Social Media in Travel, Tourism and Hospitality: Theory, Practice and Cases; Sigala, M., Christou, E., Gretzel, U., Eds.; Ashgate: Surrey, UK, 2012; pp. 73–86. [Google Scholar]
- Ngai, E.W.; Tao, S.S.; Moon, K.K. Social media research: Theories, constructs, and conceptual frameworks. Int. J. Inf. Manag. 2015, 35, 33–44. [Google Scholar] [CrossRef]
- Buckner, H.T. A Theory of Rumor Transmission. Public Opin. Q. 1965, 29, 54–70. [Google Scholar] [CrossRef]
- Mangold, W.G.; Faulds, D.J. Social media: The new hybrid element of the promotion mix. Bus. Horiz. 2009, 52, 357–365. [Google Scholar] [CrossRef]
- Cai, H.; Chen, Y.; Fang, H. Observational Learning: Evidence from a Randomized Natural Field Experiment. Am. Econ. Rev. 2009, 99, 864–882. [Google Scholar] [CrossRef]
- Chen, Y.; Xie, J. Online Consumer Review: Word-of-Mouth as a New Element of Marketing Communication Mix. Manag. Sci. 2008, 54, 477–491. [Google Scholar] [CrossRef]
- Resnik, P. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. J. Artif. Intell. Res. 1999, 11, 95–130. [Google Scholar] [CrossRef]
- Shannon, C.E.; Weaver, W. The Mathematical Theory of Communication; University of Illinois Press: Urbana, IL, USA, 1949. [Google Scholar]
- Pattee, H.H. Dynamic and Linguistic Modes of Complex Systems. Int. J. Gen. Syst. 1977, 3, 259–266. [Google Scholar] [CrossRef]
- Salthe, S.N. Naturalizing Information. Information 2011, 2, 417–425. [Google Scholar] [CrossRef]
- Foxman, D.; Bateson, G. Steps to an Ecology of Mind. West. Politi-Q. 1973, 26, 345. [Google Scholar] [CrossRef]
- Szanser, A.J.M. Automatic error correction in natural. Natl. Phys. Lab. Comput. Sci. 1971, 46, 38. [Google Scholar]
|Token||Frequency of Occurrence|
|School, program, animat, week, matter, lesson, privat, work, new, wish, project, topic||3|
|Creat, begin, program, animat, lesson, privat, work, author, result||3|
|Type of Metrics||Aim of Using||Indicators||Example|
|Connection metrics||Reflect the strength and nature of connection between:||correlation coefficients|
|Comparability metrics.||Reflect the relation between:||comparison measures|
comparing dynamic and spatial coefficients
|Dominance metrics||Determine:||maximal or minimal quantitative values|
|Exceptionality metrics||Identify statistical outliers and unreliable values:||identifiers of statistical outliers|
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).