5.1. Substantial Findings and Implications
The present systematic review provides important new insights into the conceptual and methodological aspects of studies in which survey scales for assessing individuals’ self-reported IPCs in online contexts were developed. In the following, we discuss the findings and their implications and give recommendations for future research.
The results related to RQ1 indicate that the IPC constructs are defined in different ways. Li [11
] suggested that such a range of conceptual definitions stems from researchers’ attempts to capture the most important aspects of IPCs in the changing technological and sociocultural environment. However, our results indicate that different conceptualizations were proposed in the same technological period and in relation to the same context (see Figure 2
). As the genealogy of the IPC scales shows, this might be because the investigators often developed new survey scales rather than incrementally improving existing ones [17
]. For example, Koohang [10
] adapted the IPC-2 scale for use in the SNS context, although three scales had been developed for SNSs beforehand (UPCSNS, SMIPC, and SNS-IPC). Moreover, an evident lack of theory utilization in scale development was observed. Only a handful of studies based the definition of the IPC constructs on a theory, and even those that did often used counterpointing theories. Various conceptualizations seem to have arisen from attempts to provide new survey scales for specific online contexts (e.g., SNSs) or to capture only a subset of IPCs [11
]. Whereas the multitude of approaches might deepen the understanding of IPCs in each individual context, this plurality endangers the consolidation of measurement approaches as investigators engage in the development of similar, albeit conceptually different, IPC scales. This disparity results in potential confusion rather than a richness of insights, since the same notion is used to refer to different concepts [49
]. Therefore, we propose that researchers developing IPC scales draw upon an appropriate theory and clearly state their conceptualization of the construct. Further, researchers utilizing existing IPC scales should adopt the same conceptualization as used in the original scale development study.
In contrast to the vast heterogeneity of conceptualizations, the reviewed IPC scales cover four well-distinguished online contexts (RQ2): General Internet use, ecommerce, SNSs, and mobile Internet. Given the fast pace of Internet service innovation, this finding opens at least two intriguing issues for discussion. First, the general definitions of the identified online contexts in the survey scales (except for ECWPC) might hinder the performance of IPC scales because they are less sensitive to the specifics of the context [27
] (pp. 73–75). This is especially important because IPCs are very context-dependent [25
]. In fact, differences in an individual’s level of IPCs in various online contexts have also been empirically demonstrated, indicating that an individual’s privacy expectations depend not only on the type of information submitted but also, for example, on his or her perceived anonymity [24
]. Further, relying on a general definition of IPCs might deter their explanatory potential when predicting privacy-related attitudes and human behavior in specific online contexts [17
]. Li [11
] hypothesized that when conceptualized and measured broadly, IPCs are appropriate as a measure of the psychological state, while narrower conceptualizations are required to predict behavior and trust. In this respect, we advise researchers to account for the specificity issue and adopt a measure of IPCs that corresponds to the level at which the dependent variable is measured. Second, the reviewed IPC scales were developed only for online contexts that pertain to the second (introduction) and third (awareness) stages in Yun et al.’s [12
] typology of the periods of IPC research. Thus, IPC scales that would extend the scope of the online contexts of IPCs to emerging domains of the Internet in everyday life, such as IoT, cloud computing, or autonomous vehicles, are warranted.
Regarding RQ3, considerable diversity in the number of dimensions included in IPC scales can be observed. A limited number of scales draw on a particular theory in defining dimensions (IUIPC, MUCIP, IPC-2, SMIPC, and AIPC), while the others are based on a review of existing dimensions of IPCs (IPC-1, CPCI, ECWPC, and UPCSNS), adopt the dimensionalities of previous scales (SNS-IPC, Amharic SNS-IPC, and SMSPC), or derive the dimensions through exploratory or confirmatory factor analysis (OPC, APCP, and Turkish OPCS). As a result, the contents of the included dimensions often overlap, although their formal denomination and number might differ significantly. Nevertheless, through the process of content matching, we were able to distill six key dimensions: (ab)use
, and errors
(the dimension privacy concerns
was excluded; see Section 4.3.3
). Whereas all six dimensions were empirically validated, there are some issues that we would like to underscore with reference to the last three dimensions. For example, Laufer and Wolfe [46
] noted that control is not a prerequisite of privacy and that a situation can be perceived as private although the individual lacks control over it. Tellingly, it has been demonstrated that privacy control is related to but, in essence, different from IPCs [42
]. Likewise, awareness can be conceptualized as a distinct construct and as an antecedent of IPCs [42
]. Finally, errors might be a possible dimension, but only for specific online contexts (e.g., medical, banking), as studies have reported that users in fact provide false information in ecommerce and SNS scenarios to protect their privacy [52
]. In this sense, the inclusion of the errors
dimension in IPC scales developed for SNS contexts is confusing. Looking back at the genealogy of the IPC scales (Figure 2
), one might conclude that researchers adopted previously validated dimensions without considering the context-specific elements of the environment for which the new scale was developed. Coupled with scarce theoretical justifications and a lack of content validity, this introduces a considerable level of doubt regarding the appropriateness of the included dimensions in some IPC scales. We therefore propose that more effort be given to identifying the relevant dimensions of IPCs for each specific context and to incorporating only these in the corresponding IPC scale.
RQ4 addressed the quality of procedures used in IPC scale development. Whereas the procedures used for assessing internal validity were most often of adequate or higher quality, our analysis recognized a high risk of bias in ensuring content validity. Assuming that a clear description of the measured construct is an essential step in assessing content validity—because only by specifying the construct clearly can the scale’s content be judged as appropriate or otherwise [27
]—our review suggests that some studies have already failed in this very first step of scale development. Notably, four scales did not rely on a clear account of the IPC concept, and the delineation of the construct origin was absent in six scales. Content validation was further restrained by the lack of input from the target population during the item generation process and by scarce testing of comprehensibility and comprehensiveness. This might lead to “conceptualizations that are faulty and items that do not address important facets of the construct” [56
] (p. 233). We found that several investigators attempted to overcome the problems of content validation with a stronger reliance on existing questionnaires. Nevertheless, due to the cultural and contextual specificities of privacy [18
], such a strategy needs to be applied attentively. Thus, developing a new IPC scale or adapting an existing one from one Internet context or culture to another should always be accompanied by content validation, for example, using expert reviews, cognitive interviews, or behavioral coding. Finally, criterion validity poses a problem. According to the COSMIN methodology [28
], criterion validity is assessed by comparing the newly developed measure to a “gold standard” (i.e., an existing and rigorously validated measure of the same concept), and a high correlation between them indicates that both measure the same concept. Whereas the methods used for assessing criterion validity were appropriate, criterion validity was tested only with respect to existing IPC scales of uncertain methodological quality, making the validity of such comparisons questionable. Moreover, IPC scales often differ in the conceptualization of the measured construct (see Table 2
), which calls the conceptual similarity of the compared scales into question. Therefore, we suggest that future research aimed at testing the criterion validity of IPC scales should use only existing scales with confirmed validity and consider the conceptual underpinnings of the scales in comparison.
When we integrate the findings across the research questions, they suggest that future endeavors in IPC scale development should first focus on the conceptualization of the construct and content validity of newly developed scales. Of course, many difficulties regarding the conceptualization of IPCs (e.g., nonuniformity, nonspecificity, lack of contextuality) can stem from the challenges in defining information privacy and privacy in general [2
]. Nonetheless, developing a robust—and ideally unified—conceptual framework that would allow a theoretically informed definition of the IPC construct should become the goal of future research. This would not only allow scholars to advance and compare measurement models of IPCs in different online contexts and cultural settings but also give practitioners a theoretically and methodologically informed basis for empirically evaluating the applied implications of IPCs for human behavior.
5.2. Limitations and Future Research
As with any systematic review, the findings of this study should be interpreted with the limitations of both the research literature and our methods in mind. With reference to the scope of the systematic search, the exclusive focus on articles developing and testing IPC scales can be regarded as the first limitation of our work. If our selection criteria had been more inclusive, we might have identified studies that tested specific measurement properties of IPC scales. For instance, none of the selected scales were tested for cross-cultural validity. However, it is likely that such evaluation was conducted in other empirical studies on IPCs in online contexts. Thus, overall, our results might underestimate certain aspects of IPC scales. Further, as our research is limited to online contexts, the findings may not be generalizable to other domains for which the IPC scales were developed.
Second, although the literature search was comprehensive, publication bias cannot be completely discounted. Whereas we performed manual searches of relevant journals and reviewed the reference lists of the screened articles, limited resources did not allow us to include unpublished studies, works in the gray literature, or studies not published in English. However, it is unlikely that relevant studies would be found in these sources [28
Third, the studies in this review were evaluated only for the methodological quality of the scale development procedures. We did not assess and compare the psychometric characteristics of the IPC scales. Such an overview and evaluation would be beneficial to the research community as it could be used to make an informed choice of the IPC scales for use in future empirical studies. However, the heterogeneity of the identified scales (in terms of contexts, dimensions, definitions, etc.) and the asserted questionable quality of the methodological procedures for ensuring content validity herein make the usefulness of such a comparison disputable. In fact, Mokkink et al. [28
] suggested that evaluation of the interpretability and feasibility of survey scales based on their psychometric characteristics is worthwhile only when high-quality evidence of content validity is present.
The last potential limitation pertains to the selection of the evaluation method. The COSMIN methodology is a validated framework for systematic reviews of studies on measurement properties. Nevertheless, it has been utilized mostly in the medical and health sciences for selecting the most suitable PROMs. Whereas its checklists can potentially be used in other research fields and disciplines, certain measurement properties will nonetheless need appropriate adjustments for well-founded applications to the social sciences. For instance, using the test–retest method to ascertain the reliability of a survey scale is highly uncommon for information and Internet research in general. Therefore, the identified absence of reliability studies for IPC scales might be more a consequence of the general practices in the field than a distinctive scale development characteristic of IPCs.