In various disciplines, more and more digital data have been collected in the last 20 years due to technological innovations. This is particularly true for the field of geography and geoscience. Devices that transmit or show exact geo-positions like smartphones and global positioning systems make data collection easier and, as such, increase the amount of data being collected and analyzed. As Miller and Han state, “Similar to many research and application fields, geography has moved from a data-poor and computation-poor to a data-rich and computation-rich environment. The scope, coverage and volume of digital geographic datasets are growing rapidly.” [1
] (p. 2). In addition to the amount of data growing, secondary data for reuse has also become more accessible. Research data repositories for geographic data like PANGAEA®
Data Publisher for Earth & Environmental Science (Bremen, Germany, [2
]), Crystallography Open Database (COD, [3
]), World Data Centre for Geomagnetism (WDC, [4
]) and many others make secondary research data analysis possible and offer partially open access to research data.
Funding agencies, governments and publishers also became aware of the associated potential of increasing data availability for economic, technological and academic progress. Advantages like a more transparent research process, easier re-analysis, reproducibility of research findings, verification and validation cannot be ignored [5
]. Mandatory research data management plans, data policies and strategic cooperation between journals and research data repositories (e.g., Elsevier and PANGAEA®
) are the outcome.
The benefits of research data management and data sharing are less easily conveyed to researchers. The creation, preparation, handling and analysis of research data can be time consuming. It is necessary to start data documentation very early in the research process and to continually curate the data throughout this process [6
]. Moreover, the fear of mistakes, exploitation, or legal insecurities hampers data management and sharing. However, these disadvantages of data documentation and management need to be considered in light of the advantages of receiving formal recognition and building reputation [7
Research data management has consequently become an important topic for geographers even though quite a few are still unaware of it. Hence, it is not just researchers who need to cope with the new requirements of research funders like the European Commission, the German Research Foundation or the National Science Foundation. Institutions like universities must create new information services for their researchers and students. Training and guidance as well as support are fundamental for efficient research data management [8
]. Therefore, Humboldt-Universität zu Berlin started its research data management initiative in 2012, a joint venture between the Computer and Media Service, Research Service Centre, University Library, and Vice President for Research. In 2015, the initiative began to offer training and support for research data management and launched workshops. The workshops are part of both the research data management plan [9
] and the specific information literacy teaching plan of Humboldt-Universität zu Berlin [10
]. Initial efforts included the adoption of a research data management policy [11
] and the conducting of a survey and face-to-face semi-structured interviews about research data management among the university’s researchers [12
]. These efforts were the basis for the subsequent development of research data management training. The aim is to accentuate research data management at the university and to create information multipliers among students, researchers and university personnel (e.g., librarians).
In contrast to other workshops and webinars that were offered by other institutions at this time, we decided against a general information workshop about research data and its management. Although general workshops on research data management are more scalable in comparison to discipline-specific workshops, the advantages of a tailored approach outweighed this concern. Through the provision of pinpointed guidance, participants receive more practically useful information for their individual data management. This corresponds to user needs [8
], which also facilitates community building for research data management. The target audience of researchers and PhD students can be reached more easily. The instructors can learn about discipline-specific needs and problems. The insider knowledge thus gained can later be used in consultations and for guidance material. We therefore deemed it more important and fruitful to include discipline-specific knowledge to call attention to this topic. Our first target audience were researchers and students of geography. That decision was based on prior knowledge about departmental needs through the previously mentioned survey and interviews, as well as our subject specialists’ level of expertise on existing research data within the department. The geography department already had experience with data reuse e.g., through satellite images [13
] (p. 26). Moreover, the researchers had digital research data of their own, which they wanted to share through the open access model, in accordance with national and international standards.
2. International Review
Universities and research institutions in Australia, the United Kingdom, the United States of America, and Switzerland have a head start in research data management services and support compared to those from other countries [14
]. For example, in 2011, Kennan and Markauskaite [15
] conducted a survey about research data management practices in New South Wales, Australia. The results show that data sharing as well as research data management planning was already introduced at that time. Since then, the Australian National Data Service (ANDS) in cooperation with Griffith University, Brisbane, has provided support to researchers through workshops, webinars and information via guides and videos [16
The British Digital Curation Centre (DCC), the non-profit organization Jisc, and the UK Data Archive are in the vanguard of research data management training. Beginning in 2011, training resources, guides, workshops and other material were developed to foster research data management among social science researchers [17
]. Various research data management training activities make reference to them.
By way of example for the United States, Piorun et al.
] reported on research data management teaching activities at higher education institutions in Massachusetts in 2012. The initiative involved conducting interviews in preparation for the establishment of the curriculum framework. Piorun et al.
] defined their target group as undergraduate and graduate students of science, health science and engineering. Their approach is modular and designed for a variety of delivery methods.
In Europe, some research groups at the Swiss Federal Institute of Technology in Zurich (ETHZ) have lengthy experience with research data management. For example, in one project, digital research data was managed, archived and curated dating back to 1988 [19
]. The Digital Curation Office offers advice on research data management and preparation of data for long-term archiving. A 2011 survey prompted the installation of extended research data management services and of the implementation of Digital Object Identifiers (DOI) for research data [20
3. Teaching Research Data Management
Humboldt-Universität zu Berlin’s research data management initiative involved the organization of a pilot workshop on the topic “Research Data Know-How for Geographers” in the summer of 2015. Advertising was carried out via the initiatives’ research data management website, other main university homepages and mailing lists, social media, and bulletin boards at the geography department as well as the University Library. The workshop was designed to last for 90 minutes. The four speakers were two subject librarians (information literacy; geography) of the University Library, a consultant for technical questions from the Computer and Media Service, and the research data management coordinator of the university. The target audience included researchers and students of geography. The six workshop participants consisted of PhD and graduate students as well as library personnel. Previous knowledge was not required.
Although many articles, reports or handbooks about research data management have been published in the last few years, the information on data management in geography is still limited. In preparation for our workshop, a guide and material from three past workshops were especially useful. The guide used as reference is Bertelmann et al.
]. Written and edited by leading experts and members of major German research institutions, it is a hands-on research data management guide specifically tailored to the field of geoscience. The guide covers most of the aspects mentioned by other research data management guides like Corti et al.
], but also includes discipline-specific information regarding research data management and data sharing. Corti et al.
] begin with the importance of research data management and continue with a comprehensive display of research data management topics, including the data lifecycle [6
], planning, documentation, legal issues in the United Kingdom as well as publishing and citing of research data, though from the perspective of social sciences. Bertelmann et al.
] feature a discipline-specific point of view and add e.g., a list of data portals for data publication and acquisition or the list of metadata schemata relevant for geoscience.
The content of our audiovisual presentation was based on past research data management workshops at University of Bath [22
], Leibniz University Hanover [23
] and Helmholtz Centre Potsdam—GFZ German Research Centre for Geosciences [24
]. These presentations all address data management plans as well as varying aspects of research data management covered by the previously mentioned guides. Pink and Cope [22
] as well as Neumann and Ziedorn [23
] begin with a brainstorming session about research data management, an item which we also prioritized. Bertelmann [24
] specifically targets an audience of PhD students at GFZ. This presentation is particularly informative regarding metadata and methods of data publication for geoscientists.
The participants were invited to inform us in advance about the topics they were particularly interested in. However, this option was scarcely used. Therefore, we presented the following topics originating from the aforementioned sources.
3.1.1. What is Research Data?
In the beginning of the workshop, participants brainstormed about their own research data and about the benefits and problems or their reservations regarding research data management. We posed the following questions:
What is research data in my discipline?
How do I benefit from research data management?
Where are the problems? Where do I have reservations?
In groups of two, the participants wrote down their opinions. Afterwards, their answers were collected and presented (see Table 1
The answers show that geography as a discipline has a broad range of research data types, from surveys, time courses and census data to models and plans. Although all participants agreed that research data management is worthwhile, some failed to write down the specific benefits within the given time (approx. ten minutes). This is understandable as benefits of research data management are at present mainly of an abstract nature and not easily measured. An exception was presented by Piwowar and Vision [25
], who showed that making data available can lead to higher citation rates. All of our workshop participants agreed on difficulties with copyright and the current legal situation in Germany. Group 1 also noted problems with lacking documentation of research data (i.e.
, information about research design, methodology, data collection, provenance, and preparation, alteration, etc.
) that prevent easy re-use. Some participants were also skeptical about the investment of time and money.
3.1.2. Why Research Data Management?
We continued with a short global history of research data management and two definitions. Research data management was defined as “all activities that are associated with the processing, storage, archiving and publication of research data” [12
] (p. 6). A general definition of research data by Kindling and Schirmbacher described research data as “all digital data that arise during the research process or are its result” [26
] (p. 130). We explained the requirements of funders and publishers and the advantages of data management. These included traceability, reproducibility, validation, scientific recognition as well as legal and ethical requirements. Furthermore, we referred to the research data management policy of Humboldt-Universität zu Berlin and the accompanying guidelines [11
]. Most participants did not know about these official recommendations. Therefore, we explained the content in more detail than previously planned.
3.1.3. Successful Research Data Management
Beginning with the research data lifecycle [6
] and using a generic Horizon 2020 data management plan [28
], we tried to highlight all relevant aspects of research data management: backup and securing of (sensible) research data, clear structuring of files and version control, licensing, data archival and suitable file formats were addressed (e.g., [17
]). Recommended file formats included GeoTiff for digital raster graphic and Shapefile for geographic information system (GIS) software, as both are machine-independent, non-proprietary and de facto
standards. We also informed the participants about services and advice on research data management (e.g., support with backup or archiving) that are offered by Humboldt-Universität zu Berlin. We introduced and extensively discussed discipline-specific metadata schemata with the participants. These included as an example Darwin Core (DwC, [29
]) and ISO 19115 [30
]. In addition, we referred to the overview of the British Digital Curation Centre (DCC) for disciplinary metadata [32
]. For practical support of research data management, we introduced the DMPonline tool of DCC [33
] and provided an example plan as well as German-language instructions for easier application.
3.2. Practical Work with Repositories
As an introduction to research data sharing and re-use we presented re3data.org—a central registry for research data repositories and a point of access for finding research data for secondary analysis. In contrast to other research data management workshops, we included a session about the discipline-specific research data repository PANGAEA®
, a data publisher for earth and environmental science that specializes on geo-referenced research data. In cooperation with PANGAEA®
, we were able to demonstrate the registration, login and research data submission process of the system. Afterwards, we also mentioned other discipline-specific repositories like COD and WDC as well as ZENODO [34
] as a multi-disciplinary repository. We hoped to especially encourage graduate students to use these repositories to share their data with fellow researchers. Based on our experience, young researchers are more open to research data management and can be more easily convinced of its value. Not only can they receive the benefits of data management for the longest time (being at the beginning of their research careers), but they can also act as information multipliers for their colleagues.
Due to its increasing relevance regarding data sharing, we also elaborated on proper research data citation. We presented the widely recognized FORCE11 data citation standard [35
], but also emphasized varying citation practices of different disciplines and publishers.
The workshop concluded with a vivid discussion of various topics. One question concerned the differences between public data providers and research data repositories or data archives. Another participant wanted to know more about maximum file sizes of research data repositories as his data files are quite large (more than 10 terabytes). We also solicited feedback for our workshop, which overall was very positive. The participants were grateful and expressed a desire for more in-depth workshops about topics like research data management planning and legal aspects of research data management.
4. Lessons Learned
We have drawn important conclusions from our pilot workshop. The timing of the workshops is essential. Higher education institutions like Humboldt-Universität zu Berlin operate in semesters and lecture periods. To achieve maximum participation of both researchers and students, it is important to choose a day of the week that is not too lecture-intensive. Moreover, a welcoming atmosphere can support motivation. We recommend the use of a smaller conference room instead of a computer room or a lecture hall. This can promote a lively discussion and participants are more willing to ask individual questions. Furthermore, in agreement with the findings of Carlson and Johnston [37
], we highly recommend cooperating with discipline-specific research data repositories. If at all possible, the discussion of appropriate (discipline-specific) repositories should be integrated into the workshop. This could also improve the inside knowledge of research data management trainers, because they are already familiar with specific processes and obstacles.
Our workshop showed us that more information about the available on-site support and the existing data policy [11
] is needed. Most of the participants did not know about the policy or the existing webpage for research data management at Humboldt-Universität zu Berlin. We will consequently increase our efforts to inform and reach out to researchers and faculty. To achieve a higher participation rate of researchers and professors, more pressure from university leadership as well as research funders is needed. Through clearly formulated requirements, consistent research data management can be enforced. Furthermore, its benefits can support acceptance and adoption. Fecher et al.
conclude that “academia is a reputation economy
, an exchange system that is driven by individual reputation beyond money and status. In this regard, data sharing will only see widespread adoption among research professionals if it pays
in the form of reputation” [7
] (p. 3). Incentive structures need to be created to bolster high quality research data management and support data sharing.
Specifics of Geographic Research Data Management
The workshop curriculum preparation as well as the insights gained from our participants can be summarized according to the following statements and recommendations about geographic research data management. Geography is separated into two main branches—physical geography and human geography. At universities in Germany, Austria and Switzerland, human geography research is in fact frequently affiliated with humanities departments, separate from physical geography research, which is conducted under the purview of the science faculty. This leads to multiple challenges one needs to be aware of when dealing with geographic research data. Firstly, geographers use a variety of techniques to collect research data. They work with diverse data types, hence the search for the right research data repository can become confusing [1
]. This so-called “data curation continuum
” needs to be addressed [38
]. A clear structure for different data types, their requirements and discipline foci (e.g., environments or measurement types) is therefore the key to successful research data management.
Secondly, and concurring with the aforementioned point, the amount and complexity of the data vary. Some researchers work with data that comprises files of kilobyte size (e.g., in cultural and social geography) whereas files produced by others are of terabyte size; such data sets are thus often kept and analyzed on external hard drives (e.g., datasets on geomatics and geomorphology). This problem needs to be kept in mind and applies especially to geography.
Thirdly, geographers work with visualization tools and applications. This poses a particular challenge for preservation. The data itself and the code can be archived, but developed and funded web-interfaces, platforms and applications often need virtual machines for secondary-use after project conclusion. The preservation of functionality as a research output is not yet fully addressed. The archiving of webpages through services like WebCite is only a first step in this direction (for web-archiving see e.g., [39
Fourthly, metadata specifics have to be considered. One example is data quality flag values to describe the quality of the measured research data. These codes are discipline specific and often not standardized. For best consistency, it is important to use standards where possible and to document definitions and offer them alongside the corresponding dataset [41
]. This also applies to units for reporting measured parameters.
Fifthly, the persistence of identifiers for geoscientific data is especially problematic [42
]. Geoscientific data can be highly dynamic and corrections or reprocessing is not unusual. The availability and preservation of geoscientific data in all its versions—from raw data to processed, appended or amended datasets—is one of the biggest challenges.
Finally, geographic research data has a high potential for re-use—not only for geoscientists, but also for interdisciplinary uses. Geographic data collection can be very costly or unique. Good research data management in addition to data sharing is thus essential for scientific progress.
Different conclusions can be drawn from this first workshop among geographers. The workshop concept [10
] has proven to be a fit. A group of six to eight participants seems ideal for teaching research data management. The workshop format facilitated answering individual queries. The level of attention was quite high as the participants were motivated to learn about the topic. The discipline-specific approach has significantly contributed to both the workshop participation rate and satisfaction of user needs.
The combination of library personnel and (graduate) researchers is appropriate. Librarians had the opportunity to learn about the researchers’ data and their challenges regarding research data management. The geographers had the possibility to learn about the library services and the librarians’ knowledgebase. Such an exchange of information is unlikely to have occurred outside the workshop.
In the final discussion with the participants, further topics for future workshops were noted. These partly reflect the answers in Table 1
. Legal aspects of research data management (in cooperation with an expert) were named as well as separate more in-depth workshops regarding how to compose a research data management plan or find a repository. However, such in-depth workshops only seem worthwhile after a general understanding of research data management has been achieved.
Immediate outcomes of the workshop include a better understanding of concerns and demands of geographers regarding research data management and sharing (see Section 4
). Contact persons were identified, who predominantly work with research data and desire more support. Moreover, through advertising for the workshop via newsletters and notices, we raised awareness of research data management in the department. This resulted in an invitation to teach, i.e.
, a presentation of research data management in a seminar for Master’s students of geomatics.
Our future plans include the expansion and follow-up of discipline-specific workshops. We intend to publish German-language online tutorials and information material as open educational resources on our website. This will increase the dissemination of our newly acquired discipline-specific knowledge. Simultaneously, the tutorials enable modular, independent learning for students and researchers. Further specification in research data management topics and in-depth workshops are not planned at this time. This might change with more severe funding restrictions emerging in Germany or with basic research data management knowledge spreading among researchers at Humboldt-Universität zu Berlin.