GLOBALISE Ethics Policy¶
Date: February 19, 2025
Version: 2.0
Author: Mrinalini Luthra and Amber Zijlma
I. Introduction¶
This document governs and guides GLOBALISE’s work and ethos. We created it because we believe that articulating our values and obligations to one another reinforces the respect and care among the team and in our work. Having a policy also provides us with clear avenues to correct our culture should it ever stray from course.
This policy should be read in conjunction with Combatting Bias’ ethical commitments. While this policy focuses more on digital infrastructure development, the latter specifically address institutional and systemic injustices encountered in our work.
We share this policy document to contribute to the ongoing conversation about ethical oversight in academia. We encourage others to adapt and utilise it.
This is a living document which we will revisit and revise. We encourage anyone with inquiries or a desire to discuss it to reach out to us. Your input and engagement are welcomed and valued.
II. Ethics Guidelines¶
In GLOBALISE, ethical adherence is a cornerstone throughout the project lifecycle, encompassing work packages, future plans, and governance. Reflexivity is a core principle of the project: should any aspect fall short of these ethical standards during periodic evaluations, it is imperative to restructure it to conform to the following core principles.
1. Diversity, Equity, and Inclusion (DEI)¶
GLOBALISE is dedicated to advancing DEI in every part of the project: people, governance, perspectives, datasets, algorithms, and interfaces.
Diversity encompasses a wide range of differences and variations within any given environment or system. Diversity may include variations in not only individual characteristics like ethnicity, age, gender identity, religion, physical abilities and disabilities, cultural background, and education but also extends to encompass differences in ideas, perspectives, datasets, algorithms, infrastructural elements, and any other factors that contribute to the overall complexity and richness of the system in question. Embracing diversity means recognizing, appreciating, and leveraging the breadth and depth of distinctions.
Equity strives to rectify disparities and create a level playing field for all elements within a system.
Inclusion refers to the behaviours, attitudes, and social norms within our project that ensures that there is space for multiple identities, groups and expressions.
The way in which we advance DEI is visible in the following points:
Countering Bias We pay attention to situations involving vulnerable groups and those that have been historically disadvantaged or at a risk of exclusion and to situations characterised by asymmetries of power. Through our work, we advocate for the necessary inclusion of bias analysis within digital humanities projects. For identification, analysis and articulation of bias, we work closely together with the Combatting Bias project.
Education and Training We provide regular training and workshops to increase awareness and sensitivity about diversity, equity, and inclusion both in our project and in our work.
Accessibility and Design Our infrastructure design approach acknowledges and addresses advantages and challenges faced by different social groups.
Stakeholder Participation GLOBALISE finds it critical to work with stakeholders who may directly or indirectly be affected by the infrastructure. Our approach is built on three pillars: a diverse steering team supported by external advisors, regular interactive workshops for knowledge sharing and feedback, and dedicated community engagement. Through our outreach manager, we continuously establish new partnerships and maintain open channels of communication with GLOBALISE users, ensuring their needs, critques, and insights shape our development.
Safe Digital Space Through our work, we strive to create a digital infrastructure where everyone can feel safe doing research. We acknowledge the sources GLOBALISE focuses on include offensive language, as well as harmful metadata descriptions and opaque indexing practices: discriminating, erasing and belittling marginalised peoples. Through our careful and expert-led historical contextualisation, documentation and interface design we aim to address these safety concerns by providing clear warnings and improved metadata.
Respecting Ownership Community experts hold knowledge invaluable to our project. In order to maintain and respect where this ownership is located, we aim for fair compensation to community experts for insights into our work.
Documentation We include all our interventions and strategies to promote DEI in extensive documentation and reports.
2. Transparency¶
means open disclosure about our project’s data sources, algorithms, decisions, and governance structures.
This entails:
Documentation of different parts of the GLOBALISE infrastructure to aid transparency and explainability. This includes data-envelopes for datasets, model cards for NLP models, thesaurus for terminology, reports on stakeholder participation, etc.
Communication of the characteristics, limitations, and potential shortcomings of the system to users and stakeholders, through interface design and user guides.
FAIR Research As a publicly funded project (NWO), all of GLOBALISE’s publications are open access, in accordance with the Open Access Policy. Our datasets are Findable, Accessible, Interoperable, and Reusable according to the FAIR principles. This means we publish open access and downloadable data. Code for our models is also open-source, and is published on GitHub.
3. Accountability¶
refers to the project’s ownership of its decisions and outcomes, adherence to laws and policies, and its obligation to address consequences.
This includes:
Auditability It is important to establish mechanisms that facilitate the infrastructure’s auditability. This will include providing extensive provenance on the data produced and provided by GLOBALISE and any other authoritative layers that we add, creating meticulous data-envelopes for every dataset produced and publishing datasets and research in peer-reviewed journals.
Training and Education to help develop accountability practices.
Redress Mechanisms Establishing systems to inform and provide recourse to users and third parties.
Reflexivity Our position as researchers in the Global North carries inherent privileges and responsibilities. We acknowledge both the material advantage of our well-funded research position and the conceptual challenges of working with the VOC archives – a collection that embodies Eurocentric perspectives and risks perpetuating the erasure of indigenous peoples and knowledge systems. Our research actively works to shift this balance, de-centering dominant historical narratives while amplifying counter-narratives that foreground historically disempowered communities and their experiences.
Ethical Reviews GLOBALISE is committed to have their research examined on their safety by an external Ethical Committee when it involves engagement from human participants.
4. Societal and Environmental Wellbeing¶
GLOBALISE should benefit society and ensure that it is sustainable and minimises environmental impact.
Societal Wellbeing involves recognising how the project can affect various communities.
This entails understanding the social and cultural dimensions of the content within the archive. Archives contain materials that use language or express views that are now considered offensive or inappropriate. It’s important to address this issue sensitively. GLOBALISE will warn users of such content in the VOC archives and will contextualise records of contentious or violent past events with sensitivity toward affected groups.
Ecological Impact involves consideration of the environmental footprint.
GLOBALISE will consider the environmental footprint of the project throughout its entire lifecycle, from development to implementation.
Data and Infrastructure Choices When building and maintaining our infrastructure, we opt for energy-efficient solutions without compromising on performance or reliability.
Resource-Efficient Alternatives Acknowledging the diverse capabilities of our users, we will provide alternative, lighter versions of our models and datasets. This approach ensures that our resources are accessible to a broader audience, including those without access to high-end computing facilities.
Publishing for Accessibility In publishing our data and tooling, we ensure that our deliverables are optimised for varied computational environments. We provide clear documentation and support for both our full-scale and minimal models, ensuring that users can choose the most suitable option for their specific context.
The last 2 apply to AI systems:
5. Robustness¶
GLOBALISE prioritises social robustness of the models it employs.
Robustness is the ability of a model to perform well on new and unseen data, not just on the data it was trained on. In particular, we will check our models to ensure that they do not provide discriminatory or biased results.
Curation of Training and Testing Data Our primary strategy for ensuring social robustness lies in the careful curation and selection of the training and testing data. We will invest additional time and resources in evaluating how our models perform across various data categories. This careful selection process conducted by those with relevant historical expertise/domain experts aims to minimise biases and ensure that our models are as inclusive and representative as possible.
Qualitative and Quantitative Analysis We will conduct both qualitative and quantitative analyses of our models’ outputs. This dual approach allows us to rigorously evaluate the models for potential biases, ensuring they do not unfairly discriminate against any category of data.
Contextual Application and Validation We are committed to applying our models in contexts similar to their training data to ensure reliability. Additionally, we will explore supplemental information that can be used in tandem with our models for more insightful validation. This might include additional data layers obtained during the entity linking phase or other relevant attributes.
6. Privacy and Data Governance¶
Privacy is the right of individuals to control or influence what infomration related to them may be collected and stored, and regulate who may have access to that information.
Data Protection includes measures and practices implemented to ensure that personal data is handled in compliance with data protection laws and regulations, safeguarding the rights and freedoms of individuals.
Data Governance refers to the overall management of the availability, usability, integrity, and security of the data employed our project, including the policies, processes, standards, and metrics that ensure the effective and efficient use of the data.
We prioritize data protection and commit to not collecting or storing personal user information. For research data, our governance framework includes comprehensive oversight mechanisms for collection, storage, processing, and access control. All research data is stored with certified trusted repositories, ensuring long-term sustainability and security.
III. Ethics Report¶
In this section, we provide a template for implementing and applying the ethics guidelines to the GLOBALISE project. Rather than presenting a complete report for GLOBALISE, we offer the Ethics Report template and two detailed workflow examples that demonstrate how to evaluate project components against our ethical principles.
This template serves multiple purposes:
- Establishes a systematic approach to documenting ethical considerations
- Tests the feasibility of proposed ethics guidelines in real project scenarios
- Creates a blueprint that other digital humanities projects can reference
Each project component should be evaluated against the following ethical dimensions outlined in Figure 1.
DEI |                                                                    |
Transparency | |
Accountability | |
Well-being | |
Robustness | |
Privacy + Data |
Table 1: Ethics Report Template
Below we provide ethics reports of two workflows:
Primary Sources¶
Author: Lodewijk Petram
Last updated: 14 February 2025
DEI | The source corpus of GLOBALISE, primarily comprising a large part of the Overgekomen Brieven en Papieren (OBP) of the Dutch East India Company (VOC), offers a unique window into early modern societies in Asia, Africa, and Australia. This corpus allows the creation of new global histories, including for regions where archival sources are scarce. However, the corpus predominantly reflects the VOC's perspective, a limitation that GLOBALISE recognises and strives to actively address. In pursuit of a more balanced historical narrative, GLOBALISE is committed to developing its infrastructure in such a way that it allows for the inclusion of additional perspectives. Crucial to this expansion is the project's engagement in stakeholder participation. GLOBALISE regularly consults with a user panel and an advisory board, comprising experts and representatives from diverse backgrounds. This ongoing dialogue not only aids in refining access to the OBP but also plays an important role in identifying ways to include additional sources and perspectives that would be a valuable addition to the infrastructure. This approach illustrates GLOBALISE's commitment to upholding the principles of DEI by ensuring that the resources offered are not only expansive but also representative of the multifaceted histories of the regions and peoples affected by the VOC's activities. |
Transparency | GLOBALISE openly acknowledges the limitations of its source corpus, mainly stemming from its VOC-centric perspective. This transparency is crucial in guiding researchers to understand the context and potential biases of the information presented. The project's commitment to facilitating the incorporation of additional historical perspectives exemplifies its dedication to transparency, ensuring that users are aware of both the strengths and the limitations of the current archival content. |
Accountability | In its handling of the VOC archives, GLOBALISE shows accountability by openly acknowledging the inherent biases in its source materials. The project tries to actively pursue the inclusion of additional perspectives to address these biases, by unsilencing the voices in the source corpus that do not speak from the VOC perspective and striving to facilitate the inclusion of other sources in the infrastructure. This approach reflects a responsible and ethical engagement with historical resources, demonstrating a commitment to rectifying historical imbalances in narrative and representation. |
Well-being | Offering online access to the GLOBALISE source corpus contributes to societal well being because it allows researchers and anyone interested to illuminate aspects of early modern global history, particularly about regions underrepresented in historical narratives. The project thus aids in the creation of more nuanced understandings of historical events and societal structures. The digital nature of the project is likely to reduce the environmental footprint associated with physical archival research. Moreover, generating and making transcriptions available centrally represents efficient use of energy. |
Robustness | Not applicable |
Privacy + Data | The original source corpus of the OBP is kept by the National Archives. GLOBALISE performs checks to ensure that identified parts of this source corpus as offered by the National Archives is made available and accessible to all in its infrastructure. A copy of the transcriptions is made freely available for download in the GLOBALISE Dataverse: https://hdl.handle.net/10622/LVXSBW. This approach ensures the integrity and reliability of the information provided to researchers and the public. |
Table 2: Ethics Report - Primary Sources
Places Dataset¶
Author: Ruben Land and Manjusha Kurrupath
Last updated: 14 February 2025
DEI |
|
Transparency |
|
Accountability |
|
Well-being | We strive to consider the well-being of the user by creating a diverse dataset and enable channels of communication with them if the user considers information to be incomplete, biased or discriminatory. |
Robustness | Not applicable |
Privacy + Data |
|
Table 3: Ethics Report - Places Dataset
These ethics reports serve as monitoring tools rather than as static documents. We actively use them to track our progress, identify areas for improvement, and ensure our work aligns with our core values and policies. Through regular review and updates of these assessments, we maintain accountability and adapt our practices as needed. Looking ahead, we are committed to expanding our transparency and accountability by publishing comprehensive ethics reports for each component of our work.
IV. Licence¶
This document has been made available under a CC BY-SA 4.0 licence. We extend an invitation to customise and utilise this material for your own project. We recommend modifying it to align with your project’s mission and identity, and we appreciate credit for building on our work.
V. References¶
-
‘CARE Principles for Indigenous Data Governance’, https://www.gida-global.org/care.
-
Carroll, Stephanie Russo, Ibrahim Garba, Oscar L. Figueroa-Rodríguez, Jarita Holbrook, Raymond Lovett, Simeon Materechera, Mark Parsons, et al. ‘The CARE Principles for Indigenous Data Governance’. Data Science Journal 19 (4 November 2020): 43–43. https://doi.org/10.5334/dsj-2020-043.
-
Chilcott, Alicia. ‘Towards Protocols for Describing Racially Offensive Language in UK Public Archives’. In Archives in a Changing Climate - Part I & Part II, edited by Viviane Frings-Hessami and Fiorella Foscarini, 151–68. Cham: Springer Nature Switzerland, 2022. https://doi.org/10.1007/978-3-031-19289-0_10.
-
‘Colored Conventions Project’, https://coloredconventions.org.
-
D’ignazio, Catherine, and Lauren F. Klein. Data Feminism. MIT press, 2020.
-
‘Ethics Guidelines for Trustworthy AI | Shaping Europe’s Digital Future’, 8 April 2019. Ethics guidelines for trustworthy AI.
-
Indigenous Archives Collective. ‘Indigenous Archives Collective Position Statement on the Right of Reply to Indigenous Knowledges and Information Held in Archives’, 9 August 2021. https://indigenousarchives.net/indigenous-archives-collective-position-statement-on-the-right-of-reply-to-indigenous-knowledges-and-information-held-in-archives/.
-
‘Linked Infrastructure for Networked Cultural Scholarship’. Accessed 19 February 2024. https://lincsproject.ca/.
-
Luthra, Mrinalini, and Maria Eskevich. 2024. “Data-Envelopes for Cultural Heritage: Going beyond Datasheets.” In Proceedings of the Workshop on Legal and Ethical Issues in Human Language Technologies @ LREC-COLING 2024, edited by Ingo Siegert and Khalid Choukri, 52–65. Torino, Italia: ELRA and ICCL. https://aclanthology.org/2024.legal-1.9
-
Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. ‘The FAIR Guiding Principles for Scientific Data Management and Stewardship’. Scientific Data 3, no. 1 (15 March 2016): 160018. https://doi.org/10.1038/sdata.2016.18.