Data Collection

In conflict and disaster zones for AI

Jan 14, 2025

Conflict and disasters often destroy infrastructure, including roads, bridges, and communication networks, which complicates access to conflict zones. These disruptions make it challenging to reach affected populations and can hinder the transportation of data collection tools and personnel. Additionally, the lack of electricity, internet access, and other basic services can impede the use of digital data collection methods. In remote areas of the Democratic Republic of Congo (DRC), poor infrastructure has made it difficult for data collectors to reach communities affected by conflict, limiting the scope of data collected.

The chaotic nature of conflict zones can result in unreliable data. Respondents may be unwilling or unable to provide accurate information due to fear, trauma, or mistrust of outsiders. Moreover, using biased sampling methods, such as only surveying individuals in easily accessible areas, can lead to skewed data that does not accurately represent the broader population. Interviewer bias, where the data collector’s presence or questioning style influences responses, can also affect data quality. In Afghanistan, for example, cultural barriers and mistrust of foreign data collectors have sometimes led to incomplete or biased data on civilian casualties and humanitarian needs.

Ethical considerations are paramount when collecting data in conflict zones. The principle of "do no harm" must guide all data collection activities, ensuring that the process does not put respondents at risk or exacerbate their vulnerabilities. We must carefully manage issues of consent, confidentiality, and the potential misuse of data. For instance, collecting personal data without proper safeguards can expose individuals to reprisals from armed groups or government forces. In Syria, concerns about data misuse have led to heightened scrutiny of data collection efforts, with some organizations adopting more stringent ethical protocols to protect respondents.

Remote data collection methods, such as satellite imagery, drones, and remote sensing, offer valuable alternatives when physical access to conflict zones is not feasible. These technologies can provide high-resolution images of conflict-affected areas, allowing for the monitoring of population movements, infrastructure damage, and environmental impacts without endangering data collectors. For instance, during the conflict in the Ukraine, satellite imagery was used to monitor the destruction of civilian infrastructure and assess the displacement of populations in real time.

Mobile phones and digital platforms have enhanced data collection in conflict and disaster zones. Tools such as SMS surveys, mobile apps, and social media monitoring enable data collectors to gather information from respondents while minimizing direct contact. Mobile-based data collection methods are particularly useful in environments where traditional face-to-face interviews are not possible due to security concerns. For example, in Somalia, mobile surveys have been used to collect data on food security and displacement, providing timely information to humanitarian agencies despite ongoing conflict.

Community-based data collection involves engaging local communities in the data collection process. This approach leverages local knowledge and networks, allowing for more contextually appropriate and culturally sensitive data collection. Community-based methods can include participatory mapping, focus group discussions, and key informant interviews with local leaders and community members. In conflict-affected regions of Nigeria, community-based approaches have been used to gather data on the impact of armed group activities on local populations, helping to inform community-driven responses to the conflict.

Conducting thorough risk assessments is critical in preparing for data collection in conflict zones. Risk assessments should identify potential threats to data collectors and respondents, including physical security risks, health hazards, and legal or political risks. Developing comprehensive risk management plans that include contingency planning, security protocols, and regular monitoring of the security situation can help mitigate these risks. Organizations should also have clear evacuation and communication plans in place for data collectors operating in high-risk areas.

This data requires thorough revision based on the collection circumstances to provide a useful and representative raw data set.

Data preprocessing is a pivotal step in enhancing data quality or discarding specific datasets to establish a balanced base with a known quantitative matrix. It serves as the foundation for extracting meaningful and reproducible knowledge from raw data.

Data mining systems utilize a diverse range of algorithms to generate millions of patterns from input data. However, not all patterns are equally valuable or reliable. The objective is to identify reproducible patterns that can be regarded as knowledge. To accomplish this, the data input into these algorithms must be of high quality, consistent, and devoid of noise and inconsistencies.

Horen’s Substack

Discussion about this post

Ready for more?