An R package for downloading and working with cleaned Vietnam War datasets declassified by the National Archives and Records Administration (NARA). The datasets and cleaning procedures are documented in:
Smith, Gregory L. “Beyond body counts: Vietnam war ground combat data.” Conflict Management and Peace Science (2025). DOI: 10.1177/07388942251391081
Overview
During the Vietnam War, the Department of Defense implemented an unprecedented effort to collect data and quantify nearly every aspect of the conflict. Led by Secretary of Defense Robert McNamara, the US military devised metrics intended to determine whether they were winning or losing and to optimize their efforts on the battlefield. The resulting files document incident-level information on North Vietnamese Army (NVA) and Viet Cong (VC) attacks across Southeast Asia, Allied ground combat operations, naval gunfire support, bombing sorties, estimates of territorial control, and summaries of psychological operations, among many other topics.
Despite the impressive scale and scope of these collection efforts, most of the electronic records have been inaccessible to researchers because of antiquated file formats, indecipherable alphanumeric coding systems, missing delimiters, zoned-decimal storage, and incomplete geographic coordinates. This package provides cleaned, analysis-ready versions of 12 of these source files, along with a combined incident-level dataset that joins the four major ground combat sources into a single standardized table.
Available Datasets
The package provides access to 12 cleaned data files. A summary is provided below; full details are available in Smith (2025) and the accompanying Online Appendix.
| Function | Abbreviated Name | Full Name | Period | Description |
|---|---|---|---|---|
get_comb_inc_dta() |
— | Combined Incident Database | 1963–1973 | Master incident-level dataset joining VNDBA, SITRA, TIRSA, and VCIIA into a single standardized table (638,760 rows, 43 variables) |
get_vndba() |
VNDBA | Vietnam Database | 1963–1969 | Ground combat operations in South Vietnam, part of the OPSANAL system (365,015 rows, 108 variables) |
get_sitra() |
SITRA | Situation Report Army | 1966–1973 | Records of ground combat operations in Southeast Asia detailing military operations, casualties, and logistics (77 variables) |
get_tirsa() |
TIRSA | Terrorist Incident Reporting System | 1967–1973 | Records of Viet Cong and North Vietnamese incidents against South Vietnamese civilians, part of the OPSANAL system (37 variables) |
get_vciia() |
VCIIA | Viet Cong Initiated Incidents Files | 1965–1968 | Data on enemy-initiated incidents during the Vietnam War focused on Viet Cong activities (29 variables) |
get_conga() |
CONGA | Combat Naval Gunfire Support File | 1966–1973 | Records of naval gunfire support missions in Southeast Asia (161,283 rows, 46 variables) |
get_nasva() |
NASVA | Naval Surveillance Activities File | 1966–1972 | Information on Operations Market Time and Game Warden in South Vietnam (670,242 rows, 69 variables) |
get_dcas() |
DCAS | Vietnam Conflict Extract Data File | 1956–2006 | Records of 58,220 US military fatal casualties of the Vietnam War (55 variables) |
get_seafa() |
SEAFA | Southeast Asia Friendly Forces File | 1966–1972 | Monthly locations of American, South Vietnamese, and Allied maneuver battalions in Southeast Asia (15 variables) |
get_basfa() |
BASFA | Enemy Base Area File | 1967–1971 | Geographic boundaries and status of 139 North Vietnamese base areas in South Vietnam, North Vietnam, and Cambodia (36 variables) |
get_hes_hamla() |
HES | Hamlet Evaluation System (HAMLA) | 1967–1969 | Monthly hamlet-level security, political, and socioeconomic assessments under the HAMLA instrument (389,505 rows, 76 variables) |
get_hes70() |
HES | Hamlet Evaluation System (HES-70/71) | 1969–1974 | Monthly hamlet-level security and development assessments under the HES-70 and HES-71 instruments (693,862 rows, 160 variables) |
get_psyops() |
PSYOPS-A | Psychological Operations: Surface | 1970–1973 | District-level records of surface psychological operations (6,957 rows, 45 variables) |
get_sortie() |
SORTIE-A | Psychological Operations: Aerial | 1970–1973 | Records of individual aerial psychological operations sorties (41,199 rows, 18 variables) |
Usage
All download functions share the same interface. Files are cached locally after the first download so that subsequent calls load from disk without re-downloading.
library(VietnamWarData)
# Combined incident-level ground combat database
comb_inc <- get_comb_inc_dta()
# Individual source files
vndba <- get_vndba()
sitra <- get_sitra()
tirsa <- get_tirsa()
vciia <- get_vciia()
# Naval operations
conga <- get_conga()
nasva <- get_nasva()
# Casualties
dcas <- get_dcas()
# Unit locations and base areas
seafa <- get_seafa()
basfa <- get_basfa()
# Hamlet evaluation
hamla <- get_hes_hamla()
hes70 <- get_hes70()
# Psychological operations
psyops <- get_psyops()
sortie <- get_sortie()Forcing a re-download
comb_inc <- get_comb_inc_dta(force = TRUE)Cache management
# Show the local cache directory
vw_cache_path()
# Delete all cached files (data will be re-downloaded on next call)
vw_clear_cache()The Combined Incident Dataset
The get_comb_inc_dta() function returns the master incident-level ground combat database, which joins the four incident-level source files — VNDBA, SITRA, TIRSA, and VCIIA — into a single standardized table covering ground combat events in South Vietnam from 1963 to 1973. The combined dataset was constructed through six steps: (1) column names were standardized across source files, (2) actor labels were standardized, (3) a column identifying the source file was added, (4) incidents were merged, (5) duplicate entries appearing in multiple source files were removed, and (6) a unique incident ID was created for each remaining observation.
Key variables in the combined dataset include:
| Variable | Description |
|---|---|
master_event_id |
Unique incident identifier |
data_file_origin |
Source file (VNDBA, SITRA, TIRSA, or VCIIA) |
aggressor_side |
Initiating force (Allied or enemy) |
aggressor_sub_group |
Specific unit or group |
general_action_category |
Broad action type |
enemy_action_category |
Enemy action type (e.g., direct attack, harassment, terrorism, mining) |
initiation_date |
Date of incident |
initiation_time |
Time of incident |
n_friendly_kia |
Number of friendly killed in action |
n_friendly_wia |
Number of friendly wounded in action |
n_enemy_kia |
Number of enemy killed in action |
n_enemy_wia |
Number of enemy wounded in action |
prov_name |
Province name |
corp |
Corps tactical zone |
lat / lng
|
Geographic coordinates |
Note that the combined dataset does not replace the individual source files, which contain additional variables not included in the master table.
Data Processing
The original NARA files presented significant technical challenges before they could be used for research. These included antiquated binary EBCDIC file formats incompatible with modern software, missing delimiters between data fields, alphanumeric coding systems requiring hundreds of pages of documentation, zoned-decimal numeric storage, incomplete military grid reference system (MGRS) coordinates lacking the zone identifiers needed to convert to latitude and longitude, and fixed and period set file structures requiring complex merging procedures.
The cleaning pipeline for each file involved: (1) converting binary files to ASCII, (2) adding delimiters, (3) converting zoned decimals to integers, (4) merging control, fixed, and period set files, (5) creating informative column names, (6) reformatting dates and times, (7) replacing alphanumeric codes with intelligible labels, (8) converting MGRS coordinates to latitude and longitude, and (9) removing duplicate entries. Full documentation of each step is provided in Smith (2025) and the Online Appendix.
Note on Spatial Data
The get_basfa() function returns a dataset with spatial geometry columns (sfc_POLYGON) representing the boundaries of North Vietnamese base areas. The sf package is required to work with this data and is installed automatically as a dependency of VietnamWarData.
