Analysis of White House Visitor Logs
Technical details of working with White House visitor data
Following President Obama’s example from 2009, President Biden announced a visitor log policy in May 2021 with planned monthly data releases. This new policy applied to records created for appointments after 12 noon on Jan. 20, 2021 — the time of Biden’s inauguration.
New data are released each month, but are delayed three months. For example, the White House press release from Jan. 30, 2023 indicated data from Oct. 2022 was being released.
President Trump announced in April 2017 he would keep the list of White House visitors secret.
Getting started
Go to WhiteHouse.gov and find the “Disclosures” link at the lower left of the page. Look for “Visitor Logs” on the Disclosure page.
This Visitor Logs page is interactive. A data table can be sorted or searched directly from this page. Several fields are hidden until the “+” link is selected to the left of each name.
Downloading files
Several other fields are only available by downloading the monthly .csv files from the links below the table.
The Oct. 2022 data is in file 2022.10_WAVES-ACCESS-RECORDS.csv. “WAVES” in these file names means “Worker and Visitor Entry System.”
GitHub repository
My GitHub repository shows R scripts in folders 2021 and 2022 that can be used to download all data files programmatically. The script for 2022 can be modified to download future files, too.
Initial Processing
A script, 1-WhiteHouse-Visitors-Processing, in folder Analysis/2023-01-30 combines all files into a single in-memory data frame for processing. A script documents all changes to the data to ensure reproducible research results.
Inconsistent file headers. The script shows how to deal with file headers that were different in the last five monthly files from the first six. Arbitrary changes like this can cause processing difficulties.
The file headers are written to file WAVES-ACCESS-RECORDS-Headers.csv for manual inspection. At this time, old and new field names are equivalent, but “newer” names are in mixed case with abbreviations expanded. [See Section 2.3 in script.]
Duplicates removed. The 11 files were combined [Section 2.4] into a list of strings with the “newest” header record retained. 96 duplicate strings were removed [Section 2.5].
Field delimiters were inconsistent in the file for over 6000 records [Section 2.8]. Two extra fields, “Code” and “Email,” were added to the header to accommodate the inconsistency until it could be explored.
Read file as fields. Because of numerous problems in several fields in past files, the file was read treating all fields as character strings, but forcing conversions for RELEASEDATE as a date field, and “Total People” as a numeric field [Section 3]. Two records were missing a RELEASEDATE value [Section 3.1].
Added fields removed. The added Code and Email fields were scrutinized in Section 3.1.1. The Code values were “FPG”, “OHS” and “RL”. Six .mil and .gov email addresses were in the Email field. These unexpected fields were deleted from further processing since they had no known value [Section 3.1].
Remove constant fields. “Access Type” was always “VA” (Visitor Access), “Post” was always “WIN”, and “CALLER_ROOM” was always missing. These fields were removed [Section 3.2].
This remainder of the script scrutinized all fields in the file for data quality and consistency.
Data fields
Example data fields
Meaning of fields
A White House page from Feb. 8, 2012, which is viewable via the Wayback Machine, gives some information about the data fields. Additional information cleaned from other sources in 2012 is here.
Some President George Bush records were released as part of a court case (see Set 1 and Set 2), which provides some info about these fields. Other documents identify some fields have been redacted (e.g., SSN and date of birth) and are no longer listed below.
Cleanup and Standardization
Many problems were encountered with Obama visitor records. Many of these problems remain in the Biden visitor data.
Standardize Names [visitor, visitee, caller]. Names are entered with many inconsistencies, e.g., with or without periods for initials, with a variable number of spaces. The standardizeName function imposes standardization on the names [Sections 3.5 and 1.4]. Even after standardization, the same person can have different name variations.
Combined name fields. Introduce a single field for visitor, visitee and caller [Section 3.5] using standardized names and separated by a “pipe” (“|”). For example, visitor “William J Birch” with three name fields is reduced to a single field “birch|william|j”.
Introduce Groups. Assign all visit records a group based on the visitee [Section 3.6].
Use groups POTUS, VPOTUS, FLOTUS and SGOTUS for visits to the President, Vice-President, First Lady of the US or Second Gentleman of the US. Since the counts except for POTUS are usually small, all of these are combined as “POTUS” in some summaries.
Assign “Tourist” group to anyone with “visitor” as part of the visitee name. “Tourists” are visiting no known person at the White House.
Otherwise, assign the visit to a White House Staff group. These visitee names are often known White House staff.
Standardize spelling of locations [Section 3.7]. A new LOCATION field is a combination of MEETING_LOC and MEETING_ROOM. Addition standardization is needed.
For example, the LOCATION of the bowling alley in the Old Executive Office Building can be represented in four different ways:
oeob|bowling alley
oeob|bowling alley - 037
oeob|eeob bowling alley -
oeob|truman bowling alley
Fix Dates. Date fields are a mess with diverse formats [Section 3.8]. Almost every date field has problems. In many cases a date and time field has a date without a time. In other cases, the time fields are represented in various ways, with or without seconds, with or without meridiem (AM or PM).
Who would know that “44265.42431” is really the date and time “2021-03-10 05:11”?Apparently someone converted dates and times to numeric fields in Excel.
Summaries
Overall. See file White-House-Visitor-Log-Stats-Overall-2023-01-30.xlsx. Frequency counts of unique values are reported [Section 5.1].
By Release. Similar stats by release are useful as consistency checks [Section 5.2].
Visitor summary counts by month [Section 6.1].
File: White-House-Visitor-Counts-by-Group-by-Month-2023-01-30.csv
[One summary record for each of 194,209 visitors.]
This file in Excel can be filtered to look at the top POTUS visitors by month:
Visitee summary counts by month for the 3445 people visited at the White House [Section 6.2].
File: White-House-Staff-Visitee-Counts-by-Month-2023-01-30.csv
[One summary record for each of staff visitee.]
Frequency counts for each field [See Counts folder in GitHub.]
Useful for additional standardization and cleanup.
The middle initial “N” seems to be overrepresented and might mean “none” or “missing”.
One source suggests “N” is not a common middle initial.
Related technical articles from 2012
Much of the information in these articles is still relevant in analyzing new data releases, but many links in them are stale. I wrote these articles long ago and they can now only be accessed via the WayBack Machine.
Background information on White House visitor data, 2012-12-10.
Legal battles over White House visitor records
Exceptions to releasing White House visitor data
Additional WAVES/ACR information from legal documents
Role of Secret Service
New WAVES tag for sensitive records about national security concerns
White House Worker and Visitor Entry System (WAVES) Data Fields, 2012-12-12
Cleaning and standardizing White House visitor data, 2012-12-14.
Duplicate records
Standardizing all names
Combined name fields
Metadata
Two-thirds of White House visitor records about tourists, 2013-03-06.