There are currently two places to get materials from the Weisberg collection. The main source is naturally the website that Hood College set up for the collection. The second source is Archive.org. The Archive.org now has a “copy” of the Hood collection, and when searching for Weisberg materials on-line, both of these often turn up in search results.
Curious about this, I finally got around to downloading the Weisberg materials on Archive.org last week and have now had a chance to take a look at them. They turn out to be significantly different from the materials on the Hood College website, so I’m posting a note on some of these differences.
Archive.org is a gigantic filing cabinet, and it can sometimes be quite difficult to track down the sources of the materials that are put up there. In this case, it seems the Weisberg materials there were posted mostly by one Mike Best, archivist for the National Security Internet Archive. I haven’t quite figured out who or what NSIA is, except that it is not related to the National Security Archive at George Washington University. The NSIA was registered at Archive.org in March 2015, and since then it has since posted a huge amount of materials. NSIA began posting Weisberg Materials in August 2015, and apparently finished putting up what they had by the end of September.
The description of the Archive.org version is at “Complete Weisberg Archive on the JFK Assassination”, which says: “Harold Weisberg donated the world’s largest accessible private collection of government documents and public records relating to the assassination of President John F. Kennedy to Hood College and the Beneficial-Hodson Library at Hood College, which donated a copy to the National Security Internet Archive.”
So this is not just someone scraping the Hood collection, but a copy provided by Hood to NSIA. If you really want the whole thing, there it is: 29 compressed files, over 100 gigabytes even in the ultra-compressed 7z format. It was quite a job getting all this stuff direct from archive.org. There is a torrent file that might be faster, but the word is that our school throttles torrents, so I did multi-day downloading through archive.org.
Having gotten the whole thing, I’ve had a chance to compare parts of it with the Hood College version, and they are indeed different. The most important difference is that the Hood pdf files were run through OCR software (apparently mostly Omnipage 18) to convert them into searchable files. There is a search interface for the OCR versions available at the Hood website, and this is by far the most convenient, effective way of accessing the Weisberg collection. The Archive.org files have not been OCRed; they are simply images.
This is not the end of the story though. After some rather hard poking through the NSIA materials, it seems that this is very likely a working copy of the Hood materials. It’s most useful feature is that it includes excel files for the pdfs in each directory. These excel files have all kinds of important information, such as dates, to-from fields for letters etc, and comments and cross-references to related documents. These excel files are mostly not available from the Hood website.
Unfortunately, the fact that these are “working” files also has another meaning. The whole thing seems to have been simply yanked off a hard disk at some point. The most recent files in the NSIA materials are dated 2015-07-12, and there are a number of temporary excel files included in the archive which also have this date. So the backup was done without even closing the excel files that were being edited. A number of these were clearly not yet done, with numerous inconsistencies in the files listed in the excel sheets and the files actually present in the directory. Some of the excel sheets are even in the wrong directories, with whole directories sometimes misplaced inside other directories as well.
This is not to dismiss the amazing amount of work done on the collection. The large majority of the files are listed, and the large majority of the information listed is accurate, but the Weisberg collection is so huge that “large majority” means there are still thousands of places where there are problems. It is not a trivial task to fix these problems.
It is also worth noting that there are tens of thousands of duplicate pdfs throughout the collection. These are not just duplicate files in the Weisberg collection; there are places where the exact same pdf file is present in multiple locations. Some of this is probably some sort of cross referencing system. An example is that in the giant C zip file, there are dozens of directories of the form CIA [someone’s name]. Most of these appear in other places, with the directory name in the form [someone’s name] CIA. In the second form, however, sometimes the pdf files in these directories are still named CIA [someone’s name]. In cases where they have been renamed, it is almost always the case that they are still the same pdfs, just with the names changed. Some of these duplicate directories also do not appear in the materials on the Hood website and it seems that NSIA copy may represent the Hood archivists’ current efforts in this area.
Despite these problems, the NSIA copy is a useful ancillary to anyone who wants to work with the collection as a whole As an example, the excel indices include a “date” field for much of the collection’s files. According to this, the earliest fully dated document in the collection is Weisberg’s birth certificate: April 8, 1913. There are also a few documents from after Weisberg’s death in 2002, including the obituary of Weisberg’s wife Lillian, who died March 20, 2003. The most recent document is a powerpoint file for a 2011 conference presentation by Clayton Ogilivie, the primary archivist for the Hood collection (Presentation-Canterbury 01.pptx, located in the P zip file, apparently not otherwise available either at Archive.org or Hood). This gives a very useful overview of the collection and its history. Everyone interested in Weisberg and his materials owes a huge thanks to Mr. Ogilivie and the others who have put so much time into this project.