A recent update to a publicly downloadable database maintained by the National Institute of Standards and Technology (NIST) will make it easier to search computers, cell phones and other electronic equipment seized in police raids, leading law enforcement officials to could potentially help catch sexual predators and other criminals.
The database, called the National Software Reference Library (NSRL), plays a common role in criminal investigations involving electronic files, which can be evidence of wrongdoing. In the NSRL’s first major update in two decades, NIST increased the number and type of records in the database to reflect the growing variety of software files law enforcement may encounter on a device. The agency also resized the records to make the NSRL more searchable.
“There are hardly any major crimes that have no connection to digital technology because criminals use cell phones,” said Doug White, a NIST computer scientist who helps maintain the NSRL. “But maybe only some of the data on a phone or other device is relevant to an investigation. The update should make it easier for the police to separate the wheat from the chaff.”
Both criminal and civil investigations often involve digital evidence in the form of software and files from seized computers or mobile phones. Researchers need a way to filter out the large amounts of data that are not relevant to the study so that they can focus on finding relevant evidence.
“Suppose you have a computer that can hold incriminating photos or financial data, but also has a few video games,” White said. “Games often come with a lot of graphics files. You want to do your research as quickly and efficiently as possible, so you need a way to get rid of all the video game images. Then you can perform your more mathematical analysis on the files that remain.”
The update comes at a time when researchers are dealing with a rapidly expanding universe of software, most of which produce numerous files that are stored in memory. Each of these files can be identified by a type of electronic fingerprint called a hash, which is the key to the screening process. The refinement of the sieving process may vary depending on the type of research being performed. The NSRL’s reference data set doubled in size from half a billion hash records in August 2019 to more than a billion in March 2022, and White says he expects the rapid growth to continue.
“Only some of the data on a phone or other device may be relevant to an investigation. … The update should make it easier for the police to separate the wheat from the chaff.” —NIST computer scientist Doug White
This growth makes the NSRL an extremely important tool for digital forensic labs, which specialize in this type of case review. Such work has become a critical part of investigations: There are about 11,000 digital forensics labs in the United States (compared to about 400 crime labs). While digital evidence plays a role in many types of crime, it is especially useful for tracking down child predators, who often have images of sexual abuse stored in the memory of a phone or computer.
While the number of NSRL entries is growing both numerically and by file type — White expects to add entries of Internet of Things (IoT) devices such as smart speakers in the near future — the recent update to the database should help researchers handle the burden. to grab. The previous 2.0 version, which is 20 years old, offered its hashes as basic text files that could be imported into a spreadsheet. Searching in the list was possible but cumbersome compared to modern search engine functions. The update, which is NSRL version 3.0, uses the SQLite format, making it easier for users to create custom filters to sort files and find what they need for a particular survey.
Another advantage is that the NSRL managers can distribute future changes to the dataset as relatively minor updates instead of resending the entire dataset, saving time and effort for users. White also said the NSRL would remain available in its old format for the benefit of users who may need time to adapt to the changes.
“We will continue to publish the dataset in both 2.0 and 3.0 formats until December 2022,” White said. “After that, there’s a relatively simple query that users can run to generate the 2.0 dataset if they need to.”
The dataset and more information about the update are available from the NIST website.