SAMPLE DATA

Jump to this section:
LFNs Long filenames:    NSRL INFO:     DISKCAT:Demo batch file    HASH:and SEARCH Demo batch    MD5: Demo batch   

Purpose

The purpose of this page is to provide one stop shopping for Maresware data files which contain data that might be useful in your operations. Because of the size and/or content, some of the files are 7-zip files, which have been named with a .zip extension in order for the browser to download the file.

Once downloaded, unzip using the appropriate software. Some 7-zip files (created with 7-zip because other zipping software failed to properly perform) were renamed with a .zip extension to allow browsers to download the file as a zip file. In these cases, just rename to .7z and use 7zip to unzip the file. If a file is a 7-zip and you use other unzipping softare you may loose some of the data. (forensically speaking)

In some of the sections below, you will see links to TEST_SUITE which is a self extracting, winrar executable that contains a number of directories holding data and batch files to run that will demonstrate the operation of the program being mentioned. To extract the files, set up an emtpy diretory, (ie: X: \TEST_STUFF), then place the executable in that directory, and then from the command line, run the program TEST_SUITE.exe. It will extract a number of folders of which, most of the names will be self explanatory. The folders D1 and D2 contain the test data files, while the other folders contain the executables, batch files to run the executables, output file directory, and a directory containing supporting files for the executables. Once extracted, the _READ_ME.TXT file will explain what is contained in the extracted folders, and how to run the batch files.


Long Filename Files

The file TEST_SUITE   contains sample files, and a batch file (script), which will allow you to create folders with files and directories that contain filenames which are longer than 255 characters. The D1 folder once extracted holds the long filename files. See if your software can find ALL the files (including alternate data streams) located in this tree.

Many stand alone programs which perform file system recursion have a very basic flaw. They cannot process files which contain paths/filenames that are longer than 255 characters. With current operating systems, and persons creating filenames that read like "War and Peace" it is not uncommon for files to be located in paths which are longer than 255 characters.

I have tested a number of these "forensic" programs to see if they can find and process files which fit this description. Many of the stand alone programs which are supposed to recurse/traverse a directory fail at the 255 character limit. "Maresware have been specifically coded to find these files."

Download this executable and run it from an emtpy directory. The D1 folder is the one that contains paths greater than 255 characters. Then see if your recursion programs can find and process these files.

top

NSRL GENERAL

The NIST main NSRL page can be found here. NSRL-NIST overview This page is a very important read: and is where you can download the RDS hash sets. The current 2024_03 set actually is made up of 4 sets of data. When uniqued and combined the MD5's from the 4 sets total just over 180 million unique MD5 values.

You can also check this page on my website for another overview of the NSRL data file processing. On this page you will also find how to request the 180+ million unique MD5 records which I have processed and made available for download.

Currently, May 2024, I have re-processed the giganto data base of over 1.3 billion (somewhere around 1,355,774,653) records which are in the current data sets.

I took the current 1.3 gig. data set and extracted out the MD5 values, combined and uniqued them to 182,078,612 unique MD5 records.

The current format I have is merely the MD5 value with carriage returns. So the record is 34 characters (32 MD5 and CR/LF).

Also, be reminded that the files contained in these lists are not guaranteed to be GOOD file. They are just referenced as KNOWN files. Which means that some of them may be bad files, virus', etc. So read the NIST documentation to become familiar with the actual files included in the data bases.


My analysis and formatting of the lists.

Previously I combined the MD5|SHA pair. But when examining the last set I found no collisions, so for the current 2024 set, I have not included the SHA value which would more than double the file size. So if you want the SHA value, get it from the original NIST data sets.

My current counts for the data sets are below. alues let me know).

Current March 2024_03 Hash Counts (before combining and uniquing)
               Total              Unique
Modern:      879,510,365        69,437,521
Legacy:      289,938,900        61,814,050
Android:      97,148,886        29,406,405
IOS:          89,176,502        26,148,185
            ============       ===========
Total:     1,355,774,653       186,806,161
Combined Unique                182,078,612
smart people, figure where the 4 million went.

Below are my total counts for the various older version items prior to 2022 which you may have in your library. I didn't ask, but I do hope NIST included all the older version data into any current work they perform. 😁

Version|  IOS      |  IOS_UNQ  |  ANDROID  |  ANDROID_UNQ | RDS/Modern | MODERN_UNQ |  LEGACY    |  LEGACY_UNQ
       |           |           |           |              |            |            |            |  
ver 231|           |           |           |              |            |  19,222,354|            |  
ver 258|           |           |           |              |            |  38,316,594|            |  
ver 260|           |           |           |              |            |  41,980,129|            |  
ver 262|           |           |           |              |            |   7,713,758|            |  
ver 265|           |           | 15,728,036|     5,177,636|            |            |            |  
ver 267| 14,390,472|  7,713,758|  8,396,701|     4,164,911|            |            |            |  
ver 270|  9,037,374|  5,334,511| 16,245,715|     7,043,597| 124,858,861|  31,638,076|            |  
ver 271| 46,447,082| 25,883,151| 18,890,716|     7,845,648| 130,274,166|            |            |  
ver 273| 46,447,082| 25,883,151|           |              |            |            |            |  
ver 274|    931,242|    568,223| 41,589,780|    14,861,596| 192,677,749|  38,320,334| 113,737,918| 46,111,042
ver 275| 13,124,271|  7,115,566| 50,308,347|    17,799,609| 202,302,512|  41,850,361| 134,570,414| 54,424,559       
        ---------------------------------------------------------------------------------------------------


Here is the breakdown by first character count: 0-F of the 2024_03 data set. Consistancy and even distribution is the name
of the game.
 
0     +11,379,321 
1     +11,387,167 
2     +11,382,240 
3     +11,376,850 
4     +11,380,975 
5     +11,379,100 
6     +11,381,382 
7     +11,380,499 
8     +11,375,000 
9     +11,381,355 
A     +11,380,539 
B     +11,381,181 
C     +11,372,183 
D     +11,378,769 
E     +11,378,872 
F     +11,383,179 
      ===========    
Total 182,078,612
  
Nice, when a plan (or the totals) comes together :-)
top

The entire NSRL SQL data base files can be downloaded from the NIST site mentioned above. As mentioned before, I only extracted out the MD5, SHA values for availability on this website. If you really want the entire data set, download the data base, and have fun.

top

DISKCAT    DEMO    BATCH

The file TEST_SUITE   contains a batch file with sample command lines to run diskcat.

Run the executable from a top level (empty) directory to extract all the files and programs. Then run the command line diskcat_demo.bat file located in the run_exes directory. The batch should have all the correct paths layed out to provide adequate results. Compare the output of this diskcat(aloging) program with your own programs used to provide listings of files located in specified evidentiary paths. See which provide more evidentiary information.

top

HASH    and    SEARCH    BATCH

The file TEST_SUITE   batch file contains a batch file which will create a data set of SHA 256 values. Then use the Maresware search program to search the hash data set for a specified number of SHA256 values.

Run the executable from a top level (empty) directory to extract all the files and programs. Then run the command line hash_demo.bat and search_demo.bat file located in the run_exes directory. The batch should have all the correct paths layed out to provide adequate results. Compare the outputs of the hash program with your own programs used to provide hash listings of files located in specified evidentiary paths. See which provide more evidentiary information.

The search_demo.bat files is included for you to see how fast the search program can find records contained in appropriate formatted outputs. Very useful when searching millions of data records for selected keys. (if you don't know what a key to search for means, don't even bother running the search batch)

top

MD5    BATCH

The file TEST_SUITE    batch file contains a batch file and sample HASH and SHA data files which will can be used to test and see the action of the MD5 program. It creates some sample MD5 output, and also compares some sample data files with preset MD5 and SHA values to display the action of the MD5 program.

The search command line shows how efficient the search program can be in finding specific HASH values in a data set containing HASH/SHA values. BUT BUT the search program can only work on fixed length records.

Run the executable from a top level (empty) directory to extract all the files and programs. Then run the command line md5_demo.bat file located in the run_exes directory. The batch should have all the correct paths layed out to provide adequate results. Compare the output of this md5 program with your own programs used to provide hashes (md5) listings of files located in specified evidentiary paths. See which provide more evidentiary information. Also, examine the different format of this output from that of the hash program.

top

GENERIC    DATA    PROCESS with a hint of HASH

The file TEST_SUITE    contains a generic batch file (run_test.bat) which demonstrates the speed and efficiency of some of the other data file processing programs, such as search, bsearch, compare, verticle. Don't confuse the maresware "search" program with a typical string search program. The other programs (including the forensic "upcopy" program) and help files can be downloaded from the website and made sure they are in the path before running the batch.

Anyone who processes large amounts of RAW data whether you created it through a forensic process, or obtained it from a source. The data processing programs are fast, efficient, and programmable. When I had a real job, I used to use it to process 100 million mainframe records. But don't let that deter you from taking a look see at the possibilities in your forensic work.

top

The Maresware help files may also contain additional zip and batch files which demonstrate the operation of the software.

If you find errors in the file links or process, please let me know.
Remember, this software doesn't contain bugs. Its just operationally challenged.

 

top