PURPOSE
OPERATION
COMMAND LINES
OPTIONS
RELATED PROGRAMS
BUGS: Operational Challenge
One liner: Take X-WAYS output of the meta data field, and makes it easy to read. Sets each column separate.
This program currently does NOT support Unicode input files.
This is a command line program.
MUST be run within a command window as administrator.
Both versions operate and have similar command lines. But work on different input files. So read and adjust the wording of this help file according to the input file you are using.
What's the best thing about using these programs? Its that they are stand alone programs, as apposed to x-ways extensions. That means that instead of having to run an extension within the x-ways program, which necessitates the use of the license dongle, you can run these programs on the output files created by x-ways. You the forensicator, can create output data files, and send them to the reviewers, who can they, themselves further massage the output to suit whatever format or reporting requirements they have. And they don't need to depend on the license dongle to further massage the data. That all presupposes that you provide them with output data that is inclusive enough for them to use. Can you do that?
Here is a small amount of
Sample data and
a batch file in a zip file which can be downloaded.
Included are also a number of sample keyword files like:metadata_docs_keywords
there are sample TAB delimeted x-ways metadata files
and a sample of commands in the batch file which shows how to process the sample data.
Unzip the files. then run the command.bat:
The output files have a _tmp added to the filename: such as:
DOCS_META_tmp.tab
Load (import data) this xxx_tmp.tab file to a spreadsheet and take a
look at the added columns. (Make sure your import criteria is a tab delimeted file)
To find (and isolate) semi-colon (;) delimited fields within the X-Ways metadata field that
is exported during the X-Ways "Export List" operation, or HTML report generation of X-Ways.
The default X-Ways export list is tab delimited, and this program ONLY works on those TAB
delimited files.
The X-ways html report that is generated is a typical HTML report.
The report contains all the selected x-ways columns, plus a Metadata: field.
The contents of the html report file are usually identified as individual
Metadata items seperated with a carriage return or (HTML line break command).
Similar to this sample shown here. It (the metadata column/field) may contain a lot of irrelevant information
from the meta data information for each file being processed.
Important note: See below section labelled HTML_REPORT.
Height: 3000
Orientation: 1
Equipment make: SOME CAMERA MAKE
Model: KODAK
Keywords: help me
Date Original: 2009:08:20 10:27:26
Date digitized: 2009:08:20 10:27:26
Thumbnail: true
Focal length: 4.0
F number: 3.5
The "x-ways_meta" version deals with the X-Ways "Export-List" options. It takes "user" identified sub-field(s) found within the larger metadata field located within an X-WAYS "Export-LIst" file provided by the user and isolates each one (sub-field within the metadata field), then extracts each one and sets each field up as its own tab delimited field within the output record which was generated by X-Ways. These newly inserted tab delimeted fields, can now be easily imported to a spreadsheet and manipulated by the user.
Here are two sample records from an x-ways export-list operation. These two records contain different meta data information based on the fact that one of the files is a jpg, and the other a doc file. Notice the volume and varied fields within the metadata field. There are a number of various items which may be of interest for differing needs relating to a document or a jpg picture. Notice the "field" names are identified with a colon, and the data part of the item itself ends with a semi-colon. This is how x-ways identifies each serperate item in the meta data field. The actual record has been shown with appropriate carriage returns for legibility. But each record is only a single record. The actual metadata segment starts on the 2nd line of each record.
Realize that if you import these records to a spreadsheet, the entire metadata field would end up in a single column within the spreadsheet. The specific identification and isolation of important information within the spreadsheet would be difficult. So the user is allowed to tell the program to select out specific fields, like Last Saved; Date Original; Last Saved By; Creation; and any other spefici sections the user chooses. Then the program will create a seperate new column for each item. That way, analysis within the spreadsheet becomes easier.
Name Ext. Path Size Created Modified Accessed Metadata 03.jpg jpg \comps 2684541 2009/02/02 17:26:44 2009/02/02 16:24:10 2011/10/23 15:23:32 Width: 2592;Height: 1944;Orientation: 1;Software: Digital Camera FinePix S5200 Ver1.00; Equipment Make: FUJIFILM;Model: FinePix S5200 ;Maker Note: (298 bytes);Copyright: ; Date Original: 2009:01:03 09:37:43;Date Digitized: 2009:01:03 09:37:43; Date Taken: 2009:01:03 09:37:43;Thumbnail: true;Focal Length: 13.30;F Number: 3.30;signature: E7ADE202 advance.doc doc \training 23040 2009/05/06 14:17:31 2012/01/28 17:51:22 2012/01/28 05:00:00 Locale identifier: 0x409 English (United States); Title: Mares and Company Advanced (SCERS) Training Registration; Author: Dan Mares;Last Saved By: Administrator;Version: 2; Application: Microsoft Word 9.0;Creation: 2004/06/28 12:26:00; Last Saved: 2004/06/28 12:26:00;Page count: 1;Char Count: 2189; Code page: 1252 ANSI - Latin I;Company: Mares and Company, LLC;AppVersion: 9.3821The user would decide which fields within the meta data to extract and place in its own unique column for the inclusion into the spreadsheet. For instance, supposed we only wanted to expand the Last Saved by, and Creation dates to seperate columns. The associated field column file (provided on the command line) that you the user would provide on the command line would have two items in it and look like:
The program would then process each record, leaving all the major columns in-tact, adding two new unique columns as requested, and then completing the operation by adding the entire metadata field. So the resulting xxx_tmp.tab file would now have all the original data fields, PLUS two additional columns. When inported into the spreadsheet, these two new columns could be easily manipulated and studied.
The resulting output record, now has additional tab delimited field(s) which are identified as the semi-colon (;) delimited sub-field within the X-Ways metadata field. The original metadata field is not modified in any way, and is always maintained in the newly designed output record.
The X-WAYS html report may often contain the full metadata information relating to a file. And depending on the type of file being processed, this metadata information field as always, may contain much more information than you wish to display in the html report. Especially when outputting to the report various file types, (ie: docs, xls, pdf, jpg etc.) All these file types will provide different metadata content of which you may only wish to disply specific items from the metadata field. You use this program to select ONLY those items mentioned in the metadata field to be shown in the report. ALL items normally shown in the report outside and above the metadata information will always be shown.
Again, choosing the sections in the metadata field that you wish to display will eliminate a significant amount of clutter in the report.
The HTML report processing version takes the selected or requested metadata fields and these are the only metadata components included in the resulting html report. Any additional unused/unnecessary Metadata items are removed from the output html file. This reduces erroneous, and extra html data which the user feels may be unnecessary in the final HTML report. Each (new) line in the resulting html output file is now also labelled Metadata:
The resulting metadata fields displayed in the report are each on a line by themselves. So
viewing is made easier.
A sample list of html identifiers/fields might be:
keywords:
description:
Title:
These three are only a small subset of the items contained in a much larger metadata
report field. Research appropriate items to add to the list.
The appropriate command line is:
x-ways_report x-ways_report.html meta_data_web_report_keywords_as_seen_above
The "Metadata:" line in the report MUST be the first item on the line in the html report and be left justified with no spaces. If you wish to change the Metadata: field to be BOLDED so it will stand out, you may do this. But you can only use the < B > and < / B > html tags. Those are the only html tags that the program understands when looking for the Metadata field name. If you use the < STRONG > tag it will not work. If the program does not respond properly, (meaning it does not properly find and parse the Metadata field) please open the html report with a text editor and do a search and replace. Remove the bolding or any other formatting around the word Metadata, and make it left justified on the line. In other words: replace the string < B >Metadata: < / b >, with just Metadata: Open and examine the X-Ways generated html report, and you will understand this above restriction.
WHY THIS PROGRAM WAS WRITTEN
The reason this program was created, is that when X-Ways finds and extracts metadata from within a file, it (X-WAYS) extracts many different metadata items from within the metadata content of any particular file. This metadata extracted is different depending on the type and content of the file being processed. There are a large number of variable items placed within this metadata field. (See below the list of meta data fields i have found). Some of which (when available) may be the Last Printed Date, Author, various Exif data, date(s), camera type, and other useful metadata information. However, since the metadata field as extracted is basically a field with free form sub-fields semi-colon (;) delimited, it is not easy to either identify the targeted item (ie: Last Printed date), or in which location within the metadata field it is found. If you have ever imported the export list to a spreadsheet you have experienced this problem.
This unknown location within the metadata field, and the variableness of the metadata information makes it almost impossible to isolate and reparse the targetted item (ie. Last Printed Date:). This program identifies the targeted (user identified) sub-field, extracts and isolates its data and makes a seperate delimited item/column of the data. Then when the resulting data file is imported to a spreadsheet such as Excel, that user identified sub-field is now its own unique column within Excel, and can be processed as other columns. If you have ever tried to parse the metadata field you know what we are talking about.
Said another way. This program takes the metadata field, and based on the users input (hopefully properly and correctly researched information) parses the metadata field to locate the semi-colon delimited field(s) which is needed. It then reparses the metadata field and seperates out the selected field(s) into its OWN seperate tab column, which when imported to the spreadsheet will process very nicely. The original data record is not changed, except now it has added tab delimited items.
The original field which this program was written to parse was the "Last Printed:" date item within the doc and spreadhseet generated metadata. It has also been tested on email, Exif, and link file metadata, and seems to work with all of these metadata fields. Any feedback on its operation is appreciated.
CAUTION:
There is one caveat.
Which is, that X-WAYS ALLOWS carriage returns to be embedded within the metadata field of the
"Exported List" process. These embedded carriage returns usually are a result of parsing
email items, but, regardless of the source file, will cause a spreadsheet (Excel) to have
major problems. This program finds those embedded carriage returns
If you only want to convert the carriage returns embedded within the metadata field, simple select one metadata field, (any field will do), and run the program. Your output contains the added field, but the metadata field itself, now has the carriage returns converted.
04-04-2012 NOTE: Thanks to Jimmy Weg, I have made a second program called: x-ways_report, which is included in the sample_data file mentioned above. It is designed to work on the metadata field that is included in the X-Ways HTML report files. It will search the Metadata: line and select out only those segments which the user requests on the command line. The user can input up to 10 items to select on the command line. OR OR, (4/9/2012) if the 3rd item on the command line (which would normally be the metadata item searched for) is replaced by a filename containing the items, then these items are searched for. The metadata items must be one per line in a text file. see the command line below.
This program takes a single filename as its input, and up to ten other items on the command line. Be careful, this IS a COMMAND LINE program.
The program takes the input filename, parses it, and adds an _tmp to the input name. Thus generating a new filename and uses this as the output. So an initial input file named: xways_export.txt will generate an output name: xways_export_tmp.txt. Look for a new output filename similar to the input, with the added _tmp.
The input file should be the usual tab delimited file which is exported thru the X-Ways "Export List" option. THE INPUT MUST BE A TAB DELIMITED FILE, and you must advise the spreadsheet program of this when importing the data. This is the default export format of the X-Ways "export list" operation. The user may include in the ouput record any other fields they would usually include. HOWEVER: The last field in the exported record MUST be the metadata field. This last field being the metadata field is the ONLY one which is being searched or processed for the item(s) which is provided by the user on the command line. If the metadata field is NOT the last field, the output file will not have the expected content.
The traditional format of the X-Ways metadaa field from the "Export List:" process is a single field within the tab delimited record. This is a sample of three fields below (path, hash, metadata, I split the metadata to two lines for easy reading. Notice in the metadata, there are (colon delimeted) sub-fields of: File name:, Sequence:, Version:, Length:, Cluster:, Modification: )
\WINDOWS\system32\config\Newsid Backup F67ACE253768387C57471BE55F051ABC Metadata: File name: temRoot\System32\Config\DEFAULT;Sequence: 831;Version: 1.5;Length: 499712; Cluster: 1;Modification: 12/20/2011 04:50:23;Last Printed: 12/25/2011 06:50:23
From the report html file, we find meta data would be displayed in the browser as shown below, however the format within the metadata field is one continuous line, and the html code of the <BR> to indicate a line break is embedded within the report as shown here. However: (carriage returns are inserted here for clarity on the screen). The actual view of the html report would look as it does here when viewed.
Metadata: Width: 3296 <BR> Height: 2472 <BR> Orientation: 1 <BR> Software: OLYMUS CAMERA MODEL <BR> Equipment make: YOUR CAMERA COMPANY <BR> Model: THE MODEL <BR> Maker note: (12728 bytes)<BR> Keywords: any words in the metadata <BR> Date Original: 2011:03:20 13:27:26 <BR> Date digitized: 2011:03:20 13: 27:26 <BR> Thumbnail: true <BR> Focal length: 4.0 <BR> F number: 4.70 <BR>
Within this "Export List" metadata field, X-Ways usually delimits the metadata with semi-colon (;) delimited fields. So that within the metadata you have multiple items which X-Ways has parsed into semi-colon delimted items. One of these items is what the user will probably be looking for. One usual item is the "Last Printed:" date of Office documents. If available, this "Last Printed:" date will be one of the semi-colon; delimited items within the metadata field. In other instances, there may not be any metadata at all, or the item being looked for is not part of the metadata extracted. These are the three possbilities. If you don't know what this is referring to, don't bother to read on.
On the command line, after the user provides the input filename, you are required to input a search string (or a text file, one item per line of the fields to search for). This search string is the name of the semi-colon delimited field within the metadata field which is the item to look for. For the purposes of further discussion we will use the "Last Printed:" field name which is sometimes part of the metadata of Office documents. Notice that the actual name of the field usually ends with a colon (:). This is how X-Ways seems to identify the item name.
SPECIAL CASE for not adding field title in output record
The Last Printed: field is displayed in the output record as:The program will read each record within the input file. It then finds the last tab delimited field (which MUST be the metadata field). Within the metadata field, it then looks for the string(s) which the user has input, in this case "Last Printed:". The string searched for is case sensitive, so be aware of any anomolies that might exist in the X-Ways data record, especially CaSe sensiTivity of the item being sought.
Once the string is located, the program assumes it is the semi-colon delimited field to extract. It then outputs the first part of the record, up to this metadata field, it then outputs this subset of the metadata field, which is what the user asked for, and finally it outputs the complete metadata field as it was originally in the X-Ways output record.
What we end up with is the searched for field, tab delimited inserted just BEFORE the originals meta data field.
This stand alone tabbed field is now properly formatted so that when the user imports the resutling output file into a spreadsheet that field is easily identified and processed.
C:> x-ways_report.exe report_input.html file_containing_metadata_fields_to_look_for (preferred version) C:> x_ways_meta inputfilename.txt "String_to_search_for:" "Another_string_max_of_10:" C:> x_ways_meta inputfilename.txt "Last Printed:" C:> x_ways_meta inputfilename.txt "Last Printed:" 2> CR_error_filename C:> x-ways_report.exe report_input.html "Last Printed:" "Keywords:"
Notice that all the strings in the inputfilename.txt above to search for terminate in a colon (:). This is because in my research, most if not all of the metadata field names within X-Ways metadata column are identified by a colon terminator. It is not required, but seems to be the standard.
Will attempt to locate the "String(s)_to_search_for" field within the metadata field, and extract it to another tab delimited field within a new output file.
Please note, When using the command line to identify the meta data field(s) you wish to locate, that there are a max of ten metadata strings in the x-ways_meta.exe per run that can be searched for.
For this reason, it is preferred that you use the text file which contains your strings. This makes the list easily modified and reusable.
A sample text file might contain
Sample string(s) file
Company:
Creation:
Modification:
Last Accessed:X
Last Saved:
Last Printed:
Author:
Subject:
Last Saved By:
Version:
Creation Time:X
See the full list of items i have found.
The redirection 2> to the CR_error_filename, (only used in the "Export List" processing) finds and lists those records in the input file which contain embedded carriage returns in the metadata field, and changes the embedded carriage return to blanks. The result is that the data file can easily and cleanly be imported to the spreadsheet.
There is a way to get the "report html process" version to create seperate tagged Metadata: lines for each nd every metadata item. It makes the reading of the report a lot cleaner and easier. If you wish to learn how to do this, give me a call and leave a message: 678-427-3275.
None, but a weird way to search for items case insensitive.
The default is to search for the strings as case sensistive.
So you better get it correct.
However, if you call the program with ALL UPPERCASE characters
(X-WAYS_META), then the search is done case InsEnsITive.
Below are fields i've found in the metadata column of X-Ways. I have yet to add any email eml fields. The list is long. When using in the program, if your research confirms what we have here, be sure to include the colon as part of the field name. That is usually the field delimiter. Also, do proper research to determine the case of the field you are searching for. Many programs arbitrarily alter the case. Notice some items below (see Content-type) have two versions.
_EmailSubject:
_NewReviewCycle:
action-uri=http:
application-name:
Application:
AppVersion:
assetid:
Attach:
attached to a shape:
Attributes:
Attributes:
Author:
author:
Bit count:
Build identifier:
Build Number:
Build year:
CACHE-CONTROL:
Cache-control:
cache-control:
Canon:
Caption:
Category:
Category:
Channels:
Char Count:
Characters:
CharactersWithSpaces:
CLASSIFICATION:
Cluster:
Code page:
Comment:
Comments:
Company:
Compression:
Consistent:
Contact:
Content-Language:
Content-type:
content-type:
Content-Type:
Copy ID:
Copyright:
Copyrighted:
Copyrighted:
Created-with:
Creation Time:
Creation:
Creator Application:
Creator Host OS:
Creator Version:
CREATOR:
Creator:
CreatorTool
Date digitized:
Date Original:
Date taken:
Date:
Description rdf:
Description:
description:
Detach:
DocSecurity:
DocumentID>adobe:
DocumentID>uuid:
DriveLetter:
Duration:
EmbeddedFile:
End time:
Equipment make:
Expires:
expires:
F number:
falseCreationD:
File format revision:
File history flags:
File name:
File size:
Files:
Finish:
Firmware:
Flags:
Focal length:
Format Tag:
GENERATOR:
Generator:
Height:
Hidden count:
Host Name:
http:
http:
https:
ID List:
Image Number:
IMG:
INAM:
Interpretation:
ISRC:
Item:
Keywords:
Last Accessed:
Last Opened By:
Last Printed:
Last Saved By:
Last Saved:
Last Written:
Last-Modified:
Latitude:
Length:
Linearized:
Lines:
LinksUpToDate:
Local Path:
Locale identifier:
Logger name:
Longitude:
Lowest version:
MAC Address:
Machine:
mailto:
Maker note:
Manager:
Manufacturer:
mode :
Model:
Modification:
Moved to recycle bin:
Network share name:
noquick:
Note count:
Note:
Object ID:
Orientation:
(Original Filename: *** see below)
Originator:
OS Version:
OS:
Owner:
Page count:
Pages:
Paragraphs:
PASSWORD:
pics-label:
Play Duration:
Pragma:
pragma:
Presentation Target:
Producer:
ProgId:
progid:
propID:
PROTECT:
RATING:
REFRESH:
Refresh:
refresh:
Relative Path:
Repair count:
ROBOTS:
Robots:
robots:
Root cell:
Saved State:
ScaleCrop:
searchid:
Security Level:
Sequence:
Serial number:
Service Pack:
Set ID:
SharedDoc:
signed:
Signing date:
Size:
Software:
Source Computer:
SourceModified:
Start time:
Start:
State:
Stream Type:
Subject:
subject:
Target Attributes:
Target Created:
Target File Size:
Target Path:
Template:
theme:
Thumbnail:
Timestamp:
Title:
TotalTime:
Type:
Unique ID:
URL=http:
url=http:
URL=https:
User Comment:
Version:
viewport:
Volume ID:
Volume Name:
(Volume Serial: *** see below)
Volume Type:
Volume:
Wantlive:
Width:
Words:
Work:
Original Filename: use with metadata of $R... files
SPECIAL INSTRUCTIONS: READ CAREFULLY
The "Volume Serial:" number is the serial number given to the disk by Microsoft at
the time of formatting. It is most easily seen when doing a "dir" of the drive. The
response shows up as " Volume Serial Number is 1442-13FE". However Microsoft stores the
volume serial number in the boot record in little-endian fashion at displacement 72 (from
0). So if you are trying to confirm/find the serial number 1442-13FE at displacement 72,
you would actuall need to look for: FE134214 (without the dash). The link file internal
record of the serial number is displayed as it is in the DIR command, so when looking
at the raw (boot record) data, you need to convert to little-endian.
X-WAYS and $R.... MetaData for Original Filename:
X-Ways $R... (recycled files) and obaining the Original Filename
When X-Ways exports the metadata of the $R files, it produces a "Movedd to recycle bin" field like:
Moved to
recycle bin: 2015/03/24 22:46:58.0 +0;C:\Users\DAN\Documents\Admin\Filename_whatever.pdf
Notice the actual original filename doesn't have traditional (colon :) field delimeter or a unique field name before it.
It is merged with the Moved to... as a single field. The default operation of this program will not be able to parse the
original filename because it is combined with the MOVED date.
In order to allow for correct parsing of the original
filename into a field, we must do the following.
Look at the time offset which was used. In this case it is +0 followed
by a semicolon delimeter. Assume the entire file has the same +0; offset, we can change the +0; to reflect a correct field
delimeter.
Do the following,
Perform a search and replace with the following parameter (using the offset as the key).
Find: +0; Replace with: +0;Original Filename:This will fix the fields so that now we have:
X-WAYS_ID_rename A sister program to take the X-Ways export list data and rename the exported files.
EML_PROCESS A sister program which can easily separate the header fields within eml files.
CSV2PIPE Is capable of removing embedded carriage returns from csv files.