II. Project Nomads NPK
Welcome to the second episode of "Figuring out File Formats". In this lessen we will take a look at another archive format, in this case Radon Labs' NPK format as used in Project Nomads.
Again the main part of our work is looking at the file in a hex editor:
The first thing we can see is, that the file seems to use 4-byte ASCII codes as some sort of identifier. The first one is 0KPN, which is surely supposed to be read backwards as NPK0, so it matches the file extension. This tells us that very likely all of these identifiers are supposed to be read backwards.
So let's see which we can find:
- "0KPN" -> "NPK0"
- "_RID" -> "DIR_"
- "ELIF" -> "FILE"
- "DNED" -> "DEND"
The number of bytes between the identifiers seems ot vary, so we'll try to find out if it is stored somewhere in each of the block started by identifieres. A common thing that is done in encoding files, is storing the length of a block directly behind its beginning. This value sometimes includes the header size, sometimes is doesn't.
So now, we will check if the number directly behind an ELIF or _RID is either the full length of the block, the lenght of the block minus 4 (if the length includes itself, but not the ELIF/_RID) or the length minues 8 (if it excludes ELIF/_RID and the length itself).
In this block, we see the value is 12 00 00 00, which means 00000012, or simply 12 in hex. Now let's check the lenght of the entire block:
It turns out to be 1A, which is exactly 12+8. Now we'll try the same with other blocks and find that it always works, even for DNED with length 0 and 0KNP with length 4.
Now we know how to seperate the blocks from eachother, but we still don't know how many there are, so let's check what our first block 0KNP holds within its 4-byte body:
We find the number 09 FC 00 00 or FC09, which is 64521 in base 10. THis looks a little bit large for a number of files or anything similar, so we'll assume it's something like the length of our block data. To check this, we jump to offset FC09 in our hex editor:
We can see at at exactly offset FC09 a new type of block starts that identifies as "ATAD", or "DATA" backwards. That descriptive name and the fact that it i followed by more raw looking data, tells us that here the actual file data is stored that we now have to split up using information that we can gather from our long list of blocks above. This means that we have to look deeper into the details of the "DIR_", "FILE" and "DEND" blocks.
The names already imply certain meanings: DIR_ might be directory and FILE carries information about files. DEND isn't too obvious in the beginning and we'll try to figure it out later.
We already know that block consists of identifier and length, so let's see what else can be found in a DIR_ block:
The payload of this block seems to have two parts: First 2 bytes that look like a regular number and secondly it contains a string that looks like a file name. Since we are pretty sure it has something to do with a directory, we can conclude that this string is a directory name. We'll have to accept that these developers decided to have folders with file extensions (".n" in this case). By simply checking the length of the string and comparing it to two bytes before it, we find that these two bytes represent the length of the string. Slowly we should get routine in matching lengthes and their representations in raw hex.
Next we dive into the FILE block:
This block again contains a file name at the end that has its length as 2-byte word before it.
The remaining two 8 bytes for this particular block are "F4 16 00 00 C4 35 02 00".
We can see that these seem to represent the 2 4-byte words "F4 16 00 00" and "C4 35 02 00" (or 16F4 and 235C4 as numbers).
As I already mentioned in the first episode, there are typically two pieces of information (besides the name) that you need to refer to a file:
Its size and its location inside the file.
To find out which is which, we'll use a little trick: The FILE entry that we are currently looking at is the second in the file, so we'll compare it to the first one and see what values it has:
These values are 0 and again 16F4. 0 would be weird for a file size, so we assume it's an offset inside the DATA section.
Finally we have to understand what all these blocks are about. First we have DIR entries. It would make sense if these entered directories. FILE entries seem to indicate that a file has to be put inside the directory. Following this logic, DEND must be Directory End, so it tells us to leave the current directy and go one step back in our folder hierarchy.
This leaves us with the following structure for our NPK file:
- 0KNP [4byte length of block (=4)] [4byte DATA offset]
- [List of blocks of the following types that ends at DATA offset:
- _RID [4byte length of block] [2byte length of name] [directory name]
- ELIF [4byte length of block] [4byte offset inside DATA] [4byte length of file] [2byte length of name] [file name]
- DNED [4byte length of block (=0)]]
- ATAD [4byte length of block] [raw data]
Coding all this in a program and extracting the files results in a nice folder structure filled with all sorts of files.