One of the most awesome capabilities of Zeek, except the scriptable nature, is the network file extraction a.k.a file carving. The network file extraction allows you to extract the file that travels over the network. Network protocols like HTTP, SMB, FTP, and SMTP can transfer files, so with Zeek, can extract and save these to your storage device.

NOTE: The network traffic MUST NOT be encrypted for Zeek to extract the files over the network.

So what you can achieve with that, you say. One simple example is to save them and if a malicious file hash is detected you can get the malicious file for further investigation without the need to go to the specific endpoint to extract the file. Furthermore, you can go one step further and automate the procedure and send it to a sandbox ( public or private ) for analysis.

But first things first. How you can do the file extraction on Zeek? The simplest solution as Zeek mentions in the documentation ( https://docs.zeek.org/en/master/frameworks/file-analysis.html#file-analysis-framework ) is to add to your local configuration file the line:

@load policy/frameworks/files/extract-all-files.zeek

With this line, every file that travels over the network and can be seen from the Zeek sensor will be extracted in your local drive. When I say local drive I mean at the extract_files folder which is a subfolder of $ZEEKPATH/spool/ on each Zeek WORKER. So in a multi-node setup, the file will be saved in the specific node’s local drive, that saw the file.

But this doesn’t give you the flexibility to make tailored settings. Hence if you want to extract only the Windows executable files, you cannot do that.

To extend your capabilities on file carving, you have to write your one module. One simple example is:

@load base/frameworks/files

module CustomFileExtraction;

export {

    option mime_type_analysis : set[string] = {

                "application/x-dosexec",
                "application/pdf",
                "application/msword",
                "application/x-rar",
                "applicaiton/x-gzip",
                "application/vnd.openxmlformats-officedocument",
                "application/vnd.openxmlformats-officedocument.wordproccessingml.document",
                "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
                "application/vnd.openxmlformats-officedocument.presentationml.presentation"

                };

} # end of export




event file_sniff(f: fa_file, meta:fa_metadata){

       if ( meta?$mime_type &&  meta$mime_type in mime_type_analysis ){
            Files::add_analyzer(f,Files::ANALYZER_SHA256);
            Files::add_analyzer(f,Files::ANALYZER_EXTRACT);

       }
}

Let’s take line by line the above code and try to understand it.

@load base/frameworks/files

module CustomFileExtraction;

First, we load the File Framework to give Zeek file handling capabilities. Generally, we don’t have to, if we start Zeek with zeekctl because the framework is already loaded. But this is considered as best practice. So even if you load the script alone it will not have any problem executing it. Then we set up the name of the module to create a different namespace and not mess up with the other modules of Zeek.

export {

    option mime_type_analysis : set[string] = {

                "application/x-dosexec",
                "application/pdf",
                "application/msword",
                "application/x-rar",
                "applicaiton/x-gzip",
                "application/vnd.openxmlformats-officedocument",
                "application/vnd.openxmlformats-officedocument.wordproccessingml.document",
                "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
                "application/vnd.openxmlformats-officedocument.presentationml.presentation"

                };

} # end of export

With the export keyword, we declare that the variables that we will set inside it will be accessible from every Zeek module, in other words, they will be publicly accessible. Then we add the option keyword and we configure the MIME file types that we want to extract. Using the option keyword and making it publicly accessible gives us the possibility to change it from our local.zeek configuration file without editing the module file.

event file_sniff(f: fa_file, meta:fa_metadata){

       if ( meta?$mime_type &&  meta$mime_type in mime_type_analysis ){
            Files::add_analyzer(f,Files::ANALYZER_SHA256);
            Files::add_analyzer(f,Files::ANALYZER_EXTRACT);

       }
}

In the last lines, we create a Zeek event which is raised when a file is detected in the network (The full file lifecycle events can be found here: https://docs.zeek.org/en/master/frameworks/file-analysis.html#file-lifecycle-events ). In the if stanza, we check if Zeek has detected the MIME type of the file and if so we make a second check if the MIME type is in the list of MIMEs we want to extract. If both are TRUE then we add an analyzer that will calculate the SHA256 Hash of the file ( this is not mandatory for the extraction ) and then we add a second analyzer that tells Zeek to extract the file to the local disk.

By default, Zeek extracts files at the export_files folder, as mentioned above, with the default filename extract-$timestamp-$protocol-$fuid. This differs from how Suricata extracts the files. In Suricata’s case, a folder is created with the first 2 characters of the SHA256 hash, and the file is saved inside this folder with the SHA256 hash value as the filename. Hence if two files have the same hash, the second one will overwrite the first one, and with that, you don’t have duplicate files. Zeek also gives you the ability to do something similar.


event file_sniff(f: fa_file, meta:fa_metadata){

       if ( meta?$mime_type &&  meta$mime_type in mime_type_analysis ){
            Files::add_analyzer(f,Files::ANALYZER_SHA256);
            Files::add_analyzer(f,Files::ANALYZER_EXTRACT,[$extract_filename=f$info$sha256]);

       }
}

Finally, I want to mention that Zeek will not delete the extracted files by itself. You have to add it to your script to delete them. If you don’t do so, after a while the free space of your local drive will fill up. The only time that Zeek deletes extracted files is when you stop/start your sensor. Also, you can take some ideas on how you can use the file extraction from one of my GitHub projects that you can find at https://github.com/chrisanag1985/zeek-sandbox.

TIP: You can add to your local.zeek the line below

redef FileExtract::default_limit = 10485760;

This tells Zeek not to extract files bigger than 10MB. After research from Palo Alto, they found out that most of the malicious files don’t exceed the 10MB file size. So with that, you can save a lot of space in your local drives and don’t consume a lot of resources on your Zeek sensor. Of course, this is something you have to decide, based on your network setup and your company’s policies.

You can find more information about this at: https://docs.paloaltonetworks.com/advanced-wildfire/administration/advanced-wildfire-deployment-best-practices/advanced-wildfire-best-practices