I like YARA. Every time I hear its name spoken aloud it makes me chuckle and think I should start gabbing in German. Even though its origins are somewhat more south and on a different continent, specifically South America for the curious. It never ceases to amaze me how many sharp people in our industry have not used it or, in some cases, not even heard of it. YARA is a tool aimed at (but not limited to) helping malware researchers identify and classify malware samples. It has been around for a bit and has an active, growing community that supports it. As an open source project written in raw C and provided freely via Github, it’s tough to beat its price.
Well, that’s easy to describe. YARA contains a smorgasbord of pattern matching capabilities. It can be a sniper, zoning in one a single target or a legion of soldiers linking shields and moving across a battlefield. Both are accurate depictions of its ability to detect, either through extreme accuracy or broad strokes. We used to joke that YARA ate artillery shells and drank napalm, a testament to how powerful it was when it came to finding things. It’s also as smart as you make it; with the logic coming from the user.
YARA is not just for binaries.
You might be wondering still, what it is. On one hand, YARA is a lightweight, flexible tool, usable across just about any operating system. With its source code available, it’s easy to tailor or extend to make it fit a specific use case. YARA is an easy one to fit it into a trusted toolset for digital forensics, incident response or reverse engineering. On the other hand, YARA is your bloodhound. It lives to find, to detect and puzzle out twists and turns of logic. Its targets are files, the ones you commonly think of - binaries, documents, drivers, and so on. It also scans the ones you might not think of, like network traffic, data stores, and so on. It’s been quietly woven into the fabric of a lot of tools and you might be surprised that your SIEM, triage tool, phishing, sandbox or IDS can employ it. It’s usually something you find out after the fact when you learn of YARA’s existence.
YARA runs from a command line on both Linux and Windows, which is handy when you are working locally for reverse engineering or incident response. You can bring it online fast by opening it up in terminal and just as easily put it to work by handing it logic and a target. Graphically, it wins no awards and frankly makes no attempts to change that. Its better served by leveraging the numerous Python, Ruby, Go and other bindings to it that plug it into something graphical or wrap it in an API.
The logic that forms YARA’s brain is the just as streamlined and simple. YARA takes input at the terminal or you can provide it a simple text file of logic. It thinks in patterns that you fashion from rules and its Ying/Yang is pure true or false. The rules are sleek. You provide the name, the elements to match and pattern to match on. You can create the rule from a target, by sleuthing its insides and building matches, or do the opposite; derive a pattern and find targets that correspond to the logic.
There was a related blog on YARA support in OTX last week.
At its simplest, the elements to match can be something readable in ASCII, Unicode, or Hex. Declarative assignment is easy, it’s either there or it’s not, and the presence or lack of the element in a target takes on meaning to the logical pattern. It also speaks regex, and very intricate patterns can be built as elements to incorporate as the logic. This level of declarative discovery via YARA may be all you need, whether it’s to craft simple ASCII text, interesting Hex strings or intricate regex. I’ve pulled out a nice sample to show what it looks like in a rule using some of these elements. In case, this rule is aimed at any kind of file – it frankly doesn’t care what the target is, be it binary, html, image or other formats. The logic in it looks at a couple of simple shellcode possibilities and would be used in chain with other rules in a rule set.
As you might infer from the text above, the structure of the rule is straightforward. Don’t let that simplicity fool you. While we showed an example of declarative matching, i.e., it’s present or not, YARA is by no means locked only into that model.
Two other useful techniques are detection by proximity or by container. Proximity is exactly what it sounds, where the logic revolves around defining an element and then interrogating to find out if matching elements exist congruently around it. An example would be defining a hex string of $hex = { 25 45 66 3F 2E } and then looking for where the two elements around it in steps of 5, 10 or 15 bytes. For example:
for in in (0..#hex): (@hex[i]+5 == “cyber” or @hex[i]+10 == “defenses” or @hex[i]+15 == “Alienvault”).
The logic above, like the previous rule, could care less about its target – it can be anything. The rule only cares about finding matches to the logic expressed, in this case in an iterative fashion starting with the first match to the hex string and end with the last.
Containers are exactly what they sound like, where an element is contained within a bounding box you describe. Here, we might look for our previously defined $hex string but only within a custom defined location, like between it and another element, say $string_start and $string_finish. The logic would be $hex in (@string_start..@string_finish). Or, not in that range, such as $hex not in (@string_start..string_@finish).
Figure 1 YARA Bounding Box example
Some other useful techniques are counting, location, and procession, or the order of elements appearing. Counting is what is sounds like and leverages the count of some element as part of its logic and it can be equal, not equal, greater than, less than, etc. Location is using where the element appears in the file as a means of detection. It’s like the previously mentioned proximity and containment techniques, except it aligns to the file instead of a custom container. Procession is the order of the elements, and the elements searched might be text, let’s say, such as:
$a = “cyberdefenses” and $b = “SIEM” and $c = “Alienvault”
The appearance is mathematical, as in asking for a pattern of where $a < $b < $c or any other combination, such as $b < $a > $c and so forth.
There are plenty of other techniques to discuss but I hope these give you some insight into how applicable its logic can be against more complex puzzles. YARA is very extensible, as well, and its supportive community has expanded its capability with modules. One very commonly used module is the portable executable or PE module. It eases logic by providing more predefined elements for PE files and simplifies logic calls by handling some processes automatically. The Math module is another and it opens up a ton of functions that are handy. Plenty more exist, and you can find them here. They tend to provide new functions that can be leveraged or ease the burden with predefined attributes to ease detection.
While we tend to focus on the rules individually, they are meant to be used in sets and a rule set might contain, 1, 10 or 1000 or even more rules strung along in a sequence. It’s when you understand the concept of leveraging sets that you truly start harnessing YARA’s power. Rules are read from top to bottom in a rule set. Each will resolve completely before moving to the next so you can incorporate the results of an earlier rule into one that follows. Not just singly, but in any number. Any number of resolved rules can be repurposed into the conditional logic of a rule that follows. In fact, below is an example of where a rule was written to discover portable executable (PE) files with a specific import hash value, a specific section containing the entry point and then specific strings. Note, the use of “import pe”, which tells YARA that we are using the PE module and how the second rule “drops” the strings section and only uses logic to define a condition.
import "pe"
rule interesting_strings_1
{
strings:
$ = { 98 05 00 00 06 }
$ = "360saf" nocase
$ = "linkbl.gif"
$ = "mailadword"
condition:
any of them
}
rule par_import
{
condition:
interesting_strings_1 and pe.imphash() == "87421be9519ab6eb9bdd8d2f318ff35f"
}
rule Poss_polymorphic_malware
{
condition:
par_import and (pe.sections[pe.section_index(pe.entry_point)].name contains "" or pe.sections[pe.section_index(pe.entry_point)].name contains "p")
}
As I’m hinting, rule sets mean you can re-use logic and follow the principle of “write once, use often”. They also mean you can form a chain of inheritance, where rules can inherit the results of another and apply that in their logic. It also means modular construction, especially since YARA supports importing, so you can abstract your logic into multiple rulesets and import them in on an as needed basis. When it comes to juggling large volumes of rules in rule sets, that becomes an invaluable management and Quality assurance tool.
YARA seems simple, and it is, but YARA is very versatile in application. I could expound all day on its capability but its seems unfair to do so without touching on how its employed.
Perhaps the simplest use case to describe is its play in the reverse engineering world. If you Reverse Engineer malware and don’t leverage it, you are missing out on a fast win to speed your process. To match a file by its attributes, to classify groups of files into families, identify algorithms, find code caves, code stomping, and more are all easy applications.
Incident response? No problem. At some point, you start parsing files to understand how they align to the event that spawned the response. That’s when YARA comes into play, either to play a role like it would with malware that might be present or to fast search and find elements of interest.
If you gather file intelligence of any kind or maintain a lab that interrogates files of interest, then YARA can be a chief workhorse in the process. It can detect and identify by any attribute of a file, including those left by the compiler, the composer or cracker. With the right logic, like we previously discussed, the structure, as well as the containment and order of elements in a file become valid bundles of intelligence to be harvested.
The previous examples are pretty standalone instances but YARA also shines as a support and follow on tool, as well. Do you send files to a sandbox? If so, it can enrich the outcome and understanding gained from detonating the file in the sandbox. The same applies if you use it in your email filter, to triage phishing, in your SIEM, which, speaking of, Alienvault supports.
In short, YARA is versatile, powerful and available. Its learning curve is gentle and its application is broad. In a world where your foe hides in plain sight and around the corner, it has insane detection capability to cast a light on the suspicious, malicious or plain just interesting. If it hasn’t found a home in your toolkit, it’s time to step up and make it happen. If you need a hand in exploring its capability, we can show you how. Lastly, you should always demand the best.
Tcpdump is a command line utility that allows you to capture and analyze network traffic going through your system. It is often used to help troubleshoot network issues, as well as a security tool.
A powerful and versatile tool that includes many options and filters, tcpdump can be used in a variety of cases. Since it's a command line tool, it is ideal to run in remote servers or devices for which a GUI is not available, to collect data that can be analyzed later. It can also be launched in the background or as a scheduled job using tools like cron.
In this article, we'll look at some of tcpdump's most common features.
Tcpdump is included with several Linux distributions, so chances are, you already have it installed. Check whether tcpdump is installed on your system with the following command:
If tcpdump is not installed, you can install it but using your distribution's package manager. For example, on CentOS or Red Hat Enterprise Linux, like this:
Tcpdump requires libpcap
, which is a library for network packet capture. If it's not installed, it will be automatically added as a dependency.
You're ready to start capturing some packets.
To capture packets for troubleshooting or analysis, tcpdump requires elevated permissions, so in the following examples most commands are prefixed with sudo
.
To begin, use the command tcpdump --list-interfaces
(or -D
for short) to see which interfaces are available for capture:
In the example above, you can see all the interfaces available in my machine. The special interface any
allows capturing in any active interface.
Let's use it to start capturing some packets. Capture all packets in any interface by running this command:
Tcpdump continues to capture packets until it receives an interrupt signal. You can interrupt capturing by pressing Ctrl+C
. As you can see in this example, tcpdump
captured more than 9,000 packets. In this case, since I am connected to this server using ssh
, tcpdump captured all these packets. To limit the number of packets captured and stop tcpdump
, use the -c
(for count) option:
In this case, tcpdump
stopped capturing automatically after capturing five packets. This is useful in different scenarios—for instance, if you're troubleshooting connectivity and capturing a few initial packets is enough. This is even more useful when we apply filters to capture specific packets (shown below).
By default, tcpdump resolves IP addresses and ports into names, as shown in the previous example. When troubleshooting network issues, it is often easier to use the IP addresses and port numbers; disable name resolution by using the option -n
and port resolution with -nn
:
As shown above, the capture output now displays the IP addresses and port numbers. This also prevents tcpdump from issuing DNS lookups, which helps to lower network traffic while troubleshooting network issues.
Now that you're able to capture network packets, let's explore what this output means.
Tcpdump is capable of capturing and decoding many different protocols, such as TCP, UDP, ICMP, and many more. While we can't cover all of them here, to help you get started, let's explore the TCP packet. You can find more details about the different protocol formats in tcpdump's manual pages. A typical TCP packet captured by tcpdump looks like this:
The fields may vary depending on the type of packet being sent, but this is the general format.
The first field, 08:41:13.729687,
represents the timestamp of the received packet as per the local clock.
Next, IP
represents the network layer protocol—in this case, IPv4
. For IPv6
packets, the value is IP6
.
The next field, 192.168.64.28.22
, is the source IP address and port. This is followed by the destination IP address and port, represented by 192.168.64.1.41916
.
After the source and destination, you can find the TCP Flags Flags [P.]
. Typical values for this field include:
S
SYN
Connection Start
F
FIN
Connection Finish
P
PUSH
Data push
R
RST
Connection reset
.
ACK
Acknowledgment
This field can also be a combination of these values, such as [S.]
for a SYN-ACK
packet.
Next is the sequence number of the data contained in the packet. For the first packet captured, this is an absolute number. Subsequent packets use a relative number to make it easier to follow. In this example, the sequence is seq 196:568,
which means this packet contains bytes 196 to 568 of this flow.
This is followed by the Ack Number: ack 1
. In this case, it is 1 since this is the side sending data. For the side receiving data, this field represents the next expected byte (data) on this flow. For example, the Ack number for the next packet in this flow would be 568.
The next field is the window size win 309
, which represents the number of bytes available in the receiving buffer, followed by TCP options such as the MSS (Maximum Segment Size) or Window Scale. For details about TCP protocol options, consult Transmission Control Protocol (TCP) Parameters.
Finally, we have the packet length, length 372
, which represents the length, in bytes, of the payload data. The length is the difference between the last and first bytes in the sequence number.
Now let's learn how to filter packets to narrow down results and make it easier to troubleshoot specific issues.
As mentioned above, tcpdump can capture too many packets, some of which are not even related to the issue you're troubleshooting. For example, if you're troubleshooting a connectivity issue with a web server you're not interested in the SSH traffic, so removing the SSH packets from the output makes it easier to work on the real issue.
One of tcpdump's most powerful features is its ability to filter the captured packets using a variety of parameters, such as source and destination IP addresses, ports, protocols, etc. Let's look at some of the most common ones.
To filter packets based on protocol, specifying the protocol in the command line. For example, capture ICMP packets only by using this command:
In a different terminal, try to ping another machine:
Back in the tcpdump capture, notice that tcpdump captures and displays only the ICMP-related packets. In this case, tcpdump is not displaying name resolution packets that were generated when resolving the name opensource.com
:
Limit capture to only packets related to a specific host by using the host
filter:
In this example, tcpdump captures and displays only packets to and from host 54.204.39.132
.
To filter packets based on the desired service or port, use the port
filter. For example, capture packets related to a web (HTTP) service by using this command:
You can also filter packets based on the source or destination IP Address or hostname. For example, to capture packets from host 192.168.122.98
:
Notice that tcpdumps captured packets with source IP address 192.168.122.98
for multiple services such as name resolution (port 53) and HTTP (port 80). The response packets are not displayed since their source IP is different.
Conversely, you can use the dst
filter to filter by destination IP/hostname:
You can also combine filters by using the logical operators and
and or
to create more complex expressions. For example, to filter packets from source IP address 192.168.122.98
and service HTTP only, use this command:
You can create more complex expressions by grouping filter with parentheses. In this case, enclose the entire filter expression with quotation marks to prevent the shell from confusing them with shell expressions:
In this example, we're filtering packets for HTTP service only (port 80) and source IP addresses 192.168.122.98
or 54.204.39.132
. This is a quick way of examining both sides of the same flow.
In the previous examples, we're checking only the packets' headers for information such as source, destinations, ports, etc. Sometimes this is all we need to troubleshoot network connectivity issues. Sometimes, however, we need to inspect the content of the packet to ensure that the message we're sending contains what we need or that we received the expected response. To see the packet content, tcpdump provides two additional flags: -X
to print content in hex, and ASCII or -A
to print the content in ASCII.
For example, inspect the HTTP content of a web request like this:
This is helpful for troubleshooting issues with API calls, assuming the calls are using plain HTTP. For encrypted connections, this output is less useful.
Another useful feature provided by tcpdump is the ability to save the capture to a file so you can analyze the results later. This allows you to capture packets in batch mode overnight, for example, and verify the results in the morning. It also helps when there are too many packets to analyze since real-time capture can occur too fast.
To save packets to a file instead of displaying them on screen, use the option -w
(for write):
This command saves the output in a file named webserver.pcap
. The .pcap
extension stands for "packet capture" and is the convention for this file format.
As shown in this example, nothing gets displayed on-screen, and the capture finishes after capturing 10 packets, as per the option -c10
. If you want some feedback to ensure packets are being captured, use the option -v
.
Tcpdump creates a file in binary format so you cannot simply open it with a text editor. To read the contents of the file, execute tcpdump with the -r
(for read) option:
Since you're no longer capturing the packets directly from the network interface, sudo
is not required to read the file.
You can also use any of the filters we've discussed to filter the content from the file, just as you would with real-time data. For example, inspect the packets in the capture file from source IP address 54.204.39.132
by executing this command:
These basic features of tcpdump will help you get started with this powerful and versatile tool. To learn more, consult the tcpdump website and man pages.
The tcpdump command line interface provides great flexibility for capturing and analyzing network traffic. If you need a graphical tool to understand more complex flows, look at Wireshark.
One benefit of Wireshark is that it can read .pcap
files captured by tcpdump. You can use tcpdump to capture packets in a remote machine that does not have a GUI and analyze the result file with Wireshark, but that is a topic for another day.
This article was originally published in October 2018 and has been updated by Seth Kenlon.