Cyber Threat Hunting (CTI) is a proactive approach used to find out malicious and suspicious activities in networks and endpoint devices. It includes the detection and classification of malware in target systems. The malwares detection techniques can be classified into following categories.
- Signature-based detection
- Behavior-based detection
- Anomaly-based detection
- Statistical-based detection
Among these categories, signature-based approach is the most common and relatively fast detection technique. Signature-based malware detection process requires some information about the malicious code to be detected. This information is used as a reference data to be compared with the target data. A successful match means the code belongs to the malware family pre-determined by the security experts. Yara rules is a similar technique used for identifying and classifying malware. The available information about known malware families is defined in Yara rules in the form of textual or binary patterns. This information (Yara rules) is passed on to Yara tool (Yara utility) to find out the target malware. In this article, we will discuss the Yara rules by explaining the basic structure and syntax of writing the rules.
Yara Rules Structure
The syntax of Yara rules is like C programming language. The following screenshot shows the basic structure of Yara rules.

Each set of Yara rules start with a rule name (identifier) (learn_yara_with_hackingloops in this case) followed by curly brackets. The curly brackets contain the following three important parameters.
- Meta (optional)
- Strings (mandatory)
- Condition (mandatory)
1) Meta Information
Meta is the optional field in Yara rules. It defines the different type of information related to Yara rules. Some example parameters used in meta section are description, author, date, and threat level. The following screenshot demonstrates the format of adding these parameters in Yara meta section.
Meta Example:
meta:
description = “dummy malware classification by hackingloops.com”
author = “Hackingloops”
date = “May 01, 2022”
threat_level = 7

2) Yara Strings
A malware may contain text data, hexadecimal bytes, and alphanumerical values as part of malicious code. This information (collectively known as strings) is fed to Yara rules in the strings section. The information about target malware is provided in the form of text, hexadecimal, or regular expressions. An identifier (starting with $ sign) is used to define the strings. One can define as many strings as needed to enhance the identification and classification capabilities of Yara rules.
Strings Example
strings:
$a = ” server.login(email, password)”
$b = { AA 1F E8 AB FB E2 CD F8 AC FF } $c = / f = open(‘c:\output.txt’, ‘w’)/

The above example includes three strings: text, hexadecimal, and regular expression. Text data is added in double-quotes (“ “)while Hexa-decimal values are enclosed in curly-brackets ({ }). The regular expression data is defined within forward slash (/ /) parameters. The string section may contain single or multiple text, hexadecimal, or regular expression strings.
Hexadecimal Exceptions: The above example is useful in scenarios where we have complete knowledge of hexadecimal strings. There can be circumstances where we do not have accurate hexadecimal string belonging to a malware data sample. In such situations, Yara rules offer elastic Hexadecimal search features. These features are termed as wild-card, jump, and alternatives. Let’s suppose we have a malware byte with some unknown values. We can still pass on this string to Yara rules using wild-card feature as shown in the following example.
$a = { AA 1F E8 ?? F8 A? FF }
In this example, Yara tool looks for above string with any values at placeholder (?) position. Similarly, we can use the jump feature in hexadecimal string declaration as depicted in the following example.
$b = { AA 1F E8 [2-4] F8 A8 FF }
The value in square brackets [2-4] indicates the random number of bytes that can be part of the target hexadecimal string. Examples include the following strings.
$b = { AA 1F E8 AE AF F8 AB FF }
OR
$b = { AA 1F E8 01 A1 BF F8 AB FF }
OR
$b = { AA 1F E8 AA AB FF A8 F8 AB FF }
The highlighted values in the above examples can be any random values. The rest of the string should match 100% with the string values. The alternative is another feature that is useful in situations where we have hexadecimal values with doubtful entries. To understand this concept, refer to the following hexadecimal string example.
$b = { AA 1F E8 [FF | A8] F8 AB FF }
The above example string has two values (FF and A8) separated by the pipe symbol (vertical line). This indicates that the target string can be any of the following combination of hexadecimal values.
$b = { AA 1F E8 FF F8 AB FF }
OR
$b = { AA 1F E8 A8 F8 AB FF }
3) Yara Condition
The condition is the mandatory part of Yara rules that comprises of different Boolean operators and expressions. The and, or, == , > , <, =>, <=, != , not, etc. are example operators that are used to defined conditions in Yara rules. The string identifiers are used to create the Boolean. Yara tool scans the target files and applications according to these expressions defined in the condition section.
Condition Examples
condition:
($a or $b) and ($c)
[img yara condition]
The above condition states that the target file or process should contain either string $a or $b but must have $c. We can use different expression combinations to expand the scope of Yara rules. For example, we can tell Yara utility to look for string $a for 10 times and $b for more than 20 times. This condition can be implemented by using the following expression.
#a == 10 and #b > 20
Similarly, we can teach the Yara rules to search for $b and $c at offset 50 in target files or virtual address 50 in case of a running process using the following rule condition.
$b at 50 and $c at 50
The above discussion is the basic architecture and principles of writing Yara rules. In the next article, we will demonstrate Yara rules through practical examples implemented with the help of Yara tool.
Conclusion
Yara rules are useful in identifying and classifying malware samples. One can create definitions of different malware and upload them to the Yara utility to automatically trace the malware families in files and running processes. Although Yara rules are effective in malware detection, the researchers must have beforehand knowledge about malware codes to define them through strings and conditions in the Yara rules.
Leave a Reply