Encryption is everywhere in our lives. You might not notice it, but you use it every single day. It is baked into even the most basic processes of our digital world. Every time you open a website, send a message, unlock your phone, or pay for your morning latte, you are using encryption as part of that process. Encryption has evolved over centuries to become the cornerstone of modern data security.

However, encryption can have a dark side. Threat actors can also leverage the power of encryption as part of their malicious operations. Encryption is commonplace in malware for many reasons, such as obfuscating configurations, hiding stolen data, scrambling communications, and holding users’ files for ransom. This blog will delve into the world of encryption and malware and how to detect and protect yourself and your organizations.

## Fundamentals

So, what is encryption? Encryption is the process of modifying data to conceal its true meaning from any unauthorized entity.

This involves converting the original data, commonly known as *plaintext*, into unreadable data, commonly known as *ciphertext*. The process used is an algorithm incorporating a key that is used to scramble up the original data so much that it is incredibly difficult, or in some cases nearly impossible to decipher, without having the key to use for decryption. In modern encryption, the encryption algorithms are widely known, with only the key being secret between the authorized parties. This is in contrast to many cases of historic encryption, of which much were dependent on the algorithm not being known to unauthorized parties. The most important encryption components are:

**Plaintext**: The original, unencrypted data. Often called the*message***Ciphertext**: The scrambled, encrypted data. Produced using the encryption algorithm and key on the plaintext. The ciphertext is converted back into plaintext through the process of decryption**Encryption Algorithm**: The algorithm is the process or steps that are taken to scramble up the plaintext. There is a wide range of algorithms that can be used, with varying degrees of strength and complexity. The algorithm can be something as simple as a substitution cipher, up to complex mathematical algorithms like Advanced Encryption Standard (AES)**Key**: A piece of information that is used with the algorithm to encrypt and decrypt data. A key can take many forms, such as a password, a series of numbers, or a random string of bits

Encryption and Cryptography, in general, have been around for a long time, even having been used by historical figures to protect their sensitive messages such as Julius Caesar and Mary, Queen of Scots, who used an elaborate substitution cipher. Mary’s cipher, although intricate, was still able to be cracked with relative ease through frequency analysis. With the messages being decrypted, this uncovered a plot to assassinate Queen Elizabeth I. Mary, along with her co-conspirators, was executed! Think of the consequences the next time you decide to use a weak cipher for your secret messages!

Fast forwarding to the modern era, with the development of technology, the complexity and speed of encryption have exploded. The information age has allowed encryption to be handled at the speed of computers. In the following sections, we will introduce you to different types of encryption and how they fit into the world of malware. First, we will introduce the main building block of encryption in modern cryptography. That is the XOR (exclusive OR) logical operation. XOR is used in most of the encryption algorithms that are used in information security today, both symmetric and asymmetric.

## XOR and Rolling XOR Encryption

XOR (exclusive OR) is a logical operation that takes two binary inputs and produces an output based on the following rules:

- If both inputs are the same (either both 0 or both 1), the output is 0.
- If the inputs are different (one is 0, and the other is 1), the output is 1.
- If A ^ B = C than A ^ C = B and B ^ C = A
- This means that XORing the output with one of the inputs returns the other input, making it a two-way function.

In XOR encryption, each plaintext byte is combined with a corresponding byte or character from a secret key using the XOR operation. The resulting ciphertext is the encrypted form of the plaintext. Since the encryption is a two-way function meaning that to decrypt the ciphertext, all you have to do is XOR it with the same secret key.

A rolling XOR (rotating XOR) is a cryptographic technique that involves performing the XOR operation on a stream of data in a rolling or rotating fashion. A repeating key with a fixed length is used to perform the XOR operation on the data. The key “rolls” or “rotates” through the data stream, applying the XOR operation to each byte of the data with the corresponding byte from the repeating key. Once the XOR operation is performed on the last byte of the data, the key is cyclically shifted, and the process starts again from the beginning of the data.

Rolling XOR can provide a slight improvement in security compared to regular XOR encryption with a fixed key because it introduces a level of variation and complexity. However, similar to basic XOR encryption, rolling XOR is considered relatively weak and can be broken relatively easily. Yet, due to its simplicity, it is a very fast encryption routine.

### XOR Encryption Uses in Malware

RedXor is a Linux backdoor operated by a Chinese Nation-State Actor. The backdoor masquerades itself as a polkit daemon and encodes its network data with a scheme based on XOR. Encoding network data with XOR has been used in previous Winnti malware, including PWNLNX.

The decryption logic is a simple XOR against a byte key. The byte key is incremented by a constant for each item in the buffer. The only configuration value that is not encrypted is the server port. The port value is used to derive the key and the address. The key is derived from bit shifting the port value eight steps to the right.

The decoding function accepts four arguments: the XOR key, the constant added to the key byte every round, the encoded buffer, and its length.

The screenshot below presents the implementation of the XOR encryption. The highlighted xor operation between the buffer and the key is followed by including the key and the constant character.

The pseudocode of the decryption would look like this:

doXor(keyChar, adder, buf, buf_len) { key = keyChar; for (i=0; i < buf_len; i++) { buf[i] = key ^ buf[i]; key = key + adder; } return 0; } |

## Symmetric Cryptography

Symmetric cryptography is when the same key is used to encrypt and decrypt plaintext and ciphertext. Generally, there are two types of ciphers used in symmetric encryption. *Stream ciphers* and *Block ciphers*.

### Stream Ciphers

A stream cipher is a type of cipher that encrypts data by combining the plaintext with a keystream. A keystream is a pseudorandom set of bits that is generated using a seed. This seed serves as the key. Stream ciphers typically encrypt a message one bit or byte by XORing them with the keystream. Quite often, the key for stream cipher does not need to be of a fixed length.

A popular example of a stream cipher is RC4. RC4 is a stream cipher that was created by Ron Rivest, one of the creators of the RSA asymmetric cryptosystem. A number of vulnerabilities have been identified in RC4, which has led to it being phased out of mainstream encryption use, but due to its simple implementation, it’s still used in malware.

### Stream Cipher Use in Malware

As we mentioned, RC4 is commonly used in malware, typically to encrypt communications between the malware and its command and control (C2) server or to encrypt its configuration. Although RC4 is not the strongest cipher, it serves the purpose of scrambling up data enough that there is enough entropy that defenses are not able to use patterns for detection, like in the case of XOR encryption in Cobalt Strike. RC4 is also trivial to implement without the need to import libraries, and it is easy to generate or store a key inside the malware. The malware Symbiote uses RC4 heavily to encrypt communications and gather information:

Symbiote Deep-Dive: Analysis of a New, Nearly-Impossible-to-Detect Linux Threat

Below is a screenshot of the Symbiote usage of RC4. The start of the RC4 algorithm starts by creating the identity permutation, which is a block of bytes ranging from 0x00 to 0xFF. This is a very distinctive way to identify RC4 usage. After the identity permutation is created, the key bytes will be mixed in to create the pseudorandom keystream.

Another use of a stream cipher in malware is the use of *ChaCha20 *in the APT malware StageClient. ChaCha20 is used to encrypt the configuration, protecting it from static analysis.

For a visualization of the ChaCha20 cipher, we recommend watching this Computerphile video:

### Block Ciphers

Block ciphers are another way of implementing symmetric encryption. Block ciphers encrypt fixed-size data blocks compared to doing it one bit/byte at a time, like in a stream cipher. Many block cipher algorithms are very strong, and therefore, they are heavily used in everyday encryption. Also, because of the strength of block cipher encryption algorithms, they are also favored as some of the main components in ransomware, often used in conjunction with asymmetric cryptography to form a hybrid cryptosystem, which we will discuss more in a future blog. Block ciphers can have different modes of operation. Different modes of operation are ways that the block cipher can be applied to increase the security of the encryption beyond basic blocks. Electronic Codebook (ECB) is the most basic block mode of operation, simply just encrypting each block. This is considered weak as data with repeating bytes will have repeating ciphertext, leading to patterns being able to be discerned. Another mode, Cipher Block Chaining (CBC), XORs the plaintext block with the previous ciphertext block to create more randomness and disrupt patterns. The image below shows a bitmap of the Intezer logo, being AES encrypted with ECB in the middle and CBC on the right. The logo pattern can still be discerned in the ECB mode, whereas in CBC, it is completely random. Many modes of operation also require what is called an initialization vector (IV). An IV is a small piece of data, usually a random number, that is used to ensure that two identical plaintext blocks encrypt to different ciphertext blocks, ensuring additional security and the removal of patterns.

For AES, there is also an instruction set that is integrated into many processors. This greatly speeds up the use of AES for systems using compatible processors. Hardware accelerators for AES make block encryption with AES quicker than many stream ciphers, which are typically faster than block ciphers. Most Intel chips support an AES instruction set. Whereas many ARM chips for mobile phones do not. For this reason, Google Chrome developers made the cipher suite for TLS on Android devices use the stream cipher ChaCha20, increasing speed and saving battery.

### Block Cipher Use in Malware

AES is one of the most popular symmetric block encryption algorithms. It is used by many ransomware to quickly and securely encrypt files to hold for ransom. It is also used to encrypt communications between malware and its C2 server often too. One such example is the Elephant Framework, used to target organizations in Ukraine:

Elephant Framework Delivered in Phishing Attacks Against Ukrainian Organizations

AES block encryption is used in multiple places in the malware. One part is the GraphSteel client component, where AES is used to encrypt WebSockets messages with the C2. Another area is in the GrimPlant component configuration. The C2 address is passed to the implant through a command line flag “-addr”. The passed argument is base64 decoded, and AES decrypted in CBC mode with a hardcoded embedded key. In the below screenshot, the constant S-box for AES in Golang is shown in the GrimPlant component. Golang statically compiles libraries into the built binary, therefore it is easy to detect when AES is being used and compare it with the source code.

## Asymmetric Encryption

Asymmetric encryption, also known as public-key encryption, provides a secure way to exchange encrypted data between two parties without needing them to share a common secret key. In asymmetric encryption, each party possesses a key pair consisting of a public key and a private key. The public key is freely shared and used for encryption, while the private key is kept secret and used for decryption. The keys are mathematically related so that data encrypted with one key can only be decrypted with the corresponding key from the pair.

Asymmetric encryption involves key generation, encryption of the message using the recipient’s public key, transmission of the encrypted message, and decryption by the recipient using their private key. This process ensures secure communication without the need for a shared secret key.

Asymmetric encryption reduces the need for a secure channel to exchange a secret key. The public keys can be freely shared without compromising the security of the encryption. It ensures that only the intended recipient with the corresponding private key can decrypt and read the encrypted data.

However, asymmetric encryption is generally slower and computationally more intensive than symmetric encryption. Therefore, it is often used for key exchange, digital signatures, and secure communication of smaller amounts of data.

There are several well-known asymmetric encryption algorithms:

- RSA (Rivest-Shamir-Adleman): RSA is based on the mathematical properties of large prime numbers and modular arithmetic.

- Elliptic Curve Cryptography (ECC): ECC leverages the mathematics of elliptic curves over finite fields. It provides strong security with relatively smaller key sizes compared to other asymmetric algorithms.

- ElGamal: ElGamal is an asymmetric encryption algorithm based on the Diffie-Hellman key exchange. It provides both encryption and digital signature functionalities.

- DSA (Digital Signature Algorithm): DSA is a widely used algorithm for digital signatures. It ensures data integrity, authentication, and non-repudiation. DSA is based on mathematical concepts from modular arithmetic and prime numbers.

- ECDSA (Elliptic Curve Digital Signature Algorithm): ECDSA is an elliptic curve-based digital signature algorithm. It offers the same security guarantees as DSA but with shorter key lengths.

Each cryptographic system has its specific algorithm for generating a public key, leading to the use of different sets of elements in each case. To ensure standardized representation, structured formats have been established. The two most prevalent definitions for representing public keys are X.509 and PEM (Privacy Enhanced Mail).

PEM uses base64 encoding to represent binary data, and it is typically used with different types of cryptographic data, such as public keys, private keys, and certificates.

SubjectPublicKeyInfo (SPKI) is a specific format defined in the X.509 specification to represent a public key and its associated algorithm (such as RSA, DSA, or ECDSA) and optional parameters.

PEM is often used as a container format for storing public keys in the *SubjectPublicKeyInfo* structure. When a public key is represented in the *SubjectPublicKeyInfo* format, it can be encoded and stored in a PEM file by adding header and footer lines (often “BEGIN PUBLIC KEY” and “END PUBLIC KEY”) around the base64-encoded *SubjectPublicKeyInfo* data.

**PEM is a general encoding format, and when it comes to representing public keys, ***SubjectPublicKeyInfo*** is a specific structure that can be PEM-encoded for practical use in various cryptographic applications.**

### RSA (Asymmetric Encryption) Uses in Malware

In August 2021, our research team discovered a highly sophisticated and fully undetected malware named Vermilion Strike. The malware is a full implementation of Cobalt Strike’s beacon targeting Linux systems and was later also detected in Windows samples, indicating it is a cross-platform threat. Vermilion Strike utilizes Cobalt Strike’s Command and Control (C2) protocol for communication with the C2 server and possesses remote access capabilities like file uploading, running shell commands, and file writing.

Like the standard Cobalt Strike implementation, Vermilion Strike employs XOR key encryption to decrypt the beacon’s configuration. However, it also utilizes RSA encryption for encrypting information collected during the fingerprinting process of the infected endpoint.

First, the threat imports a public RSA key using a call to *d2i_rsa_pubkey*, which decodes and encodes an RSA public key using a *SubjectPublicKeyInfo* format.

Vermilion Strike then gathers specific system information from the compromised endpoint. The collected data includes details like the kernel version, network information, current effective user ID of the process, and hostname. Once this information is assembled, it is formatted into a string.

Next, the string is encrypted using the public RSA key. RSA encryption ensures that only the corresponding private key, typically under the control of the threat actor or C2 server, can decrypt and access the collected data. To facilitate communication with the Cobalt Strike server, the encrypted data is then base64 encoded, a common standard for transmitting binary data as text. The encrypted data is sent to the C2 server in a similar way that the metadata is sent from a Cobalt Strike beacon to the C2 server.

## Spotting Encryption in Malware

There are several techniques analysts can use to detect the usage of encryption in malware samples.

### Imports

Usually, one of the first things that would be checked when analyzing a sample is the list of imported functions, as knowing which functions are being utilized by the malware can provide valuable insights into the threat’s capabilities. To identify the usage of encryption in malware, we need to look for cryptographic libraries.

For malware that targets Windows hosts, we would look for a *wincrypt* or MbedTLS library or a .NET-based API named *System.Security.Cryptography*.

For threats that target Linux hosts, we would look for one (or more) of the following libraries: OpenSSL, GnuTLS, Mbed TLS, libgcrypt, and Crypto++.

Since imports can significantly aid researchers (and security tools) in identifying the usage of specific methods, such as encryption, and facilitate faster and relatively easier analysis, many malware authors obfuscate fundamental function imports. Typically, these imports would be dynamically resolved and loaded by the malware within the relevant function. Consequently, it becomes more challenging to pinpoint the utilization of imported library functions.

### CAPA

CAPA is a Python-based open-source tool developed by FireEye. CAPA stands for “Common Analysis Platform for Artifacts.” It is designed to assist in analyzing malware samples by identifying common patterns and behaviors exhibited by these samples. For instance: usage of encryption, process creation, communication with C2 servers, retrieving information from the host, etc. This tool aids in understanding the potential impact and behavior of malware, enabling faster and more effective incident response and threat analysis.

CAPA can be integrated into tools such as IDA, Ghidra, and Radare2 and help researchers quickly identify known malicious behaviors in a sample. As seen in the screenshot below, when running CAPA for the Vermilion Strike sample, it identifies the implementation of RSA and XOR encryption. It provides the addresses of the relevant functions.

Having said that, CAPA is not a magic bullet, and in some cases, it might fail to detect certain behaviors due to unorthodox techniques implemented by threat actors or novel and non-conventional implementations of certain methods. In these cases, we would have to rely on other techniques to identify encryption implementation in malware samples.

### Identify key parts of the encryption algorithm

Identifying key parts of an encryption algorithm in malware requires conducting a detailed code analysis. Pay close attention to data manipulation involving boolean algebra operations like AND, NOR, shifts, and various operations for loading values from the stack (as can be seen in the screenshot below). These operations often indicate encryption-related transformations and can provide crucial insights into how the malware encrypts and manipulates data.

#### RC4

In addition, having knowledge of the general algorithms of commonly used encryption schemes can significantly aid in quickly identifying the usage of specific encryption methods. For example, RC4 implements a Key-scheduling algorithm (KSA) that consists of two for-loops ranging from 0 to 255, as shown in the following code snippet. Recognizing these distinctive loops can indicate that the relevant function implements RC4.

for i from 0 to 255 S[i] := i endfor j := 0 for i from 0 to 255 j := (j + S[i] + key[i mod keylength]) mod 256 swap values of S[i] and S[j] endfor |

Pseudo code for RC4 Key-scheduling algorithm (source)

Below is the implementation of the code above in the CryptoClippy malware, we can identify the two loops that are part of the KSA algorithm as described above.

#### ChaCha20

In the key setup stage of the ChaCha20 encryption algorithm, a fixed constant string, “expand 32-byte k,” as defined in the ChaCha20 specification, is utilized. This constant serves the purpose, along with a counter and nonce, of expanding the 256-bit key into a larger 512-bit keystream. The expansion process is instrumental in generating a robust and secure pseudo-random keystream, which is crucial for ensuring the confidentiality and integrity of encrypted data.

Consequently, the presence of this constant string is a distinctive artifact that often appears in malware samples, hinting at the potential usage of ChaCha20 encryption. Below is the constant string in StageClient, along with the beginning of the initialization state, with the source code for comparison.

rule likely_use_of_chacha20 { meta: author = “Intezer” description = “Likely use of ChaCha20 Cipher” reference = “https://intezer.com/blog/research/unraveling-malware-encryption-secrets/” hash = “9b48822bd6065a2ad2c6972003920f713fe2cb750ec13a886efee7b570c111a5” strings: $mov_bytes = {?? ?? ?? ?? 65 78 70 61 ?? ?? ?? ?? 6E 64 20 33 ?? ?? ?? ?? 32 2D 62 79 ?? ?? ?? ?? 74 65 20 6B} $string_literal = “expand 32-byte k” condition: any of them } |

#### AES

An S-box, short for Substitution box, is a fundamental component in many symmetric encryption algorithms and block ciphers. It is used to perform non-linear substitution of bits or bytes, introducing confusion in the data and enhancing the cryptographic strength of the algorithm. Confusion ensures that the relationship between the plaintext and ciphertext is complex and obscure, making it resistant to various cryptographic attacks, such as differential and linear cryptanalysis.

In an S-box, each input value (a bit or a byte) is substituted with a corresponding output value based on a predefined substitution table or mathematical function. This substitution table ensures that the relationship between the input and output values is highly non-linear and non-reversible, making it challenging for attackers to deduce the original data or the key used in the encryption.

S-boxes are widely used in popular encryption algorithms like AES (Advanced Encryption Standard), where they are applied during the substitution layer of the encryption process. In AES, the S-box substitutes each byte of the input state with a corresponding byte from the S-box, which is fixed and defined by the AES standard.

AES has known boxes that anyone, including malware developers, can use. For example, the implementation of AES in GoLang defines the following s-box:

var sbox0 = [256]byte{0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5, 0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76, 0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0, 0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0, 0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc, 0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15, 0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a, 0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75, 0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0, 0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84, 0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b, 0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf, 0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85, 0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8, 0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5, 0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2, 0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x44, 0x17, 0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73, 0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88, 0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb, 0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c, 0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79, 0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9, 0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08, 0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6, 0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a, 0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e, 0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e, 0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94, 0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf, 0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68, 0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16, } |

Researchers can make YARA rules to detect these known S-boxes to identify the use of AES.

Researchers can create YARA rules to identify the use of AES by detecting known S-boxes used in the algorithm. By analyzing the binary representation of the encrypted data or examining memory dumps, researchers can search for byte arrays that match the predefined S-box patterns. This can be a valuable technique for identifying the presence of AES encryption in malware or other cryptographic applications.

### Debugging

Encryption and decryption routines in malware can be highly intricate, and some may even have a proprietary implementation, adding to the complexity. As a result, analyzing these routines can be a tedious and time-consuming process. In such cases, it is preferable to debug the malware sample within an isolated environment. Doing so can provide valuable insights into the data manipulation that occurs during encryption and decryption. By debugging the encryption/decryption routines, security analysts can uncover the interesting payload more efficiently, which is particularly crucial during time-sensitive situations like incident response.

## Conclusions

Encryption plays a significant role in both our daily lives and the tactics of malware developers. As a result, understanding the fundamental concepts of encryption, distinguishing between symmetric and asymmetric encryption, and being acquainted with common encryption algorithms are essential skills. Identifying crucial elements of these algorithms in malware samples can greatly aid researchers in recognizing the use of encryption in threats. This insight can lead to a deeper understanding of other capabilities and components of the threat, facilitating more effective analysis and response strategies.