Most people who work in the anti-malware industry are familiar with signature-based detection, where if a file is determined to be malicious, a signature is written so anti-malware programs are able to detect that file or component in the future. The threat landscape is challenging for signature-based detection with an ever-increasing number of threats and the shortened duration time for the effectiveness of a single signature variation.
Because of these difficulties, complements to signature-based detection, such as heuristic-based scanning, sandboxing and/or multi-scanning (scanning for threats with multiple anti-malware engines) are needed to more effectively address modern risks. Here, we look at the pros and cons of both heuristic-based scanning, which is used alongside signature-based detection in multi-scanning solutions to increase detection rates, and sandboxing.
Introduction to Heuristic-Based Scanning
As opposed to signature-based scanning, which looks to match signatures found in files with those in a database of known malware, heuristic scanning uses rules and/or algorithms to look for commands that may indicate malicious intent. By using this method, some heuristic scanning methods are able to detect malware without needing a signature. This is why most antivirus programs use both signature and heuristic-based methods in combination to catch any malware that may try to evade detection.
Benefits of Heuristic Scanning
- Heuristic scanning is usually much faster than sandboxing because it does not execute the file and then wait to record its behavior, with the exception of some emulation-based techniques.
- Vendors can change the rules in their heuristic engines with their daily update packages based on new threat vectors without the details being known to malicious actors.
- It does not give away details on how malware is flagged (unlike sandboxing), so malware authors will be unaware of what they need to change in order to evade detection.
- Heuristic scanning is able to detect malware that can evade sandbox detection through blind spots targeted by malware authors.
Limitations of Heuristic Scanning
- When scanning a sample, the information found is generally limited to the threat name.
- Because the engines are looking for specific pieces of code that indicates a malicious action, two potential limitations emerge:
- If the vendor has not built detection for a particular action, the malware will evade detection.
- If the malicious action is obfuscated successfully (e.g., within an encrypted file), it will evade detection.
- Some of the older methods of heuristic-based scanning have a higher propensity for reporting false positives because they are looking for a wide range of actions that could indicate a potentially malicious file. Newer methods of heuristic scanning such as generic detection, however, produce false positives less frequently. Generic detection works by looking for features or behaviors that are commonly seen for known threats.
Introduction to Sandboxing
Sandboxes consist of some sort of purpose-built environment, usually virtualized (in some cases physical), where the potentially malicious files are executed and their behavior is recorded. The recorded behavior is then analyzed automatically through a weights system in the sandbox and/or manually by a malware analyst. The goal of this analysis is to determine whether the file is malicious and, if it is, what exactly the file does.
Benefits of Sandboxing
- Because sandboxing actually opens the file being analyzed, it is able to see in detail exactly what that file will do in that particular environment.
- Instead of a binary yes/no and threat name, most sandboxes offer reporting with details on the behavior recorded. In addition to providing more information on how to classify the file, this method can be particularly useful in an incident-response environment to identify exactly what the intention of the file was and thus understand what the effects are.
- Though it varies by product, many offer the ability to create a highly custom environment. For example, a piece of malware that is designed to fully execute only on a particular user’s machine can be replicated.
Limitations of Sandboxing
- Because of the visibility to their methodology and customization that is available in commercial sandboxes, malware creators can build specific behaviors to get around detection. That includes two main categories:
- “Sandbox-aware” malware which is able to tell it is being executed in a sandbox and will act differently to avoid being flagged as malicious. This action may be as simple as not running on any virtual machine, or something more advanced looking for signs specific to a sandbox.
- Blind spots will vary on the basis of the product, but in some cases malware creators have implemented pathways to act maliciously in ways that cannot be detected by the sensors of a particular sandbox.
- There needs to be an environment to execute the sample and the time necessary to collect full reports, particularly if trying to accommodate stalled code execution; it takes both a large amount of time and hardware resources to process a given sample, causing relatively low throughput.
- Although the industry trend is toward automated sandboxes, many still only provide the raw data on behavior of the malware. It is thus necessary to either build a custom application to interpret the information or have a malware analyst manually review the information.
- Owing to the overhead time in running them, many sandboxes are optionally or completely cloud-based, rendering sensitive files unusable.
As detailed above, sandboxing does have its limitations. We recommend using sandboxing in combination with other methods, like multi-scanning, to increase malware detection rates.
Both heuristic-based scanning and sandboxing present unique strengths and weaknesses, and for different situations one scanning method may be more appropriate than the other. The best security comes from using both methods simultaneously to minimize the number of samples that may be able to evade detection. Multi-scanning (scanning with multiple anti-malware engines) takes advantage of the differing heuristic algorithms of many scan engines.
As proof of the benefits of using many scan engines for a layered approach, we tested a cloud-based multi-scanning solution, Metascan Online, which is powered by anti-malware engines that use both heuristic and signature-based methods to detect threats.
By looking at the statistics page results above, you can see that the percentage of threats detected increases as more anti-malware engines are added. The use of 4 anti-malware engines led to detection of 85.59% of the top 10,000 threats compared with 16 anti-malware engines plus 14 custom engines, which detected 98.83% of the same threats. These statistics highlight the value of multi-scanning; every engine has different strengths and weaknesses, so the more engines you have, the greater the chance of detecting threats.
About the Author
Curtis Cade is a sales engineer at Opswat.