Introducing Hank

Todd Lewis, tlewis@mindspring.com

Hank (Hank Acts on Network Kaptures) is a new program written by myself and released under the GNU GPL. Hank embodies a union of several previously distinct types of network programs; it can be used as a network intrusion detection system (NIDS) like snort, a network analysis tool like tcpdump, and as a packet-filtering firewall, like those found in many operating systems. This article describes Hank's architecture and features and talks about where the project is headed in the future.


Table of Contents
1. Program Information
How to get Hank
How to compile Hank
How to run Hank
2. Feature Overview
Protocol independence
If you can match it, you can report it
Rules can be arbitrarily complex
Packet acquisition, firewalling and policy transformations
ACBM implementation
Snort compatibility
3. What the future holds for Hank
Fine tuning of rules
Rule management infrastructure
Write plugins!
Needless microoptimizations
4. Conclusion

Chapter 1. Program Information


How to get Hank

Hank is hosted at Sourceforge. The project's home page is here. You can get Hank executables for Linux as well as the source code in tarball format on the Sourceforge project page. Finally, you can browse the source code in CVS.

If you are going to use hank, then you may want to join the hank-devel mailing list.


How to compile Hank

I have been developing Hank on my own, and so the only real guarantee that I give concerning the source code is that it compiles and works on a recent Debian Linux x86 machine. Fortunately, Hank is coded to POSIX norms and should run anywhere that the standard C interfaces (*printf, opendir, signal, etc.) exist, which should include every modern unix and, presumably, win32 machines with POSIX compatibility.

The only external libraries that Hank must have are libxml and libxslt. Fortunately, both of these libraries are extremely platform-independent and well tested on a wide range of systems.

As far as optional external libraries, the big driver are the paengines. The pcap engine needs libpcap. The netfilter paengine needs libipq. The divert socket paengine needs a BSD libc. Etc. If a paengine that you do not need is giving you compilation trouble, then it can be removed.

Since this is Hank's first public release, I am sure that there will be headaches associated with getting it compiled on the various platforms out there. I am also sure that such headaches will be easy to solve. Please do not be bashful about posting to the hank-devel list with any problems you have; I love helping users.


How to run Hank

Running "hank -h" will give you a summary of command-line options. Here is a common way to invoke hank: "./hank -l . -r hank_rules" This runs hank with its default packet acquisition mechanism, pcap, using pcap's default behavior, which is probably to grab packets from the first interface, eth0. If you want to read packets from a capture file, then you pass a paengine parameter,like this: "./hank -l . -r hank_rules -p pcap -P rf=./packet_capture_file" Of course, pcap isn't the only mechanism available. If you want to grab packets from the kernel and act as a firewall using Linux's netfilter system, then you can chose that mechanism: "./hank -l . -r hank_rules -p netfilter" If you have trouble running Hank, then hop onto our mailing list and ask questions.


Chapter 2. Feature Overview

Protocol independence

Unlike many existing NIDS programs, Hank is not hard-coded to support any particular protocols. Rather, protocol support is modular in the form of a protocol engine (pe), and any protocol can be supported at any layer. Thus, Hank can deal with tunneled protocols, link-layer protocols, non-IP protocols and other facts of life dealt with in the real world but not well supported by competing programs.

Rather than adding support for matching individual protocol fields on an ad-hoc basis, Hank supports protocols in a systematic and comprehensive way. Although any method can be used to create a piece of code conforming to the pe interface, all of the existing protocol engines are coded using a completely automated transformation process. The protocol is described in an xml file created for this purpose, and that xml file is transformed into C code. (If the protocol needs special handling, then hand-written code may be included to perform specific functions.) Each field in the protocol represents one of the Hank data types (hdt), and every operation on a data type is supported for every field of that data type. Thus, e.g., every integer field in every protocol can be tested for all of the integer operations, from "less-than" and "equal-to" to "modulo-equal-to" and "within-range".

This stands in marked contrast to existing NIDS programs where some integer fields can be tested for equality, some for numerical inferiority and others can be bit-mask-compared, which operations are available being dependent on what the author happened to add support for that day. It also stands in marked contrast to other NIDS programs where many such fields, even in mainstream protocols such as IPv4, can not be matched at all! With Hank, you can match all fields using all data type operators, and adding support for new data types, protocols, fields and operators are all easy, fast and orthogonal tasks.


If you can match it, you can report it

Every field that can be matched on can be reported on, at least for those data types that support reporting. The string (blob) data type does not support reporting, since it is potentially unbounded in the case of streaming protocols, but the integer and address data types do support reporting. More importantly, since all of the programatic interfaces support reporting, adding such support to data types that lack it can be done easily and quickly at any time in the future, and the automatic code-generation system will propagate such support to all protocols that use that data type.


Rules can be arbitrarily complex

Just as matching is flexible, comprehensive and easily extensible, the syntax for Hank's rule file is flexible, can handle any possible matching criterion, and is easily extensible. Again, this stands in marked contrast to the usual practice of having a hand-coded rule syntax which is extended in an ad-hoc manner when new features are added. The key to this flexibility is Hank's use of the eXtensible Markup Language (XML) to represent rules. Extending Hank's parser in the usual ad-hoc way would be exceedingly difficult; thanks to Daniel Veillard's excellent libxml, Hank doesn't even have a parser. Instead, all rules are specified in Hank's xml rule syntax. They are even validated by libxml according to the Document Type Definition (DTD) before Hank ever sees them, and so ungrammatical rules are rejected before they can cause any trouble. I know of no other NIDS that validates its rules.

Hank's rules differ from other NIDS programs in other ways, too. Rules consist of groupings of actions, of which matching is one. Matching actions are segregated from other types of actions (like reporting) and from syntactical sugar, like version numbering, that the NIDS doesn't care about. As a result of that separation, Hank can be aggressive about optimizing its evaluation of matching criteria. Matching primitives consist of a field, an operation, and a test value. Primitives are combined together using AND, OR, and NOT elements. That combination can be arbitrarily simple, from a single primitive, to arbitrarily complex, with groupings and negations descending until you exhaust your computer's memory. Unlike many NIDS programs which mandate, e.g., that there must be a source IP address defined for each rule, there are no mandatory fields in Hank rules, either. Indeed, with our protocol independence, there can't be; what, exactly, would the IP address field represent in an Appletalk rule?


Packet acquisition, firewalling and policy transformations

Unlike all major existing NIDS programs, Hank is not tied to a particular mechanism for acquiring packets. The most common of these mechanisms is the packet capture library, libpcap. Hank relies on an abstracted "packet acquisition engine" (paengine) interface to acquire packets. Each paengine supports a single mechanism for acquiring packets, and which engine you use can be set at run-time. Hank can use libpcap, the Linux 2.4 netfilter ip queue mechanism, or the BSD/Linux 2.2 divert socket mechanism for acquiring packets. Adding support for a new acquisition mechanism is a simple matter; none of the existing paengines is more than 300 lines of code, and most of that is boilerplate.

However, acquisition is only half of the paengine's job; the other half is disposition. The traditional mechanism, pcap, was a read-only interface, and so this is how most people conceive of the NIDS role. However, the free operating systems have made big progress over the past few years in tearing down the barriers between user-space and kernel network processing, making it possible not only for programs to examine network traffic, as pcap has allowed them to for a long time, but to take part in firewalling and deciding whether a packet should be modified and forwarded, forwarded intact or discarded.

Thanks to this recent functionality, as part of your Hank rules, you can set a firewall verdict, and the paengine will enforce this verdict. (A warning will ensue with a firewall-incapable paengine like pcap.) Both firewalls and NIDS programs examine traffic to determine if it meets security criteria, and yet they have remained distinct realms. This has caused huge problems for people who want to use the strength of NIDS as the basis for active-response firewalls. With Hank, the entire problem can be solved in a single step: if you don't like what you see in Hank, you can tell the operating system to throw it away.

To reiterate, Hank is a complete user-space packet filter, capable of deciding to block single packets based on a rich matching ruleset. It combines the expressiveness of an NIDS with the power of a packet filter, surpassing both in utility. For this reason, Hank constitutes a true advance in network security software beyond the previous state of the art.

Such an advance, however, is not without complications. With an NIDS transformed from a passive observer to an active element in the network, simply describing attacks in rules is not enough; rules must also declare an action to be taken against packets that match. The temptation is simply to add such decisions to rules, just as you would add any other element. This, however would be a mistake. Consider, after all, that the usual course is for an expert to write a rule once and for that rule to be widely distributed. How is the rule author supposed to make a single binary decision that is appropriate for all potential users of the rule? Some users may want to be very restrictive, whereas others will want only to stop the worst traffic, or none at all. The rule author can state the severity of the attack described, but the decision of whether to block the attack is a matter of local policy, and he must leave it to each site to decide.

How to allow this, however, without forcing tedious manual revision of the rule file by each operator for each new release, clearly a very expensive burden? The answer is to allow the local operator to state his local policy, which can be one of a standard set of policies, and use that policy automatically to transform the attack description into a series of events for Hank, according to the severity of each attack. This is exactly what Hank does using the XML Stylesheet Transformation language (xslt), as implemented in Daniel Veillard's libxslt. With XSLT's expansive ability to examine, test and transform XML documents, almost any policy can be expressed in a very simple language. Hank is distributed with a number of standard policies, any of which can be selected at run-time. The local policy is used to take the very rich rule declaration, which can include version information, severity, attack type, etc., and transform it into a tree of Hank events, which Hank then uses to process packets when they come in. The transformation rules can be simple ones based on severity or complex ones based on the version number of the rule, the target of the attack, or any other criteria specifiable with XSLT. Thus, rule writers can write good rules, administrators can pick a standard policy transform or write or customize their own, and everyone gets what they need.

I must here point out that XML is what makes all of this possible. To my mind, a programmer would have to be a fool to write his own parser today absent an exceedingly good reason not to use XML. The code for this entire system took a few evenings to write, most of which consisted of reading the libxml/libxsl documentation and experimenting. A custom solution would have taken months and yielded a less powerful result. Throw in document validation, the DOM interface, and namespace support, and the justification curve for ignoring XML becomes exceedingly steep. Just contemplate providing users with a language as expressive as XSLT for specifying policy transforms and you'll start to see the magnitude of work that must be undertaken if you wish to ignore XML and duplicate its functionality.

As I've mentioned, what Hank ends up seeing at the end of this process is simply a tree of events. Matching is one class of event, as are reporting and setting firewall verdicts. These event handlers are designed to be cascadable and to perform whatever function needs performing. It is my sincere hope not only to port the major snort plugins to Hank event handlers, but also to implement rate limiting, bridging, NAT and traffic normalization. There are some enhancements that need to be made to the event API before these projects can begin in earnest; specifically, there needs to be a robust and fast parameter passing mechanism. This is the main feature planned for the next release of Hank, and after that a thousand processor flowers may truly bloom in this former desert.


ACBM implementation

With the explosion of public research in NIDS's over the past few years, few researchers have been as active and as productive as the guys at Silicon Defense. In one of their fits of original thinking, they diagnosed a major performance problem with most NIDS programs in the fact that the content of packets is searched over and over again when looking for multiple patterns. They also proposed and implemented a solution for this problem: the use of an Aho-Corasick Boyer-Moore (ACBM) unified search tree to look for all patterns in a single pass through the data.

Unfortunately, the Silicon Defense ACBM implementation was both memory inefficient and very tightly tied to snort. Even more unfortunately and unbelievably, snort's maintainers didn't even integrate their work! However, their research was very instructive on how to solve this problem. As a part of Hank, I have independently implemented the ACBM search algorithm and use it for all content searches. (It was rough, too! I've never coded anything as difficult as the ACBM code, but eventually, after a month and a half of reading the algorithm description over and over, I got it working and am happy with the result.) As a result of this code, Hank needs make only one or two passes (one pass each for caseful and caseless matching) over packet payloads searching for strings, regardless of how many patterns you have registered. With most NIDS programs, the number of passes, and therefore the total time cost, rises in proportion to the number of patterns. With Hank, more patterns merely require more memory; the speed of the search is mostly constrained by the length of the shortest pattern.

As Coit, Staniford and McAlerney point out in their paper, pattern matching is often the critical performance bottleneck in NIDS programs. With this implementation, Hank should be in a very competitive position when it comes to performance.


Snort compatibility

What good is a house with no furniture? Hank needs rules in order to be useful. To that end, I have created a conversion script that takes snort rules and converts them to Hank rules. It was able to convert all of the rules in a recent snort ruleset save three, and those three were actually invalid rules that snort's parser accepts. (I reported the problems, and Fyodor Yarochkin has fixed them in snort.) I then used snot to generate packets that matched the rules. I ran both snort and Hank against the generated packets and, every time I discovered a rule that snort detected and Hank did not, I debugged the conversion script or Hank until it did. As a result, I can with high confidence say that today Hank is truly compatible with snort's ruleset.

Unfortunately, I can't say that it performs very well with these rules. With Hank, the onus is on the rule writer to structure the tests in an optimal way, and the present snort_converter.pl script does not do a good job of that. While my microbenchmarks of Hank's performance on individual operations indicate that Hank will be very competitive with snort in terms of performance, the rules will have to be tuned before this will happen. This is one of the main reasons that I do not yet suggest Hank's use in a production capacity.


Chapter 3. What the future holds for Hank

So, that's where Hank is now. What does the future hold for Hank? Lots of improvements.


Fine tuning of rules

There is one modification to the event system that needs to be made before the rules can be staged properly for good performance. I hope to have that modification done soon, the snort converter script updated and good rules available. Help with the project from others could speed this process.

Precisely speaking, the event system needs to be updated to allow parameter passing along with events, probably in both directions. With that change, I probably will go ahead and turn the packet itself into just another parameter. Additionally, the packet acquisition engines should be convertible to simple event handlers. If that happens, then not only will the Hank core not know about protocols, it won't know about networking, either! The big hold-up on this project is getting the XML syntax right; it's proven surprisingly slippery, but I hope to have it nailed soon.

Additionally, it should be possible at that point to start testing Hank's threaded mode, where events are dispatched in parallel. This capability has been built in from the get-go, with each event handler advertising its degree of threadability; non-reentrant handlers can be wrapped in a mutex, and reentrant handlers can be invoked with abandon. Depending on the threading mode (one-packet-per-thread at a time, one packet at a time, etc.,) good results should be obtainable with the existing code, and better results should come from flagging performance-critical event handlers (like the matcher) as reentrant, which will happen once I'm absolutely sure that they really are.


Rule management infrastructure

Rule management should be easy, much easier than it is right now. Fortunately, since Hank rules are XML files, there is a large and growing body of tools available to manage them. Two XML technologies in particular stand out as holding promise for Hank. Jabber, the open-source chat system, might be a good platform for distributing rules. With its peer-to-peer nature, it might be possible to engineer a system where individuals can submit new rules, these rules can be tested and vouched for, and they can automatically be added to Hank installations depending on the local policy constraints, such as what score a given new rule must have, its severity level, whether it's been vouched for by one (or more) of a set of trusted authorities, etc. Such a system could allow for much faster rule addition as well as much more review of the rulesets, goals which under the status quo are in conflict with each other. Second, and relatedly, a security framework to guarantee the integrity and authenticity of Hank rules will be important. While a simple PGP-based solution may suffice for the time being, in the long run I am hopeful that the W3C's efforts to add signatures to XML will bear fruit useful for Hank.

Farther in the future, I hope to add a new practice to the NIDS arsenal: dynamic rule addition. In designing Hank's internal APIs, I have tried to keep them clear of any obstacles to adding and removing rules at run-time, and several of the subsystems have active provisions to allow for dynamic rule addition. Hopefully, at some point in the future, one will be able to feed a live stream of rule modifications into a running instance of Hank without the need for restarting, a distasteful operation for all NIDS's.


Write plugins!

Once the updated event system is available, it should be possible to start writing Hank plugins in earnest without worrying about the APIs changing from underneath you. At that point, wholesale porting of the Snort plugins will commence. Further, I am excited about supporting the notion of packet normalization, and so that also might be a sooner-rather-than-later feature.


Needless microoptimizations

Once all of the important optimizations have been written, I really want to do some needless microoptimizations. I keep looking in the Linux kernel tree at their mmx-optimized page clearing functions and drooling; mmm, yummy SIMD goodness. If Intel or AMD gives me a fast SMP system to replace my dual 200MHz box, then I will start immediately on 3dNOW/MMX/SSE-optimized versions of memcpy, memset, and a function that I've never seen but I could really use, strnchr. (The last one will dramatically speed up http protocol decomposition by allowing you to examine 16 characters at a time looking for that damned newline.) My hope is that everything else can be tuned to the point where these routines matter. I am saving them to until that time, however, so that I will stay motivated to solve the important tuning problems first. 8^) These will, however, only come once these routines become the performance bottleneck, which I hope will be soon.


Chapter 4. Conclusion

So, that is Hank. I've been working on it since last spring, and I'm pretty happy with it. It's not perfect yet, but as I've said above, I think that its design represents several major advances and that its problems are superficial ones, typical of the early stages of a project. It has reached the point that I think that experienced programmers and network security administrators can look at it and make informed decisions concerning its design and promise. To that end, I encourage other network security researchers, NIDS programmers and users, and anyone else interested to download Hank, look at it, test it, and think about the problems it tackles. Hopefully a few people will be interested enough to help work on it, but criticism alone would be a big help as I try to keep my thinking fresh on how to proceed.

Hank grew out of my own frustration in trying to bridge the gap between NIDS and firewalls. After an attempt to adapt an existing program to solve the problem, I became convinced that a fresh approach could result in a substantially improved system. My experiences so far have reinforced that belief, and I think that Hank does represent some significant advances. I certainly hope that Hank will triumph and become the dominant program in this space, widely-used and widely-enjoyed. Whatever the outcome that time holds, however, just building it has been a rewarding experience in its own right, and I hope that others find it similarly rewarding.