I haven’t posted any security related posts in a while. Richard Bejtlich recently wrote an article illustrating a general event model of an analysts reaction using just
an IDS vs an NSM based operation. I thought it would be a great opportunity to illustrate that process with an example to go along with the "Elvis" slides. The following is based on an incident, and will illustrate the NSM investigative process and using data to respond to the incident and make a business decision to modify network policy.
I’m cleaning out old events that have accumulated, and setting back up our sensor after an OS update deprecated some of the shared libraries used by Snort and SANCP. Once I got everything up and running, I saw a whole slew of SSH connection attempts kicked off by Snort. This is a fairly common event, and based on experience I know this is just an automated scan looking for open SSH ports and trying to brute force them.
Now, if this was only an IDS setup, I would only see an alert that an attack was underway. The only information I would have available would be the alert, and any logs. However, because Sguil was designed around the NSM principles, I had more information than that.
The first thing I did was query the event table for the potential attackers IP, and I can see that the connection attempts started at around 6:00 AM, and went on for 23 minutes. At this point, I have a pretty good idea that this is some sort of brute force attack, but not enough information to be sure. I need to investigate further.
The next thing I do is query the SANCP table for this IP address to see all the sessions the attacker has generated. More hits come back for this than alerts, which indicate there were connection attempts that did not trigger alerts because they did not match any of the signatures in Snort. This is an important distinction as to why an IDS or IPS system are not foolproof and shouldn’t be seen as security silver bullets. There were no sessions prior to this series of events, so this is our attackers first visit. In the below screenshot you can see there is a single session before 6:00 AM. This is probably the scan looking for live targets to attack. After this single session with no data sent or received, the connection attempts begin. I can tell by the source and destination data size that nothing more than the initial key exchange and login attempt ever gets done before closing the connection.
I investigate further and pull a transcript, which is a full content of the session. As expected, the data is encrypted because it is SSH traffic. But that doesn’t dead-end the investigation, I still have log data to analyze. I grep the auth.log file on the system in question with the IP address of the attacker. Sure enough, I see that this attacker is indeed trying to brute force our machine with different user and password combinations. The first log entry for the attackers IP shows that they did not send anything, solidifying my original theory that the first session in the SANCP query was to look for live targets.
This illustrates an important point. No 1 tool can handle everything, or give enough of a picture. As I understand it (and it may be a little off, I learned this stuff almost 8 years ago now), it's using all the tools at your disposal to get a big picture view of whats going on in your network, and be able to gather evidence as to make informed decision. Somewhat like Business Intelligence, but in a security perspective.
So at this point, being that I am a decision maker, I decide to block this IP from accessing the system further. I implement an IPTables rules to block them from any further connection attempts. To further reduce exposure, I may disable password authentication, and do white list rules to allow only known machines to connect. I could have blocked this at the firewall, but I didn't, which leads to the next scenario.
Now, this would have been the end of it, except I started to see alerts trigger from another one of our systems. Apparently this other system also had it’s SSH port exposed for remote logins. At this point, I am now aware of a policy violation on the network. So, while this was the exact same attacker just moving on to the next IP address in their scan, I have an entirely different scenario. I have information that leads me to make the decision to modify the firewall rules to close the firewall hole that is exposing this second machine.