Jordan LaRose, Incident Response Consultant
16 mins read
To protect the identities of those involved, all stories in our True Forensics series are a dramatization of events using a mixture of cases to form the basis of the narrative.
Every movement an attacker makes leaves threads of data with their own narrative, unravelling the nature of an incident like a documented testimony. Just like the data collected at a physical crime scene, it’s delicate, degradable over time, and often hard to find. It demands that responders know where to search under pressure and how make critical decisions at the right time. This article is a story about forensics—the art of investigation within incident response (IR).
Our client was a 150,000-strong corporation specializing in business-to-consumer (B2C) messaging software, with global operations. Engagement began some time after the compromise of a Linux web server used by its customers. This platform allowed for bulk business messaging to thousands of recipients at a time and was internet-facing, accessible publicly via login.
All hands were already on deck when we joined the response. The client’s attack surface was extremely broad due to the nature of their service, and it was clear from the start we would need to leverage investigators around the clock to combat the incident. The CISO, Head of IT, and other leaders from across the business were shoulder-to-shoulder with their own teams fighting the chaos the attacker had unleashed.
Although our recent work in the anti-phishing space made us a natural fit for the investigation, a sense of the challenges to come was palpable right away. The attacker had deployed a far-reaching phishing campaign through the platform by hijacking user accounts for some of the client’s most high-profile customers. They had compromised numerous customers and likely had a foothold on the main web server itself. Over 200,000 phishing messages were sent within just 1 hour. By the end of the first day, that number exceeded 1,000,000. Significant reputational harm had been done, and the client’s customers were naturally demanding a stop to the phishing activity being initiated on their behalf.
Before any forensic collection and analysis could take place, we provided recommendations for immediate containment precautions whilst the attacker continued their service abuse. The CISO and their team prepared the firewalls for rule additions and created user account credential reset scripts, while our attention turned quickly to the evidence available. If the client was to have any chance of locating and stopping the attacker, a thorough forensic investigation would be needed to understand their point of entry, their lateral movement strategy, their goal, and—most importantly—their next move.
Initial analysis led us to realize that they had been present on the network for much longer than the month’s-worth of data the client had retained and could account for. With the server being customer-facing, traffic was high, rolling over logs faster than they could be gathered. This restricted what we could learn about the attacker’s tactics, techniques, and procedures (TTPs), or their motives and capability; we had almost nothing to build a picture from. The Apache API logs showed numerous accounts delivering activity from the attacker’s IP addresses, but despite tracing these all the way back to the start of the logs, there was no sign of their entry point. The attacker had clearly made efforts to cover their tracks and delete evidence on the server that would otherwise have indicated their origin and intention. There were gaps in bash history, missing syslog files, and plenty of “set +o history”-like commands from different users.
We pieced together what we could from the remnants of any evidence available—building phishing timelines from the API logs, guessing the attacker’s route in from firewall data—but we still didn’t have the smoking gun, just unanswered (and unanswerable) questions. What we did find couldn’t be used to stop further reputational damage, and stakeholders were growing restless. Between the webserver logs, the Linux server’s piecemeal system log files, and some network data, we could only create a pallid picture of an attacker. There were few answers as to how they were maintaining persistence or how to stop them. The stakes were high. We had a canny attacker hiding in plain sight thanks to the high traffic on the server. They had gained almost full control of the estate and we didn’t have a lead.
Experience has shown me that sometimes there really is no way forward to tie up an investigation cleanly and eradicate the threat there and then. Sometimes “now” just isn’t the moment you’ve been waiting for. That’s the judgement call you have to make. This time though, that experience told me we had missed something. If there’s one thing that you can rely on in security, it’s persistence.
In cyber forensics, it’s essential to recognize when you’ve reached a dead end, retrace your steps, then try a different route. The client’s IT team was being inundated with questions and complaints from customers affected by the attack. We had to find a way to nail down the attacker before the situation spiraled out of control. So, instead of continuing with any sterile hypotheses, we turned to a deeper analysis of the Linux server. This had been imaged to provide a snapshot of the used and dead space on its 2TB hard drive. We’d gleaned little from our initial surface inspection of the server image; it would have been a fair guess also that the remainder of the disk space was filled with meaningless information from weeks of data rollover. Still, when there is little to go on, opportunities only come when you persevere.
Although it was a high-traffic server, a significant portion of the disk was not in use. Here, we hoped to find some data in the disk’s swap space to restore and identify the initial date of compromise. We took it apart and struck gold...
To explain how, the nature of a Linux hard drive needs to be understood first. On these hard drives, there exists a secondary partition on the disk called “swap”. This gets used only when the physical RAM memory is full, ergo, when a Linux system runs out of RAM, inactive items are moved from RAM to the swap space. The swap essentially becomes an extra buffer, used to temporarily write and read memory. Swap space can be a bit “wild west” from a forensics perspective, because of the ad-hoc way it is written to; you’re just as likely to come across a thousandth of a file as you are the whole file. And yet, it can be useful for the following reason: there is only so much space on the hard disk that will get constantly overwritten and rolled over as you delete and add files. This is true of memory even more so, which will last just a couple of hours. Because swap space acts as a backup, artefacts there are less susceptible to being overwritten.
We investigated the swap partition, revealing the time the attacker had landed on the client’s network via an artefact left from dmesg. For the reader’s understanding, dmesg entries on the swap partition tend to be of little use to IR because of the volume of background data dmesg collects and the randomness of what’s written to swap. As luck would have it though, a crucial bit of data had been written into the swap space after overflowing from memory, showing us an SSH login from an IP address we knew the attacker was using as a platform for attack. Suddenly, we had an initial logon for when they had first accessed the server—9 months prior to the case being opened. Now we had a trail to follow.
The beauty of forensics is that once you get a good lead, things start to fall into place. With our swap space lead, we could go back and look at other data around that initial date. Information that had no value previously was given meaning and transformed into the chapters of a clear story. We took this account data and timeframe and used it contextualize activity from across the server image and its slack space. It was a huge disk that couldn’t be manually examined for every byte of data within the slack space before finding the dmesg entry. However, now we could crawl through the key indicators of compromise and use them to fingerprint activity, we found trace after trace of the attacker’s movements. The team could finally reconstruct their attack path.
The attacker was smart. It was evident from the dmesg entries that after deploying an SSH brute-force attack to credential spray the server, they had acquired administrator credentials and logged on. From there, they established persistence with backdoor SSH keys, acquired customer data, and began their phishing campaign. Once we knew which accounts had been initially compromised, we could see in other places (like authorized_keys) where they were logging in to tweak information or position their attack. What before seemed like an admin logging in to maintain the server, or the API handling what looked like test requests from regular customers, turned into important events around the initial user compromise.
We’d found everything we needed to take back the server from the attacker: what accounts they had, where they were coming from, and the tricks they had used to hide themselves from the start. This built a complete picture of the opportunist attacker that had taken a successful SSH login and run with it. In this case, they had managed to run all the way to phishing millions of customers across multiple countries in a cash-grab scheme to steal bank account information under the guise of a trusted service.
Now armed with solid evidence, we were able to create a containment plan to eradicate the attacker from the server and stop them getting back in. Evidence from the server indicated a need to target all accounts the attacker had compromised there, and on the APIs, before removing the potential to brute-force it again. We reset every affected account, rotated API keys, implemented key-based SSH authentication, and blocked the attacker’s command and control (C2) IPs at the network level within the span of a few hours. It worked, stopping the attacker dead in their tracks.
If we had made our move prematurely or with too little information, we could have failed. Moving to containment and eradication at the wrong time may have easily led to the attacker withdrawing, before leveraging their backdoors on the server to come right back at us with a new set of tactics. In reprisal, they could have destroyed the platform altogether. But with all the evidence under our belt, we were able to confidently contain them and destroy any of their hopes to regain control of the server.
An effective incident responder relies on experience to make difficult decisions, using their own knowledge and any artefacts of the truth left behind in an attack. Our choice to check the swap space in this incident was a lifeline when we desperately needed a lead. 99% of the time, that data wouldn’t even be worth looking at, but it was the one thing that opened up this case. As Sam Wilde says in Born to Kill, “I don’t like gambling very much. I don’t like being at the mercy of those little white squares that roll around and decide whether you win or lose. I like to have the say-so myself.” When IR teams find themselves choosing between digging deeper or throwing in the towel, it is time they must consider the most. While it can be tempting to keep digging until everything is known, time is the enemy in an incident. When we find our moment to make a good choice, we take it. We should always be the ones who choose, not the attacker.
As far as true crime stories go, this one doesn’t end with a private investigator lighting one last cigarette with his betrayed comrade, but it does end happily. Those two crucial decisions—the one to take a deep dive into the server and the one that pulled it all together through the swap space—were the make-or-break moments for the investigation. Without being unearthed and understood for their relevance through the forensics process, this case could have just as easily ended with a metaphorical gunshot in a dark room. Two things that separate investigations that succeed from those that don’t are a technical awareness and the ability to see potential where everyone else thinks there's none.