This blog is part of our SOCwise series where we’ll be digging into all things related to SecOps from a practitioner’s point of view, helping us enable defenders to both build context and confidence in what they do.
Although there’s been a lot of chatter about supply chain attacks, we’re going to bring you a slightly different perspective. Instead of talking about the technique, let’s talk about what it means to a SOC and more importantly focusing on the SUNBURST attack, where the adversary leveraged a trusted application from SolarWinds.
Below you are going to see the riveting discussion between our very own Ismael Valenzuela and Michael Leland where they’ll talk about the supply chain hacks and the premise behind them. More importantly, why this one in particular was so successful. And lastly, they’ll cover best practices, hardening prevention, and early detection.
Michael: Ismael, let’s start by talking a little bit about what the common types of supply chain attacks. We know from past experience that they’ve primarily been software; though, it’s not unheard of to have hardware-based supply chain attacks as well. But really, it’s about hijacking or masquerading as a vendor or a trusted supplier and objecting malicious code into trusted, authorized applications. Sometimes even hijacking the certificate to make it look legitimate. And this last one was about injecting into third party libraries.
In relation to SUNBURST, it was a long game, right? This was an adversary long game attack where they had over 12 months to plan, stage, deploy, weaponize and reap the benefits. And we’re going to talk more about what they did, but more importantly, also how we as practitioners can leverage the sources of telemetry we have for both detection and hopefully future prevention. The first question that most people ask is, is this new and clearly this is not a new technique or tactic, but let’s talk a little bit about why this one was different.
Ismael: Right! The most interesting piece about SolarWinds is not that much of it is a supply chain attack because as you said, it’s true. It’s not new. We’ve seen similar things in the past. I know there’s a lot of controversy around some of them like Supermicro, we and many others over the last few years and it’s difficult to prove these types of attacks. But to me, the most interesting piece is not just how it got into the environment, but we talked about malicious updates into legitimate applications. For example, we’ve seen some of that in the past with modifying code on GitHub, right? Unprotected reports, attackers, threat actors are modifying the code.
We’re going to talk a little bit about what organizations can do to identify these but what I really want to highlight out of this is about the attackers, they have a plan right? They compromise the environment carefully, they stayed dormant for about two weeks, and after that, as we have seen in recent research, they started to deploy second stage payloads. The way they did that was very, very interesting, and its changing the game. It’s not radically new, but there’s always something new that we may have not seen before. And it’s important for defendants to understand these behaviors so they can start trying to detect them. In summary, they have a plan and we should ask ourselves if we have a plan for these type of attacks? Not only the initial vector but also what happens after that.
Michael: Let’s take a look at the timeline (figure 1 below) and talk about the story arc of what took place. I think the important thing is, again the adversary knew long before the attack long before the weaponization of the application, long before the deployment, they had this planned out. They knew they were going after a very specific vendor. In this case, SolarWinds knew as far back as 2018, early 2019, that they had a registration domain registered for it already. And they didn’t even give it a DNS look up until almost a year later. But the code application 2019 was weaponization in 2020. We’re talking about months almost a year of time passed, and they knew very well going into it what their intent was.
Ismael: Yep, absolutely. And as I mentioned before, even once they have the back door in place, the infamous DLL now stays dormant for two weeks. And then they start a careful reconnaissance discovery trying to find out where they are, what type of information they have around them, the users, and identity management. In some cases, we have seen them pivoting and stealing the tokens and credentials then pivoting to the cloud, all of that takes time. right? Which indicates that the attacker has a lot of knowledge on how to do these in a stealthy way. But if we think in terms of attack chains it also helps us to understand where we could have better opportunities to catch these types of activities.
Michael: We’ve set the stage to understand kind of what exactly took place and a lot of people have talked about the methodology and the attack life cycle. But they had a plan, they weren’t specifically advanced in the way they leveraged the tools. They were very specific about leveraging multiple somewhat novice or novel methods to make use of the vulnerability. More importantly, it was the amount of effort they put into planning also the amount of time they spent trying not to get seen, right. We look at telemetry all the time, whether it’s in a SIEM tool or EDR tool, and we need those pieces of telemetry that tell us what’s happening, and they were very stealthy in the way they were leveraging the techniques.
Let’s talk a little bit about what they did that was unique to this specific attack and then we’ll talk more about how we can better define our defenses and prevention around what we learned.
Ismael: Yep, absolutely! And one of the interesting things that we have seen recently is how they disassociated the stage one and stage two to make sure that stage one, the backdoor/DLL wasn’t going to be detected or burnt. So once again, you were talking about the long game. They were planning, they were architecting their attack for the long game. Even if you would find an artifact from a specific machine, it would be harder for you to trace that back to the original backdoor. So they would maintain persistency in the environment for quite some time. I know that this is not new necessarily. We have been telling defenders for a long time: You need to focus on finding persistency, because attackers, they need to stay in the environment.
We need to look at command and control but obviously these techniques are evolving. They went to great lengths to ensure that the artifacts, the indicators of compromise on each of these different systems for stage two, and at this point we know they use colon strike beacons. Each of these beacons were unique, not just for each organization, which would make sense but also for each computer within each organization. What does that mean for a SOC? Well, imagine you’re doing this and in response you find some odd behavior coming out of the machine, you look at the indicators and what are you going to do next…. scoping, right? Let’s see where else in my network. I’m seeing activity going into that domain to those IPS or those registry keys or that, you know, WMI consumer, for example. But the truth is that those indicators were not used anywhere else, not even in your environment. So that was interesting.
Michael: Given that we don’t have specific indicators that we could attribute to something malicious in that stage, what we do know is that they’re leveraging common protocols in an uncommon way. The majority of this tactic took place from a C2 perspective through the partial exfiltration being done using DNS. To the organizations that aren’t successfully or effectively monitoring the types of DNS traffic, the DNS taking place on non-standard ports or more quarterly, the volume of DNS that’s originating from machines that don’t typically have it and volume metric analysis can tell us a lot. If in fact, there’s some heuristic value that we can leverage to detect. What else should we be thinking about in terms of the protection side of things, an abuse of trust?
We trusted an application; we trusted a vendor. This was a clear abuse of that. Zero trust would be one methodology that can incorporate both micro-segmentation as well as explicit verification and more importantly, least trust methodology that we can ensure. I also think about the fact that we’re giving these applications rights and privileges to our environment and administrative privileges. We need to make sure that we’re monitoring both those accounts and service accounts that are being utilized by these applications; specifically, so that we can prescribe a domain, walls and barriers around what they have access to. What else can we do in terms of detection or providing visibility for these types of attacks?
Ismael: When we’re talking about a complicated or advanced attack, I like to think in terms of frameworks like the new cybersecurity framework, for example that talks about prevention, detection, and response but also identifying the risks and assets first. If you look at it from that perspective and look at an attack chain, even though some of the aspects of these attack were very advanced, there’s always limitations from the attacker perspective. There’s no such thing as the perfect attack, so be aware of the perfect attack fallacy. There’s always something the attacker’s going to do that can help you to detect them. With that in mind, think about putting the MITRE attack behaviors, tactics and the techniques on one side of the matrix and on the other side, like NIST cybersecurity framework identify, protect, detect.
Some of the things I would suggest is identifying the assets of risk, and I always talk about BCP. This is continuity planning. Sometimes we work in silos and we don’t leverage some of the information that can be in your organization that can point you to the crown jewel. You can’t protect everything, but you need to know what to protect and know how the information flows. For example, where are your soft spots, where are your vendors located on the network, your/their products, how do they get updated? It will be helpful for you to determine or define a defensible secure architecture that enforces it by trying to protect that…the flow of the data.
When protection fails, it could be a firewall rule that can be any type of protection. The attempts to bypass the firewalls can be turned into detections. Visibility is very important to have across your environment, that doesn’t mean to just manage devices, it also means the network, and endpoints, and servers. Attackers are going to go after the servers, the main controllers, right? Why? Because they want to steal those credentials, those identities used somewhere else and maybe pivot to the cloud. So having enough visibility across the network is important, which means having the camera’s point to the right places. That is when EDR or XDR can come into play, product that keep that telemetry and give you visibility of what’s going on and potentially detect the attack.
Michael: I think it’s important as we conclude our discussion to chat about the fact that telemetry can come in various flavors; more importantly, both real-time and historical telemetry that’s of significant value, not only in the detection side, but in the forensic investigation/scoping side, and understand exactly where an adversary may have landed. It’s not just having the telemetry accessible, it’s also sometimes the lack of telemetry. That’s the indicator that tells us when logging gets disabled on a device and we stop hearing from it then the SIEM starts seeing a gap in its visibility to a specific asset. That’s why combination of both real-time endpoint protection technologies deployed on both endpoints and servers, as well as the historical telemetry that we’re typically consuming in our analytics frameworks, and technologies like SIEM
Ismael: Absolutely, and to reiterate the point of finding those places where attackers are going to be, can be spotted more easily. If you look at the whole attack chain maybe the initial vector is harder to find, but start looking at how they got privileges, their escalation, and their persistence. Michael, you mentioned cleaning logs apparently were disabling the auditing logs by using auditpol on the endpoint or creating new firewall rules on the endpoints. If you consume these events, why would somebody disable the event logging temporarily by turning it off and then back on again after some time? Well, they were doing this for a reason.
Michael: Right. So we’re going to conclude our discussion, hopefully this was informative. Please subscribe to our Securing Tomorrow blog where you can keep up to date with all things SOC related and feel free to visit McAfee.com/SOCwise for more SOC material from our experts.