Updated: Aug 11
In the first part of our blog post, I shared about our amazing in-house security system that we built on our own, which allows us to maintain our security posture throughout the entire scale of the Wix environment - continuously and efficiently.
While developing a full-blown system is not always an option for every organization, the journey and the different phases we had to go through to get there can and, we believe should, be shared with our peers. After all, it’s a journey that can be repeated in any organization, even without the need for much development!
Let’s get into these phases, the methodology that led us, and the incredible outcomes of our security journey.
So, what was your first step?
The very first thing you have to do is get a proper aggregation tool that can collect data from your infrastructure - whether it is the cloud, virtualization, or any other solution.
Ideally, you’d want the tool to also allow you to query raw data from the infrastructure (and not just processed alerts), as it will allow you to also define your own set of rules/policies.
The tool should be able to get data from several sources, including:
Users, Roles, Groups etc.
Network Devices & Resources
At this stage you should be looking for a tool that would connect to the API layer of the infrastructure management/console and fetch data to a centralized place.
When reviewing the different tools available, consider choosing the one that can fetch data from as many components as you have in your infrastructure - not limited to just AWS or GCP. Think of your security systems, your office networks, third-party SaaS applications, and so on.
Imagine having the following data per each compute running in your infrastructure, available with a single click:
It is also super important to make sure that such a tool has nice visualization features (charts, graphs etc.) and that it supports integrations to ticketing systems, emailing systems and such.
Next, you want to look for a way to get an inventory from the workloads running inside your infrastructure - Compute, Containers, Kubernetes etc.
Oh, and don’t forget your employee laptops too! Here are several things to be collected from your workloads:
Applications, Libraries, Services
Important system files
Additionally, make sure that your tool of choice does not degrade performance within your infrastructure - this is crucial! You do not want it to interfere with the business!
At this stage I would opt for a tool that does both workload security and has the ability to give you the inventory you’re looking for. Try to not overload your infrastructure with various different tools simultaneously.
Lastly and most importantly, make sure that you can integrate this tool with your aggregation tool from step #1. This will allow you to correlate data from both systems and gain amazing visibility in your infrastructure and its workloads.
Congrats! You’re now an Asset Management Master.
What’s next? Detecting and mitigating security issues
Now that we’ve got both infrastructure level data and workload level (or application stack) data in our hands - hopefully all of it in one queryable place - we can start working on satisfying our security needs!
1. Security Coverage - By using data gathered both from the infrastructure (hosts information) and the workload security tool, we can tell whether our infrastructure is fully covered with the workload security tool. Based on this data you'll be able to tell how many computations, or workloads, are running in your infrastructure. And by correlating with the data from the security tools you own, you'll be able to tell if you're fully covered.
2. Configuration Management - By analyzing data collected from our infrastructure, we are now able to discover those misconfigured entities or resources.
Check the following first:
NAT, VPC, Route Table, Subnet
Looking at the above (among other things) will help you find those publicly accessible resources (like buckets and RDSs) and servers exposing risky ports (Telnet, SSH, Logstash etc.).
3. Identity & Access Management - Perhaps a subset of the configuration management domain, it is completely entitled to its own place. Identity and access management is one of the most (if not the most) important measures that any organization has to master in today’s cloud infrastructure.
You want to be able to map out your cloud infrastructure and become familiar with the different users and roles that are in use - both manually and programmatically.
AWS, for example, has more than 300 services with a remarkable 9,000 different permissions created for them. Can you tell which of these permissions can actually harm your infrastructure or lead to data exposure?
Sadly, most tools will only alert on administrative policies in use. But if a user or role has an “IAM:*” (essentially admin), they’ll ignore it.
Here’s how we can overcome this:
Create a list of harmful permissions that can be used against you (pro tip: look for penetration testing blogs and known IAM escalation tricks - it will save you some valuable time).
Find a way to identify all Users, Roles, Policies and Compute with Attached Roles that have any of the permissions you mapped in step 1.
Begin by looking for permissions that are also external facing - compute with inbound security groups open and public IP or Roles that are assumable by external entities outside your organization.
4. Vulnerability Management - Make a complete inventory of all libraries and applications running on each workload (compute, container etc) and correlate it with known vulnerabilities (there are plenty of tools in the industry to assist you with this).
Find a proper way to “purify” the results. Trust me, there will be a lot. Focus on the vulnerabilities that are truly susceptible to exploitation - ones that are easy to exploit, that may have already been exploited in the wild and potentially have a POC published, ones that spread widely in your infrastructure, etc.
And do not forget your network gear: Firewalls, VPNs, WAFs.These are all crucial!
5. Data Protection - data is king. Which is why you want to make sure that it is protected. Using off-the-shelf data scanning tools, identifying sensitive data and PII may assist with discovering (hopefully not) data that is not adequately protected.
6. Centralizing things
Circling back to step #1: If you can gather all of this data and information into a single place, allowing you to query and correlate between all of these sources, you’re already one step ahead.
If at this stage your solution also supports actionizing of things, consider it a done deal.
Pro tip: Tag your assets with findings from each domain (Like “isPublic”, “HighlyPermissiveRole”, “Unencrypted”, “SensitiveData”, “Log4J”).
This kind of looks like a framework… Doesn't it?
Retrospectively, when looking at what we achieved so far, we realize that we have created an actual security framework!
A security framework that adopts the industry’s practices (like NIST’s CSF for example) but also is highly effective in our complex and prone-to-changes infrastructure.
What happens when a new technology stack is published?
We got it covered!
The cornerstones of security will remain the same, no matter what the technology brings. Identity and access management is relevant in legacy networks, cloud networks, SaaS solutions and even OT - and so are asset management, vulnerability management, configuration management, etc.
We will make sure to support any new technology in our framework, “processing” it through the same phases:
Visibility -> Asset Management -> Security Domains -> Threat Hunting.
What about that threat hunting?
Ahh yes… This is where true magic happens.
Up until now we’ve only been collecting static information (crucial for improving our security posture continuously), but now it's time to put it to use.
By correlating all of the information we gathered so far, together with our cloud’s audit logs and other security system’s alerts and logs, we are now aware of the context of things!
Any alert of GuardDuty (=AWS IDS) will now be accompanied by all of these tags that you have created in your asset management system (remember?!). So now, not only do you know that a server is being brute-forced, but also that it has HighlyPermissiveRole and SensitiveData!
The same thing applies to the raw audit logs (like Cloudtrail) generated by the cloud infrastructure itself. For any trail that’s created, you can parse its “trusted entity” field, the “policy” field, the source IP etc., and compare it with your asset management system!
By enriching all of the trails with such valuable data we no longer need to look for that annoying “needle in the haystack”.
The major outcome of correlating logs and alerts with our tags is the connection we get between potential and actual - meaning, the potential security risks that exist on the resource and the actual exploitation threats. This all takes plenty of work, and not just from the security team.
Great, so all teams jumped on board right away?
Here’s the thing.
It seems as if security efforts often “collide” with the business efforts of a company, but that shouldn’t be the case. In fact, it should be the exact opposite.
Security should not revolve around “why not”, but “how to” instead!
Ultimately, our job as security professionals is to enable the business efforts and goals by removing any barriers throughout the process.
But, since “secured” effectively brings with it limitations and restrictions (which can easily be translated into resources and time), it is often not wanted and may even be overlooked.
So, it is indeed true that we had to “earn” our place and prove that we are not “wasting” anyone’s time.
So how do you “recruit” everyone to this effort of security? The answer is actually quite simple - transparency and collaboration.
The moment you stop “ruling” and start sharing your knowledge, explaining the risks and the importance of things and start working together with everyone, people will understand and want to collaborate with you.
But how do you do that in practice?
You don’t state facts (unless there’s absolutely no way around it) - you introduce people to your thoughts and hear their feedback.
You prove that you have reviewed the materials thoroughly and understand the subject fully - it isn’t shameful to ask questions and ask for clarifications.
You describe the risks in each scenario - what the probability and the potential impact is for each scenario.
You share all of the knowledge at hand - knowledge is power.
Most importantly, you don’t “fire and forget”. You work with the developers and infrastructure engineers throughout the entire project.
Awesome. Where do we go from here?
As you know, the job for security truly never ends. The technology keeps evolving and changing to support the needs of the industry. And thus to remain relevant, our security framework has to be updated accordingly.
We’re also working hard on minimizing the time of identification of security gaps, and even preventing them before they are actually deployed to our Production (“Shift Left”).
Security posture management is a continuous effort. It's a journey. If you can manage to create an in-house system like we did, all the better. But every solution needs to be continuously monitored and updated.
It is our duty to make sure we are always aligned with the latest technologies and security practices out there to ensure we are protecting our infrastructure and our customers.
By working methodically and always updating your one source of truth, you should be able to maintain proper visibility, asset management and the security of your entire stack - no matter your solution.
This post was written by Opher Hofshi, Security Architect at Wix.
For more engineering updates and insights: