- Infrastructure level
- User access level
- Storage and data level
- Application access level
- Network level
- Logging and monitoring level
Infrastructure level security is of the utmost importance. In a public cloud, the physical infrastructure is the cloud provider's responsibility. But in a private cloud, we must ensure the security at the infrastructure level as well. In OpenStack, all the components are separate services and they communicate with each other via APIs. It's very complex to ensure security at each level.
In OpenStack, we have services such as keystone, nova, and neutron, which have dependencies on their underlying databases. Here, it is always advisable that each database has its unique access credentials. This will help when any particular component gets compromised as it will not affect the other components.
Hypervisor in OpenStack must be enabled with SELinux or AppArmor. Most of the time, people disable it during configuration, but it's not recommended as it gives you a virtual boundary to protect your VMs. Apart from this, all the security patches must be deployed on the hypervisor.
There should always be an isolation between networks responsible for management, guest, and storage traffic. It's always preferred to have a separate VLAN for internal users so that users with infected or compromised machines cannot affect the cloud infrastructure.
There must be use of internal and external firewalls with OpenStack to control external and internal traffic.
In OpenStack, each service communicates with each other on specific ports; so, on the firewall, only these ports should be open.
You must watch the activity performed by users, such as successful versus failed logins, and unique transactional behavior, such as users trying to download all the images at once.
In AWS, to secure the infrastructure, you can use IAM, Trusted Advisor, and AWS Config. All these services help you identify the loopholes in the configuration. Enabling logs, monitoring, and alerts using CloudWatch helps you to strengthen security.
For the instance level, we must update the guest OS for updated security patches. VPC logs must be enabled to monitor network-level activity. Using custom alerts on the AWS service, you can proactively manage the security aspects. For example, we can create alerts on NIC of EC2 instances. If the same instance broadcasts traffic massively, we can easily identify the issue by going through the logs.
In the cloud, it is critical to define users and user access. In this section, we define the users, groups, roles, and policies. Users are entities who will access the cloud infrastructure using the console or APIs. A group defines the collection of users who will perform a similar set of actions. Roles define the nature of the job the user will perform, while a policy defines the rules for resource access. It also describes how the users will access the services or applications, and how one service will securely communicate with another service. In the public cloud, communication or integration of different services is usually the user's responsibility, where the consumer defines the secure way for communication. But most of the time, we make a mistake in this process and leave this part vulnerable to security breaches.
For example, we have a solution where EC2 instances need to store the static files on S3 storage. In this case, ideally we should create an EC2 role that has permissions to access specific S3 buckets, but most people put the access key and secret key into the test file in EC2 instances, which is not recommended. This is because if the VM gets compromised, then the whole account is at a risk if the stored key has root account access keys.
Similarly, we must use MFA for console access and should not use the root account to access the console. However, in real life, most of the users do this—they access the console using the root credentials and they also do not use MFA.
For audit purposes, we must use IAM events and we should be logged in to CloudTrail.
In OpenStack we also have identity management to define user access. As in the case of the AWS service, here also we define users, groups, and roles. Identity management in OpenStack provides you with the Role-Based Access Control (RBAC) and ACLs.
Storage and data level security is very important. Recently, we have heard about many cases of security breaches, such as Verizon, which suffered with a data leak on S3 due to it being publicly open. This also happened with Accenture, where the server was exposing the data to the public. These cases happened due to not implementing the security policy at the storage and data level. In the cloud, we have the following types of storage:
- Volume storage: This type of storage is used as a block storage, which can be mapped with VM as a partition. To ensure security, we can use OS-based encryption or HSM to ensure the security of data. For data protection, we can define RAID as well. For example, in AWS we have Elastic Block Store (EBS), which provides an encryption facility and also provides the feature to create RAID.
- Object storage: This type of storage is used to store static content, such as images and documents. Here, we can define encryption and ACLs to ensure the security of data. There are many cloud providers who already keep multiple copies of object storage data to ensure safety. For example, in AWS we have S3, which keeps six copies of data for redundancy.
- Database storage: This is the type of storage that we use to store our database. In AWS, we have RDS. To ensure data security, we must ensure that encryption is enabled and also that only authorized users have access.
In general terms, we define data security in storage in two parts:
- Data at rest: For data at rest security, we enable encryption using Key Management Service (KMS) or HSM. Here, we can enable encryption at the storage level. All the aforementioned examples of security for storage are for data at rest encryption.
- Data in transit: For data in transit, we must define the secure channel to maintain the integrity of data. For this, we use SSL/TLS while communicating with the external service or users. From a management perspective, we always prefer to use a secure VPN tunnel.
Application access is one of the most important areas of concern in terms of security. Here, we have our data and information in transit. We must secure this transferring data using a secure channel, such as SSL. Apart from this, if our application is a web application, we must ensure availability. We have heard about cases of DDoS attacks, SQL injections, and so on. There are always bad guys who work in the dark to steal your important data. To disable this, we must ensure that we have defined preventive parameters such as the use of the web application firewall (WAF), and that our infrastructure should be deployed in such a way that it can handle the DDoS attack. Security groups should allow the traffic on specific ports and from specific sources only. For example, we have a web application that runs with SSL on port
443, so make sure that only port
443 is open for public access. Network ACLs should also be configured to allow only legitimate traffic.
We can also use WAF to stop malicious traffic and prevent DDoS attacks. WAF also helps to apply rules on your websites for accessibility. You can also manage the traffic on the basis of geographical locations.
If your application uses a Content Delivery Network (CDN) to make your site perform faster, you must define security at the CDN level. The CDN keeps the local copy of all static content locally, which is transferred from one origin. So you must define security at the origin level and the CDN level regarding file access.
For APIs, security must ensure that the API is accessible only to authorized users with key-based authentication and the API should be accessible over SSL only.
Internet-based applications are more prone to DDoS and brute force attacks where there will be large amount of illegitimate traffic on your application, which results in the unavailability of your application. For online businesses, a DDoS can be critical, as the application's unavailability will essentially halt the revenue stream.
To tackle these situations, we can use a global DNS service such as Route 53, which can handle a traffic burst. The application must be deployed in HA with autoscaling running under the load balancer so that, if the peak comes, it should autoscale the resource to handle the traffic.
There is also a chance that your VM gets compromised and starts broadcasting the packet. To eliminate this situation, we must do the security hardening of the virtual machine and enable monitoring so that, if any such adverse situation comes about, you will get an alert to take appropriate action.
Most of the time, we secure our environment externally, but what about the internal users? This case is very common in a private cloud or hybrid cloud environment. So, we must watch the user activity, the number of sessions, and the kind of transactions taking place. For this you can check the load balancer logs, application server logs, and user access, or you can use any monitoring tool that can display real-time logs in a meaningful way. Here we can utilize the Elasticsearch, Logstash, and Kibana (ELK) stack, which gives very interactive dashboards and graphs.
When we are moving to the cloud or opting for the cloud, network security is of the utmost importance. On the cloud, we can define the policy at our firewall level to allow and deny the traffic. In AWS, we use VPC to define the network. In VPC, we must create subnets to define the public, private, and management subnets. For SSH or RDP access, we must have either a jump server or bastion host. This will add one additional layer of security. The route table should be properly defined. We must define and configure network ACL to control the incoming and outgoing packets. In security, we only require the ports to be open and the source should be clearly specified. Do not open all the ports to the public.
For private subnet VMs, we can use the NAT service to enable internet access.
If you need to meet a specific compliance, you can use IPS and IDS to make the environment more secure.
To access resources from a management perspective, we should use a VPN connection. There are different types of VPN connections offered by AWS.
For a private and secure connection, we can use the Direct Connect connection between the customer site to AWS.
In OpenStack, we must understand how the workflow process for the tenant instance creation needs to be mapped to security domains. There are a few services that directly communicate with neutron and these services must be mapped to security domains, as follows:
- OpenStack dashboard: Public and management
- OpenStack identity: Management
- OpenStack compute node: Management and guest
- OpenStack network node: Management, guest, and possibly public, depending upon the neutron plugin in use
- SDN services node: Management, guest, and possibly public, depending upon the product used
To isolate sensitive data communication between neutron and other OpenStack core services, we configure communication channels to only allow communication over an isolated management network.
We must restrict the neutron API connection to a specific interface using specifying details in the neutron configuration file.
Likewise, we must define the incoming and outgoing traffic using security groups.
When using flat networking, we cannot assume that projects that share the same layer 2 network (or broadcast domain) are fully isolated from each other. These projects may be vulnerable to ARP spoofing, risking the possibility of man-in-the-middle attacks.
To prevent this, we must enable
prevent_arp_spoofing in the Open vSwitch configuration file.
Logging and monitoring is a very important aspect of any IT infrastructure. Here we get granular details about all the events performed in the infrastructure at each level. Logging and monitoring is a bit complex in the cloud. In logs, we cannot always filter on the basis of IP due to dynamic allocation of IP. There can arise a situation where one IP was earlier representing the x virtual machine, but is now representing the y virtual machine.
Apart from this, the cloud comprises different services. We must ensure the activity logging at each service.
In AWS, we can use CloudTrail to log all the activity for each service and we can either store these logs to an S3 bucket or we can forward them to CloudTrail logs.
Recently, CloudTrail logs enabled at the load balancer helped us to identify the illegitimate traffic. Let's consider, we are running one financial application in HA and an autoscaling environment. Over the last few days, we have seen a peak in resource utilization. As it's configured in autoscaling, it could not affect the application's performance. But, when we tried to investigate the issue, we found that there was a bad guy who was attacking our application:
2017-10-23T00:12:54.164535Z ASP-SaaS-Prod-ELB 22.214.171.124:46838 172.31.2.240:80 0.000038 0.001246 0.000057 404 404 0 0 "HEAD http://X.X.X.X:80/mysql/admin/ HTTP/1.1" "Mozilla/5.0 Jorgee" - - 2017-10-23T00:12:54.294395Z ASP-SaaS-Prod-ELB 126.96.36.199:46838 172.31.1.37:80 0.000069 0.000936 0.000051 404 404 0 0 "HEAD http://X.X.X.X:80/mysql/dbadmin/ HTTP/1.1" "Mozilla/5.0 Jorgee" - - 2017-10-23T00:12:54.423798Z ASP-SaaS-Prod-ELB 188.8.131.52:46838 172.31.2.240:80 0.000051 0.001275 0.000052 404 404 0 0 "HEAD http://X.X.X.X:80/mysql/sqlmanager/ HTTP/1.1" "Mozilla/5.0 Jorgee" - - 2017-10-23T00:12:54.553557Z ASP-SaaS-Prod-ELB 184.108.40.206:46838 172.31.1.37:80 0.000047 0.000982 0.000062 404 404 0 0 "HEAD http://X.X.X.X:80/mysql/mysqlmanager/ HTTP/1.1" "Mozilla/5.0 Jorgee" - - 2017-10-23T00:12:54.682829Z ASP-SaaS-Prod-ELB 220.127.116.11:46838 172.31.2.240:80 0.000076 0.00103 0.000065 404 404 0 0 "HEAD http://X.X.X.X:80/phpmyadmin/ HTTP/1.1" "Mozilla/5.0 Jorgee" - -
In the aforementioned logs, you can see how the bad guy is sitting on IP
He is trying to hack the application using different URLs or passing different headers.
To prevent this, we enabled WAF and blocked all the traffic from the outside world. Also, you can make WAF learn about this malicious traffic so that whenever such a request comes, WAF will reject the packet. It won't let the packet pass through WAF.
In the monitoring, you must define the metrics and alarm. It helps us to take preventive action. If anything goes against your expectation, you get an alarm and can take appropriate action to mitigate the risk:
Alarm Details: - Name: awsrds-dspdb-CPU-Utilization - Description: - State Change: OK -> ALARM - Reason for State Change: Threshold Crossed: 1 datapoint [51.605 (24/10/17 07:02:00)] was greater than or equal to the threshold (50.0). - Timestamp: Tuesday 24 October, 2017 07:07:55 UTC - AWS Account: XXXXXXX Threshold: - The alarm is in the ALARM state when the metric is GreaterThanOrEqualToThreshold 50.0 for 300 seconds. Monitored Metric: - MetricNamespace: AWS/RDS - MetricName: CPUUtilization - Dimensions: [DBInstanceIdentifier = aspdb] - Period: 300 seconds - Statistic: Average - Unit: not specified
In the preceding example, we have defined an alarm on CPU utilization at the RDS level. We got this alert when there was CPU utilization of more than 50% but less than 70%. As soon as we got the alert, we started investigating, which caused the CPU utilization.
Now, let's see the summarized security risk and preventive action at different levels in the cloud:
- Hypervisor level: In the cloud, we have our VMs running on shared resources. There could be a chance that there is a host, which runs x and y VMs. In case the x VM got compromised or hacked, there can be a risk of the y VM getting compromised as well. Luckily, it's not possible due to isolation of resources, but what if the attacker gets access to the host? So, we must update the required security patches on the hypervisor. We must ensure that all the security parameters are configured at the VM level. Most of the time, it happens when we disable the underlying security parameters. This happens mostly with the private cloud. At the hypervisor level, we also segregate the traffic at the vSwitch level where we must have at least management, guest, and storage traffic running on different VLANs.
- Network level: The network is the backbone of the cloud. If the network is compromised, it can completely break down the cloud. The most common attacks on the network are DDoS, network eavesdropping, illegal invasion, and so on. To secure the network, we must define the following:
- Isolation of traffic (management, storage, and guest)
- ACL for network traffic
- Ingress and egress rules must be clearly defined
- IDS and IPS must be enabled to control the intrusion
- Antivirus and antispam engines should be enabled to scan the packets
- Network monitoring must be configured to track the traffic
- Storage level: Storage is also a critical component of the cloud where we store our critical data. Here, we can have risk of data loss, data tampering, and data theft. At the storage level, we must ensure the following to maintain security and integrity of data:
- All the data at rest must be encrypted
- Backup must be provisioned
- If possible, enable data replication to mitigate the risk of hardware failure
- User roles and data access policy must be defined
- A DLP mechanism should be enabled
- All the data transaction should happen using encrypted channels
- Access logs should be enabled
- VM level: At the VM level, we can have the risk of password compromise, virus infection, and exploited vulnerabilities. To mitigate this, we must ensure the following:
- OS-level security patches must be deployed from time to time
- Compromised VMs must be stopped instantly
- Backup should be provisioned using continuous data protection (CDP) or using a snapshot
- Antivirus and antispam agents should be installed
- User access should be clearly defined
- If possible, define key-based authentication instead of passwords
- The OS must be hardened and the OS-level firewall and security rule should also be enabled
- Logs management and monitoring must be enabled
- User level: User identity and access is critical for every cloud. We must clearly define the users, groups, roles, and access policy. This is the basis of cloud security. This is the portion where we authorize them to play with the infrastructure and service. And, if the identity and access is not clearly defined, it can lead to a disaster at any time. To ensure security, we must define the following:
- Users, groups, roles, and access policies
- Enable MFA for user authentication
- The password policy and access key must be defined
- Make sure that the users are not accessing the cloud using the root account
- Logs must be enabled for audit purposes
- Application level: Once your application is hosted and open for public access, then actual risks arise to maintain the availability and accessibility of the service. Here, you will face DDoS, SQL injection, man-in-the-middle attack, cross-site scripting, and so on. To prevent all such attacks, we must use the following:
- Scalable DNS
- Load balancer
- Provision autoscaling
- User IAM policies and roles
- Compliance: If you have to match some compliance, such as ISO 27001, PCI, and HIPAA, then you must follow the guidelines of all these compliances and design the solutions accordingly. We will read about compliances in the last chapter and learn how to meet them.
While designing the solution, always think that you are designing for failure. Identify all the single points of failure and find appropriate solutions for them. Also, while designing the solution for the cloud, always consider security, reliability, performance, and cost efficiency, as these factors have a huge impact on your solution as well as organization.