Industrial Control Systems and Security

Symantec’s 2015 Website Security Threat Report included details of methods used in targeted attacks on Industrial Control Systems (ICS).  ICSes control not just industrial facilities, but critical infrastructure components such as energy, transportation and communications networks.   Repercussions of hacks on  ICSes can include the potential sabotage of energy systems or the massive damage done in a German steel mill when an incursion led to the unscheduled shut down  of a blast furnace.

The function of ICSes makes them a high priority targets for hackers.  Long term attacks using every possible attack vector and vulnerability are the norm and understanding the best practices needed to protect ICSes can provide significant insight on protecting systems that are not as heavily targeted.

NIST’s Guide to Industrial Control Systems (ICS) Security notes that ICSes were originally not available over the internet, and were therefore subject to threats from local access only.  ICSes primary design concerns were availability, safety, and performance, and patching security vulnerabilities cannot sacrifice these features.  With that in mind, NIST provides the following recommendations for securing ICSes:

  • Restricting logical access to the ICS network and network activity
  • Restricting physical access to the ICS network and devices
  • Applying security patches “in as expeditious manner as possible, after testing them under field conditions”
  • Disabling all unused ports and services
  • Restricting user privileges to only required privileges
  • Antivirus software
  • File integrity checking software

Symantec’s report found that in practice ICSes were not locked down as stringently as NIST recommended.

The Dragonfly group exploited multiple vulnerabilities in ICS security with attacks on the US energy sector that date back to 2011.  They started with a spear phishing attack, and then moved on to compromising websites that users at their targets were likely to visit in a watering hole attack.  The compromised websites redirected visitors to other sites that hosted exploit kits that were then downloaded onto the target network’s computers.  Exploits at this stage were primarily used for recon on the corporate network, capturing everything from file names to passwords.   With hackers entrenched in the corporate network the only way to protect the ICS is to completely segregate its network.

The last stage of the Dragonfly attack was innovative and broke directly into the ICS network.  They used a variant of the watering hole attack in which software from ICS equipment manufacturers was infected.  When ICSes checked for updates, they downloaded the malware along with the update.  A thorough test of patches before installation might have been able to detect the infected software.

Another direct attack on the ICS systems was access to internet accessible human-machine interfaces (HMIs).  Symantec outlines the vulnerabilities in HMIs and other ICS web interfaces:

Many of the proprietary Web applications available have security vulnerabilities that allow buffer overflows, SQL injection, or cross-site scripting attacks. Poor authentication and authorization techniques can lead the attacker to gain access to critical ICS functionalities. Weak authentication in ICS protocols allows for man-in-the-middle attacks like packet replay and spoofing.

Best practices for vulnerable, internet accessible applications are patching where available, restricting access to secure channels (e.g. VPN), and implementing multifactor authentication.

The stakes are higher for ICSes, but the best practices are the same.  Keep your high value targets segregated.  Keep up to date on patches and antivirus definitions.  Educate your users on security.  And finally, make sure you monitor your networks for any and all suspicious activity.

Why You Should Care About Legacy Patches

As we noted in previous posts on security best practices and unsupported releases, unpatched applications and operating systems can be a significant security vulnerability.  Verizon’s 2015 DBIR includes a survey of the Common Vulnerabilities and Exposures (CVEs) observed in security incidents in 2014.  They found that “Ten CVEs account for almost 97% of the exploits observed in 2014”, with the top 10 threats ranging in age from 1999-2014:

CVE-2002-0012  SNMP V1 trap vulnerabilities
CVE-2002-0013 SNMP V1 GetRequest, GetNextRequest and SetRequest vulnerabilities
CVE-1999-0517 SNMP default (public) or null community string
CVE-2001-0540 Malformed RDP requests cause memory leak in Windows NT/2000 Terminal Services
CVE-2014-3556 I/O buffering bug in nginx SMTP proxy allows for command injection attack into encrypted SMTP sessions.  (Fixed in nginx 1.7.4)
CVE-2012-0152 Windows 7/2008 Terminal Server denial of service vulnerability
CVE-2001-0680 QPC’s QVT/Net 4.0 and AVT/Term 5.0 ftp daemon can allow remote users to traverse directories
CVE-2002-1054 Pablo ftp server allows arbitrary directory listing
CVE-2002-1931 PHP Arena paFileDB versions 1.1.3 and 2.1.1 vulnerable to cross site scripting
CVE-2002-1932 Windows XP and 2000 do not send alerts when Windows Event Logs overwrite entries

While these CVEs covered the majority of the observed incidents, patching only the applicable CVEs in this list is not a shortcut to 97% safety.  Thousands of vulnerabilities exist, more are discovered on a daily basis, and any vulnerability present on your system is subject to exploitation.  Even if you do get everything in your environment patched you will still need to apply new patches in a timely manner.  To get an idea of exactly what “timely manner” means we can use the DBIR’s examination of how much time it took for newly discovered CVEs to be exploited after they were published:

Half of the CVEs exploited in 2014 fell within two weeks. What’s more, the actual time lines in this particular data set are likely underestimated due to the inherent lag between initial attack and detection readiness (generation, deployment, and correlation of exploits/signatures).

The timeline between a bug’s discovery and exploitation puts “timely manner” somewhere between “drop everything and do it now” and “wait for the next maintenance window” depending on the severity of the problem and your organization’s sensitivity to patch associated downtime.

This prompts the question: Should you set all your software to update automatically and clean up after any patch bugs and outages that might cause software issues?  In a previous post we looked at the topic of automatic patch updates, and listed some criteria for determining when to apply patches.  Your patching policy will be a balance between catching up on old vulnerabilities, patching new ones in a timely manner, and maximizing availability for your users.

Regardless of how well caught up you are on patches there will be unavoidable holes in patch coverage and security.  As with the Shellshock bug last September, the initial patches needed to be revised and patches were not available for all platforms at the same time.  The nature of zero day exploits is that the bug is already there, you just don’t know about it.  Effective patch management will minimize but not eliminate your security risks.  You still need to monitor your systems for unexpected behavior that could indicate a security problem.

Where do you start with Critical Security Controls?

Verizon’s 2015 Data Breach Investigation Report (DBIR) was published last month, building on the findings of 2014’s DBIR report.  We’ll take a look at the major findings of the 2015 report in our next post, but in the spirit of heading for the dessert bar first, we’ll start with the goodies.  The conclusion of the report includes a table of the top Critical Security Controls (CSCs) from the SANS Institute which cover the majority of the observed threats:

CSC ID# Description
CSC 13-7 Two-factor authentication for remote logins.
CSC 6-1 Make sure applications are up to date, patched and supported versions.
CSC 11-5 Verify that internal devices available on the internet are actually required to be available on the internet.
CSC 13-6 Use a proxy on all outgoing traffic to provide authentication, logging, and the ability to whitelist or blacklist external sites.
CSC 6-4 Test web applications both periodically and when changes are made.  Test applications under heavy loads for both DOS and legitimate high use cases.
CSC 16-9 Use an account lockout policy for too many failed login attempts.
CSC 17-13 Block known file transfer sites.
CSC 5-5 Scan email attachments, content, and web content before it reaches the user’s inbox.
CSC 11-1 Remove unneeded services, protocols and open ports from systems.
CSC 13-10 Segment the internal network and limit traffic between segments to specific approved services.
CSC 16-8 Use strong passwords – suggested:

  • contain letters, numbers and special characters
  • changed at least every 90 days
  • minimal password age 1 day before reset
  • cannot use 15 previous passwords as new password
CSC 3-3 Restrict admin privileges to prevent installation of unauthorized software and admin privilege abuse.
CSC 5-1 Use antivirus software on all endpoints and log all detection events.
CSC 6-8 Review and monitor the vulnerability history, customer notification process, and patching/remediation for all 3rd party software.

 

This is not a comprehensive list – it’s a starting point.  Some CSCs listed above may not apply to your company, and other CSCs critical to your environment may not be in the list.  The technology used to implement CSCs updates over time to keep up with threats is more often than not playing catch up.  The bottom line is that even with a well-controlled environment you can still be vulnerable to unknown threats.

The key to detecting unknown threats is to know the baseline behavior for your network and look for deviations from that baseline.  If you know what normal login traffic looks like you can see when there are attempts to log in from unusual locations or at unusual times.  You can detect attempts to upload files to unknown ftp sites.  You can detect an unusual spike in web traffic. More importantly – you can isolate and investigate the behavior.

Logs from domain controllers, or web servers, or syslogs from network devices exist by default, but there is no easy way to correlate across the logs in their raw form to analyze problems.   We recommend Splunk® for its ability to correlate fields across different log formats, and its dashboard and alerting capabilities to highlight activity that deviates from baselines.

 

 

How Does an Advanced Threat Work?

In our last post we looked at the Verizon 2014 DBIR report’s recommendations for security controls from the SANS institute to protect against advanced threats.   Mandiant’s Threat Report – M-Trends 2015: A View From the Front Lines – provides additional context for why these controls are needed through a case study of an advanced threat.

The first step in the attack outlined in Mandiant’s study was gaining initial access to a corporate domain, and this was done by authenticating with valid credentials through a virtualized application server.  Mandiant was not able to determine how the credentials were obtained in this case.  While spear phishing and other user exploits are possible methods for harvesting credentials the attackers may have used software vulnerabilities on the system to intercept login credentials.  For example, the year old Heartbleed vulnerability was exploited to gain valid credentials used for attacks on Community Health Systems and the Canada Revenue Agency.

Once the attacker had access the next step was to use a misconfiguration of the virtual appliance to gain elevated privileges which allowed for the download of a password dumping utility.  That led to the local administrator password, which was the same for all systems, and thus gave access to every system in the domain.  Mandiant’s report then outlines how Metasploit was used to reconnoiter the environment, obtain corporate domain admin credentials, configure additional entry points, and install back doors to communicate with command and control over the internet.

Once the attacker was entrenched into the systems they moved on to their target in the retail environment.  The retail environment was a child domain of the corporate domain and the hacked domain admin credentials allowed full access.  The attacker copied malware to the retail sales registers and that malware harvested POS cards, copying the data to a workstation with internet access so that it could exfiltrate the data to the attacker’s servers via FTP.

Some important points about this attack are:

  1. Valid credentials were used.
    Patching vulnerable servers and educating users may make it harder but not impossible for attackers to obtain valid credentials.  Multifactor authentication could have made access more difficult and monitoring user login locations could have flagged unauthorized access.

  2. Server and application configuration is critical.
    An application misconfiguration led to local admin privileges and common local admin accounts made it easy to spread throughout the corporate domain and then to the target retail environment.  Using different admin credentials and restricting access to the retail environment would have slowed down and possibly prevented data exfiltration.

  3. Network traffic anomalies were not noticed.
    FTP traffic was used to download tools and upload data.  The attack used external command and control servers.  Watching traffic over the corporate and retail networks would have picked up recurring patterns in data sent to and received from unknown external addresses.

Advanced threats are designed to be persistent and difficult to detect.  Mandiant reported:

[A]ttackers still had a free rein in breached environments far too long before being detected—a median of 205 days in 2014 vs. 229 days in 2013. At the same time, the number of organizations discovering these intrusions on their own remained largely unchanged. Sixty-nine percent learned of the breach from an outside entity such as law enforcement. That’s up from 67 percent in 2013 and 63 percent in 2012.

 

It is not possible to completely eliminate the possibility of a data breach.  But you can limit the scope of a breach and detect it sooner by using the best practices outlined by SANS.org, and by monitoring your network and server activity with SIEM tools such as Splunk®.

 

Advanced Threats and Cyber Espionage

Cyber Espionage was one of the most complex attack patterns described in the 2014 Data Breach Investigations Report (DBIR) from Verizon.  Cyber Espionage is one form of an Advanced Threat in which an attack is specifically designed to infiltrate your network, dig in once it’s there, and then exfiltrate data back to the attacker’s servers.

Advanced threat attackers will typically enter the network through user activity and will exploit unpatched software security bugs in order to entrench themselves into your network.  In the best of all possible worlds users would know better than to open email attachments or click on suspicious links and your software would be instantly patched as soon as the patches are available.  In reality users can be fooled by sophisticated spearfishing emails that are indistinguishable from legitimate emails, or by a website that they’ve used previously without issue which has been compromised in a water hole attack.  And while patching systems quickly is the goal, the reality is that there are a large number of systems running older, unpatchable software that are easy targets.

The DBIR report assembled a list of security controls from the SANS Institute to combat Cyber Espionage – and all of them are fundamental best practices:

Even with best practices firmly in place due diligence is no guarantee against compromise.  Best practices can lower the chances that an advanced threat will make it in to your network but complete security coverage means more than just locking down your network.  You also need to check to see if you’ve been compromised and determine the scope of the problem if it’s there.

One of the difficulties in determining if you’ve been compromised by an Advanced Threat is that the signs are subtle.  Once your systems have been compromised there will be no ransomware demands or programs that suddenly stop working.  The point of an Advanced Threat is to infiltrate a computer without ever being detected and to then spread to other computers in the network so that the attack can maintain a foothold even if it’s removed from the originally infected computer.

Detecting an Advanced Threat means understanding what normal activity looks like on your network and monitoring it so that you notice activity outside normal operating parameters.   It means monitoring your outbound network activity to look for signs that data is being exfiltrated.  It means looking for internal network traffic on computers that have no reason to contact each other.  It may also mean monitoring servers or workstations for unusual processes or unexpected installations.  It means looking for any activity you can’t explain.

Finding the information to set up a security baseline isn’t the problem.  There is a vast amount of data available in logs and by setting up collectors for system or network activity.  The problem is filtering out the vast majority of normal activity in order to pinpoint the exceptions that indicate a compromise.  That’s where Splunk® can be used to pinpoint anomalies and correlate incidents across multiple log files to not only find a problem but also trace the scope of their activity.

Prioritizing Your Cyberdefenses

Verizon’s 2014 Data Breach Investigations Report (DBIR) compiled a decade’s worth of security incident data both from breaches and security incidents that did not result in data loss.  They were able to group the incidents into 9 patterns based on the method of attack, the attack target, and attacker motivations. They were Point of Sale Intrusions (e.g. Home Depot), Web App Attacks, Insider and Privilege Misuse (e.g. Edward Snowden), Physical Theft, Miscellaneous Errors (e.g. document misdelivery or incorrect document disposal), Crimeware, Payment Card Skimmers, Cyberespionage, and DOS attacks.  These 9 patterns described 92% of the attacks; the remaining 8% didn’t fit into the existing patterns but used similar underlying methods.

This breakdown of attack patterns is relevant because Verizon also found different industries were targeted by different types of patterns.  Obviously retailers with physical stores were targets for Point of Sale Intrusions but they were also primary targets of DOS attacks.  While one might expect Crimeware to be a significant problem in Healthcare it turns out that Physical Theft, Insider Misuse, and Miscellaneous Errors were far more serious issues.

Knowing the most likely avenue of attack for your environment enables you to prioritize your defenses.  Each pattern has a recommended set of control measures, securing POS systems can mean isolating those systems, restricting access, using and updating antivirus, and enforcing strong password policies.  If Crimeware or Cyberespionage are potential problems keeping a software inventory and scanning for unauthorized software may be more beneficial than restricting access.  The conclusion of the report includes charts displaying which security controls measures are most significant vs. threat pattern and industry, including links to the applicable sections of SANS Critical Security Controls.

Addressing the most significant threats to your network is a good place to start but obviously you want 100% protection and prioritizing your defenses for 92% of possible threats isn’t a complete defense. As we discussed in an earlier post looking at Cisco’s 2015 Annual Security Report,  malware and spam are evolving and becoming more difficult to detect so there is also the possibility that a new type of attack could make it through your defenses before you have the tools to defend it.

How can you defend against unknown attacks?  The answer is by knowing what your environment looks like when there is no problem and monitoring your environment for unusual activity that can indicate problems.  Network traffic to atypical outside sites could indicate someone trying to exfiltrate data from your environment.  Failed login attempts could be someone trying a brute force attack to log in to your network.  A large number of connections to your company’s web portal could be someone trying to hijack it.

Whether or not you’re looking at the activity in your environment most applications have logs recording the activity.  The difficulty is in differentiating normal behavior from threats and tracing threats across different log formats.  Heroix has begun to work with Splunk® to help analyze logs and identify attacks.

Security and Unsupported Linux Releases

Last September a serious vulnerability was found in the widely used Unix/Linux bash shell this vulnerability was dubbed “Shell Shock” and had been in existence since the beginnings of the bash shell in 1989.  The vulnerability was widely publicized and patches were developed and distributed quickly.  While the initial round of Shell Shock patches were incomplete, comprehensive patches have been available since October 2014.  In theory Shell Shock should no longer be a threat.

In practice, as per the Cisco 2015 Annual Security Report, there are still a large number of systems that have not yet been patched for Shell Shock.  Part of the reason systems haven’t been patched is that they are running older unsupported Linux releases for which patches are not available.  Unless a server is actively targeted for attack or displaying performance issues, upgrading to an actively supported OS and patching possible threats is often a lower priority for already overworked IT departments.

However as older code is further examined more bugs are found and the possible attack vectors increase.  Qualys’ recently discovered GHOST vulnerability is yet another bug found in older code.  GHOST uses a buffer overrun in the gethostbyname functions in the GNU glibc library to provide a means for attackers to execute arbitrary code.  The security bulletin includes C code for a program that will test for GHOST and outlines how GHOST can be exploited for the Exim mail server.

GHOST dates back to glibc 2.2 (released November 2000) and was fixed in glibc 2.18 (released August 2013), so the fix is already available.  All you have to do is upgrade to glibc version 2.18 or newer.

Unfortunately the repaired versions of glibc may not be an option for older OS versions.  One problem is that the glibc library is referenced by applications and a change in glibc versions can cause those applications to break.  As a result patching glibc involves both upgrading to an OS that supports the repaired glibc library and testing applications to make sure they can function with the newer system and library versions.  If the application doesn’t work on the newer system then the application needs to be upgraded as well.   If the application is no longer supported and doesn’t function on the new system version then the decision is between how critical the application is to your organization and how serious a security threat you consider unpatched vulnerabilities.

Finding and patching bugs will not eliminate all security threats.  Users may select insecure passwords, fall prey to phishing scams, or lose a laptop with cached passwords.  If an attacker wants it badly enough they will find a way into your network.  Patching security bugs will make it harder for them to succeed and may give you enough notice to block them completely.

Best Practices to Combat Computer Security Issues in 2015

Cisco’s recently published 2015 Annual Security Report summarized the security trends it found in 2014 and advised on best practices to address predicted threats in 2015.  Some of the key security findings were:

  • Malware is getting better at evading detection
    With increasing attention to Java security the number of security exploits using Java decreased over the course of 2014.  However exploits are becoming more difficult to detect because they change quickly. They are expanding into technologies not often exploited before (e.g. Microsoft’s Silverlight) and they can involve multiple technologies (e.g. using both Flash Player and Javascript).

  • Spam is growing in volume and sophistication
    Cisco found that the volume of spam increased 250% from January to November in 2014 and spammers are finding more ways of evading spam filters.  Spam content, sender addresses, and originating IP addresses can all be difficult to differentiate from legitimate emails.  In the case of hiding the sender’s IP address spammers have been using a “snowshoe” method of emailing in where the emails are sent from a large number of (often infected) computers. So it isn’t possible to track the email back to a specific blacklisted address.

  • Known threats are still problems on outdated software
    2014’s headline security threats – Heartbleed and Shellshock – are still issues.  Cisco’s surveys found that 56% of the OpenSSL implementations were using versions greater than 4 years old and only 10% of the IE browsers accessing sites were using the current version.  The problem is not that patches aren’t being applied, it’s that versions aren’t being updated to ones where patches are available for vulnerabilities. Cisco provided the following recommendation:

    To overcome the guaranteed eventual compromise that results from manual update processes, it may be time for organizations to accept the occasional failure and incompatibility that automatic updates represent.

  • Users are a significant vulnerability
    Malware may be inadvertently downloaded from seemingly safe websites – Cisco found that many high-ranking, short lived websites contained malware.  Malware was also installed through browser add ons and software downloads – often with misleading and confusing install options that trick the user into agreeing to install the malware.  In Cisco’s survey of 70 companies there were 711 users affected by malware at the beginning of the year. Rising to a peak of 1751 users affected during the month of September.

Cisco provided several recommendations on how to deal with the current security climate:

  • Adopt a more sophisticated endpoint visibility, access, and security (EVAS) control strategy.
    Even if you are able to secure your network you still have to plan for what to do if an attack occurs.  Determining the scope of an attack that makes it past a network’s front door means monitoring the potential target endpoints within the network.  EVAS monitors the endpoint activity within a network before, during, and after attacks, allowing you to formulate a plan to deal with mitigating the threat and providing the tools to conduct a forensic analysis to prevent the problem from occurring again.

  •  Security must be integrated into the business
    Business planners and security staff must work together to ensure that security is an integral part of all IT plans.  However security that makes it difficult for users to access resources can result in users finding ways to circumvent security measures.  Security planning must consider both protection from threats and accessibility for users.

  • Users must be included in the security plans
    No security plan will be able to address all problems.  There are too many possible threats and they change too quickly.  Users must be trained on what activities are potentially dangerous and how to recognize when there is a problem as well as how to report problems.

VMware NUMA Performance

In our previous post we outlined how NUMA works with Virtual Machines (VMs) that either fit entirely into one NUMA home node or VMs that are divided into multiple NUMA clients with each client assigned its own home node. The following points should be considered when working with VMs that use NUMA hardware:

  • The hypervisor may migrate VMs to a new NUMA node on the host.
    Hypervisors adjust physical resources as needed to balance VM performance and fairness of resource allocation over all VMs. If the home NUMA node for a VM is maxed out on CPU, the hypervisor may allocate CPU from a different NUMA node even if that means incurring latency from accessing remote memory. When CPU is available on the NUMA node with VM memory the hypervisor will migrate the CPU resources back to that NUMA node in order to improve memory access. The hypervisor will factor in CPU usage, memory locality, and the performance cost of moving memory from one NUMA node to another in its determination of whether to migrate a VM to a different NUMA node.


    VMs may also be migrated in an attempt to achieve long term fairness. The CPU Scheduler in VMware vSphere 5.1 gives an example of 3 VMs on 2 NUMA nodes. The NUMA node with 2 VMs will split resources between the 2 VMs while the NUMA node with only one VM will have all resources allocated to that VM. In the long term migrating VMs between the two nodes may average out performance but in the short term it just transfers the resource contention from one VM to another with the additional performance cost of moving memory from one NUMA node to the other. This type of migration can be disabled by setting the advanced host attribute /Numa/LTermFairnessInterval=0.

  • VMs that frequently communicate with each other may be placed on the same NUMA node.
    VMware’s ESX hypervisor may place VMs together on the same NUMA node if there is frequent communication between the VMs. This “ action-affinity” can end up causing an imbalance in VM placement on NUMA nodes by assigning multiple VMs to the same home NUMA node and underpopulating other NUMA nodes. In theory the gain from the VMs being able to access common memory resources will offset the increase in CPU ready levels. In practice this may not be the case and this feature may be disabled by setting /Numa/LocalityWeightActionAffinity=0 in advanced host attributes.

  • Hyperthreading doesn’t count when VMs are assigned to NUMA nodes.
    The hypervisor looks at available cores when determining which NUMA node to assign as the home node for a VM. If a NUMA node has 4 physical cores and a VM is allocated 8 processors, then the VM would be divided into 2 NUMA clients spread over 2 nodes. This ensures CPU resources but may increase memory latency. If a VM is running a memory intensive workload it may be more efficient to restrict the VM to one NUMA node by configuring the hypervisor to take hyperthreading into account. This is done by setting the numa.vcpu.preferHT advanced VM property to True.

  • VMs migrating between hosts with different NUMA configurations may experience degraded performance.
    For VMs moving to a host with smaller NUMA nodes, it is possible that they will need to be split into multiple NUMA clients, while hosts with larger NUMA nodes may be able to merge wide VMs into a single node. Performance will be degraded until the hypervisor on the new host can configure the VMs for the new host NUMA configuration.

  • VMs spread over multiple NUMA nodes may benefit from vNUMA.
    Some applications and operating systems can take advantage of the NUMA architecture to improve performance. A VM running these applications or operating systems that is spread across multiple NUMA nodes can be configured to use virtualized NUMA (vNUMA) to take advantage of the underlying architecture just as if it were on a physical host leading to large performance gains. However if the VM migrates to a new host and the NUMA configuration on the new host is different this could end up degrading performance until the VM can be restarted using the new vNUMA configuration.

While adjustments to the hypervisor’s NUMA algorithms may provide some performance improvements, the last two items are the most important takeaways. It is a best practice to ensure that hosts in a cluster have the same NUMA configuration to avoid performance issues when VMs move from one host to another.

How to Minimize CPU Latency in VMware with NUMA

On the most basic level CPUs do one thing: process data based on instructions. The faster the CPU the faster it processes data. But before a CPU can process data it has to read both the data and the instructions from slower system RAM and that latency can slow the CPU processing. In order to minimize the time the CPU is waiting on reading data, CPU architectures include on-chip memory caches that are much faster than RAM. However even though the on-chip caches have hit rates that are better than 95% there are still times when the CPU has to wait for data from RAM.

When the CPU reads from RAM the data is transferred along a bus shared by all the CPUs on a system. As the number of CPUs in a system increase the traffic along that bus increases as well, and CPUs can end up contending with each other to access RAM. This is where NUMA comes in – NUMA is designed to minimize the problem of system bus contention by increasing the number of paths between CPU and RAM.

NUMA (Non Uniform Memory Architecture) breaks up a system into nodes of associated CPUs and local RAM. NUMA nodes are optimized so that the CPUs in a node preferentially use the local RAM within that node. The result is that CPUs typically contend only with other CPUs within their NUMA node for access to RAM rather than with all the CPUs on a system.

As an example consider a system with 4 processor sockets, each with 4 cores and 128 GB RAM. Without NUMA that comes to 16 physical processors that could potentially be queued up on the same system bus to access 128 GB RAM. If this same system were broken up into 4 NUMA nodes each node would have 4 CPUs with local access to 32 GB RAM.

The ESXi hypervisor can manage virtual machines (VMs) so that they take advantage of the NUMA system architecture. The VMware 5.1 Best Practices Guide divides VMs running on NUMA into 2 groups:

  1. The number of virtual CPUs for a VM is less than or equal to physical CPUs in the NUMA node.
    The ESXi hypervisor assigns the VM to a home NUMA node where memory and physical CPU are preferentially used. Best practices in this case are that the allocated VM memory be less than the NUMA node memory. As far as the VM is concerned it is effectively on a non-NUMA system where all CPU and memory resources are local.

  2. The number of virtual CPUs for a VM is greater than the number of physical CPUs in the NUMA node (“Wide VMs”).
    Wide VMs are split into multiple NUMA clients with each client assigned a different home NUMA node.  For example, if a system had multiple NUMA nodes of 1 socket with 4 cores each (4 physical CPUs/node) and a wide VM had 8 virtual CPUs then ESXi can divide the VM into two NUMA clients with 4 physical CPUs each assigned to 2 different home NUMA nodes. The problem with dividing a wide VM into multiple NUMA clients is that it introduces the possibility that one of the client nodes may need to access memory from a different NUMA client node.

Note: In a previous post we discussed hyperthreading – as per The CPU Scheduler in VMware vSphere 5.1, hyperthreading isn’t taken into account when you’re calculating the number of available virtual processors on a NUMA node.

In our next post we’ll take a look at using virtual NUMA (vNUMA) to minimize wide VMs accessing remote NUMA memory and what happens when VMs configured to use NUMA are migrated to a host with a different NUMA system configuration.