SLA Monitoring for Distributed Applications

What are distributed applications?

A distributed application is any application in which the components are distributed across multiple devices.  For example, a web based application’s components could include multiple web servers, backend databases and networking devices.  For the application to function at an acceptable level each component of the application must function at an acceptable level.

The first step in monitoring a distributed application is to break it down into its components.  Components can be entirely on one server or distributed across multiple servers in a network, and can even include cloud or internet based resources. The performance of each component and the speed and integrity of the connections between components has an effect on the application as a whole.  You need to be able to differentiate between an application hanging because a SQL Query took too long to respond, which you can troubleshoot, and a remotely hosted javascript taking too long to run, which is out of your hands.

 

Why use distributed applications?

There are some cases where it can be feasible to use a single computer but the use of a distributed system is more practical for larger and more business critical applications. For example, it might be more cost-effective to share the workload across a cluster of low-end computer hardware rather than investing in a single high-end computer. An application built on a cluster of systems is inherently more scalable and fault tolerant.

Distributed applications can also use the Internet and the Cloud for both application features and additional resources that might be impractical to provide locally.  Remotely hosted search engines, mapping utilities, file sharing, etc., can be seamlessly integrated into locally hosted web sites without the cost of developing, hosting, or maintaining these features.  The drawback to using remotely hosted features is that you have no control over their performance and need to monitor them to ensure that they do not create a bottleneck for overall application performance.

 

What is a distributed application SLA?

The basic concept of a distributed application SLA is monitoring the performance data for each application’s components as it pertains to the performance of the overall application.  So, for a web server, there would be a test on how long it takes for a page to load, or for a SQL server, how long it takes to respond to a query.  The level of detail can range from basic user perspective (e.g. the web page must load in less than 2 seconds), to more in depth performance metrics (send an alert if any SQL Server disk queue length is > 2).

 

What are the parameters of distributed SLA monitoring?

Longitude SLA’s have three states:  Good, Degraded, and Unacceptable.  Each performance measure is given thresholds that indicate the transition from “good” to “degraded”, and from “degraded” to “unacceptable” performance.  For example, if a ping takes longer than 1500 ms to return that component may slip from “good” to “degraded”.  If it takes longer than 3000 ms it becomes unacceptable.  In terms of user experience this can translate to waiting a few seconds for a page to load (good), vs. waiting a minute (degraded), vs. the browser timing out (unacceptable).

Each component’s state is then color coded and stacked together in a real-time chart that provides a visual correlation between components.  The underlying network congestion that is causing pings to slow down will affect other network dependent components and will be visible over all components affected by network congestion.  This visual correlation can be used to quickly pinpoint the root cause of performance issues in an application.

 

Why do businesses need distributed application SLA monitoring?

Any enterprise with contractual performance obligations involving service level agreements needs to ensure and document compliance as well as reasons and causes of noncompliance. Correlating user experience measurements with the infrastructure of the distributed network can provide vital indicators affecting contract fulfillment as well as future business. To name three:

  • application availability
  • long term application performance
  • root causes of outages or performance degradation affecting end-users

 

How does the Heroix Longitude SLA dashboard work?

Longitude’s SLA Dashboard gives real time displays of SLA performance as well as:

  • drilldown for investigating compliance shortfalls
  • historical reports
  • ability to comment on extenuating circumstances or actions taken to resolve the problem identified
  • tools for documenting changes in IT infrastructure and measuring the effects on SLA performance

 

Is Heroix Longitude right for your business?

If you are looking for affordable, easy-to-use performance monitoring software that works across Windows, Unix, Linux, VMware, Hyper-V and network devices, and can keep your distributed system compliant and working, our product and support maybe for you.

Stay tuned to this location for more details.

Identifying Critical Error Messages in MS SQL

Administering even a single instance of MS SQL can be challenging, let alone a series of linked servers.  While MS SQL is a powerful tool for data management, that potential power also comes with a great deal of complexity.  Every IT admin worth their salt will certainly need to closely monitor the MS SQL instances they are  watching over, but the sheer volume of potential issues and corresponding error messages that are generated can be exceptionally overwhelming.

In this article we’ll examine the most critical error messages presented by an MS SQL instance, from how to identify the possibilities that will occur to how those messages will propagate into the local Windows Event Log.

Evaluating Your Environment’s Potential Errors

With MS SQL being a dominant database backend for over 25 years now, the number of versions of SQL Server that exist on the market are numerous, so it may be important to learn how to properly identify the critical error messages that are possible for the particular installed version you are working with.  Thankfully, MS SQL itself makes this task quite easy with some simple queries.

Begin by connecting to a relevant database you wish to check then run the following query:

SELECT *
FROM master.dbo.sysmessages
WHERE msglangid = 1033
ORDER BY severity, description;

This will display the entire list of system messages available filtering the list to only display messages that are localized in US English (msglangid = 1033).  If you are using a different localization you can find the appropriate msglangid code by executing the following query and replacing the above msglangid to match your own language.

SELECT name, alias, msglangid
FROM sys.syslanguages;

Now with the full list of potential MS SQL messages at our fingertips, we need to narrow this down from all possible messages to only errors. From there we can decide which errors are the most critical and require monitoring or alerts.

MS SQL Message Severity Levels

Due to the sheer volume of possible error types that MS SQL can report, MS SQL assigns all errors with a numeric severity value indicating how critical, or severe, the error message is.  The MS SQL documentation suggests that all “error messages with severity levels 17 and higher” should be dealt with by an administrator.  You may find the full details on what each severity level 17+ means from the above URL, but below are the basic descriptions to give a brief overview:

  • 17: Insufficient Resources
  • 18: Nonfatal Internal Error Detected
  • 19: SQL Server Error in Resource
  • 20: SQL Server Fatal Error in Current Process
  • 21: SQL Server Fatal Error in Database (dbid) Processes
  • 22: SQL Server Fatal Error Table Integrity Suspect
  • 23: SQL Server Fatal Error: Database Integrity Suspect
  • 24: Hardware Error

It is worth noting that while some older versions of MS SQL will produce errors with a severity level of 25 most modern instances will max out at a severity level of 24.

Regardless, it’s important to note that every message with a severity of 17 – 19 should be monitored at the very least, while severity levels of 20+ indicate a fatal system error in which the current process that was executing has halted.  As seen above the particular reason for a fatal error can vary but in all cases the error is written to the error log and must be alerted to an administrator immediately for resolution.

Viewing Fatal Errors

With a slight modification to our previous query we can now take a look at the list of fatal errors that should be monitored and generate an administrative alert:

SELECT *
FROM master.dbo.sysmessages
WHERE msglangid = 1033
AND severity >= 20
ORDER BY severity DESC, description;

This lists fatal errors in roughly the order of severity beginning with hardware failures and other messages that must be addressed immediately.

Best of all, the error column that indicates the specific error number can be matched up to messages found in the appropriate Windows Application Event Log under the Event ID column, allowing you to easily compare events from the Event Log to errors that MS SQL may be throwing.

While the number of potential errors that MS SQL can produce can be high, adding some basic filters to monitor and alert for errors above a particular severity level, you can be sure to stay on top of everything that is happening on your server.
To greatly simplify the monitoring process for all these MS SQL errors, begin your free trial of Longitude by Heroix today!

Phishing Kits and Tackle Boxes: Understanding the Danger and Being Prepared

Fishers and phishers both use lures to attract the unsuspecting. Fishers of course catch fish, while phishers capture confidential data that can be used to gain unauthorized access to secure systems. Here’s a closer look at how phishers go phishing and the tools they use. We’ll also cover a few preventative tips to make sure you don’t fall for phishing scams.

Phishing Kits

Professional phishers usually have a go-to tackle box or phishing kit that they use on their phishing endeavors. Some phishers make tackle boxes for themselves, but others use pre-made phishing kits. Pre-made phishing kits can come with email and proxy server lists, pre-generated HTML pages, hosting services, and even scripts to process user input.

What does the average phisher’s tackle box look like?

Phishers don’t have to be technologically advanced because they can purchase a virtual tackle box that comes with all the tools they’ll need to start their phishing attacks:

  • Specialized malware
  • Technical deceit
  • Abuse of DNS
  • Bots/botnets
  • Session hijacking

Specialized Malware

Malware has become extremely sophisticated over the past few years. With the simple installation of malware on a single computer, phishers can spread through a network and dig themselves in to make complete removal extremely difficult. Once they’re in, they can start stealing confidential information and sending it back over the internet to their own servers. Phishers have become incredibly successful at being able to redirect traffic from a legitimate web server to their own web site. Once the phisher’s site is accessed from the victim’s computer, they can infect the victim with malware, take advantage of browser vulnerabilities, or steal credentials when the user attempts to log on.

Technical Deceit

Because more people are becoming aware of phishing and how to identify deceitful online tactics, technical deceit has become more advanced as well. Phishers now have the capability to fully mimic web pages and dialog boxes, providing direct access to authentication information.

Botnets

A botnet is a large number of computers that have been infected with bot malware. When the malware is activated, the infected computers can be used to send spam, take part in DDOS attacks, steal credentials, etc. Bots are often deployed through social networking platforms via mass mailing, instant messaging features, and file-sharing applications. Phishers with control over the botnets can perform an assortment of illicit activities, including:

  • Proxy services
  • Sending spam and phishing emails
  • Surveillance
  • Installation of malware
  • Updating existing malware

 

Session Hijacking

Even though most phishing scams take place by leading a person to a malicious site, there are instances in which a session can be hijacked. By capturing cookies or session IDs, hackers can impersonate users, even if the user is logged on to a legitimate site following standard security practices.

Tips for Avoiding Phishing Scams

Now that you know what the average phisher’s tackle box looks like, you can take measures to avoid phishing and malware. Tips to keep in mind include:

  • Learn about phishing: The only way to avoid a problem is if you know it exists, so make sure to keep yourself regularly updated on the latest phishing scams.
  • Use browser add-ons and plugins: All major browsers have add-ons, extensions and plugins that can looks for signs of phishing and provide warnings for sites with bad reputations.
  • Look for “https”: On any site that you are asked to enter personal/financial information, make sure the URL link starts off with https instead of http. Https sites use security certificates that not only validate the identity of the site, but also encrypt any data you send over the internet.

Contact Heroix for more details on how you can detect malware and phishing in your network.

Stay Protected from Phishing – The Importance of Strong Authentication

The Internet has brought with it an assortment of benefits. From ecommerce to online degree programs, people all around the world are able to communicate and conduct business over secure Internet connections. And while these secure connections have made it possible to send confidential data over the Internet safely, they have also opened up an entirely new form of criminal activity — phishing for authentication credentials.

It’s important to note that phishing isn’t restricted to insecure Internet sites. Phishing can also be accomplished with computer programs, scripts, and malware that take advantage of security bugs. And unfortunately the resources that are needed to carry out phishing are available from both private and public sources in the form of exploit kits. Some exploit kits have evolved and became automated, making it much easier for criminals without advanced technology skills to execute their own attacks.

In order to protect your company from phishing scams, it is imperative that proper countermeasures be put into place. Here’s a closer look at phishing and the importance of strong authentication within your IT infrastructure and online activities.

Man-in-the-Middle Phishing Attacks

One of the primary reasons that phishing is so difficult to deter is because you sometimes never even know that it’s happening. When a criminal carries out a man-in-the-middle phishing attack, communications taking place between two parties are intercepted by the attacker. The attacker could modify transmitted data while the victim is still logged on, or use or sell the intercepted information to impersonate the victim later.

Dialog Box Phishing Attacks

Phishing malware comes in many advanced forms, including dialog box overlays that can obtain authentication and PIN numbers. For example, the malware could sit on top of your bank’s login screen and collect your username and password. If it passed those credentials on to the real login screen, you might never know you were compromised.

Phishing Countermeasures: It’s Time to Enhance Your Authentication Processes

The purpose of authentication is to verify a person’s identity to ensure they have the proper credentials to access whatever data it is they are trying to look at. To guard against illicit use of phished or easily guessed credentials, an effective way to beef up the authentication process is by requiring more than just a username and password. Many companies will ask for two forms of information, such as a person’s birth date and a PIN number. Some companies, however, will mandate that a user have a specific hardware token to access the information in addition to two other pieces of information. The more information that is needed, the more secure the authentication process will be.

Although strong authentication processes can sometimes be a bit tedious, they are a necessary evil. One way to implement a two-step authentication process is to include SMS authentication; this strategy requires customers to enter in at least one piece of information — security question answer, birth date, SSN, etc. — and then after the correct answer is given, an SMS message is sent to the customer’s phone with a unique login password. This type of security significantly lowers the risk of attackers being able to capture all authentication information that is needed to hack your customers’ accounts without forcing the customers to remember the answers to a long list of questions.

To learn more about phishing and how you can steer clear of attackers, check out the Heroix blog today!

Industrial Control Systems and Security

Symantec’s 2015 Website Security Threat Report included details of methods used in targeted attacks on Industrial Control Systems (ICS).  ICSes control not just industrial facilities, but critical infrastructure components such as energy, transportation and communications networks.   Repercussions of hacks on  ICSes can include the potential sabotage of energy systems or the massive damage done in a German steel mill when an incursion led to the unscheduled shut down  of a blast furnace.

The function of ICSes makes them a high priority targets for hackers.  Long term attacks using every possible attack vector and vulnerability are the norm and understanding the best practices needed to protect ICSes can provide significant insight on protecting systems that are not as heavily targeted.

NIST’s Guide to Industrial Control Systems (ICS) Security notes that ICSes were originally not available over the internet, and were therefore subject to threats from local access only.  ICSes primary design concerns were availability, safety, and performance, and patching security vulnerabilities cannot sacrifice these features.  With that in mind, NIST provides the following recommendations for securing ICSes:

  • Restricting logical access to the ICS network and network activity
  • Restricting physical access to the ICS network and devices
  • Applying security patches “in as expeditious manner as possible, after testing them under field conditions”
  • Disabling all unused ports and services
  • Restricting user privileges to only required privileges
  • Antivirus software
  • File integrity checking software

Symantec’s report found that in practice ICSes were not locked down as stringently as NIST recommended.

The Dragonfly group exploited multiple vulnerabilities in ICS security with attacks on the US energy sector that date back to 2011.  They started with a spear phishing attack, and then moved on to compromising websites that users at their targets were likely to visit in a watering hole attack.  The compromised websites redirected visitors to other sites that hosted exploit kits that were then downloaded onto the target network’s computers.  Exploits at this stage were primarily used for recon on the corporate network, capturing everything from file names to passwords.   With hackers entrenched in the corporate network the only way to protect the ICS is to completely segregate its network.

The last stage of the Dragonfly attack was innovative and broke directly into the ICS network.  They used a variant of the watering hole attack in which software from ICS equipment manufacturers was infected.  When ICSes checked for updates, they downloaded the malware along with the update.  A thorough test of patches before installation might have been able to detect the infected software.

Another direct attack on the ICS systems was access to internet accessible human-machine interfaces (HMIs).  Symantec outlines the vulnerabilities in HMIs and other ICS web interfaces:

Many of the proprietary Web applications available have security vulnerabilities that allow buffer overflows, SQL injection, or cross-site scripting attacks. Poor authentication and authorization techniques can lead the attacker to gain access to critical ICS functionalities. Weak authentication in ICS protocols allows for man-in-the-middle attacks like packet replay and spoofing.

Best practices for vulnerable, internet accessible applications are patching where available, restricting access to secure channels (e.g. VPN), and implementing multifactor authentication.

The stakes are higher for ICSes, but the best practices are the same.  Keep your high value targets segregated.  Keep up to date on patches and antivirus definitions.  Educate your users on security.  And finally, make sure you monitor your networks for any and all suspicious activity.

Why You Should Care About Legacy Patches

As we noted in previous posts on security best practices and unsupported releases, unpatched applications and operating systems can be a significant security vulnerability.  Verizon’s 2015 DBIR includes a survey of the Common Vulnerabilities and Exposures (CVEs) observed in security incidents in 2014.  They found that “Ten CVEs account for almost 97% of the exploits observed in 2014”, with the top 10 threats ranging in age from 1999-2014:

CVE-2002-0012  SNMP V1 trap vulnerabilities
CVE-2002-0013 SNMP V1 GetRequest, GetNextRequest and SetRequest vulnerabilities
CVE-1999-0517 SNMP default (public) or null community string
CVE-2001-0540 Malformed RDP requests cause memory leak in Windows NT/2000 Terminal Services
CVE-2014-3556 I/O buffering bug in nginx SMTP proxy allows for command injection attack into encrypted SMTP sessions.  (Fixed in nginx 1.7.4)
CVE-2012-0152 Windows 7/2008 Terminal Server denial of service vulnerability
CVE-2001-0680 QPC’s QVT/Net 4.0 and AVT/Term 5.0 ftp daemon can allow remote users to traverse directories
CVE-2002-1054 Pablo ftp server allows arbitrary directory listing
CVE-2002-1931 PHP Arena paFileDB versions 1.1.3 and 2.1.1 vulnerable to cross site scripting
CVE-2002-1932 Windows XP and 2000 do not send alerts when Windows Event Logs overwrite entries

While these CVEs covered the majority of the observed incidents, patching only the applicable CVEs in this list is not a shortcut to 97% safety.  Thousands of vulnerabilities exist, more are discovered on a daily basis, and any vulnerability present on your system is subject to exploitation.  Even if you do get everything in your environment patched you will still need to apply new patches in a timely manner.  To get an idea of exactly what “timely manner” means we can use the DBIR’s examination of how much time it took for newly discovered CVEs to be exploited after they were published:

Half of the CVEs exploited in 2014 fell within two weeks. What’s more, the actual time lines in this particular data set are likely underestimated due to the inherent lag between initial attack and detection readiness (generation, deployment, and correlation of exploits/signatures).

The timeline between a bug’s discovery and exploitation puts “timely manner” somewhere between “drop everything and do it now” and “wait for the next maintenance window” depending on the severity of the problem and your organization’s sensitivity to patch associated downtime.

This prompts the question: Should you set all your software to update automatically and clean up after any patch bugs and outages that might cause software issues?  In a previous post we looked at the topic of automatic patch updates, and listed some criteria for determining when to apply patches.  Your patching policy will be a balance between catching up on old vulnerabilities, patching new ones in a timely manner, and maximizing availability for your users.

Regardless of how well caught up you are on patches there will be unavoidable holes in patch coverage and security.  As with the Shellshock bug last September, the initial patches needed to be revised and patches were not available for all platforms at the same time.  The nature of zero day exploits is that the bug is already there, you just don’t know about it.  Effective patch management will minimize but not eliminate your security risks.  You still need to monitor your systems for unexpected behavior that could indicate a security problem.

Where do you start with Critical Security Controls?

Verizon’s 2015 Data Breach Investigation Report (DBIR) was published last month, building on the findings of 2014’s DBIR report.  We’ll take a look at the major findings of the 2015 report in our next post, but in the spirit of heading for the dessert bar first, we’ll start with the goodies.  The conclusion of the report includes a table of the top Critical Security Controls (CSCs) from the SANS Institute which cover the majority of the observed threats:

CSC ID# Description
CSC 13-7 Two-factor authentication for remote logins.
CSC 6-1 Make sure applications are up to date, patched and supported versions.
CSC 11-5 Verify that internal devices available on the internet are actually required to be available on the internet.
CSC 13-6 Use a proxy on all outgoing traffic to provide authentication, logging, and the ability to whitelist or blacklist external sites.
CSC 6-4 Test web applications both periodically and when changes are made.  Test applications under heavy loads for both DOS and legitimate high use cases.
CSC 16-9 Use an account lockout policy for too many failed login attempts.
CSC 17-13 Block known file transfer sites.
CSC 5-5 Scan email attachments, content, and web content before it reaches the user’s inbox.
CSC 11-1 Remove unneeded services, protocols and open ports from systems.
CSC 13-10 Segment the internal network and limit traffic between segments to specific approved services.
CSC 16-8 Use strong passwords – suggested:

  • contain letters, numbers and special characters
  • changed at least every 90 days
  • minimal password age 1 day before reset
  • cannot use 15 previous passwords as new password
CSC 3-3 Restrict admin privileges to prevent installation of unauthorized software and admin privilege abuse.
CSC 5-1 Use antivirus software on all endpoints and log all detection events.
CSC 6-8 Review and monitor the vulnerability history, customer notification process, and patching/remediation for all 3rd party software.

 

This is not a comprehensive list – it’s a starting point.  Some CSCs listed above may not apply to your company, and other CSCs critical to your environment may not be in the list.  The technology used to implement CSCs updates over time to keep up with threats is more often than not playing catch up.  The bottom line is that even with a well-controlled environment you can still be vulnerable to unknown threats.

The key to detecting unknown threats is to know the baseline behavior for your network and look for deviations from that baseline.  If you know what normal login traffic looks like you can see when there are attempts to log in from unusual locations or at unusual times.  You can detect attempts to upload files to unknown ftp sites.  You can detect an unusual spike in web traffic. More importantly – you can isolate and investigate the behavior.

Logs from domain controllers, or web servers, or syslogs from network devices exist by default, but there is no easy way to correlate across the logs in their raw form to analyze problems.   We recommend Splunk® for its ability to correlate fields across different log formats, and its dashboard and alerting capabilities to highlight activity that deviates from baselines.

 

 

How Does an Advanced Threat Work?

In our last post we looked at the Verizon 2014 DBIR report’s recommendations for security controls from the SANS institute to protect against advanced threats.   Mandiant’s Threat Report – M-Trends 2015: A View From the Front Lines – provides additional context for why these controls are needed through a case study of an advanced threat.

The first step in the attack outlined in Mandiant’s study was gaining initial access to a corporate domain, and this was done by authenticating with valid credentials through a virtualized application server.  Mandiant was not able to determine how the credentials were obtained in this case.  While spear phishing and other user exploits are possible methods for harvesting credentials the attackers may have used software vulnerabilities on the system to intercept login credentials.  For example, the year old Heartbleed vulnerability was exploited to gain valid credentials used for attacks on Community Health Systems and the Canada Revenue Agency.

Once the attacker had access the next step was to use a misconfiguration of the virtual appliance to gain elevated privileges which allowed for the download of a password dumping utility.  That led to the local administrator password, which was the same for all systems, and thus gave access to every system in the domain.  Mandiant’s report then outlines how Metasploit was used to reconnoiter the environment, obtain corporate domain admin credentials, configure additional entry points, and install back doors to communicate with command and control over the internet.

Once the attacker was entrenched into the systems they moved on to their target in the retail environment.  The retail environment was a child domain of the corporate domain and the hacked domain admin credentials allowed full access.  The attacker copied malware to the retail sales registers and that malware harvested POS cards, copying the data to a workstation with internet access so that it could exfiltrate the data to the attacker’s servers via FTP.

Some important points about this attack are:

  1. Valid credentials were used.
    Patching vulnerable servers and educating users may make it harder but not impossible for attackers to obtain valid credentials.  Multifactor authentication could have made access more difficult and monitoring user login locations could have flagged unauthorized access.

  2. Server and application configuration is critical.
    An application misconfiguration led to local admin privileges and common local admin accounts made it easy to spread throughout the corporate domain and then to the target retail environment.  Using different admin credentials and restricting access to the retail environment would have slowed down and possibly prevented data exfiltration.

  3. Network traffic anomalies were not noticed.
    FTP traffic was used to download tools and upload data.  The attack used external command and control servers.  Watching traffic over the corporate and retail networks would have picked up recurring patterns in data sent to and received from unknown external addresses.

Advanced threats are designed to be persistent and difficult to detect.  Mandiant reported:

[A]ttackers still had a free rein in breached environments far too long before being detected—a median of 205 days in 2014 vs. 229 days in 2013. At the same time, the number of organizations discovering these intrusions on their own remained largely unchanged. Sixty-nine percent learned of the breach from an outside entity such as law enforcement. That’s up from 67 percent in 2013 and 63 percent in 2012.

 

It is not possible to completely eliminate the possibility of a data breach.  But you can limit the scope of a breach and detect it sooner by using the best practices outlined by SANS.org, and by monitoring your network and server activity with SIEM tools such as Splunk®.

 

Advanced Threats and Cyber Espionage

Cyber Espionage was one of the most complex attack patterns described in the 2014 Data Breach Investigations Report (DBIR) from Verizon.  Cyber Espionage is one form of an Advanced Threat in which an attack is specifically designed to infiltrate your network, dig in once it’s there, and then exfiltrate data back to the attacker’s servers.

Advanced threat attackers will typically enter the network through user activity and will exploit unpatched software security bugs in order to entrench themselves into your network.  In the best of all possible worlds users would know better than to open email attachments or click on suspicious links and your software would be instantly patched as soon as the patches are available.  In reality users can be fooled by sophisticated spearfishing emails that are indistinguishable from legitimate emails, or by a website that they’ve used previously without issue which has been compromised in a water hole attack.  And while patching systems quickly is the goal, the reality is that there are a large number of systems running older, unpatchable software that are easy targets.

The DBIR report assembled a list of security controls from the SANS Institute to combat Cyber Espionage – and all of them are fundamental best practices:

Even with best practices firmly in place due diligence is no guarantee against compromise.  Best practices can lower the chances that an advanced threat will make it in to your network but complete security coverage means more than just locking down your network.  You also need to check to see if you’ve been compromised and determine the scope of the problem if it’s there.

One of the difficulties in determining if you’ve been compromised by an Advanced Threat is that the signs are subtle.  Once your systems have been compromised there will be no ransomware demands or programs that suddenly stop working.  The point of an Advanced Threat is to infiltrate a computer without ever being detected and to then spread to other computers in the network so that the attack can maintain a foothold even if it’s removed from the originally infected computer.

Detecting an Advanced Threat means understanding what normal activity looks like on your network and monitoring it so that you notice activity outside normal operating parameters.   It means monitoring your outbound network activity to look for signs that data is being exfiltrated.  It means looking for internal network traffic on computers that have no reason to contact each other.  It may also mean monitoring servers or workstations for unusual processes or unexpected installations.  It means looking for any activity you can’t explain.

Finding the information to set up a security baseline isn’t the problem.  There is a vast amount of data available in logs and by setting up collectors for system or network activity.  The problem is filtering out the vast majority of normal activity in order to pinpoint the exceptions that indicate a compromise.  That’s where Splunk® can be used to pinpoint anomalies and correlate incidents across multiple log files to not only find a problem but also trace the scope of their activity.

Prioritizing Your Cyberdefenses

Verizon’s 2014 Data Breach Investigations Report (DBIR) compiled a decade’s worth of security incident data both from breaches and security incidents that did not result in data loss.  They were able to group the incidents into 9 patterns based on the method of attack, the attack target, and attacker motivations. They were Point of Sale Intrusions (e.g. Home Depot), Web App Attacks, Insider and Privilege Misuse (e.g. Edward Snowden), Physical Theft, Miscellaneous Errors (e.g. document misdelivery or incorrect document disposal), Crimeware, Payment Card Skimmers, Cyberespionage, and DOS attacks.  These 9 patterns described 92% of the attacks; the remaining 8% didn’t fit into the existing patterns but used similar underlying methods.

This breakdown of attack patterns is relevant because Verizon also found different industries were targeted by different types of patterns.  Obviously retailers with physical stores were targets for Point of Sale Intrusions but they were also primary targets of DOS attacks.  While one might expect Crimeware to be a significant problem in Healthcare it turns out that Physical Theft, Insider Misuse, and Miscellaneous Errors were far more serious issues.

Knowing the most likely avenue of attack for your environment enables you to prioritize your defenses.  Each pattern has a recommended set of control measures, securing POS systems can mean isolating those systems, restricting access, using and updating antivirus, and enforcing strong password policies.  If Crimeware or Cyberespionage are potential problems keeping a software inventory and scanning for unauthorized software may be more beneficial than restricting access.  The conclusion of the report includes charts displaying which security controls measures are most significant vs. threat pattern and industry, including links to the applicable sections of SANS Critical Security Controls.

Addressing the most significant threats to your network is a good place to start but obviously you want 100% protection and prioritizing your defenses for 92% of possible threats isn’t a complete defense. As we discussed in an earlier post looking at Cisco’s 2015 Annual Security Report,  malware and spam are evolving and becoming more difficult to detect so there is also the possibility that a new type of attack could make it through your defenses before you have the tools to defend it.

How can you defend against unknown attacks?  The answer is by knowing what your environment looks like when there is no problem and monitoring your environment for unusual activity that can indicate problems.  Network traffic to atypical outside sites could indicate someone trying to exfiltrate data from your environment.  Failed login attempts could be someone trying a brute force attack to log in to your network.  A large number of connections to your company’s web portal could be someone trying to hijack it.

Whether or not you’re looking at the activity in your environment most applications have logs recording the activity.  The difficulty is in differentiating normal behavior from threats and tracing threats across different log formats.  Heroix has begun to work with Splunk® to help analyze logs and identify attacks.