Whither HSMs (in the cloud)

Hardware Security Modules (HSMs) are physical devices attached or embedded in another computer to handle various cryptographic functions. HSMs are supposed to provide both physical and logical protection of the cryptographic material stored on the HSM while handling cryptographic functions for the computer to which they are attached.

As websites move to the cloud, are HSMs the right way to achieve our goals?

Before we talk about goals, it is useful to consider a basic model for talking about them. Our Safety team often uses the following model to consider whether a system is safe:
  • What are the goals we are trying to achieve? (Or, in Leveson's STPA hazard-oriented view, what are the accidents/losses which you wish to prevent?)
  • What are the adversaries we wish to defeat?
  • What are the powers available to those adversaries? What *moves* are available to them?
  • And finally, what controls inhibit adversaries' use of their powers, thus protecting our goals?
Our hazards (or unacceptable losses) are:
  • An adversary can operate a webserver that pretends to be ours;
  • An adversary can decrypt SSL traffic; and
  • An adversary can conduct a man-in-the-middle attack on our SSL website.
In the protection of SSL certificates in the cloud, it would seem that our goals are two-fold:
  • Keep the private key *secret* from third parties; and
  • Prevent unauthorized and undetected use of the key in cryptographic functions. While SSL certificate revocation is a weak control (many browsers do not check for revocation), it is that which generally constrains this goal to both unauthorized *and* undetected; a detected adversary can be dealt with through revocation.
I could argue that the first is a special case of the second, except that I want to distinguish between "cryptographic functions over the valid lifetime of the certificate" and "cryptographic functions after the certificate is supposed to be gone."

As an aside, I could also argue that these goals are insufficient; after all, except for doing man in the middle attacks, *any* SSL certificate signed by any of the many certificate authorities in the browser store would enable an adversary to cause the first of the losses. HSMs don't really help with that problem.

Given that caveat, what are the interesting adversaries? I propose four "interesting" adversaries, mostly defined by their powers:
  • The adversary who has remotely compromised a server;
  • The adversary who has taken physical control of a server which is still online;
  • The adversary who has taken physical control of a server at end of life; and
  • The adversary who has been given administrative access to a system.
The moves available to these adversaries are clear:
  • Copy key material (anyone with administrative access);
  • Change which key material or SSL configuration we'll use (thus downgrading the integrity of legitimate connections)
  • Escalate privileges to administrative access (anyone with physical or remote access); and
  • Make API calls to execute cryptographic functions (anyone with administrative access).
What controls will affect these adversaries?
  • Use of an HSM will inhibit the copying of keying material;
  • Use of revocation will reduce the exposure of copied keying material;
  • System-integrated physical security (systems that evaluate their own cameras and cabinets, for instance) inhibit escalation from physical access to administrative access;
  • Auditing systems inhibits adversary privilege escalation;
  • Encrypting keying material, and only providing decrypted versions to audited, online systems inhibits adversaries with physical control of systems.
What I find interesting is that for systems outside the physical purview of a company, HSMs may have a subtle flaw: since HSMs must provide an API to be of use, *that API remains exposed to an adversary who has taken possession of an HSM*. This may be a minor issue if an HSM is in a server in a "secure" facility, it becomes significant in distributed data centers. On the contrary, the control system which includes tightly coupled local physical security, auditing, and software encryption may strike a different balance: slightly less stringent security against an adversary who can gain administrative access (after all, they can likely copy the keys), in exchange for greater security against adversaries who have physical access.

This isn't to say that this is the only way to assemble a control system to protect SSL keys; merely that a reflexive jump to an HSM-based solution may not actually meet the security goals that many companies might have.

(Full disclosure: I’m the primary inventor of Akamai’s SSL content delivery network, which has incorporated software-based key management for over a decade.)
cross posted on The Akamai Blog.

Cognitive Injection

Context: I’m giving a talk today at noon at DerbyCon, entitled “Cognitive Injection: Reprogramming the Situation Oriented Human OS”. Slides are here.

It's a trope among security professionals that other humans - mere mundanes - don't 'get' security, and make foolish decisions. But this is an easy out, and a fundamental attribution error. Everyone has different incentives, motivators, and even perceptions of the world. By understanding this -- and how the human wetware has evolved over the the last fifty thousand years or so -- we can redesign our security programs to better manipulate people.

Assessment of the BREACH vulnerability

The recently disclosed BREACH vulnerability in HTTPS enables an attack against SSL-enabled websites. A BREACH attack leverages the use of HTTP-level compression to gain knowledge about some secret inside the SSL stream, by analyzing whether an attacker-injected "guess" is efficiently compressed by the dynamic compression dictionary that also contains the secret. This is a type of an attack known as an oracle, where an adversary can extract information from an online system by making multiple queries to it.

BREACH is interesting in that it isn't an attack against SSL/TLS per se; rather, it is a way of compromising some of the secrecy goals of TLS by exploiting an application that will echo back user-injected data on a page that also contains some secret (a good examination of a way to use BREACH is covered by Sophos). There are certain ways of using HTTPS which make this attack possible, and others which merely make the attack easier.

Making attacks possible

Impacted applications are those which:
  • Include in the response body data supplied in the request (for instance, by filling in a search box);
  • Include in the response some static secret (token, session ID, account ID); and
  • Use HTTP compression.
For each of these enabling conditions, making it untrue is sufficient to protect a request. Therefore, never echoing user data, having no secrets in a response stream or disabling compression are all possible fixes. However, making either of the first two conditions false is likely infeasible; secrets like Cross-Site Request Forgery (CSRF) tokens are often required for security goals, and many web experiences rely on displaying user data (hopefully sanitized to prevent application injection attacks). Disabling compression is possibly the only "foolproof" and straightforward means of stopping this attack simply - although it may be sufficient to only disable compression on responses with dynamic content. Responses which do not change between requests do not contain a user-supplied string, and therefore should be safe to compress.

Disabling compression is likely to be expensive - some back-of-the-envelope numbers from Guy Podjarny, Akamai's CTO of Web Experience, suggest a significant performance hit. HTML compresses by a factor of around 6:1 - so disabling compression will increase bandwidth usage and latency accordingly. For an average web page, excluding HTML compression will likely increase the time to start rendering the page by around half a second for landline users, with an even greater impact for mobile users.

Making attacks easier

Applications are more easily attacked if they:
  • Have some predictability around the secrets; either by prepending fixed strings, or having a predictable start or end;
  • Are relatively static over time for a given user; and
  • Use a stream cipher.
This second category of enablers presents a greater challenge to evaluate solutions. Particularly challenging is the question of how much secrecy is gained and at what cost from each of them.

Altering secrets between requests is an interesting challenge - a CSRF token might be split into two dynamically changing values, which “add” together to form the real token (x * y = CSRF token). Splitting the CSRF token differently for each response ensures that an adversary can't pin down the actual token with an oracle attack. This may work for non-human parseable tokens, but what if the data being attacked is an address, phone number, or bank account number? Splitting them may still be possible (using JavaScript to reassemble in the browser), but the applications development cost to identify all secrets, and implement protections that do not degrade the user experience, seems unachievable.

Altering a page to be more dynamic, even between identical requests, seems possibly promising, and is certainly easier to implement. However, the secrecy benefit may not be as straightforward to calculate - an adversary may still be able to extract from the random noise some of the information they were using in their oracle. A different way to attack this problem might not be by altering the page, but by throttling the rate at which an adversary can force requests to happen. The attack still may be feasible against a user who is using wireless in a cafe all day, but it requires a much more patient adversary.

Shifting from a stream cipher to a block cipher is a simple change which increases the cost of setting up a BREACH attack (the adversary now has to “pad” attack inputs to hit a block size, rather than getting an exact response size). There is a slight performance hit (most implementations would move from RC4 to AES128 in TLS1.1).

Defensive options

What options are available to web applications?
  • Evaluate your cipher usage, and consider moving to AES128.
  • Evaluate whether supporting compression on dynamic content is a worthwhile performance/secrecy tradeoff.
  • Evaluate applications which can be modified to reduce secrets in response bodies.
  • Evaluate rate-limiting. Rate-limiting requests may defeat some implementations of this attack, and may be useful in slowing down an adversary.
How can Akamai customers use their Akamai services to improve their defenses?
  • You can contact your account team to assist in implementing many of these defenses, and discuss the performance implications.
  • Compression can be turned off by disabling compression for html objects. The performance implications of this change should be well understood before you make it, however (See the bottom of this post for specifics on one way to implement this change, limited only to uncacheable html pages, in Property Manager).
  • Rate-limiting is available to Kona customers.
  • Have your account team modify the cipher selections on your SSL properties.

Areas of exploration

There are some additional areas of interest that bear further research and analysis before they can be easily recommended as both safe *and* useful.
  • Padding response sizes is an interesting area of evaluation. Certainly, adding a random amount of data would at least help make the attack more difficult, as weeding out the random noise increase the number of requests an adversary would need to make. Padding to multiples of a fixed-length is also interesting, but is also attackable, as the adversary can increase the size of the response arbitrarily until they force the response to cross an interesting boundary. A promising thought from Akamai's Chief Security Architect Brian Sniffen is to pad the response by number of bytes based on the hash of the response. This may defeat the attack entirely, but merits further study.
  • An alternative to padding responses is to split them up. Ivan Ristic points us to Paul Querna's proposal to alter how chunked encoding operates, to randomize various response lengths.
  • It may be that all flavors of this attack are HTTPS responses where the referrer is from an HTTP site. Limiting defenses to only apply in this situation may be fruitful - for instance, only disabling HTML compression on an HTTPS site if the referrer begins with "http://". Akamai customers with Property Manager enabled can make this change themselves (Add a rule: Set the Criteria to "Match All": "Request Header", "Referer", "is one of", "http://*" AND "Response Cacheability", "is" "no_store"; set the Behaviors to "Last Mile Acceleration (Gzip Compression)", Compress Response "Never”. This requires you to enable wildcard values in settings.).

crossposted at blogs.akamai.com.

Environmental Controls at Planetary Scale

A common set of security control objectives found in standard frameworks (ISO 27002, FedRAMP, et al) focus on environmental controls. These controls, which might focus on humidity sensors and fire suppression, are designed to maximize the mean time between critical failure (MTBCF) of the systems inside a data center. They are often about reliability, not safety1; fixating on over-engineering a small set of systems, rather than building in fault tolerance.

Is the cost worth the hassle? If you run one data center, then the costs might worthwhile - after all, it’s only a few capital systems, and a few basis point improvements in MTBCF will likely be worth that hassle (both in operational false positives as well as deployment cost). But what if you operate in thousands of data centers, most of them someone else’s? The cost multiplies significantly, but the marginal benefit significantly decreases - as any given data center improvement only affects such a small portion of your systems. Each data center in a planetary scale environment is now as critical to availability as a power strip is to a single data center location. Mustering an argument to monitor every power strip would be challenging; a better approach is to have a drawer full of power strips, and replace ones that fail.

The same model applies at the planetary scale: with thousands of data centers all over the world (in most of which the operators already have other incentives to take care of environmental monitoring), a much more effective approach is to continue to focus on regional failover (data centers, metro regions, and countries go offline all the time), and only worry about issues within a data center when they become a noticeable problem.

1 Leveson, Nancy. Section 2.1, "Confusing Safety with Reliability", Engineering a Safer World, pp 7-14

crossposted at blogs.akamai.com

A Brief History of Cryptography



As part of an educational video series, here’s a brief history of cryptography.

DNS reflection defense

Recently, DDoS attacks have spiked up well past 100 Gbps several times. A common move used by adversaries is the DNS reflection attack, a category of Distributed, Reflected Denial of Service (DRDos) attack. To understand how to defend against it, it helps to understand how it works.

How DNS works

At the heart of the Domain Name System are two categories of name server: the authoritative name server, which is responsible for providing authoritative answers to specific queries (like use5.akam.net, which is one of the authoritative name servers for the csoandy.com domain), and the recursive name server, which is responsible for answering any question asked by a client. Recursive name servers (located in ISPs, corporations, and data centers around the world) query the appropriate authoritative name servers around the Internet, and return an answer to the querying client. An open resolver is a category of resolver that will answer recursive queries from any client, not just those local to them. Because DNS requests are fairly small and lightweight, DNS primarily uses the Universal Datagram Protocol (UDP), a stateless messaging system. Since UDP requests can be sent in a single packet, the source address are easily forgeable with any address desired by the true sender.

DNS reflection

A DNS reflection attack takes advantage of three things: the forgeability of UDP source addresses, the availability of open resolvers, and the asymmetry of DNS requests and responses. To conduct an attack, an adversary sends a set of DNS queries to open resolvers, altering the source address on their requests to be those of their chosen target. The requests are designed to have much larger responses (often, using an ANY request, a 64 byte request yields a 512-byte response), thus resulting in the recursive name servers sending about 8 times as much traffic at the target as they themselves received. A DNS reflection attack can directly use authoritative name servers, but it requires more preparation and research, making requests specific to the scope of each DNS authority used.

Eliminating DNS reflection attacks

An ideal solution would obviously be to eliminate this type of attack, rather than every target needing to defend themselves. Unfortunately, that’s challenging, as it requires significant changes by infrastructure providers across the Internet.

BCP38

No discussion of defending against DRDoS style attacks is complete without a nod to BCP38. These attacks only work because an adversary, when sending forged packets, has no routers upstream filtering based on the source address. There is rare need to permit an ISP user to send packets claiming to originate in another ISP; if BCP38 were adopted and implemented in a widespread fashion, DRDoS would be eliminated as an adversarial capability. That’s sadly unlikely, as BCP38 enters its 14th year; the complexity and edge cases are significant.

The open resolvers

While a few enterprises have made providing an open resolver into a business (OpenDNS, GoogleDNS), many open resolvers are either historical accidents, or resulting from incorrect configuration. Even MIT has turned off open recursion on its high-profile name servers.
Barring that, recursive name servers should implement rate limiting, especially on infrequent request types, to reduce the multiplication of traffic that adversaries can gain out of them.

Self-defense

Until ISPs and resolver operators implement controls to limit how large attacks can become, attack targets must defend themselves. Sometimes, attacks are targeted at infrastructure (like routers and name servers), but most often they are being targeted at high-profile websites operated by financial services firms, government agencies, retail companies, or whoever has caught the eye of the attacker this week.
An operator of a high-profile web property can take steps to defend their front door. The first step, of course, should be to find their front door; and to understand what infrastructure it relies on. And then they can evaluate their defenses.

Capacity

The first line of defense is always capacity. Without enough bandwidth at the front of your defenses, nothing else matters. This needs to be measurable both in raw bandwidth, as well as in packets per second, because hardware often has much lower bandwidth capacity as packet sizes shrink. Unfortunately, robust capacity is now measurable in the 300+ gigabits per second, well beyond the resources of the average datacenter. However, attacks in the 3-10 gigabit per second range are still common, and well within the range of existing datacenter defenses.

Filtering

For systems that aren’t DNS servers themselves, filtering out DNS traffic as far upstream as possible is a good solution, but certainly at a border firewall. One caveat - web servers often need to make DNS queries themselves, so ensure that they have a path to do so. In general, the principal of “filter out the unexpected” is a good filtering strategy.

DNS server protection

Since DNS servers have to process incoming requests (an authoritative name server has to respond to all of the recursive resolvers around the Internet, for instance), merely filtering DNS traffic upstream isn’t an option. So what is perceived as a network problem by non-DNS servers becomes an application problem for the DNS server. Defenses may no longer be simple “block this” strategies; rather, defense can take advantage of application tools to provide different defenses.

Redundancy

While the total number of authoritative DNS server IP addresses for a given domain is limited (while 13 should fit into the 512-byte DNS response packet, generally, 8 is a reasonable number), many systems use nowhere near the limit. Servers should be diversified, located in multiple networks and geographies, ensuring that attacks against two name servers aren’t traveling across the same links.

Anycast

Since requests come in via UDP, anycasting (the practice of having servers responding on the same IP address from multiple locations on the internet) is quite practical. Done at small scale (two to five locations), this can provide significant increases in capacity, as well as resilience to localized physical outages. However, DNS also lends itself to architectures with hundreds of name server locations sprinkled throughout the internet, each localized to only provide service to a small region of the Internet (possibly even to a single network). Adversaries outside these localities have no ability to target the sprinkled name servers, which continue to provide high quality support to nearby end users.

Segregation

Based on Akamai’s experience running popular authoritative name servers, 95% of all DNS traffic originates from under a million popular name server IP addresses (to get 99% requires just under 2 million IP addresses). Given that the total IPv4 address space is around 4.3 billion IP addresses, name servers can be segregated; a smaller number to handle the “unpopular” name servers, and a larger amount to handle the popular name servers. Attacks that reflect of unpopular open resolvers thus don’t consume the application resources providing quality of service to the popular name service.

Response handling

Authoritative name servers should primarily see requests, not responses. Therefore, they should be able to isolate, process, and discard response packets quickly, minimizing impact to resources engaged in replying to requests. This isolation can also apply to less frequent types of request, such that when a server is under attack, it can devote resources to requests that are more likely to provide value.

Rate Limiting

Traffic from any name server should be monitored to see if it exceeds reasonable thresholds, and, if so, aggressively managed. If a name server typically sends a few requests per minute, having name servers not answer most requests from a name servers requesting dozens of time per second (these thresholds can and should be dynamic). This works because of the built in fault tolerance of DNS; if a requesting name server doesn’t see a quick response, it will send another request, often to a different authoritative name server (and deprioritizing the failed name server for future requests).

As attacks grow past the current few hundred gigabit-per-second up to terabit-per-second attacks, robust architectures will be increasingly necessary to maintains a presence on the Internet in the face of adversarial action.

crossposted at akamai.com

SOURCE Boston Talk

This month I gave the Thursday keynote (slides) at SOURCE Boston 2013. SOURCE is one of my favorite conferences to be at - primarily because of the high density of passionate security practitioners in attendance.

There is some nice coverage of some other Akamai talks given at SOURCE as well. If you think this works sounds cool, we’re hiring!

How big is 300 Gbps, really?

The 300 Gbps attack this week against SpamHaus certainly seems epic. But how big is it, really? When we think about an attack an Akamai, we think about three things: the attacker’s capacity, their leverage, and the target’s capacity. And when we think about leverage, it’s really comprised of two smaller pieces: how much cost efficiency the attacker expects to get, and how the target’s resilience mitigates it.

300 Gbps isn’t that bad when it’s restricted to reflected DNS traffic - if you have enough capacity to ingest the packets, they’re pretty trivial to drop, and, until your network cards fill up, are less effective than a SYN flood. So why would an attacker resort to such an inefficient attack? The attacker likely doesn’t have 300 Gbps in their botnet - they probably have somewhere in the range of 30 to 60 Gbps. Attacks through DNS resolvers are amplified - so the attacker can create a larger attack than they might have otherwise, at the cost of reducing their leverage.

In comparison the BroBot botnets are routinely tossing around 30 Gbps attacks, with peaks upwards of 80 Gbps. Because they’re willing to sacrifice their hosts, they have a wider range of attacks available to them. Commonly, they send HTTPS request floods - requiring their targets to negotiate full SSL connections, parse an HTTP request, and determine whether they’ll deliver a reply or not. BroBot could certainly throw around a bit more bandwidth with DNS reflection - but against most of their targets, it would have less effect than some of their current tactics.

It’s hard to compare the two. If you have less than 60 Gbps of raw bandwidth lying around, they’re both the same (you’ll succumb either way). If you have more than 60 and less than 300 Gbps, BroBot is more palatable, although you need a lot more CPU to handle it. But above 300Gbps of bandwidth? The attack on SpamHaus is much, much easier to deal with.

Should we bother with security awareness?

Bruce Schneier opines that, “training users in security is generally a waste of time.” Dave Kennedy disagrees, “Education and awareness can be effective if you take the complete opposite view of what Bruce views as an education and awareness program.”

I think that both have interesting points, and where Dave is disagreeing with Bruce, I agree with Dave. Bruce unfortunately skirts around what I think would be his strongest point - that the failures of security design and implementation have led us to require users to take actions that cause them to question our security wisdom - undercutting the awareness benefits we might expect.

Passwords are a good example. Because we still have painful authentication systems, we force users into increasingly complex schemes, rather than building better systems.

Better use of awareness resources is critical, indeed, which is part of Bruce rails about; but more importantly, “patching” design bugs with human workarounds is a sketchy idea.

RSA Keynote


Coping below the Security Poverty Line

On Friday at the RSA Conference, Wendy Nather and I presented on Coping Mechanisms for Living Below the Security Poverty Line. Slides are here. The part of our presentation that seemed to resonate best with the audience was the “What $0 will buy” slide - that is, what “free” options exist to improve security.

Risk compensation

Context: Thursday at 4:05 pm, I’ll be keynoting in Hall D at the RSA Security Conference: “Mind over Matter: Managing Risk with Psychology instead of Brute Force”. Slides are here. There are two core topics covered by the keynote; the other is Understanding and Increasing Value.

One of the biggest challenges facing the information security profession is how we work with our business partners to better manage risk. We often make this harder on ourselves, asserting that “we are the custodians of risk” or “we are the conscience of the business”. This isn’t very productive or helpful, and it generally doesn’t start conversations off well with the business.

In fact, business partners often don’t want to talk to security, and may get reluctantly dragged in. When they ask if what they are doing is “safe enough”, they are dragged through the morass of the ISO27002 framework, asked questions about esoteric problems that haven’t affected anyone in a decade, and made to deal with lectures on the value of various entropy sources, and whether N+1 redundancy is sufficient in the case of the APT attacking during a natural disaster. And they end of that, they just want to leave, either with a “yes”, which makes them happy, or a “no” which they’re going to ignore, and hope they never get caught.

A critical part of thinking about risk is the concept of risk compensation, also known as the Peltzman effect. People have a set point of risk that they will tolerate (NB: that they are aware of!), and anything that increases this risk will cause them to decrease risk elsewhere; anything that decreases this risk will let them take more risk.

At steady state, they believe that the risks that arise in the business are being handled by the security machine, and that overall risk isn’t changing. True or not, this is the perception that companies have. If they believe that there are fewer risks coming into the business, then they’ll want to defund the machine. If they feel the machine isn’t effective and countering risk (and nothing bad is happening), they’ll believe there are fewer risks coming into the system … and defund the machine.

The overreaction to this that many of us in the security community have had is the Chicken Little approach - we make risks sound scarier than they are; or bring up risks that can’t be fixed. Unfortunately, humans have two ways of coping with unmitigated risk. One is to convince ourselves that we’ve always known about this risk, and that’s okay. Sadly, that’s the healthy response. The worse response is to tell ourselves that the risk isn’t real; more importantly, the person who told us about the risk isn’t credible, and we should ignore other risks they’ve told us about. Which, conveniently, leaves us feeling relatively risk-free, so let’s go do something more risky!

Our goal is to make people believe in something approximating the risks they’ve been ignoring. We do that by not letting them outsource risk analysis to the “experts”, but by using those experts to teach them to do the risk analysis themselves. This won’t always improve things right off the bat, but will, over time, cause people to change their behaviors.

We hope.

Understanding and increasing value

Context: Thursday at 4:05 pm, I’ll be keynoting in Hall D at the RSA Security Conference: “Mind over Matter: Managing Risk with Psychology instead of Brute Force”. Slides are here. There are two core topics covered by the keynote; the other is Risk Compensation.

How do we understand how much value we provide to a business? One way is to first understand how much value a business a provides - a business spends money (resources), and (hopefully) makes money. The money it makes is it’s value; the ratio value over resources is its capabilities: how well it applies resources. We hope that our capabilities is greater than 1: that is, that we create surplus through our activities.

Organizations within a business can apply this same measure, even if the numbers are a bit fuzzier. Since we can’t always measure value, sometimes we measure capabilities instead as a proxy. Capabilities is simply our skill at using our resources, times our effort in applying them, times our effectiveness at changing at our environment.

Skill is simple to understand. There’s an apocryphal story about a maintenance engineer for a company who, after retiring, was called back in because one of the ancient mechanical systems had failed, and no amount of effort could restore it. He came in, made a chalk mark on the side of the system, and told them to hit that spot with a hammer. He presented them with a bill for $30,000. When asked for itemization, he noted:
Chalk: $5
Knowing where to make the mark: $29,995
That’s skill: the ease with you can accomplish a task.

Effort is about how we approach a task. Do we think it will fail, so we give it insufficient attention? Have we assigned it to someone overburdened, so they are distracted and fail to make progress? Do we give it to someone with true passion, and let it be a priority for them?

Effectiveness is often about the environment we are in: Did a project complete, or did we decide not to finish after investing 80% of the time? Did we have buy in from the business, or will our project collect dust? Did we end up shouting from rooftops, and no one listened? If, as a result of investing resources, there is no change to the business, then the resources were, generally, ineffective.

That last part is hard - we think of ourselves as preventing bad things, so how do we know if we were effective? The answer is simple - we should have enabled our organizations to take more risks! It sounds perverse - but all organizations take risks. We should enable them to understand the risks they are taking, and mitigate some so that they can take others - hopefully ones not related to security, of course.

While measuring capabilities is hard, it’s like three-dimensional differential equations in a non-ideal environment : really hard on paper, but almost anyone can catch a ball. Within an organization, teams are judged on their capabilities, and resources are redirected over time from the less capable to the more capable.

Leveling up Security Awareness

Context: Thursday morning at RSAC, Bob Rudis and I will be presenting “Achievement Unlocked: Designing a Compelling Security Awareness Program” at 10:40 am in Room 123. Slides are here.

Security Awareness has become a controversial topic. Many organizations have fallen back onto rote, annual, computer-based training (CBT), taking a cookie-cutter, one size fits all approach to the problem. Why? Because auditors started checking to see if programs existed -- and their measurement of success was whether or not you’d gotten every employee in the company to certify that they’d receive training. And that led to a checklist-based race to the bottom.

The first step in improvement is to separate policy awareness - that annual verification that employees have been “trained” from security awareness - the steps you take to improve the overall security posture of your employees. If, for instance, you require each of your employees to sit through a one hour CBT annually, then your effectively spending 1 FTE for every ~1600 employees you have just to check that box. That’s a waste of time and money, and your employees know it! By demonstrating that you’re willing to waste their time, they will treat your CBT with the same respect - but playing games to see how fast they can race through it, for instance. Or to find all the picayune errors they can, and laugh about how clueless you are.

You can solve this problem by racing to the bottom even faster: even what your auditors need is to see that every employee has checked a box annually, then one option is to give every employee a box to check annually. Create an automated system that reaches out each year to employees, driving them to a webpage that has an overview of the highlights of the security policy, with some bullets about why they care, with some links to more information for the enterprising souls. And then give them a box to check that records that they’ve checked the box for the year.

Having done that, you can focus on real security awareness training. Real awareness training is much more targeted. Engage users around specific topics. Social Engineering. Phishing. USB drives. Screensavers. Give them a way to respond: at Akamai, we have a mailing list, that everyone with a published phone number is on. When a pretexted call comes in, people can notify the next likely targets of the context of the phone call. Give them incentives: gift cards, or visits from the Penguin of Awesome. Give pro bono personal security training: teach them about attacks that might target their families, and educational resources for their children. And don’t worry about tracking that every single person has consumed every single resource - that’s a waste of energy. Give them what they need, and they’ll clamor for more.

Standard Infosec Management Guidance is Wrong. Sorry.

Context: Tuesday evening, I’ll be presenting at the RSAC Infragard/ISSA meeting (Room 120 at 6pm) a talk title “All our Infosec Management Guidance is Wrong. Sorry about that!”. Slides are here.

There’s an apocryphal story about five monkeys, a ladder, a banana, and a hose. Monkeys would go up the ladder to get the banana, get hosed down, and learn not to climb the ladder. New monkeys would be introduced, and “peer training” would teach them not to climb the ladder, until no monkeys who had been hosed down remained, but monkeys would fear the ladder.

Truthiness aside, the kernel of truth that causes this story to spread is a clear one: we pass down myths and legends about what we should, or how we should do it, but not always *why* we do it. And so, like monkeys, we become afraid of the ladder, rather than watchful for the researcher with a hose. And we pass these lessons down, or across, and turn them into pithy statements, without considering what they mean now. Like, “You should get a certification”, “Pick a good password”, or “Just add security to the contract”, these once useful pieces of advice may end up lost in translation.

In the talk, I discuss pithy quotes from long-dead philosophers, applying policy (or technology!) exclusively to solve problems, Return on Security Investment, Defense in Depth/Breadth/Height, and being “not faster than the bear.”

The value of professional certifications

Context: this afternoon, I’ll be joining a panel at RSAC (PROF-M03; Room 302 at 1450) titled “Information Security Certifications: Do They Still Provide Industry Value?”

Much ado is made about the relative merits of various certificates, certifying test, and administering organizations. Before arguing the value of those, we should first assess what intrinsic value a professional certificate might have; understanding the various models, and then see which fit the information security industry.

One model is the guild certificate - a certificate of competency, generally issued to a journeyman or master of their craft, which acknowledges their capability at their preferred trade. The building trades are the most common like this; but medical professionals, lawyers, and pilots are all examples. As purchasers of services, consumers like to know that the purveyor meets a minimum standard of the craft. Guild certificates are especially preferred where quality of work is important, but there tends to be a set of common tasks performed within the profession.

Another model, often a special case of a guild certificate, is the practitioner’s certificate, which is a certificate, generally issued directly or indirectly by a governmental organization, permitting an individual to practice on your behalf. Consider the CPA: an individual who is allowed to practice accounting before the government; and you are shielded from (some) liability for errors they make. Building inspectors are another example; practitioner’s certificates let us know that in trusting an individual, we don’t necessarily have to inspect their work. Practitioner’s certificates are especially effective where there is exactly one correct way to solve a problem or accomplish a task.

Yet a third model is the reputational certificate. A reputational certificate identifies a person as a member of a clique. Membership in that clique might imply certain capabilities, but is no guarantee. A college diploma, membership in a professional organization, or employment in a given company are examples of reputational certificates. A reputational certificate represents a transfer of the reputation of existing members to a new member: the first time you meet someone from MIT, you might accord them respect on the assumption that they are as competent as other MIT graduates. But reputation is a two-bladed sword: If you know a lot of incompetent people who joined The Southwest Weasel Security Association, you’ll judge the next person you meet from there as equally incompetent.

So what then, are infosec certifications?

There exist focused, guild certificates, often administered by a vendor: consider the CCIE or MCSE as general examples. But most certifications offered are more reputational: they bear the trappings of a guild certificate, like a common body of knowledge, or coursework, but given the lack of a common craft or single set of solutions in the industry, there is no general purpose guild certificate. Infosec is not unique in this case; sales professionals or product managers also have similar challenges.

And reputational certificates always devolve to the lowest common denominator: the value of the certificate will always devolve to the reputation of the lowest holder of the certificate, not the greatest.

Early RSAC coverage

I’ve done a couple of interviews this week about my upcoming keynote at RSAC. For the English/Spanish speakers, I’ve put up an early draft of the slides, which were the keynote I gave at Security Zone. The talk will have another iteration before RSAC, but you can take an early look, or watch the even earlier version I gave at Hack in the Box last year.