Home
Threat Report
Black box security testing has long been a popular way to help assure the security of web applications. This form of testing probes running applications with various inputs to simulate malicious attacks and provides feedback to security teams on potential vulnerabilities in the applications. Typically theses tests are commissioned by business managers and either conducted by in-house security groups or outsourced to professional consultants. They are either run with automated tools or conducted manually.
It is often the case that business managers find the process of black box security testing to be more “black art” than “engineering.” The process is dominated by “ethical hackers” or automated tools using mysterious techniques that often find a number of serious issues. The discovery of these issues naturally leads to questions about how widespread the problem is - both within the program under test and among a company’s other web applications - and what is the most expedient and cost-effective manner to fix the issues. Ensuing discussions involving development, security, and management often elicit accusations, denials, claims of false positives, and widespread malaise about the magnitude of remediation. Noted software security author/speaker and Cigital CTO Dr. Gary E. McGraw comments on the use of black box security testing, “A penetration test can tell you whether your system can be compromised in a straightforward manner (often using canned, black-box probes), but it can't tell you that you're completely secure.”1 While use of black box security testing is an important part of the secure application development lifecycle, Fortify Software’s investigation into black box security testing shows tremendous opportunity to get more out of these tests.
Over the last two months, Fortify Software has gathered data via a pre-release version of its Fortify Tracer product, on black box security testing conducted on applications varying in size, function, and complexity. During this process we analyzed applications that were tested using both automated black box security testing tools and manual black box security testing efforts. Our results uncovered three key areas vital to unlocking significant additional value from black box security testing:
When a black box security test produces a result, an open question is “how much of the application was exercised to get that result?” For example, if a black box security test finds 10 issues while covering 5% of its functions susceptible to vulnerabilities, that inspires a very different level of confidence than if the 10 issues were discovered while covering 80% of the security-critical functions.
We define the coverage metric as the percentage of security-critical functions that are exercised during a black box security test. Security-critical functions include entry and end points in the application, such as database API’s (the target of SQL Injection attacks that steal private data from proprietary databases) and web interface and system functions (that provide sources of input to web applications).
When we conducted black box security tests in conjunction with Fortify Tracer, we found that most black box security tests had significant room to improve their security coverage. Across all of the applications we tested, we found black box security testing covered on average, 18% of the security-critical sites (using automated tests) and 26% (using manual tests). The below table shows the results of automated and manual black box security testing on a sample of four applications we tested. The first set of coverage percentages represents coverage ratios for all security-critical sites in the application, the second shows the coverage ratios for the security-critical sites associated with only web-facing points in the application.

As the table highlights, manual tests covered more security-critical functions, due in large part to the fact that they were more focused and tailored to the application. This was to be expected. The best black box security tests are generally performed by a professional that knows the application well and can spend days creating customized tests. However, on average neither automated nor manual testing covered a large portion of the application, suggesting that more efforts needed to be taken in order to ensure the application was secure.
The data also illustrate that while it is often easier to exercise web-input functions, just focusing on that “attack surface” won’t necessarily ensure coverage deep within the web application.
These results stress the importance of being able to measure how much of the application was checked. Whether using an automated tool or conducting manual tests, the tester currently has no way to know when they’ve successfully tested a large portion of the application. In order to feel confident that the application is secure, one must be able to assess how much of the application was checked.
Black box security tests are a crucial part of securing applications in that they can identify critical vulnerabilities that can only be found when the application is running. What security teams need to be aware of, however, is that these tests also miss important security vulnerabilities. We found two categories of vulnerabilities that they tended to miss. The first included vulnerabilities, such as SQL Injections, path manipulations, and privacy violations. While black box security testing can often identify these types of vulnerabilities, we experienced a few instances where these types of vulnerabilities were missed. In the second category there are vulnerabilities that can almost never be identified by black box security testing. These types of vulnerabilities include issues involved with processing logic.
During our research, we found many instances where the black box security test missed a critical vulnerability. Below, we provide three examples:

(Some sections are X'ed out to preserve anonymity)

(we have X'ed out the actual number in the report)
Because black box security tests work from outside the application, they are unable to catch certain vulnerabilities. If black box security testers are aware of this, they can adopt additional tools or services to ensure more critical security vulnerabilities are discovered.
Fixing a vulnerability that a black box security test identifies can be time-consuming and often challenging. Because black box security testing operates from outside the application, it has no understanding of the application’s source code – its blueprint for how it works. When black box security tests are able to successfully identify a specific vulnerability, they provide only the type and category of vulnerability, such as a SQL statement in a certain URL. As security professionals and developers attempt to investigate and remediate these vulnerabilities they must spend time searching for the problem within the source code.
In our tests, we witnessed the process that black box security testers go through to investigate a vulnerability. When the results were presented to the development team, the development team's main issue was the lack of code-level information on the vulnerabilities that were reported to them. They looked at the URL and the type of vulnerability and attempted to determine where in the source code this vulnerability existed. In some cases, this took up to several hours. In order to speed remediation, black box security testing needs to provide more information, including the issue type, file name, and line number, as well as the runtime detail of the vulnerability, such as the actual SQL statement being executed. The below tables represent two different forms of information. On the top is the typical output from a black box security test, which includes fairly limited information. On the bottom is the kind of information needed by professionals to understand where in the source code the problem lies.


During our investigation, every black box security test performed reasonably well and successfully scanned each application. In many cases these tests identified critical vulnerabilities that could have been exploited had the application been deployed. However, we also observed significant areas where these black box security tests could be improved, as highlighted in our findings above: they failed to cover large sections of the application, they missed critical vulnerabilities, and they didn’t provide the necessary information to fix identified vulnerabilities. While these limitations should not stop us from using black box security tests (they provide key insights into runtime vulnerabilities), they should encourage us to continue exploring opportunities to improve and enhance our black box security testing.