ipcSecurity.tex

\chapter{Application Security}
\label{sec:securityOfIPC}

\section{Attack Vectors against Host-only Applications}
\label{sec:hostOnlyAttackVectors}
As discussed in Section~\ref{sec:networkedIPCSecurity}, networked IPC is innately insecure because it uses other people's machines.  Therefore, any application that uses the Internet must take precautions to keep communication secure, such as encryption and two-way authentication.  However, applications that do not use the Internet, host-only applications, have their own sets of attack vectors.  The two most vulnerable attack vectors are memory leaks and local communication channels.

\subsection{Memory Leaks}
\label{sec:memoryLeaks}
Memory leaks occur when a process does not flush memory and leaves confidential information dereferenced in memory.  For example, if a password manager is in use, it will likely store passwords in memory.  Once the user finishes using the password manager and locks it, the memory holding the passwords will be freed as the process is cleaned up.  However, if the password manager does not wipe its memory, for example by using \texttt{memset} to replace the memory with arbitrary data, then the next process to be given that area of memory could read the passwords.

This situation is not just hypothetical.  This year, it was found that five common password managers, including \texttt{1Password}, \texttt{Keepass} and \texttt{LastPass}, fail to adequately scrub memory before it was freed~\cite{independent_security_evaluators_2019}.  While there are limits to what a password manager can do to keep passwords secure, the applications researched failed to reach them.  The password being requested by a user must be in memory in plaintext while in use so that the client process is able to access it, but the password manager should scrub this region of memory immediately after the password is taken.  However, applications like \texttt{KeePass} and \texttt{LastPass} fail to scrub any password after they are accessed the first time, leaving them in memory in plaintext, even after the password manager is locked.  Of even greater concern, \texttt{1Password 7} puts all passwords into plaintext in memory when the password manager is unlocked, along with the master password.  An attacker who is able to read arbitrary memory would be able to find all of a user's passwords.  These password managers, along with all applications that handle confidential information, should strive to scrub memory regions immediately when they are no longer needed to minimize the risk of data leaks.

This concept of scrubbing memory as soon as possible is not new.  It was outlined in 2005 under the term ``secure deallocation''~\cite{chow2005shredding}.  Secure deallocation means that memory is scrubbed as soon as all processes are finished using it.  At the time of this paper's publication, the lifetime of data was commonly from first write until the next time that data was written in the same location, regardless of whether a new process owned the memory address.  The secure deallocation timeframe defines data living in memory from first write until it is explicitly freed, showing that it is no longer needed.  The ideal lifetime would be from first write until last read, however it would be impossible for an operating system to know when the last read will be.  By using secure deallocation, the operating system would be able to automatically scrub data as it is being freed, minimizing the time when confidential information is living in memory.

While secure deallocation has a large security benefit, it also has downsides.  The largest downside is that it takes extra time.  Without secure deallocation, when a user denotes that an area of memory is no longer needed, all the kernel needs to do is put those memory addresses into the available memory pool.  However, if the kernel needs to also wipe that memory, then that will make the free operation take more time and CPU cycles.  Zeroing memory made processes run up to 10\% slower, which is an unacceptable delay for any process where confidential data is not of critical importance~\cite{chow2005shredding}.  This could also cause problems in processes that incorrectly use memory after they have deallocated it.  While the second issue is a bug resulting from poor programming, the first represents a significant slowdown that must be addressed before secure deallocation can successfully enter common use.

\subsection{Communication Channels}
\label{sec:communicationChannels}
While memory leaks attack information once it is no longer being used, attacks can also attempt to read or interfere with information while it is in use.  In particular, as discussed in Sections~\ref{sec:localIPCSecurity} and~\ref{sec:manInMachineAttack}, local communication channels can be vulnerable.  Many common applications, including security-conscious applications such as password managers and security tokens, were vulnerable to client or server impersonation due to weak encrpytion, key-exchange, or other vulnerabilities that could be easily fixed~\cite{MitMa}.  Additionally, other researchers have been able to use local IPC to break application isolation on both iOS and Mac OS X devices as well as hijack execution of the Android Bluetooth radio to control Bluedroid-connected devices~\cite{Xing_2015_CAI_2810103_2813609}~\cite{Shao_2016_MAU_2976749_2978297}.  These attacks highlight the insecurity associated with local communication channels, even though the data sent will never leave the computer.  It still must be secured as if it will traverse the Internet so as to protect users from local attackers.

\section{Input Management and Parsing}
\label{sec:inputManagement}
All applications must handle input, and this determines the application's behavior.  This input can come in terms of command-line arguments, such as what file the \texttt{cat} program should print the contents of, to mouse clicks that a web browser must translate into action.  This input can also be text-based, such as a website letting a user input a term to filter data by or filling in an online form with personal information.

When attacking an application, hackers often craft an input that executes the program in a way that the creators did not intend.  This input may take advantage of lapses in the parsing algorithm of the vulnerable program.  Parsing, or input management, is the way that a program decides whether input is correctly formatted, and if so, how to deal with it.  It infers meaning from raw bytes.  For example, part of a compiler is a parser that checks through the code to make sure that the programming language syntax is correct, such as balanced parentheses and semicolons at the end of lines.  If there is a bug in the parser, then invalid input could be allowed into the program, possibly turning a bug into an exploitable vulnerability.  This invalid input can follow execution paths that were not supposed to happen and possibly take the program into an unintended state.

Therefore, if a programmer is able to create a parser and prove that it accepts exactly the desired input, then he or she will be able to remove the possibility of a large class of vulnerabilities: input-based vulnerabilities.  These will be discussed more in-depth in the next section, Section~\ref{sec:inputBasedVulnerabilities}.  The difficulty in creating this parser largely depends on the complexity of the input being given.  If the input language is too complex, then it will be impossible to prove that a parser accepts exactly the appropriate input.

To break down the problem more, I will call the set of all possible, valid inputs the input language.  For a parser to be correct, for any given string, the parser must correctly decide whether the string should be accepted or rejected.  If the string is accepted, then it is in the input language.  Otherwise, it is not.  If the input language is regular or context-free, then it is possible to prove whether or not the parser accepts exactly the input language.  If so, then a parser for our input language has been created.  In this case, one could place the parser at the beginning of the program, so that any input immediately goes through the parser.  Accepted strings would be sent to the program to run, while rejected strings would cause the program to end immediately, without the input ever reaching the actual application logic.  With this parser, it is guaranteed that only valid input reaches the application and the parser would be able to prevent input-based vulnerabilities.

However, many input languages were not designed with this in mind, so the langauge is at least recursive in complexity.  Because of this, being able to prove that a parser only accepts the input language is an undecidable problem~\cite{sassaman2011halting}.  The existence of these complex input languages is not necessarily because a specific input language needs to be recursive, but more because programmers do not explicitly think about the difficulty that the problem of parsing represents.

This problem is further complicated by the way that parsing logic is currently implemented in many pieces of software.  In many applications, programmers use ``shotgun parsing,'' which means that the parsing logic is spread out throughout the code, instead of doing all parsing at the beginning~\cite{bratus2017parsing}.  Since the parsing code is spread out, it is more difficult to check that all possible cases are covered, even if the input language is regular or context-free.  Another trap that programmers fall into is using a regular expression in an attempt to validate an input language that is not regular~\cite{bratus2017parsing}.  In this case, whether an input should be accepted could be decidable, but the logic used to determine this does not have enough computational power to do so.

To combat these weaknesses, a design philosophy called Language Theoretic Security, or LangSec, has risen in popularity.  LangSec follows the idea that the code that decides whether input is valid should be separate from the application code that processes the input~\cite{langsec_language-theoretic_security}.  In a LangSec-compliant program, once the application logic receives the input, it knows the exact form that the input will follow, without exceptions, and therefore can operate without any need to check for input correctness.  This helps to make the processing code cleaner since there will be no need for ad-hoc validity checks.  More importantly, the application will be much safer since it will be safe from a large class of exploits.

\section{Input-Based Vulnerabilities}
\label{sec:inputBasedVulnerabilities}
When an application's parser does not correctly accept the input language, the application is left vulnerable to input-based vulnerabilities.  These are vulnerabilities where the attacker crafts and uses input that breaks some of the assumptions that the program writers have made to make the application perform unexpectedly.  These are especially powerful for hackers because, if the exploit succeeds, they get to run a program of their desire on the victim's computer~\cite{sassaman2011halting}.

When an input-based vulnerability is exploited, the program will not behave in the way the programmer intends.  For example, the program could switch execution to another program, often a shell, to give the attacker the ability to perform arbitrary execution on the victim computer.  Another common attack uses holes in the parsing logic to investigate memory that the programmer does not want to be retrievable by users.

While these attacks are dangerous when done in user mode, they are even more effective when the vulnerable software is a system call.  A system call is the way that a process can interact with the hardware of the computer, either writing to the screen, reading or writing to a file, and many other important tasks.  Unlike user applications, which run in the least privileged protection ring of a computer, system calls run in the most privileged ring, often called kernel mode.  When in kernel mode, the process has access to all memory, not just its own memory.  Therefore, when bugs exist in system calls, the vulnerable system calls can be used to get any information existing in memory or to hijack execution with the most privileges.  When a user passes parameters to a system call, the process running with higher privileges is vulnerable to cause a kernel panic, change user permissions, and many other consequences~\cite{johnson2004finding}.

With this in mind, I will look at three different classes of input-based vulnerabilities: buffer overflows, format string attacks, and other attacks that violate assumptions.

\subsection{Buffer Overflows}
\label{sec:bufferOverflows}
Buffer overflows are one of the most common exploits, earning the nickname ``Vulnerability of the Decade'' for the 1990s~\cite{cowan2000buffer}.  Even twenty years later, buffer overflows are still important because of their large quantity as well as the power they give an attacker.  A buffer overflow occurs when an attacker puts more bytes into a buffer than it can hold, overwriting memory after the buffer.  Buffer overflow attacks were documented by a paper in 1996 entitled ``Smashing the Stack for Fun and Profit'' that described how one could overwrite the return address of a function, jumping execution to another location in memory that the attacker could have previously filled with their own instructions~\cite{one1996smashing}.  While an easy fix to this attack would be to check the bounds of input before writing it to a buffer, this is often forgotten, or the programmer may believe the bounds were previously checked by parsing logic.  These bugs are especially common in software written in C because there is no built-in bounds checking; it must be implemented by the programmer.  Many protections, including stack canaries, non-executable stacks, and ASLR have been implemented by kernels to reduce the possibility of buffer overflows, but all of these can be defeated~\cite{richarte2002four}~\cite{shacham2007geometry}~\cite{evtyushkin2016jump}.  In some cases, either for efficiency or due to their age, programs may be compiled without a stack canary or with an executable stack, allowing simple buffer overflows to be effective.

One example of a buffer overflow attack existed in libpng version 1.2.5~\cite{CVE-2004-0597}.  Using a malformed PNG image, an attacker could overflow a buffer for transparency data and cause arbitrary execution.  Since libpng was and is still used for much of the Internet's PNG handling operations, any place where this version of libpng was used was vulnerable.  If an attacker could get the malicious PNG file to be ran through a specific function, then he or she would be able to run any desired code on the victim's computer, including opening a shell.  If the victim code was compiled without a stack canary and with an executable stack, then the attack would be even easier.  This shows the power of a buffer overflow attack and the widespread damage that can be done.

\subsection{Format String Attacks}
\label{sec:formatStringAttacks}
Another type of input-based vulnerability is a format string attack, which can occur when a function like \texttt{printf} is run.  This can occur in many situations, such as when there is a mismatch between the number of format parameters and the number of parameter arguments, or when a specific format parameter is used, \texttt{\%n}, which will output the number of bytes written to a variable.  In these situations, an attacker can view or overwrite memory to either disclose information or hijack execution~\cite{newsham_2000}.  Another use of format string attacks is to overwrite memory addresses in the global offset table~\cite{scut2001exploiting}.  The global offset table, or GOT, contains addresses for all of the library functions called by the process.  When a function is called, the GOT is consulted before execution jumps to either the runtime linker or the function itself, depending on if the function had been previously linked in the running process.  By overwriting this address, an attacker can change execution to any arbitrary address, without worrying about the protections against buffer overflow attacks.  This can also be used to cause a denial-of-service attack if the attacker can get the program to attempt to read memory it does not have access to, causing a segmentation violation, and a crash~\cite{scut2001exploiting}.

However, while format string attacks were once a fertile ground for vulnerabilities, they are also an opportunity to be a success story of correct parsing.  In C, the syntax for format strings is regular, so it is possible to create a provably correct parser to make sure that there are no  placeholders in the user's input~\cite{sassaman2013security}.  Since the user cannot add format parameters to the string, as long as the programmer correctly matches the number of format parameters with the number of parameter arguments, most attacks that attempt to read memory will fail.  However, if the \texttt{\%n} format parameter is still used, attackers would always be allowed to write arbitrary numbers to memory.

\subsection{Other Violated Assumptions}
\label{sec:otherViolatedAssumptions}
The last class of input-based vulnerabilities is the general class of other ways that assumptions can be violated.  These vulnerabilities, as do the ones previously discussed, occur when the application developers do not think of the ways that applications could be attacked.  For example, SQL injection attacks occur when user input is given directly to a SQL query, allowing users access to the database without the input being cleaned.  This attack is so prominent that it was rated the biggest application security risk of 2017~\cite{owasp_2018}.  SQL injection attacks include attempts to extract data, modify or destroy databases, or avoid authentication~\cite{halfond2006classification}.  These can be mitigated by checking the types of inputs, searching for correct input patterns, and using intrusion detection systems.  However, SQL queries are not regular nor context-free, so using regular expressions does not correctly separate valid and invalid inputs, and there is no provably correct parser for all inputs.

While SQL injection attacks have no complete solution, other vulnerabilities do.  The Heartbleed bug, which was announced in 2014, occurs when an attacker abuses the heartbeat extension in OpenSSL~\cite{mehta_codenomicon_2014}.  The heartbeat extension works when one side of the connection sends a payload and its length, and the other side is supposed to send the same payload back.  The response moves the payload into memory, and the responder replies with the same number of bytes as identified in the payload length field.  However, there was no check to make sure that the length field is no longer than the actual payload length.  This allowed an attacker to specify a much longer length field, which returns the payload followed by the bytes of memory after the original payload, up to the payload length field~\cite{Durumeric_2014_MH_2663716_2663755}.  This bug occurred because the programmers expected the specified payload length to agree with the actual length of the payload, but did not actually check that they did~\cite{bratus2017parsing}.  This simple mistake highlights the high risk of input-based vulnerabilities.  It is easy to overlook many of these vulnerabilities and their consequences can be devastating.  This underlines the need to identify all input-based vulnerabilities if there is no way to make a provably correct input parser.

\section{Fuzzing}
\label{sec:fuzzing}
One way to go about finding input-based vulnerabilities is called fuzzing.  Fuzzing is the process of sending random, semi-random, or unexpected input to a process~\cite[p 21--22]{fuzzing}.  The goal is to find inputs that cause the application to hang, crash, or otherwise behave unexpectedly.  This could represent a bug in the parsing code, where some aspect of the input is not being handled correctly and is causing problems downstream in the application.  Often, developers will fuzz their own applications before shipping to find and eliminate as many bugs as possible.  However, fuzzing can be done by third-parties as well, either to improve the software or find bugs to exploit.

Fuzzing can be split into two other categories: whether or not the fuzzer is able to read the source code.  With the source code, fuzzers can see control-flow structures such as if statements and loops and use these to follow all possible execution paths.  In practice, this requires too much time and too many resources to follow every single execution path, but there is a notion of how many have been tested.  In contrast, without any source code, it is impossible to ensure that every execution path has been convered~\cite{godefroid2012sage}.  Without knowing how the program is supposed to run and exactly what would cause different paths to be taken, the fuzzer has no idea how many possible execution paths are left untested.