Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve ELF termination algorithm #9

Open
CosmicToast opened this issue Apr 16, 2020 · 0 comments
Open

improve ELF termination algorithm #9

CosmicToast opened this issue Apr 16, 2020 · 0 comments

Comments

@CosmicToast
Copy link

The current algorithm to determine where the payload would start (or, rather, where the ELF file will end, relative to its start) is roughly identical across all AppImage (and related) implementations:

Right now the assumption appears to be that the correct offset is e_shoff + (e_shentsize * e_shnum), which holds if the final "part" of an ELF file is the Section Header Table (SHT).
If we look at the specification, we can see that illustrated on page 15.
However, almost immediately follows the following note:

Although the figure shows the program header table immediately after the ELF header, and the section header table following the sections, actual files may differ. Moreover, sections and segments have no specified order. Only the ELF header has a fixed position in the file.

Obviously, if the SHT is not the final component of an ELF file, the end address of the SHT does not correspond to the end address of the ELF file.

The algorithm should be roughly like so (based on the spec):
if the final component is the section header table: e_shoff + (e_shentsize * e_shnum)
if the final component is the program header table: e_phoff + (e_phentsize * e_phnum)
if the final component is a section: last_sect.sh_offset + last_sec.sh_size (last_sect is defined as the section with the highest offset, note that this is not guaranteed to work since size != filesize; I write in go where they calculate it, so I'm sure it's possible to adapt)
if the final component is a program: last_prog.p_offset + last_prog.p_filesz (last_prog is defined as the prog with the highest offset)

In hopes of helping and the purposes of discussion, I wrote a small program in go that will find the correct offset and print everything that follows.
The testing procedure is then simple - simply run the program against any file X which is a concatenation of any ELF file (it should work for any ELF file, not just executables!) and some text data, and observe the text data coming out.
Here is this program:

The go program (main.go)
package main

import (
	"bufio"
	"debug/elf"
	"encoding/binary"
	"fmt"
	"io"
	"os"
)

/* Ok so, ELF is weird
We are guaranteed that the first thing will be the ELF header.
However, the *DATA* can appear in *ANY ORDER*.

There are three types of data.
1. the program header table
2. the section header table
3. things that are referenced in 1 or 2

e_shoff points to the start of the section header table
e_shentsize is the size of a section header entry
e_shnum is the number of entries

the internet says that e_shoff + (e_shentsize * e_shnum) is what we want
this is *ONLY TRUE* if the *LAST THING IN THE FILE* is the section header table
as per above, this *is not guaranteed*

so what else can be at the end, theoretically speaking?
well, it can be:
a) the program header table (e_phoff + (e_phentsize * e_phnum))
b) a section (last section's offset + last section's filesize)
c) a program (last program's offset + last program's filesize)

So what you actually need to do:
1. get max(shoff, phoff, lastsec.Offset, lastprog.Off)
2. depending on which one you find, add either:
	shoff          -> e_shentsize * e_shnum
	phoff          -> e_phentsize * e_phnum
	lastsec.Offset -> lastsec.FileSize
	lastprog.Off   -> lastprog.Filesz
and you get what we want.
*/

func endelf(f reader) (int64, error) {
	var (
		shoff, phoff,
		shentsize, shnum, phentsize, phnum,
		lsoff, lsfs, pgoff, pgfs int64
	)

	e, err := elf.NewFile(f)
	if err != nil {
		return 0, err // not an ELF file
	}

	switch e.Class {
	case elf.ELFCLASSNONE:
		panic("unknown elf class")
	case elf.ELFCLASS32:
		hdr := new(elf.Header32)
		if err := binary.Read(f, e.ByteOrder, hdr); err != nil {
			return 0, err
		}
		shoff = int64(hdr.Shoff)
		shentsize = int64(hdr.Shentsize)
		shnum = int64(hdr.Shnum)
		phoff = int64(hdr.Phoff)
		phentsize = int64(hdr.Phentsize)
		phnum = int64(hdr.Phnum)
	case elf.ELFCLASS64:
		hdr := new(elf.Header64)
		if err := binary.Read(f, e.ByteOrder, hdr); err != nil {
			return 0, err
		}
		shoff = int64(hdr.Shoff)
		shentsize = int64(hdr.Shentsize)
		shnum = int64(hdr.Shnum)
		phoff = int64(hdr.Phoff)
		phentsize = int64(hdr.Phentsize)
		phnum = int64(hdr.Phnum)
	}

	if l := len(e.Sections) - 1; l >= 0 {
		lsoff = int64(e.Sections[l].Offset)
		lsfs = int64(e.Sections[l].FileSize)
	}
	if l := len(e.Progs) - 1; l >= 0 {
		pgoff = int64(e.Progs[l].Off)
		pgfs = int64(e.Progs[l].Filesz)
	}

	m := max(shoff, phoff, lsoff, pgoff)
	switch m {
	case shoff:
		m += shentsize * shnum
	case phoff:
		m += phentsize * phnum
	case lsoff:
		m += lsfs
	case pgoff:
		m += pgfs
	case 0:
		panic("all offsets were under 0")
	}
	return m, nil
}

type reader interface {
	io.Reader
	io.ReaderAt
}

func max(nums ...int64) int64 {
	var res int64
	for _, num := range nums {
		if num > res {
			res = num
		}
	}
	return res
}

func main() {
	if len(os.Args) < 2 { // need an arg
		os.Exit(1)
	}

	f, err := os.Open(os.Args[1])
	if err != nil {
		os.Exit(2)
	}

	pos, err := endelf(f)
	if err != nil {
		fmt.Println(err)
		os.Exit(3)
	}

	_, err = f.Seek(pos, 0)
	if err != nil {
		fmt.Println("Failed to seek")
		os.Exit(4)
	}

	buf := bufio.NewReader(f)
	for {
		b, err := buf.ReadByte()
		if err != nil {
			fmt.Printf("Found %s\n", err)
			break
		}
		fmt.Printf("%c %d\n", b, b)
	}
}
After a discussion with @TheAssassin via email, it was recommended that I open this issue here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant