Program Interaction (Module 1)

Program Interaction (Module 1)

As a part of my degree program, I have to take a class called CSE466: Computer Systems Security. The professor for this class (Dr. Shoshitaishvili) created pwn.college, a free education platform to guide not only students in the course, but anyone who wants to try it out. I will be publishing all of my notes from each relevant module of course here, though I highly recommend watching the lectures for yourself as they go much more in-depth!

Table of Contents:

Linux Command Line:

File System:

File System Structure

  • Absolute Paths: start with /
  • Relative Paths: do NOT start with /, relative to the current working directory

Environment Variables:

Environment Variables: a set of Key/Value pairs passed into every process when launched. Critical variables include:

  • PATH: list of directories to search for programs in
  • PWD: current working directory
  • HOME: path to home directory
  • HOSTNAME: name of the system

Files:

  • Symbolic Links (Soft Links): special type of file that references another file
    • symbolic links to relative paths are relative to the directory containing the link
  • Hard Links: a perfect reference to the data/content inside the linked file. They link only to the data of the linked file, not its original path

Different Types of Files:

  • -: regular file
  • d: a directory (yes, directories are actually just special files!)
  • l: a symbolic link (a file that transparently points to another file or directory)
  • p: is a named pipe (also known as a FIFO)
  • c: is a character device file (i.e., backed by a hardware device that produces or receives data streams, such as a microphone)
  • b: a block device file (i.e., backed by a hardware device that stores and loads blocks of data, such as a hard drive)
  • s: a unix socket (essentially a local network connection encapsulated in a file)

Pipes:

  • Unnamed Pipes (|): ethereal channels of communication, often used to direct data from one command to another
  • Named Pipes (FIFOs): used to help facilitate data flow in certain situations. The flow of data is First In First Out (FIFO)

Useful Commands (CLI):

  • witch programName: will return the absolute path of the program given found by the PATH variable
  • env: prints all the environment variables to the screen
  • export variable=value: can set new/existing variables to a new value
  • ln -s fileToBeLinked symbolicLinkName: creates a symbolic link
  • ln fileToBeLinked hardLinkName: creates a hard link
  • ls -ld: Lists all files in current directory along with the files’ absolute path
  • mkfifo fifoName: makes a new FIFO
  • command < in_file: redirect in_file into command’s input
  • command > out_file: redirect command’s output into out_file, overwriting it
  • command >>out_file: redirect the command’s output into out_file, appending to it
  • command 2>error_file: redirect the command’s errors into error_file, overwriting it
  • command 2>>error_file: redirect the command’s errors into error_file, appending to it

Binary Files

ELF Files:

Executable and Linkable Format (ELF): defines a program as it will be loaded and executed in memory for Linux/BSD systems. Allows for a compiler to create and define a program

  • is a binary file format containing the program, its data, how the program should be loaded (program/segment headers), and the metadata describing program components (section headers)
  • For Windows the equivalent are Portable Executables (PE)
  • For MacOS the equivalent are Mach-O

Dynamically Linked ELF: The ELF file relies on libraries that also need to be loaded

ELF Program Headers:

Program Headers: specify information and define segments needed to prepare the program for execution. The source of information used when loading a file

  • Segments: parts of an ELF file that are loaded into the memory of a computer when that file is executed

Important Entry Types:

  • INTERP: entry type defining the library that should be used to load this ELF into memory
  • LOAD: entry type defining a part of the file that should be loaded into memory

ELF Section Headers:

Section Headers: represent a different view inside an ELF file with a lot more semantic information that is less important for the actual loading process

  • They are not required in an ELF file, they are stored as metadata

Important Sections:

  • .text: the executable code of your program.
  • .plt and .got: used to resolve and dispatch library calls.
  • .data: used for pre-initialized global writable data (such as global arrays with initial values)
  • .rodata: used for global read-only data (such as string constants)
  • .bss: used for uninitialized global writable data (such as global arrays without initial values)

Symbols:

Symbols: binaries and libraries that use dynamically loaded libraries to find libraries, resolve function calls into those libraries, etc ELF Symbols

Resources to Interact with ELF:

  • gcc: to make your ELF.
  • readelf: to parse the ELF header.
  • objdump: to parse the ELF header and disassemble the source code.
  • nm: to view your ELF’s symbols.
  • patchelf: to change some ELF properties.
  • objcopy: to swap out ELF sections.
  • strip: to remove otherwise-helpful information (such as symbols).
  • kaitai struct: to look through your ELF interactively.

Useful Commands (Binary):

  • readelf -a programName: will parse out the ELF file, providing information such as headers
  • nm -a programName: will list out all of the symbols associated with a program
  • nm -D programName: will list out all of the dynamic imports used at runtime

Lifecycle of a Linux Process

What is a Process:

Every Linux process has:

  • State
    • running, waiting, stopped, zombie
  • Priority (and other scheduling information)
  • Parent, Siblings, Children
  • Shared Resources
    • files, pipes, sockets
  • Virtual Memory Space
  • Security Context
    • effective uid and gid
    • saved uid and gid
    • capabilities

Virtual Memory: memory dedicated to a specific process

Physical Memory: memory shared among the whole system

libc: a library full of helper functions that is used by almost every program (including common C functions)

Process Timeline:

  1. Process is created
    • Kernel will check for executable permissions
    • Calls execve() to begin loading
  2. Process is loaded
  • Figures out what steps need to be taken to load the file
    • Starts from the beginning of the file, looks for #! (sh-bang) to extract the rest of the line (the interpreter)
      • Will see if the interpreter matches one in the system, and will instead run your file as an argument to the interpreter from #!
    • If there is no #! (not a script) then it will check if the file matches a format in /proc/sys/binfmt_misc
      • If a match is found, then it will execute the interpreter for that format with your file as an argument
    • If it is a dynamically-linked ELF file, the kernel will read the interpreter/loader (collectively loader) defined in the ELF file
      • The interpreter will be loaded and will be given control
      • The interpreter then locates the libraries
        • LD_PRELOAD: environment variable, and anything in /etc/ld.so.preload
        • LD_LIBRARY_PATH: environment variable (can be set in the shell)
        • DT_RUNPATH or DT_RPATH: specified in the binary file (both can be modified with patchelf)
        • /etc/ld.so.conf: system-wide configuration
        • /lib and /usr/lib
      • The interpreter then runs the libraries
        • May cause more libraries to load if libraries depend on other libraries
        • Will also update relocations during this process
    • If it is a static ELF file, the kernel will just load the file
  • All of this information will be loaded to its own virtual memory space containing (located in /proc/self/maps):
    • the binary
    • the libraries
    • the “heap”
      • for dynamically allocated memory
    • the “stack”
      • for function local variables
    • any memory specifically mapped by the program
    • some helper regions
    • kernel code in the “upper half” of memory is inaccessible to the process
      • above 0x8000000000000000 on 64-bit architectures
  1. Process is initialized
    • Will run any constructors specified in the ELF file
  2. Process is launched
    • The ELF file automatically calls _libc_start_main() from the libc library, which then calls the program’s main() function
    • Now the code is running
  3. Process reads its arguments and environment
  • The int main(int argc, void **argv, void **envp); function will take in three arguments:
    • argc: the loaded objects (binaries and libraries)
    • argv: command-line arguments in argv
    • envp: environment variables
  1. Process executes
  • To interact with the operating system(OS), the process will use system calls
  • For the OS to communicate with the process, signals are used
    • Signals will pause process execution and invoke the handler (functions that take in the signal number)
      • If the signal has no handler, the default action is to kill the process
  • Another method of outside interaction is by sharing memory with other processes
    • This requires an initial system call, but then is self sustaining
    • One way is to use a shared memory-mapped file in /dev/shm
  1. Process terminates
  • Only two ways a process can terminate:
    • Receive an unhandled signal
    • Calling the exit() system call
  • After termination, processes must be “reaped”
    • A process will remain in a zombie state and take up memory until the wait() function is called by their parent
    • This will return the process’s exit code to the parent and then the process will be freed
      • If this does not happen, the process is re-parented to PID 1 and will remain there until cleaned up

Useful Commands (Viewing Processes):

  • readelf -a fileName | grep interpret: will return the specific loader being used to load the file
  • patchelf --set-interpreter /some/interpreter ./fileName: will forcibly set the interpreter for a provided file
  • ldd ./fileName: will list the libraries necessary for the file to run, including the interpreter
  • strace ./executable uncompiledFile: will “trace” out all of the system calls from when the process is first created to when its terminated
  • ./executable /proc/self/maps: will show the virtual memory storage layout for the executable
  • man 2 open: documentation for all the different system calls
  • du -sb programFile: will output how many bytes a file is

Sources:


Written By

s0merset7