From 8ccbf0965d81b293729cda315de1c9f6216655ae Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Tue, 7 May 2024 11:51:38 +0300 Subject: [PATCH 01/27] Create Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Co-authored-by: Daniel Weingaertner Signed-off-by: Igor Stoppa --- .../Linux_Memory_Management_Essentials.md | 242 ++++++++++++++++++ Contributions/README.md | 3 + 2 files changed, 245 insertions(+) create mode 100644 Contributions/Linux_Memory_Management_Essentials.md diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md new file mode 100644 index 0000000..a97745c --- /dev/null +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -0,0 +1,242 @@ +# **Linux Memory Management Essentials (Work In Progress)** + +## Index + +[Terms and Abbreviations](#Terms-and-Abbreviations) + +[References](#References) + +[Disclaimer](#Disclaimer) + +[Purpose of the document](#Purpose-of-the-document) + +[Structure of the document](#Structure-of-the-document) + +[Kernel-space memory allocations](Kernel-space-memory-allocations) + +[User-space memory allocations](User-space-memory-allocations) + +[License: CC BY-SA 4.0](#License-CC-BY-SA-40) + + +## **Terms and Abbreviations** +Plese refer to the Linux Kernel documentation. + + +## **References** + +1. [Linux Kernel website](https://www.kernel.org) - +2. ***Interference Scenarios for an ARM64 Linux System*** +3. [CC BY-SA 4.0 Deed | Attribution-ShareAlike 4.0 International | Creative Commons](https://creativecommons.org/licenses/by-sa/4.0/) - License + + +## **Disclaimer** +* This document is not intended to be a replacement for understanding the memory management of the Linux Kernel, + nor it attempts to be an exhaustive analysis of safety implications. +* Because of the very volatile nature of the code within the Linux Kernel, each of the statements made + below should not be taken at face value, but rather verified, for any Linux kernel version following + the one used while writing the document (6.9). +* When referring to specific HW features, the document refers to the ARM64 architecture. + +## **Purpose of the document** +This document aims to provide an holistic view of what happens in Linux memory management, so that +one is at least aware of certain features and can use this document as a jumping pad toward more detailed documentation. +Or even to the code base. + +## **Structure of the document** +The document is divided in two parts, based on the destinatary of the memory allocations discussed: kernel-space or user-space. +Individual points are numbered for ease of reference, but the numbering is not meant to represent any sequence. + +## **Memory management in Linux** + +### **Kernel-space memory allocations** + +#### **Directly Verifiable Assertions** +The following section presents a set of statements that can be objectively verified e.g. by inspecting the sources. + +1. unlike processes memory, kernel memory pages are not swapped, nor dropped silently by the kernel itself, + although an hypervisor will do to a VM what the kernel does to a process (but this is beyond the control of the kernel) +2. the kernel context (usually EL1 on ARM64) uses one single memory map (page tables) across all the cores + executing in kernel mode +3. on 64 bit systems (e.g. ARM64 and x86_64), usually almost all physical memory is mapped in a + (semi)contiguous (there can be holes) range. Memory within this range is both virtually and physically contiguous. +4. physically contiguous memory is treated as a scarce resource, and typically is not provided to userspace, + unless it explicitly asks for it (e.g. for implementing some DMA buffer) +5. the kernel can access userspace memory in three ways: + 1. through the userspace mapping + 1. this type of access is limited to few functions, like copy_to_user()/copy_from_user() and put_user()/get_user() + 2. outside the execution of the functions mentioned, this type of access is not possible, because the userspace mappings are made available only while executing such functions + 3. the userspace memory map can implement HW protections against being misused by the kernel + 4. the kernel is able to access process pages in the same sequence and with the same mappings as the process does + 2. through a memory buffer (e.g. a memory area where the kernel regularly needs to perform large amount of read/write operations, like a network buffer) + 1. the memory used by a buffer is well defined and delimited, rather than a generic area, and it is specifically reserved for this purpose + 2. the kernel uses own mappings in EL1, while the user space uses the EL0 ones + 3. very often the kernel doesn't actually access this area directly, but it rather configures a DMA controller to do the transfers, directly to the physical memory. + 4. misconfiguration of these peripherals that can access physical memory directly is a potential problem - a form of mitigation relies on using IOMMUs: hardware components that the kernel can configure as a firewall on the phisical memory bus, to limit what memory is accessible to each memory bus master device. + 3. through the EL1 mappings + linear map + 1. not the intended way for the kernel to access process context, because the intended way is one of the two previously described + 2. attempting to use EL1 mappings would be not very useful, both for legitimate purposes and even for malicious ones: + 1. the sequence of pages mapped in the userspace process is not known, and it can change continuously, as the memory management does its job of optimising page allocation to running processes + 2. some process pages might even be "missing", because they have been either swapped out or dropped (see previous point) + 3. even for an attacker, in a security scenario, because most likely the attacker would need to access user space pages sequentially, or anyway through user mappings (this is why EL1 mappings are allowed to have access to pages containing EL0 code/data: the security risk is relatively low) + 4. it bypasses any protection that the process might employ through its own mapping, which, even for legitimate kernel operations on userspace, would be less safe +6. Assertions about page "mobility", from different perspectives: + 1. employment of physical pages - observing a physical memory page (e.g. a 4kB chunk aligned on a 4kB boundary) and how it is employed: what sort of content it might host, over time. + 1. certain physical pages are put to use for a specific purpose, which is (almost) immutable, for the entire duration of the execution of the kernel: + 1. kernel code from the monolithic portion (except for pages containing kernel init code, which are released after kernel init is completed) + 2. kernel statically allocated data from the monolithic portion (except for pages containing kernel init data, which are released after kernel init is completed) + 3. some kernel dynamically allocated data, used by the kernel itself, and never released, due to the nature of its use (e.g. object caches, persistent buffers, etc.) + 4. memory used for loadable kernel modules (code, data) is tied to the permanence of the module in memory - this is typically stable through the entire duration of the execution. Some exceptions are modules loaded/unloaded as a consequence of certain peripherals (dis)appearing, e.g. USB ones. + 2. other physical pages (most, actually, in a typical system) are available for runtime allocation/release cycles of multiple users: + 1. transient kernel linear memory allocations (kmalloc / get_free_pages) + 2. transient kernel virtually linear memory allocations (vmalloc for kernel address space) + 3. user-space memory allocations (they are always virtually linear), which are by default transient and exposed to repurposing proactively done by the kernel (see below) + 2. transitioning of a logical page - given certain context and content, where it might be located in memory over time - and if it might be even discarded. + 1. the kernel doesn't spontaneously repurpose the utilisation of its own physical pages; therefore it is possible to assume that the logical content of kernel allocations will reamin tied to the associated physical pages, as long as it is not intentionally altered (or subject to interference) + 1. metadata memory used by the kernel for housekeeping purposes related to processes is included in this category; examples: task/cred structures, vmas, maple leaves, process page tables. + 2. memory used by the kernel for the actual userspace processes: the content of this logical page determines its life cycle and expectancy: certain content such as code or constants can be "re-constructed" by relaoding it from file (code pages are likely to necessitate undergoing re-linking), so the actual logical content might disappear, over time. Other pages, on the other hand, are meant to hold non-re-constructible content, such as stack, heap, and variable data. These pages can, at most, be swapped out, and loaded back later on, but they cannot be simply dropped. + 3. page cache: it is a collection of memory pages containing data read from files, over time; e.g. code or initialised data from a file that was accessed recently; in some cases the page might never have been used, but it was loaded as part of the readahaead optimisation. The life expectancy of these logical pages is heavily affected by how many processes might keep accesing them and the level of memory starvation of the system caused by other processes, with some additional complexity layered on top of this, by the use of containers. + 3. The kernel utilises various optimisations that are meant to take advantage of hardware features, such as multi-stage caching, and also to cope with different memory architectures (like Non Unifor Memory Architecture - NUMA). The main goals are: + 1. avoid having to propagate too frequently write operations through the layers of HW cache, which is caused by pages being evicted from the cache, due to memory pressure + 2. avoid having multiple cpus writing to the same page, in a NUMA system, where only one cpu has direct write access to that memory page, because it would cause cache invalidation + Therefore, the kernel tends to: + 1. reuse as much as possible a certain page that has just been freed (so called hot page, since it is stll presumably present in the HW cache) + 2. keep for each core a stash of memory pages readily available (which prevents other cores from accessing said pages and introducing additional cache-flush operations) + 4. The MMU, involved in performing address translations, acts also as a bus master, and performs read operations whenever it needs to do an address translation that is not already present in its own local translation cache (TLB - Translation Lookaside Buffer). Having to perform too many of such address translations (page walks) can constitute a significant performance penalty. The TLB is not very large, and accessing lots of different memory addresses that do not belong to the same translation entry can cause severe performance degradation. This is why the kernel actually keeps most of the memory mapped in the linear map, to take advantage of a feature present in many processors, that allows the mapping of large chunks of physical memory (e.g 2MB) as a single entry (or few ones). The kernel code, for example, is kept compact to maximise the efficiency of the fetching operations. For the similar reasons, the linear mapping allows to have memory already mapped, without a need for creating those mappings on the fly. + 5. For what concerns allocation from the linear map (kmalloc / get_free_pages), the kernel attempts to keep the free pages as much continuous as possible, avoiding fragmentation of the free areas. + 1. this is implemented through the concept of the "buddy allocator", meaning that whenever a certain amount of linear memory is requested (either sub-page or multi-page size), it always tries to obtain it from the smallest free slot available, only breaking larger free slots when no alternatives are possible. + 2. the kernel also keeps ready a certain amount of pre-diced memory allocations, to avoid incurring the penalty of having to look for some free memory as a consequence of an allocation request. + 3. folios are structures introduced to simplify the management of what has been traditionally called compound pages: a compound page represents a group of contiguous pages that is treated as a single logical unit. Folios could eventually support the use of optimisations provided by certain pages (e.g. ARM64 allows the use of a single page table entry to represent 16 pages, as long as they are physically contiguous and aligned to a 16 pages boundary, through the "contiguous bit" flag in the page table). This can be useful e.g. when keeping in the page cache a chunk of data from a file, should the memory be released, it could result in releasing several physically contiguous pages, instead of scattered ones. + 6. whenever possible, allocations happen through caches, which means that said caches must be re-filled, whenever they hit a low watermark, and this re-filling can happen in two ways: + 1. through recycling memory as it gets freed: for example in case a core is running short of pages in its own local queue, it might "capture" a page that it is freeing. + 2. through a dedicated thread that can asynchronously dice larger order pages into smaller portions that are then placed into caches in need to be refilled + 7. the kernel can also employ an Out Of Memory Killer feature, that is invoked in extreme cases, when all the existing stashes of memory have been depleted: in this case the killer will pick a user space process and just evict it, releasing all the resources it had allocated. It's far from desirable, but it's a method sometimes employed. + 8. freeing of memory pages also happens in a deferred way, through separate threads, so that there is no overhead on the freer, in updating the metadata associated with the memory that has just been released. + 9. All of the mechanisms described above for memory management are not only memory providers, but memory users as well, because they rely on metadata that cannot be pre-allocated and must be adjusted accordingly to the memory transactions happening, as the system evolves over time; therefore they also consume memory, generating an overhead, and being exposed themselves to interference. + 10. The Linux kernel provides means to limit certain requests a process might present; for example with cgroups it is possible to create logical memory "bubbles" that cannot grow beyond a set size, and associate processes to them, that share the collective limit within the bubble. But this does not much toward separating how the kernel might use the underlying memory, besides setting the constraint as described. + +#### **Safety-Oriented consideration** +The following considerations are of a more deductive nature. + +1. Because of the way pages, fractions and multiples of them are allocated, freed, cached, recovered, there is a complex interaction between system components at various layers. +2. Even using cgroups, it is not possible to segregate interaction at the low level between components with differnet level of safety qualification (e.g. a QM container can and most likely will affect the recirculation of pages related to an ASIL one) +3. Because of the nature of memory management, it must be expected that memory management mechanisms will interfere with safe processes, either due to a bug or due to the interference toward the metadata they rely on. For example, the memory management might hand over to a requesting entity a memory page that is currently already in use either by a device driver or by a userspace process playing a role in a safety use case. +4. Still due to te complex interaction between processes, kernel drivers and other kernel code, it is practically impossible to qualify the kernel as safe through positive testing alone, because it is impossible to validate all the possible combinations, and it is equally impossible to assess the overall test coverage and the risk associated with not reaching 100%. The only reliable way to test is to use negative testing (simulating a certain type of interference) and confirming that the system response is consistent with expectations (e.g detect the interference, in case of ASILB requirements). And even then, the only credible claim that can be made is that, given the simulated type of interference, on the typology of target employed, the reaction is aligned with the requirements. Other types of targets will require further ad-hoc negative testing. +5. Linux Kernel mechanisms like SELinux and cgroups/containers do not offer any protection against interference originating from the kernel itself. +6. The Linux Kernel must be assumed to be QM, unless specific subsystems are qualified through both positive and negative testing. +7. Even after achieving the ability to make certain claims about the Kernel integrity (or detection of its loss), this doesn't guarrantees system availability: knowing that an interference has occurred doesn't help with ensuring a certain level of system availability. + +### **User-space memory allocations** + +#### **Assertions** +The following section presents a set of statements that can be objectively verified e.g. by inspecting the sources. + +1. given a user-space logical memory page, at a certain moment, it must not be assumed to have a corresponding + backing physical memory pages; it means that a process might have had some of its pages dropped and/or + swapped out to disk, or perhaps not even ever loaded. +2. it is not possible to make many assumptions about the state of the logical content of a process pages. + Perhaps one of the few statments that can be made is that the logical page containing code being executed + in a certain moment has also a physical backing, while it is being executed. + It might be possible to introduce some additional certainties, for example by pinning down some memory allocations + by marking the process as non-swappable. The presence of caching and other + optimisations like the zero-page make it very difficult to be assertive beyond this point. +3. the management of memory pages associated with a process is handled through the process memory + map, which consists of several virtual memory areas, representing address ranges within the + process address space, that are put to some use. + They come in two flavors: + 1. anonymous mappings: + 1. process heap, stack + 2. zero-initialised variables + 3. vDSO / vvar (code and data provided by the kernel, to optimise the execution of frequently performed operations, like user space reading the time of day) + 2. file-backed mappings: + 1. constants + 2. the main executable associated with the process + 3. the dynamic linker (optional but typically used for ELF files) + 4. linked libraries (e.g. glibc) +4. when the kernel starts a process, it sets up mappings for all the virtual memory areas required to get it started, + but it doesnt actually allocate any memory: once the process is scheduled for execution, it will eventually be run + for the first time, and as soon as the various areas are accessed, they will trigger page fault exceptions. + 1. when an address is accessed for the first time, it might require that a memory page is allocated to host whatever + the associated content might be, but it is possible, especially when the process has just been started, that + even the associated page in he page table is missing. + 2. based on both availability of free pages and type of content asosciated, the access might cause the process + to sleep; exaples: + 1. no free pages are available (unusual but possible) and the kernel will have to try to obtain one, in one of + the ways described above + 2. the content needs to be loaded from a file, and the operation is blocking, because the retrieval process is much slower. + Either way, it is difficult to know if/when a process is not going to generate any more faults, and it is very much not deterministic. +6. as also decribed in the previous point, the kernel uses various optimisations for dealing with processes on-demand mapping: + 1. a physical page is allocated and mapped to a process only when the process accesses it, otherwise it might not be present: + 1. file backed pages are allocated/mapped only when read/written to + 2. anonymous pages are allocated only when written to + 2. zero page: when a page is known to be empty, it is not reserved and mapped; instead, + the kernel has one specific (dummy) memory page that is mapped tothe process as read-only + 3. shared libraries: + when the same library pages are mapped by multiple processes, the library physical pages + are mapped as read-only into each process address space + 4. copy-on-write: + 1. when a library has own data, this is initially mapped as read-only and shared; + only when written to, then a separate physical page is reserved for each writing process (threads sahre the same) + 2. same happens for data pages that were treated as zero-page, but then are written to. + 5. folios (see also kernel section): data structures that try to better abstract compound pages and *might* also be used + to represent optimised contiguous pages on ARM64 (instead of mapping 16 entries, it is possible + to map a 16-pages chunk of physically contiguous memory, that is also aligned to a 16 pages + boundary, reducing TLB use). + 1. A folio acts as intermediary between vma and lower level memory management + 2. it might pre-allocate/map more pages than explicitly requested +7. read-ahead: when asked to fetch data from disk, the kernel might attempt to optimise the operation, + reading more pages than requested, under the assumption that more requests might be coming soon + + +#### **Safety-Oriented consideration** +The following considerations are of a more deductive nature. + +1. a process that is supposed to support safety requirements should not have pages swapped out / dropped / missing, + because this would introduce: + 1. uncertainty in the timing required to recover the content, if not immediately available + 2. additional risk, involving the userspace paging mechanisms in the fulfilling of the safety requirements + 3. additional dependency on runtime linking, in case the process requires it, and code pages have been + discarded - reloading them from disk will not be sufficient +2. The optimisations made by the kernel in providing physical backing to process memory make it very + questionable if it can be assessed when a (part of) a process memory content is actually present in the + system physical memory. +3. by default, it is to be expected that a process will be exposed to various types of interference from the kernel: + 1. some of a more bening nature, like dropping of pages, or not allocation of not-yet-used one + 2. some limited in extent, but hard or even practicaly impossible to detect, like a rogue write to process physical memory + 3. some of systemic nature, like some form of use-after free, where a process page is accidentally in use also by another component + 4. some of indirect nature, like for example when the page table of the process address space is somehow corrupted +4. again, because of the extremely complex nature of the system, positive testing is not sufficient, but it needs to + be paired also with negative testing, proving that it is possible to cope with interference and detect it, somehow. +5. the same considerations made about integrity vs. avaialbility for the kernel are valid here too: detecting + interference doesn't help with keeping it under a certain threshold, and due to the complexity of the system, + it is not possible to estimate the risk reliably. + + + +## **License: CC BY-SA 4.0** + +### **DEED** +### **Attribution-ShareAlike 4.0 International** + +Full License text: + +**You are free to:** + +* **Share** — copy and redistribute the material in any medium or format for any purpose, even commercially. + +* **Adapt** — remix, transform, and build upon the material for any purpose, even commercially. + +The licensor cannot revoke these freedoms as long as you follow the license terms. + +**Under the following terms:** + +* **Attribution** — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. + +* **ShareAlike** — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. + +* **No additional restrictions** — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. + +**Notices:** + +You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation . +No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material. diff --git a/Contributions/README.md b/Contributions/README.md index fdf6f96..6009d2f 100644 --- a/Contributions/README.md +++ b/Contributions/README.md @@ -7,6 +7,9 @@ ELISA umbrella. ## Index These markdown documents are actively extended and reviewed, as part of safety-related work done within ELISA. +* [Linux Memory Management Essentials](Linux_Memory_Management_Essentials.md) + (3rd party) Summary of Memory Management features of the Linux Kernel that are also relevant for safety. + * [Using Linux in a Safe System](Using_Linux_in_a_Safe_System.md) (3rd party) Non-exhaustive list of engineering considerations and practices, that can help with designing a Safe System containing Linux. It can be seen as a companion to the Checklist below. From bd235a1696790f423aabba21f3627b249fe783d4 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 18 Jul 2024 01:36:41 +0300 Subject: [PATCH 02/27] Update Linux_Memory_Management_Essentials.md Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index a97745c..a52c26c 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -153,8 +153,8 @@ The following section presents a set of statements that can be objectively verif 2. the main executable associated with the process 3. the dynamic linker (optional but typically used for ELF files) 4. linked libraries (e.g. glibc) -4. when the kernel starts a process, it sets up mappings for all the virtual memory areas required to get it started, - but it doesnt actually allocate any memory: once the process is scheduled for execution, it will eventually be run +4. when the kernel starts a process from scratch (like with init), it sets up mappings for all the virtual memory areas required to get it started, + but it doesnt actually allocate almost any memory: once the process is scheduled for execution, it will eventually be run for the first time, and as soon as the various areas are accessed, they will trigger page fault exceptions. 1. when an address is accessed for the first time, it might require that a memory page is allocated to host whatever the associated content might be, but it is possible, especially when the process has just been started, that @@ -164,8 +164,12 @@ The following section presents a set of statements that can be objectively verif 1. no free pages are available (unusual but possible) and the kernel will have to try to obtain one, in one of the ways described above 2. the content needs to be loaded from a file, and the operation is blocking, because the retrieval process is much slower. + 3. the only exception, where pages are specifically allocated before the process is started, is the beginning of the stack; here process arguments and context are stored. Either way, it is difficult to know if/when a process is not going to generate any more faults, and it is very much not deterministic. -6. as also decribed in the previous point, the kernel uses various optimisations for dealing with processes on-demand mapping: +5. creations of threads for an existing process reuses the process memory map almost entirely, however an additional stack is + allocated for each new thread, picking an address range between the live ends of heap and primary stack. + The sizes of these additional stacks are defined at creation time, but allocation of physical pages happens on-demand. +6. as also decribed in the previous points, the kernel uses various optimisations for dealing with processes on-demand mapping: 1. a physical page is allocated and mapped to a process only when the process accesses it, otherwise it might not be present: 1. file backed pages are allocated/mapped only when read/written to 2. anonymous pages are allocated only when written to @@ -210,6 +214,9 @@ The following considerations are of a more deductive nature. 5. the same considerations made about integrity vs. avaialbility for the kernel are valid here too: detecting interference doesn't help with keeping it under a certain threshold, and due to the complexity of the system, it is not possible to estimate the risk reliably. +6. a single-thread process can interfere with itself, since typically most of its data is writable +7. when dealing with a multi-threaded process, besides simple self interference, one must also consider cross-thread + interference, where each thread can corrupt not only its own stack, but also the stack of every other process. From 28c01810524c965573048c7b0fac27acace47f38 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 19 Sep 2024 15:16:52 +0300 Subject: [PATCH 03/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index a52c26c..1b04850 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -54,7 +54,7 @@ Individual points are numbered for ease of reference, but the numbering is not m #### **Directly Verifiable Assertions** The following section presents a set of statements that can be objectively verified e.g. by inspecting the sources. -1. unlike processes memory, kernel memory pages are not swapped, nor dropped silently by the kernel itself, +1. Unlike processes memory, kernel memory pages are not dynamically swapped during normal (i.e. not low-power) operation, nor dropped silently by the kernel itself, although an hypervisor will do to a VM what the kernel does to a process (but this is beyond the control of the kernel) 2. the kernel context (usually EL1 on ARM64) uses one single memory map (page tables) across all the cores executing in kernel mode From 597c03b0f460277b7c033e8fcc8bb1eb70fd97a4 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 19 Sep 2024 15:19:25 +0300 Subject: [PATCH 04/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 1b04850..cdcf544 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -67,7 +67,7 @@ The following section presents a set of statements that can be objectively verif 1. this type of access is limited to few functions, like copy_to_user()/copy_from_user() and put_user()/get_user() 2. outside the execution of the functions mentioned, this type of access is not possible, because the userspace mappings are made available only while executing such functions 3. the userspace memory map can implement HW protections against being misused by the kernel - 4. the kernel is able to access process pages in the same sequence and with the same mappings as the process does + 4. the kernel is able to access the pages of a user space process using the same translation mechanisms as the process (i.e. the process memory map), unless explicitly prevented by e.g. PAN or PXN 2. through a memory buffer (e.g. a memory area where the kernel regularly needs to perform large amount of read/write operations, like a network buffer) 1. the memory used by a buffer is well defined and delimited, rather than a generic area, and it is specifically reserved for this purpose 2. the kernel uses own mappings in EL1, while the user space uses the EL0 ones From 7dc00ef4e1af272ccc025b72edd6981105ab54c6 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 19 Sep 2024 15:29:43 +0300 Subject: [PATCH 05/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index cdcf544..e3e886d 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -71,7 +71,7 @@ The following section presents a set of statements that can be objectively verif 2. through a memory buffer (e.g. a memory area where the kernel regularly needs to perform large amount of read/write operations, like a network buffer) 1. the memory used by a buffer is well defined and delimited, rather than a generic area, and it is specifically reserved for this purpose 2. the kernel uses own mappings in EL1, while the user space uses the EL0 ones - 3. very often the kernel doesn't actually access this area directly, but it rather configures a DMA controller to do the transfers, directly to the physical memory. + 3. in many cases, the kernel does not access this area directly, but rather configures a DMA controller to complete the transfer directly to the physical memory (i.e. bypassing the MMU) 4. misconfiguration of these peripherals that can access physical memory directly is a potential problem - a form of mitigation relies on using IOMMUs: hardware components that the kernel can configure as a firewall on the phisical memory bus, to limit what memory is accessible to each memory bus master device. 3. through the EL1 mappings + linear map 1. not the intended way for the kernel to access process context, because the intended way is one of the two previously described From 0dde64187eb594c1a1aea137ca8cd18f0bce20d1 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 19 Sep 2024 15:41:38 +0300 Subject: [PATCH 06/27] Update Linux_Memory_Management_Essentials.md added PXA, PXN examples Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index e3e886d..799d49b 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -66,7 +66,7 @@ The following section presents a set of statements that can be objectively verif 1. through the userspace mapping 1. this type of access is limited to few functions, like copy_to_user()/copy_from_user() and put_user()/get_user() 2. outside the execution of the functions mentioned, this type of access is not possible, because the userspace mappings are made available only while executing such functions - 3. the userspace memory map can implement HW protections against being misused by the kernel + 3. the userspace memory map can implement HW protections against being misused by the kernel (e.g. ARM PXN, PXA) 4. the kernel is able to access the pages of a user space process using the same translation mechanisms as the process (i.e. the process memory map), unless explicitly prevented by e.g. PAN or PXN 2. through a memory buffer (e.g. a memory area where the kernel regularly needs to perform large amount of read/write operations, like a network buffer) 1. the memory used by a buffer is well defined and delimited, rather than a generic area, and it is specifically reserved for this purpose From 2d70581b0500e3b1439c1f2314d02d830b779a99 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 19 Sep 2024 15:45:44 +0300 Subject: [PATCH 07/27] Update Linux_Memory_Management_Essentials.md monolithic -> boot image Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 799d49b..31ad6bf 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -83,7 +83,7 @@ The following section presents a set of statements that can be objectively verif 6. Assertions about page "mobility", from different perspectives: 1. employment of physical pages - observing a physical memory page (e.g. a 4kB chunk aligned on a 4kB boundary) and how it is employed: what sort of content it might host, over time. 1. certain physical pages are put to use for a specific purpose, which is (almost) immutable, for the entire duration of the execution of the kernel: - 1. kernel code from the monolithic portion (except for pages containing kernel init code, which are released after kernel init is completed) + 1. kernel code from the boot image (except for pages containing kernel init code, which are released after kernel init is completed) 2. kernel statically allocated data from the monolithic portion (except for pages containing kernel init data, which are released after kernel init is completed) 3. some kernel dynamically allocated data, used by the kernel itself, and never released, due to the nature of its use (e.g. object caches, persistent buffers, etc.) 4. memory used for loadable kernel modules (code, data) is tied to the permanence of the module in memory - this is typically stable through the entire duration of the execution. Some exceptions are modules loaded/unloaded as a consequence of certain peripherals (dis)appearing, e.g. USB ones. From 2d747c0920aa4a02631e6b260c26894cd71cd2ee Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 19 Sep 2024 17:20:03 +0300 Subject: [PATCH 08/27] Update Linux_Memory_Management_Essentials.md Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 31ad6bf..0299b8e 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -84,17 +84,19 @@ The following section presents a set of statements that can be objectively verif 1. employment of physical pages - observing a physical memory page (e.g. a 4kB chunk aligned on a 4kB boundary) and how it is employed: what sort of content it might host, over time. 1. certain physical pages are put to use for a specific purpose, which is (almost) immutable, for the entire duration of the execution of the kernel: 1. kernel code from the boot image (except for pages containing kernel init code, which are released after kernel init is completed) - 2. kernel statically allocated data from the monolithic portion (except for pages containing kernel init data, which are released after kernel init is completed) + 2. kernel statically allocated data from the boot image (except for pages containing kernel init data, which are released after kernel init is completed) 3. some kernel dynamically allocated data, used by the kernel itself, and never released, due to the nature of its use (e.g. object caches, persistent buffers, etc.) - 4. memory used for loadable kernel modules (code, data) is tied to the permanence of the module in memory - this is typically stable through the entire duration of the execution. Some exceptions are modules loaded/unloaded as a consequence of certain peripherals (dis)appearing, e.g. USB ones. + 4. memory used for loadable kernel modules (code, data) is tied to the permanence of the module in memory - this is very often stable through the entire duration of the execution. Some exceptions are modules loaded/unloaded as a consequence of certain peripherals (dis)appearing, e.g. USB ones. 2. other physical pages (most, actually, in a typical system) are available for runtime allocation/release cycles of multiple users: 1. transient kernel linear memory allocations (kmalloc / get_free_pages) 2. transient kernel virtually linear memory allocations (vmalloc for kernel address space) 3. user-space memory allocations (they are always virtually linear), which are by default transient and exposed to repurposing proactively done by the kernel (see below) - 2. transitioning of a logical page - given certain context and content, where it might be located in memory over time - and if it might be even discarded. + 2. location of the content a "logical" page: as seen by a process perspective, the content is always atthe same memory location, but in reality it can be moved around, over different physical pages. So, given a certain context and content, where it might be located in memory over time - and if it might be even discarded. 1. the kernel doesn't spontaneously repurpose the utilisation of its own physical pages; therefore it is possible to assume that the logical content of kernel allocations will reamin tied to the associated physical pages, as long as it is not intentionally altered (or subject to interference) - 1. metadata memory used by the kernel for housekeeping purposes related to processes is included in this category; examples: task/cred structures, vmas, maple leaves, process page tables. - 2. memory used by the kernel for the actual userspace processes: the content of this logical page determines its life cycle and expectancy: certain content such as code or constants can be "re-constructed" by relaoding it from file (code pages are likely to necessitate undergoing re-linking), so the actual logical content might disappear, over time. Other pages, on the other hand, are meant to hold non-re-constructible content, such as stack, heap, and variable data. These pages can, at most, be swapped out, and loaded back later on, but they cannot be simply dropped. + 1. metadata memory used by the kernel for housekeeping purposes related to processes is included in this category; examples: task/cred structures, vmas structures (not the vma pages), maple leaves, process page tables. + 2. memory used by the kernel for the actual userspace processes: the content of this logical page determines its life cycle and expectancy: + 1. Certain content such as code or constants can be "re-constructed", by relaoding it from file (code pages are likely to necessitate undergoing re-linking), so the actual logical content might disappear, over time. + 2. Other pages, on the other hand, are meant to hold non-re-constructible content, such as stack, heap, and variable data. These pages can, at most, be swapped out, and loaded back later on, but they cannot be simply dropped. 3. page cache: it is a collection of memory pages containing data read from files, over time; e.g. code or initialised data from a file that was accessed recently; in some cases the page might never have been used, but it was loaded as part of the readahaead optimisation. The life expectancy of these logical pages is heavily affected by how many processes might keep accesing them and the level of memory starvation of the system caused by other processes, with some additional complexity layered on top of this, by the use of containers. 3. The kernel utilises various optimisations that are meant to take advantage of hardware features, such as multi-stage caching, and also to cope with different memory architectures (like Non Unifor Memory Architecture - NUMA). The main goals are: 1. avoid having to propagate too frequently write operations through the layers of HW cache, which is caused by pages being evicted from the cache, due to memory pressure From ce04f359f5482c2a6cfa576eafccbf848f8a311f Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 19 Sep 2024 17:26:02 +0300 Subject: [PATCH 09/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 0299b8e..8ffaa04 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -121,7 +121,7 @@ The following section presents a set of statements that can be objectively verif The following considerations are of a more deductive nature. 1. Because of the way pages, fractions and multiples of them are allocated, freed, cached, recovered, there is a complex interaction between system components at various layers. -2. Even using cgroups, it is not possible to segregate interaction at the low level between components with differnet level of safety qualification (e.g. a QM container can and most likely will affect the recirculation of pages related to an ASIL one) +2. Even using cgroups, it is not possible to eliminate indirect interaction at the low level between components with different levels of safety integrity (e.g. the recirculation of pages related to critical processes in one group might be affected by less critical processes in another group) 3. Because of the nature of memory management, it must be expected that memory management mechanisms will interfere with safe processes, either due to a bug or due to the interference toward the metadata they rely on. For example, the memory management might hand over to a requesting entity a memory page that is currently already in use either by a device driver or by a userspace process playing a role in a safety use case. 4. Still due to te complex interaction between processes, kernel drivers and other kernel code, it is practically impossible to qualify the kernel as safe through positive testing alone, because it is impossible to validate all the possible combinations, and it is equally impossible to assess the overall test coverage and the risk associated with not reaching 100%. The only reliable way to test is to use negative testing (simulating a certain type of interference) and confirming that the system response is consistent with expectations (e.g detect the interference, in case of ASILB requirements). And even then, the only credible claim that can be made is that, given the simulated type of interference, on the typology of target employed, the reaction is aligned with the requirements. Other types of targets will require further ad-hoc negative testing. 5. Linux Kernel mechanisms like SELinux and cgroups/containers do not offer any protection against interference originating from the kernel itself. From fd19bb567793846af11384fa745c38641d43c370 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 19 Sep 2024 17:26:48 +0300 Subject: [PATCH 10/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 8ffaa04..4746c25 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -122,7 +122,7 @@ The following considerations are of a more deductive nature. 1. Because of the way pages, fractions and multiples of them are allocated, freed, cached, recovered, there is a complex interaction between system components at various layers. 2. Even using cgroups, it is not possible to eliminate indirect interaction at the low level between components with different levels of safety integrity (e.g. the recirculation of pages related to critical processes in one group might be affected by less critical processes in another group) -3. Because of the nature of memory management, it must be expected that memory management mechanisms will interfere with safe processes, either due to a bug or due to the interference toward the metadata they rely on. For example, the memory management might hand over to a requesting entity a memory page that is currently already in use either by a device driver or by a userspace process playing a role in a safety use case. +3. Because of the nature of memory management, we cannot rule out the possibility that memory management mechanisms will interfere with safe processes, either due to a bug or due to the interference toward the metadata they rely on. For example, the memory management might hand over to a requesting entity a memory page that is currently already in use either by a device driver or by a userspace process playing a role in a safety use case. 4. Still due to te complex interaction between processes, kernel drivers and other kernel code, it is practically impossible to qualify the kernel as safe through positive testing alone, because it is impossible to validate all the possible combinations, and it is equally impossible to assess the overall test coverage and the risk associated with not reaching 100%. The only reliable way to test is to use negative testing (simulating a certain type of interference) and confirming that the system response is consistent with expectations (e.g detect the interference, in case of ASILB requirements). And even then, the only credible claim that can be made is that, given the simulated type of interference, on the typology of target employed, the reaction is aligned with the requirements. Other types of targets will require further ad-hoc negative testing. 5. Linux Kernel mechanisms like SELinux and cgroups/containers do not offer any protection against interference originating from the kernel itself. 6. The Linux Kernel must be assumed to be QM, unless specific subsystems are qualified through both positive and negative testing. From 85581a7db5e5a5acff5cf5b92696aed4e38a7acf Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 19 Sep 2024 17:38:10 +0300 Subject: [PATCH 11/27] Update Linux_Memory_Management_Essentials.md Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 4746c25..8ff0064 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -125,7 +125,7 @@ The following considerations are of a more deductive nature. 3. Because of the nature of memory management, we cannot rule out the possibility that memory management mechanisms will interfere with safe processes, either due to a bug or due to the interference toward the metadata they rely on. For example, the memory management might hand over to a requesting entity a memory page that is currently already in use either by a device driver or by a userspace process playing a role in a safety use case. 4. Still due to te complex interaction between processes, kernel drivers and other kernel code, it is practically impossible to qualify the kernel as safe through positive testing alone, because it is impossible to validate all the possible combinations, and it is equally impossible to assess the overall test coverage and the risk associated with not reaching 100%. The only reliable way to test is to use negative testing (simulating a certain type of interference) and confirming that the system response is consistent with expectations (e.g detect the interference, in case of ASILB requirements). And even then, the only credible claim that can be made is that, given the simulated type of interference, on the typology of target employed, the reaction is aligned with the requirements. Other types of targets will require further ad-hoc negative testing. 5. Linux Kernel mechanisms like SELinux and cgroups/containers do not offer any protection against interference originating from the kernel itself. -6. The Linux Kernel must be assumed to be QM, unless specific subsystems are qualified through both positive and negative testing. +6. The Linux Kernel must be assumed to not be safe; possibly QM at best, unless specific subsystems are qualified through both positive and negative testing. 7. Even after achieving the ability to make certain claims about the Kernel integrity (or detection of its loss), this doesn't guarrantees system availability: knowing that an interference has occurred doesn't help with ensuring a certain level of system availability. ### **User-space memory allocations** From e868efe6a1496d116ae9d436c70d07148df7bf4d Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 19 Sep 2024 17:38:46 +0300 Subject: [PATCH 12/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Daniel Weingaertner Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 8ff0064..b8946c8 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -98,7 +98,7 @@ The following section presents a set of statements that can be objectively verif 1. Certain content such as code or constants can be "re-constructed", by relaoding it from file (code pages are likely to necessitate undergoing re-linking), so the actual logical content might disappear, over time. 2. Other pages, on the other hand, are meant to hold non-re-constructible content, such as stack, heap, and variable data. These pages can, at most, be swapped out, and loaded back later on, but they cannot be simply dropped. 3. page cache: it is a collection of memory pages containing data read from files, over time; e.g. code or initialised data from a file that was accessed recently; in some cases the page might never have been used, but it was loaded as part of the readahaead optimisation. The life expectancy of these logical pages is heavily affected by how many processes might keep accesing them and the level of memory starvation of the system caused by other processes, with some additional complexity layered on top of this, by the use of containers. - 3. The kernel utilises various optimisations that are meant to take advantage of hardware features, such as multi-stage caching, and also to cope with different memory architectures (like Non Unifor Memory Architecture - NUMA). The main goals are: + 3. The kernel utilises various optimisations that are meant to take advantage of hardware features, such as multi-stage caching, and also to cope with different memory architectures (like Non Uniform Memory Architecture - NUMA). The main goals are: 1. avoid having to propagate too frequently write operations through the layers of HW cache, which is caused by pages being evicted from the cache, due to memory pressure 2. avoid having multiple cpus writing to the same page, in a NUMA system, where only one cpu has direct write access to that memory page, because it would cause cache invalidation Therefore, the kernel tends to: From 2ccc6211e93518cbfb3826aea9ebd1640c672209 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 19 Sep 2024 17:41:23 +0300 Subject: [PATCH 13/27] Update Linux_Memory_Management_Essentials.md Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index b8946c8..34277b7 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -108,7 +108,7 @@ The following section presents a set of statements that can be objectively verif 5. For what concerns allocation from the linear map (kmalloc / get_free_pages), the kernel attempts to keep the free pages as much continuous as possible, avoiding fragmentation of the free areas. 1. this is implemented through the concept of the "buddy allocator", meaning that whenever a certain amount of linear memory is requested (either sub-page or multi-page size), it always tries to obtain it from the smallest free slot available, only breaking larger free slots when no alternatives are possible. 2. the kernel also keeps ready a certain amount of pre-diced memory allocations, to avoid incurring the penalty of having to look for some free memory as a consequence of an allocation request. - 3. folios are structures introduced to simplify the management of what has been traditionally called compound pages: a compound page represents a group of contiguous pages that is treated as a single logical unit. Folios could eventually support the use of optimisations provided by certain pages (e.g. ARM64 allows the use of a single page table entry to represent 16 pages, as long as they are physically contiguous and aligned to a 16 pages boundary, through the "contiguous bit" flag in the page table). This can be useful e.g. when keeping in the page cache a chunk of data from a file, should the memory be released, it could result in releasing several physically contiguous pages, instead of scattered ones. + 3. folios are structures introduced to simplify the management of what has been traditionally called compound pages and reduce memory fragmentation: a compound page represents a group of contiguous pages that is treated as a single logical unit. Folios could eventually support the use of optimisations provided by certain pages (e.g. ARM64 allows the use of a single page table entry to represent 16 pages, as long as they are physically contiguous and aligned to a 16 pages boundary, through the "contiguous bit" flag in the page table). This can be useful e.g. when keeping in the page cache a chunk of data from a file, should the memory be released, it could result in releasing several physically contiguous pages, instead of scattered ones. 6. whenever possible, allocations happen through caches, which means that said caches must be re-filled, whenever they hit a low watermark, and this re-filling can happen in two ways: 1. through recycling memory as it gets freed: for example in case a core is running short of pages in its own local queue, it might "capture" a page that it is freeing. 2. through a dedicated thread that can asynchronously dice larger order pages into smaller portions that are then placed into caches in need to be refilled From c3496f6f566ead0660b415e0ee2b5168d63982c0 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 19 Sep 2024 17:44:03 +0300 Subject: [PATCH 14/27] Update Linux_Memory_Management_Essentials.md Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 34277b7..5ebd7df 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -109,7 +109,7 @@ The following section presents a set of statements that can be objectively verif 1. this is implemented through the concept of the "buddy allocator", meaning that whenever a certain amount of linear memory is requested (either sub-page or multi-page size), it always tries to obtain it from the smallest free slot available, only breaking larger free slots when no alternatives are possible. 2. the kernel also keeps ready a certain amount of pre-diced memory allocations, to avoid incurring the penalty of having to look for some free memory as a consequence of an allocation request. 3. folios are structures introduced to simplify the management of what has been traditionally called compound pages and reduce memory fragmentation: a compound page represents a group of contiguous pages that is treated as a single logical unit. Folios could eventually support the use of optimisations provided by certain pages (e.g. ARM64 allows the use of a single page table entry to represent 16 pages, as long as they are physically contiguous and aligned to a 16 pages boundary, through the "contiguous bit" flag in the page table). This can be useful e.g. when keeping in the page cache a chunk of data from a file, should the memory be released, it could result in releasing several physically contiguous pages, instead of scattered ones. - 6. whenever possible, allocations happen through caches, which means that said caches must be re-filled, whenever they hit a low watermark, and this re-filling can happen in two ways: + 6. whenever possible, allocations happen through caches (e.g. kmalloc caches, perCPU caches, ad-hoc object caches, etc.), which means that said caches must be re-filled, whenever they hit a low watermark, and this re-filling can happen in two ways: 1. through recycling memory as it gets freed: for example in case a core is running short of pages in its own local queue, it might "capture" a page that it is freeing. 2. through a dedicated thread that can asynchronously dice larger order pages into smaller portions that are then placed into caches in need to be refilled 7. the kernel can also employ an Out Of Memory Killer feature, that is invoked in extreme cases, when all the existing stashes of memory have been depleted: in this case the killer will pick a user space process and just evict it, releasing all the resources it had allocated. It's far from desirable, but it's a method sometimes employed. From bf30c49c23bb1d65551eceec71b28b8cf38eabee Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Fri, 20 Sep 2024 15:34:56 +0300 Subject: [PATCH 15/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 5ebd7df..2544f5d 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -126,7 +126,7 @@ The following considerations are of a more deductive nature. 4. Still due to te complex interaction between processes, kernel drivers and other kernel code, it is practically impossible to qualify the kernel as safe through positive testing alone, because it is impossible to validate all the possible combinations, and it is equally impossible to assess the overall test coverage and the risk associated with not reaching 100%. The only reliable way to test is to use negative testing (simulating a certain type of interference) and confirming that the system response is consistent with expectations (e.g detect the interference, in case of ASILB requirements). And even then, the only credible claim that can be made is that, given the simulated type of interference, on the typology of target employed, the reaction is aligned with the requirements. Other types of targets will require further ad-hoc negative testing. 5. Linux Kernel mechanisms like SELinux and cgroups/containers do not offer any protection against interference originating from the kernel itself. 6. The Linux Kernel must be assumed to not be safe; possibly QM at best, unless specific subsystems are qualified through both positive and negative testing. -7. Even after achieving the ability to make certain claims about the Kernel integrity (or detection of its loss), this doesn't guarrantees system availability: knowing that an interference has occurred doesn't help with ensuring a certain level of system availability. +7. Claims about kernel integrity (or detection of its loss), do not guarantee system availability; safety arguments for a Linux-based system that rely upon a level of availability must separately show that this is supported. ### **User-space memory allocations** From 52419a6dae2cbb4689735943f97bd979d3e75215 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Fri, 20 Sep 2024 15:35:51 +0300 Subject: [PATCH 16/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 2544f5d..97c04a7 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -206,7 +206,7 @@ The following considerations are of a more deductive nature. 2. The optimisations made by the kernel in providing physical backing to process memory make it very questionable if it can be assessed when a (part of) a process memory content is actually present in the system physical memory. -3. by default, it is to be expected that a process will be exposed to various types of interference from the kernel: +3. we cannot rule out the possibility that a process will be exposed to various types of interference from the kernel: 1. some of a more bening nature, like dropping of pages, or not allocation of not-yet-used one 2. some limited in extent, but hard or even practicaly impossible to detect, like a rogue write to process physical memory 3. some of systemic nature, like some form of use-after free, where a process page is accidentally in use also by another component From 51d1a44bdd74a9d1f1c49b85cafa4626a414ef83 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Fri, 20 Sep 2024 15:36:48 +0300 Subject: [PATCH 17/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 97c04a7..bdb9a11 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -216,7 +216,7 @@ The following considerations are of a more deductive nature. 5. the same considerations made about integrity vs. avaialbility for the kernel are valid here too: detecting interference doesn't help with keeping it under a certain threshold, and due to the complexity of the system, it is not possible to estimate the risk reliably. -6. a single-thread process can interfere with itself, since typically most of its data is writable +6. a single-thread process can interfere with itself, since typically most of its data is writeable; the kernel cannot be responsible for preventing this category of interference 7. when dealing with a multi-threaded process, besides simple self interference, one must also consider cross-thread interference, where each thread can corrupt not only its own stack, but also the stack of every other process. From 3317568ca79e32f77df5805322e739b8ef2c2c70 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Fri, 20 Sep 2024 15:37:01 +0300 Subject: [PATCH 18/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Daniel Weingaertner Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index bdb9a11..15811b6 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -162,7 +162,7 @@ The following section presents a set of statements that can be objectively verif the associated content might be, but it is possible, especially when the process has just been started, that even the associated page in he page table is missing. 2. based on both availability of free pages and type of content asosciated, the access might cause the process - to sleep; exaples: + to sleep; examples: 1. no free pages are available (unusual but possible) and the kernel will have to try to obtain one, in one of the ways described above 2. the content needs to be loaded from a file, and the operation is blocking, because the retrieval process is much slower. From 20461ea36743ef772fc1f723804ba1b90f46a89a Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Fri, 20 Sep 2024 15:37:15 +0300 Subject: [PATCH 19/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Daniel Weingaertner Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 15811b6..c8f0491 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -141,7 +141,7 @@ The following section presents a set of statements that can be objectively verif in a certain moment has also a physical backing, while it is being executed. It might be possible to introduce some additional certainties, for example by pinning down some memory allocations by marking the process as non-swappable. The presence of caching and other - optimisations like the zero-page make it very difficult to be assertive beyond this point. + optimisations, like the zero-page, make it very difficult to be assertive beyond this point. 3. the management of memory pages associated with a process is handled through the process memory map, which consists of several virtual memory areas, representing address ranges within the process address space, that are put to some use. From d4e4ae7cd92f6ac530ce026aa23a637083df8393 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Fri, 20 Sep 2024 15:37:25 +0300 Subject: [PATCH 20/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Daniel Weingaertner Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index c8f0491..68d182a 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -160,7 +160,7 @@ The following section presents a set of statements that can be objectively verif for the first time, and as soon as the various areas are accessed, they will trigger page fault exceptions. 1. when an address is accessed for the first time, it might require that a memory page is allocated to host whatever the associated content might be, but it is possible, especially when the process has just been started, that - even the associated page in he page table is missing. + even the associated page in the page table is missing. 2. based on both availability of free pages and type of content asosciated, the access might cause the process to sleep; examples: 1. no free pages are available (unusual but possible) and the kernel will have to try to obtain one, in one of From 4a4396d2361ad68b9fd7cb53608204e4d4d84883 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Fri, 20 Sep 2024 15:41:32 +0300 Subject: [PATCH 21/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 68d182a..5f6dadc 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -207,7 +207,7 @@ The following considerations are of a more deductive nature. questionable if it can be assessed when a (part of) a process memory content is actually present in the system physical memory. 3. we cannot rule out the possibility that a process will be exposed to various types of interference from the kernel: - 1. some of a more bening nature, like dropping of pages, or not allocation of not-yet-used one + 1. some of a more benign nature, like dropping of pages, or not allocation of not-yet-used one 2. some limited in extent, but hard or even practicaly impossible to detect, like a rogue write to process physical memory 3. some of systemic nature, like some form of use-after free, where a process page is accidentally in use also by another component 4. some of indirect nature, like for example when the page table of the process address space is somehow corrupted From 18994c68a6e84b41572542c272561e0bd6412a5c Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Fri, 20 Sep 2024 15:44:29 +0300 Subject: [PATCH 22/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 5f6dadc..986c3c8 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -213,7 +213,7 @@ The following considerations are of a more deductive nature. 4. some of indirect nature, like for example when the page table of the process address space is somehow corrupted 4. again, because of the extremely complex nature of the system, positive testing is not sufficient, but it needs to be paired also with negative testing, proving that it is possible to cope with interference and detect it, somehow. -5. the same considerations made about integrity vs. avaialbility for the kernel are valid here too: detecting +5. the same considerations made about integrity vs. availability for the kernel are valid here too: detecting interference doesn't help with keeping it under a certain threshold, and due to the complexity of the system, it is not possible to estimate the risk reliably. 6. a single-thread process can interfere with itself, since typically most of its data is writeable; the kernel cannot be responsible for preventing this category of interference From 3a589a5bed121602bb67730960b2f50f0808371e Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Fri, 20 Sep 2024 15:44:47 +0300 Subject: [PATCH 23/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 986c3c8..e23f0e9 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -215,7 +215,7 @@ The following considerations are of a more deductive nature. be paired also with negative testing, proving that it is possible to cope with interference and detect it, somehow. 5. the same considerations made about integrity vs. availability for the kernel are valid here too: detecting interference doesn't help with keeping it under a certain threshold, and due to the complexity of the system, - it is not possible to estimate the risk reliably. + it is not possible to reliably estimate the risk that availability criteria will not be achieved. 6. a single-thread process can interfere with itself, since typically most of its data is writeable; the kernel cannot be responsible for preventing this category of interference 7. when dealing with a multi-threaded process, besides simple self interference, one must also consider cross-thread interference, where each thread can corrupt not only its own stack, but also the stack of every other process. From 67eca54263d60473e95b1acccc371e03b2802a03 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 23 Jan 2025 13:28:31 +0200 Subject: [PATCH 24/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index e23f0e9..fae6349 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -123,7 +123,12 @@ The following considerations are of a more deductive nature. 1. Because of the way pages, fractions and multiples of them are allocated, freed, cached, recovered, there is a complex interaction between system components at various layers. 2. Even using cgroups, it is not possible to eliminate indirect interaction at the low level between components with different levels of safety integrity (e.g. the recirculation of pages related to critical processes in one group might be affected by less critical processes in another group) 3. Because of the nature of memory management, we cannot rule out the possibility that memory management mechanisms will interfere with safe processes, either due to a bug or due to the interference toward the metadata they rely on. For example, the memory management might hand over to a requesting entity a memory page that is currently already in use either by a device driver or by a userspace process playing a role in a safety use case. -4. Still due to te complex interaction between processes, kernel drivers and other kernel code, it is practically impossible to qualify the kernel as safe through positive testing alone, because it is impossible to validate all the possible combinations, and it is equally impossible to assess the overall test coverage and the risk associated with not reaching 100%. The only reliable way to test is to use negative testing (simulating a certain type of interference) and confirming that the system response is consistent with expectations (e.g detect the interference, in case of ASILB requirements). And even then, the only credible claim that can be made is that, given the simulated type of interference, on the typology of target employed, the reaction is aligned with the requirements. Other types of targets will require further ad-hoc negative testing. +4. These complex interaction between processes, kernel drivers and other kernel code mean that it is practically impossible to qualify the kernel through positive testing alone. + 1. Specifying requirements and implementing a credible set of tests to cover all of the kernel's functions for the general case is certainly infeasible (and arguably impossible), because the range of potential applications and integrations for Linux is too broad. + 2. We can constrain this scope by specifying a kernel version and configuration, for a given set of target systems and software integrations, and specifying the set of functions it is intended to provide. However, this would still not be sufficient to assert that the kernel is devoid of certain classes of bug (e.g. for bugs caused by interference). + 3. Negative tests derived from credible analysis of the kernel could be used to address this, by verifying its behaviour (and/or mitigations provided by other components of a target system) for a documented set of failure modes. + 4. This might be achieved, for example, by simulating an identified type of interference for a range of positive test cases, and confirming that the overall integrated system's response is consistent with a specified set of expectations (e.g. the interference is detected and a kernel- or system-level mitigation is triggered). + 5. This, in combination with requirements-based functional testing, could be a viable approach for qualifying a specific integration and configuration of Linux, for a given set of target systems and use cases. 5. Linux Kernel mechanisms like SELinux and cgroups/containers do not offer any protection against interference originating from the kernel itself. 6. The Linux Kernel must be assumed to not be safe; possibly QM at best, unless specific subsystems are qualified through both positive and negative testing. 7. Claims about kernel integrity (or detection of its loss), do not guarantee system availability; safety arguments for a Linux-based system that rely upon a level of availability must separately show that this is supported. From c0bba1f936329e642092446301502129e76654db Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Thu, 23 Jan 2025 13:29:47 +0200 Subject: [PATCH 25/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index fae6349..6a5daf3 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -123,7 +123,12 @@ The following considerations are of a more deductive nature. 1. Because of the way pages, fractions and multiples of them are allocated, freed, cached, recovered, there is a complex interaction between system components at various layers. 2. Even using cgroups, it is not possible to eliminate indirect interaction at the low level between components with different levels of safety integrity (e.g. the recirculation of pages related to critical processes in one group might be affected by less critical processes in another group) 3. Because of the nature of memory management, we cannot rule out the possibility that memory management mechanisms will interfere with safe processes, either due to a bug or due to the interference toward the metadata they rely on. For example, the memory management might hand over to a requesting entity a memory page that is currently already in use either by a device driver or by a userspace process playing a role in a safety use case. -4. These complex interaction between processes, kernel drivers and other kernel code mean that it is practically impossible to qualify the kernel through positive testing alone. +4. These complex interaction between processes, kernel drivers and other kernel code mean that it is not feasible to qualify the kernel through positive, requirement-based testing alone. + 1. Safety-related claims made for the kernel must be specified and verified with a credible set of tests, which must necessarily include negative tests to demonstrate the effectiveness of any detection, mitigation and exception-handling mechanisms provided in support of these claims. + 2. Credible analysis of the kernel could be used to devise a suitable set of tests for this, to verify the kernel's behaviour (and/or mitigations provided by other components of a target system) for a documented set of failure modes. + 3. Specifying requirements and implementing a credible set of tests to cover all of the kernel's functions for the general case is certainly infeasible (and arguably impossible), because the range of potential applications and integrations for Linux is too broad. + 4. Demonstrating that claims are supported with a combination of analysis and testing does not allow us to argue that the kernel is devoid of a certain class of bug, only that our verification mechanisms are sufficient to detect bugs affecting the specified claims, and for the identified failure modes. + 5.. Claims must therefore be verified for each iteration of the kernel intended for use in a safety-related context, and for the specific configuration and system integration of a target context; the sufficiency of the set of failure modes considered, and of the analysis informing these, must be also confirmed for the target context. 1. Specifying requirements and implementing a credible set of tests to cover all of the kernel's functions for the general case is certainly infeasible (and arguably impossible), because the range of potential applications and integrations for Linux is too broad. 2. We can constrain this scope by specifying a kernel version and configuration, for a given set of target systems and software integrations, and specifying the set of functions it is intended to provide. However, this would still not be sufficient to assert that the kernel is devoid of certain classes of bug (e.g. for bugs caused by interference). 3. Negative tests derived from credible analysis of the kernel could be used to address this, by verifying its behaviour (and/or mitigations provided by other components of a target system) for a documented set of failure modes. From 8d7f8513176ff0d3421d4988d39cbec33edfc282 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Sun, 26 Jan 2025 17:49:48 +0200 Subject: [PATCH 26/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 6a5daf3..8c8e9cc 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -221,8 +221,8 @@ The following considerations are of a more deductive nature. 2. some limited in extent, but hard or even practicaly impossible to detect, like a rogue write to process physical memory 3. some of systemic nature, like some form of use-after free, where a process page is accidentally in use also by another component 4. some of indirect nature, like for example when the page table of the process address space is somehow corrupted -4. again, because of the extremely complex nature of the system, positive testing is not sufficient, but it needs to - be paired also with negative testing, proving that it is possible to cope with interference and detect it, somehow. +4. Because of the extremely complex nature of the system, positive testing alone is not sufficient: it must also be + paired with negative testing, to prove that it is possible to detect or cope with given types of interference. 5. the same considerations made about integrity vs. availability for the kernel are valid here too: detecting interference doesn't help with keeping it under a certain threshold, and due to the complexity of the system, it is not possible to reliably estimate the risk that availability criteria will not be achieved. From 030c654083695fc5b34f6f19109e0a7a554ca1d0 Mon Sep 17 00:00:00 2001 From: Igor Stoppa Date: Sun, 26 Jan 2025 19:16:43 +0200 Subject: [PATCH 27/27] Update Contributions/Linux_Memory_Management_Essentials.md Co-authored-by: Paul Albertella Signed-off-by: Igor Stoppa --- Contributions/Linux_Memory_Management_Essentials.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/Contributions/Linux_Memory_Management_Essentials.md b/Contributions/Linux_Memory_Management_Essentials.md index 8c8e9cc..80147a0 100644 --- a/Contributions/Linux_Memory_Management_Essentials.md +++ b/Contributions/Linux_Memory_Management_Essentials.md @@ -207,12 +207,12 @@ The following section presents a set of statements that can be objectively verif #### **Safety-Oriented consideration** The following considerations are of a more deductive nature. -1. a process that is supposed to support safety requirements should not have pages swapped out / dropped / missing, - because this would introduce: - 1. uncertainty in the timing required to recover the content, if not immediately available - 2. additional risk, involving the userspace paging mechanisms in the fulfilling of the safety requirements - 3. additional dependency on runtime linking, in case the process requires it, and code pages have been - discarded - reloading them from disk will not be sufficient +1. For a process intended to support safety requirements, having pages swapped out, dropped or missing + creates additional risk, because it introduces: + 1. Uncertainty in the timing required to recover the content, if it is not immediately available. + 2. Reliance on userspace paging mechanisms for the fulfilment of applicable safety requirements + 3. Additional dependency on runtime linking: where code pages have been discarded, reloading + them from disk can cause a process to violate its applicable timing requirements. 2. The optimisations made by the kernel in providing physical backing to process memory make it very questionable if it can be assessed when a (part of) a process memory content is actually present in the system physical memory.