Skip to content

Commit

Permalink
[wasm] Webcil-in-WebAssembly (#85932)
Browse files Browse the repository at this point in the history
Define a WebAssembly module wrapper for Webcil assemblies.
Contributes to #80807 

### Why

In some settings serving `application/octet-stream` data, or files with weird extensions will trigger firewalls or AV tools.  But let's assume that if you're interested in deploying a .NET WebAssembly app, you're in an environment that can at least serve WebAssembly modules.

### How

Essentially we serve this WebAssembly module:

```wat
(module
  (data "\0f\00\00\00") ;; data segment 0: payload size
  (data "webcil Payload\cc")  ;; data segment 1: webcil payload
  (memory (import "webcil" "memory") 1)
  (global (export "webcilVersion") i32 (i32.const 0))
  (func (export "getWebcilSize") (param $destPtr i32) (result)
    local.get $destPtr
    i32.const 0
    i32.const 4
    memory.init 0)
  (func (export "getWebcilPayload") (param $d i32) (param $n i32) (result)
    local.get $d
    i32.const 0
    local.get $n
    memory.init 1))
```

The module exports two WebAssembly functions `getWebcilSize` and `getWebcilPayload` that write some bytes (being the size or payload of the webcil assembly) to the linear memory at a given offset.  The module also exports the constant `webcilVersion` to version the wrapper format.

So a runtime or tool that wants to consume the webcil module can do something like:

```js
const wasmModule = new WebAssembly.Module (...);
const wasmMemory = new WebAssembly.Memory ({initial: 1});
const wasmInstance =
      new WebAssembly.Instance(wasmModule, {webcil: {memory: wasmMemory}});
const { getWebcilPayload, webcilVersion, getWebcilSize } = wasmInstance.exports;
console.log (`Version ${webcilVersion.value}`);
getWebcilSize(0);
const size = new Int32Array (wasmMemory.buffer)[0]
console.log (`Size ${size}`);
console.log (new Uint8Array(wasmMemory.buffer).subarray(0, 20));
getWebcilPayload(4, size);
console.log (new Uint8Array(wasmMemory.buffer).subarray(0, 20));
```

### How (Part 2)

But actually, we will define the wrapper to consist of exactly 2 data segments in the WebAssembly data section: segment 0 is 4 bytes and encodes the webcil payload size; and segment 1 is of variable size and contains the webcil payload.

So to load a webcil-in-wasm module, the runtime gets the _raw bytes_ of the WebAssembly module (ie: without instantiating it), and parses it to find the data section, assert that there are 2 segments, ensure they're both passive, and get the data directly from segment 1.

---

* Add option to emit webcil inside a wasm module wrapper

* [mono][loader] implement a webcil-in-wasm reader

* reword WebcilWasmWrapper summary comment

* update the Webcil spec to include the WebAssembly wrapper module

* Adjust RVA map offsets to account for wasm prefix

   MonoImage:raw_data is used as a base when applying the RVA map to map virtual addresses to physical offsets in the assembly.  With webcil-in-wasm there's an extra wasm prefix before the webcil payload starts, so we need to account for this extra data when creating the mapping.

   An alternative is to compute the correct offsets as part of generating the webcil, but that would entangle the wasm module and the webcil payload.  The current (somewhat hacky approach) keeps them logically separate.

* Add a note about the rva mapping to the spec

* Serve webcil-in-wasm as .wasm

* remove old .webcil support from Sdk Pack Tasks

* Implement support for webcil in wasm in the managed WebcilReader

* align webcil payload to a 4-byte boundary within the wasm module

   Add padding to data segment 0 to ensure that data segment 1's payload (ie the webcil content itself) is 4-byte aligned

* assert that webcil raw data is 4-byte aligned

* add 4-byte alignment requirement to the webcil spec

* Don't modify MonoImageStorage:raw_data

   instead just keep track of the webcil offset in the MonoImageStorage.

   This introduces a situation where MonoImage:raw_data is different from MonoImageStorage:raw_data.  The one to use for accessing IL and metadata is MonoImage:raw_data.

   The storage pointer is just used by the image loading machinery

---------

Co-authored-by: Larry Ewing <[email protected]>
  • Loading branch information
lambdageek and lewing authored May 16, 2023
1 parent 4c23ac2 commit 55c4e8c
Show file tree
Hide file tree
Showing 28 changed files with 906 additions and 48 deletions.
84 changes: 74 additions & 10 deletions docs/design/mono/webcil.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,83 @@

## Version

This is version 0.0 of the Webcil format.
This is version 0.0 of the Webcil payload format.
This is version 0 of the WebAssembly module Webcil wrapper.

## Motivation

When deploying the .NET runtime to the browser using WebAssembly, we have received some reports from
customers that certain users are unable to use their apps because firewalls and anti-virus software
may prevent browsers from downloading or caching assemblies with a .DLL extension and PE contents.

This document defines a new container format for ECMA-335 assemblies
that uses the `.webcil` extension and uses a new WebCIL container
format.
This document defines a new container format for ECMA-335 assemblies that uses the `.wasm` extension
and uses a new WebCIL metadata payload format wrapped in a WebAssembly module.


## Specification

### Webcil WebAssembly module

Webcil consists of a standard [binary WebAssembly version 0 module](https://webassembly.github.io/spec/core/binary/index.html) containing the following WAT module:

``` wat
(module
(data "\0f\00\00\00") ;; data segment 0: payload size as a 4 byte LE uint32
(data "webcil Payload\cc") ;; data segment 1: webcil payload
(memory (import "webcil" "memory") 1)
(global (export "webcilVersion") i32 (i32.const 0))
(func (export "getWebcilSize") (param $destPtr i32) (result)
local.get $destPtr
i32.const 0
i32.const 4
memory.init 0)
(func (export "getWebcilPayload") (param $d i32) (param $n i32) (result)
local.get $d
i32.const 0
local.get $n
memory.init 1))
```

That is, the module imports linear memory 0 and exports:
* a global `i32` `webcilVersion` encoding the version of the WebAssembly wrapper (currently 0),
* a function `getWebcilSize : i32 -> ()` that writes the size of the Webcil payload to the specified
address in linear memory as a `u32` (that is: 4 LE bytes).
* a function `getWebcilPayload : i32 i32 -> ()` that writes `$n` bytes of the content of the Webcil
payload at the spcified address `$d` in linear memory.

The Webcil payload size and payload content are stored in the data section of the WebAssembly module
as passive data segments 0 and 1, respectively. The module must not contain additional data
segments. The module must store the payload size in data segment 0, and the payload content in data
segment 1.

The payload content in data segment 1 must be aligned on a 4-byte boundary within the web assembly
module. Additional trailing padding may be added to the data segment 0 content to correctly align
data segment 1's content.

(**Rationale**: With this wrapper it is possible to split the WebAssembly module into a *prefix*
consisting of everything before the data section, the data section, and a *suffix* that consists of
everything after the data section. The prefix and suffix do not depend on the contents of the
Webcil payload and a tool that generates Webcil files could simply emit the prefix and suffix from
constant data. The data section is the only variable content between different Webcil-encoded .NET
assemblies)

(**Rationale**: Encoding the payload in the data section in passive data segments with known indices
allows a runtime that does not include a WebAssembly host or a runtime that does not wish to
instantiate the WebAssembly module to extract the payload by traversing the WebAssembly module and
locating the Webcil payload in the data section at segment 1.)

(**Rationale**: The alignment requirement is due to ECMA-335 metadata requiring certain portions of
the physical layout to be 4-byte aligned, for example ECMA-335 Section II.25.4 and II.25.4.5.
Aligning the Webcil content within the wasm module allows tools that directly examine the wasm
module without instantiating it to properly parse the ECMA-335 metadata in the Webcil payload.)

(**Note**: the wrapper may be versioned independently of the payload.)


### Webcil payload

The webcil payload contains the ECMA-335 metadata, IL and resources comprising a .NET assembly.

As our starting point we take section II.25.1 "Structure of the
runtime file format" from ECMA-335 6th Edition.

Expand All @@ -40,12 +102,12 @@ A Webcil file follows a similar structure
| CLI Data |
| |

## Webcil Headers
### Webcil Headers

The Webcil headers consist of a Webcil header followed by a sequence of section headers.
(All multi-byte integers are in little endian format).

### Webcil Header
#### Webcil Header

``` c
struct WebcilHeader {
Expand Down Expand Up @@ -75,11 +137,11 @@ The next pairs of integers are a subset of the PE Header data directory specifyi
of the CLI header, as well as the directory entry for the PE debug directory.


### Section header table
#### Section header table

Immediately following the Webcil header is a sequence (whose length is given by `coff_sections`
above) of section headers giving their virtual address and virtual size, as well as the offset in
the Webcil file and the size in the file. This is a subset of the PE section header that includes
the Webcil payload and the size in the file. This is a subset of the PE section header that includes
enough information to correctly interpret the RVAs from the webcil header and from the .NET
metadata. Other information (such as the section names) are not included.

Expand All @@ -92,11 +154,13 @@ struct SectionHeader {
};
```

### Sections
(**Note**: the `st_raw_data_ptr` member is an offset from the beginning of the Webcil payload, not from the beginning of the WebAssembly wrapper module.)

#### Sections

Immediately following the section table are the sections. These are copied verbatim from the PE file.

## Rationale
### Rationale

The intention is to include only the information necessary for the runtime to locate the metadata
root, and to resolve the RVA references in the metadata (for locating data declarations and method IL).
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System;
using System.Collections.Immutable;
using System.IO;
using System.Reflection;
using System.Runtime.InteropServices;
using System.Text;

namespace Microsoft.NET.WebAssembly.Webcil;

internal class WasmModuleReader : IDisposable
{
public enum Section : byte
{
// order matters: enum values must match the WebAssembly spec
Custom,
Type,
Import,
Function,
Table,
Memory,
Global,
Export,
Start,
Element,
Code,
Data,
DataCount,
}

private readonly BinaryReader _reader;

private readonly Lazy<bool> _isWasmModule;

public bool IsWasmModule => _isWasmModule.Value;

public WasmModuleReader(Stream stream)
{
_reader = new BinaryReader(stream, Encoding.UTF8, leaveOpen: true);
_isWasmModule = new Lazy<bool>(this.GetIsWasmModule);
}


public void Dispose()
{
Dispose(true);
}


protected virtual void Dispose(bool disposing)
{
if (disposing)
{
_reader.Dispose();
}
}

protected virtual bool VisitSection (Section sec, out bool shouldStop)
{
shouldStop = false;
return true;
}

private const uint WASM_MAGIC = 0x6d736100u; // "\0asm"

private bool GetIsWasmModule()
{
_reader.BaseStream.Seek(0, SeekOrigin.Begin);
try
{
uint magic = _reader.ReadUInt32();
if (magic == WASM_MAGIC)
return true;
} catch (EndOfStreamException) {}
return false;
}

public bool Visit()
{
if (!IsWasmModule)
return false;
_reader.BaseStream.Seek(4L, SeekOrigin.Begin); // skip magic

uint version = _reader.ReadUInt32();
if (version != 1)
return false;

bool success = true;
while (success) {
success = DoVisitSection (out bool shouldStop);
if (shouldStop)
break;
}
return success;
}

private bool DoVisitSection(out bool shouldStop)
{
shouldStop = false;
byte code = _reader.ReadByte();
Section section = (Section)code;
if (!Enum.IsDefined(typeof(Section), section))
return false;
uint sectionSize = ReadULEB128();

long savedPos = _reader.BaseStream.Position;
try
{
return VisitSection(section, out shouldStop);
}
finally
{
_reader.BaseStream.Seek(savedPos + (long)sectionSize, SeekOrigin.Begin);
}
}

protected uint ReadULEB128()
{
uint val = 0;
int shift = 0;
while (true)
{
byte b = _reader.ReadByte();
val |= (b & 0x7fu) << shift;
if ((b & 0x80u) == 0) break;
shift += 7;
if (shift >= 35)
throw new OverflowException();
}
return val;
}

protected bool TryReadPassiveDataSegment (out long segmentLength, out long segmentStart)
{
segmentLength = 0;
segmentStart = 0;
byte code = _reader.ReadByte();
if (code != 1)
return false; // not passive
segmentLength = ReadULEB128();
segmentStart = _reader.BaseStream.Position;
// skip over the data
_reader.BaseStream.Seek (segmentLength, SeekOrigin.Current);
return true;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ FilePosition SectionStart

private string InputPath => _inputPath;

public bool WrapInWebAssembly { get; set; } = true;

private WebcilConverter(string inputPath, string outputPath)
{
_inputPath = inputPath;
Expand All @@ -62,6 +64,26 @@ public void ConvertToWebcil()
}

using var outputStream = File.Open(_outputPath, FileMode.Create, FileAccess.Write);
if (!WrapInWebAssembly)
{
WriteConversionTo(outputStream, inputStream, peInfo, wcInfo);
}
else
{
// if wrapping in WASM, write the webcil payload to memory because we need to discover the length

// webcil is about the same size as the PE file
using var memoryStream = new MemoryStream(checked((int)inputStream.Length));
WriteConversionTo(memoryStream, inputStream, peInfo, wcInfo);
memoryStream.Flush();
var wrapper = new WebcilWasmWrapper(memoryStream);
memoryStream.Seek(0, SeekOrigin.Begin);
wrapper.WriteWasmWrappedWebcil(outputStream);
}
}

public void WriteConversionTo(Stream outputStream, FileStream inputStream, PEFileInfo peInfo, WCFileInfo wcInfo)
{
WriteHeader(outputStream, wcInfo.Header);
WriteSectionHeaders(outputStream, wcInfo.SectionHeaders);
CopySections(outputStream, inputStream, peInfo.SectionHeaders);
Expand Down Expand Up @@ -210,7 +232,7 @@ private static void WriteStructure<T>(Stream s, T structure)
}
#endif

private static void CopySections(FileStream outStream, FileStream inputStream, ImmutableArray<SectionHeader> peSections)
private static void CopySections(Stream outStream, FileStream inputStream, ImmutableArray<SectionHeader> peSections)
{
// endianness: ok, we're just copying from one stream to another
foreach (var peHeader in peSections)
Expand Down
Loading

0 comments on commit 55c4e8c

Please sign in to comment.