Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repeat-expr can't use fields defined in instances #298

Closed
GreyCat opened this issue Dec 14, 2017 · 6 comments
Closed

repeat-expr can't use fields defined in instances #298

GreyCat opened this issue Dec 14, 2017 · 6 comments
Labels

Comments

@GreyCat
Copy link
Member

GreyCat commented Dec 14, 2017

Moving issue from #1204, orignally by @yannayl


Trying to compile this .ksy file:

meta:
  id: floats
  endian: le
instances:
  - id: num_floats
    type: u4
  - id: floats
    type: f8
    repeat: expr
    repeat-expr: num_floats

yields the following error:

/instances: expected map, got List(Map(id -> num_floats, type -> u4), Map(id -> floats, type -> f8, repeat -> expr, repeat-expr -> num_floats)) (class scala.collection.immutable.$colon$colon)

This file is a copy-paste from the examples replacing a seq with instances.

I'm not sure if it is related to issue #512 or not.

@GreyCat
Copy link
Member Author

GreyCat commented Dec 14, 2017

seq and instances are not interchangeable, and there is good reason for that:

  • "seq" is a sequence, so we have to:
    • use array to maintain order
    • use extra id keys to add identifiers
  • "instances" is not a sequence, but an unordered dictionary of IDs to instance descriptions ("a map" in YAML terms) — thus:
    • we don't need any extra id keys => we just put these IDs into map keys
    • we don't have any information about where this instance would be located in the file => so we need to specify it using pos key

Thus, stuff like that

instances:
  - id: num_floats
    type: u4

does not make much sense anyway. It does not need order, it does not need id key (it should be put into the map key), and, what's much more important, it needs pos to be meaningful. Something like that will compile:

instances:
  num_floats:
    pos: 0x100
    type: u4

If you'll take a closer look at the error message, it actually tells you exactly that: instances is expected to contain a map, but you've supplied it with an ordered list.

@yannayl
Copy link

yannayl commented Dec 15, 2017

I see. Thanks!

What I was originally looking for is a "lazy" sequence. One which it's values are only retrieved and parsed when accessed. I assumed 'instances' is the closest thing I can get.

So, is it possible to make a 'seq' lazy?
Alternatively, can the positions in 'instances' can be added automatically to be exactly after the previous field unless specified otherwise?
Is there another solution I'm missing?

Thanks for your help.

@GreyCat
Copy link
Member Author

GreyCat commented Dec 15, 2017

There is an issue for lazy sequence attributes — kaitai-io/kaitai_struct_compiler#133 — but don't hold your breath for it. Conceptually, it's somewhat hard, as there are many things to consider. For example, consider this:

seq:
  - id: a
    type: user_type_a
  - id: b
    type: user_type_b
    lazy: true
  - id: c
    type: user_type_c

It is probably impossible to implement, as we don't know size of "user_type_b" before parsing just to skip that amount of bytes to eagerly parse only a and c. And it's even more complicated with repetitions (and that's what many people want — like "lazy parsing of packets in terabyte-long packet dump").

There are a couple of hacks to achieve that with instances, but note that they're all really hacks :( For example, you can actually exploit the way that instances are called and do something like that:

seq:
  - id: num_floats
    type: u4
  - id: placeholder
    size: 8 * num_floats
    if: save_pos1 >= 0
instances:
  save_pos1:
    value: _io.pos
  floats:
    pos: save_pos1
    repeat: expr
    repeat-expr: num_floats
    type: u8

What happens here is that before parsing placeholder field, it would actually check the condition in if, which is obviously always true, but it will invoke save_pos1. save_pos1 is a value (calculated) instance, which will calculate and remember the result of calculation of current position in the stream. And after that, you can use pos: save_pos1 to reference that position for lazy instance.

@koczkatamas
Copy link
Member

So, is it possible to make a 'seq' lazy?

No, but here is a feature request about this: #133

Alternatively, can the positions in 'instances' can be added automatically to be exactly after the previous field unless specified otherwise?

Not in a supported way, but you can still use _io.pos which is (hopefully) set to the position after the last seq field:

Try this:

meta:
  id: lazy_instance
seq:
  - id: field1
    type: u2le
instances:
  field2:
    type: u2le
    pos: _io.pos

Of course if you read the fields in the wrong order, or you want to use more than 1 instance field then the situation gets more complex. (You can use your 1 instance option to read not just a u2le but a complex structure too!)

Is there another solution I'm missing?

We usually recommend to split your type which contains the lazy field into multiple subtypes and control the parsing logic from your program code. This way you have to write more code manually, but Kaitai still helps you to parse the binary structures, so you can make your code more flexible at the cost of a little more hand-written code.

@yannayl
Copy link

yannayl commented Dec 18, 2017

The default position of instance's field is meaningless as of now (I mean, it just reads wherever the stream is). Perhaps it would be wise to print an error or throw an exception or something. (The way I wish it would to be is right after the previous field i.e. previous field position + previous field size which can be either resolved in compile time if possible or deferred to runtime. But I can see it's a very complex feature to implement.)

Just to complete the picture: I am parsing a remote resource thus read of every byte is an expensive operation (I'm using an underlying IO implementation for caching). I can't seem to find a nice way to do it with any existing solution :/

Thanks for your help guys, you are running a really awesome project.

@GreyCat
Copy link
Member Author

GreyCat commented Mar 26, 2019

Perhaps it would be wise to print an error or throw an exception or something.

Extracted as separate issue #544

Closing this one, as discussion seems to have reached its point.

@GreyCat GreyCat closed this as completed Mar 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants