-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsafety improvements #1683
base: master
Are you sure you want to change the base?
Unsafety improvements #1683
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the interner part is confusing because 'a
and 'self
(although the later doesn't really make sense for a value - which is in part why there's no such keyword in Rust) are in fact morally the same. At least that's what I understand from the code.
What we want to express is that the &str
we are storing inside the vector and the hashmap live as long as the the whole interner, because we store them inside an arena that is owned by the interner. So, for all intent of purpose, 'a = 'self
. But you can't express that in Rust without an explicit lifetime, at least to the best of my knowledge. Hence, we need to use this additional parameter, to juggle between them depending on the requirement of the type signature, and sometimes force the compiler to accept the conversion using transmute
.
I believe you're right that what we want in the second transmute is to get out an &'a
to be able to store it inside the vector and the hashmap.
I'm not sure about the signature change of lookup
, though: after all both lifetimes are "valid" (in that both are possible and sound), but I tend to think that 'a
is an implementation detail/Rust work-around that leaks. As a consumer of an interner, what you want is to create stuff that live as long as the interner, I think, and you don't really care about the 'a
lifetime. So using the same as self
and transmuting looks nicer. It doesn't matter now, because we only have one, static interner, but let's imagine that interner
is a library, and that we might make use of it locally (e.g. if one day we compile to bytecode - we could use an interner for one of the pases, but not the others).
this simplifies things and eliminates a usage of `unsafe`.
42a0556
to
b652cbe
Compare
Ohhh no. Okay, so I played around some more and discovered why the lifetimes were the way they were originally. With the changes I made, one can write this code: let interner = Interner::new();
let sym = interner.intern("temporary");
let smuggler = interner.lookup(sym);
drop(interner);
println!(smuggler); // use-after-free!!! It's more clear what's going on if I add in a couple of explicit lifetime annotations: let interner = Interner::<'static>::new();
let sym = interner.intern("temporary");
let smuggler: &'static str = interner.lookup(sym);
drop(interner);
println!(smuggler); // use-after-free!!! There is no guarantee that So it's definitely not the case that |
Yes, sorry, I think I didn't express myself correctly. What I meant is that, as the writer of the API, you want to think of However you're right that it's not just a question of taste or API usage, because you can obviously pick |
I managed to produce UB without let i1 = Interner::new();
let i2 = Interner::new();
let sym = i1.intern("foo");
let foo = i1.lookup(sym);
*i1.0.write().unwrap() = i2.0.into_inner().unwrap();
dbg!(foo); The issue here is that we assume the lifetime of the There's another problem, that's potentially worse / easier to trigger. I wasn't able to actually trigger it but I'm not entirely sure why. Self-referential structs can be problematic if they're moved. Something like pub struct Foo<'a> {
num: u32,
num_ptr_opt: Option<&'a u32>,
}
impl<'a> Foo<'a> {
pub fn new(num: u32) -> Self {
let mut foo = Foo {
num,
num_ptr_opt: None
};
foo.num_ptr_opt = unsafe { std::mem::transmute<&'_ u32, &'a u32>(foo.num) };
foo
}
pub fn get_num_ptr(&self) -> &u32 {
self.num_ptr_opt.unwrap()
}
} Seems safe, because the data lives as long as the struct, and so the pointer should always point to valid memory. But it's not, because one can write let foo = Foo::new(0);
let num_ptr = foo.get_num_ptr();
let foo = std::hint::black_box(foo); // force move
dbg!(num_ptr); // points to moved / freed memory My guess as to why this doesn't happen with I tried running a test of this behaviour with |
pub(crate) struct Interner<'a>(RwLock<InnerInterner<'a>>); | ||
// NOTE: We set the lifetime parameter of InnerInterner to 'static since | ||
// it's arbitrary, and there's no reason to expose it to users of Interner | ||
pub(crate) struct Interner(RwLock<InnerInterner<'static>>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Radvendii , it seems that it used to be the fact that the lifetime of the interner and the inner interner were the same one, before this change. Or am I misunderstanding something? Does your first example of UB still happens with the original code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it still happens with the original code.
Remember that in Interner<'a>(RwLock<InnerInterner<'a>)
, there is nothing limiting 'a
to the liferime of the Interner. If rust needs it to be 'static
it will just insert 'static
in here anyways, so specifying 'static
doesn't allow any behaviour that wasn't allowed before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. But it's a private item already, as is InnerInterner
, so you can only access the InnerInterner
of an interner from within the identifier::interner
module, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact, I start to believe that the lifetime of InnerInterner
is useless, because we only ever instantiate it to static. So maybe we should just bite the bullet and use static
references, saying that they aren't static at all, but only the actual implementation of InnerInterner
can touch it. And document as well that an Interner
should never ever change the underlying inner interner after construction. However, the inner interner being private should make it relatively ok. Another solution is to inline the inner interner in the Interner
to avoid this situation, but this means more locks, and more coupling. I don't know if this is worth the price: in any case, we should make it clear that this module has unsafe parts and that you shouldn't touch it unless you know what you're doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possibility is to explore Pin
, which believe allows to express those kind of constraints (for example you can't swap a Pin<..>
with something else, just drop it, which might be sufficient to avoid your UB example ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And finally, maybe we should use NonNull
instead of references with lifetimes, which would avoid to lie about their lifetime (would still require unsafe, but at least won't make it possible to do conversion to &'static
from safe code)
Cf my in-code comment, but it seems this UB has been introduced by this PR, and wasn't here before?
Ah, and I found other people claiming that Box guarantees that the content will never be moved 😅 https://stackoverflow.com/questions/64854198/can-box-move-its-contents-when-the-whole-box-is-moved Also, I didn't have time to dive as deep as you into the problem, so I might be misinterpreting, but it seems the answer you linked doesn't really says that stuff on the heap can be moved. It says that it's unsafe to have the content of a
It seems what we do here is different, because we never ever expose a mutable static reference, nor do we mutably alias anything, so nothing can take out the backing storage of e.g. the That being said: https://www.reddit.com/r/rust/comments/1479b5n/ouroboros_is_also_unsound/ |
summoning @jneem to have a second look at those tricky soundness issues |
@Radvendii I couldn't reproduce your "Foo" example on the playground. The signature of My understanding of Box (which seems compatible with both SO answers) is that the pointed-to data never moves (expressed in the fact that I wonder, though, if it would be better to just take a dependency on an existing interner crate. This one seems to be widely used, for example. |
I think the original assessment was that we needed something very simple, but now, it looks like it's not as simple as initially thought (and the person who wrote this code left Tweag some time ago, although to be fair, it's well-written). A second point is that |
As a followup to #1681, I've implemented a couple of the easy improvements.
std::mem::transmute
so the types are present locallyWhen doing the latter, I noticed that one of the comments is wrong. I'm not sure how to interpret the lifetime that is correct, though.