-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE-2464: exposing scryer prolog functionality in libscryer_prolog.so for client library consumption #2465
base: master
Are you sure you want to change the base?
ISSUE-2464: exposing scryer prolog functionality in libscryer_prolog.so for client library consumption #2465
Conversation
(Edit: I really wasn't sure I was going to be able to figure out how to do this, but the puzzle began to progressively come together. The original description of the PR was more of a cry for help, it is now somewhat organized. The following comments are the ramblings leading to the eventual victory of sorts.) Oh wow, it DOES work!! I think?? The problem was actually in the Python code (sort of). I examined the scryer playground code and realize there was more going on "under the hood" of the playground page than I had supposed When the python code is modified as follows:import ctypes
lib = ctypes.cdll.LoadLibrary("/home/jay/programs/scryer-prolog/target/release/libscryer_prolog.so")
lib.eval_code_c.argtypes = (ctypes.c_char_p,)
lib.eval_code_c.restype = ctypes.POINTER(ctypes.c_char)
lib.free_c_string.argtypes = (ctypes.POINTER(ctypes.c_char),)
lib.free_c_string.restype = None
def eval_and_free(source, query):
playground = f"""
{source}
:- use_module(library(charsio)).
:- use_module(library(iso_ext)).
:- use_module(library(format)).
:- use_module(library(dcgs)).
:- initialization(playground_main).
playground_main :-
QueryStr = "{query}",
read_term_from_chars(QueryStr, Query, [variable_names(Vars)]),
bb_put(playground_first_answer, true),
bb_put(playground_pending, true),
( setup_call_cleanup(true, Query, NotPending = true),
% Query succeeded
bb_get(playground_first_answer, FirstAnswer),
( FirstAnswer == true ->
format(" ", []),
bb_put(playground_first_answer, false)
; true
),
phrase(playground_answer(Vars), Answer),
format("~s", [Answer]),
( NotPending == true ->
bb_put(playground_pending, false)
; format("~n; ", [])
),
false
; % Query maybe failed
bb_get(playground_pending, Pending),
( Pending == true ->
% Query failed
bb_get(playground_first_answer, FirstAnswer),
( FirstAnswer == true ->
format(" ", []),
bb_put(playground_first_answer, false)
; true
),
format("false", [])
; % Query just terminated
true
)
),
format(".", []).
playground_answer([]) --> "true".
playground_answer([Var|Vars]) -->
playground_answer_([Var|Vars]).
playground_answer_([]) --> "".
playground_answer_([VarName=Var|Vars]) -->
{{ write_term_to_chars(Var, [double_quotes(true), quoted(true)], VarChars) }},
format_("~a = ~s", [VarName, VarChars]),
( {{ Vars == [] }} ->
[]
; ", ",
playground_answer_(Vars)
).
"""
res_ptr = lib.eval_code_c(playground.encode('utf-8'))
res = ctypes.cast(res_ptr, ctypes.c_char_p).value.decode('utf-8')
lib.free_c_string(res_ptr)
return res
if __name__ == '__main__':
print(eval_and_free(
"""
fact(1).
fact(2).
fact(3).
""",
"fact(X)."
)) /home/jay/_project/tai/gamedev/ffi-test/env/bin/python /home/jay/_project/tai/gamedev/ffi-test/src/ffigo/ffgo.py
X = 1
; X = 2
; X = 3. 🥳🥳🥳🥳 Still, I am not a rust (nor scryer/prolog) expert, so I would certainly love feedback (or even disrespect) on this shared library addition so we can make it better. |
I suppose the next issue is that there does not seem to be any persistence of data between eval calls. If I modify the python code as follows: def result_set(source, query, n):
return f"Result Set {n}:\n{eval_and_free(source, query)}"
if __name__ == '__main__':
print(result_set(
'''
fact(1).
fact(2).
fact(3).
''',
'fact(X).',
0))
print(result_set("", "fact(X).", 1))
eval_and_free("", "assertz(fact(4)).")
print(result_set("", "fact(X).", 2)) /home/jay/_project/tai/gamedev/ffi-test/env/bin/python /home/jay/_project/tai/gamedev/ffi-test/src/ffigo/ffgo.py
Result Set 0:
X = 1
; X = 2
; X = 3.
Result Set 1:
error(existence_error(procedure,fact/1),fact/1).
Result Set 2:
error(existence_error(procedure,fact/1),fact/1). I would have been surprised and delighted had this worked. As you can imagine the lack of persistence between calls would make this a deal breaker for any sort of serious work. (after this I'm going to start wondering if there isn't some sort of zero-copy pathway available between the scryer system and the calling system 🤔 ) |
Ok, I _think_ I may be close to a way to preserve state between calls.#[cfg(not(target_arch = "wasm32"))]
use std::ffi::{c_char, CStr, CString};
#[cfg(not(target_arch = "wasm32"))]
use std::collections::HashMap;
#[cfg(not(target_arch = "wasm32"))]
use std::sync::Mutex;
#[cfg(not(target_arch = "wasm32"))]
use lazy_static::lazy_static;
#[cfg(not(target_arch = "wasm32"))]
use machine::mock_wam::*;
#[cfg(not(target_arch = "wasm32"))]
thread_local! {
pub static MACHINE: RefCell<Option<Machine>> = RefCell::new(None);
}
#[cfg(not(target_arch = "wasm32"))]
#[no_mangle]
pub extern "C" fn machine_new() {
println!("Engaging the machine!");
MACHINE.with(|m| *m.borrow_mut() = Some(Machine::with_test_streams()));
}
#[cfg(not(target_arch = "wasm32"))]
#[no_mangle]
pub extern "C" fn machine_free() {
println!("Releasing the machine!");
MACHINE.with(|m| *m.borrow_mut() = None);
}
#[cfg(not(target_arch = "wasm32"))]
#[no_mangle]
pub extern "C" fn eval_code_c(input: *const c_char) -> *mut c_char {
let c_str = unsafe {
assert!(!input.is_null());
CStr::from_ptr(input)
};
let r_str = c_str.to_str().expect("Not a valid UTF-8 string");
let bytes = MACHINE.with(|m| {
let mut machine = m.borrow_mut();
let machine = machine.as_mut().expect("Machine not initialized.");
machine.test_load_string(r_str)
});
let result_str = String::from_utf8_lossy(&bytes).to_string();
println!("Result: {}", result_str);
let c_string = CString::new(result_str).expect("Failed to convert to CString");
c_string.into_raw()
}
#[no_mangle]
pub extern "C" fn free_c_string(ptr: *mut c_char) {
unsafe {
if ptr.is_null() {
return;
}
let _ = CString::from_raw(ptr);
};
} the only issue is with I can't really tell what this is doing: pub fn test_load_string(&mut self, code: &str) -> Vec<u8> {
let stream = Stream::from_owned_string(code.to_owned(), &mut self.machine_st.arena);
self.load_file("<stdin>", stream);
self.user_output.bytes().map(|b| b.unwrap()).collect()
} because I can't tell what this is doing: pub fn load_file(&mut self, path: &str, stream: Stream) {
self.machine_st.registers[1] = stream_as_cell!(stream);
self.machine_st.registers[2] =
atom_as_cell!(AtomTable::build_with(&self.machine_st.atom_tbl, path));
self.run_module_predicate(atom!("loader"), (atom!("file_load"), 2));
} but I've realize I have some fundamental misunderstanding about how consults work and how they are different from source code. The way the playground code gets around this is: {source}
:- use_module(library(charsio)).
:- use_module(library(iso_ext)).
:- use_module(library(format)).
:- use_module(library(dcgs)).
:- initialization(playground_main)
playground_main :-
QueryStr = "{query}",
read_term_from_chars(QueryStr, Query, [variable_names(Vars)]),
bb_put(playground_first_answer, true),
bb_put(playground_pending, true),
... snip ... but hypothetically, in python, I should be able to do something like: if __name__ == '__main__':
source = '''
fact(1).
fact(2).
fact(3).
'''
lib.machine_new()
eval_and_free(source)
print(eval_and_free('?- fact(X).'))
print(eval_and_free('?- assertz(fact(4)).'))
eval_and_free('?- fact(X).')
lib.machine_free() this raises two problems --
I was looking at sockets.pl to try to figure out how persistent sessions with scryer work, but I haven't been able to figure it out yet :/ |
Hi! I'm very new here, so I can't help you with your efforts, but using Prolog code in a library is a non-trivial tasks, because of non-determinism and highly dynamic nature of Prolog, it is deceptively easy to evaluate some simple cases, but for a complex Prolog code there even might be no good representation in your host language! For example, do you have a plan what to do if Prolog program doesn't terminate after giving you couple of answers? Imho, you can always spawn Scryer as a separate process and communicate with it using pipes or sockets. |
Great point! I want to use this in a C#/Unity project, and since it's for an off-line single player video game it needs to be embedded. Regarding representation, I'm fine working with strings, and the output from calls to the scryer shared library could be unparsed as json by scryer itself, and that's fine for me. |
Wow, amazing. Rust is really great for contributing to, and the code was well organized so it was possible to follow what was happening enough to hack this together. Great work! The secret ended up being using the actual machine designed for shared libraries in Let me know if you want me to squash the commits. |
This looks amazing! Lack of iterator-like generation of answer substitution is one of the biggest limitations of the current approach to using Scryer as a library, and this seems to fix it (I will test it latter). As for the global thread-local storage, I think it would be best if you represented the state of the machine and query generator as actual objects that need to be passed to functions. This could be done in the C API with opaque pointers, and it's a common pattern in C libraries. This would also enable using multiple machines at the same time if I'm not mistaken. Also, while returning JSON is convenient when used from Python, as a general C library it would be best to actually return the data in a format usable in C. If you need to return multiple values, you can use out pointers. So the header could be something like that: typedef struct Machine_s Machine;
typedef struct QueryState_d QueryState;
Machine *machine_new();
void machine_free(Machine* machine);
// From what I've seen, the errors that you are emitting in these functions
// would just be undefined behavior in this implementation
QueryState *start_new_query_generator(Machine *machine, char *query);
void cleanup_query_generator(QueryState *query_state);
// If returns NULL, check error_msg
// (What exactly should this return? Should we have a struct for variable
// substitutions and return a list of that?)
char *run_query_generator(Machine *machine, QueryState *query_state, char **error_msg);
// Both return NULL on success, else returns error message
char *load_module_string(Machine *machine, char *module);
char *consult_module_string(Machine *machine, char *module);
// If returns NULL, check error_msg
char *run_query(Machine *machine, char *query, char **error_msg);
// Run on all the (non-NULL) strings generated by the other functions
void free_c_string(char *str); I think you already did the hard part of actually threading through the low level implementation (though @mthom and @lucksus may want to check if there is anything wrong there), so tidying up the API is basically just plumbing. I think it's very important to have a decently good C API if we can, although it's probably best to just do something usable if the "right way" would be too complicated (like I think is the case for Another thing is, if I understood how this works, // Returns:
// - 0 on success
// - 1 on end of results
// - 2 on error, check error_msg
// (What exactly should this return? Should we have a struct for variable
// substitutions and return a list of that?)
int run_query_generator(Machine *machine, QueryState *query_state, char **error_msg, char **results); |
Great thinking! The thread local storage of the machine state I think was the biggest weakness I could see in the implementation, and I think the approach of giving the client a reference to the machine makes perfect sense. I don't work this close to the metal very often and so I don't have much of an imagination for pointers, thank you for demonstrating this pattern! I will give that a shot when I get some time. |
You will probably want to heap allocate the machine in a box and then use |
Another thing: as C doesn't have namespacing, it's probably a good idea to give all the C API functions a prefix like |
Ok, I played with this for a bit and everything seems to be working fine (apart from the things I pointed at in my previous comments). Even |
Maybe instead of putting the c-interface into the crate root (i.e. |
Thanks for this. Embarrassingly, I didn't even know it was possible to do what @bakaq suggested here:typedef struct Machine_s Machine;
typedef struct QueryState_d QueryState;
Machine *machine_new();
void machine_free(Machine* machine);
// From what I've seen, the errors that you are emitting in these functions
// would just be undefined behavior in this implementation
QueryState *start_new_query_generator(Machine *machine, char *query);
void cleanup_query_generator(QueryState *query_state);
// If returns NULL, check error_msg
// (What exactly should this return? Should we have a struct for variable
// substitutions and return a list of that?)
char *run_query_generator(Machine *machine, QueryState *query_state, char **error_msg);
// Both return NULL on success, else returns error message
char *load_module_string(Machine *machine, char *module);
char *consult_module_string(Machine *machine, char *module);
// If returns NULL, check error_msg
char *run_query(Machine *machine, char *query, char **error_msg);
// Run on all the (non-NULL) strings generated by the other functions
void free_c_string(char *str); particularly in regards to the pointer for the machine, so I will dabble with the boxing technique and you can let me know if I got it right. |
I can see now why the WASM is structured the way it is for the scryer playground, but we could/should extend the WASM code (probably separate ticket) to support the same paradigm for folks who wanted to use scryer as a library in javascript without revaluating all the source code on every call and requiring the prelude with the |
could you possibly elaborate on this a little more? Ohhh I see. The only thing I have to do is "decorate" (Python terminology, not sure what the rust terminology is) the The only reason why we might not do it that way that I can think of is because I can see it being very possible to use some feature directives to make the WASM code and the DLL code be generated from the same file. Of course we could factor out the differences into their own submodules and then conditionally include them into Or if that's overly complicated we could do two separate pathways with a small amount of code duplication. |
Yes! Indeed this should definitively be available (and be the default recommended way) in every way to embed Scryer (Rust library, Wasm and C library). After your work here, it shouldn't be very difficult to adapt for the other cases. This will be immediately useful in the Scryer Playground, although to implement the best toplevel experience there we will need more than this (some interface to pipe stdin and stdout of the embedded machine), but this is out of scope for this PR. |
yeah, I think you got what I was trying to say
maybe it then makes sense to have the following module structure (using inline modules for simplicity) // in lib.rs
mod native_interface {
#[cfg(not(target_arch = "wasm32"))]
mod c_interface {
// non-wasm stuff goes here
}
#[cfg(target_arch = "wasm32")]
mod wasm_interface {
// wasm specific stuff goes here
}
#[cfg(not(target_arch = "wasm32"))]
use c_interface::*;
#[cfg(target_arch = "wasm32")]
use wasm_interface ::*;
// common stuff goes here
} using non-inline modules this would result in a file structure like
|
Based on the docs they are called |
tl;dr requesting help on avoiding some footguns in the implementation and usage of scryer as an embedded library based on past experience doing similar embedded language integrations
By the way, what is a "residual goal"? One thing I noticed while developing this (and having developed similar language hacking efforts in the past) is that there tends to always be a bit of a "leaky abstraction" when embedding languages inside of each other. Generally speaking, you need to be aware of the idiosyncrasies of the client/host and the shared library language. Fortunately, with scryer, I this will be a robust and smooth embedding experience thanks to the design of the language itself. One potential footgun in the implementation, for instance, is I have no idea what the consequence of terminating a generative query early is on the WAM and what steps must be done to cleanup. I tried to stick as close as I could to the run_query() formula and call self.trust_me() on early termination but... I don't trust it!! This could lead to some very subtle and infuriating bugs. Similarly, the way to interact with embedded scryer will likely be quite different from interacting with scryer via @triska 's (wonderful) ediprolog, for instance. Meaning -- should consulting (via the shared library However, exposing the consulting and query tools as they are will lead to user confusion unless we clarify usage patterns (or put in some guard rails) as I'm not even sure what the right answer is here and I've been staring at the source code! I bet @triska has some solid ideas regarding this and the language nuances here. |
Ok the comments about JSON you made here and in your comment on the issue are starting to sink in. I'm used to consuming the output in languages like Python, C#, or Clojure, and so from that standpoint, JSON is very convenient and I think not offering a JSON option would be a detriment to adoption unless there was some very clear way to marshal the data. Also, because I don't operate in the C arena very often, I'm not even sure what the alternative would be 😅 So, if we don't return a JSON string -- what do we return, exactly? A pointer to the raw data structure? Which data structure would this be? I think it would probably be much more efficient than JSON, I just honestly don't what that would look like -- I assume a pointer to some data structure? And then the client would need to make some kind of struct to extract the data? And what scryer would return from queries are... pointers? Thanks for being patient with me again as most of the languages I work with don't even have pointers 😅 |
When using constaint libraries in Scryer that are implemented using attributed variables, like ?- use_module(library(dif)).
true.
?- dif(X, 1).
% This is a residual goal, notice that its not in the usual X = _ substitution form,
% because X still is just a variable that isn't bound to anything, but we do know it's
% different from 1.
dif:dif(X,1).
?- dif(X, 1), X = 2.
% Here we have all the information to actually know what X is, so there are no
% residual goals.
X = 2.
I'm pretty sure "consulting" has a pretty well defined meaning (it's whatever ?- [user].
a(1).
a(2).
a(3).
b(1).
a(4).
% Warning: overwriting a/1 because the clauses are discontiguous
?- a(X).
X = 4. You can, however, define a clause to be discontiguous with a directive: ?- [user].
:- discontiguous(a/1).
a(1).
a(2).
a(3).
b(1).
a(4).
?- a(X).
X = 1 % Now it works!
; X = 2
; X = 3
; X = 4.
?- [user].
a(5).
a(6).
?- a(X).
X = 1
; X = 2
; X = 3
; X = 4
; X = 5
; X = 6. % Even over different consults! A predicate being |
Great, makes sense. My guess is the proper balance here would be to provide solid documentation and not do anything surprising like "automatically make every fact in a consult discontiguous/dynamic on behalf of the user" (a thought I had considered) because that would probably piss off people who actually know what they are doing with scryer and prolog. Instead, we should provide adequate documentation for client library developers so that they understand these nuances and provide a good interactive experience to their users. |
Yes, but when doing a C API for embedding usually you are intending to make it usable in every language (that includes languages like C where JSON is very inconvenient). Providing a JSON (or at least some dict like object) interface for higher level languages makes complete sense, but things like that are usually built on the more raw C API by wrapper libraries. So like, the C API is for everyone, for Python you would use "scryer-python" from pip (whenever someone does that) which provides a Pythonic API built on the low level C API. This low level C API to high level language wrapper pattern is very common because it gives the maximum flexibility for everyone.
That is the difficult part that I think we may have to just ignore for now. Usually at this point we have to create data structures in C that can accurately represent Prolog concepts like terms, and all the machinery to deal with them ((de)allocation, construction, inspection, etc...). As you certainly know, if you are just gonna wrap it in Python or C# then JSON is good enough for now, so I think it's fine if you only implement functions like EDIT: Actually, now that I thought about it, it wouldn't be as difficult as I imagined, because I suppose Scryer already has all the data structures and the machinery to deal with them implemented in Rust. We would need only to make C wrappers in a way that makes sense, but even this seems very hard and tedious. |
I think this is a very valuable insight, worth pursuing! An ideal API could maybe find a way to make all residual constraints available, with concrete bindings maybe simply a special case of any goal that can appear in answers. This could be very interesting for users of the API who actually want to reason about remaining constraints, such as in theorem provers etc. |
We can use ?- X #> 3, copy_term(X, X, Gs). Gs = [clpz:(X in 4..sup)], clpz:(X in 4..sup). |
Yes, the fact that |
c3bffdc
to
4768c01
Compare
|
I wrote a more elaborate test case that is breaking, not printing results correctly. Should probably be included in this ticket. tl;dr ?- solve(N, Moves).
%@ N = 1, Moves = [from_to(a,c)]
%@ ; N = 1, Moves = [from_to(a,d)]
%@ ; ... . does not print correctly, @bakaq ? It's coming out {"result":[{"Moves":["from_to(a","c)"],"N":1}],"status":"ok"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- A couple comments in the test look like they are outdated.
- Why is
wasm-bindgen
now added to the general dependencies in addition to the wasm-only dependencies?
For including the docs it might be easier to use a semi-absolute path by making it relative to the project root.
This can be done by using the environment variableCARGO_MANIFEST_DIR
. While building crates cargo sets this to the folder containing theCargo.toml
file.#![doc = include_str!(concat!(env!("CARGO_MAINFEST_DIR"),"/docs/shared_library/README.md"))]
do we need to document setting this environment variable for building crates or is that already handled in the cargo build?
cargo
handles setting the environment variable, no need to document it.
tests/scryer/shared_library_tests.rs
Outdated
// uncomment if we can figure out why this isn't working | ||
// use crate::lib::dll::{machine_free}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// uncomment if we can figure out why this isn't working | |
// use crate::lib::dll::{machine_free}; |
Cargo.toml
Outdated
@@ -80,6 +81,7 @@ ego-tree = "0.6.2" | |||
|
|||
serde_json = "1.0.122" | |||
serde = "1.0.204" | |||
wasm-bindgen = "0.2.93" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is wasm-bindgen
now added as a general dependency in addition to a wasm only dependency below? it should still only be necessary when compiling to wasm
Edit: "[{\"Moves\":[\"from_to\":[a,c]],\"N\":1}]" Edit2: This particular case makes me think we should give up on returning results as JSON and just return the raw results from the heap printer. There is no JSON equivalent for this: ?- solve(N, Moves).
%@ N = 1, Moves = [from_to(a,c)]
%@ ; N = 1, Moves = [from_to(a,d)]
%@ ; N = 2, Moves = [fill(c),from_to(a,d)]
%@ ; N = 2, Moves = [fill(d),from_to(a,c)]
%@ ; ... . the best we could do would be return Excuse my poor Prolog, but even when I attempted this: ?- solve(N, Moves), maplist(write_canonical, Moves).
%@ from_to(a,c) N = 1, Moves = [from_to(a,c)]
%@ ; from_to(a,d)N = 1, Moves = [from_to(a,d)]
%@ ; fill(c)from_to(a,d)N = 2, Moves = [fill(c),from_to(a,d)]
%@ ; fill(d)from_to(a,c)N = 2, Moves = [fill(d),from_to(a,c)]
%@ ; ... .
The canonical form of To be really clear about it, it's clear from this example JSON does not support my usecase for making this PR in the first place, I'd have to write my own parser for these strings anyway, so we should probably just return the strings! Edit3: It looks like @bakaq 's PR #2493 is actually the perfect solution to this, crisis averted! Edit4: I rebased onto #2493 and it works great. {"result":{"bindings":{"Moves":[{"args":[{"atom":"a"},{"atom":"c"}],"functor":"from_to"}],"N":1}},"status":"ok"} I couldn't imagine anything better!! It will even be worth redoing all the documentation. 😑 Again. 😭 |
464a6b5
to
7987c4f
Compare
Co-authored-by: Kauê Hunnicutt Bazilli <[email protected]> Co-authored-by: Bennet Bleßmann <[email protected]> https://github.com/mthom/scryer-prolog/pull/2465/files/ab2ea3da1c037b6c69f110e3ac2820ae81ac630c#r1723705946 https://github.com/mthom/scryer-prolog/pull/2465/files/ab2ea3da1c037b6c69f110e3ac2820ae81ac630c#r1723703554 https://github.com/mthom/scryer-prolog/pull/2465/files/ab2ea3da1c037b6c69f110e3ac2820ae81ac630c#r1723919182 https://github.com/mthom/scryer-prolog/pull/2465/files/ab2ea3da1c037b6c69f110e3ac2820ae81ac630c#r1723921564 https://github.com/mthom/scryer-prolog/pull/2465/files/ab2ea3da1c037b6c69f110e3ac2820ae81ac630c#r1723926390 https://github.com/mthom/scryer-prolog/pull/2465/files/ab2ea3da1c037b6c69f110e3ac2820ae81ac630c#r1723927197 https://github.com/mthom/scryer-prolog/pull/2465/files/ab2ea3da1c037b6c69f110e3ac2820ae81ac630c#r1723711518 https://github.com/mthom/scryer-prolog/pull/2465/files/ab2ea3da1c037b6c69f110e3ac2820ae81ac630c#r1723715803 https://github.com/mthom/scryer-prolog/pull/2465/files/ab2ea3da1c037b6c69f110e3ac2820ae81ac630c#r1723770872 https://github.com/mthom/scryer-prolog/pull/2465/files/ab2ea3da1c037b6c69f110e3ac2820ae81ac630c#r1723913030 fix file organization remove dead file fix file structure, fix cbindgen.toml, cargofmt, fix docs
|
||
See [`scryer_python.py`](scryer_python.py) for a complete example. | ||
|
||
* Note that the JSON printer returns results that are not completely accurate when it comes to differentiating strings from symbols, e.g.: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still the case? I think the JSON printer implemented by @bakaq does this correctly!
src/ffi/shared_library.rs
Outdated
let value: Value = serde_json::from_str(&format!("{}", query_resolution_line)).unwrap(); | ||
json!( { | ||
"status": "ok", | ||
"result": value | ||
}) | ||
.to_string() |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
src/ffi/shared_library.rs
Outdated
let obj = serde_json::from_str::<serde_json::Value>(&v).expect("Bad JSON"); | ||
|
||
let output_string = json!({ | ||
"status": "ok", | ||
"result": obj | ||
}) | ||
.to_string(); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
dc1b52f
to
8b307a1
Compare
It's also entirely possible that after @Skgland 's PR #2505, we will also accept queries and code in the form of serialized data -- but depending on timing might be better to put that in another PR. However this would be much more secure, especially for queries, than allowing for raw string injection. |
If the plan is to leave room for non-string based APIs in the he future, are we fine with the string based APIs taking the current names? It feels like we currently use the preferred names for the non-preferred API. |
I'm fine very with changing them. Any suggestions? |
First a list of the functions we might want to rename and why
Not sure what to do about My immediate suggestions would be (though I am not completely happy with them)
On thing that might make this easier to not provide Instead of Machine *m = scryer_machine_new();
char* resuls = scryer_run_query("A = 'Scryer Prolog'");
scryer_machine_free(m);
scryer_string_free(result); one could then do Machine *m = scryer_machine_new();
char* resuls = scryer_all_query_resuls_json(scryer_run_query_iter("A = 'Scryer Prolog'"));
scryer_machine_free(m);
scryer_string_free(result); This could help limit the combinatoric explosion and with With that we could use the following names (I like these better than the ones above)
or with
If we already had a C-API ready I would suggest not adding the On a side note: on a quick glance it looks like everything in Splitting it like this could help to make sure that everything that can be done from C can also be done from rust i.e. the shared library doesn't accidentally depends on or exposes Related to that thought but not this PR: Could this also be done for the I think it would be good to here the others opinion on all of this as well.
Also should we maybe move the discussion to a Discussion? This PR thread is growing rather long. |
@jjtolton, @Skgland, @bakaq: First, thank you all for everything you are doing here! I am following your work with great interest and I think many excellent points were raised and have also been addressed already! I'm amazed by the quality and intensity of your contributions. I hope two additional points may be useful:
Thank you a lot, and please keep up the great work! |
@triska regarding 1, we had a conflict, but the |
@Skgland Also, to everyone -- One additional point to consider -- if we are going to change the API like this -- is there any reason at all we should ask the user to keep track of the I want to emphasize the only reason it exists is because I didn't want to step on any toes by making it an attribute of the The only other reason we would potentially want to have a Personally, I think overall the API would be much nicer if we got rid of the |
In the Rust side it makes sense because something has to implement
I'm really not sure about that. The number of APIs would still be the same, but then the user would have to handle the state implicitly. // Current
Machine *machine = scryer_machine_new();
QueryState *query_state = scryer_machine_query_iter("X = 1; X = 2.");
char *leaf_answer_1 = scryer_query_next(query_state);
char *leaf_answer_1 = scryer_query_next(query_state);
scryer_query_state_free(query_state);
scryer_free_c_string(leaf_answer_1);
scryer_free_c_string(leaf_answer_2);
scryer_machine_free(machine);
// Query state in Machine struct
Machine *machine = scryer_machine_new();
scryer_machine_query_iter("X = 1; X = 2."); // Magic happens here
char *leaf_answer_1 = scryer_query_next(machine);
char *leaf_answer_2 = scryer_query_next(machine);
scryer_query_state_free(machine); // We would still need to finish the query somehow
scryer_free_c_string(leaf_answer_1);
scryer_free_c_string(leaf_answer_2);
scryer_machine_free(machine);
// But what would happen in this situation?
Machine *machine = scryer_machine_new();
// !!??? We haven't initialized a query yet!
// What should this do? Undefined Behavior? Should we check
// if a query has been initialized?
char *leaf_answer_1 = scryer_query_next(machine); In my experience with C APIs, OpenGL-like APIs with global/implicit state (which is what GNU Prolog seems to do) are terrible footguns in an already footgun infested language. I like SDL/Vulkan-like APIs with explicit state with opaque pointers much more, because they are harder to misuse. |
That's all fine. In that case, at least from the shared library perspective, I think we should refer to the query state pointer as a leaf answer cursor rather than a query state, I think it will seem less like a chore and line up with more familiar concepts. |
Yes, and |
|
if self.source: | ||
self.consult(self.source) | ||
if query: | ||
return self.lazy_eval_context(query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving this a bit of a test drive since it's perfect for a prototype I am working on, so thanks so much!
I just wanted to point out quickly that returning three different things like this causes some Python typing issues when used in the way of the examples:
Cannot access attribute "lazy_eval_context" for class "ContextManager[Generator[SCRYER_QUERY_RESULT, None, None]]"
Attribute "lazy_eval_context" is unknown (Pyright reportAttributeAccessIssue)
──────────────────────────────────────────────────────────────────────────────
https://github.com/microsoft/pyright/blob/main/docs/configuration.md#reportAttributeAccessIssue
Closes: #2464
My barbarian efforts to make the WASM
eval_code
functionality available from the DLL for use in other languages. I genuinely hope I did not insult any rust or prolog users here with these efforts!Outstanding tasks from discussion:
cbindgen.toml
#[export_name = ...]
tocbindgen.toml
prefix**Updated TODOs in response to discussion:**
zero copy(non-goal -- text based answers are fine, I think)Machine
pointer referencescryer_
#[repr(C)]
** C-API goal (future enhancement) **
** lower level pure scryer prolog goals (future enhancement) **
run_query()
in terms ofrun_query_generator()
lib_machine::Machine::run_query()
andlib_machine::Machine::run_query_generator()
(thanks @bakaq !)run_query_generator()
express "determinism" (thanks @bakaq !)**Possible additional goals:**
- provide a way for multiple simultaneous generative queries, either via WAM cloning or scryer continuations?Possible addition to `README.md` (going to be moved to subdirectory instead of `README.md`)
Steps to Use the Shared Library:
Locate the Library: After building Scryer Prolog, you'll find the shared library in:
<PATH-TO>/scryer-prolog/target/release
.Replace
<PATH-TO>
with the actual path where you installed Scryer Prolog on your computer. The library will have a name likelibscryer_prolog.so
(Linux),libscryer_prolog.dylib
(macOS), orscryer_prolog.dll
(Windows).Load the Library: Here are basic examples for common languages:
C/C++ (Linux/macOS):
Python: You'll typically use libraries like
ctypes
:Access Functions: Once the library is loaded, you can call its functions (like
consult_module_string
,run_query
) just like you would any other function in your code. See the Python reference implementation below.Important Notes:
The following functions are exposed:
Shared Library Functions Summary
This table summarizes the Rust functions you provided, detailing their names and descriptions:
machine_new()
machine_free()
.machine_free()
machine_new()
.start_new_query_generator()
cleanup_query_generator()
run_query_generator()
load_module_string()
consult_module_string()
run_query()
free_c_string()
Shared Library Memory Management Considerations and Usage Overview
Ownership Transfer:
The Rust library uses C-style strings (
CString
) to return data. To avoid premature cleanup by Rust's garbage collector, ownership of the string pointer is transferred to the client application. This means the client becomes responsible for freeing the memory usingfree_c_string()
. Failure to do so will cause memory leaks.Thread-Local State:
The library utilizes thread-local storage (
thread_local!
) to maintain state between function calls within the same thread. You must initialize this state withmachine_new()
and release it withmachine_free()
when finished. Neglecting to callmachine_free()
will also lead to memory leaks.Using the Library: A Step-by-Step Guide
Initialization (
machine_new()
): Before using any other function, initialize the Prolog machine usingmachine_new()
. This creates a thread-local state object necessary for subsequent operations.Loading Data: You have two options for loading Prolog data:
Direct Consultation: Use
consult_module_string()
to load and immediately run Prolog source code from a C string. Think of it as directly providing the source code to Scryer Prolog.Query Generation:
start_new_query_generator()
, passing your query (in the form?- ...
) as input.run_query_generator()
repeatedly until it returns{ "status": "ok", "result": ["false"] }
. This indicates that all results have been generated.Running Queries:
run_query()
.Cleanup:
cleanup_query_generator()
after finishing with a query generator to release associated resources (unless it's done internally for you).machine_free()
once you are done using the Prolog machine, regardless of the loading and querying methods used.Example Python Client Implementation
# Concerns
load_module_string
andconsult_module_string
-- I included both because maybe there is a nuanced difference, but I can't tell which one is which.I'm not a rust specialist, and I feel like I could have done the feature/cfg targets much better.(being moved to inline modules)Still not a rust specialist, and I am a bit concerned about using the thread-local "globals" for maintaining state, but that was the best way I could figure out how to do it. Please let me know if anyone has any better ideas.(refactored to individual machine references)