Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: PostgreSQL JSON and jsonb types #561

Merged
merged 7 commits into from
Jan 30, 2017

Conversation

emk
Copy link
Contributor

@emk emk commented Dec 31, 2016

NOT READY FOR MERGE. Just for comments and feedback. :-)

We've been experimenting with using Diesel in a production application, and we really like the query builder! As discussed in #44, we have an extremely pressing need for PostgreSQL JSON and jsonb support, which we currently support using rust-progres.

@killercup suggested:

Or, you could try and do this as a plugin crate. (If you think adding this to diesel is easier and more efficient, you can of course still open a PR!)

I think it will be much easier to implement the first version as a PR against diesel itself. Once it works, we can make the decision about whether to move it to a plugin crate or not.

The current version of the code provides types::Json, types::Jsonb, ToSql and FromSql with basic unit tests. We also need more detailed tests in diesel_tests, and there are a couple of open questions I'm trying to figure out right now. But I thought we should get some actual code to look at and comment on. :-)

Your suggestions and comments are very much appreciated!

emk added 2 commits December 31, 2016 08:33
This is inspired by the following rust-postgres code:

https://github.com/sfackler/rust-postgres/blob/de46ba2176d438ddc0b5d0752a6084f241b22242/postgres-shared/src/types/serde_json.rs

Note that we only support encoding version 1, which is all that
rust-postgres supports.  But this seems to work fine in practice with
our database.  We can add other encodings as necessary.
@emk emk mentioned this pull request Dec 31, 2016
This allows us to verify that PostgreSQL actually understands the data
that we're generating, and vice versa.
@emk
Copy link
Contributor Author

emk commented Dec 31, 2016

OK, I think we're getting pretty close to a workable prototype! All the tests pass (including the new round-trip tests) and the following compiles:

table! {
    jsons {
        id -> Integer,
        data -> Jsonb,
    }
}

I'm going to try converting more code in one our projects, and see how it works. But first, a quick discussion of why I built this the way I did.

Why serde_json::Value is a good default JSON representation in Rust

When working with JSON data, there are three representations we might want to use:

  1. A raw JSON &str or &[u8]. This is the format in which programs will tend to send and receive JSON data.
  2. A "parsed" representation of JSON. For example, serde_json::Value has the constructors Value::Bool, Value::Null, Value::String, etc., and it allows us to manipulate JSON data easily.
  3. A Rust data structure. For example, we might want to convert between { "x": 1, "y": 2 } and Point { x: 1, y: 2 }.

An ideal Rust JSON library should make it simple to convert between all three representations. There are a number of Rust libraries for working with JSON. The most popular are rustc_serialize and serde_json. Ideally it would be possible to support both, but if we were to choose one, serde_json is probably the better choice, because it has excellent support for converting between (1), (2) and (3). At the moment, some serde features do require a codegen plugin, but that will go away when Macros 1.1 lands. Also, serde appears to be the favorite choice among newer crates. I'd be happy to try to implement rustc_serialize, though!

Should we support conversion to and from String and Vec<u8> using ToSql and FromSql?

If it's possible to easily support both these formats and serde_json::Value, then I would definitely be in favor of doing so!

What about the JSON query types described in #44 (comment)?

These are really interesting, but they're probably less urgent than simply reading and writing JSON data. Also, I'm not sure how to implement them. But I'd be happy to accept suggestions!

Is this code ready to merge?

Well, I'd love some feedback first! But so far, it seems to cover the basic use cases. I'm going to test this further by converting more internal code, and make sure there are no surprises. I'll let you know how it goes.

I'd also love feedback and test reports from other potential users! To test this out, run:

cd myproject-using-diesel
git clone -b pg-json-types https://github.com/faradayio/diesel.git

...and add the following to your Cargo.toml:

[replace]
"diesel:0.9.1" = { path = "diesel/diesel" }
"diesel_codegen:0.9.0" = { path = "diesel/diesel_codegen" }
"diesel_codegen_shared:0.9.0" = { path = "diesel/diesel_codegen_shared" }
"diesel_codegen_syntex:0.9.0" = { path = "diesel/diesel_codegen_syntex" }

Copy link
Member

@killercup killercup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, nice! That was quick! Still made it for 2016 ;) And thanks for your in-depth comments! I agree with basically all of it.

I think it will be much easier to implement the first version as a PR against diesel itself. Once it works, we can make the decision about whether to move it to a plugin crate or not.

The reason I originally wanted this as external crate is because the functions and operators #44 describes are pretty Postgres-specific. But this basic support looks like it can also work with SQLite's JSON type. What I am concerned about though, is how the orphan rules might come to bite us. If we define the types in diesel itself, we can't easily add foreign trait impls in an external crate.

Edit: Btw, I'm 75% certain that your code (with namespaces adjusted) would work as an external crate as-is. :)

Why serde_json::Value is a good default JSON representation in Rust

Indeed, AFAIKT serde is the de-facto library of choice here.

Should we support conversion to and from String and Vec using ToSql and FromSql?

Hmm, you mean like update(foo).set(bar.eq(r#"{"baz": true}"#)).execute(&connection)? I'm not sure if we should go that far. Is that a use case you have? Users can probably replace the string above with r#"{"baz": true}"#.parse()? and it will work (or something similar; FromStr works with serde, right?).

What about the JSON query types described in #44 (comment)?

You are right that supporting those and adding support for the basic data type are two different steps. I'm okay with adding basic support and worrying about more later.

Also, I'm not sure how to implement them.

There are infix_predicate! and sql_function! that may help.

Is this code ready to merge?

Other than the inline comments (from a quick read-though): Can you add some integration tests? E.g., create a table that has a jsonb column address, define a #[derive(Serialize, Deserialize)] struct Address { … }, and do some CRUD work on that column? This could also be done in a doc comment, so we generate a bit of documentation at the same time :)

Also, @sgrif will be back from vacation next week, so I want to wait and see what he thinks about this.

#[derive(Debug, Clone, Copy, Default)] pub struct Json;

#[cfg(feature = "serde_json")]
/// The JSON SQL type. This type can only be used with `feature =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/JSON/JSONB, or this is just the same as above :)

Can you maybe add a sentence that JSONB is storing the data in binary and thus more efficiently, and link to https://www.postgresql.org/docs/9.6/static/datatype-json.html? Maybe even quote:

In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys.

use types::{self, ToSql, IsNull, FromSql};

primitive_impls!(Json -> (serde_json::Value, pg: (114, 199)));
primitive_impls!(Jsonb -> (serde_json::Value, pg: (3802, 3807)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to add a comment on those magic oid/array_oid numbers here, with links to the postgres spec. But I can't find them, except for in other implementations and the source code…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, my only source for the magic numbers was Googling around and looking at what other libraries did, then confirming that it worked. :-/ I'll keep looking.

fn from_sql(bytes: Option<&[u8]>) -> Result<Self, Box<Error+Send+Sync>> {
let bytes = not_none!(bytes);
serde_json::from_slice(&bytes)
.map_err(|e| Box::new(e) as Box<Error+Send+Sync>)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, is that as Box<Error+Send+Sync not inferred?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, that does not seem to be the case. I tried removing it and got expected trait std::error::Error, found enum pg::types::json::serde_json::Error.

impl FromSql<types::Json, Pg> for serde_json::Value {
fn from_sql(bytes: Option<&[u8]>) -> Result<Self, Box<Error+Send+Sync>> {
let bytes = not_none!(bytes);
serde_json::from_slice(&bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably unnecessary & (bytes is already a reference)

@emk
Copy link
Contributor Author

emk commented Dec 31, 2016

Thank you for the detailed review! I'll respond to your detailed comments with either fixes or responses shortly.

Edit: Btw, I'm 75% certain that your code (with namespaces adjusted) would work as an external crate as-is. :)

Is primitive_impls! exported somewhere? Because I've been trying to add Diesel integration for several types in https://github.com/faradayio/bigml-rs, and every time I try it, I wind up missing the queryable_impls! and expression_impls! macros that are used by primitive_impls!. If there were some way to access the necessary macros, then this would probably work fine.

Hmm, you mean like update(foo).set(bar.eq(r#"{"baz": true}"#)).execute(&connection)? I'm not sure if we should go that far. Is that a use case you have?

No, we have no immediate use case for this. The only thing I can imagine is grabbing raw JSON and sending it directly to a web client without deserializing and re-serializing it. If somebody wants it, they can always add it later. :-)

@emk
Copy link
Contributor Author

emk commented Dec 31, 2016

Other than the inline comments (from a quick read-though): Can you add some integration tests? E.g., create a table that has a jsonb column address, define a #[derive(Serialize, Deserialize)] struct Address { … }, and do some CRUD work on that column? This could also be done in a doc comment, so we generate a bit of documentation at the same time :)

I'd love to add better integration tests! Right now, we do have the tests in diesel_tests that test the actual translation of JSON and jsonb between PostgreSQL and diesel, and those seem to work. This provides almost exactly the same test coverage as the existing uuid type, which I used as an example.

Where should I add tests that create a table and do more complex experiments? If you can point me at an existing file, I'd be happy to generalize from there.

I'd also love to be able to test something like the code below, but I can't get table! or Queryable working in any of the obvious places (doc comments, etc.) without a bunch of errors.

#[cfg(feature = "postgres")]
#[allow(dead_code)]
mod jsonb_compile_test {
    extern crate serde_json;

    table! {
        json_values {
            id -> Integer,
            value -> Jsonb,
        }
    }

    #[derive(Queryable)]
    struct JsonValue {
        id: i32,
        value: serde_json::Value,
    }

    fn get_json_value(conn: &diesel::Connection, id: i32)
                      -> Result<JsonValue, diesel::Error> {
        json_values::table.find(id).load::<JsonValue>(conn)
    }
}

Another patch is incoming soon which should address most of your other comments.

@emk
Copy link
Contributor Author

emk commented Dec 31, 2016

OK, I've pushed a bunch of fixes suggested in the code review. Thank you for looking at this PR!

@killercup
Copy link
Member

Happy new year!

Is primitive_impls! exported somewhere?

Oh, good point. Only a few macros are exported. And I'm not sure if we want to expose primitive_impl! as-is because we'll probably need to guarantee backward compatibility for its implementation details (macros are hard). I'll have to think about that. @sgrif probably has some thoughts on that as well.

Where should I add tests that create a table and do more complex experiments? If you can point me at an existing file, I'd be happy to generalize from there.

diesel_tests is a complicated beast! :) There is a lot in there and I can't pretend to know all of it. There is even a small schema creating DLS! I'd suggest looking at this (or similar) tests that define a new table in a transaction.

If you want to dig even deeper into diesel tests, have a look at types_roundtrip.rs :)

For doc tests, I don't think it's easily possible to add custom derive there. You can use the regular macros with the struct definition though. A lot of magic is also done in src/doctest_setup.rs. Have a look at this doc comment as a example.

// https://www.postgresql.org/message-id/CA+mi_8Yv2SVOdhAtx-4CbpzoDtaJGkf8QvnushdF8bMgySAbYg@mail.gmail.com
// https://www.postgresql.org/message-id/CA+mi_8bd_g-MDPMwa88w0HXfjysaLFcrCza90+KL9zpRGbxKWg@mail.gmail.com
primitive_impls!(Json -> (serde_json::Value, pg: (114, 199)));
primitive_impls!(Jsonb -> (serde_json::Value, pg: (3802, 3807)));
Copy link

@jashank jashank Jan 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C interface defines, in server/catalog/pg_type.h, the macros JSONOID and JSONBOID as 114 and 3802, respectively. Not sure why there's no better documentation than that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds like a solid reference :) Maybe we can link to a current release: https://github.com/postgres/postgres/blob/REL9_6_1/src/include/catalog/pg_type.h#L623-L627

(We should probably go through the code and add more references for type OIDs… I hate magic numbers)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seconded. This seems like something that should be added in the pq-sys bindings, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually get these by querying the pg_type table FWIW. Eventually I'd like to just load these dynamically, as they technically aren't supposed to be assumed stable (in practice they are though other than user defined types so it's very low priority and possibly not even possible with the rest of the project's goals)

@sgrif
Copy link
Member

sgrif commented Jan 3, 2017

NOT READY FOR MERGE

Ok merging. :trollface:

@sgrif
Copy link
Member

sgrif commented Jan 3, 2017

At the moment, some serde features do require a codegen plugin, but that will go away when Macros 1.1 lands.

The master branch is for 0.10, which will launch no earlier than the release of Rust 1.15 which will stabilize Macros 1.1, so that's not a concern.

@sgrif
Copy link
Member

sgrif commented Jan 3, 2017

looks like it can also work with SQLite's JSON type.

Let's not worry about that type for now (or any type that is an extension to any backend).

@sgrif
Copy link
Member

sgrif commented Jan 3, 2017

I think we should support FromSql for String and Vec<u8>, but not ToSql. It's definitely reasonable to want to send JSON from the database unchanged to a client, and skip the deserialize/serialize step. For ToSql though, we only want to support types that we are reasonably sure are valid JSON.

/// - [`serde_json`][Value]
///
/// [Value]: https://docs.serde.rs/serde_json/value/enum.Value.html
#[derive(Debug, Clone, Copy, Default)] pub struct Jsonb;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should call this type JsonB and have #[doc(hidden)] pub type Jsonb = JsonB to make schema inference happy like we do for VarChar?

@@ -13,11 +13,12 @@ dotenv = "0.8.0"
[dependencies]
assert_matches = "1.0.1"
chrono = { version = "^0.2.17" }
diesel = { path = "../diesel", default-features = false, features = ["quickcheck", "chrono", "uuid"] }
diesel = { path = "../diesel", default-features = false, features = ["quickcheck", "chrono", "uuid", "serde_json"] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should add an everything feature so we use that here (which would also reduce churn in the travis scripts)

@@ -387,6 +387,41 @@ fn pg_uuid_to_sql_uuid() {

#[test]
#[cfg(feature = "postgres")]
fn pg_json_from_sql() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean serde_json_value_from_sql here.

@sgrif
Copy link
Member

sgrif commented Jan 3, 2017

When I originally wrote #44, I had envisioned that we would have our own enum which was basically a copy of the "json value" enum that both rustc-seriailze and serde provide. However, that was over a year ago, and a lot has changed. I'm fine with just targeting serde.

I also envisioned (and would still like to somehow have) an impl that was roughly:

impl<T: serde::Deserialize> FromSql<types::Jsonb> for T

But that will likely never be possible within Rust's coherence rules until we figure out some method of permitting overlap.

Can we see if we can avoid having JSON be treated as a separate type from JSONB internally? I get a result from select * from pg_cast inner join pg_type a on a.typname='json' and a.oid=castsource inner join pg_type b on b.typname='jsonb' and b.oid=casttarget; so I think we can. It will simplify operators and functions if we can treat them as the same type internally. So just doing pub type Json = JsonB should suffice.

Overall this code seems fine for the parts that it has tried to accomplish so far.

@sgrif
Copy link
Member

sgrif commented Jan 3, 2017

Also, I'm not sure how to implement them. But I'd be happy to accept suggestions!

This can be a separate PR, but the implementation of array comparison operators is probably a good place to start. Relevant code is here and here

@sgrif
Copy link
Member

sgrif commented Jan 3, 2017

Thank you for working on this! I'm excited to hear you guys are experimenting with Diesel. Let me know if there's anything else I can do to help.

@norcalli
Copy link

norcalli commented Jan 9, 2017

What would you say is left for this PR to be acceptable? Implementing operators, integration tests, and perhaps the JsonB type alias?

@emk
Copy link
Contributor Author

emk commented Jan 9, 2017

Yes, it would really help to have a list.

I've been procrastinating a bit on the integration tests, because the requested tests for json/jsonb are much more extensive than the existing tests for UUID, so I'm going to have to come up to speed on a fair bit of the code base to implement them.

I think operators should not block the list.

@killercup
Copy link
Member

killercup commented Jan 9, 2017 via email

@norcalli
Copy link

norcalli commented Jan 9, 2017

I agree that operators are not essential and should/could come later. If I have some free time, I'll see about making some integration tests for you as well in a PR against this branch. Between all of us, someone should be able to get in an integration test or two :P

@killercup
Copy link
Member

@norcalli Did you write any tests yet? I just wrote a doc test and opened faradayio#1 for it. More tests will be better though :)

@emk
Copy link
Contributor Author

emk commented Jan 13, 2017

@killercup I've merged your doc test onto my PR branch. Thank you!

@killercup
Copy link
Member

You're welcome. I'd love to add some more integration tests/doc comments with custom derives using serde (because this will show the roundtrip how its meant to be), but I saw we didn't use codegen in doc comments anywhere. Probably with a good reason; I opened #576 nevertheless.

Anyway, I wont get to it today, but I'll try to add a few integration tests tomorrow or on Sunday – unless @norcalli wants to/has time to write some.

Do you have any other things you want to add here before landing this, @emk?

@emk
Copy link
Contributor Author

emk commented Jan 13, 2017

Nope. I'm pretty happy with this personally, but if you want to combine the Json and Jsonb types somehow, we should do that.

I'm dealing with an urgent production problem today, unfortunately.

@killercup
Copy link
Member

@emk I have some time this weekend and would love to merge this. The type Json = Jsonb thing is still open. If you have some time, it would be awesome if you want to tackle this, otherwise I'll try to do that, or leave it open for a future PR.

@killercup
Copy link
Member

So, before this lays around for another week: Let's merge this! We can always tweak it some more later on.

@killercup killercup merged commit ab110c8 into diesel-rs:master Jan 30, 2017
@dtolnay dtolnay mentioned this pull request Jan 31, 2017
sgrif added a commit that referenced this pull request Feb 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants