-
-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UNLIST table-valued function #8005
Comments
> The RETURNING syntax look weird, but .... > If data type conversion is omitted, resulting field has the data type of the first input parameter. > But I'm not sure about the column name - should we also imply UNLIST, or maybe use keyword VALUE or maybe some other ideas? |
As I said, this is the standard way used in some other functions. And standard support for JSON functions is implemented by RedSoft and will be offered as PR. So it seems we'll have the "weird" RETURNING syntax anyway ;-) Thus I do not see as a big problem using it for |
As for |
I've edited description to be better readable. |
UNLIST(...) [AS ALIAS[(COL1, ..., COLN)]] ? |
Does it makes sence to have both ? UNLIST(... [AS | RETURNING ...]) |
The fact that we allow derived tables without a correlation name is actually a bug if you look at the standard (e.g. see my second comment on #7574). Personally I think the And otherwise, using |
In other words
(i.e. not optional!) |
Mark, while |
Given the correlation clause should be required, I think having two |
Yes it is, but that is because for normal derived tables you're able to specify column names in the query expression itself, and that doesn't apply here, so naming the column should be required here together with the correlation name. |
So far I tend to agree. |
Though it is true, I wonder why (in both cases) we cannot generate them as usual "expression", "expression1", etc. |
Agreed - 2 AS one after another is not good idea. |
Because derived tables must have explicitly specified column names. You shouldn't rely on generated names. The fact it is allowed (or maybe condoned) for top-level queries and not needed for (single-column) query expressions as columns or for IN/EXISTS/SINGULAR doesn't mean it is allowed for derived tables and CTEs. |
They're somewhat separated -- |
While I'm not insisting, must note that we already have this with CAST: |
That is a good point, but I don't think we should perpetuate it, especially not if there is a standard-endorsed alternative. To quote from 6.26 <JSON value function>:
|
I'm wondering, if we would allow where ID in(unlist(...)) |
Or otherwise, only explicitly allow its use without |
Hi What i have in mind. Currently we have multiple stored procedure like this: CREATE PROCEDURE GET_ROWS_FROM_LISTB_4COLS (
ITEM_LIST BLOB SUB_TYPE 1,
SEPARATOR_LIST VARCHAR(20) DEFAULT ';',
SEPARATOR_SUB_LIST VARCHAR(20) DEFAULT ',' )
RETURNS (
ITEM1 VARCHAR(120),
ITEM2 VARCHAR(120),
ITEM3 VARCHAR(120),
ITEM4 VARCHAR(120) ) and i can call it with something like this. SELECT * FROM GET_ROWS_FROM_LISTB_4COLS ('1111,2222,3333,4444;5555,6666,7777,8888;9999,0000,1234,4567', ';', ',') and i got as result:
i also can ommit somthing in the list and got null or empty string in this case SELECT * FROM GET_ROWS_FROM_LISTB_4COLS ('1111,,3333,;5555,6666;9999,0000,1234,4567', ';', ',')
If your implementation will support above it will be wonderfull. SELECT * FROM GET_ROWS_FROM_LISTB_4COLS ('1111,,3333,;5555,6666;9999,0000,1234,4567', [';', '#', '&'], [',']) if your implementation will alow this will be excelent! |
@livius2: looks like you are asking for CSV string to be parsed and returned in tabular format. |
as rows and columns. With one returned column it can be used in |
Csv and other ideas other than unlist are not discussed in this ticket. If you want other opportunities to hover tricky blobs, create separate tickets. |
Off-topic disclaimer This is very simple implemented with single UDR routine for multiple possible data sources, based in the metadata of the stored procedure defined. Look at this example for JDBC data sources with FB/Java: https://github.com/FirebirdSQL/fbjava/blob/master/examples/fbjava-example/util/code.sql#L35 External routines (aka UDR) are very democratic and there to be used with different languages. It's time for community to even do these useful routines and share them. |
Why do you think about udr here? It does not support anytype parameter/returns declaration, built in function can. And why it is not releated to this ticket? It have strong releation to it. |
And to be more releated to this ticket ;-) and asked questions by @dyemanov: below you have explicite Field name `FIELD_1' and as type:
Or why not derived table syntax? Name of the field after. Default can be select * from unlist('1,2,3') AS U (MY_VALUE) select * from unlist('1,2,3' AS INTEGER, ',') AS U (MY_VALUE) It will allow in the future extensions as proposed by me. And i opt to not trim values as default. If trim required, add another optional parameters or if spaces are always e.g. one, someone should put it in separator e.g. `, '. select * from unlist('1, 2, 3' AS INTEGER, ', ') AS U (MY_VALUE) |
A Udr with a single entry point can be declared with different names and a different set of input and output parameters. Metadata of such sets of procedures can be obtained inside the implementation and used when parsing csv. This is exactly what was hinted at. |
Having built in function work in any database even if you have not DDL privilege. With udr, you must have tones of declaration for same thing with different types. And as in my last comment, supporting such things with udr is missconception. But this is my humble opinion, you can ignore it. I can only bring light for real needs from other side, user side. |
Personally, I can accept multiple separators in More complex parsing surely does not fit this proposal. |
This would be great and actually our users expect something like that (as it's simpler to use). However, the standard plays against us, or maybe I just fail to figure it out. This is possible by standard (for select * from T1 JOIN UNNEST AS T2 ON ... but |
I'm not against many separators, I'm against unlist, which returns many columns according to complex rules. Lots of comma delimiters look good.
|
It seems to me that there is no need to expand the syntax here. It's better to do it in the standard paradigm. As for the returned column, RDB$VALUE can be used instead of VALUE. It's not very pretty, but you can always rename it (give it an alias). |
That would IMHO overload the meaning of |
We could (theoretically) allow missing column names, but only if they're not referenced explicitly. E.g.:
|
Hmm, on the one hand, I'm not a fan of deviating too much from the standard, on the other hand I hate unnecessary verbosity :) As an example, BigQuery supports something like
(so even without the parentheses!) For Maybe we should just follow the standard for now and just require a normal subquery for this case, and see if there is a demand to add a simplification. To be honest, a lot of the time, we all have the tendency to add too much gold-plating and other embellishments to new features instead of keeping it simple initially, as exemplified by some of the comments above asking for a wholly different feature than this (parsing CSV), and wanting to add options like supporting multiple separators (which IMHO would indicate bad data input, which should be addressed, instead of adding bandaids to work with that bad data). |
I suppose multiple separators can be handled outside, e.g.: select val from unlist(replace(str, ';', ',')) as tab (val) but this is not so easy for trimming, if select trim(val) from unlist('1, 2, 3' returning int) as tab (val) conversion error will be raised before |
In the case of ANSI standard |
Nope. What you describe belongs to i.e. you may write both |
You're thinking of
Which is not the same as
All those elements of <table or query name> are identifiers or identifier chains, not expressions. |
A <table function derived table> can be used as a <table primary> (i.e. in the from clause, like a derived table). |
I'm thinking of other branch: |
That derivation would not work for this case, as that is about the individual row values of the |
But IN with single value is perfectly ok even if this value is a collection or row constructor, no? |
In other words, following that production would result in an IN-list with one element, and that doesn't match with what's needed because the single value you test against would never be equal to the element in the list. Also problematic on another level is the fact that Firebird doesn't support row-values (or at least, not with an arity greater than 1), and if it did, the requirement would be that the arity of the row value is deterministic, which it wouldn't be for this case. |
With the derivation you're proposing, use of |
In short, we probably should park this (shortening use of |
On 2/14/24 12:14, Dmitry Yemanov wrote:
I suppose multiple separators can be handled outside, e.g.:
select valfrom unlist(replace(str,';',','))as tab (val)
Yes. Use of replace here is absolutely logical.
but this is not so easy for trimming, if |RETURNING| clause specifies
a non-string data type:
select trim(val)from unlist('1, 2, 3' returningint)as tab (val)
conversion error will be raised before |TRIM| is called.
Here I see something strange.
1. Spaces before first digit do not cause conversion error:
SQL> select cast(' 2' as int) from rdb$database;
CAST
============
2
2. unlist(... returning int) returns integer value, what means trim(int) ?
If list contains some more garbage except spaces one can always use
replace to remove them.
If something more complex is needed - processing can always be performed
out of unlist() :
select userfunc(val)from unlist('1, 2, 3')as tab (val)
|
To summarize:
|
I think the syntax should be extended with an optional
Should return strings: 'abc', 'bcd,d', 'e'. |
I think that is unnecessary gold-plating which should be addressed by preprocessing the value and postprocessing the result. |
Implement a built-in function working the opposite way to the existing aggregate function
LIST
- i.e. parse the delimited string and return a record set containing the parsed values.It may be a possible solution for #3821, as it would be possible to do
WHERE ID IN (SELECT * FROM UNLIST(...))
The name
UNLIST
is suggested due to (1) being an antipode to the existingLIST
function (even if we add a standard synonymLISTAGG
one day) and (2) SQL standard having theUNNEST
table-value function which does the similar job just for arrays/multi-sets.Syntax:
If
<separator>
is omitted, comma is implied (as forLIST
). If<data type conversion>
is omitted, resulting field has the data type of the first input parameter.The RETURNING syntax look weird, but this is exactly how SQL specification declares the output data type in JSON functions, so we used the same approach here.
Remaining questions:
UNLIST
as relation name if theAS
clause is omitted. But I'm not sure about the column name - should we also implyUNLIST
, or maybe use keywordVALUE
or maybe some other ideas?Should we trim the parsed strings under the hood, or make it optional, or leave it up to developers?
select * from unlist('1, 2, 3')
If
<data type conversion>
is omitted and the input argument is BLOB, should we return also BLOB or maybe VARCHAR(MAX)?The text was updated successfully, but these errors were encountered: