Skip to content
This repository has been archived by the owner on Jun 4, 2020. It is now read-only.

Error when revision schema already exists #35

Closed
ymoisan opened this issue Jan 21, 2016 · 13 comments
Closed

Error when revision schema already exists #35

ymoisan opened this issue Jan 21, 2016 · 13 comments

Comments

@ymoisan
Copy link
Contributor

ymoisan commented Jan 21, 2016

If the schema for a specific revision exists AND the current user requesting it after clicking on the view button is not the owner of the schema, this is what the user gets when trying to retrieve the revision :

An error has occured (sic) while executing Python code:

Traceback (most recent call last):
  File "...python/plugins/qgis-versioning/versioning.py", line 501, in view
    schema, branch, rev )
  File "...python/plugins/qgis-versioning/versioning_base.py", line 1129, in add_revision_view
    pcur.execute("CREATE SCHEMA "+rev_schema)
  File "...python/plugins/qgis-versioning/versioning_base.py", line 123, in execute
    self.cur.execute( sql )
ProgrammingError: schema "my_schema_rev_12" already exists

The reason is that line :

pcur.execute("SELECT schema_name FROM information_schema.schemata "
        "WHERE schema_name = '"+rev_schema+"'")

in versioning_base.add_revision_view returns nothing if the current user is not the owner of the schema.

In order to avoid that, I thought about using schemas on a per pg_user basis rather than on a per revision number basis (schema+'_'+branch+'rev'+str(rev)). I did temporary code here : 162a7ed.

I'm using a new function called add_diff_revision_view that I will be using later to get a diff between two revisions but currently used as a test bed.

The idea is that specific revision numbers are not added in the schema name anymore. Since it is not possible to share schemas between users, I thought I'd do the following :

  • create schemas for revisions as "pg_user_revision_views"
  • generate views every time a revision is requested; this adds a bit of computing overhead, but it allows to override the view with a new revision number.

I think this will not only avoid the issue I'm reporting here, but it would also potentially diminish the number of schemas for revisions, given the number of pg_users is smaller than the number of revisions, which I think will be the case generally.

Comments ?

@gignacnic
Copy link
Contributor

+1 for me, I always see all these schemas viewed for each revision in the database a little too much, especially when we have many revisions to be viewed (ex. more than 50). One schema view revision per pg_user is just fine.

@ymoisan
Copy link
Contributor Author

ymoisan commented Jan 21, 2016

There is a problem with my approach : if a user wants more than one revision in a particular work session, then layers showing the first revisions fetched get overriden on the back end (e.g. on pan/zoom). Solutions I see are of two groups :

  1. We don't care about persistence ...
    a. on the back end by transferring the data locally in a spatialite (SL) file and build the layer out of it
    b. at all by using memory layers rather than vector layers
  2. We care about persistence, but we use strategies to minimize the number of objects in the database (schemas, views)

Pros (+) and cons (-) and questions (?) for ...

1 :
(+) a. would avoid network traffic
(-) a. would require managing files locally; in the case the user generates many revision views, SL files may be duplicated unnecessarily and clutter the filesystem
(?) b. what are limitations of memory layers (styling, number of features) ? I was thinking of using them to show a "diff" between versions, where there will likely be a number of items much smaller than the total number of features, but what about the case of a specific revision of all features ?
(+) we could remove the new view on exit of versioning_base.add_revision_view to keep the database clean of unnecessary objects

2 :
(+) the fewer the objects, the less housekeeping; our tests with a small set of layers show the number of schemas already clutters PgAdmin; adding schemas for the new proposed "diffmode", where we'll have a large list of pairs of revisions compared, will add to the problem
(?) would it be better to persist revisions in views rather than schemas ? Schemas would be per user, then views would also hold a revision number in their name
(?) do we require some permission sets for pg_users so that the queries necessary to either list schemas/views or use/write them can be done by all users irrespective of who the owner is ?

If we were to use the current functionality of the view button, that is fetch any number of specific revisions in individual layer groups, could there be a way to compare features between the layers, maybe in a buffer (memory ?) layer ? A kind of "layer diff" button in QGIS that would allow selecting two layers and then build the diff in a new layer ? The layer diff could apply to arbitrary pairs of layers in the current project, provided they meet minimal requirements. It would require more work for the user (load the layers (s)he wants to compare first) but we could build something similar to the conflict resolution mechanism where mine/theirs would become rev_begin/rev_end where we'd highlight attribute differences (geometry differences would obviously be apparent) ?

Whichever way we need to solve the bug mentioned in this ticket and also reduce the clutter of objects in the database.

@ymoisan
Copy link
Contributor Author

ymoisan commented Jan 21, 2016

"Schemas are analogous to directories at the operating system level, except that schemas cannot be nested." (http://www.postgresql.org/docs/current/interactive/ddl-schemas.html). So we can only have a linear structure to hold schemas and thus the only way to reduce clutter is to reduce the number of objects.

@ymoisan
Copy link
Contributor Author

ymoisan commented Jan 27, 2016

@vmora : Here is a solution proposal to both the bug reported and the proliferation of schemas :

  • at historize time (versioning_base.historize), create one schema that will be home to all revision-specific views, e.g. schema+'revision_views'
  • at view creation time, add views that have unique names instead of the currently generic name of tableName, e.g. :
    • tableName_rev_X_pgUser in versioning_base.add_revision_view
    • tableName_rev_X_vs_Y_pgUser in upcoming versioning_base.add_diff_revision_view

Pros :

  • the bug is eliminated : the one schema that holds all views is accessible by all users (no need to check if it exists); view names are specific to users because they include the PG username of the current user in their own names; this avoids replacing a view owned by another user (kind of like the bug reported here, except on views rather than schemas); unfortunately there is no IF EXISTS for views as in tables so views will need to be regenerated every time
  • the number of schemas for read-only views is drastically reduced to one; DBAs can clean the schema if they wish

Cons :

  • number of views will be high with time : numberUsers X revNumber[_vs_revOtherNumber]
  • existing users of the plugin will need to create the schema by hand if migrating to the version of the plugin that would contain this code; we can't check existence of the schema upon creating views because we'd fall back on the bug

I'm expecting to use memory layers for the comparison (diffmode) that I am currently implementing because the number of features will likely be small so for those views it may be possible to delete the views on exit of versioning.view after the features were loaded up into memory layers. Since views don't cost too much (I hear) I think we can defer the problem of view proliferation to periodic database administration tasks, if need be.

Short of fixing the bug and since there doesn't seem to be a way to grant all users view access to all existing and future schemas in a database, I'll need to implement some try/catch on the potentially offending query (schemata), which would obviously not be optimal.

I haven't fully tested this but I think it will work. Comments ?

@gignacnic
Copy link
Contributor

As discuss before, +1 for me.

@vmora
Copy link
Contributor

vmora commented Jan 28, 2016

Have you tried replacing the view by their select in qgis layers.

Views are basically just an alias for a select (except for working copies which are editable and have triggers). So you can create a qgis layer with the same select I thing: no need to add anything to the database and everyone with select role can do that.

@ymoisan
Copy link
Contributor Author

ymoisan commented Jan 29, 2016

@vmora : Thanx for the suggestion. I got it working for the diffmode (see the view() method in https://github.com/GISeHealth/qgis-versioning/blob/diffmode/versioning.py calling versioning_base.add_diff_revision_view). Warning : the code hasn't been cleaned yet.

I thought about doing the same thing in versioning_base.add_revision_view, but I realized doing this means I'll have to edit tests. Would you be fine with removing the schemas and views in versioning_base.add_revision_view too ? In fact, it's the only way we can get rid of the current issue. I would simply stuff the select SQL in a postgres layer instead of the view name.

Question : you'll see in the code ("if self.q_view_dlg.diffmode_chk.isChecked():") that I create a temporary postgres layer that I use to get pendingFields and get my memory layer uri and features out of that. Do you know if there is a way to get QgsFeature objects directly from PG ? It's not too bad for the diffmode because the layers are light (not many features) but if I could avoid the tmp layer that would be nice.

@ymoisan
Copy link
Contributor Author

ymoisan commented Feb 1, 2016

@ vmora : I have a first pass on getting rid of schemas for views altogether : 3ec1f8b. Note I am not calling versioning_base.add_diff_revision_view at all. I do everything in versioning.py. I guess we could do as I do for the diffmode and get versioning_base.add_diff_revision_view to return the select str for the "full rev" case too. Do you prefer that or should I construct my select_str for diffmode in versioning.py too ?

@vmora
Copy link
Contributor

vmora commented Feb 2, 2016

It may be nice to have a function in versioning_base to compute the where clause and another to build the complete "select" statement". Otherwise I thing we have a nice solution here.

We can keep the view creation functions in versioning_base (e.g. for testing) but use the select construction function inside.

I think also that the type conversion code has it's place in versionning base (the type of db should be an argument).

@ymoisan
Copy link
Contributor Author

ymoisan commented Feb 2, 2016

I think also that the type conversion code has it's place in versionning base (the type of db should be an argument).

Thanx for your comments. You mean move mem_layer_uri (and associated field_names_types) to versioning_base and pass an argument to the view functions, like vector_layer_type with possible options of 'postgres' or 'memory' ?

As to the separation of the select and where in two functions, I don't see much need for it since the select doesn't add much to the where, especially in the case of the diffmode. What we are doing really is one big pseudo-table arg to replace the schema.table args of the vector layer constructor.

P.S. I have tried using setSql in setDataSource. I fiddled with it a bit until I realized it would be much easier to read if the whole SQL code were in the table argument so I did not bother using setSql for the where bit.

@vmora
Copy link
Contributor

vmora commented Feb 2, 2016

You mean move mem_layer_uri

if it's the string yes, if it's the qgis class, no. It just look like some of the code could be moved to the base module, but no dep on qgis in the base module (that's the design).

setSql in setDataSource

You should be able to set the where clause in qgis (in layer properties/filter) maybe it's just the mechanism we need.

@ymoisan
Copy link
Contributor Author

ymoisan commented Feb 2, 2016

I moved the string manupulation function (renamed to mem_field_names_types) to versioning_base and left mem_layer_uri in versioning : 723dec2. I could have moved mem_layer_uri too but the dependency on QGis.WKB* definitions was cumbersome. I also integrated the where clause in the sql= parameter of the uri. Cleaner indeed (although one has to know looking at the layer source in the QGIS properties that it is the contents of a where clause).

My feeling at this point is that the SQL query for the non diffmode case (a simple filter on the historized table) can be handled in a simpler manner without versioning_base. Since versioning_base.add_revision_view becomes obsolete the functionality becomes untestable because only stuff in base is tested in the current state of things. A difference with what was done before is that I don't bother getting the versioning columns of a particular branch; I think for a revision a clearer picture involves all branches, because it may well be that a feature was modified in trunk and then in an arbitrary branch along the way and again in trunk, since revision history is not a function of branches.

How do you feel about all that ?

Lastly, I have a question. I see this weird behaviour that the Qt slot (versioning.check_branches) that is connected to my dialog's checkbox stateChanged signal gets called N times, N being the number of times I show the dialog. In the terminal, I see those print statements (from the slot) N times :

Checkbox is checked
Compared branches are trunk, trunk

It's as though something gets incremented every time the dialog pops. Do you know of ways of re-initializing a Qt dialog ? It's not a big deal since it only shows up in the terminal and otherwise does not have adverse effects, but I'd like to understand what is going on.

TIA

@vmora
Copy link
Contributor

vmora commented Feb 3, 2016

I think for a revision a clearer picture involves all branches,

I don't get that. A branch is like another db altogether, so I don't see what concept could involve more than one branch column (except for "merge branch_1 -> branche_2" which is not implemented at the moment.

signal gets called N times

A missing 'disconnect' somewhere. It will happen if the qobject is not destroyed and connect several times.

@ymoisan ymoisan mentioned this issue Feb 9, 2016
@ymoisan ymoisan closed this as completed Feb 12, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants