Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/80 new upload widget #83

Open
wants to merge 45 commits into
base: master
Choose a base branch
from
Open
Changes from 21 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
1c984dd
Add new action to infer a tabular resource schema
aivuk Dec 6, 2022
154d529
Update ckanext/validation/logic.py
aivuk Dec 6, 2022
3b1773c
Remove coverage report
aivuk Dec 6, 2022
a167acb
Merge branch 'feature/76-add-resource-table-schema-infer' of github.c…
aivuk Dec 6, 2022
670d896
correct toolkit imported name
aivuk Dec 6, 2022
1dd1a8a
use new action endpoint with upload widget to create a resource
aivuk Dec 9, 2022
0f45263
remove default resource upload file field
aivuk Dec 10, 2022
124a0f7
Update widget and logic to replace already existing resource files
aivuk Dec 12, 2022
f2af792
Pass the url_type parameter to the ckan-uploader component
aivuk Dec 13, 2022
52ca754
get variables from resource form from hidden inputs
aivuk Dec 13, 2022
e8b5b42
add ckan_uploader snippet
aivuk Dec 13, 2022
6c167a6
Update logic to add another resource after saving one
aivuk Dec 14, 2022
a3c6e27
Add some comments to ckan-uploader-module
aivuk Dec 14, 2022
a225742
Corrects behaviour for uploaded file is not tabular
aivuk Dec 15, 2022
0644e0a
use custom resource_create and resource_update instead of new actions…
aivuk Dec 15, 2022
4c61a0b
remove check for content_length in uploaded resource schema file
aivuk Dec 16, 2022
6ece563
Remove unused actions
amercader Dec 16, 2022
61ff6d4
Use helper to get package id from url
aivuk Jan 12, 2023
d744d2f
Import ckan.model on helpers
aivuk Jan 12, 2023
9c16f8f
Fix blueprints missing imports
aivuk Jan 12, 2023
fb8f7f7
add resource edit new endpoint
aivuk Jan 30, 2023
28eec48
Create resource update endpoint and update ckan-uploader widget
aivuk Feb 1, 2023
547c5ac
Correct default return values for helpers
aivuk Feb 1, 2023
5fb0ec0
Get initialization variables for ckan-uploade its template.
aivuk Feb 1, 2023
8ac15f1
Remove custom resource_form.html template.
aivuk Feb 1, 2023
bfd400a
Get the schema from the returned schema from the action
aivuk Feb 1, 2023
036262e
Stop infering the schema as default in resource create and update
aivuk Feb 1, 2023
25e39ad
Add helpers that are used by ckan_uploader.html template
aivuk Feb 1, 2023
93e1956
Fix template for ckan_uploader adding correctly the variables intial …
aivuk Feb 1, 2023
431beeb
shorter form to compare schema_url value
aivuk Feb 6, 2023
63d8bcd
add logic to switch between file upload and resource url
aivuk Feb 8, 2023
930bb35
Merge branch 'feature/80-new-upload-widget' of github.com:frictionles…
aivuk Feb 8, 2023
a9e5459
remove erroneous test for schema file uploaded size
aivuk Feb 8, 2023
56ab8bb
Remove scrolling to schema json on resource edit
aivuk Feb 8, 2023
ebcc71a
Add basic create/update tests
amercader Feb 23, 2023
78a3351
Bump frictionless to fix schema infer errors
amercader Feb 23, 2023
25dbed6
Revert accidental changes in ckan-uploader.js
amercader Feb 23, 2023
95f31e4
Revert changes made to resource_create / resource_update actions
amercader Feb 23, 2023
9cbb754
Add turn_off_validation context manager
amercader Feb 23, 2023
d9598c5
Don't run validations when creating the draft resource
amercader Feb 23, 2023
3af20e2
Specify format in tests because of ckan/ckan#7415
amercader Feb 23, 2023
9236615
Remove duplicated test
amercader Feb 28, 2023
0198797
Bump responses
amercader Feb 28, 2023
d0f780d
Revert file size check removal in a9e5459
amercader Mar 29, 2023
f990969
Remove debugger
amercader Mar 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -58,9 +58,10 @@ jobs:
run: |
ckan -c test.ini db init
- name: Run tests
run: pytest --ckan-ini=test.ini --cov=ckanext.validation --cov-report=xml --cov-append --disable-warnings ckanext/validation/tests -vv
# run: pytest --ckan-ini=test.ini --cov=ckanext.validation --cov-report=xml --cov-append --disable-warnings ckanext/validation/tests -vv
run: pytest --ckan-ini=test.ini --disable-warnings ckanext/validation/tests -vv

- name: Upload coverage report to codecov
uses: codecov/codecov-action@v1
with:
file: ./coverage.xml
#- name: Upload coverage report to codecov
# uses: codecov/codecov-action@v1
# with:
# file: ./coverage.xml
74 changes: 73 additions & 1 deletion ckanext/validation/blueprints.py
Original file line number Diff line number Diff line change
@@ -2,7 +2,21 @@

from flask import Blueprint

from ckantoolkit import c, NotAuthorized, ObjectNotFound, abort, _, render, get_action
from ckan.lib.navl.dictization_functions import unflatten
from ckan.logic import tuplize_dict, clean_dict, parse_params
from ckanext.validation.logic import is_tabular

from ckantoolkit import (
c, g,
NotAuthorized,
ObjectNotFound,
abort,
_,
render,
get_action,
request,
)
import ckantoolkit as t

validation = Blueprint("validation", __name__)

@@ -40,6 +54,64 @@ def read(id, resource_id):

abort(404, _(u"No validation report exists for this resource"))

def _get_data():
data = clean_dict(
unflatten(tuplize_dict(parse_params(request.form)))
)
data.update(clean_dict(
unflatten(tuplize_dict(parse_params(request.files)))
))


def resource_file_create(id):

# Get data from the request
data_dict = _get_data()

# Call resource_create
context = {
'user': g.user,
}
data_dict["package_id"] = id
resource = get_action("resource_create")(context, data_dict)

# If it's tabular (local OR remote), infer and store schema
if is_tabular(resource):
update_resource = get_action('resource_table_schema_infer')(
context, {'resource_id': resource.id, 'store_schema': True}
)

# Return resource
return resource


def resource_file_update(id, resource_id):
# Get data from the request
data_dict = _get_data()

# Call resource_create
context = {
'user': g.user,
}
data_dict["package_id"] = id
resource = get_action("resource_update")(context, data_dict)

# If it's tabular (local OR remote), infer and store schema
if is_tabular(resource):
update_resource = get_action('resource_table_schema_infer')(
context, {'resource_id': resource.id, 'store_schema': True}
)

# Return resource
return resource


validation.add_url_rule(
"/dataset/<id>/resource/file", view_func=resource_file_create, methods=["POST"]
)
validation.add_url_rule(
"/dataset/<id>/resource/<resource_id>/file", view_func=resource_file_update, methods=["POST"]
)

validation.add_url_rule(
"/dataset/<id>/resource/<resource_id>/validation", view_func=read
3 changes: 2 additions & 1 deletion ckanext/validation/examples/ckan_default_schema.json
Original file line number Diff line number Diff line change
@@ -86,7 +86,8 @@
{
"field_name": "url",
"label": "URL",
"preset": "resource_url_upload"
"preset": "resource_url_upload",
"form_snippet": "ckan_uploader.html"
},
{
"field_name": "name",
16 changes: 13 additions & 3 deletions ckanext/validation/helpers.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# encoding: utf-8
import json

from ckan.lib.helpers import url_for_static
from ckantoolkit import url_for, _, config, asbool, literal, h
from ckan import model
from ckantoolkit import url_for, _, config, asbool, literal, h, request

import json
import re

def get_validation_badge(resource, in_listing=False):

@@ -96,6 +97,15 @@ def bootstrap_version():
else:
return '2'

def get_package_id_from_resource_url():
match = re.match("/dataset/(.*)/resource/", request.path)
if match:
return model.Package.get(match.group(1)).id

def get_resource_id_from_resource_url():
match = re.match("/dataset/(.*)/resource/(.*)/edit", request.path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay, but since it's a constant pattern, perhaps it should be precompiled at the module level with re.compile?

if match:
return model.Resource.get(match.group(2)).id

def use_webassets():
return int(h.ckan_version().split('.')[1]) >= 9
160 changes: 120 additions & 40 deletions ckanext/validation/logic.py
Original file line number Diff line number Diff line change
@@ -5,6 +5,7 @@
import json

from sqlalchemy.orm.exc import NoResultFound
from frictionless import system, Resource, FrictionlessException

import ckan.plugins as plugins
import ckan.lib.uploader as uploader
@@ -24,6 +25,23 @@

log = logging.getLogger(__name__)

ACCEPTED_TABULAR_FORMATS = set([
'text/csv',
'application/vnd.ms-excel',
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
])

ACCEPTED_TABULAR_EXTENSIONS = set([
'csv',
'tsv',
'xls',
'xlsx'
])

def is_tabular(filename = '', mimetype = ''):
uploaded_file_extension = filename.split('.')[-1].lower()
return mimetype in ACCEPTED_TABULAR_FORMATS or \
uploaded_file_extension in ACCEPTED_TABULAR_EXTENSIONS

def enqueue_job(*args, **kwargs):
try:
@@ -174,6 +192,50 @@ def resource_validation_show(context, data_dict):

return _validation_dictize(validation)

def resource_table_schema_infer(context, data_dict):
'''
Use frictionless framework to infer a resource schema
'''

t.check_access('resource_create', context, data_dict)

t.get_or_bust(data_dict, 'resource_id')

store_schema = data_dict.get('store_schema', True)

resource = t.get_action('resource_show')(
{}, {u'id': data_dict['resource_id']})

source = None
if resource.get('url_type') == 'upload':
upload = uploader.get_resource_uploader(resource)
if isinstance(upload, uploader.ResourceUpload):
source = upload.get_path(resource['id'])

if not source:
source = resource['url']

with system.use_context(trusted=True):
if is_tabular(filename=resource['url']):
try:
fric_resource = Resource({'path': source, 'format': resource['format'].lower()})
fric_resource.infer()
resource['schema'] = fric_resource.schema.to_json()

if store_schema:
t.get_action('resource_update')(
context, resource)

return {u'schema': fric_resource.schema.to_dict()}
except FrictionlessException as e:
log.warning(
u'Error trying to infer schema for resource %s: %s',
resource['id'], e)

return {u'schema': ''}
else:
return {u'schema': ''}


def resource_validation_delete(context, data_dict):
u'''
@@ -434,9 +496,6 @@ def resource_create(up_func, context, data_dict):

'''

if get_create_mode_from_config() != 'sync':
return up_func(context, data_dict)

model = context['model']

package_id = t.get_or_bust(data_dict, 'package_id')
@@ -470,6 +529,7 @@ def resource_create(up_func, context, data_dict):
try:
context['defer_commit'] = True
context['use_cache'] = False
pkg_dict['state'] = 'active'
t.get_action('package_update')(context, pkg_dict)
context.pop('defer_commit')
except t.ValidationError as e:
@@ -486,25 +546,32 @@ def resource_create(up_func, context, data_dict):

# Custom code starts

run_validation = True
if get_create_mode_from_config() == 'sync':

for plugin in plugins.PluginImplementations(IDataValidation):
if not plugin.can_validate(context, data_dict):
log.debug('Skipping validation for resource {}'.format(resource_id))
run_validation = False
run_validation = True

if run_validation:
is_local_upload = (
hasattr(upload, 'filename') and
upload.filename is not None and
isinstance(upload, uploader.ResourceUpload))
_run_sync_validation(
resource_id, local_upload=is_local_upload, new_resource=True)
for plugin in plugins.PluginImplementations(IDataValidation):
if not plugin.can_validate(context, data_dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why skip validation if one plugin can do it and another can't? Shouldn't we just let the one plugin handle it?

log.debug('Skipping validation for resource {}'.format(resource_id))
run_validation = False

if run_validation:
is_local_upload = (
hasattr(upload, 'filename') and
upload.filename is not None and
isinstance(upload, uploader.ResourceUpload))
_run_sync_validation(
resource_id, local_upload=is_local_upload, new_resource=True)

# Custom code ends

model.repo.commit()

if upload.filename and is_tabular(filename=upload.filename):
update_resource = t.get_action('resource_table_schema_infer')(
context, {'resource_id': resource_id, 'store_schema': True}
)

# Run package show again to get out actual last_resource
updated_pkg_dict = t.get_action('package_show')(
context, {'id': package_id})
@@ -540,9 +607,6 @@ def resource_update(up_func, context, data_dict):

'''

if get_update_mode_from_config() != 'sync':
return up_func(context, data_dict)

model = context['model']
id = t.get_or_bust(data_dict, "id")

@@ -576,21 +640,24 @@ def resource_update(up_func, context, data_dict):
'datastore_active' not in data_dict):
data_dict['datastore_active'] = resource.extras['datastore_active']


for plugin in plugins.PluginImplementations(plugins.IResourceController):
plugin.before_update(context, pkg_dict['resources'][n], data_dict)

upload = uploader.get_resource_uploader(data_dict)

if 'mimetype' not in data_dict:
if hasattr(upload, 'mimetype'):
data_dict['mimetype'] = upload.mimetype
if 'mimetype' not in data_dict or hasattr(upload, 'mimetype'):
data_dict['mimetype'] = upload.mimetype
if upload.filename:
data_dict['format'] = upload.filename.split('.')[-1]

if 'size' not in data_dict and 'url_type' in data_dict:
if hasattr(upload, 'filesize'):
data_dict['size'] = upload.filesize
if 'size' not in data_dict and 'url_type' in data_dict or hasattr(upload, 'filesize'):
data_dict['size'] = upload.filesize

pkg_dict['resources'][n] = data_dict



try:
context['defer_commit'] = True
context['use_cache'] = False
@@ -602,26 +669,39 @@ def resource_update(up_func, context, data_dict):
except (KeyError, IndexError):
raise t.ValidationError(e.error_dict)

resource = updated_pkg_dict['resources'][-1]
upload.upload(id, uploader.get_max_resource_size())

if upload.filename and is_tabular(filename=upload.filename):
update_resource = t.get_action('resource_table_schema_infer')(
context, {'resource_id': resource['id'], 'store_schema': True}
)

# Run package show again to get out actual last_resource
updated_pkg_dict = t.get_action('package_show')(
context, {'id': package_id})
resource = updated_pkg_dict['resources'][-1]


# Custom code starts

run_validation = True
for plugin in plugins.PluginImplementations(IDataValidation):
if not plugin.can_validate(context, data_dict):
log.debug('Skipping validation for resource {}'.format(id))
run_validation = False

if run_validation:
run_validation = not data_dict.pop('_skip_next_validation', None)

if run_validation:
is_local_upload = (
hasattr(upload, 'filename') and
upload.filename is not None and
isinstance(upload, uploader.ResourceUpload))
_run_sync_validation(
id, local_upload=is_local_upload, new_resource=False)
if get_update_mode_from_config() == 'sync':
run_validation = True
for plugin in plugins.PluginImplementations(IDataValidation):
if not plugin.can_validate(context, data_dict):
log.debug('Skipping validation for resource {}'.format(id))
run_validation = False

if run_validation:
run_validation = not data_dict.pop('_skip_next_validation', None)

if run_validation:
is_local_upload = (
hasattr(upload, 'filename') and
upload.filename is not None and
isinstance(upload, uploader.ResourceUpload))
_run_sync_validation(
id, local_upload=is_local_upload, new_resource=False)

# Custom code ends

Loading