Version 0.42 - add search functionality, tests and documentation

OCHA-DAP · Sep 15, 2016 · 864cb2b · 864cb2b
1 parent 296042d
commit 864cb2b
Show file tree

Hide file tree

Showing 8 changed files with 520 additions and 279 deletions.
diff --git a/.idea/workspace.xml b/.idea/workspace.xml
diff --git a/README.md b/README.md
@@ -17,6 +17,7 @@ For more about the purpose and design philosophy, please visit [HDX Python Libra
 	- [Operations on HDX Objects](#operations-on-hdx-objects)
 	- [Dataset Specific Operations](#dataset-specific-operations)
 - [Working Example](#working-example)
+- [ACLED Example](#acled-example)
 
 ## Usage
 The library has detailed API documentation:  
@@ -44,7 +45,7 @@ The first task is to create an API key file. By default this is assumed to be ca
 
 To include the HDX Python library in your project, pip install the line below or add the following to your `requirements.txt` file:
 
-    git+git://github.com/ocha-dap/hdx-python-api.git@v0.41#egg=hdx-python-api
+    git+git://github.com/ocha-dap/hdx-python-api.git@v0.42#egg=hdx-python-api
 
 If you get errors, it is probably the dependencies of the cryptography package that are missing eg. for Ubuntu: python-dev, libffi-dev and libssl-dev. See [cryptography dependencies](https://cryptography.io/en/latest/installation/#building-cryptography-on-linux)
 
@@ -67,7 +68,7 @@ Let's start with a simple example that also ensures that the library is working
         source test/bin/activate
 4. Install the HDX Python library:
 
-        pip install git+git://github.com/ocha-dap/hdx-python-api.git@v0.41#egg=hdx-python-api
+        pip install git+git://github.com/ocha-dap/hdx-python-api.git@v0.42#egg=hdx-python-api
 5. If you get errors, it is probably the [dependencies of the cryptography package](#installing-the-library)
 6. Launch python:
 
@@ -92,7 +93,11 @@ Let's start with a simple example that also ensures that the library is working
 
         dataset['dataset_date'] = '06/25/2016'
         dataset.update_in_hdx()
-12. Exit and remove virtualenv:
+12. You can search for datasets on HDX:
+
+        datasets = Dataset.search_in_hdx(configuration, 'ACLED')
+        print(datasets)
+13. Exit and remove virtualenv:
 
         exit()
         deactivate
@@ -212,6 +217,12 @@ You can read an existing HDX object with the static `read_from_hdx` method whi
 
     dataset = Dataset.read_from_hdx(configuration, 'DATASET_ID_OR_NAME')
 
+You can search for datasets and resources in HDX using the `search_in_hdx` method which takes a configuration and a query parameter and returns the a list of objects of the appropriate HDX object type eg. `list[Dataset]` eg.
+
+        datasets = Dataset.search_in_hdx(configuration, 'QUERY')
+
+The query parameter takes a different format depending upon whether it is for a [dataset](http://lucene.apache.org/core/3_6_0/queryparsersyntax.html) or a [resource](http://docs.ckan.org/en/ckan-2.3.4/api/index.html#ckan.logic.action.get.resource_search). 
+
 You can create an HDX Object, such as a dataset, resource or gallery item by calling the constructor with a configuration, which is required, and an optional dictionary containing metadata. For example:
 
     from hdx.data.dataset import Dataset
@@ -354,6 +365,12 @@ Create a file `my_code.py` and copy into it the code below:
 
 You can then fill out the function `generate_dataset` as required.
 
+## ACLED Example
+
 A complete example can be found here: [https://github.com/mcarans/hdxscraper-acled-africa](https://github.com/mcarans/hdxscraper-acled-africa)
 
-In particular, take a look at the files `run.py`, `acled_africa.py` and the `config` folder.
+In particular, take a look at the files `run.py`, `acled_africa.py` and the `config` folder.
+
+The ACLED scraper creates a dataset in HDX for [ACLED realtime data](https://data.humdata.org/dataset/acled-conflict-data-for-africa-realtime-2016) if it doesn't already exist, populating all the required metadata. It then creates resources that point to urls of [Excel and csv files for Realtime 2016 All Africa data](http://www.acleddata.com/data/realtime-data-2016/) (or updates the links and metadata if the resources already exist). Finally it creates a gallery item that points to these [dynamic maps and graphs](http://www.acleddata.com/visuals/maps/dynamic-maps/). 
+
+The first iteration of the ACLED scraper was written without the HDX Python library and it became clear looking at this and previous work by others that there are operations that are frequently required and which add unnecessary complexity to the task of coding against HDX. Simplifying the interface to HDX drove the development of the Python library and the second iteration of the scraper was built using it. With the interface using HDX terminology and mapping directly on to datasets, resources and gallery items, the ACLED scraper was faster to develop and is much easier to understand for someone inexperienced in how it works and what it is doing. The challenge with ACLED is that sometimes the urls that the resources point to have not been updated and hence do not work. In this situation, the extensive logging and transparent communication of errors is invaluable and enables action to be taken to resolve the issue as quickly as possible. The static metadata for ACLED is held in human readable files so if it needs to be modified, it is straightforward. This is another feature of the HDX Python library that makes putting data programmatically into HDX a breeze. 
diff --git a/hdx/data/dataset.py b/hdx/data/dataset.py
@@ -6,7 +6,6 @@
 """
 import logging
 from os.path import join
-
 from typing import Any, List, Optional
 
 from hdx.configuration import Configuration
@@ -47,7 +46,8 @@ def actions() -> dict:
             'show': 'package_show',
             'update': 'package_update',
             'create': 'package_create',
-            'delete': 'package_delete'
+            'delete': 'package_delete',
+            'search': 'package_search'
         }
 
     def __setitem__(self, key: Any, value: Any) -> None:
@@ -255,6 +255,20 @@ def read_from_hdx(configuration: Configuration, identifier: str) -> Optional['Da
             return dataset
         return None
 
+    def _dataset_create_resources_gallery(self) -> None:
+        """Creates resource and gallery item objects in dataset
+        """
+
+        if 'resources' in self.data:
+            self.old_data['resources'] = self._copy_hdxobjects(self.resources, Resource)
+            self.separate_resources()
+        if self.include_gallery:
+            success, result = self._read_from_hdx('gallery', self.data['id'], 'id', GalleryItem.actions()['list'])
+            if success:
+                self.data['gallery'] = result
+                self.old_data['gallery'] = self._copy_hdxobjects(self.gallery, GalleryItem)
+                self.separate_gallery()
+
     def _dataset_load_from_hdx(self, id_or_name: str) -> bool:
         """Loads the dataset given by either id or name from HDX
 
@@ -267,15 +281,7 @@ def _dataset_load_from_hdx(self, id_or_name: str) -> bool:
 
         if not self._load_from_hdx('dataset', id_or_name):
             return False
-        if 'resources' in self.data:
-            self.old_data['resources'] = self._copy_hdxobjects(self.resources, Resource)
-            self.separate_resources()
-        if self.include_gallery:
-            success, result = self._read_from_hdx('gallery', self.data['id'], GalleryItem.actions()['list'])
-            if success:
-                self.data['gallery'] = result
-                self.old_data['gallery'] = self._copy_hdxobjects(self.gallery, GalleryItem)
-                self.separate_gallery()
+        self._dataset_create_resources_gallery()
         return True
 
     def check_required_fields(self, ignore_fields: List[str] = list()) -> None:
@@ -422,3 +428,29 @@ def delete_from_hdx(self) -> None:
             None
         """
         self._delete_from_hdx('dataset', 'id')
+
+    @staticmethod
+    def search_in_hdx(configuration: Configuration, query: str) -> List['Dataset']:
+        """Searches for datasets in HDX
+
+        Args:
+            configuration (Configuration): HDX Configuration
+            query (str): Query
+
+        Returns:
+            List[Dataset]: List of datasets resulting from query
+        """
+
+        datasets = []
+        dataset = Dataset(configuration)
+        success, result = dataset._read_from_hdx('dataset', query, 'q')
+        if result:
+            count = result.get('count', None)
+            if count:
+                for datasetdict in result['results']:
+                    dataset = Dataset(configuration)
+                    dataset.old_data = dict()
+                    dataset.data = datasetdict
+                    dataset._dataset_create_resources_gallery()
+                    datasets.append(dataset)
+        return datasets
diff --git a/hdx/data/hdxobject.py b/hdx/data/hdxobject.py
@@ -82,29 +82,34 @@ def update_json(self, path: str):
         """
         self.data = load_json_into_existing_dict(self.data, path)
 
-    def _read_from_hdx(self, object_type: str, id_field: str, action: Optional[str] = None) -> Tuple[bool, dict]:
-        """Checks if the hdx object exists in HDX.
+    def _read_from_hdx(self, object_type: str, value: str, fieldname: Optional[str] = 'id',
+                       action: Optional[str] = None) -> Tuple[bool, dict]:
+        """Makes a read call to HDX passing in given parameter.
 
         Args:
             object_type (str): Description of HDX object type (for messages)
-            id_field (str): HDX object identifier
-            action (Optional[str]): Replacement CKAN url to use. Defaults to None.
+            value (str): Value of HDX field
+            fieldname (Optional[str]): HDX field name. Defaults to id.
+            action (Optional[str]): Replacement CKAN action url to use. Defaults to None.
 
         Returns:
             (bool, dict): (True/False, HDX object metadata/Error)
         """
-        if not id_field:
-            raise HDXError("Empty %s identifier!" % object_type)
+        if not value:
+            raise HDXError("Empty %s value!" % object_type)
         if action is None:
-            action = self.actions()['show']
+            if fieldname == 'query' or fieldname == 'q':
+                action = self.actions()['search']
+            else:
+                action = self.actions()['show']
         try:
-            result = self.hdxpostsite.call_action(action, {'id': id_field},
+            result = self.hdxpostsite.call_action(action, {fieldname: value},
                                                   requests_kwargs={'auth': self.configuration._get_credentials()})
             return True, result
         except NotFound as e:
-            return False, "%s not found!" % id_field
+            return False, "%s=%s: not found!" % (fieldname, value)
         except Exception as e:
-            raise HDXError('HTTP Get failed when trying to read %s' % id_field) from e
+            raise HDXError('HTTP Get failed when trying to read: %s=%s' % (fieldname, value)) from e
 
     def _load_from_hdx(self, object_type: str, id_field: str) -> bool:
         """Helper method to load the HDX object given by identifier from HDX

diff --git a/hdx/data/resource.py b/hdx/data/resource.py
@@ -3,7 +3,6 @@
 """Resource class containing all logic for creating, checking, and updating resources."""
 import logging
 from os.path import join
-
 from typing import Optional, List
 
 from hdx.configuration import Configuration
@@ -35,7 +34,8 @@ def actions() -> dict:
             'show': 'resource_show',
             'update': 'resource_update',
             'create': 'resource_create',
-            'delete': 'resource_delete'
+            'delete': 'resource_delete',
+            'search': 'resource_search'
         }
 
     def update_yaml(self, path: str = join('config', 'hdx_resource_static.yml')) -> None:
@@ -113,6 +113,29 @@ def delete_from_hdx(self) -> None:
         """
         self._delete_from_hdx('resource', 'id')
 
+    @staticmethod
+    def search_in_hdx(configuration: Configuration, query: str) -> List['Resource']:
+        """Searches for resources in HDX
+
+        Args:
+            configuration (Configuration): HDX Configuration
+            query (str): Query
+
+        Returns:
+            List[Resource]: List of resources resulting from query
+        """
+
+        resources = []
+        resource = Resource(configuration)
+        success, result = resource._read_from_hdx('resource', query, 'query')
+        if result:
+            count = result.get('count', None)
+            if count:
+                for resourcedict in result['results']:
+                    resource = Resource(configuration, resourcedict)
+                    resources.append(resource)
+        return resources
+
     def create_datastore(self) -> None:
         """TODO"""
         pass
diff --git a/setup.py b/setup.py
@@ -13,7 +13,7 @@
 
 setup(
     name='hdx-python-api',
-    version='0.41',
+    version='0.42',
     packages=find_packages(exclude=['ez_setup', 'tests', 'tests.*']),
     url='http://data.humdata.org/',
     license='PSF',

diff --git a/tests/hdx/data/test_dataset.py b/tests/hdx/data/test_dataset.py
@@ -12,6 +12,7 @@
 from hdx.data.dataset import Dataset
 from hdx.data.hdxobject import HDXError
 from hdx.utilities.dictionary import merge_two_dictionaries
+from hdx.utilities.loader import load_yaml
 
 
 class MockResponse:
@@ -83,6 +84,7 @@ def json(self):
     'solr_additions': '{"countries": ["Algeria", "Zimbabwe"]}',
     'dataset_date': '06/04/2016'}
 
+searchdict = load_yaml(join('fixtures', 'search_results.yml'))
 
 def mockshow(url, datadict):
     if 'show' not in url and 'related_list' not in url:
@@ -113,6 +115,28 @@ def mockshow(url, datadict):
                         '{"success": false, "error": {"message": "Not found", "__type": "Not Found Error"}, "help": "http://test-data.humdata.org/api/3/action/help_show?name=dataset_show"}')
 
 
+def mocksearch(url, datadict):
+    if 'search' not in url and 'related_list' not in url:
+        return MockResponse(404,
+                            '{"success": false, "error": {"message": "TEST ERROR: Not search", "__type": "TEST ERROR: Not Search Error"}, "help": "http://test-data.humdata.org/api/3/action/help_show?name=package_search"}')
+    if 'related_list' in url:
+        result = json.dumps(TestDataset.gallery_data)
+        return MockResponse(200,
+                            '{"success": true, "result": %s, "help": "http://test-data.humdata.org/api/3/action/help_show?name=related_list"}' % result)
+    result = json.dumps(searchdict)
+    if datadict['q'] == 'ACLED':
+        return MockResponse(200,
+                            '{"success": true, "result": %s, "help": "http://test-data.humdata.org/api/3/action/help_show?name=package_search"}' % result)
+    if datadict['q'] == '"':
+        return MockResponse(404,
+                            '{"success": false, "error": {"message": "Validation Error", "__type": "Validation Error"}, "help": "http://test-data.humdata.org/api/3/action/help_show?name=package_search"}')
+    if datadict['q'] == 'ajyhgr':
+        return MockResponse(200,
+                            '{"success": true, "result": {"count": 0, "results": []}, "help": "http://test-data.humdata.org/api/3/action/help_show?name=package_search"}')
+    return MockResponse(404,
+                        '{"success": false, "error": {"message": "Not found", "__type": "Not Found Error"}, "help": "http://test-data.humdata.org/api/3/action/help_show?name=package_search"}')
+
+
 class TestDataset():
     dataset_data = {
         'name': 'MyDataset1',
@@ -290,6 +314,15 @@ def mockreturn(url, data, headers, files, allow_redirects, auth):
 
         monkeypatch.setattr(requests, 'post', mockreturn)
 
+    @pytest.fixture(scope='function')
+    def search(self, monkeypatch):
+        def mockreturn(url, data, headers, files, allow_redirects, auth):
+            datadict = json.loads(data.decode('utf-8'))
+            return mocksearch(url, datadict)
+
+        monkeypatch.setattr(requests, 'post', mockreturn)
+
+
     @pytest.fixture(scope='class')
     def configuration(self):
         hdx_key_file = join('fixtures', '.hdxkey')
@@ -460,3 +493,11 @@ def test_add_update_delete_gallery(self, configuration, post_delete):
         dataset.delete_galleryitem('NOTEXIST')
         dataset.delete_galleryitem('d59a01d8-e52b-4337-bcda-fceb1d059bef')
         assert len(dataset.gallery) == 0
+
+    def test_search_in_hdx(self, configuration, search):
+        datasets = Dataset.search_in_hdx(configuration, 'ACLED')
+        assert len(datasets) == 10
+        datasets = Dataset.search_in_hdx(configuration, 'ajyhgr')
+        assert len(datasets) == 0
+        with pytest.raises(HDXError):
+            Dataset.search_in_hdx(configuration, '"')