forked from ibmdbanalytics/ibmdbpy
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ROADMAP.txt
57 lines (40 loc) · 3.57 KB
/
ROADMAP.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# Roadmap
* Full test coverage (a basic coverage is already provided).
* Add wrappers for ML-Algorithms.
* Implement pending methods.
* Feature selection extension.
* Idea: Add Spark as computational engine?
* Add wrappers for binary geospatial methods.
* Fix issues related to filter queries.
* Fix issues related to geospatial methods.
* Consider the viability of using ibm_db package instead of pypyodbc for ODBC conections
* Add ODBC support from Linux
* ibmdbpy uses future and six as compatibility layers, should use only future if possible
* Fix the number of warnings occurring when building the doc (make html)
* Add support for more datatypes (upload DataFrame and download Tables) such as Timestamps
* Refine the test routine: Instead of giving the choice of testing with a particular table, test systematically with several small but relevant tables (e.g. containing missing values, with different data types, etc)
* Catch properly failures for operation on unsupported datatypes: for example, min and max are not supported for ST_* (geospatial) column types
* Remove Benchmarking section from the docu (start.rst), since those features where discarded
* DataFrame.sort( is deprecated, use .sort_values(inplace=True) instead
# Detailed information
## Issues related to filter queries
* Add a type check to the filter queries so that queries like
idadf[ idadf['name of a column with a non-numeric value'] < 10 ]
are handled properly.
* In general, the behavior of idadf[...] as well as idadf.loc[...] (calling getitem)
is not fully tested. It would be good to test it and correct potential bugs. Keep in
mind that it should be based on Pandas.Dataframe behavior.
## Issues related to geospatial methods
* Ensure that a column with ST_ type, or any other unorderable type, cannot be set as indexer attribute of IdaDataFrame.
* Ensure that methods min() and max() of IdaDataFrame are not carried on columns with ST_ type, or any other unorderable type.
* Hide the warnings thrown by jaydebeapi when retrieving from tables with columns with ST_ type.
A fix might be wrapping columns with ST_ type with ST_AsText() so that actually a CLOB is getting retrieved, instead a geometry type (mind that in the database the geometries are a STRUCT type, but when retrieved with jaydebeapi they do have the getSubString() method, like a CLOB).
For raw SQL queries it's not straightforward knowing which columns have geometry type in the database.
* Add methods for dealing with Spatial reference systems (set and modify the Spatial reference system of a column).
* Handle SQLSTATE 38SU4 error when dealing with Spatial reference systems (see geoSeries.py).
* Include three small datasets, subsets of the tables GEO_COUNTY, GEO_CUSTOMER, and GEO_TORNADO of DashDB sample data in ibmdbpy so that tests of IdaGeoDataFrame and tests of IdaGeoSeries are performed with temporary tables created from this data. Change the tests accordingly if needed.
* Include also a data set for testing with Linestring types, to test the methods mid_point(), start_point() and end_point().
* Add examples to the geospatial methods in IdaGeoSeries.
## ODBC support from Linux
* Check the viability of using a syntax similar to what is currently used for Windows (i.e. giving to ibmdbpy only the previously-set data source name), or allowing to enter the whole ODBC settings as a string.
* Mind that there's a known bug of pypyodbc in python3 when using Pandas' read_sql() method (and dashDB driver), the column names of the returned DataFrame are not properly set. This was found when trying to use ibmdbpy with IBM ODBC Driver, from a Linux environment.