Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load Table error (size out of INT32_MAX) #280

Open
bigwater opened this issue Nov 13, 2019 · 2 comments
Open

Load Table error (size out of INT32_MAX) #280

bigwater opened this issue Nov 13, 2019 · 2 comments
Labels

Comments

@bigwater
Copy link

bigwater commented Nov 13, 2019

Problem

I was trying to load a table to OmniSci from pandas data frame. I created a data frame with two columns and N_GEN rows.

from pymapd import connect
import pandas as pd
import numpy as np

con = connect(user="admin", password="HyperInteractive", host="localhost", dbname="omnisci", port=6274)

N_GEN = 2 ** 28
arr1 = np.random.rand(N_GEN)
arr2 = np.random.randint(100, size=N_GEN)
df_arr1 = pd.DataFrame(zip(arr1, arr2), columns=[ 'num', 'grp'])

print(df_arr1.info())
print(df_arr1.shape)

We use N_GEN = 2^28. The data frame uses 4.0GB memory (reported by pandas info() function).

We use the following code to insert the data frame to the DB.

import time

start = time.time()
con.execute('drop table if exists t2;')
con.load_table('t2', df_arr1)
end = time.time()

print(end - start)

However, when we try to load it to the DB, it gave an error.

OverflowError: size out of range: exceeded INT32_MAX

The error report does not make sense to me ---- 2^28 is much less than INT_MAX32, right?

I wonder why this happened and how can I fix it.

Thank you so much!

Config

pymapd                    0.17.0                     py_0    conda-forge
omnisci-os-4.8.1-20190903-e9ac6920a3

The complete error call stack:

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-3-6b1dc12b0e01> in <module>
      3 start = time.time()
      4 con.execute('drop table if exists t2;')
----> 5 con.load_table('t2', df_arr1)
      6 end = time.time()
      7 

~/miniconda3/envs/xgbnew/lib/python3.7/site-packages/pymapd/connection.py in load_table(self, table_name, data, method, preserve_index, create)
    542             if (isinstance(data, pd.DataFrame)
    543                 or isinstance(data, pa.Table) or isinstance(data, pa.RecordBatch)): # noqa
--> 544                 return self.load_table_arrow(table_name, data)
    545 
    546             elif (isinstance(data, pd.DataFrame)):

~/miniconda3/envs/xgbnew/lib/python3.7/site-packages/pymapd/connection.py in load_table_arrow(self, table_name, data, preserve_index)
    690                                            preserve_index=preserve_index)
    691         self._client.load_table_binary_arrow(self._session, table_name,
--> 692                                              payload.to_pybytes())
    693 
    694     def render_vega(self, vega, compression_level=1):

~/miniconda3/envs/xgbnew/lib/python3.7/site-packages/omnisci/mapd/MapD.py in load_table_binary_arrow(self, session, table_name, arrow_stream)
   2549          - arrow_stream
   2550         """
-> 2551         self.send_load_table_binary_arrow(session, table_name, arrow_stream)
   2552         self.recv_load_table_binary_arrow()
   2553 

~/miniconda3/envs/xgbnew/lib/python3.7/site-packages/omnisci/mapd/MapD.py in send_load_table_binary_arrow(self, session, table_name, arrow_stream)
   2558         args.table_name = table_name
   2559         args.arrow_stream = arrow_stream
-> 2560         args.write(self._oprot)
   2561         self._oprot.writeMessageEnd()
   2562         self._oprot.trans.flush()

~/miniconda3/envs/xgbnew/lib/python3.7/site-packages/omnisci/mapd/MapD.py in write(self, oprot)
  13681     def write(self, oprot):
  13682         if oprot._fast_encode is not None and self.thrift_spec is not None:
> 13683             oprot.trans.write(oprot._fast_encode(self, [self.__class__, self.thrift_spec]))
  13684             return
  13685         oprot.writeStructBegin('load_table_binary_arrow_args')

OverflowError: size out of range: exceeded INT32_MAX

@randyzwitch
Copy link
Contributor

Unfortunately, this looks like an error in the Arrow method. If you try load_table(..., method='columnar'), does it work?

@semelianova
Copy link

@randyzwitch thanks, it works for me. I had the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants