Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move csv or other files into hive #156

Open
napjon opened this issue Mar 30, 2015 · 4 comments
Open

Move csv or other files into hive #156

napjon opened this issue Mar 30, 2015 · 4 comments
Milestone

Comments

@napjon
Copy link

napjon commented Mar 30, 2015

Hey guys, How do I move files to hive remote server? I have hostname, username,password,and port.

I've been doing this:

into('hive://[email protected]:3022/default::iris','iris.csv',password='training')

But it's return:

/Users/jon/anaconda/lib/python2.7/site-packages/thrift/transport/TSocket.pyc in read(self, sz)
118     if len(buff) == 0:
119       raise TTransportException(type=TTransportException.END_OF_FILE,

--> 120 message='TSocket read 0 bytes')

TTransportException: TSocket read 0 bytes

I have succeed moving file into remote server, but to hive resulting like this. Any idea?

@cpcloud
Copy link
Member

cpcloud commented Mar 30, 2015

It looks like you're trying to write to a Hive from the box where Hive is located. Is that correct? If so, I'll note that that code path isn't well tested. We should add documentation to this effect as well as any fixes that might be required for this to work.

If the CSV is already on the remote machine then you should have no problems doing something like

into('hive://user@remote_ip:3022/default::iris', './path/to/local/file.csv', username='user', hostname='remote_ip', password='training')

If you have a private key file you can forego using a plaintext password (usually a good idea) and do this:

into('hive://user@remote_ip:3022/default::iris', './path/to/local/file.csv', username='user', hostname='remote_ip', key_filename='/local/path/to/keyfile.key')

I just ran the latter command successfully. Let me know if you're still having problems

@cpcloud
Copy link
Member

cpcloud commented Mar 30, 2015

made an issue to track odo-with-hive-on-local: #158

@JONHARI thanks for asking questions! keep 'em coming.

@napjon
Copy link
Author

napjon commented Mar 31, 2015

Hi @cpcloud. Thank you for your reply! rely excited about this project.

Currently I'm following remote.ipynb from this tutorial

Yes, just testing with VM. 3022 is port forwarding to my VM, not hive port. If I do this,

auth = {
 'port': 3022,
 'username': 'training'}

result = into(pd.DataFrame,'ssh://127.0.0.1:data/iris.csv',**auth)
into('ssh://127.0.0.1:data/res1.csv',result,**auth)

Both are working well, so I guess paramiko has detected my private key file. So how do I set which port for hive and which port to connect to VM? I still use the default hive port, which at 10000.

@cpcloud
Copy link
Member

cpcloud commented Mar 31, 2015

The hive port is specified as I show above. If you don't pass anything the pyhive library that we use to drive hive will default to port 10000. You should be able to just pass the hostname/ip, like this

into('hive://hostname/default::mytable', '/path/to/csv/file.csv')

@cpcloud cpcloud modified the milestone: 0.3.4 Jul 1, 2015
@cpcloud cpcloud modified the milestones: 0.3.4, 0.4.0 Sep 15, 2015
@cpcloud cpcloud modified the milestones: 0.4.0, 0.3.5 Oct 5, 2015
@cpcloud cpcloud modified the milestones: 0.4.0, 0.4.1 Dec 4, 2015
@kwmsmith kwmsmith modified the milestones: 0.4.1, 0.4.2, 0.5.0 Feb 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants