-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] How dictionaries work - roundtrip Java-Python #327
base: main
Are you sure you want to change the base?
[Java] How dictionaries work - roundtrip Java-Python #327
Conversation
Hi @vibhatha, just reviewing error logs and error mention that arrow-memory-netty is not available on the nightly packages, and that is true, as you can see at https://nightlies.apache.org/arrow/java/org/apache/arrow/arrow-memory-netty/, which is very rare, but it needs to be reviewed. Alternatively, you could start using the 14.0.0-SNAPSHOT version which can be configured in the arrow-cookbook/java/source/conf.py file as follows:
|
If I understand correctly, should I make that change in this PR itself? Or should we create a separate PR for that? |
This change is being added to #320 as well, but it is true, this must have been created independently. |
@davisusanibar sounds good and I will rebase once this PR is merged. Thank you 👍 |
7befe38
to
c35eefd
Compare
@danepitkin @davisusanibar This PR is ready for review, please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vibhatha ! This will be a great addition to the cook book. I'm not super familiar with the C Data interface / jpype so reviewing this was a good learning opportunity for me.
java/source/index.rst
Outdated
@@ -43,6 +43,7 @@ This cookbook is tested with Apache Arrow |version|. | |||
data | |||
avro | |||
jdbc | |||
python_java |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would "c data interface" potentially be a better name? The section below uses pyarrow/jpype as a python example, but in the future we could also add other language interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danepitkin
Thank you for the quick review and suggestions. And I agree with your comment. Will make the changed.
java/source/python_java.rst
Outdated
In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. | ||
It then updates the data and sends it back to the Python component. | ||
|
||
.. code-block:: java |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java component is tested by custom directives created with testcode
and testoutput
.
During internal works, Java code is executed by Jshell and output defined by system.out.print on the testcode is compared to the value defined on the test output, such as:
.. testcode::
System.out.print("testme");
.. testoutput::
testme
java/source/python_java.rst
Outdated
return this.vector; | ||
} | ||
|
||
public void update(long c_array_ptr, long c_schema_ptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be an option to also validate Python call as as part of Java testing using .. testcode::
and .. testoutput::
directives?
Something like this:
import org.apache.arrow.c.ArrowArray;
import org.apache.arrow.c.ArrowSchema;
import org.apache.arrow.c.CDataDictionaryProvider;
import org.apache.arrow.c.Data;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.BigIntVector;
import org.apache.arrow.vector.FieldVector;
public class MapValuesConsumer {
private final static BufferAllocator allocator = new RootAllocator();
private final CDataDictionaryProvider provider;
private FieldVector vector;
public MapValuesConsumer(CDataDictionaryProvider provider) {
this.provider = provider;
}
public MapValuesConsumer() {
this.provider = null;
}
public static BufferAllocator getAllocatorForJavaConsumer() {
return allocator;
}
public FieldVector getVector() {
return this.vector;
}
public void update(long c_array_ptr, long c_schema_ptr) {
ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
this.vector = Data.importVector(allocator, arrow_array, arrow_schema, this.provider);
this.doWorkInJava(vector);
}
public void update2(long c_array_ptr, long c_schema_ptr) {
ArrowArray arrow_array = ArrowArray.wrap(c_array_ptr);
ArrowSchema arrow_schema = ArrowSchema.wrap(c_schema_ptr);
this.vector = Data.importVector(allocator, arrow_array, arrow_schema, null);
this.doWorkInJava(vector);
}
private void doWorkInJava(FieldVector vector) {
System.out.println("Doing work in Java");
BigIntVector bigIntVector = (BigIntVector)vector;
bigIntVector.setSafe(0, 2);
}
public static void main(String[] args) {
simulateAsAJavaConsumers();
}
final static BigIntVector intVector =
new BigIntVector("internal_test", allocator);
public static BigIntVector getIntVectorForJavaConsumers() {
intVector.allocateNew(3);
intVector.set(0, 1);
intVector.set(1, 7);
intVector.set(2, 93);
intVector.setValueCount(3);
return intVector;
}
public static void simulateAsAJavaConsumers() {
MapValuesConsumer mvc = new MapValuesConsumer();//FIXME! Use constructor with dictionary provider
try (
ArrowArray arrowArray = ArrowArray.allocateNew(allocator);
ArrowSchema arrowSchema = ArrowSchema.allocateNew(allocator)
) {
//FIXME! Add custo logic to emulate a dictionary provider adding
Data.exportVector(allocator, getIntVectorForJavaConsumers(), null, arrowArray, arrowSchema);
mvc.update2(arrowArray.memoryAddress(), arrowSchema.memoryAddress());
try (FieldVector valueVectors = Data.importVector(allocator, arrowArray, arrowSchema, null);) {
System.out.print(valueVectors); //FIXME! Validate on .. testoutput::
}
}
intVector.close(); //FIXME! Expose this method also to be called by end python program
allocator.close();
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the insight, @davisusanibar. I will work on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davisusanibar this code doesn't seem to be working, but I get your idea.
68d140b
to
46bba74
Compare
@danepitkin I am addressing the reviews and thank you for reviewing this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @vibhatha ! A few small comments, but overall LGTM otherwise.
@danepitkin I am working on updating this PR for Java doctests. I will address the reviews as well. |
java/source/python_java.rst
Outdated
This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java, | ||
and finally re-accessed and validated in Python for data consistency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java, | |
and finally re-accessed and validated in Python for data consistency. | |
This section demonstrates a data roundtrip, where a dictionary array is created in Python, accessed and updated in Java, | |
and finally re-accessed and validated in Python for data consistency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is misleading. You cannot mutate data in the C Data Interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the wording.
This section demonstrates a data roundtrip where C Data interface is being used to provide
the seamless access to data across language boundaries.
In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. | ||
It then updates the data and sends it back to the Python component. | ||
|
||
.. testcode:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format the code here as well.
java/source/python_java.rst
Outdated
In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. | ||
It then updates the data and sends it back to the Python component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. | |
It then updates the data and sends it back to the Python component. | |
In the Java component, the MapValuesConsumer class receives data from the Python component through C Data. | |
It then updates the data and sends it back to the Python component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, misleading wording
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the wording. Thanks for catching this.
@lidavidm thanks a lot for your feedback. I will address these. |
Given we have no roundtrip examples at all, I don't see why we are starting with dictionaries vs a simpler type |
Could it be consider as a simpler type #325? |
Good point. Although, I am merely picking up an existing issue. |
This PR closes #213