Skip to content
This repository has been archived by the owner on Sep 17, 2019. It is now read-only.

How to increase the output throughput for inserting the records #158

Open
rkhapre opened this issue Jul 2, 2019 · 7 comments
Open

How to increase the output throughput for inserting the records #158

rkhapre opened this issue Jul 2, 2019 · 7 comments

Comments

@rkhapre
Copy link

rkhapre commented Jul 2, 2019

Currently i am inserting 191277 records in database. My source is csv
I am using 12 workers and 6000 batch size.
My input completes in 2-3 minutes and filter operation get stuck, and event after 1 hr the output is as below. ( 2158)

How can i make sure i increase my output throughput. Any suggestion
I have tried various options like below, but no improvement
max_pool_size => 50
connection_timeout => 100000

This below i get from monitoring API

"events" : {
    "in" : 191277,
    "filtered" : 68630,
    "out" : 2158,
    "duration_in_millis" : 5406886,
    "queue_push_duration_in_millis" : 79073
  }
@theangryangel
Copy link
Owner

Check your logs to see if you’re getting backed up because of an sql exception being retried.

Bear in mind your connection timeout should not exceed the sql servers timeout.

@rkhapre
Copy link
Author

rkhapre commented Jul 3, 2019

Thanks for your reply. I tried this.
But i didnt see any warning /error or anything like sql exception in the log.
It just that it get choked at output level.

On connection timeout i kept less that the sql server timeout. With this slight improvement but not drastic

Earlier every second it was loading in batches of 11, now it is loading in batches of 20 and sometime 30-35. So thats the improvement i see.

But here i am trying to load in batches of 6000, which is not going through

Do you have any benchmarks for uploading huge dataset/ some numbers as how much time it takes to flush the output data.

If benchmarks on performance are available, that will give me confidence to invest more time to come up with right configuration for this plugin

Thanks

@theangryangel
Copy link
Owner

theangryangel commented Jul 3, 2019 via email

@rkhapre
Copy link
Author

rkhapre commented Jul 4, 2019

I am using 6.8, you mean you will not be continuing to maintain this plugin going forward.?
That's very sad.
I think there is a possibility that it is slow because of 6.8 version

@theangryangel
Copy link
Owner

The intention is to maintain the plugin as best as I can, but I cannot keep up with elastic's release schedule. I will have the system I'm using it for in place for at least another year, but it's marked as legacy. It's not something I can dedicate enough time in the evenings or weekends at the moment.

I'm happy for others to step in as co-maintainers. Plenty of people fork the repo, some add features, but very few send any PRs. The majority of the work is helping people with their specific setup via email, and very often it's simply configuration mistakes (which could probably be solved by better documentation and tests, but again I'm finding it hard to make time for it).

@oak1998
Copy link

oak1998 commented Jul 19, 2019

I just imported 500,000 rows from one mysql instance to another , which took around 55 minutes, it seems not bad as compare with the result of @rkhapre.
I am using the default performance settings, while the logstash version is 6.8 too, and the mysql version is 8.0. I recommend you to check the JDBC driver to make sure the version is as the same as your mysql instance .

@stale
Copy link

stale bot commented Sep 17, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 17, 2019
@stale stale bot removed the stale label Sep 17, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants