Fix FLUME-3233: only use inode to identify files then taildirSource will support file rename/rotation #336
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The issue:
When file is renamed or rotated, just as it is in log4j or other similar log system, currently Flume taildirSource will treat it as a new file then all contents will be collected again. It will cause data duplicated, which has been described in FLUME-3233、 FLUME-3219、FLUME-3094、FLUME-3216 and FLUME-2777.
The general solution is only monitor original *.log and NOT monitor the renamed *.log.xxx. But for below two reasons, we must monitor both *.log and renamed *.log.xxx:
1、 Sometimes log system uses async writting. Contents may be flushed to disk after file is renamed. If we do not monitor renamed *.log.xxx, the content will only be sent out when Flume close inactive file. Though Flume will send it out finally, but it will cause sending delay and curreny the interval is decided by idleTimeout, default 120 seconds. In many cases it is unacceptable.
2、Sometimes both service and Flume are shutdown. Service is restarted firstly then it writes something to *.log and rename it as *.log.xxx before Flume is restarted successfully. If we do not monitor renamed *.log.xxx, the data will get lost certernly.
The solution:
The PR add a new inodeOnly paramater to reslove data duplication problem when monitoring both *.log and *.log.xx. And it will bring taildirSource ability of supporting file rename/rotation. By default, inodeOnly is false and Flume just works same with now. When inodeOnly in config is set as true, Flume only use inode to identify file then taildirSource will support file rename/rotation. And the above 2 problems will be solved perfectly.