I’ve been wanting into the ideas and utility of Kafka Join, and I’ve even touched one undertaking based mostly on it in certainly one of my intern. Now in my working situation, now I’m contemplating changing the structure of the our actual time information ingestion platform which is at the moment based mostly on flume -> Kafka with Kafka Join and Kafka.
The rationale why I’m contemplating the change will be concluded primarily into:
- But when we use flume we have to set up the agent on every distant machine which generates tons of workload for additional devops, particularly on the place the place I’m working the place the authority of machines is managed in a inflexible means that sustaining utilities on machines belonging to different departments.
- One more reason for the consideration is that the machines’ os atmosphere varies, if we set up flumes on a wide range of machines , some machine with completely different os and jdks(I’ve met some with IBM jdk) simply can’t make flume work effectively which in worst case may end up in zero information ingestion.
It seems to be with Kafka Join we are able to deploy it in a centralized means with our Kafka cluster in order that the develops price can go down. Beside, we are able to keep away from putting in flumes on machines belonging to others and keep away from the danger of incompatible atmosphere to make sure the secure ingestion of knowledge from each distant machine.
In addition to, essentially the most ingestion situation is barely to ingest real-time-written log textual content file on distant machines(on linux and unix file system) into Kafka subjects, that’s it. So I will not want superior connectors which isn’t supported in apache model of Kafka.
However I’m not certain if I’m understanding the utilization or situation of Kafka Join the correct means. So I want somebody extra skilled can provide me some lights on that.