Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
217 views
in Technique[技术] by (71.8m points)

Is it correct to use Flink keyby with a large number of keys?

If there are 10 million orders, and 200,000 or more merchants. Now I need to count the number of orders for each merchant, so I use the following method (see code). Is this correct? Because there will be a lot of keys, I don't know whether Flink will dimension each key inside, and whether it will lead to OOM?

orderStream.keyby(merchantId)
            
           .reduce(new ReduceFunction<Integer>() {
             @Override
              public Integer reduce(Integer value1, Integer value2)
               throws Exception {
                return value1 + value2;
            }
           });

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

In such a case, Flink will maintain an Integer value as managed, keyed state. So once one or more orders has been seen for each merchant, Flink's state backend will have data for each of the 200,000+ merchants.

Flink is highly scalable, so it's not a problem to have a lot of keys. keyBy partitions the stream so that each task manager (worker) will only handle events for a subset of the keys. (This is a sharded key/value store.) Moreover, you can choose between a heap-based state backend that keeps this state in memory, or one that uses an embedded RocksDB instance on each task manager, which keeps the state on the local disk of each task manager.

Bottom line: 200000 integers isn't very much state. Nothing to worry about, even with a single task manager.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...