Is it correct to use Flink keyby with a large number of keys?

Question

Welcome To Ask or Share your Answers For Others

Is it correct to use Flink keyby with a large number of keys?

asked Feb 19, 2021 in Technique[技术] by 深蓝 (71.8m points)

Is it correct to use Flink keyby with a large number of keys?

If there are 10 million orders, and 200,000 or more merchants. Now I need to count the number of orders for each merchant, so I use the following method (see code). Is this correct? Because there will be a lot of keys, I don't know whether Flink will dimension each key inside, and whether it will lead to OOM?

orderStream.keyby(merchantId)
            
           .reduce(new ReduceFunction<Integer>() {
             @Override
              public Integer reduce(Integer value1, Integer value2)
               throws Exception {
                return value1 + value2;
            }
           });

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-02-19T03:56:12+0000

In such a case, Flink will maintain an Integer value as managed, keyed state. So once one or more orders has been seen for each merchant, Flink's state backend will have data for each of the 200,000+ merchants.

Flink is highly scalable, so it's not a problem to have a lot of keys. keyBy partitions the stream so that each task manager (worker) will only handle events for a subset of the keys. (This is a sharded key/value store.) Moreover, you can choose between a heap-based state backend that keeps this state in memory, or one that uses an embedded RocksDB instance on each task manager, which keeps the state on the local disk of each task manager.

Bottom line: 200000 integers isn't very much state. Nothing to worry about, even with a single task manager.

Categories

Is it correct to use Flink keyby with a large number of keys?

Is it correct to use Flink keyby with a large number of keys?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags