I have a Kafka stream of Protobuf messages that I would like to write to S3 where they can be queried using Athena or similar tools. My clues so far are to convert the Protobuf messages to Parquet and create a table on top of them in Athena, and when adding files just repair the table. Are there any good libraries that do this already? I am not the most experienced in this area and would like to start with a good foundation. My tech stack is the standard JVM (Kotlin, Java, and Scala) languages, along with Spring, Kafka, Parquet, AWS, and S3.
2.1m questions
2.1m answers
60 comments
57.0k users