I am uploading a minified JSON file to S3 via a lambda function that extracts data with an API call and saves some data as a JSON. A glue crawler then crawls the bucket and finds a new JSON file. Crawler adds a table to a database in the data catalog. When I query the database in Athena I get the error message below. I am not sure why the field might be the full JSON object instead of the INT value. Below is my setup the best that I can describe it.
Any solutions or ideas would be gratefully appreciated.
Query:
SELECT * FROM "<database_name>"."<table_name>" limit 10;
Error Message:
HIVE_BAD_DATA: Error parsing field value '{'id': 1, 'name': 'some name', 'is_active': true}' for field 0: org.openx.data.jsonserde.json.JSONObject cannot be cast to java.lang.Integer.
Example File Name:
<time_uploaded>.json
12:00:00.json
Example JSON (actual is minified):
[
{'id': 1, 'name': 'some name', 'is_active': true},
{'id': 1, 'name': 'some name', 'is_active': true},
{'id': 1, 'name': 'some name', 'is_active': true}
]
Bucket Path:
<bucket_name>/<database_name>/<table_name>/<date_uploaded>/
JSON Classifier:
$[*]
question from:
https://stackoverflow.com/questions/65907532/hive-bad-data-error-parsing-field-value-jsonobject-cant-be-cast-as-int 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…