Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
885 views
in Technique[技术] by (71.8m points)

amazon web services - HIVE_BAD_DATA: Error Parsing field value JSONObject cant be cast as Int

I am uploading a minified JSON file to S3 via a lambda function that extracts data with an API call and saves some data as a JSON. A glue crawler then crawls the bucket and finds a new JSON file. Crawler adds a table to a database in the data catalog. When I query the database in Athena I get the error message below. I am not sure why the field might be the full JSON object instead of the INT value. Below is my setup the best that I can describe it.

Any solutions or ideas would be gratefully appreciated.

Query: SELECT * FROM "<database_name>"."<table_name>" limit 10;

Error Message: HIVE_BAD_DATA: Error parsing field value '{'id': 1, 'name': 'some name', 'is_active': true}' for field 0: org.openx.data.jsonserde.json.JSONObject cannot be cast to java.lang.Integer.

Example File Name: <time_uploaded>.json 12:00:00.json

Example JSON (actual is minified):

[
{'id': 1, 'name': 'some name', 'is_active': true},
{'id': 1, 'name': 'some name', 'is_active': true},
{'id': 1, 'name': 'some name', 'is_active': true}
]

Bucket Path: <bucket_name>/<database_name>/<table_name>/<date_uploaded>/

JSON Classifier: $[*]

question from:https://stackoverflow.com/questions/65907532/hive-bad-data-error-parsing-field-value-jsonobject-cant-be-cast-as-int

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Can't test it right now but this is what I would do:

  1. Replace single quotes (') with double quotes (") to enclose keys and string values in JSON documents; I am not sure if Athena can read single quotes JSON.
  2. Remove each entry from list and treat then as new line JSON documents. Ex.:

{"id": 1, "name": "some name", "is_active": true}

{"id": 1, "name": "some name", "is_active": true}

{"id": 1, "name": "some name", "is_active": true}

Obs.: Remember that Athena can't read multiple JSON documents in same line. It only reads the first occurence on each line of text file.

  1. Check for presence of wrong or unescaped quotes

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...