Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
371 views
in Technique[技术] by (71.8m points)

python - How do I automatically fix an invalid JSON string?

From the 2gis API I got the following JSON string.

{
    "api_version": "1.3",
    "response_code": "200",
    "id": "3237490513229753",
    "lon": "38.969916127827",
    "lat": "45.069889625267",
    "page_url": null,
    "name": "ATB",
    "firm_group": {
        "id": "3237499103085728",
        "count": "1"
    },
    "city_name": "Krasnodar",
    "city_id": "3237585002430511",
    "address": "Turgeneva,   172/1",
    "create_time": "2008-07-22 10:02:04 07",
    "modification_time": "2013-08-09 20:04:36 07",
    "see_also": [
        {
            "id": "3237491513434577",
            "lon": 38.973110606808,
            "lat": 45.029031222211,
            "name": "Advance",
            "hash": "5698hn745A8IJ1H86177uvgn94521J3464he26763737242Cf6e654G62J0I7878e",
            "ads": {
                "sponsored_article": {
                    "title": "Center "ADVANCE"",
                    "text": "Business.English."
                },
                "warning": null
            }
        }
    ]
}

But Python doesn't recognize it:

json.loads(firm_str)

Expecting , delimiter: line 1 column 3646 (char 3645)

It looks like a problem with quotes in: "title": "Center "ADVANCE""

How can I fix it automatically in Python?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The answer by @Michael gave me an idea... not a very pretty idea, but it seems to work, at least on your example: Try to parse the JSON string, and if it fails, look for the character where it failed in the exception string1 and replace that character.

while True:
    try:
        result = json.loads(s)   # try to parse...
        break                    # parsing worked -> exit loop
    except Exception as e:
        # "Expecting , delimiter: line 34 column 54 (char 1158)"
        # position of unexpected character after '"'
        unexp = int(re.findall(r'(char (d+))', str(e))[0])
        # position of unescaped '"' before that
        unesc = s.rfind(r'"', 0, unexp)
        s = s[:unesc] + r'"' + s[unesc+1:]
        # position of correspondig closing '"' (+2 for inserted '')
        closg = s.find(r'"', unesc + 2)
        s = s[:closg] + r'"' + s[closg+1:]
print result

You may want to add some additional checks to prevent this from ending in an infinite loop (e.g., at max as many repetitions as there are characters in the string). Also, this will still not work if an incorrect " is actually followed by a comma, as pointed out by @gnibbler.

Update: This seems to work pretty well now (though still not perfect), even if the unescaped " is followed by a comma, or closing bracket, as in this case it will likely get a complaint about a syntax error after that (expected property name, etc.) and trace back to the last ". It also automatically escapes the corresponding closing " (assuming there is one).


1) The exception's str is "Expecting , delimiter: line XXX column YYY (char ZZZ)", where ZZZ is the position in the string where the error occurred. Note, though, that this message may depend on the version of Python, the json module, the OS, or the locale, and thus this solution may have to be adapted accordingly.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...