Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
88 views
in Technique[技术] by (71.8m points)

How can I get the body of a gmail email with an attatchment gmail python API

This is my code to get the body of the email:

body = []
body.append(msg['payload']['parts'])
if 'data' in body[0][0]['body']:
    print("goes path 1")
    body = base64.urlsafe_b64decode(
    body[0][0]['body']['data'])
else
    print("goes path 2")
    body = base64.urlsafe_b64decode(
    body[0][1]['body']['data'])
else:
    # What Do I do Here?

The reason i have the if elif statements is because sometimes the body is in different places so i have to try for both of them. When run through this an email that had an attachment resulted in a key error of data not existing meaning it's probably in a different place. The json object of body is in an image linked below because it is too big to paste here. How do I get the body of the email?

enter image description here https://i.stack.imgur.com/Ufh5E.png

Edit:

The answers given by @fullfine aren't working, they output another json object the body of which can not be decoded for some reason:

binascii.Error: Invalid base64-encoded string: number of data characters (1185) cannot be 1 more than a multiple of 4

and:

binascii.Error: Incorrect padding

An example of a json object that i got from their answer is:

{'size': 370, 'data': 'PGRpdiBkaXI9Imx0ciI-WW91IGFyZSBpbnZpdGVkIHRvIGEgWm9vbSBtZWV0aW5nIG5vdy4gPGJyPjxicj5QbGVhc2UgcmVnaXN0ZXIgdGhlIG1lZXRpbmc6IDxicj48YSBocmVmPSJodHRwczovL3pvb20udXMvbWVldGluZy9yZWdpc3Rlci90Sll1Y3VpcnJEd3NHOVh3VUZJOGVEdkQ2NEJvXzhjYUp1bUkiPmh0dHBzOi8vem9vbS51cy9tZWV0aW5nL3JlZ2lzdGVyL3RKWXVjdWlyckR3c0c5WHdVRkk4ZUR2RDY0Qm9fOGNhSnVtSTwvYT48YnI-PGJyPkFmdGVyIHJlZ2lzdGVyaW5nLCB5b3Ugd2lsbCByZWNlaXZlIGEgY29uZmlybWF0aW9uIGVtYWlsIGNvbnRhaW5pbmcgaW5mb3JtYXRpb24gYWJvdXQgam9pbmluZyB0aGUgbWVldGluZy48L2Rpdj4NCg=='}

I figured out that i had to use base64.urlsafe_b64decode to decode the body which got me b'<div dir="ltr">You are invited to a Zoom meeting now. <br><br>Please register the meeting: <br><a href="https://zoom.us/meeting/register/tJJuyhn4ndhfjrhUFI8eDvD64Bo_8caJumI">https://zoom.us/meeting/register/tJYucuirrDwsG9XwUFI8eDvD64Bo_8caJumI</a><br><br>After registering, you will receive a confirmation email containing information about joining the meeting.</div> '

How can I remove all the extra html tags while keeping the raw text?

question from:https://stackoverflow.com/questions/65885152/how-can-i-get-the-body-of-a-gmail-email-with-an-attatchment-gmail-python-api

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Answer

The structure of the response body changes depending on the message itself. You can do some test to check how they look like in the documentation of the method: users.messages.get

How to manage it

  • Intial scenario:

Get the message with the id and define the parts.

msg = service.users().messages().get(userId='me', id=message_id['id']).execute()
payload = msg['payload']
parts = payload.get('parts')
  • Simple solution

You can find the raw version of the body message in the snippet, as the documentation says, it contains the short part of the message text. It's a simple solution that returns you the message without formatting or line breaks. Furthermore, you don't have to decode the result. If it does not fit your requirements, check the next solutions.

raw_message = msg['snippet']
  • Solution 1:

Add a conditional statement to check if any part of the message has a mimeType equal to multipart/alternative. If it is the case, the message has an attachment and the body is inside that part. You have to get the list of subparts inside that part. I attach you the code:

for part in parts:
   body = part.get("body")
   data = body.get("data")
   mimeType = part.get("mimeType")
  
   # with attachment
   if mimeType == 'multipart/alternative':
       subparts = part.get('parts')
       for p in subparts:
           body = p.get("body")
           data = body.get("data")
           mimeType = p.get("mimeType")
           if mimeType == 'text/plain':
               body_message = base64.urlsafe_b64decode(data)
           elif mimeType == 'text/html':
               body_html = base64.urlsafe_b64decode(data)

 
   # without attachment
   elif mimeType == 'text/plain':
       body_message = base64.urlsafe_b64decode(data)
   elif mimeType == 'text/html':
       body_html = base64.urlsafe_b64decode(data)

final_result = str(body_message, 'utf-8')
  • Solution 2:

Use a recursive function to process the parts:

def processParts(parts):
   for part in parts:
       body = part.get("body")
       data = body.get("data")
       mimeType = part.get("mimeType")
       if mimeType == 'multipart/alternative':
           subparts = part.get('parts')
           [body_message, body_html] = processParts(subparts)
       elif mimeType == 'text/plain':               
           body_message = base64.urlsafe_b64decode(data)
       elif mimeType == 'text/html':
           body_html = base64.urlsafe_b64decode(data)
   return [body_message, body_html]
      
[body_message, body_html] = processParts(parts)
final_result = str(body_message, 'utf-8')

Extra comments

  • If you need to get more data from your message I recommend you to use the documentation to see how the response body looks like.
  • You can also check the method in the API library of Python to see a detailed description of each element.
  • Do not use images in this way as DalmTo has said

edit

  • I tried the code with Python 2, it was my mistake. With Python 3, as you said, you have to use base64.urlsafe_b64decode(data) instead of base64.b64decode(data). I've already updated the code.

  • I added a simple solution that maybe fits your needs. It takes the message from the snippet key. It is a simplified version of the body message that does not need decoding.

  • I also don't know how you have obtained the text/html part with my code that does not handle that. If you want to get it, you have to add a second if statement, I updated the code so you can see it.

  • Finally, what you obtained using base64.urlsafe_b64decode is a bytes variable, to obtain the string you have to convert it using str(body_message, 'utf-8'). It is now in the code


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...