Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
413 views
in Technique[技术] by (71.8m points)

Unable to copy exact hindi content from pdf

I am not able to copy hindi content from pdf file. When I am trying to copy/paste that content it changes to different hindi characters.

Example-

Original- ????????

After paste- ???????

it shows like this.

Anybody can help me to get the exact hindi characters.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This issue is similar to the one discussed in this answer, and the appearance of the sample document there does also remind of the document here:

In a nutshell

Your document itself provides the information that e.g. the glyphs "????????" in the head line represent the text "???????". You should ask the source of your document for a document version in which the font informations are not misleading. If that is not possible, you should go for OCR.

In detail

The top line of the first page is generated by the following operations in the page content stream:

/9 239 Tf
( !"#$%&) Tj 

The first line selects the font named 9 at a size of 239 (an operation at the beginning of the page scales everything down). The second line causes glyphs to be printed. These glyphs are referenced inbetween the brackets using the custom encoding of that font.

The font 9 on the first page of your PDF contains a ToUnicode map. This map especially maps

<20> <20> <0928>
<21> <21> <0928>
<22> <22> <0930>
<23> <23> <0930>
<24> <24> <0930> 

i.e. the codes 0x20 (' ') and 0x21 ('!') both map to the Unicode code point 0x0928 ('?') and the codes 0x22 ('"'), 0x23 ('#'), and 0x24 ('$') all to the Unicode code point 0x0930 ('?').

Thus, the contents of ( !"#$%&), displayed as "????????", completely correctly (according to the information in the document) are extracted / copy&pasted as "???????".


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...