Monday, July 4, 2022

Is searching Myanmar language text in pdf files with pdf readers or browsers uniformly reliable?


On principle, I think all listings of Myanmar language works such as databases or bibliographies should include titles in Myanmar language in addition to whatever language they are being transliterated. For a specific case of the Myanmar Manuscript Digital Library hosted by the Toronto University, I tried to demonstrate one particular benefit of incorporating such titles in their database listings. I made the listings into a pdf file assuming that the searchability of text in both the English and Myanmar language would add considerable value. The idea wasn’t foolproof, however.

To my surprise, I found out that not all pdf readers were that reliable for searching Myanmar language text! I have in mind the major browsers like, Google Chrome, Microsoft Edge, Opera, FireFox, and a reader like Adobe Acrobat Reader.


The Browsers

I tried searching text with “Ctrl+F” in Google Chrome, Microsoft Edge, Opera, and FireFox browsers. This is the first page of pdf file used:

The search string was “အဘိဓမ္မတ္ထသင်္ဂဟဒီပနီ”, which is the entire text string of the fourth row in the Name(Myanmar) column. The following image shows how the search result changes as the search string was reduced with “back arrow” deletion, where each stroke of the backarrow amounts to a deletion of one Unicode code point.

Search with Chrome browser


Here you can see that Chrome works perfectly. So also were the Microsoft Edge, and Opera browsers.

However, the Firefox browser falls short:

Search with Firefox browser


It couldn’t find “အဘိဓမ္မတ္ထသင်္ဂဟဒီပနီ” even when the “Whole Words” option was selected. 


The Adobe Acrobat Reader

In my earlier post on Myanmar language text search in pdf file, I was impressed with the Acrobat Reader’s unique ability to search within the “bookmarks”. However, it performed poorer than the first three browsers and its failure to find “အဘိဓမ္မတ္ထသင်္ဂဟဒီပနီ” is really surprising.

Search with Adobe Acrobat Reader


Caution

Be careful when you search for Myanmar language text in pdf files!


DIY

For the benefit of my fellow dummies who would legitimately doubt my findings, I am sharing the first page of the pdf file I had used for this exercise here.

No comments:

Post a Comment