Find Non-ASCII Characters With the TreeSize File Search


Non-ASCII Characters: Find Invalid File Names With the TreeSize File Search

Computer applications use ASCII codes (American Standard Code for Information Interchange) to present text. They are a character encoding standard using 7-digit binary numbers to display symbols. ASCII enables digital entities to store and process character-oriented information. Using only ASCII symbols in file names guarantees that all systems will be able to process them – non-ASCII characters, however, may pose a problem.

Invalid File and Folder Names

 

Companies sometimes use a variety of systems, some of them with special requirements. Many languages contain characters that cannot be displayed with the ASCII standard – for example German umlauts (“ä”, “ö”, and “ü”). File names containing such characters may be problematic for certain applications and, consequently, may have to be avoided.

When asked for help from TreeSize users, JAM Software implemented a way to easily find non-ASCII characters: The TreeSize File Search can help find file names containing characters outside of the ASCII space.

 

How to use the TreeSize Custom File Search to Find Non-ASCII Characters

 

Open the TreeSize File Search and disable all searches except the “Custom Search”.
In the right panel, define an include filter for “File and Folder Name” of the type “Regular Expression”.

Use one of the following patterns to search for the characters you want to find:

 [^[:print:]]

Treats all non-ASCII characters as invalid. This applies to German Umlauts as well as for example the French circumflex and cedilla, or the Spanish eñe.

 [^\P{C}]

Finds all file names containing non-printable Unicode characters.

To find file and folder names containing the non-breakable space (Unicode NOBR U+00A0), use the following search pattern:

[\xA0]

How to Process Found Names

 

The TreeSize File Search will also help you get rid of non-ASCII characters in file names. You can archive or move the files to a different directory – or simply pass them to a renaming program or script that replaces problematic characters.

 

 

Leave a comment

Your email address will not be published. Required fields are marked *