Code Snippet: Replacing non-UTF-8 characters in filenames with Python (or Unix)

python bash code snippet"

Reading in file names as list of strings and replacing non-UTF-8 characters

import os

fns = [x[2] for x in os.walk("F:/all_data/rda")][0]

fns_correct = []
for fn in fns:
    fn = fn.replace("ä", "ae")
    fn = fn.replace("ö", "oe")
    fn = fn.replace("ü", "ue")
    fn = fn.replace("ß", "ss")
    fns_correct.append(fn)

Renaming filenames

for fn_old, fn_new in zip(fns, fns_correct):
    old = "".join(["F:/all_data/rda/", fn_old])
    new = "".join(["F:/all_data/rda/", fn_new])
    os.rename(old,new)

Update: This can also be done with Unix bash, which is arguably more convenient

for fn in *; do [[ $fn =~ "ä" ]] && mv -- "$fn" "${fn//ä/ae}"; done
for fn in *; do [[ $fn =~ "ö" ]] && mv -- "$fn" "${fn//ö/oe}"; done
for fn in *; do [[ $fn =~ "ü" ]] && mv -- "$fn" "${fn//ü/ue}"; done
for fn in *; do [[ $fn =~ "ß" ]] && mv -- "$fn" "${fn//ß/ss}"; done

Update 2: This line replaces all non utf-8 characters with “_”

for fn in *; do mv -- "$fn" $(echo $fn | sed -e 's/[^A-Za-z0-9._-]/_/g'); done

Questions? Thoughts? Generate a Comment to this Post!


Enter Name:


Enter a Title for Later Reference:


If Applicable, Enter Reply Reference:


Enter Comment:



Permutation Test for F-score Differences in Python

code-snippet python stats

Notes on Downloading Conversations through Twitter's V2 API

twitterapi bash

Using Bash to Query the New Twitter API 2.0

tutorial twitterapi bash

Search this Website and Old Blog Posts