Code Snippet: Replacing non-UTF-8 characters in filenames with Python (or Unix)

python bash code snippet"

Reading in file names as list of strings and replacing non-UTF-8 characters

import os

fns = [x[2] for x in os.walk("F:/all_data/rda")][0]

fns_correct = []
for fn in fns:
    fn = fn.replace("ä", "ae")
    fn = fn.replace("ö", "oe")
    fn = fn.replace("ü", "ue")
    fn = fn.replace("ß", "ss")
    fns_correct.append(fn)

Renaming filenames

for fn_old, fn_new in zip(fns, fns_correct):
    old = "".join(["F:/all_data/rda/", fn_old])
    new = "".join(["F:/all_data/rda/", fn_new])
    os.rename(old,new)

Update: This can also be done with Unix bash, which is arguably more convenient

for fn in *; do [[ $fn =~ "ä" ]] && mv -- "$fn" "${fn//ä/ae}"; done
for fn in *; do [[ $fn =~ "ö" ]] && mv -- "$fn" "${fn//ö/oe}"; done
for fn in *; do [[ $fn =~ "ü" ]] && mv -- "$fn" "${fn//ü/ue}"; done
for fn in *; do [[ $fn =~ "ß" ]] && mv -- "$fn" "${fn//ß/ss}"; done

Update 2: This line replaces all non utf-8 characters with “_”

for fn in *; do mv -- "$fn" $(echo $fn | sed -e 's/[^A-Za-z0-9._-]/_/g'); done

Questions? Thoughts? Generate a Comment to this Post!


Enter Name:


Enter a Title for Later Reference:


If Applicable, Enter Reply Reference:


Enter Comment:



Permutation Test for F-score Differences in Python

code-snippet python stats

Notes on Downloading Conversations through Twitter's V2 API

twitterapi bash

Using Bash to Query the New Twitter API 2.0

tutorial twitterapi bash

Search this Website