Metadata matching for movies with "umlaut" substitutions in German titles (e.g. ä -> ae)

Hello all,

I had a bunch of movie files with names with German “umlaut” substitutions.

In German, you can use the following substitutions to avoid umlaute, for example in incompatible character encodings or to improve compatibility to some devices:

ä → ae
ü → ue
ö → oe

(and as well “ß → ss”)

Recently I discovered that Infuse can’t detect the movie meta data with movie file names including umlaute:

Doesn’t work:

 - Bridge.of.Spies.Der.Unterhaendler.2015.mkv
 - Ausloeschung.2018.mkv
 - Captain.Fantastic.Einmal.Wildnis.und.zurueck.2016.mkv

Works:

 - Bridge.of.Spies.Der.Unterhändler.2015.mkv
 - Auslöschung.2018.mkv
 - Captain.Fantastic.Einmal.Wildnis.und.zurück.2016.mkv

Interestingly, the search on themoviedb.org can find movie names with those umlaute substitutions.

Example:

(Or search for Bridge of Spies Der Unterhaendler manually.)

So long story short - why can’t Infuse find meta data for movie with umlaut substitutions, even if themoviedb.org can do it?

Thanks a lot !

1 Like

I am by no means an expert on this but from what I’ve gleaned over the years is that special characters cause problems among different operating systems and file transfer protocols.

When you search using the web browser to TMDB they have all the special characters covered in the html.

When you use the search in Infuse first the transfer protocol may be having issues. For example SMB has some problems like this.

The servers and other machines that may “touch” the data may also have similar issues.

Best to avoid all special characters and go with the simplified version so all the pitfalls are avoided.

2 Likes

Hi @NC_Bullseye ,

thanks a lot for your answer!

But, I don’t think that is the case here. “umlaute” are not really special characters in the classic sense, in German they are very common and do not cause any problems in any app (of course, in HTML, you have entities for these, but that’s another story).

My theory is:

If you use the API from themovedb.org, you have to use the exact movie name (or a part of it). But entering the movie name into the search field on themovedb.org is running a smarter API-usage by requesting search results for variations of the entered text (so it searches for “ä” and “ae”. Other examples are, searching for “Jack’s” and “Jacks” and so on).

Infuse may not do this. It uses the API with only one call, namely the entered movie name and it doesn’t do extra handling for get better/more search results.

So I think what themoviedb.org is doing is:

  1. Sending the entered search term to the backend
  2. The backend analyses the search term and, for example, detects “ae”
  3. The backend does 2 calls to the API: one with “ae” and one with “ä” within the search term.

So while the user enters just one search term, the API gets multiple.

And Infuse just avoids doing the second step above.

Again, this is just my theory :slight_smile:

2 Likes

I just know that there are programs, processes, and file restrictions that don’t allow special characters like the umlaut.

For example in Microsoft 365

Letters with diacritical marks, such as umlauts, accents, and tildes, are invalid characters.

In your examples it looks like you have the ones with the umlaut under the “Works” heading. Is that correct?

Sorry I can’t give a more definitive answer, I don’t run across may special characters.

Maybe others will have a better idea on how to handle this. :wink:

1 Like

I think they do this because allowing for “typos”, essentially, could open up a whole world of trouble and lead to more wrongly identified files than are not being found now, due to intentional misspellings (for software compatibility purposes).

The titles you are looking for include umlauts. When infuse searches for those tiles spelled correctly (with the umlauts), it works. That is as it should be.

If I understand you correctly, you are asking for Infuse to search every title with every instance of your specific list of substitutions (“ue” instead of ü) … which will muddy up the search results for everyone, creating far more mis-identified titles, when there is a far better solution to your problem:

Just add alternate titles to the TMDB entries you are having trouble with — replacing the umlauts with their substitutions when you enter the alternate title. Infuse will then be able to find it, even when spelled differently.

And you won’t be requiring Infuse to do weird things like searching for “Güss Who’s Coming to Dinner” :joy:

1 Like

Here you go:
To get you started I added an alternative title, to your specifications, for Bridge of Spies, on the German version of the film’s TMDB page — as seen if you click the link below:

1 Like

Infuse relies on the results from TMDB, and does not currently add its own special handling for cases like these.

These searches on TMDB are not working for me.

Bridge of Spies appears to be working as it looks like an alternative title has been added (thx @FLskydiver)

2 Likes

Yes, I agree.

While your description of the issue is correct, I do not ask Infuse to do that. I was just wondering how TMDB is doing it (and Infuse), it was only my own speculation.

Yes, you are right. I’m a software engineer myself so I wouldn’t do this the naive way. It must be done with the help of a locale dictionary, not just replacing different strings, doing a guesswork.

Again, please understand that I don’t want Infuse to do this, it was just my assumption how things work on TMDB side (and the question, if Infuse is doing some query preprocessing…).

I could do this, but I think I will fix the filenames of my files to use umlauts, so the filenames match the title correctly. Fixing the root cause, so to speak :wink:

:smiley:

Thanks FLskydiver for testing it. But as I mentioned above, I will go ahead and fix my filenames.

Thanks for clarification, James.

I think the Infuse side of things is clear now. For the sake of the discussion, here is an interesting example, the movie “Auslöschung” from 2018. Original title: “Annihilation”:

There is no German alternative title set.

But I can find it with this query (with umlaut substition “oe”):

https://www.themoviedb.org/search?query=Ausloeschung

Maybe I should just ask TMDB how they do it with umlauts :slight_smile:

1 Like

Well, that is interesting!

Because when I first tried your link:

But then I changed my preferred language to German (DE) and what do you know?

I guess they are hiding tricks up their sleeves there.
Definitely, ask them how there’re doing it.
Whatever it is, they aren’t utilizing it when the user is viewing the site in American English (and I’m guessing any other languages that don’t commonly use those characters). Might similar substitutions benefit other languages, I wonder, such as Danish/Norwegian or Swedish?

In the meantime, if you don’t have any issues using umlauts locally, your plan to just rename the files with them seems to be the best. :+1:t3:

Thanks for the detailed reply. Blue skies!

1 Like

Oh okay… so, the current locale is of importance. I didn’t notice that, my setting was always German :slight_smile:

So thanks again @FLskydiver for pointing this out.

Yes, I already began to fix the file names :slight_smile: Thanks to Infuse having the “Others” categorie, I can easily find those movies that could not be identified.

Thank you all for your replies!

2 Likes