Need help? Check out our Support site, then

How many stopwords for non-english languages?

  1. Hi!

    We just stumbled upon this:

    How many stopwords should be used there, as we have found many stopword-lists and there are much longer:

    Best regards

    The blog I need help with is

  2. Hi Torsten, I've passed your question along to the internationalization team and will get back to you as soon as I have some guidance on this. Thanks!

  3. Thx.

  4. Seems to be a tough question ... ;-)

  5. Hi Torsten, while I don't have a definitive reply, one of our native German-speakers said that "the top100de.txt list has probably a lot of duplicates and it does have some weird words in it, like ‘percent’, ‘million’, ‘Mark’ (Germany’s pre-Euro currency)."

  6. And now? My question wasn't: "What is your opinion to these lists?", my question was "How many stopwords are okay?"

    But forget it. Seems to be irrelevant for Automattic.

  7. Hi Torsten,

    Looking at this from a translator's perspective, I don't think there's a definitive answer to your question. A lot of translation is about making judgment calls, and in this case there isn't one objective, correct way to create a list of stop words — you can take a look at how they are discussed on Wikipedia to see what I mean: Stop words

    The issue here is that the list of stop words is used to decide what words aren't included in searches. As a native speaker, you (and the other German speakers who contribute to GlotPress) are in the best position to determine what stop words will make searches more useful in German. If you think the list of stop words needs to be expanded, I don't see a problem with that — I think you should add as many stop words as you deem helpful and relevant. Cheers! :)

  8. Hi Rachel,

    I already know the definition of Stopwords. You don't have to explain that to me. Thank you.
    The best way to determine the best stopwords would be an analysis of the used search terms. But I don't even know for which search these stopwords are used. You wrote: "If you think the list of stop words needs to be expanded" - on which data should I rely? I have no data. And you are not able to provide it.

    The quick answer seems to be: "the more the merrier". And that was so hard to tell...?

Topic Closed

This topic has been closed to new replies.

About this Topic