Word Frequency Search Feedback

I have the assumption that words used more frequently in English are a higher priority for entry to Etymonline.

Based on this, I was wondering if any English word frequency datasets exist that would meet the standards of this website (assuming Google N-grams should be ruled out based on… reasons).

If this data set exists, I think it could be used to give search feedback analogous to the following:

1.) No results were found for search term but our records show it is a common word in English. Share your search inspiration in the forum.

2.) No results were found for search term and our records show it is an uncommon word in English. Etymonline is not a good resource for this need.

Hi, Scott, thanks for the suggestion and for thinking it well through before proposing it.

There are two answers. One is that we don’t trust such data sets compiled by thinking machines to reflect anything real (you seem to share the suspicion). But, frequency as measured how? Websites? Message boards? Twitter? Print publications? In global English or native-English? Uncommon over what range of time? Compared to what other word or words?

My intention is to relentlessly avoid anything on the site that smacks of a machine doing the thinking. I want people to know this as the one site where they can expect not to be told what is true by a machine. I know I will fail at that, I already lost the fight over Ngrams, but at least they’re scarlet-lettered. The whole internet is rushing pell mell the opposite direction; when all is going to Hell, every hill is a good one to die on. Here’s mine. They can’t be “our” records if they’re done by some machines I don’t control using some formula I can’t understand. That would be “us accepting what a machine tells us about us as the truth.” Not on my site. Apologies for hill-dying on you.

The final big picture counter-argument is that the initial assumption you make, though quite logical, is the opposite of the case. Etymonline is meant to bridge the present and the past, to give the living words historical context, to help make the speech of the dead alive again, to at least preserve something for whomever remains after the machines eat the brains.

So Etymonline’s interest in a word decreases in direct proportion as it becomes more modern. It pretends the internet never existed, It still writes 19- on its checks. It still writes checks. It’s Gutenberg, not Zuckerberg.

Personally I would be no use explaining the present to the present anyway. However, research partner Talia Felix has a livelier interest in such words, or at least lacks my disinclinations.

The site doesn’t have all the words; it never will; it can’t; it doesn’t intend to. There are other dictionary sites online that will happily give people fact-shaped answers to anything they like.

Thank you for the thorough response. There are a lot of tempting strings to pull, but I am going to resist in order to hopefully focus the main idea from my post. This is that I think Etymonline might need to better communicate what it is (or what it is not) to visitors. I think this is evidenced by the proportion of forum posts that seem to miss.

I’m not saying that I know what the website is. I don’t fully. But I think there is something true in my feeling on this.

1 Like

You’re right on the mark there. The site’s identity is obscured now. We’ve talked about redoing the main page, the home page, but nobody goes there anymore. Most of the world neither knows nor searches “etymology,” much less knows what it is and isn’t. AI says we should call the site “WORD HISTORY BOOK!” Look for that someday. The website was one thing in 2001 and another in 2024 or whatever damn year this is, and a dozen in the interim. One thing it is is organic.

2 Likes

Scott, thank you. Your feedback is really helpful.

Search is the most important and complex feature of the whole site, but word frequency has never been a part of search. The purpose of the search engine is to try to present the user with the word they want to find, although this may be related to word frequency.

Perhaps you are observing users who, without looking up the dictionary, are presenting their own, largely unsubstantiated, ideas in the forum.

Rather than changing the site’s search engine, I think a better solution would be to make it easier and smoother for users to look up the dictionary before posting. Currently, when posting, the forum program automatically presents some relevant search results. This alerts the user that, hey, what you are typing seems to be a duplicate of something that has already been posted.

The limitation here is that the forum program’s search can only present discussions from the forum. We haven’t find a way to integrate the dictionary search here.

1 Like

Hey Chongwei,

I think there is an opportunity to provide feedback within search itself as opposed to waiting until drafting a forum post. When someone submits a search query, I think they are effectively communicating to Etymonline what their expectations are for the website.

My hope for this is that there would be some way to immediately give feedback on whether or not the visitor’s expectation is good.

Expectation good: Encourage visitor to stay (perhaps by engaging with the forum).
Expectation bad: Encourage visitor to leave.

-Scott

1 Like

This study estimates that there were about 1,022,000 words in the English language as of the year 2000, yet currently, etymonline has fewer than 60,000 entries.

I once have used data mining to provide Douglas with a list of 5226 words, with duplicates and acronyms removed, that are missing from the dictionary, based on a certain word frequency. However, I believe that regarding the entries included in a dictionary, each entry should either be included or it’s not necessary to include it. I also don’t think that adding every single word, especially obscure ones, is the purpose of Etymonline. The editorial team has its independent views and workflows on this, which the technical team respects.

We don’t want a situation where a user, unable to find a word in the dictionary search, comes to the forum to complain. This is not the kind of discussion we aim for. Nor do we want our product design to inadvertently guide users in that direction.

Of course, your suggestion is very good. Our work goal has always been to meet users’ search expectations, helping them find the information they seek and enjoy the process.

One does not simply add words. Each one of those takes time, perhaps hours, some of them days, to research independently and to our editorial standards consistent with the rest of the site. Modern words are especially troublesome because their print use will be behind copyright and the internet’s archives are a mess. Still, allow 15 minutes to research, collate, type and code the entry. It’s ridiculously short, but convenient for calculation. 5,226 words would be something like two straight months of non-stop work.

(For example we’re now 3.5 hours tonight into trying to pin down “monotype” in the printing sense.)

1 Like

Is Etymonline actually Mordor? You have to tell us if it’s Mordor.

1 Like

Come on Dough, don’t be such a Luddite!
It’s little relevant whether you ride a horse or drive a car to get to work, what matters is that it’s you doing the driving :grinning:
A tool is a tool is a tool - as long as it stays a tool.

1 Like