Words like grains of sand

It all began when I yielded to a frivolous temptation and ordered online a paper copy of Grandiloquent words – a pictoric lexicon of ostrobogulous locutions” by Jason Travis Ott.

As I first opened the book I was fascinated, intrigued and horrified at the same time: far beyond my expectations, it listed a staggering number of English words no one in his right mind would ever use in speech or writing before the fourth or fifth beer, and even then with extreme caution.
Each entry was complete with definition, quotations, dating and etymology.
Chilling.

The worst came as while leafing through it I run into a bizarre word *) that long ago some prankster proposed (badly misspelled and bare of any explanation) as a new Etymonline entry.
OK, I was wrong, it was no amateurish improvisation: allegedly in a remote past the word existed – together with the other 227 reported in the book and the devil knows how many others no one ever bothered to record.

Which reproposes the eternal question: which words deserve a place in a good sensible dictionary? How complete should (or can) a good dictionary be?

Many old words die hard, countless new words sprout like mushrooms after the rain. Keeping trace of them all is virtually impossible because the new fancy ones keep popping up while one writes, and the old undead ones keep resurfacing like zombies out of control.
How many f**ing words are there altogether, then???

I played a little with the calculator (rainy Sunday, you know) and came to a rough but still stunning estimate: limiting the evaluation to a reasonable maximum of 12 character per word (let’s leave out supercalifragilisticexpialidocious & co.) and constraining it to English (no diacritics, no special characters etc.), there are exactly 99,246,114,928,149,462 ‘words’ one can write with 26 characters – over 99 quadrillions.
Now let’s be honest and assume (a shot in the dark, of course) that only one in a million of those ‘words’ is pronounceable and that only one in a thousand of the surviving ones may make a sense of sorts if properly stretched with enough perverse fantasy.
This leaves us with approximately 99 millions potential entries that would fill about two million pages, tantamount to ~1000 thick tomes to be fitted in ca. 100 normal shelves – “we need a larger apartment, darling” :grinning:.

No one alive would ever have time to consult such a “complete” dictionary, not to mention compiling and editing it, which would be a full-time job for many generations of wordaholics. There must be a good effective criterion to prune those 99 millions potential entries down to a few hundreds of thousands, something a normal human being can handle and survive.

Unfortunately though the only decent criterion that occurs to me has no place in any known exact science: it’s called common sense and at least so far it’s an exquisitely human prerogative. To the one who manages to provide a deterministic, unquestionable, non-tautological definition of it I’m ready to buy a barrel of excellent Bavarian beer :slight_smile:
Prost! :beverage_box:

*) “Ultracrepidarian”, n.: from a Latin proverb “Ne ultra crepidam sutor iudicaret” (more or less, “Cobbler, don’t comment beyond the shoes!”): someone who pontificates about topics he’s little familiar with.

1 Like

“Ultracrepidarian” is not unfitting for etymonline. Hebdomadally is in there already, just as bad, and nobody can spell that either.

Both always were pure showoff words. You’d be an ass to try to use ultracrepidarian today in most circumstances, and in print it seems to exist mostly in books of “longest hardest words!”

But Lamb, Hazlitt, and Beddoes among others liked it and preened their prose in it. Guiding the present in reading the past is a main motive for the site. (In addition to giving you the etymology basics of all the basic English words, and serendipity for its own sake.)

I want etymonline to keep alive, in the coming world, the mental ability to read the great thoughts and human truths that people put into words before the world went mad. To read Charles and Mary Lamb or “Hard Times” or “Common Sense” or Federalist No. 70, or Donne’s sermons.

Even if that means feeding the site into the wood-chipper of AI, to be regurgitated into chum for shrunken minds wired to little devices. Which chipping and chumming happened years ago.

A hail-Mary pass to whoever might be there to pull it down.

The whole “ultor crepidam” story has been on the site for decades, under cobbler (n.1).

Your tale and description of the sifting of words from the alphabet approaches Borges’s universal library, which is a seat of madness.

You describe the natural winnowing of possible words from the usable ones. The language-cultures winnow further; this people avoid certain possible sounds or clusters of sounds as ugly or unpronounceable. The people one valley over from them might embrace those sounds and build on them (the French and the Germans).

1 Like

Yet I’m afraid we are in dire need for a handier synonym: since Google, Wikipedia & Co. made knowledge (though not necessarily understanding) an easily accessible asset for everyone, an exponentially growing number of simpletons started pontificating profusely on subjects they had no clue about. “Smartass” and “wiseass” are fairly good English approximations but cannot render precisely enough the ultracrepidarianness of the cleaning lady who tells the neurosurgeon where to cut.

Don’t get your hopes up: Pandora’s box has been open for too long, and small minds are too fond of big words and too little sensitive to ridicule. Let’s just hope that a tiny share of humankind benefits from your toil and let the rest wallow in their own verbal onanism.

Intentional madness – reductio ad absurdum is an invaluable tool :slight_smile:

Yes, I admit I cut corners brutally by stating that a ‘word’ in a million would be pronounceable: where? By whom? How long a word? (the shorter the word, the higher its chances).
You could write a thick tome or three on the local preferences about what is pronounceable, what sounds musical, what sounds horrible and so on.
And when you’re ready to send it to the publisher you’d probably come across a long forgotten Ugro-Sumerian dialect where “xwooktz” used to mean “I love you” :smile:

I have thought about the problem of which words should be in a dictionary for a looooooong time, and I have come up with the solution. The solution is based on freedom (like, of speech) first, but tempered by good old fashioned hard-nosed economics; here’s how it goes:

1) Coin. Them. All. Freedom of Speech Rules:
First thing to solve is whether a word should be entered in the dictionary in the first place-- before we discuss which ones should be pruned over time. My answer to this: any and all of them. Dictionaries are the LAST place a word goes, after the people, or society, has already coined a term, and then used it frequently enough (how do dictionaries measure that? I’ll provide that answer too) to warrant an egghead in charge of terms to grandiosely enter it into the lexicon (all official-like with robes and wizard hats in Oxford or Cambridge). The old method’s TRUE determination was economic tho-- only so many pages. But sites like Urban and Etymonline changed all that, as online storage is cheap(-er than print)-- but not free. But the ULTIMATE dictionary would not only have words everyone or a large group can appreciate, but even colloquialisms (see the podcast: A Way With Words, for examples of all the local words there are, which never make dictionaries), localisms, and even words you and your friends have invented in 1979 for the purpose of just a single sports team or classroom group or sorority. The ideal dictionary would list ALL the words, even if used once, by a single person, in his bedroom. Who knows? Maybe like fashizzle or Homer’s “Doh!” it will become “a thing”? I think of it this way: only Mr. Higgs thought his theory was correct, and he might have been it’s only espouser for decades until the supercollider proved his particle exists, and what a particle it was, completing the Standard Model of Physics like Oliver completed the Bradies, or Kieth Moon completed The Who.

Credit Urban for allowing almost ANY word to be coined (they do edit, as I found out years ago when defining “Boot’n L’Lee Farnsworth” in it for the sole benefit of a single friend across the country in California who would appreciate it. It took a couple years, but Urban disappointingly deleted my definition, and I’d argue it did have some relevance beyond just my friend, but only if you are a Doctor Dre or possibly NWA, or hip-hop semi-fan.

The point is, Higgs thought “Higgs Boson” was a word, or at least was going to one day BECOME a word, so why shouldn’t he SPONSOR or PAY for the word to remain in dictionaries, and if he runs out of money or belief in his own coined term, fashizzle, then HE AND ONLY HE could prune it from the dictionary using the school of hard economic knocks…

2) PAY TO PLAY (with your coined words) & PAY TO STAY:
So if we allow ANYTHING into the dictionary, which is a bit crazy, but there are ways to do it such that your tiny group-of-three friends who are the only ones in the world who appreciate that term don’t have to infringe on the mind-space of the other 8 billion people on Earth. (I call this “auto-complete”, which means that only the BEST words get auto-complete, and everything else in the dictionary which are nonsensical words-- such as some of the quadrillion combinations you propose-- don’t get shown to Joe Sixpack, the average Joe.

But beyond just hiding what I would call “My Words” or “Local words” or “Extreme Colloquialisms” or “Personlized Dictionary Terms” from regular joes, how can a digital dictionary prevent over-expansion to the point of nonsense and lost capital? (the latter of which is the truly damning for an online dictionary like Etymonline or any other). The answer is use economics: make the coin-your-terms bastard pay for his entries in some way. How can you do that fairly? It’s simple, but also complicated. But the first step, to avoid spam, is just to make a TermCoiner pay his “2 cents” for the privilege of coining his crazy-ass term. That alone will prevent a lot of spam and send malignants to Urban or some other place to attempt their dirty work, requiring very expensive editors.

But there’s more to this solution beyond just paying for placement, what about long term storage costs?

That can be solved by what I call “the substrate”. Think of every coined word as having an expiration date determined by:

a) The coiner,
so Higgs can keep paying to keep his term alive long enough to gain popularity, and Gretchen Wieners can siege the world into “making FETCH happen”, can keep paying to keep the world alive. Like a plate-spinner at the circus, he can go as long as people will watch (or his own back collapses-- in our case his wallet). And that brings up…

b)…The public:
If the word is looked up enough, then it MUST be lexicon. no? Yes. Words are nothing more than repetition amongst people. We see this every year when a dictionary grandiosely TELLS. US. what words are now “English”. How pompous! How dictatorial! What a bunch of ASSES the Oxford and Cambridge professors are. Those who actually WRITE the definitions we seek, have probably never themselves coined a single term, but SNOOP Double-Oh Pee has! How crazy is that? All kinds of terms are coined by absolute nobodies, but there’s a way they can actually get credit, as if the world had an immutable ledger to determine who coined what term first-- as if words are now artwork, or doctoral theses, complete with peer review.

After all, it is USAGE which always determined what words make it, going back all the way to Mr. Webster and Sir James Murray. Murray needed citations, and it was some crazy-asses (literally) he recruited, or literally EVERYONE, to get him his citations. But a citation is just usage, and books and printed materials were the only way back then. But the REAL citations are spoken first, mostly, and maybe later get written. Why would a term coiner not be able to go from verbal usage in his city (for something like Philly’s “jawn”) to DIRECT ENTRY into the dictionary, closer to the time of invention, or even, gasp, the very second of inspiration someone coined the term first?

The answer is, in an ideal world, a person should be able to coin his own term the first second he concieves it, and then let either himself OR the world at large, decide whether his term should survive.

Long term, the definition’s “substrate” is an amount of money. That money goes up with usage, as seekers of the term can pay a miniscule amount (maybe 1/100th of a penny per seek) to seek and learn the term, and all those seeks will add-up to pay to keep the word in the slick dictionary. OR, you let the coiner pay a largeer substrate, to keep his word alive, if that’s what pleases him. Again, there’s ways to make sure his word is more hidden if not useful, but also ways to make it rise up if people should seek his term one day (like ole Higgsie).

In short, the answer is freedom to create, but tempered by the cold hard steel of public review and financing. Maximum freedom and the minimum economics (of survival of a word).

I personally think the internet is entering it’s third phase, and this third phase will provide the exact solutions to your question, that i’m outlining here. The first phase was advertising (banner ads) and “eCommerce” (seek book, buy book, Amazon-style). The second phase we are in now, see NYTimes, WSJ, Netflix, Hulu, Spotify and even the good ole OED: subscriptions (in all their annoying uncancelable glory). The third phase will be micro-payments-- payments which are too low for Visa and Mastercard to process (they charge about 10-30 cents per transaction, and almost no one is going to pay 10 cents to look up a definition). It’s coming, some day. The perfect dictionary, which moves with the people, in real time, and allows a language to breathe and pulse and beat with the hearts of all who use it. The next great dictionary will be almost as fast as word ideation itself, and it will even document the long hundreds-of-years-long change in the language itself. When you allow freedom and economics, the best old words survive, just like the best old businesses or fiefdoms can, or just like your family tree has survived. Not only that, but each word will have timestamps on them, so you can literally graph the use and even ecomomic value of each word, each definition of each word. A word’s definition can change with time, and that change can be documented and timelined as if words became science.

What do you think of my solution, kind sir(s)?

An interesting point of view indeed: you like a word? You pay a small initial fee to get it added to the dictionary. You want it to stay alive even if no one gives a damn? You just pay a modest annual fee and it stays for as long as you can afford it.
Yes, it’s certainly a way to create and maintain a dictionary - and to make a living out of it.
But I’m afraid that one should also consider a few marginal factors that might affect its success:

  • Social balance: the blabbering of a billionaire would get an unfair advantage over the works of a Spitzweg’s Poor Poet living in a drafty dormer. And today’s billionaires don’t seem to be particularly literate or versed in the art of thinking.
  • Emotional attitude: for reasons my shrink would know better than me I’ve never paid cash for a woman’s favours, although sometimes I invested small fortunes in flowers, petit cadeaux and Champagne to achieve almost the same goal.
    Same goes for words: all along my life I’ve spent a fortune in books, but I wouldn’t dream of wasting money in a particular word. Were words copyrighted, I’d just stop writing rather than paying even a microscopic amount per used word. It’s a matter of principle, please don’t ask me why for I’m short of reasonable answers.
  • Multiplicity of dictionaries: In order to ‘register’ a word and to keep it afloat one should pay fees to all the available dictionaries - and you bet that they would spread like wildfire on today’s internet: easy money has an irresitible allure. Which would make the whole thing pretty expensive.

Of course there’s more to it (there always is) but to cut it short I’d rather omit considerations about loanwords (siesta, kindergarten), composite words containing proper names (Hansen’s disease, Bose-Einstein statistics), words jocularly improvised on the spot (lipido, coffeaholic) and many others that alone might be the seed for interesting if lenghty discussions - Rome wasn’t built in a day, right? :slightly_smiling_face:

Those are all fantastic points, and your first was the first one that had to be tackled:
What if Kleen-ex just decided to dominate the word “tissue”? The dictionary would go from academic with problems, to commercial with problems.

The way to solve this is:

  1. DEFINITIONS NOT WORDS
    Words cannnot be copyrighted, only definitions. If Snoop defines “fashizzle” first, it will be timestamped on an immutable ledger which couldn’t be changed by the dictionary or anyone else. His timestamp would show that he likely coined the word and his definition of it, but language is of couse fungible and words cannot be owned, only claimed. But the DEFINITIONS of those words CAN be claimed. Since the dictionary would offer the #1 way to monetize definitions, it is unlikely that particular definitions would be overly rent-seeking, as charging too much for them would make them less popular (at least in the few, rare commercial instances).

  2. DEFINITIONS COMPETE:
    Definitions would be competitive, vying for the top spot in electronic dictionaries. The best definitions would rise to the top, algorithmically but with the algo focused on volume, learning, efficiency of being learned, and economic algorithms aimed at making the dictionary more enlightening, more efficient and even more entertaining (“The Three Es” of a great reference site)

  3. VOTING SETS RANK
    Usage of the dictionary allows for voting on the top definitions. This works not so much differently than Urban’s innovation (also a mostly open dictionary whose definitions are subject to competition) of ranking definitions. But instead of popularity, which causes Urban’s rankings to be terrible, the downside of spamming, votes require a cost, which brings me to…

  4. MICRO-FEE PER VOTE ELIMINATES MASS-
    charge an insignificantly small micro-fee for word seeks (like 1/10th of a penny or such). A tiny microtransaction would be insignificant to the seeker, as well as eliminate annoying commercial banner ads from the dictionary site, but the micro-fee would ACCUMULATE significant enough to prevent mass-spammings by billionaires and companies, as they would have to pay for many paid seeks to spam the system larger than, say, 4 billion English speakers using the dictionary. I call this “The GULLIVER PYGMALION EFFECT”. Sure, billionaires and companies are powerful, but their spammings cost millions and billions, whereas the legitimate seeks of the pygmalions (information seekers, all of us) add upSPAMto even bigger, making it quite costly to even keep a commercial definition in the rankings. Companies and billionaires could try, but what they’d find is that it would behoove them to actually make high quality definitions which could compete in their own right. Plus, the algorithm would be augmented, if not dominated, by the efficiency of learning the word by students. That brings me to the ultimate way to rank the EFFICIENCY of the eLearning of the words-- also a competition:

  5. LEARNING EFFICIENCY DOMINATES ALGORITHM
    While certainly the simple votes of the mass public would do a lot of the voting heavy lifting, this would be compared to the efficiency of definitions sticking in people’s brains by offering word/definition certified exams. These exams would be executed similarly to how the math world uses actuarial exams, or how the academic world uses varying levels of degrees. Using this dictionary and a dedicated definition test, certifications could be earned by top Word Nerd using the definition APIs of the site. In other words, your definitions wouldn’t just compete on public popularity or usage, but also in their efficiency to be remembered and savored by those looking to LEARN the English words. This again would cause the billionaires and companies to want to make EFFICIENT definitions which would help students learn faster and easier (augmenting memory). This becomes significant as dictionaries inevitably transition to audio and video definitions which require more effort and producing.

Think of definition certification exams as also similar to spelling bees. But definition and word mastery via levels of mastery certifications would be used as badges of honor for resumes and applications for writing jobs, etc… They would far outwiegh spelling bees. Also important to remember that teaching and spelling bees are about as old as ole man Webster himself. So certification exams by a dictionary would benefit all, turning knowledge and information into what I call “iSports”, or Information Sports. Again, this concept isn’t new, we’ve had game-shows, bar-quizing, college-entry exams, Information Technology Certification Exams, grade-level standardized testing, Guiness Book of World Record memory tests, spelling bees, and actuarial exams to name just a few. It’s about time for students and academics to compete, in important informational topics which can be measured (sakes alive, we have hot dog eating contests!). Why not put them too good use, instead of just testing for college and private high school entry exams? Create tests which will provide a learning efficiency feedback loop for learning definitions and words.

  1. PAY THE WRITERS!
    When your definition gets votes, by knowledge seekers paying the micro-fee, the micro-fee can be split between the writer/owner of the definition and the dictionary platform. This inspires top-quality content to be made, and by anyone qualified. The higher the quality the definition, the more votes it gets, and a higher more favorable ranking it gets among its competitors.

  2. SIGNED WORK
    Signatures are important. Would you believe the “fashizzle” and “Higgs Boson” definitions more if they were signed by Snoop Dogg and Mr. Higgs himself, or John Smith? Some may not like this, but language isn’t necessarily a merit-based organism. Popularity has ALWAYS counted, or that American Presidential candidate wouldn’t have gotten away with his new definition of “normalcy”. Right? When knowledge seekers can look at the signature of a definition work, as much as the content itself, they can decide what’s most important. Since anyone can define, it opens up definitions of words to all, the way Sir Murray would’ve dreamed-- as his job of quality assurance would’ve been happily eliminated by the same economics that makes Madison Avenue such a lovely commercial street in Manhattan, or museums such fine representations of man’s best art. The signature enforces a bit of pride in workmanship as well as its own potential branding opportunity, but this time for Word Nerds like us.

CONCLUSION:
Combine all these together, and you have a system which would not be easily biased, and certainly a lot cleaner than our current way of financing the dictionaries: BANNER ADS. Yuck. Micro-transactions, where a knowledge seeker is out maybe a dollar or two per year but isn’t bothered by flashy banner ads again, yes!

I really really appreciate the comments,thank you so much for your feedback! Your comments are very on-target for my solution of a better dictionary that can ebb and flow with the conversations, speeded-up by openness and the opportunity to profit by teaching the world the most efficiently, most englighteningly, and most entertainingly (that last part I think is missing in reference materials).

I don’t think my solutions are all the way towards their final destination, as any product never is, including all our historical dictionaries. But I think there’s enough for a model to be tried, and for the feedback loop to work its magic. The bigger trouble isn’t the attempts at gaming the system, it’s the same old “chicken and egg” problem nascent dictionaries have always had, including Etymonline: How do you have a dictionary if its not complete yet? The same problem Sir Murray struggled with, but ultimately conquered by doing the same solution as Urban Dictionary, and this one proposed above-- outsource the work to the mass public.

However, this time, unlike Wikis, Urban, or Oxford-- PAY the contributors with both money and their deserved signatory fame. That’s only fair, and similar to the Apple iPhone App Store, the creators should get the MAJORITY of the revenues, not the minority, creating a whole new industry which would provide many getting shunted by “AI” an avenue for employment. After all, crafty thoughtful PEOPLE write the best, not AI, so AI would have no chance, and also it wouldn’t even have access to the information without paying heavily for the communal corpus (something being fought in courts today, on behalf of live human writers vs the machines of copy known as “AI”).

You’re suggesting quite a complicated - though feasible - mechanism that in theory might work fine. But as a prerequisite it would need a unique NLA (National Language Agency) that sets the rules, collects the fees, chastises the infringers and distributes the revenues.
Which might actually be a weak spot, I fear: languages are a very natural phenomenon, people write and speak as they like, and even harmless primary school teachers are loathed as they scold a pupil for not abiding by the Official Rules of The Language while quarreling with a schoolmate.

What do you think, would people really welcome such an Authority? Or would they rather snub it and keep ignoring the One and Only Holy Dictionary it would try to impose?