The human civilization’s knowledge base as a machine-readable open knowledge source

©belchonok @ Depositphotos
  1. It will be generally good (for the development community and humanity and progress) to have programmatic access to common-sense knowledge.
  2. The best candidate for the source of this knowledge is Wikipedia — however chaotic and irregular it sometimes might be.

The task

To make a practical demonstration of Wikipedia’s ubiquitous knowledge-bearing power, I experimented on the weekend with one small piece of the data: movie ratings from Rotten Tomatoes.


Hands-on Tutorials

I spent a year building a spellchecker, and all I got is some grumbling to share.

Photo by Michael Dziedzic on Unsplash


The open format you probably use every day without even knowing has some delightful peculiarities

Photo by Joshua Hoehne on Unsplash


© raywoo (Depositphotos)


  1. Generate edits and test them for the correctness
  2. (If the first stage didn’t produce good enough results) Search…


© zimmytws (depositphotos)
  1. In the first part, I’ve described what Hunspell is; and why I decided to rewrite it in Python. It is an explanatory rewrite dedicated to uncovering the knowledge behind the Hunspell by “translating” it into a high-level language, with a lot of comments.
  2. In the second part, I’ve covered the basics of the lookup (word correctness check through the dictionary) algorithm, including affix compression.
  3. In the third part, the rest of the…


© belchonok (Depositphotos)
  1. In the first part, I’ve described what Hunspell is; and why I decided to rewrite it in Python. It is an explanatory rewrite dedicated to uncovering the knowledge behind the Hunspell by “translating” it into a high-level language, with a lot of comments.
  2. In the second part I’ve covered the basics of the lookup (word correctness check through the dictionary) algorithm, including affix compression.

  1. Check if a word is correct: “lookup”


© Lamai Prasitsuwan (Shutterstock)

How I decided to write a spellchecker and almost died trying

Meet the Hunspell

Back then, I decided to make a moderately generic tool, at least able to work with…


Задумка

Идея появилась как-то в январе 2014-го. Сейчас тяжело объяснить откуда или почему, но она возникла как-то сразу и целиком: чтение стихов вслух, автором — зачастую способ лучше их услышать и понять; видеозаписывающие устройства «домашнего качества» сейчас есть почти у всех; привычка воспринимать такое видео с собственного экрана — тоже. …

Victor Shepelev

Writing in human (en/ru/ukr) and programming (rb/py/…) languages . Open source, open data, texts processing, text authoring. And stuff.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store