Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder how the ratio of display overhead to cost to compute the minimal cost path has changed over the years? At some point it must be cheaper to do a "dumb" redraw if the display hardware is fast enough?

Also, it seems like this might also be useful for computing minimal diffs, eh?



It was definitely worth it if you had a 300 baud modem and a lightly loaded VAX mostly to yourself. But it became an overkill with higher baud rates and high speed networks.

It's based on the "string-to-string correction problem" (edit distance / Levenshtein distance), combined with "dynamic programming" (finding the best cost path through a cost matrix), which is (kind of) how diff works, on a line-by-line basis for inserted and deleted lines, and then on a character-by-character basis between changed lines. Each "edit" has a "cost" determined by the number of characters you have to send to the terminal (length of the escape codes to perform an edit in multiple possible ways, versus just redrawing the characters). Emacs computes its way through a twisty maze of little escape codes each time it updates the screen.

H. M. Wagner and M. J. Fischer. "The string-to-string correction problem." JACM 21, 1 (January 1974), 168-173

Citation: https://dl.acm.org/doi/10.1145/321796.321811

PDF: https://dl.acm.org/doi/pdf/10.1145/321796.321811?download=tr...

https://en.wikipedia.org/wiki/String-to-string_correction_pr...

>In computer science, the string-to-string correction problem refers to determining the minimum number of edit operations necessary to change one string into another (i.e., computing the shortest edit distance). A single edit operation may be changing a single symbol of the string into another, deleting, or inserting a symbol. The length of the edit sequence provides a measure of the distance between the two strings.

https://en.wikipedia.org/wiki/Diff

>In computing, the diff utility is a data comparison tool that calculates and displays the differences between two files. Unlike edit distance notions used for other purposes, diff is line-oriented rather than character-oriented, but it is like Levenshtein distance in that it tries to determine the smallest set of deletions and insertions to create one file from the other. The diff command displays the changes made in a standard format, such that both humans and machines can understand the changes and apply them: given one file and the changes, the other file can be created.

https://en.wikipedia.org/wiki/Levenshtein_distance

>In information theory, linguistics and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. It is named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965.

https://en.wikipedia.org/wiki/Edit_distance

>In computational linguistics and computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T.

Here's a Levenshtein demo that shows you the cost matrix with the shortest path highlighted. Try calculating the distance between "fotomat" and "tomato", and you can see the shortest path which has a cost of 3 edits (1: delete f at beginning, 2: delete o at beginning, 3: insert o at end):

http://www.let.rug.nl/kleiweg/lev/

>Levenshtein distance is obtained by finding the cheapest way to transform one string into another. Transformations are the one-step operations of (single-phone) insertion, deletion and substitution. In the simplest versions substitutions cost two units except when the source and target are identical, in which case the cost is zero. Insertions and deletions costs half that of substitutions. This demonstration illustrates a simple algorithm which basically looks at all of the different ways for operations to transform one string to another.


Cheers! I'm a big fan of yours BTW. Long ago, in a forum far away, your were the one to clue me in to NeWS (and pie menus too!) Blew open my concepts of what a display system could be. I'll always be grateful to you Don Hopkins.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: