VARUNA JAYASIRI

@vpj

Wallapatta editor spelling checker

April 30, 2016

Wallapatta spell checker uses the 10000 word list1. The words not on the list are highlighted in red, and the words less commonly used are highlighted in shades of orange. This lets you help use simple language in your articles.

images/posts/wallapatta/spell_check.png

In the Wallapatta Chrome Editor, press to turn spell checking on and off.

1 It doesn't have the whole dictionary; i.e. words that are not in the top 10,000 common words will be highlighted as spelling errors.

Wallapatta spell checker uses the <<https://github.com/first20hours/google-10000-english(10000 word list)>>^^1^^. The words not on the list are highlighted in red, and the words less commonly used are highlighted in shades of orange. This lets you help use simple language in your articles. >>> ^^1^^ It doesn't have the whole dictionary; i.e. words that are not in the top 10,000 common words will be highlighted as spelling errors. <!> !images/posts/wallapatta/spell_check.png In the <<https://chrome.google.com/webstore/detail/wallapatta/nleponjjojkllonfamfjhebhadibjlip(Wallapatta Chrome Editor)>>, press <-<i class="fa fa-check" />-> to turn spell checking on and off.

Open sourcing nearby.lk data models library

April 29, 2016

We did a re-write of the nearby.lk data model library, and we decided to open source the core of it. It supports JSON or YAML data files, and parses them based on a specification (like a schema).

The specification is a coffeescript class with a render function. The editor checks the input against the specification and uses the render function to show the parsed input. The editor can either work in YAML mode (plain YAML input) or form input mode (user fills up a form). It can handle quite complex data models.

Here's the editor accepting (and rendering a resume).

This is the class for the Resume example.

 class Resume extends Base
  @extend()

Set the type of the model.

  type: 'Resume'

Define properties. The type of the property is determined based on the property parameters. It defaults to Value property.

  @property 'name', {}
  ...

oneof specifies that the property is one of the models specified.

  @property 'address',
   oneof: ['Null', 'Address']

This is a list of values. rows and columns are schema parameters for values.

  @property 'statement',
   list:
    rows: 3
    columns: 50

This property is a list of other models.

  @property 'timeline',
   list:
    oneof: ['Experience', 'Education', 'Recognition']
    defaultValues: -> {from: '2010', to: '2020'}

  ...

These are some private functions of the Resume model.

  _education: ->
   res = (e for e in @_values.timeline when e.type is 'Education')
   res.sort (x, y) -> y._values.from - x._values.from

  ...

This is the template to render the output.

  template: (self) ->
   values = self._values
   education = self._education()
   experience = self._experience()
   recognitions = self._recognitions()

   @div ".resume", ->
    @div ".row", ->
     @div ".six.columns", ->
      @div ".name", "#{values.name}"
      @div ".role", "#{values.role}"
      if values.website isnt ''
       @div ".website", ->
        @a href: "#{values.website}", "#{values.website}"

     if values.address.type isnt 'Null'
      @div ".three.columns.address", ->
       values.address.weya this

     ...

New properties (e.g. image uploads) can be defined similarly.

Why did we need this?

On nearby.lk, we have local business information. Initially it was basic information like contact numbers, a small description, etc. Later, more details needed to be included. Also, the page structures differed based on the type of location. We didn't want to have the content in free flow. It's easier to make mistakes and not have uniform style with free text content. Also you can do a much better search with structured content.

nearby.lk is not yet running this re-written library. A little more work needs to be done to integrate this with the nearby search engine.

How to use it?

These samples show the classes for Resume example.

The library depends on Weya and Mod libraries for rendering and module management.

Embedding the editor

Mod.require 'Models.Editor', (Editor) ->
 editor = new Editor
  model: 'Resume' #Base model name
 editor.render element,
  width: width
  height: height
  onRendered
 onRendered = ->
  editor.yaml() #yaml edit mode
  editor.structured() #structured edit mode
  editor.resize  #resize editor
   width: width
   height: height
  editor.getModel() #return model object
  #returns json object omiting default values
  editor.getModel().getJSON()
  #returns full json object
  editor.getModel().getJSONFull()
  editor.setJSON jsonObject #set json object content

Using models

Mod.require 'Models.Models', (Models) ->
 ModelClass = Models.get 'Resume'
 model = new ModelClass
 #parse json object
 results = ModelClass.parse jsonObject
 #results.score = How maching it was [0..1]
 #results.errors = List of errors when parsing
 #results.value = model
 #returns json object omiting default values
 model.getJSON()
 #returns full json object
 model.getJSONFull()
 model.render element
 model.html() #returns html
We did a re-write of the <<http://nearby.lk(nearby.lk)>> data model library, and we decided to open source the core of it. It supports JSON or YAML data files, and parses them based on a specification (like a schema). >>> <<< <iframe src="https://ghbtns.com/github-btn.html?user=vpj&repo=models&type=star&count=true&size=large" frameborder="0" scrolling="0" width="160px" height="30px" </iframe> The specification is a coffeescript class with a render function. The editor checks the input against the specification and uses the render function to show the parsed input. The editor can either work in YAML mode (plain YAML input) or form input mode (user fills up a form). It can handle quite complex data models. Here's the editor accepting (and rendering a resume). <!> <<< <iframe src="http://vpj.github.io/models/" width="100%" height="500px" style="outline:none;border:1px solid #eee"></iframe> This is the class for the **Resume** example. ```coffee class Resume extends Base @extend() Set the type of the model. ```coffee type: 'Resume' Define properties. The type of the property is determined based on the property parameters. It defaults to --Value-- property. ```coffee @property 'name', {} ... ``oneof`` specifies that the property is one of the models specified. ```coffee @property 'address', oneof: ['Null', 'Address'] This is a list of values. ``rows`` and ``columns`` are schema parameters for values. ```coffee @property 'statement', list: rows: 3 columns: 50 This property is a list of other models. ```coffee @property 'timeline', list: oneof: ['Experience', 'Education', 'Recognition'] defaultValues: -> {from: '2010', to: '2020'} ... These are some private functions of the **Resume** model. ```coffee _education: -> res = (e for e in @_values.timeline when e.type is 'Education') res.sort (x, y) -> y._values.from - x._values.from ... This is the template to render the output. ```coffee template: (self) -> values = self._values education = self._education() experience = self._experience() recognitions = self._recognitions() @div ".resume", -> @div ".row", -> @div ".six.columns", -> @div ".name", "#{values.name}" @div ".role", "#{values.role}" if values.website isnt '' @div ".website", -> @a href: "#{values.website}", "#{values.website}" if values.address.type isnt 'Null' @div ".three.columns.address", -> values.address.weya this ... New properties (e.g. image uploads) can be defined similarly. ##Why did we need this? On <<http://www.nearby.lk(nearby.lk)>>, we have local business information. Initially it was basic information like contact numbers, a small description, etc. Later, more details needed to be included. Also, the page structures <<http://www.laundromat.lk/(differed)>> based on the <<http://www.nearby.lk/templeoftooth(type of location)>>. We didn't want to have the content in free flow. It's easier to make mistakes and not have uniform style with free text content. Also you can do a much better search with structured content. <<http://www.nearby.lk(nearby.lk)>> is not yet running this re-written library. A little more work needs to be done to integrate this with the nearby search engine. ##How to use it? <<https://github.com/vpj/models/tree/master/js/samples(These samples)>> show the classes for Resume example. The library depends on <<https://github.com/vpj/weya(Weya)>> and <<https://github.com/vpj/mod(Mod)>> libraries for rendering and module management. ###Embedding the editor ```coffee Mod.require 'Models.Editor', (Editor) -> editor = new Editor model: 'Resume' #Base model name editor.render element, width: width height: height onRendered onRendered = -> editor.yaml() #yaml edit mode editor.structured() #structured edit mode editor.resize #resize editor width: width height: height editor.getModel() #return model object #returns json object omiting default values editor.getModel().getJSON() #returns full json object editor.getModel().getJSONFull() editor.setJSON jsonObject #set json object content ###Using models ```coffee Mod.require 'Models.Models', (Models) -> ModelClass = Models.get 'Resume' model = new ModelClass #parse json object results = ModelClass.parse jsonObject #results.score = How maching it was [0..1] #results.errors = List of errors when parsing #results.value = model #returns json object omiting default values model.getJSON() #returns full json object model.getJSONFull() model.render element model.html() #returns html

Page Breaks

April 27, 2016

How to automatically add sensible page breaks?

In Wallaptta we model pagination as a cost minimization problem. That is, we try to find where to place page breaks so that there is no overflow and the cost is minimised. If the cost of adding a page break at a given point is know this can be easily solved with dynamic programming.

The question is how to determine the cost of adding a page-break at a given point.

First we calculated the cost based on the type of sections we split; e.g. list, paragraph, table, between to paragraphs, between two topics. This way the program would try to insert page breaks where the cost is less, like between two topics. Often inserting a page break will split multiple sections.

1 Item one
2 Item two
  <<<<< Page break
  Item two details
3 Item three

For instance the above page break splits a set of paragraphs as well as a list of items. It's important to know the hierarchy of the document when calculating this cost.

When we first tried with just the above cost, it gave two main problems:

  1. Not splitting at the best points

    e.g. when only 4 items would fit in a page

    1. Item 1
    <<<<< Page break
    2. Item 2
    3. Item 3
    4. Item 4
    5. Item 5

    The algorithm would place the break between 1 and 2, whilst the best place is between 4 and 5.

  2. Empty space

    Like in the above example it would leave a lot of empty space in initial pages while filling up later pages.

The first reaction to this was to add breaks at later points if the cost is the same. This didn't work out well as there were instances where the cost of breaking at a earlier point was slightly less - although the cost was less it didn't look nice.

So we introduced a cost based on the height of upper-part of the sections we are splitting. And another cost based on the height of the page left empty.

The cost based on height of the upper-part is an inversely proportionaly to the height; i.e. there's a very large cost if you break a section right after the start. And the cost of empty pages is also inversely proportional to the height filled with content; i.e. a large cost if only a small portion of a page is filled. The empty page cost is not added for the last page.

This algorithm seem to give decent results so far. Here are some example

Cost based on the type of sections getting splitted

Cost based on the height of upper-part of the sections getting splitted, and on the height of the page left empty

--How to automatically add sensible page breaks? In <<https://github.com/vpj/wallapatta(Wallaptta)>> we model pagination as a cost minimization problem. That is, we try to find where to place page breaks so that there is no overflow and the cost is minimised. If the cost of adding a page break at a given point is know this can be easily solved with --dynamic programming. The question is how to determine the cost of adding a page-break at a given point. First we calculated the cost based on the type of sections we split; e.g. --list--, --paragraph--, --table--, --between to paragraphs--, --between two topics--. This way the program would try to insert page breaks where the cost is less, like between two topics. Often inserting a page break will split multiple sections. >>> Cost based on the type of sections getting splitted ``` 1 Item one 2 Item two <<<<< Page break Item two details 3 Item three For instance the above page break splits a set of paragraphs as well as a list of items. It's important to know the hierarchy of the document when calculating this cost. When we first tried with just the above cost, it gave two main problems: - --Not splitting at the best points e.g. when only 4 items would fit in a page ``` 1. Item 1 <<<<< Page break 2. Item 2 3. Item 3 4. Item 4 5. Item 5 The algorithm would place the break between 1 and 2, whilst the best place is between 4 and 5. - --Empty space Like in the above example it would leave a lot of empty space in initial pages while filling up later pages. The first reaction to this was to add breaks at later points if the cost is the same. This didn't work out well as there were instances where the cost of breaking at a earlier point was slightly less - --although the cost was less it didn't look nice. So we introduced a cost based on the height of upper-part of the sections we are splitting. And another cost based on the height of the page left empty. >>> Cost based on the height of upper-part of the sections getting splitted, and on the height of the page left empty The cost based on height of the upper-part is an inversely proportionaly to the height; i.e. there's a very large cost if you break a section right after the start. And the cost of empty pages is also inversely proportional to the height filled with content; i.e. a large cost if only a small portion of a page is filled. --The empty page cost is not added for the last page. This algorithm seem to give decent results so far. Here are some example * <<http://vpj.github.io/images/posts/page_breaks/sample_a4.pdf(A guide - A4 size)>> * <<http://vpj.github.io/images/posts/page_breaks/sample_a5.pdf(Same guide - A5 size)>>

A few node.js/Javascript performance tips

March 19, 2016

Here's a list of small node.js related performance tips. I will keep updating this with new stuff we come across.

vm.runInContext vs vm.runInThisContext

runInContext and runInNewContext are much slower than runInThisContext.

Here's a small script that compares the performance with a simple for loop.

vm = require 'vm'
LOG = (require '../lib/log.js/log').log

N = 10000000
COUNT_SCRIPT = "for(var i = 0; i < #{N}; ++i) { count++; }"

T = (new Date).getTime()
count = 0
for i in [0...N]
 count++
LOG 'info', 'normal_time', "#{(new Date).getTime() - T}ms", count: count

global.count = 0
script = new vm.Script COUNT_SCRIPT
T = (new Date).getTime()
try
 script.runInThisContext timeout: 100
catch e
 LOG 'error', 'vm_error', e.message
LOG 'info', 'run_in_this', "#{(new Date).getTime() - T}ms", count: global.count

sandbox = count: 0
context = vm.createContext sandbox
script = new vm.Script COUNT_SCRIPT
T = (new Date).getTime()
try
 script.runInContext context, timeout: 1000
catch e
 LOG 'error', 'vm_error', e.message
LOG 'info', 'run_in_context', "#{(new Date).getTime() - T}ms", sandbox

This is the output

img/node-performance/vm.png

Heap Limit

The heap size limit of node.js applications on 64-bit system is about 1.7gb. Fortunately, you can increase this by passing max_old_space_size parameter to node.

node --max_old_space_size=4096 [JS_FILE]

However, typed arrays are not bound by this limit. That is, you can allocate a few gigabytes of memory to typed arrays without specifying the above parameter.

Parent of sliced string

Substrings of large strings keep a reference to the parent string. This eats up memory if you want to discard the parent string.

A quick hack of splitting and joining the substring solves this.

String join

String concatenation s += part eats up a lot of memory, that doesn't get garbage collected. It's ok for a few concatenations. Anything more than 10 should use Array.join.

The reason why the normal_time is higher than run_in_this is probably because coffeescript for loop has two counters.

runInContext gets terminated within the timeout (1s) before it completes the loop.

The value is specified in megabytes.

Here's a list of small node.js related performance tips. I will keep updating this with new stuff we come across. ##``vm.runInContext`` vs ``vm.runInThisContext`` >>> <<https://nodejs.org/api/vm.html(Documentation)>> ``runInContext`` and ``runInNewContext`` are much slower than ``runInThisContext``. Here's a small script that compares the performance with a simple for loop. ```coffee vm = require 'vm' LOG = (require '../lib/log.js/log').log N = 10000000 COUNT_SCRIPT = "for(var i = 0; i < #{N}; ++i) { count++; }" T = (new Date).getTime() count = 0 for i in [0...N] count++ LOG 'info', 'normal_time', "#{(new Date).getTime() - T}ms", count: count global.count = 0 script = new vm.Script COUNT_SCRIPT T = (new Date).getTime() try script.runInThisContext timeout: 100 catch e LOG 'error', 'vm_error', e.message LOG 'info', 'run_in_this', "#{(new Date).getTime() - T}ms", count: global.count sandbox = count: 0 context = vm.createContext sandbox script = new vm.Script COUNT_SCRIPT T = (new Date).getTime() try script.runInContext context, timeout: 1000 catch e LOG 'error', 'vm_error', e.message LOG 'info', 'run_in_context', "#{(new Date).getTime() - T}ms", sandbox This is the output !img/node-performance/vm.png >>> The reason why the **normal_time** is higher than **run_in_this** is probably because coffeescript for loop has two counters. ``runInContext`` gets terminated within the timeout (1s) before it completes the loop. ##Heap Limit The heap size limit of node.js applications on 64-bit system is about 1.7gb. Fortunately, you can increase this by passing ``max_old_space_size`` parameter to node. ```bash node --max_old_space_size=4096 [JS_FILE] >>> The value is specified in megabytes. However, <<https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays(typed arrays)>> are not bound by this limit. That is, you can allocate a few gigabytes of memory to typed arrays without specifying the above parameter. ##Parent of sliced string >>> <<http://vpj.github.io/string_slice.html(Wrote a separate post on this)>> Substrings of large strings keep a reference to the parent string. This eats up memory if you want to discard the parent string. A quick hack of splitting and joining the substring solves this. ##String join String concatenation ``s += part`` eats up a lot of memory, that doesn't get garbage collected. It's ok for a few concatenations. Anything more than 10 should use ``Array.join``.

Wallapatta Update

December 01, 2015

Wallapatta1 got some new features.

First is full width blocks. These span into the sidenote area. These are ideal for large images and quotes.

https://static.pexels.com/photos/6547/sky-night-space-galaxy-large.jpeg

Next is javascript, coffeescript or weya template blocks. These can have code that generates html.

Heres a coffeescript block.

<<<coffee
 "7 * 100 = <strong>#{7 * 100}</strong>"

Here's a weya block that generates the Forestpin logo in an SVG.

<<<weya
 G = 1.618
 H = 13
 order = [2, 4, 1, 0, 3]
 heights = (H * Math.pow G, i for i in order)
 @svg width: 250, height: 250, ->
  @g transform: "translate(2, 154)", ->
   for h, i in heights
    @g ".bar", transform: "translate(#{i * 50}, 0)", ->
     @rect y: -h * G, width: 46, height: h * G, fill: '#4a4a4a'
     @rect width: 30.67, height: h,fill: '#98ff98'
     @rect x: 30.67, width: 15.33, height: h, fill: '#8bea8b'

Checkout for details

1 Wallapatta is similar to markdown. It supports a document layout inspired by Edward Tufte's handouts and books. This blog is written in Wallapatta.

7 * 100 = 700

It generates the Forestpin Logo

<<https://github.com/vpj/wallapatta(Wallapatta)>>^^1^^ got some new features. >>> ^^1^^ Wallapatta is similar to markdown. It supports a document layout inspired by Edward Tufte's handouts and books. This blog is written in Wallapatta. * <<https://chrome.google.com/webstore/detail/wallapatta/nleponjjojkllonfamfjhebhadibjlip(Editor)>> * <<https://github.com/vpj/wallapatta(Github)>> * <<http://vpj.github.io/wallapatta/reference.html#wallapatta_335(Reference)>> <!> First is full width blocks. These span into the sidenote area. These are ideal for large images and quotes. !https://static.pexels.com/photos/6547/sky-night-space-galaxy-large.jpeg Next is javascript, coffeescript or weya template blocks. These can have code that generates html. Heres a coffeescript block. ```coffeescript <<<coffee "7 * 100 = <strong>#{7 * 100}</strong>" >>> <<<coffee "7 * 100 = <strong>#{7 * 100}</strong>" Here's a weya block that generates the <<http://www.forestpin.com(Forestpin)>> logo in an SVG. >>> <<https://github.com/vpj/weya(Weya on Github)>> ```coffeescript <<<weya G = 1.618 H = 13 order = [2, 4, 1, 0, 3] heights = (H * Math.pow G, i for i in order) @svg width: 250, height: 250, -> @g transform: "translate(2, 154)", -> for h, i in heights @g ".bar", transform: "translate(#{i * 50}, 0)", -> @rect y: -h * G, width: 46, height: h * G, fill: '#4a4a4a' @rect width: 30.67, height: h,fill: '#98ff98' @rect x: 30.67, width: 15.33, height: h, fill: '#8bea8b' >>> It generates the <<https://www.forestpin.com(Forestpin)>> Logo <<<weya G = 1.618 H = 13 order = [2, 4, 1, 0, 3] heights = (H * Math.pow G, i for i in order) @svg width: 250, height: 250, -> @g transform: "translate(2, 154)", -> for h, i in heights @g ".bar", transform: "translate(#{i * 50}, 0)", -> @rect y: -h * G, width: 46, height: h * G, fill: '#4a4a4a' @rect width: 30.67, height: h,fill: '#98ff98' @rect x: 30.67, width: 15.33, height: h, fill: '#8bea8b' Checkout << for details

Parent in (sliced string)

November 23, 2015

img/sliced_string.png

In Google Chrome 1 when you take substrings of a larger string, the larger string is not garbage collected even if it is no longer referenced. The problem seems to be because the substring keeps a reference to the parent string.

Demo

The demo iteratively,

  1. creates a random large string (~10M in length)
  2. take a few substrings (each of about 50 characters)
  3. push the substrings to a global array (the only thing permanently referenced)
  4. remove references to the large string

You would assume that larger strings will get garbage collected. And would expect the program to run without a problem for many many iterations - until the memory taken up by the smaller strings hits the limit.

Unfortunately in Google Chrome it doesn't work like that. The small substrings keep a reference to the parent and therefore it crashes after a few iterations. Try http://vpj.github.io/bench/string_slice.html in Google Chrome.It crashes before 2000 cycles. 2 If you take a heap snapshot on Chrome Inspector, you can see the large strings, referenced as parent in (sliced string) by smaller strings (as shown in the above screenshot).

Then we wrote a copy of the same program, which creates a copy of the substrings before storing them. The following splitting and concatenation creates a actual copy of the string, instead of a reference copy.

newSmallString = smallString.split('').join('')

Try http://vpj.github.io/bench/string_slice_join.html, which will run runs for many more iterations doing the same thing.

1 Same happens in node.js since it uses V8.

2 This only happens in Chrome, not on Safari. Did not check on Firefox.

<!> !img/sliced_string.png In Google Chrome ^^1^^ when you take substrings of a larger string, the larger string is not garbage collected even if it is no longer referenced. The problem seems to be because the substring keeps a reference to the parent string. >>> ^^1^^ Same happens in <<https://nodejs.org/en/(node.js)>> since it uses V8. ## <<<http://vpj.github.io/bench/string_slice.html(Demo)>> The demo iteratively, - creates a random large string (~10M in length) - take a few substrings (each of about 50 characters) - push the substrings to a global array (the only thing permanently referenced) - remove references to the large string You would assume that larger strings will get garbage collected. And would expect the program to run without a problem for many many iterations - until the memory taken up by the smaller strings hits the limit. Unfortunately in Google Chrome it doesn't work like that. The small substrings keep a reference to the parent and therefore it crashes after a few iterations. Try <<http://vpj.github.io/bench/string_slice.html>> in Google Chrome.It crashes before 2000 cycles. ^^2^^ If you take a heap snapshot on Chrome Inspector, you can see the large strings, referenced as **parent in (sliced string)** by smaller strings (as shown in the above screenshot). >>> ^^2^^ This only happens in Chrome, not on Safari. Did not check on Firefox. Then we wrote a copy of the same program, which creates a copy of the substrings before storing them. The following splitting and concatenation creates a actual copy of the string, instead of a reference copy. ```js newSmallString = smallString.split('').join('') Try <<http://vpj.github.io/bench/string_slice_join.html>>, which will run runs for many more iterations doing the same thing.
next