Nim Days
Nim days book is about my journey using Nim and creating useful/practical things with it including:
- ini parser
- bencode parser
- links checker
- tictactoe (commandline and gui)
- testing framework
- build system
- tcp router
- redis parser
- redis client
- assets bundler
- terminal table
- dotfiles manager
- urlshortening application
This book is influenced by the great books Practical Common Lisp, Real World Haskell and I'm planning to follow the same model of having the book available for free online.
Reporting issues
You can report issues or create pull requests on the book repository
Day 1: Parsing DMIDecode output
In our first day we will write a dmidecode parser in nim
What to expect ?
let sample1 = """
# dmidecode 3.1
Getting SMBIOS data from sysfs.
SMBIOS 2.6 present.
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: LENOVO
Product Name: 20042
Version: Lenovo G560
Serial Number: 2677240001087
UUID: CB3E6A50-A77B-E011-88E9-B870F4165734
Wake-up Type: Power Switch
SKU Number: Calpella_CRB
Family: Intel_Mobile
"""
import dmidecode, tables
var obj : Table[string, dmidecode.Section]
obj = parseDMI(sample)
for secname, sec in obj:
echo secname & " with " & $len(sec.props)
for k, p in sec.props:
echo "k : " & k & " => " & p.val
if len(p.items) > 0:
for i in p.items:
echo "\t\t I: ", i
Implementation
a while ago at work (https://github.com/zero-os/0-core) we needed to parse some dmidecode output, and it sounds like an good problem with enough concepts to get my feet wet in nim.
nimble ready!
mkdir dmidecode
cd dmidecode
nimble init
So how does dmidecode output look like?
# dmidecode 3.1
Getting SMBIOS data from sysfs.
SMBIOS 2.6 present.
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: LENOVO
Product Name: 20042
Version: Lenovo G560
Serial Number: 2677240001087
UUID: CB3E6A50-A77B-E011-88E9-B870F4165734
Wake-up Type: Power Switch
SKU Number: Calpella_CRB
Family: Intel_Mobile
or
Getting SMBIOS data from sysfs.
SMBIOS 2.6 present.
Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
Vendor: LENOVO
Version: 29CN40WW(V2.17)
Release Date: 04/13/2011
ROM Size: 2048 kB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
EDD is supported
Japanese floppy for NEC 9800 1.2 MB is supported (int 13h)
Japanese floppy for Toshiba 1.2 MB is supported (int 13h)
5.25"/360 kB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
8042 keyboard services are supported (int 9h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
BIOS Revision: 1.40
- DMIDecode output is some meta like comments, versions and one or more sections
- Section: consists of a
- handle line
- title line
- one or more indented properties
- Property: consists of
- key
- optional value
- optional list of indented items
Mapping DMI to nim structures
So ourplan is to have an api like
dmifile = parseDMI(source)
dmifile["section1"]["property1"].value
Let's describe the document structure we have
import sequtils, tables, strutils
type
Property* = ref object
val*: string
items*: seq[string]
type
Section* = ref object
handleLine*, title*: string
props* : Table[string, Property]
method addItem(this: Property, item: string) =
this.items.add(item)
As our parsing will depend on the indentation level we can use this handy function to get the indentation level of a line (number of spaces before the first asciiLetter)
proc getIndentLevel(line: string) : int =
for i, c in pairs(line):
if not c.isSpaceAscii():
return i
return 0
It'd have been nicer to use takewhile
, but it's not available in nim stdlib
getindentlevel = lambda l: len(list(takewhile(lambda c: c.isspace(), l)))
Parsing DMI source into nim structures
There're many ways to parse the DMI (e.g using regex which would be fairly simple "feel free to implement it" and kindly send me a PR to update this tutorial)
proc parseDMI* (source: string) : Table[string, Section]=
In plain english for output like this
Getting SMBIOS data from sysfs.
SMBIOS 2.6 present.
Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
Vendor: LENOVO
Version: 29CN40WW(V2.17)
Release Date: 04/13/2011
ROM Size: 2048 kB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
EDD is supported
Japanese floppy for NEC 9800 1.2 MB is supported (int 13h)
Japanese floppy for Toshiba 1.2 MB is supported (int 13h)
5.25"/360 kB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
8042 keyboard services are supported (int 9h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
BIOS Revision: 1.40
we have couple of states
type
ParserState = enum
noOp, sectionName, readKeyValue, readList
- noOp: no action yet
- sectionName: read sectionName
- readKeyValue: read a line has colon
:
in it into a key value pair - readList: when the next line has greater indentation level than the property line
so our state is noOp until we reach line
Handle 0x0000, DMI type 0, 24 bytes
then moves to sectionName
for line BIOS Information
then state changes to reading properties
Vendor: LENOVO
Version: 29CN40WW(V2.17)
Release Date: 04/13/2011
ROM Size: 2048 kB
Characteristics:
then we notice the indentation on the next line is greater than the one on the current line
PCI is supported
Characteristics:
so state moves into readList to read the items related to property Characterstics
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
EDD is supported
Japanese floppy for NEC 9800 1.2 MB is supported (int 13h)
Japanese floppy for Toshiba 1.2 MB is supported (int 13h)
5.25"/360 kB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
8042 keyboard services are supported (int 9h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
and again it notices the indentation is of the next line is less than the current line
BIOS Revision: 1.40
Targeted content distribution is supported
so state switches again into readKeyValue
- if we encounter an empty line:
- if not in parsing state then it's a noOp we ignore meta and empty lines
- if in parsing state
current Section isn't nil
we finish parsing the section object
proc parseDMI* (source: string) : Table[string, Section]=
var
state : ParserState = noOp
lines = strutils.splitLines(source)
sects = initTable[string, Section]()
p: Property = nil
s: Section = nil
k, v: string
Here we define the current state, code lines, initialize a table sects
from sectionName
to Section Object
and variables p current property
, s current section
, k, v current property key, value
for i, l in pairs(lines):
Start looping on index, line using pairs
pairs is kinda like enumerate in python
if l.startsWith("Handle"):
s = new Section
s.props = initTable[string, Property]()
s.handleline = l
state = sectionName
continue
If we encounter the string Handle
- create new section object and initialize it's props table
- keep track of the handle line
- switch state to reading sectionName
- continue the loop to move to the title line
if l == "": # can be just new line before reading any sections.
if s != nil:
sects[s.title] = s
continue
if line is empty and we have a section object not nil
we finish the section and continue
if state == sectionName: # current line is the title line
s.title = l
state = readKeyValue # change state into reading key value pairs
If state is sectionName:
- this line is a title line
- change state for the upcoming to readKeyValue
elif state == readKeyValue:
let pair = l.split({':'})
k = pair[0].strip()
if len(pair) == 2:
v = pair[1].strip()
else: # value can be empty
v = ""
p = Property(val: v)
p.items = newSeq[string]()
p.val = v
If state is readKeyValue
- split the line on colon
:
to get key, value pair and set v to "" if not present - make current Property
p
and initialize its related fieldsitems
,val
# current line indentation is < nextline indentation => change state to readList
if i < len(lines) and (getIndentlevel(l) < getIndentlevel(lines[i+1])) :
state = readList
If the next line indentation is greater this means we're should be reading list of items regarding the current property p
else:
# add key/value pair directly
s.props[k] = p
If not finish the property
elif state == readList:
# keep adding the current line to current property items and if dedented => change state to readKeyValue
p.add_item(l.strip())
if getindentlevel(l) > getindentlevel(lines[i+1]):
state = readKeyValue
s.props[k] = p
if state is readList
- keep adding items to current property
p
- if the indentation level decreased change state to
readKeyValue
and finish property
return sects
Day 2: Parsing Bencode
nim-bencode is a library to encode/decode torrent files Bencode
What to expect?
import bencode, tables, strformat
let encoder = newEncoder()
let decoder = newDecoder()
let btListSample1 = @[BencodeType(kind:btInt, i:1), BencodeType(kind:btString, s:"hi") ]
var btDictSample1 = initOrderedTable[BencodeType, BencodeType]()
btDictSample1[BencodeType(kind:btString, s:"name")] = BencodeType(kind:btString, s:"dmdm")
btDictSample1[BencodeType(kind:btString, s:"lang")] = BencodeType(kind:btString, s:"nim")
btDictSample1[BencodeType(kind:btString, s:"age")] = BencodeType(kind:btInt, i:50)
btDictSample1[BencodeType(kind:btString, s:"alist")] = BencodeType(kind:btList, l:btListSample1)
var testObjects = initOrderedTable[BencodeType, string]()
testObjects[BencodeType(kind: btString, s:"hello")] = "5:hello"
testObjects[BencodeType(kind: btString, s:"yes")] = "3:yes"
testObjects[BencodeType(kind: btInt, i:55)] = "i55e"
testObjects[BencodeType(kind: btInt, i:12345)] = "i12345e"
testObjects[BencodeType(kind: btList, l:btListSample1)] = "li1e2:hie"
testObjects[BencodeType(kind:btDict, d:btDictSample1)] = "d4:name4:dmdm4:lang3:nim3:agei50e5:alistli1e2:hiee"
for k, v in testObjects.pairs():
echo $k & " => " & $v
doAssert(encoder.encodeObject(k) == v)
doAssert(decoder.decodeObject(v) == k)
Implementation
So according to Bencode we have some datatypes
- strings and those are encoded with the string length followed by a colon and the string itself
length:string
, e.g yes will be encoded into3:yes
- ints those are encoded between
i
,e
letters, e.g 59 will be encoded intoi59e
- lists can contain any of the bencode types and it's encoded with
l
,e
, e.g list of 1, 2 numbers is encoded intoli1ei2e
or with spaces for verbosityl i1e i2e e
- dicts are mapping from strings to any type and encoded between letters
d
,e
, e.g name => hi and num => 3 is encoded intod4:name2:hi3:numi3ee
or with spaces for verbosityd 4:name 2:hi 3:num i3e e
Imports
import strformat, tables, json, strutils, hashes
As we will be dealing a lot with strings, tables
Types
type
BencodeKind* = enum
btString, btInt, btList, btDict
So as we mentioned about bencode data types we can define an enum to represents the kinds
BencodeType* = ref object
case kind*: BencodeKind
of BencodeKind.btString: s* : string
of BencodeKind.btInt: i* : int
of BencodeKind.btList: l* : seq[BencodeType]
of BencodeKind.btDict: d* : OrderedTable[BencodeType, BencodeType]
Encoder* = ref object
Decoder* = ref object
Encoder
a simple class to represent encoding operationsDecoder
a simple class to represent decoding operations- For
BencodeType
we make use of variant objectscase classes
in other languages. worth noticing variant objects are the same technique used forjson
module.
So we can use it like this
BencodeType(kind: btString, s:"hello")
BencodeType(kind: btInt, i:55)
let btListSample1 = @[BencodeType(kind:btInt, i:1), BencodeType(kind:btString, s:"hi") ]
BencodeType(kind: btList, l:btListSample1)
So general rule for the case classes is you have a kind defined in an enum and a constructor value u create the object with.
If you're coming from Haskell or a similar language
data BValue = BInt Integer
| BStr B.ByteString
| BList [BValue]
| BDict (M.Map BValue BValue)
deriving (Show, Eq, Ord)
Please, note if you define your own variant you should define hash
, ==
procs to be able to compare or hash the values.
proc hash*(obj: BencodeType): Hash =
case obj.kind
of btString : !$(hash(obj.s))
of btInt : !$(hash(obj.i))
of btList: !$(hash(obj.l))
of btDict:
var h = 0
for k, v in obj.d.pairs:
h = h !& hash(k) !& hash(v)
!$(h)
hash
proc returnsHash
and depending on thekind
we return the hash of the underlying stored objects, strings, ints, lists or calculate a new hash if needed!&
consider it like merging the two hashes together!$
is used to finalize the Hash object
proc `==`* (a, b: BencodeType): bool =
## Check two nodes for equality
if a.isNil:
if b.isNil: return true
return false
elif b.isNil or a.kind != b.kind:
return false
else:
case a.kind
of btString:
result = a.s == b.s
of btInt:
result = a.i == b.i
of btList:
result = a.l == b.l
of btDict:
if a.d.len != b.d.len: return false
for key, val in a.d:
if not b.d.hasKey(key): return false
if b.d[key] != val: return false
result = true
define equality operator on BencodeTypes to determine when they're equal by defining proc for operator ==
proc `$`* (a: BencodeType): string =
case a.kind
of btString: fmt("<Bencode {a.s}>")
of btInt: fmt("<Bencode {a.i}>")
of btList: fmt("<Bencode {a.l}>")
of btDict: fmt("<Bencode {a.d}")
Define a simple toString
proc using the $
operator.
Encoding
proc encode(this: Encoder, obj: BencodeType) : string
we add forward declarating to encode proc because to encode a list we might encode another values strings
, or even lists
so we will recursively call encode if needed, feel free to skip to the next part.
proc encode_s(this: Encoder, s: string) : string=
# TODO: check len
return $s.len & ":" & s
To encode a string we said we will put encoded with its length + :
+ string itself
proc encode_i(this: Encoder, i: int) : string=
# TODO: check len
return fmt("i{i}e")
To encode an int we put it between i
, e
chars
proc encode_l(this: Encoder, l: seq[BencodeType]): string =
var encoded = "l"
for el in l:
encoded &= this.encode(el)
encoded &= "e"
return encoded
- To encode a list of elements of type
BencodeType
we put their encoded values betweenl
,e
chars - Notice the call to
this.encode
that's why we needed the forward declaration.
proc encode_d(this: Encoder, d: OrderedTable[BencodeType, BencodeType]): string =
var encoded = "d"
for k, v in d.pairs():
assert k.kind == BencodeKind.btString
encoded &= this.encode(k) & this.encode(v)
encoded &= "e"
return encoded
- To encode a dict we enclose the encoded value of the pairs between
d
,e
- Notice the recursive call to
this.encode
to the keys and values - Notice the assertion the kind of the keys
must
be abtString
according toBencode
specs.
proc encode(this: Encoder, obj: BencodeType) : string =
case obj.kind
of BencodeKind.btString: result =this.encode_s(obj.s)
of BencodeKind.btInt : result = this.encode_i(obj.i)
of BencodeKind.btList : result = this.encode_l(obj.l)
of BencodeKind.btDict : result = this.encode_d(obj.d)
Simple proxy to encode obj
of BencodeType
Decoding
proc decode(this: Decoder, source: string) : (BencodeType, int)
Forward declaration for decode
same as decode
proc decode_s(this: Decoder, s: string) : (BencodeType, int) =
let lengthpart = s.split(":")[0]
let sizelength = lengthpart.len
let strlen = parseInt(lengthpart)
return (BencodeType(kind:btString, s: s[sizelength+1..strlen+1]), sizelength+1+strlen)
Create a BencodeType of after decoding a string reverse operation of encode_s
Basically and read string of length sizelength
after the colon
and construct a BencodeType
of kind btString
out of it
proc decode_i(this: Decoder, s: string) : (BencodeType, int) =
let epos = s.find('e')
let i = parseInt(s[1..<epos])
return (BencodeType(kind:btInt, i:i), epos+1)
Extract the number between i
, e
chars and construct BencodeType
of kind btInt
out of it
proc decode_l(this: Decoder, s: string): (BencodeType, int) =
# l ... e
var els = newSeq[BencodeType]()
var curchar = s[1]
var idx = 1
while idx < s.len:
curchar = s[idx]
if curchar == 'e':
idx += 1
break
let pair = this.decode(s[idx..<s.len])
let obj = pair[0]
let nextobjpos = pair[1]
els.add(obj)
idx += nextobjpos
return (BencodeType(kind:btList, l:els), idx)
Decoding the list can be bit tricky
- Its elements are between
l
,e
chars - So we start trying to decode objects starting from the first letter
after
thel
until we reach the finale
e.g
li1ei2ee
will be parsed like the following
li120ei492ee
$ $
- will consume the object
i120e
and set the cursor to the beginning of the second objecti492e
- after all the objects are consumed we consume the end character
e
and we are done - That's why all decode procs return
int
value to let us now how much characters to skip
proc decode_d(this: Decoder, s: string): (BencodeType, int) =
var d = initOrderedTable[BencodeType, BencodeType]()
var curchar = s[1]
var idx = 1
var readingKey = true
var curKey: BencodeType
while idx < s.len:
curchar = s[idx]
if curchar == 'e':
break
let pair = this.decode(s[idx..<s.len])
let obj = pair[0]
let nextobjpos = pair[1]
if readingKey == true:
curKey = obj
readingKey = false
else:
d[curKey] = obj
readingKey = true
idx += nextobjpos
return (BencodeType(kind:btDict, d: d), idx)
- Same technique as above
- Basically we read one object if we don't have a current key then we set it as the current key
- If we have a current key object then the object we read is the value, so we set the currentKey to that value and
change
mode to readingKey again.
proc decode(this: Decoder, source: string) : (BencodeType, int) =
var curchar = source[0]
var idx = 0
while idx < source.len:
curchar = source[idx]
case curchar
of 'i':
let pair = this.decode_i(source[idx..source.len])
let obj = pair[0]
let nextobjpos = pair[1]
idx += nextobjpos
return (obj, idx)
of 'l':
let pair = this.decode_l(source[idx..source.len])
let obj = pair[0]
let nextobjpos = pair[1]
idx += nextobjpos
return (obj, idx)
of 'd':
let pair = this.decode_d(source[idx..source.len])
let obj = pair[0]
let nextobjpos = pair[1]
idx += nextobjpos
return (obj, idx)
else:
let pair = this.decode_s(source[idx..source.len])
let obj = pair[0]
let nextobjpos = pair[1]
idx += nextobjpos
return (obj, idx)
Starts decoding based on the beginning of character encoding object i
for int, l
for lists, d
for dicts and otherwise tries to parse string
proc newEncoder*(): Encoder =
new Encoder
proc newDecoder*(): Decoder =
new Decoder
Simple constructor procs for newEncoder, newDecoder
proc encodeObject*(this: Encoder, obj: BencodeType) : string =
return this.encode(obj)
encodeObject
dispatch the call to encode
proc.
proc decodeObject*(this: Decoder, source:string) : BencodeType =
let p = this.decode(source)
return p[0]
decodeObject
provides a friendlier API to return the BencodeType from decode instead of BencodeType
, how many to read
int
Day 3: Talking to C (FFI and libmagic)
Libmagic is a magic number recognition library, remember everytime you called file
utility on a file to know its type?
➜ file /usr/bin/rm
/usr/bin/rm: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=cbae26b2a032b1ce3129d56aee2bcf70dd8deeb0, stripped
➜ nim-magic file /
/: directory
➜ file /usr/include/stdio.h
/usr/include/stdio.h: C source, ASCII text
What to expect?
import magic
echo magic.guessFile("/usr/bin/rm")
The output should be something like
ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=cbae26b2a032b1ce3129d56aee2bcf70dd8deeb0, stripped
Implementation
FFI Chapter of Nim in Action is freely available.
Step 0: Imports
from os import fileExists, expandFilename
Step 1: Get the library info
Well, libmagic has libmagic.so
in your library path /usr/lib/libmagic.so
and a header file magic.h
in /usr/include/magic.h
.
create a constant for the libmagic library name.
const libName* = "libmagic.so"
Step 2: Extract constants
We should extract the constants from the header
#define MAGIC_NONE 0x0000000 /* No flags */
#define MAGIC_DEBUG 0x0000001 /* Turn on debugging */
#define MAGIC_SYMLINK 0x0000002 /* Follow symlinks */
#define MAGIC_COMPRESS 0x0000004 /* Check inside compressed files */
#define MAGIC_DEVICES 0x0000008 /* Look at the contents of devices */
#define MAGIC_MIME_TYPE 0x0000010 /* Return the MIME type */
#define MAGIC_CONTINUE 0x0000020 /* Return all matches */
#define MAGIC_CHECK 0x0000040 /* Print warnings to stderr */
....
So in nim It'd be something like this
const MAGIC_NONE* = 0x000000 # No flags
const MAGIC_DEBUG* = 0x000001 # Turn on debugging
const MAGIC_SYMLINK* = 0x000002 # Follow symlinks
const MAGIC_COMPRESS* = 0x000004 # Check inside compressed files
const MAGIC_DEVICES* = 0x000008 # Look at the contents of devices
const MAGIC_MIME_TYPE* = 0x000010 # Return only the MIME type
const MAGIC_CONTINUE* = 0x000020 # Return all matches
const MAGIC_CHECK* = 0x000040 # Print warnings to stderr
const MAGIC_PRESERVE_ATIME* = 0x000080 # Restore access time on exit
const MAGIC_RAW* = 0x000100 # Don't translate unprint chars
const MAGIC_ERROR* = 0x000200 # Handle ENOENT etc as real errors
const MAGIC_MIME_ENCODING* = 0x000400 # Return only the MIME encoding
const MAGIC_NO_CHECK_COMPRESS* = 0x001000 # Don't check for compressed files
const MAGIC_NO_CHECK_TAR* = 0x002000 # Don't check for tar files
const MAGIC_NO_CHECK_SOFT* = 0x004000 # Don't check magic entries
const MAGIC_NO_CHECK_APPTYPE* = 0x008000 # Don't check application type
const MAGIC_NO_CHECK_ELF* = 0x010000 # Don't check for elf details
const MAGIC_NO_CHECK_ASCII* = 0x020000 # Don't check for ascii files
const MAGIC_NO_CHECK_TOKENS* = 0x100000 # Don't check ascii/tokens
Step 3: Extract the types
typedef struct magic_set *magic_t;
so the only type we have is a pointer to some struct (object)
type Magic = object
type MagicPtr* = ptr Magic
Step 4: Extract procedures
magic_t magic_open(int);
void magic_close(magic_t);
const char *magic_getpath(const char *, int);
const char *magic_file(magic_t, const char *);
const char *magic_descriptor(magic_t, int);
const char *magic_buffer(magic_t, const void *, size_t);
const char *magic_error(magic_t);
int magic_getflags(magic_t);
int magic_setflags(magic_t, int);
int magic_version(void);
int magic_load(magic_t, const char *);
int magic_load_buffers(magic_t, void **, size_t *, size_t);
int magic_compile(magic_t, const char *);
int magic_check(magic_t, const char *);
int magic_list(magic_t, const char *);
int magic_errno(magic_t);
we only care about magic_open
, magic_load
, magic_close
, magic_file
, magic_error
# magic_t magic_open(int);
proc magic_open(i:cint) : MagicPtr {.importc, dynlib:libName.}
magic_open
is a proc declared in dynamic lib libmagic.so
, that is takes a cint "compatible c int" i
and returns a MagicPtr
.
From the manpage
The function magic_open() creates a magic cookie pointer and returns it. It returns NULL if there was an error allocating the magic cookie. The flags argument specifies how the other magic functions should behave
# void magic_close(magic_t);
proc magic_close(p:MagicPtr): void {.importc, dynlib:libName.}
magic_close
is a proc declared in dynlib libmagic.so
and takes an argumnet p of type MagicPtr
and returns void
From the manpage
The magic_close() function closes the magic(5) database and deallocates any resources used.
#int magic_load(magic_t, const char *);
proc magic_load(p:MagicPtr, s:cstring) : cint {.importc, dynlib: libName.}
magic_load
is a proc declared in dynlib libmagic.so
takes argument p of type MagicPtr
and a cstring
"compatible c string" s
and returns a cint
From manpage:
The magic_load() function must be used to load the colon separated list of database files passed in as filename, or NULL for the default database file before any magic queries can performed.
#int magic_errno(magic_t);
proc magic_error(p: MagicPtr) : cstring {.importc, dynlib:libName.}
magic_errno
is a proc declared in dynlib libmagic.so
and takes argument p of type MagicPtr
and returns a cstring
From manpage
The magic_error() function returns a textual explanation of the last error, or NULL if there was no error.
#const char *magic_file(magic_t, const char *);
proc magic_file(p:MagicPtr, filepath: cstring): cstring {.importc, dynlib: libName.}
magic_file
is proc declared in dynlib libmagic.so
takes argument p of type MagicPtr
and a filepath of type cstring
and returns a cstring
From manpage:
The magic_file() function returns a textual description of the contents of the filename argument, or NULL if an error occurred. If the filename is NULL, then stdin is used.
Step 5: Friendly API
It'd be annoying for people to write C code and take care of pointers and such in a higher level language like nim
So let's expose a proc guessFile
takes a filepath and flags and internally use the functions we exposed through the FFI in the previous step.
proc guessFile*(filepath: string, flags: cint = MAGIC_NONE): string =
var mt : MagicPtr
mt = magic_open(flags)
discard magic_load(mt, nil)
if fileExists(expandFilename(filepath)):
result = $magic_file(mt, cstring(filepath))
magic_close(mt)
Only one note here to convert from cstring
to string
we use the toString
operator $
result = $magic_file(mt, cstring(filepath))
Day 4: LinksChecker
What to expect ?
We will be writing a simple linkschecker in both sequential
and asynchronous
style in nim
Implementation
Step 0: Imports
import os, httpclient
import strutils
import times
import asyncdispatch
Step 1: Data types
type
LinkCheckResult = ref object
link: string
state: bool
LinkCheckResult is a simple representation for a link and its state
Step 2: GO Sequential!
proc checkLink(link: string) : LinkCheckResult =
var client = newHttpClient()
try:
return LinkCheckResult(link:link, state:client.get(link).code == Http200)
except:
return LinkCheckResult(link:link, state:false)
Here, we have a proc checkLink
takes a link and returns LinkCheckResult
newHttpClient()
to create a new clientclient.get
to send a get request to a link and it returns a responseresponse.code
gives us the HTTP status code, and we consider a link is valid if its status == 200client.get
raises error for invalid structured links that's why we wrapped it atry/except
block
proc sequentialLinksChecker(links: seq[string]): void =
for index, link in links:
if link.strip() != "":
let result = checkLink(link)
echo result.link, " is ", result.state
Here, sequentialLinksChecker
proc takes sequence of links
and executes checkLink
on them sequentially
LINKS: @["https://www.google.com.eg", "https://yahoo.com", "https://reddit.com", "https://none.nonadasdet", "https://github.com", ""]
SEQUENTIAL::
https://www.google.com.eg is true
https://yahoo.com is true
https://reddit.com is true
https://none.nonadasdet is false
https://github.com is true
7.716497898101807
On my lousy internet it took 7.7 seconds to finish :(
Step 3: GO ASYNC!
We can do better than waiting on IO requests to finish
proc checkLinkAsync(link: string): Future[LinkCheckResult] {.async.} =
var client = newAsyncHttpClient()
let future = client.get(link)
yield future
if future.failed:
return LinkCheckResult(link:link, state:false)
else:
let resp = future.read()
return LinkCheckResult(link:link, state: resp.code == Http200)
Here, we define a checkLinkAsync
proc
- to declare a proc as async we use
async
pragma - notice the client is of type
newAsyncHttpClient
that doesn't block on.get
calls client.get
returns immediately a future that can either fail, and we can infer know that fromfuture.failed
or succeedyield future
means okay i'm done for now dearevent loop
you can schedule other tasks and continue my execution when you have more update on my fancyfuture
when the eventloop comes back because the future now has some updates- clearly, if the
future
failed we return the link with afalse
state - otherwise, we get the
response
object that's enclosed in the future by callingread
proc asyncLinksChecker(links: seq[string]) {.async.} =
# client.maxRedirects = 0
var futures = newSeq[Future[LinkCheckResult]]()
for index, link in links:
if link.strip() != "":
futures.add(checkLinkAsync(link))
# waitFor -> call async proc from sync proc, await -> call async proc from async proc
let done = await all(futures)
for x in done:
echo x.link, " is ", x.state
Here, we have another async procedure asyncLinksChecker
that will take a sequence of links
and create futures for all of them and wait when they finish and give us some results
futures
is a sequence for the future results of all theLinkCheckResults
for all the links passed toasyncLinksChecker
proc- we loop on the links and get
future
for the execution ofcheckLinkAsync
and add it to thefutures
sequence. - we now ask to force to block until we get all of the results out of the futures into
done
variable - then we print all the results
- Please notice
await
is used only to callasync
proc from anotherasync
proc, andwaitFor
is used to callasync
proc fromsync
proc
ASYNC::
https://www.google.com.eg is true
https://yahoo.com is true
https://reddit.com is true
https://none.nonadasdet is false
https://github.com is true
is false
3.601503849029541
Step 4 simple cli
proc main()=
echo "Param count: ", paramCount()
if paramCount() == 1:
let linksfile = paramStr(1)
var f = open(linksfile, fmRead)
let links = readAll(f).splitLines()
echo "LINKS: " & $links
echo "SEQUENTIAL:: "
var t = epochTime()
sequentialLinksChecker(links)
echo epochTime()-t
echo "ASYNC:: "
t = epochTime()
waitFor asyncLinksChecker(links)
echo epochTime()-t
else:
echo "Please provide linksfile"
main()
the only interesting part is waitFor asyncLinksChecker(links)
as we said to call async
proc from sync
proc like this main proc you will need to use waitFor
Extra, threading
import threadpool
proc checkLinkParallel(link: string) : LinkCheckResult {.thread.} =
var client = newHttpClient()
try:
return LinkCheckResult(link:link, state:client.get(link).code == Http200)
except:
return LinkCheckResult(link:link, state:false)
Same as before, only thread
pragma i used to note that proc will be executed within a thread
proc threadsLinksChecker(links: seq[string]): void =
var LinkCheckResults = newSeq[FlowVar[LinkCheckResult]]()
for index, link in links:
LinkCheckResults.add(spawn checkLinkParallel(link))
for x in LinkCheckResults:
let res = ^x
echo res.link, " is ", res.state
- spawned
tasks
orthreads
returns a value of typeFlowVar[T]
, whereT
is the return value of the spawnedproc
- To get the value of a
FlowVar
we use^
operator.
Note: you should use
nim.cfg
with flags-d:ssl
to allow working with https
Day 5: Creating INI Parser
this is a pure Ini parser for nim
Nim has advanced parsecfg
What to expect ?
let sample1 = """
[general]
appname = configparser
version = 0.1
[author]
name = xmonader
email = notxmonader@gmail.com
"""
var d = parseIni(sample1)
# doAssert(d.sectionsCount() == 2)
doAssert(d.getProperty("general", "appname") == "configparser")
doAssert(d.getProperty("general","version") == "0.1")
doAssert(d.getProperty("author","name") == "xmonader")
doAssert(d.getProperty("author","email") == "notxmonader@gmail.com")
d.setProperty("author", "email", "alsonotxmonader@gmail.com")
doAssert(d.getProperty("author","email") == "alsonotxmonader@gmail.com")
doAssert(d.hasSection("general") == true)
doAssert(d.hasSection("author") == true)
doAssert(d.hasProperty("author", "name") == true)
d.deleteProperty("author", "name")
doAssert(d.hasProperty("author", "name") == false)
echo d.toIniString()
let s = d.getSection("author")
echo $s
Implementation
You can certainly use regular expressions, like pythons configparser, but we will go for a simpler approach here, also we want to keep it pure so we don't depend on pcre
Ini sample
[general]
appname = configparser
version = 0.1
[author]
name = xmonader
email = notxmonader@gmail.com
Ini file consists of one or more sections and each section consists of one or more key value pairs separated by =
Define your data types
import tables, strutils
We will use tables extensively
type Section = ref object
properties: Table[string, string]
Section
type contains properties
table represents key value pairs
proc setProperty*(this: Section, name: string, value: string) =
this.properties[name] = value
To set property in the underlying properties
table
proc newSection*() : Section =
var s = Section()
s.properties = initTable[string, string]()
return s
To create new Section object
proc `$`*(this: Section) : string =
return "<Section" & $this.properties & " >"
Simple toString
proc using $
operator
type Ini = ref object
sections: Table[string, Section]
Ini
type represents the whole document and contains a table section
from sectionName
to Section
object.
proc newIni*() : Ini =
var ini = Ini()
ini.sections = initTable[string, Section]()
return ini
To create new Ini object
proc `$`*(this: Ini) : string =
return "<Ini " & $this.sections & " >"
define friendly toString
proc using $
operator
Define API
proc setSection*(this: Ini, name: string, section: Section) =
this.sections[name] = section
proc getSection*(this: Ini, name: string): Section =
return this.sections.getOrDefault(name)
proc hasSection*(this: Ini, name: string): bool =
return this.sections.contains(name)
proc deleteSection*(this: Ini, name:string) =
this.sections.del(name)
proc sectionsCount*(this: Ini) : int =
echo $this.sections
return len(this.sections)
Some helper procs around Ini objects for manipulating sections.
proc hasProperty*(this: Ini, sectionName: string, key: string): bool=
return this.sections.contains(sectionName) and this.sections[sectionName].properties.contains(key)
proc setProperty*(this: Ini, sectionName: string, key: string, value:string) =
echo $this.sections
if this.sections.contains(sectionName):
this.sections[sectionName].setProperty(key, value)
else:
raise newException(ValueError, "Ini doesn't have section " & sectionName)
proc getProperty*(this: Ini, sectionName: string, key: string) : string =
if this.sections.contains(sectionName):
return this.sections[sectionName].properties.getOrDefault(key)
else:
raise newException(ValueError, "Ini doesn't have section " & sectionName)
proc deleteProperty*(this: Ini, sectionName: string, key: string) =
if this.sections.contains(sectionName) and this.sections[sectionName].properties.contains(key):
this.sections[sectionName].properties.del(key)
else:
raise newException(ValueError, "Ini doesn't have section " & sectionName)
More helpers around properties in the section objects managed by Ini
object
proc toIniString*(this: Ini, sep:char='=') : string =
var output = ""
for sectName, section in this.sections:
output &= "[" & sectName & "]" & "\n"
for k, v in section.properties:
output &= k & sep & v & "\n"
output &= "\n"
return output
Simple proc toIniString
to convert the nim structures into Ini text string
Parse!
OK, here comes the cool part
Parser states
type ParserState = enum
readSection, readKV
Here we have two states
- readSection: when we are supposed to extract section name from the current line
- readKV: when we are supposed to read the line in key value pair mode
ParseIni proc
proc parseIni*(s: string) : Ini =
Here we define a proc parseIni
that takes a string s
and creates an Ini
object
var ini = newIni()
var state: ParserState = readSection
let lines = s.splitLines
var currentSectionName: string = ""
var currentSection = newSection()
ini
is the object to be returned after parsingstate
the current parser state (weather it'sreadSection
orreadKV
)lines
input string splitted into linesas we are a lines based parser
currentSectionName
to keep track of what section we are currently incurrentSection
to populateini.sections
withSection
object usingsetSection
proc
for line in lines:
for each line
if line.strip() == "" or line.startsWith(";") or line.startsWith("#"):
continue
We continue if line is safe to igore empty line
or starts with ;
or #
if line.startsWith("[") and line.endsWith("]"):
state = readSection
if line startswith [
and ends with ]
then we set parser state to readSection
if state == readSection:
currentSectionName = line[1..<line.len-1]
ini.setSection(currentSectionName, currentSection)
state = readKV
continue
if parser state
is readSection
- extract section name
between [ and ]
- add section object to the ini under the current section name
- change
state
toreadKV
to read key value pairs - continue the loop on the nextline as we're done processing the section name.
if state == readKV:
let parts = line.split({'='})
if len(parts) == 2:
let key = parts[0].strip()
let val = parts[1].strip()
ini.setProperty(currentSectionName, key, val)
if state
is readKV
- extract
key
andval
by splitting the line on=
setProperty
under thecurrentSectionName
usingkey
andval
return ini
Here we return the populated ini
object.
Day 6: Manage your dotfiles easily with nistow
Today we will create a tool to manage our dotfiles easily.
Dotfiles layout
i3
`-- .config
`-- i3
`-- config
So we have here a directory named i3 in the very top indicates APP_NAME
and under it a tree of config paths. Here it means config
file is supposed to be linked under .config/i3/config
relative to destination directory
Home directory is the default destination.
What do we expect?
➜ ~ nistow --help
Stow 0.1.0
-h | --help : show help
-v | --version : show version
--verbose : verbose messages
-s | --simulate : simulate stow operation
-f | --force : override old links
-a | --app : application path to stow
-d | --dest : destination to stow to
--simulate
flag used to simulate on the filesystem without actual linking--app
application directory that's compatible with the dotfiles layoud described above.--dest
destination to symlink files under, defaults to home dir.
nistow --app=/home/striky/wspace/dotfiles/localdir --dest=/tmp/tmpconf --verbose
Implementation
proc writeHelp() =
echo """
Stow 0.1.0 (Manage your dotfiles easily)
Allowed arguments:
-h | --help : show help
-v | --version : show version
--verbose : verbose messages
-s | --simulate : simulate stow operation
-f | --force : override old links
-a | --app : application path to stow
-d | --dest : destination to stow to
"""
writeHelp
is a simple proc to write help string to the stdout
proc writeVersion() =
echo "Stow version 0.1.0"
To write version
proc cli*() =
Entry point for out commandline application
var
simulate, verbose, force: bool = false
app, dest: string = ""
Variables represents various options we allow in the application.
if paramCount() == 0:
writeHelp()
quit(0)
If no arguments passed we will write the help string and exit
or quit
according to nim with exit status
0
for kind, key, val in getopt():
case kind
of cmdLongOption, cmdShortOption:
case key
of "help", "h":
writeHelp()
quit()
of "version", "v":
writeVersion()
quit()
of "simulate", "s": simulate = true
of "verbose": verbose = true
of "force", "f": force = true
of "app", "a": app = val
of "dest", "d": dest = val
else:
discard
else:
discard
Here we parse the commandline string using getopt
.
for kind, key, val in getopt():
case kind
of cmdLongOption, cmdShortOption:
So for --app=/home/striky/dotfiles/i3 -f
kind for --app
is cmdLongOption
and for -f
is cmdShortOption
key for --app
is app
and for -f
is f
val for --app
is /home/striky/dotfiles/i3
val for -f
we set to true
in our parsing, because it's mainly like a switch boolean
if it exists it means we want it set to true.
if dest.isNilOrEmpty():
dest = getHomeDir()
Here we set default dest
to homeDir
if app.isNilOrEmpty():
echo "Make sure to provide --app flags"
quit(1)
Here we exit with error exit status
1 if app isn't set.
try:
stow(getLinkableFiles(appPath=app, dest=dest), simulate=simulate, verbose=verbose, force=force)
except ValueError:
echo "Error happened: " & getCurrentExceptionMsg()
Here we try to stow all the linkable files in app
dir to dest
dir and pass all the options we collected from the command line arguments simulate
, verbose
, force
, and wrapped around try/except
to show error to the user
when isMainModule:
cli()
invoke our entry point cli
if this module is the main module.
OK! back to stow and getLinkableFiles
We start with getLinkableFiles
. Remember the dotfiles hierarchy?
# appPath: application's dotfiles directory
# we expect dir to have the hierarchy.
# i3
# `-- .config
# `-- i3
# `-- config
We want to get all the files in there with full path and the link file to each one will be exactly the same except for the appPath
name will be changed to dest
path
[/home/striky/wspace/dotfiles/i3]/.config/i3/config -> [/home/striky]/.config/i3/config
__________________appPath________ _____dest____
type
LinkInfo = tuple[original:string, dest:string]
Simple type to represent the original path and where to symlink to
proc getLinkableFiles*(appPath: string, dest: string=expandTilde("~")): seq[LinkInfo] =
# collects the linkable files in a certain app.
# appPath: application's dotfiles directory
# we expect dir to have the hierarchy.
# i3
# `-- .config
# `-- i3
# `-- config
# dest: destination of the link files : default is the home of user.
getLinkableFiles
is a proc takes appPath
and dest
and returns a seq
of LinkInfo contains this transformation for each file.
[/home/striky/wspace/dotfiles/i3]/A_FILE_PATH -> [/home/striky]A_FILE_PATH
__________________apppath________ _____dest____
var appPath = expandTilde(appPath)
if not dirExists(appPath):
raise newException(ValueError, fmt("App path {appPath} doesn't exist."))
var linkables = newSeq[LinkInfo]()
for filepath in walkDirRec(appPath, yieldFilter={pcFile}):
let linkpath = filepath.replace(appPath, dest)
var linkInfo : LinkInfo = (original:filepath, dest:linkpath)
linkables.add(linkInfo)
return linkables
Here, we walk over the appPath
dir using walkDirRec
and specify in yieldFilter
argument that we're interested in pcFile
"file path component", just call it entries of type regular file.
proc stow(linkables: seq[LinkInfo], simulate: bool=true, verbose: bool=true, force: bool=false) =
# Creates symoblic links and related directories
# linkables is a list of tuples (filepath, linkpath) : List[Tuple[file_path, link_path]]
# simulate does simulation with no effect on the filesystem: bool
# verbose shows log messages: bool
for linkinfo in linkables:
let (filepath, linkpath) = linkinfo
if verbose:
echo(fmt("Will link {filepath} -> {linkpath}"))
if not simulate:
createDir(parentDir(linkpath))
if not fileExists(linkpath):
createSymlink(filepath, linkpath)
else:
if force:
removeFile(linkpath)
createSymlink(filepath, linkpath)
else:
if verbose:
echo(fmt("Skipping linking {filepath} -> {linkpath}"))
stow is pretty easy procedure, it takes in a list of LinksInfo
that has all the information (original filename and destination symlink) and does the symlinking based on if it's not a simulation and prints the messages if verbose is set to true
Feel free to send improvements to this tutorial or nistow :)
Complete source code available here https://github.com/xmonader/nistow
Day 7: Shorturl service
Today, we will develop a url shortening service like bit.ly
or something
imports
import jester, asyncdispatch, htmlgen, json, os, strutils, strformat, db_sqlite
-
jester: is sinatra like framework
-
asyncdispatch: for async/await instructions
-
htmlgen: to generate html pages
-
json: to parse json string into nim structures and dump json structures to strings
-
db_sqlite: to work on sqlite databse behind our application
Database connection
# hostname can be something configurable "http://ni.m:5000"
let hostname = "localhost:5000"
var theDb : DbConn
-
hostname
is the basepath for our site to access it, and can be configurable using/etc/hosts
file or using evenreverse proxy
likecaddy
, or in real world case you will have a dns record for your site. -
theDb
is the connection object to work withsqlite
database.
if not fileExists("/tmp/mytest.db"):
theDb = open("/tmp/mytest.db", nil, nil, nil)
theDb.exec(sql("""create table urls (
id INTEGER PRIMARY KEY,
url VARCHAR(255) NOT NULL
)"""
))
else:
theDb = open("/tmp/mytest.db", nil, nil, nil)
- We check if the database file doesn't exist
/tmp/mytest.db
we create aurls
table otherwise we just get the connection and do nothing
Jester and http endpoints
routes:
- jester defines a DSL to work on routes
METHOD ROUTE_PATH:
##codeblock
-
METHOD can be
get
post
or anyhttp
verb -
ROUTE_PATH is the path accessed on the server for instance
/users
,/user/52
, here52
is a query parameter when route is defined like this/user/@id
HOME page
Here we handle GET
requests on /home
path on our server:
get "/home":
var htmlout = """
<html>
<title>NIM SHORT</title>
<head>
<script
src="https://code.jquery.com/jquery-3.3.1.min.js"
integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8="
crossorigin="anonymous"></script>
<script>
function postData(url, data) {
// Default options are marked with *
return fetch(url, {
body: JSON.stringify(data), // must match 'Content-Type' header
cache: 'no-cache', // *default, no-cache, reload, force-cache, only-if-cached
credentials: 'same-origin', // include, same-origin, *omit
headers: {
'user-agent': 'Mozilla/4.0 MDN Example',
'content-type': 'application/json'
},
method: 'POST', // *GET, POST, PUT, DELETE, etc.
mode: 'cors', // no-cors, cors, *same-origin
redirect: 'follow', // manual, *follow, error
referrer: 'no-referrer', // *client, no-referrer
})
.then(resp => resp.json())
}
$(document).ready(function() {
$('#btnsubmit').on('click', function(e){
e.preventDefault();
postData('/shorten', {url: $("#url").val()})
.then( data => {
let id = data["id"]
$("#output").html(`<a href="%%hostname/${id}">Shortlink: ${id}</a>`);
});
});
});
</script>
</head>
<body>
<div>
<form>
<label>URL</label>
<input type="url" name="url" id="url" />
<button id="btnsubmit" type="button">SHORT!</button
</form>
</div>
<div id="output">
</div>
</body>
</html>
"""
htmlout = htmlout.replace("%%hostname", hostname)
resp htmlout
-
Include jquery framework
-
Create a form with in
div tag with 1 textinput to allow user to enter a url
-
override form submission to do an ajax request
-
on the button shorturl click event we send a post request to
/shorten
endpoint in the background usingfetch
api and whenever we get a result we parse the json data and extract theid
from it and put the new url in theoutput
div -
resp
to return a response to the user and it can return ahttp status
too
Shorten endpoint
post "/shorten":
let url = parseJson(request.body).getOrDefault("url").getStr()
if not url.isNilOrEmpty():
var id = theDb.getValue(sql"SELECT id FROM urls WHERE url=?", url)
if id.isNilOrEmpty():
id = $theDb.tryInsertId(sql"INSERT INTO urls (url) VALUES (?)", url)
var jsonResp = $(%*{"id": id})
resp Http200, jsonResp
else:
resp Http400, "please specify url in the posted data."
Here we handle POST
requests on /shorten
endpoint
-
get the url from parsed json post data. please note that POST data is
available under request.body
explained in the previous section
-
if url is passed we try to check if it's there in our
urls
table, if it's there we return it, otherwise we insert it in the table. -
if the url isn't passed we return a badrequest
400
status code. -
parseJson
: loads json from a string and you can get value usinggetOrDefault
andgetStr
to get string value, there's getBool, and so on. -
getValue
to get the id from the result of the select statementreturns the first column from the first row in the result set
-
tryInsertId
executes insert statement and returns the id of the new row -
after successfull insertion we would like to return
json
serialized string to the user$(%*{"id": id})
-
%*
is a macro to convert nim struct into json node and to convert it to string we wrap$
around it
Shorturls redirect
get "/@Id":
let url = theDb.getValue(sql"SELECT url FROM urls WHERE id=?", @"Id")
if url.isNilOrEmpty():
resp Http404, "Don't know that url"
else:
redirect url
-
Here we fetch whatever path
@Id
the user trying to accessexcept for /home and /shorten
and we try to get the long url for that path -
If the path is resolved to a url we
redirect
the user to to or we show an error message -
@"Id"
gets the value of@Id
query parameter : notice the@
position in both situation
RUN
runForever()
start jester webserver
Code is available here https://gist.github.com/xmonader/d41a5c9f917eadb90d3025e7b7e748dd
Day 8: minitest
I'm a big fan of Practical Common Lisp and It has a chapter on building a unittest framework using macros and I didn't get the chance to tinker with nim macros just yet, So today we will be building almost the same thing in nim.
So what's up?
Imagine you want to check for some expression and print a specific message donating the expression
doAssert(1==2, "1 == 2 failed")
Here we want to assure that 1==2 or show a message with 1==2 failed
and it goes on for whatever we want to check for
doAssert(1+2==3, "1+2 == 3 failed")
doAssert(5*2==10, "5*2 == 10 failed")
We can already see the boilerplate here, repeating the expression twice one for the check and one for the message itself.
What to expect?
We expect having a DSL to remove the boilerplate we're suffering from in the prev. section.
check(3==1+2)
check(6+5*2 == 16)
And this will print
3 == 1 + 2 .. passed
6 + 5 * 2 == 16 .. passed
And it should evolve to allow grouping of test checks
check(3==1+2)
check(6+5*2 == 16)
suite "Arith":
check(1+2==3)
check(3+2==5)
suite "Strs":
check("HELLO".toLowerAscii() == "hello")
check("".isNilOrEmpty() == true)
Resulting something like this
3 == 1 + 2 .. passed
6 + 5 * 2 == 16 .. passed
==================================================
Arith
==================================================
1 + 2 == 3 .. passed
3 + 2 == 5 .. passed
==================================================
Strs
==================================================
"HELLO".toLowerAscii() == "hello" .. passed
"".isNilOrEmpty() == true .. passed
Implementation
So nim has two way to do macros
templates
Which are like functions that called in compilation time like preprocessor
From the nim manual
template `!=` (a, b: untyped): untyped =
# this definition exists in the System module
not (a == b)
assert(5 != 6) # the compiler rewrites that to: assert(not (5 == 6))
so in compile time 5 != 6
will be converted into not ( 5 == 6)
and the whole expression will be assert(not ( 5== 6))
So what're we gonna do is check for the passed expression to convert it to a string to be printed in the terminal output and if the expression fails we append failed
message or any other custom failure message
template check*(exp:untyped, failureMsg:string="failed", indent:uint=0): void =
let indentationStr = repeat(' ', indent)
let expStr: string = astToStr(exp)
var msg: string
if not exp:
if msg.isNilOrEmpty():
msg = indentationStr & expStr & " .. " & failureMsg
else:
msg = indentationStr & expStr & " .. passed"
echo(msg)
-
untyped
means the expression doesn't have to have a type yet, imagine passing variable name that doesn't exist yetdefineVar(myVar, 5)
so heremyVar
needs to be untyped or the compiler will complain. check the manual for more info https://nim-lang.org/docs/manual.html#templates -
astToStr
converts the ASTexp
to a string -
indent
amount of spaces prefixing the message.
Macros
Nim provides us with a way to access the AST in a very low level when we templates don't cut it.
What we expected is having a suite
macro
suite "Strs":
check("HELLO".toLowerAscii() == "hello")
check("".isNilOrEmpty() == true)
that takes a name
for the suite and bunch of statements
- Please note there're two kind of macros and we're interested in the
statements macro
here - Statments macro is a macro that has
colon
:
operator followed by bunch of statements
dumpTree
dumpTree is amazing to debug the ast and print them in a good visual way
dumpTree:
suite "Strs":
check("HELLO".toLowerAscii() == "hello")
Ident ident"suite"
StrLit Strs
StmtList
Call
Ident ident"check"
Infix
Ident ident"=="
Call
DotExpr
StrLit HELLO
Ident ident"toLowerAscii"
StrLit hello
dumpTree
says it gotIdentifier Ident
namedsuite
suite
containsStringLiteral
node with valueStrs
suite
containsStmtList
node- first statement in
StmtList
is acall
statement call
statement consist ofprocedure
namecheck
in this case and args list and so on..
macro suite*(name:string, exprs: untyped) : typed =
Here, we define a macro suite
takes name
and bunch of statements exprs
- Macro must return an AST in our case will be list of statements of
check
call statemenets - Need the messages to be indented
To achieve the indentation we can either print tab before calling check
or overwrite check to pass indent
option, we will go with overwrite the check
call ASTs
var result = newStmtList()
We will be returning a list of statments right?
let equline = newCall("repeat", newStrLitNode("="), newIntLitNode(50))
statement node that equals repeat("=", 50)
let writeEquline = newCall("echo", equline)
statement node the equals echo repeat("=", 50)
add(result, writeEquline, newCall("echo", name))
add(result, writeEquline)
this will generate
================
$name
================
Now we iterate over the passed statements to suite
macro and check for its kind
for i in 0..<exprs.len:
var exp = exprs[i]
let expKind = exp.kind
case expKind
of nnkCall:
case exp[0].kind
of nnkIdent:
let identName = $exp[0].ident
if identName == "check":
- If we're in a
check
call we will convert it fromcheck(expr)
=>check(expr, "", 1)
var checkWithIndent = exp
checkWithIndent.add(newStrLitNode(""))
checkWithIndent.add(newIntLitNode(1))
add(result, checkWithIndent)
otherwise we add any other statement as is unprocessesed.
else:
add(result, exp)
else:
discard
return result
Code is available on https://github.com/xmonader/nim-minitest
Day 9: Tic tac toe
Who didn't play Tic tac toe with his friends? :)
What to expect
Today we will implement tic tac toe game in Nim, with 2 modes
- Human vs Human
- Human vs AI
Implementation
So, let's get it. The winner in the game is the first one who manages to get 3 cells on the board to be the same in the same column or row or diagonally first.
imports
import sequtils, tables, strutils, strformat, random, os, parseopt2
randomize()
Constraints and objects
As the game allow turns we should have a way to keep track of the next player
let NEXT_PLAYER = {"X":"O", "O":"X"}.toTable
Here we use a table to tell us the next player
Board
type
Board = ref object of RootObj
list: seq[string]
Here we define a simple class representing the board
- list is a sequence representing the cells
maybe cells is a better name
- please note list is just a sequenece of elements
0 1 2 3 4 5 6 7 8
but we visualize it as
0 1 2
3 4 5
6 7 9
instead of using a 2d array for the sake of simplicity
let WINS = @[ @[0,1,2], @[3,4,5], @[6,7,8], @[0, 3, 6], @[1,4,7], @[2,5,8], @[0,4,8], @[2,4,6] ]
We talked WIN
patterns cells in the same row or the same column or in same diagonal
proc newBoard(): Board =
var b = Board()
b.list = @["0", "1", "2", "3", "4", "5", "6", "7", "8"]
return b
this is the initializer of the board and sets the cell value to the string represention of its index
Winning
proc done(this: Board): (bool, string) =
for w in WINS:
if this.list[w[0]] == this.list[w[1]] and this.list[w[1]] == this.list[w[2]]:
if this.list[w[0]] == "X":
return (true, "X")
elif this.list[w[0]] == "O":
return (true, "O")
if all(this.list, proc(x:string):bool = x in @["O", "X"]) == true:
return (true, "tie")
else:
return (false, "going")
Here we check for the state of the game and the winner if all of the item in WIN
patterns are the same
proc `$`(this:Board): string =
let rows: seq[seq[string]] = @[this.list[0..2], this.list[3..5], this.list[6..8]]
for row in rows:
for cell in row:
stdout.write(cell & " | ")
echo("\n--------------")
Here we have the string representation of the board so we can show it as 3x3 grid in a lovely way
proc emptySpots(this:Board):seq[int] =
var emptyindices = newSeq[int]()
for i in this.list:
if i.isDigit():
emptyindices.add(parseInt(i))
return emptyindices
Here we have a simple helper function that returns the empty spots indices the spots that doesn't have X or O in it
, remember all the cells are initialized to the string representation of their indices.
Game
type
Game = ref object of RootObj
currentPlayer*: string
board*: Board
aiPlayer*: string
difficulty*: int
proc newGame(aiPlayer:string="", difficulty:int=9): Game =
var
game = new Game
game.board = newBoard()
game.currentPlayer = "X"
game.aiPlayer = aiPlayer
game.difficulty = difficulty
return game
# 0 1 2
# 3 4 5
# 6 7 8
Here we have another object representing the game and the players and the difficulty and wether it has an AI player or not and who is the current player
- difficulty is only logical in case of AI, it means when does the AI start calculating moves and considering scenarios, 9 is the hardest, 0 is the easiest.
proc changePlayer(this:Game) : void =
this.currentPlayer = NEXT_PLAYER[this.currentPlayer]
Simple procedure to switch turns between players
Start the game
proc startGame*(this:Game): void=
while true:
echo this.board
if this.aiPlayer != this.currentPlayer:
stdout.write("Enter move: ")
let move = stdin.readLine()
this.board.list[parseInt($move)] = this.currentPlayer
this.change_player()
let (done, winner) = this.board.done()
if done == true:
echo this.board
if winner == "tie":
echo("TIE")
else:
echo("WINNER IS :", winner )
break
Here if we don't have aiPlayer
if not set it's just a game with 2 humans switching turns and checking for the winner after each move
Minmax and AI support
Minmax is an algorithm mainly used to predict the possible moves in the future and how to minimize the losses and maximize the chances of winning
- https://www.youtube.com/watch?v=6ELUvkSkCts
- https://www.youtube.com/watch?v=CwziaVrM_vc&t=1199s
type
Move = tuple[score:int, idx:int]
We need a type Move on a certain idx to represent if it's a good/bad move depending on the score
- good means minimizing chances of the human to win or making AI win => high score +10
- bad means maximizing chances of the human to win or making AI lose => low score -10
So let's say we are in this situation
O X X
X 4 5
X O O
And it's AI turn
we have two possible moves (4 or 5)
O X X
X 4 O
X O O
this move (to 5) is clearly wrong because the next move to human will allow him to complete the diagonal (2, 4, 6) So this is a bad move we give it score -10 or
O X X
X O 5
X O O
this move (to 4) minimizes the losses (leads to a TIE instead of making human wins) so we give it a higher score
proc getBestMove(this: Game, board: Board, player:string): Move =
let (done, winner) = board.done()
# determine the score of the move by checking where does it lead to a win or loss.
if done == true:
if winner == this.aiPlayer:
return (score:10, idx:0)
elif winner != "tie": #human
return (score:(-10), idx:0)
else:
return (score:0, idx:0)
let empty_spots = board.empty_spots()
var moves = newSeq[Move]()
for idx in empty_spots:
# we calculate more new trees depending on the current situation and see where the upcoming moves lead
var newboard = newBoard()
newboard.list = map(board.list, proc(x:string):string=x)
newboard.list[idx] = player
let score = this.getBestMove(newboard, NEXT_PLAYER[player]).score
let idx = idx
let move = (score:score, idx:idx)
moves.add(move)
if player == this.aiPlayer:
return max(moves)
# var bestScore = -1000
# var bestMove: Move
# for m in moves:
# if m.score > bestScore:
# bestMove = m
# bestScore = m.score
# return bestMove
else:
return min(moves)
# var bestScore = 1000
# var bestMove: Move
# for m in moves:
# if m.score < bestScore:
# bestMove = m
# bestScore = m.score
# return bestMove
Here we have a highly annotated getBestMove
procedure to calculate recursively the best move for us
Now our startGame should look like this
proc startGame*(this:Game): void=
while true:
##old code
## AI check
else:
if this.currentPlayer == this.aiPlayer:
let emptyspots = this.board.emptySpots()
if len(emptyspots) <= this.difficulty:
echo("AI MOVE..")
let move = this.getbestmove(this.board, this.aiPlayer)
this.board.list[move.idx] = this.aiPlayer
else:
echo("RANDOM GUESS")
this.board.list[emptyspots.rand()] = this.aiPlayer
## oldcode
Here we allow the game to use difficulty which means when does the AI starts calculating the moves and making the tree? from the beginning 9 cells left or when there're 4 cells left? you can set it the way you want it, and until u reach the starting difficulty situation AI will use random guesses (from the available emptyspots
) instead of calculating
CLI entry
proc writeHelp() =
echo """
TicTacToe 0.1.0 (MinMax version)
Allowed arguments:
-h | --help : show help
-a | --ai : AI player [X or O]
-l | --difficulty : destination to stow to
"""
proc cli*() =
var
aiplayer = ""
difficulty = 9
for kind, key, val in getopt():
case kind
of cmdLongOption, cmdShortOption:
case key
of "help", "h":
writeHelp()
# quit()
of "aiplayer", "a":
echo "AIPLAYER: " & val
aiplayer = val
of "level", "l": difficulty = parseInt(val)
else:
discard
else:
discard
let g = newGame(aiPlayer=aiplayer, difficulty=difficulty)
g.startGame()
when isMainModule:
cli()
Code is available on https://github.com/xmonader/nim-tictactoe/blob/master/src/nim_tictactoe_cli.nim
Day 10: Tic tac toe with GUI!!
Hopefully, you're done with day 9 and enjoyed playing tic tac toe.
Expectation
It's fun to play on the command line, but it'd be very cool to have some GUI with some buttons using libui bindings in Nim
- make sure to install it using
nimble install ui
Implementation
In the previous day we reached some good abstraction separating the logic for the command line gui and the minmax algorithm and it's not tightly coupled
minimal ui application
proc gui*() =
var mainwin = newWindow("tictactoe", 400, 500, true)
show(mainwin)
mainLoop()
when isMainModule:
# cli()
init()
gui()
Here we create a window 400x500 with a title tictactoe
and we show it and start its mainLoop getting ready to receive and dispatch events
TicTacToe GUI
We can imagine the gui to be something like that
---------------------------------------------
| --------------------------------------- |
+ | INFO LABEL | button to restart | +
| ---------------------------------------| |
+ |--------------------------------------| +
| | btn | btn | btn | |
+ |--------------------------------------| +
| | btn | btn | btn | |
+ |--------------------------------------| +
| | btn | btn | btn | |
+ |--------------------------------------| +
---------------------------------------------
- a window that contains a vertical box
- the vertical box contains 4 rows
- first row to show information about the current game and a button to reset the game
- and the other rows represent the 3x3 tictactoe grid that will reflect
game.list
:) - and 9 buttons to be pressed to set X or O
- we will support human vs AI so when human presses a button it gets disabled and the AI presses the button that minimizes its loss and that button gets disabled too.
proc gui*() =
var mainwin = newWindow("tictactoe", 400, 500, true)
# game object to contain the state, the players, the difficulty,...
var g = newGame(aiPlayer="O", difficulty=9)
var currentMove = -1
mainwin.margined = true
mainwin.onClosing = (proc (): bool = return true)
# set up the boxes
let box = newVerticalBox(true)
let hbox0 = newHorizontalBox(true)
let hbox1 = newHorizontalBox(true)
let hbox2 = newHorizontalBox(true)
let hbox3 = newHorizontalBox(true)
# list of buttons
var buttons = newSeq[Button]()
# information label
var labelInfo = newLabel("Info: Player X turn")
hbox0.add(labelInfo)
# restart button
hbox0.add(newButton("Restart", proc() =
g =newGame(aiPlayer="O", difficulty=9)
for i, b in buttons.pairs:
b.text = $i
b.enable()))
Here we setup the layout we just described and create a button Restart that resets the game again and restore the buttons text and enables them all
# create the buttons
for i in countup(0, 8):
var handler : proc()
closureScope:
let senderId = i
handler = proc() =
currentMove = senderId
g.board.list[senderId] = g.currentPlayer
g.change_player()
labelInfo.text = "Current player: " & g.currentPlayer
for i, v in g.board.list.pairs:
buttons[i].text = v
let (done, winner) = g.board.done()
if done == true:
echo g.board
if winner == "tie":
labelInfo.text = "Tie.."
else:
labelInfo.text = winner & " won."
else:
aiPlay()
buttons[senderId].disable()
buttons.add(newButton($i, handler))
- Here we create the buttons please notice we are using
closureScope
feature to capture the button id to keep track of which button is clicked - after pressing set set the text of the button to
X
- we disable the button so we don't receive anymore events.
- switch turns
- update the information label whether about the next player or the game state
- if the game is still going we ask the AI for a move
# code to run when the game asks the ai to play (after each move from the human..)
proc aiPlay() =
if g.currentPlayer == g.aiPlayer:
let emptySpots = g.board.emptySpots()
if len(emptySpots) <= g.difficulty:
let move = g.getBestMove(g.board, g.aiPlayer)
g.board.list[move.idx] = g.aiPlayer
buttons[move.idx].disable()
else:
let rndmove = emptyspots.rand()
g.board.list[rndmove] = g.aiPlayer
g.change_player()
labelInfo.text = "Current player: " & g.currentPlayer
for i, v in g.board.list.pairs:
buttons[i].text = v
let (done, winner) = g.board.done()
if done == true:
echo g.board
if winner == "tie":
labelInfo.text = "Tie.."
else:
labelInfo.text = winner & " won."
- using minmax algorithm from the previous day we calculate the best move
- change the button text to
O
- disable the button
- update the information label
hbox1.add(buttons[0])
hbox1.add(buttons[1])
hbox1.add(buttons[2])
hbox2.add(buttons[3])
hbox2.add(buttons[4])
hbox2.add(buttons[5])
hbox3.add(buttons[6])
hbox3.add(buttons[7])
hbox3.add(buttons[8])
box.add(hbox0, true)
box.add(hbox1, true)
box.add(hbox2, true)
box.add(hbox3, true)
mainwin.setChild(box)
- Here we add the buttons to their correct rows in the correct columns and set the main widget
show(mainwin)
mainLoop()
when isMainModule:
init()
gui()
Code is available on https://github.com/xmonader/nim-tictactoe/blob/master/src/nim_tictactoe_gui.nim
Day 11 ( Bake applications)
I used to work on application 2 years ago, and It was a bit like ansible defining recipes to create applications and managing their dependencies.
What to expect
Today we will be doing something very simple to track our dependencies and print the bash commands for each task like Makefile.
HEADERS = program.h headers.h
default: program
program.o: program.c $(HEADERS)
gcc -c program.c -o program.o
program: program.o
gcc program.o -o program
clean:
-rm -f program.o
-rm -f program
Basically, makefile consists of
- Variables
- Targets
- Dependencies
variables
like HEADERS=...
, targets
whatever precedes the :
like clean
, program
, program.o
, dependencies
are what a target depends on, so for instance program
target that generates the executable requires program.o
dependency to be executed first.
Example API usage
Normal usage
var b = initBake()
b.add_task("publish", @["build-release"], "print publish")
b.add_task("build-release", @["nim-installed"], "print exec command to build release mode")
b.add_task("nim-installed", @["curl-installed"], "print curl LINK | bash")
b.add_task("curl-installed", @["apt-installed"], "apt-get install curl")
b.add_task("apt-installed", @[], "code to install apt...")
b.run_task("publish")
OUTPUT:
code to install apt...
apt-get install curl
print curl LINK | bash
print exec command to build release mode
print publish
Circular dependencies
var b = initBake()
b.add_task("publish", @["build-release"], "print publish")
b.add_task("build-release", @["nim-installed"], "print exec command to build release mode")
b.add_task("nim-installed", @["curl-installed"], "print curl LINK | bash")
b.add_task("curl-installed", @["publish", "apt-installed"], "apt-get install curl")
b.add_task("apt-installed", @[], "code to install apt...")
b.run_task("publish")
Output:
Found cycle please fix:@["build-release", "nim-installed", "curl-installed", "publish", "build-release"]
Implementation
Imports
import strformat, strutils, tables, sequtils, algorithm
Graphs
Graphs are very powerful data structure and used to solve lots of problems, like getting the shortest route and detecting circular dependencies in our code today :)
So How to represent graph? Well, we will use Adjaceny list
Objects
type Task = object
requires*: seq[string]
actions*: string
name*: string
proc `$`(this: Task): string =
return fmt("Task {this.name} Requirements: {this.requires} , actions {this.actions}")
Task object represnts a target
in makefile language, and it has a name, actions code and list of dependencies
type Bake = ref object
tasksgraph* : Table[string, seq[string]]
tasks* : Table[string, Task]
Bake object has tasksgraph
adjaceny list representing the tasks and their dependencies and tasks table that maps taskname to task object
Adding a task
proc addTask*(this: Bake, taskname: string, deps: seq[string], actions:string) : void =
var t = Task(name:taskname, requires:deps, actions:actions)
this.tasksgraph[taskname] = deps
this.tasks[taskname] = t
- We update the adjacency list with (taskname and its dependencies)
- Add task object to tasks Table with key task name
Running tasks
proc runTask*(this: Bake, taskname: string): void =
# CODE OMITTED FOR FINIDNG CYCLES..
var deps = newSeq[string]()
var seen = newSeq[string]()
this.runTaskHelper(taskname, deps, seen)
for tsk in deps:
let t = this.tasks.getOrDefault(tsk)
echo(t.actions)
- Before running a task we should check if it has a cycle first.
- Keep track of dependencies and the seen tasks so far so we don't
run seen tasks
again. (for instance if we have target install-wget and target install-curl and both require targetapt-get update
, so we want to runapt-get update
only once )
for example
code to install apt...
apt-get install curl
print curl LINK | bash
print exec command to build release mode
print publish
- Call
runTaskHelper
procedure to walk through all thetasks
and theirdependencies
and get us a list of depseach will update deps variable as we will be sending it by reference
- After getting correct dependencies tasks sorted we execute
in our case we will just echo actions property
and now to runTaskHelper
that basically updates our dependencies list and put the task execution in order
proc runTaskHelper(this: Bake, taskname: string, deps: var seq[string], seen: var seq[string]) : void =
if taskname in seen:
echo "[+] Solved {taskname} before no need to repeat action"
var tsk = this.tasks.getOrDefault(taskname)
seen.add(taskname)
if len(tsk.requires) > 0:
for c in this.tasksgraph[tsk.name]:
this.runTaskHelper(c, deps, seen)
deps.add(taskname)
Detecting cycles
To detect a cycle we use DFS
depth first search algorithm basically going from one node as deep as we can go for each of its neigbours and Graph coloring
. Youtube Lecture
Explanation from geeksforgeeks
WHITE : Vertex is not processed yet. Initially
all vertices are WHITE.
GRAY : Vertex is being processed (DFS for this
vertex has started, but not finished which means
that all descendants (ind DFS tree) of this vertex
are not processed yet (or this vertex is in function
call stack)
BLACK : Vertex and all its descendants are
processed.
While doing DFS, if we encounter an edge from current
vertex to a GRAY vertex, then this edge is back edge
and hence there is a cycle.
OK, back to nim
1- Defining colors
type NodeColor = enum
ncWhite, ncGray, ncBlack
2- Graph has Cycle
proc graphHasCycle(graph: Table[string, seq[string]]): (bool, Table[string, string]) =
var colors = initTable[string, NodeColor]()
for node, deps in graph:
colors[node] = ncWhite
var parentMap = initTable[string, string]()
var hasCycle = false
for node, deps in graph:
parentMap[node] = "null"
if colors[node] == ncWhite:
hasCycleDFS(graph, node, colors, hasCycle, parentMap)
if hasCycle:
return (true, parentMap)
return (false, parentMap)
3- Depth First Function
proc hasCycleDFS(graph:Table[string, seq[string]] , node: string, colors: var Table[string, NodeColor], has_cycle: var bool, parentMap: var Table[string, string]) =
if hasCycle:
return
colors[node] = ncGray
for dep in graph[node]:
parentMap[dep] = node
if colors[dep] == ncGray:
hasCycle = true
parentMap["__CYCLESTART__"] = dep
return
if colors[dep] == ncWhite:
hasCycleDFS(graph, dep, colors, hasCycle, parentMap)
colors[node] = ncBlack
What's next?
- support for variables
- recipes maybe using yaml file
- modules like ansible?
Day 12: Implementing Redis Protocol
Today we will implement RESP (REdis Serialization Protocol) in Nim. Hopefully you read Day 2 on bencode data format (encoding/parsing) because we will be using the same techniques.
RESP
From redis protocol page.
Redis clients communicate with the Redis server using a protocol called RESP (REdis Serialization Protocol). While the protocol was designed specifically for Redis, it can be used for other client-server software projects.
RESP is a compromise between the following things:
Simple to implement.
Fast to parse.
Human readable.
RESP can serialize different data types like integers, strings, arrays. There is also a specific type for errors. Requests are sent from the client to the Redis server as arrays of strings representing the arguments of the command to execute. Redis replies with a command-specific data type.
So, basically we have 5 types (ints, strings, bulkstrings, errors, arrays)
What do we expect?
- able to decode strings into Reasonable structures in Nim
echo decodeString("*3\r\n:1\r\n:2\r\n:3\r\n\r\n")
# # @[1, 2, 3]
echo decodeString("+Hello, World\r\n")
# # Hello, World
echo decodeString("-Not found\r\n")
# # Not found
echo decodeString(":1512\r\n")
# # 1512
echo $decodeString("$32\r\nHello, World THIS IS REALLY NICE\r\n")
# Hello, World THIS IS REALLY NICE
echo decodeString("*2\r\n+Hello World\r\n:23\r\n")
# @[Hello World, 23]
echo decodeString("*2\r\n*3\r\n:1\r\n:2\r\n:3\r\n\r\n*5\r\n:5\r\n:7\r\n+Hello Word\r\n-Err\r\n$6\r\nfoobar\r\n")
# @[@[1, 2, 3], @[5, 7, Hello Word, Err, foobar]]
echo $decodeString("*4\r\n:51231\r\n$3\r\nfoo\r\n$-1\r\n$3\r\nbar\r\n")
# @[51231, foo, , bar]
- able to encode Nim structures representing Redis values into RESP
echo $encodeValue(RedisValue(kind:vkStr, s:"Hello, World"))
# # +Hello, World
echo $encodeValue(RedisValue(kind:vkInt, i:341))
# # :341
echo $encodeValue(RedisValue(kind:vkError, err:"Not found"))
# # -Not found
echo $encodeValue(RedisValue(kind:vkArray, l: @[RedisValue(kind:vkStr, s:"Hello World"), RedisValue(kind:vkInt, i:23)] ))
# #*2
# #+Hello World
# #:23
echo $encodeValue(RedisValue(kind:vkBulkStr, bs:"Hello, World THIS IS REALLY NICE"))
# #$32
# # Hello, World THIS IS REALLY NICE
Implementation
Imports and constants
Let's starts with main imports
import strformat, strutils, sequtils,
const CRLF = "\r\n"
const REDISNIL = "\0\0"
- CRLF is really important because lots of the protocol depends on that separator
\r\n
- REDISNIL
\0\0
to representNil
values
Data types
Again, as in Bencode chapter we will define a variant RedisValue
that represents All redis datatypes strings, errors, bulkstrings, ints, arrays
ValueKind = enum
vkStr, vkError, vkInt, vkBulkStr, vkArray
RedisValue* = ref object
case kind*: ValueKind
of vkStr: s*: string
of vkError : err*: string
of vkInt: i*: int
of vkBulkStr: bs*: string
of vkArray: l*: seq[RedisValue]
Let's add $
, hash
, ==
procedures
import hashes
proc `$`*(obj: RedisValue): string =
result = case obj.kind
of vkStr : obj.s
of vkBulkStr: obj.bs
of vkInt : $obj.i
of vkArray: $obj.l
of vkError: obj.err
proc hash*(obj: RedisValue): Hash =
result = case obj.kind
of vkStr : !$(hash(obj.s))
of vkBulkStr: !$(hash(obj.bs))
of vkInt : !$(hash(obj.i))
of vkArray: !$(hash(obj.l))
of vkError: !$(hash(obj.err))
proc `==`* (a, b: RedisValue): bool =
## Check two nodes for equality
if a.isNil:
result = b.isNil
elif b.isNil or a.kind != b.kind:
result = false
else:
case a.kind
of vkStr:
result = a.s == b.s
of vkBulkStr:
result = a.s == b.s
of vkInt:
result = a.i == b.i
of vkArray:
result = a.l == b.l
of vkError:
result = a.err == b.err
Encoder
Encoding is just converting the variant RedisValue
to the correct representation according to RESP
Encode simple strings
To encode simple strings specs says OK
should be +OK\r\n
proc encodeStr(v: RedisValue) : string =
return fmt"+{v.s}{CRLF}"
Encode Errors
To encode errors we should precede it with -
and end it with \r\n
. So Notfound
should be encoded as -Notfound\r\n
proc encodeErr(v: RedisValue) : string =
return fmt"-{v.err}{CRLF}"
Encode Ints
Ints are encoded :NUM\r\n
so 95 is :95\r\n
proc encodeInt(v: RedisValue) : string =
return fmt":{v.i}{CRLF}"
Encode Bulkstrings
From RESP page
Bulk Strings are used in order to represent a single binary safe string up to 512 MB in length.
Bulk Strings are encoded in the following way:
A "$" byte followed by the number of bytes composing the string (a prefixed length), terminated by CRLF.
The actual string data.
A final CRLF.
So the string "foobar" is encoded as follows:
"$6\r\nfoobar\r\n"
When an empty string is just:
"$0\r\n\r\n"
RESP Bulk Strings can also be used in order to signal non-existence of a value using a special format that is used to represent a Null value. In this special format the length is -1, and there is no data, so a Null is represented as:
"$-1\r\n"
proc encodeBulkStr(v: RedisValue) : string =
return fmt"${v.bs.len}{CRLF}{v.bs}{CRLF}"
Encode Arrays
To encode an array we do *
followed by array length then \r\n
then encode each element then end the array encoding with \r\n
- As we are calling
encode
we should forward declared it
proc encode*(v: RedisValue) : string
proc encodeArray(v: RedisValue): string =
var res = "*" & $len(v.l) & CRLF
for el in v.l:
res &= encode(el)
res &= CRLF
return res
So for instance to encode encodeValue(RedisValue(kind:vkArray, l: @[RedisValue(kind:vkStr, s:"Hello World"), RedisValue(kind:vkInt, i:23)] ))
The result should be
*2\r\n
+Hello World\r\n
:23\r\n
\r\n
Encode any data type
Here we switch on the passed variant and dispatch the encoding to the reasonable encoder.
proc encode*(v: RedisValue) : string =
case v.kind
of vkStr: return encodeStr(v)
of vkInt: return encodeInt(v)
of vkError: return encodeErr(v)
of vkBulkStr: return encodeBulkStr(v)
of vkArray: return encodeArray(v)
Decoder
Decoding is converting RESP representation into the correct Nim structures RedisValue
, Basically the reverse of what we did in the previous chapter
Please note: Basic strategy is Returning the RedisValue
and the length of processed characters
Decode simple string
proc decodeStr(s: string): (RedisValue, int) =
let crlfpos = s.find(CRLF)
return (RedisValue(kind:vkStr, s:s[1..crlfpos-1]), crlfpos+len(CRLF))
So, Here we are creating RedisValue of kind vkStr
of the string between +
and \r\n
Decode errors
proc decodeError(s: string): (RedisValue, int) =
let crlfpos = s.find(CRLF)
return (RedisValue(kind:vkError, err:s[1..crlfpos-1]), crlfpos+len(CRLF))
Here we are creating RedisValue of kind vkError
of the string between -
and \r\n
Decode ints
Nums as we said are the values between :
and \r\n
so we parseInt
of the characters between :
and \r\n
and create RedisValue of kind vkInt
with that parsed int.
proc decodeInt(s: string): (RedisValue, int) =
var i: int
let crlfpos = s.find(CRLF)
let sInt = s[1..crlfpos-1]
if sInt.isDigit():
i = parseInt(sInt)
return (RedisValue(kind:vkInt, i:i), crlfpos+len(CRLF))
Decode bulkstrings
Bulkstrings are between $
followed by the string length and \r\n
- string length == 0: empty string
- string length == -1: nil
- string length > 0: string with data
proc decodeBulkStr(s:string): (RedisValue, int) =
let crlfpos = s.find(CRLF)
var bulklen = 0
let slen = s[1..crlfpos-1]
bulklen = parseInt(slen)
var bulk: string
if bulklen == -1:
bulk = nil
return (RedisValue(kind:vkBulkStr, bs:REDISNIL), crlfpos+len(CRLF))
else:
let nextcrlf = s.find(CRLF, crlfpos+len(CRLF))
bulk = s[crlfpos+len(CRLF)..nextcrlf-1]
return (RedisValue(kind:vkBulkStr, bs:bulk), nextcrlf+len(CRLF))
Decode arrays
This is the trickiest part is to decode array
- first we need to get the length between
*
and\r\n
- then decode objects
array length
times, and add them toarr
- As we are calling
decode
we should forward declared it
proc decode(s: string): (RedisValue, int)
proc decodeArray(s: string): (RedisValue, int) =
var arr = newSeq[RedisValue]()
var arrlen = 0
var crlfpos = s.find(CRLF)
var arrlenStr = s[1..crlfpos-1]
if arrlenStr.isDigit():
arrlen = parseInt(arrlenStr)
var nextobjpos = s.find(CRLF)+len(CRLF)
var i = nextobjpos
if arrlen == -1:
return (RedisValue(kind:vkArray, l:arr), i)
while i < len(s) and len(arr) < arrlen:
var pair = decode(s[i..len(s)])
var obj = pair[0]
arr.add(obj)
i += pair[1]
return (RedisValue(kind:vkArray, l:arr), i+len(CRLF))
So this RESP
*2\r\n
+Hello World\r\n
:23\r\n
\r\n
Should be decoded to RedisValue(kind:vkArray, l: @[RedisValue(kind:vkStr, s:"Hello World"), RedisValue(kind:vkInt, i:23)] )
Decode any object
Based on the first character we dispatch to the correct decoder then we skip the processed count
in the string to decode the next object.
proc decode(s: string): (RedisValue, int) =
var i = 0
while i < len(s):
var curchar = $s[i]
if curchar == "+":
var pair = decodeStr(s[i..s.find(CRLF, i)+len(CRLF)])
var obj = pair[0]
var count = pair[1]
i += count
return (obj, i)
elif curchar == "-":
var pair = decodeError(s[i..s.find(CRLF, i)+len(CRLF)])
var obj = pair[0]
var count = pair[1]
i += count
return (obj, i)
elif curchar == "$":
var pair = decodeBulkStr(s[i..len(s)])
var obj = pair[0]
var count = pair[1]
i += count
return (obj, i)
elif curchar == ":":
var pair = decodeInt(s[i..s.find(CRLF, i)+len(CRLF)])
var obj = pair[0]
var count = pair[1]
i += count
return (obj, i)
elif curchar == "*":
var pair = decodeArray(s[i..len(s)])
let obj = pair[0]
let count = pair[1]
i += count
return (obj, i)
else:
echo fmt"Unrecognized char {curchar}"
break
Preparing commands
In redis, commands are sent as List of RedisValues
so GET USER
is converted to *2\r\n$3\r\nGET\r\n$4\r\nUSER\r\n\r\n
proc prepareCommand*(this: Redis, command: string, args:seq[string]): string =
let cmdArgs = concat(@[command], args)
var cmdAsRedisValues = newSeq[RedisValue]()
for cmd in cmdArgs:
cmdAsRedisValues.add(RedisValue(kind:vkBulkStr, bs:cmd))
var arr = RedisValue(kind:vkArray, l: cmdAsRedisValues)
return encode(arr)
nim-resp
That day is based on nim-resp project, and on-going effort to create a redis client in Nim, it supports pipelining feature and all of the previous code. Feel free to send PRs or open issues
Day 13: Implementing Redis Client
Today we will implement a redis client for Nim. Requires reading Day 12 to create redis parser
Redisclient
We want to create a client to communicate with redis servers
As library designers we should keep in mind How people are going to use our library, specially if it's doing IO Operations and we need to make decisions about what kind of APIs are we going to support (blocking or nonblocking ones) or should we duplicate the functionality for both interfaces. Lucky us Nim is pretty neat when it comes to providing async, sync interfaces for your library.
What do we expect?
- Sync APIs: blocking APIs
let con = open("localhost", 6379.Port)
echo $con.execCommand("PING", @[])
echo $con.execCommand("SET", @["auser", "avalue"])
echo $con.execCommand("GET", @["auser"])
echo $con.execCommand("SCAN", @["0"])
- Async APIs: Nonblocking APIs around
async/await
let con = await openAsync("localhost", 6379.Port)
echo await con.execCommand("PING", @[])
echo await con.execCommand("SET", @["auser", "avalue"])
echo await con.execCommand("GET", @["auser"])
echo await con.execCommand("SCAN", @["0"])
echo await con.execCommand("SET", @["auser", "avalue"])
echo await con.execCommand("GET", @["auser"])
echo await con.execCommand("SCAN", @["0"])
await con.enqueueCommand("PING", @[])
await con.enqueueCommand("PING", @[])
await con.enqueueCommand("PING", @[])
echo await con.commitCommands()
- Pipelining
con.enqueueCommand("PING", @[])
con.enqueueCommand("PING", @[])
con.enqueueCommand("PING", @[])
echo $con.commitCommands()
Implementation
Imports and constants
Let's starts with main imports
import redisparser, strformat, tables, json, strutils, sequtils, hashes, net, asyncdispatch, asyncnet, os, strutils, parseutils, deques, options, net
Mainly
redisparser
because we will be manipulating redis values so let's not decouple the parsing and transportasyncnet, asyncdispatch
for async sockets APIsnet
for SSL and blocking APIs
Data types
Thinking of the expected APIs we talked about earlier we have some sort of client that has exactly the same operations with different blocking policies, so we can abstract it a bit
type
RedisBase[TSocket] = ref object of RootObj
socket: TSocket
connected: bool
timeout*: int
pipeline*: seq[RedisValue]
Base class parameterized on TSocket
that has
- socket: socket object that can be the blocking
net.Socket
or the nonoblockingasyncnet.AsyncSocket
- connected: flag to indicate the connection status
- timeout: to timeout (raise TimeoutError) after certain amount of seconds
Redis* = ref object of RedisBase[net.Socket]
Here we say Redis
is a sub type of RedisBase
and the type of transport socket we are using is the blocking net.Socket
AsyncRedis* = ref object of RedisBase[asyncnet.AsyncSocket]
Same, but here we say the socket we use is non blocking of type asyncnet.AsyncSocket
Opening Connection
proc open*(host = "localhost", port = 6379.Port, ssl=false, timeout=0): Redis =
result = Redis(
socket: newSocket(buffered = true),
)
result.pipeline = @[]
result.timeout = timeout
## .. code omitted for supporting SSL
result.socket.connect(host, port)
result.connected = true
Here we define open
proc the entry point to get sync redis client Redis
. We do some initializations regarding the endpoint and the timeout and setting that on our Redis
new object.
proc openAsync*(host = "localhost", port = 6379.Port, ssl=false, timeout=0): Future[AsyncRedis] {.async.} =
## Open an asynchronous connection to a redis server.
result = AsyncRedis(
socket: newAsyncSocket(buffered = true),
)
## .. code omitted for supporting SSL
result.pipeline = @[]
result.timeout = timeout
await result.socket.connect(host, port)
result.connected = true
Exactly the same thing for openAsync, but instead of returning Redis
we return a Future
of potential AsyncRedis
object
Executing commands
Our APIs will be created around execCommand
proc that will send some command
with arguments
formatted with redis
protocol (using the redisparser library) to a server using Our socket and then read a complete parsable RedisValue
back to the user (using readForm
proc)
- Sync version
proc execCommand*(this: Redis|AsyncRedis, command: string, args:seq[string]): RedisValue =
let cmdArgs = concat(@[command], args)
var cmdAsRedisValues = newSeq[RedisValue]()
for cmd in cmdArgs:
cmdAsRedisValues.add(RedisValue(kind:vkBulkStr, bs:cmd))
var arr = RedisValue(kind:vkArray, l: cmdAsRedisValues)
this.socket.send(encode(arr))
let form = this.readForm()
let val = decodeString(form)
return val
- Async version
proc execCommandAsync*(this: Redis|AsyncRedis, command: string, args:seq[string]): Future[RedisValue] =
let cmdArgs = concat(@[command], args)
var cmdAsRedisValues = newSeq[RedisValue]()
for cmd in cmdArgs:
cmdAsRedisValues.add(RedisValue(kind:vkBulkStr, bs:cmd))
var arr = RedisValue(kind:vkArray, l: cmdAsRedisValues)
await this.socket.send(encode(arr))
let form = await this.readForm()
let val = decodeString(form)
return val
It'd be very annoying to do provide duplicate procs for every single API get
and asyncGet
... etc
Multisync FTW!
Nim provides a very neat feature multisync
pragma that allows us to use the async
definition in sync scopes
Here is the details from nim
Macro which processes async procedures into both asynchronous and synchronous procedures. The generated async procedures use the
async
macro, whereas the generated synchronous procedures simply strip off theawait
calls.
proc execCommand*(this: Redis|AsyncRedis, command: string, args:seq[string]): Future[RedisValue] {.multisync.} =
let cmdArgs = concat(@[command], args)
var cmdAsRedisValues = newSeq[RedisValue]()
for cmd in cmdArgs:
cmdAsRedisValues.add(RedisValue(kind:vkBulkStr, bs:cmd))
var arr = RedisValue(kind:vkArray, l: cmdAsRedisValues)
await this.socket.send(encode(arr))
let form = await this.readForm()
let val = decodeString(form)
return val
Readers
readForm
is the other main proc in our client. readForm
is responsible for reading X amount of bytes from the socket until we have a complete RedisValue
object.
readMany
as the redis protocol encodes some information about the values lengths we can totally make use of that, so let's build a primitivereadMany
that reads X amount of the socket
proc readMany(this:Redis|AsyncRedis, count:int=1): Future[string] {.multisync.} =
if count == 0:
return ""
let data = await this.receiveManaged(count)
return data
Here again to make sure our code works with sync
and async
usages we use multisync
if the count required is 0 we return empty string without any fancy things with the socket otherwise we delegate to the receiveManaged
proc
receivedManaged
a bit into details version on how we read the data from the socket (could be combined in the readMany proc code)
proc receiveManaged*(this:Redis|AsyncRedis, size=1): Future[string] {.multisync.} =
result = newString(size)
when this is Redis:
if this.timeout == 0:
discard this.socket.recv(result, size)
else:
discard this.socket.recv(result, size, this.timeout)
else:
discard await this.socket.recvInto(addr result[0], size)
return result
We check the type of this
object using when/is
combo to dispatch to the correct implementation (sync or async) with timeouts or not
recv
has multiple versions one of them takes aTimeout
this.timeout
if the user wants to timeout after a whilerecvInto
is theasync
version and doesn't support timeouts
readForm
readForm
is used to retrieve a complete RedisValue
from the server using the primitives we provided like 1readManyor
receiveManaged`
Remember how we decode strings into RedisValue objects?
echo decodeString("*3\r\n:1\r\n:2\r\n:3\r\n\r\n")
# # @[1, 2, 3]
echo decodeString("+Hello, World\r\n")
# # Hello, World
echo decodeString("-Not found\r\n")
# # Not found
echo decodeString(":1512\r\n")
# # 1512
echo $decodeString("$32\r\nHello, World THIS IS REALLY NICE\r\n")
# Hello, World THIS IS REALLY NICE
echo decodeString("*2\r\n+Hello World\r\n:23\r\n")
# @[Hello World, 23]
echo decodeString("*2\r\n*3\r\n:1\r\n:2\r\n:3\r\n\r\n*5\r\n:5\r\n:7\r\n+Hello Word\r\n-Err\r\n$6\r\nfoobar\r\n")
# @[@[1, 2, 3], @[5, 7, Hello Word, Err, foobar]]
echo $decodeString("*4\r\n:51231\r\n$3\r\nfoo\r\n$-1\r\n$3\r\nbar\r\n")
# @[51231, foo, , bar]
We will be doing exactly the same, but the only tricky part is we are reading from a socket and we can't move freely forward/backward without consuming data.
The way we were decoding strings into RedisValues was by peeking on the first character to see what type we are decoding simple string
, bulkstring
, error
, int
, array
proc readForm(this:Redis|AsyncRedis): Future[string] {.multisync.} =
var form = ""
## code responsible of reading a complete parsable string representing RedisValue from the socket
return form
- Setup the loop
while true:
let b = await this.receiveManaged()
form &= b
## ...
as long as we aren't done reading a complete form yet we read just 1 byte and append it to the form string we will be returning (in the beginning that byte can be one of (+
, -
, :
, $
, *
)
- Simple String
if b == "+":
form &= await this.readStream(CRLF)
return form
If the character we peeking at is +
we read until we consume the \r\n
CRLF
(from redisparser library) because strings in redis protocl are contained between +
and CRLF
but wait! what's readStream
?
It's a small proc we need to consume bytes from the socket until we reach [and consume] a certain character
proc readStream(this:Redis|AsyncRedis, breakAfter:string): Future[string] {.multisync.} =
var data = ""
while true:
if data.endsWith(breakAfter):
break
let strRead = await this.receiveManaged()
data &= strRead
return data
- Errors
elif b == "-":
form &= await this.readStream(CRLF)
return form
Exactly the same as Simple strings
but we check on -
instead of +
- Ints
elif b == ":":
form &= await this.readStream(CRLF)
return form
Same, serialized between :
and CRLF
- Bulkstrings
elif b == "$":
let bulklenstr = await this.readStream(CRLF)
let bulklenI = parseInt(bulklenstr.strip())
form &= bulklenstr
if bulklenI == -1:
form &= CRLF
else:
form &= await this.readMany(bulklenI)
form &= await this.readStream(CRLF)
return form
From RESP page
Bulk Strings are used in order to represent a single binary safe string up to 512 MB in length.
Bulk Strings are encoded in the following way:
A "$" byte followed by the number of bytes composing the string (a prefixed length), terminated by CRLF.
The actual string data.
A final CRLF.
So the string "foobar" is encoded as follows:
"$6\r\nfoobar\r\n"
When an empty string is just:
"$0\r\n\r\n"
RESP Bulk Strings can also be used in order to signal non-existence of a value using a special format that is used to represent a Null value. In this special format the length is -1, and there is no data, so a Null is represented as:
"$-1\r\n"
So we can have
1- 0
for empty strings $0\r\n\r\n
:read from $
until we consume CRLF and CRLF
2- number
of bytes to read: read from $
N amounts of bytes then consume CRLF
3- -1
for nils read from $
until we consume CRLF
- Arrays
elif b == "*":
let lenstr = await this.readStream(CRLF)
form &= lenstr
let lenstrAsI = parseInt(lenstr.strip())
for i in countup(1, lenstrAsI):
form &= await this.readForm()
return form
Arrays can be bit tricky. To encode an array we do *
followed by array length then \r\n
then encode each element then end the array encoding with \r\n
As the arrays encode their length
we know how many inner forms
or items we need to read from the socket while reading the array
Pipelining
From redis pipelining page
A Request/Response server can be implemented so that it is able to process new requests even if the client didn't already read the old responses. This way it is possible to send multiple commands to the server without waiting for the replies at all, and finally read the replies in a single step.
This is called pipelining, and is a technique widely in use since many decades. For instance many POP3 protocol implementations already supported this feature, dramatically speeding up the process of downloading new emails from the server.
Redis supports pipelining since the very early days, so whatever version you are running, you can use pipelining with Redis. This is an example using the raw netcat utility:
$ (printf "PING\r\nPING\r\nPING\r\n"; sleep 1) | nc localhost 6379
+PONG
+PONG
+PONG
So the idea we maintain a sequence of commands commands to be executed enqueueCommand
and send them commitCommands
and reset the pipeline
sequence afterwards
proc enqueueCommand*(this:Redis|AsyncRedis, command:string, args: seq[string]): Future[void] {.multisync.} =
let cmdArgs = concat(@[command], args)
var cmdAsRedisValues = newSeq[RedisValue]()
for cmd in cmdArgs:
cmdAsRedisValues.add(RedisValue(kind:vkBulkStr, bs:cmd))
var arr = RedisValue(kind:vkArray, l: cmdAsRedisValues)
this.pipeline.add(arr)
proc commitCommands*(this:Redis|AsyncRedis) : Future[RedisValue] {.multisync.} =
for cmd in this.pipeline:
await this.socket.send(cmd.encode())
var responses = newSeq[RedisValue]()
for i in countup(0, len(this.pipeline)-1):
responses.add(decodeString(await this.readForm()))
this.pipeline = @[]
return RedisValue(kind:vkArray, l:responses)
Higher level APIs
are basically proc
s around the execCommand
proc and with using multisync
pargma you can have them enabled for both sync
and async
execution
proc del*(this: Redis | AsyncRedis, keys: seq[string]): Future[RedisValue] {.multisync.} =
## Delete a key or multiple keys
return await this.execCommand("DEL", keys)
proc exists*(this: Redis | AsyncRedis, key: string): Future[bool] {.multisync.} =
## Determine if a key exists
let val = await this.execCommand("EXISTS", @[key])
result = val.i == 1
nim-redisclient
That day is based on nim-redisclient project which is using some higher level API code from Nim/redis. Feel free to send PRs or open issues
Day 14: Nim Assets (bundle your assets into single binary)
Today we will implement nimassets
project heavily inspired by go-bindata
nimassets
Typically while developing projects we have assets like (icons, images, template files, css, javascript..etc) and It can be annoying to distribute them with your application or even risk losing them or misconfiguring paths or messed-up packaging script, so packaging all of them into the same binary would be an interesting option to have. these concerns were the reason to have something like go-bindata
or Qt resource system
What do we expect?
- Having single binary that has the actually resources into the executable.
- Generating nim file out of the
resources
we want to bundle. Maybe something likenimassets -d=templatesdir -o=assetsfile.nim
- Easy access to these bundled resources using
getAsset
proc
import assetsfile
echo assetsfile.getAsset("templatesdir/index.html")
The plan
So from a very highlevel
[ Resource1 ]
[ Resource2 ] -> converter (nimassets) -> [Nim file Representing the resources list]
[ Resource3 ]
The generated file should look like
import os, tables, strformat, base64, ospaths
var assets = initTable[string, string]()
proc getAsset*(path: string): string =
result = assets[path].decode()
assets[RESOURCE1_PATH] = BASE64_ENCODE(RESOURCE1_CONTENT)
assets[RESOURCE2_PATH] = BASE64_ENCODE(RESOURCE2_CONTENT)
assets[RESOURCE3_PATH] = BASE64_ENCODE(RESOURCE3_CONTENT)
...
...
...
...
- We store the resource path and its base64 encoded content in
assets
table - We will expose 1 proc
getAsset
that takespath
and returns the content bydecoding base64
content
Implementation
Let's go top down approach for the implementation
Command line arguments
const buildBranchName* = staticExec("git rev-parse --abbrev-ref HEAD") ## \
const buildCommit* = staticExec("git rev-parse HEAD") ## \
# const latestTag* = staticExec("git describe --abbrev=0 --tags") ## \
const versionString* = fmt"0.1.0 ({buildBranchName}/{buildCommit})"
proc writeHelp() =
echo fmt"""
nimassets {versionString} (Bundle your assets into nim file)
-h | --help : show help
-v | --version : show version
-o | --output : output filename
-f | --fast : faster generation
-d | --dir : dir to include (recursively)
"""
proc writeVersion() =
echo fmt"nimassets version {versionString}"
proc cli*() =
var
compress, fast : bool = false
dirs = newSeq[string]()
output = "assets.nim"
if paramCount() == 0:
writeHelp()
quit(0)
for kind, key, val in getopt():
case kind
of cmdLongOption, cmdShortOption:
case key
of "help", "h":
writeHelp()
quit()
of "version", "v":
writeVersion()
quit()
of "fast", "f": fast = true
of "dir", "d": dirs.add(val)
of "output", "o": output = val
else:
discard
else:
discard
for d in dirs:
if not dirExists(d):
echo fmt"[-] Directory doesnt exist {d}"
quit 2 # 2 means dir doesn't exist.
# echo fmt"compress: {compress} fast: {fast} dirs:{dirs} output:{output}"
createAssetsFile(dirs, output, fast, compress)
when isMainModule:
cli()
Pretty simple, we accept list of directories (using -d
or --dir
flag) to bundle into a nim file defined using output
flag (assets.nim
by default)
--fast
flag indicates if we should use threading or not to speed up a little
compress
used to allow compression we will pass it always as false
for version information (branch and commit id) we used some git commands combined with
staticExec
to ensure these values are available at compile time
createAssetsFile
this proc is the entry to our application as it receives seq of the directories we want to bundle, the output filename, code optimization, and will make use of compress flag in the future
proc createAssetsFile(dirs:seq[string], outputfile="assets.nim", fast=false, compress=false) =
var generator: proc(s:string): string
var data = assetsFileHeader
if fast:
generator = generateDirAssetsSpawn
else:
generator = generateDirAssetsSimple
for d in dirs:
data &= generator(d)
writeFile(outputfile, data)
Here we write (the header of the assets file and the result of generating the bundle of each directory) to the outputfile
and either we bundle files one by one (using generateDirAssetsSimple
) or separately (using generateDirAssetsSpawn
)
generateDirAssetsSimple
proc generateDirAssetsSimple(dir:string): string =
var key, val, valString: string
for path in expandTilde(dir).walkDirRec():
key = path
val = readFile(path).encode()
valString = " \"\"\"" & val & "\"\"\" "
result &= fmt"""assets.add("{path}", {valString})""" & "\n\n"
We walk recursively on the directory using walkDirRec
and write down the part assets[RESOURECE_PATH] = ENCODE_BASE64(RESOURCE CONTENT)
for each file in the directory.
generateDirAssetsSpawn
proc handleFile(path:string): string {.thread.} =
var val, valString: string
val = readFile(path).encode()
valString = " \"\"\"" & val & "\"\"\" "
result = fmt"""assets.add("{path}", {valString})""" & "\n\n"
proc generateDirAssetsSpawn(dir: string): string =
var results = newSeq[FlowVar[string]]()
for path in expandTilde(dir).walkDirRec():
results.add(spawn handleFile(path))
# wait till all of them are done.
for r in results:
result &= ^r
the same but as generateDirAssetsSimple
but using spawn to do generate the assets table entry
And that's basically it.
nimassets
All of the code is based on nimassets project. Feel free to send a PR or report issues.
Day 15: TCP Router (Routing TCP traffic)
Today we will implement a tcp router
or tcp portforwarder as it works against only 1 endpoint.
What do we expect?
let opts = ForwardOptions(listenAddr:"127.0.0.1", listenPort:11000.Port, toAddr:"127.0.0.1", toPort:6379.Port)
var f = newForwarder(opts)
asyncCheck f.serve()
runForever()
and then you can do
redis-client -p 11000
> PING
PONG
The plan
- Listen on
listenPort
on addresslistenAddr
and accept connections. - On every new connection (incoming)
- open a socket to
toPort
ontoAddr
(outgoing) - whenever data is ready on any of both ends write the data to the other one
- open a socket to
How ready?
Linux provides APIs like select, poll to watch
or monitor
set of file descriptors and allows you to do
some action on whatever ready
file descriptor for reading or writing.
The select() function gives you a way to simultaneously check multiple sockets to see if they have data waiting to be recv()d, or if you can send() data to them without blocking, or if some exception has occurred.
Please check Beej's guide to network programming for more on that
Imports
import strformat, tables, json, strutils, sequtils, hashes, net, asyncdispatch, asyncnet, os, strutils, parseutils, deques, options, net
Types
Options for the server specifying on which address to listen and where to forward the traffic.
type ForwardOptions = object
listenAddr*: string
listenPort*: Port
toAddr*: string
toPort*: Port
type Forwarder = object of RootObj
options*: ForwardOptions
proc newForwarder(opts: ForwardOptions): ref Forwarder =
result = new(Forwarder)
result.options = opts
Represents the server the forwarder
and newForwarder
creates a forwader and sets its options
Server setup
proc serve(this: ref Forwarder) {.async.} =
var server = newAsyncSocket(buffered=false)
server.setSockOpt(OptReuseAddr, true)
server.bindAddr(this.options.listenPort, this.options.listenAddr)
echo fmt"Started tcp server... {this.options.listenAddr}:{this.options.listenPort} "
server.listen()
while true:
let client = await server.accept()
echo "..Got connection "
asyncCheck this.processClient(client)
We will utilize async/await features of nim to build our server.
-
Create a new socket with
newAsyncSocket
(make sure to set buffered to false so Nim doesn't try to read all requested data) -
setSockOpts
allows you to make the socket reusable
SO_REUSEADDR is used in servers mainly because it's common that you need to restart the server for the sake of trying or changing configurations (some use SIGHUP to update the configuration as a pattern) and if there were active connections the next time you start the server will fail.
bindAddr
binds the server to certian address and portlistenAddr
andlistenPort
- then we start a loop to recieve connections.
- we should call
await processClient
right? whyasyncCheck processClient
await vs asyncCheck
await
means execute that async action andblock
the execution until you get a result.asyncCheck
means execute async action anddon't block
a suitable name might bediscard
ordiscardAsync
No we can answer the question why call asyncCheck processClient
instead of await processClient
is because we will block the event machine until processClient
completely executes which defeats the purpose of concurrency and accepting/handling multiple clients.
Process a client
Establish the connection
proc processClient(this: ref Forwarder, client: AsyncSocket) {.async.} =
let remote = newAsyncSocket(buffered=false)
await remote.connect(this.options.toAddr, this.options.toPort)
...
First thing is to get a socket to the endpoint where we forward the traffic defined in the ForwardOptions
toAddr
and toPort
No we could've established a loop and reading data from the client
socket and write it to the remote
socket
Problem is we may get out of sync, sometimes the remote sends data once a client connects to it before reading anything from the client. Maybe the remote sends information like server version or some metadata or instructions on protocol and it may not we can't be sure that it's waiting on recieving data always as the first step. So what we can do is watch
the file descriptors and whoever has data we write to the other one.
e.g
- remote has data: we read
recv
it and writesend
it to the client. - client has data: we read
recv
it and writesend
it to the remote.
The remote has data
proc remoteHasData() {.async.} =
while not remote.isClosed and not client.isClosed:
echo " in remote has data loop"
let data = await remote.recv(1024)
echo "got data: " & data
await client.send(data)
client.close()
remote.close()
The client has data
proc clientHasData() {.async.} =
while not client.isClosed and not remote.isClosed:
echo "in client has data loop"
let data = await client.recv(1024)
echo "got data: " & data
await remote.send(data)
client.close()
remote.close()
Run the data processors
Now let's register clientHasData
and remoteHasData
procs to the event machine and LET'S NOT BLOCK
on any of them (remember if you don't want to block then you need asyncCheck
)
try:
asyncCheck clientHasData()
asyncCheck remoteHasData()
except:
echo getCurrentExceptionMsg()
So now our processClient
should look like
proc processClient(this: ref Forwarder, client: AsyncSocket) {.async.} =
let remote = newAsyncSocket(buffered=false)
await remote.connect(this.options.toAddr, this.options.toPort)
proc clientHasData() {.async.} =
while not client.isClosed and not remote.isClosed:
echo "in client has data loop"
let data = await client.recv(1024)
echo "got data: " & data
await remote.send(data)
client.close()
remote.close()
proc remoteHasData() {.async.} =
while not remote.isClosed and not client.isClosed:
echo " in remote has data loop"
let data = await remote.recv(1024)
echo "got data: " & data
await client.send(data)
client.close()
remote.close()
try:
asyncCheck clientHasData()
asyncCheck remoteHasData()
except:
echo getCurrentExceptionMsg()
Let's forward to redis
let opts = ForwardOptions(listenAddr:"127.0.0.1", listenPort:11000.Port, toAddr:"127.0.0.1", toPort:6379.Port)
var f = newForwarder(opts)
asyncCheck f.serve()
runForever()
runForever
begins a never ending global dispatch poll loop
our full code
# This is just an example to get you started. A typical binary package
# uses this file as the main entry point of the application.
import strformat, tables, json, strutils, sequtils, hashes, net, asyncdispatch, asyncnet, os, strutils, parseutils, deques, options, net
type ForwardOptions = object
listenAddr*: string
listenPort*: Port
toAddr*: string
toPort*: Port
type Forwarder = object of RootObj
options*: ForwardOptions
proc processClient(this: ref Forwarder, client: AsyncSocket) {.async.} =
let remote = newAsyncSocket(buffered=false)
await remote.connect(this.options.toAddr, this.options.toPort)
proc clientHasData() {.async.} =
while not client.isClosed and not remote.isClosed:
echo "in client has data loop"
let data = await client.recv(1024)
echo "got data: " & data
await remote.send(data)
client.close()
remote.close()
proc remoteHasData() {.async.} =
while not remote.isClosed and not remote.isClosed:
echo " in remote has data loop"
let data = await remote.recv(1024)
echo "got data: " & data
await client.send(data)
client.close()
remote.close()
try:
asyncCheck clientHasData()
asyncCheck remoteHasData()
except:
echo getCurrentExceptionMsg()
proc serve(this: ref Forwarder) {.async.} =
var server = newAsyncSocket(buffered=false)
server.setSockOpt(OptReuseAddr, true)
server.bindAddr(this.options.listenPort, this.options.listenAddr)
echo fmt"Started tcp server... {this.options.listenAddr}:{this.options.listenPort} "
server.listen()
while true:
let client = await server.accept()
echo "..Got connection "
asyncCheck this.processClient(client)
proc newForwarder(opts: ForwardOptions): ref Forwarder =
result = new(Forwarder)
result.options = opts
let opts = ForwardOptions(listenAddr:"127.0.0.1", listenPort:11000.Port, toAddr:"127.0.0.1", toPort:6379.Port)
var f = newForwarder(opts)
asyncCheck f.serve()
runForever()
This project is very simple, but helped us tackle multiple concepts like how to utilize async/await
and asyncCheck
interesting use cases (literally @dom96 explained it to me). Of course, It can be extended to support something like forwarding TLS traffic based on SNI So you can serve multiple backends (with domains) using a single Public IP :)
Please feel free to contribute by opening PR or issue on the repo.
Day 16: Ascii Tables
ASCII tables are everywhere, every time you issue SQL select or use tools like docker to see your beloved containers or seeing your todo list in a fancy terminal todo app
What to expect
Being able to render tables in the terminal, control the widths and the rendering characters.
var t = newAsciiTable()
t.tableWidth = 80
t.setHeaders(@["ID", "Name", "Date"])
t.addRow(@["1", "Aaaa", "2018-10-2"])
t.addRow(@["2", "bbvbbba", "2018-10-2"])
t.addRow(@["399", "CCC", "1018-5-2"])
printTable(t)
+---------------------------+---------------------------+---------------------------+
|ID |Name |Date |
+---------------------------+---------------------------+---------------------------+
|1 |Aaaa |2018-10-2 |
+---------------------------+---------------------------+---------------------------+
|2 |bbvbbba |2018-10-2 |
+---------------------------+---------------------------+---------------------------+
|399 |CCC |1018-5-2 |
+---------------------------+---------------------------+---------------------------+
or let nim decides for you
t.tableWidth = 0
printTable(t)
+---+-------+---------+
|ID |Name |Date |
+---+-------+---------+
|1 |Aaaa |2018-10-2|
+---+-------+---------+
|2 |bbvbbba|2018-10-2|
+---+-------+---------+
|399|CCC |1018-5-2 |
+---+-------+---------+
or even remote the separators between the rows.
+---+-------+---------+
|ID |Name |Date |
+---+-------+---------+
|1 |Aaaa |2018-10-2|
|2 |bbvbbba|2018-10-2|
|399|CCC |1018-5-2 |
+---+-------+---------+
Why not to do it manually?
Well if you want to write code like this
var widths = @[0,0,0,0] #id, name, ports, root
for k, v in info:
if len($v.id) > widths[0]:
widths[0] = len($v.id)
if len($v.name) > widths[1]:
widths[1] = len($v.name)
if len($v.ports) > widths[2]:
widths[2] = len($v.ports)
if len($v.root) > widths[3]:
widths[3] = len($v.root)
var sumWidths = 0
for w in widths:
sumWidths += w
echo "-".repeat(sumWidths)
let extraPadding = 5
echo "| ID" & " ".repeat(widths[0]+ extraPadding-4) & "| Name" & " ".repeat(widths[1]+extraPadding-6) & "| Ports" & " ".repeat(widths[2]+extraPadding-6 ) & "| Root" & " ".repeat(widths[3]-6)
echo "-".repeat(sumWidths)
for k, v in info:
let nroot = replace(v.root, "https://hub.grid.tf/", "").strip()
echo "|" & $v.id & " ".repeat(widths[0]-len($v.id)-1 + extraPadding) & "|" & v.name & " ".repeat(widths[1]-len(v.name)-1 + extraPadding) & "|" & v.ports & " ".repeat(widths[2]-len(v.ports)+extraPadding) & "|" & nroot & " ".repeat(widths[3]-len(v.root)+ extraPadding-2) & "|"
echo "-".repeat(sumWidths)
result = ""
be my guest :)
imports
Not much, but we will deal with lots of strings
import strformat, strutils
Types
Let's think a bit about the entities of a Table.
well we have Table
, headers
, rows
, columns
and each row has a cell
Cell
type Cell* = object
leftpad*: int
rightpad: int
pad*: int
text*: string
Describes the Cell and we define properties like leftpad
and rightpad
to set the padding around the text in the cell. Also, we used pad
general property to create equal leftpad
and rightpad
proc newCell*(text: string, leftpad=1, rightpad=1, pad=0): ref Cell =
result = new Cell
result.pad = pad
if pad != 0:
result.leftpad = pad
result.rightpad = pad
else:
result.leftpad = leftpad
result.rightpad = rightpad
result.text = text
proc len*(this:ref Cell): int =
result = this.leftpad + this.text.len + this.rightpad
Cell length is the length of the whitespaces in the paddings left
and right
+ the text length.
proc `$`*(this:ref Cell): string =
result = " ".repeat(this.leftpad) & this.text & " ".repeat(this.rightpad)
String representation of our Cell.
proc newCellFromAnother(another: ref Cell): ref Cell =
result = newCell(text=another.text, leftpad=another.leftpad, rightpad=another.rightpad)
Little helper procedure to properties from a cell to another
Table
Now let's talk a bit about the table
type AsciiTable* = object
rows: seq[seq[string]]
headers: seq[ref Cell]
rowSeparator*: char
colSeparator*: char
cellEdge*: char
widths: seq[int]
suggestedWidths: seq[int]
tableWidth*: int
separateRows*: bool
AsciiTable describes a table.
- headers makes sense to a seq of strings
@["id", "name", ...]
or a list ofCell
s. we will describe it using a seq ofCell
. - tableWidth: you set the total size of the table.
- rowSeparator: character separates rows
- colSeparator: character separates columns
- cellEdge: character on the edge of each cell Remeber that's how our table looks
+---+-------+---------+
|ID |Name |Date |
+---+-------+---------+
|1 |Aaaa |2018-10-2|
+---+-------+---------+
|399|CCC |1018-5-2 |
+---+-------+---------+
We see each row is separated by rowSeparator
-
line and cellEdge
+
on the edgeof every cell and the columns are separated by colSeparator
|
- separateRows property allows us to remove the separator between rows
without separator
+---+-------+---------+
|ID |Name |Date |
+---+-------+---------+
|1 |Aaaa |2018-10-2|
|2 |bbvbbba|2018-10-2|
|399|CCC |1018-5-2 |
+---+-------+---------+
with separator
+---+-------+---------+
|ID |Name |Date |
+---+-------+---------+
|1 |Aaaa |2018-10-2|
+---+-------+---------+
|2 |bbvbbba|2018-10-2|
+---+-------+---------+
|399|CCC |1018-5-2 |
+---+-------+---------+
proc newAsciiTable*(): ref AsciiTable =
result = new AsciiTable
result.rowSeparator='-'
result.colSeparator='|'
result.cellEdge='+'
result.tableWidth=0
result.separateRows=true
result.widths = newSeq[int]()
result.suggestedWidths = newSeq[int]()
result.rows = newSeq[seq[string]]()
result.headers = newSeq[ref Cell]()
Helper to initialize the table.
proc columnsCount*(this: ref AsciiTable): int =
result = this.headers.len
helper to get the number of columns.
proc setHeaders*(this: ref AsciiTable, headers:seq[string]) =
for s in headers:
var cell = newCell(s)
this.headers.add(cell)
proc setHeaders*(this: ref AsciiTable, headers: seq[ref Cell]) =
this.headers = headers
Allow the usage of strings directly as for headers or customized Cells
proc setRows*(this: ref AsciiTable, rows:seq[seq[string]]) =
this.rows = rows
proc addRow*(this: ref AsciiTable, row:seq[string]) =
this.rows.add(row)
Helpers to add rows to the table data structure
proc printTable*(this: ref AsciiTable) =
echo(this.render())
this will print the rendered
table which is prepared using render
proc.
proc reset*(this:ref AsciiTable) =
this.rowSeparator='-'
this.colSeparator='|'
this.cellEdge='+'
this.tableWidth=0
this.separateRows=true
this.widths = newSeq[int]()
this.suggestedWidths = newSedq[int]()
this.rows = newSeq[seq[string]]()
this.headers = newSeq[ref Cell]()
Resets table defaults.
Rendering the table.
Let's assume for a second that widths
property has all the information about the size of each column based on its index
e.g widths => [5, 10, 20]
means
- column 0 can hold maximum of 5 char cell.
- column 1 can hold maximum of 10 chars cell.
- column 2 can hold maximum of 20 chars cell.
the column cells
size can't be varied so we set the size to the LONGEST
item in the column.
it's bit tedious so we will get back to it later.
proc oneLine(this: ref AsciiTable): string =
result &= this.cellEdge
for w in this.widths:
result &= this.rowSeparator.repeat(w) & this.cellEdge
result &= "\n"
oneLine helps in creating such line
+---+-------+---------+
So how does it work?
1- add the cellEdge
+
on the left
2- add colSeparator
-
until you consume the size of the width of the column you are at and then add cellEdge
again.
3- add new line. \n
Steps for each width.
+
+---+
+---+-------+
+---+-------+---------+
proc render*(this: ref AsciiTable): string =
this.calculateWidths()
We start by calling our magic function calculateWidths
# top border
result &= this.oneline()
Generate the top border line of the table.
# headers
for colidx, h in this.headers:
result &= this.colSeparator & $h & " ".repeat(this.widths[colidx]-len(h) )
result &= this.colSeparator
result &= "\n"
# finish headers
# line after headers
Now the headers
|ID |Name |Date |
So we start with colSeparator
|
for each header defined in this.headers
the print the content of the header (which is a cell so we print the leftpad + text + rightpad ) and add colSeparator
|
to the end of the items
result &= this.oneline()
Add another line, So our table looks like this now.
+---+-------+---------+
|ID |Name |Date |
+---+-------+---------+
# start rows
for r in this.rows:
# start row
for colidx, c in r:
let cell = newCell(c, leftpad=this.headers[colidx].leftpad, rightpad=this.headers[colidx].rightpad)
result &= this.colSeparator & $cell & " ".repeat(this.widths[colidx]-len(cell))
result &= this.colSeparator
result &= "\n"
Now exactly the same for each row, we get the row and print it the same way we printed the headers and follow it by a new line.
Our table looks like this now
+---+-------+---------+
|ID |Name |Date |
+---+-------+---------+
|1 |Aaaa |2018-10-2|
if this.separateRows:
result &= this.oneLine()
# finish row
Now we need to decide: are all the rows have line separating them or they don't.
In case if they have separators we finish the row by adding another oneLine
+---+-------+---------+
|ID |Name |Date |
+---+-------+---------+
|1 |Aaaa |2018-10-2|
+---+-------+---------+
or if it doesn't have separators and we want our table to look like this in the end
+---+-------+---------+
|ID |Name |Date |
+---+-------+---------+
|1 |Aaaa |2018-10-2|
|2 |bbvbbba|2018-10-2|
we don't add oneLine
# don't duplicate the finishing line if it's already printed in case of this.separateRows
if not this.separateRows:
result &= this.oneLine()
return result
if we don't separateRows we add the final oneLine
to the table
+---+-------+---------+
|ID |Name |Date |
+---+-------+---------+
|1 |Aaaa |2018-10-2|
|2 |bbvbbba|2018-10-2|
+---+-------+---------+ <- the final oneLine
if we do separateRows we shouldn't add another oneLine
or our table will be rendered like
+---+-------+---------+
|ID |Name |Date |
+---+-------+---------+
|1 |Aaaa |2018-10-2|
+---+-------+---------+
|2 |bbvbbba|2018-10-2|
+---+-------+---------+
+---+-------+---------+
Now back to calculating widths
Back to the magic function. To be honest, it's not magical it's just bit tedious. So the basic idea is:
proc calculateWidths(this: ref AsciiTable) =
var colsWidths = newSeq[int]()
a list of column widths
if this.suggestedWidths.len == 0:
for h in this.headers:
colsWidths.add(h.len)
else:
colsWidths = this.suggestedWidths
the user might suggest some widths via suggestedWidths
property, so can use them for guidance.
for row in this.rows:
for colpos, c in row:
var acell = newCellFromAnother(this.headers[colpos])
acell.text = c
if len(acell) > colsWidths[colpos]:
colsWidths[colpos] = len(acell)
we get the size length
of each column by iterating on all the rows and find the max
item (the cell with the longest size) in the position of the column in every row and that max
will be the column width.
We support other options like totalWidth
of the Table and that will make equal column sizes if the user didn't suggest widths
let sizeForCol = (this.tablewidth/len(this.headers)).toInt()
var lenHeaders = 0
for w in colsWidths:
lenHeaders += w
Here we calculate the length of each header equally
using table width specified by the user divided by the number of columns headers
if this.tablewidth > lenHeaders:
if this.suggestedWidths.len == 0:
for colpos, c in colsWidths:
colsWidths[colpos] += sizeForCol - c
if the user didn't suggest any widths then he wants the table columns of equal length
if this.suggestedWidths.len != 0:
var sumSuggestedWidths = 0
for s in this.suggestedWidths:
sumSuggestedWidths += s
if lenHeaders > sumSuggestedWidths:
raise newException(ValueError, fmt"sum of {this.suggestedWidths} = {sumSuggestedWidths} and it's less than required length {lenHeaders}")
if the user suggested some widths we caculate the sum of what user suggested and check if greater than
the calculated lenHeaders
and if it's not we raise an exception.
this.widths = colsWidths
Phew! We finally set the widths property now
nim-asciitable
this day is based on my project nim-asciitables and it's superseded by nim-terminaltables which provides more customizable styles and unicode box drawing support.
Day 17: Nim-Sonic-Client: Nim and Rust can be friends!
sonic is a fast, lightweight and schema-less search backend. It ingests search texts and identifier tuples that can then be queried against in a microsecond's time, and it's implemented in rust. Sonic can be used as a simple alternative to super-heavy and full-featured search backends such as Elasticsearch in some use-cases. It is capable of normalizing natural language search queries, auto-completing a search query and providing the most relevant results for a query. Sonic is an identifier index, rather than a document index; when queried, it returns IDs that can then be used to refer to the matched documents in an external database. We use it heavily in all of our projects currently using python client, but we are here today to talk about nim. Please make sure to check sonic website for more info on how start the server and its configurations
What to expect ?
Ingest
We should be able to push data over tcp from nim to sonic
var cl = open("127.0.0.1", 1491, "dmdm", SonicChannel.Ingest)
echo $cl.execCommand("PING")
echo cl.ping()
echo cl.protocol
echo cl.bufsize
echo cl.push("wiki", "articles", "article-1",
"for the love of god hell")
echo cl.push("wiki", "articles", "article-2",
"for the love of satan heaven")
echo cl.push("wiki", "articles", "article-3",
"for the love of lorde hello")
echo cl.push("wiki", "articles", "article-4",
"for the god of loaf helmet")
PONG
true
0
0
true
2
0
true
true
true
Search
We should be able to search/complete data from nim client using sonic
var cl = open("127.0.0.1", 1491, "dmdm", SonicChannel.Search)
echo $cl.execCommand("PING")
echo cl.ping()
echo cl.query("wiki", "articles", "for")
echo cl.query("wiki", "articles", "love")
echo cl.suggest("wiki", "articles", "hell")
echo cl.suggest("wiki", "articles", "lo")
PONG
true
@[]
@["article-3", "article-2"]
@[]
@["loaf", "lorde", "love"]
Sonic specification
If you go to their wire protocol page you will find some examples using telnet. I'll copy some in the following section
2️⃣ Sonic Channel (uninitialized)
START <mode> <password>
: select mode to use for connection (either:search
oringest
). The password is found in theconfig.cfg
file atchannel.auth_password
.
Issuing any other command — eg. QUIT
— in this mode will abort the TCP connection, effectively resulting in a QUIT
with the ENDED not_recognized
response.
3️⃣ Sonic Channel (Search mode)
The Sonic Channel Search mode is used for querying the search index. Once in this mode, you cannot switch to other modes or gain access to commands from other modes.
➡️ Available commands:
QUERY
: query database (syntax:QUERY <collection> <bucket> "<terms>" [LIMIT(<count>)]? [OFFSET(<count>)]? [LANG(<locale>)]?
; time complexity:O(1)
if enough exact word matches orO(N)
if not enough exact matches whereN
is the number of alternate words tried, in practice it approachesO(1)
)SUGGEST
: auto-completes word (syntax:SUGGEST <collection> <bucket> "<word>" [LIMIT(<count>)]?
; time complexity:O(1)
)PING
: ping server (syntax:PING
; time complexity:O(1)
)HELP
: show help (syntax:HELP [<manual>]?
; time complexity:O(1)
)QUIT
: stop connection (syntax:QUIT
; time complexity:O(1)
)
⏩ Syntax terminology:
<collection>
: index collection (ie. what you search in, eg.messages
,products
, etc.);<bucket>
: index bucket name (ie. user-specific search classifier in the collection if you have any eg.user-1, user-2, ..
, otherwise use a common bucket name eg.generic, default, common, ..
);<terms>
: text for search terms (between quotes);<count>
: a positive integer number; set within allowed maximum & minimum limits;<locale>
: an ISO 639-3 locale code eg.eng
for English (if set, the locale must be a valid ISO 639-3 code; if set tonone
, lexing will be disabled; if not set, the locale will be guessed from text);<manual>
: help manual to be shown (available manuals:commands
);
Notice: the bucket
terminology may confuse some Sonic users. As we are well-aware Sonic may be used in an environment where end-users may each hold their own search index in a given collection
, we made it possible to manage per-end-user search indexes with bucket
. If you only have a single index per collection
(most Sonic users will), we advise you use a static generic name for your bucket
, for instance: default
.
⬇️ Search flow example (via telnet
):
T1: telnet sonic.local 1491
T2: Trying ::1...
T3: Connected to sonic.local.
T4: Escape character is '^]'.
T5: CONNECTED <sonic-server v1.0.0>
T6: START search SecretPassword
T7: STARTED search protocol(1) buffer(20000)
T8: QUERY messages user:0dcde3a6 "valerian saliou" LIMIT(10)
T9: PENDING Bt2m2gYa
T10: EVENT QUERY Bt2m2gYa conversation:71f3d63b conversation:6501e83a
T11: QUERY helpdesk user:0dcde3a6 "gdpr" LIMIT(50)
T12: PENDING y57KaB2d
T13: QUERY helpdesk user:0dcde3a6 "law" LIMIT(50) OFFSET(200)
T14: PENDING CjPvE5t9
T15: PING
T16: PONG
T17: EVENT QUERY CjPvE5t9
T18: EVENT QUERY y57KaB2d article:28d79959
T19: SUGGEST messages user:0dcde3a6 "val"
T20: PENDING z98uDE0f
T21: EVENT SUGGEST z98uDE0f valerian valala
T22: QUIT
T23: ENDED quit
T24: Connection closed by foreign host.
Notes on what happens:
- T6: we enter
search
mode (this is required to enablesearch
commands); - T8: we query collection
messages
, in bucket for platform useruser:0dcde3a6
with search termsvalerian saliou
and a limit of10
on returned results; - T9: Sonic received the query and stacked it for processing with marker
Bt2m2gYa
(the marker is used to track the asynchronous response); - T10: Sonic processed search query of T8 with marker
Bt2m2gYa
and sends 2 search results (those are conversation identifiers, that refer to a primary key in an external database); - T11 + T13: we query collection
helpdesk
twice (in the example, this one is heavy, so processing of results takes more time); - T17 + T18: we receive search results for search queries of T11 + T13 (this took a while!);
4️⃣ Sonic Channel (Ingest mode)
The Sonic Channel Ingest mode is used for altering the search index (push, pop and flush). Once in this mode, you cannot switch to other modes or gain access to commands from other modes.
➡️ Available commands:
PUSH
: Push search data in the index (syntax:PUSH <collection> <bucket> <object> "<text>" [LANG(<locale>)]?
; time complexity:O(1)
)POP
: Pop search data from the index (syntax:POP <collection> <bucket> <object> "<text>"
; time complexity:O(1)
)COUNT
: Count indexed search data (syntax:COUNT <collection> [<bucket> [<object>]?]?
; time complexity:O(1)
)FLUSHC
: Flush all indexed data from a collection (syntax:FLUSHC <collection>
; time complexity:O(1)
)FLUSHB
: Flush all indexed data from a bucket in a collection (syntax:FLUSHB <collection> <bucket>
; time complexity:O(N)
whereN
is the number of bucket objects)FLUSHO
: Flush all indexed data from an object in a bucket in collection (syntax:FLUSHO <collection> <bucket> <object>
; time complexity:O(1)
)PING
: ping server (syntax:PING
; time complexity:O(1)
)HELP
: show help (syntax:HELP [<manual>]?
; time complexity:O(1)
)QUIT
: stop connection (syntax:QUIT
; time complexity:O(1)
)
⏩ Syntax terminology:
<collection>
: index collection (ie. what you search in, eg.messages
,products
, etc.);<bucket>
: index bucket name (ie. user-specific search classifier in the collection if you have any eg.user-1, user-2, ..
, otherwise use a common bucket name eg.generic, default, common, ..
);<object>
: object identifier that refers to an entity in an external database, where the searched object is stored (eg. you use Sonic to index CRM contacts by name; full CRM contact data is stored in a MySQL database; in this case the object identifier in Sonic will be the MySQL primary key for the CRM contact);<text>
: search text to be indexed (can be a single word, or a longer text; within maximum length safety limits; between quotes);<locale>
: an ISO 639-3 locale code eg.eng
for English (if set, the locale must be a valid ISO 639-3 code; if set tonone
, lexing will be disabled; if not set, the locale will be guessed from text);<manual>
: help manual to be shown (available manuals:commands
);
Notice: the bucket
terminology may confuse some Sonic users. As we are well-aware Sonic may be used in an environment where end-users may each hold their own search index in a given collection
, we made it possible to manage per-end-user search indexes with bucket
. If you only have a single index per collection
(most Sonic users will), we advise you use a static generic name for your bucket
, for instance: default
.
⬇️ Ingest flow example (via telnet
):
T1: telnet sonic.local 1491
T2: Trying ::1...
T3: Connected to sonic.local.
T4: Escape character is '^]'.
T5: CONNECTED <sonic-server v1.0.0>
T6: START ingest SecretPassword
T7: STARTED ingest protocol(1) buffer(20000)
T8: PUSH messages user:0dcde3a6 conversation:71f3d63b Hey Valerian
T9: ERR invalid_format(PUSH <collection> <bucket> <object> "<text>")
T10: PUSH messages user:0dcde3a6 conversation:71f3d63b "Hello Valerian Saliou, how are you today?"
T11: OK
T12: COUNT messages user:0dcde3a6
T13: RESULT 43
T14: COUNT messages user:0dcde3a6 conversation:71f3d63b
T15: RESULT 1
T16: FLUSHO messages user:0dcde3a6 conversation:71f3d63b
T17: RESULT 1
T18: FLUSHB messages user:0dcde3a6
T19: RESULT 42
T20: PING
T21: PONG
T22: QUIT
T23: ENDED quit
T24: Connection closed by foreign host.
Notes on what happens:
- T6: we enter
ingest
mode (this is required to enableingest
commands); - T8: we try to push text
Hey Valerian
to the index, in collectionmessages
, bucketuser:0dcde3a6
and objectconversation:71f3d63b
(the syntax that was used is invalid); - T9: Sonic refuses the command we issued in T8, and provides us with the correct command format (notice that
<text>
should be quoted); - T10: we attempt to push another text in the same collection, bucket and object as in T8;
- T11: this time, our push command in T10 was valid (Sonic acknowledges the push commit to the search index);
- T12: we count the number of indexed terms in collection
messages
and bucketuser:0dcde3a6
; - T13: there are 43 terms (ie. words) in index for query in T12;
- T18: we flush all index data from collection
messages
and bucketuser:0dcde3a6
; - T19: 42 terms have been flushed from index for command in T18;
5️⃣ Sonic Channel (Control mode)
The Sonic Channel Control mode is used for administration purposes. Once in this mode, you cannot switch to other modes or gain access to commands from other modes.
➡️ Available commands:
TRIGGER
: trigger an action (syntax:TRIGGER [<action>]? [<data>]?
; time complexity:O(1)
)INFO
: get server information (syntax:INFO
; time complexity:O(1)
)PING
: ping server (syntax:PING
; time complexity:O(1)
)HELP
: show help (syntax:HELP [<manual>]?
; time complexity:O(1)
)QUIT
: stop connection (syntax:QUIT
; time complexity:O(1)
)
⏩ Syntax terminology:
<action>
: action to be triggered (available actions:consolidate
,backup
,restore
);<data>
: additional data to provide to the action (required for:backup
,restore
);<manual>
: help manual to be shown (available manuals:commands
);
⬇️ Control flow example (via telnet
):
T1: telnet sonic.local 1491
T2: Trying ::1...
T3: Connected to sonic.local.
T4: Escape character is '^]'.
T5: CONNECTED <sonic-server v1.0.0>
T6: START control SecretPassword
T7: STARTED control protocol(1) buffer(20000)
T8: TRIGGER consolidate
T9: OK
T10: PING
T11: PONG
T12: QUIT
T13: ENDED quit
T14: Connection closed by foreign host.
Notes on what happens:
- T6: we enter
control
mode (this is required to enablecontrol
commands); - T8: we trigger a database consolidation (instead of waiting for the next automated consolidation tick);
Implementation
imports
these are the imports that we will use because we will be dealing with networks, some data parsing, .. etc
import strformat, tables, json, strutils, sequtils, hashes, net, asyncdispatch, asyncnet, os, strutils, parseutils, deques, options, net
Types
As we said earlier there're three channels
type
SonicChannel* {.pure.} = enum
Ingest
Search
Control
Generic sonic exception
type
SonicServerError = object of Exception
Now for the base connection
type
SonicBase[TSocket] = ref object of RootObj
socket: TSocket
host: string
port: int
password: string
connected: bool
timeout*: int
protocol*: int
bufSize*: int
channel*: SonicChannel
Sonic* = ref object of SonicBase[net.Socket]
AsyncSonic* = ref object of SonicBase[asyncnet.AsyncSocket]
we require
- host: sonic server running on
- password: for sonic server
- connected: flag for connected or none
- timeout: timeout in seconds
- protocol: information sent to us on connecting to sonic server
- bufsize: how big is the data buffer u can use
- channel: to indicate the current mode.
Helpers
proc quoteText(text:string): string =
## Quote text and normalize it in sonic protocol context.
## - text str text to quote/escape
## Returns:
## str quoted text
return '"' & text.replace('"', '\"').replace("\r\n", "") & '"'
quoteText used to escape quotes and replace newline
proc isError(response: string): bool =
## Check if the response is Error or not in sonic context.
## Errors start with `ERR`
## - response response string
## Returns:
## bool true if response is an error.
response.startsWith("ERR ")
isError checks if the response represents and error
proc raiseForError(response:string): string =
## Raise SonicServerError in case of error response.
## - response message to check if it's error or not.
## Returns:
## str the response message
if isError(response):
raise newException(SonicServerError, response)
return response
raiseError a short circuit for raising errors if response is an errror or returning response
Making a connection
proc open*(host = "localhost", port = 1491, password="", channel:SonicChannel, ssl=false, timeout=0): Sonic =
result = Sonic(
socket: newSocket(buffered = true),
host: host,
port: port,
password: password,
channel: channel
)
result.timeout = timeout
result.channel = channel
when defined(ssl):
if ssl == true:
SSLifySonicConnectionNoVerify(result)
result.socket.connect(host, port.Port)
result.startSession()
proc openAsync*(host = "localhost", port = 1491, password="", channel:SonicChannel, ssl=false, timeout=0): Future[AsyncSonic] {.async.} =
## Open an asynchronous connection to a Sonic server.
result = AsyncSonic(
socket: newAsyncSocket(buffered = true),
channel: channel
)
when defined(ssl):
if ssl == true:
SSLifySonicConnectionNoVerify(result)
result.timeout = timeout
await result.socket.connect(host, port.Port)
await result.startSession()
Here we support to APIs async/sync APIs for opening connection and as soon as we do the connection we call startSession
startSession
proc startSession*(this:Sonic|AsyncSonic): Future[void] {.multisync.} =
let resp = await this.socket.recvLine()
if "CONNECTED" in resp:
this.connected = true
var channelName = ""
case this.channel:
of SonicChannel.Ingest: channelName = "ingest"
of SonicChannel.Search: channelName = "search"
of SonicChannel.COntrol: channelName = "control"
let msg = fmt"START {channelName} {this.password} \r\n"
await this.socket.send(msg) #### start
discard await this.socket.recvLine() #### started. FIXME extract protocol bufsize
## TODO: this.parseSessionMeta(line)
- we use multisync pragma to support async, sync APIs (check redisclient chapter for more info).
according to wire protocol we just send the raw string
START
SPACE
CHANNEL_NAME
SONIC_PASSWORD
and terminate that with\r\n
- when we recieve data we should parse protocol version and the bufsize and set it in our SonicClient
this
Sending/Receiving data
proc receiveManaged*(this:Sonic|AsyncSonic, size=1): Future[string] {.multisync.} =
when this is Sonic:
if this.timeout == 0:
result = this.socket.recvLine()
else:
result = this.socket.recvLine(timeout=this.timeout)
else:
result = await this.socket.recvLine()
result = raiseForError(result.strip())
proc execCommand*(this: Sonic|AsyncSonic, command: string, args:seq[string]): Future[string] {.multisync.} =
let cmdArgs = concat(@[command], args)
let cmdStr = join(cmdArgs, " ").strip()
await this.socket.send(cmdStr & "\r\n")
result = await this.receiveManaged()
proc execCommand*(this: Sonic|AsyncSonic, command: string): Future[string] {.multisync.} =
result = await this.execCommand(command, @[""])
here we have couple helpers to send data on the wire execCommand
and receiving data receiveManaged
- we only support timeout for sync client (there's a withTimeout for async the user can try to implement )
Now we have everything we need to interact with sonic server, but not with userfriendly API, we can do better by converting the results to nim data structures or booleans when suitable
User-friendly APIs
Ping
checks the server endpoint
proc ping*(this: Sonic|AsyncSonic): Future[bool] {.multisync.} =
## Send ping command to the server
## Returns:
## bool True if successfully reaching the server.
result = (await this.execCommand("PING")) == "PONG"
Quit
Ends the connection
proc quit*(this: Sonic|AsyncSonic): Future[string] {.multisync.} =
## Quit the channel and closes the connection.
result = await this.execCommand("QUIT")
this.socket.close()
Push
Pushes search data into the index
proc push*(this: Sonic|AsyncSonic, collection, bucket, objectName, text: string, lang=""): Future[bool] {.multisync.} =
## Push search data in the index
## - collection: index collection (ie. what you search in, eg. messages, products, etc.)
## - bucket: index bucket name (ie. user-specific search classifier in the collection if you have any eg. user-1, user-2, .., otherwise use a common bucket name eg. generic, procault, common, ..)
## - objectName: object identifier that refers to an entity in an external database, where the searched object is stored (eg. you use Sonic to index CRM contacts by name; full CRM contact data is stored in a MySQL database; in this case the object identifier in Sonic will be the MySQL primary key for the CRM contact)
## - text: search text to be indexed can be a single word, or a longer text; within maximum length safety limits
## - lang: ISO language code
## Returns:
## bool True if search data are pushed in the index.
var langString = ""
if lang != "":
langString = fmt"LANG({lang})"
let text = quoteText(text)
result = (await this.execCommand("PUSH", @[collection, bucket, objectName, text, langString]))=="OK"
Pop
Pops search data from the index
proc pop*(this: Sonic|AsyncSonic, collection, bucket, objectName, text: string): Future[int] {.multisync.} =
## Pop search data from the index
## - collection: index collection (ie. what you search in, eg. messages, products, etc.)
## - bucket: index bucket name (ie. user-specific search classifier in the collection if you have any eg. user-1, user-2, .., otherwise use a common bucket name eg. generic, procault, common, ..)
## - objectName: object identifier that refers to an entity in an external database, where the searched object is stored (eg. you use Sonic to index CRM contacts by name; full CRM contact data is stored in a MySQL database; in this case the object identifier in Sonic will be the MySQL primary key for the CRM contact)
## - text: search text to be indexed can be a single word, or a longer text; within maximum length safety limits
## Returns:
## int
let text = quoteText(text)
let resp = await this.execCommand("POP", @[collection, bucket, objectName, text])
result = resp.split()[^1].parseInt()
Count
Count the indexed data
proc count*(this: Sonic|AsyncSonic, collection, bucket, objectName: string): Future[int] {.multisync.} =
## Count indexed search data
## - collection: index collection (ie. what you search in, eg. messages, products, etc.)
## - bucket: index bucket name (ie. user-specific search classifier in the collection if you have any eg. user-1, user-2, .., otherwise use a common bucket name eg. generic, procault, common, ..)
## - objectName: object identifier that refers to an entity in an external database, where the searched object is stored (eg. you use Sonic to index CRM contacts by name; full CRM contact data is stored in a MySQL database; in this case the object identifier in Sonic will be the MySQL primary key for the CRM contact)
## Returns:
## int count of index search data.
var bucketString = ""
if bucket != "":
bucketString = bucket
var objectNameString = ""
if objectName != "":
objectNameString = objectName
result = parseInt(await this.execCommand("COUNT", @[collection, bucket, objectName]))
flush
Generic flush to be called from flushCollection, flushBucket, flushObject
proc flush*(this: Sonic|AsyncSonic, collection: string, bucket="", objectName=""): Future[int] {.multisync.} =
## Flush indexed data in a collection, bucket, or in an object.
## - collection: index collection (ie. what you search in, eg. messages, products, etc.)
## - bucket: index bucket name (ie. user-specific search classifier in the collection if you have any eg. user-1, user-2, .., otherwise use a common bucket name eg. generic, procault, common, ..)
## - objectName: object identifier that refers to an entity in an external database, where the searched object is stored (eg. you use Sonic to index CRM contacts by name; full CRM contact data is stored in a MySQL database; in this case the object identifier in Sonic will be the MySQL primary key for the CRM contact)
## Returns:
## int number of flushed data
if bucket == "" and objectName=="":
result = await this.flushCollection(collection)
elif bucket != "" and objectName == "":
result = await this.flushBucket(collection, bucket)
elif objectName != "" and bucket != "":
result = await this.flushObject(collection, bucket, objectName)
flushCollection
Flushes all the indexed data from a collection
proc flushCollection*(this: Sonic|AsyncSonic, collection: string): Future[int] {.multisync.} =
## Flush all indexed data from a collection
## - collection index collection (ie. what you search in, eg. messages, products, etc.)
## Returns:
## int number of flushed data
result = (await this.execCommand("FLUSHC", @[collection])).parseInt
flushBucket
flushes all indexd data from a bucket in a collection
proc flushBucket*(this: Sonic|AsyncSonic, collection, bucket: string): Future[int] {.multisync.} =
## Flush all indexed data from a bucket in a collection
## - collection: index collection (ie. what you search in, eg. messages, products, etc.)
## - bucket: index bucket name (ie. user-specific search classifier in the collection if you have any eg. user-1, user-2, .., otherwise use a common bucket name eg. generic, procault, common, ..)
## Returns:
## int number of flushed data
result = (await this.execCommand("FLUSHB", @[collection, bucket])).parseInt
flushObject
Flushes all indexed data from an object in a bucket in collection
proc flushObject*(this: Sonic|AsyncSonic, collection, bucket, objectName: string): Future[int] {.multisync.} =
## Flush all indexed data from an object in a bucket in collection
## - collection: index collection (ie. what you search in, eg. messages, products, etc.)
## - bucket: index bucket name (ie. user-specific search classifier in the collection if you have any eg. user-1, user-2, .., otherwise use a common bucket name eg. generic, procault, common, ..)
## - objectName: object identifier that refers to an entity in an external database, where the searched object is stored (eg. you use Sonic to index CRM contacts by name; full CRM contact data is stored in a MySQL database; in this case the object identifier in Sonic will be the MySQL primary key for the CRM contact)
## Returns:
## int number of flushed data
result = (await this.execCommand("FLUSHO", @[collection, bucket, objectName])).parseInt
Query
Queries sonic and returns a list of results.
proc query*(this: Sonic|AsyncSonic, collection, bucket, terms: string, limit=10, offset: int=0, lang=""): Future[seq[string]] {.multisync.} =
## Query the database
## - collection index collection (ie. what you search in, eg. messages, products, etc.)
## - bucket index bucket name (ie. user-specific search classifier in the collection if you have any eg. user-1, user-2, .., otherwise use a common bucket name eg. generic, procault, common, ..)
## - terms text for search terms
## - limit a positive integer number; set within allowed maximum & minimum limits
## - offset a positive integer number; set within allowed maximum & minimum limits
## - lang an ISO 639-3 locale code eg. eng for English (if set, the locale must be a valid ISO 639-3 code; if not set, the locale will be guessed from text).
## Returns:
## list list of objects ids.
let limitString = fmt"LIMIT({limit})"
var langString = ""
if lang != "":
langString = fmt"LANG({lang})"
let offsetString = fmt"OFFSET({offset})"
let termsString = quoteText(terms)
discard await this.execCommand("QUERY", @[collection, bucket, termsString, limitString, offsetString, langString])
let resp = await this.receiveManaged()
result = resp.splitWhitespace()[3..^1]
Suggest
autocompletes a word using a collection and a bucket.
proc suggest*(this: Sonic|AsyncSonic, collection, bucket, word: string, limit=10): Future[seq[string]] {.multisync.} =
## auto-completes word.
## - collection index collection (ie. what you search in, eg. messages, products, etc.)
## - bucket index bucket name (ie. user-specific search classifier in the collection if you have any eg. user-1, user-2, .., otherwise use a common bucket name eg. generic, procault, common, ..)
## - word word to autocomplete
## - limit a positive integer number; set within allowed maximum & minimum limits (procault: {None})
## Returns:
## list list of suggested words.
var limitString = fmt"LIMIT({limit})"
let wordString = quoteText(word)
discard await this.execCommand("SUGGEST", @[collection, bucket, wordString, limitString])
let resp = await this.receiveManaged()
result = resp.splitWhitespace()[3..^1]
Test code to use
when isMainModule:
proc testIngest() =
var cl = open("127.0.0.1", 1491, "dmdm", SonicChannel.Ingest)
echo $cl.execCommand("PING")
echo cl.ping()
echo cl.protocol
echo cl.bufsize
echo cl.push("wiki", "articles", "article-1",
"for the love of god hell")
echo cl.pop("wiki", "articles", "article-1",
"for the love of god hell")
echo cl.pop("wikis", "articles", "article-1",
"for the love of god hell")
echo cl.push("wiki", "articles", "article-2",
"for the love of satan heaven")
echo cl.push("wiki", "articles", "article-3",
"for the love of lorde hello")
echo cl.push("wiki", "articles", "article-4",
"for the god of loaf helmet")
proc testSearch() =
var cl = open("127.0.0.1", 1491, "dmdm", SonicChannel.Search)
echo $cl.execCommand("PING")
echo cl.ping()
echo cl.query("wiki", "articles", "for")
echo cl.query("wiki", "articles", "love")
echo cl.suggest("wiki", "articles", "hell")
echo cl.suggest("wiki", "articles", "lo")
proc testControl() =
var cl = open("127.0.0.1", 1491, "dmdm", SonicChannel.Control)
echo $cl.execCommand("PING")
echo cl.ping()
echo cl.trigger("consolidate")
testIngest()
testSearch()
testControl()
Code is available on xmonader/nim-sonic-client. Feel free to send me a PR or open an issue.
Day 18: From a socket to a Webframework
Today we will be focusing on building a webframework starting from a socket :)
What to expect
proc main() =
var router = newRouter()
let loggingMiddleware = proc(request: var Request): (ref Response, bool) =
let path = request.path
let headers = request.headers
echo "==============================="
echo "from logger handler"
echo "path: " & path
echo "headers: " & $headers
echo "==============================="
return (newResponse(), true)
let trimTrailingSlash = proc(request: var Request): (ref Response, bool) =
let path = request.path
if path.endswith("/"):
request.path = path[0..^2]
echo "==============================="
echo "from slash trimmer "
echo "path was : " & path
echo "path: " & request.path
echo "==============================="
return (newResponse(), true)
proc handleHello(req:var Request): ref Response =
result = newResponse()
result.code = Http200
result.content = "hello world from handler /hello" & $req
router.addRoute("/hello", handleHello)
let assertJwtFieldExists = proc(request: var Request): (ref Response, bool) =
echo $request.headers
let jwtHeaderVals = request.headers.getOrDefault("jwt", @[""])
let jwt = jwtHeaderVals[0]
echo "================\n\njwt middleware"
if jwt.len != 0:
echo fmt"bye bye {jwt} "
else:
echo fmt"sure bye but i didn't get ur name"
echo "===================\n\n"
return (newResponse(), true)
router.addRoute("/bye", handleHello, HttpGet, @[assertJwtFieldExists])
proc handleGreet(req:var Request): ref Response =
result = newResponse()
result.code = Http200
result.content = "generic greet" & $req
router.addRoute("/greet", handleGreet, HttpGet, @[])
router.addRoute("/greet/:username", handleGreet, HttpGet, @[])
router.addRoute("/greet/:first/:second/:lang", handleGreet, HttpGet, @[])
let opts = ServerOptions(address:"127.0.0.1", port:9000.Port)
var s = newServy(opts, router, @[loggingMiddleware, trimTrailingSlash])
asyncCheck s.serve()
echo "servy started..."
runForever()
main()
defining a handler and wiring to to a pattern or more
proc handleHello(req:var Request): ref Response =
result = newResponse()
result.code = Http200
result.content = "hello world from handler /hello" & $req
router.addRoute("/hello", handleHello)
proc handleGreet(req:var Request): ref Response =
result = newResponse()
result.code = Http200
result.content = "generic greet" & $req
router.addRoute("/greet", handleGreet, HttpGet, @[])
router.addRoute("/greet/:username", handleGreet, HttpGet, @[])
router.addRoute("/greet/:first/:second/:lang", handleGreet, HttpGet, @[])
defining/registering middlewares on the server globally
let loggingMiddleware = proc(request: var Request): (ref Response, bool) =
let path = request.path
let headers = request.headers
echo "==============================="
echo "from logger handler"
echo "path: " & path
echo "headers: " & $headers
echo "==============================="
return (newResponse(), true)
let trimTrailingSlash = proc(request: var Request): (ref Response, bool) =
let path = request.path
if path.endswith("/"):
request.path = path[0..^2]
echo "==============================="
echo "from slash trimmer "
echo "path was : " & path
echo "path: " & request.path
echo "==============================="
return (newResponse(), true)
var s = newServy(opts, router, @[loggingMiddleware, trimTrailingSlash])
defining middlewares (request filters on certain routes)
router.addRoute("/bye", handleHello, HttpGet, @[assertJwtFieldExists])
Sounds like a lot. Let's get to it.
Implementation
The big picture
proc newServy(options: ServerOptions, router:ref Router, middlewares:seq[MiddlewareFunc]): ref Servy =
result = new Servy
result.options = options
result.router = router
result.middlewares = middlewares
result.sock = newAsyncSocket()
result.sock.setSockOpt(OptReuseAddr, true)
we have a server listening on a socket/address (should be configurable) and has a router that knows which pattern should be handled by which handler and a set of middlewares to be used.
proc serve(s: ref Servy) {.async.} =
s.sock.bindAddr(s.options.port)
s.sock.listen()
while true:
let client = await s.sock.accept()
asyncCheck s.handleClient(client)
runForever()
we receive a connection and pass it to handleClient
proc
proc handleClient(s: ref Servy, client: AsyncSocket) {.async.} =
## code to read request from the user
var req = await s.parseRequestFromConnection(client)
...
echo "received request from client: " & $req
## code to get the route handler
let (routeHandler, params) = s.router.getByPath(req.path)
req.urlParams = params
let handler = routeHandler.handlerFunc
..
## call the handler and return response in valid http protocol format
let resp = handler(req)
echo "reached the handler safely.. and executing now."
await client.send(resp.format())
echo $req.formData
handleClient reads the data from the wire in HTTP protocol and finds the route or requested path handler and then formats a valid http response and write it on the wire. Cool? Awesome!
Example HTTP requests and responses
when you execute curl httpbin.org/get -v
the following (http formatted request) is sent to httpbin.org
webserver
GET /get HTTP/1.1
Host: httpbin.org
User-Agent: curl/7.62.0-DEV
That is called a Request
that has a request line METHOD PATH HTTPVERSION
e.g GET /get HTTP/1.1
. Followed by a list of headers lines with colon in it
representing key values
e.g
Host: httpbin.org
a header is a line ofKey: value
User-Agent: curl/7.62.0-DEV
a header indicating the client type
As soon as the server receives that request it'll handle it as it was told to
HTTP/1.1 200 OK
Content-Type: application/json
Date: Mon, 21 Oct 2019 18:28:13 GMT
Server: nginx
Content-Length: 206
{
"args": {},
"headers": {
"Accept": "*/*",
"Host": "httpbin.org",
"User-Agent": "curl/7.62.0-DEV"
},
"origin": "197.52.178.58, 197.52.178.58",
"url": "https://httpbin.org/get"
}
This is called a Response, response consists of
- status line:
HTTPVER STATUS_CODE STATUS_MESSAGE
e.gHTTP/1.1 200 OK
- list of headers
Content-Type
:application/json
type of contentDate
:Mon, 21 Oct 2019 18:28:13 GMT
date of the responseServer
: nginxserver name
Content-Length
: 206 length of the upcoming body
Now let's go over the abstractions needed
Http Version
There're multiple http specifications 0.9
, 1.0
, 1.1
, ..
so let's start with that. a Simple enum should be enough
type
HttpVersion* = enum
HttpVer11,
HttpVer10
proc `$`(ver:HttpVersion): string =
case ver
of HttpVer10: result="HTTP/1.0"
of HttpVer11: result="HTTP/1.1"
HttpMethods
We all know GET
, POST
, HEAD
, .. methods, again can be represented by a Simple enum
type
HttpMethod* = enum ## the requested HttpMethod
HttpHead, ## Asks for the response identical to the one that would
## correspond to a GET request, but without the response
## body.
HttpGet, ## Retrieves the specified resource.
HttpPost, ## Submits data to be processed to the identified
## resource. The data is included in the body of the
## request.
HttpPut, ## Uploads a representation of the specified resource.
HttpDelete, ## Deletes the specified resource.
HttpTrace, ## Echoes back the received request, so that a client
## can see what intermediate servers are adding or
## changing in the request.
HttpOptions, ## Returns the HTTP methods that the server supports
## for specified address.
HttpConnect, ## Converts the request connection to a transparent
## TCP/IP tunnel, usually used for proxies.
HttpPatch ## Applies partial modifications to a resource.
proc httpMethodFromString(txt: string): Option[HttpMethod] =
let s2m = {"GET": HttpGet, "POST": HttpPost, "PUT":HttpPut, "PATCH": HttpPatch, "DELETE": HttpDelete, "HEAD":HttpHead}.toTable
if txt in s2m:
result = some(s2m[txt.toUpper])
else:
result = none(HttpMethod)
Also we add httpMethodFromString
that takes a string and returns option[HttpMethod] value.
Http Code
HTTP specifications specifies certain code responses (status codes) to indicate the state for the request
- 20X -> it's fine
- 30X -> redirections
- 40X -> client messed up
- 50X -> server messed up
HttpCode* = distinct range[0 .. 599]
const
Http200* = HttpCode(200)
Http201* = HttpCode(201)
Http202* = HttpCode(202)
Http203* = HttpCode(203)
...
Http300* = HttpCode(300)
Http301* = HttpCode(301)
Http302* = HttpCode(302)
Http303* = HttpCode(303)
..
Http400* = HttpCode(400)
Http401* = HttpCode(401)
Http403* = HttpCode(403)
Http404* = HttpCode(404)
Http405* = HttpCode(405)
Http406* = HttpCode(406)
...
Http451* = HttpCode(451)
Http500* = HttpCode(500)
...
proc `$`*(code: HttpCode): string =
## Converts the specified ``HttpCode`` into a HTTP status.
##
## For example:
##
## .. code-block:: nim
## doAssert($Http404 == "404 Not Found")
case code.int
..
of 200: "200 OK"
of 201: "201 Created"
of 202: "202 Accepted"
of 204: "204 No Content"
of 205: "205 Reset Content"
...
of 301: "301 Moved Permanently"
of 302: "302 Found"
of 303: "303 See Other"
..
of 400: "400 Bad Request"
of 401: "401 Unauthorized"
of 403: "403 Forbidden"
of 404: "404 Not Found"
of 405: "405 Method Not Allowed"
of 406: "406 Not Acceptable"
of 408: "408 Request Timeout"
of 409: "409 Conflict"
of 410: "410 Gone"
of 411: "411 Length Required"
of 413: "413 Request Entity Too Large"
of 414: "414 Request-URI Too Long"
of 415: "415 Unsupported Media Type"
of 416: "416 Requested Range Not Satisfiable"
of 429: "429 Too Many Requests"
...
of 500: "500 Internal Server Error"
of 501: "501 Not Implemented"
of 502: "502 Bad Gateway"
of 503: "503 Service Unavailable"
of 504: "504 Gateway Timeout"
...
else: $(int(code))
the code above is taken from pure/http
in nim stdlib
headers
another abstraction we need is the headers list. Headers in http aren't just key=value, but key=[value] so key can has a list of values.
type HttpHeaders* = ref object
table*: TableRef[string, seq[string]]
type HttpHeaderValues* = seq[string]
proc newHttpHeaders*(): HttpHeaders =
new result
result.table = newTable[string, seq[string]]()
proc newHttpHeaders*(keyValuePairs:
seq[tuple[key: string, val: string]]): HttpHeaders =
var pairs: seq[tuple[key: string, val: seq[string]]] = @[]
for pair in keyValuePairs:
pairs.add((pair.key.toLowerAscii(), @[pair.val]))
new result
result.table = newTable[string, seq[string]](pairs)
proc `$`*(headers: HttpHeaders): string =
return $headers.table
proc clear*(headers: HttpHeaders) =
headers.table.clear()
proc `[]`*(headers: HttpHeaders, key: string): HttpHeaderValues =
## Returns the values associated with the given ``key``. If the returned
## values are passed to a procedure expecting a ``string``, the first
## value is automatically picked. If there are
## no values associated with the key, an exception is raised.
##
## To access multiple values of a key, use the overloaded ``[]`` below or
## to get all of them access the ``table`` field directly.
return headers.table[key.toLowerAscii].HttpHeaderValues
# converter toString*(values: HttpHeaderValues): string =
# return seq[string](values)[0]
proc `[]`*(headers: HttpHeaders, key: string, i: int): string =
## Returns the ``i``'th value associated with the given key. If there are
## no values associated with the key or the ``i``'th value doesn't exist,
## an exception is raised.
return headers.table[key.toLowerAscii][i]
proc `[]=`*(headers: HttpHeaders, key, value: string) =
## Sets the header entries associated with ``key`` to the specified value.
## Replaces any existing values.
headers.table[key.toLowerAscii] = @[value]
proc `[]=`*(headers: HttpHeaders, key: string, value: seq[string]) =
## Sets the header entries associated with ``key`` to the specified list of
## values.
## Replaces any existing values.
headers.table[key.toLowerAscii] = value
proc add*(headers: HttpHeaders, key, value: string) =
## Adds the specified value to the specified key. Appends to any existing
## values associated with the key.
if not headers.table.hasKey(key.toLowerAscii):
headers.table[key.toLowerAscii] = @[value]
else:
headers.table[key.toLowerAscii].add(value)
proc del*(headers: HttpHeaders, key: string) =
## Delete the header entries associated with ``key``
headers.table.del(key.toLowerAscii)
iterator pairs*(headers: HttpHeaders): tuple[key, value: string] =
## Yields each key, value pair.
for k, v in headers.table:
for value in v:
yield (k, value)
proc contains*(values: HttpHeaderValues, value: string): bool =
## Determines if ``value`` is one of the values inside ``values``. Comparison
## is performed without case sensitivity.
for val in seq[string](values):
if val.toLowerAscii == value.toLowerAscii: return true
proc hasKey*(headers: HttpHeaders, key: string): bool =
return headers.table.hasKey(key.toLowerAscii())
proc getOrDefault*(headers: HttpHeaders, key: string,
default = @[""].HttpHeaderValues): HttpHeaderValues =
## Returns the values associated with the given ``key``. If there are no
## values associated with the key, then ``default`` is returned.
if headers.hasKey(key):
return headers[key]
else:
return default
proc len*(headers: HttpHeaders): int = return headers.table.len
proc parseList(line: string, list: var seq[string], start: int): int =
var i = 0
var current = ""
while start+i < line.len and line[start + i] notin {'\c', '\l'}:
i += line.skipWhitespace(start + i)
i += line.parseUntil(current, {'\c', '\l', ','}, start + i)
list.add(current)
if start+i < line.len and line[start + i] == ',':
i.inc # Skip ,
current.setLen(0)
proc parseHeader*(line: string): tuple[key: string, value: seq[string]] =
## Parses a single raw header HTTP line into key value pairs.
##
## Used by ``asynchttpserver`` and ``httpclient`` internally and should not
## be used by you.
result.value = @[]
var i = 0
i = line.parseUntil(result.key, ':')
inc(i) # skip :
if i < len(line):
i += parseList(line, result.value, i)
elif result.key.len > 0:
result.value = @[""]
else:
result.value = @[]
So we have the abstraction now over the headers. very nice.
Request
type Request = object
httpMethod*: HTTPMethod
httpVersion*: HttpVersion
headers*: HTTPHeaders
path*: string
body*: string
queryParams*: TableRef[string, string]
formData*: TableRef[string, string]
urlParams*: TableRef[string, string]
request is a type that keeps track of
- http version: from the client request
- request method: get, post, .. etc
- requested path: if the url is
localhost:9000/users/myfile
the requested path would be/users/myfile
- headers: request headers
- body: body
- formData: submitted form data
- queryParams: if the url is
/users/search?name=xmon&age=50
the queryParams will be Table {"name":"xmon", "age":50} - urlParams: are the captured variables by the router
if we have a route to handle
/users/:username/:language
and we received request with path/users/xmon/ar
it will bindusername
toxmon
andlanguage
toar
and make that available on the request object to be used later on by the handler.
Building the request
remember the handleClient
that we mentioned in the big picture section?
proc handleClient(s: ref Servy, client: AsyncSocket) {.async.} =
var req = await s.parseRequestFromConnection(client)
...
So let's implement parseRequestFromConnection
proc parseRequestFromConnection(s: ref Servy, conn:AsyncSocket): Future[Request] {.async.} =
result.queryParams = newTable[string, string]()
result.formData = newTable[string, string]()
result.urlParams = newTable[string, string]()
let requestline = $await conn.recvLine(maxLength=maxLine)
var meth, path, httpver: string
var parts = requestLine.splitWhitespace()
meth = parts[0]
path = parts[1]
httpver = parts[2]
var contentLength = 0
echo meth, path, httpver
let m = httpMethodFromString(meth)
if m.isSome:
result.httpMethod = m.get()
else:
echo meth
raise newException(OSError, "invalid httpmethod")
if "1.1" in httpver:
result.httpVersion = HttpVer11
elif "1.0" in httpver:
result.httpVersion = HttpVer10
result.path = path
if "?" in path:
# has query params
result.queryParams = parseQueryParams(path)
First we parse the request line METHOD PATH HTTPVER
e.g GET /users HTTP/1.1
so if we split on spaces we get the method, path, and http version
Also if there's ?
like in /users?username=xmon
in the request path, we should parse the Query Parameters
proc parseQueryParams(content: string): TableRef[string, string] =
result = newTable[string, string]()
var consumed = 0
if "?" notin content and "=" notin content:
return
if "?" in content:
consumed += content.skipUntil({'?'}, consumed)
inc consumed # skip ? now.
while consumed < content.len:
if "=" notin content[consumed..^1]:
break
var key = ""
var val = ""
consumed += content.parseUntil(key, "=", consumed)
inc consumed # =
consumed += content.parseUntil(val, "&", consumed)
inc consumed
# result[decodeUrl(key)] = result[decodeUrl(val)]
result.add(decodeUrl(key), decodeUrl(val))
echo "consumed:" & $consumed
echo "contentlen:" & $content.len
Next should be the headers
result.headers = newHttpHeaders()
# parse headers
var line = ""
line = $(await conn.recvLine(maxLength=maxLine))
echo fmt"line: >{line}< "
while line != "\r\n":
# a header line
let kv = parseHeader(line)
result.headers[kv.key] = kv.value
if kv.key.toLowerAscii == "content-length":
contentLength = parseInt(kv.value[0])
line = $(await conn.recvLine(maxLength=maxLine))
# echo fmt"line: >{line}< "
We receive the headers and figure out the body length from content-length
header to know how much to consume from the socket after we're done with the headers.
if contentLength > 0:
result.body = await conn.recv(contentLength)
discard result.parseFormData()
Now that we know how much to consume (contentLength
) from socket we can capture the request's body.
Notice that parseFormData
handles the form submitted in the request, let's take a look at that next.
Submitting data.
In HTTP there are different Content-Type(s)
to submit (post) data: application/x-www-form-urlencoded
and multipart/form-data
.
Quoting stackoverflow answer
The purpose of both of those types of requests is to send a list of name/value pairs to the server. Depending on the type and amount of data being transmitted, one of the methods will be more efficient than the other. To understand why, you have to look at what each is doing under the covers.
For application/x-www-form-urlencoded, the body of the HTTP message sent to the server is essentially one giant query string -- name/value pairs are separated by the ampersand (&), and names are separated from values by the equals symbol (=). An example of this would be:
MyVariableOne=ValueOne&MyVariableTwo=ValueTwo
That means that for each non-alphanumeric byte that exists in one of our values, it's going to take three bytes to represent it. For large binary files, tripling the payload is going to be highly inefficient.
That's where multipart/form-data comes in. With this method of transmitting name/value pairs, each pair is represented as a "part" in a MIME message (as described by other answers). Parts are separated by a particular string boundary (chosen specifically so that this boundary string does not occur in any of the "value" payloads). Each part has its own set of MIME headers like Content-Type, and particularly Content-Disposition, which can give each part its "name." The value piece of each name/value pair is the payload of each part of the MIME message. The MIME spec gives us more options when representing the value payload -- we can choose a more efficient encoding of binary data to save bandwidth (e.g. base 64 or even raw binary).
e.g:
If you want to send the following data to the web server:
name = John
age = 12
using application/x-www-form-urlencoded
would be like this:
name=John&age=12
As you can see, the server knows that parameters are separated by an ampersand &. If & is required for a parameter value then it must be encoded.
So how does the server know where a parameter value starts and ends when it receives an HTTP request using multipart/form-data?
Using the boundary, similar to &.
For example:
--XXX
Content-Disposition: form-data; name="name"
John
--XXX
Content-Disposition: form-data; name="age"
12
--XXX--
reference of the above explanation
type FormPart = object
name*: string
headers*: HttpHeaders
body*: string
proc newFormPart(): ref FormPart =
new result
result.headers = newHttpHeaders()
proc `$`(this:ref FormPart): string =
result = fmt"partname: {this.name} partheaders: {this.headers} partbody: {this.body}"
type FormMultiPart = object
parts*: TableRef[string, ref FormPart]
proc newFormMultiPart(): ref FormMultiPart =
new result
result.parts = newTable[string, ref FormPart]()
proc `$`(this: ref FormMultiPart): string =
return fmt"parts: {this.parts}"
So that's our abstraction for multipart form.
proc parseFormData(r: Request): ref FormMultiPart =
discard """
received request from client: (httpMethod: HttpPost, requestURI: "", httpVersion: HTTP/1.1, headers: {"accept": @["*/*"], "content-length": @["241"], "content-type": @["multipart/form-data; boundary=------------------------95909933ebe184f2"], "host": @["127.0.0.1:9000"], "user-agent": @["curl/7.62.0-DEV"]}, path: "/post", body: "--------------------------95909933ebe184f2\c\nContent-Disposition: form-data; name=\"who\"\c\n\c\nhamada\c\n--------------------------95909933ebe184f2\c\nContent-Disposition: form-data; name=\"next\"\c\n\c\nhome\c\n--------------------------95909933ebe184f2--\c\n", raw_body: "", queryParams: {:})
"""
result = newFormMultiPart()
let contenttype = r.headers.getOrDefault("content-type")[0]
let body = r.body
if "form-urlencoded" in contenttype.toLowerAscii():
# query params are the post body
let postBodyAsParams = parseQueryParams(body)
for k, v in postBodyAsParams.pairs:
r.queryParams.add(k, v)
if the content-type has the word form-urlencoded
we parse he body as if it was queryParams
elif contenttype.startsWith("multipart/") and "boundary" in contenttype:
var boundaryName = contenttype[contenttype.find("boundary=")+"boundary=".len..^1]
echo "boundayName: " & boundaryName
for partString in body.split(boundaryName & "\c\L"):
var part = newFormPart()
var partName = ""
var totalParsedLines = 1
let bodyLines = body.split("\c\L")[1..^1] # at the boundary line
for line in bodyLines:
if line.strip().len != 0:
let splitted = line.split(": ")
if len(splitted) == 2:
part.headers.add(splitted[0], splitted[1])
elif len(splitted) == 1:
part.headers.add(splitted[0], "")
if "content-disposition" in line.toLowerAscii and "name" in line.toLowerAscii:
# Content-Disposition: form-data; name="next"
var consumed = line.find("name=")+"name=".len
discard line.skip("\"", consumed)
inc consumed
consumed += line.parseUntil(partName, "\"", consumed)
else:
break # done with headers now for the body.
inc totalParsedLines
let content = join(bodyLines[totalParsedLines..^1], "\c\L")
part.body = content
part.name = partName
result.parts.add(partName, part)
echo $result.parts
if it's not form-urlencoded
then it's a multipart then we need to figure out the boundary and split the body on that boundary text
Response
Now that we can parse the client request we need to be able to build a correctly formatted response. Response keeps track of
- http version
- response status code
- response content
- response headers
type Response = object
headers: HttpHeaders
httpver: HttpVersion
code: HttpCode
content: string
Formatting response
proc formatStatusLine(code: HttpCode, httpver: HttpVersion) : string =
return fmt"{httpver} {code}" & "\r\n"
Here we build status line which is HTTPVERSION STATUS_CODE STATUS_MSG\r\n
e.g HTTP/1.1 200 OK
proc formatResponse(code:HttpCode, httpver:HttpVersion, content:string, headers:HttpHeaders): string =
result &= formatStatusLine(code, httpver)
if headers.len > 0:
for k,v in headers.pairs:
result &= fmt"{k}: {v}" & "\r\n"
result &= fmt"Content-Length: {content.len}" & "\r\n\r\n"
result &= content
echo "will send"
echo result
proc format(resp: ref Response) : string =
result = formatResponse(resp.code, resp.httpver, resp.content, resp.headers)
To format a complete response we need
- building status line
- headers to string
- content length to be the length for the body
- the body itself
Handling client request
so every handler function should take a Request
object and return a Response to be sent on the wire. Right?
proc handleClient(s: ref Servy, client: AsyncSocket) {.async.} =
var req = await s.parseRequestFromConnection(client)
...
let (routeHandler, params) = s.router.getByPath(req.path)
req.urlParams = params
let handler = routeHandler.handlerFunc
...
let resp = handler(req)
await client.send(resp.format())
Very cool the router will magically return to us a suitable route handler or 404 handler if not found using its getByPath
proc
- We get the handler
- apply it to the request to get a valid http response
- send the response to the client on the wire.
Let's get to the Handler Function example definition again
proc handleHello(req:var Request): ref Response =
result = newResponse()
result.code = Http200
result.content = "hello world from handler /hello" & $req
so it takes a request and returns a response, how about we create an alias for that?
type HandlerFunc = proc(req: var Request):ref Response {.nimcall.}
Middlewares
It's typical in many frameworks to apply certain set of checks or functions on the incoming request before sending it to any handler, like logging the request first, or trimming the trailing slashes, or checking for a certain header
How can we implement that? Remember our handleClient
? they need to be applied before the request reach the handler so should be above handler(req)
proc handleClient(s: ref Servy, client: AsyncSocket) {.async.} =
var req = await s.parseRequestFromConnection(client)
### HERE SHOULD BE MIDDLEWARE Code
###
###
let (routeHandler, params) = s.router.getByPath(req.path)
req.urlParams = params
let handler = routeHandler.handlerFunc
...
let resp = handler(req)
await client.send(resp.format())
So let's get to the implementation
proc handleClient(s: ref Servy, client: AsyncSocket) {.async.} =
var req = await s.parseRequestFromConnection(client)
for m in s.middlewares:
let (resp, usenextmiddleware) = m(req)
if not usenextmiddleware:
echo "early return from middleware..."
await client.send(resp.format())
return
...
let handler = routeHandler.handlerFunc
...
let resp = handler(req)
await client.send(resp.format())
here we loop over all registered middlewares
- middleware should return a response to be sent if it needs to terminate the handling immediately
- should tell us if we should continue applying middlewares or terminate immediately
That's why the definition of a middleware is like that
let loggingMiddleware = proc(request: var Request): (ref Response, bool) =
let path = request.path
let headers = request.headers
echo "==============================="
echo "from logger handler"
echo "path: " & path
echo "headers: " & $headers
echo "==============================="
return (newResponse(), true)
Let's create an alias for middleware function so we can use it easily in the rest of our code
type MiddlewareFunc = proc(req: var Request): (ref Response, bool) {.nimcall.}
Route specific middlewares
above we talked about global application middlewares, but maybe we want to apply some middleware or filter
to a certain route
proc handleClient(s: ref Servy, client: AsyncSocket) {.async.} =
var req = await s.parseRequestFromConnection(client)
for m in s.middlewares:
let (resp, usenextmiddleware) = m(req)
if not usenextmiddleware:
echo "early return from middleware..."
await client.send(resp.format())
return
echo "received request from client: " & $req
let (routeHandler, params) = s.router.getByPath(req.path)
req.urlParams = params
let handler = routeHandler.handlerFunc
let middlewares = routeHandler.middlewares
for m in middlewares:
let (resp, usenextmiddleware) = m(req)
if not usenextmiddleware:
echo "early return from route middleware..."
await client.send(resp.format())
return
let resp = handler(req)
echo "reached the handler safely.. and executing now."
await client.send(resp.format())
echo $req.formData
notice now we have a route specific middlewares to apply as well before calling handler(req)
maybe to check for a header before allowing access on that route.
Router
Router is one of the essential components in our code it's responsible to keep track of what the registered pattern and their handlers so we can actually do something with incoming request and the filters middlewares
to apply on the request
type RouterValue = object
handlerFunc: HandlerFunc
middlewares:seq[MiddlewareFunc]
type Router = object
table: TableRef[string, RouterValue]
Basic definition of the router as it's a map from a url pattern
to RouterValue
that basically has a reference to the handler proc and a sequence of middlewares/filters
proc newRouter(): ref Router =
result = new Router
result.table = newTable[string, RouterValue]()
Initializing the router
proc handle404(req: var Request): ref Response =
var resp = newResponse()
resp.code = Http404
resp.content = fmt"nothing at {req.path}"
return resp
Simple 404 handler in case that we don't find a handler for the requested path
proc getByPath(r: ref Router, path: string, notFoundHandler:HandlerFunc=handle404) : (RouterValue, TableRef[string, string]) =
var found = false
if path in r.table: # exact match
return (r.table[path], newTable[string, string]())
for handlerPath, routerValue in r.table.pairs:
echo fmt"checking handler: {handlerPath} if it matches {path}"
let pathParts = path.split({'/'})
let handlerPathParts = handlerPath.split({'/'})
echo fmt"pathParts {pathParts} and handlerPathParts {handlerPathParts}"
if len(pathParts) != len(handlerPathParts):
echo "length isn't ok"
continue
else:
var idx = 0
var capturedParams = newTable[string, string]()
while idx<len(pathParts):
let pathPart = pathParts[idx]
let handlerPathPart = handlerPathParts[idx]
echo fmt"current pathPart {pathPart} current handlerPathPart: {handlerPathPart}"
if handlerPathPart.startsWith(":") or handlerPathPart.startsWith("@"):
echo fmt"found var in path {handlerPathPart} matches {pathPart}"
capturedParams[handlerPathPart[1..^1]] = pathPart
inc idx
else:
if pathPart == handlerPathPart:
inc idx
else:
break
if idx == len(pathParts):
found = true
return (routerValue, capturedParams)
if not found:
return (RouterValue(handlerFunc:notFoundHandler, middlewares: @[]), newTable[string, string]())
Here we search for pattern registered in the router for exact match or if it has varialbes we and capture their values
e.g: /users/:name/:lang
pattern matches the request /users/xmon/ar
and creates env Table
with {"name":"xmon", "lang":"ar"}
/mywebsite/homepage
pattern matches /mywebsite/homepage/blogs/:username
patternmatches the path
/blogs/xmonand
/blogs/ahmedso it capture the env with variable name
usernameand variable value
xmonor
ahmed` and returns- when we found the suitable handler and its env we set the env on the request on
urlParams
field and call the handler on the updated request. Remember ourhandleClient
proc?
proc handleClient(s: ref Servy, client: AsyncSocket) {.async.} =
var req = await s.parseRequestFromConnection(client)
## Global middlewares
## ..
## ..
let (routeHandler, params) = s.router.getByPath(req.path)
req.urlParams = params
let handler = routeHandler.handlerFunc
## Route middlewares.
## ..
## ..
let resp = handler(req)
await client.send(resp.format())
proc addHandler(router: ref Router, route: string, handler: HandlerFunc, httpMethod:HttpMethod=HttpGet, middlewares:seq[MiddlewareFunc]= @[]) =
router.table.add(route, RouterValue(handlerFunc:handler, middlewares:middlewares))
we provide a simple function to add a handler to a route setting the method type and the middlewares as well on a Router
object.
What's next?
We didn't talk about templates, cookies, sessions, dates, sending files and for sure that's not a complete HTTP ref implementation by any means. Jester is a great option to check. Thank you for going through this day and please feel free to send PR or open issue on nim-servy repository
Day 19: Wit.AI client
Nim client for wit.ai to Easily create text or voice based bots that humans can chat with on their preferred messaging platform. It helps to reduce expressions into entity/trait
e.g in your wit.ai project you define entity like VM
(virtual machine) and trait to be something like create
, stop
and when you send an expression like new virtual machine
or fresh vm
, wit.ai helps to reduce it to entity vm
and trait create
What to expect
let tok = getEnv("WIT_ACCESS_TOKEN", "")
if tok == "":
echo "Make sure to set WIT_ACCESS_TOKEN variable"
quit 1
var inp = ""
var w = newWit(tok)
while true:
echo "Enter your query or q to quit > "
inp = stdin.readLine()
if inp == "q":
quit 0
else:
echo w.message(inp)
Enter your query or q to quit >
new vm
{"_text":"new vm","entities":{"vm":[{"confidence":0.97072907352305,"value":"create"}]},"msg_id":"1N6CURN7qaJaSKXSK"}
Enter your query or q to quit >
new machine
{"_text":"new machine","entities":{"vm":[{"confidence":0.90071815565634,"value":"create"}]},"msg_id":"1t8dOpkPbAP6SgW49"}
Enter your query or q to quit >
new docker
{"_text":"new docker","entities":{"container":[{"confidence":0.98475238333984,"value":"create"}]},"msg_id":"1l7ocY7MVWBfUijsm"}
Enter your query or q to quit >
stop machine
{"_text":"stop machine","entities":{"vm":[{"confidence":0.66323929848545,"value":"stop"}]},"msg_id":"1ygXLjnQbEt4lVMyS"}
Enter your query or q to quit >
show my coins
{"_text":"show my coins","entities":{"wallet":[{"confidence":0.75480999601329,"value":"show"}]},"msg_id":"1SdYOY60xXdMvUG7b"}
Enter your query or q to quit >
view coins
{"_text":"view coins","entities":{"wallet":[{"confidence":0.5975926583378,"value":"show"}]},"msg_id":"1HZ3YlfLlr31JlbKZ"}
Enter your query or q to quit >
Speech
echo w.speech("/home/striky/startnewvm.wav", {"Content-Type": "audio/wav"}.toTable)
{
"_text" : "start new the m is",
"entities" : {
"vm" : [ {
"confidence" : 0.54805678200202,
"value" : "create"
} ]
},
"msg_id" : "1jHMTJGHEAFh8LHFS"
}
Implementation
imports
import strformat, tables, json, strutils, sequtils, hashes, net, asyncdispatch,
asyncnet, os, strutils, parseutils, deques, options, net
import json
import logging
import httpclient
import uri
var L = newConsoleLogger()
addHandler(L)
Here we import utilities we are going to use like string formatters, tables, json, http client .. etc and prepare default logger.
Crafting wit.ai API
let WIT_API_HOST = getEnv("WIT_URL", "https://api.wit.ai")
let WIT_API_VERSION = getEnv("WIT_API_VERSION", "20160516")
let DEFAULT_MAX_STEPS = 5
To work with wit.ai
API you will need to generate an API token.
WIT_API_HOST
: base URL for wit.ai APInotice it's https
then we will need-d:ssl
flag in compile phase.WIT_API_VERSION
: API version in wit.ai
We will be interested in /message
and /speech
endpoints in wit.ai API
Adding authorization to HTTP Headers
proc getWitAIRequestHeaders*(accessToken: string): HttpHeaders =
result = newHttpHeaders({
"authorization": "Bearer " & accessToken,
"accept": "application/vnd.wit." & WIT_API_VERSION & "+json"
})
To authorize our requests against wit.ai we need to add authorization
header.
Encoding params helper
proc encodeQueryStringTable(qsTable: Table[string, string]): string =
result = ""
if qsTable.len == 0:
return result
result = "?"
var first = true
for k, v in qsTable.pairs:
if not first:
result &= "&"
result &= fmt"{k}={encodeUrl(v)}"
first = false
echo $result
return result
A helper to encode key, value pairs into a query string `?key=val
Let's get to the client
Here we define the interesting parts to interact with wit.ai
type WitException* = object of Exception
Generic Exception to use
type Wit* = ref object of RootObj
accessToken*: string
client*: HttpClient
proc newWit(accessToken: string): Wit =
var w = new Wit
w.accessToken = accessToken
w.client = newHttpClient()
result = w
the entry point for our Wit.AI
client. the client Wit
keeps track of
accessToken
: to access the APIclient
: http client to use underneath
proc newRequest(this: Wit, meth = HttpGet, path: string, params: Table[string,
string], body = "", headers: Table[string, string]): string =
let fullUrl = WIT_API_HOST & path & encodeQueryStringTable(params)
this.client.headers = getWitAIRequestHeaders(this.accessToken)
if headers.len > 0:
for k, v in headers:
this.client.headers[k] = v
var resp: Response
if body == "":
resp = this.client.request(fullUrl, httpMethod = meth)
else:
resp = this.client.request(fullUrl, httpMethod = meth, body = body)
if resp.code != 200.HttpCode:
raise newException(WitException, (fmt"[-] {resp.code}: {resp.body} "))
result = resp.body
Generic helper to format/build wit.ai
requests. It does the following
- Prepares the headers with
authorization
usinggetWitAIRequestHeaders
- Prepares the full URL using the
WIT_API_HOST
and the queryparams
sent - Based on the method
HttpGet
orHttpPost
it'll issue the request and raises if response's status code is not200
- Returns the response body
/message endpoint
According to the docs of wit.ai only q
param is required.
Definition
GET https://api.wit.ai/message
Example request with single outcome
$ curl -XGET 'https://api.wit.ai/message?v=20170307&q=how%20many%20people%20between%20Tuesday%20and%20Friday' \
-H 'Authorization: Bearer $TOKEN'
Example response
{
"msg_id": "387b8515-0c1d-42a9-aa80-e68b66b66c27",
"_text": "how many people between Tuesday and Friday",
"entities": {
"metric": [ {
"metadata": "{'code': 324}",
"value": "metric_visitor",
"confidence": 0.9231
} ],
"datetime": [
{
"confidence": 0.954105,
"values": [
{
"to": {
"value": "2018-12-22T00:00:00.000-08:00",
"grain": "day"
},
"from": {
"value": "2018-12-18T00:00:00.000-08:00",
"grain": "day"
},
"type": "interval"
},
{
"to": {
"value": "2018-12-29T00:00:00.000-08:00",
"grain": "day"
},
"from": {
"value": "2018-12-25T00:00:00.000-08:00",
"grain": "day"
},
"type": "interval"
},
{
"to": {
"value": "2019-01-05T00:00:00.000-08:00",
"grain": "day"
},
"from": {
"value": "2019-01-01T00:00:00.000-08:00",
"grain": "day"
},
"type": "interval"
}
],
"to": {
"value": "2018-12-22T00:00:00.000-08:00",
"grain": "day"
},
"from": {
"value": "2018-12-18T00:00:00.000-08:00",
"grain": "day"
},
"type": "interval"
}
]
}
}
proc message*(this: Wit, msg: string, context: ref Table[string, string] = nil,
n = "", verbose = ""): string =
var params = initTable[string, string]()
if n != "":
params["n"] = n
if verbose != "":
params["verbose"] = verbose
if msg != "":
params["q"] = msg
if not context.isNil and context.len > 0:
var ctxNode = %* {}
for k, v in context.pairs:
ctxNode[k] = %*v
params["context"] = ( %* ctxNode).pretty()
return this.newRequest(HttpGet, path = "/message", params, "", initTable[
string, string]())
here we will allow msg
as the expression we want to check in wit.ai, and adding some extra params for more close mapping to the official API like context
, verbose
, n
msg
: User’s query. Length must be > 0 and < 280verbose
: A flag to get auxiliary information about entities, like the location within the sentence.n
: The maximum number of n-best trait entities you want to get back. The default is 1, and the maximum is 8context
: Context is key in natural language. For instance, at the same absolute instant, “today” will be resolved to a different value depending on the timezone of the user. (can containlocale
,timezone
,coords
for coordinates)
/speech endpoint
proc speech*(this: Wit, audioFilePath: string, headers: Table[string, string],
context: ref Table[string, string] = nil, n = "", verbose = ""): string =
var params = initTable[string, string]()
if n != "":
params["n"] = n
if verbose != "":
params["verbose"] = verbose
if not context.isNil and context.len > 0:
var ctxNode = %* {}
for k, v in context.pairs:
ctxNode[k] = %*v
params["context"] = ( %* ctxNode).pretty()
let body = readFile(audioFilePath)
return this.newRequest(HttpPost, path = "/speech", params, body, headers)
almost the same as /message
endpoint except we send audioFile content in body
same as /message
, but we will send an audio file.
Thanks
The complete sources can be found at nim-witai. Please feel free to contribute by opening PR or issue on the repo.
Day 20: CacheTable
Today we will implement an expiry feature on keys over nim tables
What to expect
var c = newCacheTable[string, string](initDuration(seconds = 2))
c.setKey("name", "ahmed", initDuration(seconds = 10))
c.setKey("color", "blue", initDuration(seconds = 5))
c.setKey("akey", "a value", DefaultExpiration)
c.setKey("akey2", "a value2", DefaultExpiration)
c.setKey("lang", "nim", NeverExpires)
- Here will will create a new Table from
string
tostring
- we are allowed to set the default expiration to
2 seconds
using Duration object globally on the TablenewCacheTable[string, string](initDuration(seconds = 2))
- We are allowed to override the default expiration when
setKey
by passing a duration object - We are allowed to set a key to
NeverExpires
Here's a small example to see the internals of execution
for i in countup(0, 20):
echo "has key name? " & $c.hasKey("name")
echo $c.getCache
echo $c.get("name")
echo $c.get("color")
echo $c.get("lang")
echo $c.get("akey")
echo $c.get("akey2")
os.sleep(1*1000)
Implementation
Imports
import tables, times, os, options, locks
type Expiration* = enum NeverExpires, DefaultExpiration
We have to types of Expiration
NeverExpires
basically the key stays there forever.DefaultExpiration
to use whatever global expiration value defined on the Table
type Entry*[V] = object
value*: V
ttl*: int64
type CacheTable*[K, V] = ref object
cache: Table[K, Entry[V]]
lock*: locks.Lock
defaultExpiration*: Duration
proc newCacheTable*[K, V](defaultExpiration = initDuration(
seconds = 5)): CacheTable[K, V] =
## Create new CacheTable
result = CacheTable[K, V]()
result.cache = initTable[K, Entry[V]]()
result.defaultExpiration = defaultExpiration
The only difference between our CacheTable
and Nim's Table is the entries are keeping track of Time To Live TTL
- Entry is a Generic entry we store in the CacheTable that has a value of a type
V
and keeps track of itsttl
- CacheTable is a Table from keys of type
K
to values of of typeEntry[V]
and keeps track of default expiration - newCacheTable is a helper to create a new CacheTable.
proc getCache*[K, V](t: CacheTable[K, V]): Table[K, Entry[V]] =
result = t.cache
a helper to get the underlying Table
proc setKey*[K, V](t: CacheTable[K, V], key: K, value: V, d: Duration) =
## Set ``Key`` of type ``K`` (needs to be hashable) to ``value`` of type ``V`` with duration ``d``
let rightnow = times.getTime()
let rightNowDur = times.initDuration(seconds = rightnow.toUnix(),
nanoseconds = rightnow.nanosecond)
let ttl = d.inNanoseconds + rightNowDur.inNanoseconds
let entry = Entry[V](value: value, ttl: ttl)
t.cache.add(key, entry)
a helper to set a new key in the CacheTable with a specific Duration
proc setKey*[K, V](t: CacheTable[K, V], key: K, value: V,
expiration: Expiration = NeverExpires) =
## Sets key with `Expiration` strategy
var entry: Entry[V]
case expiration:
of NeverExpires:
entry = Entry[V](value: value, ttl: 0)
t.cache.add(key, entry)
of DefaultExpiration:
t.setKey(key, value, d = t.defaultExpiration)
a helper to set key based on an Expiration strategy
- if
NeverExpires
: ttl should be 0 - if
DefaultExpiration
: ttl will be the same as the CachetabledefaultExpiration
duration
proc setKeyWithDefaultTtl*[K, V](t: CacheTable[K, V], key: K, value: V) =
## Sets a key with default Ttl duration.
t.setKey(key, value, DefaultExpiration)
sets a key to value with default expiration
proc hasKey*[K, V](t: CacheTable[K, V], key: K): bool =
## Checks if `key` exists in cache
result = t.cache.hasKey(key)
Check if the cache underneath has a specific key
proc isExpired(ttl: int64): bool =
if ttl == 0:
# echo "duration 0 never expires."
result = false
else:
let rightnow = times.getTime()
let rightNowDur = times.initDuration(seconds = rightnow.toUnix(),
nanoseconds = rightnow.nanosecond)
# echo "Now is : " & $rightnow
result = rightnowDur.inNanoseconds > ttl
Helper to check if a ttl
expired relative to the time right now.
proc get*[K, V](t: CacheTable[K, V], key: K): Option[V] =
## Get value of `key` from cache
var entry: Entry[V]
try:
entry = t.cache[key]
except:
return none(V)
# echo "getting entry for key: " & key & $entry
if not isExpired(entry.ttl):
# echo "k: " & key & " didn't expire"
return some(entry.value)
else:
# echo "k: " & key & " expired"
del(t.cache, key)
return none(V)
Getting a key from the cache to returns an Option[V]
of the value of type V
stored in the Entry[V]
.
Thank you for reading! and please feel free to open an issue or a PR to improve to content of Nim Days :)
Parser combinators
Today, we will learn about Parser Combinators and Nim. a parser is something (a function) accepts some text and creates a decent structure out of it (that's not formal definition by any means). First time I learned about Parser combinator when I was (still for sure) learning haskell, I was amazed by the expressiveness and composebility. Lots of languages has libraries based on parser combinators e.g python pyparsing
from pyparsing import Word, alphas
greet = Word(alphas) + "," + Word(alphas) + "!"
hello = "Hello, World!"
print(hello, "->", greet.parseString(hello))
The program outputs the following:
Hello, World! -> ['Hello', ',', 'World', '!']
Here in this program we literally said we want to create a greet
parser that's the combination of a Word of alphas
followed by a literal comma ,
then followed by another Word of alphas
then followed by a literal exclamation point !
. That greet parser is only capable of parsing a text that can be broken down to the small chunks (parsable parts) we mentioned.
Imagine in python you could express that json grammar using pyparsing in around 25 lines?
import pyparsing as pp
from pyparsing import pyparsing_common as ppc
def make_keyword(kwd_str, kwd_value):
return pp.Keyword(kwd_str).setParseAction(pp.replaceWith(kwd_value))
TRUE = make_keyword("true", True)
FALSE = make_keyword("false", False)
NULL = make_keyword("null", None)
LBRACK, RBRACK, LBRACE, RBRACE, COLON = map(pp.Suppress, "[]{}:")
jsonString = pp.dblQuotedString().setParseAction(pp.removeQuotes)
jsonNumber = ppc.number()
jsonObject = pp.Forward()
jsonValue = pp.Forward()
jsonElements = pp.delimitedList(jsonValue)
jsonArray = pp.Group(LBRACK + pp.Optional(jsonElements, []) + RBRACK)
jsonValue << (
jsonString | jsonNumber | pp.Group(jsonObject) | jsonArray | TRUE | FALSE | NULL
)
memberDef = pp.Group(jsonString + COLON + jsonValue)
jsonMembers = pp.delimitedList(memberDef)
jsonObject << pp.Dict(LBRACE + pp.Optional(jsonMembers) + RBRACE)
jsonComment = pp.cppStyleComment
jsonObject.ignore(jsonComment)
A more formal definition According to wikipedia, In computer programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree or a set of indices representing locations in the string where parsing stopped successfully. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing. This parsing technique is called combinatory parsing.
So today, we will try to create a small parser combinators (parsec library) in nim with the following expectation
What to expect
parsing just one letter
let aParser = charp('a')
let bParser = charp('b')
echo $aParser.parse("abc")
# <Right parsed: @["a"], remaining: bc >
echo $bParser.parse("bca")
# <Right parsed: @["b"], remaining: ca >
parsing a letter followed by another letter
let abParser = charp('a') >> charp('b')
echo $abParser.parse("abc")
# <Right parsed: @["a", "b"], remaining: c >
parsing one or the other
let aorbParser = charp('a') | charp('b')
echo $aorbParser.parse("acd")
# <Right parsed: @["a"], remaining: cd >
echo $aorbParser.parse("bcd")
# <Right parsed: @["b"], remaining: cd >
parsing abc
let abcParser = parseString("abc")
echo $abcParser.parse("abcdef")
# <Right parsed: @["abc"], remaining: def >
parsing many a's
let manyA = many(charp('a'))
echo $manyA.parse("aaab")
# <Right parsed: @["a", "a", "a"], remaining: b >
echo $manyA.parse("bbb")
# <Right parsed: @[], remaining: bbb >
parsing at least 1 a
let manyA1 = many1(charp('a'))
echo $manyA1.parse("aaab")
# <Right parsed: @["a", "a", "a"], remaining: b >
echo $manyA1.parse("bbb")
Left Expecting '$a' and found 'b'
#
parsing many digits
let manyDigits = many1(digit)
echo $manyDigits.parse("1234")
# <Right parsed: @["1", "2", "3", "4"], remaining: >
parsing digits separated by comma
let commaseparatednums = sep_by(charp(',').suppress(), digit)
echo $commaseparatednums.parse("1,2,4")
# <Right parsed: @["1", "2", "4"], remaining: >
Creating the greet parser from pyparsing
let greetparser = word >> charp(',').suppress() >> many(ws).suppress() >> word
echo $greetparser.parse("Hello, World")
# <Right parsed: @["Hello", "World"], remaining: >
Multiply parser
echo $(letter*3).parse("abc")
# <Right parsed: @["a", "b", "c"], remaining: >
parsing UUIDs
let uuidsample = "db9674c4-72a9-4ab9-9ddd-1d641a37cde4"
let uuidparser =(hexstr*8).map(smashtransformer) >> charp('-') >> (hexstr*4).map(smashtransformer) >> charp('-') >> (hexstr*4).map(smashtransformer) >> charp('-') >> (hexstr*4).map(smashtransformer) >> charp('-') >> (hexstr*12).map(smashtransformer)
echo $uuidparser.parse(uuidsample)
# <Right parsed: @["db9674c4", "-", "72a9", "-", "4ab9", "-", "9ddd", "-", "1d641a37cde4"], remaining: >
parsing recursive nested structures (ints or list of [ints or lists])
var listp: Parser
var valref = (proc():Parser =digits|listp)
listp = charp('[') >> sep_by(charp(',').suppress(), many(valref)) >> charp(']')
var valp = valref()
echo $valp.parse("1")
# <Right parsed: @["1"], remaining: >
echo $valp.parse("[1,2]")
# <Right parsed: @["[", "1", "2", "]"], remaining: >
echo $valp.parse("[1,[1,2]]")
#<Right parsed: @["[", "1", "[", "1", "2", "]", "]"], remaining: >
Implementation
the idea of a parser is something that accepts a text and returns Either an success (with the info of what got consumed of the text and what is still remaining) or a failure with some error messages
-> success( parsed, remaining)
stream of characters -> [ parser ]
-> failure (what went wrong message)
and
- if it was a failure we abort the parsing operation
- if it was a success we try to continue with the next parser
that's the basic idea
imports
import strformat, strutils, sequtils
well, we will be dealing with lots of strings and lists, so probably we need strformat
, strutils
, and sequtils
Either and its friends
Either is one of my favorite types, bit more advanced than a Maybe
or Option, because it allows returning specific error message instead of just none that gives us no idea what went wrong.
data Either a b = Left a | Right b
Either a success Right
with data of type b
or failure Left
with data of type a
we can try to describe it in Nim as variant as follows
type
EitherKind = enum
ekLeft, ekRight
Either = ref object
case kind*: EitherKind
of ekLeft: msg*: string
of ekRight: val*: tuple[parsed: seq[string], remaining:string]
Here we defined the kind EitherKind
that can be ekLeft
or ekRight
and on the variant Either
we define msg in case if kind
was ekLeft
for error message msg
and in case of ekRight
we define val
which is the "parsed and the remaining" parts of the input string.
proc map*(this: Either, f: proc(l:seq[string]):seq[string]): Either =
case this.kind
of ekLeft: return this
of ekRight:
return Either(kind:ekRight, val:(parsed:f(this.val.parsed), remaining:this.val.remaining))
Here we define the map
function for the type either, basically what happens when we apply a function on the either type, it should unwrap the data in Right
, pass it to the function and return a new Either (transformed either) and in case of Left
we return the same Either
proc `$`*(this:Either): string =
case this.kind
of ekLeft: return fmt"<Left {this.msg}>"
of ekRight: return fmt("<Right parsed: {this.val.parsed}, remaining: {this.val.remaining} >")
converting the either to string by defining $
function
proc `==`*(this: Either, other: Either): bool =
return this.kind == other.kind
here we define simple comparison for the either objects (basically checking if both are ekRight
or both are ekLeft
)
Now to the parsers
We can exploit the feature of the objects to hold some more instructions for the parser, but typically parser combinators are about composing
higher order functions together to parse a text, we can try to emulate that with objects and taking a short cut
type
Parser = ref object
f* : proc(s:string):Either
suppressed*: bool
Here we define a Parser type that
- holds a function
f
(real parser that consumes the input string and returns an Either) suppressed
a flag to indicate we want to ignore the parsed text
suppressed can be very useful in ignoring/discarding dashes in a string (e.g uuid text) or commas in a CSV row.
proc newParser(f: proc(s:string):Either, suppressed:bool=false): Parser =
var p = Parser()
p.suppressed = suppressed
p.f = f
return p
helper to create a new parser, from a real parsing function function proc(s:string):Either
and suppressed flag,
proc `$`*(this:Parser): string =
return fmt("<Parser:>")
allowing our parser to convert to string by defining $
proc parse*(this: Parser, s:string): Either =
return this.f(s)
parse
is a function that receives a string then executes the underlying parser inf
from that input string to Either type.
proc map*(this:Parser, transformer:proc(l:seq[string]):seq[string]):Parser =
proc inner(s:string):Either =
return this.f(s).map(transformer)
return newParser(f=inner)
Here we define a map function to transform the underlying parser result once executed
the idea here is we return a new parser wrapping an inner function
with all transformation knowledge (if bit tricky move to next)
proc suppress*(this: Parser): Parser =
this.suppressed = true
return this
here we change the suppressed flag to true, should be used as in the examples mentioned in what to expect section
let commaseparatednums = sep_by(charp(',').suppress(), digit)
echo $commaseparatednums.parse("1,2,4")
Here we will be interested in the digits 1 and 2 and 4 and want to ignore the commas
in the input string, so that's what suppress helps us with.
Parsing a single character
now we would like to be able to parse a single character and get parsed value and the remaining characters
let aParser = charp('a')
echo $aParser.parse("abc")
# (parsed a, remaining bc)
proc charp*(c: char): Parser =
proc curried(s:string):Either =
if s == "":
let msg = "S is empty"
return Either(kind:ekLeft, msg:msg)
else:
if s[0] == c:
let rem = s[1..<s.len]
let parsed_string = @[$c]
return Either(kind:ekRight, val:(parsed:parsed_string, remaining:rem))
else:
return Either(kind:ekLeft, msg:fmt"Expecting '${c}' and found '{s[0]}'")
return newParser(curried)
here we defined a charp
function that takes a character to parse and returns a Parser only capable of parsing that character
- we check if empty string, we return Left
Either with ekLeft kind
- we check if the string starts with the character we want to parse, if so we return a an Either with a Right of that characater and the rest of the string or we return a Left if the string doesn't start with the character we plan to parse
- all of the parsing logic we define in a function
curried
that we pass tonewParser
Sequential parsers
now we would like to parse a
then b
sequentially. possible if we create parser for a
and a parser for b
and try to (parse a
andThen
parse b
).
the statement can be converted to proc `andThen(parserForA, parserForB). let's define that function
let abParser = charp('a') >> charp('b')
echo $abParser.parse("abc")
# parse: [a, b] and remaining c
proc andThen*(p1: Parser, p2: Parser): Parser =
proc curried(s: string) : Either=
let res1 = p1.parse(s)
case res1.kind
of ekLeft:
return res1
of ekRight:
let res2 = p2.parse(res1.val.remaining) # parse remaining chars.
case res2.kind
of ekLeft:
return res2
of ekRight:
let v1 = res1.val.parsed
let v2 = res2.val.parsed
var vs: seq[string] = @[]
if not p1.suppressed: #and _isokval(v1):
vs.add(v1)
if not p2.suppressed: #and _isokval(v2):
vs.add(v2)
return Either(kind:ekRight, val:(parsed:vs, remaining:res2.val.remaining))
return res2
return newParser(f=curried)
proc `>>`*(this: Parser, rparser:Parser): Parser =
return andThen(this, rparser)
Straight forward
- if parsing with
p1
fails, we fail with Left - if parsing with
p1
succeed, we try to parse withp2
- if parsing
p2
works the whole thing returnsRight
- if it doesn't we return
Left
- if parsing
- we create
>>
function to a more pleasing api
alternate parsing
Now we want to try parsing with one parse or the other and only fail if both can't parse
let aorbParser = charp('a') | charp('b')
echo $aorbParser.parse("acd")
echo $aorbParser.parse("bcd")
Here we want to be able to parse a
or b
proc orElse*(p1, p2: Parser): Parser =
proc curried(s: string):Either=
let res = p1.parse(s)
case res.kind
of ekRight:
return res
of ekLeft:
let res = p2.parse(s)
case res.kind
of ekLeft:
return Either(kind:ekLeft, msg:"Failed at both")
of ekRight:
return res
return newParser(curried)
proc `|`*(this: Parser, rparser: Parser): Parser =
return orElse(this, rparser)
-
if we are able to parse with
p1
we return with Right -
if we can't parse with
p1
we try to parse withp2
- if we succeed we return a Right
- if we can't we return failure with Left
-
we define more pleasing syntax
|
Parsing n
times
we want to parse with a parsers n
times so instead of doing this
threetimesp1 = p1 >> p1 >> p1
we want to write
threetimesp1 = p1*3
proc n*(parser:Parser, count:int): Parser =
proc curried(s: string): Either =
var mys = s
var fullparsed: seq[string] = @[]
for i in countup(1, count):
let res = parser.parse(mys)
case res.kind
of ekLeft:
return res
of ekRight:
let parsed = res.val.parsed
mys = res.val.remaining
fullparsed.add(parsed)
return Either(kind:ekRight, val:(parsed:fullparsed, remaining:mys))
return newParser(f=curried)
proc `*`*(this:Parser, times:int):Parser =
return n(this, times)
- here we try to apply the parser
count
times - we create
*
function for more pleasing api
parsing letters, upper, lower, digits
now we want to be able to parse any alphabet letter and digits with something like
let letter = anyOf(strutils.Letters)
let lletter = anyOf({'a'..'z'})
let uletter = anyOf({'A'..'Z'})
let digit = anyOf(strutils.Digits)
for digit we can do
digit = charp("1") | charp("2") | charp("3") | charp("4") ...
but definitely it looks much nicer with anyOf
syntax, so the idea is we create parsers for the elements in the set and try to orElse
between them
Here we define choice
proc choice*(parsers: seq[Parser]): Parser =
return foldl(parsers, a | b)
proc anyOf*(chars: set[char]): Parser =
return choice(mapIt(chars, charp(it)))
- choice is generic function over any
Parser
s seq that tries them in order - anyOf takes in
characters
that then gets converted to parser usingmapIt
andcharp
parser generator (from character to a Parser)
Parsing a complete string
Now we would like to parse complete string "abc" from "abcdef" instead of doing
abcParser = charp('a') >> charp('b') >> charp('c')
we want an easier syntax that gets expanded to that have
abcParser = parseString("abc)
parseString parser
proc parseString*(s:string): Parser =
var parsers: seq[Parser] = newSeq[Parser]()
for c in s:
parsers.add(charp(c))
var p = foldl(parsers, a >> b)
return p.map(proc(l:seq[string]):seq[string] = @[join(l, "")])
Optionally
What if we want to mark a parser as optional to exist? for example if we are parsing a greet
statement and it's valid to not to have !
for instance ("Hello World" and "Hello World !") both should be parsable without greet parser.
We probably want to define it like that
let greetparser = word >> charp(',').suppress() >> many(ws).suppress() >> word >> optionally(charp('!'))
echo $greetparser.parse("Hello, World")
#<Right parsed: @["Hello", "World", ""], remaining: >
echo $greetparser.parse("Hello, World!")
# <Right parsed: @["Hello", "World", "!"], remaining: >
Notice the optionally(charp('!'))
it marks a parser as an option.
proc optionally*(parser: Parser): Parser =
let myparsed = @[""]
let nonproc = proc(s:string):Either = Either(kind:ekRight, val:(parsed:myparsed, remaining:""))
let noneparser = newParser(f=nonproc)
return parser | noneparser
What we basically do is we fake a success parser that we try to parse with the parser
passed and if we can't we succeed
with noneparser
many: zero or more
Here we try to parse as many as we can of a specific parser, e.g parse as many a
s as we can from a string.
proc parseZeroOrMore(parser: Parser, inp:string): Either = #zero or more
let res = parser.parse(inp)
case res.kind
of ekLeft:
let myparsed: seq[string] = @[]
return Either(kind:ekRight, val:(parsed:myparsed, remaining:inp))
of ekRight:
let firstval = res.val.parsed
let restinpafterfirst = res.val.remaining
# echo "REST INP AFTER FIRST " & restinpafterfirst
let res = parseZeroOrMore(parser, restinpafterfirst)
case res.kind
of ekRight:
let subseqvals = res.val.parsed
let remaining = res.val.remaining
var values:seq[string] = newSeq[string]()
# echo "FIRST VAL: " & firstval
# echo "SUBSEQ: " & $subseqvals
values.add(firstval)
values.add(subseqvals)
return Either(kind:ekRight, val:(parsed:values, remaining:remaining))
of ekLeft:
let myparsed: seq[string] = @[]
return Either(kind:ekRight, val:(parsed:myparsed, remaining:inp))
proc many*(parser:Parser):Parser =
proc curried(s: string): Either =
return parse_zero_or_more(parser,s)
many1: one or more
proc many1*(parser:Parser): Parser =
proc curried(s: string): Either =
let res = parser.parse(s)
case res.kind
of ekLeft:
return res
of ekRight:
return many(parser).parse(s)
return newParser(f=curried)
- Here we try to parse once manually
- if parsing succeed we invoke the
many
parser - if parsing fails we return a left
- if parsing succeed we invoke the
Separated by parser
Most of the times the data we parse are separated
by something a comma, space, a dash.. etc and we would like to have a simple way to parse data without hassling with commas, .. etc To make something like that possible
let commaseparatednums = sep_by(charp(',').suppress(), digit)
echo $commaseparatednums.parse("1,2,4")
proc sep_by1*(sep: Parser, parser:Parser): Parser =
let sep_then_parser = sep >> parser
return (parser >> many(sep_then_parser))
proc sep_by*(sep: Parser, parser:Parser): Parser =
let myparsed = @[""]
let nonproc = proc(s:string):Either = Either(kind:ekRight, val:(parsed:myparsed, remaining:""))
return (sep_by1(sep, parser) | newParser(f=nonproc))
How does that work? Lets assume the example a,b,c
we want to describe it as sepBy commaParser letterParser
. perfect. then how do we mentally reason about parts? well we start with parsing a letter
then comma
then letter
then comma
then letter
so letter
then (separator >> letter) many times
, that's exactly this line in sep_by1
return (parser >> many(sep_then_parser))
Surrounded By
if we want to make sure something is surrounded by something e.g single quotes or |
we can use surroundedBy helper
let sur3pipe = surroundedBy(charp('|'), charp('3'))
echo $sur3pipe.parse("|3|")
#<Right parsed: @["|", "3", "|"], remaining: >
Implementation should be as easy as
let surroundedBy = proc(surparser, contentparser: Parser): Parser =
return surparser >> contentparser >> surparser
Between
between is more generic that surroundedBy because the opening and closing can be different e.g (3)
let paren3 = between(charp('('), charp('3'), charp(')') )
echo paren3.parse("(3)")
# <Right parsed: @["(", "3", ")"], remaining: >
Implementation should be as easy as
let between = proc(p1, p2, p3: Parser): Parser =
return p1 >> p2 >> p3
Parsing recursive nested structures
Next, we have a very simple language where you can have
- chars
- list of chars or list
It's going to be very easy to express
var listp: Parser
var valref = (proc():Parser =letters|listp)
listp = charp('[') >> sep_by(charp(',').suppress(), many(valref)) >> charp(']')
var valp = valref()
Here's probably the tricky part, let's think about it for a second, we want to says
lang = list | letter
and list = list of lang
, we need to delay one of them to be able to reference, and delaying usually means "convert to a function" or at least have it's info "declared already", and that's what we do with listp: Parser
just giving nim the info that there will be listp
at some point and for the lang parser we create a function that returns list | letter
(that's the reason you will find some of our parsec parsers accept proc
in some of their overloads instead of just parser
only) and once we are done with the declaration of listp
now we can invoke valref
function to get an actual usable parser to use.
var inps = @["a", "[a,b]", "[a,[b,c]]"]
for inp in inps:
echo &"inp : {inp}"
let parsed = valp.parse(inp)
if parsed.kind == ekRight:
let data = parsed.val.parsed
echo inp, " => ", $parseToNimData(data)
we only need a function parseToNimData
to convert, typically we should be able to use enhance the usage of maps to actually convert the data to the desired type "in the same time of the parsing"
Before defining parseToNimData
, let's define the language elements first
# recursive lang ints and list of ints or lists
type
LangElemKind = enum
leChr, leList
LangElem = ref object
case kind*: LangElemKind
of leChr: c*: char
of leList: l*: seq[LangElem]
proc `$`*(this:LangElem): string =
case this.kind
of leChr: return fmt"<Char {this.c}>"
of leList: return fmt("<List: {this.l}>")
proc `==`*(this: LangElem, other: LangElem): bool =
return this.kind == other.kind
We state that our language can have two kind of LangElemKind
- leChr: for chracters
- leList: for lists of any langauge element.
proc parseToNimData(data: seq[string]) : LangElem =
result = LangElem(kind:leList, l: @[])
let dataIsList = data[0][0] == '['
for el in data:
var firstchr = el[0]
if firstchr.isAlphaAscii():
var elem = LangElem(kind:leChr, c:firstchr)
if dataIsList == false:
return elem
else:
result.l[result.l.len-1].l.add(LangElem(kind:leChr, c:firstchr))
elif firstchr == '[':
result.l.add(LangElem(kind:leList, l: @[]))
parseToNimData
is a simple transformer that builds the tree of the suceessfully parsed strings converting them into LangElem
s
This is how the final result looks like
inp : a
@["parsed data: ", "a"]
a => <Char a>
inp : [a,b]
@["parsed data: ", "[", "a", "b", "]"]
[a,b] => <List: @[<List: @[<Char a>, <Char b>]>]>
inp : [a,[b,c]]
@["parsed data: ", "[", "a", "[", "b", "c", "]", "]"]
[a,[b,c]] => <List: @[<List: @[<Char a>]>, <List: @[<Char b>, <Char c>]>]>
That's it!
More resources on the topic
Thank you for reading! and please feel free to open an issue or a PR to improve to content of Nim Days or improving the very young nim-parsec :)