Parallella LISP – the runtime environment – Language, Thought & Computation

This series of blog posts describes the Parrallella-LISP source code and how to build applications requiring a combination of symbolic and numeric computation. To this end, two example applications will also be blogged about. The first application represents the meaning of documents with vector semantics and the second, extracts contours from binary images.

If you want to understand the examples or modify the source in any way then you need to know how to compile the LISP interpreter to run on the Parallella board and a little bit about the runtime environment.

Compiling and running Parallella LISP

There are two versions of Parallella LISP. One with Garbage Collection (GC) and one without. The one with GC is in the directory plisp-gc and the version without in the directory plisp. All versions were compiled and tested with esdk.2015.1. The source code consists of host and device code:

host code: host_main.c, libhost.c – compiles to fl-host.elf
device code: device_main.c, libdevice.c, libplisp.c – compiles to fl-device.srec

The host code loads the LISP interpreter (device code) contained in fl-device.srec onto the board and prints out the results. To build LISP for the parallella in either of the above mentioned directories type:

make build

./build.sh

To run LISP type:

make run

./run.sh

The shell script plisp, executes the code contained in a single file on each core:

./plisp prog.lisp

If you would like to build lisp to run on Linux or the host then type:

make

This produces two identical executables which you can run by typing:

./fl

./onefile

Both onefile and fl are callable with a filename. If no filename is given then testfuncs.lisp is interpreted. For instance, to interpret the code that will run on core 3 type:

./fl code/p2.lisp

Apart from adding a compiler directive to data definitions, writing code for the Parallella is pretty much the same as for any other machine. For instance declare structures as follows:

struct __attribute__((aligned(8))) string {
    string *next;
    char s[STRINGMAX];
};

Initializing shared memory and loading the code for each core

host_main.c contains the procedure calls for initializing memory, loading the LISP interpreter onto the device and printing out the results. For a change, the main procedure includes a few comments!

int main(int argc, char *argv[]) {
    int rows, cols, result;
    char *code, filename[64];
    e_platform_t platform;
    e_epiphany_t dev;
    e_mem_t emem;
    //
    // init the device and get platform data
    //
    if (E_OK != e_init(NULL)) {
        fprintf(stderr, "\nERROR: epiphany initialization failed!\n\n");
        exit(1);
    }
    if (E_OK != e_reset_system() ) {
        fprintf(stderr, "\nWARNING: epiphany system rest failed!\n\n");
    }
    fprintf(stderr, "Getting platform info\n");
    if ( E_OK != e_get_platform_info(&amp;platform) ) {
        fprintf(stderr, "Failed to get Epiphany platform info\n");
        exit(1);
    }
    fprintf(stderr, "Platform version: %s, HAL version 0x%08x\n",
            platform.version, platform.hal_ver);
    rows = platform.rows;
    cols = platform.cols;
    //
    // Initialize device memory
    //
    memory = init_ememory(argc, argv, rows, cols);
    //
    // open the device
    //
    if (E_OK != e_open(&amp;dev, 0, 0, rows, cols)) {
        fprintf(stderr, "\nERROR: Can't establish connection to Epiphany device!\n\n");
        exit(1);
    }
    //
    // Write the ememory data structure to device memory
    //
    write_ememory(&amp;emem, memory);
    //
    // Load the code
    //
    clear_done_flags(&amp;dev, rows, cols);
    result = e_load_group("./fl-device.srec", &amp;dev, 0, 0, rows, cols, E_TRUE);
    if (result == E_ERR) {
        printf("Error loading Epiphany program.\n");
        exit(1);
    }
    //
    // Poll the device waiting for all cores to finish
    //
    poll_device(&amp;dev, rows, cols);
    //
    // Process the results of device processing
    //
    process_ememory(&amp;emem, memory, rows, cols);
    //
    // Close and finalize the device
    //
    if (e_close(&amp;dev)) {
        printf( "\nERROR: Can't close connection to Epiphany device!\n\n");
        exit(1);
    }
    if (e_free(&amp;emem)) {
        printf( "\nERROR: Can't release Epiphany DRAM!\n\n");
        exit(1);
    }
    e_finalize();
}

Prior to loading the interpreter, the data structures needed to run LISP on each core are loaded into shared memory. The main data type for constructing linked lists is “structure node” which when in use will be one of the types enumerated in the file structures.h:

enum ltype {PAIR, LIST, SYM, SUBR, FSUBR, LAMBDA, INT, NIL, TEE, ENV, FREE};

struct node {
    node *next;
    unsigned char type;
    unsigned char marked;
    union {
        namestr *name;
        struct {
            node *car;
            node *cdr;
        };
        struct {
            namestr *fname;
            node *(*fn)(node *, node *);
        };
        struct {
            node *args;
            node *body;
        };
        long long i;
        double r;
        struct {
            node *top;
            node *bindings;
        };
    };
};

An array of this structure is defined in advance and stored with the other data needed by the interpreter in “structure edata”:

struct edata {
    int id;
    int ememory_size;
    int node_size;
    int nnodes;
    int nodemem;
    int nnames;
    int namemem;
    int nstrings;
    int stringmem;
    int finished;
    char message[1024];
    char code[BANKSIZE];
    node *NULLPTR;
    node *history;
    node *freelist;
    namestr *namefreelist;
    string *stringfreelist;
    string freeStringArray[FREESTRING];
    node freeNodeArray[FREEOBJECT];
    namestr freeNameArray[FREENAME];
};

In addition to freeNodeArray, two other arrays are defined called freeNameArray and freeStringArray. These arrays will eventually be converted to lists (the freelists) and form the basis for memory allocation.

The edata structure for each core is accessible via structure ememory. As far as I understand, structure ememory can take up to 16M of shared memory starting at 0x8f000000. However, this memory will become corrupted by ‘C’ functions that call malloc internally.

struct ememory {
    edata data[NCORES];
};

Shared memory is initialised by function init_ememory located in libhost.c. This procedure allocates space for the ememory data structure, loads the code for each core into the string, memory->data[coreID].code, and then initialises the freelists. The code directory contains the LISP program to run on each core – p0.lisp to p15.lisp. If the plisp shell command is passed a filename then this file is interpreted on each core instead.

ememory *init_ememory(int argc, char *argv[], int rows, int cols) {
    char *code, filename[128];
    ememory *memory = (ememory *)calloc(1, sizeof(ememory));
    if (!memory) {
        fprintf(stderr, "%s\n", "out of memory in init_ememory");
        exit(-1);
    }
    lmem = (char *)memory;
    if (argc == 2) {
        code = readFile(argv[1]);
    }
    for (int i=0; i data[i].code, "%s", code);
            free(code);
        } else {
            sprintf(memory-&gt;data[i].code, "%s", code);
        }
    }
    if (argc == 2) {
        free(code);
    }
    createFreelist(memory, rows, cols);
    createStringFreelist(memory, rows, cols);
    createNameFreelist(memory, rows, cols);
    return memory;
}

Initializing the freelists

The freelists are created from the arrays freeNodeArray, freeNameArray and freeStringArray. For example, the procedure createFreelist, will make freeNodeArray[n].next point to freeNodeArray[n+1] thus creating a linked list. The procedure device_ptr, referenced below, converts a host pointer to a device pointer.

//
// create the freelist for node allocation on the device
//
void createFreelist(ememory *memory, int rows, int cols) {
    int id, k;
    node *freeNodeArray;
    char *base = (char *)memory;
    for (int i=0; i<rows; i++) {
        for (int j=0; j<cols; j++) {
            id = (cols * i) + j;
            freeNodeArray = memory->data[id].freeNodeArray;
            for (k = 0; k < FREEOBJECT - 1; k++) {
                char *ptr = (char *)&freeNodeArray[k + 1];
                freeNodeArray[k].next = (node *)device_ptr(base, ptr);
                freeNodeArray[k].type = FREE;
            }
            freeNodeArray[FREEOBJECT - 1].type = FREE;
            freeNodeArray[FREEOBJECT - 1].next = NULL;
            char *ptr = (char *)memory->data[id].freeNodeArray;
            memory->data[id].freelist = (node *)device_ptr(base, ptr);
        }
    }
}

Accessing shared memory on each core

The function coreInit in libdevice.c, initialises the pointers to the freelists:

#define BUF_ADDRESS 0x8f000000

char *coreInit(int argc, char *argv[], int cid) {
    id = cid;
    memory = (ememory *)(BUF_ADDRESS);
    stringfreelist = &amp;memory-&gt;data[id].freeStringArray[0];
    freelist = &amp;memory-&gt;data[id].freeNodeArray[0];
    namefreelist = &amp;memory-&gt;data[id].freeNameArray[0];
    return &amp;memory-&gt;data[id].code[0];
}

Allocating items from each of the freelists is quite straight forward. For instance, the procedure newnode shown below, allocates nodes of a particular type. Lists are built by linking together nodes created by this procedure.

node *omalloc(void) {
    if (freelist isnt NULL) {
        return (node *)popFree((stack **)(&amp;freelist));
    }
    setflag("ERROR in omalloc: NULL freelist");
    return NULL;
}

node *node_malloc() {
    nodemem += sizeof(node);
    nnodes += 1;
    return (node *)omalloc();
}

void node_free(node *n) {
    nodemem -= sizeof(node);
    nnodes -= 1;
    n-&gt;type = FREE;
    pushFree((stack *)n, (stack **)(&amp;freelist));
}

node *newnode(enum ltype type) {
    node *n;
    n = (node *) node_malloc();
    n-&gt;type = type;
    n-&gt;marked = 0;
    next(n) = allocated;
    allocated = n;
    return n;
}

The freelists are treated as stacks and free items popped from a freelist when needed. During GC, unused items are popped back onto the appropriate freelist. A list (named allocated) is kept of all the currently allocated nodes for use during GC.

struct stack {
    void *next;
};

void pushFree(stack *ptr, stack **stk) {
    ptr-&gt;next = *stk;
    *stk = ptr;
}

stack *popFree(stack **stk) {
    if (*stk is NULL) {
        return NULL;
    }
    stack *item = *stk;
    *stk = (*stk)-&gt;next;
    item-&gt;next = NULL;
    return item;
}

The REPL and global variables (libplisp.c)

When the REPL starts running the first function called is init_lisp which initialises the main global variables – NULLPTR, nil, tee and the history list. The NULLPTR terminates lists and nil and tee are important for many list processing functions.

void init_lisp(void);

After initializing global variables, the LISP code in the input string is interpreted and the results of processing added to the history list along with each input expression. Provided init_lisp is called, the REPL can be redefined in any way you like. In future blog posts the REPL is redefined to demonstrate the procedures contained in libplisp.c.

void REPL(char *input) {
    node *top_env = init_lisp(), *val, *l;
    mark_expr(globals, PERMANENT);
    mark_expr(NULLPTR, PERMANENT);
    l = parse_string(&amp;input);
    mark_expr(l, PERMANENT);
    forlist (sexp in l) {
        pr(car(sexp));
        clear_bindings(top_env);
        val = eval(car(sexp), top_env);
        pr(val);
        mark_expr(globals, PERMANENT);
        mark_expr(history, PERMANENT);
        free_unmarked(&amp;allocated);
    }
}

The REPL is called by function main located in device_main.c.

int main(int argc, char *argv[]) {
    unsigned int row, col, id;
    char *input;
    //
    // get the core id
    //
    id = coreID(&amp;row, &amp;col);
    //
    // Initialize the core
    //
    input = coreInit(argc, argv, id);
    //
    // Read, Eval and Print
    //
    REPL(input);
    //
    // Print stats and exit
    //
    setflag("Exited normally!");
    return 0;
}

Printing out the history list on the host (libhost.c)

When processing has finished, the history list for each core is printed out after uploading the ememory structure from the device. The code to handle lists is pretty much the same except that procedures are needed for converting between host and device pointers so that structure elements can be accessed. The values of important global variables are also printed out such as the size of the ememory data structure and the number of nodes allocated. All of this happens in function process_ememory.

void process_ememory(e_mem_t *emem, ememory *memory, int rows, int cols) {
    e_read(emem, 0, 0, 0x0, memory, sizeof(ememory));
    for (int i=0; i&lt;rows; i++) {
        for (int j=0; jdata[id].NULLPTR));
            history = dr_node(memory-&gt;data[id].history);
            for (node *ptr = history; ptr != NULLPTR; ptr = cdr(ptr)) {
                print(car(ptr));
                printf("\n\n");
            }
            prGlobals(memory, id);
        }
    }
}

Summary

This post was about creating the runtime environment for the LISP interpreter to run on each core. The next post will be about the ‘C’ functions for reading, parsing and manipulating lists.