Calling C Subroutines from A+


This chapter describes how to write C programs to be called from within A+, and what you have to do within A+ to call those functions.

How to Compile C Functions to Be Called by A+

Writing functions to be called from A+ is straightforward. You must:

  1. Write the C function normally, with #includes for the .h files described below. Several functions may be included in the file. The file should not have a main() function in it!

    It is a good idea to make all functions "static" (by putting the keyword static before the function name) unless they are directly called from A+. That is, all subroutines should be made static. This prevents their names from cluttering up the A+ namespace, which might cause a problem if another dynamic load contains a subroutine with the same name. All nonlocal variables must be static.

  2. Compile the C source code into an object file (not an executable file). This is done with the -c argument to cc. E.g.,
         cc -c -o foo.o foo.c
    compiles the C source file foo.c into foo.o.

    You may compile your code into several object files. If you do this, remember that static functions can be called only from another function in the same object file. But see "Another Way To Call C Routines From A+: Static Link".

How to Use C Functions When You Are in A+; Dynamic Loading

Once the functions are written, you must either dynamically load or statically link them into A+ before you can call them. In order to dynamically load them, you must know the names and paths of the object files, and the names of the C functions you want to call (known as the "entry points"). An object file can have several entry points, but you need to know only the names of the entry points you are loading. Static functions cannot be entry points.

Warning! If the A+ name used in _dyld (in its right argument) begins with an underscore, then the function will be installed in the root context, no matter what the current context, and it will be listed by $sfs but not by $xfs.

The procedures you must follow depend upon the system you are going to use.

   Dynamic Loading on Sun Machines

When a C program gets compiled, the compiler puts an underscore (_) in front of the name of each function. So if you write a routine called look(), the entry point will be '_look'.

Programs are dynamically loaded using the system function _dyld, which takes two arguments. The first (or left) is the name(s) of the object file(s) as a character string. Multiple object files are separated by spaces. All of the subroutines referenced by the entry points must be in one of the object files, or part of A+. You cannot refer to subroutines or entry points already dynamically loaded.

The second (right) argument is a nested array in the form
     (entry; name; args)
or, to load more than one function from the same file(s),

     (entry1; name1; args1;
      entry2; name2; args2;
      ...               ;
      entryN; nameN; argsN)
where entryI is the entry point name, as a character string, nameI is what you want to call the function in your workspace (the A+ name, also as a character string), and argsI is a numeric vector describing the types of the arguments. (This vector is described below.)

For example:

     'test.o' _dyld ('_look'; 'lookat'; 9 0)
     $xfs
lookat

   Dynamic Loading under AIX

The mechanism provided by IBM to support dynamic loading of object code into a running process dictates that three files be provided.

  1. A file listing the symbols in the A+ interpreter which are to be accessed by the code to be loaded in. This file is provided for you and can be found in
    /usr/local/bin/a+x.xx/lib/liba.exp
    where x.xx is the A+ version and release, e.g.,
    /usr/local/lib/liba.exp.

    For the current default release, you can use
    /usr/local/lib/liba.exp.

  2. A file listing the symbols in the code to be loaded that the A+ interpreter needs to know about. You must create this file with an editor.

  3. A single object file containing the modules that you wish to load in. This file is produced by the linker as described below and will be referred to as the "shareable object" file. By convention these files should have a .so suffix.

This is best illustrated by an example.

Below is the file ref.c, which returns the reference count of the A+ object passed into it.

/* begin ref.c */
#include <a/arthur.h>
I ref(a)   A a; {
   return a->c;
}
/* end ref.c */

Let's say you want to dynamically load in this function and also the function dswap() from the BLAS library. The following procedure should be followed.

  1. Create the exports file, which enumerates all symbols that you want A+ to know about. The first line of this file contains the full pathname of the .so (shareable object) file which will be dynamically loaded. Subsequent lines enumerate symbols to be exported to A+, one per line. In this case we can create the file named xmpl-exports:
         #!/u/foobar/src/hodedo/xmpl.so
         ref
         dswap
  2. Compile all of your C and FORTRAN sources. In this case we only need to compile ref.c to produce ref.o .
         cc -c ref.c
  3. Link together all the object (.o) files and libraries that contain modules which you want to load into A+. The link command must specify an entry point (with the -e option) for the linked result because the default entry point is crt0, which is already the A+ interpreter's entry point. For the entry point, use any function name you wish that is in the code to be loaded. The link command must also specify the two files describing the symbols to be imported or exported. In this case we need to link ref.o and libblas.a:

    cc -e ref -bI:/usr/local/lib/liba.exp -bE:xmpl-exports
        ref.o -o xmpl.so -lblas

  4. Dynamically load the code into the A+ session. This should occur almost instantaneously.
       "xmpl.so" _dyld ("ref";"ref";0 9;
                        "dswap";"dswap";V_,4,FP,4,FP,4)

Notes:

   Another Way To Call C Routines From A+: Static Link

A hook that was added allows you to create a new A+ executable file with your C or C++ code linked in. The main use for this is in debugging code that will later be dynamically loaded.

The hook is an empty function called uextInstall(). It is invoked in /u/aplus/3prod/src/main/aplus_uext.c. To use this facility to load your functions into A+, you need to add install() calls to uextInstall().

Then you compile and link, to create an A+ executable file. If you compile with the debug flag, you can run the new A+ executable file under a debugger and have access to your own code.

The Basic A+ Data Types

The include file a/arthur.h defines the basic data types which A+ employs. These are: I - long; F - double; C - char. When dealing with A+ objects, these typedefs should be used to refer to integers, floating-point numbers, and characters, respectively.

An A+ object (a variable in an A+ "workspace") has the following typedef:
typedef struct a{ I c, t, r, n, d[MAXR], i, p[1];};

c - reference count.

How many pointers to this object exist? This helps determine whether an object can be modified in place, or whether a copy of the object must be created. If this number is 0, the object is a mapped file, and cannot be written to directly. ("Mapped Files" tells how to write to mapped files.)

An A+ object should be modified only if c is 1.

t - type.

What are the elements of the object? They should be one of the following values, which are #defined in a/arthur.hIt, Ft, Ct, Et, or Xt. It, Ft, and Ct refer to integers, floating-point numbers, and characters (I, F, C). Nested arrays and symbols are type Et. Xt is used for "executable types" - functions and operators. These are beyond the scope of this chapter.

r - rank.

The number of dimensions of the object.

n - number of elements.

The number of elements in the data array (p). With the type (t), it determines the size of the A+ object.

d[] - dimensions.

An array of the dimensions (...) of the object. MAXR is the largest rank allowed (currently 9).

i - items.

The number of items in the object. It is the number reported by _items.

p[] - data array.

It is defined as having one element of type I, but that is just to fool the compiler. In fact, its actual length is determined by n, and the actual type is determined by t. It is worth noting here that, for objects of type Ct, p[] is always a null-terminated string, and has n+1 elements. For all other types, p[] has n elements.

Since A+ objects are almost always allocated from dynamic memory, variables are more often than not pointers to A+ structures rather than the structures themselves. The type A is defined to be a pointer to an A+ object.

Nested arrays are A+ objects of type Et. For such objects, p[] is an array of pointers to the A+ objects which compose the array. Symbols are also represented as objects with type Et. For symbols, p[]is an array of pointers to another struct (the s struct). Symbols are somewhat more complicated than other A+ objects.

   Reference Counts - a Closer Look

The reference count field is used to save memory and time by eliminating identical copies of variables. When a variable is assigned the value of another variable, that variable is normally not copied. Instead, the new variable name is set to point at the same A+ object, and the reference count of that object is incremented.

Objects with reference counts greater than one are pointed to by more than one variable and should not be changed. You must duplicate the object instead (and decrement the reference count for the original object).

When you "dereference" an object - by expunging a variable, or dropping elements from a linked list - the reference count is decremented. If it becomes zero when decremented, it is destroyed, and the associated memory is freed.

The function ic(aobj) is used to increment a reference count, and dc(aobj) is used to decrement it (and possibly erase the object). They work recursively on nested objects.

dc() is rarely used in C subroutines. ic() is used primarily when modifying or returning arguments passed to the function. (See below.)

The Argument Vector

In A+, all functions must have a fixed number of arguments (a number not exceeding 9). This is also true for C functions called by A+. (This fixed number for a C function to be called from A+ cannot exceed 8.) In addition, C functions often expect only certain kinds of arguments - integers, for example - and behave badly if they receive an argument they do not expect.

The argument vector describes the number and types of the arguments to the C function, and the result which it returns. Also, A+ provides several different ways to pass data from A+ to C, which simplifies the C programs you must write. The argument vector allows you to select among these ways.

   Theory

The argument vector is composed of numbers between 0 and 15. The first number describes the result of the function, if any. If there is no result, use code 8, as described below. Otherwise use codes 0, 7, or 9.

The remaining numbers describe the arguments to the function. The length of the vector determines how many arguments there are. The maximum number of arguments allowed is eight.

The sixteen codes are shown in the table "C-Function Argument Types".

C-Function Argument Types
CodeMeaning (an asterisk means acceptable for a result)
0any A+ object *
1A+ object consisting of integers
2A+ object consisting of floating-point numbers
3A+ object consisting of characters
4data array of any A+ object
5data array of A+ object consisting of integers
6 data array of A+ object consisting of floating-point numbers
7data array of A+ object consisting of characters *
8 First element of data array (use only for void result) *
9single integer *
10single floating-point number (don't use)
11single character (don't use)
12unique copy of any A+ object
13unique copy of A+ object consisting of integers
14 unique copy of A+ object consisting of floating-point numbers
15unique copy of A+ object consisting of characters

*   The result must be one of the codes marked with an asterisk.

Codes 0-3 pass a pointer to the A+ structure. Codes 4-7 pass a pointer to the data array within the A structure (aobj->p). Codes 8-11 pass the value of the first element of the data array of the A+ object (*aobj->p). This does not work correctly for characters and floating-point numbers, which have a different size. Codes 12-15 pass an A+ object whose reference count is guaranteed to be 1. This means that you can modify the object without causing adverse side effects.

   Practice

Although all 16 codes are defined, not all of them are useful. In fact, if you need information about the rank and type of your argument, only types 0 and 12 should be used.

If you need to know anything about the shape of the argument, the entire A+ object must be passed. This limits you to types 0-3, or 12-15, which are used only in special circumstances (see below). Using types 1-3 causes the interface to return a type error if you are passed the wrong kind of data. It also coerces floating-point numbers to integer (provided that 1|data is 0). You must do any checking for rank or length yourself.

Since argument types 4-7 and 8-11 do not pass you the entire A+ object, you cannot check rank or the number of elements of the object.

Argument types 4-6 are problematic. They pass you a pointer to the data array, but you have no way of knowing the size or dimensions of the array. Type 7 (character) is useful, and will pass you a character string, which is null-terminated. You will, however, lose all shape information, so a vector of length 15 will appear identical to a 3 by 5 matrix.

Argument types 8-11 are designed for single-element arrays. (Anything else generates a rank or length error.) This is currently defined only for integers, so types 10 and 11 should not be used. When passing or returning a scalar integer, use type 9. When a function does not return a result, use type 8.

Types 12-15 are used when you want to make an internal modification to the A+ structure passed, and then return the result. An argument passed this way can be safely modified.

Returning a Result from a C Function

Most C functions called by A+ return a result. The argument type for the result must be 0 (an arbitrary A+ object), 7 (a character string), or 9 (an integer scalar). If your function does not return a result, it should be declared as "void", and the return type should be 8.

To return a scalar integer (type 9), just use the return command, e.g., return(7).

To return a character string, also use the return command. Be sure to declare your function as returning a char*. E.g., char *hw() { return("Hello, world!"); }.

Note that A+ takes a copy of the string you return, so you are responsible for freeing any strings you create with malloc(), ma(), or strdup(). In general, you can free the string just before returning it, and this will work fine. ma() performs atmp memory allocation, where the argument specifies the number of words, and returns I*.

If you are returning an argument as the result, you must use ic(). See "Modifying and Returning Arguments".

If you are returning an A+ object (arg type 0 and not an argument to the function), you must create the appropriate object. The next section describes how to do this. Declare your function as returning an "A" type - a pointer to an A+ object. For example,
A foo(x,y)

If you are returning an A+ object, return(0) causes a null to be returned. Returning 0 can also be used to indicate an error condition.

   Creating A+ Objects

A+ provides several functions to create A+ objects. You must know the size and type of an A+ object before you create it. There are several functions to make it easier to create common A+ objects, such as vectors, or integer scalars.
     Initialized object
A gi(i) I i;            /* make a scalar integer */
A gf(f) F f;            /* make a scalar float */
A gsv(x,s) I x; C *s;   /* make a string; x is 0 (raw), 1 (apl), or 2 (c) */
A gc(t,r,n,d,p)I t,r,n,*d,*p;   /* make an A+ object,
                                   copying data from p */
     Uninitialized object
Creating A+ Objects
ExpressionEffect
A gs(t) I t;make a scalar
A gv(t,n) I t, n; make a vector
A gm(t,d1,d2) I t,d1,d2; make a matrix (2-dimensional array)
A ga(t,r,n,d) I t,r,n,*d; make an array (r dimensions)
A gd(t,a) I t; A a; make an object taking r,n,d from a.
gd(t,a) <=> ga(t,a->r,a->n,a->d)

The argument names, in all cases, conform to the A+ structure described above. t is type, r is rank, n is the number of elements, d is the array of dimensions. New A+ objects always have reference counts of one.

Because creating A+ objects involves memory allocation, whenever you create an A+ object you must later either destroy it with dc() (see below), or return it, either alone or as part of a nested array.

   Memory Allocation - What to Do and What Not to Do

A+ includes its own memory allocation functions for atmp: ma(), mab(), and mf(). They work pretty much like malloc() and free(), except that ma() takes a number of words as an argument rather than a number of bytes (1 word = 4 bytes) and that ma() and mab() use atmp and malloc() uses the heap, and therefore should probably be limited to small allocations, under 1K, say.

For portability, use mab() or malloc(), and cover your allocation and deallocation routines, checking for errors such as no more space. If you do use ma(), be careful! Remember it takes an argument in words, not bytes.

Anything you allocate with ma() or mab() you must free with mf(). Anything that you allocate with malloc() or strdup() must be freed with free(). Don't mix them up. (This is another good reason to stick with mab().)

To "erase" an A+ object, call dc(aobj), not mf(aobj). This should be rare, since you should not erase arguments to your function, so the only A+ objects you erase should be ones that you created earlier in the function. This shouldn't come up too often.

Modifying and Returning Arguments

In general, C routines called from A+ are expected to behave like A+ routines - all arguments are call by value. This means that the arguments should not be modified, since that would cause unexpected side effects in the A+ workspace.

However, if you use argument types 12-15, you can safely modify the arguments to the function. These types guarantee that the argument has a reference count of 1.

When you create your own A+ object, using ga() for example, you can simply return the created object when your program exits. This is not true for modified arguments, or arguments returned as part of a nested array. To return a modified argument, or incorporate an argument as part of a nested array, you must run ic() on the object.

The reason is that the function dc() is run on all arguments after your program exits. This function causes the arguments to be erased unless you increase the reference count with ic(). If you forget to do this, values of variables in the A+ workspace will be changed randomly.

Since ic() is defined as returning an integer, you will often want to cast the result to type A. If you don't do this, you will get the compiler warning "illegal combination of pointer and integer".

   Examples of Modifying and Returning Arguments

Example 1: join

Let's say we want to write a function, join(), that takes two A+ objects and returns a nested array containing the two elements. That is, join{a;b} is the same as (a;b).

In C, we would write:

A join(obj1, obj2)
  A obj1, obj2;
  {
   A result=gv(Et, 2);      /* create nested vector of length 2 */
   result->p[0]=ic(obj1);   /* load result vector with objs,
                               incrementing reference count */
   result->p[1]=ic(obj2);
return(result);             /* return result, not incremented
                               because we created it in this function */
}
After compiling the function into join.o, we would then enter in A+:
     'join.o' _dyld ('_join';'join';0 0 0)
We can now use join as a function in the workspace:
     7 join 'abc'
<  7
< abc
Example 2: clone

Now we want to write a function that takes an arbitrary A+ object (aobj), and an integer (n), and returns a nested array containing n copies of aobj. That is, clone(aobj, n) is equivalent to n <aobj. Notice that we increment the reference count each time we insert aobj into the nested array.

In C we would write:

A clone(aobj, n)
  A aobj;
  I n;
{
    I i;
    A result=gv(Et, n);
    for(i=0;i<n;++i) result->p[i]=ic(aobj);
    return(result);
}
We would load in A+ by entering:
     'clone.o' _dyld ('_clone';'clone';0 0 9)
     'abc' clone 2
< abc
< abc
Example 3: ravel

Now let's write a function that modifies its argument. We will replicate the Ravel function (monadic comma). Whatever we get, we will turn into a vector. We could do this by copying the aobj into a vector:

A ravel1(aobj)
  A aobj;
{
    A result;
    /* make new a object */
    result=gc(aobj->t, 1, aobj->n, &aobj->n, aobj->p);
    return(result);
}
To load in A+:
     'ravel1.o' _dyld ('_ravel1';'ravel';0 0)
We can get a somewhat neater and faster function if we modify the argument in place. Thus:
A ravel2(aobj)
  A aobj;
{
    aobj->r = 1;         /* change argument in place */
    aobj->d[0]=aobj->n;
    return(ic(aobj));    /* increment rc of modified argument */
}

     'ravel2.o' _dyld ('_ravel2';'ravel';0 12)
Note that we must now use argument type 12, and increment the reference count on the result (because it will automatically be decremented upon this function's return). Note also that, since the argument that is being modified is of type 12, it may be either the argument that appears in the A+ expression calling the function or a copy of that object. It will be a copy unless the reference count is 1 for the object.

Signalling Errors

If you detect an infelicity in your function, you may want to cause the A+ process to suspend execution and indicate, for example, a length error. This is done using two external variables: I q; C *qs;

To report an error, set the value of q (and possibly qs), and return(0). If your program returns 0, the A+ process will check the value of q. A nonzero value indicates an error condition.

Positive numbers represent different predefined error codes, as shown in the following table.

Error Codes
CodeMeaning
1interrupt
2wsfull
3stack
4value
5valence
6type
7rank
8length
9domain
CodeMeaning
10index
11mismatch
12nonce
13maxrank
14nonfunction
15parse
16maxitems
17invalid
18nondata (for an argument of any type other than 0, A+ will check for nondata; you must detect and handle wrongly nondata type 0 arguments)

If q is -1, the A+ process will report the error in the qs string.

The file a/firca.h contains #defines for these codes, as well as macros for reporting error conditions. For example:

if (a != b) ERROUT(ERR_LENGTH);       /* reports a length error */

if (positive(a)) ERRMSG("polarity"); /* reports a polarity error */

Note that these macros exit the function, so be sure to clean up first!

Executing A+ Expressions from Dynamically Loaded C Programs

Your C programs that have been dynamically loaded into A+ can execute A+ expressions. This allows you to switch between A+ and C as needed. Note that you must start with an A+ process, however, to execute these dynamically loaded programs.

The entry points pex() and ex()allow you to do this. pex() takes one argument, a pointer to a string containing the A+ expression you wish to execute. ex() takes two arguments. The first is a context. The second is the string to execute in that context. The prototypes for these functions are:

I pex(I a);
I ex(CX c, C *s);

The longs returned by both functions are pointers to A+ objects.

Mapped Files

Mapped Files look very much like other A+ variables, from the C perspective, and have headers as described in "The Basic A+ Data Types". They have reference counts of 0 to distinguish them, however. That is, (0==aobj->c) means the object is a mapped file.

There is an entry point called wr() which will return 1 for a writable mapped file, and 0 otherwise. So a writable mapped file is indicated by (0==aobj->c && wr(aobj)).

If you write to an aobj where (0==aobj->c && !wr(aobj)) you will cause a segv.

wr() references a variable called wt, which is a list of writable addresses. This code is in y.c, in the a source directory.

Memory Allocation in A+ - a Closer Look

A+ uses the function ma() to allocate memory. This function is specified to return memory locations that begin on 8-byte boundaries, freeing the last three bits for encoding purposes, which is how they are used.

A+ consists of several types of entities, all represented as integer-size objects. The last three bits of the object indicate the type of object.

The most common case (for our purposes) is for those three bits to be 000, which indicates a pointer to an A+ object. In this case, the pointer can be used as is.

Other codes require some manipulation. For example, if the code is 010, the object is a pointer to a symbol, and the last three bits must be cleared before the pointer is used. (As stated above, all pointers in A+ point to 8-byte boundaries, as allocated by ma(), so the last three bits must be 000, and it is this fact that allows A+ to use the last three bits for type encoding.)

Several macros are provided ina/arthur.h to query the type of an object. The next table gives a list of the macros, and the types of objects they represent.

Macros for Querying Object Type
CodeMacroObject Type
0QA(a)pointer to A+ object (struct a)
1QV(a)global variable (struct v)
2QS(a)pointer to symbol (struct s)
3QE(a)pointer to expression (struct e)
4QN(a)flow-control/operator
5QL(a)local variable
6QP(a)primitive
7QX(a)dynamically loaded function

Also defined are macros which clear the last three bits for those entities which serve as pointers. They are:

XS(a) retrieve pointer to struct s
XV(a)
retrieve pointer to struct v
XE(a)
retrieve pointer to struct e

These three macros work by zeroing out the last three bits and casting the result to the appropriate pointer type. Notice that there are no macros for A+ objects, although you will occasionally have to cast them.

Finally, there are macros to add the proper code into the last three bits. They are:

MV(a) global variable (struct v)
MS(a) pointer to symbol (struct s)
ME(a) pointer to expression (struct e)
MN(a) flow-control structure or operator
ML(a) local variable
MP(a) primitive function
MX(a) dynamically loaded function

The Symbol Structure

The structure of symbols is defined ina/arthur.h as:
typedef struct s{struct s *s; C n[4];} *S;

s   the next symbols. This field is included because all symbols created by A+ are stored in linked lists.

n   a character string with the name of the symbol.

   Symbols in Variables

Symbols are always contained in A+ objects of type Et (nested). In normal nested objects, the elements of p[] point to A+ objects (struct a). With symbols, they point to symbols (struct s).

   Using Symbols in Functions

        Recognizing symbols.
Whenever you encounter an object of type Et, the nested elements may be any type of A+ entity, or several types mixed together. You should always check the contents of p[] individually, using the Q_() macros described above, when using objects of type Et.

To check if an element of an Et object is a symbol, use the QS() macro.

        Getting the name of a symbol.
To get the character string associated with the symbol (its "name"), you must turn the symbol into a pointer to an s-struct using the XS() macro, and access the n field within that structure.

Example 1: Recognizing a symbol and getting its name.

The following C function examines an A+ object and prints out the symbols it contains.

void printsymbols( aobj)
  A aobj;
{
  int i;
  S sym;
  if (Et != aobj->t ) {
    printf("object not nested\n");
    return;
  }
  for (i=0 ; i<aobj->n ; ++i ) {
    if (QS(aobj->p[i]) {
      sym = XS(aobj->p[i]);
      printf ("Symbol:%s\n", sym->n);
    } else printf("Not a symbol\n");
  }
}
        Creating a symbol
It is an essential property of symbols that, if two symbols have the same name, they are the same symbol (point to the same memory location).

For this reason, you must always create symbols by using the si() function. This function takes a character string as an argument, and returns a pointer to an s-struct. If the symbol already exists, you get the current memory location. Otherwise, a new symbol is created and stored in A+, and the address of the new symbol is returned.

If you intend to insert a symbol into an A+ object, you must encode the last three bits as 010, which is best done with the MS() macro. Then load the symbol into the p[] field of an A+ object of type Et.

Example 2: Returning a symbol

The following function takes a string and returns a symbol with the string as the name.

A makesymbol(str)
  char *str;
{
  A res=gs(Et);
  res->p[0] = MS(si(str));
  return(res);
}
        Comparing symbols
Because symbols with the same name are always in the same memory location, you don't need to use string comparisons to check symbols for identity. The result of si() will match for any symbol that you use. Just make sure that the symbols you are comparing either both have the 010 in their last three bits, or both have not.

Example 3: Comparing symbols.

The following function checks whether the A+ object contains the symbol `qwerty. If so, it returns 1, else 0.

queryqwerty( aobj)
  A aobj;
{
  int i;
  S qwerty = MS(si("qwerty"));    /* get symbol and set 010 code */
  if (Et != aobj->t ) return(0);
  for (i=0 ; i<aobj->n; ++i) {
      if (qwerty == (S) aobj->p[i]) return(1);
    }
  return(0);
}

doc@aplusdev.org© Copyright 1995–2008 Morgan Stanley Dean Witter & Co. All rights reserved.