Encoding an Integer into a string in Pure Chapel

Does anybody have an example of doing this in pure Chapel please?

I have only ever done this in C or the like where I can do

s[i] = c

which is not an option in Chapel?

Yes, I can build an array of chars and then do a join() on them but that seems ike a sledge hammer.

I might be misunderstanding your question, but wouldn't casting the int to a string work?

var a = 12345,
    s = a : string;

writeln(s);
writeln(s.type: string);

which produces

12345
string
1 Like

Very smart. I never thought of that. Lean and mean. Exactly what I want.

Thanks Luca.

Almost. What if it want it in hexadecimal?

I thought it was too good to be true.

Hi @damianmoz - thanks for asking. I agree with Luca that casting to a string is a good way to do it.

But, as you said, what if you want hexadecimal?

One way to do it is with FormattedIO — Chapel Documentation 2.1, but please don't do that in anything performance-sensitive. It has a lot of overhead.

Due to that overhead, I recently string.appendCodepointValues to add numeric codepoint values to a string (and bytes has a similar way as well). You can use this to write something like the C code you would have written where you compute the ASCII value you want to append.

You can see an example of it in action in this function:

As far as a built-in and fast function do to this, it would be reasonable to add something to the standard library; feel free to make an issue asking for such a thing or a PR adding an implementation.

1 Like

I am trying to set something up for somebody to take over this task. I translated 'itoa.c' from the dark past. It prints out an integer in decimal or hexadecimal, prepending a minus sign if negative. or if the number is positive and a qualifier string is '+' it prints a '+' sign, if it is a '~', it prints a blank, and if it is an empty character it prints an empty character.

This works:

proc positive(z : string)
{
    param EMPTY = '';

    if z == '~' then
        return  ' ';
    else if z == '+' then
        return z;
    else
        return EMPTY;
}

This corrupts the stack'

proc positive(z : string)
{
    param EMPTY = '';

    return if z == '~' then ' ' else if z == '+' then z else EMPTY;
}

Should I keep going here or do a new entry in discourse or raise a github issue?

It should not be corrupting the stack & that's definitely a bug. Please make an issue about it.

1 Like

I perceive the digit-by-digit conversion a simple operation that ought to be easy to implement efficiently. I like the idea "allocate a buffer then fill it one element at a time." Since the buffer can't be a Chapel string, how about making it an array? Once the array is filled, use string.createCopyingBuffer() to convert it to a string. Since createCopyingBuffer does not accept Chapel arrays, I would make the buffer a c_array.

If this sounds interesting, I can post a prototype.

Vass

It looks like I found a compiler bug in my crude prototype. So I will post a Github issue with the source code and then we can go from there. It works as long ss I do not use a ternary operator in a tiny inline proc. Thanks all

Cuing off Lydia's post on the issue #25032, here is a skeleton for a function that uses c_array. Conceptually hairy, simple in practice:

proc int2string(arg: integral): string {
  use CTypes;
  var buf: c_array(uint(8), 32);
  var len = 0;

  // mockup of digit-by-digit conversion
  buf[0] = 49; len += 1;
  buf[1] = 102; len += 1;
  buf[2] = 0; // I think this is not needed

  return string.createCopyingBuffer(c_addrOf(buf[0]), len);
}

writeln(int2string(0x1f));

Just for fun: in my above int2string the buffer can be filled starting at the end going backwards:

.....
buf[31] = 0;
buf[30] = 102;
buf[29] = 49;
return string.createCopyingBuffer(c_addrOf(buf[29]), 2);
1 Like

Thanks Vass. Exactly what I was after. Filling from the back is the way to go. It results in less code.

At the moment this is storing in 8-bit unsigned integers. I might try and make it a little bit more obvious that one is working with characters over the weekend. I can then submit for review to see if my thought patterns are moving in the right directions or whether I have still not grasped the paradigm properly!

How much leaner and meaner is a c_array (with uint(8)s) than a Chapel string? Thanks

Thanks for the help
p is a string which if a>0 prints before the number, e.g. '+'
s s a separator (say into thousands), e.g. ','
g is the count at which to apply s, e.g. 3 (for thousands)
x is the leading character, e.g. 'x', with which to precede a non-decimal

proc itoa(a : int, p = '', s = '', g = 0, x = '', param b = 10)
{
    param NUL = 0:uint(8);
    param PLUS = '+'.toByte();
    param BLANK = ' '.toByte();
    param MINUS = '-'.toByte();
    param DIGITS = "0123456789abcdef";
    proc digit(i : uint) do return DIGITS[i:int(32)].toByte();
    proc achar(s : string) do return s[0].toByte();
    use CTypes;
    param M = max(int):uint + 1:uint;
    param L = if b > 8 then 31 else if b < 8 then 127 else 63;
    param base = b:uint;
    const sign = if a < 0 then MINUS else if p.size == 0 then NUL else achar(p);
    var n = if a == min(int) then M else abs(a):uint;
    var m = 1;
    var i = 1;
    var k = L;
    var t : c_array(uint(8), L + 1);

    // mandatory first element
    {
        const j = n / base, nmbj = n - base * j;

        t[L] = digit(nmbj); n = j;
    }
    while n > 0 do
    {
        const j = n / base, nmbj = n - base * j;

        if g > 1 && i == g then // separator
        {
            k -= 1; t[k] = achar(s); m += 1; i = 0;
        }
        k -= 1; t[k] = digit(nmbj); n = j; m += 1; i += 1;
    }
    assert(k > 0);
    if b != 10 && x.size > 0 then
    {
        k -= 1; t[k] = achar(x);
        k -= 1; t[k] = digit(0);
    }
    if sign != NUL then
    {
        k -= 1; t[k] = sign;
    }
    return string.createCopyingBuffer(c_addrOf(t[k]), L - k + 1);
}

Hello Damian,

Good to hear about your progress. Your code looks good!

I might try and make it a little bit more obvious that one is working with characters over the weekend.

This may be as simple as replacing uint(8) with c_char. Or c_uchar, depending on your taste.

How much leaner and meaner is a c_array (with uint(8)s) than a Chapel string?

A Chapel string, under the hood, is a record with a few pieces of information including a pointer to a heap-allocated buffer, which stores the characters. A c_array is simply a stack-allocated buffer with the characters.

BTW if you are in the mood for it: declare a nested function that stores the current character and advances the pointer, for example:

  proc next(c: c_char) { k -= 1; t[k] = c; }
  proc next(c: string) { next(achar(s)); }

Vass