Variables & Types 2 #383

MichalMarsalek · 2024-04-04T19:36:05Z

Here's a new, more specific proposal superseding #354
Implementing this would close #272 #124 #277 #346 #328
I no longer think Polygolf should have scopes or variable declarations. Declarations should be added as dictated by the target langs. Functions should obviously have their own scope, but I woudn't mind removing them for now.
Types of variables should be refined on each assingment. Dead code should be eliminated. Assignments that don't cause side effects (which don't really exist in current Polygolf anyway - except for read opcodes) should be inlined correctly if shorter. Variables should be merged into one if they are not interfering if that's shorter (less declarations etc.).
Examples

Polygolf

$x <- (@0):Ascii;
if ((#$x) < 99) {
  while ((#$x) < 99) {
    $x +<- $x;
  };
  $branch <- ("first" + $x);
}{
  $branch <- "second";
};
println $x;
$y <- (@1);
while (contains $y "  ") {
  $y <- (replace $y "  " " ");
};
println $y;
println $branch ;

Nim

import os
var
 x=1.paramStr
 b="second"
if x.len<99:
 while x.len<99:x+=x
 b="first"+x
x=2.paramStr
while"  "in x:x=x.replace("  "," ")
x.echo
b.echo

Polygolf

$a <- (@0);
$b <- ($a + "!");
$a <- "fail";
println $b;

Python

import sys
print(sys.argv[1]+"!")

Static single assignment

For better understanding of the input programs and achieving the goals outlined above, we need SSA.
Programs should be converted to SSA as the very first step - even before typechecking. Typechecking & inference is then done on the SSA form. This improves the inference because at each use, we know the type depends on the last def (assingment) only. At merge points, the types are unioned, if the union is not representable in Polygolf, (Int | Text) that's an error. Refering to a variable that is not defined along every path that leads to the usage is an error. Ssa bindings are identified by distinct integers.

The above programs would be

$1 <- (@0):Ascii;
if ((#$1) < 99) {
  $3 <- (phi $1 $2);
  while ((#$3) < 99) {
    $2 +<- $3;
  };
  $4 <- (concat "first" $3);
}{
  $5 <- "second";
};
$6 <- (phi $4 $5);
println $3;
$7 <- (@1);
$9 <- (phi $7 $8);
while (contains $7 "  ") {
  $8 <- (replace $9 "  " " ");
};
println $9;
println $6;

$1 <- (@0);
$2 <- ($1 + "!");
$3 <- "fail";
println $2;

I'm thinking the typechecking phase (plugin) could annotate the inferred type on the SSA read (use) nodes. After that a type of each node could be calculated by just examining its descendants, rather than having to lookup the bindings in the entire program.

Type inference

How does one infer types for this program

$i <- 64;
while ($i > 10) {
  $i <- (($i * 2) mod 103);
};

in SSA

$1 <- 64;
$2 <- (phi $1 $3);
while ($2 > 10) {
  $3 <- (($2 * 2) mod 103);
};

?
As we can see, the type of $2 depends on the type of $3 and vice versa. To resolve this, we narrow the type of each ssa binding iteratively.
Here, we

infer $1 as 64..64
infer $3 as Int because otherwise the type of (phi $1 $3) is not representable
infer $2 as Int because that's the union of types of $1 & $3
infer $3 as 0..102 because of the result of the modulo
infer $2 as 0..102 because that's the union of types of $1 & $3.

We could try guessing some common subtype of the top type (like Ascii, 0..oo, etc.) instead and only fallback to the top type on failure.

Coming out of SSA

When golfing on the SSA is done, we need to get rid of the ssa nodes and introduce variables again.

Variable name allocations

The plugin doing this should be parametrizable by a list of constraints that prevent the ssa binding to be allocated to the same variable. Except for the implicit constraint that the two definitions must not be interfering, these include

different original names - this is a very weak constraint that's not needed for correctness, but this is could be used for Polygolf & Python,
different target types - this is needed for statically typed langs so that we don't end up sharing int64 & bigint in a single variable for example. If a language relies on this, it should apply a plugin that assings each node its targetType
different nature of binding - for example the foreach loop var is immutable in some langs, but not others.

I'm not sure about the exact algorithm, but I would try first "merging" variables that occur together as arguments of a single phi function.

Introducing scopes & var declarations

Some langs (Python) would not apply such plugin.
Others would need to add var declarations. Note that one can not just add the declaration to the first assignment, as is done now, because of there might other assingments that are in a different scope. The simplest implementation would declare all variables at the start of the program. Better algorithms for declaration placement could be implemented, that shoudn't be very hard.

Control flow type narrowing

Types could be narrowed by conditions in if, while & conditional nodes but I feel like that's kinda orthogonal to everything else.

Other paradigms

I believe using the SSA as the primary representation should be good for implementing functional languages. Allocating as few variables (registers) as possible is crucial for languages like brainfuck & Hexagony. Some other help functions and code representations that would be added to make this work should be helpful for some languages as well. For 2D languages like ><> and Hexagony, you need to understand the control flow graph and lay it out in the plane as tightly as possible.

The text was updated successfully, but these errors were encountered:

MichalMarsalek added the enhancement New feature or request label Apr 4, 2024

MichalMarsalek mentioned this issue Apr 4, 2024

Variables & Types #354

Closed

MichalMarsalek added high priority architecture nonlocal-analysis Scopes or flow analysis labels Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variables & Types 2 #383

Variables & Types 2 #383

MichalMarsalek commented Apr 4, 2024 •

edited

Loading

Variables & Types 2 #383

Variables & Types 2 #383

Comments

MichalMarsalek commented Apr 4, 2024 • edited Loading

Static single assignment

Type inference

Coming out of SSA

Variable name allocations

Introducing scopes & var declarations

Control flow type narrowing

Other paradigms

MichalMarsalek commented Apr 4, 2024 •

edited

Loading