Variables & Types 2 #383
Labels
architecture
enhancement
New feature or request
high priority
nonlocal-analysis
Scopes or flow analysis
Here's a new, more specific proposal superseding #354
Implementing this would close #272 #124 #277 #346 #328
I no longer think Polygolf should have scopes or variable declarations. Declarations should be added as dictated by the target langs. Functions should obviously have their own scope, but I woudn't mind removing them for now.
Types of variables should be refined on each assingment. Dead code should be eliminated. Assignments that don't cause side effects (which don't really exist in current Polygolf anyway - except for
read
opcodes) should be inlined correctly if shorter. Variables should be merged into one if they are not interfering if that's shorter (less declarations etc.).Examples
Polygolf
Nim
Polygolf
Python
Static single assignment
For better understanding of the input programs and achieving the goals outlined above, we need SSA.
Programs should be converted to SSA as the very first step - even before typechecking. Typechecking & inference is then done on the SSA form. This improves the inference because at each use, we know the type depends on the last def (assingment) only. At merge points, the types are unioned, if the union is not representable in Polygolf, (
Int
|Text
) that's an error. Refering to a variable that is not defined along every path that leads to the usage is an error. Ssa bindings are identified by distinct integers.The above programs would be
I'm thinking the typechecking phase (plugin) could annotate the inferred type on the SSA read (use) nodes. After that a type of each node could be calculated by just examining its descendants, rather than having to lookup the bindings in the entire program.
Type inference
How does one infer types for this program
in SSA
?
As we can see, the type of
$2
depends on the type of$3
and vice versa. To resolve this, we narrow the type of each ssa binding iteratively.Here, we
$1
as64..64
$3
asInt
because otherwise the type of(phi $1 $3)
is not representable$2
asInt
because that's the union of types of$1
&$3
$3
as0..102
because of the result of the modulo$2
as0..102
because that's the union of types of$1
&$3
.We could try guessing some common subtype of the top type (like
Ascii
,0..oo
, etc.) instead and only fallback to the top type on failure.Coming out of SSA
When golfing on the SSA is done, we need to get rid of the ssa nodes and introduce variables again.
Variable name allocations
The plugin doing this should be parametrizable by a list of constraints that prevent the ssa binding to be allocated to the same variable. Except for the implicit constraint that the two definitions must not be interfering, these include
different original names
- this is a very weak constraint that's not needed for correctness, but this is could be used for Polygolf & Python,different target types
- this is needed for statically typed langs so that we don't end up sharing int64 & bigint in a single variable for example. If a language relies on this, it should apply a plugin that assings each node itstargetType
different nature of binding
- for example the foreach loop var is immutable in some langs, but not others.I'm not sure about the exact algorithm, but I would try first "merging" variables that occur together as arguments of a single phi function.
Introducing scopes & var declarations
Some langs (Python) would not apply such plugin.
Others would need to add var declarations. Note that one can not just add the declaration to the first assignment, as is done now, because of there might other assingments that are in a different scope. The simplest implementation would declare all variables at the start of the program. Better algorithms for declaration placement could be implemented, that shoudn't be very hard.
Control flow type narrowing
Types could be narrowed by conditions in
if
,while
&conditional
nodes but I feel like that's kinda orthogonal to everything else.Other paradigms
I believe using the SSA as the primary representation should be good for implementing functional languages. Allocating as few variables (registers) as possible is crucial for languages like brainfuck & Hexagony. Some other help functions and code representations that would be added to make this work should be helpful for some languages as well. For 2D languages like ><> and Hexagony, you need to understand the control flow graph and lay it out in the plane as tightly as possible.
The text was updated successfully, but these errors were encountered: