Memory Datatype and Pointer type #550

xeren · 2023-11-03T16:37:30Z

Adds datatypes to memory objects, as well as the opaque pointer type to the IR. Removes the preprocessing step that replaces GEP expressions with simple additions.
To infer the datatype of dynamically-allocated objects, the program is scanned for occurrences of that address. If those occurrences do not agree on one type, it defaults to byte. The current implementation is unsound, as it ignores registers and the base address being stored in memory, where another reader may assume another type. The datatype currently controls initializations.
The pointer type will allow the encoder to choose a different theory for address expressions. The IR will be closer to the LLVM IR and should become more readable.

More:

Since memory objects and functions cannot keep being implementations of IConst anymore, the interface is removed entirely.

Add .getPrimitiveFields(Type,int) Add .getByteOffset(Type,List) Add memory size to BooleanType

ThomasHaas · 2023-11-04T11:03:06Z

dartagnan/src/main/java/com/dat3m/dartagnan/encoding/ExpressionEncoder.java

+    public Formula visit(NullPointer constant) {
+        return context.useIntegers ? integerFormulaManager().makeNumber(0) :
+                bitvectorFormulaManager().makeBitvector(64, 0);


Use size of the pointer type rather than the hardcoded value 64.

ThomasHaas · 2023-11-04T11:14:08Z

dartagnan/src/main/java/com/dat3m/dartagnan/encoding/ExpressionEncoder.java

+            }
+            // only encode offset expressions if not constant.
+            // byteCounts already contains constant offset values.
+            final Formula count = constant != null ? null : offset.accept(this);


I don't understand why this distinction is important. I think the method is a lot more complicated than it needs to be.

Also, seeing all this distinction between integers and bitvectors, I think it might be a good idea to have a NumberEncoder that encapsulates IntegerFormulaManager and BitvectorFormulaManager with a unifying interface to avoid all this code duplication.

ThomasHaas · 2023-11-04T11:26:50Z

dartagnan/src/main/java/com/dat3m/dartagnan/expression/IValue.java

+        int result = value.intValue();
+        checkState(BigInteger.valueOf(result).equals(value), "Integer bit length exceeded by %s", value);
+        return result;


you can use value.intValueExact which automatically throws when the conversion loses information.

ThomasHaas · 2023-11-04T11:34:19Z

dartagnan/src/main/java/com/dat3m/dartagnan/expression/processing/ExprSimplifier.java

            // If we reduce MemoryObject as a normal IConst, we loose the fact that it is a Memory Object
            // We cannot call reduce for RSHIFT (lack of implementation)
-            if(!(lhs instanceof MemoryObject) && op != RSHIFT) {
+            if(op != RSHIFT) {
                return expressions.makeBinary(lhs, op, rhs).reduce();


I think the RSHIFT restriction was because back then we didn't know the bitwidth of the expression, so we didn't know if a sign-bit gets shifted or not. Nowadays, I think this should work fine.
Also, the comment about MemoryObject can be removed.

ThomasHaas · 2023-11-04T11:41:27Z

dartagnan/src/main/java/com/dat3m/dartagnan/expression/processing/ExprSimplifier.java

@@ -160,7 +158,7 @@ public Expression visit(IExprBin iBin) {
                // Rule for associativity (rhs is IConst) since we cannot reduce MemoryObjects
                // Either op can be +/-, but this does not affect correctness
                // e.g. (&mem + x) - y -> &mem + reduced(x - y)
-                if(lhs instanceof IExprBin lhsBin && lhsBin.getRHS() instanceof IConst && lhsBin.getOp() != RSHIFT) {
+                if(lhs instanceof IExprBin lhsBin && lhsBin.getRHS() instanceof IValue && lhsBin.getOp() != RSHIFT) {
                    Expression newLHS = lhsBin.getLHS();
                    Expression newRHS = expressions.makeBinary(lhsBin.getRHS(), lhsBin.getOp(), rhs).reduce();
                    return expressions.makeBinary(newLHS, op, newRHS);


This optimization is wrong, no? The LHS op may be mul, div, etc. all of which would give wrong results.
Even if the LHS is sub, this is wrong: (a - x) - y != a - (x - y)

ThomasHaas · 2023-11-04T11:48:08Z

dartagnan/src/main/java/com/dat3m/dartagnan/expression/PointerCast.java

+
+import static com.google.common.base.Preconditions.checkArgument;
+
+public final class PointerCast implements Expression {


I think it is inconsistent that we use a unary integer expression with cast op for signed/unsigned casts but then use a dedicated PointerCast class for casts from and to pointers.
It would be better to have a single CastExpression class for all casts. It shouldn't be done in this PR, but we have to keep it in mind.

ThomasHaas · 2023-11-05T09:38:38Z

dartagnan/src/main/java/com/dat3m/dartagnan/parsers/program/utils/ProgramBuilder.java

        mem.setCVar(name);
        return mem;
    }

    public MemoryObject newMemoryObject(String name, int size) {
        checkState(!locations.containsKey(name),
                "Illegal allocation. Memory object %s is already defined", name);
-        final MemoryObject mem = program.getMemory().allocate(size, true);
+        final MemoryObject mem = program.getMemory().allocate(types.getArchType(), size, true);


Why can we use archType here but use byteType for the virtual allocations?
Also, what is the meaning of archType now? Is it an integer type whose size matches the size of pointer types?

ThomasHaas · 2023-11-05T09:48:19Z

dartagnan/src/main/java/com/dat3m/dartagnan/parsers/program/visitors/VisitorLitmusC.java

+        Expression result = v1.getType() instanceof IntegerType ?
+                expressions.makeBinary(v1, ctx.opArith().op, v2) :
+                expressions.makeGetElementPointer(archType, v1, List.of(v2));


This looks dangerous. There needs to be at least a check that the op is + when doing GEP.
I'm also not sure if we can silently introduce GEPs/PointerType into Litmus code like this. Maybe it is fine for CLitmus, but in general we need to be careful.
A canonical way to handle litmus code while requiring pointer types for e.g. memory accesses, is to put a ptr2int cast when usingMemoryObject and int2ptr when doing a memory access. In that way, the litmus code will think everything is an integer.

ThomasHaas · 2023-11-05T10:03:41Z

...c/main/java/com/dat3m/dartagnan/program/processing/SparseConditionalConstantPropagation.java

+                ? (expr -> expr instanceof IValue || expr instanceof MemoryObject || expr instanceof Function ||
+                        expr instanceof BConst || expr instanceof Register)
+                : (expr -> expr instanceof IValue || expr instanceof MemoryObject || expr instanceof Function ||
+                        expr instanceof BConst);


I think we can refactor the common check into a isConstant method.

xeren · 2023-11-15T19:22:00Z

With how big this has gotten again, and the number of issues, I will close this PR.

xeren added 16 commits November 3, 2023 15:44

Encode (x & c) as (x % c+1) for fitting c when using integers

70482e9

Remove IConst

84452ae

Add TypeFactory.getPrimitiveFields(Type)

c964d0b

Add .getPrimitiveFields(Type,int) Add .getByteOffset(Type,List) Add memory size to BooleanType

Add GEPExpression.getIndexingTypes()

1527ead

Add MemoryObject.getDataType()

5e9451e

Add PointerType

7526048

Add NullPointer

830ef6e

Add PointerCast

4b966ce

IValue.getValueAsInt checks bounds

b8f37c0

Move MemoryObjectCollector to MemoryObject.Collector

9610249

Fix FieldSensitiveAndersen

a6c14fd

IfExpr no longer extends IExpr

a704aa6

Add encoding of PointerType

9d97bc9

Refactor FieldSensitiveAndersen

7134ff0

Integrate PointerType

5f58ca4

Litmus programs use pointer type

59ca125

ThomasHaas reviewed Nov 5, 2023

View reviewed changes

xeren mentioned this pull request Nov 15, 2023

Add IntegerType.isPointer() #572

Closed

xeren closed this Nov 15, 2023

hernanponcedeleon deleted the pointer-type branch February 18, 2024 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Datatype and Pointer type #550

Memory Datatype and Pointer type #550

xeren commented Nov 3, 2023

ThomasHaas Nov 4, 2023

ThomasHaas Nov 4, 2023

ThomasHaas Nov 4, 2023

ThomasHaas Nov 4, 2023

ThomasHaas Nov 4, 2023

ThomasHaas Nov 4, 2023

ThomasHaas Nov 5, 2023

ThomasHaas Nov 5, 2023

ThomasHaas Nov 5, 2023

xeren commented Nov 15, 2023


		import static com.google.common.base.Preconditions.checkArgument;

		public final class PointerCast implements Expression {

Memory Datatype and Pointer type #550

Memory Datatype and Pointer type #550

Conversation

xeren commented Nov 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xeren commented Nov 15, 2023