Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Datatype and Pointer type #550

Closed
wants to merge 16 commits into from
Closed

Memory Datatype and Pointer type #550

wants to merge 16 commits into from

Conversation

xeren
Copy link
Collaborator

@xeren xeren commented Nov 3, 2023

Adds datatypes to memory objects, as well as the opaque pointer type to the IR. Removes the preprocessing step that replaces GEP expressions with simple additions.
To infer the datatype of dynamically-allocated objects, the program is scanned for occurrences of that address. If those occurrences do not agree on one type, it defaults to byte. The current implementation is unsound, as it ignores registers and the base address being stored in memory, where another reader may assume another type. The datatype currently controls initializations.
The pointer type will allow the encoder to choose a different theory for address expressions. The IR will be closer to the LLVM IR and should become more readable.

More:

  • Since memory objects and functions cannot keep being implementations of IConst anymore, the interface is removed entirely.

Comment on lines +328 to +330
public Formula visit(NullPointer constant) {
return context.useIntegers ? integerFormulaManager().makeNumber(0) :
bitvectorFormulaManager().makeBitvector(64, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use size of the pointer type rather than the hardcoded value 64.

}
// only encode offset expressions if not constant.
// byteCounts already contains constant offset values.
final Formula count = constant != null ? null : offset.accept(this);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this distinction is important. I think the method is a lot more complicated than it needs to be.

Also, seeing all this distinction between integers and bitvectors, I think it might be a good idea to have a NumberEncoder that encapsulates IntegerFormulaManager and BitvectorFormulaManager with a unifying interface to avoid all this code duplication.

Comment on lines +27 to +29
int result = value.intValue();
checkState(BigInteger.valueOf(result).equals(value), "Integer bit length exceeded by %s", value);
return result;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use value.intValueExact which automatically throws when the conversion loses information.

// If we reduce MemoryObject as a normal IConst, we loose the fact that it is a Memory Object
// We cannot call reduce for RSHIFT (lack of implementation)
if(!(lhs instanceof MemoryObject) && op != RSHIFT) {
if(op != RSHIFT) {
return expressions.makeBinary(lhs, op, rhs).reduce();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the RSHIFT restriction was because back then we didn't know the bitwidth of the expression, so we didn't know if a sign-bit gets shifted or not. Nowadays, I think this should work fine.
Also, the comment about MemoryObject can be removed.

@@ -160,7 +158,7 @@ public Expression visit(IExprBin iBin) {
// Rule for associativity (rhs is IConst) since we cannot reduce MemoryObjects
// Either op can be +/-, but this does not affect correctness
// e.g. (&mem + x) - y -> &mem + reduced(x - y)
if(lhs instanceof IExprBin lhsBin && lhsBin.getRHS() instanceof IConst && lhsBin.getOp() != RSHIFT) {
if(lhs instanceof IExprBin lhsBin && lhsBin.getRHS() instanceof IValue && lhsBin.getOp() != RSHIFT) {
Expression newLHS = lhsBin.getLHS();
Expression newRHS = expressions.makeBinary(lhsBin.getRHS(), lhsBin.getOp(), rhs).reduce();
return expressions.makeBinary(newLHS, op, newRHS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization is wrong, no? The LHS op may be mul, div, etc. all of which would give wrong results.
Even if the LHS is sub, this is wrong: (a - x) - y != a - (x - y)


import static com.google.common.base.Preconditions.checkArgument;

public final class PointerCast implements Expression {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is inconsistent that we use a unary integer expression with cast op for signed/unsigned casts but then use a dedicated PointerCast class for casts from and to pointers.
It would be better to have a single CastExpression class for all casts. It shouldn't be done in this PR, but we have to keep it in mind.

mem.setCVar(name);
return mem;
}

public MemoryObject newMemoryObject(String name, int size) {
checkState(!locations.containsKey(name),
"Illegal allocation. Memory object %s is already defined", name);
final MemoryObject mem = program.getMemory().allocate(size, true);
final MemoryObject mem = program.getMemory().allocate(types.getArchType(), size, true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we use archType here but use byteType for the virtual allocations?
Also, what is the meaning of archType now? Is it an integer type whose size matches the size of pointer types?

Comment on lines +388 to +390
Expression result = v1.getType() instanceof IntegerType ?
expressions.makeBinary(v1, ctx.opArith().op, v2) :
expressions.makeGetElementPointer(archType, v1, List.of(v2));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks dangerous. There needs to be at least a check that the op is + when doing GEP.
I'm also not sure if we can silently introduce GEPs/PointerType into Litmus code like this. Maybe it is fine for CLitmus, but in general we need to be careful.
A canonical way to handle litmus code while requiring pointer types for e.g. memory accesses, is to put a ptr2int cast when usingMemoryObject and int2ptr when doing a memory access. In that way, the litmus code will think everything is an integer.

Comment on lines +73 to +76
? (expr -> expr instanceof IValue || expr instanceof MemoryObject || expr instanceof Function ||
expr instanceof BConst || expr instanceof Register)
: (expr -> expr instanceof IValue || expr instanceof MemoryObject || expr instanceof Function ||
expr instanceof BConst);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can refactor the common check into a isConstant method.

@xeren
Copy link
Collaborator Author

xeren commented Nov 15, 2023

With how big this has gotten again, and the number of issues, I will close this PR.

@xeren xeren closed this Nov 15, 2023
@hernanponcedeleon hernanponcedeleon deleted the pointer-type branch February 18, 2024 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants