Skip to content

Commit

Permalink
Lots of work on distributions and first-gen-filler
Browse files Browse the repository at this point in the history
  • Loading branch information
fenneltheloon committed Jun 30, 2023
1 parent 0e3e6e7 commit 43a1baa
Show file tree
Hide file tree
Showing 7 changed files with 463 additions and 42 deletions.
36 changes: 36 additions & 0 deletions bin-algs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Algorithim for computing num. of space groups in each bin
## In the edge-biased case

Compute the values of each $B_i$ for the following series where:
$$B_i = \text{round}\left[\begin{cases}
a\frac{\Chi\left(\frac{zi}{n}\right)}{\Chi(z)} & ,z > 0\\
a\frac{i}{n} & ,z=0\\
a\left(1 - \frac{\Chi\left(\frac{zi}{n}\right)}{\Chi(z)}\right) & ,z < 0\\
\end{cases}\right]$$

$$\Chi(x) = \int_0^x{\exp\left(-\frac{t^2}{2}\right)dt}$$

where:
- $B_i$ is the i-th bin
- $n$ is the total number of bins being divided into (positive integer)
- $a$ is the number of space groups (positive integer)
- $z$ is proportional to the number of standard deviations captured on the normal distribution (practically, will control how aggressively we will select for high space groups, the larger the magnitude of $z$, the more aggressive the selection)

**Note:** $\Chi(x)$ is a modified form of $\Phi(x)$, the CDF of a standard normal distribution. The $\frac{1}{\sqrt{2\pi}}$ term is divided out and as such does not need to be computed.

## In the non-edge-biased case
Let's say that we have a specific space group that we want to bias towards. We can then use something like this:

Let $p$ and $q$ be the lower and upper bounds of the desired range to bias towards (so the mean $\mu$ of the distribution is centered between them). Define a new inverse distribution $\Psi$:
$$\Psi(a,b) = b - a + \int_b^a{\exp\left(-\frac{t^2}{2}\right)dt}$$

Opposite to a bell curve, this density plot is 0 at input 0 and approaches 1 at large magnitude input.
We'll also define some useful terms that represent the z-values of space group 0 and space group $a$, $0^*$ and $a^*$ respectively:

$$0^* = -\frac{q+p}{2z(q-p)}$$
$$a^* = \frac{a}{z(q-p)} + 0^*$$

We then have:
$$B_i = \text{round}\left[a\left(\frac{\Psi(0^*, \frac{i(a^* - 0^*)}{n}+0^*)}{\Psi(0^*, a^*)}\right)\right]$$

where $a$, $i$, and $n$ mean the same the same thing, and $z$ this time refers to how far apart $p$ and $q$ are on the distribution (when $z = 1$, $p$ and $q$ are 1 standard deviation apart). Similar to before, the higher $z$ is, the more aggressively the distribution will be biased towards the range we are selecting for. Because the distribution density function is flipped, the value of $z$ is replaced by its multipliciative inverse to keep its behavior consistent. (So $z = 2$ means $p$ and $q$ are 0.5 std apart).
13 changes: 13 additions & 0 deletions bin_data.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
z-value,Bin 1,Bin 2,Bin 3,Bin 4,Bin 5,Bin 6,Bin 7,Bin 8,Bin 9,Bin 10,Bin 11,Bin 12
0.0,8.333333333333334,16.666666666666668,25.0,33.333333333333336,41.666666666666664,50.0,58.333333333333336,66.66666666666667,75.0,83.33333333333333,91.66666666666667,100.0
0.5,8.679396671173413,17.343740194114698,25.978055697784864,34.56752418751376,43.097558796358285,51.55387903514901,59.92258241187402,68.19021282265155,76.34382515512098,84.37104559004003,92.26012713753069,100.0
1.0,9.728215311670358,19.389146419618168,28.91690199418016,38.248328507162604,47.32426157128338,56.090642518800294,64.49946540442265,72.50952748028438,80.08696509078177,87.20556635624432,93.84686142814958,100.0
1.5,11.481775545623183,22.785772517928013,33.74242764442699,44.19797878330913,54.02085898859528,63.10646170108628,71.38000632817545,78.79741931730409,85.34432796345867,91.03342148997622,95.90055025510725,100.0
2.0,13.867752940628842,27.35645802296168,40.11786573259434,51.86119033692478,62.37227895975765,71.52327720109064,79.27241488195428,85.6550850276495,90.76855285460339,94.75322634936452,97.77341465173873,100.0
2.5,16.710664633596338,32.71406286149464,47.39146445741787,60.28299829577358,71.12672143531563,79.8618763150388,86.60065349003366,91.57928294111483,95.10182870317978,97.48865885107058,99.03749081550363,100.0
3.0,19.794706808098642,38.39615404032502,54.82253920013682,68.45376040656963,79.08355473637484,86.87309939798628,92.23719023030655,95.70836668195317,97.81919735291574,99.02541536140019,99.6731445161666,100.0
3.5,22.956530942246467,44.05360357851618,61.87139560105251,75.7007194624255,85.56494112660444,92.03098679463811,95.92616969329355,98.08256798919246,99.17965447075467,99.6925891425756,99.91297866416475,100.0
4.0,26.113386050376636,49.50462823672091,68.27327381243991,81.7629351222089,90.44765872478247,95.45602003175385,98.04314457055449,99.24021000786351,99.73633794103092,99.92051714708595,99.98175981499496,100.0
4.5,29.234151990729845,54.67490105951022,73.94159903140589,86.63914848924112,93.92136587514905,97.55576839421593,99.1341839785028,99.73069809829926,99.92686334678106,99.98299596203645,99.99697216648056,100.0
5.0,32.30779462002229,59.53435793850596,78.87009048309976,90.4419813961093,96.27797015848569,98.75812355318746,99.6462634796939,99.91424521451151,99.98237386316907,99.99696646919875,99.9995997084275,100.0
5.5,35.328703380163226,64.06826860029969,83.08685868529903,93.32470202742705,97.80751518480963,99.40405112826794,99.86649712858623,99.97543051898631,99.99629645020259,99.99954617623786,99.99995764844462,100.0
93 changes: 62 additions & 31 deletions first-gen-filler.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,14 @@
import sys
import os
import random
import scipy
from scipy import integrate
import math
import pymatgen
import pymatgen.io.vasp
import pymatgen.symmetry.analyzer

def phi(x):
return (1 / math.sqrt(2 * math.pi)) * (scipy.integrate.quad(lambda t:\
math.exp(-((t ** 2) / 2)), 0, x))
def chi(x):
out = integrate.quad(lambda t: math.exp(-((t ** 2)/2)), 0, x)
return out[0]

CU_OCC = 18
CU_AV = 108
Expand All @@ -38,10 +39,12 @@ def phi(x):
AG_BI_AV = 27
I_AV_OCC = 54
SPACE_GROUPS = 230
R_TOL = 0.1
A_TOL = 5.0

arg_parser = argparse.ArgumentParser()

arg_parser.add_argument("-i", "--index", type=argparse.FileType("r"),
arg_parser.add_argument("-i", "--index",
required=True, help="provide \"path/to/INDEX.vasp\",\
the index file containing all atomic positions")
arg_parser.add_argument("-b", "--bins", type=int, default=1, help="Specifies\
Expand All @@ -66,14 +69,18 @@ def phi(x):
print("Make sure arguments are within bounds and that -n >= -b.")
sys.exit(1)

index_file = open(args.index, "r")

# Read in INDEX.vasp into linetable and parse components
index = args.index.readlines()
index = index_file.readlines()

index_file.close()

lattice_constant = index[1]
# A: 0, B: 1, C: 2
LatticeMatrix = [index[i] for i in range(2, 5)]

Elements = zip(index[5].split(), [int(x) for x in index[6].split()])
Elements = [i for i in zip(index[5].split(), [int(x) for x in index[6].split()])]
# Just making sure that the file is in the correct format :)
assert len(Elements) == 3
assert index[7].strip() == "Direct"
Expand Down Expand Up @@ -110,52 +117,76 @@ def phi(x):

# Calculate the distribution of space groups into bins on a normal
# distribution
Bins = [math.round(2 * SPACE_GROUPS *\
phi((args.aggressiveness * i) / args.bins)) for i in\
range(1, args.bins)]
Bins[args.bins - 1] = SPACE_GROUPS

# Create subdirectories for each bin
if args.aggressiveness == 0:
Bins = [round((SPACE_GROUPS * i) / args.bins) for i in\
range(1, args.bins + 1)]
else:
Bins = [round(SPACE_GROUPS *\
((chi((args.aggressiveness * i) / args.bins)) /\
(chi(args.aggressiveness)))) for i in\
range(1, args.bins + 1)]
BinSize = [0] * args.bins
MAX_BIN_SIZE = args.number // args.bins

# Create subdirectories for each bin (will throw error if already exists)
parent_dir = os.path.dirname(args.index)
for i in range(0, args.bins):
os.mkdir(os.path.join(parent_dir, str(i)))

# Now generate the random config
for i in range(args.number):
AgBiOcc = random.choices(AgBiIndex, k=AG_OCC + BI_OCC)
i = 0
attempt = 0
while i < args.number:
print(attempt)
AgBiOcc = random.sample(AgBiIndex, k=AG_OCC + BI_OCC)
AgOcc = AgBiOcc[0:AG_OCC]
BiOcc = AgBiOcc[AG_OCC:AG_OCC + BI_OCC]
CuOcc = random.choices(CuIndex, k=CU_OCC)
CuOcc = random.sample(CuIndex, k=CU_OCC)

# Create the new file and write its contents
vasp_file = open(f"{i}.vasp", "w")
file_path = os.path.join(parent_dir, f"{i}.vasp")
vasp_file = open(file_path, "w")
vasp_file.write(f"{i}\n")
vasp_file.write(lattice_constant)
vasp_file.writelines(LatticeMatrix)
vasp_file.write(f"{'Ag':<3}{'Bi':<3}{'Cu':<3}{'I':<3}\n")
vasp_file.write(f"{AG_OCC:3d}{BI_OCC:3d}{CU_OCC:3d}{I_AV_OCC:3d}\n")
vasp_file.write(f"{AG_OCC:2d}{BI_OCC:3d}{CU_OCC:3d}{I_AV_OCC:3d}\n")
vasp_file.write("Direct\n")
vasp_file.writelines(AgOcc)
vasp_file.writelines(BiOcc)
vasp_file.writelines(CuOcc)
vasp_file.writelines(IIndex)
vasp_file.close()

# Calculate and append spacegroup to the start of the file
poscar = pymatgen.io.vasp.inputs.Poscar\
.from_file(f"{i}.vasp", check_for_POTCAR=False, read_velocities=False)

spacegroup = pymatgen.symmetry.analyzer.SpacegroupAnalyzer(poscar)\
.get_space_group_number()

vasp_file.seek(0)
vfline = vasp_file.readline()
vfline = vfline + f" ({spacegroup})"
vasp_file.seek(0)
vasp_file.writeline(vfline)
.from_file(file_path, check_for_POTCAR=False, read_velocities=False)

# TODO: figure out why this is only returning 1. Ran 40k+ attempts, only
# returned 1 ever. Am I just not scanning enough input or is there actually
# a problem with the system?
spacegroup = pymatgen.symmetry.analyzer.\
SpacegroupAnalyzer(poscar.structure, symprec=R_TOL,\
angle_tolerance=A_TOL).get_space_group_number()

vasp_file = open(file_path, "r")
vflines = vasp_file.readlines()
vflines[0] = vflines[0].strip() + f" ({spacegroup})\n"
vasp_file.close()
vasp_file = open(file_path, "w")
vasp_file.writelines(vflines)
vasp_file.close()

# Now determine the proper bin and move file into that bin.
for j in range(0, args.bins):
if spacegroup <= Bins[j]:
new_path = os.path.join(parent_dir, j, f"{i}.vasp")
os.rename(f"{i}.vasp", new_path)
if BinSize[j] < MAX_BIN_SIZE:
new_path = os.path.join(parent_dir, str(j), f"{i}.vasp")
os.rename(file_path, new_path)
BinSize[j] += 1
i += 1
break
os.remove(file_path)
break

attempt += 1
Loading

0 comments on commit 43a1baa

Please sign in to comment.