-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnotes.txt
157 lines (132 loc) · 4.96 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
Types
next: parsed
prev: parsed
dimensions: Array of number
charClass: { min: string, max: string, negated: boolean }
props: Array of { name: string, shallow: boolean }
parsed: everything below
alternationStart { type, props, end, next: Array of parsed, prev }
alternationEnd { type, props, next, prev: Array of parsed }
assertionStart { type, name: string, dimensions, condition: parsed | null, then: parsed, thenNot: parsed, prev }
assertionEnd { type, name: string, dimensions, condition: parsed | null, then: parsed, thenNot: parsed, next }
backreference { type, name: string, dimensions, next, prev }
capturingGroupStart { type, name: string, objName: string | null, shallow: boolean, next: parsed, prev }
capturingGroupEnd {type, next, prev: parsed }
conditionCheck: { condition: parsed, negated: boolean, lookbehind: boolean, next: parsed, prev: parsed }
group { type, arr: Array of [ boundary | string | charClass | Array of [ string | charClass ] ], next, prev }
subroutine { type, name: string, next, prev }
quantifierStart { type: [greedy|lazy|possesive]Start, min: number, max: number, special: boolean, props, end: parsed, next, prev }
quantifierEnd { type: [greedy|lazy|possesive]End, min: number, max: number, special: boolean, props, start: parsed, next, prev }
Stacks
subroutineStackItem
{ retObj: Object
, quantStateOpenStack: Array of quantStateStackItem
, quantStateClosedStack: Array of quantStateStackItem
, quantifierOpenStack: Array of quantifierStackItem
, quantifierClosedStack: Array of quantifierStackItem
, ignoreCase: boolean
, next: parsed | null
, depth: number
}
quantifierStackItem
{ captrGroupOpenStack: Array of captrGroupStackItem
, captrGroupClosedStack: Array of captrGroupStackItem
, start: number
, end: number
, flag: boolean
, props
, special
, depth: number
}
captrGroupStackItem
{ name: string
, objName
, shallow: boolean
, posStart: number
, posEnd: number
}
new quantifier - captrGroup qNested; qItem
push pop
start !== end; start > 0 start === end; end === 0
old quantifier - captrGroup qBreak;
push pop
start === end; start > 0 start === end; end > 0
------
Regex
Name:x
i|Name:x
Special characters
global: `^$|\*+?{}[]()`
local (only considered special in some places): `<>:`
Boundaries
^ $ | \a \z \` \'
Alternation and character classes
\[-?xyz\]
Groups and subroutines
\(x\)
\(:Name\)
\([.:]name[.:]x\)
\(:name:\(:Name\)\)
\(:name=capturedValue\)
TODO global capturing groups (;name)
TODO? unmatch capturing group (:name!)
Backreference
\(=name(\[n\])*\)
Quantifiers
x[\?\*\+(\{n(,m?)?\})][\?\+]?
Lookaround
(?x:y<z)
(?x>y:z)
Assertion
\(|name(\[n\])*(=x)?>y:z)
------
------
global:
array of string
map of captrName -> type (multi shallow global unmatch)
Character: char
Character classes: negated, char, char
Negated alternation: negated, array of char
Boundary (7 types): type
Alternation: array of prop, array of parsed
Subroutine: name
Capturing group: name, obj, hidden, value
Backreference name, array of dimension
Quantifier: array of prop, type (greedy lazy possesive), min, max, special
Lookahead: alternation of [ [ conditionA, then ], [ !conditionA, thenNot ] ]
Lookbehind: alternation of [ [ conditionB, then ], [ !conditionB, thenNot ] ]
Condition: lookbehind, negated, address
Assertion name, array of dimension, condition, then, thenNot
------
TODO
1. Unicode? http://www.regular-expressions.info/unicode.html
2. Second mode of matching? Matches will be organized into array, including parts
of the string that are not captured, object properies will not be strings,
but integer indexes of the array. Concatenating the array will create the
substring from starting to ending position.
Example:
{ arr [ "<", "i", ">", "Hello!", "</i>" ]
, loc: { ... }
, input: "<i>Hello!</i>"
, name: { ... }
, htmlTag: 1
, attributes: []
, content: 3
}
3. Option to throw when match is not found, with descriptive message like: Expected one of: X, Y, Z found "("
4. Detect possible catastrophic bactracking, provide option to disable such
regexes so this library can be used without worrying about ReDos (I have
no idea whether this is possible)
http://www.regular-expressions.info/catastrophic.html
5. \p{UnicodeSomething} - implement if/when javascript implements this?
6. \x{UnicodeCodePoint} - implement if/when javascript implements this?
7. Should this be valid?
(abc(;(:num:\d+)zzz(=num);){3,}(=num[0])def(, )?)*(=num[-1][-1])
8. RegexGroup.finish() - forbids futrher modification of this RegexGroup, can do
further optimizations maybe?
9. Alternation shouldn't manipulate quantifierStack if it contains no captrGroups?
10. Assertion and backreference that work for all values in array, like (=ref[])
11. Encode regex as array of integers?
12. How should replacing work?
13. Way to capture true, false, number, date, symbol? (:Boolean), (:Boolean(true)) (:Int(178))
14. RegexGroup.export() : Object (not static); new RegexGroup(exportObj[, newName[, trackLines]])