Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to select path segments using Python re regexprs; follow on to PR#146 & #177 #186

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,9 @@ elements in ``x['a']['b']`` where the key is equal to the glob ``'[cd]'``. Okay.
}
}

**Note** : Using Python's `re` regular expressions instead of globs is explained
below re_regexp_.

... Wow that was easy. What if I want to iterate over the results, and
not get a merged view?

Expand Down Expand Up @@ -438,6 +441,82 @@ To get around this, you can sidestep the whole "filesystem path" style, and aban
>>> dpath.get(['a', 'b/c'])
0

.. _re_regexp:

Globs too imprecise? Use Python's `re` Regular Expressions
==========================================================

Python's `re` regular expressions PythonRe_ may be used as follows:

.. _PythonRe: https://docs.python.org/3/library/re.html

- This facility is enabled by default, but may be disabled (for backwards
compatibility in the unlikely cases where a path expression component would start
with '{' and end in '}'):

.. code-block:: python

>>> import dpath
>>> # disable
>>> dpath.options.DPATH_ACCEPT_RE_REGEXP = False
>>> # enable
>>> dpath.options.DPATH_ACCEPT_RE_REGEXP = True

- Now a path component may also be specified :

- in a path expression, as {<re.regexpr>} where `<re.regexpr>` is a regular expression
accepted by the standard Python module `re`. For example:

.. code-block:: python

>>> selPath = 'Config/{(Env|Cmd)}'
>>> x = dpath.search(js.lod, selPath)

.. code-block:: python

>>> selPath = '{(Config|Graph)}/{(Env|Cmd|Data)}'
>>> x = dpath.search(js.lod, selPath)

- When using the list form for a path, a list element can also
be expressed as

- a string as above
- the output of :: `re.compile( args )``

An example:

.. code-block:: python

>>> selPath = [ re.compile('(Config|Graph)') , re.compile('(Env|Cmd|Data)') ]
>>> x = dpath.search(js.lod, selPath)

More examples from a realistic json context:

+-----------------------------------------+--------------------------------------+
+ **Extended path glob** | **Designates** +
+-----------------------------------------+--------------------------------------+
+ "\*\*/{[^A-Za-z]{2}$}" | "Id" +
+-----------------------------------------+--------------------------------------+
+ r"\*/{[A-Z][A-Za-z\\d]*$}" | "Name","Id","Created", "Scope",... +
+-----------------------------------------+--------------------------------------+
+ r"\*\*/{[A-Z][A-Za-z\\d]*\d$}" | EnableIPv6" +
+-----------------------------------------+--------------------------------------+
+ r"\*\*/{[A-Z][A-Za-z\\d]*Address$}" | "Containers/199c5/MacAddress" +
+-----------------------------------------+--------------------------------------+

With Python's character string conventions, required backslashes in the `re` syntax
can be entered either in raw strings or using double backslashes, thus
the following are equivalent:

+-----------------------------------------+----------------------------------------+
+ *with raw strings* | *equivalent* with double backslash +
+-----------------------------------------+----------------------------------------+
+ r"\*\*/{[A-Z][A-Za-z\\d]*\\d$}" | "\*\*/{[A-Z][A-Za-z\\\\d]*\\\\d$}" +
+-----------------------------------------+----------------------------------------+
+ r"\*\*/{[A-Z][A-Za-z\\d]*Address$}" | "\*\*/{[A-Z][A-Za-z\\\\d]*Address$}" +
+-----------------------------------------+----------------------------------------+


dpath.segments : The Low-Level Backend
======================================

Expand Down
21 changes: 20 additions & 1 deletion dpath/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@
from dpath.exceptions import InvalidKeyName, PathNotFound
from dpath.types import MergeType, PathSegment, Creator, Filter, Glob, Path, Hints

import sys
moomoohk marked this conversation as resolved.
Show resolved Hide resolved
import re


_DEFAULT_SENTINEL = object()


Expand All @@ -45,7 +49,22 @@ def _split_path(path: Path, separator: Optional[str] = "/") -> Union[List[PathSe
else:
split_segments = path.lstrip(separator).split(separator)

return split_segments
final = []
moomoohk marked this conversation as resolved.
Show resolved Hide resolved
for segment in split_segments:
if (options.DPATH_ACCEPT_RE_REGEXP and isinstance(segment, str)
moomoohk marked this conversation as resolved.
Show resolved Hide resolved
and segment[0] == '{' and segment[-1] == '}'):
moomoohk marked this conversation as resolved.
Show resolved Hide resolved
try:
rs = segment[1:-1]
rex = re.compile(rs)
except Exception as reErr:
moomoohk marked this conversation as resolved.
Show resolved Hide resolved
print(f"Error in segment '{segment}' string '{rs}' not accepted"
moomoohk marked this conversation as resolved.
Show resolved Hide resolved
+ f"as re.regexp:\n\t{reErr}",
file=sys.stderr)
raise reErr
final.append(rex)
else:
final.append(segment)
return final


def new(obj: MutableMapping, path: Path, value, separator="/", creator: Creator = None) -> MutableMapping:
Expand Down
6 changes: 6 additions & 0 deletions dpath/options.py
Original file line number Diff line number Diff line change
@@ -1 +1,7 @@
ALLOW_EMPTY_STRING_KEYS = False

# Extension to interpret path segments "{rrr}" as re.regexp "rrr" enabled by default.
# Disable to preserve backwards compatibility in the case where a user has a
# path "a/b/{cd}" where the brackets are intentional and do not denote a request
# to re.compile cd
DPATH_ACCEPT_RE_REGEXP = True
moomoohk marked this conversation as resolved.
Show resolved Hide resolved
21 changes: 14 additions & 7 deletions dpath/segments.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
from dpath.exceptions import InvalidGlob, InvalidKeyName, PathNotFound
from dpath.types import PathSegment, Creator, Hints, Glob, Path, SymmetricInt

from re import Pattern
moomoohk marked this conversation as resolved.
Show resolved Hide resolved


def make_walkable(node) -> Iterator[Tuple[PathSegment, Any]]:
"""
Expand Down Expand Up @@ -182,9 +184,11 @@ def match(segments: Path, glob: Glob):
or more star segments and the type will be coerced to match that of
the segment.

A segment is considered to match a glob if the function
fnmatch.fnmatchcase returns True. If fnmatchcase returns False or
throws an exception the result will be False.
A segment is considered to match a glob when either:
- the segment is a String : the function fnmatch.fnmatchcase returns True.
If fnmatchcase returns False or throws an exception the result will be False.
- or, the segment is a re.Pattern (result of re.compile) and re.Pattern.match returns
a match

match(segments, glob) -> bool
"""
Expand Down Expand Up @@ -241,10 +245,13 @@ def match(segments: Path, glob: Glob):
s = str(s)

try:
# Let's see if the glob matches. We will turn any kind of
# exception while attempting to match into a False for the
# match.
if not fnmatchcase(s, g):
# Let's see if the glob or the regular expression matches. We will turn any kind of
# exception while attempting to match into a False for the match.
if isinstance(g, Pattern):
moomoohk marked this conversation as resolved.
Show resolved Hide resolved
mobj = g.match(s)
if mobj is None:
return False
elif not fnmatchcase(s, g):
return False
except:
return False
Expand Down
Loading