Skip to content

Commit

Permalink
regex form guess CSV params; fix multiline tb bug
Browse files Browse the repository at this point in the history
1. the regex search form now automatically tries to
    guess the delimiter, eol and number of columns for a CSV file
    if the new auto_try_guess_csv_delim_newline setting is on.
    This is pretty slow, but there's room for improvement,
    and it makes life a lot easier.
2. Fix bug (introduced in last commit) where the Enter key
    does not add a new line in multiline textboxes.
  • Loading branch information
molsonkiko committed Feb 9, 2024
1 parent bfba863 commit 5a418d7
Show file tree
Hide file tree
Showing 14 changed files with 433 additions and 107 deletions.
90 changes: 45 additions & 45 deletions .github/workflows/CI_build.yml
Original file line number Diff line number Diff line change
@@ -1,45 +1,45 @@
name: Continuous Integration

on:
push:
paths-ignore:
- 'docs/**'
- '*.md'
- '*.txt'
- '*.PNG'
- 'makerelease.bat'
- 'testfiles/**'

jobs:
build:
runs-on: windows-2022
strategy:
max-parallel: 4
matrix:
build_configuration: [Release, Debug]
build_platform: [x64, x86]

steps:
- name: Checkout repo
uses: actions/checkout@v4

- name: Add msbuild to PATH
uses: microsoft/[email protected]

- name: MSBuild of solution
run: msbuild JsonToolsNppPlugin/JsonToolsNppPlugin.sln /p:configuration="${{ matrix.build_configuration }}" /p:platform="${{ matrix.build_platform }}" /m /verbosity:minimal

- name: Archive artifacts for x64
if: matrix.build_platform == 'x64' && matrix.build_configuration == 'Release'
uses: actions/upload-artifact@v4
with:
name: plugin_dll_x64
path: JsonToolsNppPlugin\bin\${{ matrix.build_configuration }}-x64\JsonTools.dll

- name: Archive artifacts for x86
if: matrix.build_platform == 'x86' && matrix.build_configuration == 'Release'
uses: actions/upload-artifact@v4
with:
name: plugin_dll_x86
path: JsonToolsNppPlugin\bin\${{ matrix.build_configuration }}\JsonTools.dll

name: Continuous Integration

on:
push:
paths-ignore:
- 'docs/**'
- '*.md'
- '*.txt'
- '*.PNG'
- 'makerelease.bat'
- 'testfiles/**'

jobs:
build:
runs-on: windows-2022
strategy:
max-parallel: 4
matrix:
build_configuration: [Release, Debug]
build_platform: [x64, x86]

steps:
- name: Checkout repo
uses: actions/checkout@v4

- name: Add msbuild to PATH
uses: microsoft/[email protected]

- name: MSBuild of solution
run: msbuild JsonToolsNppPlugin/JsonToolsNppPlugin.sln /p:configuration="${{ matrix.build_configuration }}" /p:platform="${{ matrix.build_platform }}" /m /verbosity:minimal

- name: Archive artifacts for x64
if: matrix.build_platform == 'x64' && matrix.build_configuration == 'Release'
uses: actions/upload-artifact@v4
with:
name: plugin_dll_x64
path: JsonToolsNppPlugin\bin\${{ matrix.build_configuration }}-x64\JsonTools.dll

- name: Archive artifacts for x86
if: matrix.build_platform == 'x86' && matrix.build_configuration == 'Release'
uses: actions/upload-artifact@v4
with:
name: plugin_dll_x86
path: JsonToolsNppPlugin\bin\${{ matrix.build_configuration }}\JsonTools.dll

8 changes: 3 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
* Add `uses_context` field to ArgFunction instances, so that they have JQueryContext appended to their arguments, and they can reference fields of that JQueryContext.
* This way we don't have to have these methods mutating and referencing a global static variable.
* Additionally, the presence of a function with `uses_context=true` would serve as a flag that the query cannot be executed in parallel, because doing so would cause race conditions associated with the shared JQueryContext fields.
7. Make it so the regex search form makes a very basic effort to determine the quote character, delimiter, and number of columns in CSV files.
* maybe only try to do this for files with the `.csv` and `.tsv` extensions
* only test the `,` and `\t` delimiters, and only the `"` or `'` quote characters
* test only the first 10KB of the file, or first 25 lines, whichever comes first.
8. Unit tests that randomly generate text with JSON chars to make sure JSON parser never throws for any reason, since errors aren't caught.
7. Unit tests that randomly generate text with JSON chars to make sure JSON parser never throws for any reason, since errors aren't caught.

### To Be Changed

Expand All @@ -52,6 +48,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
- issue with treeview closing when a file with a treeview is moved from one view to another
- `loop()` function used in `s_sub` callbacks is not thread-safe. *This doesn't matter right now* because RemesPath is single-threaded, but it could matter in the future.
- __GrepperForm loses its JSON permanently when the buffer associated with its treeview is deleted.__
- Since v7.0, holding down `Enter` in a multiline textbox (like the [tree viewer query box](/docs/README.md#remespath)) only adds one newline when the key is lifted.

## [7.0.0] - (UNRELEASED) YYYY-MM-DD

Expand All @@ -63,6 +60,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
4. [Python-style single-line comments in RemesPath](/docs/RemesPath.md#comments-added-in-v62)
5. A [RemesPath user-defined language (UDL) file](/RemesPath%20UDL.xml), providing some very basic syntax highlighting. It is buggy, but that is because the UDL system is inherently buggy, not because I did anything wrong (as far as I know).
6. A `:` character between two key-value pairs in an object no longer causes a fatal error that makes the parser quit.
7. Add new `auto_try_guess_csv_delim_newline` setting. If this is true (default false), [Regex search form](/docs/README.md#regex-search-form) now makes a very basic attempt to "sniff" if the current file is CSV whenever it is opened, or when the `Parse as CSV?` button is toggled on.

### Changed

Expand Down
25 changes: 25 additions & 0 deletions JsonToolsNppPlugin/Forms/NppFormHelper.cs
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ public static void GenericKeyUpHandler(Form form, object sender, KeyEventArgs e,
// Enter has the same effect as clicking a selected button
btn.PerformClick();
}
else
PressEnterInTextBoxHandler(sender, isModal);
}
// Escape ->
// * if this.IsModal (meaning this is a pop-up dialog), close this.
Expand All @@ -97,6 +99,29 @@ public static void GenericKeyUpHandler(Form form, object sender, KeyEventArgs e,
}
}

/// <summary>
/// NPPM_MODELESSDIALOG consumes the KeyDown and KeyPress events for the Enter key,<br></br>
/// so our KeyUp handler needs to simulate pressing enter to add a new line in a multiline text box.<br></br>
/// Note that this does not fully repair the functionality of the Enter key in a multiline text box,
/// because only one newline can be created for a single keypress of Enter, no matter how long the key is held down.
/// </summary>
/// <param name="sender">the text box that sent the message</param>
/// <param name="isModal">if true, this blocks the parent application until closed. THIS IS ONLY TRUE OF POP-UP DIALOGS</param>
public static void PressEnterInTextBoxHandler(object sender, bool isModal)
{

if (!isModal && sender is TextBox tb && tb.Multiline)
{
int selstart = tb.SelectionStart;
tb.SelectedText = "";
string text = tb.Text;
tb.Text = text.Substring(0, selstart) + "\r\n" + text.Substring(selstart);
tb.SelectionStart = selstart + 2; // after the inserted newline
tb.SelectionLength = 0;
tb.ScrollToCaret();
}
}

/// <summary>
/// CALL THIS IN YOUR Dispose(bool disposing) METHOD, INSIDE OF THE ".Designer.cs" FILE<br></br>
/// If this was a modeless dialog (i.e., !isModal; a dialog that does not block Notepad++ while open),<br></br>
Expand Down
75 changes: 63 additions & 12 deletions JsonToolsNppPlugin/Forms/RegexSearchForm.cs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
using JSON_Tools.JSON_Tools;
using System.Linq;
using System.Collections.Generic;
using Kbg.NppPluginNET.PluginInfrastructure;

namespace JSON_Tools.Forms
{
Expand Down Expand Up @@ -53,6 +54,10 @@ public RegexSearchForm()
"}"),
0);
GetTreeViewInRegexMode();
// check it, see if we have a CSV
ParseAsCsvCheckBox.Checked = true;
if (NColumnsTextBox.Text.Length == 0)
ParseAsCsvCheckBox.Checked = false;
}

public void GrabFocus()
Expand Down Expand Up @@ -86,7 +91,7 @@ private void GetTreeViewInRegexMode()
private static readonly Dictionary<string, int> HEADER_HANDLING_ABBREV_MAP = new Dictionary<string, int> { ["\"h\""] = 1, ["\"n\""] = 0, ["\"d\""] = 2, ["1"] = 1, ["2"] = 2, ["0"] = 0 };

private static readonly Dictionary<string, int> NEWLINE_MAP = new Dictionary<string, int> { ["\"\\r\\n\""] = 0, ["\"\\n\""] = 1, ["\"\\r\""] = 2, ["0"] = 0, ["1"] = 1, ["2"] = 2 };

public void SearchButton_Click(object sender, EventArgs e)
{
GetTreeViewInRegexMode();
Expand Down Expand Up @@ -151,22 +156,68 @@ private void RegexSearchForm_KeyUp(object sender, KeyEventArgs e)
NppFormHelper.GenericKeyUpHandler(this, sender, e, false);
}

/// <summary>
/// <strong>Checking</strong>the ParseAsCsvCheckBox does the following:<br></br>
/// - reveals all the CSV-related controls<br></br>
/// - disables the regex-related controls<br></br>
/// - sniffs the first 16 KB of the document (or 16 lines, whichever comes first)
/// using every combo of (',', '\t') delimiters and ('\r\n', '\n', '\r') newlines
/// and sets the CSV controls appropriately if a match is found<br></br>
/// <strong>Unchecking</strong>the ParseAsCsvCheckBox does the following:<br></br>
/// - hides the CSV related controls<br></br>
/// - enables the regex-related controls
/// </summary>
public void ParseAsCsvCheckBox_CheckedChanged(object sender, EventArgs e)
{
bool showCsvButtons = ParseAsCsvCheckBox.Checked;
QuoteCharTextBox.Visible = showCsvButtons;
QuoteCharTextBoxLabel.Visible = showCsvButtons;
DelimiterTextBox.Visible = showCsvButtons;
DelimiterTextBoxLabel.Visible = showCsvButtons;
NewlineComboBox.Visible = showCsvButtons;
NewlineComboBoxLabel.Visible = showCsvButtons;
HeaderHandlingComboBox.Visible = showCsvButtons;
// thanks to the magical mysteries of registering this form with NPPM_MODELESSDIALOG,
// the order in which I make controls visible defines their tab order.
DelimiterTextBox.Visible = showCsvButtons;
DelimiterTextBoxLabel.Visible = showCsvButtons;
QuoteCharTextBox.Visible = showCsvButtons;
QuoteCharTextBoxLabel.Visible = showCsvButtons;
NewlineComboBox.Visible = showCsvButtons;
NewlineComboBoxLabel.Visible = showCsvButtons;
NColumnsTextBox.Visible = showCsvButtons;
NColumnsTextBoxLabel.Visible = showCsvButtons;
HeaderHandlingComboBox.Visible = showCsvButtons;
HeaderHandlingComboBoxLabel.Visible = showCsvButtons;
NColumnsTextBox.Visible = showCsvButtons;
NColumnsTextBoxLabel.Visible = showCsvButtons;
RegexTextBox.Enabled = !showCsvButtons;
IgnoreCaseCheckBox.Enabled = !showCsvButtons;
RegexTextBox.Enabled = !showCsvButtons;
IgnoreCaseCheckBox.Enabled = !showCsvButtons;
IncludeFullMatchAsFirstItemCheckBox.Enabled = !showCsvButtons;
if (showCsvButtons && Main.settings.auto_try_guess_csv_delim_newline)
{
if (TrySniffCommonDelimsAndEols(out EndOfLine eol, out char delim, out int nColumns))
{
// we found possible NColumns, delimiter, and Newline values
NColumnsTextBox.Text = nColumns.ToString();
DelimiterTextBox.Text = ArgFunction.CsvCleanChar(delim);
QuoteCharTextBox.Text = "\"";
NewlineComboBox.SelectedIndex = eol == EndOfLine.CRLF ? 0 : eol == EndOfLine.LF ? 1 : 2;
}
}
}

private static bool TrySniffCommonDelimsAndEols(out EndOfLine eol, out char delim, out int nColumns)
{
eol = EndOfLine.CRLF;
delim = '\x00';
nColumns = -1;
string text = Npp.editor.GetText(CsvSniffer.DEFAULT_MAX_CHARS_TO_SNIFF * 3 / 2);
foreach (EndOfLine maybeEol in new EndOfLine[]{EndOfLine.CRLF, EndOfLine.LF, EndOfLine.CR})
{
foreach (char maybeDelim in ",\t")
{
nColumns = CsvSniffer.Sniff(text, maybeEol, maybeDelim, '"');
if (nColumns >= 2)
{
delim = maybeDelim;
eol = maybeEol;
return true;
}
}
}
return false;
}


Expand Down
2 changes: 2 additions & 0 deletions JsonToolsNppPlugin/Forms/TreeViewer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,8 @@ private void TreeViewer_KeyUp(object sender, KeyEventArgs e)
selected.Collapse(true); // don't collapse the children as well
else selected.Expand();
}
else if (QueryBox.Focused)
NppFormHelper.PressEnterInTextBoxHandler(QueryBox, false);
}
// Escape -> go to editor
else if (e.KeyData == Keys.Escape)
Expand Down
68 changes: 68 additions & 0 deletions JsonToolsNppPlugin/JSONTools/CsvSniffer.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
using Kbg.NppPluginNET.PluginInfrastructure;
using System.Text.RegularExpressions;

namespace JSON_Tools.JSON_Tools
{
public class CsvSniffer
{
public const int DEFAULT_MAX_CHARS_TO_SNIFF = 1600;

/// <summary>
/// Attempt to parse text as an RFC 4180-compliant CSV file with delimiter delimiter, newline eol, and quote character '"'.<br></br>
/// Each line, count how many columns there are. If there are two lines with different numbers of lines, or if all lines have one column, return -1.<br></br>
/// If maxLinesToSniff lines are consumed before hitting maxCharsToSniff characters,
/// and all the lines have the same number of columns, return that number of columns<br></br>
/// If maxCharsToSniff characters are consumed, return -1 unless at least minLinesToDecide lines were consumed.
/// </summary>
/// <param name="eol">CR, CRLF, or LF</param>
/// <param name="delimiter">the delimiter (e.g., ',' for a CSV file, '\t' for a TSV file)</param>
/// <param name="maxLinesToSniff">consume no more than this many complete lines while sniffing</param>
/// <param name="maxCharsToSniff">consume no more than this many characters while sniffing</param>
/// <param name="minLinesToDecide">return -1 if fewer than this many complete lines were consumed</param>
/// <returns>the number of columns in the file, or -1 if text does not appear to have that delimiter-eol combo</returns>
public static int Sniff(string text, EndOfLine eol, char delimiter, char quote, int maxLinesToSniff = 16, int maxCharsToSniff = DEFAULT_MAX_CHARS_TO_SNIFF, int minLinesToDecide = 6)
{
maxCharsToSniff = maxCharsToSniff > text.Length ? text.Length : maxCharsToSniff;
int nColumns = -1;
int nColumnsThisLine = 0;
int matchStart = 0;
int linesConsumed = 0;
string delimStr = ArgFunction.CsvCleanChar(delimiter);
string newline = eol == EndOfLine.CRLF ? "\r\n" : eol == EndOfLine.CR ? "\r" : "\n";
string escapedNewline = JNode.StrToString(newline, false);
string newlineOrDelimiter = $"(?:{delimStr}|{escapedNewline}|\\z)";
string regexStr = ArgFunction.CsvColumnRegex(ArgFunction.CsvCleanChar(delimiter), new string(quote, 1)) + newlineOrDelimiter;
Regex regex = new Regex(regexStr, RegexOptions.Compiled);
while (matchStart < maxCharsToSniff)
{
Match match = regex.Match(text, matchStart);
if (!match.Success)
return -1;
nColumnsThisLine++;
int matchEnd = matchStart + match.Length;
if (matchEnd == matchStart)
matchEnd++;
bool atEndOfLine = matchEnd >= text.Length || match.Value.EndsWith(newline);
if (atEndOfLine)
{
if (nColumns == -1) // first line
nColumns = nColumnsThisLine;
else if (nColumns == 1 // a row has only one column
|| (nColumns >= 0 && nColumnsThisLine != nColumns)) // two rows with different numbers of columns
return -1;
nColumnsThisLine = 0;
linesConsumed++;
if (linesConsumed == maxLinesToSniff)
return nColumns;
}
matchStart = matchEnd;
}
if (linesConsumed < 2 // only one line is never enough to decide anything
|| (matchStart < text.Length && linesConsumed < minLinesToDecide))
// we haven't consumed enough lines to be confident in our delimiter-eol combo
// (unless we consumed the whole file, in which case we're fine)
return -1;
return nColumns;
}
}
}
4 changes: 2 additions & 2 deletions JsonToolsNppPlugin/JSONTools/RemesPathFunctions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2485,12 +2485,12 @@ public static JNode StrFind(List<JNode> args)
/// <summary>
/// converts the delimiter to a format suitable for use in regular expressions
/// </summary>
private static string CsvCleanChar(char c)
public static string CsvCleanChar(char c)
{
return c == '\t' ? "\\t" : Regex.Escape(new string(c, 1));
}

private static string CsvColumnRegex(string delimiter, string quote)
public static string CsvColumnRegex(string delimiter, string quote)
{
return CSV_BASE_COLUMN_REGEX.Replace("{QUOTE}", quote).Replace("{DELIM}", delimiter);
}
Expand Down
2 changes: 2 additions & 0 deletions JsonToolsNppPlugin/JsonToolsNppPlugin.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@
<Compile Include="Forms\TreeViewer.Designer.cs">
<DependentUpon>TreeViewer.cs</DependentUpon>
</Compile>
<Compile Include="JSONTools\CsvSniffer.cs" />
<Compile Include="JSONTools\Dson.cs" />
<Compile Include="PluginInfrastructure\ClikeStringArray.cs" />
<Compile Include="PluginInfrastructure\DllExport\DllExportAttribute.cs" />
Expand Down Expand Up @@ -206,6 +207,7 @@
<DependentUpon>Resources.resx</DependentUpon>
</Compile>
<Compile Include="Tests\Benchmarker.cs" />
<Compile Include="Tests\CsvSnifferTests.cs" />
<Compile Include="Tests\IniFileParserTests.cs" />
<Compile Include="Tests\JsonGrepperTests.cs" />
<Compile Include="Tests\JsonParserTests.cs" />
Expand Down
4 changes: 2 additions & 2 deletions JsonToolsNppPlugin/Properties/AssemblyInfo.cs
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,5 @@
// Build Number
// Revision
//
[assembly: AssemblyVersion("6.1.1.19")]
[assembly: AssemblyFileVersion("6.1.1.19")]
[assembly: AssemblyVersion("6.1.1.20")]
[assembly: AssemblyFileVersion("6.1.1.20")]
Loading

0 comments on commit 5a418d7

Please sign in to comment.