-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial commit of library, unit tests, and pom file.
- Loading branch information
0 parents
commit 14df58f
Showing
9 changed files
with
13,590 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<classpath> | ||
<classpathentry kind="src" path="src/main/java"/> | ||
<classpathentry kind="src" path="src/test/java"/> | ||
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/java-11-openjdk-amd64"> | ||
<attributes> | ||
<attribute name="module" value="true"/> | ||
</attributes> | ||
</classpathentry> | ||
<classpathentry kind="con" path="org.eclipse.jdt.junit.JUNIT_CONTAINER/5"/> | ||
<classpathentry kind="output" path="bin"/> | ||
</classpath> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<projectDescription> | ||
<name>PublicSuffixList</name> | ||
<comment></comment> | ||
<projects> | ||
</projects> | ||
<buildSpec> | ||
<buildCommand> | ||
<name>org.eclipse.jdt.core.javabuilder</name> | ||
<arguments> | ||
</arguments> | ||
</buildCommand> | ||
</buildSpec> | ||
<natures> | ||
<nature>org.eclipse.jdt.core.javanature</nature> | ||
</natures> | ||
</projectDescription> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
eclipse.preferences.version=1 | ||
org.eclipse.jdt.core.compiler.codegen.inlineJsrBytecode=enabled | ||
org.eclipse.jdt.core.compiler.codegen.methodParameters=do not generate | ||
org.eclipse.jdt.core.compiler.codegen.targetPlatform=9 | ||
org.eclipse.jdt.core.compiler.codegen.unusedLocal=preserve | ||
org.eclipse.jdt.core.compiler.compliance=9 | ||
org.eclipse.jdt.core.compiler.debug.lineNumber=generate | ||
org.eclipse.jdt.core.compiler.debug.localVariable=generate | ||
org.eclipse.jdt.core.compiler.debug.sourceFile=generate | ||
org.eclipse.jdt.core.compiler.problem.assertIdentifier=error | ||
org.eclipse.jdt.core.compiler.problem.enumIdentifier=error | ||
org.eclipse.jdt.core.compiler.source=9 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
Public Suffix List Helper | ||
------------------------- | ||
|
||
##### Terminology used in this page: | ||
|
||
* TLD - Top Level Domain | ||
* SLD - Second Level Domain | ||
* eTLD - effective Top Level Domain | ||
* ccSLD - country-code Second Level Domain | ||
|
||
The Public Suffix List is a register of domain suffixes that are used in | ||
combination to provide effective Top Level Domains for certain actual Top | ||
Level Domains. | ||
|
||
For example, the TLD .uk has a number of second-level domains that are for most | ||
purposes considered part of the TLD. Therefore, .co.uk, .ac.uk, .org.uk can all | ||
be thought of as eTLDs, made up of a ccTLD and an actual TLD. | ||
|
||
``` | ||
Host SLD ccSLD TLD | ||
------------------------------- | ||
www example com | ||
www example co uk | ||
``` | ||
|
||
Above, the eTLDs would be '.com' and '.co.uk'. | ||
|
||
In the `net.susa.cfs.psl` package, I consider the part to the left of the eTLD to | ||
be the SLD. Everything to the left of the SLD is taken as the host. | ||
|
||
To achieve this, we load the Public Suffix List into a tree structure. The | ||
first tree level contains the actual TLDs, each of which contains a list of | ||
their ccSLDs, each of which can contain further ccSLDs, and so on. | ||
|
||
``` | ||
root | ||
| | ||
----------------------- | ||
| | | ||
uk au | ||
| | | ||
---------- ------------------- | ||
| | | | | | ||
co ac org edu gov | ||
| | | ||
------------- ------- | ||
| | | | | | ||
vic wa tas vic wa | ||
``` | ||
|
||
Using a tree allows us to store and search the list of ~9000 entries quickly | ||
when processing large numbers of URLs. | ||
|
||
The API to check a URL for is: - | ||
|
||
String getETLD(String fqdn) | ||
|
||
Returns the substring that should be considered the TLD. If the domain does not | ||
match any entry from the Public Suffix List, then the first part of the domain | ||
is returned. If the string cannot be parsed as a valid domain, the function | ||
returns the empty string. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> | ||
|
||
<modelVersion>4.0.0</modelVersion> | ||
|
||
<groupId>net.susa.cfs.psl</groupId> | ||
<artifactId>PublicSuffixList</artifactId> | ||
|
||
<packaging>jar</packaging> | ||
|
||
<version>1.0-SNAPSHOT</version> | ||
|
||
<name>PublicSuffixList</name> | ||
<url>http://maven.apache.org</url> | ||
|
||
<properties> | ||
<maven.compiler.source>1.9</maven.compiler.source> | ||
<maven.compiler.target>1.9</maven.compiler.target> | ||
</properties> | ||
|
||
<dependencies> | ||
<dependency> | ||
<groupId>org.junit.jupiter</groupId> | ||
<artifactId>junit-jupiter-api</artifactId> | ||
<version>5.3.2</version> | ||
<scope>test</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.junit.jupiter</groupId> | ||
<artifactId>junit-jupiter-engine</artifactId> | ||
<version>5.3.2</version> | ||
<scope>test</scope> | ||
</dependency> | ||
</dependencies> | ||
|
||
<build> | ||
<plugins> | ||
<plugin> | ||
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-compiler-plugin</artifactId> | ||
<version>3.8.0</version> | ||
</plugin> | ||
<plugin> | ||
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-surefire-plugin</artifactId> | ||
<version>2.22.0</version> | ||
<configuration> | ||
<argLine> | ||
--illegal-access=permit | ||
</argLine> | ||
</configuration> | ||
</plugin> | ||
<plugin> | ||
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-failsafe-plugin</artifactId> | ||
<version>2.22.0</version> | ||
<configuration> | ||
<argLine> | ||
--illegal-access=permit | ||
</argLine> | ||
</configuration> | ||
</plugin> | ||
</plugins> | ||
</build> | ||
</project> |
Oops, something went wrong.