How to parse words combo?

Hi, hope all is well.

I’m trying to analyze domain names. Does anyone know how to parse a name with 2 or 3 English dictionary words using Exploratory?

For example,

  1. AirTable = Air, Table
  2. SeoNinja = Seo, Ninja
  3. BigFatDollar = Big, Fat, Dollar

Appreciate your feedback, thanks.


If there is a specific pattern in the string like the example you showed, I am not sure if it can be completely split into ‘dictionary words’, but I think it can be done to some extent.

In the example you showed, there is a comma and a space before the uppercase letters, so if this pattern is present, it can be split using the following regular expression.

  • col option > Work With Text Data > Replace > Text(All)
From: (?<!^)(?=[A-Z])
To: , 
※ Note; there is a space after the comma

I hope this helps.

I will give it a try, many thanks.

There is no specific pattern other than the extension (.com), please see a sample below.


There is no specific pattern other than the extension (.com), please see a sample below.
If there is no specific pattern, then it is difficult to split a string with specific conditions.

In this situation, I think it would be better to use a string search algorithm(KMP, BM, …many), which allows you to give the keywords you want to search for to a string and find out if it contains the keywords or not.

As far as I know, Exploratory doesn’t provide such a feature by default, so you need to either write the algorithm from scratch or write an R script by combining R packages.

After some research(I’m sure there are many more if you look for them.), I found an R package that implements the AhoCorasick method. By combining these packages, you can write a function that outputs the search results in a format that is acceptable to Exploaratoy.

target_text <-  c("bizcashadvance", "moneymarketplace", "apartmentlocators",  "teststring")

# Normally, you would not specify the words, but use the dictionary data where a huge number of words are stored.
target_words <- c("money", "market", "place", "cash", "advance",   "apple", "apartment")

AhoCorasickSearch(keywords = target_words, text = target_text)

[1] "cash"

[1] 4

[1] "advance"

[1] 8

[1] "money"

[1] 1

[1] "market"

[1] 6

[1] "place"

[1] 12

[1] "apartment"

[1] 1


I hope this helps.

Thanks, Sugiaki. This is way too difficult for me, but I truly appreciate your feedback.

1 Like