If there is a specific pattern in the string like the example you showed, I am not sure if it can be completely split into ‘dictionary words’, but I think it can be done to some extent.
In the example you showed, there is a comma and a space before the uppercase letters, so if this pattern is present, it can be split using the following regular expression.
col option > Work With Text Data > Replace > Text(All)
From: (?<!^)(?=[A-Z])
To: ,
※ Note; there is a space after the comma
There is no specific pattern other than the extension (.com), please see a sample below.
If there is no specific pattern, then it is difficult to split a string with specific conditions.
In this situation, I think it would be better to use a string search algorithm(KMP, BM, …many), which allows you to give the keywords you want to search for to a string and find out if it contains the keywords or not.
As far as I know, Exploratory doesn’t provide such a feature by default, so you need to either write the algorithm from scratch or write an R script by combining R packages.
After some research(I’m sure there are many more if you look for them.), I found an R package that implements the AhoCorasick method. By combining these packages, you can write a function that outputs the search results in a format that is acceptable to Exploaratoy.
library(AhoCorasickTrie)
target_text <- c("bizcashadvance", "moneymarketplace", "apartmentlocators", "teststring")
# Normally, you would not specify the words, but use the dictionary data where a huge number of words are stored.
target_words <- c("money", "market", "place", "cash", "advance", "apple", "apartment")
AhoCorasickSearch(keywords = target_words, text = target_text)
[[1]]
[[1]][[1]]
[[1]][[1]]$Keyword
[1] "cash"
[[1]][[1]]$Offset
[1] 4
[[1]][[2]]
[[1]][[2]]$Keyword
[1] "advance"
[[1]][[2]]$Offset
[1] 8
[[2]]
[[2]][[1]]
[[2]][[1]]$Keyword
[1] "money"
[[2]][[1]]$Offset
[1] 1
[[2]][[2]]
[[2]][[2]]$Keyword
[1] "market"
[[2]][[2]]$Offset
[1] 6
[[2]][[3]]
[[2]][[3]]$Keyword
[1] "place"
[[2]][[3]]$Offset
[1] 12
[[3]]
[[3]][[1]]
[[3]][[1]]$Keyword
[1] "apartment"
[[3]][[1]]$Offset
[1] 1
[[4]]
list()