Group similar rows of Primary Key Column?

Hi there,

I am dealing with unstructured data. I am particularly interested in clubbing values of 2-3 rows of a primary key column into one. For instance,

Column name: Industry
Column Values:

Phamaceuticals, Biotechnology
Pharmaceuticals & Biotechnology
Pharmaceuticals, Biotechnology

I wish to club these 4 rows into 1 and aggregate the numerical values, if any, for each related column. Is there any way to do it in exploratory?


Sounds like you want to get the ‘industry’ column cleaned up first, then ‘group by’ the ‘industry’ column, then ‘summarize’.

To clean up the column, you can use one of the str_xxx functions with ‘Create Function (Mutate)’. For example, you can use ‘str_sub’ function like below.

str_sub(Industry, 1, 15)