Dictionary Encoding Java

prashant kumar
2 min readSep 18, 2020

Dictionary encoding is a loss-less compression technique to compress a list of words. Dictionary encoding works by maintaining a mapping of unique words to their compressed value, and replacing occurrence of the words with the compressed value

Here is an example:

Input String:
“USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico,Argentina”

Potential Encoder Output String:
“USA,Mexico,Canada,Argentina:0,0,0,0,1,2,1,1,1,3”

The code is assigned to the country in the order they appear in the input, starting from 0 and so on

decode(encode(input)) = input

Solution:

class Solution
{
public static String encode(final String input) {

// Split the input string to assist encoding
String[] encoded = input.split(",");
int n = encoded.length;
// Map to store the country and it's encoding
Map<String, Integer> encoding = new HashMap<>();
// Map for unique countries to provide a reverse lookup for encoding
Map<Integer, String> uniqueCountries = new HashMap<>();
int counter = 0;
// create a map for country and it's code
for( int i=0; i<n; i++) {
if(encoding.containsKey(encoded[i])) {
continue;
}

encoding.put(encoded[i], counter);
uniqueCountries.put(counter, encoded[i]);
counter++;
}
StringBuilder sb = new StringBuilder();int i=0, j=0;// encode the map using the reverse look up of the country code
// You can also use a map that maintains the input order instead of
// reverse lookup
for(Map.Entry<Integer, String> x: uniqueCountries.entrySet()) {
sb.append(uniqueCountries.get(j++));
// to ensure a ',' is not appended at the end
if(j != uniqueCountries.size()) {
sb.append(",");
}
}
sb.append(":");

int z = sb.length();
while( i < n ) {
sb.append(encoding.get(encoded[i++]));
// to ensure a ',' is not appended at the end
if(i!=n) {
sb.append(",");
}
}return sb.toString();
}
public static String decode(final String input) {
// Split the encoded string by country and it's code
String[] countryCode = input.split(":");
String[] countries = countryCode[0].split(",");
String[] code = countryCode[1].split(",");
StringBuilder sb = new StringBuilder();int n = code.length;// decode the String using the respective country provided
for(int i=0; i<n; i++) {
sb.append(countries[Integer.parseInt(code[i])]);
if(i != n-1) {
sb.append(",");
}
}
return sb.toString();
}
public static void main(String[] args) {
String input = "USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico,Argentina";
System.out.println(encode(input));
System.out.println(decode(encode(input)));
}
}

Output:

USA,Mexico,Canada,Argentina:0,0,0,0,1,2,1,1,1,3
USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico,Argentina

decode(encode(input)) = input string.

--

--