Dictionary Encoding Java
Dictionary encoding is a loss-less compression technique to compress a list of words. Dictionary encoding works by maintaining a mapping of unique words to their compressed value, and replacing occurrence of the words with the compressed value
Here is an example:
Input String:
“USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico,Argentina”
Potential Encoder Output String:
“USA,Mexico,Canada,Argentina:0,0,0,0,1,2,1,1,1,3”
The code is assigned to the country in the order they appear in the input, starting from 0 and so on
decode(encode(input)) = input
Solution:
class Solution
{
public static String encode(final String input) {
// Split the input string to assist encoding
String[] encoded = input.split(",");
int n = encoded.length;// Map to store the country and it's encoding
Map<String, Integer> encoding = new HashMap<>();
// Map for unique countries to provide a reverse lookup for encoding
Map<Integer, String> uniqueCountries = new HashMap<>();
int counter = 0;// create a map for country and it's code
for( int i=0; i<n; i++) {
if(encoding.containsKey(encoded[i])) {
continue;
}
encoding.put(encoded[i], counter);
uniqueCountries.put(counter, encoded[i]);
counter++;
}StringBuilder sb = new StringBuilder();int i=0, j=0;// encode the map using the reverse look up of the country code
// You can also use a map that maintains the input order instead of
// reverse lookup
for(Map.Entry<Integer, String> x: uniqueCountries.entrySet()) {
sb.append(uniqueCountries.get(j++));// to ensure a ',' is not appended at the end
if(j != uniqueCountries.size()) {
sb.append(",");
}
}sb.append(":");
int z = sb.length();
while( i < n ) {
sb.append(encoding.get(encoded[i++]));// to ensure a ',' is not appended at the end
if(i!=n) {
sb.append(",");
}}return sb.toString();
}public static String decode(final String input) {
// Split the encoded string by country and it's code
String[] countryCode = input.split(":");
String[] countries = countryCode[0].split(",");
String[] code = countryCode[1].split(",");StringBuilder sb = new StringBuilder();int n = code.length;// decode the String using the respective country provided
for(int i=0; i<n; i++) {
sb.append(countries[Integer.parseInt(code[i])]);if(i != n-1) {
sb.append(",");
}
}return sb.toString();
}public static void main(String[] args) {
String input = "USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico,Argentina";System.out.println(encode(input));
System.out.println(decode(encode(input)));
}
}
Output:
USA,Mexico,Canada,Argentina:0,0,0,0,1,2,1,1,1,3
USA,USA,USA,USA,Mexico,Canada,Mexico,Mexico,Mexico,Argentina
decode(encode(input)) = input string.