Skip to Content

Alien Dictionary

There is a new alien language that uses the English alphabet. However, the order among the letters is unknown to you.

You are given a list of strings words from the alien language's dictionary, where the strings in words are sorted lexicographically by the rules of this new language.

Return a string of the unique letters in the new alien language sorted in lexicographically increasing order by the new language's rules. If there is no solution, return "". If there are multiple solutions, return any of them.

from collections import defaultdict, deque
from typing import Dict, List, Set

class Solution:
    def alienOrder(self, words: List[str]) -> str:
        graph = defaultdict(set)
        inDegree = defaultdict(int)

        self._buildGraph(graph, words, inDegree)
        return self._topology(graph, inDegree)

    def _buildGraph(self, graph: Dict[str, Set[str]], words: List[str], inDegree: Dict[str, int]) -> None:
        # Create a node for each character in each word
        for word in words:
            for c in word:
                inDegree[c] = 0  # necessary for final char counting

        for first, second in zip(words, words[1:]): # or pairwise(words)
            length = min(len(first), len(second))
            for j in range(length):
                u = first[j]
                v = second[j]
                if u != v:
                    if v not in graph[u]:
                        graph[u].add(v)
                        inDegree[v] += 1
                    break  # Later characters' order is meaningless
                if j == length - 1 and len(first) > len(second):
                    # If 'ab' comes before 'a', it's an invalid order
                    graph.clear()
                    return

    def _topology(self, graph: Dict[str, Set[str]], inDegree: Dict[str, int]) -> str:
        result = ''
        q = deque([c for c in inDegree if inDegree[c] == 0])

        while q:
            u = q.popleft()
            result += u
            for v in graph[u]:
                inDegree[v] -= 1
                if inDegree[v] == 0:
                    q.append(v)

        # If there are remaining characters in inDegree, it means there's a cycle
        if any(inDegree.values()):
            return ''

        # Words = ['z', 'x', 'y', 'x']
        return result if len(result) == len(indegree) else ''

############

import collections


class Node(object):
  def __init__(self, val):
    self.val = val
    self.neighbors = []

  def connect(self, node):
    self.neighbors.append(node)

  def getNbrs(self):
    return self.neighbors


class Solution(object):
  def alienOrder(self, words):
    """
    :type words: List[str]
    :rtype: str
    """

    def dfs(root, graph, visited):
      visited[root] = 1
      for nbr in graph[root].getNbrs():
        if visited[nbr.val] == 0:
          if not dfs(nbr.val, graph, visited):
            return False
        elif visited[nbr.val] == 1:
          return False

      visited[root] = 2
      self.ans += root
      return True

    self.ans = ""
    graph = {}
    visited = collections.defaultdict(int)
    self.topNum = 0
    for i in range(0, len(words) - 1):
      a = words[i]
      b = words[i + 1]
      i = 0
      while i < len(a) and i < len(b):
        if a[i] != b[i]:
          nodeA = nodeB = None
          if a[i] not in graph:
            nodeA = Node(a[i])
            graph[a[i]] = nodeA
          else:
            nodeA = graph[a[i]]
          if b[i] not in graph:
            nodeB = Node(b[i])
            graph[b[i]] = nodeB
          else:
            nodeB = graph[b[i]]
          nodeA.connect(nodeB)
          break
        i += 1
      if i < len(a) and i >= len(b):
        return ""

    for c in graph:
      if visited[c] == 0:
        if not dfs(c, graph, visited):
          return ""

    unUsedSet = set()
    for word in words:
      for c in word:
        unUsedSet.add(c)

    for c in unUsedSet:
      if c not in graph:
        self.ans += c
    return self.ans[::-1]

The easiest approach to the Alien Dictionary problem is to process the input into a list of known orders of characters. This is similar to the given inputs for the course schdedule problem where the inputs are a list of prerequisites [course1, course2] where a student must take course1 before course2. Once the data is in this form, the same topological sort approaches can be used to find the answer. It is also common to skip creating the list of inputs and simply build the adjacency list from the inputs. This is done by looking at two adjacent words in the input word list. Iterating the words one character at a time, ie looking at word1[3] and word2[3] at the same time, the first spot where they are not equal gives us one piece of information for the adjacency graph. For example abxcd and abgzz tells us that the character 'x' is lower in lexicographical order than 'g', any characters after this index tells us nothing.

What is Topological Sort?

Topological sort is a linear ordering of vertices in a directed acyclic graph (DAG) where for every directed edge u→v, vertex u comes before v. This technique is often used in scenarios where you need to schedule tasks, resolve symbol dependencies in linkers, or, as in our case, determine the correct order of letters in an alien language.

To solve the Alien Dictionary problem using topological sort, follow these steps:

Extract Character Relationships: Compare adjacent words to determine the order of characters. This step involves iterating through the list of words and identifying the first differing character between each pair of adjacent words.

Build the Graph: Represent the characters as nodes and the identified precedence rules as directed edges.

Perform Topological Sort: Apply topological sorting on the graph. If the graph contains a cycle, it means there is no valid ordering, and you should return an empty string. Otherwise, the topological sort will give you the characters in the correct order according to the alien language.

Example Consider the words list: ["wrt", "wrf", "er", "ett", "rftt"].

Extract Relationships:

"wrt" vs. "wrf": 't' comes before 'f' "wrf" vs. "er": 'w' comes before 'e' "er" vs. "ett": 'r' comes before 't' "ett" vs. "rftt": 'e' comes before 'r' Build the Graph:

Nodes: {w, r, t, f, e} Edges: {w -> e, e -> r, r -> t, t -> f} Resulting order: "wertf" (one possible valid ordering)

Posted by Jamie Meyer 8 months ago

Related Problems

Given an array of distinct integers candidates and a target integer target, return a list of all unique combinations of candidates where the chosen numbers sum to target. You may return the combinations in any order.

The same number may be chosen from candidates an unlimited number of times. Two combinations are unique if the frequency of at least one of the chosen numbers is different.

There are a total of numCourses courses you have to take, labeled from 0 to numCourses - 1. You are given an array prerequisites where prerequisites[i] = [ai, bi] indicates that you must take course bi first if you want to take course ai.

For example, the pair [0, 1], indicates that to take course 0 you have to first take course 1.

Return true if you can finish all courses. Otherwise, return false.

Given an array nums of distinct integers, return all the possible permutations. You can return the answer in any order.

Given a sorted array of distinct integers and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order.

You must write an algorithm with O(log n) runtime complexity.