Google+ Followers

07 December 2011

Blanagrams

A blanagram is an anagram for another word except for the substitution of one letter. Think of Scrabble with a blank square (blank + anagram = blanagram). For example, phyla is a blanagram of glyph; replace the "g" with an "a" and find anagrams (see blog on Anagrams).

We will prototype with a simple word, glyph. We'll start by getting a list of the individual characters in glyph.

In[1]:= Characters["glyph"]
Out[1]= {"g", "l", "y", "p", "h"}


Start by replacing the first letter in glyph with an a and then finding all anagrams. The third argument to StringReplacePart is a list of beginning and ending positions for the replacement.

In[2]:= StringReplacePart["glyph", "a", {1, 1}]
Out[2]= "alyph"

In[3]:= Anagrams[%]
Out[3]= {"phyla", "haply"}

Now do the same for each character position in the word.

In[4]:= Map[StringReplacePart["glyph", "a", {#, #}] &, Range[StringLength["glyph"]]]
Out[4]= {"alyph", "gayph", "glaph", "glyah", "glypa"}

In[5]:= Flatten[Map[Anagrams, %]]
Out[5]= {"phyla", "haply"}

Having done this for the letter a, we now repeat for all other single characters.

In[6]:= CharacterRange["a", "z"]
Out[6]= {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"}

In[7]:= blana = Table[
  Map[StringReplacePart["glyph", ch, {#, #}] &,
   Range[StringLength["glyph"]]],
  {ch, CharacterRange["a", "z"]}]

Out[7]= {{"alyph", "gayph", "glaph", "glyah", "glypa"}, {"blyph", "gbyph", "glbph", "glybh", "glypb"}, {"clyph", "gcyph", "glcph", "glych", "glypc"}, {"dlyph", "gdyph", "gldph", "glydh", "glypd"}, {"elyph", "geyph", "gleph", "glyeh", "glype"}, {"flyph", "gfyph", "glfph", "glyfh", "glypf"}, {"glyph", "ggyph", "glgph", "glygh", "glypg"}, {"hlyph", "ghyph", "glhph", "glyhh", "glyph"}, {"ilyph", "giyph", "gliph", "glyih", "glypi"}, {"jlyph", "gjyph", "gljph", "glyjh", "glypj"}, {"klyph", "gkyph", "glkph", "glykh", "glypk"}, {"llyph", "glyph", "gllph", "glylh", "glypl"}, {"mlyph", "gmyph", "glmph", "glymh", "glypm"}, {"nlyph", "gnyph", "glnph", "glynh", "glypn"}, {"olyph", "goyph", "gloph", "glyoh", "glypo"},{"plyph", "gpyph", "glpph", "glyph", "glypp"}, {"qlyph", "gqyph", "glqph", "glyqh", "glypq"}, {"rlyph", "gryph", "glrph", "glyrh", "glypr"}, {"slyph", "gsyph", "glsph", "glysh", "glyps"}, {"tlyph", "gtyph", "gltph", "glyth", "glypt"}, {"ulyph", "guyph", "gluph", "glyuh", "glypu"}, {"vlyph", "gvyph", "glvph", "glyvh", "glypv"}, {"wlyph", "gwyph", "glwph", "glywh", "glypw"}, {"xlyph", "gxyph", "glxph", "glyxh", "glypx"}, {"ylyph", "gyyph", "glyph", "glyyh", "glypy"}, {"zlyph", "gzyph", "glzph", "glyzh", "glypz"}}

Because of the extra nesting of the output (Table[Map[...]]) we need to flatten the output at a deeper level. And delete duplicates.

In[8]:= Flatten[Map[Anagrams, blana, {2}]] // DeleteDuplicates
Out[8]= {"phyla", "haply", "glyph", "lymph", "sylph"}

Finally, put all the pieces together to create a reusable function, Blanagrams.

In[9]:= Blanagrams[word_String] := Module[{blana},
  blana = Table[
    Map[StringReplacePart[word, ch, {#, #}] &,  
      Range[StringLength[word]]],
    {ch, CharacterRange["a", "z"]}];
    DeleteDuplicates[Flatten[Map[Anagrams, blana, {2}]]]
  ]

This turns out to be fairly quick for small words, but it bogs down for larger words.

In[10]:= Blanagrams["glyph"] //Timing
Out[10]= {1.25209, {"phyla", "haply", "glyph", "lymph", "sylph"}}

In[11]:= Blanagrams["string"] //Timing
Out[11]= {8.85717, {"taring", "rating", "gratin", "arsing", "grains", "sating", "giants", "grants", "gratis", "strain", "trains", "brings", "grinds", "sering", "singer", "signer", "resign", "reigns", "ingres", "signet", "tinges", "ingest", "tigers", "insert", "inters", "inerts", "string", "things", "nights", "rights", "girths", "tiring", "siring", "rising", "siting", "tigris", "glints", "roting", "soring", "signor", "groins", "ingots", "strong", "intros", "spring", "sprint", "prints", "stings", "turing", "truing", "strung", "grunts", "wrings", "trying", "stying", "stingy"}}

Because we are doing similar computations for each of the letters of the alphabet, this is a good candidate for parallelizing. But which parts do we parallelize? One way to help determine where the computational bottlenecks are is to profile the code. Integrated development environments such as Wolfram Workbench have built-in profilers, but here we will create a simple set of profiling steps to determine where our bottlenecks are.

Here is a small auxiliary function that wraps AbsoluteTiming around an expression and adds a tag to make it easy to identify the various parts of what is reported.

In[12]:= timing[expr_, tag_] := Print[{NumberForm[First@AbsoluteTiming[expr], 10], tag}]
In[13]:= SetAttributes[timing, HoldAll];
In[14]:= word = "string";
timing[
  tmp = Table[Map[StringReplacePart[word, ch, {#, #}] &,   
  Range[StringLength[word]]],{ch, CharacterRange["a", "z"]}];,
   "table"
  ];

timing[
  Flatten[tmp2 = Map[Anagrams, tmp, {2}]];,
  "map Anagrams"
  ];

timing[
  DeleteDuplicates[Flatten[tmp2]];,
  "flatten and delete duplicates"
  ]
Out[15]= {0.000794,table}
Out[16]= {8.986847,map Anagrams}
Out[17]= {0.000050,flatten and delete duplicates}

Not surprisingly, creating the many possible letter combinations is very quick. The greatest part of this computation is spent with mapping Anagrams across the many word combinations. So we can simply try to parallelize that using ParallelMap.

In[17]:= BlanagramsParallel[word_String]:=
  Module[{blana},
    blana = Table[Map[StringReplacePart[word,ch{#,#}]&, 
         Range[StringLength[word]]],{ch,CharacterRange["a","z"]}];
    DeleteDuplicates@Flatten[ParallelMap[Anagrams,blana,{2}]]
  ]

Launch kernels on our current machine and compute.

In[18]:= LaunchKernels[]
Out[18]= {KernelObject[1,local],KernelObject[2,local]}

In[19]:= DistributeDefinitions[Anagrams]
Out[19]= {Anagrams,word}

In[20]:= BlanagramsParallel["strings"]//AbsoluteTiming
Out[20]= {37.792462,{ratings,gratins,staring,strains,resting,stinger,singers,signers,resigns,ingress,signets,ingests,tigress,inserts,strings,tirings,risings,sitings,sorting,storing,signors,tossing,springs,sprints,rusting,tryings,stringy}}

For comparison, here is the computation done serially on one kernel.

In[21]:= Blanagrams["strings"]//AbsoluteTiming
Out[21]= {47.756149,{ratings,gratins,staring,strains,resting,stinger,singers,signers,resigns,ingress,signets,ingests,tigress,inserts,strings,tirings,risings,sitings,sorting,storing,signors,tossing,springs,sprints,rusting,tryings,stringy}}

With the 2-kernel machine this was run on, we are getting a slight speedup. This particular computation parallelizes well and so on an 8-kernel machine, we would see substantially faster compute times.

In[22]:= First[%]/First[%%]
Out[22]= 1.2636422


Anagrams

Anagrams are words that have the same set of letters but in a different order. For example, "modern" and "normed" are anagrams; so are "algorithm" and "logarithm". Good Scrabble players are adept at anagram creation.

From a programmatic point of view, anagrams can be created by first extracting the characters in a word, permuting those characters, and then finding which permutations are real words.

Let's start by getting the characters in a word:

In[1]:= chars = Characters["tame"]
Out[1]= {t,a,m,e}

Permute the characters.

In[2]:= p = Permutations[chars]
Out[2]= {{t,a,m,e},{t,a,e,m},{t,m,a,e},{t,m,e,a},{t,e,a,m},{t,e,m,a},{a,t,m,e},{a,t,e,m},{a,m,t,e},{a,m,e,t},{a,e,t,m},{a,e,m,t},{m,t,a,e},{m,t,e,a},{m,a,t,e},{m,a,e,t},{m,e,t,a},{m,e,a,t},{e,t,a,m},{e,t,m,a},{e,a,t,m},{e,a,m,t},{e,m,t,a},{e,m,a,t}}

Concatenate the characters in each list:

In[3]:= words = Map[StringJoin, p]
Out[3]= {tame,taem,tmae,tmea,team,tema,atme,atem,amte,amet,aetm,aemt,mtae,mtea,mate,maet,meta,meat,etam,etma,eatm,eamt,emta,emat}

Note that OutputForm of strings in Mathematica omits the double-quotes.

In[4]:= "string"
Out[4]= string

You can still see them using FullForm or InputForm.

In[5]:= FullForm[%]
Out[5]//FullForm= "string"

Now, which of these "words" are really words? You can select those that are in the dictionary. Those elements in words that are not in the dictionary will return {} when run against DictionaryLookup, so we omit those using !=.

In[6]:= Select[words, DictionaryLookup[#, IgnoreCase -> True]!={}&]
Out[6]= {tame,team,mate,meta,meat}

Note: you could substitute any list of words for DictionaryLookup[...] here including languages other than English; e.g.,

In[7]:= DictionaryLookup[All]
Out[7]= {Arabic,BrazilianPortuguese,Breton,BritishEnglish,Catalan,Croatian,Danish,Dutch,English,Esperanto,Faroese,Finnish,French,Galician,German,Hebrew,Hindi,Hungarian,IrishGaelic,Italian,Latin,Polish,Portuguese,Russian,ScottishGaelic,Spanish,Swedish}

So putting all the pieces together, we have the function Anagrams.

In[8]:= Anagrams[word_String]:= 
Module[{chars = Characters[word],words},
  words = Map[StringJoin, Permutations[chars]];
  Select[words, DictionaryLookup[#, IgnoreCase -> True]!={}&]]

In[9]:= Anagrams["elvis"]
Out[9]= {elvis,evils,levis,lives,veils}

In[10]:= Anagrams["instance"]
Out[10]= {instance,ancients,canniest}

The exercises include an example of a more direct approach to this problem, one that avoids the creation of permutations of the characters in the word.

In the next blog post we will put Anagrams to work in creating blanagrams, words that differ by one letter.

-- Excerpted from Programming with Mathematica: An Introduction, Cambridge University Press, 2013.