Google+ Followers

07 December 2011

Blanagrams

A blanagram is an anagram for another word except for the substitution of one letter. Think of Scrabble with a blank square (blank + anagram = blanagram). For example, phyla is a blanagram of glyph; replace the "g" with an "a" and find anagrams (see blog on Anagrams).

We will prototype with a simple word, glyph. We'll start by getting a list of the individual characters in glyph.

In[1]:= Characters["glyph"]
Out[1]= {"g", "l", "y", "p", "h"}


Start by replacing the first letter in glyph with an a and then finding all anagrams. The third argument to StringReplacePart is a list of beginning and ending positions for the replacement.

In[2]:= StringReplacePart["glyph", "a", {1, 1}]
Out[2]= "alyph"

In[3]:= Anagrams[%]
Out[3]= {"phyla", "haply"}

Now do the same for each character position in the word.

In[4]:= Map[StringReplacePart["glyph", "a", {#, #}] &, Range[StringLength["glyph"]]]
Out[4]= {"alyph", "gayph", "glaph", "glyah", "glypa"}

In[5]:= Flatten[Map[Anagrams, %]]
Out[5]= {"phyla", "haply"}

Having done this for the letter a, we now repeat for all other single characters.

In[6]:= CharacterRange["a", "z"]
Out[6]= {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"}

In[7]:= blana = Table[
  Map[StringReplacePart["glyph", ch, {#, #}] &,
   Range[StringLength["glyph"]]],
  {ch, CharacterRange["a", "z"]}]

Out[7]= {{"alyph", "gayph", "glaph", "glyah", "glypa"}, {"blyph", "gbyph", "glbph", "glybh", "glypb"}, {"clyph", "gcyph", "glcph", "glych", "glypc"}, {"dlyph", "gdyph", "gldph", "glydh", "glypd"}, {"elyph", "geyph", "gleph", "glyeh", "glype"}, {"flyph", "gfyph", "glfph", "glyfh", "glypf"}, {"glyph", "ggyph", "glgph", "glygh", "glypg"}, {"hlyph", "ghyph", "glhph", "glyhh", "glyph"}, {"ilyph", "giyph", "gliph", "glyih", "glypi"}, {"jlyph", "gjyph", "gljph", "glyjh", "glypj"}, {"klyph", "gkyph", "glkph", "glykh", "glypk"}, {"llyph", "glyph", "gllph", "glylh", "glypl"}, {"mlyph", "gmyph", "glmph", "glymh", "glypm"}, {"nlyph", "gnyph", "glnph", "glynh", "glypn"}, {"olyph", "goyph", "gloph", "glyoh", "glypo"},{"plyph", "gpyph", "glpph", "glyph", "glypp"}, {"qlyph", "gqyph", "glqph", "glyqh", "glypq"}, {"rlyph", "gryph", "glrph", "glyrh", "glypr"}, {"slyph", "gsyph", "glsph", "glysh", "glyps"}, {"tlyph", "gtyph", "gltph", "glyth", "glypt"}, {"ulyph", "guyph", "gluph", "glyuh", "glypu"}, {"vlyph", "gvyph", "glvph", "glyvh", "glypv"}, {"wlyph", "gwyph", "glwph", "glywh", "glypw"}, {"xlyph", "gxyph", "glxph", "glyxh", "glypx"}, {"ylyph", "gyyph", "glyph", "glyyh", "glypy"}, {"zlyph", "gzyph", "glzph", "glyzh", "glypz"}}

Because of the extra nesting of the output (Table[Map[...]]) we need to flatten the output at a deeper level. And delete duplicates.

In[8]:= Flatten[Map[Anagrams, blana, {2}]] // DeleteDuplicates
Out[8]= {"phyla", "haply", "glyph", "lymph", "sylph"}

Finally, put all the pieces together to create a reusable function, Blanagrams.

In[9]:= Blanagrams[word_String] := Module[{blana},
  blana = Table[
    Map[StringReplacePart[word, ch, {#, #}] &,  
      Range[StringLength[word]]],
    {ch, CharacterRange["a", "z"]}];
    DeleteDuplicates[Flatten[Map[Anagrams, blana, {2}]]]
  ]

This turns out to be fairly quick for small words, but it bogs down for larger words.

In[10]:= Blanagrams["glyph"] //Timing
Out[10]= {1.25209, {"phyla", "haply", "glyph", "lymph", "sylph"}}

In[11]:= Blanagrams["string"] //Timing
Out[11]= {8.85717, {"taring", "rating", "gratin", "arsing", "grains", "sating", "giants", "grants", "gratis", "strain", "trains", "brings", "grinds", "sering", "singer", "signer", "resign", "reigns", "ingres", "signet", "tinges", "ingest", "tigers", "insert", "inters", "inerts", "string", "things", "nights", "rights", "girths", "tiring", "siring", "rising", "siting", "tigris", "glints", "roting", "soring", "signor", "groins", "ingots", "strong", "intros", "spring", "sprint", "prints", "stings", "turing", "truing", "strung", "grunts", "wrings", "trying", "stying", "stingy"}}

Because we are doing similar computations for each of the letters of the alphabet, this is a good candidate for parallelizing. But which parts do we parallelize? One way to help determine where the computational bottlenecks are is to profile the code. Integrated development environments such as Wolfram Workbench have built-in profilers, but here we will create a simple set of profiling steps to determine where our bottlenecks are.

Here is a small auxiliary function that wraps AbsoluteTiming around an expression and adds a tag to make it easy to identify the various parts of what is reported.

In[12]:= timing[expr_, tag_] := Print[{NumberForm[First@AbsoluteTiming[expr], 10], tag}]
In[13]:= SetAttributes[timing, HoldAll];
In[14]:= word = "string";
timing[
  tmp = Table[Map[StringReplacePart[word, ch, {#, #}] &,   
  Range[StringLength[word]]],{ch, CharacterRange["a", "z"]}];,
   "table"
  ];

timing[
  Flatten[tmp2 = Map[Anagrams, tmp, {2}]];,
  "map Anagrams"
  ];

timing[
  DeleteDuplicates[Flatten[tmp2]];,
  "flatten and delete duplicates"
  ]
Out[15]= {0.000794,table}
Out[16]= {8.986847,map Anagrams}
Out[17]= {0.000050,flatten and delete duplicates}

Not surprisingly, creating the many possible letter combinations is very quick. The greatest part of this computation is spent with mapping Anagrams across the many word combinations. So we can simply try to parallelize that using ParallelMap.

In[17]:= BlanagramsParallel[word_String]:=
  Module[{blana},
    blana = Table[Map[StringReplacePart[word,ch{#,#}]&, 
         Range[StringLength[word]]],{ch,CharacterRange["a","z"]}];
    DeleteDuplicates@Flatten[ParallelMap[Anagrams,blana,{2}]]
  ]

Launch kernels on our current machine and compute.

In[18]:= LaunchKernels[]
Out[18]= {KernelObject[1,local],KernelObject[2,local]}

In[19]:= DistributeDefinitions[Anagrams]
Out[19]= {Anagrams,word}

In[20]:= BlanagramsParallel["strings"]//AbsoluteTiming
Out[20]= {37.792462,{ratings,gratins,staring,strains,resting,stinger,singers,signers,resigns,ingress,signets,ingests,tigress,inserts,strings,tirings,risings,sitings,sorting,storing,signors,tossing,springs,sprints,rusting,tryings,stringy}}

For comparison, here is the computation done serially on one kernel.

In[21]:= Blanagrams["strings"]//AbsoluteTiming
Out[21]= {47.756149,{ratings,gratins,staring,strains,resting,stinger,singers,signers,resigns,ingress,signets,ingests,tigress,inserts,strings,tirings,risings,sitings,sorting,storing,signors,tossing,springs,sprints,rusting,tryings,stringy}}

With the 2-kernel machine this was run on, we are getting a slight speedup. This particular computation parallelizes well and so on an 8-kernel machine, we would see substantially faster compute times.

In[22]:= First[%]/First[%%]
Out[22]= 1.2636422


No comments:

Post a Comment