/r/dailyprogrammer – Poetry in a Haystack

I decided to start participating in some of the programming challenges in the dailyprogrammer subreddit. I'm not saying I'm going to do them every day, but maybe one a week; we'll see! The challenge from July 3 is the first one I've attempted. The challenge is to find 3 lines from a famous poem given a text file with 50000 lines in it. Of those 50000 lines, 49997 of them are gibberish. I will be solving the challenges in C# unless otherwise noted.

Here is the solution I used for the poetry in a haystack challenge:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;

namespace July5_2015
{
    class Program
    {
        static void Main(string[] args)
        {
            const int TOLERANCE = 5;            
            DateTime start = DateTime.Now;

            try
            {
                List<string>words = new List<string>();

                Console.Write("Loading Dictionary...");
                using( StreamReader dic = new StreamReader("US.dic"))
                {                    
                    string line = dic.ReadLine();
                    while(line != null)
                    {
                        words.Add(line);
                        line = dic.ReadLine();
                    }
                    Console.Write("Done\n");
                }

                Console.WriteLine("Processing data file...");
                using( StreamReader sr = new StreamReader("./challenge.txt") )
                {
                    string line = sr.ReadLine();
                    int count = 0;
                    int misses = 0;
                    string searchStr;
                    string[] poem;

                    while( line != null )
                    {
                        searchStr = line.Replace(",", "");
                        poem = searchStr.Split(' ');
                        misses = 0;
                        
                        foreach (string str in poem)
                        {
                            if (str.Length < 2)
                                continue;

                            if (words.IndexOf(str) < 0)
                            {    
                                ++misses;                                 
                                if (misses >= TOLERANCE)
                                {
                                    break;
                                }
                            }
                        }

                        if (misses < TOLERANCE)
                        {                             
                            Console.WriteLine(++count + ") " + line);         
                        }
                         
                        line = (count >= 3) ? null : sr.ReadLine();
                    }
                }
            }
            catch(Exception e)
            {
                Console.WriteLine(e.Message);
            }

            TimeSpan total = DateTime.Now - start;

            Console.WriteLine("DONE");
            Console.WriteLine("Elapsed: " + total.Minutes + "M " + total.Seconds + "S");
            Console.ReadLine();
        }
    }
}

This solution is fairly straight forward. I loaded a dictionary file (here) and then broke up each line of the challenge.txt from reddit into an array of words. The program also removes commas before trying to match the challenge words to those in the dictionary file. As the program compares words from the challenge file with the dictionary, it skips over words that are just one character in length. If a word wasn't found in the dictionary, the misses variable is incremented by one. If misses exceeds a pre-determined tolerance, then the line is disregarded. I ran the program a couple of times to fine-tune the tolerance so that all 3 lines from the poem are displayed.

It does take the program a while to find the correct lines, so I added a timer just because I was curious. On my pc, it takes just under 2 minutes to find the lines from the poem which are shown below.

sing, o goddess, the anger of achilles son of peleus, that brought countless ills upon the achaeans.

many a brave soul did it send hurrying down to hades, and many a hero did it yield a prey to dogs and vultures,

for so were the counsels of jove fulfilled from the day on which the son of atreus, king of men, and great achilles, first fell out with one another.

Bookmark and Share

Leave a Reply

Subscribe to RSS feed FGS5 Badge