Skip to main content

Cloud

Test Document Generator

Many of the projects I do have some form of document management in them. Whether developing demos or features around document management it is often necessary to have test documents. Most of the actual word documents I generate for my role are around sales or feature documentation and have a bit of proprietary information within them making them not ideal for demoing to other clients. So I find my self opening word and pasting in some Lorum Ipsum into the document and saving then repeating till I have enough documents. The problem with this is if i want to show of search or prove to a client that the right document is coming up it is hard because they are either identical or indistinguishable from one another.
I decided to write a little console app that would go to Wikipedia and copy 200 of its pages and save them as word documents. This gave me 200 test documents that each had unique and meaningful yet harmless information very quickly.
I used the OpenXML 2.0 sdk and a CodePlex project called HTML to Open XML to convert the html into word documents. The formatting is not 100% but good enough for my purposes.
Below is the code for the console app, I hope this helps someone else out.
 

using System;
using System.Linq;
using DocumentFormat.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml;
using System.Net;
using System.IO;
using NotesFor.HtmlToOpenXml;
using System.Collections.Generic;
namespace DocumentGenerator
{
    class Program
    {
        static void Main(string[] args)
        {
            for (int i = 0; i < 200; i++)
            {
                GetDocs(@"c:\temp\Test" + i + ".docx", "http://en.wikipedia.org/wiki/special:Random");
                System.Threading.Thread.Sleep(100);
            }
        }
        private static string GetPageText(string url)
        {
            WebClient client = new WebClient();
            return client.DownloadString(url);
        }
        private static void GetDocs(string docName, string url)
        {
            bool isDocGenerated = true;
            using (MemoryStream generatedDocument = new MemoryStream())
            {
                using (WordprocessingDocument package = WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
                {
                    MainDocumentPart mainPart = package.MainDocumentPart;
                    if (mainPart == null)
                    {
                        mainPart = package.AddMainDocumentPart();
                        new Document(new Body()).Save(mainPart);
                    }
                    HtmlConverter converter = new HtmlConverter(mainPart);
                    Body body = mainPart.Document.Body;
                    string source = GetPageText(url);
                    converter.ConsiderDivAsParagraph = true;
                    converter.ExcludeLinkAnchor = true;
                    converter.ImageProcessing = ImageProcessing.Ignore;
                    IList<OpenXmlCompositeElement> paragraphs = null;
                    try
                    {
                        paragraphs = converter.Parse(source);
                        for (int i = 0; i < paragraphs.Count; i++)
                        {
                            body.Append(paragraphs[i]);
                        }
                    }
                    catch {
                        isDocGenerated = false;
                    }
                    mainPart.Document.Save();
                }
                if (isDocGenerated)
                {
                    if (File.Exists(docName)) File.Delete(docName);
                    File.WriteAllBytes(docName, generatedDocument.ToArray());
                }
            }
        }
    }
}

 

Thoughts on “Test Document Generator”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

David Palfery

More from this Author

Follow Us