Working with Microsoft Word Documents: An Open XML SDK C# Guide
Manipulating Microsoft Word documents programmatically can be a daunting task, but it doesn't have to be. The Open XML SDK for Microsoft Office provides a powerful and versatile way to interact with Word documents using C#. This article will guide you through the basics of using the Open XML SDK to create, modify, and read Word documents, focusing on practical examples.
Why Open XML SDK?
The Open XML SDK allows you to access and manipulate Word documents directly, treating them as structured XML files. This offers several advantages over traditional methods:
- Flexibility: You can precisely control the content and formatting of your documents, creating highly customized documents.
- Control: You have complete control over the document structure and elements, allowing you to tailor the document to your specific needs.
- Automation: You can automate tasks like creating documents, merging data, and applying specific formatting, saving time and effort.
Setting up Your Environment
-
Install the Open XML SDK: Install the "DocumentFormat.OpenXml" NuGet package in your C# project.
-
Namespaces: Include the following namespaces in your C# code:
using DocumentFormat.OpenXml; using DocumentFormat.OpenXml.Packaging; using DocumentFormat.OpenXml.Wordprocessing;
Creating a Simple Word Document
Let's start by creating a simple Word document with some basic text.
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
// Create a new Word document
using (WordprocessingDocument document = WordprocessingDocument.Create("myDocument.docx", WordprocessingDocumentType.Document))
{
// Add a main document part
MainDocumentPart mainPart = document.AddMainDocumentPart();
// Create a new body element
Body body = new Body();
mainPart.Document = new Document(body);
// Add a paragraph
Paragraph paragraph = new Paragraph();
Run run = new Run(new Text("This is a simple example document created using the Open XML SDK."));
paragraph.Append(run);
body.Append(paragraph);
// Save the document
document.Save();
}
Explanation:
- Create a new Word document: The
WordprocessingDocument.Create()
method creates a new Word document with the specified filename and document type. - Add a main document part: Every Word document has a main document part, which contains the main content of the document.
- Create a body element: The
Body
element holds all the content of the document, such as paragraphs, tables, and images. - Add a paragraph: We create a
Paragraph
element and add aRun
element containing the text "This is a simple example document created using the Open XML SDK.". - Save the document: Finally, we call
document.Save()
to save the changes to the Word file.
Adding Images to Your Document
Let's see how to insert an image into our Word document.
// ... (code from previous example)
// Insert an image
string imagePath = "myImage.jpg";
// Add an image to the body
ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Jpeg);
imagePart.FeedData(File.OpenRead(imagePath));
// Create a drawing element
Drawing drawing = new Drawing();
Inline inline = new Inline();
ExtendedProperties extendedProperties = new ExtendedProperties(new NonVisualDrawingProperties(), new NonVisualDrawingPropertiesExtensionList());
NonVisualDrawingProperties nonVisualDrawingProperties = new NonVisualDrawingProperties(new PicLocks(NoChangeAspect: true));
// Specify the image properties
NonVisualDrawingProperties nonVisualDrawingProperties2 = new NonVisualDrawingProperties(new PicLocks(NoChangeAspect: true));
DocProperties docProperties = new DocProperties(id: "1", name: "Picture 1");
Anchor anchor = new Anchor(new DocProperties(id: "1", name: "Picture 1"), new Graphic(new GraphicData(new Picture(new NonVisualPictureProperties(new NonVisualDrawingProperties(new PicLocks(NoChangeAspect: true)), new NonVisualDrawingPropertiesExtensionList(), new NonVisualPictureDrawingProperties()), new NonVisualPicturePropertiesExtensionList(), new ShapeProperties(), new PictureLocks(NoChangeAspect: true), new ExtensionList(), new PictureFill(new SolidFill(new SchemeColor(val: SchemeColorValues.Text1))), new PictureEffects(new Blur(type: BlurValues.Gaussian, Radius: 50, Color: new SchemeColor(val: SchemeColorValues.Text2)))), new NonVisualGroupDrawingProperties(new GroupLocks(NoChangeAspect: true))));
drawing.Append(anchor);
// Add the drawing to the paragraph
paragraph.Append(drawing);
Explanation:
- Add an image part: We use
mainPart.AddImagePart()
to add an image part to the document, specifying the image type (in this case, JPEG). - Feed image data: We then use
imagePart.FeedData()
to load the image data from the specified file. - Create a drawing element: We create a
Drawing
element, anInline
element, and anExtendedProperties
element. - Specify image properties: We define non-visual properties, document properties, an anchor, and a graphic element to specify the image's size, position, and appearance.
- Add the drawing to the paragraph: Finally, we append the
Drawing
element to ourParagraph
.
Reading Word Document Content
Now let's see how to read the text content from an existing Word document.
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
string documentPath = "myDocument.docx";
// Open the Word document
using (WordprocessingDocument document = WordprocessingDocument.Open(documentPath, false))
{
// Access the main document part
MainDocumentPart mainPart = document.MainDocumentPart;
// Iterate through the paragraphs
foreach (Paragraph paragraph in mainPart.Document.Body.Descendants())
{
// Iterate through the runs in each paragraph
foreach (Run run in paragraph.Descendants())
{
// Get the text from the run
string text = run.InnerText;
Console.WriteLine(text);
}
}
}
Explanation:
- Open the Word document: We use
WordprocessingDocument.Open()
to open the existing Word document. - Access the main document part: We retrieve the
MainDocumentPart
from the document. - Iterate through paragraphs and runs: We use
Descendants<Paragraph>()
to iterate through allParagraph
elements in the body and then useDescendants<Run>()
to access eachRun
element within the paragraph. - Get the text: We extract the text content from each
Run
element usingInnerText
.
Working with Tables
Creating and manipulating tables in Word documents is another essential task. Here's an example of adding a simple table to your document.
// ... (code from previous examples)
// Create a table
Table table = new Table();
// Create a table row
TableRow tableRow = new TableRow();
// Create table cells
TableCell tableCell1 = new TableCell(new Paragraph(new Run(new Text("Cell 1"))));
TableCell tableCell2 = new TableCell(new Paragraph(new Run(new Text("Cell 2"))));
// Add cells to the row
tableRow.Append(tableCell1);
tableRow.Append(tableCell2);
// Add the row to the table
table.Append(tableRow);
// Add the table to the body
body.Append(table);
Explanation:
- Create a table: We create a
Table
element to represent the table in our document. - Create a table row: We create a
TableRow
element to hold the cells of a single row. - Create table cells: We create
TableCell
elements for each cell in the row, adding a paragraph with text content to each cell. - Add cells to the row: We append the created cells to the
TableRow
. - Add the row to the table: We append the
TableRow
to theTable
element. - Add the table to the body: Finally, we add the
Table
element to theBody
of the document.
Conclusion
The Open XML SDK for Microsoft Office offers a powerful and flexible way to interact with Word documents programmatically using C#. By understanding the basic concepts and using the examples provided, you can leverage the SDK's capabilities to automate document creation, modification, and reading. Remember to explore the full range of Open XML SDK elements and classes to implement more complex and customized document manipulation tasks.