I am facing some very weird behavior with my Powershell script. I am getting values from a pdf document via Word as a ComObject. Getting the values works fine and is no problem, but as soon as I try to concatenate two variables I got from the document, after concatenating them one of them is missing. At first I suspected it was a problem with the underscore, but after trying out every possible form of escaping, such as `_ or ${variable}_ and even substituting it with another characters such as a space, I still have the same problem.
I can display the variables by themselves and they return the right value, but after concatenation I get some very weird behavior.
$filepath = "C:\Users\xxxx\Desktop\all_spools\" $wd = New-Object -ComObject Word.Application $wd.Visible = $false $files = Get-ChildItem -path $filepath foreach($file in $files) { $doc = $wd.Documents.Open($file.FullName) if ($doc.tables(1).rows.count -eq 7) { $docnum = $doc.tables(1).Columns(2).cells(2).Range.Text $intdocarr = $doc.tables(1).Columns(2).cells(7).Range.Text $intdocnum = $intdocarr.split(" ") $finalintdocnum = $intdocnum | Select-Object -first 1 $doc.Close() } else { $docnum = $doc.tables(1).Columns(2).cells(2).Range.Text $intdocarr = $doc.tables(1).Columns(2).cells(8).Range.Text $intdocnum = $intdocarr.split(" ") $finalintdocnum = $intdocnum | Select-Object -first 1 $doc.Close() } $filename = "${docnum}_$finalintdocnum.pdf" $filename } $wd.Quit()
My expected output would be something like "90004234_74503423.pdf", but infact I get "_74503424.pdf". Sometimes at random it becomes "_74503423.pdf90004234", which is not reproducible. I am kind of lost.
When the variable $docnum is used to rename the file, I get an error with illegal charaters in path, I have also tried stripping everything and regexing out only numbers.
My question is, am I missing something? I cannot figure out why this would not work.
21 Answer
If you pipe $doc.tables(1).Columns(2).cells(2).Range.Text to clip and paste into a more revealing text editor (I'm using Notepad++), you'll see that an ascii character is captured that you're not expecting.
You can change the declaration of $docnum by excluding these characters, in both your if and else.
$docnum = ($doc.tables(1).Columns(2).cells(2).Range.Text) -replace "[\x00-\x1F]+"Also, you should set $filename like this. The way you have it, the underscore is treated as part of the variable name.
$filename = "$docnum" + "_" + "$finalintdocnum.pdf" 1