We have a PowerShell script which creates an user in Microsoft Exchange and Active Directory.
We get the user's data by a preformated txt which serves as sort of CSV with:
$data = import-csv signup.txt
But the problem is that, as we are from Spain, sometimes it arises the character ñ which isn't picked up by the script and generates a bad username and bad data. So, we put it with N and then enter in Exchange and we change it from there again.
How can I fix that problem?
I recommend converting the file to UTF-8. Because the import-csv cmdlet works with it.
I usually create an empty file in notepad++ with UTF-8 encoding and copy the text from the other file.
Or as stated here
Get-Content signup.txt -Encoding Ascii | Out-File signup_utf8.txt -Encoding UTF8
Import-Csv signup_utf8.txt
Related
I am trying to create a script that converts the encoding of a collection of CSV files (10-20 files) in a directory into UTF-8 encoding. Currently, I am doing this manually by opening each individual file In NotePad+ and then switching the encoding to UTF-8, and then re-saving.
Are there any Windows commands or something else (I have Cygwin installed as well), that I could use to build a script to do this? Ideally, I would like the script to loop through every CSV file in the directory, and convert it into a UTF-8.
Thank you in advance for the help!!!
You're not specifying what to convert from, but assuming the input encoding is Windows-1252, try
for file in *.csv; do
iconv -f cp-1252 <"$file" >"$file".tmp &&
mv "$file.tmp" "$file"
done
This could leave some files unconverted (for example, if the input file contains bytes which are undefined in the source encoding) but will not overwrite the source file in this scenario. (Maybe disable the mv logic until you can see whether it works without errors.)
You can easily do that in PowerShell
Get-Content filename.csv | Set-Content -Encoding utf8 filename-utf8.csv
For your loop, you need to modularize your commands to where you can reference them and call them properly. In your case, you need to be calling "baseName" and appending ".csv" to it After that, simply using the right variables in the right places in ForEach loop will make it work.
$a = Get-ChildItem
ForEach ($item in $a) {
Get-Content $item.FullName | Set-Content -Encoding utf8 "$($item.Basename).csv.utf8"
}
Remember that before Powershell 6, Microsoft includes BOM ( Byte-Order Mark ). Three chars placed at the beginning of a file in the conversion.
The conversion needs to create an additional file that later on you can use mv for replacing the original.
I am trying to extract each line from a CSV that has over 1million (1,000,000) lines, where the first character is a 1.
The 1 in this case, refers to the 1st line of a log. There are several different logs in this file, and I need the first line from all of them. Problem is (as you could understand) 1 is not unique, and can appear in any of the 12 'columns' of data I have in this CSV
Essentially, I would like to extract them all to a new CSV file as well, for further break down.
I know it sounds simple enough, but I cannot seem to get the information I need.
I have searched StackOverflow, Microsoft, Google and my own Tech Team.
PS: Get-Content 'C:\Users\myfiles\Desktop\massivelogs.csv' | Select-String "1" | Out-File "extractedlogs.csv"
The immediate answer is that you must use Select-String '^1 in order to restrict matching to the start (^) of each input line.
However, a much faster solution is to use the switch statement with the -File` option:
$inFile = 'C:\Users\myfiles\Desktop\massivelogs.csv'
$outFile = 'extractedlogs.csv'
& { switch -File $inFile -Wildcard { '1*' { $_ } } } | Set-Content $outFile
Note, however, that the output file won't be a true CSV file, because it will lack a header row.
Also, note that Set-Content applies an edition-specific default character encoding (the active ANSI code page in Windows PowerShell, BOM-less UTF-8 in PowerShell Core); use -Encoding as needed.
Using -Wildcard with a wildcard pattern (1*) speeds things up slightly, compared to -Regex with ^1.
I'm trying to add or remove a specific entry in Windows hosts file using powershell, but when I do this, it works for some time, and after a while it gets edited again (when Windows reads it, I guess), and it becomes corrupted (displays chinese characters).
I've tried using parts of a code i found here.
It allows me to edit the file properly and the entry is effective, until it gets corrupted.
I'm doing this to add the entry:
If ((Get-Content "$($env:windir)\system32\Drivers\etc\hosts" ) -notcontains "111.111.111.111 example.com")
{ac -Encoding UTF8 "$($env:windir)\system32\Drivers\etc\hosts" "111.111.111.111 example.com" }
Here is what the file looks like after it gets corrupted:
Thanks for your help.
Solved:
Remove -Encoding UTF8
Because as it states in the comment of the hosts file, "The IP address and the host name should be separated by at least one space.", trying to find a string with exactly one space character in between could return false.
I think it would be better to use Regex for this as it allows matching on more than one space character to separate the IP from the host name.
However, this does require the usage of [Regex]::Escape() on both parts of the entry as they contain regex special characters (the dot).
Something like this:
$hostsFile = "$($env:windir)\system32\Drivers\etc\hosts"
$hostsEntry = '111.111.111.111 example.com'
# split the entry into separate variables
$ipAddress, $hostName = $hostsEntry -split '\s+',2
# prepare the regex
$re = '(?m)^{0}[ ]+{1}' -f [Regex]::Escape($ipAddress), [Regex]::Escape($hostName)
If ((Get-Content $hostsFile -Raw) -notmatch $re) {
Add-Content -Path $hostsFile -Value $hostsEntry
}
I have some text files with different encodings. Some of them are UTF-8 and some others are windows-1251 encoded. I tried to execute following recursive script to encode it all to UTF-8.
Get-ChildItem *.nfo -Recurse | ForEach-Object {
$content = $_ | Get-Content
Set-Content -PassThru $_.Fullname $content -Encoding UTF8 -Force}
After that I am unable to use files in my Java program, because UTF-8 encoded has also wrong encoding, I couldn't get back original text. In case of windows-1251 encoded files I get empty output as in case of original files. So it makes corrupt already UTF-8 encoded files.
I found another solution, iconv, but as I see it needs current encoding as parameter.
$ iconv options -f from-encoding -t to-encoding inputfile(s) -o outputfile
Differently encoded files are mixed in a folder structure, so files should stay on same path.
System uses Code page 852.
Existing UTF-8 files are without BOM.
In Windows PowerShell you won't be able to use the built-in cmdlets for two reasons:
From your OEM code page being 852 I infer that your "ANSI" code page is Windows-1250 (both defined by the legacy system locale), which doesn't match your Windows-1251-encoded input files.
Using Set-Content (and similar) with -Encoding UTF8 invariably creates files with a BOM (byte-order mark), which Java and, more generally, Unix-heritage utilities don't understand.
Note: PowerShell Core actually defaults to BOM-less UTF8 and also allows you to pass any available [System.Text.Encoding] instance to the -Encoding parameter, so you could solve your problem with the built-in cmdlets there, while needing direct use of the .NET framework only to construct an encoding instance.
You must therefore use the .NET framework directly:
Get-ChildItem *.nfo -Recurse | ForEach-Object {
$file = $_.FullName
$mustReWrite = $false
# Try to read as UTF-8 first and throw an exception if
# invalid-as-UTF-8 bytes are encountered.
try {
[IO.File]::ReadAllText($file, [Text.Utf8Encoding]::new($false, $true))
} catch [System.Text.DecoderFallbackException] {
# Fall back to Windows-1251
$content = [IO.File]::ReadAllText($file, [Text.Encoding]::GetEncoding(1251))
$mustReWrite = $true
}
# Rewrite as UTF-8 without BOM (the .NET frameworks' default)
if ($mustReWrite) {
Write-Verbose "Converting from 1251 to UTF-8: $file"
[IO.File]::WriteAllText($file, $content)
} else {
Write-Verbose "Already UTF-8-encoded: $file"
}
}
Note: As in your own attempt, the above solution reads each file into memory as a whole, but that could be changed.
Note:
If an input file comprises only bytes with ASCII-range characters (7-bit), it is by definition also UTF-8-encoded, because UTF-8 is a superset of ASCII encoding.
It is highly unlikely with real-world input, but purely technically a Windows-1251-encoded file could be a valid UTF-8 file as well, if the bit patterns and byte sequences happen to be valid UTF-8 (which has strict rules around what bit patterns are allowed where).
Such a file would not contain meaningful Windows-1251 content, however.
There is no reason to implement a fallback strategy for decoding with Windows-1251, because there is no technical restrictions on what bit patterns can occur where.
Generally, in the absence of external information (or a BOM), there's no simple and no robust way to infer a file's encoding just from its content (though heuristics can be employed).
I have a batch script that prompts a user for some input then outputs a couple of files I'm using in an AIX environment. These files need to be in UNIX format (which I believe is UTF8), but I'm looking for some direction on the SIMPLEST way of doing this.
I don't like to have to download extra software packages; Cygwin or GnuWin32. I don't mind coding this if it is possible, my coding options are Batch, Powershell and VBS. Does anyone know of a way to do this?
Alternatively could I create the files with Batch and call a Powershell script to reform these?
The idea here is a user would be prompted for some information, then I output a standard file which are basically prompt answers in AIX for a job. I'm using Batch initially, because I didn't know that I would run into this problem, but I'm kind of leaning towards redoing this in Powershell. because I had found some code on another forum that can do the conversion (below).
% foreach($i in ls -name DIR/*.txt) { \
get-content DIR/$i | \
out-file -encoding utf8 -filepath DIR2/$i \
}
Looking for some direction or some input on this.
You can't do this without external tools in batch files.
If all you need is the file encoding, then the snippet you gave should work. If you want to convert the files inline (instead of writing them to another place) you can do
Get-ChildItem *.txt | ForEach-Object { (Get-Content $_) | Out-File -Encoding UTF8 $_ }
(the parentheses around Get-Content are important) However, this will write the files in UTF-8 with a signature at the start (U+FEFF) which some Unix tools don't accept (even though it's technically legal, though discouraged to use).
Then there is the problem that line breaks are different between Windows and Unix. Unix uses only U+000A (LF) while Windows uses two characters for that: U+000D U+000A (CR+LF). So ideally you'd convert the line breaks, too. But that gets a little more complex:
Get-ChildItem *.txt | ForEach-Object {
# get the contents and replace line breaks by U+000A
$contents = [IO.File]::ReadAllText($_) -replace "`r`n?", "`n"
# create UTF-8 encoding without signature
$utf8 = New-Object System.Text.UTF8Encoding $false
# write the text back
[IO.File]::WriteAllText($_, $contents, $utf8)
}
Try the overloaded version ReadAllText(String, Encoding) if you are using ANSI characters and not only ASCII ones.
$contents = [IO.File]::ReadAllText($_, [Text.Encoding]::Default) -replace "`r`n", "`n"
https://msdn.microsoft.com/en-us/library/system.io.file.readalltext(v=vs.110).aspx
https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx
ASCII - Gets an encoding for the ASCII (7-bit) character set.
Default - Gets an encoding for the operating system's current ANSI code page.