-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need "ANSI" encoding enumeration value to support "ANSI"-code-page-encoded files (e.g., Windows 1252) #6562
Comments
We don't use numbers with OEM and seems we should use only ANSI without number as an alias of |
@iSazonov: Agreed. @chuanjiao10: The purpose of this issue is to restore accidentally removed functionality to PS Core: support for the active ANSI code page. What you're proposing is an enhancement (as an aside: something like |
Please open new Issue. We should discuss this. (and why |
@iSazonov: Yes, it should be a new issue, but I was suggesting that @chuanjiao10 create it (I only suggested a possible syntax). Interesting about the short list of Unix - hadn't noticed that - perhaps yet another issue. |
@iSazonov: Just as a quick pointer regarding the "short list":
|
New issue for |
New issue for "to allow numerical values" discussion #6581 |
I appreciate it, @iSazonov. |
I would like to echo the concerns above. The current list of encodings is too limiting. I am dealing with text encoded SHIFT_JIS (cp932) on OEM-US (cp437) and need to get the text to Unicode. Currently working around it with a Get-EncodedContent function that takes all of the named Encodings as a result of [system.text.encoding]::GetEncodings() and then using [system.io.file]::ReadAllLines($Path,$Encoding) as a workaround. Even using encoding RAW would destroy the SHIFT_JIS text on my system. This is a work in progress but should help others work around the issue in the meantime: function Get-EncodedContent {
} |
Thanks for the snippet. That's better than my current way to work around this issue in my scripts:
|
Let me summarize the status quo as of PowerShell Core 7.0.0-rc.2:
Tab-completion would be nice, however; here's a proof-of-concept function adapted from @jongross4's workaround; it supports both code-page numbers and encoding names for tab completion, along with PowerShell's own identifiers if you type
function Get-EncodedContent {
[CmdletBinding()]
param (
$Path
)
DynamicParam {
$paramName = 'Encoding'
$codePageNums = [Text.Encoding]::GetEncodings().CodePage
$encodingNames = [Text.Encoding]::GetEncodings().Name
# PowerShell's valid -Encoding arguments - sans 'Unknown' and 'String'
$psEncodingNames = 'Unicode', 'Byte', 'BigEndianUnicode', 'UTF8', 'UTF7', 'UTF32', 'Ascii', 'Default', 'Oem', 'BigEndianUTF32'
if ($codePageNums -notcontains 1252) {
# Workaround for PS Core as of v7: only the .NET Core default set is listed, not also those added later by PowerShell - see https://github.com/dotnet/corefx/issues/28944
# We use hard-coded lists obtained via Windows PowerShell:
# ([Text.Encoding]::GetEncodings().CodePage) -join ', '
# "'{0}'" -f (([Text.Encoding]::GetEncodings().Name) -join "', '")
$codePageNums = 37, 437, 500, 708, 720, 737, 775, 850, 852, 855, 857, 858, 860, 861, 862, 863, 864, 865, 866, 869, 870, 874, 875, 932, 936, 949, 950, 1026, 1047, 1140, 1141, 1142, 1143, 1144, 1145, 1146, 1147, 1148, 1149, 1200, 1201, 1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258, 1361, 10000, 10001, 10002, 10003, 10004, 10005, 10006, 10007, 10008, 10010, 10017, 10021, 10029, 10079, 10081, 10082, 12000, 12001, 20000, 20001, 20002, 20003, 20004, 20005, 20105, 20106, 20107, 20108, 20127, 20261, 20269, 20273, 20277, 20278, 20280, 20284, 20285, 20290, 20297, 20420, 20423, 20424, 20833, 20838, 20866, 20871, 20880, 20905, 20924, 20932, 20936, 20949, 21025, 21866, 28591, 28592, 28593, 28594, 28595, 28596, 28597, 28598, 28599, 28603, 28605, 29001, 38598, 50220, 50221, 50222, 50225, 50227, 51932, 51936, 51949, 52936, 54936, 57002, 57003, 57004, 57005, 57006, 57007, 57008, 57009, 57010, 57011, 65000, 65001
$encodingNames = 'IBM037', 'IBM437', 'IBM500', 'ASMO-708', 'DOS-720', 'ibm737', 'ibm775', 'ibm850', 'ibm852', 'IBM855', 'ibm857', 'IBM00858', 'IBM860', 'ibm861', 'DOS-862', 'IBM863', 'IBM864', 'IBM865', 'cp866', 'ibm869', 'IBM870', 'windows-874', 'cp875', 'shift_jis', 'gb2312', 'ks_c_5601-1987', 'big5', 'IBM1026', 'IBM01047', 'IBM01140', 'IBM01141', 'IBM01142', 'IBM01143', 'IBM01144', 'IBM01145', 'IBM01146', 'IBM01147', 'IBM01148', 'IBM01149', 'utf-16', 'utf-16BE', 'windows-1250', 'windows-1251', 'Windows-1252', 'windows-1253', 'windows-1254', 'windows-1255', 'windows-1256', 'windows-1257', 'windows-1258', 'Johab', 'macintosh', 'x-mac-japanese', 'x-mac-chinesetrad', 'x-mac-korean', 'x-mac-arabic', 'x-mac-hebrew', 'x-mac-greek', 'x-mac-cyrillic', 'x-mac-chinesesimp', 'x-mac-romanian', 'x-mac-ukrainian', 'x-mac-thai', 'x-mac-ce', 'x-mac-icelandic', 'x-mac-turkish', 'x-mac-croatian', 'utf-32', 'utf-32BE', 'x-Chinese-CNS', 'x-cp20001', 'x-Chinese-Eten', 'x-cp20003', 'x-cp20004', 'x-cp20005', 'x-IA5', 'x-IA5-German', 'x-IA5-Swedish', 'x-IA5-Norwegian', 'us-ascii', 'x-cp20261', 'x-cp20269', 'IBM273', 'IBM277', 'IBM278', 'IBM280', 'IBM284', 'IBM285', 'IBM290', 'IBM297', 'IBM420', 'IBM423', 'IBM424', 'x-EBCDIC-KoreanExtended', 'IBM-Thai', 'koi8-r', 'IBM871', 'IBM880', 'IBM905', 'IBM00924', 'EUC-JP', 'x-cp20936', 'x-cp20949', 'cp1025', 'koi8-u', 'iso-8859-1', 'iso-8859-2', 'iso-8859-3', 'iso-8859-4', 'iso-8859-5', 'iso-8859-6', 'iso-8859-7', 'iso-8859-8', 'iso-8859-9', 'iso-8859-13', 'iso-8859-15', 'x-Europa', 'iso-8859-8-i', 'iso-2022-jp', 'csISO2022JP', 'iso-2022-jp', 'iso-2022-kr', 'x-cp50227', 'euc-jp', 'EUC-CN', 'euc-kr', 'hz-gb-2312', 'GB18030', 'x-iscii-de', 'x-iscii-be', 'x-iscii-ta', 'x-iscii-te', 'x-iscii-as', 'x-iscii-or', 'x-iscii-ka', 'x-iscii-ma', 'x-iscii-gu', 'x-iscii-pa', 'utf-7', 'utf-8'
}
$validateSet = [Management.Automation.ValidateSetAttribute]::new([string[]] ($codePageNums + $encodingNames + $psEncodingNames))
$dynParam = [Management.Automation.RuntimeDefinedParameter]::new(
$paramName,
[string],
([Management.Automation.ParameterAttribute] @{ ParameterSetName = '__AllParameterSets' }, $validateSet)
)
($paramDictionary = [Management.Automation.RuntimeDefinedParameterDictionary]::new()).Add($paramName, $dynParam)
return $paramDictionary
}
end {
Set-StrictMode -Version 1
if (($encoding = $PSBoundParameters.Encoding)) { # -Encoding specified.
$isPSCore = $PSVersionTable.PSEdition -eq 'Core'
$isPsIdentifier = $false
if ($encoding -as [int]) { # code page
# If a code-page number was given, make it an [int].
$encoding = [int] $encoding
} else { # name
# See if the identifier is a standard PS encoding identifier.
$isPsIdentifier = 'Unicode', 'Byte', 'BigEndianUnicode', 'UTF8', 'UTF7', 'UTF32', 'Ascii', 'Default', 'Oem', 'BigEndianUTF32' -contains $encoding
}
# In PS Core we can always pass the -Encoding argument through,
# in Win PS only if it is a standard identifier.
if ($isPSCore -or $isPsIdentifier) {
# Workaround for PS Core as of v7.0 for 'BigEndianUTF32' not being suported - see https://github.com/PowerShell/PowerShell/issues/11645
# Translate to the equivalent System.Text.Encoding name.
if ($isPSCore -and $encoding -eq 'BigEndianUTF32') { $encoding = 'UTF-32BE' }
Get-Content $Path -Encoding $encoding
}
else { # WinPS - obtain a System.Text.Encoding instance and use [IO.File]::ReadAllLines()
# Caveat: This doesn't *stream* through the pipeline - it reads all lines *up front*
[IO.File]::ReadAllLines((Convert-Path $Path), [Text.Encoding]::GetEncoding($encoding))
}
}
else { # -Encoding not specified -> simply invoke Get-Content
Get-Content -Path $path
}
}
} |
🎉This issue was addressed in #19298, which has now been successfully released as Handy links: |
As discussed in #6550:
While you can pass
OEM
to filesystem cmdlets to support the legacy system locale's OEM code page on Windows, its "ANSI" counterpart (such as Windows 1252 on US-English systems) is currently missing.(In Windows PowerShell, the
Default
value fulfills that role, but in PowerShell CoreDefault
now refers to the new default, (BOM-less) UTF-8.)Therefore, an
ANSI
encoding value should be introduced to complement theOEM
value.With
ANSI
available, the current workaround:would simply become:
Note: Given that
OEM
already is available even when running on Unix-like platforms, it sounds like we shouldn't restrictANSI
's availability to Windows. ([System.Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage)
seemingly does return a locale-appropriate value on Unix-like platforms as well.)Environment data
Written as of:
PowerShell Core v6.0.2
The text was updated successfully, but these errors were encountered: