[“Texas” as a cute name has been changed to the less offensive and more prosaic “ASCII”]
Perl 6 is more than Unicode “aware”—it reaches into the Unicode Character Database to use appropriate and meaningful characters. There use to be a joke that the only thing that kept Perl 5 from expanding was the lack of unused punctuation keys. That’s not a problem anymore.
But, for most Unicode thingys, there is an ASCII version. That version is probably multiple characters, so it’s larger than the Unicode version. And, since it’s a larger, super-sized version, we’ll call it the “Texas” version. The Perl 6 docs map the Unicode to ASCII versions.
I wrote a small command-line program to convert between the two. It’s not very sophisticated and I plan on improving it later. I’d especially like to look up things based on what they do. For instance, search for “subset” and get subset operators. Another data column with a description would be nice. And, reading all this from a file. Although I’ve taken the data from a single page in the Perl 6 docs, there are many other things I could add (such as the quoting stuff). I’ve also saved this in my unicode2texas.p6 gist.
use v6; # https://docs.perl6.org/language/unicode_texas#Other_acceptable_single_codepoints # https://gist.github.com/briandfoy/619638701f38817387f3e5d22d6dab12 my %hash; BEGIN { # Phasers on! my @table = qw{ « U+00AB << » U+00BB >> × U+00D7 * ÷ U+00F7 / − U+2212 - ∘ U+2218 o ≅ U+2245 =~= π U+03C0 pi τ U+03C4 tau U+1D452 e ∞ U+221E Inf … U+2026 ... ‘ U+2018 ' ’ U+2019 ' ‚ U+201A ' “ U+201C " ” U+201D " „ U+201E " 「 U+FF62 Q// 」 U+FF63 Q// ⁺ U+207A + ⁻ U+207B - ¯ U+00AF - ⁰ U+2070 **0 ¹ U+2071 **1 ² U+2072 **2 ³ U+2073 **3 ⁴ U+2074 **4 ⁵ U+2075 **5 ⁶ U+2076 **6 ⁷ U+2077 **7 ⁸ U+2078 **8 ⁹ U+2079 **9 ∘ U+2218 o ∅ U+2205 set() ∈ U+2208 (elem) ∉ U+2209 !(elem) ∋ U+220B (cont) ∌ U+220C !(cont) ⊆ U+2286 (<=) ⊈ U+2288 !(<=) ⊂ U+2282 (<) ⊄ U+2284 !(<) ⊇ U+2287 (>=) ⊉ U+2289 !(>=) ⊃ U+2283 (>) ⊅ U+2285 !(>) ≼ U+227C (<+) ≽ U+227D (>+) ∪ U+222A (|) ∩ U+2229 (&) ∖ U+2216 (-) ⊖ U+2296 (^) ⊍ U+228D (.) ⊎ U+228E (+) }; while ( @table #`(I could also read from a file) ) { # I'd really like a @table.shift(3); my ( $unicode, $codepoint, $texas ) = @table.splice( 0, 3 ); # I don't particularly like the two way hash here. %hash{ $unicode, $texas } = %( unicode => $unicode, texas => $texas, codepoint => $codepoint ) xx *; # list replication } } # I don't really need a multi here, but I'm playing with it anyway. # multi implies sub, so I could have said "multi sub MAIN" multi MAIN( Str $s where { %hash{$_}:exists and .substr(0, 1).ord > 0xAA } ) { say "Running the Unicode version for $s"; show( %hash{$s} ); } multi MAIN( Str $s where { %hash{$_}:exists and .substr(0, 1).ord <= 0xAA } ) { say "Running the Texas version for $s"; show( %hash{$s} ); } sub show ( %h ) { say sprintf "Unicode: %s\nTexas: %s", %h<unicode texas>, }
Some notes on the program, which I didn’t spend much time thinking about. In the while
I put an embedded comment #`(...)
. I think I want to create a data file to hold all of this.
I use a multi to define the same subroutine name several times but with different signatures. Inside each, I specify that they take a Str but I constrain them with where. Those code blocks get the parameter as $_
(or some other ways). That $_
is the default “topic”,. When I neglect to type out the object in .substr(0, 1)
, Perl 6 uses the topic.
And, I do a few hash slices here, but never change the sigil.
This utility works great in a shell that can handle Unicode!
You can also use a `for` loop to iterate over that array, by the way:
Is there some direct way to convert the Texas representation to the Unicode code point? I have tried quoting + .perl like this:
but does not work, obviously.
Are you trying to let P6 parse the code you wrote with Texas quotes and then deparse it with the fancier versions? I don’t know of a way to do that.