Thursday, January 28, 2010
Perl UTF8 to DEC
This is reading xml loaded in @xmld, and returns the xml, with utf8 converted to dec.
For the systems which cannot store utf8 char sets.
foreach my $line (@xmld)
{
my $loopc=0;
while ($line=~/([\x{80}-\x{FFFF}])/ || $line=~/\d{3}\_=\_/){
$line=utf8todec($line);
$loopc++;
last if $loopc>4;
}
if ($line =~m/(\d{3,})\_\=\_/){
if (my @u_ar=($line=~m/\d{3,}\_=\_/g)){
foreach my $u_cs (@u_ar){
if (my $u_cs=~m/(\d{3,})\_\=\_/){
my $u_ch=$1;
$line=~s/$u_cs/&#$u_ch;/g;
}
}
}
}
if ($line ne "") {
if ( $jxmld !~/\s$/ && $line !~/.\s/ && $jxmld ne "" ) {
$jxmld .= " $line";
}else{
$jxmld .= $line;
}
}
}
sub utf8todec()
{
my $u_st=shift;
my @u_ar, $u_c1, $u_c2, $u_c3, $u_c4, $u_cs, $u_ch;
$u_st=~ s/([\x{80}-\x{FFFF}])/ord($1).'_=_'/gse;
if (@u_ar=($u_st=~m/\d{3}\_=\_\d{3}\_=\_\d{3}\_=\_/g)){
foreach $u_cs (@u_ar){
if ($u_cs=~m/(\d{3})\_\=\_(\d{3})\_\=\_(\d{3})\_\=\_/){
($u_c1, $u_c2, $u_c3)=($1,$2,$3);
if ($u_c1>=224&& $u_c1<=239){
$u_ch=($u_c1-224)*64*64+($u_c2-128)*64+($u_c3-128);
$u_st=~s/$u_cs/&#$u_ch;/g;
}
}
}
}
if (@u_ar=($u_st=~m/\d{3}\_=\_\d{3}\_=\_/g)){
foreach $u_cs (@u_ar){
if ($u_cs=~m/(\d{3})\_\=\_(\d{3})\_\=\_/){
($u_c1, $u_c2)=($1,$2);
if ($u_c1>=192&& $u_c1<=223){
$u_ch=($u_c1-192)*64+($u_c2-128);
$u_st=~s/$u_cs/&#$u_ch;/g;
}
}
}
}
return $u_st;
}
For the systems which cannot store utf8 char sets.
foreach my $line (@xmld)
{
my $loopc=0;
while ($line=~/([\x{80}-\x{FFFF}])/ || $line=~/\d{3}\_=\_/){
$line=utf8todec($line);
$loopc++;
last if $loopc>4;
}
if ($line =~m/(\d{3,})\_\=\_/){
if (my @u_ar=($line=~m/\d{3,}\_=\_/g)){
foreach my $u_cs (@u_ar){
if (my $u_cs=~m/(\d{3,})\_\=\_/){
my $u_ch=$1;
$line=~s/$u_cs/&#$u_ch;/g;
}
}
}
}
if ($line ne "") {
if ( $jxmld !~/\s$/ && $line !~/.\s/ && $jxmld ne "" ) {
$jxmld .= " $line";
}else{
$jxmld .= $line;
}
}
}
sub utf8todec()
{
my $u_st=shift;
my @u_ar, $u_c1, $u_c2, $u_c3, $u_c4, $u_cs, $u_ch;
$u_st=~ s/([\x{80}-\x{FFFF}])/ord($1).'_=_'/gse;
if (@u_ar=($u_st=~m/\d{3}\_=\_\d{3}\_=\_\d{3}\_=\_/g)){
foreach $u_cs (@u_ar){
if ($u_cs=~m/(\d{3})\_\=\_(\d{3})\_\=\_(\d{3})\_\=\_/){
($u_c1, $u_c2, $u_c3)=($1,$2,$3);
if ($u_c1>=224&& $u_c1<=239){
$u_ch=($u_c1-224)*64*64+($u_c2-128)*64+($u_c3-128);
$u_st=~s/$u_cs/&#$u_ch;/g;
}
}
}
}
if (@u_ar=($u_st=~m/\d{3}\_=\_\d{3}\_=\_/g)){
foreach $u_cs (@u_ar){
if ($u_cs=~m/(\d{3})\_\=\_(\d{3})\_\=\_/){
($u_c1, $u_c2)=($1,$2);
if ($u_c1>=192&& $u_c1<=223){
$u_ch=($u_c1-192)*64+($u_c2-128);
$u_st=~s/$u_cs/&#$u_ch;/g;
}
}
}
}
return $u_st;
}
Tuesday, January 19, 2010
BASH
for loops: 1 to 10
for i in `seq 1 10`; do
echo $i;
done
for loops: 1 to 10
for i in `echo {1..10}`; do
echo $i;
done
for loops: A to 10
for i in `echo {A..Z}`; do
echo $i;
done
for i in `seq 1 10`; do
echo $i;
done
for loops: 1 to 10
for i in `echo {1..10}`; do
echo $i;
done
for loops: A to 10
for i in `echo {A..Z}`; do
echo $i;
done
Monday, January 18, 2010
File differences
diff -BNarq
The best use for file differences,
-b --ignore-space-change Ignore changes in the amount of white space.
-w --ignore-all-space Ignore all white space.
-B --ignore-blank-lines Ignore changes whose lines are all blank.
-a --text Treat all files as text.
-r --recursive Recursively compare any subdirectories found.
-N --new-file Treat absent files as empty.
-q --brief Output only whether files differ.
If the sources are checked out from svn,
I would remove all the .svn directories and do the diff, issuing
find . -iname ".svn" -exec rm -frv '{}' \;
The best use for file differences,
-b --ignore-space-change Ignore changes in the amount of white space.
-w --ignore-all-space Ignore all white space.
-B --ignore-blank-lines Ignore changes whose lines are all blank.
-a --text Treat all files as text.
-r --recursive Recursively compare any subdirectories found.
-N --new-file Treat absent files as empty.
-q --brief Output only whether files differ.
If the sources are checked out from svn,
I would remove all the .svn directories and do the diff, issuing
find . -iname ".svn" -exec rm -frv '{}' \;
SED Tips
Sed: Search text and display
sed -n '/404 Not Found/,/405 Method Not Allowed/p' rfc2616.txt
This searches the rfc2616.txt T, for the pattern 404 Not Found till 405 Method Not Allowed is found, then displays.
source: http://www.faqs.org/rfcs/rfc2616.txt
sed -n '/404 Not Found/,/405 Method Not Allowed/p' rfc2616.txt
10.4.5 404 Not Found ...........................................66
10.4.6 405 Method Not Allowed ..................................66
10.4.5 404 Not Found
The server has not found anything matching the Request-URI. No
indication is given of whether the condition is temporary or
permanent. The 410 (Gone) status code SHOULD be used if the server
knows, through some internally configurable mechanism, that an old
resource is permanently unavailable and has no forwarding address.
This status code is commonly used when the server does not wish to
reveal exactly why the request has been refused, or when no other
response is applicable.
10.4.6 405 Method Not Allowed
sed -n '/404 Not Found/,/405 Method Not Allowed/p' rfc2616.txt
This searches the rfc2616.txt T, for the pattern 404 Not Found till 405 Method Not Allowed is found, then displays.
source: http://www.faqs.org/rfcs/rfc2616.txt
sed -n '/404 Not Found/,/405 Method Not Allowed/p' rfc2616.txt
10.4.5 404 Not Found ...........................................66
10.4.6 405 Method Not Allowed ..................................66
10.4.5 404 Not Found
The server has not found anything matching the Request-URI. No
indication is given of whether the condition is temporary or
permanent. The 410 (Gone) status code SHOULD be used if the server
knows, through some internally configurable mechanism, that an old
resource is permanently unavailable and has no forwarding address.
This status code is commonly used when the server does not wish to
reveal exactly why the request has been refused, or when no other
response is applicable.
10.4.6 405 Method Not Allowed
Subscribe to:
Posts (Atom)