split_and_decode converts the escaped var=val pair inputs text to a corresponding vector of string elements. If the optional third argument is a string of less than three characters, then does only the decoding (but no splitting) and returns back a string.
split_and_decode("Tulipas=Taloon+kumi=kala&Joka=haisi +pahalle&kuin&%E4lymystöporkkana=ilman ruuvausta",1) produces a vector: ("TULIPAS" "Taloon kumi=kala" "JOKA" "haisi pahalle" "KUIN" NULL "ÄLYMYSTÖPORKKANA" "ilman ruuvausta")
split_and_decode(NULL) => NULL split_and_decode("") => NULL split_and_decode("A") => ("A" NULL) split_and_decode("A=B") => ("A" "B") split_and_decode("A&B") => ("A" NULL "B" NULL) split_and_decode("=") => ("" "") split_and_decode("&") => ("" NULL "" NULL) split_and_decode("&=") => ("" NULL "" "") split_and_decode("&=&") => ("" NULL "" "" "" NULL) split_and_decode("%") => ("%" NULL) split_and_decode("%%") => ("%" NULL) split_and_decode("%41") => ("A" NULL) split_and_decode("%4") => ("%4" NULL) split_and_decode("%?41") => ("%?41" NULL)
Can also work like Perl's split function (we define the escape prefix and space escape character as NUL-characters, so that they will not be encountered at all:
split_and_decode('Un,dos,tres',0,'\0\0,') => ("Un" "dos" "tres") split_and_decode("Un,dos,tres",1,'\0\0,') => ("UN" "DOS" "TRES") split_and_decode("Un,dos,tres",2,'\0\0,') => ("un" "dos" "tres")
Can also be used as replace and ucase (or lcase) together, for example, here we use the comma as space-escape instead of element-separator: (not recommended, use replace and ucase instead.
split_and_decode("Un,dos,tres",0,'\0,') => "Un dos tres" split_and_decode("Un,dos,tres",1,'\0,') => "UN DOS TRES"
Can be also used for decoding (some of) MIME-encoded mail-headers:
split_and_decode('=?ISO-8859-1?Q?Tiira_lent=E4=E4_taas?=',0,'=_') => "=?ISO-8859-1?Q?Tiira lentää taas?=" split_and_decode('Message-Id: <199511141036.LAA06462@correo.unet.ar>\n From: "=?ISO-8859-1?Q?Jorge_Mo=F1as?=" <jorgem@unet.ar>\n To: "Jore Carvajal" <carvajal@wanabee.fr>\nSubject: RE: Um-pah-pah\n Date: Wed, 12 Nov 1997 11:28:51 +0100\n X-MSMail-Priority: Normal\nX-Priority: 3\n X-Mailer: Molosoft Internet Mail 4.70.1161\nMIME-Version: 1.0\n Content-Type: text/plain; charset=ISO-8859-1\n Content-Transfer-Encoding: 8bit\nX-Mozilla-Status: 0011', 1,'=_\n:'); => ('MESSAGE-ID' ' <199511141036.LAA06462@correo.unet.ar>' 'FROM' ' "=?ISO-8859-1?Q?Jorge Moñas?=" <jorgem@unet.ar>' 'TO' ' "Jore Carvajal" <carvajal@wanabee.fr>' 'SUBJECT' ' RE: Um-pah-pah' 'DATE' ' Wed, 12 Nov 1997 11:28:51 +0100' 'X-MSMAIL-PRIORITY' ' Normal' 'X-PRIORITY' ' 3' 'X-MAILER' ' Molosoft Internet Mail 4.70.1161' 'MIME-VERSION' ' 1.0' 'CONTENT-TYPE' ' text/plain; charset=ISO-8859-1' 'CONTENT-TRANSFER-ENCODING' ' 8bit' 'X-MOZILLA-STATUS' ' 0011')
Same, but let's use space, not colon as a variable=value separator:
split_and_decode('Message-Id: <199511141036.LAA06462@correo.unet.ar>\n From: "=?ISO-8859-1?Q?Jorge_Mo=F1as?=" <jorgem@unet.ar>\n To: "Jore Carvajal" <carvajal@wanabee.fr>\nSubject: RE: Um-pah-pah\n Date: Wed, 12 Nov 1997 11:28:51 +0100\n X-MSMail-Priority: Normal\nX-Priority: 3\n X-Mailer: Molosoft Internet Mail 4.70.1161\nMIME-Version: 1.0\n Content-Type: text/plain; charset=ISO-8859-1\n Content-Transfer-Encoding: 8bit\nX-Mozilla-Status: 0011', 1,'=_\n ') => ('MESSAGE-ID:' '<199511141036.LAA06462@correo.unet.ar>' 'FROM:' '"=?ISO-8859-1?Q?Jorge Moñas?=" <jorgem@unet.ar>' 'TO:' '"Jore Carvajal" <carvajal@wanabee.fr>' 'SUBJECT:' 'RE: Um-pah-pah' 'DATE:' 'Wed, 12 Nov 1997 11:28:51 +0100' 'X-MSMAIL-PRIORITY:' 'Normal' 'X-PRIORITY:' '3' 'X-MAILER:' 'Molosoft Internet Mail 4.70.1161' 'MIME-VERSION:' '1.0' 'CONTENT-TYPE:' 'text/plain; charset=ISO-8859-1' 'CONTENT-TRANSFER-ENCODING:' '8bit' 'X-MOZILLA-STATUS:' '0011')
Of course this approach does not work with multiline headers, except somewhat kludgously. If the lines are separated by CR+LF, there is left one trailing CR at the end of each value part string.