created by Brian Leroux & Andrew Lunny. sparodically uncurated by David Trejo.

regular expression and slash

jan 29 , 2014

When I use regular expressions and I want to validate a range of letters, I can do it using a-z or A-Z. Even when I use A-z it works fine too. The problem comes doing some test:

  /[A-Z]/.test("A"); // true
  /[A-Z]/.test("b"); // false
  /[A-Z]/.test("Z"); // true
  /[A-Z]/.test("z"); // false
  /[a-z]/.test("a"); // true
  /[a-z]/.test("A"); // false
  /[a-z]/.test("z"); // true
  /[a-z]/.test("Z"); // false

The weird thing comes when I do this test:

  /[A-z]/.test("A"); // true
  /[A-z]/.test("a"); // true
  /[A-z]/.test("Z"); // true
  /[A-z]/.test("z"); // true
  /[A-z]/.test("m"); // true
  /[A-z]/.test("D"); // true
  /[A-z]/.test("\\"); // true WTF?

It's supposed to accept only letters from A to Z and a to z. Can someone explain this?

@byoigres

I had a look into this with the following code:

  var re = /[A-z]/g,s=(function(){
    var f = String.fromCharCode;
    for(var i=0;i<6000;i++) f=f.bind(0, i);
    return f();
  })(),q,z=[];while((q=re.exec(s)) != null) z.push(q[0]);z

It returns

  ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O",
  "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "\", "]", "^",
  "_", "`", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
  "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]

It is likely, I think that A-z literally means 'any character between 'A' and 'z' in unicode code-point order, or at least charCode order. This allows (I think non-standard) statements like /[ -y]/g:

  var re = /[ -y]/g,s=(function(){
    var f = String.fromCharCode;
    for(var i=0;i<6000;i++) f=f.bind(0, i);
    return f();
  })(),q,z=[];while((q=re.exec(s)) != null) z.push(q[0]);z

Which returns

  [" ", "!", """, "#", "$", "%", "&", "'", "(", ")", "*", "+", ",", "-", ".",
  "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=",
  ">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L",
  "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[",
  "\", "]", "^", "_", "`", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
  "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y"]`

This probably has some potential security implications because if you're using [A-z] to sanitise something, you'll accept []^_`

A very interesting find!

zemnmez


The Hungry Variable

dec 19 , 2013

Consider this JS:

  x = /[/ + "javascript"[0] + '///'

What do you expect the value of x to be?

Those well-versed in Javascript's concatenation may either reject the statement or perhaps say:

  "/[/j///"

Chromium's console, however says:

  x = /[/ + "javascript"[0] + '///'
  /[/ + "javascript"[0] + '/

The Hungry Variable

Those who don't immediately see through this may notice a few things.

  • Like /(/, /[/ throws an error.
  • "javascript"[0] returns "j" as expected.
  • '///' returns "///" as expected.

What should be noted is that the errors for /(/ and /[/ are subtly different. In Chromium console:

  /(/
  //SyntaxError: Invalid regular expression: /(/: Unterminated group

  /[/
  //SyntaxError: Invalid regular expression: missing /

It seems that Javascript cannot see the second / for some reason. This is because choice groups (/[any letter]/) don't require escaping of the forward- slash, so /[/]/ is perfectly acceptable and the same as /\//.

If the pre was syntax highlighted, you perhaps would notice that the highlighter was just as confused as you in that regard, only Github's highlighter seems to highlight this correctly.

Thus, the regex extends across the apparent string addition statement and terminates at the "]" (you can't nest choice blocks).

The regex then extends to the slash after the typewriter single quote, and so the //' is simply a comment.

If you want the pre to do as expected, add a single forward slash before the first "[":

  x = /\[/ + "javascript"[0] + '///'
  "/\[/j///"

  //for comparison
  x = /[/ + "javascript"[0] + '///'
  /[/ + "javascript"[0] + '/

It is an interesting example of a statement that can completely change meaning with the insertion of one character without creating any errors.

@zemnmez


charAt is not the same as []

dec 15 , 2013

In case someone tells you it doesn't matter how you access characters in strings, they're wrong:


  'hello'[1]          // 'e'
  'hello'.charAt(1)   // 'e'

  'hello'[-1]         // undefined
  'hello'.charAt(-1)  // ''

What better "character" than the empty string to say "no such index"?

@rtoal


Local storage limitations

oct 7 , 2013

The local storage functionality in browsers is a bit limited and this can lead to some rather surprising behaviour.

  localStorage[0] = false;

  if (localStorage[0]) {
      console.log('wtf'); // runs?!
  }

When checking the value stored in localStorage, it appears that the boolean was silently converted to the string "false", which is truthy.

Turns out that this is one of those cases where it pays off to carefully read the specification, which states that local storage only accepts string values!

If you want to store an object or other type of value, you can serialize the data with JSON.stringify and load it again with JSON.parse.

@Overv of http://while.io


Fork me on GitHub