Wilco Bauwer brought this article to my attention and as I'm always up for a good challenge I decided to give it a try. As I like regular expressions the first thing I decided was that I would use regular expressions to split the strings within the array instead of string methods and tokens. Now that that was out of the way I still needed to find a way to compare e.g. "10" and "5" so that "5" takes precedence. Obvious solution was to pad the strings with "0" so they would equal in length - i.e. the strings in the previous example would come "10" and "05" respectively. Enough of this chit chat as it's time to let the code speak for itself:
private static int CompareNaturally(string a, string b) { MatchCollection m1 = Regex.Matches(a ?? "", @"\d+|[^\d]+"); MatchCollection m2 = Regex.Matches(b ?? "", @"\d+|[^\d]+"); for (int i = 0, c = 0; i < Math.Max(m1.Count, m2.Count); i++) { string s1 = i < m1.Count ? m1[i].Value : ""; string s2 = i < m2.Count ? m2[i].Value : ""; if (s1.Length > 0 && s2.Length > 0 && Char.IsDigit(s1[0]) && Char.IsDigit(s2[0])) { int len = Math.Max(s1.Length, s2.Length); c = s1.PadLeft(len, '0').CompareTo(s2.PadLeft(len, '0')); } else c = s1.CompareTo(s2); if (c != 0) return c; } return 0; }
To use the above method for sorting an array one would simply use Array.Sort(myStringArray, CompareNaturally); where myStringArray is of course an array of strings. Wilco recommended I should use Regex.Replace() and MatchEvaluator to make it even shorter. So, here is a one-liner (okay, it's split on multiple lines but just for easier reading):
private static int CompareNaturally_Short(string a, string b) { return Regex.Replace(a ?? "", @"\d+|[^\d]+", delegate(Match m) { return Char.IsDigit(m.Value[0]) ? m.Value.PadLeft( Math.Max(a.Length, b != null ? b.Length : 0), '0') : m.Value; }).CompareTo(Regex.Replace(b, @"\d+|[^\d]+", delegate(Match m) { return Char.IsDigit(m.Value[0]) ? m.Value.PadLeft( Math.Max(a != null ? a.Length : 0, b.Length), '0') : m.Value; })); }The idea behind this shorter version is pretty much the same with the exception that the MatchEvaluator takes care of padding the numeric strings. Don't hesitate contacting me or dropping a comment if you spot errors in either of the methods - Unfortunately I really didn't have time to test them thoroughly with different type of inputs.