Molecule to atoms

同学发过来一道有意思的题

For a given chemical formula represented by a string, count the number of atoms of each element contained in the molecule and return an object (associative array in PHP, Dictionary<string, int> in C#, Map<String,Integer> in Java).

For example:

1
2
3
4
5
6
7
8
var water = 'H2O';
parseMolecule(water); // return {H: 2, O: 1}
var magnesiumHydroxide = 'Mg(OH)2';
parseMolecule(magnesiumHydroxide); // return {Mg: 1, O: 2, H: 2}
var fremySalt = 'K4[ON(SO3)2]2';
parseMolecule(fremySalt); // return {K: 4, O: 14, N: 2, S: 4}

As you can see, some formulas have brackets in them. The index outside the brackets tells you that you have to multiply count of each atom inside the bracket on this index. For example, in Fe(NO3)2 you have one iron atom, two nitrogen atoms and six oxygen atoms.

Note that brackets may be round, square or curly and can also be nested. Index after the braces is optional.

一句话描述:统计化学式中出现各个原子的个数.

不含括号的化学式很好解析,带上括号只要依次处理括号里的部分,乘以括号后的倍数就完成了.而括号里的部分又可以是一个完整的化学式.完美的分治.

K4[ON(SO3)2]2为例.首先解析K4,接着解析[ON(SO3)2]2,碰到括号解析字串ON(SO3)2,对应个数乘2再合并即可.ON(SO3)2也一样,解析出O,N后继续解析括号里的(SO3)2即可.

解析字符串一般可以转化成解析上下文无关文法,答案就在生成的AST中.

所有的括号{},[]替换为()后均不影响结果.这里就偷个懒

生成式:

1
2
3
4
5
Number: [1-9][0-9]*
Atom:[A-Z][a-z]*
Term: Atom Number
Molscule: TermTerm*
| Term*(Molscule)Number?Term*

  • 原子以大写字母开头,后跟任意个小写字母
  • 化学式的一项是一个原子后跟一个分子.
  • 分子式可以由至少一个项构成
  • 分子式可以由任意项中间插入左括号(,另一个分子式右括号)和一个可选的数字构成.

由生成式就可以动手写解析器啦.首先对每一个可能的分支都写出一个完整的解析函数,在无法确定接下来是哪个分支的时候使用超前查看来实现分支预测.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Please enter a chemical");
Console.WriteLine("Default: K4(ON(SO3)2)2");
string source = "K4(ON(SO3)2)2";
var str = Console.ReadLine();
str = string.IsNullOrWhiteSpace(str) ? source : str;
var dict = Parse(str, 0, 1);
dict.ToList().ForEach(p => Console.WriteLine($"{p.Key}:{p.Value}"));
Console.WriteLine("Press any key to continue");
Console.ReadLine();
}
static bool IsUpper(char c)
{
return c >= 'A' && c <= 'Z';
}
static bool IsLower(char c)
{
return c >= 'a' && c <= 'z';
}
static bool IsNumber(char c)
{
return c >= '0' && c <= '9';
}
static int GetNextElemIndex(string str, int start)
{
if (start >= str.Length) return -1;
if (!IsUpper(str[start]))
{
return -1;
}
for (int i = start + 1; i < str.Length; i++)
{
if (!IsLower(str[i]) && !IsUpper(str[i])) return i - 1;
if (IsLower(str[i]))
{
continue;
}
if(IsUpper(str[i]))
{
return i - 1;
}
}
return str.Length - 1;
}
static int GetNextNumber(string str,int start)
{
if (start >= str.Length) return -1;
if (!IsNumber(str[start])) return -1;
for(int i=start;i<str.Length;i++)
{
if (!IsNumber(str[i]))
return i - 1;
}
if (IsNumber(str.Last())) return str.Length - 1;
else return -1;
}
static int GetRightParin(string str,int start)
{
for(int i=start;i<str.Length;i++)
{
if (str[i] == ')') return i;
}
return -1;
}
static Dictionary<string, int> Parse(string str,int startIndex,int multiplex)
{
Dictionary<string, int> dict = new Dictionary<string, int>();
while (true)
{
if (startIndex == str.Length)
{
return dict;
}
var elem = GetNextElemIndex(str, startIndex);
if (elem!=-1)
{
// N个微元的情况,直接解析
var name = str.Substring(startIndex, elem - startIndex + 1);
int value = 1;
startIndex = elem + 1;
var num = GetNextNumber(str, startIndex);
if(num!=-1)
{
value = int.Parse(str.Substring(startIndex, num - startIndex + 1));
startIndex = num + 1;
}
if (!dict.ContainsKey(name)) dict[name] = 0;
dict[name] += value;
}
else if(str[startIndex]=='(')
{
// 碰到括号,解析子化学式
var right = GetRightParin(str, startIndex + 1)+1;
var nextNum = GetNextNumber(str, right);
//子化学式的倍数
int mul = int.Parse(str.Substring(nextNum, nextNum - right + 1));
var recursion = Parse(str, startIndex + 1, mul);
startIndex = nextNum + 1;
//合并字典
recursion.Keys.ToList().ForEach(k => { if (!dict.ContainsKey(k)) dict[k] = 0; });
recursion.ToList().ForEach(p => dict[p.Key] += p.Value);
} else if(str[startIndex]==')')
{
// 返回
return dict.ToDictionary(i => i.Key, i => i.Value * multiplex);
}
}
}
}