HackerRank/Dynamic Programming/Substring Diff

Problem Summary

Given two strings of length N (P and Q) and an integer S, find the maximum of L such that there exists a pair of indices(i,j) for which we have M(i,j,L) ≤ S.

M(i,j,L) refers to the size of the set {0 ≤ x < L | p[i+x] ≠ q[j+x]}.

Solution 1

First, let
    f[i][j] = M(0,j-i,i), i ≤ j
    f[i][j] = M(i-j,0,j), i > j

We can use dynamic programming to find f[i][j] for all i and j. With the help of f, for any len, 1 ≤ len ≤ N, we only need O(len^2) time to find whether it is legal.

Now we can use binary search to find the answer. The total time complexity is O(N^2 × (1 + log S)).

Solution 2

For any gap, 0 ≤ gap < N, we consider two sets of substrings :

  1. substrings ending at en in string P and substrings ending at en+gap in string Q.

  2. substrings ending at en+gap in string P and substrings ending at en in string Q.

By enumerating gap and en, we can cover all situations.

For each gap, we maintain two values, st1 and st2, as the mininal starting positions for two kinds of substrings.

Theoretically, the time complexity is O(N^2).

Code for Solution 1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include <cstdio>
#include <iostream>
#include <cstring>
#include <algorithm>
#include <climits>
using namespace std;
const int maxn = 1502;
int n,lim;
char s1[maxn],s2[maxn];
int f[maxn][maxn];
bool check(int len) {
for (int i = 1; i <= n-len+1; ++i)
for (int j = 1; j <= n-len+1; ++j)
if (f[i+len-1][j+len-1] - f[i-1][j-1]<=lim)
return true;
return false;
}
int main()
{
int T;
scanf("%d",&T);
while(T) {
T--;
scanf("%d",&lim);
scanf("%s%s",s1+1,s2+1);
n = strlen(s1+1);
for (int i = 1; i <= n; ++i)
for (int j = 1; j <= n; ++j)
{
if (s1[i] == s2[j])
f[i][j] = f[i-1][j-1];
else
f[i][j] = f[i-1][j-1]+1;
}
int l = 0, r = n;
while (l<r) {
int mid = l + ((r-l)>>1) + 1;
if (check(mid)) {
l = mid;
}
else
r = mid-1;
}
printf("%d\n", l);
}
return 0;
}

Code for Solution 2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#include <cstdio>
#include <iostream>
#include <cstring>
#include <algorithm>
#include <climits>
using namespace std;
const int maxn = 1502;
int n,lim;
char s1[maxn],s2[maxn];
int f[maxn][maxn];
int main()
{
int T;
scanf("%d",&T);
while(T) {
T--;
scanf("%d",&lim);
scanf("%s%s",s1+1,s2+1);
n = strlen(s1+1);
for (int i = 1; i <= n; ++i)
for (int j = 1; j <= n; ++j)
{
if (s1[i] == s2[j])
f[i][j] = f[i-1][j-1];
else
f[i][j] = f[i-1][j-1]+1;
}
int ans = 0;
for (int gap = 0; gap < n; ++gap) {
int st1 = 0, st2 = 0;
for (int en = 1; en+gap <= n; ++en) {
while (st1 < en && f[en][en+gap] - f[st1][st1+gap] > lim)
st1++;
if (en-st1 > ans)
ans = en-st1;
while (st2 < en && f[en+gap][en] - f[st2+gap][st2] > lim)
st2++;
if (en-st2 > ans)
ans = en-st2;
}
}
printf("%d\n", ans);
}
return 0;
}