计算机数据结构

计算机

发布日期: 2025-02-27

更新日期: 2025-03-23

文章字数: 1.7k

阅读时长: 7 分

阅读次数:

串

实现思路

1. 成员变量

char* data: 存储字符串数据的字符数组。
size_t length: 字符串的长度（不包括结尾的 \0）。

2. 构造函数

默认构造函数：初始化一个空字符串。
构造函数：接收一个 const char* 类型的字符串，将其拷贝到 data 中并记录长度。
拷贝构造函数：通过拷贝其他字符串对象来创建新对象。
移动构造函数：通过转移资源来创建新对象，避免多余的内存分配和复制。
析构函数：释放 data 所占用的内存。

3. 赋值运算符重载

拷贝赋值：如果赋值的目标对象和当前对象不同，首先判断是否需要重新分配内存，然后逐个字符进行拷贝。
移动赋值：直接转移资源，避免不必要的内存分配。

4. 字符串连接 (`operator+`)

创建一个新的字符数组，将两个字符串的数据复制到其中，并返回新字符串对象。时间复杂度为 O(n + m)，其中 n 为当前字符串的长度，m 为要连接的字符串的长度。

5. 子串提取 (`substring`)

提取从 start 开始，长度为 len 的子串。如果 start 超出字符串的范围，返回空字符串；否则，复制子串并返回新的 String 对象。时间复杂度为 O(len)。

6. 查找子串 (`contains`)

遍历当前字符串，通过比较每个位置是否能匹配目标子串，如果找到匹配则返回 true，否则返回 false。时间复杂度为 O(n * m)，其中 n 是当前字符串长度，m 是目标子串的长度。

代码实现

#include <iostream>

class String {
private:
    char* data;
    size_t length;

    // 私有构造函数：直接使用现有字符数组，避免重复分配
    String(char* buffer, size_t len, bool) : data(buffer), length(len) {}

public:
    // 默认构造函数
    String() : data(new char[1]{'\0'}), length(0) {}

    // 构造函数
    String(const char* str) {
        length = 0;
        while (str[length] != '\0') length++;
        data = new char[length + 1];
        for (size_t i = 0; i < length; i++) {
            data[i] = str[i];
        }
        data[length] = '\0';
    }

    // 拷贝构造函数
    String(const String& other) {
        length = other.length;
        data = new char[length + 1];
        for (size_t i = 0; i < length; i++) {
            data[i] = other.data[i];
        }
        data[length] = '\0';
    }

    // 移动构造函数
    String(String&& other) noexcept : data(other.data), length(other.length) {
        other.data = nullptr;
        other.length = 0;
    }

    // 析构函数
    ~String() {
        delete[] data;
    }

    // 赋值运算符重载（拷贝）
    String& operator=(const String& other) {
        if (this != &other) {
            if (length != other.length) { // 只有在长度不同的时候才重新分配
                delete[] data;
                data = new char[other.length + 1];
            }
            length = other.length;
            for (size_t i = 0; i < length; i++) {
                data[i] = other.data[i];
            }
            data[length] = '\0';
        }
        return *this;
    }

    // 赋值运算符重载（移动）
    String& operator=(String&& other) noexcept {
        if (this != &other) {
            delete[] data;
            data = other.data;
            length = other.length;
            other.data = nullptr;
            other.length = 0;
        }
        return *this;
    }

    // 获取长度
    size_t size() const {
        return length;
    }

    // 串连接 (O(n + m))
    String operator+(const String& other) const {
        char* newData = new char[length + other.length + 1];
        for (size_t i = 0; i < length; i++) {
            newData[i] = data[i];
        }
        for (size_t i = 0; i < other.length; i++) {
            newData[length + i] = other.data[i];
        }
        newData[length + other.length] = '\0';
        return String(newData, length + other.length, true);
    }

    // 子串提取 (O(len))
    String substring(size_t start, size_t len) const {
        if (start >= length) return String("");
        len = (start + len > length) ? (length - start) : len;
        char* subData = new char[len + 1];
        for (size_t i = 0; i < len; i++) {
            subData[i] = data[start + i];
        }
        subData[len] = '\0';
        return String(subData, len, true);
    }

    // 查找是否包含某个子串 (O(n * m))
    bool contains(const String& subStr) const {
        if (subStr.length > length) return false;

        // 遍历当前字符串
        for (size_t i = 0; i <= length - subStr.length; i++) {
            size_t j = 0;
            // 比较子串
            while (j < subStr.length && data[i + j] == subStr.data[j]) {
                j++;
            }
            // 如果完全匹配，返回 true
            if (j == subStr.length) {
                return true;
            }
        }
        return false;
    }

    // 输出流重载
    void print() const {
        for (size_t i = 0; i < length; i++) {
            std::cout << data[i];
        }
        std::cout << std::endl;
    }
};

int main() {
    String s1("Hello");
    String s2("World");
    String s3 = s1 + s2;
    s3.print(); // HelloWorld

    String s4 = s3.substring(5, 5);
    s4.print(); // World

    // 检查是否包含子串
    String sub("World");
    if (s3.contains(sub)) {
        std::cout << "Found substring!" << std::endl; // Found substring!
    } else {
        std::cout << "Substring not found." << std::endl;
    }

    return 0;
}

优化子串查找：KMP 算法

KMP（Knuth-Morris-Pratt）算法是一种高效的字符串匹配算法，能够避免在匹配失败时回溯已匹配的字符，从而提高效率。它的时间复杂度为 O(n + m)，其中 n 是主串长度，m 是子串长度。

实现思路

KMP 算法的工作原理

构建部分匹配表（又叫“失配函数”或“前缀函数”）：
- 部分匹配表记录了每个位置之前的最长相同前缀和后缀的长度，用于指示在发生匹配失败时应该跳过多少字符。
字符串匹配：
- 通过比较主串和子串的字符，如果匹配失败，根据部分匹配表的信息跳过一些字符，避免重复比较。

代码实现

// KMP 算法子串查找
bool contains(const String& subStr) const {
    size_t m = subStr.length;
    size_t n = length;
    if (m == 0) return true;
    if (m > n) return false;

    // 构建部分匹配表
    size_t* lps = new size_t[m];  // 最长前缀后缀表
    lps[0] = 0;  // 初始值
    size_t j = 0;  // 前缀后缀长度
    for (size_t i = 1; i < m; i++) {
        while (j > 0 && subStr.data[i] != subStr.data[j]) {
            j = lps[j - 1];
        }
        if (subStr.data[i] == subStr.data[j]) {
            j++;
        }
        lps[i] = j;
    }

    // 主串匹配
    size_t i = 0;
    j = 0;
    while (i < n) {
        if (data[i] == subStr.data[j]) {
            i++;
            j++;
        }
        if (j == m) {
            delete[] lps;
            return true;  // 匹配成功
        } else if (i < n && data[i] != subStr.data[j]) {
            if (j != 0) {
                j = lps[j - 1];  // 使用部分匹配表跳过字符
            } else {
                i++;
            }
        }
    }

    delete[] lps;
    return false;  // 匹配失败
}

优化子串查找：Boyer-Moore 算法

Boyer-Moore 算法是一种基于字符的模式匹配算法，其通过启发式规则来跳过更多的字符，从而进一步提高匹配效率。它的时间复杂度在最佳情况下接近 O(n/m)，但最坏情况下是 O(n * m)。

实现思路

Boyer-Moore 算法的工作原理

坏字符规则：
- 如果在匹配过程中某个字符不匹配，根据主串中字符的位置，跳过一些字符，直到找到一个匹配的字符。
好后缀规则：
- 当出现不匹配时，检查当前的后缀部分，尝试通过已有的匹配信息跳过一些字符。

代码实现

// Boyer-Moore 算法子串查找
bool contains(const String& subStr) const {
    size_t m = subStr.length;
    size_t n = length;
    if (m == 0) return true;
    if (m > n) return false;

    // 创建坏字符表
    size_t* badChar = new size_t[256];  // ASCII 字符表
    for (size_t i = 0; i < 256; i++) {
        badChar[i] = m;  // 默认跳过全部字符
    }
    for (size_t i = 0; i < m - 1; i++) {
        badChar[(unsigned char)subStr.data[i]] = m - i - 1;
    }

    size_t i = 0;
    while (i <= n - m) {
        size_t j = m - 1;
        while (j >= 0 && data[i + j] == subStr.data[j]) {
            j--;
        }
        if (j == -1) {
            delete[] badChar;
            return true;  // 匹配成功
        } else {
            i += badChar[(unsigned char)data[i + m - 1]];
        }
    }

    delete[] badChar;
    return false;  // 匹配失败
}

Gustavo

https://gustavo1841.github.io/2025/02/27/ji-suan-ji/shu-ju-jie-gou/xian-xing-biao/chuan/

本博客所有文章除特別声明外，均采用 CC BY-NC 4.0 许可协议。转载请注明来源 Gustavo !

计算机数据结构

二叉树

2025-02-27 计算机

计算机数据结构

数组

2025-02-27 计算机

计算机数据结构

串

串

实现思路

1. 成员变量

2. 构造函数

3. 赋值运算符重载

4. 字符串连接 (operator+)

5. 子串提取 (substring)

6. 查找子串 (contains)

代码实现

优化子串查找：KMP 算法

实现思路

KMP 算法的工作原理

代码实现

优化子串查找：Boyer-Moore 算法

实现思路

Boyer-Moore 算法的工作原理

代码实现

4. 字符串连接 (`operator+`)

5. 子串提取 (`substring`)

6. 查找子串 (`contains`)