【面試現(xiàn)場】如何在500w個單詞中統(tǒng)計特定前綴的單詞有多少個？

鷹兔牛熊眼 2019-02-07

展開全文

小史是一個應屆生，雖然學的是電子專業(yè)，但是自己業(yè)余時間看了很多互聯(lián)網與編程方面的書，一心想進BAT互聯(lián)網公司。

今天小史去了一家在線英語培訓公司面試了。

簡單的自我介紹后，面試官給了小史一個問題。

【面試現(xiàn)場】

題目：我有500w個單詞，你幫忙設計一個數據結構來進行存儲，存好之后，我有兩個需求。

1、來了一個新的單詞，需要判斷是否在這500w個單詞中

2、來了一個單詞前綴，給出500w個單詞中有多少個單詞是該前綴

小史這次沒有不假思索就給出回答，他學會了深沉。

小史回憶起呂老師之前教他的bitmap算法。

小史心想：bitmap可以判斷一個數是否在40億個int32數中，其核心是每一個數映射成一個位，同時申請的bit位數覆蓋了整個int32的值域。

小史在紙上算了半天，終于開口了。

小史：好的，我用bitmap來做第一問。我把每一個字符串映射成一個位。比如，a是第1位，b是第2位，z是第26位，aa是第27位，ab是第28位，以此類推。英文一共26個字母，我算了一下，6個字符長度的單詞總共有26的6次方個，需要占26的6次方個位，大概300M。

小史：建立數據結構的時候，排序需要花掉nlg(n)，排序時字符串比較花掉m，時間一共mnlg(n)。查找的話用二分，就是mlg(n)了?？臻g是mn。

一分鐘過去了。

【請教大神】

回到學校，小史把面試情況和呂老師說了一下。

呂老師：你想想，a到z這26個字母中，可能只有a和i兩個是單詞，其他都不是，所以你的bitmap大量空間都被浪費了。這種情況你搞個hashset沒準還更省一點。

【樹形結構解難題】

小史：哦，這確實是節(jié)省了空間，如果要找單詞interest，那么就找根節(jié)點了，如果是找單詞interesting，那么就從根節(jié)點往下走，再把沿路的字母們都拼起來就行了。

（注：這里說的in不是單詞，指的是in不是500w單詞中的單詞）

呂老師還沒說完，小史就打斷了他。

找單詞interest：

找前綴為inter的所有單詞：

遍歷以前綴節(jié)點為根結點的一棵樹，就能統(tǒng)計出前綴為inter的所有單詞有多少個。

【字典樹】

小史：節(jié)點中增加一個變量用于計數，在添加節(jié)點的時候，就把相應的計數+1

理解了算法之后，小史的代碼寫起來也是非?？欤灰粫壕蛯懞昧耍海ㄊ謾C端代碼如果有問題，可以去PC看一下）

DictionaryTree.java

import java.util.HashMap;
import java.util.Map;

/**
 * @author xiaoshi on 2018/10/5.
 */
public class DictionaryTree {

    // 字典樹的節(jié)點
    private class Node {
        // 是否是單詞
        private boolean isWord;
        // 單詞計數
        private int count;
        // 字串
        private String str;
        // 子節(jié)點
        private Map<String, Node> childs;
        // 父節(jié)點
        private Node parent;

        public Node() {
            childs = new HashMap<String, Node>();
        }

        public Node(boolean isWord, int count, String str) {
            this();
            this.isWord = isWord;
            this.count = count;
            this.str = str;
        }

        public void addChild(String key, Node node) {
            childs.put(key, node);
            node.parent = this;
        }

        public void removeChild(String key) {
            childs.remove(key);
        }

        public String toString() {
            return 'str : ' + str + ', isWord : ' + isWord + ', count : ' + count;
        }
    }

    // 字典樹根節(jié)點
    private Node root;

    DictionaryTree() {
        // 初始化root
        root = new Node();
    }

    // 添加字串
    private void addStr(String word, Node node) {

        // 計數
        node.count++;

        String str = node.str;
        if(null != str) {

            // 尋找公共前綴
            String commonPrefix = '';
            for(int i=0; i<word.length(); i++) {
                if(str.length() > i && word.charAt(i) == str.charAt(i)) {
                    commonPrefix += word.charAt(i);
                } else {
                    break;
                }
            }

            // 找到公共前綴
            if(commonPrefix.length() > 0) {
                if (commonPrefix.length() == str.length() && commonPrefix.length() == word.length()) {
                    // 與之前的詞重復
                } else if(commonPrefix.length() == str.length() && commonPrefix.length() < word.length()) {
                    // 剩余的串
                    String wordLeft = word.substring(commonPrefix.length());
                    // 剩余的串去子節(jié)點中繼續(xù)找
                    searchChild(wordLeft, node);
                } else if(commonPrefix.length() < str.length()) {
                    // 節(jié)點裂變
                    Node splitNode = new Node(true, node.count, commonPrefix);
                    // 處理裂變節(jié)點的父關系
                    splitNode.parent = node.parent;
                    splitNode.parent.addChild(commonPrefix, splitNode);
                    node.parent.removeChild(node.str);
                    node.count--;
                    // 節(jié)點裂變后的剩余字串
                    String strLeft = str.substring(commonPrefix.length());
                    node.str = strLeft;
                    splitNode.addChild(strLeft, node);
                    // 單詞裂變后的剩余字串
                    if(commonPrefix.length() < word.length()) {
                        splitNode.isWord = false;
                        String wordLeft = word.substring(commonPrefix.length());
                        Node leftNode = new Node(true, 1, wordLeft);
                        splitNode.addChild(wordLeft, leftNode);
                    }
                }
            } else {
                // 沒有共同前綴，直接添加節(jié)點
                Node newNode = new Node(true, 1, word);
                node.addChild(word, newNode);
            }
        } else {
            // 根結點
            if(node.childs.size() > 0) {
                searchChild(word, node);
            } else {
                Node newNode = new Node(true, 1, word);
                node.addChild(word, newNode);
            }
        }
    }

    // 在子節(jié)點中添加字串
    public void searchChild(String wordLeft, Node node) {
        boolean isFind = false;
        if(node.childs.size() > 0) {
            // 遍歷孩子
            for(String childKey : node.childs.keySet()) {
                Node childNode = node.childs.get(childKey);
                // 首字母相同，則在該子節(jié)點繼續(xù)添加字串
                if(wordLeft.charAt(0) == childNode.str.charAt(0)) {
                    isFind = true;
                    addStr(wordLeft, childNode);
                    break;
                }
            }
        }
        // 沒有首字母相同的孩子，則將其變?yōu)樽庸?jié)點
        if(!isFind) {
            Node newNode = new Node(true, 1, wordLeft);
            node.addChild(wordLeft, newNode);
        }
    }

    // 添加單詞
    public void add(String word) {
        addStr(word, root);
    }

    // 在節(jié)點中查找字串
    private boolean findStr(String word, Node node) {
        boolean isMatch = true;
        String wordLeft = word;
        String str = node.str;
        if(null != str) {
            // 字串與單詞不匹配
            if(word.indexOf(str) != 0) {
                isMatch = false;
            } else {
                // 匹配，則計算剩余單詞
                wordLeft = word.substring(str.length());
            }
        }
        // 如果匹配
        if(isMatch) {
            // 如果還有剩余單詞長度
            if(wordLeft.length() > 0) {
                // 遍歷孩子繼續(xù)找
                for(String key : node.childs.keySet()) {
                    Node childNode = node.childs.get(key);
                    boolean isChildFind = findStr(wordLeft, childNode);
                    if(isChildFind) {
                        return true;
                    }
                }
                return false;
            } else {
                // 沒有剩余單詞長度，說明已經匹配完畢，直接返回節(jié)點是否為單詞
                return node.isWord;
            }
        }
        return false;
    }

    // 查找單詞
    public boolean find(String word) {
        return findStr(word, root);
    }

    // 統(tǒng)計子節(jié)點字串單詞數
    private int countChildStr(String prefix, Node node) {
        // 遍歷孩子
        for(String key : node.childs.keySet()) {
            Node childNode = node.childs.get(key);
            // 匹配子節(jié)點
            int childCount = countStr(prefix, childNode);
            if(childCount != 0) {
                return childCount;
            }
        }
        return 0;
    }

    // 統(tǒng)計字串單詞數
    private int countStr(String prefix, Node node) {
        String str = node.str;
        // 非根結點
        if(null != str) {
            // 前綴與字串不匹配
            if(prefix.indexOf(str) != 0 && str.indexOf(prefix) != 0) {
                return 0;
            // 前綴匹配字串，且前綴較短
            } else if(str.indexOf(prefix) == 0) {
                // 找到目標節(jié)點，返回單詞數
                return node.count;
            // 前綴匹配字串，且字串較短
            } else if(prefix.indexOf(str) == 0) {
                // 剩余字串繼續(xù)匹配子節(jié)點
                String prefixLeft = prefix.substring(str.length());
                if(prefixLeft.length() > 0) {
                    return countChildStr(prefixLeft, node);
                }
            }
        } else {
            // 根結點，直接找其子孫
            return countChildStr(prefix, node);
        }
        return 0;
    }

    // 統(tǒng)計前綴單詞數
    public int count(String prefix) {
        // 處理特殊情況
        if(null == prefix || prefix.trim().isEmpty()) {
            return root.count;
        }
        // 從根結點往下匹配
        return countStr(prefix, root);
    }

    // 打印節(jié)點
    private void printNode(Node node, int layer) {
        // 層級遞進
        for(int i=0; i<layer; i++) {
            System.out.print('\t');
        }
        // 打印
        System.out.println(node);
        // 遞歸打印子節(jié)點
        for (String str : node.childs.keySet()) {
            Node child = node.childs.get(str);
            printNode(child, layer + 1);
        }
    }

    // 打印字典樹
    public void print() {
        printNode(root, 0);
    }

}

(友情提示：可左右滑動)

Main.java

/**
 * @author xiaoshi on 2018/10/5.
 */
public class Main {

    public static void main(String[] args) {

        DictionaryTree dt = new DictionaryTree();

        dt.add('interest');
        dt.add('interesting');
        dt.add('interested');
        dt.add('inside');
        dt.add('insert');
        dt.add('apple');
        dt.add('inter');
        dt.add('interesting');

        dt.print();

        boolean isFind = dt.find('inside');
        System.out.println('find inside : ' + isFind);

        int count = dt.count('inter');
        System.out.println('count prefix inter : ' + count);

    }

}

(友情提示：可左右滑動)

運行結果

str : null, isWord : false, count : 8
    str : apple, isWord : true, count : 1
    str : in, isWord : false, count : 7
        str : ter, isWord : true, count : 5
            str : est, isWord : true, count : 4
                str : ing, isWord : true, count : 2
                str : ed, isWord : true, count : 1
        str : s, isWord : false, count : 2
            str : ert, isWord : true, count : 1
            str : ide, isWord : true, count : 1
find inside : true
count prefix inter : 5

(友情提示：可左右滑動)

【字典樹的應用】

小史：我想想啊，大量字符串的統(tǒng)計和查找應該就可以用字典樹吧？字符串前綴的匹配也可以用，像咱們搜索常見的autoComplete控件是不是就可以用？

END

本站是提供個人知識管理的網絡存儲空間，所有內容均由用戶發(fā)布，不代表本站觀點。請注意甄別內容中的聯(lián)系方式、誘導購買等信息，謹防詐騙。如發(fā)現(xiàn)有害或侵權內容，請點擊一鍵舉報。

小男孩‘自慰网亚洲一区二区,亚洲一级在线播放毛片,亚洲中文字幕av每天更新,黄aⅴ永久免费无码,91成人午夜在线精品,色网站免费在线观看,亚洲欧洲wwwww在线观看

【面試現(xiàn)場】如何在500w個單詞中統(tǒng)計特定前綴的單詞有多少個？

【面試現(xiàn)場】如何在500w個單詞中統(tǒng)計特定前綴的單詞有多少個？