List利用HashSet去重

微信图片_20230614112241.png

首先是造出一个 List 模拟数据，一共2W条，里面有一半数据1W条是重复的:

  List<String> list = new ArrayList();

        for(int i = 1; i<= 10000; i++) {
            list.add(String.valueOf(i));
        }

        for(int i = 1; i<= 10000; i++) {
            list.add(String.valueOf(i));
        }

先看看我们用 contain 去重的代码：

 StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        System.out.println("contains 开始去重，条数：" + list.size());
        List<String> testListDistinctResult = new ArrayList<>();
        for (String str : list) {
            if (!testListDistinctResult.contains(str)) {
                testListDistinctResult.add(str);
            }
        }
        System.out.println("contains 去重完毕，条数：" + testListDistinctResult.size());
        stopWatch.stop();
        System.out.println("去重 最终耗时" + stopWatch.getTime());

微信图片_20230614112455.png

去重最终耗时661毫秒

接下来使用HashSet去重

  StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        System.out.println("HashSet.add 开始去重，条数：" + list.size());
        List<String> testListDistinctResult = new ArrayList<>(new HashSet(list));
        System.out.println("HashSet.add 去重完毕，条数：" + testListDistinctResult.size());
        stopWatch.stop();
        System.out.println("去重 最终耗时" + stopWatch.getTime());

去重最终耗时8毫秒

HashSet 的效率是相当高啊建议List去重使用HashSet去做处理。

为什么耗时差距这么大？

不多说，我们看源码：

可以看到里面用到了 index(o) ：
list.contains(o)：

/**
     * Returns <tt>true</tt> if this list contains the specified element.
     * More formally, returns <tt>true</tt> if and only if this list contains
     * at least one element <tt>e</tt> such that
     * <tt>(o==null&nbsp;?&nbsp;e==null&nbsp;:&nbsp;o.equals(e))</tt>.
     *
     * @param o element whose presence in this list is to be tested
     * @return <tt>true</tt> if this list contains the specified element
     */
    public boolean contains(Object o) {
        return indexOf(o) >= 0;
    }

    /**
     * Returns the index of the first occurrence of the specified element
     * in this list, or -1 if this list does not contain the element.
     * More formally, returns the lowest index <tt>i</tt> such that
     * <tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>,
     * or -1 if there is no such index.
     */
    public int indexOf(Object o) {
        if (o == null) {
            for (int i = 0; i < size; i++)
                if (elementData[i]==null)
                    return i;
        } else {
            for (int i = 0; i < size; i++)
                if (o.equals(elementData[i]))
                    return i;
        }
        return -1;
    }

循环重复判断去决定SIZE大小

接下来看看HashSet
时间复杂度：O(n) n: 元素个数

那么我们看看 set.add(o) 是怎么样的：

/**
     * Constructs a new set containing the elements in the specified
     * collection.  The <tt>HashMap</tt> is created with default load factor
     * (0.75) and an initial capacity sufficient to contain the elements in
     * the specified collection.
     *
     * @param c the collection whose elements are to be placed into this set
     * @throws NullPointerException if the specified collection is null
     */
    public HashSet(Collection<? extends E> c) {
        map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
        addAll(c);
    }



  /**
     * {@inheritDoc}
     *
     * <p>This implementation iterates over the specified collection, and adds
     * each object returned by the iterator to this collection, in turn.
     *
     * <p>Note that this implementation will throw an
     * <tt>UnsupportedOperationException</tt> unless <tt>add</tt> is
     * overridden (assuming the specified collection is non-empty).
     *
     * @throws UnsupportedOperationException {@inheritDoc}
     * @throws ClassCastException            {@inheritDoc}
     * @throws NullPointerException          {@inheritDoc}
     * @throws IllegalArgumentException      {@inheritDoc}
     * @throws IllegalStateException         {@inheritDoc}
     *
     * @see #add(Object)
     */
    public boolean addAll(Collection<? extends E> c) {
        boolean modified = false;
        for (E e : c)
            if (add(e))
                modified = true;
        return modified;
    }

map的add , 老生常谈就不谈了，hash完直接塞到某个位置，时间复杂度：O(1) 。

所以 O(n) 和 O(1) 谁快谁慢？显然。

瑞星的博客

去重 最终耗时661毫秒

去重 最终耗时8毫秒

去重最终耗时661毫秒

去重最终耗时8毫秒