algorithm - Sparse matrices / arrays in Java

Question

Welcome To Ask or Share your Answers For Others

algorithm - Sparse matrices / arrays in Java

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

algorithm - Sparse matrices / arrays in Java

I'm working on a project, written in Java, which requires that I build a very large 2-D sparse array. Very sparse, if that makes a difference. Anyway: the most crucial aspect for this application is efficency in terms of time (assume loads of memory, though not nearly so unlimited as to allow me to use a standard 2-D array -- the key range is in the billions in both dimensions).

Out of the kajillion cells in the array, there will be several hundred thousand cells which contain an object. I need to be able to modify cell contents VERY quickly.

Anyway: Does anyone know a particularly good library for this purpose? It would have to be Berkeley, LGPL or similar license (no GPL, as the product can't be entirely open-sourced). Or if there's just a very simple way to make a homebrew sparse array object, that'd be fine too.

I'm considering MTJ, but haven't heard any opinions on its quality.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-16T21:23:26+0000

Sparsed arrays built with hashmaps are very inefficient for frequently read data. The most efficient implementations uses a Trie that allows access to a single vector where segments are distributed.

A Trie can compute if an element is present in the table by performing only read-only TWO array indexing to get the effective position where an element is stored, or to know if its absent from the underlying store.

It can also provide a default position in the backing store for the default value of the sparsed array, so that you don't need ANY test on the returned index, because the Trie guarantees that all possible source index will map at least to the default position in the backing store (where you'll frequently store a zero, or an empty string or a null object).

There exists implementations that support fast-updatable Tries, with an otional "compact()" operation to optimze the size of the backing store at end of multiple operations. Tries are MUCH faster than hashmaps, because they don't need any complex hashing function, and don't need to handle collisions for reads (With Hashmaps, you have collision BOTH for reading and for writing, this requires a loop to skip to the next candidate position, and a test on each of them to compare the effective source index...)

In addition, Java Hashmaps can only index on Objects, and creating an Integer object for each hashed source index (this object creation will be needed for every read, not just writes) is costly in terms of memory operations, as it stresses the garbage collector.

I really hoped that the JRE included an IntegerTrieMap<Object> as the default implementation for the slow HashMap<Integer, Object> or LongTrieMap<Object> as the default implementation for the even slower HashMap<Long, Object>... But this is still not the case.

You may wonder what is a Trie?

It's just a small array of integers (in a smaller range than the full range of coordinates for your matrix) that allows mapping the coordinates into an integer position in a vector.

For example suppose you want a 1024*1024 matrix containing only a few non zero values. Instead of storing that matrix in a array containing 1024*1024 elements (more than 1 million), you may just want to split it into subranges of size 16*16, and you'll just need 64*64 such subranges.

In that case, the Trie index will contain just 64*64 integers (4096), and there will be at least 16*16 data elements (containing the default zeroes, or the most common subrange found in your sparse matrix).

And the vector used to store the values will contain only 1 copy for subranges that are equal with each other (most of them being full of zeroes, they will be represented by the same subrange).

So instead of using a syntax like matrix[i][j], you'd use a syntax like:

trie.values[trie.subrangePositions[(i & ~15) + (j >> 4)] +
            ((i & 15) << 4) + (j & 15)]

which will be more conveniently handled using an access method for the trie object.

Here is an example, built into a commented class (I hope it compiles OK, as it was simplified; signal me if there are errors to correct):

/**
 * Implement a sparse matrix. Currently limited to a static size
 * (<code>SIZE_I</code>, <code>SIZE_I</code>).
 */
public class DoubleTrie {

    /* Matrix logical options */        
    public static final int SIZE_I = 1024;
    public static final int SIZE_J = 1024;
    public static final double DEFAULT_VALUE = 0.0;

    /* Internal splitting options */
    private static final int SUBRANGEBITS_I = 4;
    private static final int SUBRANGEBITS_J = 4;

    /* Internal derived splitting constants */
    private static final int SUBRANGE_I =
        1 << SUBRANGEBITS_I;
    private static final int SUBRANGE_J =
        1 << SUBRANGEBITS_J;
    private static final int SUBRANGEMASK_I =
        SUBRANGE_I - 1;
    private static final int SUBRANGEMASK_J =
        SUBRANGE_J - 1;
    private static final int SUBRANGE_POSITIONS =
        SUBRANGE_I * SUBRANGE_J;

    /* Internal derived default values for constructors */
    private static final int SUBRANGES_I =
        (SIZE_I + SUBRANGE_I - 1) / SUBRANGE_I;
    private static final int SUBRANGES_J =
        (SIZE_J + SUBRANGE_J - 1) / SUBRANGE_J;
    private static final int SUBRANGES =
        SUBRANGES_I * SUBRANGES_J;
    private static final int DEFAULT_POSITIONS[] =
        new int[SUBRANGES](0);
    private static final double DEFAULT_VALUES[] =
        new double[SUBRANGE_POSITIONS](DEFAULT_VALUE);

    /* Internal fast computations of the splitting subrange and offset. */
    private static final int subrangeOf(
            final int i, final int j) {
        return (i >> SUBRANGEBITS_I) * SUBRANGE_J +
               (j >> SUBRANGEBITS_J);
    }
    private static final int positionOffsetOf(
            final int i, final int j) {
        return (i & SUBRANGEMASK_I) * MAX_J +
               (j & SUBRANGEMASK_J);
    }

    /**
     * Utility missing in java.lang.System for arrays of comparable
     * component types, including all native types like double here.
     */
    public static final int arraycompare(
            final double[] values1, final int position1,
            final double[] values2, final int position2,
            final int length) {
        if (position1 >= 0 && position2 >= 0 && length >= 0) {
            while (length-- > 0) {
                double value1, value2;
                if ((value1 = values1[position1 + length]) !=
                    (value2 = values2[position2 + length])) {
                    /* Note: NaN values are different from everything including
                     * all Nan values; they are are also neigher lower than nor
                     * greater than everything including NaN. Note that the two
                     * infinite values, as well as denormal values, are exactly
                     * ordered and comparable with <, <=, ==, >=, >=, !=. Note
                     * that in comments below, infinite is considered "defined".
                     */
                    if (value1 < value2)
                        return -1;        /* defined < defined. */
                    if (value1 > value2)
                        return 1;         /* defined > defined. */
                    if (value1 == value2)
                        return 0;         /* defined == defined. */
                    /* One or both are NaN. */
                    if (value1 == value1) /* Is not a NaN? */
                        return -1;        /* defined < NaN. */
                    if (value2 == value2) /* Is not a NaN? */
                        return 1;         /* NaN > defined. */
                    /* Otherwise, both are NaN: check their precise bits in
                     * range 0x7FF0000000000001L..0x7FFFFFFFFFFFFFFFL
                     * including the canonical 0x7FF8000000000000L, or in
                     * range 0xFFF0000000000001L..0xFFFFFFFFFFFFFFFFL.
                     * Needed for sort stability only (NaNs are otherwise
                     * unordered).
                     */
                    long raw1, raw2;
                    if ((raw1 = Double.doubleToRawLongBits(value1)) !=
                        (raw2 = Double.doubleToRawLongBits(value2)))
                        return raw1 < raw2 ? -1 : 1;
                    /* Otherwise the NaN are strictly equal, continue. */
                }
            }
            return 0;
        }
        throw new ArrayIndexOutOfBoundsException(
                "The positions and length can't be negative");
    }

    /**
     * Utility shortcut for comparing ranges in the same array.
     */
    public static final int arraycompare(
            final double[] values,
            final int position1, final int position2,
            final int length) {
        return arraycompare(values, position1, values, position2, length);
    }

    /**
     * Utility missing in java.lang.System for arrays of equalizable
     * component types, including all native types like double here.
     */ 
    public static final boolean arrayequals(
            final double[] values1, final int position1,
            final double[] values2, final int position2,
            final int length) {
        return arraycompare(values1, position1, values2, position2, length) ==
            0;
    }

    /**
     * Utility shortcut for identifying ranges in the same array.
     */
    public static final boolean arrayequals(
            final double[] values,
            final int position1, final int position2,
            final int length) {
        return arrayequals(values, position1, values, position2, length);
    }

    /**
     * Utility shortcut for copying ranges in the same array.
     */
    public static final void arraycopy(
            final double[] values,
            final int srcPosition, final int dstPosition,
            final int length) {
        arraycopy(values, srcPosition, values, dstPosition, length);
    }

    /**
     * Utility shortcut for resizing an array, preserving values at start.
     */
    public static final double[] arraysetlength(
            double[] values,
            final int newLength) {
        final int oldLength =
            values.length < newLength ? values.length : newLength;
        System.arraycopy(values, 0, values = new double[newLength], 0,
            oldLength);
        return values;
    }

    /* Internal instance members. */
    private double values[];
    private int subrangePositions[];
    private bool isSharedValues;
    private bool isSharedSubrangePositions;

    /* Internal method. */
    private final reset(
            final double[] values,
            final int[] subrangePositions) {
        this.isSharedValues =
            (this.values = values) == DEFAULT_VALUES;
        this.isSharedsubrangePositions =
            (this.subrangePositions = subrangePositions) ==
                DEFAULT_POSITIONS;
    }

    /**
     * Reset the matrix to fill it with the same initial value.
     *
     * @param initialValue  The value to set in all cell positions.
     */
    public reset(final double initialValue = DEFAULT_VALUE) {
        reset(
            (initialValue == DEFAULT_VALUE) ? DEFAULT_VALUES :
                new double[SUBRANGE_POSITIONS](initialValue),
            DEFAULT_POSITIONS);
    }

    /**
     * Default constructor, using single default value.
     *
     * @param initialValue  Alternate default value to initialize all
     *                      positions in the matrix.
     */
    public DoubleTrie(final double initialValue = DEFAULT_VALUE) {
        this.reset(initialValue);
    }

    /**
     * This is a useful preinitialized instance containing the
     * DEFAULT_VALUE in all cells.
     */
    public static DoubleTrie DEFAULT_INSTANCE = new DoubleTrie();

    /**
     * Copy constructor. Note that the source trie may be immutable
     * or not; but this constructor will create a new mutable trie
     * even if the new trie initially shares some storage with its
     * source when that source also uses shared storage.
     */
    public DoubleTrie(final DoubleTrie source) {
        this.values = (this.isSharedValues =
            source.isSharedValues) ?
            source.values :
            source.values.clone();
        this.subrangePositions = (this.isSharedSubrangePositions =
            source.isSharedSubrangePositions) ?

Categories

algorithm - Sparse matrices / arrays in Java

algorithm - Sparse matrices / arrays in Java

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags