python - Find the intersection between sublists

Question

Welcome To Ask or Share your Answers For Others

python - Find the intersection between sublists

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

python - Find the intersection between sublists

Recently i encounter with a question about Find the intersection between sublists . that tells the sublist which have any (1 or more ) intersection together become one. for example the following list :

l=[[1,2,3],[0,13,6],[9,10],[3,4,5],[10,11],[6,7,50]]

must be converted to :

[[1, 2, 3, 4, 5],[0, 50, 6, 7, 13],[9, 10, 11]]

So i wrote the following function to do it that works well with a good performance, i use set for its fast complexity for checking membership and also in inner loop i use slicing that compare the first index of main list with other elements in every loop and also note that the list will been decrease after each loop ,as its a recursion inside the loop . :

s=[set(i) for i in g if i]

def find_intersection(m_list):
    for i,v in enumerate(m_list) : 
        for j,k in enumerate(m_list[i+1:],i+1):
           if v & k:
              s[i]=v.union(m_list.pop(j))
              return find_intersection(m_list)
    return m_list

s=[set(i) for i in l if i]
print find_intersection(s)
[set([1, 2, 3, 4, 5]), set([0, 50, 6, 7, 13]), set([9, 10, 11])]

But i think it could be done with another solution maybe with better performance , i thought about collections.deque or maybe with numpy or just modifying my function and make it better ? . if you have any suggestion i would be grateful to hear about !

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:26:33+0000

Here is a more efficient algorithm:

For each unique number that is present in at least one of the sublists, let's maintain a list of indices of all sublists that contain this number. This part is O(n * log n) time if we use sorting to find unique numbers or O(n) if we use a hash table, where n is the total number of elements in all sublists.
Let's create a graph where vertices are sublist indices and an edge is present if two indices appear together in at least one list of indices among all numbers. We need create at most O(n) edges(this part is slightly non-trivial: there is no need to create all edges explicitly, we can just add an edge from an element to the next one in each sublist for all unique elements due to transitivity). Here is some pseudo code:
```
g = empty graph
for elem in unique_elements:
    sublist_indices = list of indices of all sublists that contain this element
    for i = 1 ... size(sublist_indices - 1):
        g.add_edge(sublist_indices[i], sublist_indices[i + 1])
```
Now we can find connected components in this graph using depth-first search in linear time(this graph is undirected).
We know which sublists should be merged(they should be merged if and only if they are in the same connected component), so we can easily construct the answer.

The total time complexity is O(n). It is optimal because reading the input already requires O(n) operations.

Categories

python - Find the intersection between sublists

python - Find the intersection between sublists

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags