Using multithreading: using multiple threads in Python on newest questions tagged multithreading – Stack Overflow

I’m trying to solve a problem, where I have many (on the order of ten thousand) URLs, and need to download the content from all of them. I’ve been doing this in a “for link in links:” loop up till now, but the amount of time it’s taking is now too long. I think it’s time to implement a multithreaded or multiprocessing approach. My question is, what is the best approach to take?

I know about the Global Interpreter Lock, but since my problem is network-bound, not CPU-bound, I don’t think that will be an issue. I need to pass data back from each thread/process to the main thread/process. I don’t need help implementing whatever approach (Multiple Threads in Python covers that), I need advice on which approach to take. My current approach:

data_list = get_data(...)
output = []
for datum in data:
    output.append(get_URL_data(datum))
return output

There’s no other shared state.

I think the best approach would be to have a queue with all the data in it, and have several worker threads pop from the input queue, get the URL data, then push onto an output queue.

Am I right? Is there anything I’m missing? This is my first time implementing multithreaded code in any language, and I know it’s generally a Hard Problem.

See Answers


source: http://stackoverflow.com/questions/11638349/using-multiple-threads-in-python
Using multithreading: using-multithreading



online applications demo