标签云

微信群

扫码加入我们

WeChat QR Code

I wanted to compare reading lines of string input from stdin using Python and C++ and was shocked to see my C++ code run an order of magnitude slower than the equivalent Python code. Since my C++ is rusty and I'm not yet an expert Pythonista, please tell me if I'm doing something wrong or if I'm misunderstanding something.(TLDR answer: include the statement: cin.sync_with_stdio(false) or just use fgets instead.TLDR results: scroll all the way down to the bottom of my question and look at the table.)C++ code:#include <iostream>#include <time.h>using namespace std;int main() {string input_line;long line_count = 0;time_t start = time(NULL);int sec;int lps;while (cin) {getline(cin, input_line);if (!cin.eof())line_count++;};sec = (int) time(NULL) - start;cerr << "Read " << line_count << " lines in " << sec << " seconds.";if (sec > 0) {lps = line_count / sec;cerr << " LPS: " << lps << endl;} elsecerr << endl;return 0;}// Compiled with:// g++ -O3 -o readline_test_cpp foo.cppPython Equivalent:#!/usr/bin/env pythonimport timeimport syscount = 0start = time.time()for line insys.stdin:count += 1delta_sec = int(time.time() - start_time)if delta_sec >= 0:lines_per_sec = int(round(count/delta_sec))print("Read {0} lines in {1} seconds. LPS: {2}".format(count, delta_sec, lines_per_sec))Here are my results:$ cat test_lines | ./readline_test_cppRead 5570000 lines in 9 seconds. LPS: 618889$cat test_lines | ./readline_test.pyRead 5570000 lines in 1 seconds. LPS: 5570000I should note that I tried this both under Mac OS X v10.6.8 (Snow Leopard) and Linux 2.6.32 (Red Hat Linux 6.2). The former is a MacBook Pro, and the latter is a very beefy server, not that this is too pertinent.$ for i in {1..5}; do echo "Test run $i at `date`"; echo -n "CPP:"; cat test_lines | ./readline_test_cpp ; echo -n "Python:"; cat test_lines | ./readline_test.py ; doneTest run 1 at Mon Feb 20 21:29:28 EST 2012CPP: Read 5570001 lines in 9 seconds. LPS: 618889Python:Read 5570000 lines in 1 seconds. LPS: 5570000Test run 2 at Mon Feb 20 21:29:39 EST 2012CPP: Read 5570001 lines in 9 seconds. LPS: 618889Python:Read 5570000 lines in 1 seconds. LPS: 5570000Test run 3 at Mon Feb 20 21:29:50 EST 2012CPP: Read 5570001 lines in 9 seconds. LPS: 618889Python:Read 5570000 lines in 1 seconds. LPS: 5570000Test run 4 at Mon Feb 20 21:30:01 EST 2012CPP: Read 5570001 lines in 9 seconds. LPS: 618889Python:Read 5570000 lines in 1 seconds. LPS: 5570000Test run 5 at Mon Feb 20 21:30:11 EST 2012CPP: Read 5570001 lines in 10 seconds. LPS: 557000Python:Read 5570000 lines in1 seconds. LPS: 5570000Tiny benchmark addendum and recapFor completeness, I thought I'd update the read speed for the same file on the same box with the original (synced) C++ code. Again, this is for a 100M line file on a fast disk. Here's the comparison, with several solutions/approaches:ImplementationLines per secondpython (default) 3,571,428cin (default/naive)819,672cin (no sync) 12,500,000fgets 14,285,714wc (not fair comparison)54,644,808


Did you run your tests multiple times?Perhaps there is a disk cache issue.

2019年05月23日54分22秒

JJC : I see two possibilities (assuming you have remove the caching problem suggested by David): 1) <iostream> performance sucks. Not the first time it happens. 2) Python is clever enough not to copy the data in the for loop because you don't use it. You could retest trying to use scanf and a char[]. Alternatively you could try rewriting the loop so that something is done with the string (eg keep the 5th letter and concatenate it in a result).

2019年05月23日54分22秒

The problem is synchronization with stdio -- see my answer.

2019年05月23日54分22秒

Since nobody seems to have mentioned why you get an extra line with C++: Do not test against cin.eof()!! Put the getline call into the 'if` statement.

2019年05月24日54分22秒

wc -l is fast because it reads the stream more than one line at a time (it might be fread(stdin)/memchr('\n') combination). Python results are in the same order of magnitude e.g., wc-l.py

2019年05月23日54分22秒

This should be at the top. It is almost certainly correct. The answer cannot lie in replacing the read with an fscanf call, because that quite simply doesn't do as much work as Python does. Python must allocate memory for the string, possibly multiple times as the existing allocation is deemed inadequate - exactly like the C++ approach with std::string. This task is almost certainly I/O bound and there is way too much FUD going around about the cost of creating std::string objects in C++ or using <iostream> in and of itself.

2019年05月24日54分22秒

Yes, adding this line immediately above my original while loop sped the code up to surpass even python.I'm about to post the results as the final edit.Thanks again!

2019年05月23日54分22秒

Yes, this actually applies to cout, cerr, and clog as well.

2019年05月23日54分22秒

To make cout, cin, cerr and clog faster, do it this waystd::ios_base::sync_with_stdio(false);

2019年05月23日54分22秒

Note that sync_with_stdio() is a static member function, and a call to this function on any stream object (e.g. cin) toggles on or off synchronization for all standard iostream objects.

2019年05月24日54分22秒

Wow, that was quite insightful!While I've been aware that cat is unnecessary for feeding input to stdin of programs and that the < shell redirect is preferred, I've generally stuck to cat due to the left-to-right flow of data that the former method preserves visually when I reason about pipelines. Performance differences in such cases I've found to be negligible.But, I do appreciate your educating us, Bela.

2019年05月23日54分22秒

I'll refrain from an upvote, personally, since this doesn't address the original question (note that the use of cat is constant in the competing examples). But, again, thanks for the intellectual discussion about the ins and outs of *nix.

1970年01月01日00分03秒

Redirection is parsed out of the shell command line at an early stage, which allows you to do one of these, if it gives a more pleasing appearance of left-to-right flow:$ < big_file time my_program $ time < big_file my_programThis should work in any POSIX shell (i.e. not `csh` and I'm not sure about exotica like `rc` : )

2019年05月23日54分22秒

Again, aside from the perhaps uninteresting incremental performance difference due to the `cat` binary running at the same time, you are giving up the possibility of the program under test being able to mmap() the input file.This could make a profound difference in results.This is true even if you wrote the benchmarks yourself, in the various languages, using only their 'input lines from a file' idiom.It depends on the detailed workings of their various I/O libraries.

2019年05月23日54分22秒

Don't forget that you can still do left to right with redirection: <file program does almost the same thing(with the caveats JJC mentioned) as cat file | program.

2019年05月23日54分22秒

You might want to try different buffer sizes to get more useful information. I suspect you will see rapidly diminishing returns.

2019年05月23日54分22秒

I was too hasty in my reply; setting the buffer size to something other than the default did not produce an appreciable difference.

2019年05月24日54分22秒

I would also avoid setting up a 1MB buffer on the stack. It can lead to stackoverflow (though I guess it's a good place to debate about it!)

2019年05月24日54分22秒

Matthieu, Mac uses a 8MB process stack by default. Linux uses 4MB per thread default, IIRC. 1MB isn't that much of an issue for a program that transforms input with relatively shallow stack depth. More importantly, though, std::cin will trash the stack if the buffer goes out of scope.

2019年05月24日54分22秒

SEK Windows default Stack size is 1MB.

2019年05月23日54分22秒

The really correct loop would be: while (getline(cin, input_line)) line_count++;

2019年05月23日54分22秒

JonathanWakely I know that I'm pretty late, but use ++line_count; and not line_count++;.

2019年05月24日54分22秒

val if that makes any difference your compiler has a bug. The variable is a long, and the compiler is quite capable of telling that the result of the increment is not used. If it doesn't generate identical code for postincrement and preincrement, it's broken.

2019年05月23日54分22秒

You can get even faster than that with a tiny custom but completely straightforward C program that iteratively makes either unbuffered read syscalls into a static buffer of length BUFSIZE or via the equivalent corresponding mmap syscalls, and then whips through that buffer counting newlines à la for (char *cp = buf; *cp; cp++) count += *cp == "\n". You’ll have to tune BUFSIZE for your system, though, which stdio will have already done for you. But that for loop should compile down to awesomely screaming-fast assembler language instructions for your box’s hardware.

2019年05月23日54分22秒

count_if and a lambda also compiles down to "awesomely screaming-fast assembler".

2019年05月24日54分22秒

Didn't see this post until I made my third edit, but thanks again for your suggestion.Strangely, there is no 2x hit for me vs. python now with the scanf line in edit3 above.I'm using 2.7, by the way.

2019年05月24日54分22秒

After fixing the c++ version, this stdio version is substantially slower than the c++ iostreams version on my computer.(3 seconds vs 1 second)

2019年05月24日54分22秒

Same here. The sync to stdio was the trick.

2019年05月24日54分22秒

fgets is even faster; please see edit 5 above. Thanks.

2019年05月24日54分22秒

Except fgets will be wrong (in terms of line counts, and in terms of splitting lines across loops if you actually need to use them) for sufficiently large lines, without additional checks for incomplete lines (and attempting to compensate for it involves allocating unnecessarily large buffers, where std::getline handles reallocation to match actual input seamlessly). Fast and wrong is easy, but it's almost always worth it to use "slightly slower, but correct", which turning off sync_with_stdio gets you.

2019年05月24日54分22秒