Simple Perl and C comparison

Code junkies hangout here

Moderators: ChrisThornett, LXF moderators

Simple Perl and C comparison

Postby gch15 » Wed Jun 29, 2011 9:38 pm

Hi,

I program quite a lot in Perl and find to be fast enough for whatever I want to do. A few days back I thought of comparing Perl with C. Since I often read in lines of text from files I thought I will compare the speed of doing this in Perl and C.

First I generate a text file to read using the BASH code below.
Code: Select all
 if [[ -e stuff ]]
   then rm stuff;
 fi
 for x in {1..5000}
 do
  echo "This is line $x" >> stuff;
  done

If I need a longer test file I just change the 5000 to some bigger number.

Below are a Perl script and a C program. Both do the same thing, which is, read lines from the file (stuff, created above) and keep adding them to a string variable. When all lines have been read, the length of this string variable is printed. That is all.

Code: Select all
$ gcc -o for_cmp for_cmp.c

$ time ./for_cmp

88893

real   0m1.126s
user   0m1.122s
sys   0m0.003s


$ time perl for_cmp.pl

88893

real   0m0.014s
user   0m0.006s
sys   0m0.007s


As you can see above, my C program is significantly slower than the Perl script. My C is very amateurish so I believe there must faster ways of doing this in C. I would greatly appreciate an example C code which is faster than (or as fast as) the Perl script in this simple task.

Thanks.

Here is the Perl code
Code: Select all
# begin perl script for_cmp.pl
open(IN, "<stuff");
my $growing;
while(<IN>) {
$growing .= $_;
}
close(IN);
print(length($growing), "\n");
# end perl script


And here is the C code
Code: Select all
/* begin C code for_cmp.c */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main (int argc, char *argv[]) {
  FILE *infile;
  const size_t mem_chunk = sizeof(char) * 1000 * 500;
  size_t allocd;
  char *growing = (char *) malloc(mem_chunk);
  char *moving = growing;
  allocd = mem_chunk;
  size_t initsize = 10000;
  char *line = (char *) malloc(sizeof(char) * initsize);
  char *str = NULL;
  infile = fopen("stuff", "r");

  while((fgets(line, initsize, infile)) != NULL) {
    if(strlen(growing) + strlen(line) + 100 > allocd) {
  growing = (char *) realloc(growing, allocd + mem_chunk);
  allocd += mem_chunk;
  moving = growing + strlen(growing);
    }
  moving = mempcpy(moving, line, strlen(line));
  }
  printf("%zu\n", strlen(growing));
fclose(infile);
exit(0);
}
/* end C code */
gch15
 
Posts: 39
Joined: Thu Jun 09, 2005 4:00 pm
Location: Norfolk, UK

Postby spaceyhase » Wed Jul 06, 2011 9:42 pm

All the memory allocation and copying is killing the C performance. fgets isn't helping as it is a line-orientated read. What you should do is figure out the file size (using fseek and ftell, for instance) and allocate once. We know the length of the file now so the rest is artificial but... Fill the buffer (again, just a single read will do) and count its length (no need to pull in string.h then either). The best way would be to just keep track of how many bytes have been read as you go - there's no need to count 'em afterwards. Or, as we know the file size and expect to read the file size, if we do read 'the file size' that should suffice in confirming the length of the 'string'.

And then free the memory.

You can probably do the similar in perl to make it even faster too.

Sorry it's all a bit vague. It shows the obvious differences between the two languages and that it isn't just a like-for-like comparison (who knows what perl's interpreter has done?; is 'while<in>' functionally the same as 'fgets'?; etc), even though the question itself is a fairly interesting one.
spaceyhase
LXF regular
 
Posts: 116
Joined: Mon Jun 30, 2008 12:07 pm

Postby johnhudson » Thu Jul 07, 2011 9:14 am

johnhudson
LXF regular
 
Posts: 873
Joined: Wed Aug 03, 2005 1:37 pm

Postby Bazza » Thu Jul 07, 2011 2:58 pm

Hi jh...

Would be interesting to know how fast this really is:-

http://www.linuxformat.com/forums/viewtopic.php?t=11351

;o)
73...

Bazza, G0LCU...

Team AMIGA...
User avatar
Bazza
LXF regular
 
Posts: 1476
Joined: Sat Mar 21, 2009 11:16 am
Location: Loughborough

Postby gch15 » Fri Jul 22, 2011 12:14 pm

Thanks for the response. I had guessed some of the issues you mention but not all of them so I have learned. "who knows what Perl interpreter has done?", however it is good to know that whatever it is doing it is pretty efficient!

spaceyhase wrote:All the memory allocation and copying is killing the C performance. fgets isn't helping as it is a line-orientated read. What you should do is figure out the file size (using fseek and ftell, for instance) and allocate once. We know the length of the file now so the rest is artificial but... Fill the buffer (again, just a single read will do) and count its length (no need to pull in string.h then either). The best way would be to just keep track of how many bytes have been read as you go - there's no need to count 'em afterwards. Or, as we know the file size and expect to read the file size, if we do read 'the file size' that should suffice in confirming the length of the 'string'.

And then free the memory.

You can probably do the similar in perl to make it even faster too.

Sorry it's all a bit vague. It shows the obvious differences between the two languages and that it isn't just a like-for-like comparison (who knows what perl's interpreter has done?; is 'while<in>' functionally the same as 'fgets'?; etc), even though the question itself is a fairly interesting one.
gch15
 
Posts: 39
Joined: Thu Jun 09, 2005 4:00 pm
Location: Norfolk, UK


Return to Programming

Who is online

Users browsing this forum: No registered users and 1 guest