[libgfortran] PR78351 comma not terminating READ of formatted input field

Message ID b656db0b-04da-2303-e07c-44f7237bfe47@charter.net
State New
Headers show
Series
  • [libgfortran] PR78351 comma not terminating READ of formatted input field
Related show

Commit Message

Jerry Nov. 3, 2018, 10:33 p.m.
Hi all,

The attached patch adds code in read_sf_internal to handle early 
termination of reads in the presence of comma's. This is to support 
legacy codes which are not standard conforming as far as we can tell.

The additions are executed only if -std=legacy is given at compile time. 
  It does not support kind=4 internal units since in legacy years there 
should be no kind=4 internal units.

I have provuded a simplified test case for various combinations of comma 
embedded strings.

This has been regression tested on x86_64-pc-linux-gnu.

OK for trunk?

This use to work way back in early versions so should probably go to 7 
and 8 branches. Opinions welcome.

Regards,

Jerry

2018-11-04  Jerry DeLisle  <jvdelisle@gcc.gnu.org>

	* io/transfer.c (read_sf_internal): Add support for early
	comma termination of internal unit formatted reads.

Comments

Andreas Schwab Nov. 4, 2018, 6:48 a.m. | #1
On Nov 03 2018, Jerry DeLisle <jvdelisle@charter.net> wrote:

> +  /* To support legacy code we have to scan the input string one byte

> +     at a time because we don't no where an early comma may be and the


s/no/know/

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
Bernhard Reutner-Fischer Nov. 4, 2018, 9:51 a.m. | #2
On Sat, 3 Nov 2018 15:33:07 -0700
Jerry DeLisle <jvdelisle@charter.net> wrote:

> diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c

> index 31198a3cc39..0d26101cef0 100644

> --- a/libgfortran/io/transfer.c

> +++ b/libgfortran/io/transfer.c


> @@ -260,22 +250,80 @@ read_sf_internal (st_parameter_dt *dtp, size_t *length)

>        sseek (dtp->u.p.current_unit->s, -1, SEEK_CUR);

>      }

>  

> -  lorig = *length;

> -  if (is_char4_unit(dtp))

> +  /* To support legacy code we have to scan the input string one byte

> +     at a time because we don't no where an early comma may be and the


As Andreas said s/no/know/

> +     requested length could go passed the end of a comma shortened


s/passed/past/

> +      /* Get the first chracter of the string to establish the base


s/chracter/character/

> +      /* Now we scan the rest and exit deal with an end-of-file


s/ exit// ?
Jerry Nov. 4, 2018, 3:14 p.m. | #3
On 11/4/18 1:51 AM, Bernhard Reutner-Fischer wrote:
> On Sat, 3 Nov 2018 15:33:07 -0700

> Jerry DeLisle <jvdelisle@charter.net> wrote:

> 

>> diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c

>> index 31198a3cc39..0d26101cef0 100644

>> --- a/libgfortran/io/transfer.c

>> +++ b/libgfortran/io/transfer.c

> 

>> @@ -260,22 +250,80 @@ read_sf_internal (st_parameter_dt *dtp, size_t *length)

>>         sseek (dtp->u.p.current_unit->s, -1, SEEK_CUR);

>>       }

>>   

>> -  lorig = *length;

>> -  if (is_char4_unit(dtp))

>> +  /* To support legacy code we have to scan the input string one byte

>> +     at a time because we don't no where an early comma may be and the

> 

> As Andreas said s/no/know/

> 

>> +     requested length could go passed the end of a comma shortened

> 

> s/passed/past/

> 

>> +      /* Get the first chracter of the string to establish the base

> 

> s/chracter/character/

> 

>> +      /* Now we scan the rest and exit deal with an end-of-file

> 

> s/ exit// ?

> 

> 

> 


Fixed all my typos, thanks.

Jerry
Thomas Koenig Nov. 8, 2018, 7:13 p.m. | #4
Hi Jerry!

> Hi all,

> 

> The attached patch adds code in read_sf_internal to handle early 

> termination of reads in the presence of comma's. This is to support 

> legacy codes which are not standard conforming as far as we can tell.

> 

> The additions are executed only if -std=legacy is given at compile time. 

>   It does not support kind=4 internal units since in legacy years there 

> should be no kind=4 internal units.

> 

> I have provuded a simplified test case for various combinations of comma 

> embedded strings.

> 

> This has been regression tested on x86_64-pc-linux-gnu.

> 

> OK for trunk?


Yes, OK. I like the fact that the old, slow behavior is restricted
to -std=legacy :-)

> This use to work way back in early versions so should probably go to 7 

> and 8 branches. Opinions welcome.


I agree, it should go in.

Thanks a lot for the patch!

Regards

	Thomas

Patch

diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
index 31198a3cc39..0d26101cef0 100644
--- a/libgfortran/io/transfer.c
+++ b/libgfortran/io/transfer.c
@@ -241,16 +241,6 @@  read_sf_internal (st_parameter_dt *dtp, size_t *length)
       && dtp->u.p.current_unit->pad_status == PAD_NO)
     hit_eof (dtp);
 
-  /* If we have seen an eor previously, return a length of 0.  The
-     caller is responsible for correctly padding the input field.  */
-  if (dtp->u.p.sf_seen_eor)
-    {
-      *length = 0;
-      /* Just return something that isn't a NULL pointer, otherwise the
-         caller thinks an error occurred.  */
-      return (char*) empty_string;
-    }
-
   /* There are some cases with mixed DTIO where we have read a character
      and saved it in the last character buffer, so we need to backup.  */
   if (unlikely (dtp->u.p.current_unit->child_dtio > 0 &&
@@ -260,22 +250,80 @@  read_sf_internal (st_parameter_dt *dtp, size_t *length)
       sseek (dtp->u.p.current_unit->s, -1, SEEK_CUR);
     }
 
-  lorig = *length;
-  if (is_char4_unit(dtp))
+  /* To support legacy code we have to scan the input string one byte
+     at a time because we don't no where an early comma may be and the
+     requested length could go passed the end of a comma shortened
+     string.  We only do this if -std=legacy was given at compile
+     time.  We also do not support this on kind=4 strings.  */
+  if (unlikely(compile_options.warn_std == 0)) // the slow legacy way.
     {
-      gfc_char4_t *p = (gfc_char4_t *) mem_alloc_r4 (dtp->u.p.current_unit->s,
-			length);
-      base = fbuf_alloc (dtp->u.p.current_unit, lorig);
-      for (size_t i = 0; i < *length; i++, p++)
-	base[i] = *p > 255 ? '?' : (unsigned char) *p;
-    }
-  else
-    base = mem_alloc_r (dtp->u.p.current_unit->s, length);
+      size_t n;
+      size_t tmp = 1;
+      char *q;
+
+      /* If we have seen an eor previously, return a length of 0.  The
+	 caller is responsible for correctly padding the input field.  */
+      if (dtp->u.p.sf_seen_eor)
+	{
+	  *length = 0;
+	  /* Just return something that isn't a NULL pointer, otherwise the
+	     caller thinks an error occurred.  */
+	  return (char*) empty_string;
+	}
+
+      /* Get the first chracter of the string to establish the base
+	 address and check for comma or end-of-record condition.  */
+      base = mem_alloc_r (dtp->u.p.current_unit->s, &tmp);
+      if (tmp == 0)
+	{
+	  dtp->u.p.sf_seen_eor = 1;
+	  *length = 0;
+	  return (char*) empty_string;
+	}
+      if (*base == ',')
+	{
+	  dtp->u.p.current_unit->bytes_left--;
+	  *length = 0;
+	  return (char*) empty_string;
+	}
 
-  if (unlikely (lorig > *length))
+      /* Now we scan the rest and exit deal with an end-of-file
+         condition or the comma.  */
+      for (n = 1; n < *length; n++)
+	{
+	  q = mem_alloc_r (dtp->u.p.current_unit->s, &tmp);
+	  if (tmp == 0)
+	    {
+	      hit_eof (dtp);
+	      return NULL;
+	    }
+	  if (*q == ',')
+	    {
+	      dtp->u.p.current_unit->bytes_left -= n;
+	      *length = n;
+	      break;
+	    }
+	}
+    }
+  else // the fast way
     {
-      hit_eof (dtp);
-      return NULL;
+      lorig = *length;
+      if (is_char4_unit(dtp))
+	{
+	  gfc_char4_t *p = (gfc_char4_t *) mem_alloc_r4 (dtp->u.p.current_unit->s,
+			    length);
+	  base = fbuf_alloc (dtp->u.p.current_unit, lorig);
+	  for (size_t i = 0; i < *length; i++, p++)
+	    base[i] = *p > 255 ? '?' : (unsigned char) *p;
+	}
+      else
+	base = mem_alloc_r (dtp->u.p.current_unit->s, length);
+
+      if (unlikely (lorig > *length))
+	{
+	  hit_eof (dtp);
+	  return NULL;
+	}
     }
 
   dtp->u.p.current_unit->bytes_left -= *length;