Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
366 views
in Technique[技术] by (71.8m points)

c - what are the possible new line characters in different OS

We have a code that reads files line by line and each line is stored as a string. The problem is that Linux adds ' ' at the end of line; windows adds ' ' and ios adds ' 'at the end of the line. We want to replace all these special characters by '' so that we get our string.

Is there any other character added by any other OS? We need to handle that.

In response to comments below, i tried using text file mode

file = fopen ( filename, "rt" );
while(fgets ( buf, sizeof buf, buf ) != NULL)
{
    (*properties)->url= (char *) malloc(strlen(buf)+1);
    strcpy( (*properties)->url, buf);
/////...................more code

}

//line x-->
strncpy(url,properties->url,strlen(properties->url));

at line x, gdb prints "https://example.com/file " for properties->url

where my url is "https://example.com/file"

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

There are three relatively common newline conventions: , , and , and a fourth one that can occur when an editor gets confused about the newline convention, . If an approach supports universal newlines, it supports all four simultaneously, even if fixed.

Reading files line by line with universal newline support is easy. The only problem is that interactive input from line-buffered sources looks like it is read one line late. To avoid that, one can read lines into a dynamic buffer up to, but not including the newline; and consume the newline when reading the next line. For example:

#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>

ssize_t  getline_universal(char **dataptr, size_t *sizeptr, FILE *in)
{
    char   *data = NULL;
    size_t  size = 0;
    size_t  used = 0;
    int     c;

    if (!dataptr || !sizeptr || !in) {
        errno = EINVAL;
        return -1;
    }

    if (*sizeptr) {
        data = *dataptr;
        size = *sizeptr;
    } else {
        *dataptr = data;
        *sizeptr = size;
    }

    /* Ensure there are at least 2 chars available. */
    if (size < 2) {
        size = 2;
        data = malloc(size);
        if (!data) {
            errno = ENOMEM;
            return -1;
        }
        *dataptr = data;
        *sizeptr = size;
    }

    /* Consume leading newline. */
    c = fgetc(in);
    if (c == '
') {
        c = fgetc(in);
        if (c == '
')
            c = fgetc(in);
    } else
    if (c == '
') {
        c = fgetc(in);
        if (c == '
')
            c = fgetc(in);
    }

    /* No more data? */
    if (c == EOF) {
        data[used] = '';
        errno = 0;
        return -1;
    }

    while (c != '
' && c != '
' && c != EOF) {

        if (used + 1 >= size) {
            if (used < 7)
                size = 8;
            else
            if (used < 1048576)
                size = (3 * used) / 2;
            else
                size = (used | 1048575) + 1048577;

            data = realloc(data, size);
            if (!data) {
                errno = ENOMEM;
                return -1;
            }

            *dataptr = data;
            *sizeptr = size;
        }

        data[used++] = c;
        c = fgetc(in);
    }

    /* Terminate line. We know used < size. */
    data[used] = '';

    /* Do not consume the newline separator. */
    if (c != EOF)
        ungetc(c, in);

    /* Done. */
    errno = 0;
    return used;
}

The above function works much like POSIX.1-2008 getline(), except that it supports all four newline conventions (even mixed), and that it omits the newline from the line read. (That is, the newline is not included in either the return value or the dynamically allocated buffer. The newline is left in the stream, and consumed by the next getline_universal() operation.)

Unlike standard functions, getline_universal() always sets errno: to zero if successful, and nonzero otherwise. If you don't like the behaviour, feel free to change that.

As an use case example:

int main(void)
{
    unsigned long  linenum = 0u;
    char          *line_buf = NULL;
    size_t         line_max = 0;
    ssize_t        line_len;

    while (1) {
        line_len = getline_universal(&line_buf, &line_max, stdin);
        if (line_len < 0)
            break;

        linenum++;

        printf("%lu: "%s" (%zd chars)
", linenum, line_buf, line_len);
        fflush(stdout);
    }

    if (errno) {
        fprintf(stderr, "Error reading from standard input: %s.
", strerror(errno));
        return EXIT_FAILURE;
    }

    /* Not necessary before exiting, but here's how to
       safely discard the line buffer: */
    free(line_buf);
    line_buf = NULL;
    line_max = 0;
    line_len = 0;

    return EXIT_SUCCESS;
}

Note that because free(NULL) is safe, you can discard the buffer (using free(line_buf); line_buf = NULL; line_max = 0;) before any call to getline_universal(&line_buf, &line_max, stream).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

56.8k users

...