chrisname 0 Light Poster

I reworked my code into a way that was simpler.

I have a URL in the following format: "http://sub.domain.tld/path/to/page"
I need to extract 'path' and 'page' and discard the rest of the URL.

I know about the System.Uri class and with Uri.AbsolutePath I can get "/path/to/page/". I need a way of splitting that into 'path' and 'page'. I can currently do it with the String.SubString / String.IndexOf / String.LastIndexOf methods but that requires bounds checking and checking the return values of each method to be safe. Is there a simpler and more robust way?

chrisname 0 Light Poster

how can i write a bootloader which loads a linux kernel ? i want to write such a simple bootloader as mentioned above ?

Regards,
Mohsin

It's a lot more complicated to boot the Linux kernel. You have to be able to load all it's boot modules, including the initial ramdisk, and tell it where they are, as well as what the root device is.

Try writing a simple stage1 (by the way, I figured out why times wasn't working -- if you let NASM do your sectioning itself (so it chooses what goes in .data, .bss, .text) then the times command times 510 - ($ - $$) db 0 will work.

All you need to do for a stage1 is read some sectors from the boot device (I would say between 8 and 32 (4 kiB and 16 kiB respectively), although GRUB loads about 60 sectors (30 kiB) AFAIK) and then jump to [ES:BX]. The program you load has to be a flat binary, so you're pretty much in the same position as your stage1 loader with one major advantage -- no 512 byte limit.

In the second stage I would implement a filesystem (FAT32, ext2 or ReiserFS (I would choose ext2 because it and Reiser are unencumbered by patents, but Linux usually resides on an ext{2, 3, 4} partition (3 and 4 are backwards-compatible) so I'd choose ext2) and an executable format (e.g. ELF, COFF, AOUT, PE (again, ELF, COFF and AOUT (a.out) aren't …

chrisname 0 Light Poster

Hm, I got it to work by allowing nasm to section everything itself, so I basically removed all sectioning stuff and then used

times 510 - ($ - $$) db 0
chrisname 0 Light Poster

($ - $$) gives you the offset in the current section, that is, in .data. The way your code is written, the .data section will occupy whole 512 bytes.

You need a simple

times (510 - $) db 0

I have changed that, but now nasm complains that times is not being supplied a constant value:

error: non-constant argument supplied to TIMES

I'm not understanding how this:

section .data
;=============================================================================;
dseg	db	0x10	; data segment
cseg	db	8		; code segment
off		db	0x7e00  ; offset of stage2 in memory
bootd	db	0		; boot disk
		times	(510 - $)	db	0
		dw	0xaa55
;=============================================================================;

is not a constant value.

What can I try to do to get this to work?

chrisname 0 Light Poster

($ - $$) gives you the offset in the current section, that is, in .data. The way your code is written, the .data section will occupy whole 512 bytes.

You need a simple

times (510 - $) db 0

Oh, thanks.

chrisname 0 Light Poster

Hi :)

I'm writing a basic 2 stage bootloader (I actually do plan for it to work, it's not just a hello world loader) and I've migrated to nasm. I find it much easier to use, etc. etc.

Anyway; I'm using the times directive to get the program to 512 bytes, and in all the examples I've seen about boot sector programs, they always have it like this:

times	510-($-$$)	db	0

(it's 510 to account for the boot signature).

Anyway, that's exactle how I have it. I've tried org'ing to different locations ([org 0] was interesting, I got a bunch of S characters printed in an infinite loop) and none have helped.

I've read the nasm manual on times; but it didn't help. I'm having to change the number from 510 to get it to stay at 512 bytes every time I add new code, and it's getting annoying because I have to compile the program twice so that I can change it.

Anyway, what I don't understand is why the times instruction never works for me, but it seems to work when I compile other people's code. I tried combining my include files into one file, and making sure the times and boot signature were at the end of the data segment, but it didn't work.

Here's the main file; tell me if you need more

;=============================================================================;
; stage1.asm: stage1, load and initialize stage2                              ;
;=============================================================================;

;=============================================================================;
[bits	16]
[org 0x7c00]
jmp …
chrisname 0 Light Poster

Assuming you're using nasm and you're on an x86 processor, here's a quick example of boot sector code to print "Hello world"

;=============================================================================;
; helloworld.asm: print "Hello world" to the screen and hang                  ;
;=============================================================================;

section .text
;=============================================================================;
; start: entry point
start:
[org 0x7c00] ; Self-relocate origin to 0x7c00
	cli ; Disable interrupts

	mov si, msg ; Place message in si
	call putstr ; Print the string
;-----------------------------------------------------------------------------;
; hang: hang the machine indefinitely
hang:
	jmp hang
;-----------------------------------------------------------------------------;
; putstr: print a string given in si to the screen
putstr:
	pusha ; Push all registers
	mov ah, 0eh ; 
	
	; putchar: print a character from the string
	putchar:
		lodsb ; Get a character
		cmp al, 0 ; Check for null-terminator
		je .return ; NULL found, stop
		; Not NULL, print the character
		int 0x10
		jmp putchar
	.return:
		popa ; Pop all registers
		mov ax, 0
		ret ; Return (0 in ax)
;=============================================================================;

section .data ; This always goes at the end of the binary, so putting it at the
; end of the source file makes sense.
;=============================================================================;
msg	db	'Hello world', 13, 10, 0 ; The 13, 10, 0 refers to '\r\n\0' in C/C++.
	times	482-($-$$)	db	0 ; Fill the binary until it's 512 bytes
	dw	0xaa55 ; Boot signature stored as 0x55 0xaa in memory (little endian)
;=============================================================================;

Now, for some reason that times isn't working. Really it should be

times 510-($-SS) db 0

but for some reason, that never works for me.

Hopefully you can compare my printing function and yours, and see where yours goes wrong.

chrisname 0 Light Poster

One more thing...
I forgot to mention that you'll see a lot of examples where the first line is something like #!/usr/local/bin/perl -w where the -w turns on warnings. Warnings can tell you about unused filehandles or variables with no assigned value. Or you can turn on warnings by a use warnings; statement in your program, or perl -w from the command line.

Oh, thanks.

chrisname 0 Light Poster

Also use strict; nags you until you declare all variables (by adding 'my' in foreach [B]my[/B] $arg (@ARGV) for example) which seems like a nuisance at first but you soon get used to it.

Oh, I was declaring variables before using them:

use strict;

my @days = ("Monday", "Tuesday", "Wednesay",
            "Thursday", "Friday", "Saturday", "Sunday"
           );
           
my $day = "";
           
foreach $day (@days) {
    print "$day\n";
}

when I could have done this

use strict;

my @days = ("Monday", "Tuesday", "Wednesay",
            "Thursday", "Friday", "Saturday", "Sunday"
           );
           
foreach my $day (@days) {
    print "$day\n";
}

>

chrisname 0 Light Poster

You're welcome. I neglected to say the test worked for me only after I changed the '==' operator to 'eq' as ItecKid pointed out. Also, I recommend using the strict module in any Perl script -- just add use strict; -- unless you have a particular reason not to use it. It warns you of unused or misspelled variable names, etc.

Ok, cool, thanks. I'd seen that before but didn't know what it does.

Thank you.

chrisname 0 Light Poster

I don't have bash but when I tested your Perl script from the Windows command line (after copying an html file into the current directory it worked -- until it died because I don't have a readFile subroutine. But the point is that I didn't get the "$0: no existing file available for parsing" message on my platform. It's a mystery.

Remember that naming variables in Perl is case-sensitive. I see you have variables named "$htmlFile" and "$htmlfile" in your program, but I don't see how that would cause this particular error.

Thanks for pointing that out, I didn't notice that! No wonder I can't read the file :)

chrisname 0 Light Poster

Your question intrigued me, so I copied your code and got the same error. Your problem is with this if ($htmlFile == 'no file found') .

Instead of the == operator, use the eq operator.

Ok :)

Your actual problem, however, is that this will only work for the first HTML file in @ARGV. return $arg will exit the function as soon as it finds a file. If you want it to find multiple files, push $arg onto an array and return the entire array

That was intentional. It's meant to return as soon as it finds an existing file.

chrisname 0 Light Poster

So I decided to learn Perl yesterday, and I'm writing a Perl script to try and parse HTML files (not using regexps for the actual parsing).

Anyway; I have this subroutine to iterate over each argument in @ARGV and, if one of them is an existing file, return it's filename (and path if it's not in the cwd; the path would have to be given on the command line). If it breaks out of the loop and still hasn't found a file, it returns the text "no file found". For some reason, although I'm calling it correctly and although it's actually telling me it found the file; it still returns "no file found":

sub getHTMLfile {
    foreach $arg (@ARGV) {
        if (-e $arg) {
            print "Found file \"$arg\"\n";
            return $arg;
        }
    }
    
    return "no file found";
}

This is where I call getHTMLfile:

sub __main {
    $htmlFile   = getHTMLfile(); # Get file to parse
    
    if ($htmlFile == 'no file found') { # getHTMLfile() didn't find a suitable
                                        # file in the argument list.
        print "$0: no existing file available for parsing\n";
        exit;
    }
    
    $htmlBuffer = readFile($htmlfile); # Read it's contents into memory
    
    if ($htmlBuffer == 'no data found') {
        print "$0: unable to find data in the file, is it blank?\n";
        exit;
    }
    
    pre_parse($htmlBuffer); # Parse syntax errors and stuff
}

I'm getting the "$0: no existing file available for parsing" message when I call the script with this (bash):

$ perl parsehtml.pl htmlfile.html
Found file …
chrisname 0 Light Poster

I am starting to write a assembly program for a mc68hc12. I will be reading a 8 bit signed number into PortA and outputting the results to 7 segment LED's, common anode. I will be using Port P to ouput the segments and PortCan to control the LED's displayed. I assume I need 4 LED's, 1 for the sign and 3 LEDs for the max number of 128. If I hard wired the data in I get something like below. How would I go about pulling the data in from a port and then displaying the number on a set of LED's?

org $1000
        ldaa #$ff
        staa DDRP
        ldaa #$7c
        staa DDRCAN
forever ldy #display
next    ldaa 0,y
        staa PORTP
        ldaa PROTCAN
        anda #$83
        staa PROTCAN
        ldaa 1,y
        oraa PORTCAN
        staa PORTCAN

;a delay subroutine here to would cycle back to forever

display
       fcc  $30,$40
       fcc  $6d,$20
       fcc  $79,$10
       fcc  $33,$08
       fcc  $5b,$04

Thanks

If you know what the port is (e.g. 0x60 is the keyboard on a computer) you can use in.

It's used like this (AT&T syntax):

inb 0x60, %eax

which would store the data from the keyboard in eax. You'll probably want to store it elsewhere, though.

Hope this helps.

chrisname 0 Light Poster

Looks like your problem is that the for loop is incrementing n , but you're also incrementing it at the end of the loop:

for (n = 0; n < 49; n++) {
  putchar(bstr[n]);
  ++n;
}

That'll print every other character in the string, which is what you appear to be getting. Try this instead:

for (n = 0; n < 49; n++) {
  putchar(bstr[n]);
}

Lol, I feel stupid now...

chrisname 0 Light Poster

I'm still doing some cpuid code; but now I want to get the brand string. I've sort of got about half of it... If you have a browser that supports it; search for "/* relevant */" to seek straight to the relevant part of the code

Important edit: if I strncpy() it into a data structure or itself, it works :l; so this is probably a problem with something else.

cpuid_t do_cpuid(void) {
    cpuid_t stcpuid;

    /* Get vendor string: */ {
        unsigned a[3] = {0};
        int eax;
     
        asm volatile(
            "cpuid\n\t"
            :"=a"(eax), "=b"(a[0]), "=d"(a[1]), "=c"(a[2])
            :"a"(CPUID_GET_VENDORSTRING)
        );
       
        if (eax == 0) {
            fprintf(stderr, "cpuid not supported on this CPU.\n");
            exit(1);
        }
        
            memmove(stcpuid.vstring, a, 12);
            strncpy(stcpuid.manufacturer, get_manufacturer(stcpuid.vstring), 24);    
    }
    /* relevant */
    /* Get brandstring: */ {
        unsigned _bstr[12 + 1] = {0};
        char bstr[48 + 1] = {0};
        
        asm(
            "cpuid\n\t"
            :"=a"(_bstr[0]), "=b"(_bstr[1]), "=c"(_bstr[2]), "=d"(_bstr[3])
            :"a"(0x80000002)
        );
        
        asm(
            "cpuid\n\t"
            :"=a"(_bstr[4]), "=b"(_bstr[5]), "=c"(_bstr[6]), "=d"(_bstr[7])
            :"a"(0x80000003)
        );
        
        asm(
            "cpuid\n\t"
            :"=a"(_bstr[8]), "=b"(_bstr[9]), "=c"(_bstr[10]), "=d"(_bstr[11])
            :"a"(0x80000004)
        );
    
        /* I decided that the asm above was best to do in 3 seperate calls; it is
         * clearer this way, I hope.
         */
 
        memmove(bstr, _bstr, 48 + 1);
        
#if 1   /* What I should be getting: Intel(R) Core(TM)2 Quad  CPU   Q8200  @ 2.33GHz
         * What I am getting:        ItlR oeT) ud P  80 @23Gz
         * As you can see there are some slight differences.
         */
        int n = 0;
        for (n = 0; n < 49; n++) {
            putchar(bstr[n]);
            ++n;
        }
    
        putchar('\n');
    
#endif
    }
        
    return stcpuid;
}
chrisname 0 Light Poster

Thanks! It worked!

chrisname 0 Light Poster

Don't, assembly can be a hard step to make...Just remember your dealing with either addresses or numbers...Gerard4143

Yeah; it's difficult all right :P
Easy to learn enough opcodes to get started, and the syntax is easy (Intel is, AT&T took a little getting used to but I prefer it now -- you can tell more what's going on because instead of just "mov" you get "movl", "movq", etc.), but it's just hard to implement and the semantics are crazy.

Anyway, thanks for helping. I'll try that CPUID code now.

chrisname 0 Light Poster

Lol, thank you! :)

I feel silly now :P

chrisname 0 Light Poster

Hi :)

I'm trying to get the vendor ID using the CPUID opcode. Again, this is inline ASM with C; but it's the ASM that I need help with. The C is fine AFAIK.

I'm not sure if it's the way I'm trying to do this, or if my CPU simply doesn't support CPUID...

char* getCPUID(void) {
    int vendor[3];
    char vstring[12 + 1];

    /* Get vendor string: */
    asm("mov %0, %%eax\n\t"
        "cpuid\n\t"
        "cmp $3, %%eax\n\t" /* If eax < 3, CPU doesn't support CPUID... */
        "jl dont_bother\n\t" /* So we can't CPUID :( */
        "mov %%ebx, %0\n\t"
        "mov %%ecx, %1\n\t"
        "mov %%edx, %2\n\t"
        "dont_bother:\n\t"
        "mov $-1, %0"
        :"=r"(vendor[0]), "=r"(vendor[1]), "=r"(vendor[2])
       );
       
   if (vendor[0] == -1) {
       return "Error: CPU doesn't support CPUID!";
   }

   /* Store in vstring; 12 char string results */
   memcpy(&(vstring[0]), &(vendor[0]), sizeof(int));
   memcpy(&(vstring[4]), &(vendor[1]), sizeof(int));
   memcpy(&(vstring[8]), &(vendor[2]), sizeof(int));

   /* Make sure string is NULL-terminated */
   vstring[12] = '\0';
   
   return getManufacturer(vstring); /* Function simply compares vstring to some
                                       known vendor strings. */
}

... but my error message is printed. Is it my ASM, or the CPU?
I'm hoping it's the ASM...

$ ./a.out
Manufacturer: Error: CPU doesn't support CPUID!

chrisname 0 Light Poster

I almost forgot to tell you, if your using 64 bit linux assembly programming then most of the books and tuts out there are out dated...The ABI(application binary interface) for Linux 64 bit programs are different than the 32 bit ABI...Below is an excerpt from the Entry_64.S file from a 64 bit kernel source...Please note the order that registers are pasted data for a system call...You'll see I used this order in the 64 bit code above

%rax
%rdi
%rsi
%rdx
%r10
%r8
%r9
%r11

/*
 * System call entry. Upto 6 arguments in registers are supported.
 *
 * SYSCALL does not save anything on the stack and does not change the
 * stack pointer.
 */

/*
 * Register setup:
 * rax  system call number
 * rdi  arg0
 * rcx  return address for syscall/sysret, C arg3
 * rsi  arg1
 * rdx  arg2
 * r10  arg3 	(--> moved to rcx for C)
 * r8   arg4
 * r9   arg5
 * r11  eflags for syscall/sysret, temporary for C
 * r12-r15,rbp,rbx saved by C code, not touched.
 *
 * Interrupts are off on entry.
 * Only called from user space.
 *
 * XXX	if we had a free scratch register we could save the RSP into the stack frame
 *      and report it properly in ps. Unfortunately we haven't.
 *
 * When user can change the frames always force IRET. That is because
 * it deals with uncanonical addresses better. SYSRET has trouble
 * with them due to …
chrisname 0 Light Poster

Try this code...I'm not sure if your running a 32 or 64 bit box...

#include <stdio.h>
#include <stdlib.h>

void printit(char *chr)
{
	//64 bit
	__asm__ __volatile__
	(
	 	"movq	$1, %%rax\n\t"
		"movq	$1, %%rdi\n\t"	
		"movq	%0, %%rsi\n\t"
		"movq	$1, %%rdx\n\t"
		"syscall\n\t"
		:"=m"(chr)
	);
	//32 bit - Note untested
	//__asm__ __volatile__
	//(
	//	"movl	$4, %%eax\n\t"
	//       	"movl	$1, %%ebx\n\t"
	//	"movl	%0, %%ecx\n\t"
	//	"movl	$1, %%edx\n\t"
	//	"int	$0x80\n\t"
	//	:"=m"(chr)	
	//);
}

int main(int argc, char**argv)
{
	char ch = 0x61;
	int i = 0;

	for (i = 0; i < 26; ++i)
	{
		printit(&ch);
		++ch;
	}
	exit(EXIT_SUCCESS);
}

I'm using x86_64 so yes, 64-bit.
And thanks :) that does work.
I'll play around with that now.

chrisname 0 Light Poster

I have decided to try this in a plain asm file using as and ld;

.intel_syntax noprefix

.globl _start

_start:  
    mov ah, 0x4C
    mov al, 'A'

    abc:
        nextletter:
            add ah, 1
        print:
            int 0x21

I don't understand where I'm getting a segfault.
I thought it was because of my use of AT&T syntax and not knowing it properly; so I tried with intel syntax and it didn't help :l

Does anyone know where the segfault is?

Thanks :l

chrisname 0 Light Poster

If your going to be messing around with assembly system programming

Yes, I will be, but not for a while. I just wanted to play with hlt. And by a while I mean a few years. I'm nowhere near that level yet.

you should download the Intel or AMD manuals, they have explanations for all the opcodes..

Oh, thanks :-)

chrisname 0 Light Poster

Your seg fault is probably from this line

movb %ah, 0x4C\n

Your moving the contents of ah into memory address 0x4c

AHHH! Drat! The first tutorial I ever read was in Intel syntax; so as I say I'm likely to make many mistakes in AT&T syntax. I think, however, that it is probably better to write AT&T -- when you learn it I think it's more explicit -- you can easily see what are variables, what is in hex, and what is being used as a register.

Thank you :)

Are you writing this for Linux or Windows?

Linux.

I just figured out why the list of registers didn't work. You're meant to put %% where you use them instead of a singe %; as per this article http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html#s5

Do you know if I need ring 0 access to use halt? I would assume so because IIRC it hangs the machine; and obviously you don't want me running code on your machine that halts execution :P

Thanks.

chrisname 0 Light Poster

I'm actually writing this code as inline ASM but I feel it is more relevant to ASM than it is to C...

Basically all I want to do for now is print the alphabet. However I'm having some trouble getting used to AT&T syntax (I would rather learn it than keep just using Intel syntax) and I'm also unsure as to how to solve this segfault I'm getting:

#include <stdio.h>

int main() {
    // code
    {
        asm volatile("movb %ah, 0x4C\n" /* BIOS interrupt call for output */
            "movb $'A', %AL\n"          /* Place "A" in AL                */
            "jmp start\n"
            "start:\n\t"
                "add $1, %AL\n\t"   /* Add 1 to 'A' to move on to 'B', etc. */
                "print:\n\t\t"
                    "int $0x21\n\t" /* Print contents of AL */
              #if 0
                "hlt\n"       /* Halt the machine (commented out because I think I need ring 0 access to use hlt) */
              : " %ax", "%bx" /* List of registers used (commented out because GCC complained of the comma)       */
              #endif
            );
    }
    // code
    return 0;
}

I have tried without the "volatile" keyword. I did nothing, as I expected...

Thank you :)